Skip to Content.
Sympa Menu

perfsonar-user - RE: [perfsonar-user] Encountered end of file

Subject: perfSONAR User Q&A and Other Discussion

List archive

RE: [perfsonar-user] Encountered end of file


Chronological Thread 
  • From: Louis-Berthier Soullière <>
  • To: Louis-Berthier Soullière <>, Mark Feit <>, "" <>
  • Subject: RE: [perfsonar-user] Encountered end of file
  • Date: Tue, 16 Jul 2019 19:41:32 +0000
  • Arc-authentication-results: i=1; mx.microsoft.com 1;spf=pass smtp.mailfrom=ssss.gouv.qc.ca;dmarc=pass action=none header.from=ssss.gouv.qc.ca;dkim=pass header.d=ssss.gouv.qc.ca;arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=fAK3PkzFCe0hYjp7/NxUFNWhCsXb/Rl3j4ZEAR4gbXA=; b=CPN5hvJloDR80FoanyNaLM2jCxQaYtqeD8BCMQK6yhnDg8kRns8W1J/gfxY9fVfec831LnCLRwhe/V/8aFJvjo74Hygkbrep1siKvR+AzvNWjP+mRU0W2LengIWyTmCaG+zf++1BgDp7hpfxMV4wwlOs/AVsrkiBZLciX8AF/5NttCYQFnPn4DlRNTcfqcYV9N1/Xeus8jGtgau8REGuh6I4iBuh9wEnFwlz8VGuySo+XvkXzrUvabp+nRqoYzPJr39XLizmRJ3EepchuCmtaEPwE2squTsX9z+Gfl3yWL587iuNg21+PpYdm7NjDej5GSekU8JniiB9BIiZEDzCbw==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=aWlcDWycMEkWEJTN7OXmrOzL9R9LP/HWW0UiTnKp2DPegVKW81rzcgc444Q0+/bFf+afC+Dv4FBozTHLLqiftdc6oGWcdtnGG37sum4HfokKtHnvrJzFHr1pXyzqP/UL5hbCqIl6MdaPbrz3d2xI5f8rg5MmFGUdteTW77o/lyH5qGaespWy5n8keuqinBZTZk0TChoqwE5MSyjacbVbqMwqLANPxDpYwg+YbXleRn++jZuZUwoE4JyzXDEZJDItb4h/GeoBM7xd2LrVQDKPd5znD+bTt5UyJAoFWqnVvo0vpuD83XxMzBQbrPOlQYk/bg8oc3pta+Cu4khP+QISEQ==

Ok, so I just did the troubleshoot again with the debug activated on the runner.

 

HOST 10.x.x.1

#pscheduler troubleshoot 10.72.32.28

 

<content posted in previous email>

 

HOST 10.x.x.2

 

# echo "" > /var/log/pscheduler/pscheduler.log

# pscheduler debug on runner

 

# pscheduler monitor

 

2019-07-16T19:33:07Z      Pending      idle --duration PT1S                             (Visible in the debug)

2019-07-16T19:33:08Z      Finished     idle --duration PT1S                              (Visible in the debug)

2019-07-16T19:33:48Z      Pending      simplestream --dest 10.72.32.28    (Not visible in the debug)

2019-07-16T19:33:53Z      Missed       simplestream --dest 10.72.32.28     (Not visible in the debug)

 

# pscheduler debug off

# cp /var/log/pscheduler/pscheduler.log /var/log/pscheduler/pscheduler2.log

# cat /var/log/pscheduler/pscheduler2.log

 

Jul 16 15:32:24 localhost journal: runner DEBUG    Debug started

Jul 16 15:32:24 localhost journal: runner DEBUG    No workers to run.

Jul 16 15:32:24 localhost journal: runner DEBUG    Next run or check in 0:01:00

Jul 16 15:33:02 localhost journal: runner DEBUG    Schedule change.

Jul 16 15:33:02 localhost journal: runner DEBUG    No workers to run.

Jul 16 15:33:02 localhost journal: runner DEBUG    Next run or check in 0:01:00

Jul 16 15:33:02 localhost journal: runner DEBUG    Schedule change.

Jul 16 15:33:02 localhost journal: runner DEBUG    Run 311, starts at 2019-07-16 19:33:07+00:00

Jul 16 15:33:02 localhost journal: runner DEBUG    311: Created worker

Jul 16 15:33:02 localhost journal: runner DEBUG    Dropping wait time to 0:00:05

Jul 16 15:33:02 localhost journal: runner DEBUG    Putting runs on deck: [311L]

Jul 16 15:33:02 localhost journal: runner DEBUG    311: Thread running

Jul 16 15:33:02 localhost journal: runner DEBUG    Next run or check in 0:00:05

Jul 16 15:33:02 localhost journal: runner DEBUG    Start at 2019-07-16 19:33:07+00:00

Jul 16 15:33:02 localhost journal: runner DEBUG    311: Sleeping 0:00:04.247848 until test start

Jul 16 15:33:07 localhost journal: runner INFO     311: Running https://10.x.x.2/pscheduler/tasks/c647ab3f-9de9-4f38-9fde-7db3488c8023/runs/e22547a5-0558-433f-bc29-b59e0a894fff

Jul 16 15:33:07 localhost journal: runner INFO     311: With snooze: idle --duration PT1S

Jul 16 15:33:07 localhost journal: runner DEBUG    Schedule change.

Jul 16 15:33:07 localhost journal: runner DEBUG    311: Tool input: {"task-uuid": "c647ab3f-9de9-4f38-9fde-7db3488c8023", "participant": 0, "schedule": {"duration": "PT1S", "start": "2019-07-16T19:33:07Z"}, "participant-data": [{}], "test": {"type": "idle", "spec": {"duration": "PT1S", "schema": 1}}, "limits-passed": [], "schema": 1}

Jul 16 15:33:07 localhost journal: runner DEBUG    311: Start time difference is 0:00:00.009052

Jul 16 15:33:07 localhost journal: runner DEBUG    No workers to run.

Jul 16 15:33:07 localhost journal: runner DEBUG    Next run or check in 0:01:00

Jul 16 15:33:08 localhost journal: runner INFO     311: Run succeeded.

Jul 16 15:33:08 localhost journal: runner DEBUG    311: Run returned {"diags": "", "result": {"time-slept": "PT1S"}, "succeeded": true, "error": ""}

Jul 16 15:33:08 localhost journal: runner DEBUG    311: Stored local result

Jul 16 15:33:08 localhost journal: runner DEBUG    311: Doing lead participant duties

Jul 16 15:33:08 localhost journal: runner DEBUG    311: Only one participant; not waiting.

Jul 16 15:33:08 localhost journal: runner DEBUG    311: Runs are ['https://10.x.x.2/pscheduler/tasks/c647ab3f-9de9-4f38-9fde-7db3488c8023/runs/e22547a5-0558-433f-bc29-b59e0a894fff']

Jul 16 15:33:08 localhost journal: runner DEBUG    311: Local run returned 0

Jul 16 15:33:08 localhost journal: runner DEBUG    311: Accumulated local result

Jul 16 15:33:08 localhost journal: runner DEBUG    311: 0 failures

Jul 16 15:33:08 localhost journal: runner DEBUG    311: Merging results: [{u'diags': u'', u'result': {u'time-slept': u'PT1S'}, u'succeeded': True, u'error': u''}]

Jul 16 15:33:08 localhost journal: runner DEBUG    311: Merging {"test": {"type": "idle", "spec": {"duration": "PT1S", "schema": 1}}, "results": [{"diags": "", "result": {"time-slept": "PT1S"}, "succeeded": true, "error": ""}]}

Jul 16 15:33:08 localhost journal: runner DEBUG    Schedule change.

Jul 16 15:33:08 localhost journal: runner DEBUG    No workers to run.

Jul 16 15:33:08 localhost journal: runner DEBUG    Next run or check in 0:01:00

Jul 16 15:33:08 localhost journal: runner DEBUG    311: Merged results: 0 {"duration": "PT1S", "succeeded": true, "schema": 1}

Jul 16 15:33:08 localhost journal: runner DEBUG

Jul 16 15:33:08 localhost journal: runner DEBUG    311: Full result: [{"diags": "", "result": {"time-slept": "PT1S"}, "succeeded": true, "error": ""}]

Jul 16 15:33:08 localhost journal: runner DEBUG    311: Merged result: {"duration": "PT1S", "succeeded": true, "schema": 1}

Jul 16 15:33:08 localhost journal: runner DEBUG    311: Run complete

Jul 16 15:33:08 localhost journal: runner DEBUG    311: Thread finished

Jul 16 15:33:08 localhost journal: runner DEBUG    Schedule change.

Jul 16 15:33:08 localhost journal: runner DEBUG    No workers to run.

Jul 16 15:33:08 localhost journal: runner DEBUG    Next run or check in 0:01:00

Jul 16 15:33:45 localhost journal: runner DEBUG    Schedule change.

Jul 16 15:33:45 localhost journal: runner DEBUG    No workers to run.

Jul 16 15:33:45 localhost journal: runner DEBUG    Next run or check in 0:01:00

Jul 16 15:33:48 localhost journal: runner DEBUG    Schedule change.

Jul 16 15:33:48 localhost journal: runner DEBUG    No workers to run.

Jul 16 15:33:48 localhost journal: runner DEBUG    Next run or check in 0:01:00

Jul 16 15:34:31 localhost journal: runner DEBUG    Schedule change.

Jul 16 15:34:31 localhost journal: runner DEBUG    No workers to run.

Jul 16 15:34:31 localhost journal: runner DEBUG    Next run or check in 0:01:00

Jul 16 15:34:42 localhost journal: runner DEBUG    Debug discontinued

 

De : [mailto:] De la part de Louis-Berthier Soullière
Envoyé : 16 juillet 2019 08:42
À : Mark Feit <>;
Objet : RE: [perfsonar-user] Encountered end of file

 

Hi Mark,

 

I restarted the httpd service and it is now working.

 

But I got another error when doing the troubleshoot (btw, i changed the MTU on the interfaces of both server to 1460)

 

localhost:

 

  Measuring MTU... 65535 (Local)

  Looking for pScheduler... OK.

  Checking clock... OK.

  Exercising API... Status... Tests... Tools... OK.

  Idle test.... 4 seconds.... Checking archiving... OK.

 

10.x.x.1 :

 

  Measuring MTU... 1460+

  Looking for pScheduler... OK.

  Checking clock... OK.

  Exercising API... Status... Tests... Tools... OK.

  Idle test.... 1 seconds.... Checking archiving... OK.

 

localhost and 10.x.x.2 :

 

  Checking IP addresses... IPv4

  Measuring MTU... 1460+

  Checking timekeeping... OK.

  Simple stream test.... 9 seconds.... Failed.

 

Task failed to run properly.

 

2019-07-16T12:06:32Z on 127.0.0.1 and 10.x.x.2 with simplestreamer:

 

simplestream --dest 10.x.x.2

 

Run did not complete: Failed

 

This run likely failed because the clocks on participants differed by 0:00:01.941681.

 

Limit system diagnostics for this run:

 

  No limits were applied

 

Diagnostics from 127.0.0.1:

  Try 1 failed: Failed to connect: [Errno 111] Connection refused

  Try 2 failed: Failed to connect: [Errno 111] Connection refused

  Try 3 failed: Failed to connect: [Errno 111] Connection refused

  Try 4 failed: Failed to connect: [Errno 111] Connection refused

  Try 5 failed: Failed to connect: [Errno 111] Connection refused

  Ran out of time trying to connect.

 

Error from 127.0.0.1:

  Failed to connect: [Errno 111] Connection refused

 

Diagnostics from 10.x.x.1 :

  No result was produced

 

Error from 10.x.x.2:

  No result was produced

 

============

 

I synchronized both clock manually before the operation, and both use the same server.

 

# service ntpd stop
# ntpdate ntp.rtss.qc.ca
# service ntpd start

 

I even tried to switch to chrony because it is suppose to handle the network latency better (satellite), no luck.

 

On both host :

 

[]# psc ping localhost

localhost: pScheduler is alive

 

If I try to fetch manually the api version from the host 10.x.x.2, it work :

 

# curl -ks https://10.x.x.2/pscheduler/api

4

 

Log of 10.x.x.1 :

 

Jul 16 08:09:54 perfsonar journal: runner INFO     205654: With simplestreamer: simplestream --dest 10.x.x.1

Jul 16 08:09:55 perfsonar journal: runner INFO     23875: Posted result to https://10.x.x.1/pscheduler/tasks/de96c854-9933-472d-b8f7-c52612c8b2b9/runs/805a53b3-d80d-4b53-bb35-9f3da25b667d

Jul 16 08:09:55 perfsonar journal: runner INFO     23877: Posted result to https://10.x.x.1/pscheduler/tasks/b3c3941e-fac9-4d03-9f5a-e359244e2db8/runs/056dcc00-a9e0-4903-95fc-47b50d9cda98

Jul 16 08:09:59 perfsonar journal: runner INFO     205654: Run succeeded.

Jul 16 08:10:05 perfsonar journal: runner INFO     22182: Posted result to https://10.x.x.1/pscheduler/tasks/02f151e9-18ae-4370-b7fb-bc29eb263a25/runs/49c23c92-e7da-4d62-8012-2f816a72c467

Jul 16 08:10:05 perfsonar journal: runner INFO     22180: Posted result to https://10.x.x.1/pscheduler/tasks/936ac75d-0020-49be-93f2-2a37743ff900/runs/e1a798c8-5a90-4cc6-b399-661fc6347edf

Jul 16 08:10:14 perfsonar journal: runner WARNING  205654: Unable to retrieve run https://10.x.x.2/pscheduler/tasks/bf27fdfe-7c89-4db3-ab34-a5180152b35c/runs/fb8a42c9-f614-4029-a474-d2fea305c0bf

 

Log of 10.x.x.2 :

 

Nothing

 

Louis

 

De : Mark Feit []
Envoyé : 15 juillet 2019 18:57
À : Louis-Berthier Soullière <>;
Objet : Re: [perfsonar-user] Encountered end of file

 

Louis-Berthier Soullière writes:

 

I'm trying to run this command with no success :

 

# pscheduler ping 10.x.x.1

10.x.x.1 : Encountered end of file

- Running pS-Toolkit-4.2.0b1-CentOS7-FullInstall-x86_64-2019Jul09.iso on both side

 

One key difference between the beta and production versions is the introduction of different library to do the HTTP(S) operations.  It’s cURL, which has an established track record, and we’ve had it in the test bed for months.  The error you’re seeing comes from cURL.

 

“pscheduler ping 10.x.x.1” does the equivalent of “curl -sk https://10.x.x.1/pscheduler/”.  If you can do that on the command line and get the same result, the problem isn’t likely a problem with pScheduler.

 

- High latency between the two server : 530ms

- Tried to simulate latency between two local server and cannot replicate the issue (not a latency problem)

- The test work one way but not the other. (Server A to B = OK, B to A = issue)

 

Is this the same pair of systems you were working with in ticket #884?  Your last comment indicates that there’s a MTU mismatch between the two systems, which would cause problems in the larger-to-smaller direction.  If you’re on satellite, some providers have been known to do things that make path MTU discovery not work.

 

Is it possible to disable all limits for testing purpose?

 

This isn’t a limits-related problem, but you can disable the limit system by deleting or renaming /etc/pscheduler/limits.conf.  The change will take effect within 30 seconds.

 

--Mark

 




Archive powered by MHonArc 2.6.19.

Top of Page