perfsonar-user - RE: [perfsonar-user] Encountered end of file
Subject: perfSONAR User Q&A and Other Discussion
List archive
- From: Louis-Berthier Soullière <>
- To: Louis-Berthier Soullière <>, Mark Feit <>, "" <>
- Subject: RE: [perfsonar-user] Encountered end of file
- Date: Tue, 16 Jul 2019 19:41:32 +0000
- Arc-authentication-results: i=1; mx.microsoft.com 1;spf=pass smtp.mailfrom=ssss.gouv.qc.ca;dmarc=pass action=none header.from=ssss.gouv.qc.ca;dkim=pass header.d=ssss.gouv.qc.ca;arc=none
- Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=fAK3PkzFCe0hYjp7/NxUFNWhCsXb/Rl3j4ZEAR4gbXA=; b=CPN5hvJloDR80FoanyNaLM2jCxQaYtqeD8BCMQK6yhnDg8kRns8W1J/gfxY9fVfec831LnCLRwhe/V/8aFJvjo74Hygkbrep1siKvR+AzvNWjP+mRU0W2LengIWyTmCaG+zf++1BgDp7hpfxMV4wwlOs/AVsrkiBZLciX8AF/5NttCYQFnPn4DlRNTcfqcYV9N1/Xeus8jGtgau8REGuh6I4iBuh9wEnFwlz8VGuySo+XvkXzrUvabp+nRqoYzPJr39XLizmRJ3EepchuCmtaEPwE2squTsX9z+Gfl3yWL587iuNg21+PpYdm7NjDej5GSekU8JniiB9BIiZEDzCbw==
- Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=aWlcDWycMEkWEJTN7OXmrOzL9R9LP/HWW0UiTnKp2DPegVKW81rzcgc444Q0+/bFf+afC+Dv4FBozTHLLqiftdc6oGWcdtnGG37sum4HfokKtHnvrJzFHr1pXyzqP/UL5hbCqIl6MdaPbrz3d2xI5f8rg5MmFGUdteTW77o/lyH5qGaespWy5n8keuqinBZTZk0TChoqwE5MSyjacbVbqMwqLANPxDpYwg+YbXleRn++jZuZUwoE4JyzXDEZJDItb4h/GeoBM7xd2LrVQDKPd5znD+bTt5UyJAoFWqnVvo0vpuD83XxMzBQbrPOlQYk/bg8oc3pta+Cu4khP+QISEQ==
Ok, so I just did the troubleshoot again with the debug activated on the runner.
HOST 10.x.x.1 #pscheduler troubleshoot 10.72.32.28
<content posted in previous email>
HOST 10.x.x.2
# echo "" > /var/log/pscheduler/pscheduler.log # pscheduler debug on runner
# pscheduler monitor
2019-07-16T19:33:07Z Pending idle --duration PT1S (Visible in the debug) 2019-07-16T19:33:08Z Finished idle --duration PT1S (Visible in the debug) 2019-07-16T19:33:48Z Pending simplestream --dest 10.72.32.28 (Not visible in the debug) 2019-07-16T19:33:53Z Missed simplestream --dest 10.72.32.28 (Not visible in the debug)
# pscheduler debug off # cp /var/log/pscheduler/pscheduler.log /var/log/pscheduler/pscheduler2.log # cat /var/log/pscheduler/pscheduler2.log
Jul 16 15:32:24 localhost journal: runner DEBUG Debug started Jul 16 15:32:24 localhost journal: runner DEBUG No workers to run. Jul 16 15:32:24 localhost journal: runner DEBUG Next run or check in 0:01:00 Jul 16 15:33:02 localhost journal: runner DEBUG Schedule change. Jul 16 15:33:02 localhost journal: runner DEBUG No workers to run. Jul 16 15:33:02 localhost journal: runner DEBUG Next run or check in 0:01:00 Jul 16 15:33:02 localhost journal: runner DEBUG Schedule change. Jul 16 15:33:02 localhost journal: runner DEBUG Run 311, starts at 2019-07-16 19:33:07+00:00 Jul 16 15:33:02 localhost journal: runner DEBUG 311: Created worker Jul 16 15:33:02 localhost journal: runner DEBUG Dropping wait time to 0:00:05 Jul 16 15:33:02 localhost journal: runner DEBUG Putting runs on deck: [311L] Jul 16 15:33:02 localhost journal: runner DEBUG 311: Thread running Jul 16 15:33:02 localhost journal: runner DEBUG Next run or check in 0:00:05 Jul 16 15:33:02 localhost journal: runner DEBUG Start at 2019-07-16 19:33:07+00:00 Jul 16 15:33:02 localhost journal: runner DEBUG 311: Sleeping 0:00:04.247848 until test start Jul 16 15:33:07 localhost journal: runner INFO 311: Running https://10.x.x.2/pscheduler/tasks/c647ab3f-9de9-4f38-9fde-7db3488c8023/runs/e22547a5-0558-433f-bc29-b59e0a894fff Jul 16 15:33:07 localhost journal: runner INFO 311: With snooze: idle --duration PT1S Jul 16 15:33:07 localhost journal: runner DEBUG Schedule change. Jul 16 15:33:07 localhost journal: runner DEBUG 311: Tool input: {"task-uuid": "c647ab3f-9de9-4f38-9fde-7db3488c8023", "participant": 0, "schedule": {"duration": "PT1S", "start": "2019-07-16T19:33:07Z"}, "participant-data": [{}], "test": {"type": "idle", "spec": {"duration": "PT1S", "schema": 1}}, "limits-passed": [], "schema": 1} Jul 16 15:33:07 localhost journal: runner DEBUG 311: Start time difference is 0:00:00.009052 Jul 16 15:33:07 localhost journal: runner DEBUG No workers to run. Jul 16 15:33:07 localhost journal: runner DEBUG Next run or check in 0:01:00 Jul 16 15:33:08 localhost journal: runner INFO 311: Run succeeded. Jul 16 15:33:08 localhost journal: runner DEBUG 311: Run returned {"diags": "", "result": {"time-slept": "PT1S"}, "succeeded": true, "error": ""} Jul 16 15:33:08 localhost journal: runner DEBUG 311: Stored local result Jul 16 15:33:08 localhost journal: runner DEBUG 311: Doing lead participant duties Jul 16 15:33:08 localhost journal: runner DEBUG 311: Only one participant; not waiting. Jul 16 15:33:08 localhost journal: runner DEBUG 311: Runs are ['https://10.x.x.2/pscheduler/tasks/c647ab3f-9de9-4f38-9fde-7db3488c8023/runs/e22547a5-0558-433f-bc29-b59e0a894fff'] Jul 16 15:33:08 localhost journal: runner DEBUG 311: Local run returned 0 Jul 16 15:33:08 localhost journal: runner DEBUG 311: Accumulated local result Jul 16 15:33:08 localhost journal: runner DEBUG 311: 0 failures Jul 16 15:33:08 localhost journal: runner DEBUG 311: Merging results: [{u'diags': u'', u'result': {u'time-slept': u'PT1S'}, u'succeeded': True, u'error': u''}] Jul 16 15:33:08 localhost journal: runner DEBUG 311: Merging {"test": {"type": "idle", "spec": {"duration": "PT1S", "schema": 1}}, "results": [{"diags": "", "result": {"time-slept": "PT1S"}, "succeeded": true, "error": ""}]} Jul 16 15:33:08 localhost journal: runner DEBUG Schedule change. Jul 16 15:33:08 localhost journal: runner DEBUG No workers to run. Jul 16 15:33:08 localhost journal: runner DEBUG Next run or check in 0:01:00 Jul 16 15:33:08 localhost journal: runner DEBUG 311: Merged results: 0 {"duration": "PT1S", "succeeded": true, "schema": 1} Jul 16 15:33:08 localhost journal: runner DEBUG Jul 16 15:33:08 localhost journal: runner DEBUG 311: Full result: [{"diags": "", "result": {"time-slept": "PT1S"}, "succeeded": true, "error": ""}] Jul 16 15:33:08 localhost journal: runner DEBUG 311: Merged result: {"duration": "PT1S", "succeeded": true, "schema": 1} Jul 16 15:33:08 localhost journal: runner DEBUG 311: Run complete Jul 16 15:33:08 localhost journal: runner DEBUG 311: Thread finished Jul 16 15:33:08 localhost journal: runner DEBUG Schedule change. Jul 16 15:33:08 localhost journal: runner DEBUG No workers to run. Jul 16 15:33:08 localhost journal: runner DEBUG Next run or check in 0:01:00 Jul 16 15:33:45 localhost journal: runner DEBUG Schedule change. Jul 16 15:33:45 localhost journal: runner DEBUG No workers to run. Jul 16 15:33:45 localhost journal: runner DEBUG Next run or check in 0:01:00 Jul 16 15:33:48 localhost journal: runner DEBUG Schedule change. Jul 16 15:33:48 localhost journal: runner DEBUG No workers to run. Jul 16 15:33:48 localhost journal: runner DEBUG Next run or check in 0:01:00 Jul 16 15:34:31 localhost journal: runner DEBUG Schedule change. Jul 16 15:34:31 localhost journal: runner DEBUG No workers to run. Jul 16 15:34:31 localhost journal: runner DEBUG Next run or check in 0:01:00 Jul 16 15:34:42 localhost journal: runner DEBUG Debug discontinued
De :
[mailto:] De la part de Louis-Berthier Soullière
Hi Mark,
I restarted the httpd service and it is now working.
But I got another error when doing the troubleshoot (btw, i changed the MTU on the interfaces of both server to 1460)
localhost:
Measuring MTU... 65535 (Local) Looking for pScheduler... OK. Checking clock... OK. Exercising API... Status... Tests... Tools... OK. Idle test.... 4 seconds.... Checking archiving... OK.
10.x.x.1 :
Measuring MTU... 1460+ Looking for pScheduler... OK. Checking clock... OK. Exercising API... Status... Tests... Tools... OK. Idle test.... 1 seconds.... Checking archiving... OK.
localhost and 10.x.x.2 :
Checking IP addresses... IPv4 Measuring MTU... 1460+ Checking timekeeping... OK. Simple stream test.... 9 seconds.... Failed.
Task failed to run properly.
2019-07-16T12:06:32Z on 127.0.0.1 and 10.x.x.2 with simplestreamer:
simplestream --dest 10.x.x.2
Run did not complete: Failed
This run likely failed because the clocks on participants differed by 0:00:01.941681.
Limit system diagnostics for this run:
No limits were applied
Diagnostics from 127.0.0.1: Try 1 failed: Failed to connect: [Errno 111] Connection refused Try 2 failed: Failed to connect: [Errno 111] Connection refused Try 3 failed: Failed to connect: [Errno 111] Connection refused Try 4 failed: Failed to connect: [Errno 111] Connection refused Try 5 failed: Failed to connect: [Errno 111] Connection refused Ran out of time trying to connect.
Error from 127.0.0.1: Failed to connect: [Errno 111] Connection refused
Diagnostics from 10.x.x.1 : No result was produced
Error from 10.x.x.2: No result was produced
============
I synchronized both clock manually before the operation, and both use the same server.
# service ntpd stop
I even tried to switch to chrony because it is suppose to handle the network latency better (satellite), no luck.
On both host :
[]# psc ping localhost localhost: pScheduler is alive
If I try to fetch manually the api version from the host 10.x.x.2, it work :
# curl -ks https://10.x.x.2/pscheduler/api 4
Log of 10.x.x.1 :
Jul 16 08:09:54 perfsonar journal: runner INFO 205654: With simplestreamer: simplestream --dest 10.x.x.1 Jul 16 08:09:55 perfsonar journal: runner INFO 23875: Posted result to https://10.x.x.1/pscheduler/tasks/de96c854-9933-472d-b8f7-c52612c8b2b9/runs/805a53b3-d80d-4b53-bb35-9f3da25b667d Jul 16 08:09:55 perfsonar journal: runner INFO 23877: Posted result to https://10.x.x.1/pscheduler/tasks/b3c3941e-fac9-4d03-9f5a-e359244e2db8/runs/056dcc00-a9e0-4903-95fc-47b50d9cda98 Jul 16 08:09:59 perfsonar journal: runner INFO 205654: Run succeeded. Jul 16 08:10:05 perfsonar journal: runner INFO 22182: Posted result to https://10.x.x.1/pscheduler/tasks/02f151e9-18ae-4370-b7fb-bc29eb263a25/runs/49c23c92-e7da-4d62-8012-2f816a72c467 Jul 16 08:10:05 perfsonar journal: runner INFO 22180: Posted result to https://10.x.x.1/pscheduler/tasks/936ac75d-0020-49be-93f2-2a37743ff900/runs/e1a798c8-5a90-4cc6-b399-661fc6347edf Jul 16 08:10:14 perfsonar journal: runner WARNING 205654: Unable to retrieve run https://10.x.x.2/pscheduler/tasks/bf27fdfe-7c89-4db3-ab34-a5180152b35c/runs/fb8a42c9-f614-4029-a474-d2fea305c0bf
Log of 10.x.x.2 :
Nothing
Louis
De : Mark Feit []
Louis-Berthier Soullière writes:
I'm trying to run this command with no success :
# pscheduler ping 10.x.x.1 10.x.x.1 : Encountered end of file … - Running pS-Toolkit-4.2.0b1-CentOS7-FullInstall-x86_64-2019Jul09.iso on both side
One key difference between the beta and production versions is the introduction of different library to do the HTTP(S) operations. It’s cURL, which has an established track record, and we’ve had it in the test bed for months. The error you’re seeing comes from cURL.
“pscheduler ping 10.x.x.1” does the equivalent of “curl -sk https://10.x.x.1/pscheduler/”. If you can do that on the command line and get the same result, the problem isn’t likely a problem with pScheduler.
- High latency between the two server : 530ms - Tried to simulate latency between two local server and cannot replicate the issue (not a latency problem) - The test work one way but not the other. (Server A to B = OK, B to A = issue)
Is this the same pair of systems you were working with in ticket #884? Your last comment indicates that there’s a MTU mismatch between the two systems, which would cause problems in the larger-to-smaller direction. If you’re on satellite, some providers have been known to do things that make path MTU discovery not work.
Is it possible to disable all limits for testing purpose?
This isn’t a limits-related problem, but you can disable the limit system by deleting or renaming /etc/pscheduler/limits.conf. The change will take effect within 30 seconds.
--Mark
|
- [perfsonar-user] Encountered end of file, Louis-Berthier Soullière, 07/15/2019
- <Possible follow-up(s)>
- Re: [perfsonar-user] Encountered end of file, Mark Feit, 07/15/2019
- RE: [perfsonar-user] Encountered end of file, Louis-Berthier Soullière, 07/16/2019
- RE: [perfsonar-user] Encountered end of file, Louis-Berthier Soullière, 07/16/2019
- Re: [perfsonar-user] Encountered end of file, Mark Feit, 07/16/2019
- RE: [perfsonar-user] Encountered end of file, Louis-Berthier Soullière, 07/19/2019
- RE: [perfsonar-user] Encountered end of file, Louis-Berthier Soullière, 07/16/2019
Archive powered by MHonArc 2.6.19.