Skip to Content.
Sympa Menu

perfsonar-user - Re: [perfsonar-user] "pscheduler result" raises exception {"result-merged": null}

Subject: perfSONAR User Q&A and Other Discussion

List archive

Re: [perfsonar-user] "pscheduler result" raises exception {"result-merged": null}


Chronological Thread 
  • From: Brian Candler <>
  • To: "" <>
  • Subject: Re: [perfsonar-user] "pscheduler result" raises exception {"result-merged": null}
  • Date: Fri, 6 Sep 2019 11:40:45 +0100
  • Domainkey-signature: a=rsa-sha1; c=nofws; d=pobox.com; h=from:subject:to :references:message-id:date:mime-version:in-reply-to :content-type:content-transfer-encoding; q=dns; s=sasl; b=FiaxuX zqoRcxVzOLGfPHaeKKsaQOhN6+ITQVYaCUpGPJjpOR/4uUDXnHnzMaBG/w0y9rS/ KEBgtZOnSzDEmL8uBs/D6cb2LKjw2YdAIgyccrBX+aGOI+zHS3KOGVKbH5S44cSE pwJwbdrSOiQwMuw8zVT1PdYH0JkziFR6FSx3k=

I am making slight progress, although I'm having to do far more reverse-engineering than I'd hope.  To summarise, the current symptoms are:

- throughput tasks submitted at the command line (pscheduler task ...) display results for both outbound and inbound tests, in the CLI response.

- but scheduled tasks (created in the GUI) are storing results for outbound, but not inbound.  Specifically, the toolkit front page list of tests shows "n/a" for the inbound throughput; and the graph page shows an outbound line but no inbound line.

Here's what I believe I have discovered.

1. for inbound tests (i.e. --source <remotenode> --dest <localnode>), the "lead participant" is the remote node

2. the lead node is the only one which stores the result_merged value in the database.  This means that

    pscheduler result https://localhost/pscheduler/tasks/<tuuid>/runs/<ruuid>
complains that result-merged is missing, but

    pscheduler result https://<remotenode>/pscheduler/tasks/<tuuid>/runs/<ruuid>
shows the full result

That explains my 'pscheduler result' problem: I should have been asking the lead node.

3. This implies to me that the archiver, which writes the results to esmond, runs on the lead node - which in this case is the remote node

This in turn implies that either esmond on the local node must be fully open to writes from the whole Internet (ouch!); or that some sort of access token is given to the remote node to grant it permission to write to the local node.

Fortunately, I have full control of both the local node (Ubuntu 18.04 / perfsonar-toolkit 4.2.0) and the remote node (Ubuntu 16.04 / perfsonar-toolkit 4.1.6) so was able to turn on debugging at the far end.

Debugging on the remote node says archiving was successful:

Sep  6 10:49:53 ns1 archiver-esmond/archive DEBUG    Debug started
Sep  6 10:49:53 ns1 archiver-esmond/archive message repeated 2 times: [ DEBUG    Debug started]
Sep  6 10:49:53 ns1 archiver-esmond/archive DEBUG Archiver received: {u'last-attempt': None, u'attempts': 0, u'result':...
Sep  6 10:49:53 ns1 archiver-esmond/archive DEBUG Found metadata key d7bd3b9b8f28474b97a730a49364ccb4 in cache for https://localhost/pscheduler/tasks/b28773ff-3c0e-4b5d-99e5-cf9a1fc143f8@https://[A:A:A:A::45]/esmond/perfsonar/archive/, so running in fast mode
Sep  6 10:49:53 ns1 archiver-esmond/archive DEBUG Found metadata key d7bd3b9b8f28474b97a730a49364ccb4 in cache for https://localhost/pscheduler/tasks/b28773ff-3c0e-4b5d-99e5-cf9a1fc143f8@https://[A:A:A:A::45]/esmond/perfsonar/archive/, so running in fast mode
Sep  6 10:49:53 ns1 archiver-esmond/archive DEBUG fast_mode is True
Sep  6 10:49:53 ns1 archiver-esmond/archive DEBUG fast_mode is True
Sep  6 10:49:53 ns1 archiver-esmond/archive DEBUG Putting data to https://[A:A:A:A::45]/esmond/perfsonar/archive/d7bd3b9b8f28474b97a730a49364ccb4/: {'data': [{'ts': 1567763364, 'val': [{'val': 206006572.70301065, 'event-type': 'throughput'}, {'val': 637, 'event-type': 'packet-retransmits'}, ...snip... {'val': {'href': 'https://localhost/pscheduler/tasks/b28773ff-3c0e-4b5d-99e5-cf9a1fc143f8/runs/94b5a7a6-5c85-4c0c-b369-f6325f153f2d'}, 'event-type': 'pscheduler-run-href'}]}]}
Sep  6 10:49:53 ns1 archiver-esmond/archive DEBUG Putting data to https://[A:A:A:A::45]/esmond/perfsonar/archive/d7bd3b9b8f28474b97a730a49364ccb4/: {'data': ...snip...
Sep  6 10:49:53 ns1 archiver DEBUG    174540: Returned JSON from archiver: {u'succeeded': True}
Sep  6 10:49:53 ns1 archiver DEBUG    174540: Succeeded: 94b5a7a6-5c85-4c0c-b369-f6325f153f2d to esmond
Sep  6 10:49:53 ns1 archiver DEBUG    174540: Thread finished
Sep  6 10:49:53 ns1 archiver DEBUG    Notifications: archiving_change
Sep  6 10:49:53 ns1 archiver DEBUG    Nothing to archive; finding time until next archiving.
Sep  6 10:49:53 ns1 archiver DEBUG    Nothing in the future.
Sep  6 10:49:53 ns1 archiver DEBUG    Waiting 15.0 for change or notification

(Aside: it's weird that those lines are logged twice, apart from "Archiver received". Those messages come from /usr/lib/pscheduler/classes/archiver/esmond/archive)

But the specified archive on the local node appears to be empty: event-types is empty list.

root@perf1:~# curl -sk https://localhost/esmond/perfsonar/archive/d7bd3b9b8f28474b97a730a49364ccb4/
{"url":"https://localhost/esmond/perfsonar/archive/d7bd3b9b8f28474b97a730a49364ccb4/","metadata-key":"d7bd3b9b8f28474b97a730a49364ccb4","subject-type":"point-to-point","event-types":[],"uri":"/esmond/perfsonar/archive/d7bd3b9b8f28474b97a730a49364ccb4/"}

On the local node, /var/log/apache2/access.log shows success (status code 201) for the write:

B:B:B:B:feff:ff:fe00:f16 - - [06/Sep/2019:09:49:53 +0000] "PUT /esmond/perfsonar/archive/d7bd3b9b8f28474b97a730a49364ccb4/ HTTP/1.1" 201 1558 "-" "python-requests/2.9.1"

At this point I'm pretty stumped; I don't know how to go about debugging esmond internals.  /var/log/esmond/esmond.log is zero bytes.

There *is* one other thing that might be related.  The local node has no reverse DNS for its IPv6 address; I haven't gotten round to getting it delegated yet.  Hence you can see the URL generated with https://[ipaddress]/.  But that URL is valid, and in any case, if it couldn't be contacted, the archiver would have reported an error.  And of course, I see the successful access log.

To try to rule that out, I've just added an /etc/hosts entry on the remote host.

Regards,

Brian Candler.




Archive powered by MHonArc 2.6.19.

Top of Page