Skip to Content.
Sympa Menu

perfsonar-user - Re: [perfsonar-user] Node not reporting to Maddash

Subject: perfSONAR User Q&A and Other Discussion

List archive

Re: [perfsonar-user] Node not reporting to Maddash


Chronological Thread 
  • From: Raul Lopes <>
  • To: Mark Feit <>, "" <>
  • Subject: Re: [perfsonar-user] Node not reporting to Maddash
  • Date: Wed, 7 Jul 2021 14:54:55 +0000
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=jisc.ac.uk; dmarc=pass action=none header.from=jisc.ac.uk; dkim=pass header.d=jisc.ac.uk; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=6zef7nj8QZUX+LrGzISb3DV+pTZLGfTPdADPpFJLLq8=; b=mXE7jeYNtlasmpN7GFtJuY2s6L5BrSf2zKJzQfCmAxkFeMK3/nSgd4c6Ebg/ZTgA6Qmiad59B/oFet8w5jb+7M31ZjXuLdUnD/u9I8hVvBJVUexRyCtbS3ETiE/4gaiE2HqfW7ngpUvBya92HfOveUxjgzaUBKKN0v109p80Y58KgYIOSEPczm7aEgCFHxEdnnBcoAL49aRahJvPT80eZg66ZKziRObSLIccEssbS3arIUGU2Jh1OgYevAYtgP68n+quIX/HvUIGRH7CV9U+v3dYAxsH5/p8FrjG9H78rDEtpHDpgR2OP7rsYqYRfm8F0eshhtFv8KKDl7kjOBWqKA==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=PSAt16oJjgjXK/Xwaa63pFr/TvB7E15DNneA8VFFrLLStN81jeBxhLUNH7mjqic/CcimRy70+obkBe+VcD0HBu8YSbk/vL3l7y/rExwpeGARMhjDlC/4k4EhT8rQp01ysQlGOmmCYG/JE39XHt5QDSK+O/mpy+F9WxawsGytMe69xl1TVjfFztB84JK1JXBTvxkhf2sKwmBxBjSOZ2iyNgggUzgP10M6OIqFikPsaaY5fD1WU9ursrtbbfrsGKjBTd3X8Pzyy5OYeuV+LWXx9ajQaYAn0ysK0mc2R4rTlhUygHTmSpSvSpV6dFKKvdEaoEvgvQhywes1VCKMzXRzZg==

Hi,

I've rebooted the two nodes and the problem remains:
 - esmond reports success in archiving (see below);
 - I see an exchange between the maddash and the nodes every hour: a small number of packets flow in both directions.
 - no data shows in the dashboard.



Raul

[psremote@ps08-em1 ~]$ pscheduler result --archivings https://ps08-lat.rl.ac.uk/pscheduler/tasks/7aaaf59e-eac9-4b73-aae7-1ce584a24218/runs/2a4131ef-d95b-4cb5-9c6e-d233ca595433

021-07-07T14:11:19Z on ps08-lat.rl.ac.uk with powstream:

latencybg --source ps08-lat.rl.ac.uk --dest ps18-lat.chobs.rl.ac.uk --packet-
  count 600 --packet-interval 0.1 --data-ports 8760-9960

Packet Statistics
-----------------
Packets Sent ......... 600 packets
Packets Received ..... 600 packets
Packets Lost ......... 0 packets
Packets Duplicated ... 0 packets
Packets Reordered .... 0 packets

One-way Latency Statistics
--------------------------
Delay Median ......... 8.98 ms
Delay Minimum ........ 0.06 ms
Delay Maximum ........ 96.90 ms
Delay Mean ........... 14.00 ms
Delay Mode ........... 0.10 ms
Delay 25th Percentile ... 0.12 ms
Delay 75th Percentile ... 23.06 ms
Delay 95th Percentile ... 41.50 ms
Max Clock Error ...... 0.98 ms
Common Jitter Measurements:
    P95 - P50 ........ 32.53 ms
    P75 - P25 ........ 22.94 ms
    Variance ......... 222.84 ms
    Std Deviation .... 14.93 ms
Histogram:
    0.06 ms: 1 packets
    0.07 ms: 11 packets
    0.08 ms: 22 packets
    0.09 ms: 27 packets
    0.10 ms: 34 packets
    0.11 ms: 31 packets
    0.12 ms: 25 packets
    0.13 ms: 9 packets
    0.14 ms: 3 packets
    0.15 ms: 3 packets


TTL Statistics
--------------
TTL Median ........... 249.00
TTL Minimum .......... 249.00
TTL Maximum .......... 249.00
TTL Mean ............. 249.00
TTL Mode ............. 249.00
TTL 25th Percentile ... 249.00
TTL 75th Percentile ... 249.00
TTL 95th Percentile ... 249.00

..............

Histogram:
    249: 600 packets

Archivings:

  To esmond, Finished
    2021-07-07T15:11:19+01:00 Succeeded


From: Raul Lopes <>
Sent: 07 July 2021 11:00
To: Mark Feit <>; <>; Raul Lopes <>
Subject: Re: Node not reporting to Maddash
 
Hi,

I've rebooted one of the nodes that is failing to publish. I wanted to see all services starting. I see strange errors in mkessages

Jul  7 10:53:10 ps07-em1 journal: ticker WARNING  Queue maintainer got exception server closed the connection unexpectedly
Jul  7 10:53:10 ps07-em1 journal: ticker WARNING  #011This probably means the server terminated abnormally
Jul  7 10:53:10 ps07-em1 journal: ticker WARNING  #011before or while processing the request.
Jul  7 10:53:10 ps07-em1 journal: safe_run/runner ERROR    Program threw an exception after -1 day, 23:00:11.484495
Jul  7 10:53:10 ps07-em1 journal: safe_run/runner ERROR    Exception: psycopg2.OperationalError: server closed the connection unexpectedly
Jul  7 10:53:10 ps07-em1 journal: safe_run/runner ERROR    #011This probably means the server terminated abnormally
Jul  7 10:53:10 ps07-em1 journal: safe_run/runner ERROR    #011before or while processing the request.
Jul  7 10:53:10 ps07-em1 journal: safe_run/runner ERROR
Jul  7 10:53:10 ps07-em1 journal: safe_run/runner ERROR    Traceback (most recent call last):
Jul  7 10:53:10 ps07-em1 journal: safe_run/runner ERROR      File "/usr/lib/python3.6/site-packages/pscheduler/saferun.py", line 76, in safe_run
Jul  7 10:53:10 ps07-em1 journal: safe_run/runner ERROR        function()
Jul  7 10:53:10 ps07-em1 journal: safe_run/runner ERROR      File "/usr/libexec/pscheduler/daemons/runner", line 986, in <lambda>
Jul  7 10:53:10 ps07-em1 journal: safe_run/runner ERROR        pscheduler.safe_run(lambda: main_program())
Jul  7 10:53:10 ps07-em1 journal: safe_run/runner ERROR      File "/usr/libexec/pscheduler/daemons/runner", line 879, in main_program
Jul  7 10:53:10 ps07-em1 journal: safe_run/runner ERROR        cursor.execute("SELECT heartbeat('runner')")
Jul  7 10:53:10 ps07-em1 journal: safe_run/runner ERROR    psycopg2.OperationalError: server closed the connection unexpectedly
Jul  7 10:53:10 ps07-em1 journal: safe_run/runner ERROR    #011This probably means the server terminated abnormally
Jul  7 10:53:10 ps07-em1 journal: safe_run/runner ERROR    #011before or while processing the request.
Jul  7 10:53:10 ps07-em1 journal: safe_run/runner ERROR    Waiting 0.25 seconds before restarting


Would anyone have clue?

Raul

From: <> on behalf of Raul Lopes <>
Sent: 06 July 2021 20:38
To: Mark Feit <>; <>
Subject: Re: [perfsonar-user] Node not reporting to Maddash
 
Hi,

Is this normal

[root@ps01-em1 PSREMOTE]# systemctl status cassandra.service
● cassandra.service - SYSV: Starts and stops Cassandra
   Loaded: loaded (/etc/rc.d/init.d/cassandra; bad; vendor preset: disabled)
   Active: active (exited) since Tue 2021-07-06 11:50:29 BST; 8h ago
     Docs: man:systemd-sysv-generator(8)
  Process: 25742 ExecStop=/etc/rc.d/init.d/cassandra stop (code=exited, status=1/FAILURE)
  Process: 25811 ExecStart=/etc/rc.d/init.d/cassandra start (code=exited, status=0/SUCCESS)

Jul 06 11:50:28 ps01-em1.rl.ac.uk systemd[1]: cassandra.service: control process exited, code=exited status=1
Jul 06 11:50:28 ps01-em1.rl.ac.uk systemd[1]: Stopped SYSV: Starts and stops Cassandra.
Jul 06 11:50:28 ps01-em1.rl.ac.uk systemd[1]: Unit cassandra.service entered failed state.
Jul 06 11:50:28 ps01-em1.rl.ac.uk systemd[1]: cassandra.service failed.
Jul 06 11:50:28 ps01-em1.rl.ac.uk systemd[1]: Starting SYSV: Starts and stops Cassandra...
Jul 06 11:50:28 ps01-em1.rl.ac.uk su[25820]: (to cassandra) root on none
Jul 06 11:50:29 ps01-em1.rl.ac.uk cassandra[25811]: Starting Cassandra: OK
Jul 06 11:50:29 ps01-em1.rl.ac.uk systemd[1]: Started SYSV: Starts and stops Cassandra.


I assume it is.

Regards, Raul

From: Mark Feit <>
Sent: 06 July 2021 15:06
To: Raul Lopes <>; <>
Subject: Re: Node not reporting to Maddash
 

Raul Lopes writes:

 

pscheduler result doesn't give any hint of error.

 

Archivings:

 

  To esmond, Finished

    2021-07-06T12:52:52+01:00 Succeeded

 

That being the case, the Esmond archiver believes it successfully handed the data over to Esmond and there’s an internal problem.  That’s definitely Andy territory.

 

--Mark

 


Jisc is a registered charity (number 1149740) and a company limited by guarantee which is registered in England under company number. 05747339, VAT number GB 197 0632 86. Jisc’s registered office is: 4 Portwall Lane, Bristol, BS1 6NB. T 0203 697 5800.

Jisc Services Limited is a wholly owned Jisc subsidiary and a company limited by guarantee which is registered in England under company number 02881024, VAT number GB 197 0632 86. The registered office is: 4 Portwall Lane, Bristol, BS1 6NB. T 0203 697 5800.

Jisc Commercial Limited is a wholly owned Jisc subsidiary and a company limited by shares which is registered in England under company number 09316933, VAT number GB 197 0632 86. The registered office is: 4 Portwall Lane, Bristol, BS1 6NB. T 0203 697 5800.

For more details on how Jisc handles your data see our privacy notice here: https://www.jisc.ac.uk/website/privacy-notice




Archive powered by MHonArc 2.6.24.

Top of Page