Skip to Content.
Sympa Menu

perfsonar-user - Re: [perfsonar-user] perfSONAR host stop report the test results

Subject: perfSONAR User Q&A and Other Discussion

List archive

Re: [perfsonar-user] perfSONAR host stop report the test results


Chronological Thread 
  • From: Pedro Reis <>
  • To:
  • Subject: Re: [perfsonar-user] perfSONAR host stop report the test results
  • Date: Mon, 12 Dec 2016 15:38:29 +0000
  • Ironport-phdr: 9a23:fd5mUBdn7l8XwmDzyaqQ/wd+lGMj4u6mDksu8pMizoh2WeGdxcS+Zx7h7PlgxGXEQZ/co6odzbGH6Oa+Aydavt6oizMrSNR0TRgLiMEbzUQLIfWuLgnFFsPsdDEwB89YVVVorDmROElRH9viNRWJ+iXhpTEdFQ/iOgVrO+/7BpDdj9it1+C15pbffxhEiCCzbL52Ihi6twTcutcZjYd8KKs61wfErGZPd+lK321jOEidnwz75se+/Z5j9zpftvc8/MNeUqv0Yro1Q6VAADspL2466svrtQLeTQSU/XsTTn8WkhtTDAfb6hzxQ4r8vTH7tup53ymaINH2QLUpUjms86tnVBnlgzoBOjUk8m/Yl9ZwgbpGrhy/qRxxw4nUboKbOvVwcazSf88VSHFOXspNTSFMHp+wYoUNAucHIO1Wr5P9p1wLrRamHwejGv7gyiVPhnTrwaM1zeUhERrb1wEnB9IBrmnbrM/yNKsIS+C60qjIzS7YYvNYxTjy9I7Ifgo5ofGQRL99d9fax0coFwPAlFqQqIrlMiub1usRr2eb6fBsVfqzi2I/pQB+vCOvxtsjionTh4Ia10rI+jljz4szONa2S1Z7bMa5HJZUuSyWLZZ6T80gTm1ypSo3zr4LtYS1cSUL0Jgr2QLTZ+Caf4WN4x/uUPqdLDJmiH55e7+zmxO//Eu8xuHgUsS51UhFoTZZndbQs30CzBze5tWJR/Z48EquwzKC1w7W5+xGPE85lbTUJpg8ybAqjJUTq17MHirulUX2kqCWckIk9/Cy5OT/erXquoKQO5V6ig7iLqsunNazAfwkMgQWXmib//qz1KH78EHkRLhHgOc6nrTXvZ3YP8gXu6q0DgxP3ost6huzFzKm384ZnXkDIlJFYhWHj43xNlHWPv/3EO2/g1O2nDdx2f/GP6bhD47DLnffjLjhe6xx61ZAyAYr19BQ+4pUCq0dIPL0QkLxrMLXDgU3MwyvxObnDs9y1pkHVWKSGa+WLrnSvESM5uIuOOmMeJQVtCjnJ/gk4f7ukWE2mUUbfaa3wZsbdmq0Eep7LEWEMjLQhYIZHH0EpQ04RfavlUaPSxZSYWq/RaQx+mt9BY67XqnZQYX4uLGNxiDzJJBXfSgSEl2AC3CubYiFQN8CYymUZNd9xG9XHYO9QpMsgEn9/DTxzKBqe7LZ
  • Organization: FCT | FCCN

Hello all,

I would like to report a similar a similar situation.

I have 2 toolkit that are sending the measurements to a single Archive.
Now last Saturday around 8AM the MA had a slight problem and stopped
processing information (I had to re-set the python environment again).

Since I fixed the MA the two toolkits have the the regulartesting.log
full of messages like this:
2016/12/12 15:11:02 (42410) WARN> regulartesting.pl:103 main::__ANON__ -
Warned: IPC::DirQueue: killed stale lockfile:
/var/lib/perfsonar/regulartesting/esmond_latency_<MA_FQDN_HERE>/active/active/50.20161212035623328938.EMjQ4Mw
at /usr/share/perl5/IPC/DirQueue.pm line 519.

And the regulartesting process is using up almost all the CPU
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
1425 perfsona 20 0 237m 112m 1244 R 99.8 0.7 2:29.82 regulartesting.
935 cassandr 20 0 21.6g 1.4g 16m S 29.0 9.4 3:20.16 java
2505 apache 20 0 673m 46m 4712 S 11.2 0.3 0:03.48 httpd


I'm hopping the regular testing is just processing the data from the
failed connections/measurements from when the MA was down.
Because I tried to see other errors/problem in other logs and didn't
seem to find anything relevant!

Until this date I'm still not seeing any new measurements at the MA :(

Com meus melhores cumprimentos | Best Regards
Pedro Reis
Área de Serviços de Rede | Network Services Area
FCT|FCCN
Av. do Brasil, n.º 101
1700-066 Lisboa - Portugal
Telefone|Phone +351 218 440 100; Fax +351 218 472 167
www.fccn.pt

On 2016/06/19 18:15, Lixin Liu wrote:
> It appears the problem is gone, not sure how but tests results are
> available again.
>
> Thanks,
>
> Lixin.
>
> On 2016-06-18, 10:14 PM, "Lixin Liu"
> <
> on behalf of
> >
> wrote:
>
> Hi,
>
> One of my latency hosts stopped reporting test results starting sometime
> early today.
> I see the load on the process
>
> perfSONAR Regular Testing: Measurement Archive:
> esmond_latency_localhost
>
> is always at 100% and regulartesting.log continues showing errors like this:
>
> 2016/06/18 21:48:23 (2656) WARN> regulartesting.pl:103 main::__ANON__ -
> Warned: IPC::DirQueue: killed stale lockfile:
> /var/lib/perfsonar/regulartesting/esmond_latency_localhost/active/active/50.20160618092329343238.EMjI4OQ
> at /usr/share/perl5/IPC/DirQueue.pm line 519.
> 2016/06/18 21:48:24 (2656) WARN> regulartesting.pl:103 main::__ANON__ -
> Warned: IPC::DirQueue: killed stale lockfile:
> /var/lib/perfsonar/regulartesting/esmond_latency_localhost/active/active/50.20160618092329700337.EMjI5MQ
> at /usr/share/perl5/IPC/DirQueue.pm line 519.
>
> I hope someone could help me to figure out what needs to be done to resolve
> this
> problem. The hostname of the machine is lat-usask.westgrid.ca.
>
> Thanks,
>
> Lixin Liu
> Compute Canada & WestGrid
>
>
>
>
>
>
>

Attachment: smime.p7s
Description: S/MIME Cryptographic Signature




Archive powered by MHonArc 2.6.19.

Top of Page