Skip to Content.
Sympa Menu

perfsonar-user - Re: [perfsonar-user] perfSONAR host stop report the test results

Subject: perfSONAR User Q&A and Other Discussion

List archive

Re: [perfsonar-user] perfSONAR host stop report the test results


Chronological Thread 
  • From: Pedro Reis <>
  • To: "Garnizov, Ivan (RRZE)" <>, "" <>
  • Subject: Re: [perfsonar-user] perfSONAR host stop report the test results
  • Date: Wed, 14 Dec 2016 08:29:24 +0000
  • Ironport-phdr: 9a23:ci5NOR8BWabM8v9uRHKM819IXTAuvvDOBiVQ1KB+0+oVIJqq85mqBkHD//Il1AaPBtSArawZwLSP+4nbGkU4qa6bt34DdJEeHzQksu4x2zIaPcieFEfgJ+TrZSFpVO5LVVti4m3peRMNQJW2aFLduGC94iAPERvjKwV1Ov71GonPhMiryuy+4ZPebgFGiTanYL5/LBq6oATMusILnYZsN6E9xwfTrHBVYepW32RoJVySnxb4+Mi9+YNo/jpTtfw86cNOSL32cKskQ7NWCjQmKH0169bwtRbfVwuP52ATXXsQnxFVHgXK9hD6XpP2sivnqupw3TSRMMPqQbwoXzmp8rxmQwH0higZKzE58XnXis1ug6JdvBKhvAF0z4rNbI2IKPZyYqbRcNUHTmRDQ8lRTTRMDJ6iYYsBD+QBOuVWoYfzqFQBrxSxGRKhBOzzxjJSmnL6waM33uYnHArb3AIgBdUOsHHModvvKqgdTPq1zK7VxjvfcfxW2Cz945XPfxA5v/6DQKhwcc3LxUk1CQzFj06QpJfrPzyJyusNtXKX7/J8Ve+2jWMstg9/oj+qxsg2i4nJgJoYyl7Y+iplxoY5P8W4SFJjbd+qDpRQrD2aN4RsQsMjXm5kojo1xacAtJWmfyYK0IwqywDDZ/CbboSF5xPuWeKNLTp9mn5pZLGyiwq3/EWj1uHwSsm53VNPoyZYjtXAqn8A2wTN5sSbVvdx40Ks1DeL2gzP7+xIP1w4mKnHJ5MkwbM8ioYfvEDGEyLylkj7jKqbe0s49uS29evqZ7frqoOYOoJ0jwz+PLkildKwDOk+LwMARXKU+f6m273m5UD5QKtFjvkxkqTBqp/aPdwbqrKnDwNP3IYs9wqwDzG83NQAgXkLNFNFeBSZgIj1I1zCPuz0Aeuij1mpkTpmw+zKM7j/DpnRLXXPjLLscLVh50JAyAc+yNVS649IBr0dL///Qkrxu8bZDh89PQy02eHnCNBl24MGR22AH7WZMKTIvV+S+O0vIvKMaZQbuDnhN/cl4eTijWclmVMFZ6mmwYMXaGykHvRhO0iZeWTjgs0PEWcRuQo+SvbliEebXT5OfHa9Qbg86yo/CIKnFofDWputjKKb0Ce6GJ1Wen5JCkqKEXj2a4WIRe0AZzyPLc98wXQ4Uu3rZIY70xC8sxG+g5t5J+ycuhcih7+inpAh5vPak1c9/CZyDtaazUmDRnwykm5eA3d8xK1lr1d6zF6ZlLVjjuZwFNpP6ulPXxtgc5PQ0qYyX8j/QAzaedGAUhO7Wdi8KTA3Ut8rxdISOQBwF8j03T7Z2C//O7IZjbvDPpE16eqIx3X9PcE70XvHz4EvhFIgBNNTYz71zpVj/hTeUtaa236SkLynIOFFhHbA
  • Organization: FCT | FCCN

Hello Ivan, All,

Thanks for the info.
The system eventually recovered after a few hours processing the stale
lockfiles.

Now, I was expecting to see the results in the MA, but I'm getting a big
blank. In both toolkits the readings are there, but looks like they
didn't manage to (re)send it to the MA, or the MA didn't processed the
information correctly :(

Com meus melhores cumprimentos | Best Regards
Pedro Reis
Área de Serviços de Rede | Network Services Area
FCT|FCCN
Av. do Brasil, n.º 101
1700-066 Lisboa - Portugal
Telefone|Phone +351 218 440 100; Fax +351 218 472 167
www.fccn.pt

On 2016/12/13 09:53, Garnizov, Ivan (RRZE) wrote:
> Hi Pedro,
>
> Generally all the scheduled tests on the toolkit generate files in that
> /var/lib/perfsonar/regulartesting/ folder.
> I would suggest to first suspend the regulartesting service for a while
> and try to stop all the active processes running which try to write in
> that folder and (optionally) clean after them.
> A very good option for you could be to just restart the server, but make
> note of the currently created folders in /var/lib/perfsonar/regulartesting/
> You might want to clean those afterwards. You will not have easy options
> to do this after the restart.
>
> Regards,
> Ivan Garnizov
>
> /GEANT SA1T2: pS deployments GN Operations/
> /GEANT SA2T3: pS development team/
> /GEANT SA3T5: eduPERT team/
>
>
>
> -----Original Message-----
> From:
>
> [mailto:]
> On Behalf Of Pedro Reis
> Sent: Montag, 12. Dezember 2016 16:38
> To:
>
> Subject: Re: [perfsonar-user] perfSONAR host stop report the test results
>
> Hello all,
>
> I would like to report a similar a similar situation.
>
> I have 2 toolkit that are sending the measurements to a single Archive.
> Now last Saturday around 8AM the MA had a slight problem and stopped
> processing information (I had to re-set the python environment again).
>
> Since I fixed the MA the two toolkits have the the regulartesting.log
> full of messages like this:
> 2016/12/12 15:11:02 (42410) WARN> regulartesting.pl:103 main::__ANON__ -
> Warned: IPC::DirQueue: killed stale lockfile:
> /var/lib/perfsonar/regulartesting/esmond_latency_<MA_FQDN_HERE>/active/active/50.20161212035623328938.EMjQ4Mw
> at /usr/share/perl5/IPC/DirQueue.pm line 519.
>
> And the regulartesting process is using up almost all the CPU
> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
> 1425 perfsona 20 0 237m 112m 1244 R 99.8 0.7 2:29.82 regulartesting.
> 935 cassandr 20 0 21.6g 1.4g 16m S 29.0 9.4 3:20.16 java
> 2505 apache 20 0 673m 46m 4712 S 11.2 0.3 0:03.48 httpd
>
>
> I'm hopping the regular testing is just processing the data from the
> failed connections/measurements from when the MA was down.
> Because I tried to see other errors/problem in other logs and didn't
> seem to find anything relevant!
>
> Until this date I'm still not seeing any new measurements at the MA :(
>
> Com meus melhores cumprimentos | Best Regards
> Pedro Reis
> Área de Serviços de Rede | Network Services Area
> FCT|FCCN
> Av. do Brasil, n.º 101
> 1700-066 Lisboa - Portugal
> Telefone|Phone +351 218 440 100; Fax +351 218 472 167
> www.fccn.pt <http://www.fccn.pt>
>
> On 2016/06/19 18:15, Lixin Liu wrote:
>> It appears the problem is gone, not sure how but tests results are
>> available again.
>>
>> Thanks,
>>
>> Lixin.
>>
>> On 2016-06-18, 10:14 PM, "Lixin Liu"
>> <
>> on behalf of
>>
> <mailto:
> on behalf of
> >>
> wrote:
>>
>> Hi,
>>
>> One of my latency hosts stopped reporting test results starting sometime
>> early today.
>> I see the load on the process
>>
>> perfSONAR Regular Testing: Measurement Archive:
>> esmond_latency_localhost
>>
>> is always at 100% and regulartesting.log continues showing errors like
>> this:
>>
>> 2016/06/18 21:48:23 (2656) WARN> regulartesting.pl:103 main::__ANON__ -
>> Warned: IPC::DirQueue: killed stale lockfile:
>> /var/lib/perfsonar/regulartesting/esmond_latency_localhost/active/active/50.20160618092329343238.EMjI4OQ
>> at /usr/share/perl5/IPC/DirQueue.pm line 519.
>> 2016/06/18 21:48:24 (2656) WARN> regulartesting.pl:103 main::__ANON__ -
>> Warned: IPC::DirQueue: killed stale lockfile:
>> /var/lib/perfsonar/regulartesting/esmond_latency_localhost/active/active/50.20160618092329700337.EMjI5MQ
>> at /usr/share/perl5/IPC/DirQueue.pm line 519.
>>
>> I hope someone could help me to figure out what needs to be done to
>> resolve this
>> problem. The hostname of the machine is lat-usask.westgrid.ca.
>>
>> Thanks,
>>
>> Lixin Liu
>> Compute Canada & WestGrid
>>
>>
>>
>>
>>
>>
>>
>
>

Attachment: smime.p7s
Description: S/MIME Cryptographic Signature




Archive powered by MHonArc 2.6.19.

Top of Page