Skip to Content.
Sympa Menu

perfsonar-user - Re: [perfsonar-user] runaway powstream processes

Subject: perfSONAR User Q&A and Other Discussion

List archive

Re: [perfsonar-user] runaway powstream processes


Chronological Thread 
  • From: Andrew Lake <>
  • To: Marian Babik <>, "" <>
  • Subject: Re: [perfsonar-user] runaway powstream processes
  • Date: Thu, 30 Nov 2017 09:41:02 -0500
  • Ironport-phdr: 9a23:pBqRzBHSjKSJm1THS3XkK51GYnF86YWxBRYc798ds5kLTJ78r8WwAkXT6L1XgUPTWs2DsrQf2rqQ6/iocFdDyK7JiGoFfp1IWk1NouQttCtkPvS4D1bmJuXhdS0wEZcKflZk+3amLRodQ56mNBXdrXKo8DEdBAj0OxZrKeTpAI7SiNm82/yv95HJbQhFgDmwbaluIBmqsA7cqtQYjYx+J6gr1xDHuGFIe+NYxWNpIVKcgRPx7dqu8ZBg7ipdpesv+9ZPXqvmcas4S6dYDCk9PGAu+MLrrxjDQhCR6XYaT24bjwBHAwnB7BH9Q5fxri73vfdz1SWGIcH7S60/VC+85Kl3VhDnlCYHNyY48G7JjMxwkLlbqw+lqxBm3oLYfJ2ZOP94c6jAf90VWHBBU95RWSNDDIOyaIQAAeQCM+hFsYfyu0ADogGiCQS2Hu7i0CNEi33w0KYn0+ohCwbG3Ak4Et8StnTbsc/1O7kcUOuoyqfH1zbDYO1L0jr68ofIdA0uoPGXUL1uasrd008vGB3ZjliJr4HuIjCb1vwVvmSG8eZtVvijhmA9pwx+vzSj3MUhhpTRio4L1lzJ8T91zYU1KNGiVkJ3fNGpHIFSui2HMYZ9X9ksTHtyuCkgz70LoZ67czYOyJQg3xPfZeKIfoeS7hLnT+mRJS10hH1ieLKhnxqy8E6gxfPgVsSszVpGsClInsPCu3wX2BHf99KLR/lg8ku53DaAzQHT6uVKIUAukqrbLoYszaQ2lpUOsUXOBTH5mF7sgK+QaEok5vCk6/77bbX+up+cK4h0hxnmMqswgMy/D/84Mg8IX2eB4+SwzaDj/VbnT7VQlfA2lqjZsIvGJcQAuKK1GQ5V0oA/6xmhFTem1soXnWUZIF5fZh2IkpXpaBnyJ6XdBO2+ilLkuj5hxvOOarjoGJTEKD7Jmb3gfp525lRajgEzm5QXrYpZEL8aJ/T6QArsr9HCJh4/LwGuxev7UpNw2p5UETaXD7WXK6TUuEXN+/kiOcGNYpMYojDwN6Jj6vLz2ywXg1gYKICv0YEadzicF/draxGQZ3b9qtoaV2EHolxtH6TRlFSeXGsLND6JVKUm62R+Udr+AA==

Hi Marian,

What happens if you stop pscheduler-runner (/etc/init.d/pscheduler-runner stop)? It might take a minute but all powstream processes should go away if you do that. If not you should be able to kill them manually (pkill -9 -f powstream). Once all the powstreams are gone if you start the runner (/etc/init.d/pscheduler-runner start) does the incorrect number of processes appear again? Trying to determine if its a case of too many tests being scheduled, which is what is likely the cause if you have too many processes after restarting the pscheduler-runner. If you don't have too many processes after the restart it may be that something didn't get killed properly. 

Thanks,
Andy


On November 30, 2017 at 7:40:46 AM, Marian Babik () wrote:

Hi,
following 4.0.2 upgrade, a latency node (SLC6) with uptime of just 22 hours, so recently rebooted, which is configured to run 128 one-directional latency tests gives me:
[root@perfsonar-lt ~]# ps waxxxu | grep "powstream/run" | wc -l
496
[root@perfsonar-lt ~]# ps waxxxu | grep powstream | wc -l
1236

I think I should expect 128 powstream/run processes and about 4x the number of powstream, right ? So this looks way off.

The box has 16G RAM and there is quite some memory pressure in effect:
[root@perfsonar-lt ~]# ps -A --no-headers -o pcpu,rss,command | grep "powstream" | awk '{cpu += $1; rss += $2} END {print rss}'
9276304
[root@perfsonar-lt ~]# ps -A --no-headers -o pcpu,rss,command | grep "cassandra" | awk '{cpu += $1; rss += $2} END {print rss}'
4685812
[root@perfsonar-lt ~]# ps -A --no-headers -o pcpu,rss,command | grep httpd | awk '{cpu += $1; rss += $2} END {print rss}'
1258452
[root@perfsonar-lt ~]# ps -A --no-headers -o pcpu,rss,command | grep httpd24 | awk '{cpu += $1; rss += $2} END {print rss}’
1057600

Trying to understand what’s going on gives me (we have separate meshes for IPv4 and IPv6, so the reported ratio is about right):
[root@perfsonar-lt ~]# ps -ef | grep "/usr/bin/powstream" | grep " -6 " | wc -l
144
[root@perfsonar-lt ~]# ps -ef | grep "/usr/bin/powstream" | grep " -4 " | wc -l
597

I should have just one source as one-direction only is tested:
[root@perfsonar-lt ~]# ps -ef | grep "/usr/bin/powstream" | grep " -6 " | awk '{print $23}' | sort | uniq -c
142 perfsonar-lt
[root@perfsonar-lt ~]# ps -ef | grep "/usr/bin/powstream" | grep " -4 " | awk '{print $23}' | sort | uniq -c
3 <IP of perfsonar-lt>
591 perfsonar-lt
3 [perfs-wig-lt]:861
so this looks good (perfs-wig-lt is configured manually), for destinations I get:

[root@perfsonar-lt ~]# ps -ef | grep "/usr/bin/powstream" | grep " -4 " | awk '{print $24}' | sort | uniq | wc -l
100
[root@perfsonar-lt ~]# ps -ef | grep "/usr/bin/powstream" | grep " -6 " | awk '{print $24}' | sort | uniq | wc -l
28
so 128 is correct.

However, looking for counts per destination:
# ps -ef | grep "/usr/bin/powstream" | grep " -4 " | awk '{print $24}' | sort | uniq -c | sort -k 1 --numeric | tail
gives me something like this:
16 [host1]:861
16 [host2]:861
16 [host3]:861
16 [host4]:861
...
19 [host5]:861
29 [host6]:861
30 [host7 (btw. @aglt2)]:861

# ps -ef | grep <host7> then gives me 30 powstream processes (/usr/bin/powstream) that all have stime within 3-4 minutes.

Any hints what’s going on ?

Thanks,
Marian










Archive powered by MHonArc 2.6.19.

Top of Page