Skip to Content.
Sympa Menu

perfsonar-user - [perfsonar-user] runaway powstream processes

Subject: perfSONAR User Q&A and Other Discussion

List archive

[perfsonar-user] runaway powstream processes


Chronological Thread 
  • From: Marian Babik <>
  • To: "" <>
  • Subject: [perfsonar-user] runaway powstream processes
  • Date: Thu, 30 Nov 2017 12:40:08 +0000
  • Accept-language: en-GB, en-US
  • Authentication-results: spf=pass (sender IP is 188.184.36.46) smtp.mailfrom=cern.ch; internet2.edu; dkim=none (message not signed) header.d=none;internet2.edu; dmarc=bestguesspass action=none header.from=cern.ch;
  • Ironport-phdr: 9a23:477IFRTd6i4+2r08LAZJv+yyhNpsv+yvbD5Q0YIujvd0So/mwa6zZhGN2/xhgRfzUJnB7Loc0qyN4vCmATRIyK3CmUhKSIZLWR4BhJdetC0bK+nBN3fGKuX3ZTcxBsVIWQwt1Xi6NU9IBJS2PAWK8TW94jEIBxrwKxd+KPjrFY7OlcS30P2594HObwlSijewZbB/IA+qoQnNq8IbnZZsJqEtxxXTv3BGYf5WxWRmJVKSmxbz+MK994N9/ipTpvws6ddOXb31cKokQ7NYCi8mM30u683wqRbDVwqP6WACXWgQjxFFHhLK7BD+Xpf2ryv6qu9w0zSUMMHqUbw5Xymp4qF2QxHqlSgHLSY0/mHJhMJtkKJVrhGvpx1jzIHbe4yVO+ZyfqbHcN8GWWZMXMBcXDFBDIOmaIsPCvIMM+FCoInnplsBtx2+DhSxCez10TBIh3z21rA93uomCw7Gxg0gEMgIsHjOsdj6LrwdUeG2zKTT0TrMcelW2THn5IfUchAsuPeBVq9zf8rJ0UQjCRnKgkmNpYHgIj+Zy/kBvm2V7+dvSe6jl2sqqw9vrTSzxsohj4zEi4MJxl/Z7Sl13YI4KNKiREJmf9KoCoZcuiKbOodsTM4uXXlkuCgkxbAFpZK2eS0HxIo7yx7RafGKdouF7g/hWeufJDp0mm9qd6i8ihu3/0Wtzu7xWdOy3V1XtCRKiMPMuWoI1xHL6siIVP99/kC51DiA2Q/d9v1ILV4tmaTZJJAt36c8lp0IvkvdBCP2n1j2jLONeUUj5+io7fnobq/+pp+GMI90lh/xPbgymsy+BuQ4NBICX2+G+eSg0L3j+kr5QLZQgvIqlanZtYjWJcUdpqGnHw9Yypgv5AyjAzu71dkUgGQLIE9AdRKJgIXlJ03CLfX2Dfihn1ihkDJmyvXHM7H9H5rBNn3Dn63gfbZ55U5c0g0zzdVH6pxTEL4BOvPzVVX3tNDCExI2KRe7w+fkCNhmyowRR3iPAqmDPKzOsV+E/vgvLPWUZI8JpDb9LOAo5+bwgn8jl18dZq6p3Z0NZ3CiB/hmPl6ZbmT2gtoaFWcKvxE+TPDxiFGcSzJTZnCyX74i6TEhDoKpE5vDSp63jLOfwSi7A84eWmcTQEiBC3nzcIONQbIRcy+ICs5njjEeU7W9Ecks2Qzk/FvixqBpNe3S8zddqInuzvB04fHejxc/6WYyAsiAhTKjVWZxy0YBXT4/2uhVqEZ8yR/X9KFigvVeU/Za7fJKegw3L5iaxOwsWIO6YR7IYtrcEAXued6hGzxkFt8=
  • Spamdiagnosticmetadata: NSPM
  • Spamdiagnosticoutput: 1:99

Hi,
following 4.0.2 upgrade, a latency node (SLC6) with uptime of just 22 hours,
so recently rebooted, which is configured to run 128 one-directional latency
tests gives me:
[root@perfsonar-lt ~]# ps waxxxu | grep "powstream/run" | wc -l
496
[root@perfsonar-lt ~]# ps waxxxu | grep powstream | wc -l
1236

I think I should expect 128 powstream/run processes and about 4x the number
of powstream, right ? So this looks way off.

The box has 16G RAM and there is quite some memory pressure in effect:
[root@perfsonar-lt ~]# ps -A --no-headers -o pcpu,rss,command | grep
"powstream" | awk '{cpu += $1; rss += $2} END {print rss}'
9276304
[root@perfsonar-lt ~]# ps -A --no-headers -o pcpu,rss,command | grep
"cassandra" | awk '{cpu += $1; rss += $2} END {print rss}'
4685812
[root@perfsonar-lt ~]# ps -A --no-headers -o pcpu,rss,command | grep httpd
| awk '{cpu += $1; rss += $2} END {print rss}'
1258452
[root@perfsonar-lt ~]# ps -A --no-headers -o pcpu,rss,command | grep httpd24
| awk '{cpu += $1; rss += $2} END {print rss}’
1057600

Trying to understand what’s going on gives me (we have separate meshes for
IPv4 and IPv6, so the reported ratio is about right):
[root@perfsonar-lt ~]# ps -ef | grep "/usr/bin/powstream" | grep " -6 " | wc
-l
144
[root@perfsonar-lt ~]# ps -ef | grep "/usr/bin/powstream" | grep " -4 " | wc
-l
597

I should have just one source as one-direction only is tested:
[root@perfsonar-lt ~]# ps -ef | grep "/usr/bin/powstream" | grep " -6 " | awk
'{print $23}' | sort | uniq -c
142 perfsonar-lt
[root@perfsonar-lt ~]# ps -ef | grep "/usr/bin/powstream" | grep " -4 " | awk
'{print $23}' | sort | uniq -c
3 <IP of perfsonar-lt>
591 perfsonar-lt
3 [perfs-wig-lt]:861
so this looks good (perfs-wig-lt is configured manually), for destinations I
get:

[root@perfsonar-lt ~]# ps -ef | grep "/usr/bin/powstream" | grep " -4 " | awk
'{print $24}' | sort | uniq | wc -l
100
[root@perfsonar-lt ~]# ps -ef | grep "/usr/bin/powstream" | grep " -6 " | awk
'{print $24}' | sort | uniq | wc -l
28
so 128 is correct.

However, looking for counts per destination:
# ps -ef | grep "/usr/bin/powstream" | grep " -4 " | awk '{print $24}' | sort
| uniq -c | sort -k 1 --numeric | tail
gives me something like this:
16 [host1]:861
16 [host2]:861
16 [host3]:861
16 [host4]:861
...
19 [host5]:861
29 [host6]:861
30 [host7 (btw. @aglt2)]:861

# ps -ef | grep <host7> then gives me 30 powstream processes
(/usr/bin/powstream) that all have stime within 3-4 minutes.

Any hints what’s going on ?

Thanks,
Marian







Attachment: smime.p7s
Description: S/MIME cryptographic signature




Archive powered by MHonArc 2.6.19.

Top of Page