Skip to Content.
Sympa Menu

perfsonar-user - Re: [perfsonar-user] more than 2000 threads

Subject: perfSONAR User Q&A and Other Discussion

List archive

Re: [perfsonar-user] more than 2000 threads


Chronological Thread 
  • From: Andrew Lake <>
  • To: Marian Babik <>, Pete Siemsen <>
  • Cc: "" <>
  • Subject: Re: [perfsonar-user] more than 2000 threads
  • Date: Wed, 6 Dec 2017 15:01:33 +0100
  • Ironport-phdr: 9a23:zA74Qh/nTcFTqf9uRHKM819IXTAuvvDOBiVQ1KB42u0cTK2v8tzYMVDF4r011RmSDNWds6oMotGVmpioYXYH75eFvSJKW713fDhBt/8rmRc9CtWOE0zxIa2iRSU7GMNfSA0tpCnjYgBaF8nkelLdvGC54yIMFRXjLwp1Ifn+FpLPg8it2e2//57ebx9UiDahfLh/MAi4oQLNu8cMnIBsMLwxyhzHontJf+RZ22ZlLk+Nkhj/+8m94odt/zxftPw9+cFAV776f7kjQrxDEDsmKWE169b1uhTFUACC+2ETUmQSkhpPHgjF8BT3VYr/vyfmquZw3jSRMNboRr4oRzut86ZrSAfpiCgZMT457HrXgdF0gK5CvR6tuwBzz4vSbYqINvRxY7ndcMsaS2VDUMZfUCNODI2/YYUSEeQOIf1VoJPhq1YUtxazHxWgCeHpxzRVhnH2x6o60+E5HAza2gwvAsgOv2rard7oMqkdS+O1w7fVxjjEdfxW3yry6YjTfx8/uvGAR7RwftTNyUQ2EQ7Ok1ueqYvgPzyP1+QNtXCW7+p8VeKzk24otht9ozioxscxjITCm4Ebykjc+Clk3oo4Jse0RUBhbdOrDZdcrSCXOohuTs88X21lvDw2x74GtJKhYiQG1poqywTCZ/GDcIWF5A/oWvyLLjdinn1lfaqyhxas/kikze3xTse030hEoyZfltnDrXQN2wbc6siAVPtx5kah2TCR2ADP8uxIPF44mKnBJ5Mv2LI9mYcfvV7CEyL1gEn2ibWZdkQg+uim8eTnZbDmq4eGOI9ylw7/Mr8jldKkAegiNAgBQXSb9fym1LL/5U35XKlKjvoun6nBrp/aP8obprW+Aw9TyIkj8Q+zDyq90NsGh3kKN1ZFeBOcj4j1IFHCPur0Dfa5g1Swjjhr3fbGMaP9ApnTNHTMjqrufasuo3JbnS8619FW4dp+A7sAI7qnX0brtdXeSBUwOQ236+3qEtM73YNIHSq3D7OUeJjTvFqT6+Rnd/KNaYoOtTDVNvMs/bjjgWJvynEHeqz89J0bcn2nVtBvIEjRNXPqjsYpHHxMuAciGr+5wGaeWCJeMi7hF5k34Ss2Xcf/Vd/O

Hi Pete,

To echo some of what Ivan said, I don’t think a couple hundred powstreams should lead to you exceeding any thread ulimits, all of those processes are single threaded.  Do you have very many httpd processes?  I wonder if apache got stuck for some reason, it does some threading. What else is on the system? I assume esmond. Does it have maddash as well? That could also put some pressure on httpd. 

You could also try restarting apache (and might need to do a pkill if things are similarly stuck). Unfortunately you may also need to do a reboot because once you start exceeding ulimits it can break all sorts of stuff in unexpected ways. 

Thanks,
Andy




On December 6, 2017 at 4:27:06 AM, Marian Babik () wrote:

Hi Pete,
my understanding was that Andy suggested to restart pscheduler runner just to better understand where exactly to look for the root cause for the high number of threads/processes (as stopping pscheduler runner leaves quite a number of powstream processes around, it means there are indeed runaway processes that are no longer managed). I suspect it’s a bug that will need to be fixed, to be confirmed by Andy or Mark. As far as I can tell there is no workaround for it at the moment.

Just to check if this is perhaps OS specific, are you machines on centOS7 or still on SL6/or something else ?

Thanks,
Marian


> On Dec 5, 2017, at 8:03 PM, Pete Siemsen <> wrote:
>
> We monitor our perfSONAR machines with Nagios check-mk, which by default warns
> if there are more than 2000 threads on a machine. I'm getting that alarm. I
> found an email sequence where Andy suggested stopping/starting powstream. I
> did that but am still seeing "too many" threads. Should I raise the threshold
> above 2000?
>
> Here's what I did to no effect:
>
> perfsonar-1850# /etc/init.d/pscheduler-runner stop
> Stopping pScheduler runner: [ OK ]
> perfsonar-1850# /etc/init.d/pscheduler-runner status
> runner is stopped
> perfsonar-1850# ps -ef | grep "/usr/bin/powstream" | wc -l
> 357
> perfsonar-1850# ps -ef | grep "/usr/bin/powstream" | wc -l
> 354
> (waited 5 minutes)
> perfsonar-1850# ps -ef | grep "/usr/bin/powstream" | wc -l
> 354
> perfsonar-1850# ps -ef | grep "/usr/bin/powstream" | wc -l
> 353
> perfsonar-1850# ps -ef | grep "/usr/bin/powstream" | wc -l
> 352
> perfsonar-1850# pkill -9 -f powstream
> perfsonar-1850# ps -ef | grep "/usr/bin/powstream" | wc -l
> 9
> perfsonar-1850# ps -ef | grep "/usr/bin/powstream" | wc -l
> 9
> perfsonar-1850# /etc/init.d/pscheduler-runner start
> Starting pScheduler runner: [ OK ]
> perfsonar-1850# ps -ef | grep "/usr/bin/powstream" | wc -l
> 197
> perfsonar-1850# ps -ef | grep "/usr/bin/powstream" | wc -l
> 202
> perfsonar-1850# uptime
> 11:53:52 up 3 days, 21:50, 1 user, load average: 1.30, 1.02, 1.12
> (waited an hour)
> perfsonar-1850# ps -ef | grep "/usr/bin/powstream" | wc -l
> 203




Archive powered by MHonArc 2.6.19.

Top of Page