Skip to Content.
Sympa Menu

perfsonar-user - Re: [perfsonar-user] Logstash killing CPU on pS-Toolkit node

Subject: perfSONAR User Q&A and Other Discussion

List archive

Re: [perfsonar-user] Logstash killing CPU on pS-Toolkit node


Chronological Thread 
  • From: Kathy Benninger <>
  • To: Andrew Lake <>
  • Cc: perfsonar-user <>, Kathy Benninger <>,
  • Subject: Re: [perfsonar-user] Logstash killing CPU on pS-Toolkit node
  • Date: Sat, 13 May 2023 17:59:29 -0400

Hi Andy,

Problem with high CPU utilization resolved with "yum reinstall perfsonar-logstash-output-plugin".

"install" didn't do anything because perfsonar-logstash-output-plugin was already installed and up to date.

When I first tried "yum reinstall perfsonar-logstash-output-plugin", the perfSONAR sat there thinking for a while then posted several errors related to not being able to access rubygems, ending with:

ERROR: Installation Aborted, message: Could not fetch specs from https://rubygems.org/ due to underlying error <timed out (https://rubygems.org/specs.4.8.gz)>

After a bit of investigation our problem seems to have been caused by some incorrect routing affecting that server that prevented the machine from accessing rubygems. After that was fixed, the reinstall ran cleanly.

Thanks for your help!

Kathy


On 5/10/2023 5:48 PM, Andrew Lake wrote:
Hi,

Thanks. If you do a "yum install perfsonar-logstash-output-plugin" or a "yum reinstall perfsonar-logstash-output-plugin" do things get any better?

Andy

On Wed, May 10, 2023, 5:27 PM Kathy Benninger <> wrote:
Hi Andy,

"/var/log/logstash/logstash-plain.log” has many repeating error messages, just since midnight. The error file has

[2023-05-10T00:00:07,364][ERROR][logstash.plugins.registry] Unable to load plugin. {:type=>"output", :name=>"opensearch"}

logstash.runner and logstash.agent
 



On 5/10/2023 10:56 AM, Andrew Lake wrote:
Hi Kathy,

Anything interesting in "/var/log/logstash/logstash-plain.log”?Interesting would include frequent errors or anything like that. Possible its doing something unexpected.

Thanks,
Andy 


On May 9, 2023 at 9:37:00 PM, Kathy Benninger () wrote:

Greetings!

I have a bare-metal server that was up to date and running fine with the
version 4 pS-Toolkit. After the 5.0 upgrade, the perfSONAR is reporting high
CPU utilization by logstash. All testing is stopped (Actually pscheduler
testing doesn't run and I believe I also have to fix the limits.conf issue,
though I never modified the limits file.).

The OS is CentOS Linux release 7.9.2009 (Core)

Example utilization:

[benninge@ps100g-10g ~]$ top -H
top - 13:41:53 up  1:55,  2 users,  load average: 3.33, 3.11, 3.10
Threads: 542 total,   6 running, 536 sleeping,   0 stopped,   0 zombie
%Cpu(s):  0.4 us,  2.0 sy, 72.1 ni, 25.5 id,  0.0 wa,  0.0 hi, 0.0 si,  0.0 st
KiB Mem : 47884028 total, 19617196 free, 27611884 used,   654948 buff/cache
KiB Swap: 52428796 total, 52428796 free,        0 used. 19800300 avail Mem

   PID USER      PR  NI    VIRT    RES    SHR S %CPU %MEM TIME+ COMMAND
 42145 logstash  39  19 3889836 456264  14888 R 91.4  1.0 0:02.76 java
 42158 logstash  39  19 3889836 456264  14888 R 71.2  1.0 0:02.15 C2
CompilerThre
 42153 logstash  39  19 3889836 456264  14888 R 70.2  1.0 0:02.12 C2
CompilerThre
 42154 logstash  39  19 3889836 456264  14888 R 46.4  1.0 0:01.40 C1
CompilerThre
 42147 logstash  39  19 3889836 456264  14888 R  6.6  1.0 0:00.20 CMS Main
Thread
 42146 logstash  39  19 3889836 456264  14888 S  1.7  1.0 0:00.05 GC Thread#0
  7325 benninge  20   0  121236   2792   1452 S  0.7  0.0 0:53.08 htop
 40457 benninge  20   0  160652   2712   1568 R  0.7  0.0 0:01.20 top
 42162 logstash  39  19 3889836 456264  14888 S  0.7  1.0 0:00.02 GC Thread#2
     9 root      20   0       0      0      0 S  0.3  0.0 0:11.79 rcu_sched
  1387 opensea+  20   0   26.6g  23.1g  23432 S  0.3 50.7 0:01.52 G1 Service
  1540 opensea+  20   0   26.6g  23.1g  23432 S  0.3 50.7 0:03.36
pa-collectors-t
  4261 opensea+  20   0   26.6g  23.1g  23432 S  0.3 50.7 0:00.25
opensearch[ps10
  6144 opensea+  20   0   26.6g  23.1g  23432 S  0.3 50.7 0:00.02
opensearch[ps10
  3054 root      20   0  638544  15176   5096 S  0.3  0.0 0:03.51 f2b/f.sshd
 24256 root      20   0  638544  15176   5096 S  0.3  0.0 0:02.45 f2b/observer
  3184 perfson+  20   0  313844  27952   5276 S  0.3  0.1 0:33.51 python3
  7148 benninge  20   0  157500   2552   1188 S  0.3  0.0 0:00.72 sshd
 10376 benninge  20   0  157500   2560   1204 S  0.3  0.0 0:00.35 sshd
 42148 logstash  39  19 3889836 456264  14888 S  0.3  1.0 0:00.01 VM Thread
 42161 logstash  39  19 3889836 456264  14888 S  0.3  1.0 0:00.01 GC Thread#1
 42163 logstash  39  19 3889836 456264  14888 S  0.3  1.0 0:00.01 GC Thread#3

I saw that Cas D'Angelo had reported a similar problem on 24-Apr-2023, but I
do not see a resolution. This node was not an archive host, but it was a
standalone host storing its own measurements.

Thanks for any guidance!

Kathy Benninger
Pittsburgh Supercomputing Center

--
To unsubscribe from this list: https://lists.internet2.edu/sympa/signoff/perfsonar-user





Archive powered by MHonArc 2.6.24.

Top of Page