Skip to Content.
Sympa Menu

perfsonar-user - [perfsonar-user] perfsonar server load high

Subject: perfSONAR User Q&A and Other Discussion

List archive

[perfsonar-user] perfsonar server load high


Chronological Thread 
  • From: Zhi-Wei Lu <>
  • To: "" <>
  • Subject: [perfsonar-user] perfsonar server load high
  • Date: Mon, 18 Dec 2017 23:48:02 +0000
  • Accept-language: en-US
  • Authentication-results: spf=none (sender IP is ) ;
  • Ironport-phdr: 9a23:kxljwBTxlLLS5bcUeAVzDUKqZdpsv+yvbD5Q0YIujvd0So/mwa6zbB2N2/xhgRfzUJnB7Loc0qyK6/mmATRIyK3CmUhKSIZLWR4BhJdetC0bK+nBN3fGKuX3ZTcxBsVIWQwt1Xi6NU9IBJS2PAWK8TW94jEIBxrwKxd+KPjrFY7OlcS30P2594HObwlSizexfa5+IA+qoQnNq8IbnZZsJqEtxxXTv3BGYf5WxWRmJVKSmxbz+MK994N9/ipTpvws6ddOXb31cKokQ7NYCi8mM30u683wqRbDVwqP6WACXWgQjxFFHhLK7BD+Xpf2ryv6qu9w0zSUMMHqUbw5Xymp4qF2QxHqlSgHLSY0/mHJhMJtkKJVrhGvpx1jzIHbe4yaLuZycr/HcN8GWWZNQMBcXDFBDIOmaIsPCvIMMuVYr4n8vlcBrQWxBQixD+3p1z9Dm3j73Kw00uQlDAHLxxEgEM4Tv3vOstX1NbwSXfqrw6bV0DXOdvVb0irz5ojPdxAuu/CMXbRofMrN1UYjChrJjkmOpozjMDOY1f0Bv3aB4+pmS+2vl3Yrqxx3ojiv3MsjlJTGhp8NxlDc6yp52og1Jca/SE59e9GkCoJcuz2HO4drWM8iRX9nuDoixr0Ho5G7ZzQKx447xxHBcfCIbZWH4g/lWe2MIjl4nGpodKyliBqu7EStz/DwWtSp3FtPoCpIncXAumwI2hzd9MeKRfhw80Kk1DuOywzf9vlILV01mKffMZIt3LA9m5gJvUjdAyP7ll/6gLGKekgi5+Om8f7oYq/8qZ+ZL4J0ih/xMqApmsGnGeo1Lg8AU3SV9OilyrDt5FD1TKxNjvItjKbVqpfaJdkHpqGiBA9Vz4Aj5AulAze+ytQYmmUHI0xZdxKbjojpPFfOLOr/Dfein1SslDBrx/fFPrH7HprNKX3DnK/gfbZ79UFc1BI+wc5F6J5IF70MJe//VlLsuNHdAB80PBC4z/riBdVzyIwTVmGCD6qcPa7TrVOE+vojI+yWa48UvDb9JeIl5/nrjXIhmF4cc62p3YYMZXClAvtpPl+WYWTtgtcaC2sFoBcxTPHyhF2YTTFTf2qyX7475jwjEIKpE53DRo62gLyG2ie0BIdWanlbClCXD3jobZ6JW/MNaCKJPs9hiSIIWaKgS48nyRGhqhX6y7x5IerI5CEUr4zs28Vo576bqRZnvyR5FcqayWSESWp522UPSzJzwbtyu1dVy1Gf3LJ+juACU9Ff+rkBBh83L5Db1elzDdv/HwTGctrMVU2rWM6OADctQ8g3zsNUJUtxBoPxoArE2n+YA75QqLGKTLgpuvbOwHzsD9t2x3/Y1bJnglU7FJgcfVa6j7JyolCAT7XClF+Uwv6n
  • Spamdiagnosticmetadata: NSPM
  • Spamdiagnosticoutput: 1:99

While at today’s perfclub meeting, I noticed that our server to CENIC had terrible throughput issue.  I then noticed that our server had load as high as “30+”, since there were a few recent perfsonar related packages. I reboot the server, once the system came back, it had high load right away.  I wonder if anyone see similar problem.  In the owamp_bwctl log, I was log such as these:

 

 

Dec 18 15:25:04 melange owampd[20058]: FILE=owampd.c, LINE=806, Control session terminated abnormally...

Dec 18 15:25:06 melange bwctld[21040]: FILE=sapi.c, LINE=391, BWLControlAccept(): Unable to read ClientGreeting message

Dec 18 15:25:07 melange bwctld[21142]: FILE=sapi.c, LINE=391, BWLControlAccept(): Unable to read ClientGreeting message

Dec 18 15:25:09 melange bwctld[21185]: FILE=sapi.c, LINE=391, BWLControlAccept(): Unable to read ClientGreeting message

Dec 18 15:25:20 melange bwctld[21390]: FILE=sapi.c, LINE=391, BWLControlAccept(): Unable to read ClientGreeting message

Dec 18 15:25:31 melange owampd[20963]: FILE=protocol.c, LINE=1900, _OWPWriteStopSessions: called in wrong state.

Dec 18 15:25:31 melange owampd[20963]: FILE=owampd.c, LINE=806, Control session terminated abnormally...

Dec 18 15:25:31 melange owampd[20950]: FILE=protocol.c, LINE=1900, _OWPWriteStopSessions: called in wrong state.

Dec 18 15:25:31 melange owampd[20950]: FILE=owampd.c, LINE=806, Control session terminated abnormally...

Dec 18 15:26:08 melange owampd[21585]: FILE=protocol.c, LINE=1900, _OWPWriteStopSessions: called in wrong state.

Dec 18 15:26:08 melange owampd[21585]: FILE=owampd.c, LINE=806, Control session terminated abnormally...

Dec 18 15:26:08 melange owampd[21583]: FILE=protocol.c, LINE=1900, _OWPWriteStopSessions: called in wrong state.

Dec 18 15:26:08 melange owampd[21583]: FILE=owampd.c, LINE=806, Control session terminated abnormally...

Dec 18 15:26:35 melange owampd[22207]: FILE=protocol.c, LINE=1900, _OWPWriteStopSessions: called in wrong state.

Dec 18 15:26:35 melange owampd[22207]: FILE=owampd.c, LINE=806, Control session terminated abnormally...

While in meshconfig-agent.log, there are many warnings as well.

 

2017/12/18 15:19:00 (10386) WARN> perfsonar_meshconfig_agent:145 main::__ANON__ - Warned: Use of uninitialized value $address in exists at /usr/lib/perfsonar/bin/../lib/perfSONAR_PS/RegularTesting/Tests/BwctlBase.pm line 384.

2017/12/18 15:25:24 (10386) WARN> perfsonar_meshconfig_agent:430 main:: - Problem adding test trace(128.120.80.74->nautilus.sr.unh.edu), continuing with rest of config: 500 INTERNAL SERVER ERROR: Unable to determine participants: Process took too long to run.

2017/12/18 15:25:24 (10386) WARN> perfsonar_meshconfig_agent:430 main:: - Problem determining which pscheduler to submit test to for creation, skipping test throughput(128.120.80.74->tc1-teng8-2.net.ohio-state.edu): 400 BAD REQUEST: Can't find pScheduler or BWCTL on tc1-teng8-2.net.ohio-state.edu

 

2017/12/18 15:25:24 (10386) WARN> perfsonar_meshconfig_agent:430 main:: - Problem determining which pscheduler to submit test to for creation, skipping test throughput(tc1-teng8-2.net.ohio-state.edu->128.120.80.74): 400 BAD REQUEST: Can't find pScheduler or BWCTL on tc1-teng8-2.net.ohio-state.edu

 

2017/12/18 15:25:24 (10386) WARN> perfsonar_meshconfig_agent:430 main:: - Problem determining which pscheduler to submit test to for creation, skipping test throughput(128.120.80.74->b06sr1-vlan254.tele.iastate.edu): 400 BAD REQUEST: Can't find pScheduler or BWCTL on b06sr1-vlan254.tele.iastate.edu

 

2017/12/18 15:25:24 (10386) WARN> perfsonar_meshconfig_agent:430 main:: - Problem determining which pscheduler to submit test to for creation, skipping test throughput(b06sr1-vlan254.tele.iastate.edu->128.120.80.74): 400 BAD REQUEST: Can't find pScheduler or BWCTL on b06sr1-vlan254.tele.iastate.edu

 

2017/12/18 15:25:24 (10386) WARN> perfsonar_meshconfig_agent:430 main:: - Problem determining which pscheduler to submit test to for creation, skipping test rtt(2607:f810:330:1ffe::f->perfsonar-011.net.berkeley.edu): 400 BAD REQUEST: Neither the source nor destination is running pScheduler.

 

2017/12/18 15:25:24 (10386) WARN> perfsonar_meshconfig_agent:430 main:: - Problem determining which pscheduler to submit test to for creation, skipping test throughput(128.120.80.74->nautilus.sr.unh.edu): 400 BAD REQUEST: Can't find pScheduler or BWCTL on nautilus.sr.unh.edu

 

There were also errors in clean_esmond_db.log

query error for metadata_key=bdd71f21372749cf90d63c6544bda3df, event_type=time-error-estimates, summary_type=base, summary_window=0, beg

in_time=1476263098, end_time=1476349498, error=An attempt was made to connect to each of the servers twice, but none of the attempts suc

ceeded. The last failure was TTransportException: Could not connect to localhost:9160

Error connecting to remote JMX agent!

java.io.IOException: Failed to retrieve RMIServer stub: javax.naming.CommunicationException [Root exception is java.rmi.ConnectIOException: Exception creating connection to: 127.0.0.1; nested exception is:

        java.net.SocketException: Network is unreachable (connect failed)]

        at javax.management.remote.rmi.RMIConnector.connect(RMIConnector.java:370)

        at javax.management.remote.JMXConnectorFactory.connect(JMXConnectorFactory.java:268)

        at org.apache.cassandra.tools.NodeProbe.connect(NodeProbe.java:151)

        at org.apache.cassandra.tools.NodeProbe.<init>(NodeProbe.java:121)

        at org.apache.cassandra.tools.NodeCmd.main(NodeCmd.java:1276)

Caused by: javax.naming.CommunicationException [Root exception is java.rmi.ConnectIOException: Exception creating connection to: 127.0.0.1; nested exception is:

        java.net.SocketException: Network is unreachable (connect failed)]

        at com.sun.jndi.rmi.registry.RegistryContext.lookup(RegistryContext.java:142)

        at com.sun.jndi.toolkit.url.GenericURLContext.lookup(GenericURLContext.java:204)

        at javax.naming.InitialContext.lookup(InitialContext.java:415)

        at javax.management.remote.rmi.RMIConnector.findRMIServerJNDI(RMIConnector.java:1928)

        at javax.management.remote.rmi.RMIConnector.findRMIServer(RMIConnector.java:1895)

        at javax.management.remote.rmi.RMIConnector.connect(RMIConnector.java:287)

        ... 4 more

Caused by: java.rmi.ConnectIOException: Exception creating connection to: 127.0.0.1; nested exception is:

        java.net.SocketException: Network is unreachable (connect failed)

        at sun.rmi.transport.tcp.TCPEndpoint.newSocket(TCPEndpoint.java:631)

        at sun.rmi.transport.tcp.TCPChannel.createConnection(TCPChannel.java:216)

        at sun.rmi.transport.tcp.TCPChannel.newConnection(TCPChannel.java:202)

        at sun.rmi.server.UnicastRef.newCall(UnicastRef.java:338)

        at sun.rmi.registry.RegistryImpl_Stub.lookup(RegistryImpl_Stub.java:112)

 

Please let me know if you know solution to this problem.  Thank you.

 

Zhi-Wei Lu

IET-CR-Network Operations Center

University of California, Davis

(530) 752-0155

 




Archive powered by MHonArc 2.6.19.

Top of Page