While at today’s perfclub meeting, I noticed that our server to CENIC had terrible throughput issue. I then noticed that our server had load as high as “30+”, since there were a few recent perfsonar related packages. I reboot the server,
once the system came back, it had high load right away. I wonder if anyone see similar problem. In the owamp_bwctl log, I was log such as these:
Dec 18 15:25:04 melange owampd[20058]: FILE=owampd.c, LINE=806, Control session terminated abnormally...
Dec 18 15:25:06 melange bwctld[21040]: FILE=sapi.c, LINE=391, BWLControlAccept(): Unable to read ClientGreeting message
Dec 18 15:25:07 melange bwctld[21142]: FILE=sapi.c, LINE=391, BWLControlAccept(): Unable to read ClientGreeting message
Dec 18 15:25:09 melange bwctld[21185]: FILE=sapi.c, LINE=391, BWLControlAccept(): Unable to read ClientGreeting message
Dec 18 15:25:20 melange bwctld[21390]: FILE=sapi.c, LINE=391, BWLControlAccept(): Unable to read ClientGreeting message
Dec 18 15:25:31 melange owampd[20963]: FILE=protocol.c, LINE=1900, _OWPWriteStopSessions: called in wrong state.
Dec 18 15:25:31 melange owampd[20963]: FILE=owampd.c, LINE=806, Control session terminated abnormally...
Dec 18 15:25:31 melange owampd[20950]: FILE=protocol.c, LINE=1900, _OWPWriteStopSessions: called in wrong state.
Dec 18 15:25:31 melange owampd[20950]: FILE=owampd.c, LINE=806, Control session terminated abnormally...
Dec 18 15:26:08 melange owampd[21585]: FILE=protocol.c, LINE=1900, _OWPWriteStopSessions: called in wrong state.
Dec 18 15:26:08 melange owampd[21585]: FILE=owampd.c, LINE=806, Control session terminated abnormally...
Dec 18 15:26:08 melange owampd[21583]: FILE=protocol.c, LINE=1900, _OWPWriteStopSessions: called in wrong state.
Dec 18 15:26:08 melange owampd[21583]: FILE=owampd.c, LINE=806, Control session terminated abnormally...
Dec 18 15:26:35 melange owampd[22207]: FILE=protocol.c, LINE=1900, _OWPWriteStopSessions: called in wrong state.
Dec 18 15:26:35 melange owampd[22207]: FILE=owampd.c, LINE=806, Control session terminated abnormally...
…
While in meshconfig-agent.log, there are many warnings as well.
2017/12/18 15:19:00 (10386) WARN> perfsonar_meshconfig_agent:145 main::__ANON__ - Warned: Use of uninitialized value $address in exists at /usr/lib/perfsonar/bin/../lib/perfSONAR_PS/RegularTesting/Tests/BwctlBase.pm line 384.
2017/12/18 15:25:24 (10386) WARN> perfsonar_meshconfig_agent:430 main:: - Problem adding test trace(128.120.80.74->nautilus.sr.unh.edu), continuing with rest of config: 500 INTERNAL SERVER ERROR: Unable to determine participants: Process
took too long to run.
2017/12/18 15:25:24 (10386) WARN> perfsonar_meshconfig_agent:430 main:: - Problem determining which pscheduler to submit test to for creation, skipping test throughput(128.120.80.74->tc1-teng8-2.net.ohio-state.edu): 400 BAD REQUEST: Can't
find pScheduler or BWCTL on tc1-teng8-2.net.ohio-state.edu
2017/12/18 15:25:24 (10386) WARN> perfsonar_meshconfig_agent:430 main:: - Problem determining which pscheduler to submit test to for creation, skipping test throughput(tc1-teng8-2.net.ohio-state.edu->128.120.80.74): 400 BAD REQUEST: Can't
find pScheduler or BWCTL on tc1-teng8-2.net.ohio-state.edu
2017/12/18 15:25:24 (10386) WARN> perfsonar_meshconfig_agent:430 main:: - Problem determining which pscheduler to submit test to for creation, skipping test throughput(128.120.80.74->b06sr1-vlan254.tele.iastate.edu): 400 BAD REQUEST: Can't
find pScheduler or BWCTL on b06sr1-vlan254.tele.iastate.edu
2017/12/18 15:25:24 (10386) WARN> perfsonar_meshconfig_agent:430 main:: - Problem determining which pscheduler to submit test to for creation, skipping test throughput(b06sr1-vlan254.tele.iastate.edu->128.120.80.74): 400 BAD REQUEST: Can't
find pScheduler or BWCTL on b06sr1-vlan254.tele.iastate.edu
2017/12/18 15:25:24 (10386) WARN> perfsonar_meshconfig_agent:430 main:: - Problem determining which pscheduler to submit test to for creation, skipping test rtt(2607:f810:330:1ffe::f->perfsonar-011.net.berkeley.edu): 400 BAD REQUEST: Neither
the source nor destination is running pScheduler.
2017/12/18 15:25:24 (10386) WARN> perfsonar_meshconfig_agent:430 main:: - Problem determining which pscheduler to submit test to for creation, skipping test throughput(128.120.80.74->nautilus.sr.unh.edu): 400 BAD REQUEST: Can't find pScheduler
or BWCTL on nautilus.sr.unh.edu
There were also errors in clean_esmond_db.log
…
query error for metadata_key=bdd71f21372749cf90d63c6544bda3df, event_type=time-error-estimates, summary_type=base, summary_window=0, beg
in_time=1476263098, end_time=1476349498, error=An attempt was made to connect to each of the servers twice, but none of the attempts suc
ceeded. The last failure was TTransportException: Could not connect to localhost:9160
Error connecting to remote JMX agent!
java.io.IOException: Failed to retrieve RMIServer stub: javax.naming.CommunicationException [Root exception is java.rmi.ConnectIOException: Exception creating connection to: 127.0.0.1; nested exception is:
java.net.SocketException: Network is unreachable (connect failed)]
at javax.management.remote.rmi.RMIConnector.connect(RMIConnector.java:370)
at javax.management.remote.JMXConnectorFactory.connect(JMXConnectorFactory.java:268)
at org.apache.cassandra.tools.NodeProbe.connect(NodeProbe.java:151)
at org.apache.cassandra.tools.NodeProbe.<init>(NodeProbe.java:121)
at org.apache.cassandra.tools.NodeCmd.main(NodeCmd.java:1276)
Caused by: javax.naming.CommunicationException [Root exception is java.rmi.ConnectIOException: Exception creating connection to: 127.0.0.1; nested exception is:
java.net.SocketException: Network is unreachable (connect failed)]
at com.sun.jndi.rmi.registry.RegistryContext.lookup(RegistryContext.java:142)
at com.sun.jndi.toolkit.url.GenericURLContext.lookup(GenericURLContext.java:204)
at javax.naming.InitialContext.lookup(InitialContext.java:415)
at javax.management.remote.rmi.RMIConnector.findRMIServerJNDI(RMIConnector.java:1928)
at javax.management.remote.rmi.RMIConnector.findRMIServer(RMIConnector.java:1895)
at javax.management.remote.rmi.RMIConnector.connect(RMIConnector.java:287)
... 4 more
Caused by: java.rmi.ConnectIOException: Exception creating connection to: 127.0.0.1; nested exception is:
java.net.SocketException: Network is unreachable (connect failed)
at sun.rmi.transport.tcp.TCPEndpoint.newSocket(TCPEndpoint.java:631)
at sun.rmi.transport.tcp.TCPChannel.createConnection(TCPChannel.java:216)
at sun.rmi.transport.tcp.TCPChannel.newConnection(TCPChannel.java:202)
at sun.rmi.server.UnicastRef.newCall(UnicastRef.java:338)
at sun.rmi.registry.RegistryImpl_Stub.lookup(RegistryImpl_Stub.java:112)
Please let me know if you know solution to this problem. Thank you.
Zhi-Wei Lu
IET-CR-Network Operations Center
University of California, Davis
(530) 752-0155