Skip to Content.
Sympa Menu

ndt-users - Re: failed middlebox testing

Subject: ndt-users list created

List archive

Re: failed middlebox testing


Chronological Thread 
  • From: Clayton Keller <>
  • To:
  • Subject: Re: failed middlebox testing
  • Date: Thu, 01 Sep 2005 14:03:47 -0500

Richard Carlson wrote:
Hi Clay;

One other thing. What happens if you run the web100srv process as a normal user (not root). This should skip the packet_pair monitoring step. I think the server will still choke on a sysctl command, but it should finish the c2s speed test before it crashes. This would help isolate where the problem is.

Rich
At 11:40 AM 9/1/2005, Clayton Keller wrote:

Richard Carlson wrote:

Hi Clay;

[snip snip snip]
Here are the current processes after I have restarted them:

root 8370 1 96 14:20 pts/1 00:00:01 /usr/local/sbin/web100srv -a -l /var/log/web100/web100srv.log
root 8377 1 0 14:20 pts/1 00:00:00 /usr/local/sbin/fakewww -l /var/log/web100/fakewww.log

I have configured iptables to allow connections on tcp - dest. ports 3001, 3002, 3003, and 7123.

When configuring web100 in the kernel (2.6.12.5 - kernel.org ), I have the following configured:

--- IP: Web100 networking enhancements
[*] Web100: Extended TCP statistics
(384) Web100:
Default file permissions (0) Web100: Default gid
[*] Web100: Net100 extensions
[*] Web100: Netlink event notification service

GID 0 is root.

File permissions for /usr/local/sbin/web100srv rwxr-xr-x root.root
All files in /usr/local/ndt are root.root with the exception of tcpbw100.html which is root.users. All files in this folder are rw-r--r--

The files in /usr/local/lib are all root.root as well, including the libpcap.a file that was compiled prior to installation of ndt-3.1.4a.


All of this sounds and looks correct. And just to be clear, the client sees the middlebox test end and starts the "client to server" test.
This test fails with the "Server failed: ..." message.

While running the test I went ahead and did a packet capture as well. The following information is being passed on the connection to port 3003:

ip.web.100.server;ip.client.doing.test;1456;-1;-1;


This is the result of the middlebox test. The IP addresses, MSS value and the window scale values. Port 3003 is then closed and re-opened.

I also see SYN, ACK, and ACK FIN traffic passing on 3001 and 3002.

I still am seeing the 'Go' flag error. I thank you for all the help thus far, and am curious what ideas you have as far as proceeding further with this.


No problem, thanks for putting up with my buggy code. .-)
OK, what is the last thing you seen on port 3001. This is the control channel and it should be sending a message back to the client after the middlebox test ends. Right now I'm simply reusing an old buffer so the string should be the port numbers "3002 3003". You should see this message twice. Once before the middlebox test and again after. If you only see it once, then the server is probably failing to initialize the Ethernet interface (the libpcap stuff). You can try forcing libpcap to use a specific interface with the -i flag. So if you are have 1 network interface then the option "-ieth0" should be added to the command line.
It may also help to see where the debug messages stop. Turn on a couple layers (at least 2) and let me know what the last message before the sig 11 comes. You might need to redirect the stderr output to a file or the messages may scroll off the screen. I'm looking for what comes after the ""C2S test calling init_pkttrace() with pd=" message.
Rich

Clay


------------------------------------


Rich,

You are correct, the Middlebox test completes, and when the client to server test starts is when we get the 'Go' failure:

TCP/Web100 Network Diagnostic Tool v5.3.3e
click START to begin
Checking for Middleboxes . . . . . . . . . . . . . . . . . . Done
running 10s outbound test (client to server) . . . . . Server failed: 'Go' flag not received

I've gathered ouput when specifying the interface for libpcap and without. Both are returning similar results.

Here is the output with the following command up to the SIGNAL 11's:

# ./web100srv -dd -a -ieth0 -l /var/log/web100/web100srv.log &> output.txt

# cat output.txt | more
ANL/Internet2 NDT ver 3.1.4
Variables file = /usr/local/ndt/web100_variables
log file = /var/log/web100/web100srv.log
Debug level set to 2
server ready on port 3001
web100_init() read 69 variables from file
Signal 17 received from process 9709
successfully locked '/tmp/view.string' for updating
sending '0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,,' to tmp file
socket info src=ip.web.100.srvr:3001, dst=ip.pc.runnig.test:1683
listening for Inet connection on sock2, fd=3
server ports 3002 3003
listening for Inet connection on sock3, fd=5
Middlebox test, Port 3003 waiting for incoming connection
Set MSS to 536, Window size set to 16777216KB
socket info src=ip.web.100.srvr:3003, dst=ip.pc.running.test:1684
Sent 'GO' signal, waiting for incoming connection on sock2
C2S test calling init_pkttrace() with pd=0x95060002
Opening network interface 'eth0' for packet-pair timing
pcap_open_live() returned pointer 0x808c620
installing pkt filter for 'host ip.pc.running.test and port 1685'
Initial pkt src data = 0
Signal 11 received from process 9715

And here is the output when not specifying the ethernet device along with it's command up to the SIGNAL 11's:

# ./web100srv -dd -a -l /var/log/web100/web100srv.log &> output2.txt

# cat output2.txt | more
ANL/Internet2 NDT ver 3.1.4
Variables file = /usr/local/ndt/web100_variables
log file = /var/log/web100/web100srv.log
Debug level set to 2
server ready on port 3001
web100_init() read 69 variables from file
Signal 17 received from process 9720
successfully locked '/tmp/view.string' for updating
sending '0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,,' to tmp file
socket info src=ip.web.100.srvr:3001, dst=ip.pc.running.test:1689
listening for Inet connection on sock2, fd=3
server ports 3002 3003
listening for Inet connection on sock3, fd=5
Middlebox test, Port 3003 waiting for incoming connection
Set MSS to 536, Window size set to 16777216KB
socket info src=ip.web.100.srvr:3003, dst=ip.pc.running.test:1690
Sent 'GO' signal, waiting for incoming connection on sock2
C2S test calling init_pkttrace() with pd=0x9b060002
Opening network interface 'eth0' for packet-pair timing
pcap_open_live() returned pointer 0x808c828
installing pkt filter for 'host ip.pc.running.test and port 1691'
Initial pkt src data = 0
Signal 11 received from process 9724

During a packet capture, I see another 3003 packet that looks to be the port reopening like you said after the Middlebox test is ran. There is then this information passed from port 3001 before and immediately after the Middlebox test:

03002 30033002 3003

And this is as you described. After the Middlebox test and after this is passed on 3001 there looks to be a connection attempt on 3002. Which from looking at the port information is where we are seeing the installing pkt filter in the debug output...

At this point the signal 11's hit, and I manually shutdown the web100srv with a ctrl-c.

Let me know if you need me to gather any more information on this for you. I hope this information is helping you as well as we continue to look into this.

Clay





------------------------------------

Rich,

tcpdump is working...it is version 3.8.

There were however two versions of libpcap installed. It appears that 0.8.3 is installed during the initial OS install. I did install libpcap-0.9.3 from source. I went ahead and removed the RPM package of 0.8.3 that had been installed by Fedora during the installation. After doing so, I recompiled NDT-3.1.4a with only libpcap-0.9.3 installed from source with libpcap.a installed in the dir above and the other files installed in the following locations:

/usr/local/lib/libpcap.a
/usr/local/include/pcap.h
/usr/local/include/pcap-bpf.h
/usr/local/include/pcap-namedb.h

Here is a debug with only libpcap-0.9.3 installed and a recompile of NDT-3.1.4a:

# ./web100srv -dd -i lo -a -l /var/log/web100/web100srv.log &> output3.txt

# cat output3.txt | more
ANL/Internet2 NDT ver 3.1.4
Variables file = /usr/local/ndt/web100_variables
log file = /var/log/web100/web100srv.log
Debug level set to 2
server ready on port 3001
web100_init() read 69 variables from file
Signal 17 received from process 16226
successfully locked '/tmp/view.string' for updating
sending '0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,,' to tmp file
socket info src=ip.web.100.srvr:3001, dst=ip.pc.running.test:2448
listening for Inet connection on sock2, fd=3
server ports 3002 3003
listening for Inet connection on sock3, fd=5
Middlebox test, Port 3003 waiting for incoming connection
Set MSS to 536, Window size set to 16777216KB
socket info src=ip.web.100.srvr:3003, dst=ip.pc.running.test:2449
Sent 'GO' signal, waiting for incoming connection on sock2
C2S test calling init_pkttrace() with pd=0x92090002
Opening network interface 'lo' for packet-pair timing
pcap_open_live() returned pointer 0x808c620
installing pkt filter for 'host ip.pc.running.test and port 2450'
Initial pkt src data = 0
Signal 11 received from process 16232

This looks to be identical results as what we had before using eth0 and when both libpcaps were installed.

This is running on an IBM 306. The network card as picked up by the stock Fedora Kernel was using e1000. When compiling 2.6.12.5 from kernel.org sources I also compiled in this driver. The card is an integrated gigabit ethernet card. Here is a line from /var/log/dmesg loading the driver for both eth0 and eth1 (eth1 is disabled at this time):

e1000: eth0: e1000_probe: Intel(R) PRO/1000 Network Connection
e1000: eth1: e1000_probe: Intel(R) PRO/1000 Network Connection

I went ahead and did a "make uninstall" with the source compiled libpcap, and re-installed the RPM package via yum. It installed libpcap-0.8.3-13.FC4.i386. The following files were installed:

/usr/lib/libpcap.so.0
/usr/lib/libpcap.so.0.8.3
/usr/lib/libpcap.so
/usr/lib/libpcap.a
/usr/lib/libpcap.so.0.8
/usr/include/pcap-namedb.h
/usr/include/pcap.h
/usr/include/pcap-bpf.h

After doing so, I was able to run the test successfully using the following:

# ./web100srv -dd -i eth0 -a -l /var/log/web100/web100srv.log &> output3.txt

# cat output3.txt
ANL/Internet2 NDT ver 3.1.4
Variables file = /usr/local/ndt/web100_variables
log file = /var/log/web100/web100srv.log
Debug level set to 2
server ready on port 3001
web100_init() read 69 variables from file
Initial counter Values Totalcnt = 0, Total Mismatch = 0, Total Bad Cables = 0
Updated counter values Totalcnt = 1, Total Mismatch = 0, Total Bad Cables = 0
Individual counts = [0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0]
Initial counter Values Totalcnt = 1, Total Mismatch = 0, Total Bad Cables = 0
Updated counter values Totalcnt = 2, Total Mismatch = 0, Total Bad Cables = 0
Individual counts = [0,0,0,0,0,0,2,0,0,0,0,0,0,0,0,0]
Signal 17 received from process 20783
successfully locked '/tmp/view.string' for updating
sending '82289,80763,95569,95490,2,0,0,0,0,0,0,0,0,2,0,0,0,0,0,0,0,0,0,Sep 1 13:48:11,Sep 1 13:48:11' to tmp file
socket info src=ip.web.100.srvr:3001, dst=ip.pc.running.test:2528
listening for Inet connection on sock2, fd=3
server ports 3002 3003
listening for Inet connection on sock3, fd=5
Middlebox test, Port 3003 waiting for incoming connection
Set MSS to 536, Window size set to 16777216KB
socket info src=ip.web.100.srvr:3003, dst=ip.pc.running.test:2529
Sent 'GO' signal, waiting for incoming connection on sock2
C2S test calling init_pkttrace() with pd=0xe2090002
Opening network interface 'eth0' for packet-pair timing
pcap_open_live() returned pointer 0x806e620
installing pkt filter for 'host ip.pc.running.test and port 2530'
Initial pkt src data = 0
socket info src=ip.web.100.srvr:3002, dst=ip.pc.running.test:2530
New packet trace started -- initializing counters
78034 Kbs outbound
Signal USR1(10) sent to child [20787]
Signal 10 received from process 20787
Receied SIGUSRx signal terminating data collection loop for pid=20787
12:48:55.210043 wrote 64 bytes: link counters are ' 1 1 9 248 317 65887 248 38 2848 0 0 2 46.45'
12:48:55.210043 wrote 64 bytes: link counters are ' 0 0 1 96 389 34008 322 0 0 0 0 0 104.04'
64 bytes read ' 1 1 9 248 317 65887 248 38 2848 0 0 2 46.45' from monitor pipe
64 bytes read ' 0 0 1 96 389 34008 322 0 0 0 0 0 104.04' from monitor pipe
listening for Inet connection on sock3, fd=3
waiting for data on sock3
S2C test calling init_pkttrace() with pd=0xe6090002
Opening network interface 'eth0' for packet-pair timing
pcap_open_live() returned pointer 0x806e620
installing pkt filter for 'host ip.pc.running.test and port 2534'
Initial pkt src data = 0
socket info src=ip.web.100.srvr:3003, dst=ip.pc.running.test:2534
Signal 17 received from process 20786
New packet trace started -- initializing counters
sent 0 bytes to client in -98079738589049698940182229907747274657893567126099197952.00 seconds
Signal USR2(12) sent to child [20788]
Signal 12 received from process 20788
Receied SIGUSRx signal terminating data collection loop for pid=20788
12:49:05.456800 wrote 64 bytes: link counters are ' 0 0 2 159 1867 225 1709 110 1788 12943 0 103 3855.15'
12:49:05.456800 wrote 64 bytes: link counters are ' 0 0 2 1781 447 259 1727 166 5191 28430 91 20 1318.50'
Read ' 0 0 2 159 1867 225 1709 110 1788 12943 0 103 3855.15' from monitor pipe
Read ' 0 0 2 1781 447 259 1727 166 5191 28430 91 20 1318.50' from monitor pipe
95545 Kbps inbound
socket info src=ip.web.100.srvr:3003, dst=ip.pc.running.test:2534
libweb100: warning: accessing depricated variable AckPktsIn
libweb100: warning: accessing depricated variable AckPktsOut
Variable 13 (CwndRestores) not found in KIS
Variable 22 (MaxCaCwnd) not found in KIS
Variable 30 (MaxSaCwnd) not found in KIS
Client --> Server data detects link = FastEthernet
Client <-- Server Ack's detect link = FastEthernet
Server --> Client data detects link = 10 Gigabit Enet
Server <-- Client Ack's detect link = 10 Gigabit Enet
Updating admin_view variables: Total count = 2
Initial counter Values Totalcnt = 2, Total Mismatch = 0, Total Bad Cables = 0
Updated counter values Totalcnt = 3, Total Mismatch = 0, Total Bad Cables = 0
Individual counts = [0,0,0,0,0,0,3,0,0,0,0,0,0,0,0,0]
Signal 17 received from process 20786
successfully locked '/tmp/view.string' for updating
sending '82289,78033,95569,95490,3,0,0,0,0,0,0,0,0,3,0,0,0,0,0,0,0,0,0,Sep 1 13:48:11,Sep 1 13:48:54' to tmp file
Signal 17 received from process 20783
Received SIGCHLD signal for active web100srv process [20783]
Signal 2 received from process 20783

I tested without the "-i eth0" and the test was successful again. The issues with "-m" and the failed middlebox testing do appear to be IPTABLES related.

Rich, I would say this was an issue with broken software on my part, with the two different versions of libpcap conflicting, or is there an issue with the program finding/using libpcap from its locations when compiled from source? I do know that I had /usr/local/lib in ld.so.conf and had ran /sbin/ldconfig. On the previous test server, both versions of libpcap were also installed, but I did not run into this issue. Any ideas?

Clay



Archive powered by MHonArc 2.6.16.

Top of Page