Skip to Content.
Sympa Menu

ndt-users - Re: failed middlebox testing

Subject: ndt-users list created

List archive

Re: failed middlebox testing


Chronological Thread 
  • From: Clayton Keller <>
  • To: Richard Carlson <>
  • Cc:
  • Subject: Re: failed middlebox testing
  • Date: Wed, 31 Aug 2005 14:52:39 -0500

Richard Carlson wrote:
Hi Clayton;

At 09:53 AM 8/31/2005, Clayton Keller wrote:

Richard Carlson wrote:

Hi Clayton;
At 11:10 AM 8/30/2005,

wrote:

I have previously configured and have the application operational. Upon setting up Web100 and NDT on another system, I am having what appears to be issues with the web100srv.

I am running kernel 2.6.12.5 with web100-2.5.4 web100_userland-1.5.4 and NDT-3.1.4a. Current version of java is 1.4.2_09, and I have tried with both the libpcap files that are provided by Fedora Core 4 and also compiling libpcap-0.9.3.


All of this sounds normal.

I have used the following options when running the web100srv client.

./web100srv -a -m -l /var/log/web100/web100srv.log.


OK, the -a says generate the admin view, the -m says let multiple clients run simultaneously, and the -l specifies the log file.

When running with -d I see the following:

# ./web100srv -d -m -l /var/log/web100/web100srv.log
Reading config file /etc/ndt.conf to obtain options
ANL/Internet2 NDT ver 3.1.4
Variables file = /usr/local/ndt/web100_variables
log file = /var/log/web100/web100srv.log
Debug level set to 1
server ready on port 3001
web100_init() read 69 variables from file

Upon starting a test I see the following:

Signal 17 received from process 6956


Signal 17 indicates that the child process 6956 was stopped or terminated.

successfully locked '/tmp/view.string' for updating
sending '0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,,' to tmp file
listening for Inet connection on sock2, fd=3
server ports 32778 32779
listening for Inet connection on sock3, fd=5
Middlebox test, Port 32779 waiting for incoming connection
Set MSS to 536, Window size set to 16777216KB


At this point the server should have ports 32778 and 32779 in a listen state. Is that true? Try running a "netstat -nat" command on the server. The ports should be in some state (WAITING, LISTEN, or something).

I then receive an error that the server failed middlebox testing.

On the working system, I see much more output when the test is began...

Any help would be appreciated on this issue, and if more information is needed, I can work on providing that as well.


What happens if you run without the "-m" flag? Does it work then?
What type of port security did you enable? Using the "-m" flag means that the NDT server will use ephemeral ports for the client connections. If you have "iptables" enabled, then the client may not be able to connect to the server.
I just tried using the "-m" on one of my test systems and it ran properly, so I'd suspect an iptables problem.
Regards;
Rich Carlson


When running without "-m", i receive the following output:

Checking for Middleboxes . . . . . . . . . . . . . . . . . . Done
running 10s outbound test (client to server) . . . . . Server failed: 'Go' flag not received


So a connection is opened and closed on port 3003 and the client moves on to the next test. The NDT server and client communicate with each other over port 3001. Since there are multiple tests being run, I created a simple message passing protocol that allows the server to control the clients actions. The original client ran on a timer, meaning it started each new test at a specific time. I changed that behavior so that the client enters a wait state at the end of each test. The server sends a message to the client on port 3001 to move the client out of this wait state and onto the next test. The client also starts a timer to avoid hanging forever if the server dies.

What is happening here is that the server is die-ing and the client is timeing out. This is what the "Server failed: ..." message means.

When the test is run and this error is returned I see a flood of Signal 11 received from process XXXX.


Signal 11 is an invalid memory reference. Are you running the server as root? There might be a bug in the code that causes it to crash if it isn't root. It needs root access to run the packet-pair bottleneck link detection algorithm (it does a raw read on the network interface).


Also, I do see it listening for connections to on the ports indicated in the debug when running with "-m":

tcp 0 0 0.0.0.0:32775 0.0.0.0:* LISTEN
tcp 0 0 0.0.0.0:32776 0.0.0.0:* LISTEN

So, this very well could be an IPTABLES issue. Do you have a range of ports that when running with "-m" the server will be listening on? I guess the ESTABLISHED and RELATED IPTABLES rule is not working on this.


No, I don't have a port range when running in -m mode. The server will request a pair of open ports from the kernel and it will simply use what it gets back.

I turned IPTABLES turned off momentarily for testing and using the "-m" option, I still receive the error:

running 10s outbound test (client to server) . . . . . Server failed: 'Go' flag not received


So this seems to indicate that the original problem really is an IPTABLES issue. You get past the middlebox test and the server crashes when it tries to start the client->server speed test (the same as when you run without the -m option).

Again, there are a number of Signal 11 received from process XXXX which continue to flood with debug on, until a kill web100srv.


This is also a bug in my code. I should handle signal 11's as a permanent error and terminate the process. I'll fix this soon.

I've been trying to look for any information pertaining to the 'Go' flag, but again, input and information would be greatly appreciated.


See the previous email I sent to this list. The 'Go' flag is part of the client/server communications. It allows the server to control the clients state.

As I noted above, this may be a process ownership problem. Try running the server as root and with the -m flag turned off. What happens then?

Rich

Clay




Here are the current processes after I have restarted them:

root 8370 1 96 14:20 pts/1 00:00:01 /usr/local/sbin/web100srv -a -l /var/log/web100/web100srv.log
root 8377 1 0 14:20 pts/1 00:00:00 /usr/local/sbin/fakewww -l /var/log/web100/fakewww.log

I have configured iptables to allow connections on tcp - dest. ports 3001, 3002, 3003, and 7123.

When configuring web100 in the kernel (2.6.12.5 - kernel.org ), I have the following configured:

--- IP: Web100 networking enhancements
[*] Web100: Extended TCP statistics
(384) Web100: Default file permissions (0) Web100: Default gid
[*] Web100: Net100 extensions

[*] Web100: Netlink event notification service

GID 0 is root.

File permissions for /usr/local/sbin/web100srv rwxr-xr-x root.root
All files in /usr/local/ndt are root.root with the exception of tcpbw100.html which is root.users. All files in this folder are rw-r--r--

The files in /usr/local/lib are all root.root as well, including the libpcap.a file that was compiled prior to installation of ndt-3.1.4a.

While running the test I went ahead and did a packet capture as well. The following information is being passed on the connection to port 3003:

ip.web.100.server;ip.client.doing.test;1456;-1;-1;

I also see SYN, ACK, and ACK FIN traffic passing on 3001 and 3002.

I still am seeing the 'Go' flag error. I thank you for all the help thus far, and am curious what ideas you have as far as proceeding further with this.

Clay



Archive powered by MHonArc 2.6.16.

Top of Page