Skip to Content.
Sympa Menu

ndt-dev - [ndt] r358 committed - Adding in credits for the translation work. Adding a SVN dump for ...

Subject: NDT-DEV email list created

List archive

[ndt] r358 committed - Adding in credits for the translation work. Adding a SVN dump for ...


Chronological Thread 
  • From:
  • To:
  • Subject: [ndt] r358 committed - Adding in credits for the translation work. Adding a SVN dump for ...
  • Date: Fri, 28 May 2010 11:57:23 +0000

Revision: 358
Author: jwzurawski
Date: Fri May 28 04:57:08 2010
Log: Adding in credits for the translation work. Adding a SVN dump for
release purposes.

-jason


http://code.google.com/p/ndt/source/detail?r=358

Added:
/trunk/CHANGES
Modified:
/trunk/AUTHORS
/trunk/FILES
/trunk/Makefile.in

=======================================
--- /dev/null
+++ /trunk/CHANGES Fri May 28 04:57:08 2010
@@ -0,0 +1,1701 @@
+------------------------------------------------------------------------
+r357 | jwzurawski | 2010-05-18 13:32:00 -0400 (Tue, 18 May 2010) | 5 lines
+
+Updating translation
+
+-jason
+
+
+------------------------------------------------------------------------
+r356 | jwzurawski | 2010-05-07 07:37:48 -0400 (Fri, 07 May 2010) | 5 lines
+
+Adding support for pt_BR to the translations.
+
+-jason
+
+
+------------------------------------------------------------------------
+r355 | rcarlson501 | 2010-05-06 18:47:15 -0400 (Thu, 06 May 2010) | 5 lines
+
+uncomment close() function to release sock fd's
+
+RaC
+
+
+------------------------------------------------------------------------
+r354 | jwzurawski | 2010-05-06 09:28:11 -0400 (Thu, 06 May 2010) | 6 lines
+
+Applying security patch for issue experienced in Safari - applet would
+freeze and not continue.
+
+-jason
+
+
+------------------------------------------------------------------------
+r353 | jwzurawski | 2010-05-06 09:26:03 -0400 (Thu, 06 May 2010) | 5 lines
+
+Adding the correct patch
+
+-jason
+
+
+------------------------------------------------------------------------
+r352 | jwzurawski | 2010-05-06 09:25:14 -0400 (Thu, 06 May 2010) | 5 lines
+
+whoops, this is the wrong patch to be adding
+
+-jason
+
+
+------------------------------------------------------------------------
+r351 | jwzurawski | 2010-05-06 09:19:34 -0400 (Thu, 06 May 2010) | 5 lines
+
+Adding a patch for Tcpb100.java to fix a freezing error.
+
+-jason
+
+
+------------------------------------------------------------------------
+r350 | jwzurawski | 2010-05-05 07:20:44 -0400 (Wed, 05 May 2010) | 5 lines
+
+Adding a contributed patch for fakewww.
+
+-jason
+
+
+------------------------------------------------------------------------
+r349 | rcarlson501 | 2010-04-22 22:07:19 -0400 (Thu, 22 Apr 2010) | 10 lines
+
+more error processing
+
+Call exit(-2) when a child generates a SIGSEGV.
+
+handle case, terminate process, if a child receives a
+blank 'go' message.
+
+4/22/09 RAC
+
+
+------------------------------------------------------------------------
+r348 | rcarlson501 | 2010-04-21 20:45:18 -0400 (Wed, 21 Apr 2010) | 12 lines
+
+Improve error handling when communicating with child. Use select() with
+a timer to prevent child processes from hanging in an accept() state.
+the time will now expire and the child will return a error flag back to the parent
+process.
+
+Also handle write errors and terminate if a non EINTR error is encountered
+while processing a write() call.
+
+bumped version number to 3.6.3
+
+RAC 4/21/10
+
+------------------------------------------------------------------------
+r347 | jwzurawski | 2010-04-15 14:16:28 -0400 (Thu, 15 Apr 2010) | 5 lines
+
+Adding a second samknows patch
+
+-jason
+
+
+------------------------------------------------------------------------
+r346 | jwzurawski | 2010-04-14 11:15:20 -0400 (Wed, 14 Apr 2010) | 6 lines
+
+adding the 'SamKnows' patch to Tcpbw100.java to the repo. We will be
+sorting out how to apply this at a later date.
+
+-jason
+
+
+------------------------------------------------------------------------
+r345 | jwzurawski | 2010-04-13 22:14:36 -0400 (Tue, 13 Apr 2010) | 5 lines
+
+adding more localizations
+
+-jason
+
+
+------------------------------------------------------------------------
+r343 | jwzurawski | 2010-04-12 19:00:51 -0400 (Mon, 12 Apr 2010) | 5 lines
+
+Adding support for some localizations.
+
+-jason
+
+
+------------------------------------------------------------------------
+r342 | rcarlson501 | 2010-04-09 13:34:54 -0400 (Fri, 09 Apr 2010) | 6 lines
+
+update version number to 3.6.2b
+Matches Tcpbw100.java version
+
+RAC 4/9/10
+
+
+------------------------------------------------------------------------
+r341 | rcarlson501 | 2010-04-09 12:49:47 -0400 (Fri, 09 Apr 2010) | 11 lines
+
+Part of the ToDo on error handling. The server can send the '9988' signal
+multiple times. Old clients will see this and report a 'server busy' signal.
+The NDT applet and commandline client will report a 'server busy' if no
+valid wait time was received, otherwise they will report an ' unknown fault'
+occurred.
+
+Note messages are now in a .properties file for multi-lingual support.
+
+RAC 4/9/10
+
+
+------------------------------------------------------------------------
+r340 | rcarlson501 | 2010-04-09 12:23:03 -0400 (Fri, 09 Apr 2010) | 98 lines
+
+Mostly debugging changes. Added multiple log_println() lines and updated
+others to include the PID on the line. This adds in debugging multi-client
+operations by allowing the admin to tie messags to a specific process.
+
+There were some more substantive changes as well:
+
+test_sfw_srv.c
+ Changed testTime value. The max value was reduced from 30 sec to 3
sec
+ there is no reason to wait longer. There is now 2 possible values 1
sec
+ and 3 secs. This is based on the MaxRTO value.
+ Todo: fix this to use RTO values and make testTime a float instead
+ of int var. Both the client and server code needs to change to
+ implement this approach.
+
+network.c
+ Handle interrupts/signals while read()/write() data. These functions
+ can exit before reading/writing if an interrupt occurs. In
multi-client
+ mode this is quite possible. The new code handles up to 5 interrupts
+ before returning an error (this number may need to change).
+
+ I also fixed the error handling to have send_msg() return an error
+ indication if the write() failed. The calling routine can then
determin
+ what to do.
+
+testoptions.c
+ Improved the error handling around the TEST_PREPARE messages. This is
+ the send_msg() call that tells the client to begin a new test. It typically
+ sends a text message (usually a port number) along with this flag.
(ToDo:
+ it should also send a flag indicating which test to run so the client
can
+ skip failed tests.) The return code from send_msg() is used to
determine
+ if the client got this message. If not the test is aborted. (ToDo: implement
+ a TEST_ABORT message to let the client know this test is being
skipped.)
+
+ Also moved the I2AddrFree() calls inside the if() loop. The middlebox, c2s,
+ and s2c tests all have a master if() loop. The main run_test() routine calls
+ each test in turn. The test routine determines if it should do something or
+ just return. The Free() call should only be used if the test was run.
+
+testoptions.h
+ Added new 'int state' value to the testoptions struct. ToDo: use this
+ state var to keep track of where in the test process (prepare, start, running,
+ finalize) the server is. This would allow the server to clean up if a test
+ aborted or failed.
+
+web100clt.c
+ Handle condition where the CreateConnectSocket() call failed. In this case
+ the client was unable to open the control socket to the server. The
+ client now aborts and reports the fault instead of trying to continue
+
+web100srv.c
+ Fixed bug when trying to dispatch a waiting client in multi-client mode. The
+ server would find that a client was able to run, but the goto: call
was
+ inside an if() statement instead of after it, so the call would only happen
+ if there were lots of clients in the queue.
+
+ Fixed bug where the server would try to 'start' a client multiple times. The
+ server now checks the clients 'running' flag before trying to 'start' signal.
+
+ Moved the check for a stuck client. The server now does the following tasks
+ process pending signals
+ dispatch waiting clients if there is a run slot
+ Also update waiting clients when they move up in the
queue
+ handle fault conditions (ToDo: improve this function.)
+ process new test requests
+
+ Handle SIGPIPE (13) signals
+
+ Improved error handling in the run_test() routine. Each child has a run_test()
+ routine that control the testing. The test order is fixed and each test routine
+ is called in sequence. The error code for a failed test is now reported. (ToDo:
+ further improvements are needed to handle the case where a test fails while
+ running. At the present time, faults are caught when the prepare signal is sent
+ to the client. The server needs to track the process and handle other conditions.)
+
+ToDo:
+in order to handle error conditions and faults better, the server needs the ability
+to inform the client that a fault has occurred and to 'skip ahead' in the test sequence.
+This will require changes to both the server and the client code. Since there are now multiple
+clients, this will need to be done in a group manner and backward compatability issues need
+to be addresses.
+
+At the present time the client can't get an abort signal after it has received a valid
+'wait time' signal. The client can track this and issue different messages depending
+on what state it is in, thus overloading the '9999 - server busy' signal. This will
+be implimented shortly in the NDT managed clients, and the process started to work out
+a better solution with the other client developers
+
+The client should also tell the server some info (OS type, client name, browser (if
+applicable). This would help when post processing data and this will go into the .meta
+file. This can also be done in a backward compatable manner. The current NDT managed
+clients have a flag to indicate their old/new state. This flag can be used/changed to
+let the server maintain this compatability with old clients.
+
+RAC 4/9/2010
+
+
+
+
+------------------------------------------------------------------------
+r338 | jwzurawski | 2010-04-08 16:53:43 -0400 (Thu, 08 Apr 2010) | 5 lines
+
+Merging changes from jz-localization into the trunk.
+
+-jason
+
+
+------------------------------------------------------------------------
+r331 | rcarlson501 | 2010-03-25 18:23:58 -0400 (Thu, 25 Mar 2010) | 5 lines
+
+adding author/version/IP info.
+
+-jason
+
+
+------------------------------------------------------------------------
+r330 | rcarlson501 | 2010-03-25 18:09:38 -0400 (Thu, 25 Mar 2010) | 5 lines
+
+Adding a donar init script, addresses issue 38.
+
+-jason
+
+
+------------------------------------------------------------------------
+r327 | jwzurawski | 2010-03-25 13:12:13 -0400 (Thu, 25 Mar 2010) | 5 lines
+
+Adding 'x' to the list for getopt. This addresses issue 18.
+
+-jason
+
+
+------------------------------------------------------------------------
+r326 | rcarlson501 | 2010-03-24 15:18:47 -0400 (Wed, 24 Mar 2010) | 5 lines
+
+Adding a log rotation script for use on MLab.
+
+-jason
+
+
+------------------------------------------------------------------------
+r325 | rcarlson501 | 2010-03-23 23:16:43 -0400 (Tue, 23 Mar 2010) | 5 lines
+
+catch return code for send_msg call when doing the S2C test_prepare message
+exchange.
+
+RAC 3/23/10
+
+------------------------------------------------------------------------
+r324 | rcarlson501 | 2010-03-23 22:36:09 -0400 (Tue, 23 Mar 2010) | 6 lines
+
+more debuging in s2c test, testing shows not all clients
+are entering this test loop.
+
+RAC 3/23/10
+
+
+------------------------------------------------------------------------
+r323 | rcarlson501 | 2010-03-23 22:02:58 -0400 (Tue, 23 Mar 2010) | 5 lines
+
+add a couple of debug messages around the s2c test loop Testing
+is showning the server is getting stuck in this area.
+
+RAC 3/23/10
+
+------------------------------------------------------------------------
+r322 | rcarlson501 | 2010-03-23 20:58:45 -0400 (Tue, 23 Mar 2010) | 6 lines
+
+add some debug messages and a rewrite c2s accept() loop to
+detect and recover from interrupt.
+
+rac /23/09
+
+
+------------------------------------------------------------------------
+r319 | rcarlson501 | 2010-03-22 23:56:00 -0400 (Mon, 22 Mar 2010) | 12 lines
+
+Updating files to handle case where a write() or read() can return due
+to an interrupt. In this case no date is written/read and the server may
+not move to the next test. This would cause the server to timeout the client
+and the client would report a failed test.
+
+Previous changes also include a reduction in the firewall test time. The
+original version had a max time of 30 sec. This may cause an alarm() signal
+to go off terminating the server process. The max time was reduced to 3 sec.
+
+RAC 3/22/10
+
+
+------------------------------------------------------------------------
+r318 | rcarlson501 | 2010-03-21 15:11:15 -0400 (Sun, 21 Mar 2010) | 9 lines
+
+the write() function can get terminated by an interrupt. When there
+ar multiple clients running, the possibility of this happening increases.
+This update wraps the write() calls in a for() loop. This way the write()
+can get repeated up to 4 times. If all 4 write()'s fail then the test
+will fail.
+
+RAC 3/21/10
+
+
+------------------------------------------------------------------------
+r317 | rcarlson501 | 2010-03-21 14:04:42 -0400 (Sun, 21 Mar 2010) | 9 lines
+
+Updates to server code
+
+catch/report sig13, sigpipe
+
+Remove alarm() timers around individual tests
+
+rac 3/21/10
+
+
+------------------------------------------------------------------------
+r312 | jwzurawski | 2010-03-16 13:02:10 -0400 (Tue, 16 Mar 2010) | 5 lines
+
+Fixes for issue 16. All links have been updated and checked.
+
+-jason
+
+
+------------------------------------------------------------------------
+r308 | jwzurawski | 2010-03-08 17:34:02 -0500 (Mon, 08 Mar 2010) | 5 lines
+
+Reverting protocol messages to r278. Change is due to MLab use.
+
+-jason
+
+
+------------------------------------------------------------------------
+r307 | rcarlson501 | 2010-03-02 14:13:35 -0500 (Tue, 02 Mar 2010) | 6 lines
+
+Updated copy of the Tcpbw100.java file. Contains references to API and
+error codes. Commit is in conjunction with MLab development.
+
+-jason
+
+
+------------------------------------------------------------------------
+r305 | rcarlson501 | 2010-02-28 15:04:47 -0500 (Sun, 28 Feb 2010) | 11 lines
+
+More bug fixes.
+
+changed exit() call to return -1 in err_sys() function. This funcion is called
+by the main web100srv process and it shouldn't exit!
+
+Changed logging level for web100 data text, reduces the amount of text in the
+debug log file.
+
+RAC 2/28/10
+
+
+------------------------------------------------------------------------
+r304 | rcarlson501 | 2010-02-28 14:51:47 -0500 (Sun, 28 Feb 2010) | 8 lines
+
+change alarm() time from 60 sec to 120 sec. This alarm is suppose to
+prevent clients from remaining stuck in the queue forever, but the
+normal queue walking process should provide that protection. This
+alarm() may be removed in the near future.
+
+RAC -2.28.10
+
+
+------------------------------------------------------------------------
+r303 | rcarlson501 | 2010-02-28 14:29:32 -0500 (Sun, 28 Feb 2010) | 40 lines
+
+More changes to resolve bugs in the mlab distro.
+
+From looking at the code this weekend (2.27.10) and running tests it appears
+the part of the problem is that the server and/or client is timing out on
+reads/writes and then the test fails. As a specific example the network.c
+file contains the readn() function, which is called by the read_msg() function.
+This routine reads data from the network and returns the data it found. I
+earlier found that the read() call would exit if an interrupt was received so
+this could cause the readn() routine to fail. I also noticed that it could hang
+forever if nothing ever arrived on the socket. To resolve these problems I
+added a select with a timer to prevent an indefinate hang, and handled the errno=INTR
+case. This should have fixed thing, but I then found that both the server code AND
+the client code use this same readxxx() functions, and the timeout for the server was
+way too short for the client. This cased the client to exit before the server sent it
+the wait time signal. (at least this happend in multi-client mode.) The solution was
+to make the time much longer (was 10 sec, now it's 600 sec). This may need to be
+revisited.
+
+I then found a couple of bugs in the web100srv.c code. In 1 case if the client times out
+the waiting variable was decremented twice, causing the server to miss count waiting clients.
+I also moved one of the test conditions to handle errors better, The server was attempting
+to run tests with invalid test suite data, it now detects this condition.
+
+Handled error and exit conditions better when a client can't get into the queue.
+
+Handled a full queue bug that caused an extra client to enter the queue.
+
+Improved the exit and error reporting for the command-line and java client.
+
+Fixed a bug in deploying the janalyze class and jar files.
+
+incremented the version number to 3.6.1
+
+Remaining, task -- The error messages on the Java applet have been updated to help
+identify what was going on when the fault occurred. This version needs to be
+patches with Seth's version and a new signed applet needs to be generated.
+
+RAC 2/28/10
+
+
+------------------------------------------------------------------------
+r296 | racarlson | 2010-02-21 14:06:25 -0500 (Sun, 21 Feb 2010) | 7 lines
+
+clear send buffer (buff) before writing s2c test results into buffer. This buffer use to
+hold the 8K of text being sent to the client. set the entire buffer to 0 before loading in
+the test results.
+
+RAC 2.21.10
+
+
+------------------------------------------------------------------------
+r295 | racarlson | 2010-02-11 11:31:38 -0500 (Thu, 11 Feb 2010) | 17 lines
+
+This is a revision to v3.6.0
+
+Modified web100srv.c to handle error conditions better. If a child gets stuck or some other
+error occurs, then the code takes the following actions:
+
+1) get the PID from the process at the head of the FIFO
+2) call the child_sig() function with a -1
+3) the child_sig process will remove the process from the head of the FIFO queue
+4) then call kill() with a SIGTERM for this pid
+5) finally call child_sig() again with the pid so the wait4() will clean up the kernel state
+
+This should keep the server going and prevent the current situation where the main process
+gets into a tight loop looking for some process to kill/cleanup.
+
+RAC 2/11/10
+
+
+------------------------------------------------------------------------
+r294 | racarlson | 2010-02-09 20:30:07 -0500 (Tue, 09 Feb 2010) | 38 lines
+
+Modfications to fix bugs in multi-client mode operations.
+
+Bumped version number to 3.6.0, with intermediate versions of 3.5.15, 3.5.16, 3.5.17 & 3.5.18
+Version 3.6.0 should be a working version that doesn't crash the server and clients don't get
+partial results.
+
+Partial results: Changed web100-pcap.c and testoptions.c to resolve this bug. The problem
+was that the pkt-pair timing data wasn't getting delivered to the parent (testing) process.
+The parent would then hang in a wait state until a SIGALRM fired. By then it was usually
+too late. In testing with 2 clients, one wired and the other wireless, I found that the
+wireless (100+ msec RTT) would see ALRM's and failed tests, but the wired client would run
+to completion. I finally noticed that the parent was reading data from the pipe in larger chunks
+than it should have. That is, the child was writing 2 lines, but in some cases the parent got
+everything in a single read. This would hang the parent on the 2nd read.
+
+To solve this I reworked the read() section of the code. It now lives inside a select() call
+and a for() loop. After the 1st read, the code loops back to the select() to wait for the
+2nd line. I also added a short 30 msec delay (using usleep()) into the web100-pcap.c file.
+This went between the 2 writes. This gives the parent time to pick up the 1st line before
+looking for the 2nd. Testing now shows this code is working correctly.
+
+Server crashes and hangs: This was the 2nd major problem with the multi-client code. In fact
+I fixed this 1st and then found the above problem. To solve this I reworked the select(), read(),
+and write() code to correctly handle an Interrupt (EINTR error). These functions typically wait
+for an event. However, if an interrupt occurs, then they exit and report this using the EINTR
+error code. Previously, the code didn't handle this correctly so it would hang or proceed when
+I wasn't expecting it to.
+
+I also reworked the SIGCHLD processing and the child_sig() routine. This routine now handles
+both pkt-pair children and test children properly.
+
+Finally I added in a little more error handling into the main test loop. I now detect when there
+is something in the queue, but the waiting variable says it should be empty. I also added some
+code to catch an error when the waiting and/or mclients variable went below zero.
+
+RAC 2/9/10
+
+
+------------------------------------------------------------------------
+r293 | jwzurawski | 2010-01-26 13:44:17 -0500 (Tue, 26 Jan 2010) | 5 lines
+
+Testing SVN notification
+
+-jason
+
+
+------------------------------------------------------------------------
+r292 | racarlson | 2010-01-13 20:24:05 -0500 (Wed, 13 Jan 2010) | 15 lines
+
+Update files to handle interrupt signals during select() function call. The
+select() function will exit if a signal is received. However, the code may
+still be waiting for a read to complete, and the signal should be handled, but
+then loop back to continue waiting for the select to timeout or the read to
+complete. The select() call now checks for this condition and returns to the
+wait state. It still needs to check for/handle some of the signals.
+
+Also, fixed bug in single user mode operations. Multiple clients were
+starting instead of properly queuing.
+
+Bumped version to 3.5.14
+
+RAC 1/14/10
+
+
+------------------------------------------------------------------------
+r291 | jwzurawski | 2010-01-12 16:27:35 -0500 (Tue, 12 Jan 2010) | 7 lines
+
+Replacing instances of 'MKDIR_P' with 'mkdir_p' in some makefile
+defintions. This was causing 'make install' to fail for versions 3.5.7
+through 3.5.13.
+
+-jason (1/12/09)
+
+
+------------------------------------------------------------------------
+r290 | racarlson | 2010-01-04 21:34:32 -0500 (Mon, 04 Jan 2010) | 7 lines
+
+update files to catch up with mlab patches
+
+update to version 3.5.13
+
+1/4/10
+
+
+------------------------------------------------------------------------
+r289 | racarlson | 2009-12-04 17:16:10 -0500 (Fri, 04 Dec 2009) | 25 lines
+
+catching up with work done on mlab4 node.
+
+update version to 3.5.12 in configure.ac and Tcpbw100.java
+
+Add select() call to readn() function in network.c This prevents the server from
+blocking forever when trying to read data from a remote client. The select() will
+wait 13 seconds (or something like) for data. If nothing arrives, the subroutine
+will return an error.
+
+Changed the signal processing for SIGTERM to ignore these signals for the parent NDT process
+These should never happen, and the init.d script uses sigkill to stop/restart the ndtd process.
+
+Updated the signal handling for SIGCHLD & SIGALRM events when the pcap child processes terminate. For
+some reason, these children don't always throw the SIGCHLD signal until the alarm() timer
+expires. Once they do, the CHLD signal is generated/processed. Modified the SIGALRM handler to
+monitor the waid_sig global flag. If this flag is set, then the pcap child has done it's stuff and
+the server should simply process the CHLD signal and continue testing. Otherwize the client has
+dissapearred and we should termnate this test. This solves the problem where a test appears to
+complete properly, but then the server throws a 'protocol error' message and kills off the test.
+
+Changes a few alarm timer values as well.
+
+RAC 12/2/09
+
+
+------------------------------------------------------------------------
+r288 | racarlson | 2009-10-22 20:13:14 -0400 (Thu, 22 Oct 2009) | 10 lines
+
+add config.h include statement to logging.c file, brings in the defines from the configure process.
+
+changed the waitpid() routine in testoptions.c to make it look at the return code and detect if
+the waidpid() function returned due to a signal or the child terminating.
+
+Start looking at ways to detect if a test timed-out so the next test could run if desired.
+
+RAC 10/22/09
+
+
+------------------------------------------------------------------------
+r287 | racarlson | 2009-10-14 13:56:36 -0400 (Wed, 14 Oct 2009) | 4 lines
+
+added debug line to see why compression routine wasn't being called.
+
+RAC - 10/14/09
+
+------------------------------------------------------------------------
+r286 | racarlson | 2009-10-14 13:23:04 -0400 (Wed, 14 Oct 2009) | 35 lines
+
+More memory leak fixes and a couple of bug fixes.
+
+I found an on-line reference from 2003 that indicated there was a bug in the libpcap
+freecode() routine. I appearred to be bumping into this bug, so this function is
+commented out for now in the web100-pcap.c file. I also commented out the alldevfree()
+call. This should be revisited later, but since the child process that runs this code
+terminates, it should free up any malloc'ed memory.
+
+Possible bug fix in web100srv.c - When creating a new child a block of memory is
+malloc'ed and later free'ed. I noticed that some of the strings contained extraneous
+characters. The code now calls memset() to 0 out the block of memory before using it.
+The extraneous characters are now gone.
+
+Probable bug fix in web100srv.c - All clients now go through the FIFO linked list to
+control the testing. In pre3.5 versions only the single client mode operation uses the
+FIFO queue, multi-client mode bypassed this queue. The v3.5 code was modified to send all
+clients through the queue so clients could wait if the server was busy.
+
+In v3.5.10, a semaphore was added to protect the queue pointer manipulation routines (adding
+and removing clients from this queue). This caused the server to hang at a semaphore wait state
+instead of crashing due to pointer corruptions. I finally trace this down to a child_sig()
+call being made in the middle of a queue update. The child_sig() routine can also update the
+queue, and this was causing the hang/crash. The child_sig() call has been moved to after the
+pointer manipulation is completed.
+
+Also, implimented better SIGCHLD handling. This signal is handled by a short routine that
+checks to see which process generated the signal. If one of the pkt-pair children generated
+it, then ignore this signal, those signals are handles by waitpid() calls after each test
+completes. SIGCHLD signals for the each test child should be handled by the main process, by
+calling the child_sig() rouitne. The main process also detects "defunct" processes and clean
+them up by making repeated calls to the child_sig() function.
+
+RAC 10/14/09
+
+
+------------------------------------------------------------------------
+r285 | racarlson | 2009-10-13 10:49:45 -0400 (Tue, 13 Oct 2009) | 10 lines
+
+Clean up memory leaks reported by valgrind program http://valgrind.org
+
+Added in new error detection routine in main for() loop. If running in multi-client
+mode and the number of waiting clients (in the queue) is less than the number of mclients
+then we probably missed a signal. Test for this condition and if true, call the signal
+handler routine child_sig() to clean up.
+
+RAC 10/13/09
+
+
+------------------------------------------------------------------------
+r284 | racarlson | 2009-10-09 16:19:49 -0400 (Fri, 09 Oct 2009) | 4 lines
+
+Update fifo pointers after removing stuck client from queue.
+
+RAC 10/9/09
+
+------------------------------------------------------------------------
+r283 | racarlson | 2009-10-09 15:46:38 -0400 (Fri, 09 Oct 2009) | 14 lines
+
+Convert wait() to waidpid() function in testoptions.c file. This call is made after the c2s & s2c
+tests run, to catch/close the pkt-pair child process. The wait() call responded to any child, while
+the waitpid() call only responds to a specific child. This may fix a bug with multi-client mode where the
+server gets multiple signals.
+
+remove a possible extraneous call to child_sig() when a client is listed as stuck in the fifo queue.
+The mlab servers are entering a state where a new client is delayed from entering the run state if
+a previous client pushed the parent into this stuck state.
+
+Update version in configure.ac and .java files to 3.5.11
+
+RAC 10/9/09
+
+
+------------------------------------------------------------------------
+r282 | racarlson | 2009-09-17 17:15:33 -0400 (Thu, 17 Sep 2009) | 10 lines
+
+Fixed configure.ac to detect and report if the zlib.h and pcap.h header files are
+loaded on the system. It pcap.h doesn't exist, some client things will be built,
+if zlib.h doesn't exist the web100srv process will build, but it will not attempt
+to compress snaplog and/or tcpdump files.
+
+Bumped version number to 3.5.10
+
+RAC 9/17/09
+
+
+------------------------------------------------------------------------
+r281 | racarlson | 2009-09-14 12:28:38 -0400 (Mon, 14 Sep 2009) | 6 lines
+
+wrap compression routines in #ifdef HAVE_ZLIB statements. The code should build even if
+the zlib library isn't installed, you just can't compress the logs then.
+
+RAC 9/14/09
+
+
+------------------------------------------------------------------------
+r280 | racarlson | 2009-09-10 12:16:13 -0400 (Thu, 10 Sep 2009) | 9 lines
+
+Changes to the build process (makefile.am's) and configure.ac to detect if the libz library
+is found. This is needed to compress the snaplog & tcpdump files. The logging.c code should (will)
+be modified to include a def statement so it compiles without the zlib.h file, disabling the
+compression function.
+
+This also updates the aclocal.m4 file to use automake v1.11
+RAC 9/10/09
+
+
+------------------------------------------------------------------------
+r279 | racarlson | 2009-09-10 11:14:16 -0400 (Thu, 10 Sep 2009) | 4 lines
+
+bump the version number in the applet to match the server version number (3.5.9)
+
+RAC 0/10/09
+
+------------------------------------------------------------------------
+r278 | racarlson | 2009-09-09 17:00:45 -0400 (Wed, 09 Sep 2009) | 14 lines
+
+changes to support compression of tcpdump, snaplog, and cputime files.
+
+The configure.ac file changed due to the need to add the libz library to the linker
+
+Test code went into web100-pcap.c and testoptions.c, but it was removed and everything
+was put into the logging.c file.
+
+The web100srv.c file has an update to the writeMeta() routine to call it with more options
+that need to be passed in to determine if compression is requested.
+
+Note: compression is enabled by default. The -z command line option disables this function.
+
+RAC 9/9/09
+
+------------------------------------------------------------------------
+r277 | racarlson | 2009-09-08 12:37:09 -0400 (Tue, 08 Sep 2009) | 13 lines
+
+Added new field to ndtchild structure. This field keeps track of the childs running/not-running state.
+This was added to support multi-client operations.
+
+possible bug fix for web100-pcap.c. Some serves are throwing a SIGSEVG signal after the pkt-pair child
+process finishes collecting data.
+
+Other changes support multi-client operations, all clients now enter the queue and then get dispatched
+when they are ready to run. Multi-clinets get dispatched immediately, up to the max_client limit FIFO
+clients get dispatched one at a time.
+
+RAC 9/8/09
+
+
+------------------------------------------------------------------------
+r276 | racarlson | 2009-08-03 15:25:16 -0400 (Mon, 03 Aug 2009) | 13 lines
+
+Bug fix.
+
+Server wasn't handling clients with improperly formed test requests. (i.e., telnet'ing to test port would cause
+server to kill itself). The fix was to catch the return code from the initialize_tests() routine. Now this
+routine returns a negative number on failure and a positive number on success. The return code is then
+checked in the web100srv.c file and if negative, the child is killed and the server loops back to see if another
+client has arrived.
+
+Incremented to ver 3.5.8
+
+Rich
+
+
+------------------------------------------------------------------------
+r275 | racarlson | 2009-07-24 10:22:00 -0400 (Fri, 24 Jul 2009) | 10 lines
+
+Bug fixes
+
+web100-pcap.c: initial ifspeed value wasn't being set to -1
+
+converted from gethostbyaddr() to getnameinfo() routine. getnameinfo() is v4/v6 compatible
+so I don't need to do the conversion.
+
+RAC 7/24/09
+
+
+------------------------------------------------------------------------
+r274 | racarlson | 2009-07-17 20:31:45 -0400 (Fri, 17 Jul 2009) | 8 lines
+
+bug fix to multi-client code
+
+mclients counter was being incremented in parent an decremented in child.
+Obviously this isn't right. mclients counter now decremented when termination
+signal is caught.
+
+RAC 7/17/09
+
+------------------------------------------------------------------------
+r273 | racarlson | 2009-07-17 13:09:49 -0400 (Fri, 17 Jul 2009) | 6 lines
+
+bump version number for previous multi-client bug fix
+
+now v3.5.7
+
+RAC 7/17/09
+
+------------------------------------------------------------------------
+r272 | racarlson | 2009-07-17 13:04:32 -0400 (Fri, 17 Jul 2009) | 8 lines
+
+Bug fix - there was no limit to the number of clients when running in multi-client mode.
+
+Now the max_client variable is used for both the max number of clients in the queue (FIFO mode)
+or the max number of simultaneous clients (multi-client mode).
+
+RAC 7/17/09
+
+
+------------------------------------------------------------------------
+r271 | racarlson | 2009-07-16 10:59:50 -0400 (Thu, 16 Jul 2009) | 7 lines
+
+check returned ifspeed value. If it wasn't found it will be -1, reset that to 10 before
+entering the pkt-pair bin scan. Otherwise the loop will be from 0 to -1, which could
+take a very long time .-)
+
+RAC 7/16/09
+
+
+------------------------------------------------------------------------
+r270 | racarlson | 2009-07-15 17:43:24 -0400 (Wed, 15 Jul 2009) | 10 lines
+
+Add in code to capture interface speed, based on ethtool code. During initialization, the
+server walks the list of interfaces and grabs the current speed for each up interface.
+this data is then used to limit the pkt-pair search to find the bottleneck link type.
+
+The intent is to reduce over extimates of the link speed when the local host is doing
+interrupt coalescing. Bumped version number to 3.5.6 in configure.ac and Tcpbw100.java
+
+RAC 7/15/09
+
+
+------------------------------------------------------------------------
+r269 | racarlson | 2009-07-14 12:17:53 -0400 (Tue, 14 Jul 2009) | 9 lines
+
+bug fix
+
+Don't use s2c2 gt s2c speeds as an indication of duplex mismatch when running in
+multi-client mode. The CWND limited speed may be greater than the unlimitet CWND
+case due to congestion on the local link.
+
+RAC 7/14/09
+
+
+------------------------------------------------------------------------
+r268 | racarlson | 2009-07-14 11:56:34 -0400 (Tue, 14 Jul 2009) | 15 lines
+
+Bug fixes-
+
+1) zero out buffer used to receive parent-to-child "go" message. The buffer had
+extraenous characters which caused the last test to be requeted to fail. The
+value wasn't a test number, but a string with the extraenous data attached.
+
+2) fixed multi-client test mode, server initialization code use to be handled
+once the test started, it was moved to the init stage, but the test for multi-client
+mode occurred before the test_suite was initialized. Moved test for multi-client
+to after initialization step.
+
+v3.5.5 should be ready for release
+
+RAC 7/14/09
+
+------------------------------------------------------------------------
+r267 | racarlson | 2009-07-14 11:00:19 -0400 (Tue, 14 Jul 2009) | 7 lines
+
+Bug fix. Server was sending waiting messages to last client in the queue instead
+of sending a message to each client. Changed send_msg() call to use ctlsockfd stored
+in ndtchild struct instead of the last set value.
+
+RAC 7/14/09
+
+
+------------------------------------------------------------------------
+r266 | racarlson | 2009-07-14 10:29:39 -0400 (Tue, 14 Jul 2009) | 21 lines
+
+The c2s routine had a fixed number of file descriptors (32) it would read from. This
+should have been a variable (mon_pipe[0]+1). The result was that if more than 16 clients
+were in the queue, the c2s test routine would never exit properly. This would result in
+the FINALIZE signal never getting sent to the client, so the s2c test would also fail.
+
+I made 2 changes,
+ 1) changed fixed value to variable
+ 2) changed code to handle select timeout properly, causing the c2s
test
+ to signal the client the exit status.
+FIXME, I need a better way to handle this type of error.
+
+Added alarm(90) signal to client, make it exit after 90 seconds if a test starts
+and never completes.
+
+moved test for too many clients up before doing the server initialize code. The
+intent is to get rid of clients that exceed the queue limit without sending them
+a waiting in queue message.
+
+RAC 7/14/09
+
+
+------------------------------------------------------------------------
+r265 | racarlson | 2009-07-10 12:49:51 -0400 (Fri, 10 Jul 2009) | 9 lines
+
+there is a problem with the test_suite string that is being passed down to the
+child. The string has extraneous characters on the end and this causes the
+client to fail on the last test (s2c_speed).
+
+Tempory fix to make the copy 7 bytes (1 8 2 4) instead of a strlen() variable.
+need to fix later.
+
+RAC 7/10/09
+
+------------------------------------------------------------------------
+r264 | racarlson | 2009-07-09 13:20:27 -0400 (Thu, 09 Jul 2009) | 4 lines
+
+removed code to set meta.family value. Now set in testopts.c
+
+RAC 7/9/09
+
+------------------------------------------------------------------------
+r263 | racarlson | 2009-07-09 13:17:31 -0400 (Thu, 09 Jul 2009) | 13 lines
+
+bug fix to ndttrace logging filename. The file name is generated in the child process
+and then it needs to be passed back to the parent to get listed in the metadata file.
+
+There already was a pipe and the child sent a message to the parent once the filter was
+in place. Now that message is either the ndttrace name, if logging is requested or
+a "Ready" message.
+
+also fixed bug in setting the hostname. the meta.family varialbe is now set right after
+the middlebox test is requested.
+
+rac 7/9/09
+
+
+------------------------------------------------------------------------
+r262 | racarlson | 2009-07-09 11:00:55 -0400 (Thu, 09 Jul 2009) | 19 lines
+
+Added new thread to remove zombie clients from queue list. A zombie client
+is one which the user has requested a test, but left before the test ran.
+The old code would catch this after a 30 sec timeout when it tried to start
+the test. This new code spawns a thread to walk through the queue list to
+see if the client is still there.
+
+This required adding a new message type and test flag to the server and
+client code. The client now indicates on the initial request if it can
+respond to these queue queries. The server now remembers the clients old/new
+status (pre 3.5.5 is old) and only sends probes to new clients.
+
+Code changes to the initialize_test() routine and passing parameters between
+the parent and child enable this new function.
+
+Incremented version number and changed Applet version to match the servers
+
+RAC 7/9/09
+
+
+------------------------------------------------------------------------
+r261 | racarlson | 2009-07-07 17:20:57 -0400 (Tue, 07 Jul 2009) | 12 lines
+
+Added more logging functions. The detailed log files (snaplog, ndttrace, cputime
+all drop into a YYYY/DD/MM directory structure under the serverdata directory. The
+sub-dirs are created automatically if they don't exist.
+
+A new metadata file fn.meta is also created. It contains details about the files that
+got created (snaplog, ndttrace, cputime), the client & server IP/hostname, and some other
+minor details. The Janalyze program should probably be changed to look for these .meta files
+to create the analysis work instead of parsing the web100srv.log file.
+
+RAC 7/7/09
+
+
+------------------------------------------------------------------------
+r260 | racarlson | 2009-07-01 18:24:58 -0400 (Wed, 01 Jul 2009) | 7 lines
+
+More changes to pcap routines. Change code to manually set src/dst address/port info during
+the initialization phase. This replaces the old code where the address/port info was
+automatically gathered once the data packets started flowing. This should fix a bug in the
+code that hits the m-lab nodes running in virtual machine space.
+
+Rich 7/1/09
***The diff for this file has been truncated for email.***
=======================================
--- /trunk/AUTHORS Wed Jun 30 08:40:21 2004
+++ /trunk/AUTHORS Fri May 28 04:57:08 2010
@@ -8,3 +8,28 @@

The base Web100 library and routines were done by the
Web100 team at PSC http://www.web100.org
+
+The NDT Java applet has been localized by the following people, many thanks to:
+
+ The Catalan translation was provided by:
+ Victor Saez - vicsaez _at_ gmail _dot_ com
+ David Rincon - drincon _at_ entel_dot _upc _dot_ edu
+ Laia Ecunillera - ecunillera _at_ cesca _dot_ es
+
+ The French translation was provided by:
+ Aris Adamantiadis - aris.adamantiadis _at_ belnet _dot_ be
+
+ The Norwegian translation was provided by:
+ Jon Hellan - jon.kare.hellan _at_ uninett _dot_ no
+
+ The Dutch translation was provided by:
+ Merlijn Hofstra - mhofstra _at_ gmail _dot_ com
+
+ The Brazilian Portuguese translation was provided by:
+ Patricia Dourado - paty_dourado _at_ unifacs_dot _edu _dot_ br
+ Leobino Sampaio - leobino _at_ gmail _dot_ com
+ Jose Augusto Suruagy Monteiro - suruagy _at_ unifacs _dot_ br
+
+ The Russian translation was provided by:
+ Maxim Grigoriev - maxim _at_ fnal _dot_ gov
+
=======================================
--- /trunk/FILES Wed Jun 30 08:40:21 2004
+++ /trunk/FILES Fri May 28 04:57:08 2010
@@ -11,6 +11,7 @@
COPYING - A symlink to the COPYRIGHT notice
NEWS - Any new features or functions
INSTALL - Instruction on what to install
+CHANGES - SVN Log of Changes

README - A brief description of the NDT program
Readme-fakewww - A breif overveiw of the fakewww program
=======================================
--- /trunk/Makefile.in Sun Feb 28 11:29:32 2010
+++ /trunk/Makefile.in Fri May 28 04:57:08 2010
@@ -53,7 +53,7 @@
subdir = .
DIST_COMMON = README $(am__configure_deps) $(srcdir)/Makefile.am \
$(srcdir)/Makefile.in $(srcdir)/config.h.in \
- $(top_srcdir)/configure AUTHORS COPYING ChangeLog INSTALL NEWS \
+ $(top_srcdir)/configure AUTHORS CHANGES COPYING ChangeLog INSTALL
NEWS \
config/compile config/depcomp config/install-sh config/missing \
config/mkinstalldirs
ACLOCAL_M4 = $(top_srcdir)/aclocal.m4


  • [ndt] r358 committed - Adding in credits for the translation work. Adding a SVN dump for ..., ndt, 05/28/2010

Archive powered by MHonArc 2.6.16.

Top of Page