Skip to Content.
Sympa Menu

ndt-dev - [ndt] r340 committed - Mostly debugging changes. Added multiple log_println() lines and upda...

Subject: NDT-DEV email list created

List archive

[ndt] r340 committed - Mostly debugging changes. Added multiple log_println() lines and upda...


Chronological Thread 
  • From:
  • To:
  • Subject: [ndt] r340 committed - Mostly debugging changes. Added multiple log_println() lines and upda...
  • Date: Fri, 09 Apr 2010 16:42:29 +0000

Revision: 340
Author: rcarlson501
Date: Fri Apr 9 09:23:03 2010
Log: Mostly debugging changes. Added multiple log_println() lines and updated
others to include the PID on the line. This adds in debugging multi-client
operations by allowing the admin to tie messags to a specific process.

There were some more substantive changes as well:

test_sfw_srv.c
Changed testTime value. The max value was reduced from 30 sec to 3
sec
there is no reason to wait longer. There is now 2 possible values 1
sec
and 3 secs. This is based on the MaxRTO value.
Todo: fix this to use RTO values and make testTime a float instead
of int var. Both the client and server code needs to change to
implement this approach.

network.c
Handle interrupts/signals while read()/write() data. These functions
can exit before reading/writing if an interrupt occurs. In
multi-client
mode this is quite possible. The new code handles up to 5 interrupts
before returning an error (this number may need to change).

I also fixed the error handling to have send_msg() return an error
indication if the write() failed. The calling routine can then
determin
what to do.

testoptions.c
Improved the error handling around the TEST_PREPARE messages. This is
the send_msg() call that tells the client to begin a new test. It typically
sends a text message (usually a port number) along with this flag.
(ToDo:
it should also send a flag indicating which test to run so the client
can
skip failed tests.) The return code from send_msg() is used to
determine
if the client got this message. If not the test is aborted. (ToDo: implement
a TEST_ABORT message to let the client know this test is being
skipped.)

Also moved the I2AddrFree() calls inside the if() loop. The middlebox, c2s,
and s2c tests all have a master if() loop. The main run_test() routine calls
each test in turn. The test routine determines if it should do something or
just return. The Free() call should only be used if the test was run.

testoptions.h
Added new 'int state' value to the testoptions struct. ToDo: use this
state var to keep track of where in the test process (prepare, start, running,
finalize) the server is. This would allow the server to clean up if
a test
aborted or failed.

web100clt.c
Handle condition where the CreateConnectSocket() call failed. In
this case
the client was unable to open the control socket to the server. The
client now aborts and reports the fault instead of trying to continue

web100srv.c
Fixed bug when trying to dispatch a waiting client in multi-client mode. The
server would find that a client was able to run, but the goto: call
was
inside an if() statement instead of after it, so the call would only
happen
if there were lots of clients in the queue.

Fixed bug where the server would try to 'start' a client multiple times. The
server now checks the clients 'running' flag before trying to 'start' signal.

Moved the check for a stuck client. The server now does the following tasks
process pending signals
dispatch waiting clients if there is a run slot
Also update waiting clients when they move up in the
queue
handle fault conditions (ToDo: improve this function.)
process new test requests

Handle SIGPIPE (13) signals

Improved error handling in the run_test() routine. Each child has a run_test()
routine that control the testing. The test order is fixed and each test routine
is called in sequence. The error code for a failed test is now reported. (ToDo:
further improvements are needed to handle the case where a test fails
while
running. At the present time, faults are caught when the prepare signal is sent
to the client. The server needs to track the process and handle other conditions.)

ToDo:
in order to handle error conditions and faults better, the server needs the ability
to inform the client that a fault has occurred and to 'skip ahead' in the test sequence.
This will require changes to both the server and the client code. Since there are now multiple
clients, this will need to be done in a group manner and backward compatability issues need
to be addresses.

At the present time the client can't get an abort signal after it has received a valid
'wait time' signal. The client can track this and issue different messages depending
on what state it is in, thus overloading the '9999 - server busy' signal. This will
be implimented shortly in the NDT managed clients, and the process started to work out
a better solution with the other client developers

The client should also tell the server some info (OS type, client name, browser (if
applicable). This would help when post processing data and this will go into the .meta
file. This can also be done in a backward compatable manner. The current NDT managed
clients have a flag to indicate their old/new state. This flag can be used/changed to
let the server maintain this compatability with old clients.

RAC 4/9/2010




http://code.google.com/p/ndt/source/detail?r=340

Modified:
/trunk/src/network.c
/trunk/src/test_sfw_srv.c
/trunk/src/testoptions.c
/trunk/src/testoptions.h
/trunk/src/web100clt.c
/trunk/src/web100srv.c

=======================================
--- /trunk/src/network.c Sun Feb 28 11:29:32 2010
+++ /trunk/src/network.c Fri Apr 9 09:23:03 2010
@@ -333,21 +333,39 @@
send_msg(int ctlSocket, int type, void* msg, int len)
{
unsigned char buff[3];
+ int rc, i;

assert(msg);
assert(len >= 0);

+ /* memset(0, buff, 3); */
buff[0] = type;
buff[1] = len >> 8;
buff[2] = len;

- if (writen(ctlSocket, buff, 3) != 3) {
- return -1;
- }
- if (writen(ctlSocket, msg, len) != len) {
- return -2;
- }
- log_println(8, ">>> send_msg: type=%d, len=%d", type, len);
+ for (i=0; i<5; i++) {
+ rc = writen(ctlSocket, buff, 3);
+ if (rc == 3)
+ break;
+ if (rc == 0)
+ continue;
+ if (rc == -1)
+ return -1;
+ }
+ if (i == 5)
+ return -3;
+ for (i=0; i<5; i++) {
+ rc = writen(ctlSocket, msg, len);
+ if (rc == len)
+ break;
+ if (rc == 0)
+ continue;
+ if (rc == -1)
+ return -2;
+ }
+ if (i == 5)
+ return -3;
+ log_println(8, ">>> send_msg: type=%d, len=%d, msg=%s, pid=%d", type, len, msg, getpid());
return 0;
}

@@ -412,8 +430,11 @@
if (n == -1) {
if (errno == EINTR)
continue;
- if (errno != EAGAIN)
- return 0;
+ if (errno != EAGAIN) {
+ log_println(6, "writen() Error! write(%d) failed with err='%s(%d) pic=%d'", fd,
+ strerror(errno), errno, getpid());
+ return -1;
+ }
}
assert(n != 0);
if (n != -1) {
@@ -465,10 +486,8 @@
if (n == -1) {
if (errno == EINTR)
continue;
- if (errno == ECONNRESET)
- return(ECONNRESET);
if (errno != EAGAIN)
- return 0;
+ return -errno;
}
if (n != -1) {
received += n;
=======================================
--- /trunk/src/test_sfw_srv.c Sun Mar 21 11:04:42 2010
+++ /trunk/src/test_sfw_srv.c Fri Apr 9 09:23:03 2010
@@ -121,6 +121,7 @@
web100_group* group;
int maxRTT, maxRTO;
char hostname[256];
+ int rc;

assert(ctlsockfd != -1);
assert(options);
@@ -154,8 +155,11 @@
maxRTO = atoi(web100_value_to_text(web100_get_var_type(var), buff));
if (maxRTT > maxRTO)
maxRTO = maxRTT;
- if ((((double) maxRTO) / 1000.0) < 3.0)
- testTime = (((double) maxRTO) / 1000.0) * 4 ;
+ if ((((double) maxRTO) / 1000.0) > 3.0)
+ /* `testTime = (((double) maxRTO) / 1000.0) * 4 ; */
+ testTime = 3;
+ else
+ testTime = 1;
}
else {
log_println(0, "Simple firewall test: Cannot find connection");
@@ -167,7 +171,8 @@
log_println(1, " -- SFW time: %d", testTime);

sprintf(buff, "%d %d", sfwsockport, testTime);
- send_msg(ctlsockfd, TEST_PREPARE, buff, strlen(buff));
+ if ((rc = send_msg(ctlsockfd, TEST_PREPARE, buff, strlen(buff))) < 0)
+ return (rc);

msgLen = sizeof(buff);
if (recv_msg(ctlsockfd, &msgType, buff, &msgLen)) {
@@ -178,6 +183,7 @@
return 1;
}
if (check_msg_type("Simple firewall test", TEST_MSG, msgType, buff, msgLen)) {
+ log_println(0, "Fault, unexpected message received!");
sprintf(buff, "Server (Simple firewall test): Invalid port number received");
send_msg(ctlsockfd, MSG_ERROR, buff, strlen(buff));
I2AddrFree(sfwsrv_addr);
=======================================
--- /trunk/src/testoptions.c Tue Mar 23 20:16:43 2010
+++ /trunk/src/testoptions.c Fri Apr 9 09:23:03 2010
@@ -263,7 +263,7 @@
int maxseg=1456;
/* int maxseg=1456, largewin=16*1024*1024; */
/* int seg_size, win_size; */
- int midfd, j;
+ int midfd, j, ret;
struct sockaddr_storage cli_addr;
/* socklen_t optlen, clilen; */
socklen_t clilen;
@@ -333,7 +333,8 @@
log_println(1, " -- port: %d", options->midsockport);

sprintf(buff, "%d", options->midsockport);
- send_msg(ctlsockfd, TEST_PREPARE, buff, strlen(buff));
+ if ((ret = send_msg(ctlsockfd, TEST_PREPARE, buff, strlen(buff))) < 0)
+ return ret;

/* set mss to 1456 (strange value), and large snd/rcv buffers
* should check to see if server supports window scale ?
@@ -414,6 +415,7 @@
send_msg(ctlsockfd, TEST_FINALIZE, "", 0);
log_println(1, " <--------- %d ----------->", options->child0);
setCurrentTest(TEST_NONE);
+ /* I2AddrFree(midsrv_addr); */
}
/* I2AddrFree(midsrv_addr); */
return 0;
@@ -529,7 +531,8 @@

log_println(1, "Sending 'GO' signal, to tell client %d to head for the next test", testOptions->child0);
sprintf(buff, "%d", testOptions->c2ssockport);
- send_msg(ctlsockfd, TEST_PREPARE, buff, strlen(buff));
+ if ((ret = send_msg(ctlsockfd, TEST_PREPARE, buff, strlen(buff))) < 0)
+ return ret;

clilen = sizeof(cli_addr);
/* j = 0; */
@@ -559,7 +562,7 @@
if (getuid() == 0) {
pipe(mon_pipe1);
if ((mon_pid1 = fork()) == 0) {
- close(ctlsockfd);
+ /* close(ctlsockfd); */
close(testOptions->c2ssockfd);
close(recvsfd);
log_println(5, "C2S test Child %d thinks pipe() returned fd0=%d, fd1=%d",
@@ -809,10 +812,13 @@

log_println(1, " <----------- %d -------------->", testOptions->child0);
setCurrentTest(TEST_NONE);
- }
/* I2AddrFree(c2ssrv_addr); */
I2AddrFree(src_addr);
/* testOptions->child1 = mon_pid1; */
+ }
+ /* I2AddrFree(c2ssrv_addr); */
+ /* I2AddrFree(src_addr); */
+ /* testOptions->child1 = mon_pid1; */
return 0;
}

@@ -948,10 +954,14 @@
/* Data received from speed-chk, tell applet to start next test */
sprintf(buff, "%d", testOptions->s2csockport);
j = send_msg(ctlsockfd, TEST_PREPARE, buff, strlen(buff));
- if (j == -1)
+ if (j == -1) {
log_println(6, "S2C %d Error!, Test start message not sent!", testOptions->child0);
- if (j == -2)
+ return j;
+ }
+ if (j == -2) {
log_println(6, "S2C %d Error!, server port [%s] not sent!", testOptions->child0, buff);
+ return j;
+ }

/* ok, await for connect on 3rd port
* This is the second throughput test, with data streaming from
@@ -984,7 +994,7 @@
if (getuid() == 0) {
pipe(mon_pipe2);
if ((mon_pid2 = fork()) == 0) {
- close(ctlsockfd);
+ /* close(ctlsockfd); */
close(testOptions->s2csockfd);
close(xmitsfd);
log_println(5, "S2C test Child thinks pipe() returned fd0=%d, fd1=%d", mon_pipe2[0], mon_pipe2[1]);
@@ -1304,9 +1314,12 @@

log_println(1, " <------------ %d ------------->", testOptions->child0);
setCurrentTest(TEST_NONE);
+ /* I2AddrFree(s2csrv_addr); */
+ I2AddrFree(src_addr);
+ /* testOptions->child2 = mon_pid2; */
}
/* I2AddrFree(s2csrv_addr); */
- I2AddrFree(src_addr);
+ /* I2AddrFree(src_addr); */
/* testOptions->child2 = mon_pid2; */
return 0;
}
=======================================
--- /trunk/src/testoptions.h Tue Feb 9 17:30:07 2010
+++ /trunk/src/testoptions.h Fri Apr 9 09:23:03 2010
@@ -32,6 +32,7 @@
pid_t child2;

int sfwopt;
+ int State;
} TestOptions;

int wait_sig;
=======================================
--- /trunk/src/web100clt.c Sun Feb 28 11:29:32 2010
+++ /trunk/src/web100clt.c Fri Apr 9 09:23:03 2010
@@ -761,6 +761,10 @@
}
if (xwait == 0) /* signal from ver 3.0.x NDT servers */
break;
+ if (xwait == 9977) {
+ fprintf(stderr, "Server Fault: Test terminated for unknown reason, plase try again later.\n");
+ exit(0);
+ }
if (xwait == 9988) {
fprintf(stderr, "Server Busy: Too many clients waiting in queue, plase try again later.\n");
exit(0);
=======================================
--- /trunk/src/web100srv.c Thu Mar 25 10:12:13 2010
+++ /trunk/src/web100srv.c Fri Apr 9 09:23:03 2010
@@ -734,7 +734,7 @@
void *
zombieWorker(void *head_ptr) {

- struct ndtchild *tmp_ptr, *tmp, *pre_ptr;
+ struct ndtchild *tmp_ptr, *tmp, *pre_ptr=NULL;
int i=0, rc;
struct timeval sel_tv;
fd_set rfd;
@@ -763,7 +763,8 @@
continue;
}
log_println(6, "New client found, checking for response, child=%d", tmp_ptr->pid);
- send_msg(tmp_ptr->ctlsockfd, SRV_QUEUE, tmpstr, strlen(tmpstr));
+ rc = send_msg(tmp_ptr->ctlsockfd, SRV_QUEUE, tmpstr, strlen(tmpstr));
+ log_println(6, "send_msg() returned %d during zombie check on client %d", rc, tmp_ptr->pid);
FD_ZERO(&rfd);
FD_SET(tmp_ptr->ctlsockfd, &rfd);
sel_tv.tv_sec = 1;
@@ -789,7 +790,7 @@
log_println(6, "Free'd semaphore lock - 4");
break;
default:
- log_println(6, "new client responded, bumping pointers child=%d", tmp_ptr->pid);
+ log_println(6, "%d new client(s) responded, bumping pointers child=%d", rc, tmp_ptr->pid);
recv_msg(tmp_ptr->ctlsockfd, &msgType, buff, &msgLen);
tmp_ptr = tmp_ptr->next;
pre_ptr = pre_ptr->next;
@@ -901,7 +902,7 @@
sprintf(buff, "v%s", VERSION);
send_msg(ctlsockfd, MSG_LOGIN, buff, strlen(buff));

-log_println(3, "run_test() routine, asking for test_suite = %s", test_suite);
+ log_println(3, "run_test() routine, asking for test_suite = %s", test_suite);
send_msg(ctlsockfd, MSG_LOGIN, test_suite, strlen(test_suite));
/* if ((n = initialize_tests(ctlsockfd, &testopt, conn_options))) {
log_println(0, "ERROR: Tests initialization failed (%d)", n);
@@ -924,32 +925,40 @@
}

/* alarm(15); */
- log_println(6, "Setting 15 sec alarm for middlebox test");
- if (test_mid(ctlsockfd, agent, &*testopt, conn_options, &s2c2spd)) {
- log_println(0, "Middlebox test FAILED!");
+ log_println(6, "Starting middlebox test");
+ if ((ret = test_mid(ctlsockfd, agent, &*testopt, conn_options, &s2c2spd)) != 0) {
+ if (ret < 0)
+ log_println(6, "Middlebox test failed with rc=%d", ret);
+ log_println(0, "Middlebox test FAILED!, rc=%d", ret);
testopt->midopt = TOPT_DISABLED;
}

/* alarm(20); */
- log_println(6, "re-Setting 20 sec alarm for simple firewall test");
- if (test_sfw_srv(ctlsockfd, agent, &*testopt, conn_options)) {
- log_println(0, "Simple firewall test FAILED!");
+ log_println(6, "Starting simple firewall test");
+ if ((ret = test_sfw_srv(ctlsockfd, agent, &*testopt, conn_options)) != 0) {
+ if (ret < 0)
+ log_println(6, "SFW test failed with rc=%d", ret);
+ log_println(0, "Simple firewall test FAILED!, rc=%d", ret);
testopt->sfwopt = TOPT_DISABLED;
}

/* alarm(25); */
- log_println(6, "re-Setting 20 sec alarm for c2s throughput test");
- if (test_c2s(ctlsockfd, agent, &*testopt, conn_options, &c2sspd, set_buff, window, autotune,
- device, &options, record_reverse, count_vars, spds, &spd_index)) {
- log_println(0, "C2S throughput test FAILED!");
+ log_println(6, "Starting c2s throughput test");
+ if ((ret = test_c2s(ctlsockfd, agent, &*testopt, conn_options, &c2sspd, set_buff, window, autotune,
+ device, &options, record_reverse, count_vars, spds, &spd_index)) != 0) {
+ if (ret < 0)
+ log_println(6, "C2S test failed with rc=%d", ret);
+ log_println(0, "C2S throughput test FAILED!, rc=%d", ret);
testopt->c2sopt = TOPT_DISABLED;
}

/* alarm(25); */
- log_println(6, "re-Setting 20 sec alarm for s2c throughput test");
- if (test_s2c(ctlsockfd, agent, &*testopt, conn_options, &s2cspd, set_buff, window, autotune,
- device, &options, spds, &spd_index, count_vars, &peaks)) {
- log_println(0, "S2C throughput test FAILED!");
+ log_println(6, "Starting s2c throughput test");
+ if ((ret = test_s2c(ctlsockfd, agent, &*testopt, conn_options, &s2cspd, set_buff, window, autotune,
+ device, &options, spds, &spd_index, count_vars, &peaks)) != 0) {
+ if (ret < 0)
+ log_println(6, "S2C test failed with rc=%d", ret);
+ log_println(0, "S2C throughput test FAILED!, rc=%d", ret);
testopt->s2copt = TOPT_DISABLED;
}

@@ -1319,6 +1328,7 @@
DataDirName = NULL;

memset(&testopt, 0, sizeof(testopt));
+ /* sigset_t newmask, oldmask; */

#ifdef AF_INET6
#define GETOPT_LONG_INET6(x) "46"x
@@ -1577,10 +1587,13 @@
new.sa_handler = cleanup;

/* Grab all signals and run them through my cleanup routine. 2/24/05 */
+ /* sigemptyset(&newmask);
+ * sigemptyset(&oldmask); */
for (i=1; i<32; i++) {
if ((i == SIGKILL) || (i == SIGSTOP))
continue; /* these signals can't be caught */
sigaction(i, &new, NULL);
+ /* sigaddset(&newmask, i); */
}

/*
@@ -1695,6 +1708,7 @@

if (sig13 == 1) {
log_println(5, "todo: Handle SIGPIPE signal, terminate child?");
+ child_sig(0);
sig13 = 0;
}

@@ -1703,9 +1717,69 @@
child_sig(0);
}

+ if ((multiple == 1) && (mclients < max_clients) && (waiting >= max_clients)) {
+ /* this condition means that there are clients waiting and there are open slots
+ * in the test queue, so dispatch another client.
+ * RAC 12/11/09
+ */
+ log_println(5, "Empty slot in test queue, find new client to
dispatch");
+ /* tmp_ptr = head_ptr; */
+ mchild = head_ptr;
+ i = 0;
+ while (mchild != NULL) {
+ i++; /* Keep count of how many times we go through this loop */
+ log_println(2, "walking queue look for non-running client current=%d, running=%d, next=0x%x",
+ mchild->pid, mchild->running, mchild->next);
+ if (mchild->running == 0) {
+ /* mchild = tmp_ptr; */
+ log_println(6, "found non-running client %d, update queue and dispatch this client",
+ mchild->pid);
+ break;
+ }
+ mchild = mchild->next;
+ }
+ if (i > max_clients) {
+ log_println(6, "walked through running client list, no empty
slots!");
+ continue;
+ }
+
+ /* if ((mchild->next == NULL) && (mchild->running == 0))
+ * mchild = tmp_ptr;
+ * if (mchild != head_ptr) {
+ */
+ tmp_ptr = mchild;
+ /* update queued clients, send message to client when it moves
+ * up in the queue enough to get closer to running a test. This
happens
+ * when the client falls into the next lower maxquee bin
+ * RAC 3/21/10
+ */
+ int rac;
+ if (waiting > (2*max_clients)) {
+ for (i=max_clients; i<=waiting; i++) {
+ if (tmp_ptr == NULL)
+ break;
+ if (i == (2*max_clients)) {
+ rac = send_msg(tmp_ptr->ctlsockfd, SRV_QUEUE, "1", 1);
+ log_println(6, "sent 45 sec update message to client %d on fd=%d, send_msg() returned %d",
+ tmp_ptr->pid, tmp_ptr->ctlsockfd, rac);
+ }
+ if (i == (3*max_clients)) {
+ rac = send_msg(tmp_ptr->ctlsockfd, SRV_QUEUE, "2", 1);
+ log_println(6, "sent 90 sec update message to client %d on fd=%d, send_msg() returned %d",
+ tmp_ptr->pid, tmp_ptr->ctlsockfd, rac);
+ }
+ tmp_ptr = tmp_ptr->next;
+ }
+ }
+ goto dispatch_client;
+ }
+
if ((waiting < 0) || (mclients < 0)) {
log_println(6, "Fault: Negative number of clents waiting=%d, mclients=%d, nuke them", waiting, mclients);
while (head_ptr != NULL) {
+ send_msg(head_ptr->ctlsockfd, SRV_QUEUE, "9888", 4);
+ shutdown(head_ptr->ctlsockfd, SHUT_WR);
+ close(head_ptr->ctlsockfd);
tpid = head_ptr->pid;
child_sig(-1);
kill(tpid, SIGTERM);
@@ -1713,11 +1787,15 @@
}
waiting = 0;
mclients = 0;
+ continue;
}

if ((waiting == 0) && (head_ptr != NULL)) {
- log_println(6, "Fault: Something in queue, but no waiting clients");
+ log_println(6, "Fault: Something [%d] in queue, but no waiting clients", head_ptr->pid);
while (head_ptr != NULL) {
+ send_msg(head_ptr->ctlsockfd, SRV_QUEUE, "9777", 4);
+ shutdown(head_ptr->ctlsockfd, SHUT_WR);
+ close(head_ptr->ctlsockfd);
tpid = head_ptr->pid;
child_sig(-1);
kill(tpid, SIGTERM);
@@ -1725,64 +1803,33 @@
}
waiting = 0;
mclients = 0;
+ continue;
}

if (head_ptr != NULL) {
if ((time(0) - head_ptr->stime) > 60) {
- log_println(6, "Fault: Something in queue, but child has exceeded wait time");
+ log_println(6, "Fault: Something in queue, but child %d (fd=%d) has exceeded wait time",
+ head_ptr->pid, head_ptr->ctlsockfd);
+ /* Should send new 9977 'test aborted' signal to client. Using
this
+ * for now.
+ *
+ * rac 3/26/10
+ */
+ log_println(6, "pid=%d, client='%s', stime=%ld, qtime=%ld now=%ld", head_ptr->pid, head_ptr->addr,
+ head_ptr->stime, head_ptr->qtime, time(0));
+ log_println(6, "pipe-fd=%d, running=%d, ctlsockfd=%d, client-type=%d, tests='%s'",
+ head_ptr->pipe, head_ptr->running,
head_ptr->ctlsockfd,
+ head_ptr->oldclient, head_ptr->tests);
+ send_msg(head_ptr->ctlsockfd, SRV_QUEUE, "9666", 4);
+ shutdown(head_ptr->ctlsockfd, SHUT_WR);
+ close(head_ptr->ctlsockfd);
tpid = head_ptr->pid;
child_sig(-1);
kill(tpid, SIGTERM);
child_sig(tpid);
+ continue;
}
}
-
- if ((multiple == 1) && (mclients < max_clients) && (waiting >= max_clients)) {
- /* this condition means that there are clients waiting and there are open slots
- * in the test queue, so dispatch another client.
- * RAC 12/11/09
- */
- log_println(5, "Empty slot in test queue, find new client to
dispatch");
- tmp_ptr = head_ptr;
- mchild = head_ptr;
- while (tmp_ptr != NULL) {
- log_println(2, "walking queue look for non-running client current=%d, running=%d, next=0x%x",
- tmp_ptr->pid, tmp_ptr->running, tmp_ptr->next);
- if (tmp_ptr->running == 0) {
- mchild = tmp_ptr;
- break;
- }
- tmp_ptr = tmp_ptr->next;
- }
- if ((tmp_ptr->next == NULL) && (tmp_ptr->running == 0))
- mchild = tmp_ptr;
- if (mchild != head_ptr) {
- tmp_ptr = mchild;
- /* update queued clients, send message to client when it moves
- * up in the queue enough to get closer to running a test. This
happens
- * when the client falls into the next lower maxquee bin
- * RAC 3/21/10
- */
- if (waiting > (2*max_clients)) {
- for (i=max_clients; i<=waiting; i++) {
- if (tmp_ptr == NULL)
- break;
- if (i == (2*max_clients)) {
- log_println(6, "Updating client list position client %d moved now 45 sec away",
- tmp_ptr->pid);
- send_msg(tmp_ptr->ctlsockfd, SRV_QUEUE, "1", 1);
- }
- if (i == (3*max_clients)) {
- log_println(6, "Updating client list position client %d moved now 90 sec away",
- tmp_ptr->pid);
- send_msg(tmp_ptr->ctlsockfd, SRV_QUEUE, "2", 1);
- }
- tmp_ptr = tmp_ptr->next;
- }
- goto multi_client;
- }
- }
- }

if ((multiple == 1) && (mclients > waiting)) {
log_println(5, "Multi-client mode has uncaught terminated clients mclient=%d, waiting=%d", mclients, waiting);
@@ -1799,16 +1846,19 @@
sel_tv.tv_sec = 3;
sel_tv.tv_usec = 0;
log_println(3, "Waiting for new connection, timer running");
+sel_11:
rc = select(listenfd+1, &rfd, NULL, NULL, &sel_tv);
if ((rc == -1) && (errno == EINTR))
/* continue; */ /* a signal caused the select() to exit, re-enter loop & check */
- continue;
+ goto sel_11;
tt = time(0);
+
+/*
if (head_ptr != NULL) {
log_println(3, "now = %ld Process started at %ld, run time = %ld",
tt, head_ptr->stime, (tt - head_ptr->stime));
if ((tt - head_ptr->stime) > 60) {
- /* process is stuck at the front of the queue. */
+ /-* process is stuck at the front of the queue. *-/
fp = fopen(get_logfile(),"a");
if (fp != NULL) {
fprintf(fp, "%d children waiting in queue: Killing off stuck process %d at %15.15s\n",
@@ -1817,12 +1867,17 @@
}
log_println(6, "%d children waiting in queue: Killing off stuck process %d at %15.15s\n",
waiting, head_ptr->pid, ctime(&tt)+4);
- /* kill(tmp_ptr->pid, SIGTERM); */
- /* kill(head_ptr->pid, SIGCHLD); */
- /* clean up more and inform the client that the test is ending
+ -* kill(tmp_ptr->pid, SIGTERM); *-
+ -* kill(head_ptr->pid, SIGCHLD); *-
+ -* clean up more and inform the client that the test is ending
* rac 2/27/10
- */
- send_msg(head_ptr->ctlsockfd, SRV_QUEUE, "9999", 4);
+ *-
+ log_println(6, "pid=%d, client='%s', stime=%ld, qtime=%ld now=%ld", head_ptr->pid, head_ptr->addr,
+ head_ptr->stime, head_ptr->qtime, time(0));
+ log_println(6, "pipe-fd=%d, running=%d, ctlsockfd=%d, client-type=%d, tests='%s'",
+ head_ptr->pipe, head_ptr->running,
head_ptr->ctlsockfd,
+ head_ptr->oldclient, head_ptr->tests);
+ send_msg(head_ptr->ctlsockfd, SRV_QUEUE, "9555", 4);
shutdown(head_ptr->ctlsockfd, SHUT_WR);
close(head_ptr->ctlsockfd);
tpid = head_ptr->pid;
@@ -1833,22 +1888,26 @@
if (((multiple == 0) && (waiting == 1)) ||
((multiple == 1) && (mclients == 0)))
testing = 0;
- /* should not decrement waiting here, it was decrementd in the child_sig() routine
+ -* should not decrement waiting here, it was decrementd in the child_sig() routine
* RAC 2/27/09
- */
- /* if (waiting > 0)
+ *-
+ -* if (waiting > 0)
* waiting--;
- */
+ *-
if (waiting == 0)
mclients = 0;
}
}
+ */
}
else {
/* Nothing is in the queue, so wait forever until a new connection request arrives */
log_println(3, "Timer not running, waiting for new connection");
mclients = 0;
+sel_12:
rc = select(listenfd+1, &rfd, NULL, NULL, NULL);
+ if ((rc == -1) && (errno == EINTR))
+ goto sel_12; /* a signal caused the select() to exit, re-enter loop & check */
}

if (rc < 0) {
@@ -1875,18 +1934,26 @@
goto ChldRdy;
/* } */
clilen = sizeof(cli_addr);
- for (;;) {
+ memset(&cli_addr, 0, clilen);
+ log_println(6, "Select() found %d clients ready, highest fd=%d", rc, listenfd);
+ if (rc > 1) {
+ for (i=3; i<=listenfd; i++) {
+ if (FD_ISSET(i, &rfd)) {
+ listenfd = i;
+ break;
+ }
+ }
+ }
+ for (i=0;i<5;i++) {
+ ctlsockfd = 0;
ctlsockfd = accept(listenfd, (struct sockaddr *) &cli_addr, &clilen);
- if (ctlsockfd < 0) {
- if (errno == EINTR)
+ if ((ctlsockfd == -1) && (errno == EINTR))
continue; /*sig child */
- perror("Web100srv server: accept error");
- break;
- }
size_t tmpstrlen = sizeof(tmpstr);
+ memset(tmpstr, 0, tmpstrlen);
I2Addr tmp_addr = I2AddrBySockFD(get_errhandle(), ctlsockfd, False);
I2AddrNodeName(tmp_addr, tmpstr, &tmpstrlen);
- I2AddrFree(tmp_addr);
+ /* I2AddrFree(tmp_addr); */
log_println(4, "New connection received from 0x%x [%s] sockfd=%d.", tmp_addr, tmpstr, ctlsockfd);
break;
}
@@ -1925,7 +1992,8 @@
*/
}

- pipe(chld_pipe);
+ if (pipe(chld_pipe) == -1)
+ log_println(6, "pipe() failed errno=%d", errno);
chld_pid = fork();

switch (chld_pid) {
@@ -1939,7 +2007,7 @@
log_println(5, "Parent process spawned child = %d", chld_pid);
log_println(5, "Parent thinks pipe() returned fd0=%d, fd1=%d", chld_pipe[0], chld_pipe[1]);

- close(chld_pipe[0]);
+ /* close(chld_pipe[0]); */

/* Check to see if we have more than max_clients waiting in the queue
* If so, tell them to go away.
@@ -1947,9 +2015,11 @@
*/
if (((multiple == 0) && (waiting >= (max_clients-1))) ||
((multiple == 1) && (waiting >= ((4*max_clients)-1)))) {
- log_println(0, "Too many clients/mclients (%d) waiting to be served, Please try again later.", chld_pid);
+ log_println(0, "Too many clients/mclients (%d) waiting to be served, Please try again later.",
+ chld_pid);
sprintf(tmpstr, "9988");
send_msg(ctlsockfd, SRV_QUEUE, tmpstr, strlen(tmpstr));
+ close(chld_pipe[0]);
close(chld_pipe[1]);
shutdown(ctlsockfd, SHUT_WR);
close(ctlsockfd);
@@ -1964,6 +2034,7 @@
t_opts = initialize_tests(ctlsockfd, &testopt, test_suite);
if (t_opts < 1) {
log_println(3, "Invalid test suite string '%s' received, terminate child", test_suite);
+ close(chld_pipe[0]);
close(chld_pipe[1]);
shutdown(ctlsockfd, SHUT_WR);
close(ctlsockfd);
@@ -1979,6 +2050,7 @@
continue;
}
log_println(6, "creating new child - semaphore locked");
+ /*sigprocmask(SIG_BLOCK, &newmask, &oldmask); */
new_child->pid = chld_pid;
strncpy(new_child->addr, rmt_host, strlen(rmt_host));
strncpy(new_child->host, name, strlen(name));
@@ -1997,6 +2069,7 @@
memset(new_child->tests, 0, sizeof(test_suite));
memcpy(new_child->tests, test_suite, strlen(test_suite));
new_child->next = NULL;
+ /* sigprocmask(SIG_SETMASK, &oldmask, NULL); */
sem_post(&ndtq);
log_println(6, "Free'd ndtq semaphore lock - 1");
if (multiple == 1)
@@ -2006,12 +2079,14 @@
log_println(3, "initialize_tests returned old/new client = %d, test_suite = %s",
new_child->oldclient, new_child->tests);

+ /* close(chld_pipe[0]); */
+
if ((testing == 1) && (queue == 0)) {
log_println(3, "queuing disabled and testing in progress, tell client no");
- send_msg(ctlsockfd, SRV_QUEUE, "9999", 4);
+ send_msg(new_child->ctlsockfd, SRV_QUEUE, "9444", 4);
close(chld_pipe[1]);
- shutdown(ctlsockfd, SHUT_WR);
- close(ctlsockfd);
+ shutdown(new_child->ctlsockfd, SHUT_WR);
+ close(new_child->ctlsockfd);
log_println(6, "no queuing, free new_child=0x%x", new_child);
free(new_child);
continue;
@@ -2057,7 +2132,7 @@
log_println(3, "%d clients waiting, telling client (%d) testing will begin within %d minutes",
(waiting-1), tmp_ptr->pid, (waiting-1));
sprintf(tmpstr, "%d", (waiting-1));
- send_msg(ctlsockfd, SRV_QUEUE, tmpstr, strlen(tmpstr));
+ send_msg(tmp_ptr->ctlsockfd, SRV_QUEUE, tmpstr, strlen(tmpstr));
continue;
}

@@ -2141,22 +2216,29 @@
* request.
*/

+dispatch_client:
memset(tmpstr, 0, sizeof(tmpstr));
if (multiple == 1) {
- log_println(3, "New mclient '%d'(%d) asking for service", mclients, mchild->pid);
if (mchild == NULL)
mchild = head_ptr;
+ if (mchild->running == 1)
+ continue;
+ log_println(3, "New mclient '%d'(%d) asking for service", mclients, mchild->pid);
mchild->stime = time(0);
mchild->running = 1;
mclients++;
sprintf(tmpstr, "go %d %s", t_opts, mchild->tests);
+ log_println(5, "sending 'GO' signal to client msg='%s'", tmpstr);
send_msg(mchild->ctlsockfd, SRV_QUEUE, "0", 1);
for (i=0; i<5; i++) {
rc = write(mchild->pipe, tmpstr, strlen(tmpstr));
+ log_println(6, "write(%d) returned %d, errno=%d", mchild->pid, rc, errno);
if ((rc == -1) && (errno == EINTR))
continue;
if (rc == strlen(tmpstr))
break;
+ log_println(6, "Failed to write 'GO' message to client %d, reason=%d, errno=%d",
+ mchild->pid, rc, errno);
/* TODO: handle other error conditions */
}
close(mchild->pipe);
@@ -2166,6 +2248,7 @@
head_ptr->stime = time(0);
head_ptr->running = 1;
sprintf(tmpstr, "go %d %s", t_opts, head_ptr->tests);
+ log_println(5, "sending 'GO' signal to client msg='%s'", tmpstr);
send_msg(head_ptr->ctlsockfd, SRV_QUEUE, "0", 1);
for (i=0; i<5; i++) {
rc = write(head_ptr->pipe, tmpstr, strlen(tmpstr));
@@ -2205,10 +2288,11 @@
* RAC 3/18/10
*/
rc = read(chld_pipe[0], buff, 32);
+ log_println(6, "Child %d received '%s' from parent", getpid(),
buff);
if ((rc == -1) && (errno == EINTR))
continue;
if (strncmp(buff, "go", 2) == 0) {
- log_println(6, "Got 'go' signal from parent, ready to start testing");
+ log_println(6, "Got 'go' signal from parent, ready to start testing %d", getpid());
break;
}
}


  • [ndt] r340 committed - Mostly debugging changes. Added multiple log_println() lines and upda..., ndt, 04/09/2010

Archive powered by MHonArc 2.6.16.

Top of Page