perfsonar-dev - Ann Arbor - Computer Room downtime issues
Subject: perfsonar development work
List archive
- From: "Jeff W. Boote" <>
- To: "" <>
- Subject: Ann Arbor - Computer Room downtime issues
- Date: Thu, 06 Dec 2007 10:39:47 -0700
Hi All,
First of all, I want to apologize for not notifying anyone of the 30 min downtime we had on November 22 (Thanksgiving holiday here in the U.S.). I was not reading email for the week before that, and did not read the message until it was too late to notify perfSONAR developers.
Second, I would like to let you know some of our plans to mitigate future problems in this area.
On the communications side of things, we are setting things up so that anyone on the perfsonar-dev email list will get notifications of planned down-times of the Ann Arbor machine room. (This usually only happens about once every 6-12 months. On November 22, the main switch for the room was being upgraded.)
Please realize that scheduling these downtimes is a very difficult thing. Internet2 has developers and groups interacting all over the world, and coming up with times that won't effect anyone is likely impossible.
That said, if you see a planned downtime announcement that would adversely effect the project, please let me (), Eric () or Jason () know and we can inquire about rescheduling it. Please do NOT try and contact the person who sends the announcement - they will just have to figure out who to forward your request to.
We also intend to make some of the services more redundant to mitigate the effect of any down-time periods. (Especially since not all down-times are planned.)
This is the list of current services run by Internet2 in the Ann Arbor computer room that perfSONAR developers depend upon to some degree:
perfsonar-dev email list server no fail-over
svn server no fail-over
bugzilla no fail-over
www.perfsonar.net (including downloads) 3-4 min fail-over
The email list server does not have a fail-over. If you attempt to send to the perfsonar-dev list, your own SMTP server will retry for several hours until it can get through.
It would be non-trivial to fail-over svn and bugzilla due to database synchronization issues. We do not believe the cost/complexity of such a solution is worth it.
The web-site fail-over will happen by having a read-only version of the web-site on a backup server. Fail-over is accomplished by DNS changes. It takes about 3-4 minutes for the fail-over to take place, so if something happens to make the Ann Arbor site unreachable, it could result in 3-4 minutes of downtime for the web site.
The data will get to the backup server by using regular rsync's. (I believe the current update frequency is 1 hour.) It will not be possible to update the web site during a fail-over. (i.e. The release mgmt team can not upload any new packages during this time.) But, the web-site should be usable to fetch current packages.
Most of the web-site fail-over is already in place with the exception of the downloads area due to disk-space constraints. Additional disk space will be in place so the downloads can be included before the end of the year.
Thanks,
jeff
- Ann Arbor - Computer Room downtime issues, Jeff W. Boote, 12/06/2007
Archive powered by MHonArc 2.6.16.