[arrl-odv:15703] Re:RE: FW: Network outtage

Bob: Let me try and take your points in order: 1. We have already instituted procedures to insure that this specific situation will not happen again. In restoring the service yesterday, one of the 2 domain controllers was relocated to the main distribution frame (the central point of the HQ network). The MDF will be supported by a gas powered generator being installed as soon as we can get CNG (the local gas company) and, more specifically the local town inspector, to finish their jobs. 2. The instance of the power outage was discovered by the building manager who didn't notify the ISD staff. From their perspective and that of the monitoring software, the servers did power up properly and were running. The problem was the operating software on the network switch when power was restored. The ISD staff should have been notified and would have come in to insure everything was restored. We've reiterated the point that ISD must be notified. 3. Our current UPS capabilities support the orderly shut down of the servers which occurred as planned. The issue was created, not on shut-down, but on the powering up of the network switches. From all indications, although we're checking this now, there was a spike on powering up that came through the circuitry from the outside and bypassed the UPS protection due to a ground fault. This affected the configuration files on the network switch located in the data center and there was a loss of network connectivity for teh servers as a result. Additional UPS power wouldn't have impacted this situation. If you have any additional questions, please let me know. 73, Barry J. Shelley, N1VXY Chief Financial Officer ARRL, Inc.--The national association for Amateur Radio 225 Main St. Newington, CT 06111 Telephone: (860) 594-0212 E-mail: bshelley@arrl.org Web: www.arrl.org ________________________________ From: Bob Vallio [mailto:rbvallio@gmail.com] Sent: Monday, June 18, 2007 2:02 PM To: arrl-odv Subject: [arrl-odv:15702] Re: FW: Network outtage It would seem to be good to put into place a process whereby we will not get bitten by the same bug during the next power failure. Some questions: Is the power outage indication transmitted by any means to staff personnel? If not, can such a system be instituted? UPS lasts for some finite time, generally long enough for servers and systems to be closed in an orderly and non-file-damaging method. Do we need more UPS capacity, or a method whereby the staff will be provided sufficient time to get to the office to perform shutdown routines, or both? If these issues are already being addressed, please advise of the projected implementation time. If they are not being addressed, please advise of the plans to do so. Thanks, Bob -- W6RGG On 6/18/07, Kramer, Harold, WJ1B <wj1b@arrl.org> wrote: A follow-up from Don Durand: Harold _____________________________________________ From: Durand, Don Sent: Monday, June 18, 2007 12:38 PM To: Kramer, Harold, WJ1B Subject: Network outtage On Saturday, HQ experienced 2 power outtages. When the network switches lost power (after draining out the battery charges on the UPSs) some of the configuration files for these devices were corrupted. While we have processes in place to watch the network and determine its status at any given point in time, no alarms were triggered following the return to power. These processes are not designed to look at individual files. This configuration corruption resulted in our inability to access our corporate network, including email and the Siebel membership database. [And Logbook of the World - HK] These configuration files have been rebuilt and all systems have returned to normal status as of 12:30PM, 06/18/2007. Don
participants (1)
-
Shelley, Barry, N1VXY