As some of you may be aware, we had a service impacting issue on 2/14/2017.  At approximately 5:40PM, the data center experienced a utility power outage.  At this time, the data center's UPS should have ta over the responsibility of providing power.  Due to an as yet unknown failure in the UPS system, this did not happen.  Utility power was pretty quickly restored.  The UPS vendor was dispatched to provide assistance and produce a Root Cause Analysis.  We are awaiting further information from the data center maintenance staff on this.  For most shared and dedicated server customers, this restored service.

Virtual Private Servers
There was an additional anomaly on the Shared Storage Subsystem that caused it to not come up cleanly and caused additional downtime on these services.  The issue was escalated with both the storage vendor and the virtual software vendor (EMC and vMware) to reach resolution.  It was discovered that one of the storage processors did not boot cleanly after a couple of attempts and required manual intervention.  The issue was recovered following this activity and all virtual servers came online.
As a result of the hard crash, database corrupt was discovered this morning on mysql.  It was necessary to roll the data back to the previous day's backup as the transaction logs could not be cleared without compromising the integrity of the data.

Official response - Power issues

At approximately 17:33 EST the Atlanta Data Center experienced a brief utility power sag from Georgia Power and the Building Monitoring System began reporting loss of utility power. Local CSC staff began investigating, and discovered one of our UPS modules failed to operate correctly resulting in a brief interruption of critical power to some of our customers “A” power feeds. In some cases PDU breakers tripped. The CSC Team then initiated the emergency response plan. The UPS module in question failed to carry its portion of the shared load during this event causing the remaining modules to pickup the additional load. The other modules could not maintain proper output voltage resulting in a brief loss of critical power to some customers on their “A” feeds. The remaining UPS modules recovered within milliseconds and have been stable. UPS vendors have been onsite inspecting this system and now have all modules back in service. All systems are currently operating as intended.

Wednesday, February 15, 2017

« Back