Incident Post Mortem
Disruption of Service: February 12, 2019
At Glia, trusted customer success is one of our core values, and delivering the highest standard in system availability, performance, and security is our top priority. We know that trust begins with transparency, and with that in mind, we are providing the full root cause analysis for the disruption of service described below.
On February 12th, 2019 at 11:37 UTC, we deployed an update to our web applications, following normal operating procedure. As a result of this update, visitors using an old version of Internet Explorer (IE11) encountered errors when visiting our client’s sites. These errors caused issues loading the Glia visitor application and in some cases, these errors caused the client sites to malfunction. After receiving reports of issues from our clients, we rolled back the changes and by 15:00 UTC the issue was fully resolved.
This alone would not have been enough to cause this incident since we test the visitor application and its changes in IE11. However, the test site being used had a component that globally installed the polyfills required by our visitor application. Because of this, the issue did not occur during our checks on the test site prior to production deployment. Once in production, normally a problem of this severity would have been immediately caught by our monitoring and the change would have been rolled back in a matter of minutes. However, our systems that monitor visitor application loading failures were not sensitive enough on IE11 to detect this issue that only occurred in a particular browser, causing us to first find out about this issue from client reports.
To avoid similar incidents from occurring in the future, we are strengthening our automated and manual testing procedures to guarantee compatibility with all supported browsers. In particular, we are adding more automated tests for IE11 and ensuring that our test site catches transpilation and polyfilling errors such as this one. Additionally, we are improving our monitors to quickly catch issues where our visitor application fails to load in particular browsers and we are exploring different mechanisms to minimize the impact of exceptions thrown by Glia's application on the performance of our client's websites.
Our Commitment to You:
We sincerely apologize for the impact this incident caused you and your organization. It is our goal to provide world-class service to our customers, and we are continuously assessing and improving our tools, processes, and architecture in order to provide customers with the best service possible.