On January 18th at 4:17pm EST, Bloomerang teams received alerts of customers’ inability to login to the CRM application. Our incident management team started a triage by 4:21pm EST; additional teams were assembled to actively review their areas and work to identify the cause of the issue. As a result of the research, it was determined that teams needed to rollback a production database change that was implemented at 4:10pm EST. Once the rollback plan was identified, teams were able to restore services by 5:05pm EST.
The database change was preemptive work to prepare for a larger change later in the evening. Neither the change nor the rollback procedure caused data integrity issues with customer data.
Per process, our teams continued to engage the triage for an arbitrary amount of time to confirm restoration and perform any additional cleanup. During the triage, our incident management team provided updates via the external status page; the incident and triage were marked as resolved by 5:59pm EST.
A change against our production services caused a disruption with the login process.
The solution was a rollback of a database change implemented at 4:10pm EST.
Action Items | Tentative Completion Date |
---|---|
Internal teams responsible for the production change will perform a retrospective and identify a safe path forward to implement this particular product enhancement. | Friday, 1/19/2024 |
Identify additional checks for this type of change in the CRM staging environment. | Monday, 2/5/2024 |
The amount of time taken to rollback could be decreased; additional rollback procedures will be incorporated for pre-deployment changes. | Monday, 2/5/2024 |