Incident Summary
On April 17, between 08:00 a.m. and 12:15 p.m. (Ecuador time), a platform-level issue occurred that caused intermittent errors in the message delivery service.
This situation directly affected message sending within the platform, resulting in occasional failures in some processes. Due to its intermittent nature, the service was not completely unavailable, but it did exhibit inconsistent behavior.
Impact
The incident impacted users consuming the message delivery service, causing intermittent failures in the querying and management of phone numbers associated with WABA accounts.
The impact was classified as medium, as not all requests failed and the service was not fully unavailable.
Detection
The issue was identified through error reports and service monitoring, where an irregular failure rate in responses was observed.
Response
Once the incident was detected, the team analyzed recent platform behavior and identified a recently deployed change as the potential cause.
The change was immediately rolled back, and a fix was deployed across all clusters, accompanied by monitoring to validate service stability.
Root Cause
The incident was caused by a recent platform-level change that introduced unexpected behavior in the message delivery service, resulting in intermittent errors.
Resolution and Preventive Measures
Applied Solution:
Rollback of the change that caused the incident
Full deployment of the fix across all clusters
Validation of endpoint stability
Preventive Measures:
Strengthen pre-deployment testing for critical endpoints
Implement stricter monitoring to detect intermittent errors
Apply progressive deployments (controlled rollout)
Improve post-deployment validation