Incident Summary
On June 30, 2026, between approximately 16:30 and 17:00 (GMT-5), a temporary intermittency of approximately 30 minutes occurred in one of the platform's highest-demand databases. Under high traffic volume, certain costly query operations executed by long-running sessions progressively consumed all available connections until they were exhausted, causing transaction rejections and isolated processing delays. The root cause was identified and resolved, and the service was fully restored. As a preventive measure, automatic controls were reinforced on the highest-traffic databases.
Impact
During the incident window, some bots experienced intermittency and response delays across messaging channels (WhatsApp, Facebook, Instagram, and web), and in isolated cases, certain interactions were not completed — corresponding to transactions that were rejected due to the temporary connection saturation. The impact was transient and limited to the event window; no data loss occurred and the service was fully restored following the fix.
Detection
The incident was automatically detected by our infrastructure monitoring systems, which immediately notified the technical team upon identifying anomalous behavior in the affected database, enabling a timely response.
Response
Once the incident was detected, the technical team investigated the state of the affected database and identified a set of long-running sessions executing costly query operations that were holding and saturating the available connections, impacting transaction processing. As an immediate mitigation action, those sessions were terminated, releasing the connections and restoring normal transaction processing. In parallel, as a stabilization measure, the cache layer was reinforced to reduce load on the affected database and data access for the involved operations was optimized, ensuring the service remained stable under the traffic volume.
Root Cause
The root cause of the incident was connection saturation in the affected database. Under high traffic volume, a set of costly query operations executed by long-running sessions held connections for longer than usual and progressively consumed them until the available limit was exhausted; as a result, new transactions were intermittently rejected. The trigger was the sustained traffic volume toward that database; no deployment or code change was identified as associated with the start of the incident.
Resolution
The immediate action was to identify and terminate the sessions running those costly operations that were keeping connections saturated, which released them and normalized the service within minutes. As a permanent fix, the processing of these query operations was optimized to reduce their cost and duration under high-traffic scenarios, alongside a configuration review currently under evaluation. Additionally, automatic controls on the highest-traffic databases were reinforced, incorporating specific thresholds and alerts for connection usage and early detection of this type of saturation, enabling the pattern to be anticipated and addressed before it impacts the service. Recovery was confirmed through continuous monitoring of connection and resource usage metrics, which remained stable following resolution.
Preventive Measures
The processing of these query operations in the affected database was optimized, reducing their cost and duration under high traffic.
Automatic controls were reinforced with specific thresholds and alerts on connection usage, to detect and respond more swiftly to traffic spikes, accelerating detection and escalation for this type of saturation.
Additional database configuration adjustments are currently under evaluation.