Jelou - Notice history

All systems operational

Notice history

Jul 2026

No notices reported this month

Jun 2026

WhatsApp Messaging Service – Increased Latency & Delayed Delivery
  • Postmortem
    Postmortem

    Incident Summary
    On June 30, 2026, between approximately 16:30 and 17:00 (GMT-5), a temporary intermittency of approximately 30 minutes occurred in one of the platform's highest-demand databases. Under high traffic volume, certain costly query operations executed by long-running sessions progressively consumed all available connections until they were exhausted, causing transaction rejections and isolated processing delays. The root cause was identified and resolved, and the service was fully restored. As a preventive measure, automatic controls were reinforced on the highest-traffic databases.

    Impact
    During the incident window, some bots experienced intermittency and response delays across messaging channels (WhatsApp, Facebook, Instagram, and web), and in isolated cases, certain interactions were not completed — corresponding to transactions that were rejected due to the temporary connection saturation. The impact was transient and limited to the event window; no data loss occurred and the service was fully restored following the fix.

    Detection
    The incident was automatically detected by our infrastructure monitoring systems, which immediately notified the technical team upon identifying anomalous behavior in the affected database, enabling a timely response.

    Response
    Once the incident was detected, the technical team investigated the state of the affected database and identified a set of long-running sessions executing costly query operations that were holding and saturating the available connections, impacting transaction processing. As an immediate mitigation action, those sessions were terminated, releasing the connections and restoring normal transaction processing. In parallel, as a stabilization measure, the cache layer was reinforced to reduce load on the affected database and data access for the involved operations was optimized, ensuring the service remained stable under the traffic volume.

    Root Cause
    The root cause of the incident was connection saturation in the affected database. Under high traffic volume, a set of costly query operations executed by long-running sessions held connections for longer than usual and progressively consumed them until the available limit was exhausted; as a result, new transactions were intermittently rejected. The trigger was the sustained traffic volume toward that database; no deployment or code change was identified as associated with the start of the incident.

    Resolution
    The immediate action was to identify and terminate the sessions running those costly operations that were keeping connections saturated, which released them and normalized the service within minutes. As a permanent fix, the processing of these query operations was optimized to reduce their cost and duration under high-traffic scenarios, alongside a configuration review currently under evaluation. Additionally, automatic controls on the highest-traffic databases were reinforced, incorporating specific thresholds and alerts for connection usage and early detection of this type of saturation, enabling the pattern to be anticipated and addressed before it impacts the service. Recovery was confirmed through continuous monitoring of connection and resource usage metrics, which remained stable following resolution.

    Preventive Measures

    • The processing of these query operations in the affected database was optimized, reducing their cost and duration under high traffic.

    • Automatic controls were reinforced with specific thresholds and alerts on connection usage, to detect and respond more swiftly to traffic spikes, accelerating detection and escalation for this type of saturation.

    • Additional database configuration adjustments are currently under evaluation.

  • Resolved
    Resolved
    This incident has been resolved.
  • Investigating
    Investigating

    WhatsApp messages are currently experiencing delays in responses. We are currently investigating this incident.

May 2026

WhatsApp Messaging Service – Increased Latency & Delayed Delivery
  • Postmortem
    Postmortem

    Incident Summary

    An overload was recorded in the bot system that limited its ability to process incoming messages in a timely manner. As support resources became saturated, the service experienced excessive wait times, preventing end users from receiving automated responses during the event.

    Impact

    The impact was limited to the interruption of service flows that depend on real-time message processing and user responses. Static or independent flows continued to run normally.

    Detection

    The incident was identified through automatic performance alerts and high latency in bot responses, in addition to reports from the support team and customers experiencing failures in digital signature flows. This confirmed service degradation in the processing of interactive messages.

    Response

    Once the issue was identified, the engineering team diagnosed the root cause and deployed a fix in under 30 minutes from escalation, increasing the performance and capacity of the resources supporting the bot to normalize the service.

    Root Cause

    The incident was triggered by a mass event caused by the simultaneous sending of campaigns by two customers. This generated a sudden and concentrated spike in activity volume that exceeded the response speed of the automatic scaling mechanisms, preventing the system from reacting in time to contain the load and causing temporary resource saturation.

    Resolution

    As a definitive resolution, scaling controls were adjusted and the base performance capacity of the bot infrastructure was increased. This adjustment ensures the system has the necessary headroom to absorb mass events and sudden activity spikes without degrading service.

  • Resolved
    Resolved
    This incident has been resolved.
  • Monitoring
    Monitoring
    We implemented a fix and are currently monitoring the result.
  • Investigating
    Investigating

    WhatsApp messages are currently experiencing delays in responses. We are currently investigating this incident.

Workflow Intermittency
  • Postmortem
    Postmortem

    Incident Summary

    On May 19, 2026, during an automated hardening and infrastructure controls strengthening process, a temporary disruption occurred in some platform services. The intervention began at approximately 11:10 (UTC-5); the infrastructure team applied the corresponding fix at around 11:30, conducted behavior monitoring until 11:50, and confirmed full service stabilization at approximately 12:07.

    Impact

    The total period between the application of the change and full service stabilization was approximately one hour. During that interval, certain services related to workflows and integrations experienced intermittency and availability errors for approximately 30 minutes, while the team applied the fix and validated recovery. The impact was limited to temporary service availability. There was no data loss. There was no security compromise or unauthorized access.

    Detection

    The behavior was flagged by the service Health Status controls, which notify via Slack and email upon detection of disruptions, allowing the infrastructure team to immediately begin reviewing the event.

    Response

    Once the behavior was identified, the infrastructure team analyzed the affected services and components, determined the origin of the event, applied the fix at 11:30, carried out validations and post-fix monitoring until 11:50, and confirmed full platform stabilization at 12:07.

    Root Cause

    During an automated hardening and infrastructure controls strengthening process, an internal connectivity dependency between platform services reacted in an unexpected way. The dependency had a lower-than-optimal level of detail in the configuration inventory, which caused a temporary degradation in communication between components and, consequently, the transient unavailability of the affected services during the stabilization period.

    Resolution

    As part of the continuous improvement of controls already operated by the engineering team, the following mechanisms are being reinforced: deepening the validation controls applied before and after each infrastructure change, incorporating more thorough connectivity tests into the standard intervention protocol; expanding the scope of monitoring and early alerting mechanisms already in use, with more agile escalation to on-call teams to further shorten detection and response times; strengthening the internal connectivity dependency inventory maintained by the team, expanding its level of detail so that each future change is systematically cross-referenced against it prior to implementation; and reinforcing the incremental change protocol already in place, strengthening the practice of creating and validating new specific configurations before retiring previous ones. These measures enhance the operational maturity of the platform, reduce the risk of recurrence, and consolidate the resilience with which the team already manages infrastructure maintenance and strengthening activities. We sincerely apologize for any inconvenience caused and thank you for your understanding.

  • Resolved
    Resolved
    This incident has been resolved.
  • Monitoring
    Monitoring
    We implemented a fix and are currently monitoring the result.
  • Identified
    Identified

    We are working on a fix for this incident.

  • Investigating
    Investigating

    We are currently investigating an issue that may be affecting agent performance. Our technical team is actively reviewing the situation and will provide an update within the next 30 minutes. We apologize for any inconvenience this may cause.

May 2026 to Jul 2026

Next