RCA

Incident Summary

On March 16, 2026, between 03:44 and 07:45 UTC, a temporary degradation occurred in the message processing and workflow services. The issue was triggered by an automatic failover event in the cache cluster. During the failover process, some services experienced a interruption in communication with the in-memory database, which resulted in temporary latency in request processing.

Impact

During the incident window, the system experienced temporary delays in the processing of some messages. Certain users may have noticed increased response times from the service.

The cache hit rate temporarily decreased from its normal operating level, which led to increased processing times for some requests. Service performance gradually recovered as the system re-established normal operations.

Detection

The incident was proactively detected through automated monitoring alerts configured in the system. These alerts identified variations in the performance of the workflow service and in the operation of the cache cluster.

The alerts enabled the infrastructure team to begin investigating the event promptly.

Response

The infrastructure team analyzed cache cluster metrics and dependent services to determine the source of the degradation. It was identified that an automatic failover event had occurred in the cluster as part of the high availability and protection mechanisms of the managed cache service.

It was confirmed that this event was not related to any security incident. The system stabilized progressively as services re-established their connections and the cache returned to normal operational levels.

Root Cause

The incident was caused by the combination of the following factors:

An automatic failover event occurred in the cache cluster at 03:44 UTC. These events are part of the high availability mechanisms of the managed cache service and may occur either as scheduled maintenance or in response to infrastructure conditions.
During the failover process, platform services were required to re-establish connections with the new active node, which temporarily impacted cache efficiency.
The temporary reduction in cache efficiency increased request processing times during the system stabilization period.

Resolution and Improvements

In response to this incident, the engineering team is implementing the following improvements:

Optimization of service connectivity configurations with the cache cluster to ensure that failover events remain fully transparent to platform operations.
Implementation of proactive multi-layer monitoring capable of detecting failover events early and tracking their impact across platform services.
Deployment of a centralized observability dashboard to improve incident identification and accelerate resolution times for the infrastructure team.
Optimization of service auto-reconnection mechanisms to minimize recovery time during failover events.

These measures are designed to prevent the recurrence of similar situations and ensure service continuity during infrastructure maintenance events.

Con el fin de mejorar la seguridad de nuestra plataforma y aumentar la satisfacción de nuestros usuarios, realizaremos un mantenimiento preventivo en nuestros sistemas. Este trabajo contribuirá a mejorar el entorno, así como la estabilidad y disponibilidad de nuestros servicios de base de datos.

Durante el mantenimiento pueden producirse interrupciones en el servicio. Haremos todo lo posible para que estas interrupciones sean mínimas.
Si tiene algún problema o necesita ayuda para comprender este cambio, póngase en contacto con los equipos de asistencia correspondientes para coordinarse: ayuda@jelou.ai.

Gracias por su continua confianza en nosotros y por formar parte de nuestra comunidad.
Seguiremos trabajando duro para ofrecerle un servicio excepcional.

Horario:

EC: 00:00 (GMT-5)
MX: 23:00 (GMT-6)
BR: 02:00 (GMT-3)

Sistemas funcionando con normalidad

Historial de avisos

mar 2026

RCA

Incident Summary

Impact

Detection

Response

Root Cause

Resolution and Improvements

feb 2026

ene 2026

Jelou - Historial de avisos

Sistemas funcionando con normalidad

Historial de avisos

mar 2026

RCAIncident Summary

Impact

Detection

Response

Root Cause

Resolution and Improvements

feb 2026

ene 2026

RCA

Incident Summary