Jelou - Historial de avisos

Sistemas funcionando con normalidad

Historial de avisos

abr 2026

Multimedia Reception Issues
  • Después de la muerte
    Después de la muerte

    RCA
    1. Incident Summary
    On April 16, 2026, between 09:20 and 12:00 (Ecuador time), a partial degradation of the chatbot service was identified. The incident affected the visualization of multimedia resources (images, videos, and documents) within user interfaces, without compromising data integrity. The service was fully restored following corrective actions.

    2. Impact
    During the incident window, the following effects were observed:

    • Multimedia visualization: Temporarily unavailable due to an error in access link generation.

    • File uploads: Fully operational at all times, with no data loss or alteration.

    • Text messaging: Functionality remained 100% operational.

    • Biometric flow: Partially affected during image validation, video processing, and PDF generation; subsequently verified and normalized.

    3. Detection

    • 09:20 AM: Preventive monitoring systems detected an increase in access errors related to multimedia loading.

    • 09:45 AM: Impact was confirmed through end-user reports, aligning with monitoring indicators.

    4. Response and Resolution
    An inconsistency was identified in configuration parameters applied during an automated security policy update. This adjustment temporarily affected the generation of access links for multimedia resources. After corrective intervention and full validation of upload and visualization flows, the service was completely restored at 12:00 PM.

    Root Cause
    The incident originated from an automated adjustment in the security policies of file management services. This change applied a new set of permissions to temporary access links, resulting in the disruption of multimedia content visualization within the platform.

    Preventive Actions
    To reduce the risk of recurrence:

    • Cross-validation: Automated multimedia rendering tests will be incorporated into security configuration deployment processes.

    • Proactive observability: Monitoring mechanisms will be expanded to independently detect reading and visualization errors, complementing existing file upload alerts.

    • Policy review: Additional verification controls will be implemented for parameters modified by automated security processes.

  • Resuelto
    Resuelto
    This incident has been resolved.
  • Supervisando
    Supervisando

    Image rendering for new uploads is now fully restored.
    The issue causing intermittent display failures has been identified and mitigated.

    We will restore visibility of images affected during the incident window.

  • Investigando
    Investigando

    We are currently experiencing an issue affecting the reception of multimedia content. Our team is investigating the problem and working on a resolution. We will provide updates as more information becomes available.

mar 2026

🚨 Platform Service Interruption
  • Después de la muerte
    Después de la muerte

    RCA

    Incident Summary

    On March 16, 2026, between 03:44 and 07:45 UTC, a temporary degradation occurred in the message processing and workflow services. The issue was triggered by an automatic failover event in the cache cluster. During the failover process, some services experienced a interruption in communication with the in-memory database, which resulted in temporary latency in request processing.

    Impact

    During the incident window, the system experienced temporary delays in the processing of some messages. Certain users may have noticed increased response times from the service.

    The cache hit rate temporarily decreased from its normal operating level, which led to increased processing times for some requests. Service performance gradually recovered as the system re-established normal operations.

    Detection

    The incident was proactively detected through automated monitoring alerts configured in the system. These alerts identified variations in the performance of the workflow service and in the operation of the cache cluster.

    The alerts enabled the infrastructure team to begin investigating the event promptly.

    Response

    The infrastructure team analyzed cache cluster metrics and dependent services to determine the source of the degradation. It was identified that an automatic failover event had occurred in the cluster as part of the high availability and protection mechanisms of the managed cache service.

    It was confirmed that this event was not related to any security incident. The system stabilized progressively as services re-established their connections and the cache returned to normal operational levels.

    Root Cause

    The incident was caused by the combination of the following factors:

    1. An automatic failover event occurred in the cache cluster at 03:44 UTC. These events are part of the high availability mechanisms of the managed cache service and may occur either as scheduled maintenance or in response to infrastructure conditions.

    2. During the failover process, platform services were required to re-establish connections with the new active node, which temporarily impacted cache efficiency.

    3. The temporary reduction in cache efficiency increased request processing times during the system stabilization period.

    Resolution and Improvements

    In response to this incident, the engineering team is implementing the following improvements:

    1. Optimization of service connectivity configurations with the cache cluster to ensure that failover events remain fully transparent to platform operations.

    2. Implementation of proactive multi-layer monitoring capable of detecting failover events early and tracking their impact across platform services.

    3. Deployment of a centralized observability dashboard to improve incident identification and accelerate resolution times for the infrastructure team.

    4. Optimization of service auto-reconnection mechanisms to minimize recovery time during failover events.

    These measures are designed to prevent the recurrence of similar situations and ensure service continuity during infrastructure maintenance events.

  • Resuelto
    Resuelto
    This incident has been resolved.
  • Supervisando
    Supervisando
    We implemented a fix and are currently monitoring the result.
  • Investigando
    Investigando

    We are currently experiencing a service disruption affecting the platform functionality. At the moment, chats may not respond and some platform features could be unavailable.

    Our technical team is already investigating the issue with high priority and working to restore the service as soon as possible.

    We will keep you updated in the next 30 minutes with more information.

    Thank you for your patience

feb 2026

feb 2026 a abr 2026

Siguiente