AM1 - Elevated Error Rate – Webpage Label Loading

Incident Report for BarTender Cloud Status Page

Postmortem

INCIDENT RCA
AM1 – Elevated Error Rate – Webpage Label Loading
Mon, March 16, 2026

Time: 09:30 to 09:45 PDT
Duration: Approximately 15 minutes
Severity: Degraded Performance

Customer Impact
Users on AM1 were unable to open labels from the Print Console during the incident window, experiencing timeouts and errors. No data was lost or corrupted, and all previously submitted print jobs were unaffected.

Root Cause
An unusually high volume of automated print-to-PDF requests in a short period overloaded the print job scheduling service, triggering a restart. During the restart, the service temporarily lost visibility into in-progress jobs, causing them to be incorrectly flagged as failed. An automated recovery process then attempted to reprocess those jobs, unable to distinguish them from genuine failures, adding further load and extending the disruption.

Containment Actions

  1. Restarted the scheduling service to restore normal processing.
  2. Confirmed recovery through monitoring of job acceptance and error rates.
  3. Identified and flagged the source of the automated request surge for follow-up.
     

Corrective Actions

  1. Job state persistence. In-progress job state will be stored durably, surviving service restarts.
  2. Resilient startup. Service will reload active job state before accepting new requests.
  3. Centralized retry logic. Recovery logic consolidated into a single component to prevent conflicting actions.
  4. Service readiness checks. Health checks added to confirm full initialization before processing begins.

 
Preventive Actions

  • Rate controls to manage sudden automated request spikes.
  • Enhanced monitoring and alerting on scheduling service health and recovery states.

 

We apologize for the disruption and are committed to the improvements outlined above to prevent recurrence.

Posted Mar 18, 2026 - 04:52 PDT

Resolved

This incident has been resolved.
Posted Mar 16, 2026 - 10:45 PDT

Monitoring

The elevated error rate affecting webpage label loading in BTC - AM1 has been addressed. We are continuing to monitor the environment to ensure full stability.
Posted Mar 16, 2026 - 09:45 PDT

Investigating

We are currently experiencing an elevated error rate affecting webpage label loading in the BTC - AM1 environment. Our team is actively investigating the issue and will provide updates as soon as more information is available.
Posted Mar 16, 2026 - 09:30 PDT
This incident affected: AM1 - BarTender Cloud (AM1 - Print Console).