AM1 - Intermittent Printing

Incident Report for BarTender Cloud Status Page

Postmortem

RCA Report: Print Services Degradation

Incident Window: November 5 – 17, 2025 (EST)

Summary (Root Cause)

Between November 5 and 17, some customers experienced intermittent print delays, printer list errors, and periods of degraded performance.

What initially appeared to be separate issues was ultimately traced back to the combination of a database connection-string bug introduced in the November 5 release (v11.9) and limitations in one of our older print service components. During periods of high activity, the Print Controller struggled to restart because multiple services were placing heavy load on shared database resources.

A configuration intended for a single service unintentionally affected others, slowing down database performance and causing print-processing delays during peak usage. Between November 8 and 14, several mitigations were applied: the database connection pool was increased, but because the configuration is global, this created additional load during busy periods and contributed to degraded performance. We then attempted to split the database pool into a larger pool for the Print Controller and a smaller pool for other components, but this did not fully resolve the issue.

Stability was fully restored on November 17 by scaling out the Print Controller service (adding additional instances, adjusting the load balancer) and reverting the temporary database configuration changes back to their original, safe values.

In short:

  • A configuration bug exposed weaknesses in an older, non-scalable Print Controller
  • A temporary workaround increased database load across the platform
  • Multiple services experienced slowness as a result
  • Symptoms appeared at different times as traffic rose during peak hours

Timeline of Events (EST)

November 5 – V11.9 Release

  • BTC v11.9 release deployed.
  • This release unintentionally introduced a connection-string bug affecting print services.

Issue 1 - Database Connection String Bug

November 6 (Thursday)

  • EU customers report printer list failures.
  • Root cause: malformed connection strings caused by spaces in the v11.9 configuration.
  • The bug is confirmed to impact the EU Print Controller, which struggles to restart during peak load.

November 8 (Saturday)

  • The fixed connection string is applied in EU.
  • For AM and APAC, the corrected configurations are staged for the next release or service restart.
  • Issue #1 is resolved but resolving it exposes deeper bottle necks that leads to Issue #2a.

Issue 2a – Print Controller Restart Failure

November 8 (Saturday)

  • During busy hours, print performance slows.
  • Team increases database connection capacity as a temporary measure to support reconnecting print devices.
  • This helped in the short term but introduced new pressure on shared database resources.

Issue 2b – System Degradation After Pool Increase (Last Week)

November 9 – 14

  • During busy hours, print performance slows.
  • The increased connection pool causes high database load across multiple components.
  • Some customers experience delays, timeouts, or intermittent failures.
  • The team splits database settings to reduce pressure, improving stability, but did not fully resolve the issue.

November 17 (Monday)

  • During morning peak, print delays return in AM region.
  • The underlying connection-pool pressure reappears.
  • Additional symptoms surface as other services compete for database access.
  • Final fix applied: all temporary pool increases rolled back and the Print Controller scaled out using a controlled process.
  • System stability fully restored.

Final Resolution

Stability was achieved through two coordinated actions:

  1. Reverting all database settings back to their safe, original values.
  2. Scaling out the Print Controller using a controlled method that improves reliability and reduces pressure during peak demand.

Print Services returned to stable operation after these steps.

Preventive Measures

To prevent this type of issue in the future, we have already made or are making the following improvements:

  • Ensuring database configurations are properly scoped per component
  • Adding protections to prevent a single configuration from impacting the entire platform
  • Improving monitoring for connection-pool usage and print service load
  • Establishing production-like performance testing for print services
  • Horizontal scaling of the Print Controller
  • Continuing our roadmap to modernize all critical print services in 2026

Current Status

🟢 Resolved – Print Services are fully stable and operating normally.

Posted Nov 18, 2025 - 09:55 PST

Resolved

The issue has been fully resolved, and the system is operating normally.
Posted Nov 12, 2025 - 17:35 PST

Monitoring

A fix has been implemented, and we are currently monitoring the results to ensure full stability.
Posted Nov 12, 2025 - 16:21 PST

Update

We are currently experiencing a partial outage. The engineering team is actively investigating and will provide further updates as they become available.
Posted Nov 12, 2025 - 16:08 PST

Update

We are continuing to investigate this issue.
Posted Nov 12, 2025 - 15:50 PST

Update

We continue to investigate the issue. Further updates will be provided as more information becomes available.
Posted Nov 12, 2025 - 14:48 PST

Investigating

Some customers may experience intermittent printing issues. Our team is actively investigating the situation. Further updates will follow.
Posted Nov 12, 2025 - 13:48 PST
This incident affected: AM1 - BarTender Cloud (AM1 - Print Engine Services).