1999425 - Intermittent Phabricator DB issues 2025-11-10--11: “Too many connections” (to DB)

Assignee

Description

•

23 days ago

Since 2025-11-10 22:03 UTC (9:03 AEDT), we have been seeing intermitent issues with Phabricator, which seem to stem from database connections issues.

From moz-phab:

matthew@ZenTower:~/firefox2$ moz-phab submit 
Submitting 3 commits for review
Phabricator Error: Unable to establish a connection to any database host (while trying "phabricator_conduit"). All masters and replicas are completely unreachable.

AphrontConnectionQueryException: Attempt to connect to mozphab_db@mozphab-phabhost-rds-devsvcprod-201709240232.c8twujv3cfsf.us-west-2.rds.amazonaws.com failed with error #1040: Too many connections.

Errors when refreshing pages:

[Core Exception/PhutilAggregateException] Encountered a processing exception, then another exception when trying to build a response for the first exception.
    - PhabricatorDataNotAttachedException: Attempting to access attached data on PhabricatorUser (via getAlternateCSRFString()), but the data is not actually attached. Before accessing attachable data on an object, you must load and attach it.
      
      Data is normally attached by calling the corresponding needX() method on the Query class when the object is loaded. You can also call the corresponding attachX() method explicitly.
    - PhabricatorClusterStrandedException: Unable to establish a connection to any database host (while trying "phabricator_user"). All masters and replicas are completely unreachable.
      
      AphrontConnectionQueryException: Attempt to connect to mozphab_db@mozphab-phabhost-rds-devsvcprod-201709240232.c8twujv3cfsf.us-west-2.rds.amazonaws.com failed with error #1040: Too many connections.

Warnings when refreshing pages:

Unable to establish a connection to any database host (while trying "phabricator_policy"). All masters and replicas are completely unreachable. AphrontConnectionQueryException: Attempt to connect to mozphab_db@mozphab-phabhost-rds-devsvcprod-201709240232.c8twujv3cfsf.us-west-2.rds.amazonaws.com failed with error #1040: Too many connections.

We are also seeing alerts from phabricator-emails firing and resolving, more or less in line with user reports. As early as 2025-11-10 15:20 UTC (2:20 AEDT):

[FIRING:1] Phabricator conduit (Phabricator Emails Errors Logged low)
**Firing**Value: A=1, B=0
Labels:
- alertname = Phabricator Emails Errors Logged
- grafana_folder = Phabricator
- severity = low
- team = conduit
Annotations:
- description = Phabricatoremails logging errors
Source: https://yardstick.mozilla.org/alerting/grafana/cegjcz26knwg0d/view?orgId=1
Silence: https://yardstick.mozilla.org/alerting/silence/new?alertmanager=grafana&matcher=__alert_rule_uid__%3Dcegjcz26knwg0d&matcher=severity%3Dlow&matcher=team%3Dconduit&orgId=1

Logs for both Phabricator and Phabricator-emails don't reveal anything obvious. Neither do metrics in Yardstick and RDS in the AWS console.

Olivier Mehani [:shtrom]

Assignee

Updated

•

23 days ago

Assignee: nobody → omehani

Severity: -- → S3

Status: NEW → ASSIGNED

Olivier Mehani [:shtrom]

Assignee

Updated

•

23 days ago

Severity: S3 → S2

Olivier Mehani [:shtrom]

Assignee

Updated

•

23 days ago

Summary: Intermitent Phabricator DB issues 2025-11-10--11 → Intermitent Phabricator DB issues 2025-11-10--11: “Too many connections” (to DB)

Olivier Mehani [:shtrom]

Assignee

Comment 1

•

23 days ago

Ah! There is a dedicated dashboard for RDS metrics. We do see the database connections maxing out
https://yardstick.mozilla.org/d/beggx0u1j5ekge/mozphab-prod-rds-cloudwatch?orgId=1&from=2025-11-10T20:01:13.307Z&to=2025-11-11T02:01:13.307Z&timezone=browser&viewPanel=panel-4

Olivier Mehani [:shtrom]

Assignee

Comment 2

•

23 days ago

From dkl: Phabricator has a lot of backed up tasks.

Olivier Mehani [:shtrom]

Assignee

Comment 3

•

23 days ago

•

Edited

There were very many PhabricatorCalendarImportReloadWorker tasks in the Leased Tasks in https://phabricator.services.mozilla.com/daemon/

As they are not critical, dkl killed them.

It's uncertain whether they were a cause, or a consequence, of the observed issues.

Emilio Cobos Álvarez [:emilio]

Comment 4

•

23 days ago

FWIW I can still see problems submitting to phab.

Olivier Mehani [:shtrom]

Assignee

Comment 5

•

23 days ago

From dkl: the database show shows all or almost the max 1280 connections in use in show processlist
(with /app/phabricator/bin/storage shell from the phd instance); all but a couple are in state Sleep.

Olivier Mehani [:shtrom]

Assignee

Comment 6

•

23 days ago

•

Edited

Also seeing a lot of PhabricatorSearchWorker tasks (in charge of full-text indexing), killing them now with

/app/phabricator/bin/worker cancel --active --class PhabricatorSearchWorker

Olivier Mehani [:shtrom]

Assignee

Comment 7

•

23 days ago

dkl also noted that the wait_timeout on the RDS instance is very high, 28800 seconds (vs the default 300).

Olivier Mehani [:shtrom]

Assignee

Updated

•

23 days ago

Summary: Intermitent Phabricator DB issues 2025-11-10--11: “Too many connections” (to DB) → Intermittent Phabricator DB issues 2025-11-10--11: “Too many connections” (to DB)

Olivier Mehani [:shtrom]

Assignee

Comment 8

•

23 days ago

At 4:30 UTC, :jbuck resized the db to m8g.2xlarge

Olivier Mehani [:shtrom]

Assignee

Comment 9

•

23 days ago

5:06 UTC; it looks like the incident is mitigated.

Olivier Mehani [:shtrom]

Assignee

Updated

•

23 days ago

See Also: → https://mozilla-hub.atlassian.net/browse/SREIN-646

David Lawrence [:dkl]

Updated

•

12 days ago

Status: ASSIGNED → RESOLVED

Closed: 12 days ago

Resolution: --- → FIXED

Olivier Mehani [:shtrom]

Assignee

Updated

•

10 days ago

See Also: → https://docs.google.com/document/d/1tkcGhhDsZPF2dWus5kuC3t_Vx-zJL1FJ5wfWNlt5P0E/edit?tab=t.0

Olivier Mehani [:shtrom]

Assignee

Updated

•

7 days ago

Blocks: 2002695

Olivier Mehani [:shtrom]

Assignee

Updated

•

8 hours ago

Blocks: 2003932

Bugzilla

Intermittent Phabricator DB issues 2025-11-10--11: “Too many connections” (to DB)

Categories

(Conduit :: Phabricator, defect)

Tracking

(Not tracked)

People

(Reporter: shtrom, Assigned: shtrom)

References

(Blocks 2 open bugs)

Details

Crash Data

Security

(public)

User Story

Description

Updated

Updated

Updated

Comment 1

Comment 2

Comment 3

Comment 4

Comment 5

Comment 6

Comment 7

Updated

Comment 8

Comment 9

Updated

Updated

Updated

Updated

Updated