Many command queue alerts from large code pushes

RESOLVED FIXED

Status

Release Engineering
General
RESOLVED FIXED
a year ago
2 months ago

People

(Reporter: nthomas, Assigned: nthomas)

Tracking

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(1 attachment)

(Assignee)

Description

a year ago
Lots of alerts like
 <nagios-releng> Wed 14:54:57 PDT [4231] [] buildbot-master77.bb.releng.use1.mozilla.com:Command Queue is CRITICAL: 2 dead items
recently, which turn out to be timeouts after 2 minutes trying to add large changes to the statusdb, eg

2017-05-10 14:00:14,579 - INSERT INTO sourcestamps (branch, revision, patch_id) VALUES (%s, %s, %s)
2017-05-10 14:00:14,579 - ('integration/autoland', 'eb62dc9d8524742ec288004e12d6380f1535c031', None)
2017-05-10 14:00:14,655 - INSERT INTO changes (number, branch, revision, who, comments, `when`) VALUES (%s, %s, %s, %s, %s, %s)
2017-05-10 14:00:14,655 - (9227481L, 'integration/autoland', 'eb62dc9d8524742ec288004e12d6380f1535c031', 'cbook@mozilla.com', 'Merge mozilla-central to autoland', datetime.datetime(2017, 5, 10, 13, 36, 40))2017-05-10 14:00:14,978 - INSERT INTO file_changes (file_id, change_id) VALUES (%s, %s)
2017-05-10 14:00:14,978 - ((3840659L, 6514173L), (979845L, 6514173L), (712747L, 6514173L), (1945983L, 6514173L), (1946189L, 6514173L)...

A work around is to just increase the -m argument in /etc/init.d/command_runner (from 60 to 600), so it has time to finish once, then subsequent jobs for the change are quick. We can also look at if anything uses the file lists on changes; if not we can stop inserting them.
(Assignee)

Comment 1

a year ago
Created attachment 8866544 [details] [diff] [review]
[puppet] Increase timeout to 10 minutes

Workaround/stop gap solution to avoid manual work recovering dead queues.
Attachment #8866544 - Flags: review?(catlee)
Attachment #8866544 - Flags: review?(catlee) → review+
Attachment #8866544 - Flags: checked-in+
Status: NEW → RESOLVED
Last Resolved: a year ago
Resolution: --- → FIXED
(Assignee)

Updated

a year ago
Duplicate of this bug: 1313947
(Assignee)

Updated

11 months ago
See Also: → bug 1382916
Component: General Automation → General
Product: Release Engineering → Release Engineering
You need to log in before you can comment on or make changes to this bug.