Treeherder stopped ingesting pushes after the production deploy

RESOLVED FIXED

Status

Tree Management
Treeherder: Infrastructure
--
blocker
RESOLVED FIXED
3 years ago
3 years ago

People

(Reporter: Tomcat, Assigned: fubar)

Tracking

Details

(Reporter)

Description

3 years ago
somehow treeherder is stuck at https://treeherder.mozilla.org/ui/#/jobs?repo=mozilla-inbound&revision=6c094e2b6e57 while it should show https://hg.mozilla.org/integration/mozilla-inbound/rev/dedc769ea8a8

wonder if this is the pushlog problem we have seen before ?
(Reporter)

Comment 1

3 years ago
also trees closed now for this problem

Comment 2

3 years ago
15:15 <•mdoglio> edmorley: the etltasks stopped working 15 minutes ago, I guess because of the deployment
15:16 <•mdoglio> fubar: hey there, can you please have a look at the celery worker on the etl nodes?
15:17 <fubar> mdoglio: processes are running
15:17 <fubar> [treeherder-etl1.private.scl3.mozilla.com] out: celery RUNNING pid 2013, uptime 0:17:37
15:17 <fubar> [treeherder-etl2.private.scl3.mozilla.com] out: celery RUNNING pid 22710, uptime 0:18:51
15:17 <fubar> [treeherder-etl1.private.scl3.mozilla.com] out: 495 2034 2013 0 13:59 ? 00:00:04 /usr/bin/python /usr/bin/celery -A treeherder worker -c 3 -Q default -E --maxtasksperchild=500 --logfile=/var/log/celery/celery_worker.log -l INFO -n default.%h
15:18 <•mdoglio> fubar: I'm on etl and I see the worker is started with the wrong script
15:19 <•mdoglio> s/etl/etl1
15:20 <•edmorley> mdoglio: fubar: this is the first deploy after bug 1086934 I guess?
15:20 <firebot> https://bugzil.la/1086934 — FIXED, klibby@mozilla.com — Production's commander_settings.py is missing treeherder-etl from CELERY_HOSTGROUP
15:21 <•mdoglio> fubar: this is the supervisord conf for the etl nodes https://github.com/mozilla/treeherder-service/blob/master/deployment/supervisord/etl_node.conf
15:22 <fubar> mdoglio: I think we missed a step between building the etl nodes and actually using those queues
15:23 <•mdoglio> oh okey
15:23 — fubar adds those to puppet
15:23 <•mdoglio> thanks fubar
Summary: Treeherder missing https://hg.mozilla.org/integration/mozilla-inbound/rev/dedc769ea8a8 and following → Treeherder has stopped ingesting pushes

Updated

3 years ago
Summary: Treeherder has stopped ingesting pushes → Treeherder stopped ingesting pushes after the production deploy
The pushlog and buildapi ingestion services are running now. Thanks fubar!
Status: NEW → RESOLVED
Last Resolved: 3 years ago
Resolution: --- → FIXED
Assignee: nobody → klibby

Comment 4

3 years ago
This was presumably a combination of:
1) Recent change to unplug the default celery worker from etl (buildapi,pushlog) queues (https://github.com/mozilla/treeherder-service/commit/5df6bd4212778425adff0e743dd9a09c63bb01c4).
2) Recent fix to the Chief deploy script, since the new ETL nodes were not included in the machines it was updating (bug 1086934).
3) The ETL nodes using the wrong supervisord conf, which wasn't noticed until now, since #1 meant the default worker was doing ETL, and #2 meant that even after #1 landed, the ETL workers didn't get the new code until the first deploy after #2 was fixed (which was the deploy 30 mins ago).
Assignee: klibby → nobody
Blocks: 1080589
Component: Treeherder → Infrastructure
QA Contact: laura

Updated

3 years ago
Assignee: nobody → klibby
You need to log in before you can comment on or make changes to this bug.