Treeherder stopped ingesting pushes after the production deploy



4 years ago
4 years ago


(Reporter: cbook, Assigned: fubar)





4 years ago
somehow treeherder is stuck at while it should show

wonder if this is the pushlog problem we have seen before ?

Comment 1

4 years ago
also trees closed now for this problem

Comment 2

4 years ago
15:15 <•mdoglio> edmorley: the etltasks stopped working 15 minutes ago, I guess because of the deployment
15:16 <•mdoglio> fubar: hey there, can you please have a look at the celery worker on the etl nodes?
15:17 <fubar> mdoglio: processes are running
15:17 <fubar> [] out: celery RUNNING pid 2013, uptime 0:17:37
15:17 <fubar> [] out: celery RUNNING pid 22710, uptime 0:18:51
15:17 <fubar> [] out: 495 2034 2013 0 13:59 ? 00:00:04 /usr/bin/python /usr/bin/celery -A treeherder worker -c 3 -Q default -E --maxtasksperchild=500 --logfile=/var/log/celery/celery_worker.log -l INFO -n default.%h
15:18 <•mdoglio> fubar: I'm on etl and I see the worker is started with the wrong script
15:19 <•mdoglio> s/etl/etl1
15:20 <•edmorley> mdoglio: fubar: this is the first deploy after bug 1086934 I guess?
15:20 <firebot> — FIXED, — Production's is missing treeherder-etl from CELERY_HOSTGROUP
15:21 <•mdoglio> fubar: this is the supervisord conf for the etl nodes
15:22 <fubar> mdoglio: I think we missed a step between building the etl nodes and actually using those queues
15:23 <•mdoglio> oh okey
15:23 — fubar adds those to puppet
15:23 <•mdoglio> thanks fubar
Summary: Treeherder missing and following → Treeherder has stopped ingesting pushes


4 years ago
Summary: Treeherder has stopped ingesting pushes → Treeherder stopped ingesting pushes after the production deploy
The pushlog and buildapi ingestion services are running now. Thanks fubar!
Last Resolved: 4 years ago
Resolution: --- → FIXED
Assignee: nobody → klibby

Comment 4

4 years ago
This was presumably a combination of:
1) Recent change to unplug the default celery worker from etl (buildapi,pushlog) queues (
2) Recent fix to the Chief deploy script, since the new ETL nodes were not included in the machines it was updating (bug 1086934).
3) The ETL nodes using the wrong supervisord conf, which wasn't noticed until now, since #1 meant the default worker was doing ETL, and #2 meant that even after #1 landed, the ETL workers didn't get the new code until the first deploy after #2 was fixed (which was the deploy 30 mins ago).
Assignee: klibby → nobody
Blocks: 1080589
Component: Treeherder → Infrastructure
QA Contact: laura


4 years ago
Assignee: nobody → klibby
You need to log in before you can comment on or make changes to this bug.