Closed Bug 1126549 Opened 10 years ago Closed 10 years ago

populate-performance-series task not being scheduled on stage

Categories

(Tree Management :: Perfherder, defect)

x86_64
Linux
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: wlach, Assigned: mdoglio)

Details

For some reason, it appears as if the populate-performance-series task is not being scheduled on stage. Manual inspection indicates that performance artifacts and series properties are being updated by the log parsing task (as you'd expect). e.g. https://treeherder.allizom.org/api/project/mozilla-central/performance_artifact/ ... however the series summary endpoint (e.g. https://treeherder.allizom.org//api/project/mozilla-central/performance-data/0/get_performance_series_summary/?interval=2592000) returns an empty set and the actual performance_series table is blank: > SELECT * FROM mozilla_central_jobs_1.performance_series ORDER BY id DESC LIMIT 100 + ------- + --------------------- + --------------------- + --------- + ----------------- + --------- + ------------------ + | id | interval_seconds | series_signature | type | last_updated | blob | active_status | + ------- + --------------------- + --------------------- + --------- + ----------------- + --------- + ------------------ + | NULL | NULL | NULL | NULL | NULL | NULL | NULL | + ------- + --------------------- + --------------------- + --------- + ----------------- + --------- + ------------------ + 1 rows :edmorley and I spend about 30 minutes trying to debug this with no real luck. Do you have any idea what might be happening Mauro? There seem to be no problems on production. The task is being scheduled here: https://github.com/mozilla/treeherder-service/blob/master/treeherder/etl/perf_data_adapters.py#L254 And is defined here: https://github.com/mozilla/treeherder-service/blob/master/treeherder/model/tasks.py#L55
Flags: needinfo?(mdoglio)
Prod: [emorley@treeherder-rabbitmq1.private.scl3 ~]$ grep 'populate-performance' /var/log/celery/celery_worker.log | tail [2015-01-28 04:26:21,482: INFO/MainProcess] Task populate-performance-series[72ba946c-1925-4148-b623-94f7b9d7522e] succeeded in 8.23880121857s: None [2015-01-28 04:27:07,200: INFO/MainProcess] Task populate-performance-series[1741108b-fb49-4976-84c0-717187bdac78] succeeded in 110.314319617s: None [2015-01-28 04:27:07,202: INFO/MainProcess] Received task: populate-performance-series[f3e6c8b8-622b-47a8-adac-99ad49065cb2] [2015-01-28 04:27:07,207: INFO/MainProcess] Received task: populate-performance-series[043da50d-1663-45ea-833d-d721836bfecd] [2015-01-28 04:27:15,307: INFO/MainProcess] Received task: populate-performance-series[70f3deaa-d85f-4788-8097-724259f6b0fc] [2015-01-28 04:27:29,767: INFO/MainProcess] Task populate-performance-series[84a4da72-8b7d-48fc-a7fb-58a8091dcc8d] succeeded in 111.270641737s: None [2015-01-28 04:27:29,771: INFO/MainProcess] Received task: populate-performance-series[77092656-90f4-4aaf-8064-54a6980f7d9f] [2015-01-28 04:27:41,205: INFO/MainProcess] Task populate-performance-series[19997949-3a9c-4fcf-98cc-bf74c6e0c442] succeeded in 11.4368384909s: None [2015-01-28 04:27:41,207: INFO/MainProcess] Received task: populate-performance-series[d143dcd3-b0f5-42c4-a502-984964213b87] [2015-01-28 04:27:48,855: INFO/MainProcess] Task populate-performance-series[bd92937c-68c1-47bc-ad29-770978468aa9] succeeded in 7.65052897111s: None Stage: [emorley@treeherder-rabbitmq1.stage.private.scl3 ~]$ grep 'populate-performance' /var/log/celery/celery_worker.log | tail [emorley@treeherder-rabbitmq1.stage.private.scl3 ~]$
The tasks is supposed to be scheduled here: https://github.com/mozilla/treeherder-service/blob/master/treeherder/etl/perf_data_adapters.py#L254 Which is called from: https://github.com/mozilla/treeherder-service/blob/93ae4541feb7013bd7181f5a714644a51521e55b/treeherder/model/derived/jobs.py#L2196 There are no errors on new relic or in the various celery logs I've looked at so far. However: [emorley@treeherder-processor1.stage.private.scl3 ~]$ nc -vz treeherder-stage-ro-vip.db.scl3.mozilla.com 3306 ...doesn't succeed. And prior to the call above, we do: https://github.com/mozilla/treeherder-service/blob/93ae4541feb7013bd7181f5a714644a51521e55b/treeherder/model/derived/jobs.py#L2168 Which uses get_reference_data_for_perf_signature, which is on read_host: https://github.com/mozilla/treeherder-service/blob/c8b1c70f30a5a43d464714d1634b823bd049fdbd/treeherder/model/sql/reference.json#L168 ...so we're just missing yet more flows, sigh.
Actually ignore that, since (a) I'm not sure processor1 is where these jobs are scheduled, and (b) stage proc1 does have flow to RW, and until recently we would have always been using that, yet still zero rows [emorley@treeherder-processor1.stage.private.scl3 ~]$ nc -vzw 5 treeherder-stage-rw-vip.db.scl3.mozilla.com 3306 Connection to treeherder-stage-rw-vip.db.scl3.mozilla.com 3306 port [tcp/mysql] succeeded!
I can only tell the task is running looking at the logs. I'll take this
Assignee: nobody → mdoglio
Flags: needinfo?(mdoglio)
I can see data in mozilla_central_jobs_1.performance_series and https://treeherder.allizom.org//api/project/mozilla-central/performance-data/0/get_performance_series_summary/?interval=2592000 doesn't return an empty dataset. :wlach can you please double check that?
Flags: needinfo?(wlachance)
(In reply to William Lachance (:wlach) from comment #0) > > SELECT * FROM mozilla_central_jobs_1.performance_series ORDER BY id DESC LIMIT 100 > > + ------- + --------------------- + --------------------- + --------- + > ----------------- + --------- + ------------------ + > | id | interval_seconds | series_signature | type | > last_updated | blob | active_status | > + ------- + --------------------- + --------------------- + --------- + > ----------------- + --------- + ------------------ + > | NULL | NULL | NULL | NULL | NULL > | NULL | NULL | > + ------- + --------------------- + --------------------- + --------- + > ----------------- + --------- + ------------------ + > 1 rows Strangely this is now returning rows: > SELECT * FROM mozilla_central_jobs_1.performance_series ORDER BY id DESC LIMIT 10 + ------- + --------------------- + --------------------- + --------- + ----------------- + --------- + ------------------ + | id | interval_seconds | series_signature | type | last_updated | blob | active_status | + ------- + --------------------- + --------------------- + --------- + ----------------- + --------- + ------------------ + | 19278 | 7776000 | 31e0458d778f376ce8c34f816e208884186da4ab | talos_data | 1422517329 | ... | active | | 19277 | 7776000 | dc16afa7ab2945c05aec696049ca919d66a4ee4e | talos_data | 1422517329 | ... | active | | 19276 | 7776000 | 334a5f74b57c0ab70b54f8bf11375bc089d47faa | talos_data | 1422517329 | ... | active | | 19275 | 7776000 | 388aba0ec1ce9519774f789c45fbb639715d3746 | talos_data | 1422517329 | ... | active | | 19274 | 7776000 | 7c7431d5d704cb0afc229791a7666a2744c0704b | talos_data | 1422517329 | ... | active | | 19273 | 7776000 | bf4f17e6d8ff9f78e78104fe87a86460c100bdab | talos_data | 1422517329 | ... | active | | 19272 | 7776000 | e637a25acaf53b700b6995863af919528662ff2e | talos_data | 1422517329 | ... | active | | 19271 | 7776000 | 56e739c9b09d2d8a4855003d6af6987ad8fd1243 | talos_data | 1422517329 | ... | active | | 19270 | 7776000 | 0790f4241237b312adac45fbc640f95d1d01fb0b | talos_data | 1422517329 | ... | active | | 19269 | 7776000 | e4e47a74c5675adecf08304f8cb6986199123b5b | talos_data | 1422517329 | ... | active | | NULL | NULL | NULL | NULL | NULL | NULL | NULL | + ------- + --------------------- + --------------------- + --------- + ----------------- + --------- + ------------------ + 11 rows
The grep in comment 1 now finds entries in the log - the task wasn't running before, but is now. Perhaps splitting the default queue fixed this? (Though who knows why)
Yes, for whatever reason it looks like this just started working. We have the last few pushes worth of data: https://treeherder.allizom.org/perf.html#/graphs?timerange=604800&seriesList=%5B%5B%22mozilla-central%22,%229cfc271dab9b7fc2c1229736fecfbbc6e7c5fac9%22%5D%5D
Flags: needinfo?(wlachance)
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.