Closed
Bug 1126549
Opened 10 years ago
Closed 10 years ago
populate-performance-series task not being scheduled on stage
Categories
(Tree Management :: Perfherder, defect)
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: wlach, Assigned: mdoglio)
Details
For some reason, it appears as if the populate-performance-series task is not being scheduled on stage.
Manual inspection indicates that performance artifacts and series properties are being updated by the log parsing task (as you'd expect). e.g. https://treeherder.allizom.org/api/project/mozilla-central/performance_artifact/
... however the series summary endpoint (e.g. https://treeherder.allizom.org//api/project/mozilla-central/performance-data/0/get_performance_series_summary/?interval=2592000) returns an empty set and the actual performance_series table is blank:
> SELECT * FROM mozilla_central_jobs_1.performance_series ORDER BY id DESC LIMIT 100
+ ------- + --------------------- + --------------------- + --------- + ----------------- + --------- + ------------------ +
| id | interval_seconds | series_signature | type | last_updated | blob | active_status |
+ ------- + --------------------- + --------------------- + --------- + ----------------- + --------- + ------------------ +
| NULL | NULL | NULL | NULL | NULL | NULL | NULL |
+ ------- + --------------------- + --------------------- + --------- + ----------------- + --------- + ------------------ +
1 rows
:edmorley and I spend about 30 minutes trying to debug this with no real luck. Do you have any idea what might be happening Mauro? There seem to be no problems on production. The task is being scheduled here:
https://github.com/mozilla/treeherder-service/blob/master/treeherder/etl/perf_data_adapters.py#L254
And is defined here:
https://github.com/mozilla/treeherder-service/blob/master/treeherder/model/tasks.py#L55
Flags: needinfo?(mdoglio)
Comment 1•10 years ago
|
||
Prod:
[emorley@treeherder-rabbitmq1.private.scl3 ~]$ grep 'populate-performance' /var/log/celery/celery_worker.log | tail
[2015-01-28 04:26:21,482: INFO/MainProcess] Task populate-performance-series[72ba946c-1925-4148-b623-94f7b9d7522e] succeeded in 8.23880121857s: None
[2015-01-28 04:27:07,200: INFO/MainProcess] Task populate-performance-series[1741108b-fb49-4976-84c0-717187bdac78] succeeded in 110.314319617s: None
[2015-01-28 04:27:07,202: INFO/MainProcess] Received task: populate-performance-series[f3e6c8b8-622b-47a8-adac-99ad49065cb2]
[2015-01-28 04:27:07,207: INFO/MainProcess] Received task: populate-performance-series[043da50d-1663-45ea-833d-d721836bfecd]
[2015-01-28 04:27:15,307: INFO/MainProcess] Received task: populate-performance-series[70f3deaa-d85f-4788-8097-724259f6b0fc]
[2015-01-28 04:27:29,767: INFO/MainProcess] Task populate-performance-series[84a4da72-8b7d-48fc-a7fb-58a8091dcc8d] succeeded in 111.270641737s: None
[2015-01-28 04:27:29,771: INFO/MainProcess] Received task: populate-performance-series[77092656-90f4-4aaf-8064-54a6980f7d9f]
[2015-01-28 04:27:41,205: INFO/MainProcess] Task populate-performance-series[19997949-3a9c-4fcf-98cc-bf74c6e0c442] succeeded in 11.4368384909s: None
[2015-01-28 04:27:41,207: INFO/MainProcess] Received task: populate-performance-series[d143dcd3-b0f5-42c4-a502-984964213b87]
[2015-01-28 04:27:48,855: INFO/MainProcess] Task populate-performance-series[bd92937c-68c1-47bc-ad29-770978468aa9] succeeded in 7.65052897111s: None
Stage:
[emorley@treeherder-rabbitmq1.stage.private.scl3 ~]$ grep 'populate-performance' /var/log/celery/celery_worker.log | tail
[emorley@treeherder-rabbitmq1.stage.private.scl3 ~]$
Comment 2•10 years ago
|
||
The tasks is supposed to be scheduled here:
https://github.com/mozilla/treeherder-service/blob/master/treeherder/etl/perf_data_adapters.py#L254
Which is called from:
https://github.com/mozilla/treeherder-service/blob/93ae4541feb7013bd7181f5a714644a51521e55b/treeherder/model/derived/jobs.py#L2196
There are no errors on new relic or in the various celery logs I've looked at so far.
However:
[emorley@treeherder-processor1.stage.private.scl3 ~]$ nc -vz treeherder-stage-ro-vip.db.scl3.mozilla.com 3306
...doesn't succeed.
And prior to the call above, we do:
https://github.com/mozilla/treeherder-service/blob/93ae4541feb7013bd7181f5a714644a51521e55b/treeherder/model/derived/jobs.py#L2168
Which uses get_reference_data_for_perf_signature, which is on read_host:
https://github.com/mozilla/treeherder-service/blob/c8b1c70f30a5a43d464714d1634b823bd049fdbd/treeherder/model/sql/reference.json#L168
...so we're just missing yet more flows, sigh.
Comment 3•10 years ago
|
||
Actually ignore that, since (a) I'm not sure processor1 is where these jobs are scheduled, and (b) stage proc1 does have flow to RW, and until recently we would have always been using that, yet still zero rows
[emorley@treeherder-processor1.stage.private.scl3 ~]$ nc -vzw 5 treeherder-stage-rw-vip.db.scl3.mozilla.com 3306
Connection to treeherder-stage-rw-vip.db.scl3.mozilla.com 3306 port [tcp/mysql] succeeded!
Assignee | ||
Comment 4•10 years ago
|
||
I can only tell the task is running looking at the logs. I'll take this
Assignee: nobody → mdoglio
Flags: needinfo?(mdoglio)
Assignee | ||
Comment 5•10 years ago
|
||
I can see data in mozilla_central_jobs_1.performance_series and
https://treeherder.allizom.org//api/project/mozilla-central/performance-data/0/get_performance_series_summary/?interval=2592000
doesn't return an empty dataset.
:wlach can you please double check that?
Flags: needinfo?(wlachance)
Comment 6•10 years ago
|
||
(In reply to William Lachance (:wlach) from comment #0)
> > SELECT * FROM mozilla_central_jobs_1.performance_series ORDER BY id DESC LIMIT 100
>
> + ------- + --------------------- + --------------------- + --------- +
> ----------------- + --------- + ------------------ +
> | id | interval_seconds | series_signature | type |
> last_updated | blob | active_status |
> + ------- + --------------------- + --------------------- + --------- +
> ----------------- + --------- + ------------------ +
> | NULL | NULL | NULL | NULL | NULL
> | NULL | NULL |
> + ------- + --------------------- + --------------------- + --------- +
> ----------------- + --------- + ------------------ +
> 1 rows
Strangely this is now returning rows:
> SELECT * FROM mozilla_central_jobs_1.performance_series ORDER BY id DESC LIMIT 10
+ ------- + --------------------- + --------------------- + --------- + ----------------- + --------- + ------------------ +
| id | interval_seconds | series_signature | type | last_updated | blob | active_status |
+ ------- + --------------------- + --------------------- + --------- + ----------------- + --------- + ------------------ +
| 19278 | 7776000 | 31e0458d778f376ce8c34f816e208884186da4ab | talos_data | 1422517329 | ... | active |
| 19277 | 7776000 | dc16afa7ab2945c05aec696049ca919d66a4ee4e | talos_data | 1422517329 | ... | active |
| 19276 | 7776000 | 334a5f74b57c0ab70b54f8bf11375bc089d47faa | talos_data | 1422517329 | ... | active |
| 19275 | 7776000 | 388aba0ec1ce9519774f789c45fbb639715d3746 | talos_data | 1422517329 | ... | active |
| 19274 | 7776000 | 7c7431d5d704cb0afc229791a7666a2744c0704b | talos_data | 1422517329 | ... | active |
| 19273 | 7776000 | bf4f17e6d8ff9f78e78104fe87a86460c100bdab | talos_data | 1422517329 | ... | active |
| 19272 | 7776000 | e637a25acaf53b700b6995863af919528662ff2e | talos_data | 1422517329 | ... | active |
| 19271 | 7776000 | 56e739c9b09d2d8a4855003d6af6987ad8fd1243 | talos_data | 1422517329 | ... | active |
| 19270 | 7776000 | 0790f4241237b312adac45fbc640f95d1d01fb0b | talos_data | 1422517329 | ... | active |
| 19269 | 7776000 | e4e47a74c5675adecf08304f8cb6986199123b5b | talos_data | 1422517329 | ... | active |
| NULL | NULL | NULL | NULL | NULL | NULL | NULL |
+ ------- + --------------------- + --------------------- + --------- + ----------------- + --------- + ------------------ +
11 rows
Comment 7•10 years ago
|
||
The grep in comment 1 now finds entries in the log - the task wasn't running before, but is now.
Perhaps splitting the default queue fixed this? (Though who knows why)
Reporter | ||
Comment 8•10 years ago
|
||
Yes, for whatever reason it looks like this just started working. We have the last few pushes worth of data: https://treeherder.allizom.org/perf.html#/graphs?timerange=604800&seriesList=%5B%5B%22mozilla-central%22,%229cfc271dab9b7fc2c1229736fecfbbc6e7c5fac9%22%5D%5D
Flags: needinfo?(wlachance)
Assignee | ||
Updated•10 years ago
|
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•