Reduce max-run-time for Marionette jobs in CI
Categories
(Testing :: Marionette, task, P1)
Tracking
(firefox109 fixed)
Tracking | Status | |
---|---|---|
firefox109 | --- | fixed |
People
(Reporter: whimboo, Assigned: whimboo)
References
Details
(Whiteboard: [webdriver:m5])
Attachments
(1 file)
As discovered https://phabricator.services.mozilla.com/D96466#inline-544594 we use a max-run-time of 5400s at the moment. That is quite a lot for the Marionette jobs.
We should reduce it for all Mn jobs to a lower value. I will check what our usual runtime is for the job across platforms, and if we simply can reduce the timeout, or maybe have to divide the jobs into chunks.
Assignee | ||
Comment 1•2 years ago
|
||
Joel, does one of you have a tool that can scrape jobs of specific types on Treeherder/Taskcluster and fetch their duration? Doing that manually is a bit bothersome, and I could imagine that also other job types could benefit from that.
Comment 2•2 years ago
|
||
there is no simple tool that I know of, ./mach test-info ..
can provide some insight into specific tests, but not really test jobs. You could query the treeherder database on redash, or use activedata as well.
I did a quick redash query using the treeherder
datasource:
select
jt.symbol,
jt.name,
j.end_time-j.start_time
from
job j,
job_type jt
where
j.job_type_id=jt.id
and jt.name like '%marionette%'
and j.result='success'
limit 100
it should get you a headstart if you want to use redash.
Assignee | ||
Comment 3•2 years ago
|
||
As discussed on Matrix the above query doesn't work in different ways. So here an updated one:
select
jt.name,
j.start_time,
j.end_time,
timestampdiff(second, j.start_time, j.end_time) as seconds,
jl.url
from
job j,
job_type jt,
job_log jl
where
j.job_type_id=jt.id
and jl.job_id=j.id
and jt.name like '%opt-marionette-e10s%'
and jt.name not like '%ccov%'
and jt.name not like '%devedition%'
and j.result='success'
and timestampdiff(second, j.start_time, j.end_time) > 2000
limit 100
Basically here what I got suggested...
- No job should run longer than 3600s
- for jobs of opt builds 1800s is fine
- for jobs of debug/asan/ccov builds 2700s - 3600s are fine
Jobs that take longer should be split into multiple chunks.
Assignee | ||
Comment 4•2 months ago
|
||
Lets test the proposed timeouts from the last comment if those work and if not if we could chunk the Mn jobs.
Assignee | ||
Comment 5•2 months ago
|
||
I've pushed a try build. Lets see how it works:
https://treeherder.mozilla.org/jobs?repo=try&revision=4dfdeb6dcba9365935790e8eeaf5f3c6253eb055
Assignee | ||
Comment 6•2 months ago
|
||
Updated•2 months ago
|
Pushed by hskupin@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/b034b4e4cb99 [marionette] Reduce max-run-time for jobs in Taskcluster. r=jmaher
Assignee | ||
Updated•2 months ago
|
Comment 8•2 months ago
|
||
bugherder |
Description
•