Intermittent pup [taskcluster:error] Task timeout after 1800 seconds. Force killing container
Categories
(Remote Protocol :: CDP, defect, P5)
Tracking
(firefox-esr91 unaffected, firefox96 unaffected, firefox97 fixed, firefox98 fixed)
Tracking | Status | |
---|---|---|
firefox-esr91 | --- | unaffected |
firefox96 | --- | unaffected |
firefox97 | --- | fixed |
firefox98 | --- | fixed |
People
(Reporter: whimboo, Assigned: jdescottes)
References
(Regression)
Details
(Keywords: intermittent-failure, regression)
Attachments
(1 file)
For bug 1580652 we currently see a lot of task timeouts and nearly all are coming from our Puppeteer job. It looks like this is a regression from bug 1732958.
Checking some of them we seem to stall in the following test:
Target Browser.waitForTarget should wait for a target
Target should create a worker from a shared worker
Target should report when a target url changes
waittask specs Page.waitFor should allow you to select an element with parenthesis-starting xpath
waittask specs Page.waitFor should wait for selector
waittask specs Frame.waitForSelector should respond to node attribute mutation
waittask specs Frame.waitForXPath should allow you to select an element with single slash
waittask specs Frame.waitForXPath should throw when frame is detached
Julian, could you please have a look?
Reporter | ||
Comment 1•2 years ago
|
||
I requested from the sheriffs the failures to be re-classified for this bug.
Assignee | ||
Comment 2•2 years ago
|
||
As discussed on elements, failing runs seem to waste around 10 minutes on a "clone" step:
[vcs 2022-01-08T11:39:54.869Z] fetching hgmointernal config from http://taskcluster/secrets/v1/secret/project/taskcluster/gecko/hgmointernal
[vcs 2022-01-08T11:39:55.013Z] hgmointernal rate miss; using public hg.mozilla.org service
[vcs 2022-01-08T11:39:55.013Z] fetching hg.mozilla.org fingerprint from http://taskcluster/secrets/v1/secret/project/taskcluster/gecko/hgfingerprint
[vcs 2022-01-08T11:39:55.091Z] executing ['hg', 'robustcheckout', '--sharebase', '/builds/worker/checkouts/hg-store', '--purge', '--config', 'hostsecurity.hg.mozilla.org:fingerprints=sha256:4D:EB:21:6E:35:2F:99:C6:8F:C3:47:9B:57:B8:6C:17:15:8F:86:09:D4:6C:17:1D:87:B0:DE:F9:0E:51:70:FC,sha256:FF:E7:8D:93:E9:56:3C:C0:19:FC:00:4C:18:B9:86:E5:08:E5:10:F5:E2:EA:48:E8:22:D3:A3:3A:CA:99:C3:4C', '--upstream', 'https://hg.mozilla.org/mozilla-unified', '--revision', '4427e203c432ae8a2db310112bbc86a598ca1203', 'https://hg.mozilla.org/mozilla-central', '/builds/worker/checkouts/gecko']
[vcs 2022-01-08T11:39:55.249Z] (using Mercurial 5.8.1)
[vcs 2022-01-08T11:39:55.249Z] ensuring https://hg.mozilla.org/mozilla-central@4427e203c432ae8a2db310112bbc86a598ca1203 is available at /builds/worker/checkouts/gecko
[vcs 2022-01-08T11:39:56.046Z] (cloning from upstream repo https://hg.mozilla.org/mozilla-unified)
[vcs 2022-01-08T11:39:56.216Z] (sharing from new pooled repository 8ba995b74e18334ab3707f27e9eb8f4e37ba3d29)
[vcs 2022-01-08T11:39:57.010Z] applying clone bundle from https://s3-external-1.amazonaws.com/moz-hg-bundles-us-east-1/mozilla-unified/4427e203c432ae8a2db310112bbc86a598ca1203.stream-v2.hg
https://firefoxci.taskcluster-artifacts.net/Jft1NJWuTm2TmGnWknp8dQ/0/public/logs/live_backing.log
A successful run will not have this 10mn clone: https://firefoxci.taskcluster-artifacts.net/GCjlo76NQ9G09xrDrQyK0w/0/public/logs/live_backing.log
Since a successful run already takes close to 30mn, the 10mn delay could be enough to make the test exceed the 1800s (=30mn) timeout. Since the last sync added new tests, I imagine we run into this issue more consistently anytime this clone happens.
My suggestion is to increase the timeout and see if it solves the issue. Alternatively we could skip failing/timeout tests, which would reduce the runtime to 10mn ish.
Reporter | ||
Comment 3•2 years ago
|
||
Slow cloning is known via bug 1487587. So yes, lets increase the timeout for the puppeteer job to circumvent this problem for now. Thanks!
Assignee | ||
Comment 4•2 years ago
|
||
Slow cloning is about clone taking 15-30mn instead of 10mn, however we have the issue even if the clone is fast.
The tests are only successful when we are able to start without cloning, by using a cached version of the repo.
That's what I assume the following log line is about:
(existing repository shared store: /builds/worker/checkouts/hg-store/8ba995b74e18334ab3707f27e9eb8f4e37ba3d29/.hg)
Assignee | ||
Comment 5•2 years ago
|
||
The test itself takes around 25 minutes. If the task hits the clone step, it will take an additional 10minutes.
Note that there is also a slow clone issue where clone can take between 15 and 30 mn.
Updated•2 years ago
|
Pushed by jdescottes@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/f106f16199d0 [remote] Increase timeout for puppeteer tasks to 3600 seconds r=jmaher,whimboo
Comment 7•2 years ago
|
||
bugherder |
Comment 8•2 years ago
|
||
bugherder uplift |
Comment 9•2 years ago
|
||
Set release status flags based on info from the regressing bug 1732958
Comment hidden (Intermittent Failures Robot) |
Updated•2 years ago
|
Description
•