Closed Bug 1749266 Opened 4 years ago Closed 4 years ago

Intermittent pup [taskcluster:error] Task timeout after 1800 seconds. Force killing container

Tracking

(firefox-esr91 unaffected, firefox96 unaffected, firefox97 fixed, firefox98 fixed)

Status:

RESOLVED FIXED

Milestone:

98 Branch

Tracking Flags:

Tracking

Status

firefox-esr91

---

unaffected

firefox96

---

unaffected

firefox97

---

fixed

firefox98

---

fixed

People

(Reporter: whimboo, Assigned: jdescottes)

References

(Regression)

Details

(Keywords: intermittent-failure, regression)

Attachments

(1 file)

Bug 1749266 - [remote] Increase timeout for puppeteer tasks to 3600 seconds 4 years ago Julian Descottes [:jdescottes] 48 bytes, text/x-phabricator-request		Details \| Review

Henrik Skupin [:whimboo][⌚️UTC+2]

Reporter

Description

•

4 years ago

For bug 1580652 we currently see a lot of task timeouts and nearly all are coming from our Puppeteer job. It looks like this is a regression from bug 1732958.

Checking some of them we seem to stall in the following test:

Target Browser.waitForTarget should wait for a target
Target should create a worker from a shared worker
Target should report when a target url changes
waittask specs Page.waitFor should allow you to select an element with parenthesis-starting xpath
waittask specs Page.waitFor should wait for selector
waittask specs Frame.waitForSelector should respond to node attribute mutation
waittask specs Frame.waitForXPath should allow you to select an element with single slash
waittask specs Frame.waitForXPath should throw when frame is detached

Julian, could you please have a look?

Flags: needinfo?(jdescottes)

Henrik Skupin [:whimboo][⌚️UTC+2]

Reporter

Comment 1

•

4 years ago

I requested from the sheriffs the failures to be re-classified for this bug.

Summary: Intermittent Pup [taskcluster:error] Task timeout after 1800 seconds. Force killing container → Intermittent pup [taskcluster:error] Task timeout after 1800 seconds. Force killing container

Julian Descottes [:jdescottes]

Assignee

Comment 2

•

4 years ago

As discussed on elements, failing runs seem to waste around 10 minutes on a "clone" step:

[vcs 2022-01-08T11:39:54.869Z] fetching hgmointernal config from http://taskcluster/secrets/v1/secret/project/taskcluster/gecko/hgmointernal
[vcs 2022-01-08T11:39:55.013Z] hgmointernal rate miss; using public hg.mozilla.org service
[vcs 2022-01-08T11:39:55.013Z] fetching hg.mozilla.org fingerprint from http://taskcluster/secrets/v1/secret/project/taskcluster/gecko/hgfingerprint
[vcs 2022-01-08T11:39:55.091Z] executing ['hg', 'robustcheckout', '--sharebase', '/builds/worker/checkouts/hg-store', '--purge', '--config', 'hostsecurity.hg.mozilla.org:fingerprints=sha256:4D:EB:21:6E:35:2F:99:C6:8F:C3:47:9B:57:B8:6C:17:15:8F:86:09:D4:6C:17:1D:87:B0:DE:F9:0E:51:70:FC,sha256:FF:E7:8D:93:E9:56:3C:C0:19:FC:00:4C:18:B9:86:E5:08:E5:10:F5:E2:EA:48:E8:22:D3:A3:3A:CA:99:C3:4C', '--upstream', 'https://hg.mozilla.org/mozilla-unified', '--revision', '4427e203c432ae8a2db310112bbc86a598ca1203', 'https://hg.mozilla.org/mozilla-central', '/builds/worker/checkouts/gecko']
[vcs 2022-01-08T11:39:55.249Z] (using Mercurial 5.8.1)
[vcs 2022-01-08T11:39:55.249Z] ensuring https://hg.mozilla.org/mozilla-central@4427e203c432ae8a2db310112bbc86a598ca1203 is available at /builds/worker/checkouts/gecko
[vcs 2022-01-08T11:39:56.046Z] (cloning from upstream repo https://hg.mozilla.org/mozilla-unified)
[vcs 2022-01-08T11:39:56.216Z] (sharing from new pooled repository 8ba995b74e18334ab3707f27e9eb8f4e37ba3d29)
[vcs 2022-01-08T11:39:57.010Z] applying clone bundle from https://s3-external-1.amazonaws.com/moz-hg-bundles-us-east-1/mozilla-unified/4427e203c432ae8a2db310112bbc86a598ca1203.stream-v2.hg

https://firefoxci.taskcluster-artifacts.net/Jft1NJWuTm2TmGnWknp8dQ/0/public/logs/live_backing.log

A successful run will not have this 10mn clone: https://firefoxci.taskcluster-artifacts.net/GCjlo76NQ9G09xrDrQyK0w/0/public/logs/live_backing.log

Since a successful run already takes close to 30mn, the 10mn delay could be enough to make the test exceed the 1800s (=30mn) timeout. Since the last sync added new tests, I imagine we run into this issue more consistently anytime this clone happens.

My suggestion is to increase the timeout and see if it solves the issue. Alternatively we could skip failing/timeout tests, which would reduce the runtime to 10mn ish.

Flags: needinfo?(jdescottes)

Henrik Skupin [:whimboo][⌚️UTC+2]

Reporter

Comment 3

•

4 years ago

Slow cloning is known via bug 1487587. So yes, lets increase the timeout for the puppeteer job to circumvent this problem for now. Thanks!

Depends on: 1487587

Julian Descottes [:jdescottes]

Assignee

Comment 4

•

4 years ago

Slow cloning is about clone taking 15-30mn instead of 10mn, however we have the issue even if the clone is fast.
The tests are only successful when we are able to start without cloning, by using a cached version of the repo.

That's what I assume the following log line is about:

(existing repository shared store: /builds/worker/checkouts/hg-store/8ba995b74e18334ab3707f27e9eb8f4e37ba3d29/.hg)

Julian Descottes [:jdescottes]

Assignee

Comment 5

•

4 years ago

Attached file Bug 1749266 - [remote] Increase timeout for puppeteer tasks to 3600 seconds — Details

The test itself takes around 25 minutes. If the task hits the clone step, it will take an additional 10minutes.
Note that there is also a slow clone issue where clone can take between 15 and 30 mn.

Phabricator Automation

Updated

•

4 years ago

Assignee: nobody → jdescottes

Status: NEW → ASSIGNED

Pulsebot

Comment 6

•

4 years ago

Pushed by jdescottes@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/f106f16199d0 [remote] Increase timeout for puppeteer tasks to 3600 seconds r=jmaher,whimboo

Alexandru Michis [:malexandru]

Comment 7

•

4 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/f106f16199d0

Status: ASSIGNED → RESOLVED

Closed: 4 years ago

status-firefox98: --- → fixed

Resolution: --- → FIXED

Target Milestone: --- → 98 Branch

Sebastian Hengst [:aryx] (needinfo me if it's about an intermittent or backout)

Comment 8

•

4 years ago

bugherder uplift

https://hg.mozilla.org/releases/mozilla-beta/rev/8490c61e343b

status-firefox97: affected → fixed

BugBot [:suhaib / :marco/ :calixte]

Comment 9

•

4 years ago

Set release status flags based on info from the regressing bug 1732958

status-firefox-esr91: --- → unaffected

Comment hidden (Intermittent Failures Robot)

BugBot [:suhaib / :marco/ :calixte]

Updated

•

4 years ago

Has Regression Range: --- → yes

You need to log in before you can comment on or make changes to this bug.

Bugzilla

Intermittent pup [taskcluster:error] Task timeout after 1800 seconds. Force killing container

Categories

(Remote Protocol :: CDP, defect, P5)

Tracking

(firefox-esr91 unaffected, firefox96 unaffected, firefox97 fixed, firefox98 fixed)

People

(Reporter: whimboo, Assigned: jdescottes)

References

(Regression)

Details

(Keywords: intermittent-failure, regression)

Crash Data

Security

(public)

User Story

Attachments

(1 file)

Description

Comment 1

Comment 2

Comment 3

Comment 4

Comment 5

Updated

Comment 6

Comment 7

Comment 8

Comment 9

Comment 10

Updated

Attachment

General

Description

File Name

Content Type