Closed Bug 1821513 Opened 1 year ago Closed 1 year ago

[meta] High frequency [taskcluster:error] Task timeout after 2700 seconds / 1800 seconds / 3600 seconds / 7200 seconds. Force killing container. -> Hang during various stages of the build

Tracking

(firefox-esr102 fixed, firefox112 fixed, firefox113 fixed)

Status:

RESOLVED FIXED

Milestone:

113 Branch

Tracking Flags:

Tracking

Status

firefox-esr102

---

fixed

firefox112

---

fixed

firefox113

---

fixed

People

(Reporter: intermittent-bug-filer, Assigned: jcristau)

References

(Depends on 1 open bug)

Details

(Keywords: intermittent-failure, meta, Whiteboard: [stockwell infra])

Attachments

(10 files)

Bug 1821513 - add worker image/pool for testing. r?masterwayz 1 year ago Julien Cristau [:jcristau] 48 bytes, text/x-phabricator-request		Details \| Review
Bug 1821513 - bump disk size for test pool. r?masterwayz,#releng 1 year ago Julien Cristau [:jcristau] 48 bytes, text/x-phabricator-request		Details \| Review
Bug 1821513 - Revert disk size change and use new image r=jcristau 1 year ago Michelle Goossens [:masterwayz] 48 bytes, text/x-phabricator-request		Details \| Review
Bug 1821513 - update l1 docker-worker image. r?masterwayz 1 year ago Julien Cristau [:jcristau] 48 bytes, text/x-phabricator-request		Details \| Review
Bug 1821513 - update robustcheckout hg extension. r?#releng 1 year ago Julien Cristau [:jcristau] 48 bytes, text/x-phabricator-request		Details \| Review
Bug 1821513 - update l3 docker-worker image. r?MasterWayZ 1 year ago Julien Cristau [:jcristau] 48 bytes, text/x-phabricator-request		Details \| Review
Bug 1821513 - patch mercurial debian package to make http.timeout option work 1 year ago Julien Cristau [:jcristau] 48 bytes, text/x-phabricator-request		Details \| Review
Bug 1821513 - set a timeout for fetch-content downloads. r?#releng 1 year ago Julien Cristau [:jcristau] 48 bytes, text/x-phabricator-request		Details \| Review
Bug 1821513 - bump decision docker image version. r?#releng 1 year ago Julien Cristau [:jcristau] 48 bytes, text/x-phabricator-request		Details \| Review
Bug 1821513 - remove test pool, no longer necessary. r?masterwayz 1 year ago Julien Cristau [:jcristau] 48 bytes, text/x-phabricator-request		Details \| Review

Treeherder Bug Filer

Reporter

Description

•

1 year ago

treeherder

Filed by: imoraru [at] mozilla.com
Parsed log: https://treeherder.mozilla.org/logviewer?job_id=408420613&repo=autoland
Full log: https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/SxIBvUG0T0aoVj9ZBkrtwg/runs/0/artifacts/public/logs/live_backing.log

[taskcluster 2023-03-09 17:21:13.355Z] === Task Starting ===
 * Starting system message bus dbus       
[setup 2023-03-09T17:21:16.027Z] run-task started in /builds/worker
[cache 2023-03-09T17:21:16.030Z] cache /builds/worker/checkouts is empty; writing requirements: gid=1000 uid=1000 version=1
[cache 2023-03-09T17:21:16.030Z] cache /builds/worker/tooltool-cache is empty; writing requirements: gid=1000 uid=1000 version=1
[volume 2023-03-09T17:21:16.030Z] changing ownership of volume /builds/worker/.cache to 1000:1000
[volume 2023-03-09T17:21:16.030Z] volume /builds/worker/checkouts is a cache
[volume 2023-03-09T17:21:16.030Z] volume /builds/worker/tooltool-cache is a cache
[volume 2023-03-09T17:21:16.030Z] changing ownership of volume /builds/worker/workspace to 1000:1000
[setup 2023-03-09T17:21:16.030Z] running as worker:worker
[vcs 2023-03-09T17:21:16.030Z] fetching hgmointernal config from http://taskcluster/secrets/v1/secret/project/taskcluster/gecko/hgmointernal
[vcs 2023-03-09T17:21:16.127Z] region google/us-central1 not yet supported; using public hg.mozilla.org service
[vcs 2023-03-09T17:21:16.127Z] executing ['hg', 'robustcheckout', '--sharebase', '/builds/worker/checkouts/hg-store', '--purge', '--upstream', 'https://hg.mozilla.org/mozilla-unified', '--revision', '8aea0e78341420231309f814d26a9819e600c94f', 'https://hg.mozilla.org/integration/autoland', '/builds/worker/checkouts/gecko']
[vcs 2023-03-09T17:21:16.479Z] (using Mercurial 5.8.1)
[vcs 2023-03-09T17:21:16.480Z] ensuring https://hg.mozilla.org/integration/autoland@8aea0e78341420231309f814d26a9819e600c94f is available at /builds/worker/checkouts/gecko
[vcs 2023-03-09T17:21:17.007Z] (cloning from upstream repo https://hg.mozilla.org/mozilla-unified)
[vcs 2023-03-09T17:21:17.724Z] (sharing from new pooled repository 8ba995b74e18334ab3707f27e9eb8f4e37ba3d29)
[vcs 2023-03-09T17:21:18.377Z] applying clone bundle from https://storage.googleapis.com/moz-hg-bundles-gcp-us-central1/mozilla-unified/4d5e640759776d678d17822c70cece3e395e72c1.stream-v2.hg
[vcs 2023-03-09T17:21:18.460Z] 703336 files to transfer, 4.39 GB of data
[vcs 2023-03-09T17:21:20.466Z] 
[vcs 2023-03-09T17:21:21.463Z] clone [=>                                           ]  269189773/4713057607 34s
[vcs 2023-03-09T17:21:22.463Z] clone [==>                                          ]  359632057/4713057607 37s
<...>
[vcs 2023-03-09T17:22:25.470Z] clone [==================================>          ] 3673829845/4713057607 19s
[vcs 2023-03-09T17:22:26.470Z] clone [==================================>          ] 3762565975/4713057607 17s
[vcs 2023-03-09T17:22:27.502Z] clone [===================================>         ] 3823083326/4713057607 16s
[taskcluster:error] Task timeout after 2700 seconds. Force killing container.
[taskcluster 2023-03-09 18:06:13.764Z] === Task Finished ===
[taskcluster 2023-03-09 18:06:13.765Z] Unsuccessful task run with exit code: -1 completed in 2779.496 seconds

Iulian Moraru

Comment 1

•

1 year ago

•

Edited

Hi Andrew! Can you please take a look at this? In the last few days we saw an increase in frequency in "run" build bustages.
They fail like this or with failure lines that would go to Bug 1705852.
Thank you!

Flags: needinfo?(ahal)

Iulian Moraru

Comment 2

•

1 year ago

•

Edited

From what I can see, so far, the ones that fail because of the hang during cloning are all on Android 5.0 x86 pgo.
And the ones that fail with

[task 2023-03-09T13:43:26.860Z] adb Using adb 1.0.41
[task 2023-03-09T13:48:26.863Z] 13:48:26     INFO - Sleeping 10 seconds
[task 2023-03-09T13:48:36.867Z] 13:48:36     INFO - >> Verify Android boot completed: Attempt #2 of 30
[task 2023-03-09T13:48:36.872Z] adb Using adb 1.0.41
[taskcluster:error] Task timeout after 2700 seconds. Force killing container.
[taskcluster 2023-03-09 13:53:02.734Z] === Task Finished ===
[taskcluster 2023-03-09 13:53:02.735Z] Unsuccessful task run with exit code: -1 completed in 2780.163 seconds

are on Android 5.0 x86-64 pgo.

Iulian Moraru

Comment 3

•

1 year ago

I guess this is not just Android related, found the same hang on OSX and Windows.

Summary: Frequent Android [taskcluster:error] Task timeout after 2700 seconds. Force killing container. -> Hang during cloning phase. → Frequent [taskcluster:error] Task timeout after 2700 seconds. Force killing container. / [taskcluster:error] Task timeout after 7200 seconds. Force killing container. -> Hang during cloning phase.

Treeherder Bug Filer

Reporter