Intermittent [test-linux.sh:error] Failed to download and unzip mozharness
Categories
(Firefox Build System :: Task Configuration, task, P5)
Tracking
(Not tracked)
People
(Reporter: intermittent-bug-filer, Unassigned)
Details
(Keywords: intermittent-failure)
Updated•7 years ago
|
| Comment hidden (Intermittent Failures Robot) |
| Comment hidden (Intermittent Failures Robot) |
| Comment hidden (Intermittent Failures Robot) |
| Comment hidden (Intermittent Failures Robot) |
| Comment hidden (Intermittent Failures Robot) |
| Comment hidden (Intermittent Failures Robot) |
| Comment hidden (Intermittent Failures Robot) |
| Comment hidden (Intermittent Failures Robot) |
| Comment hidden (Intermittent Failures Robot) |
| Comment hidden (Intermittent Failures Robot) |
| Comment hidden (Intermittent Failures Robot) |
| Comment hidden (Intermittent Failures Robot) |
| Comment hidden (Intermittent Failures Robot) |
| Comment hidden (Intermittent Failures Robot) |
Comment 15•7 years ago
|
||
Updated•7 years ago
|
| Comment hidden (Intermittent Failures Robot) |
Comment 17•7 years ago
|
||
| Comment hidden (Intermittent Failures Robot) |
| Comment hidden (Intermittent Failures Robot) |
| Comment hidden (Intermittent Failures Robot) |
Comment 21•7 years ago
|
||
Comment 22•7 years ago
|
||
| Comment hidden (Intermittent Failures Robot) |
| Comment hidden (Intermittent Failures Robot) |
| Comment hidden (Intermittent Failures Robot) |
| Comment hidden (Intermittent Failures Robot) |
| Comment hidden (Intermittent Failures Robot) |
| Comment hidden (Intermittent Failures Robot) |
| Comment hidden (Intermittent Failures Robot) |
| Comment hidden (Intermittent Failures Robot) |
| Comment hidden (Intermittent Failures Robot) |
| Assignee | ||
Updated•6 years ago
|
| Comment hidden (Intermittent Failures Robot) |
| Comment hidden (Intermittent Failures Robot) |
Comment 34•6 years ago
|
||
Comment 35•6 years ago
|
||
| Comment hidden (Intermittent Failures Robot) |
| Comment hidden (Intermittent Failures Robot) |
Comment 38•6 years ago
|
||
This download operation should grow a retry to retry in these circumstances.
Comment 39•6 years ago
|
||
It looks like it already has retries. Given the number of failures here, it seems not worth it to deal with this
| Comment hidden (Intermittent Failures Robot) |
Comment 41•6 years ago
|
||
| Comment hidden (Intermittent Failures Robot) |
Updated•6 years ago
|
| Comment hidden (Intermittent Failures Robot) |
| Comment hidden (Intermittent Failures Robot) |
Comment 45•6 years ago
|
||
Comment 46•6 years ago
|
||
Recent failure log: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=255629043&repo=autoland&lineNumber=883
| Comment hidden (Intermittent Failures Robot) |
| Comment hidden (Intermittent Failures Robot) |
Comment 49•6 years ago
|
||
There are 31 total failures in the last 7 days on android-em-7-0-x86_64 debug and opt and android-hw-p2-8-0-android-aarch64 pgo.
Recent failure log: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=259704228&repo=autoland&lineNumber=349
[task 2019-08-02T23:02:43.594Z] Download failed, retrying in 2 seconds...
[task 2019-08-02T23:02:43.594Z] + sleep 2
[task 2019-08-02T23:02:43.594Z] + timeout=4
[task 2019-08-02T23:02:43.594Z] + attempt=2
[task 2019-08-02T23:02:43.594Z] + [[ 2 < 10 ]]
[task 2019-08-02T23:02:43.594Z] + fail 'Failed to download and unzip mozharness'
[task 2019-08-02T23:02:43.594Z] + echo
[task 2019-08-02T23:02:43.594Z]
[task 2019-08-02T23:02:43.594Z] + echo '[test-linux.sh:error]' 'Failed to download and unzip mozharness'
[task 2019-08-02T23:02:43.594Z] [test-linux.sh:error] Failed to download and unzip mozharness
[task 2019-08-02T23:02:43.594Z] + exit 1
[task 2019-08-02T23:02:43.594Z] cleanup
[task 2019-08-02T23:02:43.594Z] + cleanup
[task 2019-08-02T23:02:43.594Z] + local rv=1
[task 2019-08-02T23:02:43.594Z] + [[ -s /builds/worker/.xsession-errors ]]
[task 2019-08-02T23:02:43.594Z] + false
[task 2019-08-02T23:02:43.594Z] + exit 1
[task 2019-08-02T23:02:43.594Z]
[task 2019-08-02T23:02:43.594Z] netstat -aop
[task 2019-08-02T23:02:43.594Z] Active Internet connections (servers and established)
[task 2019-08-02T23:02:43.594Z] Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name Timer
[task 2019-08-02T23:02:43.594Z] tcp 0 0 127.0.0.11:39021 : LISTEN - off (0.00/0/0)
[task 2019-08-02T23:02:43.594Z] tcp 0 0 bitbar-ubuntu-238:57946 hg.public.mdc1.mo:https ESTABLISHED 21/generic-worker keepalive (3.30/0/0)
[task 2019-08-02T23:02:43.594Z] tcp 0 0 bitbar-ubuntu-238:43432 ec2-54-70-57-161.:https ESTABLISHED 21/generic-worker off (0.00/0/0)
[task 2019-08-02T23:02:43.594Z] tcp 0 0 bitbar-ubuntu-238:53644 lga15s43-in-f42.1:https ESTABLISHED 19/python off (0.00/0/0)
[task 2019-08-02T23:02:43.594Z] tcp 0 0 localhost:5037 localhost:34519 TIME_WAIT - timewait (59.89/0/0)
[task 2019-08-02T23:02:43.594Z] tcp 0 0 localhost:51314 localhost:60022 ESTABLISHED 21/generic-worker keepalive (25.83/0/0)
[task 2019-08-02T23:02:43.594Z] tcp 0 0 bitbar-ubuntu-238:45536 sfo07s13-in-f10.1:https ESTABLISHED 19/python off (0.00/0/0)
[task 2019-08-02T23:02:43.594Z] tcp6 0 0 [::]:60099 [::]:* LISTEN 50/livelog off (0.00/0/0)
[task 2019-08-02T23:02:43.594Z] tcp6 0 0 [::]:60022 [::]:* LISTEN 50/livelog off (0.00/0/0)
[task 2019-08-02T23:02:43.594Z] tcp6 0 0 localhost:60022 localhost:51314 ESTABLISHED 50/livelog keepalive (89.93/0/0)
[task 2019-08-02T23:02:43.594Z] udp 0 0 127.0.0.11:56180 : - off (0.00/0/0)
[task 2019-08-02T23:02:43.594Z] Active UNIX domain sockets (servers and established)
[task 2019-08-02T23:02:43.594Z] Proto RefCnt Flags Type State I-Node PID/Program name Path
[task 2019-08-02T23:02:43.594Z]
[task 2019-08-02T23:02:43.594Z]
[task 2019-08-02T23:02:43.594Z]
[task 2019-08-02T23:02:43.594Z] script.py exitcode 1
[taskcluster 2019-08-02T23:02:43.611Z] Exit Code: 1
[taskcluster 2019-08-02T23:02:43.611Z] User Time: 411.76ms
[taskcluster 2019-08-02T23:02:43.611Z] Kernel Time: 249.31ms
[taskcluster 2019-08-02T23:02:43.611Z] Wall Time: 1m23.383673664s
[taskcluster 2019-08-02T23:02:43.611Z] Result: FAILED
[taskcluster 2019-08-02T23:02:43.611Z] === Task Finished ===
[taskcluster 2019-08-02T23:02:43.611Z] Task Duration: 1m25.068088501s
[taskcluster 2019-08-02T23:02:44.424Z] Uploading redirect artifact public/logs/live.log to URL https://queue.taskcluster.net/v1/task/dA08qvPJQ62rtS5ESfBF-w/runs/0/artifacts/public/logs/live_backing.log with mime type "text/plain; charset=utf-8" and expiry 2020-08-01T20:27:37.007Z
[taskcluster:error] exit status 1
Tom can you please assign someone to take a look?
Comment 50•6 years ago
|
||
Nearly all the recent failures were on packet.net on July 31; there was scheduled maintenance on a packet.net switch on that day.
Comment 51•6 years ago
|
||
Thank you Geoff.
| Comment hidden (Intermittent Failures Robot) |
Comment 53•6 years ago
|
||
Comment 54•6 years ago
|
||
New occurrence:
https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=275970280&repo=autoland&lineNumber=311
| Comment hidden (Intermittent Failures Robot) |
| Comment hidden (Intermittent Failures Robot) |
Comment 57•5 years ago
|
||
https://wiki.mozilla.org/Bug_Triage#Intermittent_Test_Failure_Cleanup
For more information, please visit auto_nag documentation.
Comment 58•5 years ago
|
||
Recent failure log: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=293405153&repo=autoland&lineNumber=502
Comment 59•5 years ago
|
||
I've rebooted several of the devices per Wander's recommendations. It doesn't seem to help.
Machine-13 was just rebooted and has had 4 failures.
Comment 60•5 years ago
|
||
I've quarantined the workers with a low success rate.
packet_hosts_to_quarantine = [4, 6, 10, 13, 14, 16, 43, 53, 54, 65, 68]
gecko-t-linux.machine-4 {sr: [ ] 0.0%, suc: 0, cmp: 3, exc: 2, rng: 2, alerts: ['Low health (less than 0.85)!']}
gecko-t-linux.machine-6 {sr: [ ] 0.0%, suc: 0, cmp: 1, exc: 1, rng: 1, alerts: ['Low health (less than 0.85)!']}
gecko-t-linux.machine-10 {sr: [=== ] 37.5%, suc: 6, cmp: 16, exc: 4, rng: 0, alerts: ['Low health (less than 0.85)!']}
gecko-t-linux.machine-13 {sr: [ ] 0.0%, suc: 0, cmp: 11, exc: 6, rng: 3, alerts: ['Low health (less than 0.85)!']}
gecko-t-linux.machine-14 {sr: [ ] 0.0%, suc: 0, cmp: 2, exc: 0, rng: 4, alerts: ['Low health (less than 0.85)!']}
gecko-t-linux.machine-16 {sr: [= ] 16.7%, suc: 2, cmp: 12, exc: 5, rng: 3, alerts: ['Low health (less than 0.85)!']}
gecko-t-linux.machine-43 {sr: [ ] 6.2%, suc: 1, cmp: 16, exc: 4, rng: 0, alerts: ['Low health (less than 0.85)!']}
gecko-t-linux.machine-53 {sr: [====== ] 66.7%, suc: 4, cmp: 6, exc: 0, rng: 4, alerts: ['Low health (less than 0.85)!']}
gecko-t-linux.machine-54 {sr: [== ] 25.0%, suc: 3, cmp: 12, exc: 6, rng: 2, alerts: ['Low health (less than 0.85)!']}
gecko-t-linux.machine-65 {sr: [ ] 0.0%, suc: 0, cmp: 10, exc: 5, rng: 5, alerts: ['Low health (less than 0.85)!']}
gecko-t-linux.machine-68 {sr: [ ] 0.0%, suc: 0, cmp: 3, exc: 1, rng: 3, alerts: ['Low health (less than 0.85)!']}
| Comment hidden (Intermittent Failures Robot) |
Comment 62•5 years ago
|
||
(In reply to Andrew Erickson [:aerickson] from comment #60)
I've quarantined the workers with a low success rate.
packet_hosts_to_quarantine = [4, 6, 10, 13, 14, 16, 43, 53, 54, 65, 68]
gecko-t-linux.machine-4 {sr: [ ] 0.0%, suc: 0, cmp: 3, exc: 2, rng: 2, alerts: ['Low health (less than 0.85)!']} gecko-t-linux.machine-6 {sr: [ ] 0.0%, suc: 0, cmp: 1, exc: 1, rng: 1, alerts: ['Low health (less than 0.85)!']} gecko-t-linux.machine-10 {sr: [=== ] 37.5%, suc: 6, cmp: 16, exc: 4, rng: 0, alerts: ['Low health (less than 0.85)!']} gecko-t-linux.machine-13 {sr: [ ] 0.0%, suc: 0, cmp: 11, exc: 6, rng: 3, alerts: ['Low health (less than 0.85)!']} gecko-t-linux.machine-14 {sr: [ ] 0.0%, suc: 0, cmp: 2, exc: 0, rng: 4, alerts: ['Low health (less than 0.85)!']} gecko-t-linux.machine-16 {sr: [= ] 16.7%, suc: 2, cmp: 12, exc: 5, rng: 3, alerts: ['Low health (less than 0.85)!']} gecko-t-linux.machine-43 {sr: [ ] 6.2%, suc: 1, cmp: 16, exc: 4, rng: 0, alerts: ['Low health (less than 0.85)!']} gecko-t-linux.machine-53 {sr: [====== ] 66.7%, suc: 4, cmp: 6, exc: 0, rng: 4, alerts: ['Low health (less than 0.85)!']} gecko-t-linux.machine-54 {sr: [== ] 25.0%, suc: 3, cmp: 12, exc: 6, rng: 2, alerts: ['Low health (less than 0.85)!']} gecko-t-linux.machine-65 {sr: [ ] 0.0%, suc: 0, cmp: 10, exc: 5, rng: 5, alerts: ['Low health (less than 0.85)!']} gecko-t-linux.machine-68 {sr: [ ] 0.0%, suc: 0, cmp: 3, exc: 1, rng: 3, alerts: ['Low health (less than 0.85)!']}
I rebuilt all quarentined machines.
Comment 63•5 years ago
|
||
Thanks. :) I'll put them in gradually and monitor them.
| Comment hidden (Intermittent Failures Robot) |
Comment 65•5 years ago
|
||
https://wiki.mozilla.org/Bug_Triage#Intermittent_Test_Failure_Cleanup
For more information, please visit auto_nag documentation.
Comment 66•5 years ago
|
||
Andrew, we currently have two machines that are causing this failure:
https://firefox-ci-tc.services.mozilla.com/provisioners/terraform-packet/worker-types/gecko-t-linux/workers/packet-sjc1/machine-15
https://firefox-ci-tc.services.mozilla.com/provisioners/terraform-packet/worker-types/gecko-t-linux/workers/packet-sjc1/machine-1
Here is the failure log: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=298366671&repo=autoland&lineNumber=570
Should we quarantine them?
Updated•5 years ago
|
Comment 67•5 years ago
|
||
Comment 68•5 years ago
•
|
||
Another one: https://firefox-ci-tc.services.mozilla.com/provisioners/terraform-packet/worker-types/gecko-t-linux/workers/packet-sjc1/machine-8
This one is failing with: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=298376900&repo=autoland&lineNumber=14
Error downloading "public/image.tar.zst" from task ID "O-2tDt-zRWaHMcjqBsqOmQ". Error: connect ETIMEDOUT 35.190.5.182:443 Next Attempt in: 16518.39 ms
Update: https://firefox-ci-tc.services.mozilla.com/provisioners/terraform-packet/worker-types/gecko-t-linux/workers/packet-sjc1/machine-29 too
https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=298377286&repo=autoland&lineNumber=884
Comment 69•5 years ago
|
||
Issue seems to be bigger, so I filed bug 1631409 for it.
Comment 70•5 years ago
|
||
Yes, please feel free to quarantine if this is seen. I will quarantine the above hosts.r
Comment 71•5 years ago
|
||
packet hosts [15, 1, 53, 29, 14] have been quarantined.
Comment 72•5 years ago
|
||
I've also quarantined: [7, 2, 37, 46, 28, 45, 22, 64] due to success rate below 80%.
gecko-t-linux.machine-7 {sr: [ ] 0.0%, suc: 0, cmp: 13, exc: 7, rng: 0, alerts: ['Low health (less than 0.85)!']}
gecko-t-linux.machine-2 {sr: [= ] 11.8%, suc: 2, cmp: 17, exc: 3, rng: 0, alerts: ['Low health (less than 0.85)!']}
gecko-t-linux.machine-37 {sr: [=== ] 38.5%, suc: 5, cmp: 13, exc: 7, rng: 0, alerts: ['Low health (less than 0.85)!']}
gecko-t-linux.machine-46 {sr: [===== ] 50.0%, suc: 1, cmp: 2, exc: 0, rng: 5, alerts: ['Low health (less than 0.85)!']}
gecko-t-linux.machine-28 {sr: [===== ] 58.3%, suc: 7, cmp: 12, exc: 4, rng: 4, alerts: ['Low health (less than 0.85)!']}
gecko-t-linux.machine-45 {sr: [====== ] 66.7%, suc: 10, cmp: 15, exc: 5, rng: 0, alerts: ['Low health (less than 0.85)!']}
gecko-t-linux.machine-22 {sr: [======= ] 73.3%, suc: 11, cmp: 15, exc: 5, rng: 0, alerts: ['Low health (less than 0.85)!']}
gecko-t-linux.machine-64 {sr: [======= ] 76.5%, suc: 13, cmp: 17, exc: 3, rng: 0, alerts: ['Low health (less than 0.85)!']}
Comment 73•5 years ago
|
||
68 also quarantined due to success rate.
Comment 74•5 years ago
|
||
(In reply to Andrew Erickson [:aerickson] from comment #71)
packet hosts [15, 1, 53, 29, 14] have been quarantined.
I've rebuilt the offending workers.
| Comment hidden (Intermittent Failures Robot) |
Comment 76•5 years ago
|
||
https://wiki.mozilla.org/Bug_Triage#Intermittent_Test_Failure_Cleanup
For more information, please visit auto_nag documentation.
Comment 77•5 years ago
|
||
New occurrence: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=307689462&repo=autoland&lineNumber=487
| Comment hidden (Intermittent Failures Robot) |
Comment 79•5 years ago
|
||
https://wiki.mozilla.org/Bug_Triage#Intermittent_Test_Failure_Cleanup
For more information, please visit auto_nag documentation.
Comment 80•5 years ago
|
||
New occurrence: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=317640636&repo=autoland&lineNumber=238
| Comment hidden (Intermittent Failures Robot) |
Comment 82•5 years ago
|
||
https://wiki.mozilla.org/Bug_Triage#Intermittent_Test_Failure_Cleanup
For more information, please visit auto_nag documentation.
Comment 83•4 years ago
|
||
This is still happening.
Recent failure: https://treeherder.mozilla.org/logviewer?job_id=325762144&repo=autoland&lineNumber=498
| Comment hidden (Intermittent Failures Robot) |
Comment 85•4 years ago
|
||
https://wiki.mozilla.org/Bug_Triage#Intermittent_Test_Failure_Cleanup
For more information, please visit auto_nag documentation.
Description
•