gecko-t-bitbar-gw-unit-p2 Android 8.0 Pixel2 failures with | load failed: timed out waiting for reftest-wait to be removed | load failed: timed out after 300000 ms | application timed out after 370 seconds with no output
Categories
(Testing :: General, defect, P3)
Tracking
(Not tracked)
People
(Reporter: intermittent-bug-filer, Unassigned)
References
Details
(Keywords: intermittent-failure, Whiteboard: [stockwell:infra])
Crash Data
Filed by: csabou [at] mozilla.com
Parsed log: https://treeherder.mozilla.org/logviewer.html#?job_id=292966605&repo=autoland
Full log: https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/Tc1icqN9RJi-Wp5IPs3qbg/runs/0/artifacts/public/logs/live_backing.log
Reftest URL: https://hg.mozilla.org/mozilla-central/raw-file/tip/layout/tools/reftest/reftest-analyzer.xhtml#logurl=https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/Tc1icqN9RJi-Wp5IPs3qbg/runs/0/artifacts/public/logs/live_backing.log&only_show_unexpected=1
There are a lot of Android 8.0 Pixel2 jsreftests that fail with the following:
https://treeherder.mozilla.org/logviewer.html#?job_id=292967477&repo=autoland
https://treeherder.mozilla.org/logviewer.html#?job_id=292938627&repo=mozilla-central
https://treeherder.mozilla.org/logviewer.html#?job_id=292951061&repo=mozilla-beta
First failed here on autoland: https://treeherder.mozilla.org/#/jobs?repo=autoland&group_state=expanded&selectedJob=292956372&resultStatus=pending%2Crunning%2Csuccess%2Ctestfailed%2Cbusted%2Cexception&searchStr=reftest%2Candroid%2Cj&revision=651f6b4c7d7414fc73ea15080e8874bddc5c122b
these run on https://firefox-ci-tc.services.mozilla.com/provisioners/proj-autophone/worker-types/gecko-t-bitbar-gw-unit-p2
retriggered some initial green jobs but those failed too: https://treeherder.mozilla.org/#/jobs?repo=autoland&group_state=expanded&resultStatus=pending%2Crunning%2Csuccess%2Ctestfailed%2Cbusted%2Cexception&classifiedState=unclassified&searchStr=reftest%2Candroid%2Cj&revision=5f1e70cbdf730256a7c0c98271692ab3f9ead897&selectedJob=292966605
Comment 1•6 years ago
|
||
aerickson: It seems to be happening in trunk and beta without regard for the pushes which makes me wonder about networking at bitbar?
Comment 2•6 years ago
•
|
||
Range on autoland between last green one and first failures is this: https://treeherder.mozilla.org/#/jobs?repo=autoland&group_state=expanded&resultStatus=pending%2Crunning%2Csuccess%2Ctestfailed%2Cbusted%2Cexception&classifiedState=unclassified&searchStr=reftest%2Candroid%2Cj&tochange=651f6b4c7d7414fc73ea15080e8874bddc5c122b&fromchange=5f1e70cbdf730256a7c0c98271692ab3f9ead897&selectedJob=292910539
some other reftests running also on gecko-t-bitbar-gw-unit-p2 timing out: https://treeherder.mozilla.org/#/jobs?repo=autoland&resultStatus=testfailed%2Cbusted%2Cexception&revision=48bba598f1dbb2295bd2d2246a6d3d24727f280a&searchStr=Android%2C8.0%2CPixel2%2CAArch64%2CQuantumRender%2Cdebug&selectedJob=292970205
The q for the this platform in earthangel for these failures timeframe was: https://earthangel-b40313e5.influxcloud.net/d/slXwf4emz/workers?orgId=1&var-workerType=gecko-t-bitbar-gw-unit-p2&from=1584037800000&to=1584070200000
Thanks Bob for looking.
Comment 3•6 years ago
|
||
I think that also the spike of failures in Bug 1542628 is caused by this, they also run on gecko-t-bitbar-gw-unit-p2
https://treeherder.mozilla.org/intermittent-failures.html#/bugdetails?startday=2020-03-06&endday=2020-03-13&tree=trunk&bug=1542628
Found these mda failures too: https://treeherder.mozilla.org/#/jobs?repo=mozilla-beta&group_state=expanded&resultStatus=testfailed%2Cbusted%2Cexception&searchStr=mda%2CAndroid%2C8.0%2CPixel2&revision=aa506197596f9e90a6cc1c6b6879c13df40bd90e&selectedJob=292933295
and: https://treeherder.mozilla.org/#/jobs?repo=autoland&resultStatus=pending%2Crunning%2Ctestfailed%2Cbusted%2Cexception&searchStr=mda%2CAndroid%2C8.0%2CPixel2&fromchange=9c1fa3e6e991aba42289e88f0a09c7f6e500022c&tochange=48bba598f1dbb2295bd2d2246a6d3d24727f280a&selectedJob=292970208
| Comment hidden (Intermittent Failures Robot) |
| Comment hidden (Intermittent Failures Robot) |
| Comment hidden (Intermittent Failures Robot) |
Comment 7•6 years ago
•
|
||
Andrew, I checked several of these and some at least are due to the device losing wifi during the test. I can think of several things that we all could do to make this better.
- Ask Bitbar to figure it out and maybe strengthen the signal for the devices by adding/moving access points.
- Figure out a way to detect lost wifi in a test. Perhaps just checking that the device has an ip address would be sufficient. If we could detect lost wifi, we probably couldn't recover without a failure and an orange result but we could terminate the test quicker and not make it wait for the full timeout.
- When we lose wifi, exit with a retry status quickly so we just try again.
#1 Seems like a good first step but not sufficient in the long run. I'll file bugs for #2, #3 later this morning.
Comment 8•6 years ago
|
||
Sakari/Bitbar says that all devices have strong connections to the Wifi access point. They're open to trying different channels.
Comment 9•6 years ago
|
||
Let's revisit when we have bug 1622816 available.
Comment 10•6 years ago
|
||
We're getting more failures today. We haven't made any progress. Our planned remedies:
- BC/Snorp: Near term: Working to add code to mark job as RETRY if Wifi fails (via inspecting logcat?).
- aerickson: Long term: Going to investigate using USB Ethernet for devices.
Comment 11•6 years ago
•
|
||
After https://bugzilla.mozilla.org/show_bug.cgi?id=1622816#c6 was merged, it's failing with this message: "No tests run or test summary not found"
Here is the merge:
https://treeherder.mozilla.org/#/jobs?repo=mozilla-central&resultStatus=testfailed%2Cbusted%2Cexception&classifiedState=unclassified&searchStr=android%2C8.0%2Cpixel2%2Crefte&revision=7df85231b517bb1864e949b62c567ada54352105&selectedJob=294149475
Before the merge failed with: "https://treeherder.mozilla.org/#/jobs?repo=mozilla-central&searchStr=android%2C8.0%2Cpixel2%2Cpgo%2Creftests%2Ctest-android-hw-p2-8-0-arm7-api-16%2Fpgo-geckoview-jsreftest-e10s-1%2Cr%28j1%29&selectedJob=294149475":
https://treeherder.mozilla.org/#/jobs?repo=mozilla-central&searchStr=android%2C8.0%2Cpixel2%2Cpgo%2Creftests%2Ctest-android-hw-p2-8-0-arm7-api-16%2Fpgo-geckoview-jsreftest-e10s-1%2Cr%28j1%29&selectedJob=294149475
Comment 12•6 years ago
|
||
That is because the app is crashing with a java exception when the network connection is lost. This will help us diagnose the issue as an infrastructure and not a framework issue. See for example Caused by: java.lang.RuntimeException: Network connection has been lost.
We will be working to get Treeherder to offer this failure for classification and will work to get the jobs retried since this is an intermittent infra error. Once we have good data we can work with bitbar to resolve the issue.
| Comment hidden (Intermittent Failures Robot) |
| Comment hidden (Intermittent Failures Robot) |
| Comment hidden (Intermittent Failures Robot) |
Comment 16•6 years ago
|
||
Based on comment 12, removing the disable recommended tag
| Comment hidden (Intermittent Failures Robot) |
Updated•6 years ago
|
| Comment hidden (Intermittent Failures Robot) |
Comment 19•6 years ago
|
||
No new failures since 23th of March until 1st of April when started to fail again: https://treeherder.mozilla.org/#/jobs?repo=autoland&revision=3c7d041125689ebaa7d55e0dd1a18afe63ff838d&selectedJob=295779590
Comment 20•6 years ago
|
||
bitbar has some issues with their access points yesterday. They have supposedly fixed it as of 2020-04-01 4:30 PM PDT.
Comment 21•6 years ago
|
||
Thank you Bob, removing the disable tag.
Updated•6 years ago
|
Updated•6 years ago
|
| Comment hidden (Intermittent Failures Robot) |
Updated•6 years ago
|
| Comment hidden (Intermittent Failures Robot) |
| Comment hidden (Intermittent Failures Robot) |
Comment 26•6 years ago
|
||
Hi, this is a bug whose current Severity is blocker but needs to be updated for the new Severity values as of May 4 2020.
I am moving its severity to S1.
Please review this bug's Severity and let Release Management know if it still is a high Severity bug.
| Comment hidden (Intermittent Failures Robot) |
| Comment hidden (Intermittent Failures Robot) |
| Comment hidden (Intermittent Failures Robot) |
| Comment hidden (Intermittent Failures Robot) |
| Comment hidden (Intermittent Failures Robot) |
| Comment hidden (Intermittent Failures Robot) |
| Comment hidden (Intermittent Failures Robot) |
| Comment hidden (Intermittent Failures Robot) |
| Comment hidden (Intermittent Failures Robot) |
| Comment hidden (Intermittent Failures Robot) |
| Comment hidden (Intermittent Failures Robot) |
| Comment hidden (Intermittent Failures Robot) |
| Comment hidden (Intermittent Failures Robot) |
| Comment hidden (Intermittent Failures Robot) |
| Comment hidden (Intermittent Failures Robot) |
| Comment hidden (Intermittent Failures Robot) |
| Comment hidden (Intermittent Failures Robot) |
| Comment hidden (Intermittent Failures Robot) |
| Comment hidden (Intermittent Failures Robot) |
| Comment hidden (Intermittent Failures Robot) |
| Comment hidden (Intermittent Failures Robot) |
Comment 48•4 years ago
|
||
This appears to be very infrequent now... Marking down the severity.
Updated•4 years ago
|
Description
•