Intermittent layout/style/crashtests/1546255.html | application timed out after 370 seconds with no output
Categories
(Core :: CSS Parsing and Computation, defect, P5)
Tracking
()
People
(Reporter: intermittent-bug-filer, Unassigned)
References
(Regression)
Details
(Keywords: crash, intermittent-failure, regression, Whiteboard: [stockwell disabled])
Crash Data
Attachments
(2 files, 1 obsolete file)
Filed by: aciure [at] mozilla.com
Parsed log: https://treeherder.mozilla.org/logviewer.html#?job_id=248019165&repo=mozilla-central
Full log: https://queue.taskcluster.net/v1/task/OL6FdGIxRvCPluOeO2NjtA/runs/1/artifacts/public/logs/live_backing.log
Reftest URL: https://hg.mozilla.org/mozilla-central/raw-file/tip/layout/tools/reftest/reftest-analyzer.xhtml#logurl=https://queue.taskcluster.net/v1/task/OL6FdGIxRvCPluOeO2NjtA/runs/1/artifacts/public/logs/live_backing.log&only_show_unexpected=1
[task 2019-05-23T17:55:21.129Z] 17:55:21 INFO - REFTEST TEST-PASS | http://10.0.2.2:8854/tests/layout/style/crashtests/1546255.html | (LOAD ONLY)
[task 2019-05-23T17:55:21.129Z] 17:55:21 INFO - REFTEST TEST-END | http://10.0.2.2:8854/tests/layout/style/crashtests/1546255.html
[task 2019-05-23T18:01:52.889Z] 18:01:52 INFO - wait for org.mozilla.geckoview.test complete; top activity=org.mozilla.geckoview.test
[task 2019-05-23T18:01:52.992Z] 18:01:52 INFO - org.mozilla.geckoview.test unexpectedly found running. Killing...
[task 2019-05-23T18:01:52.992Z] 18:01:52 INFO - REFTEST TEST-INFO | started process screentopng
[task 2019-05-23T18:01:53.316Z] 18:01:53 INFO - REFTEST TEST-INFO | screentopng: exit 0
[task 2019-05-23T18:02:07.657Z] 18:02:07 WARNING - TEST-UNEXPECTED-FAIL | http://10.0.2.2:8854/tests/layout/style/crashtests/1546255.html | application timed out after 370 seconds with no output
[task 2019-05-23T18:02:07.657Z] 18:02:07 INFO - remoteautomation.py | Application ran for: 0:13:54.845250
[task 2019-05-23T18:02:08.192Z] 18:02:08 INFO - REFTEST INFO | Downloading symbols from: https://queue.taskcluster.net/v1/task/cm0pFFzpTsmXUojTHNF9aA/artifacts/public/build/target.crashreporter-symbols.zip
[task 2019-05-23T18:02:12.945Z] 18:02:12 INFO - REFTEST INFO | Copy/paste: /builds/worker/workspace/build/linux64-minidump_stackwalk /tmp/tmpRHOTix/37a6e1c4-40e7-2c33-8a1c-492691cd8c13.dmp /tmp/tmpXilQ5r
[task 2019-05-23T18:02:18.804Z] 18:02:18 INFO - REFTEST INFO | Saved minidump as /builds/worker/workspace/build/blobber_upload_dir/37a6e1c4-40e7-2c33-8a1c-492691cd8c13.dmp
[task 2019-05-23T18:02:18.804Z] 18:02:18 INFO - REFTEST INFO | Saved app info as /builds/worker/workspace/build/blobber_upload_dir/37a6e1c4-40e7-2c33-8a1c-492691cd8c13.extra
[task 2019-05-23T18:02:18.807Z] 18:02:18 INFO - REFTEST PROCESS-CRASH | http://10.0.2.2:8854/tests/layout/style/crashtests/1546255.html | application crashed [@ libc.so + 0x8c66a]
[task 2019-05-23T18:02:18.807Z] 18:02:18 INFO - Crash dump filename: /tmp/tmpRHOTix/37a6e1c4-40e7-2c33-8a1c-492691cd8c13.dmp
[task 2019-05-23T18:02:18.807Z] 18:02:18 INFO - Operating system: Android
[task 2019-05-23T18:02:18.808Z] 18:02:18 INFO - 0.0.0 Linux 3.10.0+ #1 PREEMPT Thu Jan 5 00:46:30 UTC 2017 x86_64
[task 2019-05-23T18:02:18.808Z] 18:02:18 INFO - CPU: amd64
[task 2019-05-23T18:02:18.808Z] 18:02:18 INFO - family 6 model 2 stepping 3
[task 2019-05-23T18:02:18.808Z] 18:02:18 INFO - 1 CPU
[task 2019-05-23T18:02:18.808Z] 18:02:18 INFO - GPU: UNKNOWN
[task 2019-05-23T18:02:18.808Z] 18:02:18 INFO - Crash reason: SIGABRT
[task 2019-05-23T18:02:18.808Z] 18:02:18 INFO - Crash address: 0xd5c
[task 2019-05-23T18:02:18.808Z] 18:02:18 INFO - Process uptime: not available
[task 2019-05-23T18:02:18.808Z] 18:02:18 INFO - Thread 0 (crashed)
[task 2019-05-23T18:02:18.808Z] 18:02:18 INFO - 0 libc.so + 0x8c66a
[task 2019-05-23T18:02:18.808Z] 18:02:18 INFO - rax = 0xfffffffffffffffc rdx = 0x0000000000000010
[task 2019-05-23T18:02:18.808Z] 18:02:18 INFO - rcx = 0xffffffffffffffff rbx = 0x00000000ffffffff
[task 2019-05-23T18:02:18.808Z] 18:02:18 INFO - rsi = 0x00007fff36f14580 rdi = 0x000000000000001c
[task 2019-05-23T18:02:18.808Z] 18:02:18 INFO - rbp = 0x00000000ffffffff rsp = 0x00007fff36f14508
[task 2019-05-23T18:02:18.808Z] 18:02:18 INFO - r8 = 0x0000000000000000 r9 = 0x0000000000000008
[task 2019-05-23T18:02:18.808Z] 18:02:18 INFO - r10 = 0x00000000ffffffff r11 = 0x0000000000000246
[task 2019-05-23T18:02:18.808Z] 18:02:18 INFO - r12 = 0x00000000ffffffff r13 = 0x00007708fe5e3f40
[task 2019-05-23T18:02:18.808Z] 18:02:18 INFO - r14 = 0x0000000000000000 r15 = 0x0000000000000000
[task 2019-05-23T18:02:18.808Z] 18:02:18 INFO - rip = 0x00007708ffc0866a
[task 2019-05-23T18:02:18.809Z] 18:02:18 INFO - Found by: given as instruction pointer in context
[task 2019-05-23T18:02:18.809Z] 18:02:18 INFO - 1 libc.so + 0x2a0e6
[task 2019-05-23T18:02:18.809Z] 18:02:18 INFO - rsp = 0x00007fff36f14510 rip = 0x00007708ffba60e6
[task 2019-05-23T18:02:18.809Z] 18:02:18 INFO - Found by: stack scanning
[task 2019-05-23T18:02:18.809Z] 18:02:18 INFO - 2 libutils.so + 0x184e7
[task 2019-05-23T18:02:18.809Z] 18:02:18 INFO - rsp = 0x00007fff36f14530 rip = 0x00007708ffc914e7
[task 2019-05-23T18:02:18.809Z] 18:02:18 INFO - Found by: stack scanning
[task 2019-05-23T18:02:18.809Z] 18:02:18 INFO - 3 libart.so + 0x14cd91
[task 2019-05-23T18:02:18.809Z] 18:02:18 INFO - rsp = 0x00007fff36f14540 rip = 0x00007708fde58d91
[task 2019-05-23T18:02:18.809Z] 18:02:18 INFO - Found by: stack scanning
[task 2019-05-23T18:02:18.809Z] 18:02:18 INFO - 4 system@framework@boot-framework.art + 0x3d5350
[task 2019-05-23T18:02:18.809Z] 18:02:18 INFO - rsp = 0x00007fff36f14558 rip = 0x0000000071675350
[task 2019-05-23T18:02:18.809Z] 18:02:18 INFO - Found by: stack scanning
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 3•6 years ago
|
||
The priority flag is not set for this bug.
:emilio, could you have a look please?
For more information, please visit auto_nag documentation.
Comment 4•6 years ago
|
||
Do we know why there are no usable stacks on android? I'm not sure about whether you're the right person to ask Mike, feel free to forward as needed.
Comment 5•6 years ago
|
||
minidump_stackwalk can't do anything about system libraries, for which it doesn't have breakpad symbol files.
Comment hidden (Intermittent Failures Robot) |
Comment 7•6 years ago
|
||
The priority flag is not set for this bug.
:heycam, could you have a look please?
For more information, please visit auto_nag documentation.
![]() |
||
Comment 8•6 years ago
|
||
These are all on Android 7.0 (geckoview). Tests hang at the end of 1546255.html. This logcat seems typical:
06-08 13:00:32.290 2417 2432 I Gecko : REFTEST TEST-LOAD | http://10.0.2.2:8854/tests/layout/style/crashtests/1546255.html | 3167 / 3761 (84%)
06-08 13:00:32.290 2417 2432 I Gecko :
06-08 13:00:32.290 2417 2432 I Gecko : {"action":"log","time":1559995232290,"thread":null,"pid":null,"source":"reftest","level":"DEBUG","message":"START http://10.0.2.2:8854/tests/layout/style/crashtests/1546255.html"}
06-08 13:00:32.320 2477 2492 W Web Content: [JavaScript Warning: "The stylesheet data:;base64,QGN7IGJ9CkRBTVAsIEhUTUwgey was loaded as CSS even though its MIME type, “text/plainâ€, is not “text/cssâ€." {file: "http://10.0.2.2:8854/tests/layout/style/crashtests/1546255.html" line: 0}]
06-08 13:00:32.320 2477 2492 E Web Content: [JavaScript Error: "The character encoding of the HTML document was not declared. The document will render with garbled text in some browser configurations if the document contains characters from outside the US-ASCII range. The character encoding of the page must be declared in the document or in the transfer protocol." {file: "http://10.0.2.2:8854/tests/layout/style/crashtests/1546255.html" line: 0}]
06-08 13:00:32.330 2417 2432 I Gecko :
06-08 13:00:32.330 2417 2432 I Gecko : {"action":"log","time":1559995232330,"thread":null,"pid":null,"source":"reftest","level":"DEBUG","message":"[CONTENT] OnDocumentLoad triggering AfterOnLoadScripts"}
06-08 13:00:32.330 2477 2492 W Web Content: [JavaScript Warning: "Synchronous XMLHttpRequest on the main thread is deprecated because of its detrimental effects to the end user’s experience. For more help http://xhr.spec.whatwg.org/" {file: "http://10.0.2.2:8854/tests/layout/style/crashtests/1546255.html" line: 21}]
06-08 13:00:32.340 2417 2432 I Gecko :
06-08 13:00:32.340 2417 2432 I Gecko : {"action":"log","time":1559995232340,"thread":null,"pid":null,"source":"reftest","level":"DEBUG","message":"Initializing canvas snapshot"}
06-08 13:00:32.340 2417 2432 I Gecko :
06-08 13:00:32.340 2417 2432 I Gecko : {"action":"log","time":1559995232340,"thread":null,"pid":null,"source":"reftest","level":"DEBUG","message":"[CONTENT] AfterOnLoadScripts belatedly entering WaitForTestEnd"}
06-08 13:00:32.340 2417 2432 I Gecko :
06-08 13:00:32.340 2417 2432 I Gecko : {"action":"log","time":1559995232340,"thread":null,"pid":null,"source":"reftest","level":"DEBUG","message":"[CONTENT] WaitForTestEnd: Adding listeners"}
06-08 13:00:32.340 2477 2492 W Web Content: [JavaScript Warning: "Use of Mutation Events is deprecated. Use MutationObserver instead." {file: "resource://reftest/reftest-content.js" line: 784}]
06-08 13:00:32.340 2417 2432 I Gecko :
06-08 13:00:32.340 2417 2432 I Gecko : {"action":"log","time":1559995232340,"thread":null,"pid":null,"source":"reftest","level":"DEBUG","message":"Initializing canvas snapshot"}
06-08 13:00:32.340 2417 2432 I Gecko :
06-08 13:00:32.340 2417 2432 I Gecko : {"action":"log","time":1559995232340,"thread":null,"pid":null,"source":"reftest","level":"DEBUG","message":"[CONTENT] MakeProgress: STATE_WAITING_TO_FIRE_INVALIDATE_EVENT"}
06-08 13:00:32.340 2417 2432 I Gecko :
06-08 13:00:32.340 2417 2432 I Gecko : {"action":"log","time":1559995232340,"thread":null,"pid":null,"source":"reftest","level":"DEBUG","message":"[CONTENT] MakeProgress: dispatching MozReftestInvalidate"}
06-08 13:00:32.340 2417 2432 I Gecko :
06-08 13:00:32.340 2417 2432 I Gecko : {"action":"log","time":1559995232340,"thread":null,"pid":null,"source":"reftest","level":"DEBUG","message":"[CONTENT] MakeProgress: STATE_WAITING_FOR_REFTEST_WAIT_REMOVAL"}
06-08 13:00:32.340 2417 2432 I Gecko :
06-08 13:00:32.340 2417 2432 I Gecko : {"action":"log","time":1559995232340,"thread":null,"pid":null,"source":"reftest","level":"DEBUG","message":"[CONTENT] MakeProgress: STATE_WAITING_FOR_SPELL_CHECKS"}
06-08 13:00:32.340 2417 2432 I Gecko :
06-08 13:00:32.340 2417 2432 I Gecko : {"action":"log","time":1559995232340,"thread":null,"pid":null,"source":"reftest","level":"DEBUG","message":"[CONTENT] MakeProgress: STATE_WAITING_FOR_APZ_FLUSH"}
06-08 13:00:32.340 2417 2432 I Gecko :
06-08 13:00:32.340 2417 2432 I Gecko : {"action":"log","time":1559995232340,"thread":null,"pid":null,"source":"reftest","level":"DEBUG","message":"[CONTENT] MakeProgress: APZ flush not required"}
06-08 13:00:32.340 2417 2432 I Gecko :
06-08 13:00:32.340 2417 2432 I Gecko : {"action":"log","time":1559995232340,"thread":null,"pid":null,"source":"reftest","level":"DEBUG","message":"[CONTENT] MakeProgress: STATE_WAITING_TO_FINISH"}
06-08 13:00:32.340 2417 2432 I Gecko :
06-08 13:00:32.340 2417 2432 I Gecko : {"action":"log","time":1559995232340,"thread":null,"pid":null,"source":"reftest","level":"DEBUG","message":"[CONTENT] MakeProgress: Doing sync flush to compositor"}
06-08 13:00:32.340 2417 2432 I Gecko :
06-08 13:00:32.340 2417 2432 I Gecko : {"action":"log","time":1559995232340,"thread":null,"pid":null,"source":"reftest","level":"DEBUG","message":"[CONTENT] MakeProgress: Completed"}
06-08 13:00:32.340 2417 2432 I Gecko :
06-08 13:00:32.340 2417 2432 I Gecko : {"action":"log","time":1559995232340,"thread":null,"pid":null,"source":"reftest","level":"DEBUG","message":"[CONTENT] RecordResult fired"}
06-08 13:00:32.340 2417 2432 I Gecko :
06-08 13:00:32.340 2417 2432 I Gecko : {"action":"log","time":1559995232340,"thread":null,"pid":null,"source":"reftest","level":"DEBUG","message":"RecordResult fired"}
06-08 13:00:32.340 2417 2432 I Gecko :
06-08 13:00:32.340 2417 2432 I Gecko : {"action":"test_status","time":1559995232340,"thread":null,"pid":null,"source":"reftest","test":"http://10.0.2.2:8854/tests/layout/style/crashtests/1546255.html","subtest":"(LOAD ONLY)","status":"PASS"}
06-08 13:00:32.340 2417 2432 I Gecko :
06-08 13:00:32.340 2417 2432 I Gecko : {"action":"test_end","time":1559995232340,"thread":null,"pid":null,"source":"reftest","test":"http://10.0.2.2:8854/tests/layout/style/crashtests/1546255.html","status":"OK"}
06-08 13:00:32.340 2417 2432 I Gecko :
06-08 13:00:32.340 2417 2432 I Gecko : {"action":"log","time":1559995232340,"thread":null,"pid":null,"source":"reftest","level":"DEBUG","message":"Loading a blank page"}
06-08 13:00:32.400 2417 2432 I Gecko :
06-08 13:00:32.400 2417 2432 I Gecko : {"action":"log","time":1559995232400,"thread":null,"pid":null,"source":"reftest","level":"DEBUG","message":"[CONTENT] AfterPaintListener in data:text/html;charset=UTF-8,%3C%21%2D%2DCLEAR%2D%2D%3E"}
06-08 13:00:32.400 2417 2432 I Gecko :
06-08 13:00:32.400 2417 2432 I Gecko : {"action":"log","time":1559995232400,"thread":null,"pid":null,"source":"reftest","level":"DEBUG","message":"[CONTENT] AfterPaintListener in data:text/html;charset=UTF-8,%3C%21%2D%2DCLEAR%2D%2D%3E"}
06-08 13:07:12.090 2417 2423 I art : Thread[3,tid=2423,WaitingInMainSignalCatcherLoop,Thread*=0x73c2032b7000,peer=0x12c4dd30,"Signal Catcher"]: reacting to signal 3
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 11•6 years ago
|
||
There are 22 total failures in the last 7 days: https://treeherder.mozilla.org/intermittent-failures.html#/bugdetails?startday=2019-06-23&endday=2019-06-30&tree=trunk&bug=1553971
Recent failure log: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=254128988&repo=autoland&lineNumber=14491
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 18•6 years ago
|
||
We had 102 failures in the past week, all on android-em-7-0-x86_64 opt.
Recent log failure:
https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=255173994&repo=autoland&lineNumber=14569
Comment 19•6 years ago
|
||
This looks to be caused of Bug 1556058. Joel, can we star these timeouts against that bug or should we have bugs for each test timing out?
Comment 20•6 years ago
|
||
the plan is to keep on the new emulator, so we should skip this crashtest on android&&!debug and move on. Every time we switch platforms or run on a new config or platform many tests change- the tests that we are running on the new emulators are running more tests and more reliable, this seems to be one of the few that slipped by in early testing.
Comment 21•6 years ago
•
|
||
Joel, I've come across more timeouts like this so my question was somewhat towards a general rule for them. Should we go ahead and skip tests that are timing out cause of 1556058?
eg: https://treeherder.mozilla.org/#/jobs?repo=autoland&selectedJob=255204230&revision=f4081c5e99bf3f47f82f70b09edc7b94cc243841
https://treeherder.mozilla.org/#/jobs?repo=autoland&selectedJob=255025669&revision=b4f0e8b4b6faec847999027e2cff9d5227077d21
https://treeherder.mozilla.org/#/jobs?repo=mozilla-esr68&selectedJob=255027398&revision=501f4667334f5ccaee37ca013d6d1a5923e076f6
https://treeherder.mozilla.org/#/jobs?repo=mozilla-release&selectedJob=255029929&revision=353628fec415324ca6aa333ab6c47d447ecc128e
Comment 22•6 years ago
|
||
good question Cosmin, yes lets do that, but please indicate in bug 1556058 the tests or bug numbers, you could even have this bug as blocking bug 1556058 as a quick way to do it.
Comment 23•6 years ago
|
||
Thanks for the info Joel. Will let the other sheriffs know about this.
Comment 24•6 years ago
|
||
Updated•6 years ago
|
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Updated•6 years ago
|
Comment 27•6 years ago
|
||
Comment 28•6 years ago
|
||
Updated•6 years ago
|
Comment 29•6 years ago
|
||
bugherder |
Comment 30•6 years ago
|
||
How sure are we that this failure had anything to do with this test? Before the timeout, I see:
REFTEST TEST-END | http://10.0.2.2:8854/tests/layout/style/crashtests/1546255.html
INFO - wait for org.mozilla.geckoview.test complete; top activity=org.mozilla.geckoview.test
INFO - org.mozilla.geckoview.test unexpectedly found running. Killing...
So this test has ended (and we've probably even navigated away from it) by the time we hit the problem.
It looks like GeckoView is hitting a shutdown hang, or something -- not a hang in this test. And the harness is (kinda-mistakenly) attributing the failure to this test because it happens to be the most recent test that was running, but it seems entirely likely that the shutdown hang was caused by something else.
In other words, I'm wondering whether this failure will now suddenly start being reported for another nearby test (whichever test ends up being the last one now that this test has been disabled)... Worth keeping an eye out for.
Comment 31•6 years ago
|
||
(In reply to Daniel Holbert [:dholbert] from comment #30)
How sure are we that this failure had anything to do with this test?
Comment 19 makes me even more curious about this... Sounds like this may be a problem with the emulator and disabling individual harness-"blamed" tests is unlikely to help?
see also e.g. bug 1471222 where we had a similar class of failures for ASAN testruns, where it seems to be a shutdown hang and a harness issue, though the failure message mistakenly implied that the unlucky last test was somehow involved (though it was not). It looks like we later improved the logging in bug 1475141 to make that clearer, but I'm not sure if that fix benefited the crashtest harness vs. just reftests... ahal, do you know?
Comment 32•6 years ago
•
|
||
(In reply to Joel Maher ( :jmaher ) (UTC-4) from comment #20)
the plan is to keep on the new emulator, so we should skip this crashtest on android&&!debug and move on.
Joel, are you still sure about this^^ assessment, given that this failure looks like a shutdown hang & potentially has nothing to do with this test?
Perhaps a more-robust fix here is to revert to the old emulator for crashtests, as we did for xpcshell it seems? (bug 1556058 comment 19)
Comment 33•6 years ago
|
||
it is fine to revert to the old emulator, I am not clear on what issues were fixed for crashtests by switching to the new emulator. :gbrown knows a lot more about both emulators and will be back from PTO on Monday, lets get this on his radar.
Comment 34•6 years ago
•
|
||
I also wonder if this is the same issue as bug 1563997. The first comment there is for a debug android Mochitest run (specifically "Android 7.0 x86-64 opt Mochitests test-android-em-7.0-x86_64/opt-mochitest-gpu-e10s"), and it ends with the same pattern that I'm seeing here and quoted in comment 30. In particular, it's got:
[logging for last test finishing]
INFO - wait for org.mozilla.geckoview.test complete; top activity=org.mozilla.geckoview.test
INFO - org.mozilla.geckoview.test unexpectedly found running. Killing...
...and a 370-second harness timeout.
Assuming this is a more general harness/emulator problem and unrelated to this particular test, we should probably backout comment 27 (unless we're noticing less orange as a result of it, but I'm skeptical that we would be).
Comment 35•6 years ago
|
||
I think bug 1475141 should have also benefited crashtest. But if you're seeing evidence to the contrary then I wouldn't be too surprised if crashtest wasn't working.
Here's a reftest task where it's working fwiw:
https://treeherder.mozilla.org/#/jobs?repo=autoland&selectedJob=218864193&revision=75a0b52395c94a2fdc77642d5dfc6cf0e027ad58
Comment 36•6 years ago
|
||
(In reply to Andrew Halberstadt [:ahal] from comment #35)
I think bug 1475141 should have also benefited crashtest. But if you're seeing evidence to the contrary then I wouldn't be too surprised if crashtest wasn't working.
I spun this off as bug 1565286 -- let's move this discussion over there. Thanks!
Comment hidden (Intermittent Failures Robot) |
![]() |
||
Comment 38•6 years ago
|
||
(In reply to Cosmin Sabou [:CosminS] from comment #19)
This looks to be caused of Bug 1556058. Joel, can we star these timeouts against that bug or should we have bugs for each test timing out?
We have a history of failures in this bug beginning May 22, while bug 1556058 did not land anything until June 28; there may have been an increase in failure frequency with the emulator change, but the failure before/after bug 1556058 looks to be the same.
![]() |
||
Comment 39•6 years ago
|
||
(In reply to Daniel Holbert [:dholbert] from comment #30)
In other words, I'm wondering whether this failure will now suddenly start being reported for another nearby test (whichever test ends up being the last one now that this test has been disabled)... Worth keeping an eye out for.
I had the same concern, but actually, I see no new failures since this test was disabled: There are no shutdown timeouts in Android 7.0 crashtests since comment 29.
![]() |
||
Comment 40•6 years ago
|
||
It is okay to revert to the old emulator for crashtests if that is desired, but in light of comment 38 and comment 39 I think the best course of action is to do nothing and/or actually investigate the cause of the hang.
Comment 41•6 years ago
|
||
(In reply to Daniel Holbert [:dholbert] from comment #34)
I also wonder if this is the same issue as bug 1563997. The first comment there is for a debug android Mochitest run (specifically "Android 7.0 x86-64 opt Mochitests test-android-em-7.0-x86_64/opt-mochitest-gpu-e10s"), and it ends with the same pattern that I'm seeing here and quoted in comment 30. In particular, it's got:
[logging for last test finishing]
INFO - wait for org.mozilla.geckoview.test complete; top activity=org.mozilla.geckoview.test
INFO - org.mozilla.geckoview.test unexpectedly found running. Killing......and a 370-second harness timeout.
Assuming this is a more general harness/emulator problem and unrelated to this particular test, we should probably backout comment 27 (unless we're noticing less orange as a result of it, but I'm skeptical that we would be).
Hi Daniel. There aren't any failures since June 11th according to OF so I presume that the backout you mentioned is not necessary. Please correct me if I'm wrong.
Comment hidden (Intermittent Failures Robot) |
Comment 43•6 years ago
•
|
||
(In reply to Andreea Pavel [:apavel] from comment #41)
Hi Daniel. There aren't any failures since June 11th according to OF
No Android crashtest "application timed out after 370 seconds with no output" issues at all?
I was guessing that this issue would just pop up as a similar failure attributed to another nearby testcase (whichever testcase became the new final test in this bucket), and it would spawn a new intermittent-failure bug. But if that hasn't happened, then I agree the hypothetical backout would not be necessary & the test-disabling should stick.
Comment 44•6 years ago
|
||
(In reply to Daniel Holbert [:dholbert] from comment #43)
(In reply to Andreea Pavel [:apavel] from comment #41)
Hi Daniel. There aren't any failures since June 11th according to OF
No Android crashtest "application timed out after 370 seconds with no output" issues at all?
I was guessing that this issue would just pop up as a similar failure attributed to another nearby testcase (whichever testcase became the new final test in this bucket), and it would spawn a new intermittent-failure bug. But if that hasn't happened, then I agree the hypothetical backout would not be necessary & the test-disabling should stick.
I was referring solely to this failure that it doesn't occur anymore.
Comment 45•6 years ago
|
||
(In reply to Andreea Pavel [:apavel] from comment #44)
I was referring solely to this failure that it doesn't occur anymore.
Well, all we know from OrangeFactor + this bug is that this failure doesn't occur anymore with this test mentioned in the failure message, which isn't particularly surprising since the test has been disabled. But since this is an error that happens when the whole crashtest run is complete, I was worried that it'd keep failing & just not be reported here since it'd be attributed to some other test.
But, having said that: it looks like there haven't been any new crashtest "timed out after 370 seconds" bugs filed since Comment 27 (when this test was disabled). I did a bugzilla search for that "timed out" phrase in bugs filed since ~July 9th, and I didn't see any results that were for a crashtest run. So, I think this did indeed go away, which means the test-disabling was in fact a Good Thing.
So that means something about 1546255.html is problematic and can cause shutdown hangs, at least on Android.
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 50•6 years ago
|
||
This started permafailing in this push:
https://treeherder.mozilla.org/#/jobs?repo=mozilla-central&fromchange=5909193688b5a750be9eeaf6ea70fb59476d5b7f&searchStr=android%2C4.3%2Capi16%2B%2Cdebug%2Creftests%2Cwithout%2Ce10s%2Ctest-android-em-4.3-arm7-api-16%2Fdebug-fennec-crashtest-1proc-9%2Cr-1proc%28c9%29
Daniel, can you take a look at it? I believe is the same issue as discussed above.
Comment 51•6 years ago
•
|
||
I agree, looks like the same issue. But now it's apparently affecting Android Debug as well..
We could paper-over it by adjusting the skip-if(Android&&!isDebugBuild)
annotation to remove the "!isDebug" restriction. But given that we've got a tight range (this particular autoland-->m-c merge), it'd be nice to see if we can figure out where/why this started perma-failing.
:malexandru, one request for you: is it possible for us to backfill this testrun (from comment 50) on Autoland and see if there's one push there where it got bad? (It doesn't look to me like we have any recent "Android 4.3" runs of any sort at all on Autoland right now, but I'm wondering if we can spawn some over the potentially-guilty range...)
Also, one observation (knocking down a possible explanation): I don't think this is just a case of "rebucketing", because the logs from m-c's last good push and first bad push both have this identical messaging/bucket-description:
REFTEST INFO | Running chunk 9 out of 10 chunks. tests 3039-3417/3795
REFTEST SUITE-START | Running 379 tests
REFTEST TEST-START | http://10.0.2.2:8854/tests/layout/style/crashtests/1245260-1.html
(Notably: exact same total number of tests, exact same range of tests (3039-3417), exact same first test name, and the problematic test does get run in both the last-good and the first-bad build (as the final test before we time out, in the latter case).
Comment 52•6 years ago
|
||
I did what you asked and here are the results:
Looks like Bug 1564594 is the culprit here.
Comment 53•6 years ago
|
||
Thanks, malexandru! For now: if this perma-orange is showing up and causing sheriff pain, I think we're fine to land a followup to your earlier patch to remove &&!isDebugBuild
(i.e. just skip this crashtest on Android for the moment).
emilio, it looks like you wrote the crashtest in question, 1546255.html - do you have any idea how the just-landed Bug 1564594 ( https://hg.mozilla.org/integration/autoland/rev/2fc4f931f041ebab1d02fe3433fdaba6f9a49fdf ) could've tickled it and make us reliably timeout after running it? I'm guessing maybe there's a latent race condition that Bug 1564594 caused us to "lose", but I don't really know how....
:twisniewski, same question to you... Do you know if any of your Bug 1564594 code is intended to be active during tests? Is it conceivable that something about that patch could've led to a deadlock/hang somehow?
For reference/convenience, here's one of the failing logs from comment 50:
https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=263027894&repo=mozilla-central&lineNumber=2058
Relevant lines:
[task 2019-08-22T23:12:19.196Z] 23:12:19 INFO - REFTEST TEST-LOAD | http://10.0.2.2:8854/tests/layout/style/crashtests/1546255.html | 156 / 379 (41%)
[task 2019-08-22T23:12:30.112Z] 23:12:30 INFO - REFTEST TEST-PASS | http://10.0.2.2:8854/tests/layout/style/crashtests/1546255.html | (LOAD ONLY)
[task 2019-08-22T23:12:30.112Z] 23:12:30 INFO - REFTEST TEST-END | http://10.0.2.2:8854/tests/layout/style/crashtests/1546255.html
[task 2019-08-22T23:25:01.899Z] 23:25:01 INFO - wait for org.mozilla.fennec_aurora complete; top activity=org.mozilla.fennec_aurora
[task 2019-08-22T23:25:02.204Z] 23:25:02 INFO - org.mozilla.fennec_aurora unexpectedly found running. Killing...
(notice the 12m31s minute delay between REFTEST TEST-END and the next line where we detect that something went wrong & bail)
Updated•6 years ago
|
Comment 54•6 years ago
|
||
I don't expect the extension to run on crashtests, but that failure may highlight a deeper issue.
That crashtest is doing a sync AJAX request. So it's not totally unexpected, I suspect, the fact that it runs the extension code?
But there's no race condition there afaik, the crashtest just loads and finishes as you can see by the REFTEST-TEST-END
. So I think something in the landed extension code or related infrastructure gets confused by the sync XHR and we end up running something for a long amount of time.
Comment 55•6 years ago
|
||
Comment 56•6 years ago
|
||
Do you know if any of your Bug 1564594 code is intended to be active during tests? Is it conceivable that something about that patch could've led to a deadlock/hang somehow?
No, the changes in that patch should not have any impact on desktop Firefox at all, and should definitely have nothing to do with this specific test.
All the patch did was update a system addon (the webcompat addon), and in a way that shouldn't impact anything any more than the addon did before.
Comment 57•6 years ago
|
||
(In reply to Thomas Wisniewski [:twisniewski] from comment #56)
No, the changes in that patch should not have any impact on desktop Firefox at all, and should definitely have nothing to do with this specific test.
This is an android failure, fwiw.
Comment 58•6 years ago
|
||
Ah, I see. In that case, I'm afraid that I still don't see how the patch would affect this test, because it ultimately just overrides the user-agent string for navigations to some Google URLs, and blocks specific network requests to the Google serviceworker. I don't see the test making requests to either.
The only things outside of the system addon which the patch does (on Fennec) is to add two properties to the core telemetry ping, and also inform Fennec of an upcoming Fretboard experiment. Both of these are trivial changes to the relevant Java files.
The only thing I could see happening here is that there may be very slight changes to how long the webcompat system addon loads, or this specific patch triggers the addon to somehow fail loading (which shouldn't be the case, or I'd expect many other tests to trip as well). I don't see any references to the webcompat addon in the full log, either.
Comment 59•6 years ago
|
||
Comment 60•6 years ago
|
||
bugherder |
Comment hidden (Intermittent Failures Robot) |
Comment 62•6 years ago
|
||
Note that bug 1564594 (the thing that made this perma-orange) has a pending beta-uplift request, so we may need to uplift the test-skipping-patch (comment 60) along with that in order to avoid perma-orange on Beta.
However, I'd very much like to understand what piece of bug 1564594 is causing this failure before we actually do the uplift... I believe Tom is investigating (thanks Tom!)
One initial piece of data: I tried removing bug 1564594's new ua-overrides.js
Google snippet in a Try run, and that was not sufficient to fix the perma-orange:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=8b9bcfe5f16234eb9437a4521b1c0f07f7551854
However, a full backout of bug 1564594 did fix the perma-orange:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=01802c227ec583ad8f265a1f8c7cc27ca4853e96
...so some other piece of bug 1564594 (besides the Google ua-override.js
snippet) seems to be responsible.
Comment 63•6 years ago
|
||
I ran a series of try-runs over the past couple of days to determine what the issue is:
- disabling just the Enhanced Search code-block which uses the new features in the patch DOES resolve the crash: https://treeherder.mozilla.org/#/jobs?repo=try&revision=e2d26df3ce2667d0ea422b34b4f0e83ffe61d618
- disabling just its "blocks" parameter does not resolve the crash: https://treeherder.mozilla.org/#/jobs?repo=try&revision=1a40b72f1c5debc7560e62821a3d7d1aac15ccca
- not using a regex for the "matches" parameter does not resolve the crash: https://treeherder.mozilla.org/#/jobs?repo=try&revision=c6ec71019878e37d87441fb91a16c86720b6c2a8
- skipping permanent pref, telemetry and experiments bits does not resolve the crash: https://treeherder.mozilla.org/#/jobs?repo=try&revision=d6c6515c5e7ef9fa27edc95d48ab003e3cf772b3
- skipping "blocks" and not using a regex for "matches" DOES resolve the crash: https://treeherder.mozilla.org/#/jobs?repo=try&revision=d88a1ce0a5c27677aa073089220aa388b1a5821f
- having the two webRequest listeners with an <all_urls> pattern seems to be the culprit: https://treeherder.mozilla.org/#/jobs?repo=try&revision=7cc7e866c29f0d0c10ccd732775786ce1008be1e
- indeed, only adding one of the listeners (onBeforeSendHeaders), whether it actually does anything, is enough: https://treeherder.mozilla.org/#/jobs?repo=try&revision=49dcb54d98ab31987f309021d35b30be27dfa3ea
- same when adding adding the other listener (onBeforeRequest) without the handler doing anything: https://treeherder.mozilla.org/#/jobs?repo=try&revision=de3e393ab26d9171845b459cd56a5db0f08c53cc
So the crash is somehow being caused by the addon registering at least one blocking webRequest onBeforeSendHeaders/onBeforeRequest listener with the match pattern ["*://*/*"]
This all-URLs listener does not have to actually do anything, merely having one of them is enough.
I'm not sure why this would cause the test to timeout though.
I don't see any strange timeouts or other behavior when I view the same HTML as the test-case on a local build on my own phone (though it cannot access SpecialPowers, of course).
So maybe it's specific to older Android versions somehow? (My phone is running Android 8).
Comment 64•6 years ago
|
||
Luca, do you have any thoughts on what we could try here?
mixedpuppy also suggested that I try using <all_urls>
instead of *://*/*
as my match-pattern, but that also fails: https://treeherder.mozilla.org/#/jobs?repo=try&revision=152349dfe7e8346c7e70557b94528a963f846a0e
Comment hidden (Intermittent Failures Robot) |
Comment 66•6 years ago
|
||
(Adding :robwu to the ni?)
Rob, maybe you have some clues here? Note that in bug 1543639 it doesn't seem as though it's just webRequest listeners that might trigger test failures, but they do seem to be the problem here.
Comment 67•6 years ago
|
||
(In reply to Thomas Wisniewski [:twisniewski] from comment #64)
Luca, do you have any thoughts on what we could try here?
mixedpuppy also suggested that I try using
<all_urls>
instead of*://*/*
as my match-pattern, but that also fails: https://treeherder.mozilla.org/#/jobs?repo=try&revision=152349dfe7e8346c7e70557b94528a963f846a0e
Apologies for the lag time, I have been trying to reproduce the issue locally in an android emulator, to make it easier for me to take a deeper look into it, it is faster than pushing to try... but the feedback loop while investigating a bug on the android emulator is still a bit slow (it feels like a slow motion scene from a movie :-))
Anyway, I'm pretty sure that using <all_urls>
instead of *://**/*
is not going to make any difference based on what it seems to be the underlying reason for the failure, from what I see the test case that is getting stuck is the only one in this group of crashtests to be using a synchronous XMLHttpRequest.
When that request is intercepted by the webRequest API internals and a blocking webRequest listener is already registered ([1]), the request is suspended but it is never resumed because the promise returned by fire.sync(...)
is never resolved.
It doesn't matter what the webRequest listener contains, it could be even an empty function and the issue would still be triggered.
On Fennec e10s is set to false and both the extension pages and the webpages are going to run in the same process (the main process), and so I think that a synchronous XMLHttpRequest may be preventing the "API:RunListener" event to be even received by the extension code until the synchronous XMLHttpRequest completes (or fails).
The reftests test harness is stuck because, while the synchronous XMLHttpRequest is still suspended, it is calling LoadURI(BLANK_URL_FOR_CLEARING);
and then waiting for the BLANK_URL_FOR_CLEARING data url to be loaded, which never happens ([2]).
It seems a legit bug on Fennec (I haven't tried yet, but I would assume that a similar issue may be triggered by doing a synchronous XMLHttpRequest from a webpage while webcompat or an adblocker extension are installed), but the same scenario shouldn't be a problem for GeckoView, where the webpages are loaded in remote browsers and the extension runs in the main process.
[1]: which happens if the webcompat extension has been already started and it has been able to register its webRequest listeners, and so when running the test file alone and for a single time it seems to be able to complete successfully because the webcompat extension has not yet subscribed its listeners and so it doesn't intercept and suspend the synchronous XMLHttpRequest, but if I run the test case more than once in a row (e.g. using ./mach test --repeat 1 ...
) it gets consistently stuck on the second run (when webcompat is already started and listening for webRequest API events) and fails by timing out.
[2]: loading the data url is having the side-effect of unblocking the XMLHttpRequest, but the data url is still not loaded and reftests onDocumentLoad
is never called and the test harness is stuck until the timeout.
Comment 68•6 years ago
•
|
||
Thanks Luca. Would you have any ideas I could try to work around this? Because I'm seeing other, similar timeout regressions from my patch which likewise make no sense to me, as they only seem to fail in the test framework (not when I run the tests on a nightly build with my patch applied, our of the framework). That makes me suspect that either the test framework uses sync XHRs internally somewhere, or maybe there's more to the story? (you can see all the regressions on parent bug 1564594). It's also worth noting that the system addon was already using webRequest listeners for other site-patches, so it's unclear to me why my patch would suddenly start causing these errors..
Comment 69•6 years ago
•
|
||
(going to answer in bug 1564594)
EDIT: See https://bugzilla.mozilla.org/show_bug.cgi?id=1564594#c40
Comment 70•6 years ago
|
||
(In reply to Thomas Wisniewski [:twisniewski] from comment #68)
Thanks Luca. Would you have any ideas I could try to work around this?
As we briefly discussed over IRC on Friday, using a more specific match pattern (more specific than "*://*/*"
and/or "<all_urls>"
) should be enough to prevent the issue triggered in this crashtests (because the webRequest listener subscribed by webcompat would not be matching the synchronous XMLHttpRquest that this crashtest is using and the crashtest shouldn't get stuck anymore).
I just gave a try to the updated webcompat version (which is not using the "*://*/*"
catch-all anymore) in my local dev environment and the crashtests is now consistently completing without getting stuck anymore (which confirms that the fix is preventing this failure, and very likely also the other regression tracked in the parent Bug 1564594).
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Updated•5 years ago
|
Updated•3 years ago
|
Updated•3 years ago
|
Comment 73•3 years ago
|
||
No reports in 3 years, which I guess is unsurprising given that this was on Android (per comment 8) and we landed patches here to skip the test on Android (comment 59).
I guess we can leave this open to track the fact that this is presumably still an issue if we were to un-skip these tests.
Dropping severity to S4, given that there's nothing actively causing trouble at the moment & it's not clear there were any user-facing issues here.
Comment 74•9 months ago
|
||
https://wiki.mozilla.org/Bug_Triage#Intermittent_Test_Failure_Cleanup
For more information, please visit BugBot documentation.
Updated•9 months ago
|
Description
•