Lots of timeouts for wpt tests on Android Fission (TestRunner hit external timeout (this may indicate a hang))
Categories
(GeckoView :: General, task, P1)
Tracking
(firefox132 fixed)
Tracking | Status | |
---|---|---|
firefox132 | --- | fixed |
People
(Reporter: owlish, Assigned: whimboo)
References
(Blocks 1 open bug)
Details
(Whiteboard: [fxdroid][group1])
When running WPT tests on Android Fission with both isolate-everything
and isolate-high-value
strategies, there are multiple timeouts in every build (typically about 15 in various tests). On debug builds, however, there are close to a 1000 of these timeouts [0][1]
There must be some fundamental problem going on, we need to figure out what it is.
[0] https://treeherder.mozilla.org/jobs?repo=try&revision=d8c38daaf1f0f2f3d0ac6b05e4313a99c49846ff&selectedTaskRun=NKrC4W4MRwuF80DWePXUyg.1
[1] https://treeherder.mozilla.org/jobs?repo=try&revision=0598e320d6838a999b4cf4f92ffc2a6873e7e775&selectedTaskRun=V6poAaaeQIaxEKt3vHx3Yw.0
Reporter | ||
Updated•5 months ago
|
Updated•5 months ago
|
Reporter | ||
Updated•4 months ago
|
Assignee | ||
Comment 1•3 months ago
|
||
Just to clarify the failure that is happening here and causing these timeouts is:
TestRunner hit external timeout (this may indicate a hang)
This is because for the initial page load of the test page the state change webprogress listener event is not sent with the flag STATE_START
set, because there is already an ongoing navigation to about:blank
when the test window gets opened. As such the final state change event for STATE_STOP
is not considered for the current page load.
Nika tried a patch which sends out extra stop and start events but that didn't seem to work because front-end code isn't expecting that. When testing with BFCache disabled the problem seem to be gone:
https://treeherder.mozilla.org/jobs?repo=try&revision=6c6c717d073a3c4015781eb7f2827c1d79321654
So maybe it's an underlying issue with the BFCache implementation on Android?
Beside all that I'll try to slightly modify wptrunner to immediately load the test page for now when the test window gets opened. There are as well lots of failures but I'll check if those might reveal issues that were hidden before.
https://treeherder.mozilla.org/jobs?repo=try&revision=ec38def2b71ad9e8fe1ac78663fa2a63c7bd4b98
Reporter | ||
Updated•3 months ago
|
Reporter | ||
Comment 2•3 months ago
|
||
Hey Makoto, any discoveries in your debugging so far? Are these similar to SHIP timeouts?
Assignee | ||
Comment 3•3 months ago
|
||
Oh sorry. I thougth that this bug is for the Fission SHIP timeouts. If that is not the case please rename the bug's summary accordingly. At least from my previous work and try builds all looked like the same.
Comment 4•2 months ago
|
||
(In reply to [:owlish] 🦉 PST from comment #2)
Hey Makoto, any discoveries in your debugging so far? Are these similar to SHIP timeouts?
I guess yes. Actually, even if we don't modify isolated strategy, encoding/legacy-mb-korean/euc-kr/euckr-decode.html?11001-12000
etc is also timeout. When timeout occurs, HTTP request isn't sent, then previous document isn't unloaded. I guess It depends on cache/history.
Bug 1522790 changes to use marionette API for starting testharness ,instead of using window.open
. So I think that this most (all?) timeout will be fixed by it.
Reporter | ||
Updated•2 months ago
|
Reporter | ||
Comment 5•2 months ago
|
||
(In reply to Henrik Skupin [:whimboo][⌚️UTC+2] from comment #3)
Oh sorry. I thougth that this bug is for the Fission SHIP timeouts. If that is not the case please rename the bug's summary accordingly. At least from my previous work and try builds all looked like the same.
I mean, your edit to the summary was not wrong, SHIP is indeed enabled on Fission. It's just that SHIP can be enabled without Fission, and this bug is not about that :)
Assignee | ||
Comment 6•2 months ago
|
||
Now that bug 1522790 is fixed can you please check the Fission and SHIP jobs again? They should now work and don't timeout anymore due to this hang in the testrunner. But t is bug 1891706 right now that we have to investigate.
Comment 7•1 month ago
|
||
(In reply to Henrik Skupin [:whimboo][⌚️UTC+2] from comment #6)
Now that bug 1522790 is fixed can you please check the Fission and SHIP jobs again? They should now work and don't timeout anymore due to this hang in the testrunner. But t is bug 1891706 right now that we have to investigate.
Thank you, yes, as long as I run WPT with fission+SHIP on local, there is no timeout. So I will update *.ini meta data and we may use sessionHistoryInParent
in meta data too.
Assignee | ||
Comment 8•1 month ago
|
||
(In reply to Makoto Kato [:m_kato] from comment #7)
Thank you, yes, as long as I run WPT with fission+SHIP on local, there is no timeout. So I will update *.ini meta data and we may use
sessionHistoryInParent
in meta data too.
That's great to hear! Please note that Olivia is already doing the metadata update via bug 1919837.
So I assume there is nothing more to do here on this bug. Marking it fixed based by my work on bug 1522790.
Description
•