Closed Bug 1903929 Opened 5 months ago Closed 1 month ago

Lots of timeouts for wpt tests on Android Fission (TestRunner hit external timeout (this may indicate a hang))

Categories

(GeckoView :: General, task, P1)

All
Android
task

Tracking

(firefox132 fixed)

RESOLVED FIXED
132 Branch
Tracking Status
firefox132 --- fixed

People

(Reporter: owlish, Assigned: whimboo)

References

(Blocks 1 open bug)

Details

(Whiteboard: [fxdroid][group1])

When running WPT tests on Android Fission with both isolate-everything and isolate-high-value strategies, there are multiple timeouts in every build (typically about 15 in various tests). On debug builds, however, there are close to a 1000 of these timeouts [0][1]

There must be some fundamental problem going on, we need to figure out what it is.

[0] https://treeherder.mozilla.org/jobs?repo=try&revision=d8c38daaf1f0f2f3d0ac6b05e4313a99c49846ff&selectedTaskRun=NKrC4W4MRwuF80DWePXUyg.1
[1] https://treeherder.mozilla.org/jobs?repo=try&revision=0598e320d6838a999b4cf4f92ffc2a6873e7e775&selectedTaskRun=V6poAaaeQIaxEKt3vHx3Yw.0

Severity: -- → N/A
Type: defect → task
Priority: -- → P1
Whiteboard: [fxdroid][group1]
Assignee: nobody → bugzeeeeee

Just to clarify the failure that is happening here and causing these timeouts is:

TestRunner hit external timeout (this may indicate a hang)

This is because for the initial page load of the test page the state change webprogress listener event is not sent with the flag STATE_START set, because there is already an ongoing navigation to about:blank when the test window gets opened. As such the final state change event for STATE_STOP is not considered for the current page load.

Nika tried a patch which sends out extra stop and start events but that didn't seem to work because front-end code isn't expecting that. When testing with BFCache disabled the problem seem to be gone:

https://treeherder.mozilla.org/jobs?repo=try&revision=6c6c717d073a3c4015781eb7f2827c1d79321654

So maybe it's an underlying issue with the BFCache implementation on Android?

Beside all that I'll try to slightly modify wptrunner to immediately load the test page for now when the test window gets opened. There are as well lots of failures but I'll check if those might reveal issues that were hidden before.

https://treeherder.mozilla.org/jobs?repo=try&revision=ec38def2b71ad9e8fe1ac78663fa2a63c7bd4b98

Summary: Multiple timeouts in W(wpt) suite on Android Fission → Lots of timeouts for wpt tests on Android Fission with SHiP enabled (TestRunner hit external timeout (this may indicate a hang))
Depends on: 1761634
Depends on: 1522790
Assignee: bugzeeeeee → m_kato

Hey Makoto, any discoveries in your debugging so far? Are these similar to SHIP timeouts?

Flags: needinfo?(m_kato)

Oh sorry. I thougth that this bug is for the Fission SHIP timeouts. If that is not the case please rename the bug's summary accordingly. At least from my previous work and try builds all looked like the same.

(In reply to [:owlish] 🦉 PST from comment #2)

Hey Makoto, any discoveries in your debugging so far? Are these similar to SHIP timeouts?

I guess yes. Actually, even if we don't modify isolated strategy, encoding/legacy-mb-korean/euc-kr/euckr-decode.html?11001-12000 etc is also timeout. When timeout occurs, HTTP request isn't sent, then previous document isn't unloaded. I guess It depends on cache/history.

Bug 1522790 changes to use marionette API for starting testharness ,instead of using window.open. So I think that this most (all?) timeout will be fixed by it.

Flags: needinfo?(m_kato)
Summary: Lots of timeouts for wpt tests on Android Fission with SHiP enabled (TestRunner hit external timeout (this may indicate a hang)) → Lots of timeouts for wpt tests on Android Fission (TestRunner hit external timeout (this may indicate a hang))

(In reply to Henrik Skupin [:whimboo][⌚️UTC+2] from comment #3)

Oh sorry. I thougth that this bug is for the Fission SHIP timeouts. If that is not the case please rename the bug's summary accordingly. At least from my previous work and try builds all looked like the same.

I mean, your edit to the summary was not wrong, SHIP is indeed enabled on Fission. It's just that SHIP can be enabled without Fission, and this bug is not about that :)

No longer depends on: 1761634

Now that bug 1522790 is fixed can you please check the Fission and SHIP jobs again? They should now work and don't timeout anymore due to this hang in the testrunner. But t is bug 1891706 right now that we have to investigate.

Flags: needinfo?(m_kato)

(In reply to Henrik Skupin [:whimboo][⌚️UTC+2] from comment #6)

Now that bug 1522790 is fixed can you please check the Fission and SHIP jobs again? They should now work and don't timeout anymore due to this hang in the testrunner. But t is bug 1891706 right now that we have to investigate.

Thank you, yes, as long as I run WPT with fission+SHIP on local, there is no timeout. So I will update *.ini meta data and we may use sessionHistoryInParent in meta data too.

Flags: needinfo?(m_kato)

(In reply to Makoto Kato [:m_kato] from comment #7)

Thank you, yes, as long as I run WPT with fission+SHIP on local, there is no timeout. So I will update *.ini meta data and we may use sessionHistoryInParent in meta data too.

That's great to hear! Please note that Olivia is already doing the metadata update via bug 1919837.

So I assume there is nothing more to do here on this bug. Marking it fixed based by my work on bug 1522790.

Assignee: m_kato → hskupin
Status: NEW → RESOLVED
Closed: 1 month ago
Resolution: --- → FIXED
See Also: → 1919837
Target Milestone: --- → 132 Branch
You need to log in before you can comment on or make changes to this bug.