Intermittent <test-name> | application crashed [@ mozilla::dom::ScreenOrientation::~ScreenOrientation()]
Categories
(Core :: DOM: Core & HTML, defect, P1)
Tracking
()
People
(Reporter: intermittent-bug-filer, Assigned: marcosc)
References
Details
(4 keywords, Whiteboard: [geckoview:p1])
Crash Data
Attachments
(1 file)
Filed by: asferro [at] mozilla.com
Parsed log: https://treeherder.mozilla.org/logviewer.html#?job_id=255609876&repo=try
Full log: https://queue.taskcluster.net/v1/task/L13q2qAOQPa3qBeWBXU1UQ/runs/0/artifacts/public/logs/live_backing.log
Enabling x86_64 geckoview junit tests triggers this intermittently.
Updated•5 years ago
|
Updated•5 years ago
|
Updated•5 years ago
|
Comment 1•5 years ago
|
||
Does anyone familiar with HAL know what's going on here? This assertion happens randomly on our test suite. See this try here: https://treeherder.mozilla.org/#/jobs?repo=try&selectedJob=255579830&revision=3195db587c1d3b46d315cf3df4ac1d3a146eacb6 (there are pretty much no changes with m-c, just enabling the tests)
Comment 2•5 years ago
|
||
Olli, do we really need to assert in ~ScreenOrientation that we've exited full screen [1] when shutting down?
This assertion failure is causing intermittent GeckoView junit test failures on x86_64. We can't disable individual tests because this assertion failure happens randomly in many tests.
Comment 3•5 years ago
|
||
I don't know. That assertion comes from bug 1131470.
I'm really not familiar with that code, would need to read it all.
It wasn't immediately clear whether the assertion is really something we must have, but not clear either whether we could remove it.
Perhaps baku remembers this better.
Comment 4•5 years ago
|
||
This assertion makes sense because, in theory, we want to release the fullscreen listener before the DTOR. If this doesn't happen, we have a bug somewhere. In general, reading the code quickly, it seems all too fragile:
- it seems that nothing guarantees Unlock to be called.
- the document could be hidden when LockOrientationTask() runs
- the pending promise could probably be overwritten if lock() is called more than once.
I would not remove that assertion. Instead I would improve the code in order to be sure to release the fullscreen listener correctly.
It seems that Marcos touched the code, recently. Maybe he has options and/or time to work on it.
Comment hidden (Intermittent Failures Robot) |
Comment 6•5 years ago
|
||
To add some context: this assertion is preventing us from turning on x86_64
Junit tests for GeckoView.
Assignee | ||
Comment 7•5 years ago
|
||
Apologies for the delay... just getting to this...
(In reply to Andrea Marchesini [:baku] from comment #4)
This assertion makes sense because, in theory, we want to release the fullscreen listener before the DTOR. If this doesn't happen, we have a bug somewhere. In general, reading the code quickly, it seems all too fragile:
- it seems that nothing guarantees Unlock to be called.
- the document could be hidden when LockOrientationTask() runs
- the pending promise could probably be overwritten if lock() is called more than once.
All valid points. For 3, mPromise should be guarding against that...
I would not remove that assertion. Instead I would improve the code in order to be sure to release the fullscreen listener correctly.
It seems that Marcos touched the code, recently. Maybe he has options and/or time to work on it.
I'll got through it today and see if I can tighten some things up and figure out why mFullscreenListener is still alive.
Assignee | ||
Updated•5 years ago
|
Assignee | ||
Comment 8•5 years ago
|
||
(In reply to Marcos Caceres [:marcosc] from comment #7)
Apologies for the delay... just getting to this...
(In reply to Andrea Marchesini [:baku] from comment #4)
This assertion makes sense because, in theory, we want to release the fullscreen listener before the DTOR. If this doesn't happen, we have a bug somewhere. In general, reading the code quickly, it seems all too fragile:
- it seems that nothing guarantees Unlock to be called.
This is more than likely the culprit. UnlockDeviceOrientation only gets called iff (mOrientationLock == hal::eScreenOrientation_None)
, so I think that's leaving mFullscreenListener
alive.
- the document could be hidden when LockOrientationTask() runs
There is a check for this on line ~169, so I think this is ok:
if (mDocument->Hidden()) {
// Active orientation lock is not the document's orientation lock.
mPromise->MaybeResolveWithUndefined();
mDocument->SetOrientationPendingPromise(nullptr);
return NS_OK;
}
- the pending promise could probably be overwritten if lock() is called more than once.
AbortOrientationPromises() handles 3, actually.
I would not remove that assertion. Instead I would improve the code in order to be sure to release the fullscreen listener correctly.
Do we know what actual test is triggering this? I couldn't figure it out from the logs.
Comment 9•5 years ago
|
||
This happens in a few tests, e.g.:
org.mozilla.geckoview.test.MediaElementTest.webmTimeMedia
org.mozilla.geckoview.test.crash.CrashTest.crashContent
org.mozilla.geckoview.test.NavigationDelegateTest.onNewSession_supportNoOpener
You need to enable the debug x86_64 tests to run this in try: https://hg.mozilla.org/try/rev/134d603918a51b32c0a332514e52eed099232c7d
Unfortunately I've never been able to reproduce this locally.
Assignee | ||
Comment 10•5 years ago
|
||
(In reply to Agi | :agi | ⏰ EST | he/him from comment #9)
This happens in a few tests, e.g.:
org.mozilla.geckoview.test.MediaElementTest.webmTimeMedia
org.mozilla.geckoview.test.crash.CrashTest.crashContent
org.mozilla.geckoview.test.NavigationDelegateTest.onNewSession_supportNoOpener
I can't find these in Searchfox, or what they look like from a JS perspective? (I've zero Android or jUnit experience, sorry).
You need to enable the debug x86_64 tests to run this in try: https://hg.mozilla.org/try/rev/134d603918a51b32c0a332514e52eed099232c7d
Will add. Thank you!
Unfortunately I've never been able to reproduce this locally.
I have a hypothesis that .exitFullScreen()
and .unlock()
are causing a race condition.
I'll put up a patch, which includes enabling the x86_64
tests on Try... however, I've no idea how to get them to repeat to get a significant sample of runs to see if we can reproduce? Should I just get try to rebuild 20 times? Or is there some other way to get this to run continuously.
I'll try and see if it happens once... then can send it to rebuild 20 times. But if there is some way to get it run the tests 1000 times or so, that would be better.
Assignee | ||
Comment 11•5 years ago
|
||
Comment 12•5 years ago
|
||
(In reply to Marcos Caceres [:marcosc] from comment #10)
(In reply to Agi | :agi | ⏰ EST | he/him from comment #9)
This happens in a few tests, e.g.:
org.mozilla.geckoview.test.MediaElementTest.webmTimeMedia
org.mozilla.geckoview.test.crash.CrashTest.crashContent
org.mozilla.geckoview.test.NavigationDelegateTest.onNewSession_supportNoOpener
I can't find these in Searchfox, or what they look like from a JS perspective? (I've zero Android or jUnit experience, sorry).
You want to look for the last part of the name after the .
, that's the method name, e.g. https://searchfox.org/mozilla-central/rev/4436573fa3d5c4311822090e188504c10c916c6f/mobile/android/geckoview/src/androidTest/java/org/mozilla/geckoview/test/NavigationDelegateTest.kt#1104
You need to enable the debug x86_64 tests to run this in try: https://hg.mozilla.org/try/rev/134d603918a51b32c0a332514e52eed099232c7d
Will add. Thank you!
Unfortunately I've never been able to reproduce this locally.
I have a hypothesis that
.exitFullScreen()
and.unlock()
are causing a race condition.I'll put up a patch, which includes enabling the
x86_64
tests on Try... however, I've no idea how to get them to repeat to get a significant sample of runs to see if we can reproduce? Should I just get try to rebuild 20 times? Or is there some other way to get this to run continuously.I'll try and see if it happens once... then can send it to rebuild 20 times. But if there is some way to get it run the tests 1000 times or so, that would be better.
I don't know of a better way to do that :) if it doesn't happen after 20 runs we should be in the clear, it happens every 2 or 3 runs usually. Note: there are other crashes too, that's normal.
Assignee | ||
Comment 13•5 years ago
|
||
Try chooser doesn't let me set the number of rebuilds 👎... so trying this manually:
./mach try --rebuild 20 -b o -p android -u junit
Will see what comes out (I'm just taking a wild guess at the above)...
https://treeherder.mozilla.org/#/jobs?repo=try&revision=696e0ccee6d19dc5b8bd9bd762b144312cfefa83
Worst case, I can just configure try chooser manually 10 times or so.
Assignee | ||
Comment 14•5 years ago
|
||
Bah, hit it again:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=5e0acad8a6a729dc869eacd4f0e38ecf54758be5&selectedJob=257819261
Back to square 1.
Assignee | ||
Comment 15•5 years ago
|
||
I wonder if it would be too abhorrent to just explicitly call ::UnlockDeviceOrientation()
in the DTOR? The would give us the guarantee of unlock, it would assure that mFullscreenListener
gets nulled out.
I still think the fragility is coming from the ScreenOrientation::FullscreenEventListener::HandleEvent()
not called before ScreenOrientation is destroyed - but that requires going up one level and figuring out if it is that event not dispatching (if that's the cause). Having looked at the code for a few days now, I can't see what else it could be.
Trying that out...
https://treeherder.mozilla.org/#/jobs?repo=try&revision=b5e52103b0235c87d8688ff8bf93c8d4681f3eff
Comment 16•5 years ago
|
||
It could be a solution, yes. But would be nice to have a bug to cleanup the current implementation (P3...?)
Assignee | ||
Comment 17•5 years ago
|
||
We are unsure as to the actual cause of the crash, but we suspect that when fullscreen exists,
the even listener is not being called. That was leaving mFullscreenListener
alive when the DTOR gets called
hitting the MOZ_ASSERT(!mFullscreenListener)
.
We now always all call UnlockDeviceOrientation() in the DTOR to ensure that:
hal::UnlockScreenOrientation()
gets called.mFullscreenListener
gets null'ed- we
RemoveSystemEventListener()
for "fullscreenchange".
Comment 18•5 years ago
|
||
Pushed by mcaceres@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/adc36d5fd87a Intermittent <test-name> | application crashed [@ mozilla::dom::ScreenOrientation::~ScreenOrientation()] r=baku
Comment 19•5 years ago
|
||
bugherder |
Comment 20•5 years ago
|
||
Marcos, should we uplift your fix to GeckoView Beta and Fennec (ESR 68.1)? Does your fix prevent a real crash or just the test assertion failure?
Assignee | ||
Comment 21•5 years ago
|
||
I'm not aware of it crashing anything in practice... it seems to occasionally happen in our testing infrastructure. Other folks here might know if this is actually crashing for real on Android. I'd feel more comfortable not uplifting and letting this sit on nightly for a bit.
Agi, wdyt?
Assignee | ||
Comment 22•5 years ago
|
||
As requested by Baku in https://bugzilla.mozilla.org/show_bug.cgi?id=1564621#c16 I filed https://bugzilla.mozilla.org/show_bug.cgi?id=1571985
Comment 23•5 years ago
|
||
Not aware of any real crash for this, just an assertion failure in debug builds.
Comment 24•5 years ago
|
||
(In reply to :Agi | ⏰ PST | he/him from comment #23)
Not aware of any real crash for this, just an assertion failure in debug builds.
In that case, I'm setting firefox69=wontfix and firefox-esr68=wontfix because we don't need to uplift for GV Beta or Fennec ESR.
Description
•