Closed Bug 1741899 Opened 6 months ago Closed 3 months ago

Intermittent "Display already acquired" crash after GeckoSession was released from GeckoView

Categories

(GeckoView :: General, defect, P1)

Unspecified
All

Tracking

(firefox-esr91 unaffected, firefox96 wontfix, firefox97 wontfix, firefox98+ wontfix, firefox99+ fixed)

RESOLVED FIXED
99 Branch
Tracking Status
firefox-esr91 --- unaffected
firefox96 --- wontfix
firefox97 --- wontfix
firefox98 + wontfix
firefox99 + fixed

People

(Reporter: sebastian, Assigned: agi)

References

Details

(Whiteboard: [geckoview:m97][geckoview:m98][geckoview:m99])

Attachments

(1 file)

When opening a custom tab in Fenix/Focus and the user selecting "Switch to browser" from the menu, we release the GeckoSession from GeckoView, switch to the browser activity and render the same GeckoSession on the GeckoView in that activity.

This works. But we have still been seeing "Display already acquired" crashes happening (due to a GeckoSession getting rendered by more than one GeckoView), that we were not able to reproduce.

Recently this crash started happening in Focus UI tests and allowed us to debug this further. We verified that we correctly released the GeckoSession before switching to the browser. And we noticed that we were not able to reproduce this issue with a debugger attached, or when waiting before switching to the browser, which may hint at some race condition happening. Then Mihai figured out that stopping the load using stopLoad() (Patch) prevents the crash from happening. So this may be a GeckoView bug, potentially a race?

How often does this seem to happen in real life? I can't find the signature in crash-stats?

Flags: needinfo?(s.kaspari)
Blocks: 1651096
Whiteboard: [geckoview:m97?]
Priority: -- → P1
Whiteboard: [geckoview:m97?] → [geckoview:m97]
Assignee: nobody → agi

I asked on matrix#treeherder whether there was a way to retry tasks many times to debug this bug and apparently there isn't. See Bug 1746426.

Whiteboard: [geckoview:m97] → [geckoview:m97][geckoview:m98]
Severity: -- → S3

This blocks an S1 bug (and basically our #1 top Fenix crash), so I'm not sure an S3 severity really fits.

Flags: needinfo?(etoop)

I can reproduce this crash reverting the change mentioned in Comment 0 in Focus. I'll look into it a little bit.

I need to look more into it, but from a first look it looks like the app is requesting the Display twice in a row causing the crash.

01-21 12:11:33.527 28364 28364 E sferrog : acquireDisplay
01-21 12:11:33.527 28364 28364 E sferrog : java.lang.RuntimeException
01-21 12:11:33.527 28364 28364 E sferrog : 	at org.mozilla.geckoview.GeckoSession.acquireDisplay(GeckoSession.java:2456)
01-21 12:11:33.527 28364 28364 E sferrog : 	at org.mozilla.geckoview.GeckoView.setSession(GeckoView.java:458)
01-21 12:11:33.527 28364 28364 E sferrog : 	at mozilla.components.browser.engine.gecko.GeckoEngineView.render(GeckoEngineView.kt:116)
01-21 12:11:33.527 28364 28364 E sferrog : 	at mozilla.components.feature.session.engine.EngineViewPresenter.renderTab(EngineViewPresenter.kt:89)
01-21 12:11:33.527 28364 28364 E sferrog : 	at mozilla.components.feature.session.engine.EngineViewPresenter.onTabToRender(EngineViewPresenter.kt:62)
01-21 12:11:33.527 28364 28364 E sferrog : 	at mozilla.components.feature.session.engine.EngineViewPresenter.access$onTabToRender(EngineViewPresenter.kt:24)
01-21 12:11:33.527 28364 28364 E sferrog : 	at mozilla.components.feature.session.engine.EngineViewPresenter$start$1$invokeSuspend$$inlined$collect$1.emit(Collect.kt:135)
01-21 12:11:33.527 28364 28364 E sferrog : 	at mozilla.components.support.ktx.kotlinx.coroutines.flow.FlowKt$ifAnyChanged$$inlined$filter$1$2.emit(Collect.kt:143)
01-21 12:11:33.527 28364 28364 E sferrog : 	at mozilla.components.feature.session.engine.EngineViewPresenter$start$1$invokeSuspend$$inlined$map$1$2.emit(Collect.kt:137)
01-21 12:11:33.527 28364 28364 E sferrog : 	at kotlinx.coroutines.flow.FlowKt__ChannelsKt.emitAllImpl$FlowKt__ChannelsKt(Channels.kt:62)
01-21 12:11:33.527 28364 28364 E sferrog : 	at kotlinx.coroutines.flow.FlowKt__ChannelsKt.access$emitAllImpl$FlowKt__ChannelsKt(Channels.kt:1)
01-21 12:11:33.527 28364 28364 E sferrog : 	at kotlinx.coroutines.flow.FlowKt__ChannelsKt$emitAllImpl$1.invokeSuspend(Unknown Source:14)
01-21 12:11:33.527 28364 28364 E sferrog : 	at kotlin.coroutines.jvm.internal.BaseContinuationImpl.resumeWith(ContinuationImpl.kt:33)
01-21 12:11:33.527 28364 28364 E sferrog : 	at kotlinx.coroutines.DispatchedTask.run(DispatchedTask.kt:106)
01-21 12:11:33.527 28364 28364 E sferrog : 	at android.os.Handler.handleCallback(Handler.java:938)
01-21 12:11:33.527 28364 28364 E sferrog : 	at android.os.Handler.dispatchMessage(Handler.java:99)
01-21 12:11:33.527 28364 28364 E sferrog : 	at android.os.Looper.loopOnce(Looper.java:201)
01-21 12:11:33.527 28364 28364 E sferrog : 	at android.os.Looper.loop(Looper.java:288)
01-21 12:11:33.527 28364 28364 E sferrog : 	at android.app.ActivityThread.main(ActivityThread.java:7798)
01-21 12:11:33.527 28364 28364 E sferrog : 	at java.lang.reflect.Method.invoke(Native Method)
01-21 12:11:33.527 28364 28364 E sferrog : 	at com.android.internal.os.RuntimeInit$MethodAndArgsCaller.run(RuntimeInit.java:548)
01-21 12:11:33.527 28364 28364 E sferrog : 	at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:1003)
01-21 12:11:33.707 28364 28364 E sferrog : acquireDisplay
01-21 12:11:33.707 28364 28364 E sferrog : java.lang.RuntimeException
01-21 12:11:33.707 28364 28364 E sferrog : 	at org.mozilla.geckoview.GeckoSession.acquireDisplay(GeckoSession.java:2456)
01-21 12:11:33.707 28364 28364 E sferrog : 	at org.mozilla.geckoview.GeckoView.setSession(GeckoView.java:458)
01-21 12:11:33.707 28364 28364 E sferrog : 	at mozilla.components.browser.engine.gecko.GeckoEngineView.render(GeckoEngineView.kt:116)
01-21 12:11:33.707 28364 28364 E sferrog : 	at mozilla.components.feature.session.engine.EngineViewPresenter.renderTab(EngineViewPresenter.kt:89)
01-21 12:11:33.707 28364 28364 E sferrog : 	at mozilla.components.feature.session.engine.EngineViewPresenter.onTabToRender(EngineViewPresenter.kt:62)
01-21 12:11:33.707 28364 28364 E sferrog : 	at mozilla.components.feature.session.engine.EngineViewPresenter.access$onTabToRender(EngineViewPresenter.kt:24)
01-21 12:11:33.707 28364 28364 E sferrog : 	at mozilla.components.feature.session.engine.EngineViewPresenter$start$1$invokeSuspend$$inlined$collect$1.emit(Collect.kt:135)
01-21 12:11:33.707 28364 28364 E sferrog : 	at mozilla.components.support.ktx.kotlinx.coroutines.flow.FlowKt$ifAnyChanged$$inlined$filter$1$2.emit(Collect.kt:143)
01-21 12:11:33.707 28364 28364 E sferrog : 	at mozilla.components.feature.session.engine.EngineViewPresenter$start$1$invokeSuspend$$inlined$map$1$2.emit(Collect.kt:137)
01-21 12:11:33.707 28364 28364 E sferrog : 	at kotlinx.coroutines.flow.FlowKt__ChannelsKt.emitAllImpl$FlowKt__ChannelsKt(Channels.kt:62)
01-21 12:11:33.707 28364 28364 E sferrog : 	at kotlinx.coroutines.flow.FlowKt__ChannelsKt.access$emitAllImpl$FlowKt__ChannelsKt(Channels.kt:1)
01-21 12:11:33.707 28364 28364 E sferrog : 	at kotlinx.coroutines.flow.FlowKt__ChannelsKt$emitAllImpl$1.invokeSuspend(Unknown Source:14)
01-21 12:11:33.707 28364 28364 E sferrog : 	at kotlin.coroutines.jvm.internal.BaseContinuationImpl.resumeWith(ContinuationImpl.kt:33)
01-21 12:11:33.707 28364 28364 E sferrog : 	at kotlinx.coroutines.DispatchedTask.run(DispatchedTask.kt:106)
01-21 12:11:33.707 28364 28364 E sferrog : 	at android.os.Handler.handleCallback(Handler.java:938)
01-21 12:11:33.707 28364 28364 E sferrog : 	at android.os.Handler.dispatchMessage(Handler.java:99)
01-21 12:11:33.707 28364 28364 E sferrog : 	at android.os.Looper.loopOnce(Looper.java:201)
01-21 12:11:33.707 28364 28364 E sferrog : 	at android.os.Looper.loop(Looper.java:288)
01-21 12:11:33.707 28364 28364 E sferrog : 	at android.app.ActivityThread.main(ActivityThread.java:7798)
01-21 12:11:33.707 28364 28364 E sferrog : 	at java.lang.reflect.Method.invoke(Native Method)
01-21 12:11:33.707 28364 28364 E sferrog : 	at com.android.internal.os.RuntimeInit$MethodAndArgsCaller.run(RuntimeInit.java:548)
01-21 12:11:33.707 28364 28364 E sferrog : 	at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:1003)
01-21 12:11:33.708 28364 28364 E sferrog : display already acquired
01-21 12:11:33.708 28364 28364 E sferrog : java.lang.RuntimeException
01-21 12:11:33.708 28364 28364 E sferrog : 	at org.mozilla.geckoview.GeckoSession.acquireDisplay(GeckoSession.java:2460)
01-21 12:11:33.708 28364 28364 E sferrog : 	at org.mozilla.geckoview.GeckoView.setSession(GeckoView.java:458)
01-21 12:11:33.708 28364 28364 E sferrog : 	at mozilla.components.browser.engine.gecko.GeckoEngineView.render(GeckoEngineView.kt:116)
01-21 12:11:33.708 28364 28364 E sferrog : 	at mozilla.components.feature.session.engine.EngineViewPresenter.renderTab(EngineViewPresenter.kt:89)
01-21 12:11:33.708 28364 28364 E sferrog : 	at mozilla.components.feature.session.engine.EngineViewPresenter.onTabToRender(EngineViewPresenter.kt:62)
01-21 12:11:33.708 28364 28364 E sferrog : 	at mozilla.components.feature.session.engine.EngineViewPresenter.access$onTabToRender(EngineViewPresenter.kt:24)
01-21 12:11:33.708 28364 28364 E sferrog : 	at mozilla.components.feature.session.engine.EngineViewPresenter$start$1$invokeSuspend$$inlined$collect$1.emit(Collect.kt:135)
01-21 12:11:33.708 28364 28364 E sferrog : 	at mozilla.components.support.ktx.kotlinx.coroutines.flow.FlowKt$ifAnyChanged$$inlined$filter$1$2.emit(Collect.kt:143)
01-21 12:11:33.708 28364 28364 E sferrog : 	at mozilla.components.feature.session.engine.EngineViewPresenter$start$1$invokeSuspend$$inlined$map$1$2.emit(Collect.kt:137)
01-21 12:11:33.708 28364 28364 E sferrog : 	at kotlinx.coroutines.flow.FlowKt__ChannelsKt.emitAllImpl$FlowKt__ChannelsKt(Channels.kt:62)
01-21 12:11:33.708 28364 28364 E sferrog : 	at kotlinx.coroutines.flow.FlowKt__ChannelsKt.access$emitAllImpl$FlowKt__ChannelsKt(Channels.kt:1)
01-21 12:11:33.708 28364 28364 E sferrog : 	at kotlinx.coroutines.flow.FlowKt__ChannelsKt$emitAllImpl$1.invokeSuspend(Unknown Source:14)
01-21 12:11:33.708 28364 28364 E sferrog : 	at kotlin.coroutines.jvm.internal.BaseContinuationImpl.resumeWith(ContinuationImpl.kt:33)
01-21 12:11:33.708 28364 28364 E sferrog : 	at kotlinx.coroutines.DispatchedTask.run(DispatchedTask.kt:106)
01-21 12:11:33.708 28364 28364 E sferrog : 	at android.os.Handler.handleCallback(Handler.java:938)
01-21 12:11:33.708 28364 28364 E sferrog : 	at android.os.Handler.dispatchMessage(Handler.java:99)
01-21 12:11:33.708 28364 28364 E sferrog : 	at android.os.Looper.loopOnce(Looper.java:201)
01-21 12:11:33.708 28364 28364 E sferrog : 	at android.os.Looper.loop(Looper.java:288)
01-21 12:11:33.708 28364 28364 E sferrog : 	at android.app.ActivityThread.main(ActivityThread.java:7798)
01-21 12:11:33.708 28364 28364 E sferrog : 	at java.lang.reflect.Method.invoke(Native Method)
01-21 12:11:33.708 28364 28364 E sferrog : 	at com.android.internal.os.RuntimeInit$MethodAndArgsCaller.run(RuntimeInit.java:548)
01-21 12:11:33.708 28364 28364 E sferrog : 	at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:1003)

OK So I'm more confident now that this is an AC bug. It looks like something is going sideways when a FirstContentfulPaint fires just at the right time (hence why stopping the load "fixes" the problem)

01-21 17:01:33.127 15930 15930 E sferrog : release
01-21 17:01:33.130 15930 15930 E sferrog : releaseDisplay
01-21 17:01:33.130 15930 15930 E sferrog : migrate
01-21 17:01:33.174 15930 15930 E sferrog : ifAnyChangedtab.id=fb9db19d-20e6-4d5a-87a7-8b34f016ae06 engineSession=mozilla.components.browser.engine.gecko.GeckoEngineSession@68cac72 crashed=false fCp=false
01-21 17:01:33.175 15930 15930 E sferrog : ifAnyChangedtab.id=fb9db19d-20e6-4d5a-87a7-8b34f016ae06 engineSession=mozilla.components.browser.engine.gecko.GeckoEngineSession@68cac72 crashed=false fCp=false
01-21 17:01:33.308 15930 15930 E sferrog : ifAnyChangedtab.id=fb9db19d-20e6-4d5a-87a7-8b34f016ae06 engineSession=mozilla.components.browser.engine.gecko.GeckoEngineSession@68cac72 crashed=false fCp=true
01-21 17:01:33.308 15930 15930 E sferrog : onTabToRender(fb9db19d-20e6-4d5a-87a7-8b34f016ae06)
01-21 17:01:33.309 15930 15930 E sferrog : renderTab(View: 170793867, Session: 109882482)
01-21 17:01:33.310 15930 15930 E sferrog : acquireDisplay
01-21 17:01:33.315 15930 15930 E sferrog : ifAnyChangedtab.id=fb9db19d-20e6-4d5a-87a7-8b34f016ae06 engineSession=mozilla.components.browser.engine.gecko.GeckoEngineSession@68cac72 crashed=false fCp=true
01-21 17:01:33.477 15930 15930 E sferrog : start
01-21 17:01:33.483 15930 15930 E sferrog : ifAnyChangedtab.id=fb9db19d-20e6-4d5a-87a7-8b34f016ae06 engineSession=mozilla.components.browser.engine.gecko.GeckoEngineSession@68cac72 crashed=false fCp=true
01-21 17:01:33.483 15930 15930 E sferrog : ifAnyChangedtab.id=fb9db19d-20e6-4d5a-87a7-8b34f016ae06 engineSession=mozilla.components.browser.engine.gecko.GeckoEngineSession@68cac72 crashed=false fCp=true
01-21 17:01:33.527 15930 15930 E sferrog : ifAnyChangedtab.id=fb9db19d-20e6-4d5a-87a7-8b34f016ae06 engineSession=mozilla.components.browser.engine.gecko.GeckoEngineSession@68cac72 crashed=false fCp=true
01-21 17:01:33.527 15930 15930 E sferrog : ifAnyChangedtab.id=fb9db19d-20e6-4d5a-87a7-8b34f016ae06 engineSession=mozilla.components.browser.engine.gecko.GeckoEngineSession@68cac72 crashed=false fCp=true
01-21 17:01:33.541 15930 15930 E sferrog : ifAnyChangedtab.id=fb9db19d-20e6-4d5a-87a7-8b34f016ae06 engineSession=mozilla.components.browser.engine.gecko.GeckoEngineSession@68cac72 crashed=false fCp=true
01-21 17:01:33.542 15930 15930 E sferrog : onTabToRender(fb9db19d-20e6-4d5a-87a7-8b34f016ae06)
01-21 17:01:33.544 15930 15930 E sferrog : renderTab(View: 60118107, Session: 109882482)
01-21 17:01:33.545 15930 15930 E sferrog : acquireDisplay
01-21 17:01:33.545 15930 15930 E sferrog : display already acquired

Current theory:

OK more obscure logs:

01-21 17:50:59.328 22687 22687 E sferrog : [95686291] presenter.start view=174529952 947a2e95-bb6b-4dfe-b14b-addce3776996
01-21 17:50:59.336 22687 22687 E sferrog : [152987644] presenter.start view=174529952 947a2e95-bb6b-4dfe-b14b-addce3776996
...
01-21 17:51:00.464 22687 22687 E sferrog : [95686291] presenter.stop view=174529952 947a2e95-bb6b-4dfe-b14b-addce3776996
...
01-21 17:51:00.636 22687 22687 E sferrog : renderTab(view=174529952, session=137883466)
01-21 17:51:00.776 22687 22687 E sferrog : renderTab(view=193951177, session=137883466)
...
01-21 17:51:00.777 22687 22687 E sferrog : display already acquired

one problematic thing that is happening is that we have multiple EngineViewPresenter per GeckoView instance. So even when one of them is stopped, the other one is still able to receive messages and reattach the session that was previously detached, causing two calls to renderTab with different views on it.

OK So updated current theory, what follows is my understanding (warning: things might be incorrect):

ViewBoundFeatureWrapper is used by AC to manage the lifetime of objects connected to a View. In particular, it's used to start and stop the SessionFeature which in turns will (indirectly) setSession and release the GeckoView instance associated to the feature.

By design, GeckoView can only have one session set at any given time, so it's crucial that only one ViewBoundFeatureWrapper "owns" a GeckoView at any time. Failure to do so, i.e. calling setSession with two different sessions or release twice in a row, will cause the crash in this bug.

Focus (and maybe Fenix too?), however, creates multiple ViewBoundFeatureWrapper instances for the same GeckoView when opening a custom tab, here and here.

(1) Note now, that EngineViewPresenter will call renderTab every time any of the following properties change: the tab Id, the session associated to the tab, the "crashed" propery and the firstContentfulPaint property. renderTab will in turn call setSession on the GeckoView instance every time any of the above properties are updated, even if the property does not change which session is being displayed (e.g. in the case of the firstContentfulPaint event).

The bug is now possible with the following chain of events:

  • User creates a custom tab, this creates two separate ViewBoundFeatureWrapper for the GeckoView in the custom tab
  • User taps on the "Open in Focus" button, while the page is loading
  • While migrating the tab to the main browser, Focus calls release on one of the ViewBoundFeatureWrapper instances to release the Session from the GeckoView instance, but crucially, not on the other ViewBoundFeatureWrapper instance.
  • While the tab is being migrated, a FirstContenfulPaint even fires. The first ViewBoundFeatureWrapper does not respond, as it was correctly stopped in the step above, but the other one does, calling setSession on the Custom Tab view as explained in (1).
  • The tab finished migrating to the browser, which calls setSession again, this time on the main browser View, triggering the crash.

There are other combinations of events that trigger this bug in this area which I'm not going to go into since they involve the same fix.

The two problems that I see here are:

  • We create multiple ViewBoundFeatureWrapper for the same GeckoView. This is always going to end up badly, we should come up with a way to enforce that this never happens.
  • We call setSession regardless of which event is causing ifAnyChanged to trigger in EngineViewPresenter. I think that's wrong and that could also cause problems, we should only do it if a relevant change in the session happened.

(In reply to Agi Sferro | :agi | ni? for questions | ⏰ PST | he/him from comment #9)

OK more obscure logs:

01-21 17:50:59.328 22687 22687 E sferrog : [95686291] presenter.start view=174529952 947a2e95-bb6b-4dfe-b14b-addce3776996
01-21 17:50:59.336 22687 22687 E sferrog : [152987644] presenter.start view=174529952 947a2e95-bb6b-4dfe-b14b-addce3776996

one problematic thing that is happening is that we have multiple `EngineViewPresenter` per `GeckoView` instance. So even when one of them is stopped, the other one is still able to receive messages and reattach the session that was previously detached, causing two calls to `renderTab` with different `view`s on it.

Discussed this on Matrix: I think what you're seeing here is a Focus bug. There shouldn't be two SessionFeature's working at the same time. At most, they should have sequential lifetimes so they do not start together (start -> stop -> start -> stop vs start -> start -> stop -> stop).

I don't expect we see this bug in Fenix because we have ExternalAppBrowserFragment which is used for Custom Tabs that inherits the SessionFeature and does not need to make it's own in the activity.

Asked Amedyne on Slack to find an owner to fix the Focus bug outlined in Comment 10.

Flags: needinfo?(amoya)

@mcarare is looking into this.

Flags: needinfo?(amoya)
Flags: needinfo?(etoop)
Whiteboard: [geckoview:m97][geckoview:m98] → [geckoview:m97][geckoview:m98][geckoview:m99]

Needs further investigation. Agi should talk to Christian about next steps.

Flags: needinfo?(agi)

Historically, we wanted apps to be deliberate about setting/unsetting sessions
on a GeckoView instace because we were worried that they might leak a
GeckoSession (which would leak a tab running in Gecko), so we expected apps to
call view.releaseSession and collect the returned session.

We also expected apps's code to be more straightforward and clearly have a
release/set session pair that could be verifiably happening in order.

All of the above assumptions are broken in AC, where the code for
releaseSession and setSession is driven by a reactive data store.

The state management piece is also so hard for AC that they call
releaseSession conservatively in a lot of places and never actually look at
the return value.

To more align GeckoView with AC's architecture this patch moves into the
direction where apps only setSession and never release.

GeckoView will release the session from the old GeckoView when
appropriate.

The only problem with the above is that we might get into a state
where the app thinks a GeckoView is displaying a session while really it isn't
because the session was set to a different GeckoView.

To detect the above case we introduce a new flag called mIsSessionPoisoned
which is set whenever the GeckoView instance used to hold a reference to a
session which has since gotten attached to a different window. If the app tries
to display such a GeckoView, we will throw an exception in onAttachedToWindow.

After this patch, apps won't need to manage release/set manually and if
they somehow end up in a broken state (e.g. because the app has a bug) we still
correctly crash.

This avoids crashing if the release/set pairs are out of order (e.g.
because onAttachedToWindow, onDetachFromWindow arrive out of order) which
is our leading theory for the crash in the bug attached to this patch.

We're not sure that this patch is going to fix the crashes.

Flags: needinfo?(agi)
Keywords: leave-open
Pushed by asferro@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/a7a1766b9d39
Detach session from GeckoView when setting it to other instance. r=calu

We haven't seen any instances of bug 1651096 so far on Nightly since this patch landed there :-)

Still no reports, so we can close this for now.

Status: NEW → RESOLVED
Closed: 3 months ago
Resolution: --- → FIXED
Keywords: leave-open
Target Milestone: --- → 99 Branch

Given the volume of bug 1651096, I think this is worth considering for backport to 98 once y'all are feeling confident in the lack of side effects from this change. Please nominate for release approval when you're ready if you agree.

Flags: needinfo?(agi)

Discussed with csadilek, because the crash has existed for so long, we're OK with waiting a few days to see how the patch behaves in Beta. We're gonna re-evaluate then whether we want to uplift or not.

Flags: needinfo?(agi)

Mobile team wants to let this ride 99 to release.

You need to log in before you can comment on or make changes to this bug.