[geckoview] Fix content process shutdown

NEW
Assigned to

Status

()

P2
normal
2 years ago
5 months ago

People

(Reporter: esawin, Assigned: esawin)

Tracking

Trunk
All
Android
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(5 attachments)

(Assignee)

Description

2 years ago
GeckoView [e10s] currently doesn't properly shutdown the content process.

STR 1
1. Open geckoview_example app.
2. Hit the Android back button.
3. Re-open geckoview_example app.

Expected: The app opens and shows the default page.
Actual: Crash (failed IsCurrent() check) or hang.

STR 2
1. Open Focus app.
2. Search for something (or open a specific URL).
3. Delete the session (using button in bottom right corner).
4. Search for something again.

Expected: The search result are displayed.
Actual: Crash or hang.
(Assignee)

Updated

2 years ago
Blocks: 1322573
(Assignee)

Comment 1

2 years ago
Created attachment 8851606 [details] [diff] [review]
0001-Bug-1350924-1.0-Stop-and-kill-the-content-process-wh.patch

Unbind and kill the content service process when destructing the content process host.

Alternatively, we could just _exit(0) in ~ContentChild(), but this seemed like a more explicit and symmetric solution, given that we also launch the content process in the content process host.
(Assignee)

Comment 2

2 years ago
Created attachment 8851607 [details] [diff] [review]
0002-Bug-1350924-2.0-Abort-content-process-launch-when-se.patch

Don't try to shutdown and restart the content process during start if it's still around. Instead, just abort the start and return.

We will need something more sophisticated here, should we want to support more than one content process.
(Assignee)

Comment 3

2 years ago
Created attachment 8851608 [details] [diff] [review]
0003-Bug-1350924-3.0-Refactor-content-process-service-han.patch

Refactor the content process handling and somewhat simplify the waiting procedure.
(Assignee)

Comment 4

2 years ago
Created attachment 8851618 [details] [diff] [review]
0004-Bug-1350924-4.0-Spin-the-Gecko-thread-event-loop-to-.patch

During the content process startup, we may trigger a deadlock between the Gecko thread and the UI thread.

1. [Gecko thread] Blocked waiting for the content process to launch in GeckoChildProcessHost::LaunchAndWaitForProcessHandle.
2. [UI thread] Blocked in createCompositor dispatch in LayerView.updateCompositor.

With this patch we spin the Gecko thread event loop to unblock the dispatching from the UI thread.

Eventually, we need async compositor creation, which should prevent the deadlock.
(Assignee)

Comment 5

2 years ago
Created attachment 8851983 [details] [diff] [review]
0005-Bug-1350924-5.0-Only-delete-textures-from-a-valid-co.patch

Prevent calling GL functions on a stale context when clearing the texture pool during the first call to TexturePoolOGL::Fill on a new context.
Attachment #8851983 - Flags: review?(snorp)
(Assignee)

Comment 6

2 years ago
Comment on attachment 8851618 [details] [diff] [review]
0004-Bug-1350924-4.0-Spin-the-Gecko-thread-event-loop-to-.patch

This fixes the deadlock in the geckoview_example app, but causes intermittent startup hangs in Focus (haven't debugged yet).

Do you see an issue with the implementation?
Attachment #8851618 - Flags: feedback?(nchen)
(Assignee)

Comment 7

2 years ago
Comment on attachment 8851608 [details] [diff] [review]
0003-Bug-1350924-3.0-Refactor-content-process-service-han.patch

Refactored waitForChild into more explicit waitForConnect and waitForDisconnect (not used, so left it out for now) to avoid additional wait helper members and functions.

Moved content process killing out of UI thread in case that it's blocked.
Attachment #8851608 - Flags: review?(rbarker)
(Assignee)

Comment 8

2 years ago
Comment on attachment 8851607 [details] [diff] [review]
0002-Bug-1350924-2.0-Abort-content-process-launch-when-se.patch

Return 0 in case that the connection hasn't been properly cleared, which signals failure to start the content process.
Attachment #8851607 - Flags: review?(snorp)
(Assignee)

Comment 9

2 years ago
Comment on attachment 8851606 [details] [diff] [review]
0001-Bug-1350924-1.0-Stop-and-kill-the-content-process-wh.patch

Feel free to redirect r? since I'm not sure who is best suited for this.

Stop content process on GeckoChildHost destruction. This seems to work reliably for the geckoview_example app or anytime we close the window.

We need to decide how to handle the case where GeckoView's instance state is saved, which prevents the closing of its window on detaching.
Attachment #8851606 - Flags: review?(snorp)
Attachment #8851983 - Flags: review?(snorp) → review+
Comment on attachment 8851606 [details] [diff] [review]
0001-Bug-1350924-1.0-Stop-and-kill-the-content-process-wh.patch

Review of attachment 8851606 [details] [diff] [review]:
-----------------------------------------------------------------

Looks ok, but I'd rather have rbarker review
Attachment #8851606 - Flags: review?(snorp) → review?(rbarker)
Comment on attachment 8851607 [details] [diff] [review]
0002-Bug-1350924-2.0-Abort-content-process-launch-when-se.patch

Review of attachment 8851607 [details] [diff] [review]:
-----------------------------------------------------------------

-> rbarker
Attachment #8851607 - Flags: review?(snorp) → review?(rbarker)
Comment on attachment 8851618 [details] [diff] [review]
0004-Bug-1350924-4.0-Spin-the-Gecko-thread-event-loop-to-.patch

Review of attachment 8851618 [details] [diff] [review]:
-----------------------------------------------------------------

::: ipc/glue/GeckoChildProcessHost.cpp
@@ +475,5 @@
>    while (mProcessState < PROCESS_CREATED) {
> +    lock.Wait(100);
> +    if (mProcessState < PROCESS_CREATED) {
> +      MonitorAutoUnlock unlock(mMonitor);
> +      NS_ProcessNextEvent();

NS_ProcessNextEvent() can block (mayWait defaults to true), which essentially creates a "Gecko-blocked-on-UI" situation that's similar to what caused the deadlock from before. Also, processing one event every ~100ms might not be good.

I think you want to call NS_ProcessNextEvent() repeatedly in a loop, without wait. When NS_ProcessNextEvent() blocks, you get out of that by posting an event back to the Gecko thread.
Attachment #8851618 - Flags: feedback?(nchen) → feedback+
Comment on attachment 8851618 [details] [diff] [review]
0004-Bug-1350924-4.0-Spin-the-Gecko-thread-event-loop-to-.patch

Review of attachment 8851618 [details] [diff] [review]:
-----------------------------------------------------------------

This patch scares me to be honest. I think you want to talk to Billm about it for sure. One thing to keep an eye open for is the work under bug 1348361 that might solve some of your problems.

About the preallocated process manager we talked very briefly about, I don't know how could that help your situation here exactly. All it does is launching up an extra content process in the background without hosting any tab/window in it, and by the next time someone needs a new content process, one just needs to grab and use it. Since you have a hard upper limit of having only one content process alive at all times, I don't see how that would help you here but I might be missing something.
(Assignee)

Updated

2 years ago
Attachment #8851606 - Flags: review?(rbarker)
(Assignee)

Updated

2 years ago
Attachment #8851608 - Flags: review?(rbarker)
(Assignee)

Updated

2 years ago
Attachment #8851607 - Flags: review?(rbarker)
(Assignee)

Updated

2 years ago
Depends on: 1348361
Component: Embedding: APIs → GeckoView
Product: Core → Firefox for Android
Version: unspecified → Trunk
Priority: -- → P1
P2 because this is not a Klar+GeckoView blocker. Klar doesn't shut down the content process.
Priority: P1 → P2
You need to log in before you can comment on or make changes to this bug.