Closed Bug 771189 Opened 12 years ago Closed 12 years ago

Intermittent deadlock during CreateCompositor / surfaceDestroyed

Categories

(Firefox for Android Graveyard :: General, defect)

x86
Android
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED
Firefox 16

People

(Reporter: gbrown, Assigned: ajuma)

References

Details

Attachments

(1 file)

(This has been described in bug 766059 and 759792, but it is not specific to DLBI or a particular Robocop test, and the description has become confused.)

nsBaseWidget::CreateCompositor, on the Gecko thread, waits for a valid surface via GLController::waitForValidSurface. Usually the surface is valid and all is well, but if the surface is not valid (surfaceDestroyed called last), waitForValidSurface waits until GLController::surfaceChanged is called. surfaceDestroyed and surfaceChanged are called on the UI thread -- if the UI thread is blocked, CreateCompositor will not complete and the Gecko thread will be blocked.

Meanwhile, the UI thread may be running LayerView::surfaceDestroyed, calling -> GeckoLayerClient::compositionPauseRequested -> GeckoAppShell::sendEventToGeckoSync -> GeckoAppShell::geckoEventSync which will block the UI thread waiting for a reply, which requires a functioning Gecko thread.

So if surfaceDestroyed and CreateCompositor are running at about the same time, it seems there is a good chance that we deadlock, with the UI thread waiting for the Gecko thread and the Gecko thread waiting for the UI thread. Eventually the process is killed (ANR).

This condition was first noted May 30 and appears to cause various intermittent failures in tbpl tests.


Java stack traces showing the deadlock:

One thread is at:

06-26 18:31:56.052 32527 32527 W System.err:    at org.mozilla.gecko.GeckoAppShell.geckoEventSync(GeckoAppShell.java:549)
06-26 18:31:56.054 32527 32527 W System.err:    at org.mozilla.gecko.GeckoAppShell.sendEventToGeckoSync(GeckoAppShell.java:507)
06-26 18:31:56.054 32527 32527 W System.err:    at org.mozilla.gecko.gfx.GeckoLayerClient.compositionPauseRequested(GeckoLayerClient.java:448)
06-26 18:31:56.054 32527 32527 W System.err:    at org.mozilla.gecko.gfx.LayerView.surfaceDestroyed(LayerView.java:230)
06-26 18:31:56.054 32527 32527 W System.err:    at android.view.SurfaceView.reportSurfaceDestroyed(SurfaceView.java:586)
06-26 18:31:56.054 32527 32527 W System.err:    at android.view.SurfaceView.updateWindow(SurfaceView.java:478)
06-26 18:31:56.056 32527 32527 W System.err:    at android.view.SurfaceView.onWindowVisibilityChanged(SurfaceView.java:208)
06-26 18:31:56.056 32527 32527 W System.err:    at android.view.View.dispatchWindowVisibilityChanged(View.java:3944)
06-26 18:31:56.056 32527 32527 W System.err:    at android.view.ViewGroup.dispatchWindowVisibilityChanged(ViewGroup.java:719)
06-26 18:31:56.058 32527 32527 W System.err:    at android.view.ViewGroup.dispatchWindowVisibilityChanged(ViewGroup.java:719)
06-26 18:31:56.058 32527 32527 W System.err:    at android.view.ViewGroup.dispatchWindowVisibilityChanged(ViewGroup.java:719)
06-26 18:31:56.058 32527 32527 W System.err:    at android.view.ViewGroup.dispatchWindowVisibilityChanged(ViewGroup.java:719)
06-26 18:31:56.058 32527 32527 W System.err:    at android.view.ViewGroup.dispatchWindowVisibilityChanged(ViewGroup.java:719)
06-26 18:31:56.058 32527 32527 W System.err:    at android.view.ViewRoot.performTraversals(ViewRoot.java:755)
06-26 18:31:56.058 32527 32527 W System.err:    at android.view.ViewRoot.handleMessage(ViewRoot.java:1752)
06-26 18:31:56.060 32527 32527 W System.err:    at android.os.Handler.dispatchMessage(Handler.java:99)
06-26 18:31:56.060 32527 32527 W System.err:    at android.os.Looper.loop(Looper.java:123)
06-26 18:31:56.062 32527 32527 W System.err:    at android.app.ActivityThread.main(ActivityThread.java:4627)
06-26 18:31:56.062 32527 32527 W System.err:    at java.lang.reflect.Method.invokeNative(Native Method)
06-26 18:31:56.064 32527 32527 W System.err:    at java.lang.reflect.Method.invoke(Method.java:521)
06-26 18:31:56.064 32527 32527 W System.err:    at com.android.internal.os.ZygoteInit$MethodAndArgsCaller.run(ZygoteInit.java:858)
06-26 18:31:56.064 32527 32527 W System.err:    at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:616)
06-26 18:31:56.064 32527 32527 W System.err:    at dalvik.system.NativeStart.main(Native Method)

The other thread is at:

06-26 18:31:56.080 32527 32570 W System.err:    at org.mozilla.gecko.gfx.GLController.waitForValidSurface(GLController.java:83)
06-26 18:31:56.080 32527 32570 W System.err:    at dalvik.system.NativeStart.run(Native Method)
06-26 18:31:56.080 32527 32570 W System.err:    at dalvik.system.NativeStart.run(Native Method)

and on the C++ side, I see:

nsWindow::Invalidate posts DRAW
  nsWindow::OnGlobalAndroidEvent receives AndroidGeckoEvent::DRAW, calls
    nsWindow::OnDraw, calls
      nsWindow::GetLayerManager, calls
        nsBaseWidget::CreateCompositor, calls
          CompositorChild::SendPLayersConstructor(LayerManager::LAYERS_OPENGL, ...)
          ... does not return
Blocks: 769524
Blocks: 759792
Quoting :ajuma from https://bugzilla.mozilla.org/show_bug.cgi?id=766059#c23

> This leads to a couple theories:
> 1) We're getting a surfaceDestroyed event (triggering a 
> compositionPauseRequested) before we've even created the compositor. 
> Then we get the paint event, triggering a call to nsWindow::GetLayerManager,
> which causes us to try to create the compositor, but this blocks on getting 
> a surfaceChanged event (by calling waitForValidSurface), but the surfaceChanged
> event doesn't arrive since the UI thread is still handling the 
> surfaceDestroyed event. If this is the problem, we need to track the creation
> of the compositor and avoid sending a compositionPauseRequested when there isn't 
> actually a compositor to pause.

That sounds like the scenario I am seeing, and that seems like a reasonable solution. Any concerns?
This implements the solution from Comment 1 (and handles compositionResumeRequested as well, since it also involves sending a synchronous event to Gecko).
Assignee: nobody → ajuma
Status: NEW → ASSIGNED
Attachment #639427 - Flags: review?(bugmail.mozilla)
Comment on attachment 639427 [details] [diff] [review]
Don't try to pause/resume a non-existent compositor.

Review of attachment 639427 [details] [diff] [review]:
-----------------------------------------------------------------

LGTM
Attachment #639427 - Flags: review?(bugmail.mozilla) → review+
Running the patch locally, I no longer see deadlocks during Robocop tests - it works!
https://hg.mozilla.org/integration/mozilla-inbound/rev/6ca2f332fa91
Target Milestone: --- → Firefox 16
https://hg.mozilla.org/mozilla-central/rev/6ca2f332fa91
Status: ASSIGNED → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Product: Firefox for Android → Firefox for Android Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: