Closed
Bug 752368
Opened 13 years ago
Closed 13 years ago
Random black screens when opening links from other applications
Categories
(Firefox for Android Graveyard :: General, defect)
Tracking
(firefox14 fixed, blocking-fennec1.0 +)
RESOLVED
FIXED
Firefox 15
People
(Reporter: mcote, Assigned: vlad)
References
Details
Crash Data
Attachments
(2 files, 5 obsolete files)
1.23 MB,
text/plain
|
Details | |
5.70 KB,
patch
|
ajuma
:
review+
kats
:
review+
joe
:
approval-mozilla-aurora+
|
Details | Diff | Splinter Review |
At least once a day, on my Motorola Atrix, when I open a link from another Android app (often Google Reader), Nightly will show a black content screen. The chrome is still there, and it looks like it is loading the page (the loading symbol spins for a while, then stops, as usual when loading a page), but the content part of the screen stays completely black. I've seen this for some time but haven't been able to figure out what causes it exactly. I've never seen this problem on my Asus Eee Transformer tablet, but I use it a lot less than my phone.
I'll attach a logcat the next time it happens.
Comment 1•13 years ago
|
||
Crop of these bugs coming up; other similar ones bug 752153, bug 751732, bug 751589.
These need to be sorted out.
Mark, can you try this:
Take a debug Android build and run it (via Android ADB) with MOZ_GL_DEBUG=1 and MOZ_GL_DEBUG_ABORT_ON_ERROR=1, e.g.
adb shell am start -a android.activity.MAIN -n org.mozilla.fennec_<user>/.App --es env0 MOZ_GL_DEBUG=1 --es env1 MOZ_GL_DEBUG_ABORT_ON_ERROR=1
blocking-fennec1.0: --- → ?
Updated•13 years ago
|
Severity: normal → major
Assignee | ||
Comment 2•13 years ago
|
||
I can reproduce this on today's Aurora nightly (20120507) on my Atrix. It took about 5-6 link opens from another app, but eventually the link opened up with the content area black and never came back. App is still responsive; I can get a tab list, etc., but any content areas are all black.
Reporter | ||
Comment 3•13 years ago
|
||
Okay I'm running the debug build with those options. I assume you want a logcat the next time this happens?
Reporter | ||
Comment 4•13 years ago
|
||
Hm I've opened about 15-20 links so far, and no black screen. It looks like fennec might have crashed once, although it came back very quickly (is that level of crash recovery a new addition? it was pretty slick!).
I'll keep trying.
Reporter | ||
Comment 5•13 years ago
|
||
Oh right, the crash must be happening because of the MOZ_GL_DEBUG_ABORT_ON_ERROR setting (I saw the content screen briefly go black before it crashed). So disregard my last comment. :)
I saw the crash again. Attached is a logcat.
Comment 6•13 years ago
|
||
Vlad - Can you renom for beta-blocking if this is more than just the Atrix?
Assignee: nobody → vladimir
blocking-fennec1.0: ? → +
Comment 7•13 years ago
|
||
Attachment #621683 -
Attachment is obsolete: true
Assignee | ||
Comment 8•13 years ago
|
||
Debug build log:
05-07 16:28:22.756 20250 20263 I GeckoApp: Got message: DOMContentLoaded
05-07 16:28:23.166 20250 20250 I GeckoTabs: Removed a tab with id: 3
** hitting back button
05-07 16:28:23.166 1645 2255 I ActivityManager: moveTaskToBack: 63
05-07 16:28:23.176 20250 20250 D PhoneWindow: couldn't save which view has focus because the focused view org.mozilla.gecko.gfx.LayerView@40547248 has no id.
05-07 16:28:23.176 20250 20250 I GeckoApp: application paused
** bunch of Surface is invalid errors
05-07 16:28:23.576 20250 20296 E Surface : surface (identity=612) is invalid, err=-19 (No such device)
...
05-07 16:28:23.666 20250 20250 I GeckoApp: stop
...
05-07 16:28:24.076 20250 20296 E Surface : surface (identity=612) is invalid, err=-19 (No such device)
* I click on link in twitter; intent gets thrown over to Gecko
05-07 16:28:24.386 1645 2223 I ActivityManager: Starting: Intent { act=android.intent.action.VIEW dat=http://t.co/ogZwxyNi flg=0x10000000 cmp=android/com.android.internal.app.ResolverActivity } from pid 19947
05-07 16:28:24.456 20250 20296 E Surface : surface (identity=612) is invalid, err=-19 (No such device)
05-07 16:28:24.656 1645 1995 I ActivityManager: Displayed android/com.android.internal.app.ResolverActivity: +263ms
05-07 16:28:25.316 1645 1675 I ActivityManager: Starting: Intent { act=android.intent.action.VIEW dat=http://t.co/ogZwxyNi flg=0x13800000 cmp=org.mozilla.fennec_vladimir/.App } from pid 1645
05-07 16:28:25.336 20250 20250 W GeckoApp: zerdatime 42188476 - onNewIntent
05-07 16:28:25.336 20250 20250 I GeckoApp: onNewIntent: http://t.co/ogZwxyNi
* gecko gets it
05-07 16:28:25.346 20250 20250 I GeckoApp: restart
* and restarts/starts/resumes
05-07 16:28:25.346 20250 20250 W GeckoApp: zerdatime 42188480 - onStart
05-07 16:28:25.346 20250 20250 I GeckoApp: start
05-07 16:28:25.346 20250 20250 I GeckoApp: resume
05-07 16:28:25.346 20250 20250 I GeckoApp: application resumed
05-07 16:28:25.376 20250 20263 I GeckoTabs: Got message: Tab:Added
05-07 16:28:25.376 20250 20263 I GeckoTabs: Received message from Gecko: 42188517 - Tab:Added
05-07 16:28:25.376 20250 20263 I GeckoTabs: Added a tab with id: 4
05-07 16:28:25.396 20250 20263 I GeckoApp: Got message: Content:LocationChange
05-07 16:28:25.396 20250 20263 I GeckoApp: URI - about:blank
...
no Surface errors or GL errors (with MOZ_GL_DEBUG_ABORT_ON_ERROR) after this point, but the content area just remains black.
Assignee | ||
Comment 9•13 years ago
|
||
The above sequence appears in the attached logs as well (the attached log has multiple runs of Gecko; there are two instances of getting into surface is invalid.
Assignee | ||
Comment 10•13 years ago
|
||
Now with more logging! Here's the relevant sequence:
05-07 19:38:29.818 1644 2265 I ActivityManager: moveTaskToBack: 20
05-07 19:38:29.828 12540 12540 I GeckoTabs: Removed a tab with id: 3
05-07 19:38:29.828 12540 12540 I GeckoApp: application paused
05-07 19:38:29.868 12540 12540 I GeckoToolbar: zerdatime 1919232 - Throbber stop
05-07 19:38:29.868 12540 12540 I GeckoDoorHangerPopup: Showing all doorhangers for tab: 1
05-07 19:38:29.868 12540 12572 I Gecko : Composite (not paused)
05-07 19:38:29.868 12540 12540 I GeckoDoorHangerPopup: Showing all doorhangers for tab: 1
05-07 19:38:29.968 12540 12572 I Gecko : Composite (not paused)
05-07 19:38:29.978 12540 12540 W IInputConnectionWrapper: showStatusIcon on inactive InputConnection
05-07 19:38:29.978 12540 12572 I Gecko : Composite (not paused)
05-07 19:38:29.988 12540 12572 I Gecko : Composite (not paused)
05-07 19:38:30.078 12540 12572 I Gecko : Composite (not paused)
05-07 19:38:30.406 12540 12540 I GeckoApp: stop
** this is where things go wrong.
05-07 19:38:30.586 12540 12572 I Gecko : Composite (not paused)
05-07 19:38:30.586 12540 12572 E Surface : surface (identity=119) is invalid, err=-19 (No such device)
05-07 19:38:30.586 12540 12572 I Gecko : PauseComposition
05-07 19:38:30.586 12540 12572 E Surface : surface (identity=119) is invalid, err=-19 (No such device)
05-07 19:38:30.606 12540 12572 I Gecko : Composite (paused; skipped)
Basically, we're getting an onStop, but (likely) we still have either an outstanding Composite in the queue or one just happened to have gotten issued at the time. We try to process it before we get to the PauseComposition, and things blow up. We never recover from this.
This doesn't seem to be Atrix specific; just for whatever reason it happens to be easier to hit it on that device.
Assignee | ||
Comment 12•13 years ago
|
||
Basically, as soon as we get an onStop, we must not try to composite. This will be tricky, as they're on different threads and the Composite could already be queued as we're processing the onStop. This is a bit of a problem.
Reporter | ||
Comment 13•13 years ago
|
||
Btw, just wanted to mention that I'm more than happy to try out fixes. Doesn't take me too long to repro this problem normally.
Assignee | ||
Comment 14•13 years ago
|
||
Ok, this seems to fix things for me. It's only a prototype, since I'm kind of hijacking SchedulePauseComposition, but it might just be ok (minus the debug printfs).
The basic idea is that pausing composition needs to be synchronous, and needs to be done whenever we get a pause or a stop. I can't reproduce either of the black screen issues (either from new apps or via awesomebar) with this patch applied. (Also, now that I look at it, I should probably just use MutexAutoLock too.)
Attachment #621812 -
Flags: feedback?(bgirard)
Attachment #621812 -
Flags: feedback?(ajuma)
Comment 15•13 years ago
|
||
Comment on attachment 621812 [details] [diff] [review]
prototype fix
I like this approach -- it's too bad that we have to add more synchronization, but it seems unavoidable here.
Attachment #621812 -
Flags: feedback?(ajuma) → feedback+
Assignee | ||
Comment 16•13 years ago
|
||
Ok, cleaned up version of the prototype; using mozilla::Monitor this time, because derp. Debugging crap removed. If we're ok with having SchedulePauseOnCompositorThread block on android until it's complete, then this should be good to go.
Attachment #621812 -
Attachment is obsolete: true
Attachment #621812 -
Flags: feedback?(bgirard)
Attachment #621826 -
Flags: review?(bgirard)
Attachment #621826 -
Flags: review?(ajuma)
Comment 17•13 years ago
|
||
Comment on attachment 621826 [details] [diff] [review]
cleaned up fix
Review of attachment 621826 [details] [diff] [review]:
-----------------------------------------------------------------
Looking at the cleaned up patch reminded me of one more thing we'll need to handle: nsWindow's sCompositorPaused needs to updated (e.g. by sending a COMPOSITOR_PAUSE event to Gecko) before the compositor actually pauses, or else Gecko might try to send a layer tree update after the compositor is paused. This could be a problem (since, effectively, we would need the Java thread to block first on Gecko, and then on the compositor), but might just work.
Alternatively, we need the compositor to be able to handle getting a layer tree update while paused. The problem with getting a layer tree update while paused is that we can't do texture uploads. BenWa's work on making layer tree updates asynch helps here, in that it would allow us to receive the layer tree update while paused but postpone the texture upload till later, but we don't yet have asynch updates for all layer types. Thoughts on this, BenWa?
Assignee | ||
Comment 18•13 years ago
|
||
Any reason to not just set sCompositorPaused to true in nsWindow::SchedulePauseCompositor (either before or after the call to sCompositorParent->SchedulePauseOnCompositorThread())? We know it'll be paused after that returns because the pause will happen synchronously, right?
Comment 19•13 years ago
|
||
(In reply to Vladimir Vukicevic (:vlad) from comment #18)
> Any reason to not just set sCompositorPaused to true in
> nsWindow::SchedulePauseCompositor (either before or after the call to
> sCompositorParent->SchedulePauseOnCompositorThread())? We know it'll be
> paused after that returns because the pause will happen synchronously, right?
The problem is that SchedulePauseCompositor is getting called on the Java thread. So what could happen is that Gecko checks sCompositorPaused, then begins to paint, then SchedulePauseCompositor sets sCompositorPaused to true and pauses the compositor, then Gecko begins a layer tree update. Even if we could split the paint and the layer tree update and have Gecko check sCompositorPaused right before the layer tree update (rather than right before the paint), we could have that Gecko checks the flag, then SchedulePauseCompositor sets the flag and pauses the compositor, and then Gecko does a layer tree update.
So we need to keep sCompositorPaused locked between the time Gecko checks it and the time it actually sends the layer tree update.
Assignee | ||
Comment 20•13 years ago
|
||
Different, simpler approach. I realized that the surfaceDestroyed callback is still valid, and that's where we need to pause composition. The two changes that are needed are to make pausing composition synchronous on the compositor side (as in the previous patch), and new to this patch, to dispatch the pause request to the gecko thread synchronously. This ensures that we won't be in the middle of painting or anything on the gecko thread, since we'll effectively be synchronizing all three threads. Works fine and fixes the problem on my Atrix, works fine on Galaxy Nexus.
Attachment #621826 -
Attachment is obsolete: true
Attachment #621826 -
Flags: review?(bgirard)
Attachment #621826 -
Flags: review?(ajuma)
Attachment #622041 -
Flags: review?(bgirard)
Attachment #622041 -
Flags: review?(ajuma)
Comment 21•13 years ago
|
||
Comment on attachment 622041 [details] [diff] [review]
fix, v2
Review of attachment 622041 [details] [diff] [review]:
-----------------------------------------------------------------
::: gfx/layers/ipc/CompositorParent.cpp
@@ +137,5 @@
> CompositorLoop()->PostTask(FROM_HERE, renderTask);
> }
>
> +#ifdef MOZ_WIDGET_ANDROID
> +static mozilla::Monitor sPauseCompositionMonitor("PauseCompositionMonitor");
This will trigger leak detection since static desconstructor run after leak checks. :(
@@ +184,4 @@
> void
> CompositorParent::SchedulePauseOnCompositorThread()
> {
> +#ifdef MOZ_WIDGET_ANDROID
This function here is only intended to be used by Android as I recall. I don't like having it behave differently in case it gets called by another platform. We should either make it clear that this function is android only or make it behave the same (not be both async/sync).
@@ +193,5 @@
> CompositorLoop()->PostTask(FROM_HERE, pauseTask);
> +
> +#ifdef MOZ_WIDGET_ANDROID
> + // XXX we should use a timeout here
> + lock.Wait();
Let's add a comment to make it clear what we're waiting for here such as:
// Wait for the compositor thread to notify us that it has paused.
::: mobile/android/base/gfx/GeckoLayerClient.java
@@ +457,3 @@
> public void compositionPauseRequested() {
> // We need to coordinate with Gecko when pausing composition, to ensure
> // that Gecko never executes a draw event while the compositor is paused.
We should get this comment updated to be more specific. Specifically we should mention why we're sending a sync event to gecko (to make sure we're not longer drawing), and that this event will in turn sync with the compositor to pause it.
Attachment #622041 -
Flags: review?(bgirard) → review-
Assignee | ||
Comment 22•13 years ago
|
||
Updated with review comments. Moved the Monitor to a class member.
Attachment #622041 -
Attachment is obsolete: true
Attachment #622041 -
Flags: review?(ajuma)
Attachment #622061 -
Flags: review?(bgirard)
Comment 23•13 years ago
|
||
Just a heads-up in case you haven't seen it already: there is a somewhat-related issue that Chris Lord is looking at, see https://bugzilla.mozilla.org/show_bug.cgi?id=751690#c12
Comment 24•13 years ago
|
||
Comment on attachment 622061 [details] [diff] [review]
fix, v3
Review of attachment 622061 [details] [diff] [review]:
-----------------------------------------------------------------
::: mobile/android/base/gfx/GeckoLayerClient.java
@@ +462,5 @@
> + // surfaceDestroyed notification), and to make sure that any in-flight
> + // Gecko draw events have been processed. When this returns, composition is
> + // definitely paused -- it'll synchronize with the Gecko event loop, which
> + // in turn will synchronize with the compositor thread.
> + GeckoAppShell.sendEventToGeckoSync(GeckoEvent.createCompositorPauseEvent());
When Gecko receives this event, it sends sCompositorChild->SendPause(), which synchronously pauses the compositor. So once we use sendEventToGeckoSync here, it seems we have all we need.
Do we still need the changes to SchedulePauseOnCompositorThread? (Is this actually getting called at all with this version of the patch?)
Assignee | ||
Comment 25•13 years ago
|
||
is sCompositorChild->SendPause() synchronous? If so, then we don't, but it seems like we should keep the bits in SchedulePauseOnCompositorThread in case someone needs to call that directly from Java like we used to so that they have the same semantics.
Comment 26•13 years ago
|
||
(In reply to Vladimir Vukicevic (:vlad) from comment #25)
> is sCompositorChild->SendPause() synchronous?
Yes.
> If so, then we don't, but it
> seems like we should keep the bits in SchedulePauseOnCompositorThread in
> case someone needs to call that directly from Java like we used to so that
> they have the same semantics.
That sounds like a good idea.
Updated•13 years ago
|
Attachment #622061 -
Flags: review?(bgirard)
Attachment #622061 -
Flags: review?(ajuma)
Attachment #622061 -
Flags: review+
Updated•13 years ago
|
Attachment #622061 -
Flags: review?(ajuma) → review+
Assignee | ||
Comment 27•13 years ago
|
||
Pushed to inbound as 3502d9840d98
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Comment 28•13 years ago
|
||
Backed out in https://hg.mozilla.org/integration/mozilla-inbound/rev/b5304fd23df9 - sent robocheck, robocheck2 and robopan into permanent failure (a state only barely distinguishable from their normal state, but still distinguishable).
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Comment 29•13 years ago
|
||
I see the following egl errors in the log for the test runs, probably related to the failure in some way:
05-09 20:50:34.216 E/libEGL ( 1020): eglSetSwapRectangleANDROID:1861 error 3008 (EGL_BAD_DISPLAY)
05-09 20:50:34.476 E/SurfaceFlinger( 1020): eglSwapBuffers: EGL error 0x3008 (EGL_BAD_DISPLAY)
Comment 30•13 years ago
|
||
(In reply to Kartikaya Gupta (:kats) from comment #29)
> I see the following egl errors in the log for the test runs, probably
> related to the failure in some way:
>
> 05-09 20:50:34.216 E/libEGL ( 1020): eglSetSwapRectangleANDROID:1861 error
> 3008 (EGL_BAD_DISPLAY)
>
> 05-09 20:50:34.476 E/SurfaceFlinger( 1020): eglSwapBuffers: EGL error 0x3008
> (EGL_BAD_DISPLAY)
If these errors aren't directly related, I wonder if the "error" text in the log is throwing off the log parser.
Assignee | ||
Comment 31•13 years ago
|
||
Shouldn't be related -- they're coming from what looks like android startup stuff? I don't actually see fennec being launched in those logs at all.
Assignee | ||
Comment 32•13 years ago
|
||
I see:
05-09 20:39:17.205 W/System.err( 1451): java.io.FileNotFoundException: /mnt/sdcard/tests/fennec-15.0a1.en-US.android-arm.apk (No such file or directory)
05-09 20:40:38.765 W/System.err( 1451): java.io.FileNotFoundException: /mnt/sdcard/tests/robocop.apk (No such file or directory)
after the respective pushes and install attempts... but there are so many errors and other stuff in that log that look to be just from the test that I can't really understand what's a valid error or not; but I don't see anything that looks like fennec even got installed, much less launched.
Comment 33•13 years ago
|
||
(In reply to Vladimir Vukicevic (:vlad) from comment #32)
> 05-09 20:39:17.205 W/System.err( 1451): java.io.FileNotFoundException:
> /mnt/sdcard/tests/fennec-15.0a1.en-US.android-arm.apk (No such file or
> directory)
>
> 05-09 20:40:38.765 W/System.err( 1451): java.io.FileNotFoundException:
> /mnt/sdcard/tests/robocop.apk (No such file or directory)
These errors are normal. At some point before the file is transfered the host requests a hash of the file or something which results in this exception. It's annoying as hell but I believe benign.
(In reply to Vladimir Vukicevic (:vlad) from comment #31)
> Shouldn't be related -- they're coming from what looks like android startup
> stuff? I don't actually see fennec being launched in those logs at all.
Usually I never actually see fennec being launched in the logs. The logs generated are pretty useless if you try to just interpret them directly. The best way is to compare a working-test log to a non-working-test log and see what's different.
Assignee | ||
Comment 34•13 years ago
|
||
Working through bug 749426, which is what I think is causing the test failures here.
Status: REOPENED → NEW
Assignee | ||
Comment 35•13 years ago
|
||
Testing this patch along with bug 749426 caused the same robo* test failures on the try server, but I was not able to reproduce the failures locally on either a galaxy nexus or on a local tegra board. I'm a bit stumped as to how to proceed; one thing that could be useful is the logcat output from the tinderbox tegra device while the test is running.
Comment 36•13 years ago
|
||
(In reply to Vladimir Vukicevic (:vlad) from comment #35)
> one thing that could be useful is the logcat output from the
> tinderbox tegra device while the test is running.
I have encountered other situations in which this would have been useful as well. It would be nice if the test harness could be modified to provide this always.
Comment 37•13 years ago
|
||
(In reply to Vladimir Vukicevic (:vlad) from comment #35)
> I'm a bit stumped as to
> how to proceed; one thing that could be useful is the logcat output from the
> tinderbox tegra device while the test is running.
Joel, any suggestions? I thought we had a solution for this.
Comment 38•13 years ago
|
||
adb/tcp is the most unreliable form of communication and that is the only way we can harvest the entire set of logcat data. In addition, there would be no way to store this. The volume is so high on the tegras that it would flood the logfile we produce.
I am open to suggestions, but the few attempts we have tried have yielded nothing useful (i.e. it collects the wrong stuff, or not enough).
Comment 39•13 years ago
|
||
one thing that could cut down on the volume is grep'ing the logcat for the process id of fennec
Comment 40•13 years ago
|
||
we find that we miss important things like crash dumps in logcat (which are different than when we crash fennec and produce a crash dump).
I do know there are ways with logcat to filter the output, we might be able filter based on the process and minimize our pipe of information.
The other factor is that this would be collected after the fact, not interleaved with test results.
Comment 41•13 years ago
|
||
(In reply to Joel Maher (:jmaher) from comment #40)
> we find that we miss important things like crash dumps in logcat (which are
> different than when we crash fennec and produce a crash dump).
Is that filtering on "Gecko" instead of the pid?
Comment 42•13 years ago
|
||
Can we run something like alogcat on the device and have it collect the full log, and then upload it somewhere using a reliable channel?
Comment 43•13 years ago
|
||
I like that idea. If we could essentially do a log flush before we start the test, record the log, then pull it up, that would work out great. At that point in our test harness script we could do any parsing and filtering necessary.
Reporter | ||
Comment 44•13 years ago
|
||
Just wanted to mention that, this morning, I got the black content on my Asus Eee Transformer as well (latest nightly).
Assignee | ||
Comment 45•13 years ago
|
||
I filed bug 754873 for capturing logcat output.
I tried submitting this patch with nothing but the sendEventToGecko -> sendEventToGeckoSync change (which technically should be all that's required); same problem. I can't reproduce it locally, though I'm about to try something different here shortly.
Assignee | ||
Comment 46•13 years ago
|
||
Try builds with this patch are at http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/vladimir@pobox.com-0e539f1d4ce4/try-android/ for testing.
Comment 49•13 years ago
|
||
Here's Vlad's tryserver build with the patch, to try:
http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/vladimir@pobox.com-0e539f1d4ce4/try-android/
Comment 50•13 years ago
|
||
Oops, meant to post that in the other bug.
Assignee | ||
Comment 51•13 years ago
|
||
I can finally reproduce the deadlock! I needed to run the exact same test run, using SUT and Talos to do it; doing anything as a mochitest etc. didn't work. What's happening is this:
The Android thread posted a native compositing paused event and is waiting for gecko to process it:
java.util.concurrent.CountDownLatch await CountDownLatch.java:207
org.mozilla.gecko.GeckoAppShell geckoEventSync GeckoAppShell.java:618
org.mozilla.gecko.GeckoAppShell sendEventToGeckoSync GeckoAppShell.java:538
org.mozilla.gecko.gfx.GeckoLayerClient compositionPauseRequested GeckoLayerClient.java:475
org.mozilla.gecko.gfx.LayerView surfaceDestroyed LayerView.java:269
android.view.SurfaceView reportSurfaceDestroyed SurfaceView.java:568
The Gecko thread, on the other hand, is here:
org.mozilla.gecko.gfx.LayerView requestRender LayerView.java:182
org.mozilla.gecko.gfx.LayerController setCheckerboardColor LayerController.java:347
org.mozilla.gecko.GeckoApp handleMessage GeckoApp.java:861
org.mozilla.gecko.GeckoAppShell handleGeckoMessage GeckoAppShell.java:1865
org.mozilla.gecko.GeckoAppShell nativeRun GeckoAppShell.java:-2
org.mozilla.gecko.GeckoAppShell nativeRun GeckoAppShell.java:-2
org.mozilla.gecko.GeckoAppShell runGecko GeckoAppShell.java:505
org.mozilla.gecko.GeckoThread run GeckoThread.java:114
So, Gecko is tryng to call LayerView.requestRender() while we're already in the middle of LayerView.surfaceDestroyed(). Both those methods are synchronized. Thus, we deadlock.
Comment 52•13 years ago
|
||
odd why we get that only in talos vs mochitest version of this. Could it be something in the profile (pref or extension), or just random chance.
Assignee | ||
Comment 53•13 years ago
|
||
Yeah, I'm confused by that as well; I don't see anything in prefs.js in the profile, at least.
![]() |
||
Updated•13 years ago
|
Crash Signature: [@ WSEGL_GetDrawableParameters]
Assignee | ||
Comment 54•13 years ago
|
||
Updated fix that fixes the deadlock. (Also gives the Gecko thread a name, for easier finding in ddms.)
Removing synchronized from requestRepaint seems safe, since it just flows into posting an event to Gecko.. which is thread safe by design anyway.
Attachment #624782 -
Flags: review?(ajuma)
Comment 55•13 years ago
|
||
Comment on attachment 624782 [details] [diff] [review]
fix, v4
Looks good to me. I think kats should take a look at the Java bits too.
Attachment #624782 -
Flags: review?(bugmail.mozilla)
Attachment #624782 -
Flags: review?(ajuma)
Attachment #624782 -
Flags: review+
Comment 56•13 years ago
|
||
Comment on attachment 624782 [details] [diff] [review]
fix, v4
Review of attachment 624782 [details] [diff] [review]:
-----------------------------------------------------------------
Looks like most of the synchronization in LayerView is just an artifact from long ago, we should probably revisit that at some point. But yeah, this change looks fine.
::: mobile/android/base/gfx/LayerView.java
@@ +200,1 @@
> public long getRenderTime() {
This function is never used; might as well take it out.
Attachment #624782 -
Flags: review?(bugmail.mozilla) → review+
Assignee | ||
Comment 57•13 years ago
|
||
On inbound, with getRenderTime and bits nuked. Fingers crossed that it sticks!
https://tbpl.mozilla.org/?tree=Mozilla-Inbound&rev=4a6bdd1308c3
Comment 58•13 years ago
|
||
Status: NEW → RESOLVED
Closed: 13 years ago → 13 years ago
Flags: in-testsuite-
Resolution: --- → FIXED
Target Milestone: --- → Firefox 15
Updated•13 years ago
|
status-firefox14:
--- → affected
Assignee | ||
Comment 59•13 years ago
|
||
Comment on attachment 624782 [details] [diff] [review]
fix, v4
[Approval Request Comment]
User impact if declined: crashes, freezes, all sorts of badness
Testing completed (on m-c, etc.): local testing on variety of devices; on m-c
Risk to taking this patch (and alternatives if risky): crashes, freezes, all sorts of badness
Attachment #624782 -
Flags: approval-mozilla-aurora?
Assignee | ||
Comment 60•13 years ago
|
||
Er, ignore the risk to taking this patch line above, meant to delete that! :p
Updated•13 years ago
|
Attachment #624782 -
Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Comment 62•13 years ago
|
||
Comment on attachment 622061 [details] [diff] [review]
fix, v3
># HG changeset patch
># Parent 448f554f6acbfe632da02e23526344f3defebe69
>
>diff --git a/gfx/layers/ipc/CompositorParent.cpp b/gfx/layers/ipc/CompositorParent.cpp
>--- a/gfx/layers/ipc/CompositorParent.cpp
>+++ b/gfx/layers/ipc/CompositorParent.cpp
>@@ -1,4 +1,4 @@
>-/* -*- Mode: C++; tab-width: 2; indent-tabs-mode: nil; c-basic-offset: 4 -*- */
>+/* -*- Mode: C++; tab-width: 2; indent-tabs-mode: nil; c-basic-offset: 2 -*- */
> /* vim: set sw=2 ts=2 et tw=80 : */
> /* ***** BEGIN LICENSE BLOCK *****
> * Version: MPL 1.1/GPL 2.0/LGPL 2.1
>@@ -66,6 +66,7 @@ CompositorParent::CompositorParent(nsIWi
> , mLayersUpdated(false)
> , mCompositorLoop(aMsgLoop)
> , mThreadID(aThreadID)
>+ , mPauseCompositionMonitor("PauseCompositionMonitor")
> {
> MOZ_COUNT_CTOR(CompositorParent);
> }
>@@ -141,6 +142,9 @@ CompositorParent::PauseComposition()
> {
> NS_ABORT_IF_FALSE(CompositorThreadID() == PlatformThread::CurrentId(),
> "PauseComposition() can only be called on the compositor thread");
>+
>+ mozilla::MonitorAutoLock lock(mPauseCompositionMonitor);
>+
> if (!mPaused) {
> mPaused = true;
>
>@@ -148,6 +152,9 @@ CompositorParent::PauseComposition()
> static_cast<LayerManagerOGL*>(mLayerManager.get())->gl()->ReleaseSurface();
> #endif
> }
>+
>+ // if anyone's waiting to make sure that composition really got paused, tell them
>+ lock.NotifyAll();
> }
>
> void
>@@ -169,12 +176,21 @@ CompositorParent::ResumeCompositionAndRe
> ResumeComposition();
> }
>
>+/*
>+ * This will execute a pause synchronously, waiting to make sure that the compositor
>+ * really is paused.
>+ */
> void
> CompositorParent::SchedulePauseOnCompositorThread()
> {
>+ mozilla::MonitorAutoLock lock(mPauseCompositionMonitor);
>+
> CancelableTask *pauseTask = NewRunnableMethod(this,
> &CompositorParent::PauseComposition);
> CompositorLoop()->PostTask(FROM_HERE, pauseTask);
>+
>+ // Wait until the pause has actually been processed by the compositor thread
>+ lock.Wait();
> }
>
> void
>diff --git a/gfx/layers/ipc/CompositorParent.h b/gfx/layers/ipc/CompositorParent.h
>--- a/gfx/layers/ipc/CompositorParent.h
>+++ b/gfx/layers/ipc/CompositorParent.h
>@@ -1,4 +1,4 @@
>-/* -*- Mode: C++; tab-width: 8; indent-tabs-mode: nil; c-basic-offset: 4 -*- */
>+/* -*- Mode: C++; tab-width: 8; indent-tabs-mode: nil; c-basic-offset: 2 -*- */
> /* vim: set sw=4 ts=8 et tw=80 : */
> /* ***** BEGIN LICENSE BLOCK *****
> * Version: MPL 1.1/GPL 2.0/LGPL 2.1
>@@ -52,6 +52,7 @@
> #include "mozilla/layers/PCompositorParent.h"
> #include "mozilla/layers/PLayersParent.h"
> #include "base/thread.h"
>+#include "mozilla/Monitor.h"
> #include "ShadowLayersManager.h"
>
> class nsIWidget;
>@@ -165,6 +166,8 @@ private:
> MessageLoop* mCompositorLoop;
> PlatformThreadId mThreadID;
>
>+ mozilla::Monitor mPauseCompositionMonitor;
>+
> DISALLOW_EVIL_CONSTRUCTORS(CompositorParent);
> };
>
>diff --git a/mobile/android/base/gfx/GeckoLayerClient.java b/mobile/android/base/gfx/GeckoLayerClient.java
>--- a/mobile/android/base/gfx/GeckoLayerClient.java
>+++ b/mobile/android/base/gfx/GeckoLayerClient.java
>@@ -457,7 +457,13 @@ public class GeckoLayerClient implements
> public void compositionPauseRequested() {
> // We need to coordinate with Gecko when pausing composition, to ensure
> // that Gecko never executes a draw event while the compositor is paused.
>- GeckoAppShell.sendEventToGecko(GeckoEvent.createCompositorPauseEvent());
>+ // This is sent synchronously to make sure that we don't attempt to use
>+ // any outstanding Surfaces after we call this (such as from a
>+ // surfaceDestroyed notification), and to make sure that any in-flight
>+ // Gecko draw events have been processed. When this returns, composition is
>+ // definitely paused -- it'll synchronize with the Gecko event loop, which
>+ // in turn will synchronize with the compositor thread.
>+ GeckoAppShell.sendEventToGeckoSync(GeckoEvent.createCompositorPauseEvent());
> }
>
> /** Implementation of LayerView.Listener */
Attachment #622061 -
Attachment is obsolete: true
Assignee | ||
Comment 63•13 years ago
|
||
Updated•13 years ago
|
Reporter | ||
Comment 64•13 years ago
|
||
This should be fixed in Nightly, right? Just checking 'cause I experienced the problem again in today's Nightly...
Mark : Yes, it should be in the nightly. Do you have steps to reproduce?
Reporter | ||
Comment 70•13 years ago
|
||
Naoki: Unfortunately it just happened once as per the normal steps--opened a link from an other application. Had to stop the process and relaunch to get the content screen back.
Assignee | ||
Comment 72•13 years ago
|
||
If you see it again, can you grab the logcat output? It should let us determine if the root cause is the same as what was fixed here, or something different.
Reporter | ||
Comment 76•13 years ago
|
||
Still haven't seen it since. Looks like it is much, much reduced in frequency.
Updated•13 years ago
|
Assignee: vladimir → nobody
Updated•5 years ago
|
Product: Firefox for Android → Firefox for Android Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•