Web page or screen you were on when you saw the issue: kijiji.ca Device: Droid Razr running Gingerbread Crash report ID: bp-56e54338-bbd5-44f6-88dc-a53922120517 The site looks fine when it loads, but the top right section (Daily Deals) takes a long time to show up and the horizontal area under "post a classified ad" and "all new ads" usually doesn't show up at all (though it did a couple of times). Any panning or zooming causes the browser to crash almost immediately. Sometimes clicking a link will load the new page, sometimes it will cause the browser to crash.
Also, on first visit, the site serves a location detection message in a yellow box at the top of the screen ("Welcome to Kijiji. We have matched your location to..."). On Fennec, the font is too big, the lines overlap and the message bleeds off the right side of the screen.
Crash looks like a dupe of bug 754056, which supposedly didn't affect aurora/beta. The font being too big is probably a font inflation issue.
There's both crashes from incorrect management of reusable surfaces. But the cause must be different.
It's bug 681114 with STR.
I couldn't reproduce, but we know this crash is still happening. If you can confirm that it still happens and give more detailed steps that would be wonderful. In the meantime we want to increase the runtime checking and crash easier when we enter this bad state to find who's at fault. Going to post that patch in bug 681114.
Seems to only happen for me if I activate the plugin that displays the top side-scrolling promo area. Otherwise it appears fine.
Matej, can you provide the new crash ID with the try build in bug 681114 comment 17?
Matej I reproduced this and added the info Scoobidiver asked for.
Crashes a current nightly as well on the Droid Razr.
This seems like it's probably going to be fixed by the fix to bug 681114.
The problem if-and-only-if flash is loaded. It seems to crash most of the time for me and when it doesn't crash the horizontal flash ad is significantly out of sync when doing async zooming. While reproducing this I was able to reproduce 'JS_DHashTableEnumerate' another top crasher.
This crash seems to be if-and-only-if flash + Android 2.3.x or LOWER.
I've narrowed down the code to flash drawing. I'm still unsure if it's our support code or not yet. I hope to have this narrowed down significantly monday.
Right now I have to narrowed down further to PluginLayer.performUpdates(). Still progressing.
We tried a few theories unsuccessfully: - Click to play - Skia - Plugin Placeholder surface Ali has some Valgrind traces showing some serious heap issues he will post once the trace is completed.
Created attachment 630727 [details] Valgrind log This is the complete Valgrind log of a run of Fennec that included visiting kijiji.ca, clicking to play the Flash content, zooming/panning, and then crashing. The most concerning/suspicious parts of the log are the invalid writes (that is, writing to memory that has already been freed).
(In reply to Ali Juma [:ajuma] from comment #16) > Created attachment 630727 [details] > The most concerning/suspicious parts of the log are the invalid writes (that > is, writing to memory that has already been freed). Well, basically Fennec is hosed as soon as the first invalid read occurs, at android::Surface::getBufferIndex. By this point, the underlying bad event (refcounting problem?) that screws this up has already occurred. The invalid write and invalid free later on are merely follow-on effects. > at 0xAC7121E6: android::Surface::getBufferIndex(android::sp<android::GraphicBuffer> const&) const (Surface.cpp:1025) > by 0xAC71393B: android::Surface::unlockAndPost() (Surface.cpp:1011) > by 0xC279EC1: ??? (in /dev/ashmem) Do you need more frames than this? I have a crappy stack-scanning hack that might get you 3 or 4 frames after the ??? -- or it might not, no promises. You're welcome to try it if you want.
(In reply to Julian Seward from comment #17) > Do you need more frames than this? I have a crappy stack-scanning hack > that might get you 3 or 4 frames after the ??? -- or it might not, no > promises. You're welcome to try it if you want. That would certainly help. I've been looking into the android source code but if we can find if libxul is calling this (or flash) that would help.
Something relevant in the log: W/SurfaceComposerClient( 2762): Destroying surface while a transaction is open. Client 0x219358: destroying surface 142, mTransactionOpen=1
(In reply to Benoit Girard (:BenWa) from comment #19) > Something relevant in the log: > W/SurfaceComposerClient( 2762): Destroying surface while a transaction is > open. Client 0x219358: destroying surface 142, mTransactionOpen=1 It sounds like the right thing to do here might be to just disable placeholders. This seemed to fix the problem in our testing.
Created attachment 631032 [details] [diff] [review] patch
Comment on attachment 631032 [details] [diff] [review] patch [Approval Request Comment] Bug caused by (feature/regressing bug #): Bug 727116 User impact if declined: Crashes on 2.3 and lower when running Flash and zooming Testing completed (on m-c, etc.): Tested locally, will be on m-c soon. Risk to taking this patch (and alternatives if risky): mobile only, very low. Disabling a feature via a pref. String or UUID changes made by this patch: none.
Comment on attachment 631032 [details] [diff] [review] patch https://hg.mozilla.org/integration/mozilla-inbound/rev/a9455a51d543
Are there any performance implications disabling place holders?
Let's confirm that this fixes the crash from crash-stats. Product and QA to evaluate the trade off of crashiness vs. the flash zooming experience.
We should keep digging for a better solution as well, because the tradeoff here could be pretty ugly.
https://hg.mozilla.org/mozilla-central/rev/a9455a51d543 (Merged by Ed Morley)
reopening because we're not sure this fixes the crash (should have had a leave open in the whiteboard)
I've tried a few things to try to allow us to turn placeholders back on: 1) Leave the Surface lock acquired in AndroidJNI::getSurfaceBits(). This would prevent the plugin from locking the surface before we removed it, with the goal that it would not then attempt to unlock it after the surface was destroyed. This didn't work, but I can't really explain why. Might warrant more investigation. 2) Instead of removing the SurfaceView, which destroys the underlying Surface (causing our problem), I thought it might be nice if we could just hide the Surface. Conveniently, Surface has methods for doing this. Sadly, they all require a transaction to work, and only privileged processes are allowed to open a transaction. At this point I think we should go ahead with the current fix. I will continue to investigate options for getting this working again, but I don't think we'll be able to get anything in before the next build.
qawanted: take a look at stability report today
(In reply to Tony Chung [:tchung] from comment #31) > qawanted: take a look at stability report today The latest crash in Nightly happened in 15.0a1/20120525 long before the patch landed. The latest crash in Aurora happened in 14.0a2/20120604 before the patch landed.
Comment on attachment 631032 [details] [diff] [review] patch Let's back this out of mozilla-central and mozilla-aurora
(In reply to Brad Lassey [:blassey] from comment #33) > Comment on attachment 631032 [details] [diff] [review] > patch > > Let's back this out of mozilla-central and mozilla-aurora Shouldn't we fix the problems first?
(In reply to Brad Lassey [:blassey] from comment #33) > Comment on attachment 631032 [details] [diff] [review] > patch > > Let's back this out of mozilla-central and mozilla-aurora I don't think turning placeholders back on is a good idea. Keep in mind that the crash we are seeing is a heap corruption which: 1) This is not the only signature caused by the bug. I've seen dozens of unique signature while debugging this problem. Judging how frequent this crash is based on this signature is wrong and misleading. This bug accounts for other crashes! 2) Heap corruption does not necessary lead to crash but can cause unpredictable behavior as well as corrupting data.
Backed out of mozilla-central and mozilla-aurora https://hg.mozilla.org/mozilla-central/rev/e9bf05c14376 https://hg.mozilla.org/releases/mozilla-aurora/rev/94e4994199c0
(In reply to Benoit Girard (:BenWa) from comment #35) > 1) This is not the only signature caused by the bug. I've seen dozens of > unique signature while debugging this problem. Judging how frequent this > crash is based on this signature is wrong and misleading. This bug accounts > for other crashes! It's worth re-emphasizing that this may be responsible for crashes with other signatures and on other sites. For example, I just tried panning and zooming a flash video on cnn.com; I did this three times, and (rather easily) got the following three crashes: bp-79c61338-6e65-4358-bf2c-f08722120612 bp-33617e48-e514-4847-91a6-323b92120612 bp-2ec951d2-5cbf-426c-bbe9-d954d2120612 Importantly, note that each of these crashes has a different signature.
I agree that this could be really bad, fwiw. I hope to have a real fix some time today, however.
Re-nominating as a *hard* blocker because of concerns in Comment 35 and 37. Here's some video of the current behavior: Try 1: https://dl.dropbox.com/u/10523664/bug_756253/2012-06-12%2013.55.42.mp4 Try 2: https://dl.dropbox.com/u/10523664/bug_756253/2012-06-12%2013.55.21.mp4 Try 3: https://dl.dropbox.com/u/10523664/bug_756253/2012-06-12%2013.54.32.mp4 Note that when placeholders run long enough to not crash right away the visual behavior is equivalent to not having placeholders. I was not able to demonstrate this in all my attempts because the heap corruptions is to severe and brings down the browser in seconds. - All hard evidence points to this crash being our single biggest source of crashes - It causes heap corruption which is a security problem and makes it impossible to accurately gather crash stats. I think shipping with placeholder as-is is out of the question! Disabling placeholders at this point is a no-brainer. The only question in my mind is what additional steps can be take to lower the impact. My suggestion is to either live with the current zoom/panning behavior or hiding flash during panning/zooming.
Placeholders off http://www.youtube.com/watch?v=PJDBQMZ_iDk Placeholders on http://www.youtube.com/watch?v=-nWODB94358
I relanded Benoit's original patch on m-c and aurora, and also applied to beta per blassey https://hg.mozilla.org/mozilla-central/rev/d39501066aa4 https://hg.mozilla.org/releases/mozilla-aurora/rev/19a31aa93143 https://hg.mozilla.org/releases/mozilla-beta/rev/03b65c7a3470
The latest crash on Nightly happened in 15.0a1/20120525 and on Aurora in 14.0a2/20120604. There are no crashes in 14.0b7.
Unable to reproduce the issue on Nightly 16.0a1 2012-06-17, Aurora 15.0a2 2012-06-17 or Firefox Mobile 14 beta 7 build 3 on HTC Desire running Android 2.2.2. Verifying the issue as fixed and removing qawaned.
Per comment 44 changing flags to verified