Closed
Bug 756253
Opened 11 years ago
Closed 11 years ago
Beta crashes @ _cairo_user_data_array_fini on kijiji.ca (desktop site) while zooming or panning
Categories
(Firefox for Android Graveyard :: General, defect)
Tracking
(firefox14 verified, firefox15- verified, firefox16 verified, fennec15+)
VERIFIED
FIXED
Firefox 16
People
(Reporter: matej, Assigned: BenWa)
References
()
Details
(Keywords: crash, productwanted, reproducible, Whiteboard: [Engagement][native-crash][gfx])
Crash Data
Attachments
(2 files)
65.59 KB,
text/plain
|
Details | |
1.05 KB,
patch
|
snorp
:
review+
blassey
:
approval-mozilla-aurora+
blassey
:
approval-mozilla-beta+
BenWa
:
checkin+
|
Details | Diff | Splinter Review |
Web page or screen you were on when you saw the issue: kijiji.ca Device: Droid Razr running Gingerbread Crash report ID: bp-56e54338-bbd5-44f6-88dc-a53922120517 The site looks fine when it loads, but the top right section (Daily Deals) takes a long time to show up and the horizontal area under "post a classified ad" and "all new ads" usually doesn't show up at all (though it did a couple of times). Any panning or zooming causes the browser to crash almost immediately. Sometimes clicking a link will load the new page, sometimes it will cause the browser to crash.
Reporter | ||
Comment 1•11 years ago
|
||
Also, on first visit, the site serves a location detection message in a yellow box at the top of the screen ("Welcome to Kijiji. We have matched your location to..."). On Fennec, the font is too big, the lines overlap and the message bleeds off the right side of the screen.
Comment 2•11 years ago
|
||
Crash looks like a dupe of bug 754056, which supposedly didn't affect aurora/beta. The font being too big is probably a font inflation issue.
Assignee | ||
Comment 3•11 years ago
|
||
There's both crashes from incorrect management of reusable surfaces. But the cause must be different.
Assignee: nobody → bgirard
tracking-firefox15:
--- → ?
Whiteboard: [Engagement] → [Engagement][crash]
Comment 4•11 years ago
|
||
It's bug 681114 with STR.
Severity: normal → critical
blocking-fennec1.0: --- → ?
Crash Signature: [@ _cairo_user_data_array_fini]
Keywords: crash,
reproducible
Hardware: All → ARM
Summary: Beta crashes on kijiji.ca (desktop site) → Beta crashes @ _cairo_user_data_array_fini on kijiji.ca (desktop site) while zooming or panning
Whiteboard: [Engagement][crash] → [Engagement][native-crash]
Version: Trunk → Firefox 14
Assignee | ||
Comment 5•11 years ago
|
||
I couldn't reproduce, but we know this crash is still happening. If you can confirm that it still happens and give more detailed steps that would be wonderful. In the meantime we want to increase the runtime checking and crash easier when we enter this bad state to find who's at fault. Going to post that patch in bug 681114.
Reporter | ||
Comment 6•11 years ago
|
||
Seems to only happen for me if I activate the plugin that displays the top side-scrolling promo area. Otherwise it appears fine.
Comment 7•11 years ago
|
||
Matej, can you provide the new crash ID with the try build in bug 681114 comment 17?
Comment 8•11 years ago
|
||
Matej I reproduced this and added the info Scoobidiver asked for.
Comment 10•11 years ago
|
||
This seems like it's probably going to be fixed by the fix to bug 681114.
blocking-fennec1.0: ? → +
Updated•11 years ago
|
Updated•11 years ago
|
Whiteboard: [Engagement][native-crash] → [Engagement][native-crash][gfx]
Assignee | ||
Comment 11•11 years ago
|
||
The problem if-and-only-if flash is loaded. It seems to crash most of the time for me and when it doesn't crash the horizontal flash ad is significantly out of sync when doing async zooming. While reproducing this I was able to reproduce 'JS_DHashTableEnumerate' another top crasher.
Assignee | ||
Comment 12•11 years ago
|
||
This crash seems to be if-and-only-if flash + Android 2.3.x or LOWER.
Assignee | ||
Comment 13•11 years ago
|
||
I've narrowed down the code to flash drawing. I'm still unsure if it's our support code or not yet. I hope to have this narrowed down significantly monday.
Assignee | ||
Comment 14•11 years ago
|
||
Right now I have to narrowed down further to PluginLayer.performUpdates(). Still progressing.
Assignee | ||
Comment 15•11 years ago
|
||
We tried a few theories unsuccessfully: - Click to play - Skia - Plugin Placeholder surface Ali has some Valgrind traces showing some serious heap issues he will post once the trace is completed.
Comment 16•11 years ago
|
||
This is the complete Valgrind log of a run of Fennec that included visiting kijiji.ca, clicking to play the Flash content, zooming/panning, and then crashing. The most concerning/suspicious parts of the log are the invalid writes (that is, writing to memory that has already been freed).
Comment 17•11 years ago
|
||
(In reply to Ali Juma [:ajuma] from comment #16) > Created attachment 630727 [details] > The most concerning/suspicious parts of the log are the invalid writes (that > is, writing to memory that has already been freed). Well, basically Fennec is hosed as soon as the first invalid read occurs, at android::Surface::getBufferIndex. By this point, the underlying bad event (refcounting problem?) that screws this up has already occurred. The invalid write and invalid free later on are merely follow-on effects. > at 0xAC7121E6: android::Surface::getBufferIndex(android::sp<android::GraphicBuffer> const&) const (Surface.cpp:1025) > by 0xAC71393B: android::Surface::unlockAndPost() (Surface.cpp:1011) > by 0xC279EC1: ??? (in /dev/ashmem) Do you need more frames than this? I have a crappy stack-scanning hack that might get you 3 or 4 frames after the ??? -- or it might not, no promises. You're welcome to try it if you want.
Assignee | ||
Comment 18•11 years ago
|
||
(In reply to Julian Seward from comment #17) > Do you need more frames than this? I have a crappy stack-scanning hack > that might get you 3 or 4 frames after the ??? -- or it might not, no > promises. You're welcome to try it if you want. That would certainly help. I've been looking into the android source code but if we can find if libxul is calling this (or flash) that would help.
Assignee | ||
Comment 19•11 years ago
|
||
Something relevant in the log: W/SurfaceComposerClient( 2762): Destroying surface while a transaction is open. Client 0x219358: destroying surface 142, mTransactionOpen=1
Comment 20•11 years ago
|
||
(In reply to Benoit Girard (:BenWa) from comment #19) > Something relevant in the log: > W/SurfaceComposerClient( 2762): Destroying surface while a transaction is > open. Client 0x219358: destroying surface 142, mTransactionOpen=1 It sounds like the right thing to do here might be to just disable placeholders. This seemed to fix the problem in our testing.
Assignee | ||
Comment 21•11 years ago
|
||
Attachment #631032 -
Flags: review?(snorp)
Updated•11 years ago
|
Attachment #631032 -
Flags: review?(snorp) → review+
Assignee | ||
Comment 22•11 years ago
|
||
Comment on attachment 631032 [details] [diff] [review] patch [Approval Request Comment] Bug caused by (feature/regressing bug #): Bug 727116 User impact if declined: Crashes on 2.3 and lower when running Flash and zooming Testing completed (on m-c, etc.): Tested locally, will be on m-c soon. Risk to taking this patch (and alternatives if risky): mobile only, very low. Disabling a feature via a pref. String or UUID changes made by this patch: none.
Attachment #631032 -
Flags: approval-mozilla-beta?
Attachment #631032 -
Flags: approval-mozilla-aurora?
Assignee | ||
Comment 23•11 years ago
|
||
Comment on attachment 631032 [details] [diff] [review] patch https://hg.mozilla.org/integration/mozilla-inbound/rev/a9455a51d543
Attachment #631032 -
Flags: checkin+
Assignee | ||
Updated•11 years ago
|
status-firefox14:
--- → affected
status-firefox15:
--- → affected
status-firefox16:
--- → affected
Target Milestone: --- → Firefox 16
Comment 24•11 years ago
|
||
Are there any performance implications disabling place holders?
Comment 25•11 years ago
|
||
Let's confirm that this fixes the crash from crash-stats. Product and QA to evaluate the trade off of crashiness vs. the flash zooming experience.
Keywords: productwanted,
qawanted
Comment 26•11 years ago
|
||
We should keep digging for a better solution as well, because the tradeoff here could be pretty ugly.
Comment 27•11 years ago
|
||
https://hg.mozilla.org/mozilla-central/rev/a9455a51d543 (Merged by Ed Morley)
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Comment 28•11 years ago
|
||
reopening because we're not sure this fixes the crash (should have had a leave open in the whiteboard)
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Updated•11 years ago
|
Whiteboard: [Engagement][native-crash][gfx] → [Engagement][native-crash][gfx][leave open]
Updated•11 years ago
|
Attachment #631032 -
Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Assignee | ||
Comment 29•11 years ago
|
||
https://hg.mozilla.org/releases/mozilla-aurora/rev/935dd46475ef
Comment 30•11 years ago
|
||
I've tried a few things to try to allow us to turn placeholders back on: 1) Leave the Surface lock acquired in AndroidJNI::getSurfaceBits(). This would prevent the plugin from locking the surface before we removed it, with the goal that it would not then attempt to unlock it after the surface was destroyed. This didn't work, but I can't really explain why. Might warrant more investigation. 2) Instead of removing the SurfaceView, which destroys the underlying Surface (causing our problem), I thought it might be nice if we could just hide the Surface. Conveniently, Surface has methods for doing this. Sadly, they all require a transaction to work, and only privileged processes are allowed to open a transaction. At this point I think we should go ahead with the current fix. I will continue to investigate options for getting this working again, but I don't think we'll be able to get anything in before the next build.
Comment 31•11 years ago
|
||
qawanted: take a look at stability report today
Comment 32•11 years ago
|
||
(In reply to Tony Chung [:tchung] from comment #31) > qawanted: take a look at stability report today The latest crash in Nightly happened in 15.0a1/20120525 long before the patch landed. The latest crash in Aurora happened in 14.0a2/20120604 before the patch landed.
Comment 33•11 years ago
|
||
Comment on attachment 631032 [details] [diff] [review] patch Let's back this out of mozilla-central and mozilla-aurora
Attachment #631032 -
Flags: approval-mozilla-beta?
Attachment #631032 -
Flags: approval-mozilla-beta-
Attachment #631032 -
Flags: approval-mozilla-aurora-
Attachment #631032 -
Flags: approval-mozilla-aurora+
Updated•11 years ago
|
tracking-fennec: --- → 15+
blocking-fennec1.0: + → soft
Comment 34•11 years ago
|
||
(In reply to Brad Lassey [:blassey] from comment #33) > Comment on attachment 631032 [details] [diff] [review] > patch > > Let's back this out of mozilla-central and mozilla-aurora Shouldn't we fix the problems first?
Assignee | ||
Comment 35•11 years ago
|
||
(In reply to Brad Lassey [:blassey] from comment #33) > Comment on attachment 631032 [details] [diff] [review] > patch > > Let's back this out of mozilla-central and mozilla-aurora I don't think turning placeholders back on is a good idea. Keep in mind that the crash we are seeing is a heap corruption which: 1) This is not the only signature caused by the bug. I've seen dozens of unique signature while debugging this problem. Judging how frequent this crash is based on this signature is wrong and misleading. This bug accounts for other crashes! 2) Heap corruption does not necessary lead to crash but can cause unpredictable behavior as well as corrupting data.
Comment 36•11 years ago
|
||
Backed out of mozilla-central and mozilla-aurora https://hg.mozilla.org/mozilla-central/rev/e9bf05c14376 https://hg.mozilla.org/releases/mozilla-aurora/rev/94e4994199c0
Comment 37•11 years ago
|
||
(In reply to Benoit Girard (:BenWa) from comment #35) > 1) This is not the only signature caused by the bug. I've seen dozens of > unique signature while debugging this problem. Judging how frequent this > crash is based on this signature is wrong and misleading. This bug accounts > for other crashes! It's worth re-emphasizing that this may be responsible for crashes with other signatures and on other sites. For example, I just tried panning and zooming a flash video on cnn.com; I did this three times, and (rather easily) got the following three crashes: bp-79c61338-6e65-4358-bf2c-f08722120612 bp-33617e48-e514-4847-91a6-323b92120612 bp-2ec951d2-5cbf-426c-bbe9-d954d2120612 Importantly, note that each of these crashes has a different signature.
Comment 38•11 years ago
|
||
I agree that this could be really bad, fwiw. I hope to have a real fix some time today, however.
Updated•11 years ago
|
Assignee | ||
Comment 39•11 years ago
|
||
Re-nominating as a *hard* blocker because of concerns in Comment 35 and 37. Here's some video of the current behavior: Try 1: https://dl.dropbox.com/u/10523664/bug_756253/2012-06-12%2013.55.42.mp4 Try 2: https://dl.dropbox.com/u/10523664/bug_756253/2012-06-12%2013.55.21.mp4 Try 3: https://dl.dropbox.com/u/10523664/bug_756253/2012-06-12%2013.54.32.mp4 Note that when placeholders run long enough to not crash right away the visual behavior is equivalent to not having placeholders. I was not able to demonstrate this in all my attempts because the heap corruptions is to severe and brings down the browser in seconds. - All hard evidence points to this crash being our single biggest source of crashes - It causes heap corruption which is a security problem and makes it impossible to accurately gather crash stats. I think shipping with placeholder as-is is out of the question! Disabling placeholders at this point is a no-brainer. The only question in my mind is what additional steps can be take to lower the impact. My suggestion is to either live with the current zoom/panning behavior or hiding flash during panning/zooming.
blocking-fennec1.0: soft → ?
Comment 40•11 years ago
|
||
Placeholders off http://www.youtube.com/watch?v=PJDBQMZ_iDk Placeholders on http://www.youtube.com/watch?v=-nWODB94358
Updated•11 years ago
|
Attachment #631032 -
Flags: approval-mozilla-beta-
Attachment #631032 -
Flags: approval-mozilla-beta+
Attachment #631032 -
Flags: approval-mozilla-aurora-
Attachment #631032 -
Flags: approval-mozilla-aurora+
Comment 41•11 years ago
|
||
I relanded Benoit's original patch on m-c and aurora, and also applied to beta per blassey https://hg.mozilla.org/mozilla-central/rev/d39501066aa4 https://hg.mozilla.org/releases/mozilla-aurora/rev/19a31aa93143 https://hg.mozilla.org/releases/mozilla-beta/rev/03b65c7a3470
Updated•11 years ago
|
Comment 42•11 years ago
|
||
The latest crash on Nightly happened in 15.0a1/20120525 and on Aurora in 14.0a2/20120604. There are no crashes in 14.0b7.
Status: REOPENED → RESOLVED
Closed: 11 years ago → 11 years ago
Resolution: --- → FIXED
Whiteboard: [Engagement][native-crash][gfx][leave open] → [Engagement][native-crash][gfx]
Comment 44•11 years ago
|
||
Unable to reproduce the issue on Nightly 16.0a1 2012-06-17, Aurora 15.0a2 2012-06-17 or Firefox Mobile 14 beta 7 build 3 on HTC Desire running Android 2.2.2. Verifying the issue as fixed and removing qawaned.
Status: RESOLVED → VERIFIED
Keywords: qawanted
Comment 46•11 years ago
|
||
Per comment 44 changing flags to verified
Updated•2 years ago
|
Product: Firefox for Android → Firefox for Android Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•