756253 - Beta crashes @ _cairo_user_data_array_fini on kijiji.ca (desktop site) while zooming or panning

Reporter

Description

•

13 years ago

Web page or screen you were on when you saw the issue: kijiji.ca Device: Droid Razr running Gingerbread Crash report ID: bp-56e54338-bbd5-44f6-88dc-a53922120517 The site looks fine when it loads, but the top right section (Daily Deals) takes a long time to show up and the horizontal area under "post a classified ad" and "all new ads" usually doesn't show up at all (though it did a couple of times). Any panning or zooming causes the browser to crash almost immediately. Sometimes clicking a link will load the new page, sometimes it will cause the browser to crash.

Matej Novak [:matej]

Reporter

Comment 1

•

13 years ago

Also, on first visit, the site serves a location detection message in a yellow box at the top of the screen ("Welcome to Kijiji. We have matched your location to..."). On Fennec, the font is too big, the lines overlap and the message bleeds off the right side of the screen.

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Comment 2

•

13 years ago

Crash looks like a dupe of bug 754056, which supposedly didn't affect aurora/beta. The font being too big is probably a font inflation issue.

Benoit Girard (:BenWa)

Assignee

Comment 3

•

13 years ago

There's both crashes from incorrect management of reusable surfaces. But the cause must be different.

Assignee: nobody → bgirard

tracking-firefox15: --- → ?

Whiteboard: [Engagement] → [Engagement][crash]

Scoobidiver (away)

Comment 4

•

13 years ago

It's bug 681114 with STR.

Severity: normal → critical

blocking-fennec1.0: --- → ?

Crash Signature: [@ _cairo_user_data_array_fini]

Keywords: crash, reproducible

Hardware: All → ARM

Summary: Beta crashes on kijiji.ca (desktop site) → Beta crashes @ _cairo_user_data_array_fini on kijiji.ca (desktop site) while zooming or panning

Whiteboard: [Engagement][crash] → [Engagement][native-crash]

Version: Trunk → Firefox 14

Benoit Girard (:BenWa)

Assignee

Updated

•

13 years ago

Blocks: 681114

Benoit Girard (:BenWa)

Assignee

Comment 5

•

13 years ago

I couldn't reproduce, but we know this crash is still happening. If you can confirm that it still happens and give more detailed steps that would be wonderful. In the meantime we want to increase the runtime checking and crash easier when we enter this bad state to find who's at fault. Going to post that patch in bug 681114.

Matej Novak [:matej]

Reporter

Comment 6

•

13 years ago

Seems to only happen for me if I activate the plugin that displays the top side-scrolling promo area. Otherwise it appears fine.

Johnathan Nightingale [:johnath]

Updated

•

13 years ago

Blocks: 756556

Scoobidiver (away)

Updated

•

13 years ago

No longer blocks: 756556

Benoit Girard (:BenWa)

Assignee

Updated

•

13 years ago

No longer blocks: 681114

Benoit Girard (:BenWa)

Assignee

Updated

•

13 years ago

Blocks: 681114

Mark Finkle (:mfinkle) (use needinfo?)

Updated

•

13 years ago

Keywords: qawanted

Scoobidiver (away)

Comment 7

•

13 years ago

Matej, can you provide the new crash ID with the try build in bug 681114 comment 17?

Kevin Brosnan [Ex-Mozilla]

Comment 8

•

13 years ago

Matej I reproduced this and added the info Scoobidiver asked for.

Kevin Brosnan [Ex-Mozilla]

Comment 9

•

13 years ago

Crashes a current nightly as well on the Droid Razr.

Keywords: qawanted

Joe Drew (not getting mail)

Comment 10

•

13 years ago

This seems like it's probably going to be fixed by the fix to bug 681114.

blocking-fennec1.0: ? → +

Alex Keybl [:akeybl]

Updated

•

13 years ago

tracking-firefox15: ? → -

Erin Lancaster [:elan]

Updated

•

13 years ago

Whiteboard: [Engagement][native-crash] → [Engagement][native-crash][gfx]

Benoit Girard (:BenWa)

Assignee

Updated

•

13 years ago

Blocks: 759676

Benoit Girard (:BenWa)

Assignee

Comment 11

•

13 years ago

The problem if-and-only-if flash is loaded. It seems to crash most of the time for me and when it doesn't crash the horizontal flash ad is significantly out of sync when doing async zooming. While reproducing this I was able to reproduce 'JS_DHashTableEnumerate' another top crasher.

Benoit Girard (:BenWa)

Assignee

Comment 12

•

13 years ago

This crash seems to be if-and-only-if flash + Android 2.3.x or LOWER.

Benoit Girard (:BenWa)

Assignee

Comment 13

•

13 years ago

I've narrowed down the code to flash drawing. I'm still unsure if it's our support code or not yet. I hope to have this narrowed down significantly monday.

Benoit Girard (:BenWa)

Assignee

Comment 14

•

13 years ago

Right now I have to narrowed down further to PluginLayer.performUpdates(). Still progressing.

Benoit Girard (:BenWa)

Assignee

Comment 15

•

13 years ago

We tried a few theories unsuccessfully: - Click to play - Skia - Plugin Placeholder surface Ali has some Valgrind traces showing some serious heap issues he will post once the trace is completed.

Ali Juma [:ajuma]

Comment 16

•

13 years ago

Attached file Valgrind log — Details

This is the complete Valgrind log of a run of Fennec that included visiting kijiji.ca, clicking to play the Flash content, zooming/panning, and then crashing. The most concerning/suspicious parts of the log are the invalid writes (that is, writing to memory that has already been freed).

Julian Seward [:jseward]

Comment 17

•

13 years ago

(In reply to Ali Juma [:ajuma] from comment #16) > Created attachment 630727 [details] > The most concerning/suspicious parts of the log are the invalid writes (that > is, writing to memory that has already been freed). Well, basically Fennec is hosed as soon as the first invalid read occurs, at android::Surface::getBufferIndex. By this point, the underlying bad event (refcounting problem?) that screws this up has already occurred. The invalid write and invalid free later on are merely follow-on effects. > at 0xAC7121E6: android::Surface::getBufferIndex(android::sp<android::GraphicBuffer> const&) const (Surface.cpp:1025) > by 0xAC71393B: android::Surface::unlockAndPost() (Surface.cpp:1011) > by 0xC279EC1: ??? (in /dev/ashmem) Do you need more frames than this? I have a crappy stack-scanning hack that might get you 3 or 4 frames after the ??? -- or it might not, no promises. You're welcome to try it if you want.

Benoit Girard (:BenWa)

Assignee

Comment 18

•

13 years ago

(In reply to Julian Seward from comment #17) > Do you need more frames than this? I have a crappy stack-scanning hack > that might get you 3 or 4 frames after the ??? -- or it might not, no > promises. You're welcome to try it if you want. That would certainly help. I've been looking into the android source code but if we can find if libxul is calling this (or flash) that would help.

Benoit Girard (:BenWa)

Assignee

Comment 19

•

13 years ago

Something relevant in the log: W/SurfaceComposerClient( 2762): Destroying surface while a transaction is open. Client 0x219358: destroying surface 142, mTransactionOpen=1

Jeff Muizelaar [:jrmuizel]

Comment 20

•

13 years ago

(In reply to Benoit Girard (:BenWa) from comment #19) > Something relevant in the log: > W/SurfaceComposerClient( 2762): Destroying surface while a transaction is > open. Client 0x219358: destroying surface 142, mTransactionOpen=1 It sounds like the right thing to do here might be to just disable placeholders. This seemed to fix the problem in our testing.

Benoit Girard (:BenWa)

Assignee

Comment 21

•

13 years ago

Attached patch patch — Details — Splinter Review

Attachment #631032 - Flags: review?(snorp)

James Willcox (:snorp) (jwillcox@mozilla.com) (he/him)

Updated

•

13 years ago

Attachment #631032 - Flags: review?(snorp) → review+

Benoit Girard (:BenWa)

Assignee

Comment 22

•

13 years ago

Comment on attachment 631032 [details] [diff] [review] patch [Approval Request Comment] Bug caused by (feature/regressing bug #): Bug 727116 User impact if declined: Crashes on 2.3 and lower when running Flash and zooming Testing completed (on m-c, etc.): Tested locally, will be on m-c soon. Risk to taking this patch (and alternatives if risky): mobile only, very low. Disabling a feature via a pref. String or UUID changes made by this patch: none.

Attachment #631032 - Flags: approval-mozilla-beta?

Attachment #631032 - Flags: approval-mozilla-aurora?

Benoit Girard (:BenWa)

Assignee

Comment 23

•

13 years ago

Comment on attachment 631032 [details] [diff] [review] patch https://hg.mozilla.org/integration/mozilla-inbound/rev/a9455a51d543

Attachment #631032 - Flags: checkin+

Benoit Girard (:BenWa)

Assignee

Updated

•

13 years ago

status-firefox14: --- → affected

status-firefox15: --- → affected

status-firefox16: --- → affected

Target Milestone: --- → Firefox 16

JP Rosevear [:jpr]

Comment 24

•

13 years ago

Are there any performance implications disabling place holders?

Brad Lassey [:blassey] (use needinfo?)

Comment 25

•

13 years ago

Let's confirm that this fixes the crash from crash-stats. Product and QA to evaluate the trade off of crashiness vs. the flash zooming experience.

Keywords: productwanted, qawanted

JP Rosevear [:jpr]

Comment 26

•

13 years ago

We should keep digging for a better solution as well, because the tradeoff here could be pretty ugly.

Graeme McCutcheon [:graememcc]

Comment 27

•

13 years ago

https://hg.mozilla.org/mozilla-central/rev/a9455a51d543 (Merged by Ed Morley)

Status: NEW → RESOLVED

Closed: 13 years ago

Resolution: --- → FIXED

Brad Lassey [:blassey] (use needinfo?)

Comment 28

•

13 years ago

reopening because we're not sure this fixes the crash (should have had a leave open in the whiteboard)

Status: RESOLVED → REOPENED

Resolution: FIXED → ---

Joe Drew (not getting mail)

Updated

•

13 years ago

Whiteboard: [Engagement][native-crash][gfx] → [Engagement][native-crash][gfx][leave open]

Brad Lassey [:blassey] (use needinfo?)

Updated

•

13 years ago

Attachment #631032 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+

Benoit Girard (:BenWa)

Assignee

Comment 29

•

13 years ago

https://hg.mozilla.org/releases/mozilla-aurora/rev/935dd46475ef

status-firefox15: affected → fixed

status-firefox16: affected → fixed

James Willcox (:snorp) (jwillcox@mozilla.com) (he/him)

Comment 30

•

13 years ago

I've tried a few things to try to allow us to turn placeholders back on: 1) Leave the Surface lock acquired in AndroidJNI::getSurfaceBits(). This would prevent the plugin from locking the surface before we removed it, with the goal that it would not then attempt to unlock it after the surface was destroyed. This didn't work, but I can't really explain why. Might warrant more investigation. 2) Instead of removing the SurfaceView, which destroys the underlying Surface (causing our problem), I thought it might be nice if we could just hide the Surface. Conveniently, Surface has methods for doing this. Sadly, they all require a transaction to work, and only privileged processes are allowed to open a transaction. At this point I think we should go ahead with the current fix. I will continue to investigate options for getting this working again, but I don't think we'll be able to get anything in before the next build.

Tony Chung [:tchung]

Comment 31

•

13 years ago

qawanted: take a look at stability report today

Scoobidiver (away)

Comment 32

•

13 years ago

(In reply to Tony Chung [:tchung] from comment #31) > qawanted: take a look at stability report today The latest crash in Nightly happened in 15.0a1/20120525 long before the patch landed. The latest crash in Aurora happened in 14.0a2/20120604 before the patch landed.

Brad Lassey [:blassey] (use needinfo?)

Comment 33

•

13 years ago

Comment on attachment 631032 [details] [diff] [review] patch Let's back this out of mozilla-central and mozilla-aurora

Attachment #631032 - Flags: approval-mozilla-beta?

Attachment #631032 - Flags: approval-mozilla-beta-

Attachment #631032 - Flags: approval-mozilla-aurora-

Attachment #631032 - Flags: approval-mozilla-aurora+

Brad Lassey [:blassey] (use needinfo?)

Updated

•

13 years ago

tracking-fennec: --- → 15+

blocking-fennec1.0: + → soft

Jeff Muizelaar [:jrmuizel]

Comment 34

•

13 years ago

(In reply to Brad Lassey [:blassey] from comment #33) > Comment on attachment 631032 [details] [diff] [review] > patch > > Let's back this out of mozilla-central and mozilla-aurora Shouldn't we fix the problems first?

Benoit Girard (:BenWa)

Assignee

Comment 35

•

13 years ago

(In reply to Brad Lassey [:blassey] from comment #33) > Comment on attachment 631032 [details] [diff] [review] > patch > > Let's back this out of mozilla-central and mozilla-aurora I don't think turning placeholders back on is a good idea. Keep in mind that the crash we are seeing is a heap corruption which: 1) This is not the only signature caused by the bug. I've seen dozens of unique signature while debugging this problem. Judging how frequent this crash is based on this signature is wrong and misleading. This bug accounts for other crashes! 2) Heap corruption does not necessary lead to crash but can cause unpredictable behavior as well as corrupting data.

James Willcox (:snorp) (jwillcox@mozilla.com) (he/him)

Comment 36

•

13 years ago

Backed out of mozilla-central and mozilla-aurora https://hg.mozilla.org/mozilla-central/rev/e9bf05c14376 https://hg.mozilla.org/releases/mozilla-aurora/rev/94e4994199c0

Ali Juma [:ajuma]

Comment 37

•

13 years ago

(In reply to Benoit Girard (:BenWa) from comment #35) > 1) This is not the only signature caused by the bug. I've seen dozens of > unique signature while debugging this problem. Judging how frequent this > crash is based on this signature is wrong and misleading. This bug accounts > for other crashes! It's worth re-emphasizing that this may be responsible for crashes with other signatures and on other sites. For example, I just tried panning and zooming a flash video on cnn.com; I did this three times, and (rather easily) got the following three crashes: bp-79c61338-6e65-4358-bf2c-f08722120612 bp-33617e48-e514-4847-91a6-323b92120612 bp-2ec951d2-5cbf-426c-bbe9-d954d2120612 Importantly, note that each of these crashes has a different signature.

James Willcox (:snorp) (jwillcox@mozilla.com) (he/him)

Comment 38

•

13 years ago

I agree that this could be really bad, fwiw. I hope to have a real fix some time today, however.

Scoobidiver (away)

Updated

•

13 years ago

status-firefox15: fixed → affected

status-firefox16: fixed → affected

Benoit Girard (:BenWa)

Assignee

Comment 39

•

13 years ago

Re-nominating as a *hard* blocker because of concerns in Comment 35 and 37. Here's some video of the current behavior: Try 1: https://dl.dropbox.com/u/10523664/bug_756253/2012-06-12%2013.55.42.mp4 Try 2: https://dl.dropbox.com/u/10523664/bug_756253/2012-06-12%2013.55.21.mp4 Try 3: https://dl.dropbox.com/u/10523664/bug_756253/2012-06-12%2013.54.32.mp4 Note that when placeholders run long enough to not crash right away the visual behavior is equivalent to not having placeholders. I was not able to demonstrate this in all my attempts because the heap corruptions is to severe and brings down the browser in seconds. - All hard evidence points to this crash being our single biggest source of crashes - It causes heap corruption which is a security problem and makes it impossible to accurately gather crash stats. I think shipping with placeholder as-is is out of the question! Disabling placeholders at this point is a no-brainer. The only question in my mind is what additional steps can be take to lower the impact. My suggestion is to either live with the current zoom/panning behavior or hiding flash during panning/zooming.

blocking-fennec1.0: soft → ?

Kevin Brosnan [Ex-Mozilla]

Comment 40

•

13 years ago

Placeholders off http://www.youtube.com/watch?v=PJDBQMZ_iDk Placeholders on http://www.youtube.com/watch?v=-nWODB94358

Brad Lassey [:blassey] (use needinfo?)

Updated

•

13 years ago

Attachment #631032 - Flags: approval-mozilla-beta-

Attachment #631032 - Flags: approval-mozilla-beta+

Attachment #631032 - Flags: approval-mozilla-aurora-

Attachment #631032 - Flags: approval-mozilla-aurora+

James Willcox (:snorp) (jwillcox@mozilla.com) (he/him)

Comment 41

•

13 years ago

I relanded Benoit's original patch on m-c and aurora, and also applied to beta per blassey https://hg.mozilla.org/mozilla-central/rev/d39501066aa4 https://hg.mozilla.org/releases/mozilla-aurora/rev/19a31aa93143 https://hg.mozilla.org/releases/mozilla-beta/rev/03b65c7a3470

blocking-fennec1.0: ? → ---

status-firefox15: affected → fixed

status-firefox16: affected → fixed

Scoobidiver (away)

Updated

•

13 years ago

status-firefox14: affected → fixed

Scoobidiver (away)

Comment 42

•

13 years ago

The latest crash on Nightly happened in 15.0a1/20120525 and on Aurora in 14.0a2/20120604. There are no crashes in 14.0b7.

Status: REOPENED → RESOLVED

Closed: 13 years ago → 13 years ago

Resolution: --- → FIXED

Whiteboard: [Engagement][native-crash][gfx][leave open] → [Engagement][native-crash][gfx]

Scoobidiver (away)

Updated

•

13 years ago

No longer blocks: 681114

Andrew McCreight [:mccr8]

Updated

•

13 years ago

Blocks: 759675

Adrian Tamas (:AdrianT)

Comment 44

•

13 years ago

Unable to reproduce the issue on Nightly 16.0a1 2012-06-17, Aurora 15.0a2 2012-06-17 or Firefox Mobile 14 beta 7 build 3 on HTC Desire running Android 2.2.2. Verifying the issue as fixed and removing qawaned.

Status: RESOLVED → VERIFIED

Keywords: qawanted

Adrian Tamas (:AdrianT)

Comment 46

•

13 years ago

Per comment 44 changing flags to verified

status-firefox14: fixed → verified

status-firefox15: fixed → verified

status-firefox16: fixed → verified

Nobody; OK to take it and work on it

Updated

•

5 years ago

Product: Firefox for Android → Firefox for Android Graveyard

Valgrind log 13 years ago Ali Juma [:ajuma] 65.59 KB, text/plain		Details
patch 13 years ago Benoit Girard (:BenWa) 1.05 KB, patch	snorp : review+ blassey : approval-mozilla-aurora+ blassey : approval-mozilla-beta+ BenWa : checkin+	Details \| Diff \| Splinter Review