<a class="header-button" href="https://bugzilla.mozilla.org/home" title="Go to home page"> Bugzilla

Chris Peterson [:cpeterson]

Comment 1

•

11 years ago

This is a top crash on Firefox for Android Nightly.

tracking-fennec: --- → ?

Flags: needinfo?(cpeterson)

Keywords: topcrash-android-armv7

Comment 2

•

11 years ago

Jon: do you know of any recent changes to incremental GC that might particularly affect Android?

Flags: needinfo?(cpeterson) → needinfo?(jcoppeard)

Brad Lassey [:blassey] (use needinfo?)

Comment 3

•

11 years ago

The crashes spiked with the build on 20140505 so I looked for things that landed before then. But TBH nothing jumps out. Most of out changes have been generational or exact rooting, I don't think we've done anything recently that affects the original conservative scanner / incremental GC. Looking at the crash addresses is interesting because most of them are inside what would be the mark bitmap for a chunk: ~60% 0x5a5fea08 - this looks like trying to mark the address 0x5a5a5a00 ~10% 0xfc0a0 - trying to mark a null pointer 0x5a5a5a5a is the pattern jemalloc uses to poison freed memory so it looks like use after free or incorrect tracing code in gecko somewhere.

Flags: needinfo?(jcoppeard)

Updated

•

11 years ago

tracking-fennec: ? → 32+

tracking-firefox32: --- → ?

Flags: needinfo?(nihsanullah)

Keywords: regression

Benjamin Kerensa [:bkerensa]

Updated

•

11 years ago

tracking-firefox32: ? → +

Naveed Ihsanullah [:naveed]

Comment 4

•

11 years ago

Terrence please find an owner for this

Flags: needinfo?(nihsanullah) → needinfo?(terrence)

Assignee

Comment 5

•

11 years ago

There appears to be nothing further we can do here without STR.

Flags: needinfo?(terrence)

Aaron Train [:aaronmt]

Updated

•

11 years ago

Keywords: steps-wanted

Benjamin Smedberg

Comment 6

•

11 years ago

What can we do to make it more reproducible? Are there low-overhead logging mechanisms, or intermediate checks for poison values in the tracer that would move the crash closer to the actual bad code?

Assignee

Updated

•

11 years ago

Blocks: GC.stability

Brad Lassey [:blassey] (use needinfo?)

Assignee

Comment 7

•

11 years ago

Attempts to mark a small or invalid pointer directly will already assert when inserted into the mark stack in nightly builds. This indicates that the corruption is internal to the object and therefore likely mis-usage by the JSAPI consumer. We can potentially also check for such corruption at insertion time, however, there are some extremely subtle issues, unfortunately, with blindly checking the internals of an object. I'm working on isolating a safe subset under bug 994253, but this still requires a substantial amount of work.

Comment 8

•

11 years ago

(In reply to Terrence Cole [:terrence] from comment #5) > There appears to be nothing further we can do here without STR. This crash has spiked in 32, can you have a look at what's landed in that time frame?

Flags: needinfo?(terrence)

Comment 9

•

11 years ago

loading http://www.bmwusa.com reliably crashes nightly with this or the scanshape signature.

Assignee

Comment 10

•

11 years ago

(In reply to Brad Lassey [:blassey] (use needinfo?) from comment #8) > (In reply to Terrence Cole [:terrence] from comment #5) > > There appears to be nothing further we can do here without STR. > > This crash has spiked in 32, can you have a look at what's landed in that > time frame? About 200 patches land each day. Even if someone could pay attention for long enough to read and understand the ~6000 checkins, the actual mistake is likely not something that's going to be reflected there. (In reply to Kevin Brosnan [:kbrosnan] from comment #9) > loading http://www.bmwusa.com reliably crashes nightly with this or the > scanshape signature. This, on the other hand, is quite actionable. I'll try to take a look as soon as I get back home, but I'm traveling until late tonight.

Flags: needinfo?(terrence)

Jason Orendorff [:jorendorff]

Comment 11

•

11 years ago

I haven't been able to reproduce this on Android with the BMW site and latest nightly.

Comment 12

•

11 years ago

(In reply to Kevin Brosnan [:kbrosnan] from comment #9) > loading http://www.bmwusa.com reliably crashes nightly with this or the > scanshape signature. !!!!!! Kevin, is this still the case for you? If so, **PLEASE** grab the recording HTTP proxy of your choice (http://mitmproxy.org/ is one -- it's a pain to get installed, though, and there are other choices) and record the site exactly as seen by the device. If this holds up, it's almost guaranteed to turn into a bugfix, possibly a substantial stability win, in a component with lots of unreproducible crashes. Therefore shoring up the STR here is a huge deal for us.

Flags: needinfo?(kbrosnan)

Comment 13

•

11 years ago

Looks like they are detecting Firefox for Android as a mobile browser now. Previously I was crashing on their desktop site. Using the request desktop site or the Phony extension set to "Desktop Firefox" still crashes.

Flags: needinfo?(kbrosnan)

Comment 14

•

11 years ago

(In reply to Kevin Brosnan [:kbrosnan] from comment #13) Thanks, I've reproduced this now.

Comment 16

•

11 years ago

Bisection shows that this started with changeset accdf191ac4e, part of the fix for bug 969012.

Assignee

Comment 17

•

11 years ago

Attached patch suppress_large_callobj_on_arm-v0.diff — Details — Splinter Review

Great work bisecting this, Jon! The patch in question actually had quite a bit of trouble landing on ARM because it was running into a codegen bug. I rewrote it completely, but it seems maybe that just hid the underlying problem. For now let's just suppress all of the new codepaths on ARM so we can debug at leisure. The attached patch passes all jsapi and jit tests in the ARM simulator and it should be safe and easy to uplift as needed, assuming, of course, that it actually works around the issue successfully. I've put up a try build so that someone that can reproduce can test this out easily: https://tbpl.mozilla.org/?tree=Try&rev=8a4b6dc55317

Assignee: nobody → terrence

Status: NEW → ASSIGNED

Attachment #8437121 - Flags: review?(sphink)

Updated

•

11 years ago

status-firefox31: --- → unaffected

status-firefox32: --- → affected

status-firefox33: --- → affected

Steve Fink [:sfink] [:s:]

Comment 18

•

11 years ago

Can confirm that the patch stops the crashing on bmwusa.com and loading several of the less consistent crashing urls obtained from crash stats.

Updated

•

11 years ago

Attachment #8437121 - Flags: review?(sphink) → review+

https://hg.mozilla.org/integration/mozilla-inbound/rev/37fdae0f24d2

Assignee

Comment 19

•

11 years ago

Ryan VanderMeulen [:RyanVM]

Assignee

Comment 20

•

11 years ago

We still need to find the root cause.

Keywords: leave-open

Comment 21

•

11 years ago

https://hg.mozilla.org/mozilla-central/rev/37fdae0f24d2

Marty Rosenberg [:mjrosenb]

Comment 22

•

11 years ago

Attached patch ClobberinTime-r0.patch — Details — Splinter Review

ok, so this is almost certainly *not* the patch that we want, but after a bit of digging, the issue seems to be that on x86 and x64, CallTempReg0 is volatile, but on arm, it is not. we probably just want to choose a new register that is volatile on all platforms. Bonus points if you choose something that is either r0 or r1 on arm, since those are arg0 and arg1, and r0 is also the return value (it'll save a bunch of moves, if nothing else.)

Marty Rosenberg [:mjrosenb]

Updated

•

11 years ago

Attachment #8437527 - Flags: review-

Attachment #8437527 - Flags: feedback?(terrence)

Chris Peterson [:cpeterson]

Comment 23

•

11 years ago

Attached patch bug1011474-fix — Details — Splinter Review

As discussed on IRC, use one of the volatile registers that we save rather than trashing CallTemp1. I verified that this does fix the crash.

Attachment #8437692 - Flags: review?(mrosenberg)

Comment 24

•

11 years ago

We should try to uplift this crash fix to Aurora 32 because it will ship with B2G 2.0. AFAIK, B2G 1.4 is shipping Gecko 30, which is unaffected.

Marty Rosenberg [:mjrosenb]

Comment 25

•

11 years ago

Comment on attachment 8437692 [details] [diff] [review] bug1011474-fix Review of attachment 8437692 [details] [diff] [review]: ----------------------------------------------------------------- ::: js/src/jit/CodeGenerator.cpp @@ +5452,5 @@ > masm.PushRegsInMask(regs); > > + const Register regTemp = regs.takeGeneral(); > + const Register regRuntime = regTemp; > + regs.add(regTemp); Sadly, I don't think there's a way of getting around this dance.

Attachment #8437692 - Flags: review?(mrosenberg) → review+

Assignee

Comment 26

•

11 years ago

Comment on attachment 8437527 [details] [diff] [review] ClobberinTime-r0.patch Review of attachment 8437527 [details] [diff] [review]: ----------------------------------------------------------------- Looks like Jon beat me to it. :-)

Attachment #8437527 - Flags: feedback?(terrence)

https://hg.mozilla.org/mozilla-central/rev/d1af495b17a7

Assignee

Comment 27

•

11 years ago

https://hg.mozilla.org/integration/mozilla-inbound/rev/d1af495b17a7 Let's get this on inbound asap.

Ed Morley [:emorley]

Comment 28

•

11 years ago

Assignee

Comment 29

•

11 years ago

Comment on attachment 8437692 [details] [diff] [review] bug1011474-fix [Approval Request Comment] Bug caused by (feature/regressing bug #): bug 969012 User impact if declined: Increased crashes in b2g and fennec. Testing completed (on m-c, etc.): m-i for a day, now on m-c. Risk to taking this patch (and alternatives if risky): low. String or IDL/UUID changes made by this patch: none.

Attachment #8437692 - Flags: approval-mozilla-aurora?

Markus Popp

Comment 30

•

11 years ago

I got this crash while browsing the new Google Maps: https://crash-stats.mozilla.com/report/index/0165bb83-0f79-4ae4-b768-c3b2a2140612

Assignee

Comment 31

•

11 years ago

(In reply to Markus Popp from comment #30) > I got this crash while browsing the new Google Maps: > > https://crash-stats.mozilla.com/report/index/0165bb83-0f79-4ae4-b768- > c3b2a2140612 Thanks for the crash report! The fix here is only relevant for ARM, so it looks like we have a similar crash affecting linux amd64 that we need to look into too.

Markus Popp

Comment 32

•

11 years ago

(In reply to Terrence Cole [:terrence] from comment #31) > Thanks for the crash report! The fix here is only relevant for ARM, so it > looks like we have a similar crash affecting linux amd64 that we need to > look into too. You're welcome! Firefox crashes quite frequently when accessing the new Google Maps. While I think these other crashes are probably unrelated to this one, I reported them as Bug 1022987. Unlike this crash (1011474) which occurred in the middle of browsing the new Google Maps, those of Bug 1022987 happen when first accessing Google Maps.

Lawrence Mandel [:lmandel] (use needinfo)

Assignee

Comment 33

•

11 years ago

(In reply to Markus Popp from comment #32) > (In reply to Terrence Cole [:terrence] from comment #31) > > Thanks for the crash report! The fix here is only relevant for ARM, so it > > looks like we have a similar crash affecting linux amd64 that we need to > > look into too. > > You're welcome! > > Firefox crashes quite frequently when accessing the new Google Maps. While I > think these other crashes are probably unrelated to this one, I reported > them as Bug 1022987. Unlike this crash (1011474) which occurred in the > middle of browsing the new Google Maps, those of Bug 1022987 happen when > first accessing Google Maps. You are correct, the crash in comment 30 is definitely different from the ones in bug 1022987. I've moved it to Core::Graphics, which should hopefully get some eyes on it. At a guess, the linux radeon driver is probably at fault there, but I'm not at all familiar with our webgl code though.

Comment 34

•

11 years ago

Comment on attachment 8437692 [details] [diff] [review] bug1011474-fix Aurora approval granted. Please verify that this fixes the crash on Aurora as well.

Attachment #8437692 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+

Ryan VanderMeulen [:RyanVM]

Comment 35

•

11 years ago

https://hg.mozilla.org/releases/mozilla-aurora/rev/7df8944f610b Resolving per IRC discussion with Jon. Please file a new bug for any remaining crashes as it makes tracking much easier than piling multiple different issues into one bug :)

Status: ASSIGNED → RESOLVED

Closed: 11 years ago

status-firefox32: affected → fixed

status-firefox33: affected → fixed

Keywords: leave-open

Resolution: --- → FIXED

Target Milestone: --- → mozilla33

Ryan VanderMeulen [:RyanVM]

Updated

•

11 years ago

status-b2g-v2.0: --- → fixed

status-b2g-v2.1: --- → fixed