Closed
Bug 1011474
Opened 11 years ago
Closed 11 years ago
crash in js::GCMarker::drainMarkStack(js::SliceBudget&)
Categories
(Core :: JavaScript: GC, defect)
Tracking
()
RESOLVED
FIXED
mozilla33
People
(Reporter: gcp, Assigned: terrence)
References
(Blocks 1 open bug)
Details
(Keywords: crash, regression, topcrash-android-armv7)
Crash Data
Attachments
(3 files)
1.37 KB,
patch
|
sfink
:
review+
|
Details | Diff | Splinter Review |
4.43 KB,
patch
|
mjrosenb
:
review-
|
Details | Diff | Splinter Review |
3.00 KB,
patch
|
mjrosenb
:
review+
lmandel
:
approval-mozilla-aurora+
|
Details | Diff | Splinter Review |
This bug was filed from the Socorro interface and is
report bp-f2596b8a-6d49-4906-b0fd-628c42140516.
=============================================================
This signature exists for a while on desktop, but it's now a (as far as I can see, new) topcrasher on Android. Was reported by some friends that use Fennec as well, though without specific STR.
Updated•11 years ago
|
Component: General → JavaScript: GC
Product: Firefox for Android → Core
Comment 1•11 years ago
|
||
This is a top crash on Firefox for Android Nightly.
Comment 2•11 years ago
|
||
Jon: do you know of any recent changes to incremental GC that might particularly affect Android?
Flags: needinfo?(cpeterson) → needinfo?(jcoppeard)
Comment 3•11 years ago
|
||
The crashes spiked with the build on 20140505 so I looked for things that landed before then. But TBH nothing jumps out. Most of out changes have been generational or exact rooting, I don't think we've done anything recently that affects the original conservative scanner / incremental GC.
Looking at the crash addresses is interesting because most of them are inside what would be the mark bitmap for a chunk:
~60% 0x5a5fea08 - this looks like trying to mark the address 0x5a5a5a00
~10% 0xfc0a0 - trying to mark a null pointer
0x5a5a5a5a is the pattern jemalloc uses to poison freed memory so it looks like use after free or incorrect tracing code in gecko somewhere.
Flags: needinfo?(jcoppeard)
Updated•11 years ago
|
tracking-fennec: ? → 32+
tracking-firefox32:
--- → ?
Flags: needinfo?(nihsanullah)
Keywords: regression
Updated•11 years ago
|
Comment 4•11 years ago
|
||
Terrence please find an owner for this
Flags: needinfo?(nihsanullah) → needinfo?(terrence)
Assignee | ||
Comment 5•11 years ago
|
||
There appears to be nothing further we can do here without STR.
Flags: needinfo?(terrence)
Updated•11 years ago
|
Keywords: steps-wanted
Comment 6•11 years ago
|
||
What can we do to make it more reproducible? Are there low-overhead logging mechanisms, or intermediate checks for poison values in the tracer that would move the crash closer to the actual bad code?
Assignee | ||
Updated•11 years ago
|
Blocks: GC.stability
Assignee | ||
Comment 7•11 years ago
|
||
Attempts to mark a small or invalid pointer directly will already assert when inserted into the mark stack in nightly builds. This indicates that the corruption is internal to the object and therefore likely mis-usage by the JSAPI consumer.
We can potentially also check for such corruption at insertion time, however, there are some extremely subtle issues, unfortunately, with blindly checking the internals of an object. I'm working on isolating a safe subset under bug 994253, but this still requires a substantial amount of work.
Comment 8•11 years ago
|
||
(In reply to Terrence Cole [:terrence] from comment #5)
> There appears to be nothing further we can do here without STR.
This crash has spiked in 32, can you have a look at what's landed in that time frame?
Flags: needinfo?(terrence)
Comment 9•11 years ago
|
||
loading http://www.bmwusa.com reliably crashes nightly with this or the scanshape signature.
Assignee | ||
Comment 10•11 years ago
|
||
(In reply to Brad Lassey [:blassey] (use needinfo?) from comment #8)
> (In reply to Terrence Cole [:terrence] from comment #5)
> > There appears to be nothing further we can do here without STR.
>
> This crash has spiked in 32, can you have a look at what's landed in that
> time frame?
About 200 patches land each day. Even if someone could pay attention for long enough to read and understand the ~6000 checkins, the actual mistake is likely not something that's going to be reflected there.
(In reply to Kevin Brosnan [:kbrosnan] from comment #9)
> loading http://www.bmwusa.com reliably crashes nightly with this or the
> scanshape signature.
This, on the other hand, is quite actionable. I'll try to take a look as soon as I get back home, but I'm traveling until late tonight.
Flags: needinfo?(terrence)
Comment 11•11 years ago
|
||
I haven't been able to reproduce this on Android with the BMW site and latest nightly.
Comment 12•11 years ago
|
||
(In reply to Kevin Brosnan [:kbrosnan] from comment #9)
> loading http://www.bmwusa.com reliably crashes nightly with this or the
> scanshape signature.
!!!!!!
Kevin, is this still the case for you?
If so, **PLEASE** grab the recording HTTP proxy of your choice (http://mitmproxy.org/ is one -- it's a pain to get installed, though, and there are other choices) and record the site exactly as seen by the device.
If this holds up, it's almost guaranteed to turn into a bugfix, possibly a substantial stability win, in a component with lots of unreproducible crashes. Therefore shoring up the STR here is a huge deal for us.
Flags: needinfo?(kbrosnan)
Comment 13•11 years ago
|
||
Looks like they are detecting Firefox for Android as a mobile browser now. Previously I was crashing on their desktop site. Using the request desktop site or the Phony extension set to "Desktop Firefox" still crashes.
Flags: needinfo?(kbrosnan)
Comment 14•11 years ago
|
||
(In reply to Kevin Brosnan [:kbrosnan] from comment #13)
Thanks, I've reproduced this now.
Comment 16•11 years ago
|
||
Bisection shows that this started with changeset accdf191ac4e, part of the fix for bug 969012.
Assignee | ||
Comment 17•11 years ago
|
||
Great work bisecting this, Jon!
The patch in question actually had quite a bit of trouble landing on ARM because it was running into a codegen bug. I rewrote it completely, but it seems maybe that just hid the underlying problem. For now let's just suppress all of the new codepaths on ARM so we can debug at leisure.
The attached patch passes all jsapi and jit tests in the ARM simulator and it should be safe and easy to uplift as needed, assuming, of course, that it actually works around the issue successfully.
I've put up a try build so that someone that can reproduce can test this out easily: https://tbpl.mozilla.org/?tree=Try&rev=8a4b6dc55317
Updated•11 years ago
|
status-firefox31:
--- → unaffected
status-firefox32:
--- → affected
status-firefox33:
--- → affected
Comment 18•11 years ago
|
||
Can confirm that the patch stops the crashing on bmwusa.com and loading several of the less consistent crashing urls obtained from crash stats.
Updated•11 years ago
|
Attachment #8437121 -
Flags: review?(sphink) → review+
Assignee | ||
Comment 19•11 years ago
|
||
Comment 21•11 years ago
|
||
Comment 22•11 years ago
|
||
ok, so this is almost certainly *not* the patch that we want, but after a bit of digging, the issue seems to be that on x86 and x64, CallTempReg0 is volatile, but on arm, it is not. we probably just want to choose a new register that is volatile on all platforms. Bonus points if you choose something that is either r0 or r1 on arm, since those are arg0 and arg1, and r0 is also the return value (it'll save a bunch of moves, if nothing else.)
Updated•11 years ago
|
Attachment #8437527 -
Flags: review-
Attachment #8437527 -
Flags: feedback?(terrence)
Comment 23•11 years ago
|
||
As discussed on IRC, use one of the volatile registers that we save rather than trashing CallTemp1.
I verified that this does fix the crash.
Attachment #8437692 -
Flags: review?(mrosenberg)
Comment 24•11 years ago
|
||
We should try to uplift this crash fix to Aurora 32 because it will ship with B2G 2.0. AFAIK, B2G 1.4 is shipping Gecko 30, which is unaffected.
Comment 25•11 years ago
|
||
Comment on attachment 8437692 [details] [diff] [review]
bug1011474-fix
Review of attachment 8437692 [details] [diff] [review]:
-----------------------------------------------------------------
::: js/src/jit/CodeGenerator.cpp
@@ +5452,5 @@
> masm.PushRegsInMask(regs);
>
> + const Register regTemp = regs.takeGeneral();
> + const Register regRuntime = regTemp;
> + regs.add(regTemp);
Sadly, I don't think there's a way of getting around this dance.
Attachment #8437692 -
Flags: review?(mrosenberg) → review+
Assignee | ||
Comment 26•11 years ago
|
||
Comment on attachment 8437527 [details] [diff] [review]
ClobberinTime-r0.patch
Review of attachment 8437527 [details] [diff] [review]:
-----------------------------------------------------------------
Looks like Jon beat me to it. :-)
Attachment #8437527 -
Flags: feedback?(terrence)
Assignee | ||
Comment 27•11 years ago
|
||
https://hg.mozilla.org/integration/mozilla-inbound/rev/d1af495b17a7
Let's get this on inbound asap.
Comment 28•11 years ago
|
||
Assignee | ||
Comment 29•11 years ago
|
||
Comment on attachment 8437692 [details] [diff] [review]
bug1011474-fix
[Approval Request Comment]
Bug caused by (feature/regressing bug #): bug 969012
User impact if declined: Increased crashes in b2g and fennec.
Testing completed (on m-c, etc.): m-i for a day, now on m-c.
Risk to taking this patch (and alternatives if risky): low.
String or IDL/UUID changes made by this patch: none.
Attachment #8437692 -
Flags: approval-mozilla-aurora?
Comment 30•11 years ago
|
||
I got this crash while browsing the new Google Maps:
https://crash-stats.mozilla.com/report/index/0165bb83-0f79-4ae4-b768-c3b2a2140612
Assignee | ||
Comment 31•11 years ago
|
||
(In reply to Markus Popp from comment #30)
> I got this crash while browsing the new Google Maps:
>
> https://crash-stats.mozilla.com/report/index/0165bb83-0f79-4ae4-b768-
> c3b2a2140612
Thanks for the crash report! The fix here is only relevant for ARM, so it looks like we have a similar crash affecting linux amd64 that we need to look into too.
Comment 32•11 years ago
|
||
(In reply to Terrence Cole [:terrence] from comment #31)
> Thanks for the crash report! The fix here is only relevant for ARM, so it
> looks like we have a similar crash affecting linux amd64 that we need to
> look into too.
You're welcome!
Firefox crashes quite frequently when accessing the new Google Maps. While I think these other crashes are probably unrelated to this one, I reported them as Bug 1022987. Unlike this crash (1011474) which occurred in the middle of browsing the new Google Maps, those of Bug 1022987 happen when first accessing Google Maps.
Assignee | ||
Comment 33•11 years ago
|
||
(In reply to Markus Popp from comment #32)
> (In reply to Terrence Cole [:terrence] from comment #31)
> > Thanks for the crash report! The fix here is only relevant for ARM, so it
> > looks like we have a similar crash affecting linux amd64 that we need to
> > look into too.
>
> You're welcome!
>
> Firefox crashes quite frequently when accessing the new Google Maps. While I
> think these other crashes are probably unrelated to this one, I reported
> them as Bug 1022987. Unlike this crash (1011474) which occurred in the
> middle of browsing the new Google Maps, those of Bug 1022987 happen when
> first accessing Google Maps.
You are correct, the crash in comment 30 is definitely different from the ones in bug 1022987. I've moved it to Core::Graphics, which should hopefully get some eyes on it. At a guess, the linux radeon driver is probably at fault there, but I'm not at all familiar with our webgl code though.
Comment 34•11 years ago
|
||
Comment on attachment 8437692 [details] [diff] [review]
bug1011474-fix
Aurora approval granted. Please verify that this fixes the crash on Aurora as well.
Attachment #8437692 -
Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Comment 35•11 years ago
|
||
https://hg.mozilla.org/releases/mozilla-aurora/rev/7df8944f610b
Resolving per IRC discussion with Jon. Please file a new bug for any remaining crashes as it makes tracking much easier than piling multiple different issues into one bug :)
Status: ASSIGNED → RESOLVED
Closed: 11 years ago
Keywords: leave-open
Resolution: --- → FIXED
Target Milestone: --- → mozilla33
Updated•11 years ago
|
status-b2g-v2.0:
--- → fixed
status-b2g-v2.1:
--- → fixed
Comment 36•11 years ago
|
||
verified by checking crash stats, no more Android crashes on 33a1 or 32a2.
You need to log in
before you can comment on or make changes to this bug.
Description
•