Closed
Bug 1259699
Opened 8 years ago
Closed 8 years ago
spike in JS-related EXCEPTION_STACK_OVERFLOW crashes on nightly
Categories
(Core :: JavaScript: GC, defect)
Core
JavaScript: GC
Tracking
()
RESOLVED
FIXED
mozilla48
People
(Reporter: dbaron, Assigned: jandem)
References
Details
(Keywords: crash, topcrash)
Crash Data
Attachments
(1 file)
1.07 KB,
patch
|
ted
:
review+
bholley
:
review+
ritu
:
approval-mozilla-aurora+
lizzard
:
approval-mozilla-beta+
ritu
:
approval-mozilla-esr45+
|
Details | Diff | Splinter Review |
There's been a huge spike in crashes on nightly with EXCEPTION_STACK_OVERFLOW in JS-related signatures: Massive spike on March 24 nightly in _chkstk: https://crash-stats.mozilla.com/signature/?product=Firefox&release_channel=nightly&platform=Windows&date=%3E%3D2016-03-01&reason=EXCEPTION_STACK_OVERFLOW&signature=_chkstk&_columns=date&_columns=product&_columns=version&_columns=build_id&_columns=platform&_columns=reason&_columns=address&page=1#aggregations Massive spike on March 23 and 24 nightlies in DispatchToTracer<>: https://crash-stats.mozilla.com/signature/?product=Firefox&release_channel=nightly&platform=Windows&date=%3E%3D2016-03-01&reason=EXCEPTION_STACK_OVERFLOW&signature=DispatchToTracer%3CT%3E&_columns=date&_columns=product&_columns=version&_columns=build_id&_columns=platform&_columns=reason&_columns=address&page=1#aggregations Spikes on March 19 and March 24 in UnmarkGrayTracer::onChild: https://crash-stats.mozilla.com/signature/?product=Firefox&release_channel=nightly&platform=Windows&date=%3E%3D2016-03-01&reason=EXCEPTION_STACK_OVERFLOW&signature=UnmarkGrayTracer%3A%3AonChild&_columns=date&_columns=product&_columns=version&_columns=build_id&_columns=platform&_columns=reason&_columns=address&page=1#aggregations Spike on March 24 in js::TraceRange<T>: https://crash-stats.mozilla.com/signature/?product=Firefox&release_channel=nightly&platform=Windows&date=%3E%3D2016-03-01&reason=EXCEPTION_STACK_OVERFLOW&signature=js%3A%3ATraceRange%3CT%3E&_columns=date&_columns=product&_columns=version&_columns=build_id&_columns=platform&_columns=reason&_columns=address&page=1#aggregations These spikes represent a significant portion of the crashes on nightly.
Comment 1•8 years ago
|
||
Hmm, looks like these crashes are all win64, so this must be fallout from bug 1257234. Jan, maybe our safe stack limits need to be adjusted now that we can actually exhaust the stack on win64 (instead of using at most half of it)? That's just a guess though.
Blocks: 1257234
Flags: needinfo?(jdemooij)
Assignee | ||
Comment 2•8 years ago
|
||
Ted, do you know why none of these reports have stack traces? I only see a single frame, so it's impossible to figure out if and where we're overrecursing. (In reply to Emanuel Hoogeveen [:ehoogeveen] from comment #1) > Jan, maybe our safe stack limits need to be adjusted now that > we can actually exhaust the stack on win64 (instead of using at most half of > it)? That's just a guess though. Maybe, but since most (all?) of these crashes are in GC code I wonder if we're missing an overrecursion check there somewhere. We backported bug 1257234, so the simplest and safest short-term fix is probably to subtract 100 KB or so from our stack quota on Win64.
Flags: needinfo?(ted)
Comment 3•8 years ago
|
||
I think unfortunately the _chkstk routine used here doesn't have any unwind info. :-/ I can probably get a plausible stack out of some of these dumps though.
Flags: needinfo?(ted)
Assignee | ||
Comment 4•8 years ago
|
||
I managed to load one of these dumps in Visual Studio and got a good stack trace there. * We have a ton of UnmarkGrayTracer calls on the stack, using up pretty much all of our stack space. I filed bug 1259730 to make this non-recursive. * UnmarkGrayTracer::onChild has an overrecursion check, but after that we push some frames that are quite large (I saw a 3+ KB frame for instance). This seems to be a common issue with MSVC PGO builds unfortunately. * I think the simplest fix is to subtract more than 20 KB on Win64, to account for these large PGO stack frames.
Assignee | ||
Comment 5•8 years ago
|
||
Bump the 20 KB buffer to 40 KB (Win32) or 80 KB (Win64).
Assignee: nobody → jdemooij
Status: NEW → ASSIGNED
Flags: needinfo?(jdemooij)
Attachment #8734745 -
Flags: review?(ted)
Comment 6•8 years ago
|
||
Crash with signature UnmarkGrayTracer::onChild appeared the 2016-03-26 on 47.0a2. It's #9 in topcrash and represents 337 crashes.
status-firefox47:
--- → affected
Comment 7•8 years ago
|
||
We backed out bug 1257234 on beta for test failures. I'm wondering why that happened for beta but not for nightly/aurora (successfully - since it stopped us from hitting this crash in beta 6) Are the nightly/aurora tests for win64 the same, here? Or do we treat them differently?
status-firefox46:
--- → affected
Assignee | ||
Comment 8•8 years ago
|
||
(In reply to Liz Henry (:lizzard) (needinfo? me) from comment #7) > We backed out bug 1257234 on beta for test failures. I'm wondering why that > happened for beta but not for nightly/aurora (successfully - since it > stopped us from hitting this crash in beta 6) > Are the nightly/aurora tests for win64 the same, here? Or do we treat them > differently? It's hard to say, but release/beta builds are slightly different from aurora/nightly builds (IIRC nightly/aurora builds have frame pointers enabled for instance), so that likely affects the code we generate - just enough to push us over the stack limit.
Comment 9•8 years ago
|
||
Comment on attachment 8734745 [details] [diff] [review] Patch Review of attachment 8734745 [details] [diff] [review]: ----------------------------------------------------------------- rs=me on the mechanics, though I have no idea if this is the appropriate amount to subtract.
Attachment #8734745 -
Flags: review+
Comment 11•8 years ago
|
||
I can't really comment on the absolute numbers, but based on the anecdotal evidence of the stack frame sizes in bug 1259796 I think doubling on 64-bit is reasonable.
Comment 12•8 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/72c2846e2d16
Status: ASSIGNED → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla48
Jan, this was mentioned as a top crasher on Aurora 47. Should we consider uplifting this fix to Aurora? Thanks!
Flags: needinfo?(jdemooij)
Updated•8 years ago
|
Attachment #8734745 -
Flags: review?(ted) → review+
Assignee | ||
Comment 14•8 years ago
|
||
Comment on attachment 8734745 [details] [diff] [review] Patch We need this small change to fix crashes on Aurora and to uplift bug 1257234 to Beta and ESR45. Approval Request Comment [Feature/regressing bug #]: Bug 1257234. [User impact if declined]: Crashes. [Describe test coverage new/current, TreeHerder]: This is hard to test well. Top crashes and intermittents have stopped after this landed though. [Risks and why]: Low risk. [String/UUID change made/needed]: None.
Flags: needinfo?(jdemooij)
Attachment #8734745 -
Flags: approval-mozilla-esr45?
Attachment #8734745 -
Flags: approval-mozilla-beta?
Attachment #8734745 -
Flags: approval-mozilla-aurora?
status-firefox-esr45:
--- → affected
Comment on attachment 8734745 [details] [diff] [review] Patch Crash fix that seems to have fixed the crashes in Nightly, Aurora47+, ESR45
Attachment #8734745 -
Flags: approval-mozilla-esr45?
Attachment #8734745 -
Flags: approval-mozilla-esr45+
Attachment #8734745 -
Flags: approval-mozilla-aurora?
Attachment #8734745 -
Flags: approval-mozilla-aurora+
Comment 16•8 years ago
|
||
bugherder uplift |
https://hg.mozilla.org/releases/mozilla-aurora/rev/21c01308022d
Crash Signature: [@ _chkstk]
[@ DispatchToTracer<T>]
[@ UnmarkGrayTracer::onChild]
[@ js::TraceRange<T>] → [@ _chkstk]
[@ DispatchToTracer<T>]
[@ UnmarkGrayTracer::onChild]
[@ js::TraceRange<T>]
[@ JS::CallbackTracer::onShapeEdge ]
Crash Signature: [@ _chkstk]
[@ DispatchToTracer<T>]
[@ UnmarkGrayTracer::onChild]
[@ js::TraceRange<T>]
[@ JS::CallbackTracer::onShapeEdge ] → [@ _chkstk]
[@ DispatchToTracer<T>]
[@ UnmarkGrayTracer::onChild]
[@ js::TraceRange<T>]
[@ JS::CallbackTracer::onShapeEdge ]
[@ js::ObjectGroup::traceChildren ]
Comment 18•8 years ago
|
||
Comment on attachment 8734745 [details] [diff] [review] Patch Fix for topcrash, verified on nightly, let's uplift it for beta 9.
Attachment #8734745 -
Flags: approval-mozilla-beta? → approval-mozilla-beta+
This does not apply cleanly to beta: $ hg graft -er 21c01308022d grafting 336149:21c01308022d "Bug 1259699 - Adjust Windows stack limits to account for large PGO stack frames. r=bholley a=ritu" merging js/xpconnect/src/XPCJSRuntime.cpp warning: conflicts while merging js/xpconnect/src/XPCJSRuntime.cpp! (edit, then use 'hg resolve --mark') abort: unresolved conflicts, can't continue
Flags: needinfo?(jdemooij)
Assignee | ||
Comment 20•8 years ago
|
||
https://hg.mozilla.org/releases/mozilla-beta/rev/2354df610ae50aaf239f25250f11c043b32d36b2
Flags: needinfo?(jdemooij)
Assignee | ||
Comment 21•8 years ago
|
||
https://hg.mozilla.org/releases/mozilla-esr45/rev/ca6da05ad2d8649971a6228f4626730801cd1e9a
Comment 22•8 years ago
|
||
(In reply to Jan de Mooij [:jandem] from comment #2) > Ted, do you know why none of these reports have stack traces? I only see a > single frame, so it's impossible to figure out if and where we're > overrecursing. I finally spent the time to track this down. Turns out the Breakpad processor code will refuse to read a memory region that's >1MB: https://chromium.googlesource.com/breakpad/breakpad/+/fee47f4638072e9b049ffa745bbb8aa5be986698/src/processor/minidump.cc#1183 This is pretty silly. I will get that fixed upstream. Bumping that to 2MB in my local Breakpad tree gives me a useful stack from that crash.
Assignee | ||
Comment 23•8 years ago
|
||
(In reply to Ted Mielczarek [:ted.mielczarek] from comment #22) > I finally spent the time to track this down. Turns out the Breakpad > processor code will refuse to read a memory region that's >1MB: Thanks for tracking that down and fixing it.
You need to log in
before you can comment on or make changes to this bug.
Description
•