Closed
Bug 1280591
Opened 8 years ago
Closed 6 years ago
Crash in js::gc::GCRuntime::markBufferedGrayRoots or js::
Categories
(Core :: JavaScript: GC, defect, P3)
Tracking
()
People
(Reporter: marcia, Unassigned)
References
Details
(Keywords: crash, topcrash, triage-deferred)
Crash Data
Attachments
(2 files, 2 obsolete files)
6.28 KB,
text/plain
|
Details | |
984 bytes,
patch
|
terrence
:
review+
|
Details | Diff | Splinter Review |
This bug was filed from the Socorro interface and is
report bp-b27d8452-2dfd-405f-af75-82b0a2160617.
=============================================================
Top mac crash on 47 - currently #4. https://crash-stats.mozilla.com/report/list?signature=js%3A%3A
Lion's share of URLs are from facebook.com.
ni on Terrence - any ideas?
Flags: needinfo?(terrence)
Comment 1•8 years ago
|
||
These are all under markBufferedGrayRoots.
Andrew, could a missing ExposeToActiveJS cause this? Is there anything that landed on 13 Jun that touches how we do rooting or traversal of the DOM?
Flags: needinfo?(terrence) → needinfo?(continuation)
Hi Naveed, is this something your team can investigate? This was mentioned as a top crash on release and something we should consider investigating in time for 48 go-live.
Flags: needinfo?(nihsanullah)
Reporter | ||
Comment 3•8 years ago
|
||
From a URL perspective, most of them are facebook.com and many of the comments mention doing some operation on Facebook.
Comment 5•8 years ago
|
||
(In reply to Terrence Cole [:terrence] from comment #1)
> Andrew, could a missing ExposeToActiveJS cause this?
Not as far as I can think of.
> Is there anything that landed on 13 Jun that touches how we do rooting or traversal of the DOM?
Why June 13? It looks like this crash is happening on all branches.
Flags: needinfo?(continuation)
Reporter | ||
Comment 6•8 years ago
|
||
[Tracking Requested - why for this release]: 48 Beta crashes are now showing up, albeit in smaller volume. Nominating for tracking so we can keep on eye on this crash as well as investigate further if possible.
tracking-firefox48:
--- → ?
Comment 7•8 years ago
|
||
Track this per comment #6.
Updated•8 years ago
|
Assignee: nobody → terrence
Flags: needinfo?(nihsanullah)
Comment 8•8 years ago
|
||
FYI, the crash address in the report in comment 0 is misleading due to bug 974420. From the raw dump tab you can see "rcx": "0xe5e5e5e5e5e00000", and the crashing instruction from that dump is:
http://hg.mozilla.org/releases/mozilla-release/annotate/b0310cb90fd0/js/src/gc/Heap.h#l1154
1154 return isTenured() ? asTenured().getTraceKind() : JS::TraceKind::Object;
1055d54d6: f6 81 e8 ff 0f 00 01 testb $0x1,0xfffe8(%rcx)
Comment 9•8 years ago
|
||
Comment 10•8 years ago
|
||
Ted, awesome work!
That pointer is |cell->chunk()->trailer.runtime|. So it's |this| there that's 0xe5: the jemalloc poison pattern. Presumably, the DOM object that was storing the pointer on the C heap was freed before we marked it in the first slice.
We can check when buffering gray roots that the pointer we're seeing is not poisoned. Unfortunately, that will almost certainly just take us to [1], which is also too generic to tell us anything. Whatever is being freed is being freed while it is still held in that table, so just checking the table entries at insertion is not going to give us anything interesting. Moreover, the table entries are very generic, so we wouldn't necessarily even know what we could check.
Andrew is there anywhere in the Tracer called at [1] where we could check the pointer and still have enough information about the type of the owner to print anything useful in the crash message? Also, are there any other places where we buffer gray roots that might be implicated?
1- https://dxr.mozilla.org/mozilla-central/source/xpcom/base/CycleCollectedJSRuntime.cpp#1065
Flags: needinfo?(continuation)
Comment 11•8 years ago
|
||
What is it that buffers gray roots? I assume that it buffers up gray roots, then executes random code. What ensures that a dying object removes itself from that table?
Flags: needinfo?(continuation)
Comment 12•8 years ago
|
||
(In reply to Andrew McCreight [:mccr8] from comment #11)
> What is it that buffers gray roots? I assume that it buffers up gray roots,
> then executes random code. What ensures that a dying object removes itself
> from that table?
Chatted with Andrew about this on IRC. We copy the Cell* out of the CC thing and into a list in the GC in the first slice of GC, so the pointer must have been 0xe5 in the table. This means either that someone added an uninitialized pointer or it was deleted without having DropJSHolder called on it. Since Coverity, ASAN, and Valgrind will all catch the first, it's likely the latter. I think Andrew has some idea of where we can assert in order to catch this case and find out what's failing to drop.
Comment 13•8 years ago
|
||
(ASan detects use after free, but not uninitialized variables, to be pedantic.)
My theory is that uninitialized variables would be _always_ detected, while getting this UAF would require some very particular timing, so it would be harder to detect.
Anyways, I'm hoping Terrence can write and land a patch that somehow would crash in we try to add a poisoned value to gcGrayRoots, which will give us more information about exactly what kind of gray root it is.
Meanwhile, I'll look over places where we call HoldJSObject and DropJSObject, as those are the most obvious possible culprit. We may also be able to come up with a way to be less error-prone for that.
Comment 14•8 years ago
|
||
This will at least make us crash sooner in debug builds.
Attachment #8769389 -
Flags: review?(sphink)
Comment 15•8 years ago
|
||
Comment on attachment 8769389 [details] [diff] [review]
assert_no_poisoned_gray-v0.diff
Review of attachment 8769389 [details] [diff] [review]:
-----------------------------------------------------------------
Do we do msan runs anywhere?
Attachment #8769389 -
Flags: review?(sphink) → review+
Comment 16•8 years ago
|
||
Can't we just call cell->chunk()->trailer.runtime and do it in release builds or something?
> Do we do msan runs anywhere?
No, but Julian does full Mochitest runs with Valgrind regularly.
Reporter | ||
Comment 17•8 years ago
|
||
Is this a dupe of Bug 1265414 which was filed back in April?
Updated•8 years ago
|
Crash Signature: [@ js::] → [@ js::] [@ js::gc::GCRuntime::markBufferedGrayRoots]
Summary: Crash in js:: → Crash in js::gc::GCRuntime::markBufferedGrayRoots or js::
Comment 19•8 years ago
|
||
https://hg.mozilla.org/integration/mozilla-inbound/rev/010d92da655dd4619edb1fd9df79318899ac0db9
Bug 1280591 - Assert if we try to add a poisoned root; r=sfink
Updated•8 years ago
|
Keywords: leave-open
Comment 20•8 years ago
|
||
sorry had to back this out since we hit this assertion like https://treeherder.mozilla.org/logviewer.html#?job_id=31706830&repo=mozilla-inbound and on other tests as intermittent failure
Flags: needinfo?(terrence)
Comment 21•8 years ago
|
||
Backout by cbook@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/fd1350f7c3e1
Backed out changeset 010d92da655d for frequent assertion failures
Comment 22•8 years ago
|
||
It's worrying that this assertion fails so much!
Comment 23•8 years ago
|
||
Of the three crashes from Treeherder linked here, one is tracing mWrappedJSRoots, one is for CallbackObject, and one is not showing anything useful in the stack.
I don't understand how this assertion is failing so much, though. Why don't we crash in these conditions normally on Treeherder? Why has ASan never noticed this problem if we hit it 3 times in less than a day?
Updated•8 years ago
|
Assignee: terrence → continuation
Flags: needinfo?(terrence)
Comment 24•8 years ago
|
||
This patch calls the same function we seem to be crashing on in the wild. Does this look reasonable to you, Terrence? I'd push to try, but it is closed.
Attachment #8771427 -
Flags: feedback?(terrence)
Comment 25•8 years ago
|
||
Comment on attachment 8771427 [details] [diff] [review]
Check if gray roots are valid when we add them.
Review of attachment 8771427 [details] [diff] [review]:
-----------------------------------------------------------------
::: js/src/gc/RootMarking.cpp
@@ +415,5 @@
> {
> MOZ_ASSERT(runtime()->isHeapBusy());
> + MOZ_RELEASE_ASSERT(thing);
> + // Check if |thing| is a valid JS pointer.
> + MOZ_RELEASE_ASSERT(thing.asCell()->getTraceKind() <= JS::TraceKind::Null);
I don't think |enum class| supports ordering operations, but I guess |enum class : uint8_t| does? I guess if it compiles it's fine, although you might want to use != here instead just to be safe.
I'd also update the comment: it looks like the "validness" is coming from the check whereas that's kinda a sideshow given the null check just above. Maybe something like:
// Check if |thing| is corrupt by calling a method that touches the heap.
Attachment #8771427 -
Flags: feedback?(terrence) → feedback+
Comment 26•8 years ago
|
||
Attachment #8772620 -
Flags: review?(terrence)
Updated•8 years ago
|
Attachment #8769389 -
Attachment is obsolete: true
Attachment #8771427 -
Attachment is obsolete: true
Comment 27•8 years ago
|
||
Comment on attachment 8772620 [details] [diff] [review]
Check if gray roots are valid when we add them.
Review of attachment 8772620 [details] [diff] [review]:
-----------------------------------------------------------------
Welp, that looks like fewer false positives. Let's hope it turns up some intermittents.
Attachment #8772620 -
Flags: review?(terrence) → review+
Comment 28•8 years ago
|
||
Pushed by amccreight@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/923bcd0c37b0
Check if gray roots are valid when we add them. r=terrence
Comment 29•8 years ago
|
||
bugherder |
Comment 30•8 years ago
|
||
No sign of this on crash stats, as far as I can see, and there's one crash on Nightly with the markBufferedGrayRoots signature, so I don't know.
Reporter | ||
Comment 31•8 years ago
|
||
(In reply to Andrew McCreight [:mccr8] from comment #30)
> No sign of this on crash stats, as far as I can see, and there's one crash
> on Nightly with the markBufferedGrayRoots signature, so I don't know.
js:: never had crashes on nightly that I saw, so unfortunately that makes it a bit difficult to evaluate on nightly with only 1 crash. I think we would need more volume on nightly to really evaluate.
:js crash is currently #31 top crash on 47.0.1, all Mac crashes.
Comment 32•8 years ago
|
||
(In reply to Marcia Knous [:marcia - use ni] from comment #31)
> js:: never had crashes on nightly that I saw, so unfortunately that makes it
> a bit difficult to evaluate on nightly with only 1 crash. I think we would
> need more volume on nightly to really evaluate.
Sorry, I didn't really say what I meant. What I meant was, my patch was intended to change the crash from markBufferedGrayRoots to something else, by trying to detect the failure earlier. I've seen no evidence of a different signature.
js:: is not on Nightly any more due to bug 1283916, which fixed the signature. I do see a few instances of markBufferedGrayRoots there, which is what I think the proper signature is.
Comment 33•8 years ago
|
||
Crash volume for signature 'js::':
- esr (version 45): 35 crashes from 2016-04-07.
Crash volume on the last weeks:
Week N-1 Week N-2 Week N-3 Week N-4 Week N-5 Week N-6 Week N-7
- nightly 0 0 0 0 1 2 3
- aurora 4 2 2 2 0 3 0
- beta 135 115 124 136 104 105 24
- release 2305 2346 2213 2285 2339 2053 573
- esr 7 5 4 3 2 4 3
Affected platform: Mac OS X
status-firefox-esr45:
--- → affected
Comment 34•8 years ago
|
||
We will ship 48 with this crash, updating the flags
Comment 35•8 years ago
|
||
Crash volume for signature 'js::gc::GCRuntime::markBufferedGrayRoots':
- nightly (version 51): 3 crashes from 2016-08-01.
- aurora (version 50): 73 crashes from 2016-08-01.
- beta (version 49): 35 crashes from 2016-08-02.
- release (version 48): 66 crashes from 2016-07-25.
- esr (version 45): 1306 crashes from 2016-05-02.
Crash volume on the last weeks (Week N is from 08-22 to 08-28):
W. N-1 W. N-2 W. N-3
- nightly 0 2 1
- aurora 29 26 2
- beta 12 9 5
- release 14 26 14
- esr 111 109 94
Affected platforms: Windows, Mac OS X
Crash rank on the last 7 days:
Browser Content Plugin
- nightly
- aurora #128 #35
- beta #1049 #504
- release #1054
- esr #79
status-firefox51:
--- → affected
Comment 36•8 years ago
|
||
Looks like this is more of an issue on ESR that it is in other current releases. Maybe it was fixed along the way since 45 but not backported to ESR. I don't think we need to track this.
Terence, does that sound right to you? Do you want this on your radar? At least to fix for the next ESR (52).
Flags: needinfo?(terrence)
Comment 37•8 years ago
|
||
The patches here were diagnostic (and didn't catch anything). I think the likely fixes are somewhere in the massive tree of ExposeToActiveJS additions that bz and I made last month. Unfortunately, it's hard to say exactly what in that pile we'd need to uplift, and that's if I've even identified the right cause. I think we have to wont-fix on branches.
Flags: needinfo?(terrence)
Updated•8 years ago
|
This is a pretty low volume crash on current release 49.x and also on beta 50, we are a week away from RC, this is now a wontfix for 50.
Updated•8 years ago
|
Assignee: continuation → nobody
Comment 39•7 years ago
|
||
The crash still continues to happen in fully updated FF 54: https://crash-stats.mozilla.com/report/index/d1d2f2c8-d79a-4fb8-b7a0-1c5a71170701
I was not doing anything in the browser, it just died on its own while I was away. Thus it is impossible to reproduce.
Updated•7 years ago
|
Keywords: triage-deferred
Priority: -- → P3
Comment 40•6 years ago
|
||
Folding this into Bug 1203273 to track signatures a little better.
You need to log in
before you can comment on or make changes to this bug.
Description
•