Closed Bug 1441143 Opened 4 years ago Closed 4 years ago

MOZ_RELEASE_ASSERT(chunk->arena->mMagic == 0x947d3d24) from addEnumerableDataProperty

Categories

(Core :: JavaScript Engine, defect)

Unspecified
Windows 10
defect
Not set
critical

Tracking

()

RESOLVED WORKSFORME
Tracking Status
firefox59 --- unaffected
firefox60 --- fixed

People

(Reporter: mccr8, Unassigned)

References

Details

(4 keywords, Whiteboard: [fixed by backout])

Crash Data

This signature is fairly generic (see bug 1440927), but I noticed that the Nightly incarnation of this signature is almost all MOZ_RELEASE_ASSERT(chunk->arena->mMagic == 0x947d3d24), which likely indicates heap corruption or something else going terribly wrong, and seemed like it is all stacks involving addEnumerableDataProperty.

This bug was filed from the Socorro interface and is
report bp-ad8fe184-4ed2-4ccf-a54b-b11c20180225.
=============================================================

Top 10 frames of crashing thread:

0 mozglue.dll BaseAllocator::realloc memory/build/mozjemalloc.cpp:4205
1 mozglue.dll Allocator<MozJemallocBase>::moz_arena_realloc memory/build/malloc_decls.h:39
2 xul.dll js::MallocProvider<JS::Zone>::pod_realloc<unsigned char> js/src/vm/MallocProvider.h:173
3 xul.dll js::ReallocateObjectBuffer<js::HeapSlot> js/src/gc/Nursery-inl.h:134
4 xul.dll js::NativeObject::growSlots js/src/vm/NativeObject.cpp:415
5 xul.dll js::NativeObject::addEnumerableDataProperty js/src/vm/Shape.cpp:684
6 xul.dll SetNonexistentProperty<1> js/src/vm/NativeObject.cpp:2656
7 xul.dll js::NativeSetProperty<1> js/src/vm/NativeObject.cpp:2808
8 xul.dll js::SetObjectElement js/src/vm/Interpreter.cpp:4635
9  @0x67b5cb7886 

=============================================================

Marcia posted a possible regression range in bug 1440927 comment 1, though I see literally no JS changes in that set, aside from what looks like a math library thing:

(In reply to Marcia Knous [:marcia - needinfo? me] from comment #1)
> Confirming signature since I see crashes in crash stats in 58/59/60. The
> reporter's crash is in 59b10. I moved into a better component, although I am
> not sure it is the correct place.
> 
> On 60 the crashes spiked a bit starting 20180224103027, but appear to have
> started using 20180219100221. The crashes in 60 seem to all have the crash
> reason "EXCEPTION_BREAKPOINT" - https://mzl.la/2EUj3t6 is the possible
> regression range based on Build ID for nightly only. For the last three days
> we are averaging around 20 crashes a day in nightly.
Jan, any ideas what might have changed recently that could have broken this?
Flags: needinfo?(jdemooij)
Keywords: regression
Ah, ok, if I search just for the release assert, then it looks like that started in build 20180224103027, as Marcia noted, which gives this regression range: https://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=ad3c6f89d67752309a473e57a47fb88f9da37683&tochange=b3191953ccdabd335ffb383905a36b0062e547b2

Strings are involved, so as a first guess I'm going to blame bug 903519. :)
Flags: needinfo?(jdemooij) → needinfo?(sphink)
Well, this might be shape related, so searching through that range for "shape" turns up bug 1438086 and bug 1437507.
Marcia, it is possible this is a similar issue to the garbage signatures you posted about on the stability list. If something is trashing memory enough that we're hitting jemalloc asserts, I could imagine that whatever data structures the crash reporter relies on could also get trashed sometimes.

I'm going to mark this sec-critical, because something seems to be going really wrong. Bug 903519 got backed out for causing GC assertions on TreeHerder, so we can see if these mysterious issues go away, too.
Needinfo to Marcia to be sure that she sees this.
Flags: needinfo?(mozillamarcia.knous)
(In reply to Andrew McCreight [:mccr8] from comment #5)
> Needinfo to Marcia to be sure that she sees this.

Thanks Andrew, glad you noticed that there might be a correlation.
Flags: needinfo?(mozillamarcia.knous)
Another way for such an assert to happen is rare cases of double-free, where what is freed is the last allocation in a chunk.
Taking a quick look at one of these crashes, I see that the NativeObject that we are adding properties to has slots_ pointing just after the native object. This should only be possible for Nursery-allocated objects at [1]. Since we call pod_realloc, we failing one of these checks: https://searchfox.org/mozilla-central/rev/056a4057575029f58eb69ae89bc7b6c3c11e2e28/js/src/gc/Nursery.cpp#449,452 .

Looking a little closer, the ChunkLocation of the chunk containing both this object and it's slots_ is 0x00200001 which is illegal. (Nursery should be 0x00000001).

And with that, I am at a loss of what is happening.

[1] https://searchfox.org/mozilla-central/rev/056a4057575029f58eb69ae89bc7b6c3c11e2e28/js/src/gc/Nursery.cpp#308
Bug 903519 seems like a prime candidate, because of the association with both strings and the nursery. It was backed out on the 26th. I see lots of crashes in the 20180226 nightly, but I don't know if it was still in that one or not.

I guess I'll wait to see if the 20180227 nightly shows up.
The backout changeset on hg.mozilla.org says it's in the 20180226230123 nightly, but not in the 20180226100105 one.
And crash stats indicates there haven't been any crashes with 20180226230123 so far.
(In reply to Mike Hommey [:glandium] from comment #11)
> And crash stats indicates there haven't been any crashes with 20180226230123
> so far.

To be clear, there haven't been any crashes with the BaseAllocator::realloc signature for that nightly, but there were other crashes for that nightly.
Blocks: 903519
No longer blocks: 903519
Are we in good state at this point with the backout mentioned?
Flags: needinfo?(continuation)
Yeah, I don't see any crashes with this signature that are hitting the same release assert in the last week, except for a build from before the backout.
Status: NEW → RESOLVED
Closed: 4 years ago
Flags: needinfo?(continuation)
Resolution: --- → WORKSFORME
Whiteboard: [fixed by backout]
Group: javascript-core-security
Note that while in theory this is still active, the restructuring that tcampbell means it either won't happen or won't show up in the same way. (And it's likely that one of the other fixes really did fix this.)
Flags: needinfo?(sphink)
Sorry. "Still active" in terms of it returning if nursery strings are re-enabled. This is certainly not active right now.
You need to log in before you can comment on or make changes to this bug.