Closed Bug 991752 Opened 7 years ago Closed 4 years ago

crash in GetObjectAllocKindForCopy


(Core :: JavaScript: GC, defect)

31 Branch
Windows NT
Not set



Tracking Status
firefox31 + wontfix


(Reporter: lizzard, Unassigned)



(Keywords: crash, Whiteboard: [GGC])

Crash Data


(1 file)

This bug was filed from the Socorro interface and is 
report bp-09c3b5ad-cc94-46a7-9a85-ce8dc2140402.
Another Nursery related topcrasher in Firefox 31.0a1, first appearing in the 2014032903 build, currently with 200 out of 5697 crashes.
Terrence, there are a bunch of related-looking crash signatures coming up since 3-29, I'm wondering if you want me to mark them all as related to any particular bug. For now I'll cc you on them since I think it might be one of your commits from 3-28...
Flags: needinfo?(terrence)
This is currently #9 (1.73%) in Firefox Nightly 31.0a1.
Summary: crash in GetObjectAllocKindForCopy → [GGC?] crash in GetObjectAllocKindForCopy
In a crash comment it's saying "Always crushes on Soundcloud".
Unable to reproduce, 31.0a1 (2014-04-03), win 7 x64.
I can trigger a crash by browsing soundcloud with an optdebug build.  I get the some information, but then gdb dies with an internal error:

Assertion failure: MIR instruction returned value with unexpected type, at /home/jon/work/rooting/js/src/jit/IonMacroAssembler.cpp:1274

Program received signal SIGTRAP, Trace/breakpoint trap.
0x00007fffd241f26f in ?? ()
(gdb) bt
#0  0x00007fffd241f26f in ?? ()
#1  0x0000000000000009 in ?? ()
#2  0x00007fffc0211ce0 in ?? ()
#3  0x00007fffffffa2f0 in ?? ()
#4  0x00007fffd1cf4a80 in ?? ()
#5  0x00007fffd317a0e0 in ?? ()
#6  0x00007fffffff9570 in ?? ()
#7  0x00007fffd941bdcc in ?? ()
#8  0x0000000000000000 in ?? ()
(gdb) info threads
  Id   Target Id         Frame 
  57   Thread 0x7fffe2b22700 (LWP 10517) "ImageDecoder #4" 
/build/buildd/gdb-7.6.1/gdb/dwarf2read.c:10350: internal-error: dwarf2_record_block_ranges: Assertion `dwarf2_per_objfile->ranges.readin' failed.
A problem internal to GDB has been detected,
further debugging may prove unreliable.
Summary: [GGC?] crash in GetObjectAllocKindForCopy → crash in GetObjectAllocKindForCopy
Whiteboard: [GGC]
Almost all of these are 0xffffffffffffff87 or 0xffffffffffffff85, aka -121 or -123 (which is a large odd negative offset to a NULL pointer.)

The failure is in the line:

    if (obj->is<ArrayObject>()) {

when obj->getClass() is called. obj is a valid pointer. getClass() is type_->clasp(). clasp() is just this->clasp_ here, I think (a plain field access).

The assembly is:

71EEA3A3  mov         eax,dword ptr [ecx+4]  
71EEA3A6  mov         eax,dword ptr [eax]  <-- crash is here

obj is 0x08904ed0, which is in $ecx. type_ is at offset 4. $eax is 0xffffff87. So this isn't an offset from a nullptr, at least not directly. The 0xffffff87 value was stored in type_ (which is a HeapPtrTypeObject.)

terrence, when moving things out of the nursery, does the type_ field get overwritten with a forwarding pointer or something like it? I thought that only touched the first field (shape_), but I don't actually know.

These values are for 32-bit Windows, btw.
Ok, so when moving stuff out of the nursery, everything is reinterpreted as a RelocationOverlay. Which would put 0xbad0bad1 into shape_ and a destination pointer into type_. So if it's the nursery scrozzling this pointer, it's miscomputing the destination pointer.

Unfortunately, the minidump does not contain the memory at 'obj', so I can't look at eg the shape_ pointer.
Blocks: 994589
One crash since bug 992535 landed, so it looks like failure to mark a nursery slot can indeed reliably lead to this behavior. So the pointer we're looking through is likely pointing at the nursery after it has been dead. Oddly, the behavior is the same both before and after landing nursery poisoning. I'm not sure how this makes any sense.
Flags: needinfo?(terrence)
From crash-stats, this is still looking like another issue fixed by bug 992535 which landed in the 2014040803, like bug 991746.  Since then, there have only been a couple of crashes on builds after 0408. This is no longer a topcrasher so may not need to be tracked for 31; however it also isn't completely gone. 

Terence, thanks very much for your explanations!
Untrack (cf comment #9). Thanks Liz!
(In reply to Sylvestre Ledru [:sylvestre] from comment #10)
> Untrack (cf comment #9). Thanks Liz!

We should only do this if we are really sure this is solved. For one thing, GGC crashes can all easily be security issues and for the other, we paid dearly for early untracking in bug 970483.
No update on this for a month and beta 8 is going to be released today. Untracking.
It seems that there might be paths that update the AccessorShapeness of a Shape without shrinking the Shape. This doesn't explain how we can get from a valid getter/setterObj pointer to garbage in that slot, but it's the only lead we really have at this point. This moves the getter/setterObj barrier code up to the shape so that we can double-check hasGetterObject() before updating the barrier. Let's check this in and see if our crash stats move.
Attachment #8588780 - Flags: review?(jcoppeard)
Comment on attachment 8588780 [details] [diff] [review]

Review of attachment 8588780 [details] [diff] [review]:

Looks good!

::: js/src/vm/Shape.h
@@ +1256,5 @@
> +    MOZ_ASSERT(shape);
> +    MOZ_ASSERT(obj);
> +    if (gc::StoreBuffer* sb = reinterpret_cast<gc::Cell*>(obj)->storeBuffer())
> +        sb->putGeneric(ShapeGetterSetterRef(shape, obj));
> +}

Possible improvement: We could change this to just take a shape and check whether either getter or setter (if present) are in the nursery, and then we only push one store buffer entry maximum.
Attachment #8588780 - Flags: review?(jcoppeard) → review+
Crash volume is dramatically reduced with this landing. The remaining trickle appears to be a mixed bag. I'll give some thought to how we can narrow down the common cases.
There are no crash reports with this signature since FF39, so closing this.
Closed: 4 years ago
Keywords: leave-open
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.