Closed Bug 931812 Opened 6 years ago Closed 6 years ago

GenerationalGC: Browser build test jittest failures: js/src/jit-test/tests/gc/bug-913261.js

Categories

(Core :: JavaScript Engine, defect)

x86
Linux
defect
Not set

Tracking

()

RESOLVED FIXED
mozilla28

People

(Reporter: jonco, Assigned: jonco)

References

Details

Attachments

(3 files)

This recently started failing in browser builds.  It works fine in the shell.

This test also fails intermittently in non-GGC builds, see bug 927754.
Actually this does reproduce in the shell, in 32 bit linux builds:

Catchpoint 1 (signal SIGSEGV), 0x08111f2a in js::gc::StoreBuffer::CellPtrEdge::isNullEdge (
    this=0x970e0f0) at ../gc/StoreBuffer.h:237
237	            return !*edge;

Catchpoint 1 (signal SIGSEGV), 0x08111f2a in js::gc::StoreBuffer::CellPtrEdge::isNullEdge (
    this=0x970e0f0) at ../gc/StoreBuffer.h:237
237	            return !*edge;
(gdb) bt
#0  0x08111f2a in js::gc::StoreBuffer::CellPtrEdge::isNullEdge (this=0x970e0f0)
    at ../gc/StoreBuffer.h:237
#1  0x08112ac8 in js::gc::StoreBuffer::MonoTypeBuffer<js::gc::StoreBuffer::CellPtrEdge>::mark (this=0x9615a00, trc=0xfff11304) at /home/jon/work/rooting/js/src/gc/StoreBuffer.cpp:118
#2  0x08111201 in js::gc::StoreBuffer::mark (this=0x9615938, trc=0xfff11304)
    at /home/jon/work/rooting/js/src/gc/StoreBuffer.cpp:281
#3  0x087509de in js::Nursery::collect (this=0x96158d8, rt=0x9614e20, 
    reason=JS::gcreason::FULL_STORE_BUFFER)
    at /home/jon/work/rooting/js/src/gc/Nursery.cpp:603
#4  0x081cbb98 in js::MinorGC (rt=0x9614e20, reason=JS::gcreason::FULL_STORE_BUFFER)
    at /home/jon/work/rooting/js/src/jsgc.cpp:4799
#5  0x0818133c in js_InvokeOperationCallback (cx=0x9628170)
    at /home/jon/work/rooting/js/src/jscntxt.cpp:1014
#6  0x081813b4 in js_HandleExecutionInterrupt (cx=0x9628170)
    at /home/jon/work/rooting/js/src/jscntxt.cpp:1038
#7  0x086a13f2 in js::jit::InterruptCheck (cx=0x9628170)
    at /home/jon/work/rooting/js/src/jit/VMFunctions.cpp:471
#8  0x0869fd75 in js::jit::CheckOverRecursed (cx=0x9628170)
    at /home/jon/work/rooting/js/src/jit/VMFunctions.cpp:126

This is actually the same as the original bug, bug 913261.
From some manual bisection, it looks like this was triggered by the fix for bug 928971.

This suggests that the script is now being compiled where it wasn't before and that's where the problem is, and running with --no-ion does indeed makes the test pass.
Assignee: nobody → jcoppeard
This is what I think's happening:

A minor GC is marking the watchpoints table as part of the root set.  The table has movable JSObjects as part of the key and part of the value.  The value objects are stored in a RelocatablePtrObject, and some of them have store buffer entries.

When the value is marked, the RelocatablePtrObject is cast to its superclass EnclapsulatedPtrObject and passed to MarkObject.  This updates the value in place, and does not trigger the postbarrier to remove the storebuffer entry.

Many of the watchpoint table keys are also moved, and table entries rekeyed as part of the marking process.  This causes the hashtable to double its size due to the number of removed entries, and the table is reallocated.  

Now when the store buffer gets marked, some of the entries point into the old hash table which has been unmapped by this point, and the program crashes.

There are several issues here:
 - we allow RelocatablePtr (and related) objects to be silently cast to EncapsulatedPtr, circumventing the postbarriers
 - I was surprised that the hash table resized itself in response to rekeying, but I don't know if this is actually a bug
 - since we always mark these values in a minot GC, we don't need to postbarrier them at all, as I pointed out myself in bug 924690

I think the simplest fix is to to mark the store buffer first in a minor GC - this means we don't have to trigger postbarriers on further marking, since marking can only remove entries from the store buffer, and if we mark it first then it will always be empty.  I'm testing this at the moment.

I made an attempt to disconnect the inheritance of RelocatablePtr etc from EncapsulatedPtr, but this didn't turn up any other places where we did this.  It may be worth doing it for safety's sake though.

We should also remove the postbarriering of weakmaps totally, which also fixes the problem.
Patch to mark store buffer first.
Attachment #824171 - Flags: review?(terrence)
Attachment #824171 - Attachment description: mark-sb-first → 1 - Mark store buffer first
Comment on attachment 824171 [details] [diff] [review]
1 - Mark store buffer first

Review of attachment 824171 [details] [diff] [review]:
-----------------------------------------------------------------

Sorry for the late review. I was trying to remember why I moved the store buffer marking down from the top to after MarkRuntime originally. I think this was because it needed to happen after the full heap marking when we were doing that on overflow, for reasons I still don't remember. I agree this is a better spot for it and I suppose the fuzzers will tell us if we are wrong.

(In reply to Jon Coppeard (:jonco) from comment #3)
> This is what I think's happening:
> 
> A minor GC is marking the watchpoints table as part of the root set.  The
> table has movable JSObjects as part of the key and part of the value.  The
> value objects are stored in a RelocatablePtrObject, and some of them have
> store buffer entries.

That is indeed rather silly. I wonder why we ended up marking through both paths. I guess I probably just forgot that those are marked by the runtime.

> When the value is marked, the RelocatablePtrObject is cast to its superclass
> EnclapsulatedPtrObject and passed to MarkObject.  This updates the value in
> place, and does not trigger the postbarrier to remove the storebuffer entry.

Sadly, this is the only thing that isn't working as planned. Even sadder, it is working as intended: Bill and I decided to make those subclasses so that we would only need one implementation for the markers. I thought that I had added an overload specifically for Relocatable because of an earlier fuzz bug, but I guess that's not there now?

> Many of the watchpoint table keys are also moved, and table entries rekeyed
> as part of the marking process.  This causes the hashtable to double its
> size due to the number of removed entries, and the table is reallocated.  

I think the idea is that it is allowed to resize, but it is safe on OOM failure.

> Now when the store buffer gets marked, some of the entries point into the
> old hash table which has been unmapped by this point, and the program
> crashes.
> 
> There are several issues here:
>  - we allow RelocatablePtr (and related) objects to be silently cast to
> EncapsulatedPtr, circumventing the postbarriers
>  - I was surprised that the hash table resized itself in response to
> rekeying, but I don't know if this is actually a bug
>  - since we always mark these values in a minot GC, we don't need to
> postbarrier them at all, as I pointed out myself in bug 924690
> 
> I think the simplest fix is to to mark the store buffer first in a minor GC
> - this means we don't have to trigger postbarriers on further marking, since
> marking can only remove entries from the store buffer, and if we mark it
> first then it will always be empty.  I'm testing this at the moment.

I agree we should do this to solve the short term problem, but we really need a less fiddly solution for this in the long term. Ideally we could reset the store buffer as part of ::mark and then assert that the store buffer is empty at the end of the minor GC. This would be bug 887030. There are some major challenges though: we'd have to switch all of our hashtables over to custom barriers with attendant manual insertions. And we'd have to find some way to deal with Heap<T>, maybe MOZ_HEAP_CLASS.

> I made an attempt to disconnect the inheritance of RelocatablePtr etc from
> EncapsulatedPtr, but this didn't turn up any other places where we did this.
> It may be worth doing it for safety's sake though.

Yes, that would probably be worth doing. It sounds like you've even got patches ready.

> We should also remove the postbarriering of weakmaps totally, which also
> fixes the problem.

You mean watchpoints, right?
Attachment #824171 - Flags: review?(terrence) → review+
Comment on attachment 824172 [details] [diff] [review]
2 - remove unnecessary postbarriering of watchpoints table

Review of attachment 824172 [details] [diff] [review]:
-----------------------------------------------------------------

r=me
Attachment #824172 - Flags: review?(terrence) → review+
(In reply to Terrence Cole [:terrence] from comment #6)

Thanks for the review.  I'll work on the RelocatablePtr patches next then.

> You mean watchpoints, right?

Yeah, I mean watchpoints.
Blocks: 931984
Brian is hitting this trying to land bug 931984. I told him to go ahead and land these patches with his changes, although the tree has been closed all day. If the tree does open up before one of us gets to it, please land these.
Whiteboard: [leave open]
This patch adds a set of BarrieredFoo base classes, which are separately derived by EncapsulatedFoo, HeapFoo and RelocatableFoo classes.

The base class doesn't provide any options to change the contained value (except for unsafeGet(), which is used by the Mark functions).

This prevents accidental passing of RelocatableFoo as EncapsulatedFoo, which would have then allowed the contained value to be changed without invoking the post barriers.
Attachment #825833 - Flags: review?(terrence)
Comment on attachment 825833 [details] [diff] [review]
3 - disallow casting of RelocatableFoo to EncapsulatedFoo

Review of attachment 825833 [details] [diff] [review]:
-----------------------------------------------------------------

This is a nice cleanup; I particularly love how this explicitly fires no barriers during marking: it should get us most of the way to removing JSRuntime::needsBarriers. Unfortunately, however, you forgot about the pre-barriers that were implicitly fired by Encapsulated; they're now missing from all of the classes that used to derive from Encapsulated.

::: js/src/gc/Barrier.h
@@ +347,5 @@
>   *
>   * Not to be confused with JS::Heap<T>.
>   */
>  template <class T, class Unioned = uintptr_t>
> +class HeapPtr : public BarrieredPtr<T, Unioned>

I think HeapPtr is going to be missing pre-barriers now since it doesn't derive from Encapsulated anymore?

@@ +418,5 @@
>      }
>  };
>  
>  template <class T>
> +class RelocatablePtr : public BarrieredPtr<T>

Ditto for Relocatable.

@@ +663,5 @@
>   * A pre- and post-barriered heap JS::Value, for use inside the JS engine.
>   *
>   * Not to be confused with JS::Heap<JS::Value>.
>   */
> +class HeapValue : public BarrieredValue

Ditto.

@@ +763,5 @@
>          writeBarrierPost(rt, value, &value);
>      }
>  };
>  
> +class RelocatableValue : public BarrieredValue

Ditto.

@@ +1059,5 @@
>   * A pre- and post-barriered heap jsid, for use inside the JS engine.
>   *
>   * Not to be confused with JS::Heap<jsid>.
>   */
> +class HeapId : public BarrieredId

Ditto.
Attachment #825833 - Flags: review?(terrence)
(In reply to Terrence Cole [:terrence] from comment #12)

Where do these pre-barriers get implicitly fired?  The base class still calls pre() in its destructor.  EncapsulatedFoo currently calls pre() in its assignment operator methods, but these were overridden in both HeapFoo and RelocatableFoo.  The overrides do also call pre().

Am I missing something?
Comment on attachment 825833 [details] [diff] [review]
3 - disallow casting of RelocatableFoo to EncapsulatedFoo

Review of attachment 825833 [details] [diff] [review]:
-----------------------------------------------------------------

You are quite right: I missed that ~BarrieredPtr calls pre(). Sorry, r=me.
Attachment #825833 - Flags: review+
https://hg.mozilla.org/mozilla-central/rev/da5df68e8857
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla28
You need to log in before you can comment on or make changes to this bug.