Closed Bug 945250 Opened 9 years ago Closed 9 years ago

GenerationalGC: Assertion failure: addr % CellSize == 0, at ../gc/Heap.h:1081

Categories

(Core :: JavaScript Engine, defect)

x86
Linux
defect
Not set
major

Tracking

()

RESOLVED FIXED
mozilla29

People

(Reporter: decoder, Assigned: terrence)

References

Details

(Keywords: assertion, testcase, Whiteboard: [jsbugmon:ignore])

Attachments

(1 file, 1 obsolete file)

The following testcase asserts on mozilla-central built with --enable-exact-rooting --enable-gcgenerational, revision 53d55d2d0a25 (run with --fuzzing-safe):


var invalidTimeZoneNames = [ "", ];
try {
invalidTimeZoneNames.forEach(function (name) {
        var format = new Intl.DateTimeFormat(["de-de"], {timeZone: name});
});
} catch(exc1) {}
setObjectMetadataCallback(function(obj) {});
Assignee: general → terrence
Bugs like this are the reason I have grey hair. Good find Christian!

Sequence of events:
- Allocation of an ICGetProp_NativePrototype into the ICStubSpace LifoAlloc.
- The IC contains a HeapPtrObject; this puts a storebuffer entry referencing the LifoAlloc.
- |setObjectMetadataCallback| calls ReleaseAllJITCode, destructing the baseline script, which destructs the ICStubSpace LifoAlloc.
- The next MinorGC marks through the storebuffer entry pointing into the freed LifoAlloc.

Broken Assumptions / Expectation:
1) ICStubSpace is only destructed at "reasonable" times: e.g. with an empty nursery.
2) Use of a CellIter which impacts the validity of store-buffer entries.
3) Visual inspection and assertions find that we are using CellIter to visit "all" objects when the nursery is not empty.
4) More assertions find that we use some CellIterUnderGC sometimes not under GC.

Resolution:
1) Assert that the nursery is empty when we destruct ICStubSpace.
2+3) Trigger a MinorGC in CellIter's constructor.
4) Manual MinorGC in these cases.

It's going to take me a few more caffeinated hours to track down all the places where we're falling over for this. Hopefully I'll have time to do this tomorrow.
There is, unfortunately, a huge amount going on in this patch. Adding assertions kept finding new ways that we can get into broken states: I had to add and revamp several mechanisms to make all of the new assertions work together.

1) CellIter - The crux of this fuzz bug: if someone uses CellIter to look at "all" things and some of the things are still in the nursery, this is obviously bad. Adding an assertion here kept me backtracking until I had a MinorGC in front of basically every CellIter, so I just added the MinorGC to the CellIter constructor.

2) CellIterUnderGC - Unlike CellIter, this simply asserts as the nursery should be empty as we /should/ be empty when under GC. Unforuntately, this isn't 100% true as some uses of CellIterUnderGC are not under GC: most notably gc/Iteration.cpp. For these cases I added manual MinorGC in front of the iteration. I'm sad there are so many of these, but the new assertion should at least keep us from regressing.

3) Pre-barrier verifier - We need to evict the nursery here too or the verification is not complete when running with GGC. Sadly we enter and leave the verifier in pretty much every heap state and recursively from MinorGC. The cleanest I was able to get this is to manually evict the nursery before updating the heap state when entering the verifier and ensuring we check Nursery::isEmpty in Nursery::collect before infinitely recursing. Then the verifier can disable GGC for the duration without hitting the heap state assertions (because the nursery is already empty).

4) setObjectMetadataCallback - Since we now have two mechanisms that disable GGC from the shell -- the pre barrier verifier and the object metadata callback -- if the two nest incorrectly then we accidentally re-enable GGC when it should still be disabled. I made gcGenerationalEnabled into a disabled counter so that these can nest.

5) setObjectMetadataCallback (again) - The tests can (and do) call this function with a null callback multiple times in a row, breaking our nesting assumptions. I moved the disabling from the top level to just being around the actual invocation of the callback. This is better anyway. Sadly, these can recurse so we do actually need the stacky disabling still.

6) Zeal7 - Now that we are leaning heavily on our ability to test if the nursery is empty, we need to be able to do this correctly in zeal 7 too. This is pretty simple, but we do need a new variable in Nursery. I think I've added adequate asserts to ensure this doesn't get out-of-date.
Attachment #8344779 - Flags: review?(wmccloskey)
Comment on attachment 8344779 [details] [diff] [review]
evict_nursery_for_celliter-v0.diff

Bill appears to be out this week; switching review over, since this is blocking us now.
Attachment #8344779 - Flags: review?(wmccloskey) → review?(jcoppeard)
Comment on attachment 8344779 [details] [diff] [review]
evict_nursery_for_celliter-v0.diff

Review of attachment 8344779 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good, just a couple of very minor comments.

::: js/src/gc/Nursery.cpp
@@ +76,5 @@
>  js::Nursery::enable()
>  {
>      if (isEnabled())
>          return;
> +    JS_ASSERT(isEmpty());

This isn't going to assert position() == currentStart_ because isEmpty() checks isEnabled() first.

::: js/src/jsfriendapi.cpp
@@ +936,2 @@
>  #ifdef JSGC_GENERATIONAL
>      MinorGC(rt, JS::gcreason::API);

This works, but do we want |if (IsGenerationalGCEnabled(rt))| here?  It might be clearer.
Attachment #8344779 - Flags: review?(jcoppeard) → review+
A checkin-ready copy.
Attachment #8344779 - Attachment is obsolete: true
Attachment #8346955 - Flags: review+
Keywords: checkin-needed
https://hg.mozilla.org/mozilla-central/rev/04c1449b8497
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla29
What's going on with the object metadata callback changes in this patch?  The {Enable,Disable}GenerationalGC changes cause the testcase in bug 951213 to crash with a stock build (causing that bug to be backed out), asserting !empty() under the disable() call.  It seems rather strange that after a MinorGC() the nursery is not empty, but in any event calling the metadata callback is different from setting it in the first place and the problem with this bug seems to be that ReleaseAllJITCode doesn't trigger a minor GC.

I don't see any reason why calling the metadata callback should require the nursery to be disabled.  There isn't any difference in BaseShape between the metadata pointer and parent pointer, and the parent can be in the nursery.  The initial shape table keys things based off the metadata pointer, but also does that for the prototype, and the prototype can be in the nursery.

Even if the metadata pointer can't be in the nursery, there is a much less heavyweight way of ensuring that is the case --- the metadata callback is now C++ only and can't reenter JS, so just assert it doesn't give back a metadata pointer in the nursery.  The metadata callback is intended to be used in a devtools like setting in opt builds in real browsers, so enabling/disabling the nursery (with a minor GC) on every object allocation would destroy performance.
Flags: needinfo?(terrence)
Blocks: 951213
(In reply to Brian Hackett (:bhackett) from comment #8)
> What's going on with the object metadata callback changes in this patch? 

The rational here is the rather dumb jit-test that just unconditionally and repeatedly calls everything on the global in a loop: e.g. all the shell test functions. By happenstance, this enters the post-barrier verifier (disabling GGC), then calls setObjectMetadataCallback(null), which re-enables GGC, which then allows things to be allocated in the Nursery, which causes the post barrier verifier to assert. Additionally, this same test also calls setObjectMetadataCallback(null) many times in a row, so any trivial stacky solution with nice assertions doesn't work either.

> The {Enable,Disable}GenerationalGC changes cause the testcase in bug 951213
> to crash with a stock build (causing that bug to be backed out), asserting
> !empty() under the disable() call.

I think you mean JS_ASSERT(isEmpty()) at gc/Nursery.cpp:94?

> It seems rather strange that after a
> MinorGC() the nursery is not empty,
> but in any event calling the metadata
> callback is different from setting it in the first place and the problem
> with this bug seems to be that ReleaseAllJITCode doesn't trigger a minor GC.

CellIter::CellIter does a MinorGC, so I'm not sure how this is possible.

> I don't see any reason why calling the metadata callback should require the
> nursery to be disabled.  There isn't any difference in BaseShape between the
> metadata pointer and parent pointer, and the parent can be in the nursery. 
> The initial shape table keys things based off the metadata pointer, but also
> does that for the prototype, and the prototype can be in the nursery.

Makes sense.
 
> Even if the metadata pointer can't be in the nursery, there is a much less
> heavyweight way of ensuring that is the case --- the metadata callback is
> now C++ only and can't reenter JS,

It does re-enter itself, however, so I needed to introduce a stacky enable/disable for this.

> so just assert it doesn't give back a
> metadata pointer in the nursery.  The metadata callback is intended to be
> used in a devtools like setting in opt builds in real browsers, so
> enabling/disabling the nursery (with a minor GC) on every object allocation
> would destroy performance.

Ah. I concluded that you must have had an excellent reason to disable GGC, so I just assumed the called code was both not under our control and of no performance concern. Since all of the other debugger mechanisms already work fine with GGC and it's not impossible here, we should put in the effort to make it work. I'll file a bug.
Flags: needinfo?(terrence)
Keywords: verifyme
In case this is covered automatically, can I find anywhere mozilla-central built with --enable-exact-rooting --enable-gcgenerational?

I can build it myself if not, but it takes a while on my machine.
Flags: needinfo?(choller)
Flags: in-testsuite?
Ioana, you'd probably have to build it yourself to be sure.
You can find builds on Maple (https://tbpl.mozilla.org/?tree=Maple). We should probably just add the test to the suite though.
Thanks guys!

I didn't find Maple builds old enough to still contain this bug (I need to reproduce it locally first, so I can verify it reliably). So I built a mozilla-central JS shell with --enable-exact-rooting --enable-gcgenerational (revision 53d55d2d0a25) and I ran the testcase in comment 0 with --fuzzing-safe. I get no assertions or any other issues.

If anyone has any ideas of what else I could try to reproduce it, let me know. Otherwise, I think Terrence is right and we should just add the test to the suite.
Flags: needinfo?(choller)
Keywords: verifyme
You need to log in before you can comment on or make changes to this bug.