Closed Bug 961091 Opened 6 years ago Closed 6 years ago

GenerationalGC: Crashtest 693212.xhtml fails on MacOSX platforms only

Categories

(Core :: JavaScript Engine, defect)

defect
Not set

Tracking

()

RESOLVED FIXED
mozilla30

People

(Reporter: jonco, Assigned: jonco)

References

Details

Attachments

(3 files, 1 obsolete file)

https://tbpl.mozilla.org/php/getParsedLog.php?id=33165753&tree=Try#error0

04:56:08     INFO -  REFTEST TEST-UNEXPECTED-FAIL | file:///builds/slave/talos-slave/test/build/tests/reftest/tests/content/base/crashtests/693212.xhtml | load failed: timed out waiting for test to complete (waiting for onload scripts to complete)
We are doing hundreds of store buffer compactions for this case, as it seems the operation callback is not being triggered.
So this is quite a pathological case, but this is what is happening:

The test sets an onDOMSubtreeModified handler on a DOM element which also modifies that element, then causes it to fire.  This leads to infinite recursion between the DOM modifying the element and the JS engine running the handler.  This is aborted at some point when we hit the recursion limit (I'm not sure how this actually happens).  The stack unwinds and the DOM creates an exception object for every invocation of the JS engine on the way down.  These objects contain a stack backtrace created by JS::DescribeStack(), each one containing up to 100 FrameDescription objects, these containing a couple of Heap<T>s.  

Constructing these Heap<T>s adds store buffer entries, and we soon fill up the store buffer and trigger the operation handler so that a GC will happen.  But we don't ever actually invoke the operation handler because the interrupt flag is not checked on this path.  We check it (AIUI) on script entry and loop backedges - neither of which we have here.  So, we end up compacting the store buffer for every entry added, which makes this very slow.  Also, we end up growing the store buffer beyond its intended size, which risks an unhandleable OOM.
Assignee: nobody → jcoppeard
Attached patch check-interrupts-on-exit (obsolete) — Splinter Review
So the first thing that can be done to improve this is to make sure the operation callback gets invoked and we get a GC if one has been triggered.

This is a patch to check the interrupt flag before we leave the engine following an Invoke() call.  It seems like this would be sensible to do anyway, rather than allowing the possibility of returning to the browser with an interrupt pending.

I checked and this does happen already in normal browser operation, albeit rarely.  

An alternative to this might be to check for pending GC on allocation of a GC thing.
Attachment #8370011 - Flags: review?(terrence)
The next thing we can do is create fewer storebuffer entries by not constructing FrameDescription and hence Heap<T>s on the stack.
Attachment #8370012 - Flags: review?(terrence)
Finally, after we have set our 'about to overflow' flag and triggered a GC, we don't need to compact the storebuffers unless they actually reach their maximum size.
Attachment #8370014 - Flags: review?(terrence)
Comment on attachment 8370014 [details] [diff] [review]
3 - less-compaction-after-gc-triggered

Review of attachment 8370014 [details] [diff] [review]:
-----------------------------------------------------------------

Great point! r=me
Attachment #8370014 - Flags: review?(terrence) → review+
Comment on attachment 8370012 [details] [diff] [review]
2 - frame-desc-fewer-postbarriers

Review of attachment 8370012 [details] [diff] [review]:
-----------------------------------------------------------------

Nice!
Attachment #8370012 - Flags: review?(terrence) → review+
Comment on attachment 8370011 [details] [diff] [review]
check-interrupts-on-exit

Review of attachment 8370011 [details] [diff] [review]:
-----------------------------------------------------------------

The invoke stack is already quite large, but it's still better than a goto. r=me

Please also add this check when allocating in the Nursery's interpreter path. If we are in the interpreter, object allocation perf is much less important. I'm pretty sure only objects are typically allocated in the jit though, so lets restrict the check to nursery allocations for now.
Attachment #8370011 - Flags: review?(terrence) → review+
Stealing to get this in, since Jon is on PTO for the next while.
Assignee: jcoppeard → terrence
And unstealing myself, since I got the dates wrong!
Assignee: terrence → jcoppeard
So the original patch caused some failures that took a while to track down.

The operation callback can attach Ion compilations that have been completed by worker threads, and it seems that this can clear pending exceptions.  This interferes with exception handling and makes jit test ion/bug945811.js fail intermittently.  I don't understand enough about how this works to say if this is a bug or expected.

Anyway, we can just check for whether a GC is necessary at these points instead.
Attachment #8370011 - Attachment is obsolete: true
Attachment #8371677 - Flags: review?(terrence)
Comment on attachment 8371677 [details] [diff] [review]
1 - check-interrupts-on-exit v2

Review of attachment 8371677 [details] [diff] [review]:
-----------------------------------------------------------------

I was worried that doing the full interrupt check would have that problem. This seems less likely to be problematic. r=me

::: js/src/jsgc.cpp
@@ +5060,5 @@
> +
> +#ifdef JSGC_GENERATIONAL
> +    if (rt->gcStoreBuffer.isAboutToOverflow())
> +        MinorGC(cx, JS::gcreason::FULL_STORE_BUFFER);
> +#endif

If gcIsNeeded, the minor GC we see will have gcTriggerReason, not FULL_STORE_BUFFER. I think we should move the MinorGC above the full GC to be sure we see the FULL_STORE_BUFFER with correct timing stats in our logging.

::: js/src/jsgcinlines.h
@@ +433,5 @@
>  #endif
>  
> +        if (rt->interrupt) {
> +            /*
> +             * Invoking the operation handler can fail, and we can't usefully

The two phrases here are complete, so I don't think the comma is necessary.

@@ +434,5 @@
>  
> +        if (rt->interrupt) {
> +            /*
> +             * Invoking the operation handler can fail, and we can't usefully
> +             * handle that here.  Just check in case we need to collect instead.

One space between sentences.

::: js/src/vm/Interpreter.cpp
@@ +493,5 @@
> +{
> +    bool ok = DoInvoke(cx, args, construct);
> +    js::gc::GCIfNeeded(cx);
> +    return ok;
> +}

It may be cleaner to do:

struct AutoGCIfNeeded {
    JSContext *cx_;
    AutoGCIfNeeded(JSContext *cx) : cx_(cx); {}
    ~AutoGCIfNeeded() { js::gc::GCIfNeeded(cx_); }
} gcIfNeeded(cx);

In the body of js::Invoke.
Attachment #8371677 - Flags: review?(terrence) → review+
(In reply to Terrence Cole [:terrence] from comment #12)

Great, yes this is a much better solution for Invoke().
Depends on: 975947
Depends on: 986864
No longer depends on: 986864
I happened across the AutoGCIfNeeded in js::Invoke and I was wondering if the fix to the problem in comment 2 could have instead been to add a CheckForInterrupt into the loop that was leading to the operation callback not being checked enough.  That is, it would seem that the root of the bug was us not checking the op callback frequently enough.
You need to log in before you can comment on or make changes to this bug.