Closed Bug 456826 Opened 13 years ago Closed 13 years ago

TM: fatal assert on OOM when running jit code

Categories

(Core :: JavaScript Engine, defect, P1)

x86
macOS
defect

Tracking

()

VERIFIED FIXED

People

(Reporter: bzbarsky, Assigned: gal)

References

Details

(Keywords: verified1.9.1)

Attachments

(4 files, 8 obsolete files)

1.85 KB, text/plain
Details
16.35 KB, patch
Details | Diff | Splinter Review
6.93 KB, patch
gal
: review+
Details | Diff | Splinter Review
4.41 KB, text/plain
Details
Attached file Testcase
JS shell testcase attached.  It dies with the default maxBytes too, but the lower number might make it easier to debug.  Basically, we go to js_BoxDouble from jit code, fail to do so, side-exit, go to restore our stack, and fail to do _that_, with the result that we have fp->slots pointing into a block of 0xdadadada.  Then later the GC tracer tries to trace these pointers, and becomes highly unhappy, and asserts.

Note that a debug build is needed to guarantee reliable crashing, since in an opt build we'll just trace random memory, which might succeed.
Flags: blocking1.9.1?
Severity: normal → critical
No longer blocks: 449534
Blocks: 449534
Assignee: general → gal
Priority: -- → P1
Ok, here is what I think happens:

We call FlushNative*Frame twice, once in the deep frame recovery for nested trees, and then for the actual frame on the top. This is not safe since the first causes the GC to trigger and we die a horrible death since references have been collected that the native frame still holds.
Attached patch prototype for pool creation (obsolete) — Splinter Review
Here is a patch that adds support for set-aside double pool. The pool is stored as a per-thread JSTraceMonitor.doubleRecoveryList, which is a list of ready to use doubles. It is created on the first trace after the last GC.

It is assumed that the only safe usage for the pool is to take doubles from it right before an explicit js_GC call.
So how come we're only worried about doubles?  What about objects, say?  I'm avoiding temp objects in this code, but if someone doesn't do that..

I guess because we don't need to alloc those for stack restoration?
(In reply to comment #3)
> So how come we're only worried about doubles?  What about objects, say?

Because objects are not inlined and allocated as-is. The problem for doubles comes form the existence of js_BoxDouble.
Attachment #340356 - Attachment is obsolete: true
Attachment #340390 - Attachment is obsolete: true
Attachment #340398 - Attachment is obsolete: true
Attachment #340398 - Attachment is obsolete: false
Attachment #340398 - Attachment is patch: true
Attachment #340398 - Attachment mime type: application/octet-stream → text/plain
Attachment #340398 - Attachment is obsolete: true
Attachment #340415 - Attachment is patch: true
Attachment #340415 - Attachment mime type: application/octet-stream → text/plain
With the patch the test case does work without gcparam(), but using gcparam("maxBytes", 200*1000); silently fails.
Attachment #340415 - Attachment is obsolete: true
Good progress. Thanks a lot Igor.
Attachment #340433 - Attachment is obsolete: true
Attachment #340467 - Flags: review?(brendan)
Could I see the nit-picked, latest and greatest patch?

/be
Comment on attachment 340467 [details] [diff] [review]
Use a recovery pool of doubles to make sure we can safely recover in case of OOM or out of doubles. Doesn't run the test case to completion under memory pressure. Needs debugging.

>+    jsdouble              **recoveryDoublePool;
>+    jsdouble              **recoveryDoublePoolPtr;

Nit about C++-style declarator and type cuddling already fixed.

>+    /* replenish the reserve pool (this might trigger a GC */
>+    if (tm->recoveryDoublePoolPtr <= tm->recoveryDoublePool + MAX_NATIVE_STACK_SLOTS) {

The condition here should be <  not <= -- right?

>+        JSRuntime* rt = cx->runtime;
>+        uintN gcNumber = rt->gcNumber;
>+        const void* ip = f->ip;
>+        if (!ReplenishReservePool(cx, tm) || (gcNumber != rt->gcNumber)) {

Don't overparenthesize higher-precedence ops against || and &&.

>+            *treep = tm->fragmento->newLoop(ip);

Is this happening? Might be worth breakpointing and stress-testing.

>@@ -2605,17 +2624,17 @@ js_ExecuteTree(JSContext* cx, Fragment**
> 
> #ifdef DEBUG
>     // Verify that our state restoration worked
>     for (JSStackFrame* fp = cx->fp; fp; fp = fp->down) {
>         JS_ASSERT(!fp->callee || JSVAL_IS_OBJECT(fp->argv[-1]));
>         JS_ASSERT(!fp->callee || fp->thisp == JSVAL_TO_OBJECT(fp->argv[-1]));
>     }
> #endif
>-
>+    
>     AUDIT(sideExitIntoInterpreter);

Please don't add trailing whitespace.

r=me with all nits picked and whatever debugging ideas the above provokes investigated.

/be
Attachment #340467 - Flags: review?(brendan) → review+
About to push into m-c so closing.

http://hg.mozilla.org/tracemonkey/rev/ca0b904b4b26
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Comment on attachment 340504 [details] [diff] [review]
didGC was cross-initialized (goto over 1st initialization), causing excessive GC. Good catch gcc -O3!

>diff --git a/js/src/jstracer.cpp b/js/src/jstracer.cpp
>+
>+static bool
>+ReplenishReservePool(JSContext* cx, JSTraceMonitor* tm)
>+{
>+    while (tm->recoveryDoublePoolPtr < tm->recoveryDoublePool + MAX_NATIVE_STACK_SLOTS) {
>+        jsval v;
>+        if (!js_NewDoubleInRootedValue(cx, 0.0, &v))
>+            return false;
>+        *tm->recoveryDoublePoolPtr++ = JSVAL_TO_DOUBLE(v);
>+    }
>+    return true;
>+}

> 
>+    /* replenish the reserve pool (this might trigger a GC */
>+    if (tm->recoveryDoublePoolPtr <= tm->recoveryDoublePool + MAX_NATIVE_STACK_SLOTS) {

Sould it be "<", not "<=" here?

>         tm->globalSlots = new (&gc) SlotList();
>         tm->globalTypeMap = new (&gc) TypeMap();
>+        tm->recoveryDoublePoolPtr = tm->recoveryDoublePool = new double*[MAX_NATIVE_STACK_SLOTS];

Why not new (gc) double*[MAX_NATIVE_STACK_SLOTS]?

> 
> extern void
> js_FinishJIT(JSTraceMonitor *tm)
>@@ -2819,24 +2839,26 @@ js_FinishJIT(JSTraceMonitor *tm)
...
>         tm->fragmento = NULL;
>         delete tm->globalSlots;
>         tm->globalSlots = NULL;
>         delete tm->globalTypeMap;
>         tm->globalTypeMap = NULL;
>+        delete tm->recoveryDoublePool;

It should be delete[] tm->recoveryDoublePool, not delete here.
The new (gc) thing is for nanojit stuff only. Long story. Its a dummy for us.

Will fix the others.
Wow I take that back. Who added all those (&gc) allocator calls? David?
(In reply to comment #17)
> The new (gc) thing is for nanojit stuff only. Long story. Its a dummy for us.

But should the code at least check for null recoveryDoublePool?
Attached patch igor's fixes (obsolete) — Splinter Review
Attachment #340525 - Flags: review?
Attachment #340525 - Flags: review? → review?(igor)
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Check for null where? Feel free to grab the bug and patch it up and I reviewed. Might be easier to see that way what you mean :)
Comment on attachment 340504 [details] [diff] [review]
didGC was cross-initialized (goto over 1st initialization), causing excessive GC. Good catch gcc -O3!

>+static bool
>+ReplenishReservePool(JSContext* cx, JSTraceMonitor* tm)
>+{
>+    while (tm->recoveryDoublePoolPtr < tm->recoveryDoublePool + MAX_NATIVE_STACK_SLOTS) {
>+        jsval v;
>+        if (!js_NewDoubleInRootedValue(cx, 0.0, &v))
>+            return false;

When the GC happens in js_NewDoubleInRootedValue, it will reset tm->recoveryDoublePoolPtr back to tm->recoveryDoublePool potentially leading to an infinite loop here when the GC could not free enough memory.
Good point. Not sure how to fix that though. This would happen for total memory < MAX_NATIVE_STACK_SLOTS*sizeof(double)?
I guess we could guard with a 2nd counter and if ptr == pool and counter > 0 then return false;
Feel free to grab this one. I am getting too tired for serious thinking :)
(In reply to comment #20)
> Created an attachment (id=340525) [details]
> igor's fixes

I meant checking for out-of-memory on new. But the code does not do that in any case. I thought that new (gc) would at least set cx->throwing, but it is indeed just an alias for calloc.
Yes, the sad sob story here is that Adobe insists on not initializing members in the constructor, so we have to make sure we calloc it. Adobe's GC takes care of clearing pages on free (ours doesn't).
(In reply to comment #27)
> Yes, the sad sob story here is that Adobe insists on not initializing members
> in the constructor, so we have to make sure we calloc it.

And they would not accept patches that makes nanojit to recover from OOM in new? It is straightforward to trigger real OOM in the browser and those null pointers may well lead to exploitable bugs.
The new version moves the gc run detection to ReplenishReservePool. I also changed the type of recoveryDoublePoolPtr to jsval to better match js_NewDoubleInRootedValue.
Attachment #340525 - Attachment is obsolete: true
Attachment #340530 - Flags: review?
Attachment #340525 - Flags: review?(igor)
Attachment #340530 - Flags: review? → review?(gal)
(In reply to comment #29)
> Created an attachment (id=340530) [details]
> fixing refil to detect GC.
> 
> The new version moves the gc run detection to ReplenishReservePool. I also
> changed the type of recoveryDoublePoolPtr to jsval to better match
> js_NewDoubleInRootedValue.

Note that the code in js_ExecuteTree ignores OOM reports from ReplenishReservePool and just treat them as if GC run, but OOM is not treated in any way there.
No, I am sure OOM handling is welcome in nanojit, just initializing fields in constructors in a no-no.
I don't particularly like the change to jsval. Since they are all doubles we can as well also say double*. r=me with the charge to think about that and maybe reconsider? ;)
Attachment #340530 - Flags: review?(gal) → review+
(In reply to comment #32)
> I don't particularly like the change to jsval.

That changes replaced JSVAL_TO_DOUBLE/DOUBLE_TO_JSVAL pair by single JSVAL_TO_DOUBLE and removed that:

jsval v;
js_NewDoubleInRootedValue(cx, 0.0, &v);

which is a mockery over that "rooted" suffix. Yes, it works here, but history has shown that such code has a tendency to evolve into exploitable GC hazards.


 Since they are all doubles we
> can as well also say double*. r=me with the charge to think about that and
> maybe reconsider? ;)
Brendan also favors your approach so lets go with it. I will apply the patch.
Slight speedup due to the <= to < fix (10ms on SS).

http://hg.mozilla.org/tracemonkey/rev/b44f455eae52
Status: REOPENED → RESOLVED
Closed: 13 years ago13 years ago
Resolution: --- → FIXED
I do get the out of memory in the shell and don't get the assert, but do get the following errors:

./js1_6/extensions/regress-456826.js:93: Math is not defined
./js1_6/extensions/regress-456826.js:127: Date is not defined

what's up with that?
/cvsroot/mozilla/js/tests/js1_6/extensions/regress-456826.js,v  <--  regress-456826.js
initial revision: 1.1

http://hg.mozilla.org/mozilla-central/rev/b04c04268a94
Flags: in-testsuite+
Flags: in-litmus-
(In reply to comment #36)

> ./js1_6/extensions/regress-456826.js:93: Math is not defined
> ./js1_6/extensions/regress-456826.js:127: Date is not defined

filed Bug 461158
Flags: blocking1.9.1? → blocking1.9.1+
Keywords: fixed1.9.1
v 1.9.1, 1.9.2
Status: RESOLVED → VERIFIED
Depends on: 475146
You need to log in before you can comment on or make changes to this bug.