Closed Bug 538275 Opened 15 years ago Closed 14 years ago

TM: Crash at regress-479381.js testcase

Categories

(Core :: JavaScript Engine, defect)

x86
macOS
defect
Not set
normal

Tracking

()

RESOLVED FIXED
Tracking Status
blocking1.9.2 --- -
status1.9.2 --- wanted
blocking1.9.1 --- -
status1.9.1 --- wanted

People

(Reporter: gwagner, Assigned: igor)

References

Details

(Whiteboard: [sg:moderate][needs answer to comment 41 from shaver][3.6.x] fixed-in-tracemonkey)

Attachments

(2 files, 4 obsolete files)

I run the following testcase in a multithreaded shell: 


test();

function test()
{

  if (typeof gczeal != 'function' || typeof scatter != 'function')
  {
    print('Test skipped: requires mulithreads');
  }
  else
  {
    gczeal(2);
    function f() {
      var s;
      for (var i = 0; i < 9999; i++)
        s = 'a' + String(i)[3] + 'b';
      return s;
    }
    print(scatter([f, f, f, f]));
    gczeal(0);
  }
}

GDB says: 
Program received signal EXC_BAD_ACCESS, Could not access memory.
Reason: KERN_PROTECTION_FAILURE at address: 0x00000001003d23c0
[Switching to process 27152]
0x00000001003d23c0 in ?? ()
(gdb) bt
#0  0x00000001003d23c0 in ?? ()
#1  0x00000001000760bf in JS_CallTracer (trc=0x101385ed0, thing=0x1003d2280, kind=0) at ../jsgc.cpp:2156
#2  0x0000000100076a80 in JSWeakRoots::mark (this=0x100603c30, trc=0x101385ed0) at ../jsgc.cpp:2374
#3  0x0000000100076eac in js_TraceContext (trc=0x101385ed0, acx=0x100603ac0) at ../jsgc.cpp:2456
#4  0x00000001000775d5 in js_TraceRuntime (trc=0x101385ed0, allAtoms=1) at ../jsgc.cpp:2529
#5  0x0000000100079bb6 in js_GC (cx=0x100603610, gckind=GC_LAST_DITCH) at ../jsgc.cpp:3114
#6  0x000000010007b01b in RefillFinalizableFreeList (cx=0x100603610, thingKind=0) at ../jsgc.cpp:1477
#7  0x000000010007b518 in js_NewFinalizableGCThing (cx=0x100603610, thingKind=0) at ../jsgc.cpp:1588
#8  0x00000001000c2a26 in js_NewGCObject (cx=0x100603610) at jsgc.h:275
#9  0x00000001000c313e in js_NewObjectWithGivenProto (cx=0x100603610, clasp=0x100228fe0, proto=0x1003d2240, parent=0x0, objectSize=0) at ../jsobj.cpp:2961
#10 0x00000001000c386e in js_NewObject (cx=0x100603610, clasp=0x100228fe0, proto=0x1003d2240, parent=0x0, objectSize=0) at ../jsobj.cpp:3027
#11 0x00000001000c391d in js_PrimitiveToObject (cx=0x100603610, vp=0x1013861f0) at ../jsobj.cpp:6348
#12 0x00000001000c3a21 in js_ValueToObject (cx=0x100603610, v=4297234324, objp=0x101386248) at ../jsobj.cpp:6370
#13 0x00000001000c3a78 in js_ValueToNonNullObject (cx=0x100603610, v=4297234324) at ../jsobj.cpp:6383
#14 0x0000000100094be0 in js_Interpret (cx=0x100603610) at jsops.cpp:1916
#15 0x00000001000ab951 in js_Invoke (cx=0x100603610, argc=0, vp=0x102815c38, flags=0) at jsinterp.cpp:1384
#16 0x00000001000abf9b in js_InternalInvoke (cx=0x100603610, obj=0x0, fval=4298973632, flags=0, argc=0, argv=0x0, rval=0x100603140) at jsinterp.cpp:1439
#17 0x0000000100012403 in JS_CallFunctionValue (cx=0x100603610, obj=0x0, fval=4298973632, argc=0, argv=0x0, rval=0x100603140) at ../jsapi.cpp:5106
#18 0x0000000100004111 in DoScatteredWork (cx=0x100603610, td=0x1006031a0) at ../../shell/js.cpp:3158
#19 0x0000000100009b82 in RunScatterThread (arg=0x1006031a0) at ../../shell/js.cpp:3192
#20 0x000000010034ceae in PR_JoinThread ()
#21 0x00007fff8004ef8e in _pthread_start ()
#22 0x00007fff8004ee41 in thread_start ()
Assignee: general → igor
This is a regression from the bug 533705. The bug has added JS_OBJ_LOCK to InitScopeForObject. But that function allows for GC to run on another thread. This means that the GC would mark, via a weak root, a partly initialized object. I guess as a workaround the code can just set the map to null before locking the object.
Blocks: 533705
(In reply to comment #1)
> I guess as a workaround the code can just set the map to null before
> locking the object.

That is not enough. While JS_LOCK_OBJ is waiting for a scope, another thread can run a full GC wiping out weak roots. So the code must explicitly root the object.
closing the bug as the LOCK_OBJ race with a GC may have bad implications in other parts of the code.
Group: core-security
Argh, the JS_LOCK_OBJ is for the proto, to close races getting its empty scope that could leak and hand out two empty scopes. Can we do that later in the "new object" process?

/be
JSOP_NEWINIT is another case of the same bug. The code calls JS_LOCK_OBJ on a freshly allocated object without ensuring any rooting.
More issues are related to the property cache. The code often reads the shape before doing JS_OBJ_LOCK. Which, due to GC activity on another thread, can regenerate shapes.
(In reply to comment #6)
> More issues are related to the property cache. The code often reads the shape
> before doing JS_OBJ_LOCK. Which, due to GC activity on another thread, can
> regenerate shapes.

How can the GC run when the first thread reading the shape is in a request?

/be
(In reply to comment #7)
> How can the GC run when the first thread reading the shape is in a request?

JS_LOCK_GC can sort-of suspend the request. Here are stack traces for two relevant threads for a typical crash associated for a variation of the test case:

Thread 5 (Thread 0xb5cffb70 (LWP 16506)):
#0  0xb7ffc1e0 in ?? ()
#1  0x080b8d37 in JS_TraceChildren (trc=0xb5cfeb18, thing=0xb7ffc140, kind=0) at /home/igor/m/tm/js/src/jsgc.cpp:1868
#2  0x080b9bcc in JS_CallTracer (trc=0xb5cfeb18, thing=0xb7ffc140, kind=0) at /home/igor/m/tm/js/src/jsgc.cpp:2156
#3  0x080ba826 in JSWeakRoots::mark (this=0x824e5ec, trc=0xb5cfeb18) at /home/igor/m/tm/js/src/jsgc.cpp:2374
#4  0x080babd9 in js_TraceContext (trc=0xb5cfeb18, acx=0x824e530) at /home/igor/m/tm/js/src/jsgc.cpp:2456
#5  0x080bb17a in js_TraceRuntime (trc=0xb5cfeb18, allAtoms=1) at /home/igor/m/tm/js/src/jsgc.cpp:2529
#6  0x080bbd60 in js_GC (cx=0x824e7b8, gckind=GC_LAST_DITCH) at /home/igor/m/tm/js/src/jsgc.cpp:3114
#7  0x080b7bca in RefillFinalizableFreeList (cx=0x824e7b8, thingKind=0) at /home/igor/m/tm/js/src/jsgc.cpp:1477
#8  0x080b8090 in js_NewFinalizableGCThing (cx=0x824e7b8, thingKind=0) at /home/igor/m/tm/js/src/jsgc.cpp:1588
#9  0x080cfeef in js_NewGCObject (cx=0x824e7b8) at /home/igor/m/tm/js/src/jsgc.h:275
#10 0x080d7338 in js_NewObjectWithGivenProto (cx=0x824e7b8, clasp=0x822e020, proto=0xb7ffc100, parent=0xb7ffc000, objectSize=0) at /home/igor/m/tm/js/src/jsobj.cpp:2961
#11 0x080d75df in js_NewObject (cx=0x824e7b8, clasp=0x822e020, proto=0xb7ffc100, parent=0xb7ffc000, objectSize=0) at /home/igor/m/tm/js/src/jsobj.cpp:3027
#12 0x080c4943 in js_InvokeConstructor (cx=0x824e7b8, argc=0, clampReturn=0, vp=0xb4f0b9f0) at /home/igor/m/tm/js/src/jsinterp.cpp:1878
#13 0x081e033d in js_Interpret (cx=0x824e7b8) at /home/igor/m/tm/js/src/jsops.cpp:2040
#14 0x080c3763 in js_Invoke (cx=0x824e7b8, argc=0, vp=0xb4f0b9e0, flags=0) at /home/igor/m/tm/js/src/jsinterp.cpp:1384
#15 0x080c39d2 in js_InternalInvoke (cx=0x824e7b8, obj=0x0, fval=-1207975744, flags=0, argc=0, argv=0x0, rval=0x824db44) at /home/igor/m/tm/js/src/jsinterp.cpp:1439
#16 0x0806e8ef in JS_CallFunctionValue (cx=0x824e7b8, obj=0x0, fval=-1207975744, argc=0, argv=0x0, rval=0x824db44) at /home/igor/m/tm/js/src/jsapi.cpp:5107
#17 0x08050f46 in DoScatteredWork (cx=0x824e7b8, td=0x824e28c) at /home/igor/m/tm/js/src/shell/js.cpp:3158
#18 0x0805104c in RunScatterThread (arg=0x824e28c) at /home/igor/m/tm/js/src/shell/js.cpp:3192
#19 0x02187b42 in ?? () from /lib/libnspr4.so
#20 0x002d1ab5 in start_thread () from /lib/libpthread.so.0
#21 0x0022883e in clone () from /lib/libc.so.6

Thread 4 (Thread 0xb68b2b70 (LWP 16505)):
#0  0x00d2f424 in __kernel_vsyscall ()
#1  0x002d5c45 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/libpthread.so.0
#2  0x021815d9 in PR_WaitCondVar () from /lib/libnspr4.so
#3  0x080c8981 in ClaimTitle (title=0x824eeb0, cx=0x824e530) at /home/igor/m/tm/js/src/jslock.cpp:662
#4  0x080ca033 in js_LockTitle (cx=0x824e530, title=0x824eeb0) at /home/igor/m/tm/js/src/jslock.cpp:1254
#5  0x080ca57d in js_LockObj (cx=0x824e530, obj=0xb7ffc100) at /home/igor/m/tm/js/src/jslock.cpp:1405
#6  0x080d709a in InitScopeForObject (cx=0x824e530, obj=0xb7ffc140, proto=0xb7ffc100, ops=0x822d000) at /home/igor/m/tm/js/src/jsobj.cpp:2892
#7  0x080d73d6 in js_NewObjectWithGivenProto (cx=0x824e530, clasp=0x822e020, proto=0xb7ffc100, parent=0xb7ffc000, objectSize=0) at /home/igor/m/tm/js/src/jsobj.cpp:2976
#8  0x080d75df in js_NewObject (cx=0x824e530, clasp=0x822e020, proto=0xb7ffc100, parent=0xb7ffc000, objectSize=0) at /home/igor/m/tm/js/src/jsobj.cpp:3027
#9  0x080c4943 in js_InvokeConstructor (cx=0x824e530, argc=0, clampReturn=0, vp=0xb510b9f0) at /home/igor/m/tm/js/src/jsinterp.cpp:1878
#10 0x081e033d in js_Interpret (cx=0x824e530) at /home/igor/m/tm/js/src/jsops.cpp:2040
#11 0x080c3763 in js_Invoke (cx=0x824e530, argc=0, vp=0xb510b9e0, flags=0) at /home/igor/m/tm/js/src/jsinterp.cpp:1384
#12 0x080c39d2 in js_InternalInvoke (cx=0x824e530, obj=0x0, fval=-1207975744, flags=0, argc=0, argv=0x0, rval=0x824db40) at /home/igor/m/tm/js/src/jsinterp.cpp:1439
#13 0x0806e8ef in JS_CallFunctionValue (cx=0x824e530, obj=0x0, fval=-1207975744, argc=0, argv=0x0, rval=0x824db40) at /home/igor/m/tm/js/src/jsapi.cpp:5107
#14 0x08050f46 in DoScatteredWork (cx=0x824e530, td=0x824e278) at /home/igor/m/tm/js/src/shell/js.cpp:3158
#15 0x0805104c in RunScatterThread (arg=0x824e278) at /home/igor/m/tm/js/src/shell/js.cpp:3192
#16 0x02187b42 in ?? () from /lib/libnspr4.so
#17 0x002d1ab5 in start_thread () from /lib/libpthread.so.0
#18 0x0022883e in clone () from /lib/libc.so.6

Here the thread 5 waits for the title owned by the thread 4. Then the thread 4 calls the GC which is allowed to run.
Igor: agree locking an object (scope title) without rooting is bad, and that indeed suspends a request at the limit, letting GC run.

Didn't JS_LOCK_OBJ_IF_SHAPE solve the problem in comment 6? If not, can we solve it here along with any locking without rooting? Otherwise, separate bug.

/be
(In reply to comment #9)
> Didn't JS_LOCK_OBJ_IF_SHAPE solve the problem in comment 6?

No: the problem is that the matched shape is used to refill the cache after the lock is taken.

> If not, can we
> solve it here along with any locking without rooting?

A simple way to address the property cache problem is to cache only single-threaded object. This way CalimTitle would never be called for property cache operations. As a bonus all those JS_UNLOCK_OBJ in js_Interpret related to the property cache would be unnecessary. But this is for another bug.

To address this one what about always suspending the current request in js_GC? This way the GC could not run in parallel with threads suspended in ClaimTitle. The drawback is that the last ditch GC would need an extra pair of UNLOCK_GC/LOCK_GC calls which could bring the bug 162779 back.
(In reply to comment #10)
> (In reply to comment #9)
> > Didn't JS_LOCK_OBJ_IF_SHAPE solve the problem in comment 6?
> 
> No: the problem is that the matched shape is used to refill the cache after the
> lock is taken.

This is ok if the GC didn't reshape the runtime, which suggests that last-ditch GCs should never reshape the runtime.

> > If not, can we
> > solve it here along with any locking without rooting?
> 
> A simple way to address the property cache problem is to cache only
> single-threaded object. This way CalimTitle would never be called for property
> cache operations. As a bonus all those JS_UNLOCK_OBJ in js_Interpret related to
> the property cache would be unnecessary. But this is for another bug.

This is a good idea, it fits into the grand plan to eliminate JS_THREADSAFE by making all objects ST from birth, requiring special API for making multi-threaded objects, which would have dictionary-mode scopes, titles, etc. See bug 511591 at least. Want to file a metabug or different concrete bug?

> To address this one what about always suspending the current request in js_GC?
> This way the GC could not run in parallel with threads suspended in ClaimTitle.
> The drawback is that the last ditch GC would need an extra pair of
> UNLOCK_GC/LOCK_GC calls which could bring the bug 162779 back.

How about never reshaping from last-ditch GC as easier fix?

/be
(In reply to comment #11)
> This is a good idea, it fits into the grand plan to eliminate JS_THREADSAFE by
> making all objects ST from birth, requiring special API for making
> multi-threaded objects, which would have dictionary-mode scopes, titles, etc.
> See bug 511591 at least. Want to file a metabug or different concrete bug?

I have filed bug 538463 for the specific optimization.

> How about never reshaping from last-ditch GC as easier fix?

That would not help in general as the thread that own the object may call a full GC directly which would also wipe out the weak roots. That is why I prefer a solution that would make JS_LOCK_OBJ not to run the GC ever. Otherwise all JS_LOCK_OBJ calls has to be monitored for possible GC hazards.
> That is why I prefer a solution that would make JS_LOCK_OBJ not to run the GC
> ever. Otherwise all JS_LOCK_OBJ calls has to be monitored for possible GC
> hazards.

Sold -- I guess we will take our chances with bug 162779 again.

/be
Attached patch fix v1 (obsolete) — Splinter Review
Here is an untested work in progress. The main idea here is not to leave the request in ClaimTitle. Instead js_GC calls js_ShareWaitingTitles before it goes to wait for requests to complete.
Attached patch fix v2 (obsolete) — Splinter Review
The new version fixes quite a few bug in the previous patch. It is not that straightforward to ensure proper locking and avoid deadlocks with racing js_GC/ClaimTitle calls.
Attachment #420919 - Attachment is obsolete: true
Attached patch fix v3 (obsolete) — Splinter Review
The new version adds comments and fixes another problematic code. NudgeThread(thread) from jslock.cpp in the current form is not thread-safe as it access thread->contextList from non-current thread. That is not safe as thread->contextList is manipulated outside any locks and should be accessed only from the current thread.

The patch passes the TryServer and shell tests.
Attachment #421220 - Attachment is obsolete: true
Attachment #421263 - Flags: review?(brendan)
Here is a variation of the test case that exposes the problem and contain better coverage for various deadlocking issues:


if (typeof gczeal != 'function' || typeof scatter != 'function')
    print('Test skipped: requires mulithreads');
else
    test();

function test()
{
    gczeal(2);
    function f() {
        var buf = [];
        for (var i = 0; i < 9999; i++)
            buf[i] = 'a' + new String();
        return buf;
    }
    var x = [];
    var threads = 8;
    for (var i = 0; i != threads; ++i)
        x.push(f);
    x = scatter(x);
    if (x.length !== threads)
        throw "Bad length:"+x.length;
    for (var i = 0; i != threads; ++i) {
        var buf = x[i];
        for (var j = 0; j != buf.length; ++j) {
            if (buf[j] !== "a")
                throw "Bad x["+i+"]["+j+"]: "+x[i];
        }
    }
}
Nominating for branches: this bug exposes an old race that could affect a few JS_LOCK_OBJ and friends call sites. It theory it should not affect the default Firefox configuration as even with thread workers the race is not visible. But the bug can bite an extension that uses threads. Note also the fix is not light and requires some baking to ensure that it does not introduce deadlock bugs.
blocking1.9.1: --- → ?
Flags: wanted1.9.0.x?
Flags: blocking1.9.2?
Don't think we want this in 1.9.0. Would be worth taking in 1.9.1 after sufficient baking, but not blocking for now. We'll get back to it after it lands on trunk and then 1.9.2.
blocking1.9.1: ? → ---
Flags: wanted1.9.0.x? → wanted1.9.0.x-
Flags: blocking1.9.2? → blocking1.9.2+
Can I get a blocking rationale here? Blocking the release of 1.9.2 on this issue means a new RC, and moving into February. Also, it wasn't mentioned at all on the Tuesday call, which I find a little odd :(
I don't think we should block on this -- it doesn't affect our default config, and we know of no config (like, available add-on) in which it does.  We can fix it in 1.9.2.1 if we want, and it bakes well.

Moving back to nom-state for more discussion.  Rob: please indicate why you think something should block if you mark it so, especially at this late date.  Thanks!
Flags: blocking1.9.2+ → blocking1.9.2?
We haven't created the 1.9.2.x flags yet (coming soon) but I've been using [3.6.x] in the status whiteboard to indicate that a bug is a candidate for a stability release. If this ends up falling into that bucket, I recommend adding that to the whiteboard and moving the blocking1.9.2 flag to -
To Brendan: do you have time to review this?
Flags: blocking1.9.2? → blocking1.9.2-
Whiteboard: [3.6.x]
I will review tomorrow.

/be
Attached patch fix v4 (obsolete) — Splinter Review
The new version takes advantage of the fact that with the patch the GC cannot race with js_ShareWaitingTitles so the title hold/drop during waiting for the title sharing can be removed. But this lead to discovering of another inconsistency. 

JSScope::drop does not use JS_ATOMIC_DECREMENT under assumption that only the GC or js_ShareWaitingTitles can call it. But this is not true as js_InitClass also calls the method. At that moment the proto can be leaked to another thread via the debugger API which violates the assumption. The patch fixes that via providing separated ensureEmptyScope method.

I will ask for review after testing the new changes.
Attachment #421263 - Attachment is obsolete: true
Attachment #421263 - Flags: review?(brendan)
Comment on attachment 422103 [details] [diff] [review]
fix v4

the patch passes the shell tests and the try server
Attachment #422103 - Flags: review?(brendan)
review ping
Comment on attachment 422103 [details] [diff] [review]
fix v4

I'm under water on other fronts (too many to mention) -- Jason is better reviewer for that reason, and for many others (having documented the JS_THREADSAFE design, written scatter, etc.).

/be
Attachment #422103 - Flags: review?(brendan) → review?(jorendorff)
Comment on attachment 422103 [details] [diff] [review]
fix v4

I was reading the diff and had some nits edited already; here they are:

>+         * Share any title that is owned by the GC thread before we wait to

Nit: comma before "to".

>+         * avoid a deadlock with ClaimTitle. We also set the gcWaiting  flag

Double-space before "flag".

>+         * so that ClaimTitle can claim the title owneship from the GC thread

"ownership"

>+    /*
>+     * We cannot walk here over thread->contextList as that is manipulated
>+     * outside the GC lock and must be accessed only from the current thread.

s/the current thread/that thread/ maybe? Not clear you don't mean by "the current thread" the one running this code, the one holding the GC lock.

>+    for (JSContext *ownercx; (ownercx = title->ownercx) != NULL; ) {

This could be just while (JSContext *ownercx = title->ownercx) and I'd prefer that (if my C++ is correct).

>-    {
>-        JSScope *scope = OBJ_SCOPE(proto)->getEmptyScope(cx, clasp);
>-        if (!scope)
>-            goto bad;
>-        scope->drop(cx, NULL);
>-    }
>+    if (!OBJ_SCOPE(proto)->ensureEmptyScope(cx, clasp))
>+        goto bad;

This is righteous.

So r=me at this point but Jason reviewing too would be appreciated.

/be
Attachment #422103 - Flags: review+
Attached patch fix v5Splinter Review
The new version fixes the nits above.
Attachment #422103 - Attachment is obsolete: true
Attachment #422818 - Flags: review?(jorendorff)
Attachment #422103 - Flags: review?(jorendorff)
blocking1.9.2: --- → ?
Whiteboard: [3.6.x] → [sg:moderate][3.6.x]
In JSThread:
>     /*
>+     * The thread is inside js_GC and waits until the GC can start on this or
>+     * another thread.
>+     */
>+    bool                gcWaiting;

I think you can usefully say a little more here:

    /*
     * The thread is inside js_GC, either waiting until it can start GC, or
     * waiting for GC to finish on another thread. This thread holds no locks;
     * other threads may steal titles from it.
     *
     * Protected by rt->gcLock.
     */

In ClaimTitle:
>+         * titles are shared and the threads that wants to lock them finish

"threads that want"

In js_GC:
>+            if (requestDebit != 0) {
>+                rt->requestCount -= requestDebit;
>+
>+                if (rt->requestCount == 0)
>+                    JS_NOTIFY_REQUEST_DONE(rt);
>+
>+                /*
>+                 * See comments before another call to js_ShareWaitingTitles
>+                 * below.
>+                 */
>+                cx->thread->gcWaiting = true;
>+                js_ShareWaitingTitles(cx);
>+#ifdef JS_TRACER
>+                if (JS_ON_TRACE(cx)) {
>+                    JS_UNLOCK_GC(rt);
>+                    js_LeaveTrace(cx);
>+                    JS_LOCK_GC(rt);
>+                }
>+#endif
>+                do {
>+                    JS_AWAIT_GC_DONE(rt);

Before, this code called js_LeaveTrace before decrementing rt->requestCount and
calling JS_NOTIFY_REQUEST_DONE. I think that was right: js_LeaveTrace shouldn't
be allowed to happen during GC.

r=me with that fixed.
Attachment #422818 - Flags: review?(jorendorff) → review+
(In reply to comment #31)
>     /*
>      * The thread is inside js_GC, either waiting until it can start GC, or
>      * waiting for GC to finish on another thread. This thread holds no locks;
>      * other threads may steal titles from it.

Would "This thread" to start the first sentence be even better? If I have context in mind correctly, I think so.

/be
(In reply to comment #31)
> >+            if (requestDebit != 0) {
> >+                rt->requestCount -= requestDebit;
> >+
> >+                if (rt->requestCount == 0)
> >+                    JS_NOTIFY_REQUEST_DONE(rt);
> >+
> >+                /*
> >+                 * See comments before another call to js_ShareWaitingTitles
> >+                 * below.
> >+                 */
> >+                cx->thread->gcWaiting = true;
> >+                js_ShareWaitingTitles(cx);
> >+#ifdef JS_TRACER
> >+                if (JS_ON_TRACE(cx)) {
> >+                    JS_UNLOCK_GC(rt);
> >+                    js_LeaveTrace(cx);
> >+                    JS_LOCK_GC(rt);
> >+                }

> Before, this code called js_LeaveTrace before decrementing rt->requestCount and
> calling JS_NOTIFY_REQUEST_DONE. I think that was right: js_LeaveTrace shouldn't
> be allowed to happen during GC.

That is a good catch. Moreover, any unlock/lock pair after JS_NOTIFY_REQUEST_DONE but before JS_AWAIT_GC_DONE requires at least a check before the wait that gcLevel > 0. Otherwise the code may loose the JS_NOTIFY_GC_DONE signal when the GC finishes inside that unlock/lock.
That reminds me. While I was looking at js_GC I noticed that the first thing it does is query rt->gcKeepAtoms. Isn't that a race condition? We should query it after the other threads quiesce. This is unrelated to your changes of course.
landed with fixes to address the comment 31 and 32 - https://hg.mozilla.org/tracemonkey/rev/74f02b834385
Whiteboard: [sg:moderate][3.6.x] → [sg:moderate][3.6.x] fixed-in-tracemonkey
http://hg.mozilla.org/mozilla-central/rev/74f02b834385
Status: NEW → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
Depends on: 543839
If you still think this needs to go on the 1.9.2 branch, please nominate the appropriate patch for approval and let us know why. If it's unlikely to hit 1.9.2 users, though, I don't think we should take the risk.
blocking1.9.1: --- → -
blocking1.9.2: ? → -
(In reply to comment #37)
> If it's unlikely to hit 1.9.2 users, though, I don't think we should take the risk.

Potentially risky changes that the patch introduces are not in the code that Firefox users uses. Only extensions that share JS objects accross threads could see a regression, but then this is exactly the code that is affected by the bug.
(In reply to comment #38)
> Potentially risky changes that the patch introduces are not in the code that
> Firefox users uses.

I don't see how that could be the case: it looks like this patch touches js_GC, which Firefox must indeed run at many points, so risk here would certainly seem to extend to all users.  If we have path coverage with tests that gives us high confidence that the cases that Firefox hits (in the presence of workers and such) are unchanged, then that should definitely be considered, but I don't really see how we can say "there isn't potential risk to the code that Firefox uses".  What am I missing?

But even without Firefox, we have a set of known bugs that might affect thread-using extensions, and risk changing to another set.  Even if the bugs are equally severe, and equally likely to happen, we should prefer the status quo: extension authors are likely to have worked around the current set of bugs, as they are accustomed to do when writing threading code against our engine, and giving them a new set of bugs to contend with in a compatible update isn't very nice.
Attached patch fix v5 for 1.9.2Splinter Review
For 192 the patch required trivial merging. Here is the plain diff between trunk and 1.9.2 versions of patch parts that required merging: 

> @@ -904,44 +904,6 @@ js_WaitForGC(JSRuntime *rt)
26c19
< -        LeaveTrace(cx);
---
> -        js_LeaveTrace(cx);

...
> @@ -2969,12 +2969,8 @@ js_InitClass(JSContext *cx, JSObject *ob
411c408
< -    {
---
> -    if (OBJ_IS_NATIVE(proto)) {
417c414
< +    if (!OBJ_SCOPE(proto)->ensureEmptyScope(cx, clasp))
---
> +    if (OBJ_IS_NATIVE(proto) && !OBJ_SCOPE(proto)->ensureEmptyScope(cx, clasp))

As regarding a potential regression risk, that may only can come from parts of the code that are not used by the Firefox default configuration. So I nominate this for 1.9.2 to make sure that extensions that share JS objects across threads would not be a source of GC hazards.
Attachment #426683 - Flags: approval1.9.2.2?
Comment on attachment 426683 [details] [diff] [review]
fix v5 for 1.9.2

We missed 1.9.2.2 so moving the approval request forward.

Shaver, are you still feeling nervous about taking this on the branch?  It doesn't seem like we have good consensus here yet.
Attachment #426683 - Flags: approval1.9.2.2? → approval1.9.2.3?
Whiteboard: [sg:moderate][3.6.x] fixed-in-tracemonkey → [sg:moderate][needs answer to comment 41 from shaver][3.6.x] fixed-in-tracemonkey
I haven't really had an answer to why we should prefer the potential regression space to the existing one, but I would defer to the module owner on this.
(In reply to comment #42)
> I haven't really had an answer to why we should prefer the potential regression
> space to the existing one, but I would defer to the module owner on this.

The potential regression space is large-ish and unknown, not even bounded by source lines the patch touches. Even if the main thread code never suspends a request claiming a title for Firefox-only code, some subtle logic error or coupling to a latent bug could regress Firefox without addons.

The winning aspect of course is the fix for a bad GC safety bug if extensions do cross threads in the wrong ways.

The main albeit crude way to evaluate risk is therefore baking. The patch hit m-c on 2010-02-02. It's now 03-19. Have trunk builds been spun and tested by addons users/testers? I do not know. Should we can wait a bit longer while we find the answer, and if "no", get some testing done?

It's tempting to think code coverage is enough, but with threaded code you can have all-paths coverage and still have bad bugs. This looks like such a bug. But the fix looks good to me, and at some point we will ship it.

Has anyone searched for breakpad reports that might be due to this bug?

/be
(In reply to comment #43)
> The potential regression space is large-ish and unknown, not even bounded by
> source lines the patch touches. Even if the main thread code never suspends a
> request claiming a title for Firefox-only code, some subtle logic error or
> coupling to a latent bug could regress Firefox without addons.
> 
> The winning aspect of course is the fix for a bad GC safety bug if extensions
> do cross threads in the wrong ways.
> 
> The main albeit crude way to evaluate risk is therefore baking. The patch hit
> m-c on 2010-02-02. It's now 03-19. Have trunk builds been spun and tested by
> addons users/testers? I do not know. Should we can wait a bit longer while we
> find the answer, and if "no", get some testing done?

To be complete and include an item among what we do know so far, which I forgot to list:

This bug has not regressed nightly trunk build users, apart from bug 543839, with any symptom that led back via a bug, bisecting, etc., to blame the patch for this bug.

So this is good news, but again it hasn't even been two months.

/be
Comment on attachment 426683 [details] [diff] [review]
fix v5 for 1.9.2

alright, we'll clear this approval request for now and check back later.
Attachment #426683 - Flags: approval1.9.2.3?
Group: core-security
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: