Last Comment Bug 675078 - rm JSThreadData and JSThread
: rm JSThreadData and JSThread
Status: RESOLVED FIXED
: dev-doc-needed
Product: Core
Classification: Components
Component: JavaScript Engine (show other bugs)
: unspecified
: All All
: -- normal (vote)
: mozilla12
Assigned To: Luke Wagner [:luke]
:
Mentors:
Depends on: 650411
Blocks: 720018 722345 714639 719858 720013 720045 720753 721345 723286 724310 725595
  Show dependency treegraph
 
Reported: 2011-07-28 16:07 PDT by Luke Wagner [:luke]
Modified: 2012-04-28 15:48 PDT (History)
13 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments
rm JSThread/JSThreadData (131.99 KB, patch)
2011-07-28 16:07 PDT, Luke Wagner [:luke]
no flags Details | Diff | Splinter Review
rm JSThread/JSThreadData and some low-hanging fruit (204.59 KB, patch)
2012-01-20 16:13 PST, Luke Wagner [:luke]
n.nethercote: review+
Details | Diff | Splinter Review
rm JSThread/ThreadData and some low-hanging fruit (202.12 KB, patch)
2012-01-23 14:24 PST, Luke Wagner [:luke]
igor: review+
Details | Diff | Splinter Review

Description Luke Wagner [:luke] 2011-07-28 16:07:27 PDT
Created attachment 549260 [details] [diff] [review]
rm JSThread/JSThreadData

Once bug 650411 lands, JSThreadData can be rolled into JSRuntime and JSThread isn't needed at all.
Comment 1 Luke Wagner [:luke] 2011-07-29 09:49:04 PDT
Comment on attachment 549260 [details] [diff] [review]
rm JSThread/JSThreadData

It'll be a while before this can land but, FWIW, green on try.
Comment 2 :Ms2ger (⌚ UTC+1/+2) 2011-11-07 23:57:59 PST
Comment on attachment 549260 [details] [diff] [review]
rm JSThread/JSThreadData

>--- a/js/src/jscntxt.h
>+++ b/js/src/jscntxt.h
>+    /* Number of JS_SuspendRequest calls withot JS_ResumeRequest. */
>+    unsigned            suspendCount;
>+
>+# ifdef DEBUG

You just copied this, but "#ifdef"?

>--- a/js/src/jsgc.cpp
>+++ b/js/src/jsgc.cpp
>     /*
>      * Recursive GC is no-op and a call from another thread waits the started
>      * GC cycle to finish.
>      */

Is this still relevant?
Comment 3 Luke Wagner [:luke] 2011-11-08 08:19:36 PST
(In reply to Ms2ger from comment #2)
> >+# ifdef DEBUG
> 
> You just copied this, but "#ifdef"?

Nope: SM style indents 1 space per level of #if nesting.

> >      * Recursive GC is no-op and a call from another thread waits the started
> >      * GC cycle to finish.
> 
> Is this still relevant?

There is a ton of stuff that will be obsolete when bug 650411 can be considered to have "stuck" (esp. in the GC).  Rest assured the rapture awaits.
Comment 4 Luke Wagner [:luke] 2012-01-20 16:13:59 PST
Created attachment 590383 [details] [diff] [review]
rm JSThread/JSThreadData and some low-hanging fruit

 45 files changed, 513 insertions(+), 2752 deletions(-)

The patch straight-up simply removes JSThread and folds ThreadData into JSRuntime.  Secondly, a few obviously useless locks/condvars are removed (which allow jslock.cpp, lock_sparcv8plus.il, and lock_sparcv9.il to be removed).

njn, could you review the memory reporting changes and igor, could you review the rest?
Comment 5 Nicholas Nethercote [:njn] 2012-01-20 16:40:21 PST
Comment on attachment 590383 [details] [diff] [review]
rm JSThread/JSThreadData and some low-hanging fruit

Review of attachment 590383 [details] [diff] [review]:
-----------------------------------------------------------------

Looks fine, just one nit below.  I only reviewed MemoryMetrics.{h,cpp} and XPCJSRuntime.cpp, is that all you wanted me to look at?

When this lands I'd love it if you could write a message to the JSAPI list to explain the changes and benefits!

::: js/src/MemoryMetrics.cpp
@@ +222,5 @@
> +        rt->sizeOfExcludingThis(data->mallocSizeOf,
> +                                &data->runtimeNormal,
> +                                &data->runtimeTemporary,
> +                                &data->runtimeRegexpCode,
> +                                &data->runtimeStackCommitted);

data->runtimeNormal et al are int64_t, but rt->sizeOfExcludingThis expects size_t pointers.  Can you introduce size_t temporaries and pass them to sizeOfExcludingThis, and then assign those temporaries to data->runtimeNormal et al, similar to how the old code worked?
Comment 6 Luke Wagner [:luke] 2012-01-20 16:43:25 PST
(In reply to Nicholas Nethercote [:njn] from comment #5)
> data->runtimeNormal et al are int64_t, but rt->sizeOfExcludingThis expects
> size_t pointers.

The patch changes that :)  (Type error)

> Can you introduce size_t temporaries and pass them to
> sizeOfExcludingThis, and then assign those temporaries to
> data->runtimeNormal et al, similar to how the old code worked?

Do you still want me to do this?
Comment 7 Igor Bukanov 2012-01-20 16:47:45 PST
Comment on attachment 590383 [details] [diff] [review]
rm JSThread/JSThreadData and some low-hanging fruit

I see in a few places that the patch removes the code that synchronizes with the background thread or removes the locking to allow calling the operation callback triggers from an arbitrary thread. This is just wrong. In general, I would prefer to see a patch that does not remove any GC locks/unlocks unless it is very clear that it would not affect any semantics.

And for this reason i really would like to see first a patch that just stores JSThread in JSRuntime and then have a separated patch that moves the fields into JSRuntime. With the current patch it is way to easy to miss important semantic changes.
Comment 8 Nicholas Nethercote [:njn] 2012-01-20 16:49:40 PST
> The patch changes that :)  (Type error)

Oh, you changed sizeOfExcludingThis().

It's kind of lame, but sizeOfExcludingThis functions always have size_t return types, but memory reporters need PRInt64/int64_t types because IDL doesn't allow size_t and -1 is used to mean "unknown".  So I'd like you to revert the sizeOfExcludingThis type change and add the temporaries.  

(I've been thinking about changing IterateData to have size_t members instead of int64_t members, but until then, these temporaries are the easiest way to keep things consistent with the current scheme.)

Thanks!
Comment 9 Igor Bukanov 2012-01-20 17:07:37 PST
(In reply to Igor Bukanov from comment #7)
> I see in a few places that the patch removes the code that synchronizes with
> the background thread or removes the locking to allow calling the operation
> callback triggers from an arbitrary thread. This is just wrong.

For example, the changes in TraceRuntime in jsgc.cpp removes all JS_THREADSAFE code. But this misses that rt->gcThread != cx->thread() is true when the GC is not running on the current thread and when rt->gcThread is true. Another example is onTooMuchMalloc. It still must do AutoLockGC lock(this) before calling T.riggerGC as the latter can be called from, for example, a timer thread and requires the GC lock
Comment 10 Igor Bukanov 2012-01-20 17:11:33 PST
Another problem with the patch is that the conservative GC changes no longer includes the registers into the the set of the conservative roots. This should be restored.
Comment 11 Luke Wagner [:luke] 2012-01-20 17:43:41 PST
(In reply to Igor Bukanov from comment #9)
> For example, the changes in TraceRuntime in jsgc.cpp removes all
> JS_THREADSAFE code. But this misses that rt->gcThread != cx->thread() is
> true when the GC is not running on the current thread and when rt->gcThread
> is true.

rt->gcThread == cx->thread always.  It is an invariant.  See JS_AbortIfWrongThread in js_GC.  That is why I removed gcThread.

> Another example is onTooMuchMalloc. It still must do AutoLockGC
> lock(this) before calling T.riggerGC as the latter can be called from, for
> example, a timer thread and requires the GC lock

Can't happen.

(In reply to Igor Bukanov from comment #10)
> Another problem with the patch is that the conservative GC changes no longer
> includes the registers into the the set of the conservative roots. This
> should be restored.

Do you realistically think the registers will contain a pointer to a GC-thing that somehow was never spilled half despite being neck-deep in the GC?  I (clearly) do not, but it is trivial to add it back.
Comment 12 Luke Wagner [:luke] 2012-01-20 17:44:31 PST
(Green on try)
Comment 13 Luke Wagner [:luke] 2012-01-20 17:53:31 PST
In general, only three lock-y things were removed: atomState.lock, gcDone, and requestDone.  A simple grepping shows these are only used from the runtime's owner thread.  I initially tried to break this change into a queue of minimal patches, but those intermediate states were a lot harder to reason about than the end state, which is rather simple.
Comment 14 Luke Wagner [:luke] 2012-01-20 19:59:29 PST
Comment on attachment 590383 [details] [diff] [review]
rm JSThread/JSThreadData and some low-hanging fruit

Given the clarifications in comments 11 and 13 would you consider looking at the patch again?  They significantly simplify reasoning about the patch.  E.g., js_WaitForGC is trivially a nop.
Comment 15 Gregor Wagner [:gwagner] 2012-01-21 02:16:46 PST
Why don't we have to wait for the background thread to finish in TraceRuntime? Is there another check somewhere?
Comment 16 Igor Bukanov 2012-01-21 03:14:36 PST
(In reply to Luke Wagner [:luke] from comment #11)
> rt->gcThread == cx->thread always.  It is an invariant.  See
> JS_AbortIfWrongThread in js_GC.  

Currently rt->gcThread is set only during the GC. The rest of the time it is null. So it is not invariant and that code must be preserved to ensure that we properly synchronizes against the background thread.

> > Another example is onTooMuchMalloc. It still must do AutoLockGC
> > lock(this) before calling TriggerGC as the latter can be called from, for
> > example, a timer thread and requires the GC lock
> 
> Can't happen.

How exactly do we interrupt the main thread with a long running script if not setting the interrupt flags from the another thread? As a part of that implementation TriggerGC uses the GC lock and assumes that it is hold by the caller. Without the lock a potential race in TriggerGC called from onTooMuchMalloc with a timer thread may breaks things.

> Do you realistically think the registers will contain a pointer to a
> GC-thing that somehow was never spilled half despite being neck-deep in the
> GC?

If you can prove that none of the compilers on platforms that we use inline or transform the code to such extent that register spill does not happen, then yes, lets remove the register scan. But I am not aware of such proof and prefer to be on a safe side.

Another thing is that the patch moves recording of the stack pointer for the conservative scanner to MarkConservativeStackRoots. That introduces false positives from various pointers into the GC arenas that were touched in the GC code before doing the conservative scan. I have observed that many times and that is why currently the code recorders the stack bounds separately early in the GC-related calls. That must be preserved.
Comment 17 Luke Wagner [:luke] 2012-01-21 17:49:12 PST
(In reply to Igor Bukanov from comment #16)
> (In reply to Luke Wagner [:luke] from comment #11)
> > rt->gcThread == cx->thread always.  It is an invariant.  See
> > JS_AbortIfWrongThread in js_GC.  
> 
> Currently rt->gcThread is set only during the GC. The rest of the time it is
> null. So it is not invariant and that code must be preserved to ensure that
> we properly synchronizes against the background thread.

You're right "always" was overreaching; "if gcThread is non-null, then it is the same as cx->thread".  Also, yes, I assume you are talking about the one instance in TraceRuntime?

> > > Another example is onTooMuchMalloc. It still must do AutoLockGC
> > > lock(this) before calling TriggerGC as the latter can be called from, for
> > > example, a timer thread and requires the GC lock
> > 
> > Can't happen.
> 
> How exactly do we interrupt the main thread with a long running script if
> not setting the interrupt flags from the another thread?

Given that *only* triggerOperationCallback may happen off the runtime's owner thread, that must mean you thing there is some race condition since there are two states here (interruptFlags/interruptCounter)?  That just points out that we don't need two volatile operation-callback flag variables anymore.

> > Do you realistically think the registers will contain a pointer to a
> > GC-thing that somehow was never spilled half despite being neck-deep in the
> > GC?
> 
> If you can prove that none of the compilers on platforms that we use inline
> or transform the code to such extent that register spill does not happen,
> then yes, lets remove the register scan. But I am not aware of such proof
> and prefer to be on a safe side.

Heh, I thought you'd say that.  Lack of proof didn't stop us from doing conservative GC in the first place (we can't even pretend to being close to having one).  But fine, I'll put it back.  In theory the whole thing is going away eventually anyway.
Comment 18 Luke Wagner [:luke] 2012-01-21 17:50:21 PST
Comment on attachment 590383 [details] [diff] [review]
rm JSThread/JSThreadData and some low-hanging fruit

There are a few good comments to address, so I'll include them and post a new patch.
Comment 19 Igor Bukanov 2012-01-22 04:39:24 PST
(In reply to Luke Wagner [:luke] from comment #17)
> You're right "always" was overreaching; "if gcThread is non-null, then it is
> the same as cx->thread".  Also, yes, I assume you are talking about the one
> instance in TraceRuntime?

To be clear: in TraceRuntime the condition rt->gcThread != cx->thread() should be replaced with if (!rt->gcRunning) and the rest of the code should be preserved for now.

> Given that *only* triggerOperationCallback may happen off the runtime's
> owner thread, that must mean you thing there is some race condition since
> there are two states here (interruptFlags/interruptCounter)?  That just
> points out that we don't need two volatile operation-callback flag variables
> anymore.

For now just keep that lock in onTooMuchMalloc. Its removal should go to a separated patch. But in this patch surely remove all instances of interruptCounter - that had to be added to deal with web workers scripts that could have been executed against different threads when it was not possible to record a pointer to JSThread::interruptFlags into the jited code.

> Heh, I thought you'd say that.  Lack of proof didn't stop us from doing
> conservative GC in the first place (we can't even pretend to being close to
> having one).

You are right, I was too conservative at that point. But for the case here the argument against the proposed simplification I think is stronger. To avoid false positives (that was observed in practice) sampling of the current stack pointer should be done at the earliest point during the GC when the native stack is not used for any GC structures yet.  But such early sampling increases a chance that we could have a register with GC things that is not yet spilled.

And when you update the patch, could you keep for now the compare-and-swap code in jslock.cpp (but put it under #if 0) and also keep the CPU counter function there? This will minimizes number of conflicts with my patches that tries to push to the background thread more parts of the GC sweeping phase.
Comment 20 Luke Wagner [:luke] 2012-01-22 14:45:36 PST
(In reply to Igor Bukanov from comment #19)
> To avoid false positives (that was observed in practice) sampling of the
> current stack pointer should be done at the earliest point during the GC
> when the native stack is not used for any GC structures yet.

I understand the general concern of trying to minimize unintended false positives.  Given that the try server didn't report any such leaks, though, it seems like we should default to the simplest possible design (which also simplifies the review for this patch).
Comment 21 Luke Wagner [:luke] 2012-01-22 14:59:55 PST
(In reply to Gregor Wagner [:gwagner] from comment #15)
> Why don't we have to wait for the background thread to finish in
> TraceRuntime? Is there another check somewhere?

I was confused by this also.  There is a waitBackgroundSweepOrAllocEnd in GCCycle, but this is missed by calls to TraceRuntime.  (Is that right Igor?)  It would be better if there was a single wait call in MarkRuntime; perhaps we can do that in a future patch...
Comment 22 Igor Bukanov 2012-01-22 16:40:27 PST
(In reply to Luke Wagner [:luke] from comment #20)
> I understand the general concern of trying to minimize unintended false
> positives.  Given that the try server didn't report any such leaks, though,
> it seems like we should default to the simplest possible design

The leaks are not visible at the tryserver, since at shutdown we do not scan the native stacks. But there are some tests that are sensitive to these leaks since the tests depend on the GC collecting things and running finalizers. My past experience tells me that the patch in its current form increases a chance for the tests to fail randomly.

 (which also
> simplifies the review for this patch).

If the patch would have kept ConservativeGCThreadData and RecordNativeStackTopForGC as is (i.e. if it would just do what was proposed in the comment 0), the patch would be smaller and we would not waste time for the discussion here.

Somebody once rightfully observed that efforts for a meaningful review are proportional to something like the squire of the number of lines the patch changes. So patches that mix big mechanical changes like moving fields from ThreadData to JSRuntime and unrelated changes are really bad from that point of view. Yes, it may be easier to write one big patch as the author does not need to introduce some temporary code just to decouple somewhat related changes, but the reviewer efforts to look for and check those changes among the noise from the mechanical code movement increases disproportionately.

(In reply to Luke Wagner [:luke] from comment #21)
> I was confused by this also.  There is a waitBackgroundSweepOrAllocEnd in
> GCCycle, but this is missed by calls to TraceRuntime.  (Is that right Igor?)
> It would be better if there was a single wait call in MarkRuntime; perhaps
> we can do that in a future patch...

TraceRuntime currently contains:

#ifdef JS_THREADSAFE
    {
        JSContext *cx = trc->context;
        JSRuntime *rt = cx->runtime;
        if (rt->gcThread != cx->thread()) {
            AutoLockGC lock(rt);
            AutoGCSession gcsession(cx);

            rt->gcHelperThread.waitBackgroundSweepEnd();
            AutoUnlockGC unlock(rt);

            AutoCopyFreeListToArenas copy(rt);
            RecordNativeStackTopForGC(trc->context);
            MarkRuntime(trc);
            return;
        }
    }
#else
    AutoCopyFreeListToArenas copy(trc->runtime);
    RecordNativeStackTopForGC(trc->context);
#endif

    /*
     * Calls from inside a normal GC or a recursive calls are OK and do not
     * require session setup.
     */
    MarkRuntime(trc);

The condition rt->gcThread != cx->thread() ensures two things: another thread is not in the GC and TraceRuntime is not called from the GC itself.  In this case we call waitBackgroundSweepEnd().

With a single thread runtime we do not need to deal with other threads. But the case of calling TraceRuntime under the GC is possible, for example, when one calls DumpHeap or similar from the debugger. In that case it is very useful that the MarkRuntime scans exactly the same native stack that the real GC sees and dot just sample its own SP each time.

As I wrote the patch should just change the if condition from if (rt->gcThread != cx->thread()) to if (!rt->gcRunning) delegating *any* conservative GC simplifications to another bug.
Comment 23 Luke Wagner [:luke] 2012-01-23 14:24:49 PST
Created attachment 590873 [details] [diff] [review]
rm JSThread/ThreadData and some low-hanging fruit

The patch puts back ConservativeGCData and RecordNativeStackTop.  interruptFlags and interruptCounter are merged into a single JSRuntime::interrupt.
Comment 24 Igor Bukanov 2012-01-23 15:46:45 PST
Comment on attachment 590873 [details] [diff] [review]
rm JSThread/ThreadData and some low-hanging fruit

Review of attachment 590873 [details] [diff] [review]:
-----------------------------------------------------------------

r+ with comment fixes.

::: js/src/jsgc.cpp
@@ +2186,4 @@
>  
>      /*
>       * Trigger the GC when it is safe to call an operation callback on any
>       * thread.

Drop "on any thread"

@@ +2896,4 @@
>      rt->gcRunning = true;
>  }
>  
>  /* End the current GC session and allow other threads to proceed. */

Remove the comment.

@@ +2906,5 @@
>  /*
>   * GC, repeatedly if necessary, until we think we have not created any new
>   * garbage and no other threads are demanding more GC. We disable inlining
>   * to ensure that the bottom of the stack with possible GC roots recorded in
>   * js_GC excludes any pointers we use during the marking implementation.

Drop "and no other threads are demanding more GC"

@@ +2920,5 @@
>      /*
>       * Recursive GC is no-op and a call from another thread waits the started
>       * GC cycle to finish.
>       */
> +    if (rt->gcMarkAndSweep)

Drop in the comment everything after "is no-op"

@@ +2929,5 @@
>      /*
>       * Don't GC if any thread is reporting an OOM. We check the flag after we
>       * have set up the GC session and know that the thread that reported OOM
>       * is either the current thread or waits for the GC to complete on this
>       * thread.

Replace the comment with "Don't GC if we are reporting an OOM."
Comment 26 Matt Brubeck (:mbrubeck) 2012-01-24 17:38:08 PST
Sorry, we had to back this out on inbound:
https://hg.mozilla.org/integration/mozilla-inbound/rev/288eae8384a2

because of an infinite loop in test_ccdump.xul on Win debug builds:
https://ftp.mozilla.org/pub/mozilla.org/firefox/tinderbox-builds/mozilla-inbound-win32-debug/1327431171/mozilla-inbound_xp-debug_test-mochitest-other-build1930.txt.gz
Comment 27 Andrew McCreight [:mccr8] 2012-01-24 17:47:04 PST
That test just does a CC dump and ignores the results.  I'm not sure how it could hang.  The only real difference should be that it runs various bits of code to generate node and edge names.
Comment 28 Andrew McCreight [:mccr8] 2012-01-24 17:52:18 PST
I'd think the most likely culprit is NoteJSChild, in nsXPConnect.  There's a glob of code in that is guarded by #ifdef DEBUG (which would explain why it triggers in debug but not opt) and cb.WantDebugInfo() (which would explain why it triggers only during a CC dump), and it probably does JS-y stuff.  Could this patch have broken the JSTracer debugPrinter stuff?

Another suspect would be nsXPConnect::Traverse (that's the chunk that ObjShrink broke, that inspired this test), but that doesn't have the #ifdef DEBUG.
Comment 29 Luke Wagner [:luke] 2012-01-24 18:05:41 PST
The test prints out:

  [Exception... "'[JavaScript Error: "too much recursion"  ...

repeatedly.  Also, test_ccdump.xul calls the cycle collector synchronously (like about:memory and unlike normal cc), so my guess is that the C stack limit is not being appropriately switched between threads.  I really do not like this cycle-collection-off-the-main-thread silliness...
Comment 30 Luke Wagner [:luke] 2012-01-24 18:06:50 PST
So is test_ccdump.xul is running JS during cycle collection?
Comment 31 Andrew McCreight [:mccr8] 2012-01-24 18:08:08 PST
Yeah, it uses some empty functions as callbacks.  Maybe that's bad?  I was idly wondering the other day how it worked at all.
Comment 32 Luke Wagner [:luke] 2012-01-24 19:22:13 PST
Andrew suggested I just temporarily remove test_cc.xul since it seems clearly wrong to run JS during cycle collection and he will just re-add it later with a C++ observer.  Granted, the issue in comment 29 should still be fixed as a followup bug, but it isn't necessary for FF since FF does not run JS during cycle collection.  (Shell workers also don't hit the issue since JS_SetNativeStackQuota is never called on the workers' contexts hence cx->stackLimit == 0.)

https://hg.mozilla.org/integration/mozilla-inbound/rev/e517d4c43143
Comment 34 Eric Shepherd [:sheppy] 2012-04-24 11:29:11 PDT
These weren't really documented at all, so this change has simply been noted on Firefox 12 for developers.
Comment 35 Braden 2012-04-28 15:48:26 PDT
Some of this was documented (and still is): https://developer.mozilla.org/en/SpiderMonkey/JSAPI_Reference/JS_ClearContextThread

A summary of how code using JS_[Set|Clear]ContextThread should be changed would be quite useful.

Note You need to log in before you can comment on or make changes to this bug.