Closed Bug 675078 Opened 13 years ago Closed 12 years ago

rm JSThreadData and JSThread

Categories

(Core :: JavaScript Engine, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla12

People

(Reporter: luke, Assigned: luke)

References

Details

Attachments

(1 file, 2 obsolete files)

Attached patch rm JSThread/JSThreadData (obsolete) — Splinter Review
Once bug 650411 lands, JSThreadData can be rolled into JSRuntime and JSThread isn't needed at all.
Depends on: 650411
Comment on attachment 549260 [details] [diff] [review]
rm JSThread/JSThreadData

It'll be a while before this can land but, FWIW, green on try.
Attachment #549260 - Attachment description: rm JSThread/JSThreadData (WIP) → rm JSThread/JSThreadData
Comment on attachment 549260 [details] [diff] [review]
rm JSThread/JSThreadData

>--- a/js/src/jscntxt.h
>+++ b/js/src/jscntxt.h
>+    /* Number of JS_SuspendRequest calls withot JS_ResumeRequest. */
>+    unsigned            suspendCount;
>+
>+# ifdef DEBUG

You just copied this, but "#ifdef"?

>--- a/js/src/jsgc.cpp
>+++ b/js/src/jsgc.cpp
>     /*
>      * Recursive GC is no-op and a call from another thread waits the started
>      * GC cycle to finish.
>      */

Is this still relevant?
(In reply to Ms2ger from comment #2)
> >+# ifdef DEBUG
> 
> You just copied this, but "#ifdef"?

Nope: SM style indents 1 space per level of #if nesting.

> >      * Recursive GC is no-op and a call from another thread waits the started
> >      * GC cycle to finish.
> 
> Is this still relevant?

There is a ton of stuff that will be obsolete when bug 650411 can be considered to have "stuck" (esp. in the GC).  Rest assured the rapture awaits.
Blocks: 719858
Blocks: 720013
Blocks: 720018
45 files changed, 513 insertions(+), 2752 deletions(-)

The patch straight-up simply removes JSThread and folds ThreadData into JSRuntime.  Secondly, a few obviously useless locks/condvars are removed (which allow jslock.cpp, lock_sparcv8plus.il, and lock_sparcv9.il to be removed).

njn, could you review the memory reporting changes and igor, could you review the rest?
Attachment #549260 - Attachment is obsolete: true
Attachment #590383 - Flags: review?(n.nethercote)
Attachment #590383 - Flags: review?(igor)
Blocks: 720045
Comment on attachment 590383 [details] [diff] [review]
rm JSThread/JSThreadData and some low-hanging fruit

Review of attachment 590383 [details] [diff] [review]:
-----------------------------------------------------------------

Looks fine, just one nit below.  I only reviewed MemoryMetrics.{h,cpp} and XPCJSRuntime.cpp, is that all you wanted me to look at?

When this lands I'd love it if you could write a message to the JSAPI list to explain the changes and benefits!

::: js/src/MemoryMetrics.cpp
@@ +222,5 @@
> +        rt->sizeOfExcludingThis(data->mallocSizeOf,
> +                                &data->runtimeNormal,
> +                                &data->runtimeTemporary,
> +                                &data->runtimeRegexpCode,
> +                                &data->runtimeStackCommitted);

data->runtimeNormal et al are int64_t, but rt->sizeOfExcludingThis expects size_t pointers.  Can you introduce size_t temporaries and pass them to sizeOfExcludingThis, and then assign those temporaries to data->runtimeNormal et al, similar to how the old code worked?
Attachment #590383 - Flags: review?(n.nethercote) → review+
(In reply to Nicholas Nethercote [:njn] from comment #5)
> data->runtimeNormal et al are int64_t, but rt->sizeOfExcludingThis expects
> size_t pointers.

The patch changes that :)  (Type error)

> Can you introduce size_t temporaries and pass them to
> sizeOfExcludingThis, and then assign those temporaries to
> data->runtimeNormal et al, similar to how the old code worked?

Do you still want me to do this?
Comment on attachment 590383 [details] [diff] [review]
rm JSThread/JSThreadData and some low-hanging fruit

I see in a few places that the patch removes the code that synchronizes with the background thread or removes the locking to allow calling the operation callback triggers from an arbitrary thread. This is just wrong. In general, I would prefer to see a patch that does not remove any GC locks/unlocks unless it is very clear that it would not affect any semantics.

And for this reason i really would like to see first a patch that just stores JSThread in JSRuntime and then have a separated patch that moves the fields into JSRuntime. With the current patch it is way to easy to miss important semantic changes.
Attachment #590383 - Flags: review?(igor) → review-
> The patch changes that :)  (Type error)

Oh, you changed sizeOfExcludingThis().

It's kind of lame, but sizeOfExcludingThis functions always have size_t return types, but memory reporters need PRInt64/int64_t types because IDL doesn't allow size_t and -1 is used to mean "unknown".  So I'd like you to revert the sizeOfExcludingThis type change and add the temporaries.  

(I've been thinking about changing IterateData to have size_t members instead of int64_t members, but until then, these temporaries are the easiest way to keep things consistent with the current scheme.)

Thanks!
(In reply to Igor Bukanov from comment #7)
> I see in a few places that the patch removes the code that synchronizes with
> the background thread or removes the locking to allow calling the operation
> callback triggers from an arbitrary thread. This is just wrong.

For example, the changes in TraceRuntime in jsgc.cpp removes all JS_THREADSAFE code. But this misses that rt->gcThread != cx->thread() is true when the GC is not running on the current thread and when rt->gcThread is true. Another example is onTooMuchMalloc. It still must do AutoLockGC lock(this) before calling T.riggerGC as the latter can be called from, for example, a timer thread and requires the GC lock
Another problem with the patch is that the conservative GC changes no longer includes the registers into the the set of the conservative roots. This should be restored.
(In reply to Igor Bukanov from comment #9)
> For example, the changes in TraceRuntime in jsgc.cpp removes all
> JS_THREADSAFE code. But this misses that rt->gcThread != cx->thread() is
> true when the GC is not running on the current thread and when rt->gcThread
> is true.

rt->gcThread == cx->thread always.  It is an invariant.  See JS_AbortIfWrongThread in js_GC.  That is why I removed gcThread.

> Another example is onTooMuchMalloc. It still must do AutoLockGC
> lock(this) before calling T.riggerGC as the latter can be called from, for
> example, a timer thread and requires the GC lock

Can't happen.

(In reply to Igor Bukanov from comment #10)
> Another problem with the patch is that the conservative GC changes no longer
> includes the registers into the the set of the conservative roots. This
> should be restored.

Do you realistically think the registers will contain a pointer to a GC-thing that somehow was never spilled half despite being neck-deep in the GC?  I (clearly) do not, but it is trivial to add it back.
(Green on try)
In general, only three lock-y things were removed: atomState.lock, gcDone, and requestDone.  A simple grepping shows these are only used from the runtime's owner thread.  I initially tried to break this change into a queue of minimal patches, but those intermediate states were a lot harder to reason about than the end state, which is rather simple.
Comment on attachment 590383 [details] [diff] [review]
rm JSThread/JSThreadData and some low-hanging fruit

Given the clarifications in comments 11 and 13 would you consider looking at the patch again?  They significantly simplify reasoning about the patch.  E.g., js_WaitForGC is trivially a nop.
Attachment #590383 - Flags: review- → review?(igor)
Why don't we have to wait for the background thread to finish in TraceRuntime? Is there another check somewhere?
(In reply to Luke Wagner [:luke] from comment #11)
> rt->gcThread == cx->thread always.  It is an invariant.  See
> JS_AbortIfWrongThread in js_GC.  

Currently rt->gcThread is set only during the GC. The rest of the time it is null. So it is not invariant and that code must be preserved to ensure that we properly synchronizes against the background thread.

> > Another example is onTooMuchMalloc. It still must do AutoLockGC
> > lock(this) before calling TriggerGC as the latter can be called from, for
> > example, a timer thread and requires the GC lock
> 
> Can't happen.

How exactly do we interrupt the main thread with a long running script if not setting the interrupt flags from the another thread? As a part of that implementation TriggerGC uses the GC lock and assumes that it is hold by the caller. Without the lock a potential race in TriggerGC called from onTooMuchMalloc with a timer thread may breaks things.

> Do you realistically think the registers will contain a pointer to a
> GC-thing that somehow was never spilled half despite being neck-deep in the
> GC?

If you can prove that none of the compilers on platforms that we use inline or transform the code to such extent that register spill does not happen, then yes, lets remove the register scan. But I am not aware of such proof and prefer to be on a safe side.

Another thing is that the patch moves recording of the stack pointer for the conservative scanner to MarkConservativeStackRoots. That introduces false positives from various pointers into the GC arenas that were touched in the GC code before doing the conservative scan. I have observed that many times and that is why currently the code recorders the stack bounds separately early in the GC-related calls. That must be preserved.
(In reply to Igor Bukanov from comment #16)
> (In reply to Luke Wagner [:luke] from comment #11)
> > rt->gcThread == cx->thread always.  It is an invariant.  See
> > JS_AbortIfWrongThread in js_GC.  
> 
> Currently rt->gcThread is set only during the GC. The rest of the time it is
> null. So it is not invariant and that code must be preserved to ensure that
> we properly synchronizes against the background thread.

You're right "always" was overreaching; "if gcThread is non-null, then it is the same as cx->thread".  Also, yes, I assume you are talking about the one instance in TraceRuntime?

> > > Another example is onTooMuchMalloc. It still must do AutoLockGC
> > > lock(this) before calling TriggerGC as the latter can be called from, for
> > > example, a timer thread and requires the GC lock
> > 
> > Can't happen.
> 
> How exactly do we interrupt the main thread with a long running script if
> not setting the interrupt flags from the another thread?

Given that *only* triggerOperationCallback may happen off the runtime's owner thread, that must mean you thing there is some race condition since there are two states here (interruptFlags/interruptCounter)?  That just points out that we don't need two volatile operation-callback flag variables anymore.

> > Do you realistically think the registers will contain a pointer to a
> > GC-thing that somehow was never spilled half despite being neck-deep in the
> > GC?
> 
> If you can prove that none of the compilers on platforms that we use inline
> or transform the code to such extent that register spill does not happen,
> then yes, lets remove the register scan. But I am not aware of such proof
> and prefer to be on a safe side.

Heh, I thought you'd say that.  Lack of proof didn't stop us from doing conservative GC in the first place (we can't even pretend to being close to having one).  But fine, I'll put it back.  In theory the whole thing is going away eventually anyway.
Comment on attachment 590383 [details] [diff] [review]
rm JSThread/JSThreadData and some low-hanging fruit

There are a few good comments to address, so I'll include them and post a new patch.
Attachment #590383 - Flags: review?(igor)
(In reply to Luke Wagner [:luke] from comment #17)
> You're right "always" was overreaching; "if gcThread is non-null, then it is
> the same as cx->thread".  Also, yes, I assume you are talking about the one
> instance in TraceRuntime?

To be clear: in TraceRuntime the condition rt->gcThread != cx->thread() should be replaced with if (!rt->gcRunning) and the rest of the code should be preserved for now.

> Given that *only* triggerOperationCallback may happen off the runtime's
> owner thread, that must mean you thing there is some race condition since
> there are two states here (interruptFlags/interruptCounter)?  That just
> points out that we don't need two volatile operation-callback flag variables
> anymore.

For now just keep that lock in onTooMuchMalloc. Its removal should go to a separated patch. But in this patch surely remove all instances of interruptCounter - that had to be added to deal with web workers scripts that could have been executed against different threads when it was not possible to record a pointer to JSThread::interruptFlags into the jited code.

> Heh, I thought you'd say that.  Lack of proof didn't stop us from doing
> conservative GC in the first place (we can't even pretend to being close to
> having one).

You are right, I was too conservative at that point. But for the case here the argument against the proposed simplification I think is stronger. To avoid false positives (that was observed in practice) sampling of the current stack pointer should be done at the earliest point during the GC when the native stack is not used for any GC structures yet.  But such early sampling increases a chance that we could have a register with GC things that is not yet spilled.

And when you update the patch, could you keep for now the compare-and-swap code in jslock.cpp (but put it under #if 0) and also keep the CPU counter function there? This will minimizes number of conflicts with my patches that tries to push to the background thread more parts of the GC sweeping phase.
(In reply to Igor Bukanov from comment #19)
> To avoid false positives (that was observed in practice) sampling of the
> current stack pointer should be done at the earliest point during the GC
> when the native stack is not used for any GC structures yet.

I understand the general concern of trying to minimize unintended false positives.  Given that the try server didn't report any such leaks, though, it seems like we should default to the simplest possible design (which also simplifies the review for this patch).
(In reply to Gregor Wagner [:gwagner] from comment #15)
> Why don't we have to wait for the background thread to finish in
> TraceRuntime? Is there another check somewhere?

I was confused by this also.  There is a waitBackgroundSweepOrAllocEnd in GCCycle, but this is missed by calls to TraceRuntime.  (Is that right Igor?)  It would be better if there was a single wait call in MarkRuntime; perhaps we can do that in a future patch...
(In reply to Luke Wagner [:luke] from comment #20)
> I understand the general concern of trying to minimize unintended false
> positives.  Given that the try server didn't report any such leaks, though,
> it seems like we should default to the simplest possible design

The leaks are not visible at the tryserver, since at shutdown we do not scan the native stacks. But there are some tests that are sensitive to these leaks since the tests depend on the GC collecting things and running finalizers. My past experience tells me that the patch in its current form increases a chance for the tests to fail randomly.

 (which also
> simplifies the review for this patch).

If the patch would have kept ConservativeGCThreadData and RecordNativeStackTopForGC as is (i.e. if it would just do what was proposed in the comment 0), the patch would be smaller and we would not waste time for the discussion here.

Somebody once rightfully observed that efforts for a meaningful review are proportional to something like the squire of the number of lines the patch changes. So patches that mix big mechanical changes like moving fields from ThreadData to JSRuntime and unrelated changes are really bad from that point of view. Yes, it may be easier to write one big patch as the author does not need to introduce some temporary code just to decouple somewhat related changes, but the reviewer efforts to look for and check those changes among the noise from the mechanical code movement increases disproportionately.

(In reply to Luke Wagner [:luke] from comment #21)
> I was confused by this also.  There is a waitBackgroundSweepOrAllocEnd in
> GCCycle, but this is missed by calls to TraceRuntime.  (Is that right Igor?)
> It would be better if there was a single wait call in MarkRuntime; perhaps
> we can do that in a future patch...

TraceRuntime currently contains:

#ifdef JS_THREADSAFE
    {
        JSContext *cx = trc->context;
        JSRuntime *rt = cx->runtime;
        if (rt->gcThread != cx->thread()) {
            AutoLockGC lock(rt);
            AutoGCSession gcsession(cx);

            rt->gcHelperThread.waitBackgroundSweepEnd();
            AutoUnlockGC unlock(rt);

            AutoCopyFreeListToArenas copy(rt);
            RecordNativeStackTopForGC(trc->context);
            MarkRuntime(trc);
            return;
        }
    }
#else
    AutoCopyFreeListToArenas copy(trc->runtime);
    RecordNativeStackTopForGC(trc->context);
#endif

    /*
     * Calls from inside a normal GC or a recursive calls are OK and do not
     * require session setup.
     */
    MarkRuntime(trc);

The condition rt->gcThread != cx->thread() ensures two things: another thread is not in the GC and TraceRuntime is not called from the GC itself.  In this case we call waitBackgroundSweepEnd().

With a single thread runtime we do not need to deal with other threads. But the case of calling TraceRuntime under the GC is possible, for example, when one calls DumpHeap or similar from the debugger. In that case it is very useful that the MarkRuntime scans exactly the same native stack that the real GC sees and dot just sample its own SP each time.

As I wrote the patch should just change the if condition from if (rt->gcThread != cx->thread()) to if (!rt->gcRunning) delegating *any* conservative GC simplifications to another bug.
The patch puts back ConservativeGCData and RecordNativeStackTop.  interruptFlags and interruptCounter are merged into a single JSRuntime::interrupt.
Attachment #590383 - Attachment is obsolete: true
Attachment #590873 - Flags: review?(igor)
Comment on attachment 590873 [details] [diff] [review]
rm JSThread/ThreadData and some low-hanging fruit

Review of attachment 590873 [details] [diff] [review]:
-----------------------------------------------------------------

r+ with comment fixes.

::: js/src/jsgc.cpp
@@ +2186,4 @@
>  
>      /*
>       * Trigger the GC when it is safe to call an operation callback on any
>       * thread.

Drop "on any thread"

@@ +2896,4 @@
>      rt->gcRunning = true;
>  }
>  
>  /* End the current GC session and allow other threads to proceed. */

Remove the comment.

@@ +2906,5 @@
>  /*
>   * GC, repeatedly if necessary, until we think we have not created any new
>   * garbage and no other threads are demanding more GC. We disable inlining
>   * to ensure that the bottom of the stack with possible GC roots recorded in
>   * js_GC excludes any pointers we use during the marking implementation.

Drop "and no other threads are demanding more GC"

@@ +2920,5 @@
>      /*
>       * Recursive GC is no-op and a call from another thread waits the started
>       * GC cycle to finish.
>       */
> +    if (rt->gcMarkAndSweep)

Drop in the comment everything after "is no-op"

@@ +2929,5 @@
>      /*
>       * Don't GC if any thread is reporting an OOM. We check the flag after we
>       * have set up the GC session and know that the thread that reported OOM
>       * is either the current thread or waits for the GC to complete on this
>       * thread.

Replace the comment with "Don't GC if we are reporting an OOM."
Attachment #590873 - Flags: review?(igor) → review+
Blocks: 720753
That test just does a CC dump and ignores the results.  I'm not sure how it could hang.  The only real difference should be that it runs various bits of code to generate node and edge names.
I'd think the most likely culprit is NoteJSChild, in nsXPConnect.  There's a glob of code in that is guarded by #ifdef DEBUG (which would explain why it triggers in debug but not opt) and cb.WantDebugInfo() (which would explain why it triggers only during a CC dump), and it probably does JS-y stuff.  Could this patch have broken the JSTracer debugPrinter stuff?

Another suspect would be nsXPConnect::Traverse (that's the chunk that ObjShrink broke, that inspired this test), but that doesn't have the #ifdef DEBUG.
The test prints out:

  [Exception... "'[JavaScript Error: "too much recursion"  ...

repeatedly.  Also, test_ccdump.xul calls the cycle collector synchronously (like about:memory and unlike normal cc), so my guess is that the C stack limit is not being appropriately switched between threads.  I really do not like this cycle-collection-off-the-main-thread silliness...
So is test_ccdump.xul is running JS during cycle collection?
Yeah, it uses some empty functions as callbacks.  Maybe that's bad?  I was idly wondering the other day how it worked at all.
Andrew suggested I just temporarily remove test_cc.xul since it seems clearly wrong to run JS during cycle collection and he will just re-add it later with a C++ observer.  Granted, the issue in comment 29 should still be fixed as a followup bug, but it isn't necessary for FF since FF does not run JS during cycle collection.  (Shell workers also don't hit the issue since JS_SetNativeStackQuota is never called on the workers' contexts hence cx->stackLimit == 0.)

https://hg.mozilla.org/integration/mozilla-inbound/rev/e517d4c43143
https://hg.mozilla.org/mozilla-central/rev/79deba022227
https://hg.mozilla.org/mozilla-central/rev/288eae8384a2
https://hg.mozilla.org/mozilla-central/rev/e517d4c43143

\o/
 |
/|
Status: ASSIGNED → RESOLVED
Closed: 12 years ago
Keywords: dev-doc-needed
Resolution: --- → FIXED
Target Milestone: --- → mozilla12
Blocks: 714639
Blocks: 721345
Blocks: 722345
Blocks: 723286
Blocks: 724310
Blocks: 725595
These weren't really documented at all, so this change has simply been noted on Firefox 12 for developers.
Some of this was documented (and still is): https://developer.mozilla.org/en/SpiderMonkey/JSAPI_Reference/JS_ClearContextThread

A summary of how code using JS_[Set|Clear]ContextThread should be changed would be quite useful.
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: