Closed Bug 1437554 Opened 2 years ago Closed 2 years ago

Off-thread parsing can conflict with GC

Categories

(Core :: JavaScript: GC, enhancement, P1)

enhancement

Tracking

()

RESOLVED FIXED
mozilla60
Tracking Status
firefox60 --- fixed

People

(Reporter: jonco, Assigned: jonco)

Details

Attachments

(1 file)

Bug 1431353 increased the number of helper thread that can be used for off-thread parsing.  Unfortunately these threads can become blocked waiting on the exclusive access lock for the duration of any GC, since this always takes that lock.  This slows down parsing but also slows down GC which uses helper threads for parallelising work.

This is almost a deadlock situation except for the fact that the helper thread system will not use all threads for off-thread parsing, keeping one in reserve due to an unrelated Wasm compilation issues (see GlobalHelperThreadState::canStartParseTask).

The problem is that the GC waits for GC work to run on helper threads while holding the lock.  Running helper threads that try to take the lock will block until the GC releases the lock.
Here's a patch to change the GC to only hold the exclusive access lock if the atoms zone is being collected.  We don't allow collecting the atoms zone while off-thread parsing is happening (it's not safe) so there is no conflict there.

This works by changing most of the places we pass around an AutoLockForExclusiveAccess& to pass an AutoTraceSession&, and changing the lock in that to a Maybe<AutoLockForExclusiveAccess>.  The lock is always taken at the beginning of a trace session, and the GC can release it if it determines that it doesn't need it due to not collecting the atoms zone.

I've done a couple of full try builds and they're looking green.  One issue I found was with mutex ordering under GlobalHelperThreadState::trace().  This runs while holding the helper thread lock and this causes an ordering violation if a tracing assert calls AtomIsPinnedInRuntime().  I worked around this by always taking exclusive access lock here.

I tested this and locally it does allow parse threads to continue to run during a non-atoms GC.
Attachment #8950238 - Flags: review?(sphink)
Comment on attachment 8950238 [details] [diff] [review]
bug1437554-only-lock-for-atoms-gc

Review of attachment 8950238 [details] [diff] [review]:
-----------------------------------------------------------------

This seems like a really great change, especially for parse-heavy workloads (like Facebook). But I wanted to be sure I understand what's going on.

What is the practical difference between ZoneGroupData and ActiveThreadData? Are you switching some things from ZGD -> ATD because we don't have "active thread or parse task" right now? (Which would be fine; I don't think adding that level of specificity to AllowedHelperThread makes sense.) Or am I missing the point?

This patch removes the AutoLockForExclusiveAccess from a number of sweeping functions: purgeRuntime, groupZonesForSweeping, performSweepActions, ... Is it the case that we already didn't need that lock, and were just sort of passing it in because we happened to have it? As in, is its removal really part of the same change, or could it have been removed before this change anyway without changing behavior? (If so, I'm fine with having it rolled in here, but I want to be sure I'm following what's going on.)

::: js/src/jsgc.cpp
@@ +6702,5 @@
>      MOZ_ASSERT(prevState == JS::HeapState::Idle);
>      MOZ_ASSERT(heapState != JS::HeapState::Idle);
>      MOZ_ASSERT_IF(heapState == JS::HeapState::MajorCollecting, AllNurseriesAreEmpty(rt));
> +
> +    maybeLock.emplace(rt);

Maybe comment here

   // Session always begins with lock held, but it may be released later if not needed.

or specifically call out the atoms zone if that makes more sense.

::: js/src/vm/HelperThreads.cpp
@@ +2098,5 @@
> +    // need to take that lock before the helper thread lock, if we don't have it
> +    // already.
> +    Maybe<AutoLockForExclusiveAccess> exclusiveLock;
> +    if (!session.maybeLock.isSome())
> +        exclusiveLock.emplace(trc->runtime());

Hm... does this reintroduce much nonparallelism? If so, and this really is only for an assertion, could this be DEBUG-only?
needinfo for questions on patch; it's an "r+ if what is actually going on is what sfink thinks is going on."
Flags: needinfo?(jcoppeard)
(In reply to Steve Fink [:sfink] [:s:] from comment #2)

> What is the practical difference between ZoneGroupData and ActiveThreadData?

I'm going to work through this to make sure I get this right...

ZoneGroupData checks access in three ways based on the zone group:
 - for zone groups owned by ahelper thread, access is allowed to the helper thread that owns them
 - for non-helper-thread zone groups, access is allowed to the active thread (checked with CurrentThreadCanAccessRuntime()).
 - for the null zone group (i.e. the atoms zone), access is allowed to threads holding the exclusive access lock

ActiveThreadData allows access to the active thread, i.e it checks CurrentThreadCanAccessRuntime().

The patch changes the check on Zone::gcSweepGroupEdges_ and Zone::listNext_ from the former to the latter.  These fields are used by the GC.  For non-helper-thread zones, there is no change in behaviour.  For helper thread zones these fields are unused because we never GC helper thread zones (until they stop being helper thread zones).  For the atoms zone, we no longer require the exclusive access lock but instead require they are accessed by the active thread.

So the net effect is that for the atoms zone we no longer allow access to these fields by parse threads and instead allow access to them by the GC (which runs on the active thread).

BTW I found this pattern of wrapping fields in access checking templates was good at finding problems.

> This patch removes the AutoLockForExclusiveAccess from a number of sweeping
> functions: purgeRuntime, groupZonesForSweeping, performSweepActions, ... Is
> it the case that we already didn't need that lock, and were just sort of
> passing it in because we happened to have it?

Yes, I couldn't see a reason why we had to have that lock there.  Maybe we used to require it?

> As in, is its removal really
> part of the same change, or could it have been removed before this change
> anyway without changing behavior?

We could have removed it previously.  But that wouldn't have changed behaviour since we would still have held the lock.

> ::: js/src/vm/HelperThreads.cpp
> @@ +2098,5 @@
> > +    // need to take that lock before the helper thread lock, if we don't have it
> > +    // already.
> > +    Maybe<AutoLockForExclusiveAccess> exclusiveLock;
> > +    if (!session.maybeLock.isSome())
> > +        exclusiveLock.emplace(trc->runtime());
> 
> Hm... does this reintroduce much nonparallelism? If so, and this really is
> only for an assertion, could this be DEBUG-only?

This is unfortunate but I don't think it will have a large effect.  It is only for an assertion but I don't want to make the locking behaviour different between debug and release builds.  Maybe we could skip the assertion if the thread already held the helper thread lock, that might work better.  But I'll do that as a followup.

Does that all sound OK?
Flags: needinfo?(jcoppeard)
Priority: -- → P1
Excellent, thank you so much for the detailed explanation! That clears up a lot of the perennial uncertainty I've had about what exactly these all mean.
Attachment #8950238 - Flags: review?(sphink) → review+
Pushed by jcoppeard@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/af441cafff70
Release the exclusive access lock when not collecting the atoms zone r=sfink
https://hg.mozilla.org/mozilla-central/rev/af441cafff70
Status: NEW → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla60
You need to log in before you can comment on or make changes to this bug.