Closed Bug 1219418 Opened 4 years ago Closed 4 years ago

TSan: data race js/src/jsgc.cpp:1476 GCSchedulingTunables::SetParameter (race on minEmptyChunkCount_)

Categories

(Core :: General, defect)

x86_64
Linux
defect
Not set

Tracking

()

RESOLVED FIXED
mozilla45
Tracking Status
firefox45 --- fixed
b2g-v2.5 --- fixed

People

(Reporter: froydnj, Assigned: terrence)

References

(Blocks 1 open bug)

Details

(Whiteboard: [tsan])

Attachments

(2 files, 1 obsolete file)

Attached file TSan stack trace
The attached logfile shows a thread/data race detected by TSan (ThreadSanitizer).

* Specific information about this bug

This is a pretty straightforward race: read/write on different threads without intervening locking.  The wrinkle here is that we're reading it on a parallel task GC thread *before* the value is actually set on the main thread.  This seems unfortunate, and possibly leads to screwy behavior because a helper thread doesn't have the value we expect it to.

ni? Terrence for his opinion on this.

* General information about TSan, data races, etc.

Typically, races reported by TSan are not false positives, but it is possible that the race is benign. Even in this case though, we should try to come up with a fix unless this would cause unacceptable performance issues. Also note that seemingly benign races can possibly be harmful (also depending on the compiler and the architecture) [1][2].

If the bug cannot be fixed, then this bug should be used to either make a compile-time annotation for blacklisting or add an entry to the runtime blacklist.

[1] http://software.intel.com/en-us/blogs/2013/01/06/benign-data-races-what-could-possibly-go-wrong
[2] _How to miscompile programs with "benign" data races_: https://www.usenix.org/legacy/events/hotpar11/tech/final_files/Boehm.pdf
Flags: needinfo?(terrence)
It's impossible to tell from the stacks, but I'm guessing the off-thread access is via expireEmptyChunkPool. This takes a |const AutoLockGC& lock| and expects the chunk pools to not change (although the foreground can continue allocating into any existing chunks, as long as it doesn't need a new one). I would consider access to minEmptyChunks to be under the ambit of things we should only access or change under the GC lock (and probably the rest of these variables too). I think we should add an AutoLockGC to setGCParameter. Patch coming up.
Flags: needinfo?(terrence)
Ironically getProperty is already under the lock. The reason setProperty is not already under the lock is fairly dumb: setting the mark stack limit needs to FinishGC (quite reasonably), which causes us to wait for background sweeping, which needs to take the lock. We're double locked: boom.

I guess we can AutoUnlockGC when setting the mark stack limit, but ugh.
This is a bit ugly, but I think better than what we have before.
Assignee: nobody → terrence
Status: NEW → ASSIGNED
Attachment #8680293 - Flags: review?(jcoppeard)
Forgot to qref.
Attachment #8680293 - Attachment is obsolete: true
Attachment #8680293 - Flags: review?(jcoppeard)
Attachment #8680295 - Flags: review?(jcoppeard)
Comment on attachment 8680295 [details] [diff] [review]
fix_tsan_bug_minEmptyChunkCount-v1.diff

Review of attachment 8680295 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jsgc.cpp
@@ +1293,5 @@
>      setMaxMallocBytes(maxbytes);
>  
> +    const char* size = getenv("JSGC_MARK_STACK_LIMIT");
> +    if (size)
> +        setMarkStackLimit(atoi(size), lock);

Just moving this from Runtime::init -- must have missed it when we split out GCRuntime.

@@ +1557,3 @@
>  {
>      MOZ_ASSERT(!rt->isHeapBusy());
> +    AutoUnlockGC unlock(lock);

Ugh. I think we should be able to move the lock above AutoStopVerifyingBarriers, but that causes other issues. Will try more tomorrow.

@@ +5181,5 @@
>  {
>      /* Update the GC state for zones we have swept. */
>      for (GCZoneGroupIter zone(rt); !zone.done(); zone.next()) {
>          MOZ_ASSERT(zone->isGCSweeping());
> +        AutoLockGC lock(rt);

If we move the lock above the loop, queueZonesForBackgroundSweep double-locks. I'd like to move the helper state and gc locks up to gcCycle; I'm not sure about the helper thread lock though. I think taking the helper thread should be fine, since it's only taken by the background for communication with the main thread, which we're currently blocking. I guess currently, more tasks can run and queue up results while we're GCing? I think I'm a bit terrified that that might be having an impact currently.
Comment on attachment 8680295 [details] [diff] [review]
fix_tsan_bug_minEmptyChunkCount-v1.diff

Review of attachment 8680295 [details] [diff] [review]:
-----------------------------------------------------------------

This looks fine.

::: js/src/jsgc.cpp
@@ +1557,3 @@
>  {
>      MOZ_ASSERT(!rt->isHeapBusy());
> +    AutoUnlockGC unlock(lock);

Yes this is a bit unfortunate but necessary I guess.

@@ +5186,3 @@
>          zone->setGCState(Zone::Finished);
>          zone->threshold.updateAfterGC(zone->usage.gcBytes(), invocationKind, tunables,
> +                                      schedulingState, lock);

Another way we could handle the would be to lock once at the start of the method and copy the data we need out of tunables into a separate structure, and pass that down through updateAfterGC() into computeZoneTriggerBytes().
Attachment #8680295 - Flags: review?(jcoppeard) → review+
https://hg.mozilla.org/mozilla-central/rev/07485d615e33
Status: ASSIGNED → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla45
You need to log in before you can comment on or make changes to this bug.