Closed Bug 1219418 Opened 9 years ago Closed 9 years ago

TSan: data race js/src/jsgc.cpp:1476 GCSchedulingTunables::SetParameter (race on minEmptyChunkCount_)

Tracking

()

Status:

RESOLVED FIXED

Milestone:

mozilla45

Tracking Flags:

Tracking

Status

firefox45

---

fixed

b2g-v2.5

---

fixed

People

(Reporter: froydnj, Assigned: terrence)

References

(Blocks 1 open bug)

Details

(Whiteboard: [tsan])

Attachments

(2 files, 1 obsolete file)

TSan stack trace 9 years ago Nathan Froyd [:froydnj] 12.83 KB, text/plain		Details
fix_tsan_bug_minEmptyChunkCount-v0.diff 9 years ago Terrence Cole [:terrence] 13.39 KB, patch		Details \| Diff \| Splinter Review
fix_tsan_bug_minEmptyChunkCount-v1.diff 9 years ago Terrence Cole [:terrence] 15.20 KB, patch	jonco : review+	Details \| Diff \| Splinter Review

Nathan Froyd [:froydnj]

Reporter

Description

•

9 years ago

Attached file TSan stack trace — Details

The attached logfile shows a thread/data race detected by TSan (ThreadSanitizer).

* Specific information about this bug

This is a pretty straightforward race: read/write on different threads without intervening locking.  The wrinkle here is that we're reading it on a parallel task GC thread *before* the value is actually set on the main thread.  This seems unfortunate, and possibly leads to screwy behavior because a helper thread doesn't have the value we expect it to.

ni? Terrence for his opinion on this.

* General information about TSan, data races, etc.

Typically, races reported by TSan are not false positives, but it is possible that the race is benign. Even in this case though, we should try to come up with a fix unless this would cause unacceptable performance issues. Also note that seemingly benign races can possibly be harmful (also depending on the compiler and the architecture) [1][2].

If the bug cannot be fixed, then this bug should be used to either make a compile-time annotation for blacklisting or add an entry to the runtime blacklist.

[1] http://software.intel.com/en-us/blogs/2013/01/06/benign-data-races-what-could-possibly-go-wrong
[2] _How to miscompile programs with "benign" data races_: https://www.usenix.org/legacy/events/hotpar11/tech/final_files/Boehm.pdf

Flags: needinfo?(terrence)

Terrence Cole [:terrence]

Assignee

Comment 1

•

9 years ago

It's impossible to tell from the stacks, but I'm guessing the off-thread access is via expireEmptyChunkPool. This takes a |const AutoLockGC& lock| and expects the chunk pools to not change (although the foreground can continue allocating into any existing chunks, as long as it doesn't need a new one). I would consider access to minEmptyChunks to be under the ambit of things we should only access or change under the GC lock (and probably the rest of these variables too). I think we should add an AutoLockGC to setGCParameter. Patch coming up.

Flags: needinfo?(terrence)

Terrence Cole [:terrence]

Assignee

Comment 2

•

9 years ago

Ironically getProperty is already under the lock. The reason setProperty is not already under the lock is fairly dumb: setting the mark stack limit needs to FinishGC (quite reasonably), which causes us to wait for background sweeping, which needs to take the lock. We're double locked: boom.

I guess we can AutoUnlockGC when setting the mark stack limit, but ugh.

Terrence Cole [:terrence]

Assignee

Comment 3

•

9 years ago

Attached patch fix_tsan_bug_minEmptyChunkCount-v0.diff (obsolete) — Details — Splinter Review

This is a bit ugly, but I think better than what we have before.

Assignee: nobody → terrence

Status: NEW → ASSIGNED

Attachment #8680293 - Flags: review?(jcoppeard)

Terrence Cole [:terrence]

Assignee

Comment 4

•

9 years ago

Attached patch fix_tsan_bug_minEmptyChunkCount-v1.diff — Details — Splinter Review

Forgot to qref.

Attachment #8680293 - Attachment is obsolete: true

Attachment #8680293 - Flags: review?(jcoppeard)

Attachment #8680295 - Flags: review?(jcoppeard)

Terrence Cole [:terrence]

Assignee

Comment 5

•

9 years ago

Comment on attachment 8680295 [details] [diff] [review]
fix_tsan_bug_minEmptyChunkCount-v1.diff

Review of attachment 8680295 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jsgc.cpp
@@ +1293,5 @@
>      setMaxMallocBytes(maxbytes);
>  
> +    const char* size = getenv("JSGC_MARK_STACK_LIMIT");
> +    if (size)
> +        setMarkStackLimit(atoi(size), lock);

Just moving this from Runtime::init -- must have missed it when we split out GCRuntime.

@@ +1557,3 @@
>  {
>      MOZ_ASSERT(!rt->isHeapBusy());
> +    AutoUnlockGC unlock(lock);

Ugh. I think we should be able to move the lock above AutoStopVerifyingBarriers, but that causes other issues. Will try more tomorrow.

@@ +5181,5 @@
>  {
>      /* Update the GC state for zones we have swept. */
>      for (GCZoneGroupIter zone(rt); !zone.done(); zone.next()) {
>          MOZ_ASSERT(zone->isGCSweeping());
> +        AutoLockGC lock(rt);

If we move the lock above the loop, queueZonesForBackgroundSweep double-locks. I'd like to move the helper state and gc locks up to gcCycle; I'm not sure about the helper thread lock though. I think taking the helper thread should be fine, since it's only taken by the background for communication with the main thread, which we're currently blocking. I guess currently, more tasks can run and queue up results while we're GCing? I think I'm a bit terrified that that might be having an impact currently.

Jon Coppeard (:jonco)

Comment 6

•

9 years ago

Comment on attachment 8680295 [details] [diff] [review]
fix_tsan_bug_minEmptyChunkCount-v1.diff

Review of attachment 8680295 [details] [diff] [review]:
-----------------------------------------------------------------

This looks fine.

::: js/src/jsgc.cpp
@@ +1557,3 @@
>  {
>      MOZ_ASSERT(!rt->isHeapBusy());
> +    AutoUnlockGC unlock(lock);

Yes this is a bit unfortunate but necessary I guess.

@@ +5186,3 @@
>          zone->setGCState(Zone::Finished);
>          zone->threshold.updateAfterGC(zone->usage.gcBytes(), invocationKind, tunables,
> +                                      schedulingState, lock);

Another way we could handle the would be to lock once at the start of the method and copy the data we need out of tunables into a separate structure, and pass that down through updateAfterGC() into computeZoneTriggerBytes().

Attachment #8680295 - Flags: review?(jcoppeard) → review+

Terrence Cole [:terrence]

Assignee

Comment 7

•

9 years ago

https://hg.mozilla.org/integration/mozilla-inbound/rev/07485d615e337c907c5ef57b0dc2e7fa0e0bd251
Bug 1219418 - Always access minEmptyChunkCount under the GC lock; r=jonco

Wes Kocher (:KWierso) (Not reading bugmail; email directly if needed)

Comment 8

•

9 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/07485d615e33

Status: ASSIGNED → RESOLVED

Closed: 9 years ago

status-firefox45: --- → fixed

Resolution: --- → FIXED

Target Milestone: --- → mozilla45

Carsten Book [:Tomcat]

Comment 9

•

9 years ago

bugherder uplift

https://hg.mozilla.org/releases/mozilla-b2g44_v2_5/rev/07485d615e33

status-b2g-v2.5: --- → fixed

You need to log in before you can comment on or make changes to this bug.

Bugzilla

Quick Search

TSan: data race js/src/jsgc.cpp:1476 GCSchedulingTunables::SetParameter (race on minEmptyChunkCount_)

Categories

(Core :: General, defect)

Tracking

()

People

(Reporter: froydnj, Assigned: terrence)

References

(Blocks 1 open bug)

Details

(Whiteboard: [tsan])

Crash Data

Security

(public)

User Story

Attachments

(2 files, 1 obsolete file)

Description

Comment 1

Comment 2

Comment 3

Comment 4

Comment 5

Comment 6

Comment 7

Comment 8

Comment 9

Attachment

General

Description

File Name

Content Type