TSan: multiple data races on JSFunction::flags_ (in JSFunction::maybeRelazify, JSFunction::kind)

RESOLVED FIXED in Firefox 50

Status

()

Core
JavaScript: GC
RESOLVED FIXED
2 years ago
2 years ago

People

(Reporter: jseward, Assigned: jandem)

Tracking

Trunk
mozilla50
Points:
---

Firefox Tracking Flags

(firefox48 affected, firefox50 fixed)

Details

Attachments

(2 attachments)

(Reporter)

Description

2 years ago
I'm not sure that these are really the same thing -- hence, I'm not 
sure if it is really wise to put them together in the same bug.  But
for better or worse, here they are.

In any case, it's clear there is a bunch of racing happening on 
JSFunction::flags_, which seems to be a collection of bitfields.

STR: do a TSan-enabled build.  Then:

TSAN_OPTIONS=suppressions=/home/sewardj/MOZ/SUPPS/tsan-empty.txt \
   DISPLAY=:1.0 ./mach xpcshell-test --sequential --log-mach - \
   devtools/server/tests/unit/test_nesting-01.js 2>&1 \
   | tee spew-23-tsan-3-kind | grep SUMMARY

Sometimes the summary is for maybeRelazify, sometimes for kind, but it
is always for one or the other.  In other words *a* race is always
reported.
(Reporter)

Comment 1

2 years ago
Created attachment 8743945 [details]
race-maybeRelazify-and-kind

TSan complaints.
(Reporter)

Comment 2

2 years ago
Brian, is this something you can look at?  If not, can you suggest
who would be a suitable person?
Flags: needinfo?(bhackett1024)
It's hard for me to tell whether this is a false positive or not.  The reported race here is between the main thread relazifying functions at the start of a GC, and a helper thread performing an Ion compilation.  We are supposed to be canceling all off thread compilations at the start of the GC, but I can't tell where this happens anymore.  Immediately before the relazification stuff there is this block of code:

    /* For non-incremental GC the following sweep discards the jit code. */
    if (isIncremental) {
        for (GCZonesIter zone(rt); !zone.done(); zone.next()) {
            gcstats::AutoPhase ap(stats, gcstats::PHASE_MARK_DISCARD_CODE);
            zone->discardJitCode(rt->defaultFreeOp());
        }
    }

The discardJitCode call will cause all off thread compilations to be canceled.  But the comment on this block of code is plainly incorrect and doesn't address where jitcode is discarded / off thread compilations canceled for non-incremental GCs.
Flags: needinfo?(bhackett1024) → needinfo?(terrence)
Jan changed how we do relazification a couple months ago. Maybe he knows how this is supposed to work? If not, I'll try to figure out what should be happening.
Flags: needinfo?(terrence) → needinfo?(jdemooij)
(Reporter)

Comment 5

2 years ago
(In reply to comment #3 and comment #4)
I am not sure if this was clear from my original comments, but I suspect
that this is a bitfield race.  That is, there is no logical race, but because
JSFunction::flags_ is treated as a grab-bag of bit-fields, and because individual
fields cannot be updated atomically, races are reported as resulting from accesses
to unrelated parts of the word.  For example

jsfun.h:252:
    void setKind(FunctionKind kind) {
        this->flags_ &= ~FUNCTION_KIND_MASK;
        this->flags_ |= static_cast<uint16_t>(kind) << FUNCTION_KIND_SHIFT;
    }

assigns to some subfield of flags_ and writes back the other fields unchanged,
which potentially overwrites concurrent changes to those other fields made by some
other thread.  Does that seem plausible?
(Assignee)

Comment 6

2 years ago
(In reply to Terrence Cole [:terrence] from comment #4)
> Jan changed how we do relazification a couple months ago. Maybe he knows how
> this is supposed to work?

Before the relazification rewrite we relazified in JSFunction::trace, so I think we had the same issue then (assuming we don't stop stop off-thread compilations before the mark phase)?

> If not, I'll try to figure out what should be happening.

That would be great!

FWIW, there's also a CancelOffThreadIonCompile call in JitCompartment::sweep. Ideally there would be only 1 or 2 places where we have to do that for GC.
Flags: needinfo?(jdemooij) → needinfo?(terrence)
So I think the way it's supposed to work is that LIR should not have any direct GC pointers: we would not know to keep them rooted and they would race with the main thread. If LIR were allowed to have pointers, we'd have to stop and delete all ongoing compilations at the start of every GC slice, rather than once when we sweep the zone. This seems like it would destroy performance pretty effectively, so we should probably find a way to do whatever we're doing with this function without holding the function itself.

Jan, what is this function even doing in LIR?
Flags: needinfo?(terrence) → needinfo?(jdemooij)
Note that this is not a GC race specifically. GC just happens to touch lots of stuff.

Note 2: there are some "template" objects which are used by the BG thread, but we have mechanisms to root these on the main thread and ensure that they are not touched by other code while compilation is ongoing.
(Assignee)

Comment 9

2 years ago
I'll try to set up a TSan shell build locally so I can look at these races.
(Assignee)

Comment 10

2 years ago
Created attachment 8763856 [details] [diff] [review]
Patch

This wraps the JSFunction* pointer in MCall et al in a WrappedFunction class. WrappedFunction has a constructor that copies the relevant data while we're still on the main thread, to avoid races off-thread.

This fixes a ton of failures when I run jit-tests with TSan locally.
Assignee: nobody → jdemooij
Status: NEW → ASSIGNED
Flags: needinfo?(jdemooij)
Attachment #8763856 - Flags: review?(efaustbmo)
(Assignee)

Comment 11

2 years ago
Comment on attachment 8763856 [details] [diff] [review]
Patch

Review of attachment 8763856 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jit/MIR.h
@@ +3841,5 @@
> +        nargs_(fun->nargs()),
> +        isNative_(fun->isNative()),
> +        isConstructor_(fun->isConstructor()),
> +        isClassConstructor_(fun->isClassConstructor()),
> +        isSelfHostedBuiltin_(fun->isSelfHostedBuiltin())

I forgot to mention: an alternative is to add a FunctionFlags class wrapping the function's |uint16_t flags|, and then we can use that class here and in JSFunction. That's a bit more involved though and the compiler should be able to emit pretty efficient bit twiddling code for this.
Comment on attachment 8763856 [details] [diff] [review]
Patch

Review of attachment 8763856 [details] [diff] [review]:
-----------------------------------------------------------------

This seems fine. I have no objections to this design.

::: js/src/jit/MIR.h
@@ +3850,5 @@
> +    bool isClassConstructor() const { return isClassConstructor_; }
> +    bool isSelfHostedBuiltin() const { return isSelfHostedBuiltin_; }
> +
> +    // fun->native() and fun->jitInfo() can safely be called off-thread: these
> +    // fields never change.

This is where it would be so nice to have a typesystem that could enforce that this is never modified in a racy manner... Oh, well.
Attachment #8763856 - Flags: review?(efaustbmo) → review+
(Reporter)

Comment 13

2 years ago
Can this be landed now?

Comment 14

2 years ago
Pushed by jandemooij@gmail.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/64248267e93e
Fix a TSan data race on JSFunction flags. r=efaust

Comment 15

2 years ago
bugherder
https://hg.mozilla.org/mozilla-central/rev/64248267e93e
Status: ASSIGNED → RESOLVED
Last Resolved: 2 years ago
status-firefox50: --- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla50
You need to log in before you can comment on or make changes to this bug.