Closed Bug 1280637 Opened 8 years ago Closed 8 years ago

Add a TSan-enabled TaskCluster JS shell job

Categories

(Core :: JavaScript Engine, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla50
Tracking Status
firefox50 --- fixed

People

(Reporter: n.nethercote, Assigned: sfink)

References

(Blocks 1 open bug)

Details

Attachments

(7 files, 2 obsolete files)

We want to start testing the JS shell on automation under TreeHerder. There are still some races present, so we'll need to suppress the existing ones, but even with that it will still prevent new races from landing.

TSan has a ~6x slowdown factor, so we might not want to run all combinations of JIT/non-JIT.
Assignee: nobody → sphink
jseward, can you please provide details on:

- how to build the shell with TSan enabled, and

- how to run the shell with TSan enabled, and

- an appropriate suppression file for the existing races.
Flags: needinfo?(jseward)
Also the script to aggregate/reduce the TSan output.
(In reply to Nick Fitzgerald [:fitzgen] [⏰PDT; UTC-7] from comment #2)
> Also the script to aggregate/reduce the TSan output.

Not sure that'll be necessary? TSan will spit out errors in the log if it finds any, just like ASan or Valgrind.
Notes to myself: currently the automation builds use gcc on linux, and this will require clang. They get the compiler from the tooltool manifests used for desktop, so we'll have to fork off the shell builds' (or just this build's) tooltool manifest from the browser's, which is kind of unfortunate since we won't automatically inherit any new dependencies. Or I can go back to using multiple tooltool manifests, which apparently is not the model by which they're supposed to be used (the idea is that you have a single tooltool manifest that establishes part of the environment; I don't happen to agree with that idea, but my usual reviewer does).

Or I can investigate making this use its own docker image, and stop using tooltool for installing things like this. Or a combination.
(In reply to Nicholas Nethercote [:njn] from comment #3)
> (In reply to Nick Fitzgerald [:fitzgen] [⏰PDT; UTC-7] from comment #2)
> > Also the script to aggregate/reduce the TSan output.
> 
> Not sure that'll be necessary? TSan will spit out errors in the log if it
> finds any, just like ASan or Valgrind.

jseward told me at lunch the other day that tsan generates way too much spew to make sense of. It will be nice to have all the individual occurrences of a race aggregated and sorted by count for us.

But I defer to folks actually doing the implementation, of course :)
(In reply to Nick Fitzgerald [:fitzgen] [⏰PDT; UTC-7] from comment #5)
> jseward told me at lunch the other day that tsan generates way too much spew
> to make sense of. It will be nice to have all the individual occurrences of
> a race aggregated and sorted by count for us.

Yes.  TSan generates easily a hundred lines of output per race reported,
and often a lot more.  This quickly gets unwieldy to look at and is only
really good for making a zero-vs-nonzero-number-of-races decision.  I'll
attach the script in the next comment.
Attached file summarise_races.sh
A very simple shell script to summarise TSan races, given the
output of TSan.  This can be either sent in on stdin or given
as a command line argument, because of the way /usr/bin/cat works.

You'll need to edit the path inside it, in the sed command on 
line 4.

The, when it works correctly, has the following appearance:

     14 /usr/src/debug/glib-2.46.2/glib/gmem.c:189 in g_free
      3 xpcom/threads/nsThread.cpp:206 in Wait
      2 /usr/include/bits/string3.h:90 in g_slice_alloc0
      2 js/src/gc/Tracer.cpp:70 in JS::Value DoCallback<JS::Value>(JS::CallbackTracer*, JS:
:Value*, char const*)
      2 js/src/gc/Heap.h:659 in unsetDelayedMarking
      2 js/src/gc/Heap.h:581 in getAllocKind
      1 js/src/jspubtd.h:164 in isHeapBusy
      1 js/src/jsgc.cpp:5630 in AutoTraceSession
      1 js/src/gc/Barrier.h:448 in unbarrieredGet
> The, when it works correctly, has the following appearance:

That should be: "The expected output, when it works correctly .."
Flags: needinfo?(jseward)
(In reply to Nicholas Nethercote [:njn] from comment #1)
> jseward, can you please provide details on:
> - how to build the shell with TSan enabled, and
> - how to run the shell with TSan enabled, and

I haven't done any standalone shell runs on TSan, so I am not sure how
to build or start it.  But here's my mozconfig for a TSanified run of
the browser.  I think it is derived from one I inherited from Nathan.

A couple of points to bear in mind when reading the file:

* You need to be sure to use a llvm_symbolizer that matches the Clang
  installation in use.  I got into difficulties when mixing an ad-hoc
  installed Clang with a default (Fedora-supplied) llvm_symbolizer.

* TSan does some somewhat dubious shadow memory trickery that involves
  reserving about 120 terabytes of address space, almost all of which
  is never used.  Some recent-ish Linux kernels have been known to
  have problems with this, and this is the reason for "-fPIC -pie"
  flags.  They may not be necessary for you -- it requires
  experimentation.

For whatever it's worth, this mozconfig, together with Clang 3.8.0,
works fine for the browser on Fedora 23 (x86_64).
I made an abortive attempt to do this in the new-style configure (also using --enable-sanitize=... instead of individual options), but it turned out to be too confusing for me.

I could at least separate this out into an autoconf/sanitize.m4 if you'd like.

I merged in the funky address sanitizer logic from the toplevel old-configure.in to the js/src/old-configure.in, but I don't actually know whether it's still needed.

Also, the original landing intentionally did *not* do the -fsanitize=... stuff as part of this, and I don't know if there was a good reason for that or not. Certainly seems easier to use this way. f?decoder for that.
Attachment #8769293 - Flags: review?(mh+mozilla)
Attachment #8769293 - Flags: feedback?(choller)
I'm actually not sure who should review things like this. Probably, I ought to ask the same person to review actually adding these jobs. Feel free to forward this on if dustin or someone is more appropriate.
Attachment #8769395 - Flags: review?(garndt)
Comment on attachment 8769394 [details] [diff] [review]
Implement tsan (thread sanitizer) and asan (address sanitizer) jobs

Review of attachment 8769394 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/devtools/automation/autospider.py
@@ +170,5 @@
> +    compiler = variant['compiler']
> +elif platform.system() == 'Darwin':
> +    compiler = 'clang'
> +elif platform.system() == 'Windows':
> +    compiler = 'cl'

I'm surprised that this doesn't require 'cl.exe'.

@@ +207,3 @@
>          env['CC'] = '{CC} -m32'.format(**env)
>          env['CXX'] = '{CXX} -m32'.format(**env)
>          env['AR'] = 'ar'

Won't this set -m32 on Windows/cl as well? I guess CC/CXX get overridden fully by configure on that platform?

@@ +316,5 @@
>      results.append(run_test_command([jsapi_test_binary]))
>  if 'jstests' in test_suites:
>      results.append(run_test_command([MAKE, 'check-jstests']))
>  
> +if args.variant == 'tsan':

I see you are also adding asan builds. Do we not need any result aggregation for it, or is that going to come later?

@@ +322,5 @@
> +    sites = Counter()
> +    allfiles = os.listdir(env['MOZ_UPLOAD_DIR'])
> +    for filename in (f for f in allfiles if f.startswith("tsan_log.")):
> +        with open(os.path.join(env['MOZ_UPLOAD_DIR'], filename), 'rb') as fh:
> +            for line in fh.getlines():

|file|'s __iter__ produces the file lines, so it's canonical to leave off the getlines() and just have |for line in fh|.
Attachment #8769394 - Flags: review?(terrence) → review+
Attachment #8769395 - Flags: review?(garndt) → review+
(In reply to Terrence Cole [:terrence] from comment #13)
> Comment on attachment 8769394 [details] [diff] [review]
> Implement tsan (thread sanitizer) and asan (address sanitizer) jobs
> 
> Review of attachment 8769394 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> ::: js/src/devtools/automation/autospider.py
> @@ +170,5 @@
> > +    compiler = variant['compiler']
> > +elif platform.system() == 'Darwin':
> > +    compiler = 'clang'
> > +elif platform.system() == 'Windows':
> > +    compiler = 'cl'
> 
> I'm surprised that this doesn't require 'cl.exe'.

Maybe it would be more clear. But when using the Windows shell, I never bother typing the .exe, so I guess this is just habit.

> @@ +207,3 @@
> >          env['CC'] = '{CC} -m32'.format(**env)
> >          env['CXX'] = '{CXX} -m32'.format(**env)
> >          env['AR'] = 'ar'
> 
> Won't this set -m32 on Windows/cl as well? I guess CC/CXX get overridden
> fully by configure on that platform?

Hm, looks like I probably mangled the logic when I translated it from the previous one. It makes more sense to make this conditional on 'gcc.'

> @@ +316,5 @@
> >      results.append(run_test_command([jsapi_test_binary]))
> >  if 'jstests' in test_suites:
> >      results.append(run_test_command([MAKE, 'check-jstests']))
> >  
> > +if args.variant == 'tsan':
> 
> I see you are also adding asan builds. Do we not need any result aggregation
> for it, or is that going to come later?

From what I understand, asan crashes out when it detects a problem, so I don't really *need* anything here.

I could make its failures noncritical, so that it just logs them, with the goal of gathering more than one. But I haven't actually observed an asan failure yet (I should probably inject one and try it out), and I figure this is good enough.

I haven't looked into msan. It would probably want the same sort of handling as tsan.

And as for tsan, I'm making it try-only for now because I was a little shocked at the number of failures it's reporting, and I'd like to look at them further before drawing a line in the sand.
Comment on attachment 8769293 [details] [diff] [review]
Make --enable-thread-sanitizer & friends do more work

decoder told me over IRC that it wasn't done that way because at the time, it was difficult.
Attachment #8769293 - Flags: feedback?(choller)
Comment on attachment 8769293 [details] [diff] [review]
Make --enable-thread-sanitizer & friends do more work

Review of attachment 8769293 [details] [diff] [review]:
-----------------------------------------------------------------

Shouldn't this be accompanied with mozconfig changes?
Attachment #8769293 - Flags: review?(mh+mozilla)
Oh, right. Ok, I updated the existing users, and did a try push that looks awful. But the errors are mostly in the form of timeouts, so I'm guessing this isn't particularly worse than usual: https://treeherder.mozilla.org/#/jobs?repo=try&revision=21e6737f146df050e46d02c1b1a49ca38b4ca06d
Attachment #8770301 - Flags: review?(mh+mozilla)
Attachment #8769293 - Attachment is obsolete: true
Oops, sorry, that was missing some pieces that I accidentally folded into a different patch. Here's the full version.

MozReview-Commit-ID: KinAe8zLivJ
Attachment #8770340 - Flags: review?(mh+mozilla)
Attachment #8770301 - Attachment is obsolete: true
Attachment #8770301 - Flags: review?(mh+mozilla)
Depends on: 1286725
The masm job only works with the interpreter, at least until we inject instrumentation via the jit, so we need an interpreter-only option. Note that it won't actually work without bug 1286725.
Attachment #8770778 - Flags: review?(terrence)
This is handy, especially when working on autospider.py test running fixes.
Attachment #8770780 - Flags: review?(terrence)
Attachment #8770778 - Flags: review?(terrence) → review+
Attachment #8770780 - Flags: review?(terrence) → review+
Comment on attachment 8770340 [details] [diff] [review]
Make --enable-thread-sanitizer & friends do more work

Review of attachment 8770340 [details] [diff] [review]:
-----------------------------------------------------------------

::: build/autoconf/sanitize.m4
@@ +22,5 @@
> +            AC_MSG_ERROR([Couldn't find $MOZ_CLANG_RT_ASAN_LIB.  It should be available in the same location as clang-cl.])
> +        fi
> +        AC_SUBST(MOZ_CLANG_RT_ASAN_LIB_PATH)
> +    fi
> +    CFLAGS="-fsanitize=address -Dxmalloc=myxmalloc -fPIC $CFLAGS"

Might be worth filing a followup to sort out whether -fPIC and -Dxmalloc=myxmalloc are necessary, and if they are, why.
Attachment #8770340 - Flags: review?(mh+mozilla) → review+
(In reply to Mike Hommey [:glandium] from comment #21)
> ::: build/autoconf/sanitize.m4
> > +    CFLAGS="-fsanitize=address -Dxmalloc=myxmalloc -fPIC $CFLAGS"
> 
> Might be worth filing a followup to sort out whether -fPIC and
> -Dxmalloc=myxmalloc are necessary, and if they are, why.

Given that the tree is closed, and I doubt those are useful anymore, https://treeherder.mozilla.org/#/jobs?repo=try&revision=30237c8d2317

I thought -fPIC was the default anyway. And myxmalloc doesn't seem to be defined anywhere. Though I don't know why it works at all, then.
Pushed by sfink@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/d3f29964daf8
Make --enable-thread-sanitizer & friends do more work, r=glandium
https://hg.mozilla.org/integration/mozilla-inbound/rev/4126bd41ce8b
Implement tsan (thread sanitizer), asan (address sanitizer), and msan (uninitialized memory) jobs, r=terrence
https://hg.mozilla.org/integration/mozilla-inbound/rev/9e01e72d8ac7
Make sm-{asan,tsan,msan} try-only for now, r=garndt
https://hg.mozilla.org/integration/mozilla-inbound/rev/bc11f3fed0a5
Implement --build-only, --skip-tests=all, and --jitflags=interp, r=terrence
https://hg.mozilla.org/integration/mozilla-inbound/rev/be79f9b6ecc4
Implement autospider.py --nobuild for running tests on existing build, r=terrence
Blocks: 1287541
Blocks: 1288241
Blocks: 1288596
Blocks: 1289646
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: