Closed Bug 1288596 Opened 9 years ago Closed 9 years ago

Make SM-tc(msan) a first class test job

Categories

(Core :: JavaScript Engine, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla51
Tracking Status
firefox50 --- affected
firefox51 --- fixed

People

(Reporter: n.nethercote, Assigned: sfink)

References

Details

Attachments

(5 files)

Bug 1280637 created a TSan JS shell job. It's currently available on try. The next step is to get it running as a first-class test job -- on every push, and failures necessitate backouts.
Summary: Add a TSan-enabled TaskCluster JS shell job → Make SM-tc(TSan) a first class test job
Blocks: 1289646
I'm actually working on msan right now, but much of the changes are shared, and this is turning out trickier than expected. msan exits immediately with a return code of 77 if it finds an error during the run. The proper fix for this appears to be passing -mllvm -msan-keep-going, but that makes clang infinite loop during register allocation for one of our files. You can change the return code msan uses, so I set it to 0. This way, a test with no msan errors will operate completely correctly. A test with msan errors will pretend to succeed, but will report the msan error, and we can then check that against the set of allowed errors. Except that it's still exiting immediately when it hits an error, which means it might exist before it prints any PASSED notices. And the test harness requires at least one of those to believe an exit code of 0. I can fix that by adding a --no-require-pass to disable that functionality, but now I'm getting some failures when I run the whole suite that I don't see when I run just that one test or everything in that test's directory. I may have to try switching to msan's suppression mechanism instead of counting unique errors. I haven't used it yet. I think I may be able to carve off some incremental pieces to land first. I'm not sure.
Keywords: leave-open
This probably doesn't really belong in this bug, but it's important for making the msan/tsan/asan builds reproducible on developers' desktops. Without this, if you don't point autospider.py to the correct compiler, you'll get a mysterious error about TimeStamp being unavailable on your platform. It's because linking a configure test program is failing due to undefined symbols, but we never checked whether linking worked after setting up the sanitizers, so you can't trace it back to that. This patch reports the actual problem.
Attachment #8776338 - Flags: review?(mh+mozilla)
"MSan does not support suppressions. This is an intentional design choice." Well, ok then. I guess I should at least submit a bug report to clang for the infinite loop thing. Oh, yeah. Another wrinkle I didn't mention: jstests can have headers that include some JS code to run, generally to decide whether to skip the test or not. Unfortunately, it runs the test code with no flags, and so runs with the JITs on. And in that case, even just running !this.hasOwnProperty("Intl") is enough to trigger an msan error currently, which means it exits without any output. That makes the test harness very unhappy, since it was looking for "true" or "false". Sadly, fixing that particular msan error would require implementing msan instrumentation in our JITs, which is not a small task. This is easy to fix, though; I can just have the test harness run js with the JITs disabled for these pre-run checks. But back to the overall problem. For now, I think I will make msan just ignore all test failures, and only look at the number of distinct errors reported. It seems like the crashiness and exit code of msan-instrumented binaries is just not very interesting until you're completely msan-clean.
msan and the JIT are not friendds. When testing reftest conditions, run without the JIT to avoid failing du to an msan error.
Attachment #8776780 - Flags: review?(terrence)
Count up the errors for SM(msan tsan) and treat the job as failed if too many are seen. Btw, I'm uploading these out of order. This is the first patch. But it shouldn't matter much for reviewing.
Attachment #8776786 - Flags: review?(terrence)
If msan detects any errors while running an individual test, it controls the exit code. There is no way to pass through the test's usual exit code. In order to allow msan jobs to be treated as successful as long as they don't exceed a maximum error count, we ignore the test failures and just accumulate msan errors. It would be vastly better to have msan keep going on an error, but doing that (via -mllvm -msan-keep-going) triggers a clang bug that hangs the compile during register allocation. Another change kind of got tangled up in here, and I'm too lazy to sort it out. It seems like the LLVM_SYMBLIZER path wasn't being propagated through. (I didn't notice before because it was in my $PATH, but I just switched to a new computer.) Sending it through MSAN_OPTIONS appears to work.
Attachment #8776787 - Flags: review?(terrence)
Comment on attachment 8776338 [details] [diff] [review] Better error message when compiler is incompatible with sanitizer Review of attachment 8776338 [details] [diff] [review]: ----------------------------------------------------------------- meh
Attachment #8776338 - Flags: review?(mh+mozilla) → review+
Comment on attachment 8776780 [details] [diff] [review] Run reftest condition line with --no-baseline Review of attachment 8776780 [details] [diff] [review]: ----------------------------------------------------------------- ::: js/src/tests/lib/manifest.py @@ +90,5 @@ > ans = self.cache.get(cond, None) > if ans is None: > cmd = [ > self.js_bin, > + '--no-baseline', # msan will error out if the jit is active More generally, crashing in this code is super-hard to debug, so running with the safest possible configuration (given that you're probably trying to test a questionable patch anyway) makes a lot of sense.
Attachment #8776780 - Flags: review?(terrence) → review+
Attachment #8776786 - Flags: review?(terrence) → review+
Comment on attachment 8776787 [details] [diff] [review] Make SM(msan) ignore test failures and only look at msan errors Review of attachment 8776787 [details] [diff] [review]: ----------------------------------------------------------------- ::: js/src/devtools/automation/autospider.py @@ +337,5 @@ > results.append(run_test_command([jsapi_test_binary])) > if 'jstests' in test_suites: > results.append(run_test_command([MAKE, 'check-jstests'])) > > +if variant.get('ignore-test-failures'): Add a |# FIXME| comment above this line explaining why we need it. Hopefully the code can go away soon; I guess clang 3.9 should be out any day now.
Attachment #8776787 - Flags: review?(terrence) → review+
Blocks: 1291449
Pushed by sfink@mozilla.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/82b3f5f3eed4 Add error limits to tsan and msan, r=terrence https://hg.mozilla.org/integration/mozilla-inbound/rev/07c4518ad9ac Better error message when compiler is incompatible with sanitizer, r=glandium https://hg.mozilla.org/integration/mozilla-inbound/rev/6b54837ecee4 Make SM(msan) ignore test failures and only look at msan errors, r=terrence https://hg.mozilla.org/integration/mozilla-inbound/rev/a60e63852496 Run reftest condition line with --no-baseline, r=terrence
Pushed by sfink@mozilla.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/fb1d26487576 Drop msan max-errors down to 6, since that is what has been observed on try. r=woof!
> > + '--no-baseline', # msan will error out if the jit is active FWIW: MSan/ASan/TSan all use static binary instrumentation, which means they cannot instrument JITted code. For ASan at least this just means that JITted code won't have any checks added to it. For MSan it means that JITted code will lack both checks *and* additional instrumentation required to propagate information about definedness correctly. So MSan + JITs is always likely to be a difficult combination. In contrast, Valgrind uses dynamic binary instrumentation, which makes it slower, but it can instrument JITted code just like any other code.
I've only been doing msan in this bug, so I'll steal this bug for it and make another for tsan.
Summary: Make SM-tc(TSan) a first class test job → Make SM-tc(msan) a first class test job
These are now green. Time to make them tier 1.
Attachment #8777606 - Flags: review?(garndt)
Comment on attachment 8777606 [details] [diff] [review] Promote SM(asan msan) to tier 1 thank you about being explicit with the tier level!
Attachment #8777606 - Flags: review?(garndt) → review+
Keywords: leave-open
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla51
No longer blocks: tsan
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: