Closed Bug 1288596 Opened 9 years ago Closed 9 years ago

Make SM-tc(msan) a first class test job

Tracking

()

Status:

RESOLVED FIXED

Milestone:

mozilla51

Tracking Flags:

Tracking

Status

firefox50

---

affected

firefox51

---

fixed

People

(Reporter: n.nethercote, Assigned: sfink)

References

Details

Attachments

(5 files)

Better error message when compiler is incompatible with sanitizer 9 years ago Steve Fink [:sfink] [:s:] 1.04 KB, patch	glandium : review+	Details \| Diff \| Splinter Review
Run reftest condition line with --no-baseline 9 years ago Steve Fink [:sfink] [:s:] 1.21 KB, patch	terrence : review+	Details \| Diff \| Splinter Review
Add error limits to tsan and msan 9 years ago Steve Fink [:sfink] [:s:] 3.76 KB, patch	terrence : review+	Details \| Diff \| Splinter Review
Make SM(msan) ignore test failures and only look at msan errors 9 years ago Steve Fink [:sfink] [:s:] 2.88 KB, patch	terrence : review+	Details \| Diff \| Splinter Review
Promote SM(asan msan) to tier 1 9 years ago Steve Fink [:sfink] [:s:] 4.47 KB, patch	garndt : review+	Details \| Diff \| Splinter Review

Nicholas Nethercote [inactive]

Reporter

Description

•

9 years ago

Bug 1280637 created a TSan JS shell job. It's currently available on try. The next step is to get it running as a first-class test job -- on every push, and failures necessitate backouts.

Nicholas Nethercote [inactive]

Reporter

Updated

•

9 years ago

Summary: Add a TSan-enabled TaskCluster JS shell job → Make SM-tc(TSan) a first class test job

Nicholas Nethercote [inactive]

Reporter

Updated

•

9 years ago

Blocks: 1289646

Steve Fink [:sfink] [:s:]

Assignee

Comment 1

•

9 years ago

I'm actually working on msan right now, but much of the changes are shared, and this is turning out trickier than expected. msan exits immediately with a return code of 77 if it finds an error during the run. The proper fix for this appears to be passing -mllvm -msan-keep-going, but that makes clang infinite loop during register allocation for one of our files. You can change the return code msan uses, so I set it to 0. This way, a test with no msan errors will operate completely correctly. A test with msan errors will pretend to succeed, but will report the msan error, and we can then check that against the set of allowed errors. Except that it's still exiting immediately when it hits an error, which means it might exist before it prints any PASSED notices. And the test harness requires at least one of those to believe an exit code of 0. I can fix that by adding a --no-require-pass to disable that functionality, but now I'm getting some failures when I run the whole suite that I don't see when I run just that one test or everything in that test's directory. I may have to try switching to msan's suppression mechanism instead of counting unique errors. I haven't used it yet. I think I may be able to carve off some incremental pieces to land first. I'm not sure.

Keywords: leave-open

Steve Fink [:sfink] [:s:]

Assignee

Comment 2

•

9 years ago

Attached patch Better error message when compiler is incompatible with sanitizer — Details — Splinter Review

This probably doesn't really belong in this bug, but it's important for making the msan/tsan/asan builds reproducible on developers' desktops. Without this, if you don't point autospider.py to the correct compiler, you'll get a mysterious error about TimeStamp being unavailable on your platform. It's because linking a configure test program is failing due to undefined symbols, but we never checked whether linking worked after setting up the sanitizers, so you can't trace it back to that. This patch reports the actual problem.

Attachment #8776338 - Flags: review?(mh+mozilla)

Steve Fink [:sfink] [:s:]

Assignee

Comment 3

•

9 years ago

"MSan does not support suppressions. This is an intentional design choice." Well, ok then. I guess I should at least submit a bug report to clang for the infinite loop thing. Oh, yeah. Another wrinkle I didn't mention: jstests can have headers that include some JS code to run, generally to decide whether to skip the test or not. Unfortunately, it runs the test code with no flags, and so runs with the JITs on. And in that case, even just running !this.hasOwnProperty("Intl") is enough to trigger an msan error currently, which means it exits without any output. That makes the test harness very unhappy, since it was looking for "true" or "false". Sadly, fixing that particular msan error would require implementing msan instrumentation in our JITs, which is not a small task. This is easy to fix, though; I can just have the test harness run js with the JITs disabled for these pre-run checks. But back to the overall problem. For now, I think I will make msan just ignore all test failures, and only look at the number of distinct errors reported. It seems like the crashiness and exit code of msan-instrumented binaries is just not very interesting until you're completely msan-clean.

Steve Fink [:sfink] [:s:]

Assignee

Comment 4

•

9 years ago

Filed https://llvm.org/bugs/show_bug.cgi?id=28786 for the clang bug.

Steve Fink [:sfink] [:s:]

Assignee

Comment 5

•

9 years ago

Attached patch Run reftest condition line with --no-baseline — Details — Splinter Review

msan and the JIT are not friendds. When testing reftest conditions, run without the JIT to avoid failing du to an msan error.

Attachment #8776780 - Flags: review?(terrence)

Steve Fink [:sfink] [:s:]

Assignee

Comment 6

•

9 years ago

Attached patch Add error limits to tsan and msan — Details — Splinter Review

Count up the errors for SM(msan tsan) and treat the job as failed if too many are seen. Btw, I'm uploading these out of order. This is the first patch. But it shouldn't matter much for reviewing.

Attachment #8776786 - Flags: review?(terrence)

Steve Fink [:sfink] [:s:]

Assignee

Comment 7

•

9 years ago

Attached patch Make SM(msan) ignore test failures and only look at msan errors — Details — Splinter Review

If msan detects any errors while running an individual test, it controls the exit code. There is no way to pass through the test's usual exit code. In order to allow msan jobs to be treated as successful as long as they don't exceed a maximum error count, we ignore the test failures and just accumulate msan errors. It would be vastly better to have msan keep going on an error, but doing that (via -mllvm -msan-keep-going) triggers a clang bug that hangs the compile during register allocation. Another change kind of got tangled up in here, and I'm too lazy to sort it out. It seems like the LLVM_SYMBLIZER path wasn't being propagated through. (I didn't notice before because it was in my $PATH, but I just switched to a new computer.) Sending it through MSAN_OPTIONS appears to work.

Attachment #8776787 - Flags: review?(terrence)

Mike Hommey [:glandium]

Comment 8

•

9 years ago

Comment on attachment 8776338 [details] [diff] [review] Better error message when compiler is incompatible with sanitizer Review of attachment 8776338 [details] [diff] [review]: ----------------------------------------------------------------- meh

Attachment #8776338 - Flags: review?(mh+mozilla) → review+

Terrence Cole [:terrence]

Comment 9

•

9 years ago

Comment on attachment 8776780 [details] [diff] [review] Run reftest condition line with --no-baseline Review of attachment 8776780 [details] [diff] [review]: ----------------------------------------------------------------- ::: js/src/tests/lib/manifest.py @@ +90,5 @@ > ans = self.cache.get(cond, None) > if ans is None: > cmd = [ > self.js_bin, > + '--no-baseline', # msan will error out if the jit is active More generally, crashing in this code is super-hard to debug, so running with the safest possible configuration (given that you're probably trying to test a questionable patch anyway) makes a lot of sense.

Attachment #8776780 - Flags: review?(terrence) → review+

Terrence Cole [:terrence]

Updated

•

9 years ago

Attachment #8776786 - Flags: review?(terrence) → review+

Terrence Cole [:terrence]

Comment 10

•

9 years ago

Comment on attachment 8776787 [details] [diff] [review] Make SM(msan) ignore test failures and only look at msan errors Review of attachment 8776787 [details] [diff] [review]: ----------------------------------------------------------------- ::: js/src/devtools/automation/autospider.py @@ +337,5 @@ > results.append(run_test_command([jsapi_test_binary])) > if 'jstests' in test_suites: > results.append(run_test_command([MAKE, 'check-jstests'])) > > +if variant.get('ignore-test-failures'): Add a |# FIXME| comment above this line explaining why we need it. Hopefully the code can go away soon; I guess clang 3.9 should be out any day now.

Attachment #8776787 - Flags: review?(terrence) → review+

Steve Fink [:sfink] [:s:]

Assignee

Updated

•

9 years ago

Blocks: 1291449

Pulsebot

Comment 11

•

9 years ago

Pushed by sfink@mozilla.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/82b3f5f3eed4 Add error limits to tsan and msan, r=terrence https://hg.mozilla.org/integration/mozilla-inbound/rev/07c4518ad9ac Better error message when compiler is incompatible with sanitizer, r=glandium https://hg.mozilla.org/integration/mozilla-inbound/rev/6b54837ecee4 Make SM(msan) ignore test failures and only look at msan errors, r=terrence https://hg.mozilla.org/integration/mozilla-inbound/rev/a60e63852496 Run reftest condition line with --no-baseline, r=terrence

Pulsebot

Comment 12

•

9 years ago

Pushed by sfink@mozilla.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/fb1d26487576 Drop msan max-errors down to 6, since that is what has been observed on try. r=woof!

Nicholas Nethercote [inactive]

Reporter

Comment 13

•

9 years ago

> > + '--no-baseline', # msan will error out if the jit is active FWIW: MSan/ASan/TSan all use static binary instrumentation, which means they cannot instrument JITted code. For ASan at least this just means that JITted code won't have any checks added to it. For MSan it means that JITted code will lack both checks *and* additional instrumentation required to propagate information about definedness correctly. So MSan + JITs is always likely to be a difficult combination. In contrast, Valgrind uses dynamic binary instrumentation, which makes it slower, but it can instrument JITted code just like any other code.

Carsten Book [:Tomcat]

Comment 14

•

9 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/82b3f5f3eed4 https://hg.mozilla.org/mozilla-central/rev/07c4518ad9ac https://hg.mozilla.org/mozilla-central/rev/6b54837ecee4 https://hg.mozilla.org/mozilla-central/rev/a60e63852496 https://hg.mozilla.org/mozilla-central/rev/fb1d26487576

Steve Fink [:sfink] [:s:]

Assignee

Comment 15

•

9 years ago

I've only been doing msan in this bug, so I'll steal this bug for it and make another for tsan.

Summary: Make SM-tc(TSan) a first class test job → Make SM-tc(msan) a first class test job

Steve Fink [:sfink] [:s:]

Assignee

Comment 16

•

9 years ago

Attached patch Promote SM(asan msan) to tier 1 — Details — Splinter Review

These are now green. Time to make them tier 1.

Attachment #8777606 - Flags: review?(garndt)

Greg Arndt [:garndt]

Comment 17

•

9 years ago

Comment on attachment 8777606 [details] [diff] [review] Promote SM(asan msan) to tier 1 thank you about being explicit with the tier level!

Attachment #8777606 - Flags: review?(garndt) → review+

Pulsebot

Comment 18

•

9 years ago

Pushed by sfink@mozilla.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/1d6da869903e Promote SM(asan msan) to tier 1, r=garndt

Steve Fink [:sfink] [:s:]

Assignee

Updated

•

9 years ago

Keywords: leave-open

Wes Kocher (:KWierso) (Not reading bugmail; email directly if needed)

Comment 19

•

9 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/1d6da869903e

Status: NEW → RESOLVED

Closed: 9 years ago

status-firefox51: --- → fixed

Resolution: --- → FIXED

Target Milestone: --- → mozilla51

Nicholas Nethercote [inactive]

Reporter

Updated

•

9 years ago

No longer blocks: tsan

You need to log in before you can comment on or make changes to this bug.