Closed
Bug 1288596
Opened 9 years ago
Closed 9 years ago
Make SM-tc(msan) a first class test job
Categories
(Core :: JavaScript Engine, defect)
Core
JavaScript Engine
Tracking
()
RESOLVED
FIXED
mozilla51
People
(Reporter: n.nethercote, Assigned: sfink)
References
Details
Attachments
(5 files)
1.04 KB,
patch
|
glandium
:
review+
|
Details | Diff | Splinter Review |
1.21 KB,
patch
|
terrence
:
review+
|
Details | Diff | Splinter Review |
3.76 KB,
patch
|
terrence
:
review+
|
Details | Diff | Splinter Review |
2.88 KB,
patch
|
terrence
:
review+
|
Details | Diff | Splinter Review |
4.47 KB,
patch
|
garndt
:
review+
|
Details | Diff | Splinter Review |
Bug 1280637 created a TSan JS shell job. It's currently available on try.
The next step is to get it running as a first-class test job -- on every push, and failures necessitate backouts.
![]() |
Reporter | |
Updated•9 years ago
|
Summary: Add a TSan-enabled TaskCluster JS shell job → Make SM-tc(TSan) a first class test job
Assignee | ||
Comment 1•9 years ago
|
||
I'm actually working on msan right now, but much of the changes are shared, and this is turning out trickier than expected.
msan exits immediately with a return code of 77 if it finds an error during the run. The proper fix for this appears to be passing -mllvm -msan-keep-going, but that makes clang infinite loop during register allocation for one of our files.
You can change the return code msan uses, so I set it to 0. This way, a test with no msan errors will operate completely correctly. A test with msan errors will pretend to succeed, but will report the msan error, and we can then check that against the set of allowed errors.
Except that it's still exiting immediately when it hits an error, which means it might exist before it prints any PASSED notices. And the test harness requires at least one of those to believe an exit code of 0.
I can fix that by adding a --no-require-pass to disable that functionality, but now I'm getting some failures when I run the whole suite that I don't see when I run just that one test or everything in that test's directory.
I may have to try switching to msan's suppression mechanism instead of counting unique errors. I haven't used it yet.
I think I may be able to carve off some incremental pieces to land first. I'm not sure.
Keywords: leave-open
Assignee | ||
Comment 2•9 years ago
|
||
This probably doesn't really belong in this bug, but it's important for making the msan/tsan/asan builds reproducible on developers' desktops.
Without this, if you don't point autospider.py to the correct compiler, you'll get a mysterious error about TimeStamp being unavailable on your platform. It's because linking a configure test program is failing due to undefined symbols, but we never checked whether linking worked after setting up the sanitizers, so you can't trace it back to that.
This patch reports the actual problem.
Attachment #8776338 -
Flags: review?(mh+mozilla)
Assignee | ||
Comment 3•9 years ago
|
||
"MSan does not support suppressions. This is an intentional design choice."
Well, ok then. I guess I should at least submit a bug report to clang for the infinite loop thing.
Oh, yeah. Another wrinkle I didn't mention: jstests can have headers that include some JS code to run, generally to decide whether to skip the test or not. Unfortunately, it runs the test code with no flags, and so runs with the JITs on. And in that case, even just running
!this.hasOwnProperty("Intl")
is enough to trigger an msan error currently, which means it exits without any output. That makes the test harness very unhappy, since it was looking for "true" or "false". Sadly, fixing that particular msan error would require implementing msan instrumentation in our JITs, which is not a small task.
This is easy to fix, though; I can just have the test harness run js with the JITs disabled for these pre-run checks.
But back to the overall problem. For now, I think I will make msan just ignore all test failures, and only look at the number of distinct errors reported. It seems like the crashiness and exit code of msan-instrumented binaries is just not very interesting until you're completely msan-clean.
Assignee | ||
Comment 4•9 years ago
|
||
Filed https://llvm.org/bugs/show_bug.cgi?id=28786 for the clang bug.
Assignee | ||
Comment 5•9 years ago
|
||
msan and the JIT are not friendds. When testing reftest conditions, run without the JIT to avoid failing du to an msan error.
Attachment #8776780 -
Flags: review?(terrence)
Assignee | ||
Comment 6•9 years ago
|
||
Count up the errors for SM(msan tsan) and treat the job as failed if too many are seen.
Btw, I'm uploading these out of order. This is the first patch. But it shouldn't matter much for reviewing.
Attachment #8776786 -
Flags: review?(terrence)
Assignee | ||
Comment 7•9 years ago
|
||
If msan detects any errors while running an individual test, it controls the exit code. There is no way to pass through the test's usual exit code. In order to allow msan jobs to be treated as successful as long as they don't exceed a maximum error count, we ignore the test failures and just accumulate msan errors.
It would be vastly better to have msan keep going on an error, but doing that (via -mllvm -msan-keep-going) triggers a clang bug that hangs the compile during register allocation.
Another change kind of got tangled up in here, and I'm too lazy to sort it out. It seems like the LLVM_SYMBLIZER path wasn't being propagated through. (I didn't notice before because it was in my $PATH, but I just switched to a new computer.) Sending it through MSAN_OPTIONS appears to work.
Attachment #8776787 -
Flags: review?(terrence)
Comment 8•9 years ago
|
||
Comment on attachment 8776338 [details] [diff] [review]
Better error message when compiler is incompatible with sanitizer
Review of attachment 8776338 [details] [diff] [review]:
-----------------------------------------------------------------
meh
Attachment #8776338 -
Flags: review?(mh+mozilla) → review+
Comment 9•9 years ago
|
||
Comment on attachment 8776780 [details] [diff] [review]
Run reftest condition line with --no-baseline
Review of attachment 8776780 [details] [diff] [review]:
-----------------------------------------------------------------
::: js/src/tests/lib/manifest.py
@@ +90,5 @@
> ans = self.cache.get(cond, None)
> if ans is None:
> cmd = [
> self.js_bin,
> + '--no-baseline', # msan will error out if the jit is active
More generally, crashing in this code is super-hard to debug, so running with the safest possible configuration (given that you're probably trying to test a questionable patch anyway) makes a lot of sense.
Attachment #8776780 -
Flags: review?(terrence) → review+
Updated•9 years ago
|
Attachment #8776786 -
Flags: review?(terrence) → review+
Comment 10•9 years ago
|
||
Comment on attachment 8776787 [details] [diff] [review]
Make SM(msan) ignore test failures and only look at msan errors
Review of attachment 8776787 [details] [diff] [review]:
-----------------------------------------------------------------
::: js/src/devtools/automation/autospider.py
@@ +337,5 @@
> results.append(run_test_command([jsapi_test_binary]))
> if 'jstests' in test_suites:
> results.append(run_test_command([MAKE, 'check-jstests']))
>
> +if variant.get('ignore-test-failures'):
Add a |# FIXME| comment above this line explaining why we need it. Hopefully the code can go away soon; I guess clang 3.9 should be out any day now.
Attachment #8776787 -
Flags: review?(terrence) → review+
Comment 11•9 years ago
|
||
Pushed by sfink@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/82b3f5f3eed4
Add error limits to tsan and msan, r=terrence
https://hg.mozilla.org/integration/mozilla-inbound/rev/07c4518ad9ac
Better error message when compiler is incompatible with sanitizer, r=glandium
https://hg.mozilla.org/integration/mozilla-inbound/rev/6b54837ecee4
Make SM(msan) ignore test failures and only look at msan errors, r=terrence
https://hg.mozilla.org/integration/mozilla-inbound/rev/a60e63852496
Run reftest condition line with --no-baseline, r=terrence
Comment 12•9 years ago
|
||
Pushed by sfink@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/fb1d26487576
Drop msan max-errors down to 6, since that is what has been observed on try. r=woof!
![]() |
Reporter | |
Comment 13•9 years ago
|
||
> > + '--no-baseline', # msan will error out if the jit is active
FWIW: MSan/ASan/TSan all use static binary instrumentation, which means they cannot instrument JITted code. For ASan at least this just means that JITted code won't have any checks added to it. For MSan it means that JITted code will lack both checks *and* additional instrumentation required to propagate information about definedness correctly. So MSan + JITs is always likely to be a difficult combination.
In contrast, Valgrind uses dynamic binary instrumentation, which makes it slower, but it can instrument JITted code just like any other code.
Comment 14•9 years ago
|
||
bugherder |
Assignee | ||
Comment 15•9 years ago
|
||
I've only been doing msan in this bug, so I'll steal this bug for it and make another for tsan.
Summary: Make SM-tc(TSan) a first class test job → Make SM-tc(msan) a first class test job
Assignee | ||
Comment 16•9 years ago
|
||
These are now green. Time to make them tier 1.
Attachment #8777606 -
Flags: review?(garndt)
Comment 17•9 years ago
|
||
Comment on attachment 8777606 [details] [diff] [review]
Promote SM(asan msan) to tier 1
thank you about being explicit with the tier level!
Attachment #8777606 -
Flags: review?(garndt) → review+
Comment 18•9 years ago
|
||
Pushed by sfink@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/1d6da869903e
Promote SM(asan msan) to tier 1, r=garndt
Assignee | ||
Updated•9 years ago
|
Keywords: leave-open
Comment 19•9 years ago
|
||
bugherder |
Status: NEW → RESOLVED
Closed: 9 years ago
status-firefox51:
--- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla51
You need to log in
before you can comment on or make changes to this bug.
Description
•