Closed Bug 1291954 Opened 5 years ago Closed 3 years ago

Make SM(tsan) a tier 1 build


(Core :: JavaScript Engine, defect, P3)




Tracking Status
firefox62 --- verified


(Reporter: sfink, Assigned: sfink)


(Blocks 2 open bugs)



(8 files)

See also 1288596.
Blocks: 1311733
I don't even remember what I was waiting on to land tsan as a tier 1 job, but I know I felt it was at least somewhat important to get somewhat decent reporting of added failures.

This commit also bumps up the number of expected failures from 14 to 20 unique locations, which is not great.
Attachment #8840728 - Flags: review?(jseward)
Assignee: nobody → sphink
Ah! *This* is the important piece. Without this, a tsan error will make the test fail, which will turn the job red even if it is one of the expected errors.
Attachment #8840733 - Flags: review?(jseward)
Two remaining problems.

A bunch of tests are timing out when run under tsan. Unfortunately, they're jit-tests, which means we don't have a nice mechanism for logging all of the tests that take longer than n seconds so we can skip them. Hopefully the log will have enough info, or I can add in that option there too. (It's really annoying having 2 very similar test harnesses.)

A bunch of new races have been added, probably from the new thread work bhackett has been doing. See bug 1342473. Currently, it looks like it's probably just that the JS test shell is using things in a way not yet supported. (The setup is for cooperative threads, and it looks like shell workers run concurrently on the same runtime.)
Depends on: 1342473
Comment on attachment 8840728 [details] [diff] [review]
Report added tsan failures

Review of attachment 8840728 [details] [diff] [review]:

::: js/src/devtools/automation/variants/tsan
@@ +7,5 @@
>          "JITTEST_EXTRA_ARGS": "--jitflags=debug --ignore-timeouts={DIR}/cgc-jittest-timeouts.txt",
>          "JSTESTS_EXTRA_ARGS": "--exclude-file={DIR}/cgc-jstests-slow.txt",
>          "TSAN_OPTIONS": "log_path={OUTDIR}/sanitize_log"
>      },
> +    "expect-errors": [

It would be nice to add a comment here explaining why some
of these entries are duplicated.  If not here, maybe in the
Python part?
Attachment #8840728 - Flags: review?(jseward) → review+
Attachment #8840733 - Flags: review?(jseward) → review+
Keywords: leave-open
Depends on: 1362239
Message in the commit message: Several jit-tests expect to timeout, and are annotated with an expected status code. Currently, we have to force tsan to report a zero status if it finds an error, since otherwise it will cause lots of tests to fail (due to hitting a tsan-detectable problem.) But those zero exit statuses cause the test to fail. Add --unusable-error-status to treat those as passing.

Now, it turns out that this was *not* the reason for those tests to be failing. They didn't have any tsan errors, so did not trigger the forced-exitstatus setting. But this patch still seems worthwhile if we *do* run into that situation (with any expected error test, not just timeouts.)
Attachment #8865618 - Flags: review?(jcoppeard)
After the next patch I'm going to post, the main purpose of this patch is to tune the set of flags that tsan is running with.
Attachment #8865619 - Flags: review?(jcoppeard)
It turns out that it's some sort of subtle interaction with how tsan manages signal handlers and the signal handler we use for wasm. Just skip these tests for now.
Attachment #8865621 - Flags: review?(jcoppeard)
Comment on attachment 8865618 [details] [diff] [review]
Avoid problems with timeouts when running under tsan

Review of attachment 8865618 [details] [diff] [review]:

Yes, seems sensible.
Attachment #8865618 - Flags: review?(jcoppeard) → review+
Attachment #8865619 - Flags: review?(jcoppeard) → review+
Attachment #8865621 - Flags: review?(jcoppeard) → review+
Pushed by
Avoid problems with timeouts when running under tsan, r=jonco
Skip wasm/asm.js for tsan test, r=jonco
Skip tests that depend on the wasm signal handler, since tsan interferes with it, r=jonco
jonco, nothing much here to review. This moves SM(tsan) to tier 1. I can't think of any remaining reason not to now that it's green. I'll try to fix some of the failures in fast followups.
Attachment #8867031 - Flags: review?(jcoppeard)
Blocks: 1364287
Depends on: 1364288
Comment on attachment 8867031 [details] [diff] [review]
Update tsan failures to current set and move SM(tsan) to tier 1

Review of attachment 8867031 [details] [diff] [review]:

Attachment #8867031 - Flags: review?(jcoppeard) → review+
I don't think this verification approach is going to work. tsan is a dynamic checker, so the error count is random. Also, the reports of "expected error XXX not seen" are noise.

I'm thinking a workable approach would be to run a bunch of times, collect all failures seen, and add them to the expected set. Then make the job fail whenever any new error appears, regardless of the error count. We will just have to accept that low-probability errors will appear occasionally, and we'll need to add them to the expected set. Distinguishing these from legitimate new errors will be the tough part.

Also, this raises the question of when an error can be removed from the expected set. I think for now, we'll have to rely on people to remove errors when they believe they are fixing them, and allowing some number of stale errors to pile up over time. We can periodically purge.

A better approach would be to report the results of every run to a common place, and notice when an error hasn't appeared in N runs. That smells like future work.

Unfortunately, all of this does mean that new errors can hide on a per-file basis. Since line numbers are unstable, the expectation is a mapping of filenames to error counts. So if you introduce a new error in foo.cpp that already expects 3 errors, and you don't happen to hit one of the known errors, then the job will pass and some random later patch will be blamed when it reports 4 errors in that file. Hopefully this shouldn't come up often, and the best fix would be getting down to zero errors anyway.
Pushed by
Update tsan failures to current set and move SM(tsan) to tier 1, r=jonco
Pushed by
followup, revert the part of changeset 2db29775e18f which prematurely declared SM(tsan) tier-1
No, absolutely not. - a TinderboxPrint which appears in the never-seen Job Details panel is not enough, we never should have let you get away with the way hazards are displayed, but at least they are better than this. It's not even tier-2 until it displays failure messages in a treeherder-parsable way in the Failure Summary panel. - I believe that it's the "run-on-projects: []", which my vague memory says is how you say that you only want a job to run on Try and only when someone explicitly asks for it, but it doesn't run on any trees other than Try when you ask for it, making it the weirdest tier-1 ever. - five runs to declare victory was far too few, so I triggered another 45 on and that's a 22% failure rate, not the barely tolerable for a new suite 5%.
Depends on: 1366522
Keywords: triage-deferred
Priority: -- → P3
Depends on: 1458240
sheriffs - anything else you'd want us to do before making this tier 1? The known issues have been miraculously fixed, and all of the fancy output processing I had has been ripped out so that I believe this should report in the same way as eg asan builds.

(Also, re: comment 18: I hadn't realized the hazard error reporting was a problem, so when philor complained I immediately jumped up, paused for a year, then filed bug 1459378. Time is slippery.)
Flags: needinfo?(sphink)
Flags: needinfo?(philringnalda)
Flags: needinfo?(aryx.bugmail)
Looks good to me.
Flags: needinfo?(aryx.bugmail)
Dunno, I guess maybe? There have been zero failures since last Friday, which is nice but makes it impossible to assess whether or not it actually displays its errors reasonably.
Flags: needinfo?(philringnalda)
NI to make sure we don't forget about this (I'm worried it will regress quickly if we don't make it tier 1).
Flags: needinfo?(sphink)
I wanted to check out what failures look like, so I introduced a bunch of races (it didn't complain about my first attempt!). I can't say it looks all that great:

The number of errors is deceptive because this is failing on many, many tests. I scanned through try for a recent tsan failure. Here's one:

Its Failure Summary, in its entirety, is:

TEST-UNEXPECTED-FAIL | js/src/jit-test/tests/gc/bug-1136597.js | ================== (code 66, args "") [1.1 s]
make[1]: *** [check-jit-test] Error 2
make: *** [check-jit-test] Error 2

Yuck. One thing that would make this much better, good enough for landing, is to just do what AddressSanitizer does and highlight the SUMMARY: line on failures. I've opened the PR for that.

So I think I'll go ahead and land this tier-1 now.
Ok, maybe I'll get sheriff review first, just to cover my argyle socks.
Attachment #8976025 - Flags: review?(aryx.bugmail)
Flags: needinfo?(sphink)
Attachment #8976025 - Flags: review?(aryx.bugmail) → review+
Backout by
Backed out changeset a4257fbc42c8 because tsan is permafail right now
Depends on: 1459761
(In reply to Pulsebot from comment #30)
> Pushed by
> Promote SM(tsan) to tier 1, r=Aryx

Closed: 3 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla62
Confirmed running and visible on Treeherder :)
Please add documentation for this test suite to and also explain what kind of failures shall be reported as restricted security bugs and which ones not. Thank you.
Flags: needinfo?(sphink)
(In reply to Sebastian Hengst [:aryx][:archaeopteryx] (needinfo on intermittent or backout) from comment #35)
> Please add documentation for this test suite to
> and

It is already covered under the SM(...) row, which links to <>. I have replaced the contents of the tsan section on that page to remove the outdated whitelist info. Is that adequate? I could add a row specifically for this job, but it would turn out to be a complete duplicate of the SM(...) row.

> also explain what kind of failures shall be reported as restricted security
> bugs and which ones not. Thank you.

I put in some weaselly language for that "ThreadSanitizer failures are generally not security-critical, though they can be." But my confidence in that assessment is not high. Mostly, if this build fails on the initial push of a problem, then it's somewhat likely that it could be a security issue -- but it doesn't really matter with this tier 1, since it'll immediately get backed out anyway. The intermittents are more problematic, and while I could imagine them indeed being security sensitive, I'm not sure if they are any more so than other intermittents?

I dunno, other people may feel wildly differently.
Flags: needinfo?(sphink) → needinfo?(aryx.bugmail)
Ok, that seemed a little too glib, so I updated to "ThreadSanitizer failures are generally not security sensitive, though they can be. The usual rules apply -- eg if it's in GC code, it should be assumed to be sensitive until investigated."
Thank you, that works.
Flags: needinfo?(aryx.bugmail)
Sounds reasonable to me too. I would say that TSAN errors aren't security sensitive the way ASAN errors are, for example.
You need to log in before you can comment on or make changes to this bug.