Closed Bug 940868 Opened 11 years ago Closed 10 years ago

Intermittent debug ASan timeouts in --no-ion /jit-test/tests/TypedObject/jit-complex.js and jit-test/tests/TypedObject/jit-prefix.js

Categories

(Core :: JavaScript Engine: JIT, defect)

x86_64
Linux
defect
Not set
normal

Tracking

()

RESOLVED WORKSFORME

People

(Reporter: philor, Unassigned)

Details

(Keywords: intermittent-failure)

Starting in a period of literally bustage on bustage on bustage on bustage overlaid with two different sorts of infra bustage, while we were ramming pushes into the tree far faster than it could possibly keep up, we got two instances of debug ASan builds that consistently (as in, we pointlessly run jit-tests twice per run and both times they timed out) timed out in TypedObject/jit-complex.js and TypedObject/jit-prefix.js in both of the --no-ion flavors:

https://tbpl.mozilla.org/php/getParsedLog.php?id=30803607&tree=Mozilla-Inbound
http://ftp.mozilla.org/pub/mozilla.org/firefox/tinderbox-builds/mozilla-inbound-linux64-asan-debug/1384903268/mozilla-inbound-linux64-asan-debug-bm58-build1-build316.txt.gz (tbpl cached its memory of being unable to fetch that log)

TIMEOUT - TypedObject/jit-complex.js
TEST-UNEXPECTED-FAIL | js/src/jit-test/tests/TypedObject/jit-complex.js | --no-baseline --no-ion
INFO exit-status     : -9
INFO timed-out       : True
INFO stdout          > 
INFO stderr         2> 
TEST-PASS | js/src/jit-test/tests/TypedObject/jit-read-float64.js | 
TEST-PASS | js/src/jit-test/tests/TypedObject/jit-read-float64.js | --ion-eager
TEST-PASS | js/src/jit-test/tests/TypedObject/jit-read-float64.js | --ion-eager --ion-check-range-analysis --no-sse3
TEST-PASS | js/src/jit-test/tests/TypedObject/jit-read-float64.js | --baseline-eager
TEST-PASS | js/src/jit-test/tests/TypedObject/jit-read-float64.js | --baseline-eager --no-ti --no-fpu
TIMEOUT - TypedObject/jit-complex.js
TEST-UNEXPECTED-FAIL | js/src/jit-test/tests/TypedObject/jit-complex.js | --no-baseline --no-ion --no-ti
INFO exit-status     : -9
INFO timed-out       : True
INFO stdout          > 
INFO stderr         2> 
TIMEOUT - TypedObject/jit-prefix.js
TEST-UNEXPECTED-FAIL | js/src/jit-test/tests/TypedObject/jit-prefix.js | --no-baseline --no-ion
INFO exit-status     : -9
INFO timed-out       : True
INFO stdout          > 
INFO stderr         2> 
TEST-PASS | js/src/jit-test/tests/TypedObject/jit-read-int.js | 
TEST-PASS | js/src/jit-test/tests/TypedObject/jit-read-int.js | --ion-eager
TEST-PASS | js/src/jit-test/tests/TypedObject/jit-read-int.js | --ion-eager --ion-check-range-analysis --no-sse3
TEST-PASS | js/src/jit-test/tests/TypedObject/jit-read-int.js | --baseline-eager
TEST-PASS | js/src/jit-test/tests/TypedObject/jit-read-int.js | --baseline-eager --no-ti --no-fpu
TIMEOUT - TypedObject/jit-prefix.js
TEST-UNEXPECTED-FAIL | js/src/jit-test/tests/TypedObject/jit-prefix.js | --no-baseline --no-ion --no-ti
INFO exit-status     : -9
INFO timed-out       : True
INFO stdout          > 
INFO stderr         2> 
...
TIMEOUT - TypedObject/jit-complex.js
TEST-UNEXPECTED-FAIL | js/src/jit-test/tests/TypedObject/jit-complex.js | --no-baseline --no-ion
INFO exit-status     : -9
INFO timed-out       : True
INFO stdout          > 
INFO stderr         2> 
TEST-PASS | js/src/jit-test/tests/TypedObject/jit-read-float64.js | 
TEST-PASS | js/src/jit-test/tests/TypedObject/jit-read-float64.js | --ion-eager
TEST-PASS | js/src/jit-test/tests/TypedObject/jit-read-float64.js | --ion-eager --ion-check-range-analysis --no-sse3
TEST-PASS | js/src/jit-test/tests/TypedObject/jit-read-float64.js | --baseline-eager
TEST-PASS | js/src/jit-test/tests/TypedObject/jit-read-float64.js | --baseline-eager --no-ti --no-fpu
TIMEOUT - TypedObject/jit-complex.js
TEST-UNEXPECTED-FAIL | js/src/jit-test/tests/TypedObject/jit-complex.js | --no-baseline --no-ion --no-ti
INFO exit-status     : -9
INFO timed-out       : True
INFO stdout          > 
INFO stderr         2> 
TIMEOUT - TypedObject/jit-prefix.js
TEST-UNEXPECTED-FAIL | js/src/jit-test/tests/TypedObject/jit-prefix.js | --no-baseline --no-ion
INFO exit-status     : -9
INFO timed-out       : True
INFO stdout          > 
INFO stderr         2> 
TEST-PASS | js/src/jit-test/tests/TypedObject/jit-read-int.js | 
TEST-PASS | js/src/jit-test/tests/TypedObject/jit-read-int.js | --ion-eager
TEST-PASS | js/src/jit-test/tests/TypedObject/jit-read-int.js | --ion-eager --ion-check-range-analysis --no-sse3
TEST-PASS | js/src/jit-test/tests/TypedObject/jit-read-int.js | --baseline-eager
TIMEOUT - TypedObject/jit-prefix.js
TEST-UNEXPECTED-FAIL | js/src/jit-test/tests/TypedObject/jit-prefix.js | --no-baseline --no-ion --no-ti
INFO exit-status     : -9
INFO timed-out       : True
INFO stdout          > 
INFO stderr         2> 

A maybe-reasonable regression range is https://tbpl.mozilla.org/?tree=Mozilla-Inbound&tochange=bf15e2032c38&fromchange=54dab4a01a81&jobname=debug%20asan

I already backed out 00644e4b067d for a separate slowing-us-down, so my best of all possible worlds is that it slowed us down in those two tests, only intermittently enough to time them out, and we'll never see this failure again.

Otherwise... could be anything.
These tests are already very, very slow when the JITs are turned off. In the interest of keeping differential testing in the pushes for all tests, I don't think a new jit-flag that says to the effect of "slow when JITs are off" is a good idea.

So I offer a few suggestions here:
 1. Spot fix these tests to not be so slow when the JITs are off. Lower the value of N, which seems arbitrary anyways. We should move stress tests elsewhere so they aren't prone to intermittent timeouts on the TBPL slaves when one of them decides it's having a bad day.
 2. Have a separate, non-looping test for correctness, but keep these tests as they are and mark them as "slow".

We should probably just do 1. Niko, what do you think?
Flags: needinfo?(nmatsakis)
Summary: Intermittent (or permanent) debug ASan timeouts in --no-ion /jit-test/tests/TypedObject/jit-complex.js and jit-test/tests/TypedObject/jit-prefix.js → Intermittent debug ASan timeouts in --no-ion /jit-test/tests/TypedObject/jit-complex.js and jit-test/tests/TypedObject/jit-prefix.js
This is tricky. On the one hand, the tests aren't serving their full purpose as they are, because they do not test performance -- so we can't be sure that the optimizations are even triggering. On the other hand, tests like these did help me to uncover bugs in the optimization paths, and it'd be good to keep testing those pathways. Using ion-eager or lower loop counts has the danger that it doesn't gather enough TI information to get to the fully optimized code. Do we have a solution for conformance testing jit-optimized paths that we employ elsewhere? Poking about in jit-tests, I did not see any such thing. (Flagging jandem for feedback on this point)

I have been planning to draw up some microbenchmarks (and some larger benchmarks) and submit them to AWFY for the purpose of monitoring performance. Those benchmarks should likely validate their results, which helps with conformance testing, but of course AWFY is not designed as a conformance monitoring platform.
Flags: needinfo?(nmatsakis) → needinfo?(jdemooij)
(In reply to Niko Matsakis [:nmatsakis] from comment #5)
> This is tricky. On the one hand, the tests aren't serving their full purpose
> as they are, because they do not test performance -- so we can't be sure
> that the optimizations are even triggering. On the other hand, tests like
> these did help me to uncover bugs in the optimization paths, and it'd be
> good to keep testing those pathways. Using ion-eager or lower loop counts
> has the danger that it doesn't gather enough TI information to get to the
> fully optimized code. Do we have a solution for conformance testing
> jit-optimized paths that we employ elsewhere? Poking about in jit-tests, I
> did not see any such thing. (Flagging jandem for feedback on this point)
> 

Testing that heuristics kick in seems pretty hard, but even then I think it's lower than 30,000.

> I have been planning to draw up some microbenchmarks (and some larger
> benchmarks) and submit them to AWFY for the purpose of monitoring
> performance. Those benchmarks should likely validate their results, which
> helps with conformance testing, but of course AWFY is not designed as a
> conformance monitoring platform.

My point was more that leave the correctness tests in jit-tests and the perf stuff to AWFY. I guess what you're saying that it's not good enough to take performance numbers as indication that the JIT heuristics are kicking in at the right places?
(In reply to Shu-yu Guo [:shu] from comment #6)
> Testing that heuristics kick in seems pretty hard, but even then I think
> it's lower than 30,000.

Sure. Another option is to add an ion-impatient mode (maybe something the test can even set dynamically using a shell function) that kicks in after 30 iterations or something rather than 1000. Then we could drastically lower the iteration counts. After all, the types will stabilize awfully quick here.

> My point was more that leave the correctness tests in jit-tests and the perf
> stuff to AWFY. I guess what you're saying that it's not good enough to take
> performance numbers as indication that the JIT heuristics are kicking in at
> the right places?

No, that's not quite what I'm saying. I'm saying that just because it runs fast doesn't mean it does the right thing, so we do want to have tests that run enough iterations to trigger the jit, with full type info, that are part of the conformance suite and hence subject to fuzzing etc.
Note that we have some tests that do:

    if (getBuildConfiguration()['asan'] && getBuildConfiguration()['debug']) quit(0);

But would be good to not do that of course, tests shouldn't be very slow if we can avoid it and we have ASan builds for a reason :)

(In reply to Niko Matsakis [:nmatsakis] from comment #5)
> Do we have a solution for conformance testing
> jit-optimized paths that we employ elsewhere? Poking about in jit-tests, I
> did not see any such thing. (Flagging jandem for feedback on this point)

Nope, we don't have anything like that.

(In reply to Niko Matsakis [:nmatsakis] from comment #7)
> Sure. Another option is to add an ion-impatient mode (maybe something the
> test can even set dynamically using a shell function) that kicks in after 30
> iterations or something rather than 1000. Then we could drastically lower
> the iteration counts. After all, the types will stabilize awfully quick here.

We have that already, something like:

setJitCompilerOption("ion.usecount.trigger", 30);
Flags: needinfo?(jdemooij)
Closing bugs where TBPLbot has previously commented, but have now not been modified for >3 months & do not contain the whiteboard strings for disabled/annotated tests or use the keyword leave-open. Filter on: mass-intermittent-bug-closure-2014-07
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.