Emscripten has 500+ automatic tests, that have uncovered various bugs in spidermonkey (in particular, recently in TI). It has been suggested to add some or all of those tests to the JS engine test suite, hence this bug. The tests include both compiling LLVM to JS and running the generated JS. That includes some very large JS scripts (multiple megabytes), and running the entire suite takes 1-2 hours, so that is the downside to adding them all (but the large JS scripts have tended to find the most JS engine bugs).
To move on this, the relevant people (I am not sure who that is) need to decide if we want this, and if so how many of the tests.
The criteria we want to use is "the tests which increase our test coverage". Does this help or hinder?
We have found several bugs in TI using this, including recent regressions after they were fixed (the reduced testcases for them, that were added to the js tests, still passed, but not the original emscripten test they were reduced from). So I believe these tests do increase coverage. However, it might be the case that some of that can be achieved with fuzzing, which AFAIK is not yet done on jaegermonkey, so maybe once fuzzing is done these tests will not increase the coverage that much. I don't know the answer to that, but I did ask jesse about one specific case - a bug with >50 local variables - and he said his fuzzer would not have found that one.
Just to clarify, I wasnt trying to shoot down including the tests, but trying to provide the criteria so that you can find some subset which is useful. But in fact, :decoder tells me that coverage is probably not a good metric for including tests, because we're doing pretty well on both line- and branch-coverage, and are finding bugs with fuzz-tests which However, I believe we can't add 2 hours of tests, no matter what the benefits. I could be wrong - it may be worth checking with RelEng and see what they say. Assuming for a moment that I'm right, then we need to find some subset of the tests that are useful to add to the testsuite. I don't know a way to identify that subset.
(In reply to comment #4) > But in fact, :decoder tells me that coverage is probably not a good metric for > including tests, because we're doing pretty well on both line- and > branch-coverage, and are finding bugs with fuzz-tests which ... are not improving coverage stats.
How about running it once per day on TM, instead of once per push?
I run the tests generally once a week, so I guess there is already some coverage in that sense, if we are happy with just that (and not preventing badness from even landing). The risk is that I might miss a few weeks now and then. For example, I just found another bug on tracemonkey with this, bug 648769, which looks like a regression from almost two weeks ago.
(In reply to comment #6) > How about running it once per day on TM, instead of once per push? If we do this we can also use it for Kraken and other long running scripts. Bug 624792 is about adding Kraken to jit-tests but I had to tweak the tests to run faster. It would be good to check correctness of the full version too.
(In reply to Alon Zakai (:azakai) from comment #7) > The risk is that I might miss a few weeks now and then. For example, I just > found another bug on tracemonkey with this, bug 648769, which looks like a > regression from almost two weeks ago. Bug 679878 is another regression. From about a month ago, on the TI branch. What about Jesse's suggestion of running these tests once per day? Ideally on several branches (TI, MC, Ion). It doesn't have to be fancy, just an email to some people if there's a new failure.
You need to log in before you can comment on or make changes to this bug.