Closed Bug 478437 Opened 15 years ago Closed 15 years ago

Lightweight SpiderMonkey regression tests

Categories

(Core :: JavaScript Engine, defect)

Other Branch
defect
Not set
normal

Tracking

()

RESOLVED WONTFIX

People

(Reporter: jorendorff, Assigned: jorendorff)

References

Details

Attachments

(4 files, 1 obsolete file)

bclary is a saint for maintaining and vigilantly running the js/tests test suite.  But.

I think we need a test framework that the *developers* are willing to use.  Something lightweight to complement js/tests.

Requirements offhand:

- stupidly easy to run, just one script (bash or js)
- add a test by appending a tiny scrap of code to an existing file (like trace-tests.js)
- running a test standalone should be like this:  1. select the failing test in the test file 2. copy 3. paste into interpreter

Being able to handle everything we might want to test isn't a requirement.  We still have js/tests.  I think I might cheat a little by adding assert() and assertEq() functions to the JS shell.
Have to run js/tests before any serious patch lands, so this is for smaller patches? I'm wondering if there'll be enough coverage.

Also, it's not hard to run js/tests, if you are willing to type extra -L args. The big pain for me is qpopping my patch and running to get a basis (-ish -- some ND in those big-O measuring tests) -results.txt file to comm -13 against.

/be
I have bash aliases set up to handle the pain of a full run, so I don't feel much pain there (tho there is pain nonetheless).  Running a single test is harder than it should be, tho, I agree.
Can we just give jsDriver or a wrapper better defaults or utility switches, so that we don't need to do as much of it manually?

Baseline should be "no failures", no?  Do we have a lot of outstanding failures that we aren't attending to?
(In reply to comment #3)

> Baseline should be "no failures", no?  Do we have a lot of outstanding failures
> that we aren't attending to?

Pretty much. I can produce a test list which shouldn't fail on any given branch.
Once bc told me the 'right' set of command line args to use, I found it easy to run. I just have a text file with that command. Putting that in a 1-line bash script would make it easy for everyone, I think.

It shouldn't be too hard to make a shell script to run one test. I think all you have to do is be in the right directory and add the 3 -f blah/shell parts.

When I diff results I always get differences for the time tests and often a few of the big-O ones. A little script to paper over those incidental differences would be nice.
To test patches I typically once in a while run

./jsDriver.pl -e smdebug -s .../js -T 30

Here -T is important to filter out memory hogs and other slow tests that runs over 30s. In a few minutes this generates a list of failures that I save as base-failures.txt

Then it is matter of running

./jsDriver.pl -e smdebug -s .../js -T 30 -L base-failures.txt

But I agree it should be easier to run individual tests. Currently after a regression I often copy the test file and then either delete test-related goodies or replace the check functions with === etc. so the shell can run the file alone. It would be really nice if this step can be avoided.
Attached file jstest.sh
simple shell script to execute a single test.
Attachment #362302 - Attachment mime type: application/octet-stream → text/plain
I already have scripts to run all tests or just one test, and even to run just one test under gdb, and I do run them often.  But.

I see trace-test.js growing while bclary is the only person who contributes to js/tests.  Something is wrong.
Depends on: 480199
Attached patch v1 (obsolete) — Splinter Review
Here's what I had in mind.  This moves all the developer-maintained js tests we've already got (NOT js/tests, nor js/src/analysis-tests) into a new directory, js/src/tests.

js/src/tests/crash-tests.js is new.  I threw it together from the existing js/tests. I want to gather some ECMA compliance tests too, in a follow-up bug.

To run these tests, do `make check` in an objdir or just:
  ./tests/runtests.py
Assignee: general → jorendorff
Status: NEW → ASSIGNED
Attachment #365214 - Flags: review?
Oh, I forgot the best part.  You don't need runtest.py or make check.  You can just point the JS shell at the test files.

crash-tests.js takes this one step further: each test is standalone, so you can just copy it and paste it into the shell.
Attachment #365214 - Flags: review? → review?(sayrer)
Attachment #365214 - Flags: review?(sayrer) → review+
Comment on attachment 365214 [details] [diff] [review]
v1

looks good, don't get too verbose in the header comment, since that will just get out of sync. Tell people to run with --help to get options.
Attached patch v2Splinter Review
This version sets the exit code properly and works in more cases, like from a browser objdir.

There is one case where it still doesn't work:

../configure  --enable-readline --enable-threadsafe --with-nspr-cflags=-I/Users/jason/dev/moz/tracemonkey/obj-ff-release/dist/include/nspr '--with-nspr-libs=-L/Users/jason/dev/moz/tracemonkey/obj-ff-release/dist/lib -lnspr4'

and js refuses to run in this case.  I can make it work by setting DYLD_LIBRARY_PATH when running make check.  I claim this is a good baseline.
Attachment #365214 - Attachment is obsolete: true
Attachment #365315 - Flags: review?(sayrer)
What's the status of this?  It looks like it would be really useful to have in the tree.
I still use (from js/tests)

./jsDriver.pl -t -e smdebug -L lc2 lc3 spidermonkey-n.tests slow-n.tests

Some noise from the O(n^2) tests.

It's not true that only bclary adds tests, although he often is the person who commits tests contributed by others.

trace-test.js is great but it doesn't scale to cover the space that js/tests covers, which needs coverage for every changeset to the primary repository.

My point is not that we don't need this bug fixed -- who could object, all else equal -- but that the space is too big to cover in a "lightweight" fashion. We are finding more uncovered paths all the time, usually via browser-dependent bugs that bite web apps. These turn into regression tests under js/tests.

I can't safely (for others' time and my rep) and in good conscience skip the js/tests on any checkin I make, and if there were a lightweight suite, while I would find it useful during rapid-iteration-style development, I would have to run the "full" suite pretty often -- and definitely before committing.

Is the problem with js/tests that there's too much boilerplate to (a) learn and (b) add to a test, to get it into js/tests format, in addition to (c) deciding where in the directory tree it should go?

/be
Of course, I have to generate the baseline failures.txt file pretty much each time. This is a pain. Definitely bugs to fix here, but they'd still be there biting if we fix this bug.

/be
At the moment, I'm interested in the "stupidly easy to run" part from the original description.
(In reply to comment #16)
> Of course, I have to generate the baseline failures.txt file pretty much each
> time. This is a pain.

Why it is so? I always use -T 60 when generating the base line. That filters all those slow tests and generates the base line rather quickly (less than 10 minutes on a 2.6 GHz CPU).
I will try -T, I've been using -L <short-list> for a while at Bob's recommendation and it runs a lot faster than 10 minutes.

My point is we could use a standard, minimal-arguments command line for everyone to run, or a wrapper script or equivalent. Automatically-updated baseline results would be helpful. This may be on QA's agenda already, Bob should speak up if so.

/be
a ref test like approach is on my agenda. it is high on my list of priorities but keeps getting bumped keeping up with keeping up with regressions in the face of releases.
It seems like documentation is the #1 issue. I found the js tests bewildering at first, but after 5 minutes of bc explaining it to me, it became fairly easy. The readme is well written but new users need examples and an explanation of the output. I use:

    ./jsDriver.pl -e smopt -s <js-shell-path> -L lc2 lc3 spidermonkey-n-1.9.1.tests slow-n.tests -R 2>&1 2> <output-file>

That seems to filter out the slow tests. I didn't know about the -T option; it seems not to be documented in the readme. I guess -T requires less manual maintenance of a slow tests list.

'Stupidly easy to run' is of course a great goal. Interpreting the results seems to be the main difficulty once you've got yourself a good command line. Diffing a baseline output with a new output works well, except for (a) tests that output times, and (b) the O(n) tests. I think a postprocessing script would fix (a) easily enough, but I've never gotten around to it. I'm not sure about (b) -- I never really understood those tests to begin with and they have only produced spurious failures for me so far. I confess my reaction was to ignore those tests. Maybe they're for expert consumption and needn't be run all the time?
Oh, another comment on this subject. I'm sure it's been said 1000 times, but the 1001th can't hurt. I think it would be better if the regression tests ran as part of the push-to-repo process. That way, developers can't forget to do it or miss a test failure, and they don't have to wait until it's done--iff it fails, they'll get an email about it.
Can someone remind me of why we need to establish baseline failures at all?  We don't for our other test suites, because...the baseline is "they all pass".  Are there tests that are too setup-sensitive for that to work (recommend: exclude those by default)?  Or maybe we just have old bugs that need attention (recommend: obvious)?
(In reply to comment #22)

the plan is for them to run with the other unit tests.

(In reply to comment #23)

basically for historical reasons which will go away when the make check process lands. The current (non js) test suites have tests added to them on a per branch/repo basis when they are fixed so that all tests can be run with the expectation that all will pass.

the current js tests are not branch/repo specific, also include pathological tests which are expected to fail in some (good) way.

I'm somewhat frustrated here. The tests are too hard, take too long to run or whatever and no one mentions their problems to me so most people don't run the tests at all leaving me to continually chase regressions with "we are releasing tomorrow" deadlines existing for months leaving no time to do anything but play janitor. I work 7 days a week cleaning up after you guys. Cut me a break.
Bob, I run the tests and they have saved me many times -- so many thanks. It's clear dmandelin and others run the tests. I remember early on, gal praised the tests effusively for their completeness.

This bug is motivated in part by a desire to get you of the release-chasing business. That's a good motivation, but I question the proposed means, unless this is really just "better defaults and baselining for jsDriver.pl".

The utility of a lightweight testsuite is clear enough but it does not come for free. The down side includes the false sense of coverage that it might produce, the duplication of effort with respect to the full suite, and the opportunity cost apart from that duplication (even if we shared tests somehow, what could we do to the full suite with the effort spent on the lightweight tests?).

/be
I too wasn't saying anything bad about the tests or Bob's work on them here. The biggest problem seems to be that new JS hackers have to learn about them by accident and word of mouth. It's not really the fault of the tests' docs themselves; maybe a JS hacker's intro guide is needed. Otherwise I was just suggesting some tools that might make them easier to use; in fact I intended to give it a shot myself but never got around to it.
I first used the tests yesterday, when working on a change to the LIR in nanojit, so here's a newbie user's point of view:

- Having a good test suite is hugely helpful, especially when I'm making changes to code that I'm not yet familiar with.  So your efforts here are greatly appreciated, Bob.

- The tests aren't easy to find for a newbie.  I only learnt about them because someone (Andreas?) CC'd me on this bug.  Prior to that I'd asked about JS tests on IRC and only been told about trace-test.js.  "jsDriver.pl" isn't the most obvious name for a script that runs tests.

- I didn't realise there was a README file until I read #21, that helped.

- Having to exclude tests to make the suite usable is a big obstacle.  I first tried not excluding any tests, and gave up after 90 minutes.  Turns out one test had timed out after an hour.  A test that takes over an hour to run doesn't seem much use.  Using Brendan's suggestion from #15 was very useful, but again I couldn't have discovered that without being CC'd on this bug.

- I now agree with Brendan that having lightweight vs. heavyweight tests isn't much use, so long as the heavyweight tests don't take too long to run.  On my machine a debug run using Brendan's suggestion from #15 takes 4.5 minutes, which is fine.  I was also using trace-test.js as a quick pre-check.

- Having to establish a baseline of failures is very non-obvious, again I wouldn't have worked that out without being CC'd on this bug.  A test that always (or even often) fails isn't useful.

- I still don't know what the O(n) and O(n^2) tests are.

IMO the ideal suite would have these properties

- Be very easy to run and have an obvious name.  Bob's "make check" process mentioned in #24 sounds like it would fix this.

- No expected failures.  The "make check" process sounds like it might also fix that.  If some tests sometimes fail due to timing issues, maybe the timing-sensitive ones should be run separately?  Not sure.

- No need for exclusions -- all tests should be run each invocation.  That might require killing some of the existing tests, but if people aren't running certain tests anyway that's no great loss.

- A fairly short run-time.  5 minutes is great, 10 is ok, much longer and you'll get people wanting to skip them.

I understand that getting all those properties may well be a lot of work! :)
Many thanks to you, Bob, for all your work!  I use js/tests consistently (last night's red notwithstanding, koff koff), and not just to catch regressions.  When I can't figure out how/whether a given condition could ever possibly arise in the interpreter, I add an assert and run the test suite --- it's a corpus of code to search for examples!  (I've not yet needed to run the whole browser suite to find what I'm looking for.)
(In reply to comment #28)
> When I can't figure out how/whether a given condition could ever possibly arise
> in the interpreter, I add an assert and run the test suite

I've done this too -- my memory has finite capacity beyond what is reserved for better things than oddball (e4x, e.g.) corner cases ;-).

/be
Attached is a script I have been using to run the jstests and compare them against baseline output. Required setup:

- have a baseline js shell build to test against in |js/src/debug-baseline|
  (can also use |opt| instead of |debug|)
- have a js shell you want to test in |js/src/debug|
  (again, can use |opt|)
- have a /tmp directory

Then, you can just run 'python3 test.py debug', and it will do all the rest, namely run the suite against the baseline (if results are not cached), run the suite against the test version (if results are not cached & created after the js shell was built), and then compare the results.

If everything's OK, the last line of output is 'PASSED'. 

If not, there are test suite mismatches or failures and the output will show them. The output also shows any tests that are newly passed after the word 'FIXES'.
Attachment #365315 - Flags: review?(sayrer)
See bug 505588 for progress in js tests.
Status: ASSIGNED → RESOLVED
Closed: 15 years ago
Resolution: --- → WONTFIX
(In reply to comment #31)
> See bug 505588 for progress in js tests.

See also bug 509629.
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: