478437 - Lightweight SpiderMonkey regression tests

Assignee

Description

•

15 years ago

bclary is a saint for maintaining and vigilantly running the js/tests test suite.  But.

I think we need a test framework that the *developers* are willing to use.  Something lightweight to complement js/tests.

Requirements offhand:

- stupidly easy to run, just one script (bash or js)
- add a test by appending a tiny scrap of code to an existing file (like trace-tests.js)
- running a test standalone should be like this:  1. select the failing test in the test file 2. copy 3. paste into interpreter

Being able to handle everything we might want to test isn't a requirement.  We still have js/tests.  I think I might cheat a little by adding assert() and assertEq() functions to the JS shell.

Brendan Eich [:brendan]

Comment 1

•

15 years ago

Have to run js/tests before any serious patch lands, so this is for smaller patches? I'm wondering if there'll be enough coverage.

Also, it's not hard to run js/tests, if you are willing to type extra -L args. The big pain for me is qpopping my patch and running to get a basis (-ish -- some ND in those big-O measuring tests) -results.txt file to comm -13 against.

/be

Jeff Walden [:Waldo]

Comment 2

•

15 years ago

I have bash aliases set up to handle the pain of a full run, so I don't feel much pain there (tho there is pain nonetheless).  Running a single test is harder than it should be, tho, I agree.

Mike Shaver (:shaver -- probably not reading bugmail closely)

Comment 3

•

15 years ago

Can we just give jsDriver or a wrapper better defaults or utility switches, so that we don't need to do as much of it manually?

Baseline should be "no failures", no?  Do we have a lot of outstanding failures that we aren't attending to?

Bob Clary [:bc] (inactive)

Comment 4

•

15 years ago

(In reply to comment #3)

> Baseline should be "no failures", no?  Do we have a lot of outstanding failures
> that we aren't attending to?

Pretty much. I can produce a test list which shouldn't fail on any given branch.

David Mandelin [:dmandelin]

Comment 5

•

15 years ago

Once bc told me the 'right' set of command line args to use, I found it easy to run. I just have a text file with that command. Putting that in a 1-line bash script would make it easy for everyone, I think.

It shouldn't be too hard to make a shell script to run one test. I think all you have to do is be in the right directory and add the 3 -f blah/shell parts.

When I diff results I always get differences for the time tests and often a few of the big-O ones. A little script to paper over those incidental differences would be nice.

Igor Bukanov

Comment 6

•

15 years ago

To test patches I typically once in a while run

./jsDriver.pl -e smdebug -s .../js -T 30

Here -T is important to filter out memory hogs and other slow tests that runs over 30s. In a few minutes this generates a list of failures that I save as base-failures.txt

Then it is matter of running

./jsDriver.pl -e smdebug -s .../js -T 30 -L base-failures.txt

But I agree it should be easier to run individual tests. Currently after a regression I often copy the test file and then either delete test-related goodies or replace the check functions with === etc. so the shell can run the file alone. It would be really nice if this step can be avoided.

Bob Clary [:bc] (inactive)

Comment 7

•

15 years ago

Attached file jstest.sh — Details

simple shell script to execute a single test.

Bob Clary [:bc] (inactive)

Updated

•

15 years ago

Attachment #362302 - Attachment mime type: application/octet-stream → text/plain

Jason Orendorff [:jorendorff]

Assignee

Comment 8

•

15 years ago

I already have scripts to run all tests or just one test, and even to run just one test under gdb, and I do run them often.  But.

I see trace-test.js growing while bclary is the only person who contributes to js/tests.  Something is wrong.

Jason Orendorff [:jorendorff]

Assignee

Updated

•

15 years ago

Depends on: 480199

Jason Orendorff [:jorendorff]

Assignee

Comment 9

•

15 years ago

Attached patch v1 (obsolete) — Details — Splinter Review

Here's what I had in mind.  This moves all the developer-maintained js tests we've already got (NOT js/tests, nor js/src/analysis-tests) into a new directory, js/src/tests.

js/src/tests/crash-tests.js is new.  I threw it together from the existing js/tests. I want to gather some ECMA compliance tests too, in a follow-up bug.

To run these tests, do `make check` in an objdir or just:
  ./tests/runtests.py

Assignee: general → jorendorff

Status: NEW → ASSIGNED

Attachment #365214 - Flags: review?

Jason Orendorff [:jorendorff]

Assignee

Comment 10

•

15 years ago

Oh, I forgot the best part.  You don't need runtest.py or make check.  You can just point the JS shell at the test files.

crash-tests.js takes this one step further: each test is standalone, so you can just copy it and paste it into the shell.

Jason Orendorff [:jorendorff]

Assignee

Updated

•

15 years ago

Attachment #365214 - Flags: review? → review?(sayrer)

Robert Sayre

Updated

•

15 years ago

Attachment #365214 - Flags: review?(sayrer) → review+

Robert Sayre

Comment 11

•

15 years ago

Comment on attachment 365214 [details] [diff] [review]
v1

looks good, don't get too verbose in the header comment, since that will just get out of sync. Tell people to run with --help to get options.

Jason Orendorff [:jorendorff]

Assignee

Comment 12

•

15 years ago

Attached patch interdiff v1 to v2 — Details — Splinter Review

Jason Orendorff [:jorendorff]

Assignee

Comment 13

•

15 years ago

Attached patch v2 — Details — Splinter Review

This version sets the exit code properly and works in more cases, like from a browser objdir.

There is one case where it still doesn't work:

../configure  --enable-readline --enable-threadsafe --with-nspr-cflags=-I/Users/jason/dev/moz/tracemonkey/obj-ff-release/dist/include/nspr '--with-nspr-libs=-L/Users/jason/dev/moz/tracemonkey/obj-ff-release/dist/lib -lnspr4'

and js refuses to run in this case.  I can make it work by setting DYLD_LIBRARY_PATH when running make check.  I claim this is a good baseline.

Attachment #365214 - Attachment is obsolete: true

Attachment #365315 - Flags: review?(sayrer)

Nicholas Nethercote [inactive]

Comment 14

•

15 years ago

What's the status of this?  It looks like it would be really useful to have in the tree.

Brendan Eich [:brendan]

Comment 15

•

15 years ago

I still use (from js/tests)

./jsDriver.pl -t -e smdebug -L lc2 lc3 spidermonkey-n.tests slow-n.tests

Some noise from the O(n^2) tests.

It's not true that only bclary adds tests, although he often is the person who commits tests contributed by others.

trace-test.js is great but it doesn't scale to cover the space that js/tests covers, which needs coverage for every changeset to the primary repository.

My point is not that we don't need this bug fixed -- who could object, all else equal -- but that the space is too big to cover in a "lightweight" fashion. We are finding more uncovered paths all the time, usually via browser-dependent bugs that bite web apps. These turn into regression tests under js/tests.

I can't safely (for others' time and my rep) and in good conscience skip the js/tests on any checkin I make, and if there were a lightweight suite, while I would find it useful during rapid-iteration-style development, I would have to run the "full" suite pretty often -- and definitely before committing.

Is the problem with js/tests that there's too much boilerplate to (a) learn and (b) add to a test, to get it into js/tests format, in addition to (c) deciding where in the directory tree it should go?

/be

Brendan Eich [:brendan]

Comment 16

•

15 years ago

Of course, I have to generate the baseline failures.txt file pretty much each time. This is a pain. Definitely bugs to fix here, but they'd still be there biting if we fix this bug.

/be

Nicholas Nethercote [inactive]

Comment 17

•

15 years ago

At the moment, I'm interested in the "stupidly easy to run" part from the original description.

Igor Bukanov

Comment 18

•

15 years ago

(In reply to comment #16)
> Of course, I have to generate the baseline failures.txt file pretty much each
> time. This is a pain.

Why it is so? I always use -T 60 when generating the base line. That filters all those slow tests and generates the base line rather quickly (less than 10 minutes on a 2.6 GHz CPU).

Brendan Eich [:brendan]

Comment 19

•

15 years ago

I will try -T, I've been using -L <short-list> for a while at Bob's recommendation and it runs a lot faster than 10 minutes.

My point is we could use a standard, minimal-arguments command line for everyone to run, or a wrapper script or equivalent. Automatically-updated baseline results would be helpful. This may be on QA's agenda already, Bob should speak up if so.

/be

Bob Clary [:bc] (inactive)

Comment 20

•

15 years ago

a ref test like approach is on my agenda. it is high on my list of priorities but keeps getting bumped keeping up with keeping up with regressions in the face of releases.

David Mandelin [:dmandelin]

Comment 21

•

15 years ago

It seems like documentation is the #1 issue. I found the js tests bewildering at first, but after 5 minutes of bc explaining it to me, it became fairly easy. The readme is well written but new users need examples and an explanation of the output. I use:

    ./jsDriver.pl -e smopt -s <js-shell-path> -L lc2 lc3 spidermonkey-n-1.9.1.tests slow-n.tests -R 2>&1 2> <output-file>

That seems to filter out the slow tests. I didn't know about the -T option; it seems not to be documented in the readme. I guess -T requires less manual maintenance of a slow tests list.

'Stupidly easy to run' is of course a great goal. Interpreting the results seems to be the main difficulty once you've got yourself a good command line. Diffing a baseline output with a new output works well, except for (a) tests that output times, and (b) the O(n) tests. I think a postprocessing script would fix (a) easily enough, but I've never gotten around to it. I'm not sure about (b) -- I never really understood those tests to begin with and they have only produced spurious failures for me so far. I confess my reaction was to ignore those tests. Maybe they're for expert consumption and needn't be run all the time?

David Mandelin [:dmandelin]

Comment 22

•

15 years ago

Oh, another comment on this subject. I'm sure it's been said 1000 times, but the 1001th can't hurt. I think it would be better if the regression tests ran as part of the push-to-repo process. That way, developers can't forget to do it or miss a test failure, and they don't have to wait until it's done--iff it fails, they'll get an email about it.

Mike Shaver (:shaver -- probably not reading bugmail closely)

Comment 23

•

15 years ago

Can someone remind me of why we need to establish baseline failures at all?  We don't for our other test suites, because...the baseline is "they all pass".  Are there tests that are too setup-sensitive for that to work (recommend: exclude those by default)?  Or maybe we just have old bugs that need attention (recommend: obvious)?

Bob Clary [:bc] (inactive)

Comment 24

•

15 years ago

(In reply to comment #22)

the plan is for them to run with the other unit tests.

(In reply to comment #23)

basically for historical reasons which will go away when the make check process lands. The current (non js) test suites have tests added to them on a per branch/repo basis when they are fixed so that all tests can be run with the expectation that all will pass.

the current js tests are not branch/repo specific, also include pathological tests which are expected to fail in some (good) way.

I'm somewhat frustrated here. The tests are too hard, take too long to run or whatever and no one mentions their problems to me so most people don't run the tests at all leaving me to continually chase regressions with "we are releasing tomorrow" deadlines existing for months leaving no time to do anything but play janitor. I work 7 days a week cleaning up after you guys. Cut me a break.

Brendan Eich [:brendan]

Comment 25

•

15 years ago

Bob, I run the tests and they have saved me many times -- so many thanks. It's clear dmandelin and others run the tests. I remember early on, gal praised the tests effusively for their completeness.

This bug is motivated in part by a desire to get you of the release-chasing business. That's a good motivation, but I question the proposed means, unless this is really just "better defaults and baselining for jsDriver.pl".

The utility of a lightweight testsuite is clear enough but it does not come for free. The down side includes the false sense of coverage that it might produce, the duplication of effort with respect to the full suite, and the opportunity cost apart from that duplication (even if we shared tests somehow, what could we do to the full suite with the effort spent on the lightweight tests?).

/be

David Mandelin [:dmandelin]

Comment 26

•

15 years ago

I too wasn't saying anything bad about the tests or Bob's work on them here. The biggest problem seems to be that new JS hackers have to learn about them by accident and word of mouth. It's not really the fault of the tests' docs themselves; maybe a JS hacker's intro guide is needed. Otherwise I was just suggesting some tools that might make them easier to use; in fact I intended to give it a shot myself but never got around to it.

Nicholas Nethercote [inactive]

Comment 27

•

15 years ago

I first used the tests yesterday, when working on a change to the LIR in nanojit, so here's a newbie user's point of view:

- Having a good test suite is hugely helpful, especially when I'm making changes to code that I'm not yet familiar with. So your efforts here are greatly appreciated, Bob.

- The tests aren't easy to find for a newbie. I only learnt about them because someone (Andreas?) CC'd me on this bug. Prior to that I'd asked about JS tests on IRC and only been told about trace-test.js. "jsDriver.pl" isn't the most obvious name for a script that runs tests.

- I didn't realise there was a README file until I read #21, that helped.

- Having to exclude tests to make the suite usable is a big obstacle. I first tried not excluding any tests, and gave up after 90 minutes. Turns out one test had timed out after an hour. A test that takes over an hour to run doesn't seem much use. Using Brendan's suggestion from #15 was very useful, but again I couldn't have discovered that without being CC'd on this bug.

- I now agree with Brendan that having lightweight vs. heavyweight tests isn't much use, so long as the heavyweight tests don't take too long to run. On my machine a debug run using Brendan's suggestion from #15 takes 4.5 minutes, which is fine. I was also using trace-test.js as a quick pre-check.

- Having to establish a baseline of failures is very non-obvious, again I wouldn't have worked that out without being CC'd on this bug. A test that always (or even often) fails isn't useful.

- I still don't know what the O(n) and O(n^2) tests are.

IMO the ideal suite would have these properties

- Be very easy to run and have an obvious name. Bob's "make check" process mentioned in #24 sounds like it would fix this.

- No expected failures. The "make check" process sounds like it might also fix that. If some tests sometimes fail due to timing issues, maybe the timing-sensitive ones should be run separately? Not sure.

- No need for exclusions -- all tests should be run each invocation. That might require killing some of the existing tests, but if people aren't running certain tests anyway that's no great loss.

- A fairly short run-time. 5 minutes is great, 10 is ok, much longer and you'll get people wanting to skip them.

I understand that getting all those properties may well be a lot of work! :)

Jim Blandy :jimb

Comment 28

•

15 years ago

Many thanks to you, Bob, for all your work!  I use js/tests consistently (last night's red notwithstanding, koff koff), and not just to catch regressions.  When I can't figure out how/whether a given condition could ever possibly arise in the interpreter, I add an assert and run the test suite --- it's a corpus of code to search for examples!  (I've not yet needed to run the whole browser suite to find what I'm looking for.)

Brendan Eich [:brendan]

Comment 29

•

15 years ago

(In reply to comment #28)
> When I can't figure out how/whether a given condition could ever possibly arise
> in the interpreter, I add an assert and run the test suite

I've done this too -- my memory has finite capacity beyond what is reserved for better things than oddball (e4x, e.g.) corner cases ;-).

/be

David Mandelin [:dmandelin]

Comment 30

•

15 years ago

Attached patch JSTest comparison harness — Details — Splinter Review

Attached is a script I have been using to run the jstests and compare them against baseline output. Required setup:

- have a baseline js shell build to test against in |js/src/debug-baseline|
  (can also use |opt| instead of |debug|)
- have a js shell you want to test in |js/src/debug|
  (again, can use |opt|)
- have a /tmp directory

Then, you can just run 'python3 test.py debug', and it will do all the rest, namely run the suite against the baseline (if results are not cached), run the suite against the test version (if results are not cached & created after the js shell was built), and then compare the results.

If everything's OK, the last line of output is 'PASSED'. 

If not, there are test suite mismatches or failures and the output will show them. The output also shows any tests that are newly passed after the word 'FIXES'.

Jason Orendorff [:jorendorff]

Assignee

Updated

•

15 years ago

Attachment #365315 - Flags: review?(sayrer)

Jason Orendorff [:jorendorff]

Assignee

Comment 31

•

15 years ago

See bug 505588 for progress in js tests.

Status: ASSIGNED → RESOLVED

Closed: 15 years ago

Resolution: --- → WONTFIX

David Mandelin [:dmandelin]

Comment 32

•

15 years ago

(In reply to comment #31)
> See bug 505588 for progress in js tests.

See also bug 509629.

jstest.sh 15 years ago Bob Clary [:bc] (inactive) 1.13 KB, text/plain		Details
v1 15 years ago Jason Orendorff [:jorendorff] 316.99 KB, patch	sayrer : review+	Details \| Diff \| Splinter Review
interdiff v1 to v2 15 years ago Jason Orendorff [:jorendorff] 3.34 KB, patch		Details \| Diff \| Splinter Review
v2 15 years ago Jason Orendorff [:jorendorff] 317.17 KB, patch		Details \| Diff \| Splinter Review
JSTest comparison harness 15 years ago David Mandelin [:dmandelin] 4.99 KB, patch		Details \| Diff \| Splinter Review