allow "fails", "random" to be specified for "load" reftests tests

RESOLVED FIXED

Status

enhancement
RESOLVED FIXED
10 years ago
10 years ago

People

(Reporter: bc, Assigned: bc)

Tracking

(Blocks 1 bug)

Trunk
Points:
---
Dependency tree / graph
Bug Flags:
in-testsuite -

Firefox Tracking Flags

(Not tracked)

Details

(Whiteboard: fixed-in-tracemonkey)

Attachments

(2 attachments, 4 obsolete attachments)

Assignee

Description

10 years ago
Posted patch proposed patch (obsolete) — Splinter Review
I wish to use the reftest framework for running the JavaScript Tests in the browser as part of the effort to get the JavaScript Tests running on tinderbox. See bug 469718.

The ability of the reftests framework to selectively run tests based upon various criteria, such as platform, processor architecture or build type, is important for running the JavaScript tests which may fail for debug but not opt or may fail on Mac OS X intel but not on other platforms.

The minimal requirement to run the JavaScript Tests using reftest is to be able to specify "fails" or "random" for "load" tests. Currently, the reftest manifests preclude using "fails" or "random" with a "load" only test. 

A secondary requirement is the ability to report on the individual test case results from each test file. The simplest approach I can think of involves the reftest extension retrieving the pass/fail information from the test's content and using its knowledge of the manifest entry's conditional value to report expected vs. unexpected failures. This could involve just checking a single global variable or could involve retrieving all of the test results from the test's content.

The attached patch implements one approach to solve this. It assumes that the test exposes a function |getTestCases()| which returns an array of test results and that each item in the array exposes two functions: |testPassed()| and |testDescription()|.

By intention, the patch reuses and duplicates the |outputs| object for writing the reftest results in order to minimize changes to reftest.js. This could be changed to specify |outputs| only once, if the approach is approved.

The patch has been tested by running the reftests and crashtests in opt and debug as well as the JavaScript Tests.
Flags: in-testsuite-
Attachment #384107 - Flags: review?(dbaron)
One thing I'm not sure about here is whether it's best to reuse 'load' or add a new keyword that triggers this behavior of loading a test and then poking into it for scripted test results.  I think I probably prefer a new keyword, but I'd be interested in Waldo's thoughts as well.  (I'm not quite sure what to call it, though... maybe 'script'?)

(In reply to comment #0)
> By intention, the patch reuses and duplicates the |outputs| object for writing
> the reftest results in order to minimize changes to reftest.js. This could be
> changed to specify |outputs| only once, if the approach is approved.

Please change it so outputs is specified only once.  (It's not only approved, but required.)
Assignee

Comment 2

10 years ago
Waldo, did you have any opinion about the new keyword?

David, would a keyword of "test" be ok? That seems more descriptive and on target than "script".
Sorry, slipped below the top of bugmail...

I agree a new keyword should be used; this is quite different from the traditional meaning of "load".

I *don't* think "test" is a good name, that's entirely too overloaded and/or unmeaningful in this context.  "script" is a plausible name; I don't really have any suggestions to make.
Attachment #384107 - Flags: review?(dbaron) → review-
Comment on attachment 384107 [details] [diff] [review]
proposed patch

Could you request review again with the comments above incorporated (merging outputs, using new "script" keyword)?
Assignee

Comment 5

10 years ago
Posted patch patch v2 (obsolete) — Splinter Review
Attachment #384107 - Attachment is obsolete: true
Attachment #388319 - Flags: review?(dbaron)
Assignee

Comment 6

10 years ago
Comment on attachment 388319 [details] [diff] [review]
patch v2

sorry, this fails in the reftest sanity tests.
Attachment #388319 - Flags: review?(dbaron)
Assignee

Comment 7

10 years ago
Posted patch patch v3 (obsolete) — Splinter Review
this runs reftests, crashtests and the jstests afaict.
Attachment #388319 - Attachment is obsolete: true
Attachment #388493 - Flags: review?(dbaron)
Assignee

Comment 8

10 years ago
dbaron: review ping?
Attachment #388493 - Flags: review?(dbaron) → review-
Comment on attachment 388493 [details] [diff] [review]
patch v3

Adding the value EXPECTED_SCRIPT really makes things awfully confusing.
EXPECTED_LOAD was a bit of a hack, but it sort of worked; with this it no
longer makes sense and requires a bunch of extra complication.

Instead, you should:
 * remove EXPECTED_LOAD and EXPECTED_SCRIPT
 * replace the "equal" field with a new "type" field to the objects
   (probably right before "expected") taking values TYPE_REFTEST_EQUAL,
   TYPE_REFTEST_NOTEQUAL, TYPE_LOAD, and TYPE_SCRIPT

>+             getTestCases() returns an array of test result objects
>+             representing the results of the tests performed by the page.
>+
>+             Each test result object has two methods:
>+
>+             testPassed() returns true if the test result object passed,
>+             otherwise it returns false.
>+
>+             testDescription() returns a string describing the test
>+             result.

Why are these methods rather than properties?  Is the idea that the
tests are actually executed when the harness calls the methods?  If so,
it's probably better that the tests not be executed twice (though I have
some suggested changes below that would fix that).

>@@ -843,2 +881,91 @@ function DocumentLoaded()
>         dump("REFTEST TEST-PASS | " + gURLs[0].prettyPath + " | (LOAD ONLY)\n");
>+    }
>+    else if (gURLs[0].expected == EXPECTED_SCRIPT ||
>+        ((gURLs[0].expected == EXPECTED_FAIL ||
>+          gURLs[0].expected == EXPECTED_RANDOM) && !gURLs[0].url2)) {


You're chopping off the last 2 lines of the EXPECTED_LOAD case
(FinishTestItem and return) so that they don't return early anymore, but
they need to.

>+        if (!testwindow.getTestCases)
>+        {
>+            // "script" tests must provide a getTestCases() object.
>+            // Force an unexpected failure.
>+            testpassed = false;
>+            ++gTestResults[outputs[EXPECTED_PASS][testpassed].n];
>+            result = "REFTEST " + outputs[expected][test_passed].s + " | " +
>+                gURLs[0].prettyPath + " | " + // the URL being tested
>+                "missing test results (SCRIPT).\n";
>+
>+            dump(result);
>+        }
>+        else
>+        {
>+            var testcases = testwindow.getTestCases();
>+
>+            if (testcases.length == 0)
>+            {
>+                // Treat empty testcases array as a failure.
>+                test_passed = false;
>+
>+                ++gTestResults[outputs[expected][test_passed].n];
>+
>+                result = "REFTEST " + outputs[expected][test_passed].s + " | " +
>+                    gURLs[0].prettyPath + " | " + // the URL being tested
>+                    "No test results reported.\n";
>+
>+                dump(result);

I'm not entirely sure the subtle distinction here is worth it, but it's
ok with me if you think it is.  If you wanted, you could combine these
two cases by doing:

var testcases;
if (!testwindow.getTestcases ||
    !(testcases = testwindow.getTestCases())) {

    // code you previously had twice goes here

}

In both, rather than setting test_passed = false (you typo it as
testpassed twice, though, leading to one reference to it when it's been
uninitialized), you should just do:
  var output = outputs[EXPECTED_PASS][false];
or
  var output = outputs[expected][false];
and then refer to output.s and output.n.  (I probably should have done
the same at the existing site that uses outputs.  Feel free to fix that
if you want.)

>+                for (var itest = 0; itest < testcases.length; itest++)
>+                {
>+                    test_passed = test_passed && testcases[itest].testPassed();
>+                }
>+                if (testcases.length > 1 || !test_passed)
>+                {
>+                    outputs[EXPECTED_FAIL][true].s = "TEST-PASS";
>+                }

While |outputs| isn't const, I'd really like to continue to behave as
though it is.

Also, (in the same code), I'd really like the numbers in gTestResults to
continue to match the number of output lines we print with test results.

Also, I think the tests that pass in a test that marked failing should
probably be treated as though they were EXPECTED_RANDOM rather than
EXPECTED_FAIL.

So to fix all three of these, you could do something like:

var results = testcases.map(function(test) { return test.testPassed(); });
var anyFailed = results.some(function(result) { return !result; });
var outputPair;
if (anyFailed && expected == EXPECTED_FAIL) {
    // If we're marked as expected to fail, and some (but not all) tests
    // passed, treat those tests as though they were marked random
    // (since we can't tell whether they were really intended to be
    // marked failing or not).
    outputPair = { true: outputs[EXPECTED_RANDOM][true],
                   false: outputs[expected][false] };
} else {
    outputPair = outputs[expected];
}
var index = 0;
results.forEach(function(result) {
    var output = outputPair[result];

    ++gTestResults[output.n];
    result = "REFTEST " + output.s + " | " +
             gURLs[0].prettyPath + " | " + // the URL being tested
             testcase.testDescription() + " item " + (++index) + "\n";
    dump(result);
});



With those issues fixed, I think this looks good, though I'd like to
look at the new patch.  Sorry for the delay in getting to this.
(I should be able to get to a revised version a good bit faster; I've
cleared out much of my review queue backlog in the past few weeks.)
(In reply to comment #8)
> dbaron: review ping?

Oh, and I just saw this review ping about a minute ago. :-)
Assignee

Comment 11

10 years ago
Posted patch patch v4Splinter Review
(In reply to comment #9)
> Instead, you should:
>  * remove EXPECTED_LOAD and EXPECTED_SCRIPT
>  * replace the "equal" field with a new "type" field to the objects
>    (probably right before "expected") taking values TYPE_REFTEST_EQUAL,
>    TYPE_REFTEST_NOTEQUAL, TYPE_LOAD, and TYPE_SCRIPT
> 

done. 

> 
> Why are these methods rather than properties?  Is the idea that the
> tests are actually executed when the harness calls the methods?  If so,
> it's probably better that the tests not be executed twice (though I have
> some suggested changes below that would fix that).
> 

The tests are expected to run at load and to use class="reftest-wait" to inform reftest that they are complete.

I used methods to hopefully make it simpler for others to incorporate test suites using reftest by simply providing the required methods which could wrap and return the test-specific representations in a standard format.

> >@@ -843,2 +881,91 @@ function DocumentLoaded()
> >         dump("REFTEST TEST-PASS | " + gURLs[0].prettyPath + " | (LOAD ONLY)\n");
> >+    }
> >+    else if (gURLs[0].expected == EXPECTED_SCRIPT ||
> >+        ((gURLs[0].expected == EXPECTED_FAIL ||
> >+          gURLs[0].expected == EXPECTED_RANDOM) && !gURLs[0].url2)) {
> 
> 
> You're chopping off the last 2 lines of the EXPECTED_LOAD case
> (FinishTestItem and return) so that they don't return early anymore, but
> they need to.
> 

I wondered how that missed my testing. I actually had that fixed but it was mixed up in the next patch in my queue. Sorry.

> 
> I'm not entirely sure the subtle distinction here is worth it, but it's
> ok with me if you think it is.  If you wanted, you could combine these
> two cases by doing:
> 

The idea was to flag the case where the test writer failed to provide the required function in the one case and to flag where the return list of tests failed to contain any results. In the JS Engine at least, regressions which cause the list to be empty due to silent script termination do occur. I cleaned it up a bit though.

> 
> In both, rather than setting test_passed = false (you typo it as
> testpassed twice, though, leading to one reference to it when it's been

doh.

> So to fix all three of these, you could do something like:

done.
Attachment #388493 - Attachment is obsolete: true
Attachment #393537 - Flags: review?(dbaron)
Comment on attachment 393537 [details] [diff] [review]
patch v4

+        var missing_msg;

Maybe initialize to false?  (Would that avoid a strict warning?)

+        else if (!(testcases = testwindow.getTestCases())) {

declare "var testcases" above the if-else chain?

+            output = outputs[expected][test_passed];

Should probably be "var output ="


r=dbaron with that
Attachment #393537 - Flags: review?(dbaron) → review+
Assignee

Comment 13

10 years ago
(In reply to comment #12)
> (From update of attachment 393537 [details] [diff] [review])
> +        var missing_msg;
> 
> Maybe initialize to false?  (Would that avoid a strict warning?)

No, I don't think there is a strict warning for uninitialized variable declarations. But no harm, no foul. Ok.

> 
> +        else if (!(testcases = testwindow.getTestCases())) {
> 
> declare "var testcases" above the if-else chain?

Darn. Thanks!

> 
> +            output = outputs[expected][test_passed];
> 
> Should probably be "var output ="
> 

I already did that following the definition of |outputs|. Is that ok?
Assignee

Comment 14

10 years ago
Posted patch patch v5 (obsolete) — Splinter Review
incorporates changes from comment 12 but leaves var output below the var outputs.
Assignee

Comment 15

10 years ago
Posted patch patch v6Splinter Review
I missed adding the new type to print-manifest-dirs.py in the previous patches.

David, is it ok to carry the r+ forward to this patch?
Attachment #393943 - Attachment is obsolete: true
Attachment #394309 - Attachment is patch: true
Attachment #394309 - Attachment mime type: application/octet-stream → text/plain
Comment on attachment 394309 [details] [diff] [review]
patch v6

Well, since I just looked at the differences, I may as well carry it forward myself. :-)

Though I'm not sure what this line is:
+  topsrcdir = os.path.normpath(topsrcdir)

Is it for something different?
Attachment #394309 - Flags: review+
Assignee

Comment 17

10 years ago
Oh, sorry. I didn't mean to try to slip that by you.

I'm trying to shoe horn the JavaScripts tests into build's Makefiles and have a patch in progress for bug 468913 that does

--- a/js/src/Makefile.in
+++ b/js/src/Makefile.in
@@ -61,2 +61,6 @@ endif
 
+ifdef ENABLE_TESTS
+DIRS += ../tests
+endif
+

While working on the packaging I at first was going to use the normal approach of using print-manifest-dirs.py pointing to the top level manifest file and found that it threw on the line     if d[0] == '/': with an invalid index error due to the /../ in the path.

I've since decided not to use print-manifest-dirs.py in that way since it ends up copying the entire js/tests directory. I can drop the normpath call if you want.
Probably better to put that in bug 468913 if you need it there, I think.
Assignee

Comment 19

10 years ago
http://hg.mozilla.org/mozilla-central/rev/c01c8bea9ead
http://hg.mozilla.org/tracemonkey/rev/d18db5828c6e
Assignee: nobody → bclary
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Whiteboard: fixed-in-tracemonkey
Assignee

Updated

10 years ago
Blocks: 511611
Assignee

Comment 20

10 years ago
This was landed on 1.9.2 on Oct 1, 2009
http://hg.mozilla.org/releases/mozilla-1.9.2/rev/e093beeee710
You need to log in before you can comment on or make changes to this bug.