Closed Bug 637400 Opened 11 years ago Closed 5 years ago

Check that the output of all JITFLAGS tests is the same

Categories

(Core :: JavaScript Engine, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla51
Tracking Status
firefox51 --- fixed

People

(Reporter: paul.biggar, Assigned: jj, Mentored)

References

(Blocks 1 open bug)

Details

(Whiteboard: [good first bug])

Attachments

(1 file, 6 obsolete files)

I'd say this will also need an annotation for when we expect things aren't identical, like things based on timers and callbacks.
Whiteboard: [good first bug]
This is a nice easy bug for someone who knows python and javascript. The attachments are almost complete, but need to be finished and shepherded through.

As background, the files js/src/jit-test/jit_test.py runs all of our tests with a set of flags like -mjp. Those flags are passed to the js executable, and control which jits run. No matter what combination is passed, the tests should outpt the same result (1st patch), and should have the same global state (second patch).

As well as being finished, I think the second patch might be really slow.
Whiteboard: [good first bug] → [mentor=pbiggar]
Hello, I'm new to Mozilla community, Can I try this one ?
Hi Jeremy, welcome to the community!

Paul Biggar isn't with Mozilla anymore, and most likely won't be able to mentor you here. Nevertheless, you're more than welcome to work on this, and I have assigned the bug to you to reflect that.

@terrence: can you mentor this, or recommend someone who can?
Assignee: general → chwing
Status: NEW → ASSIGNED
Flags: needinfo?(terrence)
OS: Mac OS X → All
Hardware: x86 → All
Whiteboard: [mentor=pbiggar]
(In reply to Till Schneidereit [:till] from comment #5)
> @terrence: can you mentor this, or recommend someone who can?

Gladly!

(In reply to Jeremy Fouriaux [:Chwing] from comment #4)
> Hello, I'm new to Mozilla community, Can I try this one ?

Yes, absolutely!

The patches above have rotted quite a bit. The jit-tests driver code has moved to tests/lib/jittests.py for one thing. For another, we're working on making the jit-test platform conformant with the reftests platform, so the work should probably get done on that test suite now. Also, the second patch is a terrible idea in general: we shouldn't do that. The first is a great idea though and something we should definitely do, so I'd start there.

Now that I've looked over the first patch, I actually have a couple problems with how it is implemented. If you are looking at this patch for inspiration, here is what I would fix: First, any tests where the output is different should be filed as bugs and fixed rather than annotating the test. Secondly, there is no reason to create a global array for the results: the first result should just go on the TestCase. Thirdly, the diffing strategy only needs to consider the first and most recent results, not all of them, so there is no reason to store any but the first.

Please ask here or in #jsapi on IRC if you have any questions!
Flags: needinfo?(terrence)
Whiteboard: [good first bug][mentor=terrence@mozilla.com]
Terrence, Till: thanks for your advices, I'll begin on it and go back to you for questions.
Jeremy, could you confirm that you're still working on this?
Flags: needinfo?(chwing)
(In reply to Josh Matthews [:jdm] (on vacation until 5/7) from comment #8)
> Jeremy, could you confirm that you're still working on this?

Sorry, was away for business since a long time. Get back to it and opefully submit a patch within today.
Flags: needinfo?(chwing)
I have one question concerning it: How do you store test outputs ? separated files ? Shall we create an output file if it does not exist or fill the test as failed too ?
Hello, is it still relevant to correct it ?
(In reply to Jeremy Fouriaux [:Chwing] from comment #9)
> (In reply to Josh Matthews [:jdm] (on vacation until 5/7) from comment #8)
> > Jeremy, could you confirm that you're still working on this?
> 
> Sorry, was away for business since a long time. Get back to it and opefully
> submit a patch within today.

Hello, is it still relevant to correct it ?
Flags: needinfo?(terrence)
(In reply to Jeremy Fouriaux [:Chwing] from comment #11)
> Hello, is it still relevant to correct it ?

Yes.

(In reply to Jeremy Fouriaux [:Chwing] from comment #10)
> I have one question concerning it: How do you store test outputs ? separated
> files ? Shall we create an output file if it does not exist or fill the test
> as failed too ?

The test output is read into a string and, I believe, stored as part of the TestResult class. There is not enough output to warrant any filesystem use; we should just compare the strings in memory.
Flags: needinfo?(terrence)
I refreshed the first bug, and execute test suite with no regs, but I still have to write propers tests that the patch work properly...
Attachment #831503 - Flags: review?(terrence)
Sorry I mixed up with Hg queues this patch is not working, I'm fixing it now... (sorry again)

(In reply to Jeremy Fouriaux [:Chwing] from comment #14)
> Created attachment 831503 [details] [diff] [review]
> Refresh 1st patch: Check outputs of different jit invocations against each
> other.
> 
> I refreshed the first bug, and execute test suite with no regs, but I still
> have to write propers tests that the patch work properly...
Attachment #831503 - Attachment is obsolete: true
Attachment #831503 - Flags: review?(terrence)
Attachment #831514 - Flags: review?(terrence)
Comment on attachment 831514 [details] [diff] [review]
Refresh 1st patch: Check outputs of different jit invocations against each other.

Review of attachment 831514 [details] [diff] [review]:
-----------------------------------------------------------------

I'm afraid this patch is going to take quite a bit more work to be ready for submission. To start with, it is riddled with whitespace issues: there is trailing whitespace all over the place, the blocks alternate oddly between 2 and 4 space indents, and the comments frequently do not line up with the code. Even then, this bug is quite old now: if we compare the list of options in parse_jitflags against the set of supported options in js/src/shell/js.cpp, we see that several are not supported anymore and those that are do nothing. I've actually got an r+'d patch in bug 927685 which removes most of this code.

That said, this bug still needs fixing, just at a larger scope than jitflags. In general, any test invocation of the same script should have the same output, regardless of flags. All this really needs is a post-processing step where you find all the tests that are of the same script and compare their output. It should actually be much simpler than what is here.
Attachment #831514 - Flags: review?(terrence)
Thanks Terrence and sorry about space indents, I'll be more careful about that.
I'll get something more generic about all flags, shall I keep the same approach ?
I mean does recording all run invocations that differ make sense or keeping a record of only the first diff is enough ?

(In reply to Terrence Cole [:terrence] from comment #17)
> Comment on attachment 831514 [details] [diff] [review]
> Refresh 1st patch: Check outputs of different jit invocations against each
> other.
> 
> Review of attachment 831514 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> I'm afraid this patch is going to take quite a bit more work to be ready for
> submission. To start with, it is riddled with whitespace issues: there is
> trailing whitespace all over the place, the blocks alternate oddly between 2
> and 4 space indents, and the comments frequently do not line up with the
> code. Even then, this bug is quite old now: if we compare the list of
> options in parse_jitflags against the set of supported options in
> js/src/shell/js.cpp, we see that several are not supported anymore and those
> that are do nothing. I've actually got an r+'d patch in bug 927685 which
> removes most of this code.
> 
> That said, this bug still needs fixing, just at a larger scope than
> jitflags. In general, any test invocation of the same script should have the
> same output, regardless of flags. All this really needs is a post-processing
> step where you find all the tests that are of the same script and compare
> their output. It should actually be much simpler than what is here.
Flags: needinfo?(terrence)
Use whatever approach you think works best in practice.
Flags: needinfo?(terrence)
Sorry guys, I was a long time away from that, is it still relevant ?
Flags: needinfo?(terrence)
Jeremy: yes, I believe your work on this JITFLAGS test is still relevant. If you have more questions, you ask here or in the #jsapi channel on Mozilla's IRC server.
Mentor: terrence
Whiteboard: [good first bug][mentor=terrence@mozilla.com] → [good first bug]
Flags: needinfo?(terrence)
Hi,

Is this bug still alive if anyone is not working on this then I would like to take on this bug?
(In reply to Jinank Jain from comment #22)
> Hi,
> 
> Is this bug still alive if anyone is not working on this then I would like
> to take on this bug?

Yes, it's still quite relevant! I expect that having this feature probably would have saved us some time in the past by detecting regressions earlier.

The prior patches in this bug did not actually work, so please disregard them. I'll mark them obsolete to get them out of the way.
Attachment #525643 - Attachment is obsolete: true
Attachment #831514 - Attachment is obsolete: true
Attachment #525644 - Attachment is obsolete: true
(In reply to Terrence Cole [:terrence] from comment #23)
> (In reply to Jinank Jain from comment #22)
> > Hi,
> > 
> > Is this bug still alive if anyone is not working on this then I would like
> > to take on this bug?
> 
> Yes, it's still quite relevant! I expect that having this feature probably
> would have saved us some time in the past by detecting regressions earlier.
> 
> The prior patches in this bug did not actually work, so please disregard
> them. I'll mark them obsolete to get them out of the way.

Could you please elaborate over this bug? Looking at the previous comments I was not able to make out what is actually needed?
Flags: needinfo?(terrence)
Both jit-tests (js/src/jit-test/jit_tests.py) and jsreftests (js/src/test/jstests.py) get run with the --tbpl flag in automation. This flag causes us to run each test 5 times, passing different flags to the interpreter each time: these flags are defined at [1], the JITFLAGS of the title.

The flags we set here affect how SpiderMonkey does its work, but it should not affect the ultimate result of that work: the stdout, stderr, and return code that each test produces. In theory, these outputs will only ever differ if there is a bug in one of the tested configurations: e.g. if one of the JITs does not interpret an instruction in the same way as the interpreter, for example. The goal of this bug is to verify that every run of the same test always produces the same results, regardless of the flags we passed. This way, if a change is introduced that affects the output of any one configuration differently from others, it will get immediately detected in automation, before getting uplifted into a firefox release. 

Unfortunately, jit-tests and jsreftests process their output differently, so you will have to solve this bug independently between the two tests suites. I'd focus on jsreftests first, as it has more a much cleaner internal implementation. In jsreftests, we collect all outputs into the TestOutput structure at [2] and pass it to ResultSink.push at [3]. I would add a dict to ResultSink that stores each test's name (output.test.path) as the key and the expected output (just the "output" variable really). Each time we push a new test, see if the path has already been run, if not, add the output to the map, if it has, check that the output is the same as the one already stored. The rest of the code in ResultSink.push should show you how to produce a new TEST-UNEXPECTED-FAIL if they differ.

You will probably run into a few cases where the output differs already. In those cases you will have to figure out if it is a real bug, or something the test case does explicitly, like returning early if --no-threads is passed. We can discuss what to do if and when it comes up.

Good luck! 

1- http://searchfox.org/mozilla-central/source/js/src/tests/lib/tests.py#16-22
2- http://searchfox.org/mozilla-central/source/js/src/tests/lib/results.py#11-30
3- http://searchfox.org/mozilla-central/source/js/src/tests/lib/results.py#117
Flags: needinfo?(terrence)
Attached patch try_v1.diff (obsolete) — Splinter Review
Assignee: chwing → jinank94
Attachment #8779599 - Flags: review?(terrence)
Comment on attachment 8779599 [details] [diff] [review]
try_v1.diff

Review of attachment 8779599 [details] [diff] [review]:
-----------------------------------------------------------------

This is a great first attempt, but there is quite a bit that needs to change before we can land this.

First off, it crashes immediately because there is not an output_differ property in NullTestOutput. Please do test your patches before submitting them for review.

::: js/src/tests/jstests.py
@@ +330,4 @@
>  
>  def main():
>      options, prefix, requested_paths, excluded_paths = parse_args()
> +    output_dict = {}

Please move this to ResultSink.__init__.

@@ +369,5 @@
>          try:
>              for out in run_all_tests(test_gen, prefix, results.pb, options):
> +                if out.test.path in output_dict.keys():
> +                    if output_dict[out.test.path] != out:
> +                        out.output_differ = True

Instead of tagging the test output, let's just move this entire block into ResultSink.push and record the error directly. We already do all of our other output processing in that method, so it doesn't make sense to do this one check just outside that method.

::: js/src/tests/lib/results.py
@@ +18,4 @@
>          self.rc = rc       # int:   return code
>          self.dt = dt       # float: run time
>          self.timed_out = timed_out # bool: did the test time out
> +        self.output_differ = False # bool: Check that output differs with different flags for same test

Once we're checking the output status in ResultSink.push, we no longer need to tag the test, so we can remove this line.

@@ +118,5 @@
>      def push(self, output):
>          if output.timed_out:
>              self.counts['TIMEOUT'] += 1
> +        if output.output_differ:
> +            self.counts['FAIL'] += 1

It's not enough to just add to the failure count. This will fail the test suite, but it won't tell the user what failed. You'll want to do the output checking in the else block of |if isinstance(ouptut, NullTestOutput)| and adjust the error reporting code to take that check into effect, printing something useful in the output, such as the flags that differ and the differing output.
Attachment #8779599 - Flags: review?(terrence)
What are the tests that need to run before submitting the patch?

For printing the error I am using this 
self.print_automation_result("TEST-UNEXPECTED-FAIL", result.test, time=output.dt,
                            message="Same test with different flag producing different output")
Flags: needinfo?(terrence)
(In reply to Jinank Jain from comment #28)
> What are the tests that need to run before submitting the patch?

The code you're writing is *for* a test suite. If you run js/src/test/jstests.py with no args, it will do nothing and ask for you to tell it where the shell you want to test is. If you pass it a shell so that it can actually test something, it will immediately crash. This implies to me that you didn't even verify that the code you are editing runs, much less does the right thing.

> For printing the error I am using this 
> self.print_automation_result("TEST-UNEXPECTED-FAIL", result.test,
> time=output.dt,
>                             message="Same test with different flag producing
> different output")

You tell me. When that code runs does it look like other adjacent output? Does it provide you with all the info you would need to investigate the problem?
Flags: needinfo?(terrence)
Attached patch try_v2.diff (obsolete) — Splinter Review
I ran the test and it was working fine this time
Attachment #8779852 - Flags: review?(terrence)
Comment on attachment 8779852 [details] [diff] [review]
try_v2.diff

Review of attachment 8779852 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good!
Attachment #8779852 - Flags: review?(terrence) → review+
Attachment #8779599 - Attachment is obsolete: true
Finally after 5+ years this bug is gonna close. :)
Pushed by cbook@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/83860b1448ec
Check that the output of all JITFLAGS tests is the same. r=terrence
Keywords: checkin-needed
Does this patch fix things for jit-tests (js/src/jit-test/jit_tests.py) too ? We need a similar implementation for that too.
Flags: needinfo?(terrence)
(In reply to Jinank Jain from comment #34)
> Does this patch fix things for jit-tests (js/src/jit-test/jit_tests.py) too
> ? We need a similar implementation for that too.

No, jit-tests has separate result processing code. It would be nice to have the same thing happen for that test suite as well. It's generally best to have one patch per bug though, so I've opened bug 1294466 for that work.
Flags: needinfo?(terrence)
(In reply to Wes Kocher (:KWierso) from comment #35)
> I had to back this out for frequent (but not constant) rootanalysis failures
> like
> https://treeherder.mozilla.org/logviewer.html#?job_id=33709134&repo=mozilla-
> inbound
> 
> https://hg.mozilla.org/integration/mozilla-inbound/rev/80af899d3462

It is causing a lot of test failures. Anything that could be done to fix that?
Flags: needinfo?(wkocher)
Flags: needinfo?(terrence)
Flags: needinfo?(jinank94)
(In reply to Jinank Jain from comment #37)
> (In reply to Wes Kocher (:KWierso) from comment #35)
> > I had to back this out for frequent (but not constant) rootanalysis failures
> > like
> > https://treeherder.mozilla.org/logviewer.html#?job_id=33709134&repo=mozilla-
> > inbound
> > 
> > https://hg.mozilla.org/integration/mozilla-inbound/rev/80af899d3462
> 
> It is causing a lot of test failures. Anything that could be done to fix
> that?

Well, the good news is that your work has discovered a flaw in SpiderMonkey! The bad news is that work needs to happen to fix SpiderMonkey before we can enable this patch by default in automation.

As a first step, you should alter the patch such that it is only enabled when a flag is passed to the suite. I'd suggest --check-output/-C. This will let us check the patch in so that others can use the work to improve spidermonkey.
Flags: needinfo?(terrence)
Attached patch try_v3.diffSplinter Review
I guess this will fix things :)
Attachment #8780231 - Flags: review?(terrence)
Comment on attachment 8780231 [details] [diff] [review]
try_v3.diff

Review of attachment 8780231 [details] [diff] [review]:
-----------------------------------------------------------------

Yup, looks good.
Attachment #8780231 - Flags: review?(terrence) → review+
Attachment #8779852 - Attachment is obsolete: true
Pushed by ryanvm@gmail.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/cdafadc2edc6
Check that the output of all JITFLAGS tests is the same. r=terrence
Keywords: checkin-needed
https://hg.mozilla.org/mozilla-central/rev/cdafadc2edc6
Status: ASSIGNED → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla51
You need to log in before you can comment on or make changes to this bug.