Closed Bug 899697 Opened 11 years ago Closed 11 years ago

Intermittent TEST-UNEXPECTED-FAIL | jit-test\tests\auto-regress\bug704136.js | --ion-eager: jit-test\tests\auto-regress\bug704136.js:8:0 ReferenceError: jsTestDriverEnd is not defined

Categories

(Core :: JavaScript Engine, defect)

x86
Windows 7
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla27
Tracking Status
firefox25 --- fixed
firefox26 --- fixed
firefox27 --- fixed
firefox-esr24 --- fixed

People

(Reporter: RyanVM, Assigned: jandem)

References

Details

(Keywords: intermittent-failure)

Attachments

(1 file)

https://tbpl.mozilla.org/php/getParsedLog.php?id=25915625&tree=Fx-Team

WINNT 5.2 fx-team build on 2013-07-30 07:56:15 PDT for push 051d88f15bc7
slave: w64-ix-slave130

TEST-PASS | e:\builds\moz2_slave\fx-team-w32-000000000000000000\build\js\src\jit-test\tests\auto-regress\bug704136.js | 
FAIL - e:\builds\moz2_slave\fx-team-w32-000000000000000000\build\js\src\jit-test\tests\auto-regress\bug704136.js
TEST-UNEXPECTED-FAIL | e:\builds\moz2_slave\fx-team-w32-000000000000000000\build\js\src\jit-test\tests\auto-regress\bug704136.js | --ion-eager: e:\builds\moz2_slave\fx-team-w32-000000000000000000\build\js\src\jit-test\tests\auto-regress\bug704136.js:8:0 ReferenceError: jsTestDriverEnd is not defined

TEST-PASS | e:\builds\moz2_slave\fx-team-w32-000000000000000000\build\js\src\jit-test\tests\auto-regress\bug704136.js | --baseline-eager
TEST-PASS | e:\builds\moz2_slave\fx-team-w32-000000000000000000\build\js\src\jit-test\tests\auto-regress\bug704136.js | --baseline-eager --no-ti --no-fpu
TEST-PASS | e:\builds\moz2_slave\fx-team-w32-000000000000000000\build\js\src\jit-test\tests\auto-regress\bug704136.js | --no-baseline --no-ion
TEST-PASS | e:\builds\moz2_slave\fx-team-w32-000000000000000000\build\js\src\jit-test\tests\auto-regress\bug704136.js | --no-baseline --no-ion --no-ti
We've had a lot of intermittent jit-test failures like this one recently: some test that's expected to throw, say, a ReferenceError fails with the error we expect, and still the test is marked as failure.

I also had one on Try: https://tbpl.mozilla.org/?tree=Try&rev=dd3bd6f21e16

My Try build, bug 899611 and this bug are all Windows-only. I wonder if there's a threading/timing related bug in the jit-test harness or something.
Bug 880086 is another one. The test looks like this:

// |jit-test| error:Error

function jsTestDriverEnd() {}
this.__defineSetter__("x", function () {});
x %= 5;
jsTestDriverEnd();
mjitChunkLimit();

The interpreter throws an error, as expected:

TEST-UNEXPECTED-FAIL | e:\builds\moz2_slave\m-in-w32-000000000000000000000\build\js\src\jit-test\tests\auto-regress\bug726636.js | --no-baseline --no-ion --no-ti: e:\builds\moz2_slave\m-in-w32-000000000000000000000\build\js\src\jit-test\tests\auto-regress\bug726636.js:10:0 ReferenceError: mjitChunkLimit is not defined

And yet, it's marked as failure... This one is also Windows 7.

It's possible the shell returns an exit code other than 3 for some reason, but I've no idea why. I will try to reproduce this on Windows now.
(In reply to Jan de Mooij [:jandem] from comment #2)
> It's possible the shell returns an exit code other than 3 for some reason,
> but I've no idea why. I will try to reproduce this on Windows now.

I'm able to reproduce this on Windows after running the same test thousands of times. Will add some logging and see if that tells us anything...
OK, so the exit code is 0 instead of 3, even though we do print the uncaught exception... If I change the shell to always return a non-zero value from main(), the jit-test harness still thinks it's 0...

So either the jit-test harness is wrong, or it's a threading issue somehow.
Unfortunately, I won't be near my Windows pc for a few days, so I can't debug this until Monday :( Anybody should feel free to investigate further...

If you run the bug726636.js jit-test in a loop, with an opt32 thread-safe Windows shell build, it should fail in < 15 minutes. I will try this on OS X as well, but it looks like this is Windows only.

The shell seems to exit with code 0 every X thousand runs, no matter what main() returns. I looked for exit(..) calls but couldn't find anything interesting..
Bug 892697 is another one, also Windows. Test expects an error, shell throws that error, but somehow exits with a code other than 3.

I will get to the bottom of this in a few days, when I'm back.
Assignee: general → jdemooij
Status: NEW → ASSIGNED
Bug 776043 exposed this bug. Before that bug, we'd only check if stderr contained the expected error and we ignored the return code. Now we also make sure the return code is 3.

And indeed, a few days later the first of these bugs were filed. Here's a list of bugs that are all caused by this bug. List may be incomplete:

Bug 874858, bug 876213, bug 878611, bug 880086, bug 881403, bug 881604, bug 883224, bug 883327, bug 884064, bug 884183, bug 884451, bug 885142, bug 885146, bug 886171, bug 887003, bug 887559, bug 888868, bug 891063, bug 892697, bug 892975, bug 894032, bug 894436, bug 894613, bug 897298, bug 898084, bug 899611, bug 899697, bug 902047, bug 902052

All of these are Windows 7. I'm still investigating what's causing this on Windows.
Even if main() always returns 3, it will fail every X thousand runs. I'm still narrowing it down but I think it's caused by something we do when destroying the JSRuntime.

It's possible for a process to return 0 although the "real" exit code was something else:

http://blogs.msdn.com/b/oldnewthing/archive/2008/05/06/8461730.aspx

My best guess is that it's an NSPR locking/threading thing triggering that somehow.
These intermittent Windows-only jit-test failures keep coming in (philor just filed at least 7 bugs, Ryan has also filed lots of them over the past few weeks and see also the list in comment 7).

I think we should revert bug 776043 for Windows (don't check the return code there). Terrence, are you ok with that?
Flags: needinfo?(terrence)
Or a bit better: on Windows always allow code 0, so that we will still fail if we see another error code.
(In reply to Jan de Mooij [:jandem] from comment #10)
> Or a bit better: on Windows always allow code 0, so that we will still fail
> if we see another error code.

Yes, that sounds like an excellent workaround. I'd guess python's multiprocessing just does not work as well on windows.

Christian, did you see any notes in the multiprocessing docs that might cause the above when you were parallelizing the jit-tests?
Flags: needinfo?(terrence) → needinfo?(choller)
Attached patch jittest-fixSplinter Review
Thanks, Jan. Let's just patch this now.
Attachment #806129 - Flags: review?(terrence)
Comment on attachment 806129 [details] [diff] [review]
jittest-fix

Review of attachment 806129 [details] [diff] [review]:
-----------------------------------------------------------------

Thanks for beating me to this! The logic looks correct: r=me.
Attachment #806129 - Flags: review?(terrence) → review+
https://hg.mozilla.org/mozilla-central/rev/105a4186e79d
Status: ASSIGNED → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla27
I assume my info isn't required here anymore :)
Flags: needinfo?(choller)
Depends on: 992276
Depends on: 992279
Depends on: 992282
Depends on: 992283
Bah, bad awesomebar.
No longer depends on: 992276, 992279, 992282, 992283
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: