Resolve timeouts in jit-test parallel suite

RESOLVED FIXED in mozilla31

Status

()

defect
RESOLVED FIXED
6 years ago
5 years ago

People

(Reporter: dminor, Unassigned)

Tracking

Trunk
mozilla31
ARM
Android
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(Not tracked)

Details

In process of working on bug 858622 I've run into a couple of tests in the parallel jit-test suite that fail intermittently when running on my pandaboard:

/home/dminor/mozilla-central/js/src/jit-test/tests/parallel/ic-getelement.js
/home/dminor/mozilla-central/js/src/jit-test/tests/parallel/ic-getproperty.js

These need to be resolved or marked as intermittent/skippable before the tests can be scheduled on tbpl.
Assignee: nobody → general
Component: General → JavaScript Engine
Product: Testing → Core
Blocks: 912997
The parallel suite has recently begun failing (possibly intermittently) on Mac OS X 10.6 and Windows XP:

13:27:33     INFO -  TIMEOUT - parallel\timeout-gc.js
13:27:33  WARNING -  TEST-UNEXPECTED-FAIL | tests\jit-test\jit-test\tests\parallel\timeout-gc.js |
13:27:33     INFO -  INFO exit-status     : -1
13:27:33     INFO -  INFO timed-out       : True
13:27:33     INFO -  INFO stdout          >
13:27:33     INFO -  INFO stderr         2>
13:27:33     INFO -  TIMEOUT - parallel\timeout-gc.js
13:27:33  WARNING -  TEST-UNEXPECTED-FAIL | tests\jit-test\jit-test\tests\parallel\timeout-gc.js | --ion-eager --ion-parallel-compile=off
13:27:33     INFO -  INFO exit-status     : -1
13:27:33     INFO -  INFO timed-out       : True
13:27:33     INFO -  INFO stdout          >
13:27:33     INFO -  INFO stderr         2>
13:27:33     INFO -  TIMEOUT - parallel\timeout-gc.js
13:27:35  WARNING -  TEST-UNEXPECTED-FAIL | tests\jit-test\jit-test\tests\parallel\timeout-gc.js | --ion-eager --ion-parallel-compile=off --ion-check-range-analysis --no-sse3
13:27:35     INFO -  INFO exit-status     : -1
13:27:35     INFO -  INFO timed-out       : True
13:27:35     INFO -  INFO stdout          >
13:27:35     INFO -  INFO stderr         2>
13:27:35     INFO -  TIMEOUT - parallel\timeout-gc.js
13:27:35  WARNING -  TEST-UNEXPECTED-FAIL | tests\jit-test\jit-test\tests\parallel\timeout-gc.js | --baseline-eager
13:27:35     INFO -  INFO exit-status     : -1
13:27:35     INFO -  INFO timed-out       : True
13:27:35     INFO -  INFO stdout          >
13:27:35     INFO -  INFO stderr         2>
13:27:35     INFO -  TIMEOUT - parallel\timeout.js
13:27:35  WARNING -  TEST-UNEXPECTED-FAIL | tests\jit-test\jit-test\tests\parallel\timeout.js |
13:27:35     INFO -  INFO exit-status     : -1
13:27:35     INFO -  INFO timed-out       : True
13:27:35     INFO -  INFO stdout          >
13:27:35     INFO -  INFO stderr         2>
Blocks: 973900
Summary: Resolve intermittent failures in jit-test parallel suite on Panda → Resolve failures in jit-test parallel suite
Depends on: 977711
Giving these more time (20 minutes) does not help.
These pass on the linux slaves which are single core, so I tried running them sequentially (--worker-count=1) but it did not help.
Summary: Resolve failures in jit-test parallel suite → Resolve timeouts in jit-test parallel suite
I don't see anything obvious left to try with these test cases.

Any more suggestions, or are we down to trying to bisect this? I'm almost certain these were ok back in January when I was going through the first batch of test machine specific failures.
Flags: needinfo?(terrence)
Forwarding need-info to Niko.
Flags: needinfo?(terrence)
Err, there.
Flags: needinfo?(nmatsakis)
I'm not sure what's going on here. There seem to be two distinct problems collected in this bug:

1. `ic-setelement` and `ic-getelement`

2. `timeout` and `timeout-gc`

For the second one, I also started seeing problems locally (but apparently not on tbpl?). This confuses me. Those tests test an infinite loop that is supposed to be interrupted by the (shell equivalent of) the slow script dialog. It seems like there is a problem! I know that Shu was looking at that at one point, so I will flag him with needinfo as well. 

Regarding the IC tests, I have no idea what could be going on there.

I'm leaving my needinfo since I don't think this comment really provides a lot of info yet. ;)
Flags: needinfo?(shu)
To provide a bit more context we're running these from the test package on Cedar: https://tbpl.mozilla.org/?tree=Cedar.

They are still running as part of make check, which means that it is passing on the WinXP builder, but failing on the WinXP test machine, and passing on the OS X 10.8 builder, but failing on the OS X 10.6 test machine.

The only difference between the WinXP test machine and the build machine that I'm certain about is that the test machines are sensitive to large memory allocations. Multiple small allocations will work fine, but single large allocations that work on the build machine will fail on the test machine. I don't think this is relevant here, but just in case.

I have a WinXP test machine loaner from releng if you have a patch you would like me to test. I can also ask for a 10.6 loaner if that would be useful, but I was going to see if solving it on WinXP fixes it on 10.6 before asking for one.
Dan -- can you point me at an instance where they actually fail?
Flags: needinfo?(nmatsakis) → needinfo?(dminor)
Niko, here is an instance from a recent run on cedar:
https://tbpl.mozilla.org/php/getParsedLog.php?id=36344679&tree=Cedar&full=1
Flags: needinfo?(dminor)
How rare are the timeouts? It is possible that that test hits a nasty corner case in the new scheduler that causes a deadlock or something, but without reliable STR, I can't find any bugs from just auditing the scheduler code. :(
Shu -- I think I am seeing timeouts in these two tests (timeout, timeout-gc) on my local machine quite regularly, actually. I have never seen any problems with the IC tests. The link to TPBL that dminor provided is also for timeout and timeout-gc, so presumably those tests are the actual issue.
(In reply to Shu-yu Guo [:shu] from comment #11)
> How rare are the timeouts? It is possible that that test hits a nasty corner
> case in the new scheduler that causes a deadlock or something, but without
> reliable STR, I can't find any bugs from just auditing the scheduler code. :(

The timeouts always occur when running on the test machines.
(In reply to Dan Minor [:dminor] from comment #13)
> (In reply to Shu-yu Guo [:shu] from comment #11)
> > How rare are the timeouts? It is possible that that test hits a nasty corner
> > case in the new scheduler that causes a deadlock or something, but without
> > reliable STR, I can't find any bugs from just auditing the scheduler code. :(
> 
> The timeouts always occur when running on the test machines.

Is it possible for me to ssh in to these test machines to debug?
(In reply to Shu-yu Guo [:shu] from comment #14)
> (In reply to Dan Minor [:dminor] from comment #13)
> > (In reply to Shu-yu Guo [:shu] from comment #11)
> > > How rare are the timeouts? It is possible that that test hits a nasty corner
> > > case in the new scheduler that causes a deadlock or something, but without
> > > reliable STR, I can't find any bugs from just auditing the scheduler code. :(
> > 
> > The timeouts always occur when running on the test machines.
> 
> Is it possible for me to ssh in to these test machines to debug?

Shu, you can request a test machine by filing a bug under Release Engineering - Loan Requests (you can clone Bug 977711). Thanks for looking at this!
Depends on: 988501
Flags: needinfo?(shu)
These tests have been disabled until they are fixed, so they no longer block removing jit-tests from make check.
No longer blocks: 858621, 973900
Depends on: 998997
https://hg.mozilla.org/mozilla-central/rev/31b79b2c4a7a
Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla31
You need to log in before you can comment on or make changes to this bug.