Closed
Bug 898658
Opened 11 years ago
Closed 11 years ago
Run the xpcshell tests that fail when run in parallel concurrently.
Categories
(Testing :: XPCShell Harness, defect)
Testing
XPCShell Harness
Tracking
(Not tracked)
RESOLVED
INVALID
People
(Reporter: mihneadb, Assigned: mihneadb)
References
Details
Attachments
(1 file, 6 obsolete files)
38.46 KB,
patch
|
mihneadb
:
review+
|
Details | Diff | Splinter Review |
This is needed so we can land the parallel xpcshell harness while keeping the current orange rates.
Assignee | ||
Updated•11 years ago
|
Assignee: nobody → mihneadb
Assignee | ||
Comment 1•11 years ago
|
||
"INFO - Can't trigger Breakpad, just killing process" seems to be a common timeout reason.
Assignee | ||
Comment 2•11 years ago
|
||
Ted, (I don't know anything about breakpad) would setting up a different symbols dir for the tests that use the crashreport feature fix this?
Flags: needinfo?(ted)
Assignee | ||
Comment 3•11 years ago
|
||
a try run, work in progress - https://tbpl.mozilla.org/?tree=Try&rev=13d5c99e5824
Comment 4•11 years ago
|
||
(In reply to Mihnea Dobrescu-Balaur (:mihneadb) from comment #1) > "INFO - Can't trigger Breakpad, just killing process" seems to be a common > timeout reason. That's not a timeout reason, that's a symptom. We don't have a way to trigger Breakpad from out-of-process on OS X (bug 525296), so when a test hangs we just print that message and kill the process.
Flags: needinfo?(ted)
Comment 5•11 years ago
|
||
(In reply to Mihnea Dobrescu-Balaur (:mihneadb) from comment #0) > This is needed so we can land the parallel xpcshell harness while keeping > the current orange rates. I don't understand what you're saying here. Can you explain this in more detail? Why would running tests that fail intermittently in parallel make our orange rate higher than it currently is?
Assignee | ||
Comment 6•11 years ago
|
||
(In reply to Ted Mielczarek [:ted.mielczarek] (post-vacation backlog) from comment #5) > (In reply to Mihnea Dobrescu-Balaur (:mihneadb) from comment #0) > > This is needed so we can land the parallel xpcshell harness while keeping > > the current orange rates. > > I don't understand what you're saying here. Can you explain this in more > detail? Why would running tests that fail intermittently in parallel make > our orange rate higher than it currently is? So, we don't know why most of the intermittents are failing (otherwise I expect we would've fixed them), but from what I see/understand many of them have timing issues. A hunch I have is that running the tests in parallel changes the CPU load and the timings for some tests as well. For example I found the dom/encoding/test/unit/test_singlebytes.js test to time out when run in parallel and I think it's because it is a more cpu-intensive test and it takes longer than the default timeout value. It might also be that some tests have some race conditions in them which I have not found, although I managed to run all the tests consistently without failures on two laptops, linux and mac os.
Assignee | ||
Comment 7•11 years ago
|
||
(In reply to Ted Mielczarek [:ted.mielczarek] (post-vacation backlog) from comment #4) > (In reply to Mihnea Dobrescu-Balaur (:mihneadb) from comment #1) > > "INFO - Can't trigger Breakpad, just killing process" seems to be a common > > timeout reason. > > That's not a timeout reason, that's a symptom. We don't have a way to > trigger Breakpad from out-of-process on OS X (bug 525296), so when a test > hangs we just print that message and kill the process. Try run with your mozcrash patches: https://tbpl.mozilla.org/?tree=Try&rev=969af08ceb20
Comment 8•11 years ago
|
||
Okay, thanks for the info. It sounds like your parallel xpcshell patch just exacerbates some existing intermittent tests, then, so your plan is to make those run sequentially to not make things worse? If you do this, please file bugs on them and mention the bug numbers in the manifest so we can get them fixed.
Assignee | ||
Updated•11 years ago
|
Summary: Run known intermittent failing tests sequentially → Run the xpcshell tests that fail when run in parallel concurrently.
Assignee | ||
Comment 9•11 years ago
|
||
Assignee | ||
Comment 10•11 years ago
|
||
Changed commit msg.
Assignee | ||
Updated•11 years ago
|
Attachment #786473 -
Attachment is obsolete: true
Assignee | ||
Comment 11•11 years ago
|
||
This is what I got to so far. I ended up running the dom/plugins folder sequentially for now because it still seems to fail on windows (XP for example) and it became a massive time sink. There are just 7 tests in there so there is not really a big perf gain. Will open some follow ups for these after the new harness and the final version of this patch lands. Try run with the current version: https://tbpl.mozilla.org/?tree=Try&rev=bbf0fbd351bc
Assignee | ||
Updated•11 years ago
|
Attachment #786474 -
Attachment is obsolete: true
Assignee | ||
Comment 12•11 years ago
|
||
New try.. https://tbpl.mozilla.org/?tree=Try&rev=1356f4b64dd8 [increased the timeout, see if it helps]
Assignee | ||
Updated•11 years ago
|
Attachment #787147 -
Attachment is obsolete: true
Assignee | ||
Comment 13•11 years ago
|
||
Goes hand in hand with the parxpc patch. This ended up containing more tests marked to run seq than I would've wished, but as long as this ensures green runs and we still get the speedup, we can always unmark them later if they turn out to be ok.
Attachment #787983 -
Flags: review?(ted)
Assignee | ||
Updated•11 years ago
|
Attachment #787236 -
Attachment is obsolete: true
Assignee | ||
Comment 14•11 years ago
|
||
Added one more test, found in this[1] try run. [1] https://tbpl.mozilla.org/?tree=Try&rev=cb4ced2feb16
Attachment #788272 -
Flags: review?(ted)
Assignee | ||
Updated•11 years ago
|
Attachment #787983 -
Attachment is obsolete: true
Attachment #787983 -
Flags: review?(ted)
Updated•11 years ago
|
Attachment #788272 -
Flags: review?(ted) → review+
Assignee | ||
Comment 15•11 years ago
|
||
Added some more tests because of intermittent failures/timeouts on tbpl.
Assignee | ||
Updated•11 years ago
|
Attachment #788272 -
Attachment is obsolete: true
Assignee | ||
Comment 16•11 years ago
|
||
Comment on attachment 788632 [details] [diff] [review] Run the xpcshell tests that fail when run in parallel concurrently. keeping r+
Attachment #788632 -
Flags: review+
Assignee | ||
Comment 17•11 years ago
|
||
Changing dep since bug 887054 will not turn on parxpc in automation.
Assignee | ||
Comment 18•11 years ago
|
||
I'm thinking maybe we shouldn't flag known intermittents to run sequentially since we will end up losing quite a bit on the performance side. Ed, what do you think?
Flags: needinfo?(emorley)
Comment 19•11 years ago
|
||
(In reply to Mihnea Dobrescu-Balaur (:mihneadb) from comment #18) > I'm thinking maybe we shouldn't flag known intermittents to run sequentially > since we will end up losing quite a bit on the performance side. > > Ed, what do you think? Do we have any evidence that the orange rate increases otherwise? (Other than presuming many of the intermittents are timing related, and this will affect timing)
Flags: needinfo?(emorley)
Assignee | ||
Comment 20•11 years ago
|
||
(In reply to Ed Morley [:edmorley UTC+1] from comment #19) > (In reply to Mihnea Dobrescu-Balaur (:mihneadb) from comment #18) > > I'm thinking maybe we shouldn't flag known intermittents to run sequentially > > since we will end up losing quite a bit on the performance side. > > > > Ed, what do you think? > > Do we have any evidence that the orange rate increases otherwise? (Other > than presuming many of the intermittents are timing related, and this will > affect timing) I found that some unfiled tests time out with this patch. I guess that counts as increasing orange rate. I looked into those tests and I think they are just uncovered intermittents. (They *are* intermittents for sure since they only fail once or twice in tens of runs)
Comment 21•11 years ago
|
||
Perhaps we should try with only marking those known to be unreliable in parallel as sequential at first? We can always add known intermittents later in many of them start appearing high on http://brasstacks.mozilla.com/orangefactor/ (though in that case I'd almost be inclined to just disable those tests until investigated).
Assignee | ||
Comment 22•11 years ago
|
||
(In reply to Ed Morley [:edmorley UTC+1] from comment #21) > Perhaps we should try with only marking those known to be unreliable in > parallel as sequential at first? We can always add known intermittents later > in many of them start appearing high on > http://brasstacks.mozilla.com/orangefactor/ (though in that case I'd almost > be inclined to just disable those tests until investigated). Ok, I'll set up a run that runs tests continuously, trying to find more broken tests locally, mark those as run-seq and we can try landing a patch after that.
Assignee | ||
Comment 23•11 years ago
|
||
Went with the approach in bug 906510.
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → INVALID
You need to log in
before you can comment on or make changes to this bug.
Description
•