Closed Bug 898658 Opened 11 years ago Closed 11 years ago

Run the xpcshell tests that fail when run in parallel concurrently.

Categories

(Testing :: XPCShell Harness, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED INVALID

People

(Reporter: mihneadb, Assigned: mihneadb)

References

Details

Attachments

(1 file, 6 obsolete files)

This is needed so we can land the parallel xpcshell harness while keeping the current orange rates.
Assignee: nobody → mihneadb
"INFO -  Can't trigger Breakpad, just killing process" seems to be a common timeout reason.
Ted, (I don't know anything about breakpad) would setting up a different symbols dir for the tests that use the crashreport feature fix this?
Flags: needinfo?(ted)
Blocks: parxpc
(In reply to Mihnea Dobrescu-Balaur (:mihneadb) from comment #1)
> "INFO -  Can't trigger Breakpad, just killing process" seems to be a common
> timeout reason.

That's not a timeout reason, that's a symptom. We don't have a way to trigger Breakpad from out-of-process on OS X (bug 525296), so when a test hangs we just print that message and kill the process.
Flags: needinfo?(ted)
(In reply to Mihnea Dobrescu-Balaur (:mihneadb) from comment #0)
> This is needed so we can land the parallel xpcshell harness while keeping
> the current orange rates.

I don't understand what you're saying here. Can you explain this in more detail? Why would running tests that fail intermittently in parallel make our orange rate higher than it currently is?
(In reply to Ted Mielczarek [:ted.mielczarek] (post-vacation backlog) from comment #5)
> (In reply to Mihnea Dobrescu-Balaur (:mihneadb) from comment #0)
> > This is needed so we can land the parallel xpcshell harness while keeping
> > the current orange rates.
> 
> I don't understand what you're saying here. Can you explain this in more
> detail? Why would running tests that fail intermittently in parallel make
> our orange rate higher than it currently is?

So, we don't know why most of the intermittents are failing (otherwise I expect we would've fixed them), but from what I see/understand many of them have timing issues.

A hunch I have is that running the tests in parallel changes the CPU load and the timings for some tests as well. For example I found the dom/encoding/test/unit/test_singlebytes.js test to time out when run in parallel and I think it's because it is a more cpu-intensive test and it takes longer than the default timeout value.


It might also be that some tests have some race conditions in them which I have not found, although I managed to run all the tests consistently without failures on two laptops, linux and mac os.
(In reply to Ted Mielczarek [:ted.mielczarek] (post-vacation backlog) from comment #4)
> (In reply to Mihnea Dobrescu-Balaur (:mihneadb) from comment #1)
> > "INFO -  Can't trigger Breakpad, just killing process" seems to be a common
> > timeout reason.
> 
> That's not a timeout reason, that's a symptom. We don't have a way to
> trigger Breakpad from out-of-process on OS X (bug 525296), so when a test
> hangs we just print that message and kill the process.

Try run with your mozcrash patches: https://tbpl.mozilla.org/?tree=Try&rev=969af08ceb20
Okay, thanks for the info. It sounds like your parallel xpcshell patch just exacerbates some existing intermittent tests, then, so your plan is to make those run sequentially to not make things worse? If you do this, please file bugs on them and mention the bug numbers in the manifest so we can get them fixed.
Summary: Run known intermittent failing tests sequentially → Run the xpcshell tests that fail when run in parallel concurrently.
Attachment #786473 - Attachment is obsolete: true
This is what I got to so far.

I ended up running the dom/plugins folder sequentially for now because it still
seems to fail on windows (XP for example) and it became a massive time sink.
There are just 7 tests in there so there is not really a big perf gain.

Will open some follow ups for these after the new harness and the final version
of this patch lands.


Try run with the current version: https://tbpl.mozilla.org/?tree=Try&rev=bbf0fbd351bc
Attachment #786474 - Attachment is obsolete: true
Attachment #787147 - Attachment is obsolete: true
Goes hand in hand with the parxpc patch.

This ended up containing more tests marked to run seq than I would've wished,
but as long as this ensures green runs and we still get the speedup, we can
always unmark them later if they turn out to be ok.
Attachment #787983 - Flags: review?(ted)
Attachment #787236 - Attachment is obsolete: true
Added one more test, found in this[1] try run.

[1] https://tbpl.mozilla.org/?tree=Try&rev=cb4ced2feb16
Attachment #788272 - Flags: review?(ted)
Attachment #787983 - Attachment is obsolete: true
Attachment #787983 - Flags: review?(ted)
Attachment #788272 - Flags: review?(ted) → review+
Added some more tests because of intermittent failures/timeouts on tbpl.
Attachment #788272 - Attachment is obsolete: true
Comment on attachment 788632 [details] [diff] [review]
Run the xpcshell tests that fail when run in parallel concurrently.

keeping r+
Attachment #788632 - Flags: review+
Changing dep since bug 887054 will not turn on parxpc in automation.
Blocks: 660788
No longer blocks: parxpc
I'm thinking maybe we shouldn't flag known intermittents to run sequentially since we will end up losing quite a bit on the performance side.

Ed, what do you think?
Flags: needinfo?(emorley)
(In reply to Mihnea Dobrescu-Balaur (:mihneadb) from comment #18)
> I'm thinking maybe we shouldn't flag known intermittents to run sequentially
> since we will end up losing quite a bit on the performance side.
> 
> Ed, what do you think?

Do we have any evidence that the orange rate increases otherwise? (Other than presuming many of the intermittents are timing related, and this will affect timing)
Flags: needinfo?(emorley)
(In reply to Ed Morley [:edmorley UTC+1] from comment #19)
> (In reply to Mihnea Dobrescu-Balaur (:mihneadb) from comment #18)
> > I'm thinking maybe we shouldn't flag known intermittents to run sequentially
> > since we will end up losing quite a bit on the performance side.
> > 
> > Ed, what do you think?
> 
> Do we have any evidence that the orange rate increases otherwise? (Other
> than presuming many of the intermittents are timing related, and this will
> affect timing)

I found that some unfiled tests time out with this patch. I guess that counts as increasing orange rate. I looked into those tests and I think they are just uncovered intermittents.

(They *are* intermittents for sure since they only fail once or twice in tens of runs)
Perhaps we should try with only marking those known to be unreliable in parallel as sequential at first? We can always add known intermittents later in many of them start appearing high on http://brasstacks.mozilla.com/orangefactor/ (though in that case I'd almost be inclined to just disable those tests until investigated).
(In reply to Ed Morley [:edmorley UTC+1] from comment #21)
> Perhaps we should try with only marking those known to be unreliable in
> parallel as sequential at first? We can always add known intermittents later
> in many of them start appearing high on
> http://brasstacks.mozilla.com/orangefactor/ (though in that case I'd almost
> be inclined to just disable those tests until investigated).

Ok, I'll set up a run that runs tests continuously, trying to find more broken tests locally, mark those as run-seq and we can try landing a patch after that.
Went with the approach in bug 906510.
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → INVALID
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: