Run the xpcshell tests that fail when run in parallel concurrently.

RESOLVED INVALID

Status

Testing
XPCShell Harness
RESOLVED INVALID
4 years ago
4 years ago

People

(Reporter: mihneadb, Assigned: mihneadb)

Tracking

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(1 attachment, 6 obsolete attachments)

(Assignee)

Description

4 years ago
This is needed so we can land the parallel xpcshell harness while keeping the current orange rates.
(Assignee)

Updated

4 years ago
Assignee: nobody → mihneadb
(Assignee)

Comment 1

4 years ago
"INFO -  Can't trigger Breakpad, just killing process" seems to be a common timeout reason.
(Assignee)

Comment 2

4 years ago
Ted, (I don't know anything about breakpad) would setting up a different symbols dir for the tests that use the crashreport feature fix this?
Flags: needinfo?(ted)
(Assignee)

Comment 3

4 years ago
a try run, work in progress - https://tbpl.mozilla.org/?tree=Try&rev=13d5c99e5824
(Assignee)

Updated

4 years ago
Blocks: 887054
(In reply to Mihnea Dobrescu-Balaur (:mihneadb) from comment #1)
> "INFO -  Can't trigger Breakpad, just killing process" seems to be a common
> timeout reason.

That's not a timeout reason, that's a symptom. We don't have a way to trigger Breakpad from out-of-process on OS X (bug 525296), so when a test hangs we just print that message and kill the process.
Flags: needinfo?(ted)
(In reply to Mihnea Dobrescu-Balaur (:mihneadb) from comment #0)
> This is needed so we can land the parallel xpcshell harness while keeping
> the current orange rates.

I don't understand what you're saying here. Can you explain this in more detail? Why would running tests that fail intermittently in parallel make our orange rate higher than it currently is?
(Assignee)

Comment 6

4 years ago
(In reply to Ted Mielczarek [:ted.mielczarek] (post-vacation backlog) from comment #5)
> (In reply to Mihnea Dobrescu-Balaur (:mihneadb) from comment #0)
> > This is needed so we can land the parallel xpcshell harness while keeping
> > the current orange rates.
> 
> I don't understand what you're saying here. Can you explain this in more
> detail? Why would running tests that fail intermittently in parallel make
> our orange rate higher than it currently is?

So, we don't know why most of the intermittents are failing (otherwise I expect we would've fixed them), but from what I see/understand many of them have timing issues.

A hunch I have is that running the tests in parallel changes the CPU load and the timings for some tests as well. For example I found the dom/encoding/test/unit/test_singlebytes.js test to time out when run in parallel and I think it's because it is a more cpu-intensive test and it takes longer than the default timeout value.


It might also be that some tests have some race conditions in them which I have not found, although I managed to run all the tests consistently without failures on two laptops, linux and mac os.
(Assignee)

Comment 7

4 years ago
(In reply to Ted Mielczarek [:ted.mielczarek] (post-vacation backlog) from comment #4)
> (In reply to Mihnea Dobrescu-Balaur (:mihneadb) from comment #1)
> > "INFO -  Can't trigger Breakpad, just killing process" seems to be a common
> > timeout reason.
> 
> That's not a timeout reason, that's a symptom. We don't have a way to
> trigger Breakpad from out-of-process on OS X (bug 525296), so when a test
> hangs we just print that message and kill the process.

Try run with your mozcrash patches: https://tbpl.mozilla.org/?tree=Try&rev=969af08ceb20
Okay, thanks for the info. It sounds like your parallel xpcshell patch just exacerbates some existing intermittent tests, then, so your plan is to make those run sequentially to not make things worse? If you do this, please file bugs on them and mention the bug numbers in the manifest so we can get them fixed.
(Assignee)

Updated

4 years ago
Summary: Run known intermittent failing tests sequentially → Run the xpcshell tests that fail when run in parallel concurrently.
(Assignee)

Comment 9

4 years ago
Created attachment 786473 [details] [diff] [review]
mark needed tests to run sequentially
(Assignee)

Comment 10

4 years ago
Created attachment 786474 [details] [diff] [review]
Run the xpcshell tests that fail when run in parallel concurrently.

Changed commit msg.
(Assignee)

Updated

4 years ago
Attachment #786473 - Attachment is obsolete: true
(Assignee)

Comment 11

4 years ago
Created attachment 787147 [details] [diff] [review]
Run the xpcshell tests that fail when run in parallel concurrently.

This is what I got to so far.

I ended up running the dom/plugins folder sequentially for now because it still
seems to fail on windows (XP for example) and it became a massive time sink.
There are just 7 tests in there so there is not really a big perf gain.

Will open some follow ups for these after the new harness and the final version
of this patch lands.


Try run with the current version: https://tbpl.mozilla.org/?tree=Try&rev=bbf0fbd351bc
(Assignee)

Updated

4 years ago
Attachment #786474 - Attachment is obsolete: true
(Assignee)

Comment 12

4 years ago
Created attachment 787236 [details] [diff] [review]
Run the xpcshell tests that fail when run in parallel concurrently.

New try.. https://tbpl.mozilla.org/?tree=Try&rev=1356f4b64dd8

[increased the timeout, see if it helps]
(Assignee)

Updated

4 years ago
Attachment #787147 - Attachment is obsolete: true
(Assignee)

Comment 13

4 years ago
Created attachment 787983 [details] [diff] [review]
Run the xpcshell tests that fail when run in parallel concurrently.

Goes hand in hand with the parxpc patch.

This ended up containing more tests marked to run seq than I would've wished,
but as long as this ensures green runs and we still get the speedup, we can
always unmark them later if they turn out to be ok.
Attachment #787983 - Flags: review?(ted)
(Assignee)

Updated

4 years ago
Attachment #787236 - Attachment is obsolete: true
(Assignee)

Comment 14

4 years ago
Created attachment 788272 [details] [diff] [review]
Run the xpcshell tests that fail when run in parallel concurrently.

Added one more test, found in this[1] try run.

[1] https://tbpl.mozilla.org/?tree=Try&rev=cb4ced2feb16
Attachment #788272 - Flags: review?(ted)
(Assignee)

Updated

4 years ago
Attachment #787983 - Attachment is obsolete: true
Attachment #787983 - Flags: review?(ted)
Attachment #788272 - Flags: review?(ted) → review+
(Assignee)

Comment 15

4 years ago
Created attachment 788632 [details] [diff] [review]
Run the xpcshell tests that fail when run in parallel concurrently.

Added some more tests because of intermittent failures/timeouts on tbpl.
(Assignee)

Updated

4 years ago
Attachment #788272 - Attachment is obsolete: true
(Assignee)

Comment 16

4 years ago
Comment on attachment 788632 [details] [diff] [review]
Run the xpcshell tests that fail when run in parallel concurrently.

keeping r+
Attachment #788632 - Flags: review+
(Assignee)

Comment 17

4 years ago
Changing dep since bug 887054 will not turn on parxpc in automation.
Blocks: 660788
No longer blocks: 887054
(Assignee)

Comment 18

4 years ago
I'm thinking maybe we shouldn't flag known intermittents to run sequentially since we will end up losing quite a bit on the performance side.

Ed, what do you think?
Flags: needinfo?(emorley)
(In reply to Mihnea Dobrescu-Balaur (:mihneadb) from comment #18)
> I'm thinking maybe we shouldn't flag known intermittents to run sequentially
> since we will end up losing quite a bit on the performance side.
> 
> Ed, what do you think?

Do we have any evidence that the orange rate increases otherwise? (Other than presuming many of the intermittents are timing related, and this will affect timing)
Flags: needinfo?(emorley)
(Assignee)

Comment 20

4 years ago
(In reply to Ed Morley [:edmorley UTC+1] from comment #19)
> (In reply to Mihnea Dobrescu-Balaur (:mihneadb) from comment #18)
> > I'm thinking maybe we shouldn't flag known intermittents to run sequentially
> > since we will end up losing quite a bit on the performance side.
> > 
> > Ed, what do you think?
> 
> Do we have any evidence that the orange rate increases otherwise? (Other
> than presuming many of the intermittents are timing related, and this will
> affect timing)

I found that some unfiled tests time out with this patch. I guess that counts as increasing orange rate. I looked into those tests and I think they are just uncovered intermittents.

(They *are* intermittents for sure since they only fail once or twice in tens of runs)
Perhaps we should try with only marking those known to be unreliable in parallel as sequential at first? We can always add known intermittents later in many of them start appearing high on http://brasstacks.mozilla.com/orangefactor/ (though in that case I'd almost be inclined to just disable those tests until investigated).
(Assignee)

Comment 22

4 years ago
(In reply to Ed Morley [:edmorley UTC+1] from comment #21)
> Perhaps we should try with only marking those known to be unreliable in
> parallel as sequential at first? We can always add known intermittents later
> in many of them start appearing high on
> http://brasstacks.mozilla.com/orangefactor/ (though in that case I'd almost
> be inclined to just disable those tests until investigated).

Ok, I'll set up a run that runs tests continuously, trying to find more broken tests locally, mark those as run-seq and we can try landing a patch after that.
(Assignee)

Comment 23

4 years ago
Went with the approach in bug 906510.
Status: NEW → RESOLVED
Last Resolved: 4 years ago
Resolution: --- → INVALID
You need to log in before you can comment on or make changes to this bug.