Closed Bug 898658 Opened 11 years ago Closed 11 years ago

Run the xpcshell tests that fail when run in parallel concurrently.

Tracking

(Not tracked)

Status:

RESOLVED INVALID

People

(Reporter: mihneadb, Assigned: mihneadb)

References

Details

Attachments

(1 file, 6 obsolete files)

mark needed tests to run sequentially 11 years ago Mihnea Dobrescu-Balaur (:mihneadb) 12.92 KB, patch		Details \| Diff \| Splinter Review
Run the xpcshell tests that fail when run in parallel concurrently. 11 years ago Mihnea Dobrescu-Balaur (:mihneadb) 12.94 KB, patch		Details \| Diff \| Splinter Review
Run the xpcshell tests that fail when run in parallel concurrently. 11 years ago Mihnea Dobrescu-Balaur (:mihneadb) 14.74 KB, patch		Details \| Diff \| Splinter Review
Run the xpcshell tests that fail when run in parallel concurrently. 11 years ago Mihnea Dobrescu-Balaur (:mihneadb) 17.94 KB, patch		Details \| Diff \| Splinter Review
Run the xpcshell tests that fail when run in parallel concurrently. 11 years ago Mihnea Dobrescu-Balaur (:mihneadb) 32.39 KB, patch		Details \| Diff \| Splinter Review
Run the xpcshell tests that fail when run in parallel concurrently. 11 years ago Mihnea Dobrescu-Balaur (:mihneadb) 32.77 KB, patch	ted : review+	Details \| Diff \| Splinter Review
Run the xpcshell tests that fail when run in parallel concurrently. 11 years ago Mihnea Dobrescu-Balaur (:mihneadb) 38.46 KB, patch	mihneadb : review+	Details \| Diff \| Splinter Review

Mihnea Dobrescu-Balaur (:mihneadb)

Assignee

Description

•

11 years ago

This is needed so we can land the parallel xpcshell harness while keeping the current orange rates.

Mihnea Dobrescu-Balaur (:mihneadb)

Assignee

Updated

•

11 years ago

Assignee: nobody → mihneadb

Mihnea Dobrescu-Balaur (:mihneadb)

Assignee

Comment 1

•

11 years ago

"INFO -  Can't trigger Breakpad, just killing process" seems to be a common timeout reason.

Mihnea Dobrescu-Balaur (:mihneadb)

Assignee

Comment 2

•

11 years ago

Ted, (I don't know anything about breakpad) would setting up a different symbols dir for the tests that use the crashreport feature fix this?

Flags: needinfo?(ted)

Mihnea Dobrescu-Balaur (:mihneadb)

Assignee

Comment 3

•

11 years ago

a try run, work in progress - https://tbpl.mozilla.org/?tree=Try&rev=13d5c99e5824

Mihnea Dobrescu-Balaur (:mihneadb)

Assignee

Updated

•

11 years ago

Blocks: parxpc

(not currently active) Ted Mielczarek

Comment 4

•

11 years ago

(In reply to Mihnea Dobrescu-Balaur (:mihneadb) from comment #1)
> "INFO -  Can't trigger Breakpad, just killing process" seems to be a common
> timeout reason.

That's not a timeout reason, that's a symptom. We don't have a way to trigger Breakpad from out-of-process on OS X (bug 525296), so when a test hangs we just print that message and kill the process.

Flags: needinfo?(ted)

(not currently active) Ted Mielczarek

Comment 5

•

11 years ago

(In reply to Mihnea Dobrescu-Balaur (:mihneadb) from comment #0)
> This is needed so we can land the parallel xpcshell harness while keeping
> the current orange rates.

I don't understand what you're saying here. Can you explain this in more detail? Why would running tests that fail intermittently in parallel make our orange rate higher than it currently is?

Mihnea Dobrescu-Balaur (:mihneadb)

Assignee

Comment 6

•

11 years ago

(In reply to Ted Mielczarek [:ted.mielczarek] (post-vacation backlog) from comment #5)
> (In reply to Mihnea Dobrescu-Balaur (:mihneadb) from comment #0)
> > This is needed so we can land the parallel xpcshell harness while keeping
> > the current orange rates.
> 
> I don't understand what you're saying here. Can you explain this in more
> detail? Why would running tests that fail intermittently in parallel make
> our orange rate higher than it currently is?

So, we don't know why most of the intermittents are failing (otherwise I expect we would've fixed them), but from what I see/understand many of them have timing issues.

A hunch I have is that running the tests in parallel changes the CPU load and the timings for some tests as well. For example I found the dom/encoding/test/unit/test_singlebytes.js test to time out when run in parallel and I think it's because it is a more cpu-intensive test and it takes longer than the default timeout value.


It might also be that some tests have some race conditions in them which I have not found, although I managed to run all the tests consistently without failures on two laptops, linux and mac os.

Mihnea Dobrescu-Balaur (:mihneadb)

Assignee

Comment 7

•

11 years ago

(In reply to Ted Mielczarek [:ted.mielczarek] (post-vacation backlog) from comment #4)
> (In reply to Mihnea Dobrescu-Balaur (:mihneadb) from comment #1)
> > "INFO -  Can't trigger Breakpad, just killing process" seems to be a common
> > timeout reason.
> 
> That's not a timeout reason, that's a symptom. We don't have a way to
> trigger Breakpad from out-of-process on OS X (bug 525296), so when a test
> hangs we just print that message and kill the process.

Try run with your mozcrash patches: https://tbpl.mozilla.org/?tree=Try&rev=969af08ceb20

(not currently active) Ted Mielczarek

Comment 8

•

11 years ago

Okay, thanks for the info. It sounds like your parallel xpcshell patch just exacerbates some existing intermittent tests, then, so your plan is to make those run sequentially to not make things worse? If you do this, please file bugs on them and mention the bug numbers in the manifest so we can get them fixed.

Mihnea Dobrescu-Balaur (:mihneadb)

Assignee

Updated

•

11 years ago

Summary: Run known intermittent failing tests sequentially → Run the xpcshell tests that fail when run in parallel concurrently.

Mihnea Dobrescu-Balaur (:mihneadb)

Assignee

Comment 9

•

11 years ago

Attached patch mark needed tests to run sequentially (obsolete) — Details — Splinter Review

Mihnea Dobrescu-Balaur (:mihneadb)

Assignee

Comment 10

•

11 years ago

Attached patch Run the xpcshell tests that fail when run in parallel concurrently. (obsolete) — Details — Splinter Review

Changed commit msg.

Mihnea Dobrescu-Balaur (:mihneadb)

Assignee

Updated

•

11 years ago

Attachment #786473 - Attachment is obsolete: true

Mihnea Dobrescu-Balaur (:mihneadb)

Assignee

Comment 11

•

11 years ago

Attached patch Run the xpcshell tests that fail when run in parallel concurrently. (obsolete) — Details — Splinter Review

This is what I got to so far.

I ended up running the dom/plugins folder sequentially for now because it still
seems to fail on windows (XP for example) and it became a massive time sink.
There are just 7 tests in there so there is not really a big perf gain.

Will open some follow ups for these after the new harness and the final version
of this patch lands.


Try run with the current version: https://tbpl.mozilla.org/?tree=Try&rev=bbf0fbd351bc

Mihnea Dobrescu-Balaur (:mihneadb)

Assignee

Updated

•

11 years ago

Attachment #786474 - Attachment is obsolete: true

Mihnea Dobrescu-Balaur (:mihneadb)

Assignee

Comment 12

•

11 years ago

Attached patch Run the xpcshell tests that fail when run in parallel concurrently. (obsolete) — Details — Splinter Review

New try.. https://tbpl.mozilla.org/?tree=Try&rev=1356f4b64dd8

[increased the timeout, see if it helps]

Mihnea Dobrescu-Balaur (:mihneadb)

Assignee

Updated

•

11 years ago

Attachment #787147 - Attachment is obsolete: true

Mihnea Dobrescu-Balaur (:mihneadb)

Assignee

Comment 13

•

11 years ago

Attached patch Run the xpcshell tests that fail when run in parallel concurrently. (obsolete) — Details — Splinter Review

Goes hand in hand with the parxpc patch.

This ended up containing more tests marked to run seq than I would've wished,
but as long as this ensures green runs and we still get the speedup, we can
always unmark them later if they turn out to be ok.

Attachment #787983 - Flags: review?(ted)

Mihnea Dobrescu-Balaur (:mihneadb)

Assignee

Updated

•

11 years ago

Attachment #787236 - Attachment is obsolete: true

Mihnea Dobrescu-Balaur (:mihneadb)

Assignee

Comment 14

•

11 years ago

Attached patch Run the xpcshell tests that fail when run in parallel concurrently. (obsolete) — Details — Splinter Review

Added one more test, found in this[1] try run.

[1] https://tbpl.mozilla.org/?tree=Try&rev=cb4ced2feb16

Attachment #788272 - Flags: review?(ted)

Mihnea Dobrescu-Balaur (:mihneadb)

Assignee

Updated

•

11 years ago

Attachment #787983 - Attachment is obsolete: true

Attachment #787983 - Flags: review?(ted)

(not currently active) Ted Mielczarek

Updated

•

11 years ago

Attachment #788272 - Flags: review?(ted) → review+

Mihnea Dobrescu-Balaur (:mihneadb)

Assignee

Comment 15

•

11 years ago

Attached patch Run the xpcshell tests that fail when run in parallel concurrently. — Details — Splinter Review

Added some more tests because of intermittent failures/timeouts on tbpl.

Mihnea Dobrescu-Balaur (:mihneadb)

Assignee

Updated

•

11 years ago

Attachment #788272 - Attachment is obsolete: true

Mihnea Dobrescu-Balaur (:mihneadb)

Assignee

Comment 16

•

11 years ago

Comment on attachment 788632 [details] [diff] [review]
Run the xpcshell tests that fail when run in parallel concurrently.

keeping r+

Attachment #788632 - Flags: review+

Mihnea Dobrescu-Balaur (:mihneadb)

Assignee

Comment 17

•

11 years ago

Changing dep since bug 887054 will not turn on parxpc in automation.

Blocks: 660788
No longer blocks: parxpc

Mihnea Dobrescu-Balaur (:mihneadb)

Assignee

Comment 18

•

11 years ago

I'm thinking maybe we shouldn't flag known intermittents to run sequentially since we will end up losing quite a bit on the performance side.

Ed, what do you think?

Flags: needinfo?(emorley)

Ed Morley [:emorley]

Comment 19

•

11 years ago

(In reply to Mihnea Dobrescu-Balaur (:mihneadb) from comment #18)
> I'm thinking maybe we shouldn't flag known intermittents to run sequentially
> since we will end up losing quite a bit on the performance side.
> 
> Ed, what do you think?

Do we have any evidence that the orange rate increases otherwise? (Other than presuming many of the intermittents are timing related, and this will affect timing)

Flags: needinfo?(emorley)

Mihnea Dobrescu-Balaur (:mihneadb)

Assignee

Comment 20

•

11 years ago

(In reply to Ed Morley [:edmorley UTC+1] from comment #19)
> (In reply to Mihnea Dobrescu-Balaur (:mihneadb) from comment #18)
> > I'm thinking maybe we shouldn't flag known intermittents to run sequentially
> > since we will end up losing quite a bit on the performance side.
> > 
> > Ed, what do you think?
> 
> Do we have any evidence that the orange rate increases otherwise? (Other
> than presuming many of the intermittents are timing related, and this will
> affect timing)

I found that some unfiled tests time out with this patch. I guess that counts as increasing orange rate. I looked into those tests and I think they are just uncovered intermittents.

(They *are* intermittents for sure since they only fail once or twice in tens of runs)

Ed Morley [:emorley]

Comment 21

•

11 years ago

Perhaps we should try with only marking those known to be unreliable in parallel as sequential at first? We can always add known intermittents later in many of them start appearing high on http://brasstacks.mozilla.com/orangefactor/ (though in that case I'd almost be inclined to just disable those tests until investigated).

Mihnea Dobrescu-Balaur (:mihneadb)

Assignee

Comment 22

•

11 years ago

(In reply to Ed Morley [:edmorley UTC+1] from comment #21)
> Perhaps we should try with only marking those known to be unreliable in
> parallel as sequential at first? We can always add known intermittents later
> in many of them start appearing high on
> http://brasstacks.mozilla.com/orangefactor/ (though in that case I'd almost
> be inclined to just disable those tests until investigated).

Ok, I'll set up a run that runs tests continuously, trying to find more broken tests locally, mark those as run-seq and we can try landing a patch after that.

Mihnea Dobrescu-Balaur (:mihneadb)

Assignee

Comment 23

•

11 years ago

Went with the approach in bug 906510.

Status: NEW → RESOLVED

Closed: 11 years ago

Resolution: --- → INVALID

You need to log in before you can comment on or make changes to this bug.