Closed Bug 1548160 Opened 5 years ago Closed 5 years ago

revisit chunking of xpcshelltests

Tracking

(firefox68 fixed)

Status:

RESOLVED FIXED

Milestone:

mozilla68

Tracking Flags:

Tracking

Status

firefox68

---

fixed

People

(Reporter: egao, Assigned: egao)

References

Details

Attachments

(1 file)

Bug 1548160 - task efficiency: review and reduce chunk count of xpcshell for various platforms 5 years ago Edwin Takahashi (:egao \| infrequent contributor) 47 bytes, text/x-phabricator-request		Details \| Review

Edwin Takahashi (:egao | infrequent contributor)

Assignee

Description

•

5 years ago

•

Edited

Summary

Similar to bug #1548106, xpcshelltest is currently run in many chunks that I feel possibly unnecessary as each chunk introduces additional overhead to set up the environment.

An example push is seen here.

Data

Using X1 as standardized example (where comparable in chunk count), in the mozilla-central revision e8aebe488b2f2e567940577de25013d00e818f7c (linked above):

linux64-shippable: 6 minutes, 17:59:24 - 18:01:06 = 00:01:42
linux64-asan: 16 minutes, 16:49:21 - 16:51:38 = 00:02:17

Thoughts

Each chunk is running very quickly and requires approximately 1-2 minutes to set up. If we can reduce the number of chunks required to 50% of current values for linux64-debug for example, it is possible to save 12 minutes of overhead per push.

Edwin Takahashi (:egao | infrequent contributor)

Assignee

Updated

•

5 years ago

Type: defect → enhancement

Edwin Takahashi (:egao | infrequent contributor)

Assignee

Comment 1

•

5 years ago

Baseline: https://treeherder.mozilla.org/#/jobs?repo=mozilla-central&resultStatus=pending%2Crunning%2Csuperseded%2Cusercancel%2Cretry%2Csuccess%2Ctestfailed%2Cbusted%2Cexception&classifiedState=unclassified&tier=1%2C2%2C3&group_state=expanded&revision=e8aebe488b2f2e567940577de25013d00e818f7c&searchStr=xpcshell%2Cccov&selectedJob=243356572

Try push: https://treeherder.mozilla.org/#/jobs?repo=try&group_state=expanded&revision=8707c76c267fec042a9842321562d42482cdd0bb

The above try push shows the results of reducing chunks across the board:

            android-em-4.3-arm7-api-16/debug: 12
            macosx.*[^ccov]/.*: 1
            windows.*[^ccov]/.*: 1
            .*-ccov/.*: 5
            default: 4

ccov
in the baseline push, ccov builds appear to have uneven chunking with some chunks going over the 30 minute soft rule, and other chunks running for only 10 minutes.

the goal was to reduce the chunks from 8 to a manageable 5, in the hopes that some faster chunks are merged together. The shorter chunks are combined, but at the same time the longer chunks also take correspondingly longer time to run.

windows, macosx
chunk count remains at 1, as it currently stands. Runtime of these single-chunk xpcshelltest sometimes exceed 30 minutes, sometimes under 30 minutes. Considering the current baseline also sees the same behavior, this is not a concern.

linux
linux32 and linux64 had its chunk count reduced greatly, from 8 or 12 to 4 across the board.

For most of the linux runs the X4 chunk appears to take the longest, in some cases (linux64 opt) exceeding 45 minutes. Despite this I don't think this is cause for concern since even on the baseline, linux64 opt takes 40 minutes, so the extra 5 minutes consumed in the reduced chunk is easily made up by savings from reduced overhead.

Joel Maher ( :jmaher ) (UTC -8)

Comment 2

•

5 years ago

40 minutes is long, but it does simplify our chunks- what runtime dow we get with 6? closer to 30?

Edwin Takahashi (:egao | infrequent contributor)

Assignee

Comment 3

•

5 years ago

:jmaher - yesterday I ran a try push with 6 chunks for a bunch of platforms including linux variants. It is availble here.

First impression is that chunk runtimes don't decrease much between 4 and 6 chunks.
Using linux64/opt as example:

4 chunks: 15, 14, 20, 45
6 chunks: 11, 8, 7, 14, 14, 38

So it looks like by reducing chunks from 6 -> 4, it had the effect of redistributing the very short tests (8, 7) to the moderately long tests (11, 14) and some of the modifications also spilled over to the last chunks, which takes the longest.

An idea for another task efficiencies project could be to investigate the chunking mechanism, to better distribute the load for situations like this. Since the chunking mechanism is not related to overhead, it will be outside the scope of this project.

Joel Maher ( :jmaher ) (UTC -8)

Comment 4

•

5 years ago

overall I don't like the 45, but 6 chunks has a 38. Chunking is basically taking the list of tests we have and dividing them up- we are using chunk_by_slice for xpcshell:
https://searchfox.org/mozilla-central/source/testing/xpcshell/runxpcshelltests.py#900

and the definition is here:
https://searchfox.org/mozilla-central/source/testing/mozbase/manifestparser/manifestparser/filters.py#153

xpcshell runs tests in parallel and any failures is repeats at the end in series. Unlike mochitest and retest it doesn't chunk_by_dir.

We could look at the runtimes of the tests and find the individuals which are running longer than normal- either split them up, limit their running to select platforms, isolate them in another job, or chunk_by_runtime and include test weights (as we do for mochitest)

I agree this is out of scope, but good to have an understanding of what we are doing and what we could easily do.

As for chunks, it seems we save about 2-3 minutes per chunk we remove and the runtimes are better except for the single 45 minute, but that is just one job which is an outlier and we already have an outlier. I give this a thumbs up

Geoff Brown [:gbrown]

Updated

•

5 years ago

Priority: -- → P3

Edwin Takahashi (:egao | infrequent contributor)

Assignee

Comment 5

•

5 years ago

Attached file Bug 1548160 - task efficiency: review and reduce chunk count of xpcshell for various platforms — Details

ccov chunks are set to 6, with exception of macosx64-ccov at 8
various linux platforms saw reduction in chunks from 8 to 4

Edwin Takahashi (:egao | infrequent contributor)

Assignee

Comment 6

•

5 years ago

linux

4 chunks
https://treeherder.mozilla.org/#/jobs?repo=try&group_state=expanded&searchStr=xpcshell&revision=f725d3c1c16b612b9748813d6c4d0bb4844ac575&selectedJob=244078851

observations

chunk runtimes are either uneven, or generally under the 30 minute mark but just so.

5 chunks
https://treeherder.mozilla.org/#/jobs?repo=try&group_state=expanded&revision=ae20fb489e94cf6a9df348cc3c9e9f932f3d5408

observations

similar to above

6 chunks
https://treeherder.mozilla.org/#/jobs?repo=try&group_state=expanded&searchStr=xpcshell&revision=787eb0c2a1b1ded6589c4f5d32cff54bbc957b5a&selectedJob=243786868

observations

too many small chunks for not much gain

conclusion

linux chunk sizes of 4 or 5 are preferable; if uneven chunks can be resolved, 4 chunks or even 3 chunks may be preferable.

windows10-aarch64

1 chunk

runtime is too long (> 60 min)

2 chunks

runtime is too long
runtime is uneven

3 chunks

chunk runtime is uneven
currently used in mozilla-central

conclusion

windows10-aarch64 chunk size of 3 (current) or 4 is preferable.

Phabricator Automation

Updated

•

5 years ago

Attachment #9062279 - Attachment description: Bug 1548160 - task efficiency - review and reduce chunk count for various platforms → Bug 1548160 - task efficiency: review and reduce chunk count of xpcshell for various platforms

Edwin Takahashi (:egao | infrequent contributor)

Assignee

Updated

•

5 years ago

No longer depends on: 1548106

Edwin Takahashi (:egao | infrequent contributor)

Assignee

Updated

•

5 years ago

Blocks: task-efficiency-test-overhead

Edwin Takahashi (:egao | infrequent contributor)

Assignee

Updated

•

5 years ago

No longer blocks: task-efficiency-test-overhead

Edwin Takahashi (:egao | infrequent contributor)

Assignee

Updated

•

5 years ago

Blocks: task-efficiency-test-overhead
No longer blocks: test-efficiencies

Pulsebot

Comment 7

•

5 years ago

Pushed by egao@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/b447fc4d689d
task efficiency: review and reduce chunk count of xpcshell for various platforms r=gbrown,jmaher

Narcis Beleuzu [:NarcisB]

Comment 8

•

5 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/b447fc4d689d

Status: NEW → RESOLVED

Closed: 5 years ago

status-firefox68: --- → fixed

Resolution: --- → FIXED

Target Milestone: --- → mozilla68

BugBot [:suhaib / :marco/ :calixte]

Updated

•

5 years ago

Assignee: nobody → egao

Geoff Brown [:gbrown]

Updated

•

5 years ago

Regressions: 1552580

Joel Maher ( :jmaher ) (UTC -8)

Comment 9

•

5 years ago

I see that osx debug is 2 chunks and opt is 5 chunks:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=aa906ac6a62cb0d8e9d8e73b5804183cffc720cd

opt runs fast, 4 chunks in <=10 minutes and 5th chunk is 20+ minutes;
debug runs slow ~17 and ~37 minutes

I suspect there is one or two manifests which are longer- maybe to follow up here is to split large manifests into smaller ones so we can load balance better?

You need to log in before you can comment on or make changes to this bug.