Open Bug 895225 Opened 11 years ago Updated 2 years ago

Investigate why there are multiple gigabytes of write I/O during xpcshell tests

Categories

(Testing :: XPCShell Harness, defect)

defect

Tracking

(Not tracked)

People

(Reporter: gps, Unassigned)

References

(Depends on 1 open bug, Blocks 1 open bug)

Details

https://tbpl.mozilla.org/php/getParsedLog.php?id=25391634&tree=Mozilla-Central&full=1 reports nearly 10GB of write I/O during xpcshell test runs on this OS X 10.8 opt build. That seems excessive to me. I think someone should investigate where that I/O is coming from so it can be reduced if possible.

Since the resource monitoring is at the system level, it could be a background service incurring all the I/O.  However, other logs for this job incurred 8.5 and 9.5 GB, so it's likely either something always running or its the tests.
I applied the parallel xpcshell test patch and my SSD capped out at 268MB/s somewhere in Toolkit (I think it was during Places tests).
Mihnea: You said you had instrumented the xpcshell test runner to collect process counters. Do you have any results worth publishing?

It was speculated offline that much of this I/O could come from profile first-run activity. For this to be true, I'd expect a graph of write bytes per tests to consist of a long tail of similar height.

Also, Mihnea's try pushes with excessive amounts of parallel execution of xpcshell tests seem to indicate that the xpcshell test suite is I/O bound, not CPU bound, as we received no wall time or CPU % improvements at higher parallel levels while I/O wait times increased. This is surprising to me as I thought most tests wouldn't be doing that much I/O! Depending on the findings, we should consider optimizations to decrease I/O or automation improvements such as running tests from a RAM disk. But let's collect data on where the I/O is coming from first.
Flags: needinfo?(mihneadb)
Product: mozilla.org → Release Engineering
I added support for psutil to the new parallel harness. Now that it has landed and things have cooled off a bit, I'll add per-test data collecting and start studying that data.

Should this be enabled all the time or should we keep it behind a flag (--profile/--instrument) ?
Flags: needinfo?(mihneadb)
Depends on: 911582
I'm going to move this into Testing :: xpcshell for now. Will open a new bug to track the RelEng side of things.
Component: General Automation → XPCShell Harness
Product: Release Engineering → Testing
QA Contact: catlee
Version: other → Trunk
Blocks: fastci
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.