Open Bug 1045205 Opened 5 years ago Updated 3 years ago

mochitest performance regression from structure logging on tests with large numbers of assertions

Categories

(Testing :: Mochitest, defect, critical)

defect
Not set
critical

Tracking

(Not tracked)

People

(Reporter: dbaron, Unassigned)

References

Details

(Keywords: perf)

Attachments

(1 file)

Bug 886570 (structured logging in mochitest) caused a performance regression, documented and worked around in bug 1041075, on mochitests with large numbers of assertions, at least on Android.

This performance regression should probably be fixed.
We should certainly optimise here, but it's unclear that "zero performance regression" is a viable goal; structured logging does more work in order to retain more data; just writing a string to stdout is cheap compared to serializing a data structure, reparsing the serialized data and writing it again in a backwards-compatible format.
Given that this happened specifically with a large number of assertions, I'm concerned that there might be a small O(N^2) term here, which could be problematic.
Current theory is that this may be why we're seeing the worst Windows test backlogs in the past few weeks than I can recall ever seeing. Can we please bump the priority on this?
Severity: normal → critical
I did a profile of a mochitest run:
https://pastebin.mozilla.org/5826770

Granted, this isn't windows or debug though. Will do another on a debug build when I get a chance.
Since a profile of a screaming fast linux machine probably isn't all that useful, here's a try run with the same profiling enabled:
https://tbpl.mozilla.org/?tree=Try&rev=05104cac4251

Note the windows jobs will likely take awhile..
Oh, that is profiling the main thread while output processing (where the bulk of the structured logging happens) is on a separate thread.

This run should profile the correct thread:
https://tbpl.mozilla.org/?tree=Try&rev=a2311e4d690c
Here's a profile from the WinXP debug job. It looks more like what I was expecting.

I added up all the cumulative times for functions in structuredlog, and it came to around 26 seconds (this is an upper bound, in reality the time spent in structuredlog is somewhere between 8.5-26 seconds).

What this doesn't take into account are the changes made on the JS side. It's possible those are much slower than before.. I'd be surprised if that were the case, but I guess it's worth grabbing a profile of that as well.
I uploaded a profile from a try run of a subset of mochitests (from layout/style) on Windows debug here:

https://tbpl.mozilla.org/?tree=Try&rev=935ee1785c13

The profiles are uploaded to blobber, I used http://people.mozilla.org/~bgirard/cleopatra/ to view them. Searching through for "StructuredLogger" yields < 1% of time samples, although after reading a little about how the profiler works I'm a little worried we end up with too few samples for this to be accurate.
Profiling Gecko seems unlikely to help when the performance regression is likely to be in other processes, no?
My goal was to profile the parts of Mochitest that are implemented in javascript. The profile has entries mentioning SimpleTest.js and TestRunner.js, but perhaps I haven't done that?
See Also: → 1317732
Blocks: 1317732
See Also: 1317732
In bug 1317732, reducing the (somewhat excessive) logging in test_value_storage.html decreased the run time on Android Debug by more than 50%. That seems to suggest there is potential for significant impact if structured logging performance can be improved.
You need to log in before you can comment on or make changes to this bug.