Non-incremental GCs are skewing tp5o responsiveness
Categories
(Testing :: Talos, defect, P3)
Tracking
(Not tracked)
People
(Reporter: davehunt, Unassigned)
References
Details
(Whiteboard: [fxp])
+++ This bug was initially created as a clone of Bug #1517758 +++
From bug 1517758 comment 4:
The "tp5o responsiveness" benchmark is flawed.
How it works: a background thread sends a trace event to the native event
queue (GTK, Windows, etc) and then we measure the time it takes Gecko to
handle it. If that's more than 20 ms, we report it and the harness does some
calculations on these values to determine the final responsiveness score.
(We also ensure we don't trigger more than one event in each 10 ms window.)That's pretty reasonable but the problem is that the benchmark harness
triggers thousands of non-incremental GCs with windowUtils.garbageCollect().
So what we end up measuring is effectively non-incremental GC times: if
non-incremental GCs get a bit slower we will end up with a higher score.
Incremental GCs are what you want when it comes to responsiveness and by
forcing so many non-incremental GCs we're just making things hard for
ourselves and skewing the results to something non-representative.If I make nsDOMWindowUtils::GarbageCollect a no-op, I get a score of 0.04 on
Linux64 Opt on Try (compare to 0.97 -> 1.08 in comment 0).
Now why do my patches make non-incremental GCs slower? When we call
windowUtils.garbageCollect(), we preserve JIT code and type information in
that zone because it's active. That now means the whole system zone instead
of just that window. Tracing JIT code and ICs is a bit slower than just
discarding it hence GC becoming a bit slower. If I make
GCRuntime::shouldPreserveJITCode always return false, I get a score of
0.69-0.73, better than before.
Comment 1•5 years ago
|
||
Some other comments there are relevant:
bug 1517758 comment 11 from bz:
I think the upshot is that you need to either stop doing forced gc in this test or ignore all events which are measuring time across a gc you're forcing yourself.
bug 1517758 comment 13 from bz:
It's not exactly responsiveness data "between the markers". It's data that corresponds to any interval that started before the end of a GC and ended after the start of the same GC.
So depending on the structure of the setup, the data could be coming in after the "end GC" marker, for example.
Comment 2•5 years ago
•
|
||
This doesn't just do a non-incremental GC, it does a cycle collection too:
https://searchfox.org/mozilla-central/source/dom/base/nsDOMWindowUtils.cpp#1093
This is something that never happens in real use (unless the system is OOM). So this test is not representative of anything we want to measure.
Updated•4 months ago
|
Updated•4 months ago
|
| Reporter | ||
Comment 3•3 months ago
|
||
@sparky it seems we don't have a owner/contact for tp5o responsiveness outside of the perftest team. Do you think this is worth fixing, or if the test isn't doing what we want it to, could it make sense to remove the test? @kshampur I know you're looking into relative value of our tests, do you have any thoughts on tp5o?
Comment 4•3 months ago
|
||
Seems like it's worth fixing to me considering the comments above. I know we'd like to update the pageset used for these tests eventually since tp5 is very old now so this could be done at the same time as those changes.
Comment 5•3 months ago
|
||
I agree on fixing it
I can't find a bug, but seems we disabled reporting on the responsiveness metric/test? (does this ring a bell? perhaps i am looking in the wrong places...)
Also I think we should decouple the upgrade of the tp5 pageset to the garbage collection fix
Description
•