Open Bug 1665600 Opened 5 years ago Updated 3 months ago

Non-incremental GCs are skewing tp5o responsiveness

Tracking

(Not tracked)

Status:

NEW

People

(Reporter: davehunt, Unassigned)

References

Details

(Whiteboard: [fxp])

Dave Hunt [:davehunt] [he/him] ⌚BST

Reporter

Description

•

5 years ago

+++ This bug was initially created as a clone of Bug #1517758 +++

From bug 1517758 comment 4:

The "tp5o responsiveness" benchmark is flawed.

How it works: a background thread sends a trace event to the native event
queue (GTK, Windows, etc) and then we measure the time it takes Gecko to
handle it. If that's more than 20 ms, we report it and the harness does some
calculations on these values to determine the final responsiveness score.
(We also ensure we don't trigger more than one event in each 10 ms window.)

That's pretty reasonable but the problem is that the benchmark harness
triggers thousands of non-incremental GCs with windowUtils.garbageCollect().
So what we end up measuring is effectively non-incremental GC times: if
non-incremental GCs get a bit slower we will end up with a higher score.
Incremental GCs are what you want when it comes to responsiveness and by
forcing so many non-incremental GCs we're just making things hard for
ourselves and skewing the results to something non-representative.

If I make nsDOMWindowUtils::GarbageCollect a no-op, I get a score of 0.04 on
Linux64 Opt on Try (compare to 0.97 -> 1.08 in comment 0).

Now why do my patches make non-incremental GCs slower? When we call
windowUtils.garbageCollect(), we preserve JIT code and type information in
that zone because it's active. That now means the whole system zone instead
of just that window. Tracing JIT code and ICs is a bit slower than just
discarding it hence GC becoming a bit slower. If I make
GCRuntime::shouldPreserveJITCode always return false, I get a score of
0.69-0.73, better than before.

Steve Fink [:sfink] [:s:]

Comment 1

•

5 years ago

Some other comments there are relevant:

bug 1517758 comment 11 from bz:

I think the upshot is that you need to either stop doing forced gc in this test or ignore all events which are measuring time across a gc you're forcing yourself.

bug 1517758 comment 13 from bz:

It's not exactly responsiveness data "between the markers". It's data that corresponds to any interval that started before the end of a GC and ended after the start of the same GC.

So depending on the structure of the setup, the data could be coming in after the "end GC" marker, for example.

Jon Coppeard (:jonco)

Comment 2

•

5 years ago

•

Edited

This doesn't just do a non-incremental GC, it does a cycle collection too:

https://searchfox.org/mozilla-central/source/dom/base/nsDOMWindowUtils.cpp#1093

This is something that never happens in real use (unless the system is OOM). So this test is not representative of anything we want to measure.

Alex Finder| PTO returning Dec. 8th

Updated

•

4 months ago

Whiteboard: [fxp]

Jira Integration Bot

Updated

•

4 months ago

See Also: → https://mozilla-hub.atlassian.net/browse/FXP-4401

Dave Hunt [:davehunt] [he/him] ⌚BST

Reporter

Comment 3

•

3 months ago

@sparky it seems we don't have a owner/contact for tp5o responsiveness outside of the perftest team. Do you think this is worth fixing, or if the test isn't doing what we want it to, could it make sense to remove the test? @kshampur I know you're looking into relative value of our tests, do you have any thoughts on tp5o?

Flags: needinfo?(kshampur)

Flags: needinfo?(gmierz2)

Greg Mierzwinski [:sparky]

Comment 4

•

3 months ago

Seems like it's worth fixing to me considering the comments above. I know we'd like to update the pageset used for these tests eventually since tp5 is very old now so this could be done at the same time as those changes.

Flags: needinfo?(gmierz2)

Kash Shampur [:kshampur] ⌚EST

Comment 5

•

3 months ago

I agree on fixing it

I can't find a bug, but seems we disabled reporting on the responsiveness metric/test? (does this ring a bell? perhaps i am looking in the wrong places...)

Also I think we should decouple the upgrade of the tp5 pageset to the garbage collection fix

Flags: needinfo?(kshampur)

You need to log in before you can comment on or make changes to this bug.

Bugzilla

Non-incremental GCs are skewing tp5o responsiveness

Categories

(Testing :: Talos, defect, P3)

Tracking

(Not tracked)

People

(Reporter: davehunt, Unassigned)

References

Details

(Whiteboard: [fxp])

Crash Data

Security

(public)

User Story

Description

Comment 1

Comment 2

Updated

Updated

Comment 3

Comment 4

Comment 5