Closed Bug 1243379 Opened 8 years ago Closed 2 months ago

Find a way to report through Telemetry what's eating up GC time

Categories

(Core :: JavaScript: GC, defect)

defect

Tracking

()

RESOLVED WONTFIX

People

(Reporter: Yoric, Unassigned)

Details

While we do not have hard numbers to back this hunch, it looks like longer sessions lead to more time spent doing GC (and CC). It would be useful to find a way to extract some information from the GC telling us where it is spending time.

Note that we do not need exact information, nor do we need it to be on all the time. What we do need is information in-the-wild, not just on our computers.
Some info that could be useful:
- do we have long-lived content memory?
- do we spend lots of time traversing content memory (respectively chrome memory)?
Putting fitzgen in the loop, as he has worked recently on making gc impact understandable.
Flags: needinfo?(nfitzgerald)
(In reply to David Rajchenbach-Teller [:Yoric] (please use "needinfo") from comment #0)
> What we do need is information in-the-wild, not just on our computers.

Actually, I think the first step would be to create a procedure which could work locally, and after it works reasonably effectively, try to generalize it to something which could also carry meaning in an aggregate (telemetry).

I.e. I think we should start with a local too which could say "X is causing us to spend a lot of time in GC/CC", where X could be a (pinned?) web page, an addon, something in firefox itself, etc.
There is a lot of existing GC-related telemetry that is pretty heavily used by GC folks -- what is that missing for you?
Flags: needinfo?(nfitzgerald)
Telemetry just tells that average/median/.. times have improved or regressed. It doesn't hint at all what might be leaking.
As far as I understand we're talking about more detailed information here. Something like names of the roots in the suspected subgraphs or something similar (at least in case of CC, not sure about GC).
There is also the gcreason for why the GC was triggered: https://telemetry.mozilla.org/new-pipeline/dist.html#!cumulative=0&end_date=2016-01-25&keys=__none__!__none__!__none__&max_channel_version=nightly%252F47&measure=GC_REASON_2&min_channel_version=null&product=Firefox&sanitize=1&sort_keys=submissions&start_date=2016-01-25&table=0&trim=1&use_submission_date=0

Whether the GC was incremental or not: https://telemetry.mozilla.org/new-pipeline/dist.html#!cumulative=0&end_date=2016-01-25&keys=__none__!__none__!__none__&max_channel_version=nightly%252F47&measure=GC_NON_INCREMENTAL&min_channel_version=null&product=Firefox&sanitize=1&sort_keys=submissions&start_date=2016-01-25&table=0&trim=1&use_submission_date=0

The longest phase in any slice that goes over 2x the budget: https://telemetry.mozilla.org/new-pipeline/dist.html#!cumulative=0&end_date=2016-01-25&keys=__none__!__none__!__none__&max_channel_version=nightly%252F47&measure=GC_SLOW_PHASE&min_channel_version=null&product=Firefox&sanitize=1&sort_keys=submissions&start_date=2016-01-25&table=0&trim=1&use_submission_date=0

Whether GCs are compartmental or full runtime: https://telemetry.mozilla.org/new-pipeline/dist.html#!cumulative=0&end_date=2016-01-25&keys=__none__!__none__!__none__&max_channel_version=nightly%252F47&measure=GC_IS_COMPARTMENTAL&min_channel_version=null&product=Firefox&sanitize=1&sort_keys=submissions&start_date=2016-01-25&table=0&trim=1&use_submission_date=0

Reason that caused a long (>1ms) minor GC: https://telemetry.mozilla.org/new-pipeline/dist.html#!cumulative=0&end_date=2016-01-25&keys=__none__!__none__!__none__&max_channel_version=nightly%252F47&measure=GC_MINOR_REASON_LONG&min_channel_version=null&product=Firefox&sanitize=1&sort_keys=submissions&start_date=2016-01-25&table=0&trim=1&use_submission_date=0

Plus a bunch of others.
(In reply to Nick Fitzgerald [:fitzgen] [⏰PST; UTC-8] from comment #6)
> There is also the gcreason for why the GC was triggered ...
> Plus a bunch of others.

Can any of those help us identify what are the high-level-entities* which are causing us to spend a lot of time on GC or CC?

* high level entity is not well defined, but hopefully it might be a specific web page, an addon name, a firefox module which leaks, etc? i.e. point at something which we can try to investigate further?
Side-note: "specific web page" is probably something that we cannot send through Telemetry, but we might theoretically be able to do some pre-processing, to send mdatadata:
- how long the page has been opened;
- which DOM features it uses;
- which JS features it uses;
- ...
Severity: normal → S3

sfink: I'm inclined to close this, but if there's still value to thinking about this let me know.

Flags: needinfo?(sfink)

Since I don't expect us to specifically work on this anytime soon unless we get specific examples to poke at, I'm in favor of closing. I would say that today we would satisfy this need with a combination of the telemetry values that fitzgen gave in comment 6 for aggregate in-the-field values, and the devtools memory profiler (that fitzgen implemented the backend for, iirc—notice a theme?) for local examination and identification of the specific types of objects etc.

We don't really have anything for getting actionable specific information from in-the-field executions, sadly, but I can't think of any immediate ideas for how to do that without making it a big Project. I could imagine an automatically-triggered "hey your browser is struggling here, want to collect and send over some debugging info?" mode. Or a when things bog down, have a 0.1% change of grabbing a random sample with a basic census of JSClass buckets or something. It's a sensitive UX issue, though, especially when balancing privacy vs actionability vs the users' experience: you don't want to mess with kiosk users or to freak out Uncle Kevin who firmly believes the browser running on his Miracast-capable electric toothbrush ought to be able to browse any and everything and if it has trouble it's an evil guvmint conspiracy to force him to to spend money on a computer that will cut into his sardine and saltine cracker budget, and anyway it'll spy on him and enrich the billionaire jerks who...

Sorry, got a little carried away there. In short, good idea, but I think to make something like this actually happen we'd need a bug filed with a more specific idea for what to gather and how.

Status: NEW → RESOLVED
Closed: 2 months ago
Flags: needinfo?(sfink)
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.