Handle Telemetry::Accumulate in the GPU process

RESOLVED FIXED

Status

()

defect
P3
normal
RESOLVED FIXED
3 years ago
6 months ago

People

(Reporter: dvander, Assigned: dvander)

Tracking

(Blocks 2 bugs, {feature})

Firefox Tracking Flags

(Not tracked)

Details

(Whiteboard: [gfx-noted])

Attachments

(4 attachments, 4 obsolete attachments)

Initially, I figured we wouldn't need any telemetry directly in the GPU process. Anything truly interesting can be sent up to the UI. However, it looks like APZ has some Telemetry instrumentation for checkerboarding, and more cases like this might come up as we near shipping. I'd rather not make each of these its own ad-hoc message with ad-hoc handling. Also, evidence in bug 1218576 is that sending an IPDL message for each Accumulate is too much overhead.

So it sounds like we want the GPU process to be able to call Accumulate and have it just work. To do this it looks like we need:
 (a) Batched accumulate essages in PGPU similar to what PContent has [1].
 (b) Hook those up to Telemetry when XRE_IsGPUProcess is true.
 (c) Refactor Telemetry.cpp to support multiple subprocess suffixes (i.e. #gpu in addition to #content).
 (d) Ensure the telemetry JSM puts #gpu data into pings.
 (e) Ensure that appropriate dashboards will accumulate both #content and #gpu data.

From IRC:
11:59 AM <chutten> Let me draw a diagram in ASCII... content process calls "Accumulate" which calls "RemoteAccumulate" which batches up your call to be sent across IPC. When it's time, content process calls exactly that SendAccumulateChildHistogram as you say, which passes the batch to the parent process' "AccumulateChild" which then adds the necessary suffixes (or passes the
11:59 AM <chutten> necessary booleans) to ensure it is recorded in a Histogram object in RAM
12:00 PM <chutten> Then, when TelemetryController.jsm wants to make a payload object (when someone opens about:telemetry, or we're writing a ping to disk, or whatever) it pulls all histograms' into JS and puts them in the right place in the payload object

[1] https://dxr.mozilla.org/mozilla-central/rev/62f79d676e0e11b3ad59a5425b3ebb3ec5bbefb5/dom/ipc/PContent.ipdl#1224
Keywords: feature
Whiteboard: [gfx-noted]
This just makes sure nsITelemetry actually instantiates in the GPU process. There are three changes here:

(1) Tweaked Telemetry's XPCOM bindings so the GPU process will opt it in.

(2) Since Telemetry is an XPCOM service, it won't instantiate until something requests its CID. Normally this would happen via JSMs, but the GPU process doesn't load those. To work around this I just forced a do_GetService call after the GPU process inits XPCOM. (Normally we would just call a Startup function directly, but Telemetry doesn't expose one.)

(3) Changed InitializeGlobalState to recognize the GPU process.
Assignee: nobody → dvander
Status: NEW → ASSIGNED
Attachment #8793609 - Flags: review?(chutten)
This makes AccumulateChild/etc take a Process enumeration. All the helper functions in TelemetryHistogram have been updated to switch based on this enum instead of an isChild flag. Places that assumed "child" meant "content" now explicitly refer to content.

Some helper functions for deducing and adding suffixes have been introduced as well.

I added an "Unspecified" process type just because it wasn't clear if |isChild == false| implied "parent process". It looked like some cases might just mean "don't try to do any more decoration". If that assumption is wrong then that enum value can be removed.
Attachment #8793611 - Flags: review?(chutten)
Posted patch part 3, gpu process support (obsolete) — Splinter Review
Note that this looks asymmetrical to content process support since GPUParent lives in the child process and GPUChild lives in the UI (parent) process.
Attachment #8793612 - Flags: review?(chutten)
Comment on attachment 8793609 [details] [diff] [review]
part 1, ensure telemetry loads

Review of attachment 8793609 [details] [diff] [review]:
-----------------------------------------------------------------

Pulling :gfritzsche in to look at this module init stuff as my knowledge of it is very weak.
Attachment #8793609 - Flags: review?(gfritzsche)
Attachment #8793609 - Flags: review?(chutten)
Attachment #8793609 - Flags: feedback+
I think this is an interesting enough change that i should look over it anyway.
I'm at a work-week though, so it might take until monday to take a closer look.
Comment on attachment 8793611 [details] [diff] [review]
part 2, multi-child-process support

Review of attachment 8793611 [details] [diff] [review]:
-----------------------------------------------------------------

A few questions, but I'm liking it over all!

::: toolkit/components/telemetry/TelemetryHistogram.cpp
@@ +477,4 @@
>  {
> +  nsDependentCString name(aName);
> +  Process process = GetProcessFromName(name);
> +  const char* suffix = SuffixForProcessType(process);

Hm. I would've if (process == Process::Parent) and saved the suffix for the non-parent case. Either way's fine.

Your comment is correct: !content meant parent within the context of TelemetryHistogram. No need for Process::Unspecified, I think.

@@ +511,5 @@
> +  Histogram** knownList = nullptr;
> +
> +  switch (aProcess) {
> +  case Process::Parent:
> +  case Process::Unspecified:

If you want to keep Unspecified around, fall into the default case of getting the histogram info from gHistograms and calling through to internal_HistogramGet. These are just caches and aren't required.

@@ +652,5 @@
> +
> +  Process process = GetProcessFromName(existing.histogram_name());
> +  switch (process) {
> +  case Process::Parent:
> +  case Process::Unspecified:

ditto

@@ +1329,5 @@
>        return;
>      }
>    } else {
>      Histogram *h;
> +    nsresult rv = internal_GetHistogramByEnumId(aID, &h, Process::Unspecified);

definitely called on parent.

...actually, we don't guarantee that, do we. Hrm.

Well, only the parent versions of histograms can have their recording disabled... this seems like it might be something I missed in bug 1218576.

I have filed bug 1304730 to look into this further. For now, assume parent or whatever is most convenient.

@@ +1495,3 @@
>                                const nsCString& aKey, uint32_t aSample)
>  {
> +  // :TODO:

Checking in a Todo?

@@ +2290,5 @@
>      return;
>    }
>  
>    Histogram *h;
> +  nsresult rv = internal_GetHistogramByEnumId(aId, &h, Process::Unspecified);

If you are willing, through in a MOZ_ASSERT(XRE_IsParentProcess()); in here and assume Parent. It should never actually be called.

...actually, scrap that, I've filed Bug 1304735 to remove ClearHistogram entirely, so do what you will :)

::: toolkit/components/telemetry/TelemetryTypes.h
@@ +8,5 @@
> +
> +namespace mozilla {
> +namespace Telemetry {
> +
> +enum class Process {

Any reason to not use nsXULAppAPI.h's GeckoProcessType? I believe we're allowed to assume that Telemetry is running inside of Gecko.
Attachment #8793611 - Flags: review?(chutten)
Comment on attachment 8793612 [details] [diff] [review]
part 3, gpu process support

Review of attachment 8793612 [details] [diff] [review]:
-----------------------------------------------------------------

Looks suitable, but may need to adapt to changes on earlier patches.

Also, it seems as though :gfritzsche wants in on this :)
Attachment #8793612 - Flags: review?(chutten) → feedback+
Yeah, we can use GeckoProcessType if Unspecified isn't necessary. I'll put a new patch up today. 

I don't know what that TODO is. I don't usually write those, so it was probably a search marker.
Addresses review comments - now uses GeckoProcessType.
Attachment #8793611 - Attachment is obsolete: true
Attachment #8793945 - Flags: review?(gfritzsche)
Posted patch part 3, gpu process support (obsolete) — Splinter Review
Rebased.
Attachment #8793612 - Attachment is obsolete: true
Attachment #8793946 - Flags: review?(gfritzsche)
With this I see gpu stuff get included in my saved sessions. I'm not sure I see anything in the about:telemetry UI though.

A few thoughts:

(1) Is it worth excluding the "gpu" key when the GPU process is not enabled?
(2) What kind of tests can I write to make sure this stays working?
(3) I see a lot of pointless histograms get included, potentially increasing the payload size. Like "SOCIAL_ENABLED_ON_SESSION". It looks like this is by design for "flag" histograms. Is it worth finding a way to exclude stuff that is totally irrelevant in the GPU process?
Attachment #8794458 - Flags: review?(gfritzsche)
(In reply to David Anderson [:dvander] from comment #11)
> Created attachment 8794458 [details] [diff] [review]
> part 4, include GPU histograms in session pings
> 
> With this I see gpu stuff get included in my saved sessions. I'm not sure I
> see anything in the about:telemetry UI though.

In theory the gpu process histograms should just show up, you should see them after refreshing about:telemetry.
They should be visisble in:
* the raw json (switch "ping display" to "raw json")
* the histogram rendering, when switching to "gpu" process on the far right: http://imgur.com/a/5jqL6
(In reply to David Anderson [:dvander] from comment #11)
> (1) Is it worth excluding the "gpu" key when the GPU process is not enabled?

Yes, that would be great to avoid the noise from default flags etc.

> (2) What kind of tests can I write to make sure this stays working?

Is there any existing testing that loads the GPU process?
And could we accumulate some histograms there, then check that they get propagated properly?
Existing test coverage for "content" process Telemetry is in test_ChildHistograms.js, but i don't know if we can spawn a "gpu" process in a feasible way?

> (3) I see a lot of pointless histograms get included, potentially increasing
> the payload size. Like "SOCIAL_ENABLED_ON_SESSION". It looks like this is by
> design for "flag" histograms. Is it worth finding a way to exclude stuff
> that is totally irrelevant in the GPU process?

We need to solve that generally, not just for the "gpu" process. I would not worry about it here.
Bug 1281791 will look into disabling histograms by default in child processes except whitelisted ones.
Comment on attachment 8793609 [details] [diff] [review]
part 1, ensure telemetry loads

Review of attachment 8793609 [details] [diff] [review]:
-----------------------------------------------------------------

Nathan, can you take a look at the Module::ALLOW_IN_GPU_PROCESS additions in Telemetry.cpp?
Do you know if that looks right and if we need all of them?

::: toolkit/components/telemetry/Telemetry.cpp
@@ +1913,5 @@
> +  if (XRE_IsParentProcess() ||
> +      XRE_IsContentProcess() ||
> +      XRE_IsGPUProcess())
> +  {
> +    useTelemetry = true;

Nit: Initialize with these?
const bool useTelemetry = (XRE_Is...
Attachment #8793609 - Flags: review?(gfritzsche)
Attachment #8793609 - Flags: review+
Attachment #8793609 - Flags: feedback?(nfroyd)
Comment on attachment 8793945 [details] [diff] [review]
part 2, multi-child-process support v2

Review of attachment 8793945 [details] [diff] [review]:
-----------------------------------------------------------------

LGTM with the below fixed.

::: toolkit/components/telemetry/Telemetry.h
@@ +133,5 @@
>   * Accumulate child data into child histograms
>   *
>   * @param aAccumulations - accumulation actions to perform
>   */
> +void AccumulateChild(GeckoProcessType aProcessType, const nsTArray<Accumulation>& aAccumulations);

Nit: Also update the doc comment above.

@@ +140,5 @@
>   * Accumulate child data into child keyed histograms
>   *
>   * @param aAccumulations - accumulation actions to perform
>   */
> +void AccumulateChildKeyed(GeckoProcessType aProcessType, const nsTArray<KeyedAccumulation>& aAccumulations);

Nit: Also update the doc comment above.

::: toolkit/components/telemetry/TelemetryHistogram.cpp
@@ +1495,5 @@
>    }
>    const HistogramInfo& th = gHistograms[aId];
>    nsCString id;
>    id.Append(th.id());
> +  if (const char* suffix = SuffixForProcessType(aProcessType)) {

suffix should never be null here.
Lets MOZ_ASSERT() that and also early-return if suffix==null.
Attachment #8793945 - Flags: review?(gfritzsche) → review+
Comment on attachment 8793946 [details] [diff] [review]
part 3, gpu process support

Review of attachment 8793946 [details] [diff] [review]:
-----------------------------------------------------------------

::: toolkit/components/telemetry/TelemetryHistogram.cpp
@@ +2083,5 @@
> +      nsCString gpuId(id);
> +      gpuId.AppendLiteral(GPU_HISTOGRAM_SUFFIX);
> +      gKeyedHistograms.Put(gpuId,
> +                           new KeyedHistogram(id, expiration, h.histogramType,
> +                                              h.min, h.max, h.bucketCount, h.dataset));

I'm not sure if we need that for the GPU process.
Chris, do you recall offhand what this covers?

@@ +2731,3 @@
>      }
> +    default:
> +      NS_WARNING("Unsupported process type");

Add a MOZ_ASSERT()?
Attachment #8793946 - Flags: review?(gfritzsche) → feedback+
(In reply to Georg Fritzsche [:gfritzsche] from comment #16)
> Comment on attachment 8793946 [details] [diff] [review]
> part 3, gpu process support
> 
> Review of attachment 8793946 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> ::: toolkit/components/telemetry/TelemetryHistogram.cpp
> @@ +2083,5 @@
> > +      nsCString gpuId(id);
> > +      gpuId.AppendLiteral(GPU_HISTOGRAM_SUFFIX);
> > +      gKeyedHistograms.Put(gpuId,
> > +                           new KeyedHistogram(id, expiration, h.histogramType,
> > +                                              h.min, h.max, h.bucketCount, h.dataset));
> 
> I'm not sure if we need that for the GPU process.
> Chris, do you recall offhand what this covers?
Flags: needinfo?(chutten)
Comment on attachment 8794458 [details] [diff] [review]
part 4, include GPU histograms in session pings

Review of attachment 8794458 [details] [diff] [review]:
-----------------------------------------------------------------

This looks fine, provided about:telemetry is confirmed to work.
We still need to figure out test-coverage though, as well as not including the data when the GPU process is not enabled (unless we expect this to be the default very soon).
Attachment #8794458 - Flags: review?(gfritzsche) → feedback+
(In reply to Georg Fritzsche [:gfritzsche] from comment #17)
> (In reply to Georg Fritzsche [:gfritzsche] from comment #16)
> > Comment on attachment 8793946 [details] [diff] [review]
> > part 3, gpu process support
> > 
> > Review of attachment 8793946 [details] [diff] [review]:
> > -----------------------------------------------------------------
> > 
> > ::: toolkit/components/telemetry/TelemetryHistogram.cpp
> > @@ +2083,5 @@
> > > +      nsCString gpuId(id);
> > > +      gpuId.AppendLiteral(GPU_HISTOGRAM_SUFFIX);
> > > +      gKeyedHistograms.Put(gpuId,
> > > +                           new KeyedHistogram(id, expiration, h.histogramType,
> > > +                                              h.min, h.max, h.bucketCount, h.dataset));
> > 
> > I'm not sure if we need that for the GPU process.
> > Chris, do you recall offhand what this covers?

When gathering the payload in TelemetrySession.jsm, see bug 1218576 comment 49

I never did file that bug for removing the requirement... I have rectified the lapse with bug 1305758
Flags: needinfo?(chutten)
Thanks for the quick reviews! This addresses the review comment, but also adds a small change: if we never launched a GPU process we don't bother forcing flag/etc histograms into JS snapshots.
Attachment #8793946 - Attachment is obsolete: true
Attachment #8795601 - Flags: review?(gfritzsche)
Comment on attachment 8795601 [details] [diff] [review]
part 3, gpu process support

for gfx changes
Attachment #8795601 - Flags: review?(matt.woodrow)
This version only adds the "gpu" key to the payload only if GPU histograms were accumulated. I can confirm that, with the GPU process enabled, they show up in about:telemetry.

Right now there isn't a way to force the GPU process on via tests, but fairly soon (~1month or so) we will try enabling it on Nightly for Windows. So maybe a better thing to do is to test that GPU histograms are accumulated when the GPU process is enabled? If that sounds good, is there an existing test I can use as a template?

I hope to add infrastructure for testing the GPU process soon (bug 1297260), but it's not there yet.
Attachment #8794458 - Attachment is obsolete: true
Attachment #8795603 - Flags: review?(gfritzsche)
Attachment #8795601 - Flags: review?(matt.woodrow) → review+
Attachment #8793609 - Flags: feedback?(nfroyd) → feedback+
Attachment #8795601 - Flags: review?(gfritzsche) → review+
(In reply to David Anderson [:dvander] from comment #22)
> Created attachment 8795603 [details] [diff] [review]
> part 4, include GPU histograms in session pings
> 
> This version only adds the "gpu" key to the payload only if GPU histograms
> were accumulated. I can confirm that, with the GPU process enabled, they
> show up in about:telemetry.
> 
> Right now there isn't a way to force the GPU process on via tests, but
> fairly soon (~1month or so) we will try enabling it on Nightly for Windows.
> So maybe a better thing to do is to test that GPU histograms are accumulated
> when the GPU process is enabled? If that sounds good, is there an existing
> test I can use as a template?

That sounds like a good option.
I don't in which test environments we can invoke the GPU process though.
Do you think that is possible from xpcshell tests? Or should we figure out a different testing environment?
(In reply to David Anderson [:dvander] from comment #22)
> If that sounds good, is there an existing
> test I can use as a template?

How can we trigger the GPU process & in which test suites?
Can we run JS in it to trigger histogram accumulations?
Or can we just use one existing histogram from the tests that are reliably recorded in a spawned GPU process?
Comment on attachment 8795603 [details] [diff] [review]
part 4, include GPU histograms in session pings

Review of attachment 8795603 [details] [diff] [review]:
-----------------------------------------------------------------

This looks good to me with the below addressed and test coverage settled in a separate patch.

::: toolkit/components/telemetry/TelemetrySession.jsm
@@ +1303,5 @@
>        },
>      };
>  
> +    // Only include the GPU process if we've accumulated data for it.
> +    let gpuProcess = {

Minor: This seems redundant?

```
if (HISTOGRAM_SUFFIXES.GPU in histograms || ...) {
  payloadObj.process.gpu = {
    histograms: ...,
    ...
  };
```
Attachment #8795603 - Flags: review?(gfritzsche) → review+
Am going to land these four patches now and do tests as a follow-up shortly, since we'd like to try the GPU process on nightly tomorrow.
Keywords: leave-open

Comment 27

3 years ago
Pushed by danderson@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/2963a5b02f01
Ensure we start the Telemetry singleton in the GPU process. (bug 1304494 part 1, r=gfritzsche)
https://hg.mozilla.org/integration/mozilla-inbound/rev/0129ae28198b
Refactor TelemetryHistogram to support multiple child process types. (bug 1304494 part 2, r=gfritzsche)
https://hg.mozilla.org/integration/mozilla-inbound/rev/ebb6abdb33b9
Send accumulated GPU process telemetry to the UI process. (bug 1304494 part 3, r=gfritzsche)
https://hg.mozilla.org/integration/mozilla-inbound/rev/9a76e10e5450
Add GPU process histograms to Telemetry pings. (bug 1304494 part 4, r=gfritzsche)
The leave-open keyword is there and there is no activity for 6 months.
:davidb, maybe it's time to close this bug?
Flags: needinfo?(dbolter)
Jessie can you find out if we need this bug left open?
Flags: needinfo?(dbolter) → needinfo?(jbonisteel)

Updated

6 months ago
Blocks: 1511076
The implementation of telemetry in the GPU process landed and has been working for a while.

It could be good to still add tests for it though. I've filed bug 1511076 to track that. We can close this bug.
Status: ASSIGNED → RESOLVED
Last Resolved: 6 months ago
Flags: needinfo?(jbonisteel)
Keywords: leave-open
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.