Default histogram recording to main & content process

RESOLVED FIXED in Firefox 56

Status

()

Toolkit
Telemetry
P1
normal
RESOLVED FIXED
10 months ago
20 days ago

People

(Reporter: gfritzsche, Assigned: chutten)

Tracking

Trunk
mozilla56
Points:
2
Dependency tree / graph

Firefox Tracking Flags

(firefox54 wontfix, firefox55 fix-optional, firefox56 fixed)

Details

(Whiteboard: [measurement:client])

MozReview Requests

()

Submitter Diff Changes Open Issues Last Updated
Loading...
Error loading review requests:

Attachments

(2 attachments, 3 obsolete attachments)

(Reporter)

Description

10 months ago
Similar to bug 1333797 for scalars, we should limit the recorded histograms to specific process types before the GPU process starts to roll out to broader populations [1].

Proposed plan:
- add a required field `record_in_processes` to Histograms.json
- for all existing histograms, fill in `record_in_processes: ["main", "content"]`
- reach out to see which histograms need GPU process recording (:dvander, :ahughes?)
- make sure the new field doesn't surprise/break any data pipeline jobs

1: https://ashughes.com/?p=374
(Reporter)

Updated

8 months ago
Priority: P2 → P1
(Assignee)

Comment 1

8 months ago
The following non-flag histograms were present in payload.processes.gpu amongst Nightly pings submitted on April 18:

CHECKERBOARD_DURATION
CHECKERBOARD_PEAK
CHECKERBOARD_POTENTIAL_DURATION
CHECKERBOARD_SEVERITY
COMPOSITE_FRAME_ROUNDTRIP_TIME
COMPOSITE_TIME
COMPOSITOR_ANIMATION_DURATION
COMPOSITOR_ANIMATION_MAX_CONTIGUOUS_DROPS_APZ
COMPOSITOR_ANIMATION_MAX_CONTIGUOUS_DROPS_CHROME
COMPOSITOR_ANIMATION_MAX_CONTIGUOUS_DROPS_CONTENT
COMPOSITOR_ANIMATION_MAX_LAYER_AREA
COMPOSITOR_ANIMATION_THROUGHPUT_APZ
COMPOSITOR_ANIMATION_THROUGHPUT_CHROME
COMPOSITOR_ANIMATION_THROUGHPUT_CONTENT
CONTENT_RESPONSE_DURATION
D3D11_SYNC_HANDLE_FAILURE
DEVICE_RESET_REASON
FORCED_DEVICE_RESET_REASON
GPU_PROCESS_INITIALIZATION_TIME_MS
MEDIA_DECODER_BACKEND_USED
MEDIA_WMF_DECODE_ERROR
SCROLL_INPUT_METHODS

I expect this is a superset of hgrams we care about having from GPU processes, but we can start here.

(( As for flag histograms, there were none submitted on that day that had a non-false value. I'm thinking we successfully kept them from infecting gpu. ))

Hrm. I wonder if there's any code that expects flag histograms to have a valid value on every process or if I'll be able to cleanly skip all of them when reflecting to JS...
Assignee: nobody → chutten
Status: NEW → ASSIGNED
(Assignee)

Comment 2

8 months ago
Almost forgot gpu's keyed histograms:

[u'IPC_READ_MAIN_THREAD_LATENCY_MS',
 u'IPC_WRITE_LATENCY_MS',
 u'D3D11_COMPOSITING_FAILURE_ID',
 u'OPENGL_COMPOSITING_FAILURE_ID',
 u'CANVAS_WEBGL_ACCL_FAILURE_ID',
 u'IPC_MESSAGE_SIZE',
 u'IPC_READ_LATENCY_MS',
 u'IPC_WRITE_MAIN_THREAD_LATENCY_MS',
 u'MAIN_THREAD_RUNNABLE_MS']
(Assignee)

Comment 3

8 months ago
IPC_{READ,WRITE}_LATENCY_MS are vestigial from bug 1342635 and can be ignored.
Comment hidden (mozreview-request)

Comment 5

7 months ago
mozreview-review
Comment on attachment 8866031 [details]
bug 1335343 - Add initial record_in_processes support

https://reviewboard.mozilla.org/r/137624/#review140994

This looks good to me, thanks. Great job in adding all the entries to the Histograms.json file. I'm holding back the r+ until the issue below is cleared.

::: toolkit/components/telemetry/histogram_tools.py:168
(Diff revision 1)
>  
>      def dataset(self):
>          """Returns the dataset this histogram belongs into."""
>          return self._dataset
>  
>      def labels(self):

I think you also need to add getters here. See the scalars implementation [here](http://searchfox.org/mozilla-central/rev/7057a51c5cf29b5b115b1db19ace2cfe3fd83a0e/toolkit/components/telemetry/parse_scalars.py#241,246). You probably just need to add one to get the enum needed for the implementation.

Are doc updates coming as a separate patch?
Comment hidden (mozreview-request)

Comment 7

7 months ago
mozreview-review
Comment on attachment 8866031 [details]
bug 1335343 - Add initial record_in_processes support

https://reviewboard.mozilla.org/r/137624/#review141086

This looks good now, thanks! Let's not forget to update the docs later!
Attachment #8866031 - Flags: review?(alessio.placitelli) → review+
Comment hidden (mozreview-request)

Comment 9

7 months ago
Pushed by chutten@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/69ddf4e384ab
Add initial record_in_processes support r=Dexter
Backed out for lint failures like https://treeherder.mozilla.org/logviewer.html#?job_id=98041423&repo=autoland

https://hg.mozilla.org/integration/autoland/rev/aa740eb0e52cc6b9f8b24388caa43e2de7b63f47
Flags: needinfo?(chutten)
(Assignee)

Comment 11

7 months ago
Ack. I knew I forgot something.
Flags: needinfo?(chutten)
Comment hidden (mozreview-request)

Comment 13

7 months ago
Pushed by chutten@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/ed073a4c343e
Add initial record_in_processes support r=Dexter
(Reporter)

Comment 14

7 months ago
Can you send an update to fhr-dev for this?
You could follow up on the thread: "Making the process choice required for Histograms.json".
Flags: needinfo?(chutten)
(Assignee)

Updated

7 months ago
Flags: needinfo?(chutten)
Keywords: leave-open

Comment 15

7 months ago
bugherder
https://hg.mozilla.org/mozilla-central/rev/ed073a4c343e
(Reporter)

Comment 16

7 months ago
I missed that we may have to update data job dependencies for this.
I worry whether this breaks python_moztelemetry or the aggregator (see bug 1343855, comment 8 ff).
Non-blocking, but this broke the (non-critical) probe-scraper [1].

1: https://nbviewer.jupyter.org/url/s3-us-west-2.amazonaws.com/telemetry-public-analysis-2/gfritzsche-telemetry-probe-scraper/data/load_and_run.ipynb
(Reporter)

Comment 17

7 months ago
We should just ignore/allow any additional properties when `self._strict_type_checks` is not set (meaning the script is used in data jobs).
That way we won't run into this repeatedly.
(Reporter)

Updated

7 months ago
Depends on: 1363973
(Reporter)

Updated

7 months ago
Depends on: 1363934
(Reporter)

Updated

7 months ago
Blocks: 1364362
(Reporter)

Updated

7 months ago
Depends on: 1364207
(Reporter)

Updated

7 months ago
Depends on: 1364393
(Reporter)

Updated

7 months ago
No longer depends on: 1363973
(Assignee)

Comment 18

7 months ago
Created attachment 8869157 [details] [diff] [review]
0001-bug-1335343-Prevent-histogram-recording-in-disallowe.patch

Here's the enforcement patch, with tests.
Attachment #8869157 - Flags: review?(alessio.placitelli)
Comment on attachment 8869157 [details] [diff] [review]
0001-bug-1335343-Prevent-histogram-recording-in-disallowe.patch

Review of attachment 8869157 [details] [diff] [review]:
-----------------------------------------------------------------

Thanks Chris, my comments mostly relate to testing:

- we could make the test probes more straightforward and cover the rest of the processes flags;
- we could test it in test_ChildHistograms, since we have a content process there.

::: toolkit/components/telemetry/Histograms.json
@@ +7561,4 @@
>      "kind": "flag",
>      "description": "a testing histogram; not meant to be touched"
>    },
> +  "TELEMETRY_TEST_NOT_MAIN": {

nit: what about renaming this to TELEMETRY_TEST_CONTENT_PROCESS?

@@ +7571,5 @@
> +    "n_buckets": 10,
> +    "bug_numbers": [1335343],
> +    "description": "a testing histogram; not meant to be touched"
> +  },
> +  "TELEMETRY_TEST_KEYED_NOT_MAIN": {

nit: TELEMETRY_TEST_KEYED_CONTENT_PROCESS

@@ +7583,5 @@
> +    "keyed": true,
> +    "bug_numbers": [1335343],
> +    "description": "a testing histogram; not meant to be touched"
> +  },
> +  "TELEMETRY_TEST_FLAG_NOT_MAIN": {

nit: TELEMETRY_TEST_FLAG_CONTENT_PROCESS

@@ +7597,5 @@
> +    "alert_emails": ["telemetry-client-dev@mozilla.com"],
> +    "expires_in_version": "never",
> +    "kind": "flag",
> +    "bug_numbers": [1335343],
> +    "description": "a testing histogram; not meant to be touched"

Can we also test 'all' and 'all_childs'?

::: toolkit/components/telemetry/TelemetryHistogram.cpp
@@ +131,4 @@
>    uint32_t dataset;
>    uint32_t label_index;
>    uint32_t label_count;
> +  mozilla::Telemetry::Common::RecordedProcessType record_in_processes;

nit: we can add an 'using' directive for RecordedProcessType around line ~40 to make this more readable

@@ +1955,5 @@
>      return;
>    }
>  
> +  const HistogramInfo& h = gHistograms[aID];
> +  if (!CanRecordInProcess(h.record_in_processes, XRE_GetProcessType())) {

nit: can you drop a comment here to mention that we do that to prevent enabling recording from the wrong processes?

@@ +1984,2 @@
>    Histogram *h;
> +  rv = internal_GetHistogramByName(id, &h);

This ends up calling |internal_GetHistogramEnumId| again :( I don't think that would bite performances, as SetHistogramRecordingEnabled shouldn't be called on an hot-path. Can you confirm (or refactor this part to call internal_GetHistogramEnumId once)?

::: toolkit/components/telemetry/histogram-whitelists.json
@@ +2020,4 @@
>      "TELEMETRY_TEST_COUNT_INIT_NO_RECORD",
>      "TELEMETRY_TEST_EXPIRED",
>      "TELEMETRY_TEST_FLAG",
> +    "TELEMETRY_TEST_FLAG_NOT_MAIN",

Can we use some other histogram type so we don't have to add exceptions to the whitelist?

::: toolkit/components/telemetry/tests/unit/test_TelemetryHistograms.js
@@ +939,5 @@
>    subsession = h.subsessionSnapshot();
>    Assert.ok(!(KEY in subsession));
>    Assert.equal(h.subsessionSnapshot(KEY).sum, 0);
> +},
> +function* test_record_in_processes() {

Can you put this in a separate add_task()?

@@ +964,5 @@
> +
> +  // Flag histogram disallowed in content still works in main.
> +  content.add(1);
> +  Assert.ok("TELEMETRY_TEST_FLAG_NOT_CONTENT" in Telemetry.histogramSnapshots);
> +  Assert.equal(content.snapshot().sum, 1);

All you can test here is that accumulations from the main process don't make it into "content" histograms.

Wouldn't test_ChildHistogram.js be a more suitable place for this? That way you could check that accumulations from content processes don't make it into main processes histograms, etc.
Attachment #8869157 - Flags: review?(alessio.placitelli)
(Assignee)

Comment 20

7 months ago
Created attachment 8869570 [details] [diff] [review]
0001-bug-1335343-Prevent-histogram-recording-in-disallowe.patch

I also fleshed out the commit message to be clear on a couple of design necessities.
Attachment #8869157 - Attachment is obsolete: true
Attachment #8869570 - Flags: review?(alessio.placitelli)
Comment on attachment 8869570 [details] [diff] [review]
0001-bug-1335343-Prevent-histogram-recording-in-disallowe.patch

Review of attachment 8869570 [details] [diff] [review]:
-----------------------------------------------------------------

This looks good, thanks. Other than the nit and the test request, I'm just holding back the r+ to understand what's the assumption you're referring to in test_ChildHistograms.js.

Keyed histograms should not be present if empty, regardless of the process. Am I missing something there?

Please also make sure this merges correctly (I'm looking at you TelemetryHistogram.cpp) due to bug 1361661 landing at the same time.

::: toolkit/components/telemetry/tests/unit/test_ChildHistograms.js
@@ +77,4 @@
>                 "Keyed flag test histogram should have the right value.");
>    Assert.equal(kh["TELEMETRY_TEST_KEYED_FLAG"]["b"].sum, 1,
>                 "Keyed flag test histogram should have the right value.");
> +

nit: extra blank line

@@ +135,5 @@
>  
> +  // Check record_in_processes
> +  // Content Process
> +  let hs = payload.processes.content.histograms;
> +  let khs = payload.processes.content.keyedHistograms;

Can we also check if the stored values match with the expected values that we set early in the test? For both the content and the parent process

@@ +148,5 @@
> +  let mainHs = payload.histograms;
> +  let mainKhs = payload.keyedHistograms;
> +  Assert.ok(!("TELEMETRY_TEST_CONTENT_PROCESS" in mainHs), "Should not have content process histogram in main process payload");
> +  //Assert.ok(!("TELEMETRY_TEST_KEYED_CONTENT_PROCESS" in mainKhs), "Should not have keyed content process histogram in main process payload");
> +  // Unfortunately, keyed histograms must be present due to assumptions in TelemetrySession.getKeyedHistograms

Which assumption? What's the problem here? We should not be reporting any empty keyed histogram, regardless of the process. Is this an issue here?
Attachment #8869570 - Flags: review?(alessio.placitelli) → feedback+
(Assignee)

Comment 22

6 months ago
Created attachment 8873584 [details] [diff] [review]
0001-bug-1335343-Prevent-histogram-recording-in-disallowe.patch

You are of course correct. I thought because of the TelemetrySession assumptions that underlies init code in TelemetryHistogram[1] which gives us the assumption in TelemetrySession that any registered keyedHistogram exists[2] it meant that they existed in the payload.

Here's the fix, including a fix to an issue the test uncovered where adding to a keyed histogram you aren't allowed to record into would populate the key, only failing to put the value into the histogram. (Which, of course, only happens in the parent process, because of course it wasn't evenly prohibitive)

Telemetry tests and lint pass. Ready for another look. (Adding it to :gfritzsche's queue as well due to a holiday).

[1]: http://searchfox.org/mozilla-central/source/toolkit/components/telemetry/TelemetryHistogram.cpp#1854
[2]: http://searchfox.org/mozilla-central/source/toolkit/components/telemetry/TelemetrySession.jsm#954
Attachment #8869570 - Attachment is obsolete: true
Attachment #8873584 - Flags: review?(gfritzsche)
Attachment #8873584 - Flags: review?(alessio.placitelli)
(Reporter)

Updated

6 months ago
Attachment #8873584 - Flags: review?(gfritzsche)
Comment on attachment 8873584 [details] [diff] [review]
0001-bug-1335343-Prevent-histogram-recording-in-disallowe.patch

Review of attachment 8873584 [details] [diff] [review]:
-----------------------------------------------------------------

This looks good now, thanks Chris.

::: toolkit/components/telemetry/TelemetryHistogram.cpp
@@ +2244,5 @@
>      mozilla::Telemetry::HistogramID id = mozilla::Telemetry::HistogramID(i);
>  
>      for (uint32_t process = 0; process < static_cast<uint32_t>(ProcessID::Count); ++process) {
> +      if (!CanRecordInProcess(hi.record_in_processes, ProcessID(process)) ||
> +        ((ProcessID(process) == ProcessID::Gpu) && !includeGPUProcess)) {

nit: indentation - I think this needs to be aligned to the "!" on the line above
Attachment #8873584 - Flags: review?(alessio.placitelli) → review+
Note: given the "Nightly" freeze thing, I'm not sure if this is allowed to land before the uplift. Do you mind checking this out?
Flags: needinfo?(chutten)
(Assignee)

Comment 25

6 months ago
(In reply to Alessio Placitelli [:Dexter] from comment #23)
> Comment on attachment 8873584 [details] [diff] [review]
> nit: indentation - I think this needs to be aligned to the "!" on the line
> above

The multi-line conditional just below that begs to differ :)

(In reply to Alessio Placitelli [:Dexter] from comment #24)
> Note: given the "Nightly" freeze thing, I'm not sure if this is allowed to
> land before the uplift. Do you mind checking this out?

Yeah... I'm not the happiest about leaving this to bitrot for a week, but I guess that's the world we live in now. The criteria pretty clearly ask us to err on the side of caution with something like this.
Flags: needinfo?(chutten)
Comment on attachment 8873584 [details] [diff] [review]
0001-bug-1335343-Prevent-histogram-recording-in-disallowe.patch

Review of attachment 8873584 [details] [diff] [review]:
-----------------------------------------------------------------

::: toolkit/components/telemetry/histogram_tools.py
@@ +120,4 @@
>          self._keyed = definition.get('keyed', False)
>          self._expiration = definition.get('expires_in_version')
>          self._labels = definition.get('labels', [])
> +        self._record_in_processes = definition['record_in_processes']

Sorry about this late request, I didn't catch it last time and I had a second look at it due to the failures we're seeing in bug 1368713.

This will throw a KeyError if 'record_in_processes' is not in definition. Maybe the safest path here is to use definition.get('record_in_processes', ['main'] if not self._is_use_counter else ['main', 'content']).

If we do that, I think we could even get rid of the 'use_counters' special case a few lines above.

In any case, we should be updating the docs to mention the defaults here: https://gecko.readthedocs.io/en/latest/toolkit/components/telemetry/telemetry/collection/histograms.html#record-in-processes
It should probably be ok to default to 'main' and 'content'
Luckily, we didn't land this yet :)

Chris, would you mind fixing the stuff from comment 26 before landing this second part? Rajiit found about it in bug 1368713 since the tests we just added were failing... so I guess we're on the right path with that patch \o/
Flags: needinfo?(chutten)
(Assignee)

Comment 28

6 months ago
I don't want there to be defaults for this key. It's required.

So I'll go with get('record_in_processes', None) and then it'll fail its type checks on the client yet succeed on the server and for use counters.
Flags: needinfo?(chutten)
(Assignee)

Comment 29

6 months ago
Created attachment 8877157 [details] [diff] [review]
0001-bug-1335343-Prevent-histogram-recording-in-disallowe.patch

I also rebased it atop central while I was looking at it.
Attachment #8873584 - Attachment is obsolete: true
Attachment #8877157 - Flags: review?(alessio.placitelli)
Attachment #8877157 - Flags: review?(alessio.placitelli) → review+
(Assignee)

Updated

6 months ago
Keywords: leave-open → checkin-needed

Comment 30

6 months ago
Pushed by ryanvm@gmail.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/8194cd1a678d
Prevent histogram recording in disallowed processes. r=Dexter
Keywords: checkin-needed
(Assignee)

Updated

6 months ago
Blocks: 1368713

Comment 31

6 months ago
bugherder
https://hg.mozilla.org/mozilla-central/rev/8194cd1a678d
Status: ASSIGNED → RESOLVED
Last Resolved: 6 months ago
status-firefox56: --- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla56
Depends on: 1373083
Mark 54 won't fix as 54 was released.
status-firefox54: affected → wontfix
status-firefox55: --- → affected
status-firefox55: affected → fix-optional
(Assignee)

Updated

20 days ago
Duplicate of this bug: 1281791
You need to log in before you can comment on or make changes to this bug.