Closed
Bug 1335343
Opened 7 years ago
Closed 7 years ago
Default histogram recording to main & content process
Categories
(Toolkit :: Telemetry, defect, P1)
Toolkit
Telemetry
Tracking
()
RESOLVED
FIXED
mozilla56
Tracking | Status | |
---|---|---|
firefox54 | --- | wontfix |
firefox55 | --- | fix-optional |
firefox56 | --- | fixed |
People
(Reporter: gfritzsche, Assigned: chutten)
References
Details
(Whiteboard: [measurement:client])
Attachments
(2 files, 3 obsolete files)
Similar to bug 1333797 for scalars, we should limit the recorded histograms to specific process types before the GPU process starts to roll out to broader populations [1]. Proposed plan: - add a required field `record_in_processes` to Histograms.json - for all existing histograms, fill in `record_in_processes: ["main", "content"]` - reach out to see which histograms need GPU process recording (:dvander, :ahughes?) - make sure the new field doesn't surprise/break any data pipeline jobs 1: https://ashughes.com/?p=374
Reporter | ||
Updated•7 years ago
|
Priority: P2 → P1
Assignee | ||
Comment 1•7 years ago
|
||
The following non-flag histograms were present in payload.processes.gpu amongst Nightly pings submitted on April 18: CHECKERBOARD_DURATION CHECKERBOARD_PEAK CHECKERBOARD_POTENTIAL_DURATION CHECKERBOARD_SEVERITY COMPOSITE_FRAME_ROUNDTRIP_TIME COMPOSITE_TIME COMPOSITOR_ANIMATION_DURATION COMPOSITOR_ANIMATION_MAX_CONTIGUOUS_DROPS_APZ COMPOSITOR_ANIMATION_MAX_CONTIGUOUS_DROPS_CHROME COMPOSITOR_ANIMATION_MAX_CONTIGUOUS_DROPS_CONTENT COMPOSITOR_ANIMATION_MAX_LAYER_AREA COMPOSITOR_ANIMATION_THROUGHPUT_APZ COMPOSITOR_ANIMATION_THROUGHPUT_CHROME COMPOSITOR_ANIMATION_THROUGHPUT_CONTENT CONTENT_RESPONSE_DURATION D3D11_SYNC_HANDLE_FAILURE DEVICE_RESET_REASON FORCED_DEVICE_RESET_REASON GPU_PROCESS_INITIALIZATION_TIME_MS MEDIA_DECODER_BACKEND_USED MEDIA_WMF_DECODE_ERROR SCROLL_INPUT_METHODS I expect this is a superset of hgrams we care about having from GPU processes, but we can start here. (( As for flag histograms, there were none submitted on that day that had a non-false value. I'm thinking we successfully kept them from infecting gpu. )) Hrm. I wonder if there's any code that expects flag histograms to have a valid value on every process or if I'll be able to cleanly skip all of them when reflecting to JS...
Assignee: nobody → chutten
Status: NEW → ASSIGNED
Assignee | ||
Comment 2•7 years ago
|
||
Almost forgot gpu's keyed histograms: [u'IPC_READ_MAIN_THREAD_LATENCY_MS', u'IPC_WRITE_LATENCY_MS', u'D3D11_COMPOSITING_FAILURE_ID', u'OPENGL_COMPOSITING_FAILURE_ID', u'CANVAS_WEBGL_ACCL_FAILURE_ID', u'IPC_MESSAGE_SIZE', u'IPC_READ_LATENCY_MS', u'IPC_WRITE_MAIN_THREAD_LATENCY_MS', u'MAIN_THREAD_RUNNABLE_MS']
Assignee | ||
Comment 3•7 years ago
|
||
IPC_{READ,WRITE}_LATENCY_MS are vestigial from bug 1342635 and can be ignored.
Comment hidden (mozreview-request) |
Comment 5•7 years ago
|
||
mozreview-review |
Comment on attachment 8866031 [details] bug 1335343 - Add initial record_in_processes support https://reviewboard.mozilla.org/r/137624/#review140994 This looks good to me, thanks. Great job in adding all the entries to the Histograms.json file. I'm holding back the r+ until the issue below is cleared. ::: toolkit/components/telemetry/histogram_tools.py:168 (Diff revision 1) > > def dataset(self): > """Returns the dataset this histogram belongs into.""" > return self._dataset > > def labels(self): I think you also need to add getters here. See the scalars implementation [here](http://searchfox.org/mozilla-central/rev/7057a51c5cf29b5b115b1db19ace2cfe3fd83a0e/toolkit/components/telemetry/parse_scalars.py#241,246). You probably just need to add one to get the enum needed for the implementation. Are doc updates coming as a separate patch?
Comment hidden (mozreview-request) |
Comment 7•7 years ago
|
||
mozreview-review |
Comment on attachment 8866031 [details] bug 1335343 - Add initial record_in_processes support https://reviewboard.mozilla.org/r/137624/#review141086 This looks good now, thanks! Let's not forget to update the docs later!
Attachment #8866031 -
Flags: review?(alessio.placitelli) → review+
Comment hidden (mozreview-request) |
Pushed by chutten@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/69ddf4e384ab Add initial record_in_processes support r=Dexter
Backed out for lint failures like https://treeherder.mozilla.org/logviewer.html#?job_id=98041423&repo=autoland https://hg.mozilla.org/integration/autoland/rev/aa740eb0e52cc6b9f8b24388caa43e2de7b63f47
Flags: needinfo?(chutten)
Comment hidden (mozreview-request) |
Comment 13•7 years ago
|
||
Pushed by chutten@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/ed073a4c343e Add initial record_in_processes support r=Dexter
Reporter | ||
Comment 14•7 years ago
|
||
Can you send an update to fhr-dev for this? You could follow up on the thread: "Making the process choice required for Histograms.json".
Flags: needinfo?(chutten)
Assignee | ||
Updated•7 years ago
|
Flags: needinfo?(chutten)
Keywords: leave-open
Comment 15•7 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/ed073a4c343e
Reporter | ||
Comment 16•7 years ago
|
||
I missed that we may have to update data job dependencies for this. I worry whether this breaks python_moztelemetry or the aggregator (see bug 1343855, comment 8 ff). Non-blocking, but this broke the (non-critical) probe-scraper [1]. 1: https://nbviewer.jupyter.org/url/s3-us-west-2.amazonaws.com/telemetry-public-analysis-2/gfritzsche-telemetry-probe-scraper/data/load_and_run.ipynb
Reporter | ||
Comment 17•7 years ago
|
||
We should just ignore/allow any additional properties when `self._strict_type_checks` is not set (meaning the script is used in data jobs). That way we won't run into this repeatedly.
Assignee | ||
Comment 18•7 years ago
|
||
Here's the enforcement patch, with tests.
Attachment #8869157 -
Flags: review?(alessio.placitelli)
Comment 19•7 years ago
|
||
Comment on attachment 8869157 [details] [diff] [review] 0001-bug-1335343-Prevent-histogram-recording-in-disallowe.patch Review of attachment 8869157 [details] [diff] [review]: ----------------------------------------------------------------- Thanks Chris, my comments mostly relate to testing: - we could make the test probes more straightforward and cover the rest of the processes flags; - we could test it in test_ChildHistograms, since we have a content process there. ::: toolkit/components/telemetry/Histograms.json @@ +7561,4 @@ > "kind": "flag", > "description": "a testing histogram; not meant to be touched" > }, > + "TELEMETRY_TEST_NOT_MAIN": { nit: what about renaming this to TELEMETRY_TEST_CONTENT_PROCESS? @@ +7571,5 @@ > + "n_buckets": 10, > + "bug_numbers": [1335343], > + "description": "a testing histogram; not meant to be touched" > + }, > + "TELEMETRY_TEST_KEYED_NOT_MAIN": { nit: TELEMETRY_TEST_KEYED_CONTENT_PROCESS @@ +7583,5 @@ > + "keyed": true, > + "bug_numbers": [1335343], > + "description": "a testing histogram; not meant to be touched" > + }, > + "TELEMETRY_TEST_FLAG_NOT_MAIN": { nit: TELEMETRY_TEST_FLAG_CONTENT_PROCESS @@ +7597,5 @@ > + "alert_emails": ["telemetry-client-dev@mozilla.com"], > + "expires_in_version": "never", > + "kind": "flag", > + "bug_numbers": [1335343], > + "description": "a testing histogram; not meant to be touched" Can we also test 'all' and 'all_childs'? ::: toolkit/components/telemetry/TelemetryHistogram.cpp @@ +131,4 @@ > uint32_t dataset; > uint32_t label_index; > uint32_t label_count; > + mozilla::Telemetry::Common::RecordedProcessType record_in_processes; nit: we can add an 'using' directive for RecordedProcessType around line ~40 to make this more readable @@ +1955,5 @@ > return; > } > > + const HistogramInfo& h = gHistograms[aID]; > + if (!CanRecordInProcess(h.record_in_processes, XRE_GetProcessType())) { nit: can you drop a comment here to mention that we do that to prevent enabling recording from the wrong processes? @@ +1984,2 @@ > Histogram *h; > + rv = internal_GetHistogramByName(id, &h); This ends up calling |internal_GetHistogramEnumId| again :( I don't think that would bite performances, as SetHistogramRecordingEnabled shouldn't be called on an hot-path. Can you confirm (or refactor this part to call internal_GetHistogramEnumId once)? ::: toolkit/components/telemetry/histogram-whitelists.json @@ +2020,4 @@ > "TELEMETRY_TEST_COUNT_INIT_NO_RECORD", > "TELEMETRY_TEST_EXPIRED", > "TELEMETRY_TEST_FLAG", > + "TELEMETRY_TEST_FLAG_NOT_MAIN", Can we use some other histogram type so we don't have to add exceptions to the whitelist? ::: toolkit/components/telemetry/tests/unit/test_TelemetryHistograms.js @@ +939,5 @@ > subsession = h.subsessionSnapshot(); > Assert.ok(!(KEY in subsession)); > Assert.equal(h.subsessionSnapshot(KEY).sum, 0); > +}, > +function* test_record_in_processes() { Can you put this in a separate add_task()? @@ +964,5 @@ > + > + // Flag histogram disallowed in content still works in main. > + content.add(1); > + Assert.ok("TELEMETRY_TEST_FLAG_NOT_CONTENT" in Telemetry.histogramSnapshots); > + Assert.equal(content.snapshot().sum, 1); All you can test here is that accumulations from the main process don't make it into "content" histograms. Wouldn't test_ChildHistogram.js be a more suitable place for this? That way you could check that accumulations from content processes don't make it into main processes histograms, etc.
Attachment #8869157 -
Flags: review?(alessio.placitelli)
Assignee | ||
Comment 20•7 years ago
|
||
I also fleshed out the commit message to be clear on a couple of design necessities.
Attachment #8869157 -
Attachment is obsolete: true
Attachment #8869570 -
Flags: review?(alessio.placitelli)
Comment 21•7 years ago
|
||
Comment on attachment 8869570 [details] [diff] [review] 0001-bug-1335343-Prevent-histogram-recording-in-disallowe.patch Review of attachment 8869570 [details] [diff] [review]: ----------------------------------------------------------------- This looks good, thanks. Other than the nit and the test request, I'm just holding back the r+ to understand what's the assumption you're referring to in test_ChildHistograms.js. Keyed histograms should not be present if empty, regardless of the process. Am I missing something there? Please also make sure this merges correctly (I'm looking at you TelemetryHistogram.cpp) due to bug 1361661 landing at the same time. ::: toolkit/components/telemetry/tests/unit/test_ChildHistograms.js @@ +77,4 @@ > "Keyed flag test histogram should have the right value."); > Assert.equal(kh["TELEMETRY_TEST_KEYED_FLAG"]["b"].sum, 1, > "Keyed flag test histogram should have the right value."); > + nit: extra blank line @@ +135,5 @@ > > + // Check record_in_processes > + // Content Process > + let hs = payload.processes.content.histograms; > + let khs = payload.processes.content.keyedHistograms; Can we also check if the stored values match with the expected values that we set early in the test? For both the content and the parent process @@ +148,5 @@ > + let mainHs = payload.histograms; > + let mainKhs = payload.keyedHistograms; > + Assert.ok(!("TELEMETRY_TEST_CONTENT_PROCESS" in mainHs), "Should not have content process histogram in main process payload"); > + //Assert.ok(!("TELEMETRY_TEST_KEYED_CONTENT_PROCESS" in mainKhs), "Should not have keyed content process histogram in main process payload"); > + // Unfortunately, keyed histograms must be present due to assumptions in TelemetrySession.getKeyedHistograms Which assumption? What's the problem here? We should not be reporting any empty keyed histogram, regardless of the process. Is this an issue here?
Attachment #8869570 -
Flags: review?(alessio.placitelli) → feedback+
Assignee | ||
Comment 22•7 years ago
|
||
You are of course correct. I thought because of the TelemetrySession assumptions that underlies init code in TelemetryHistogram[1] which gives us the assumption in TelemetrySession that any registered keyedHistogram exists[2] it meant that they existed in the payload. Here's the fix, including a fix to an issue the test uncovered where adding to a keyed histogram you aren't allowed to record into would populate the key, only failing to put the value into the histogram. (Which, of course, only happens in the parent process, because of course it wasn't evenly prohibitive) Telemetry tests and lint pass. Ready for another look. (Adding it to :gfritzsche's queue as well due to a holiday). [1]: http://searchfox.org/mozilla-central/source/toolkit/components/telemetry/TelemetryHistogram.cpp#1854 [2]: http://searchfox.org/mozilla-central/source/toolkit/components/telemetry/TelemetrySession.jsm#954
Attachment #8869570 -
Attachment is obsolete: true
Attachment #8873584 -
Flags: review?(gfritzsche)
Attachment #8873584 -
Flags: review?(alessio.placitelli)
Reporter | ||
Updated•7 years ago
|
Attachment #8873584 -
Flags: review?(gfritzsche)
Comment 23•7 years ago
|
||
Comment on attachment 8873584 [details] [diff] [review] 0001-bug-1335343-Prevent-histogram-recording-in-disallowe.patch Review of attachment 8873584 [details] [diff] [review]: ----------------------------------------------------------------- This looks good now, thanks Chris. ::: toolkit/components/telemetry/TelemetryHistogram.cpp @@ +2244,5 @@ > mozilla::Telemetry::HistogramID id = mozilla::Telemetry::HistogramID(i); > > for (uint32_t process = 0; process < static_cast<uint32_t>(ProcessID::Count); ++process) { > + if (!CanRecordInProcess(hi.record_in_processes, ProcessID(process)) || > + ((ProcessID(process) == ProcessID::Gpu) && !includeGPUProcess)) { nit: indentation - I think this needs to be aligned to the "!" on the line above
Attachment #8873584 -
Flags: review?(alessio.placitelli) → review+
Comment 24•7 years ago
|
||
Note: given the "Nightly" freeze thing, I'm not sure if this is allowed to land before the uplift. Do you mind checking this out?
Flags: needinfo?(chutten)
Assignee | ||
Comment 25•7 years ago
|
||
(In reply to Alessio Placitelli [:Dexter] from comment #23) > Comment on attachment 8873584 [details] [diff] [review] > nit: indentation - I think this needs to be aligned to the "!" on the line > above The multi-line conditional just below that begs to differ :) (In reply to Alessio Placitelli [:Dexter] from comment #24) > Note: given the "Nightly" freeze thing, I'm not sure if this is allowed to > land before the uplift. Do you mind checking this out? Yeah... I'm not the happiest about leaving this to bitrot for a week, but I guess that's the world we live in now. The criteria pretty clearly ask us to err on the side of caution with something like this.
Flags: needinfo?(chutten)
Comment 26•7 years ago
|
||
Comment on attachment 8873584 [details] [diff] [review] 0001-bug-1335343-Prevent-histogram-recording-in-disallowe.patch Review of attachment 8873584 [details] [diff] [review]: ----------------------------------------------------------------- ::: toolkit/components/telemetry/histogram_tools.py @@ +120,4 @@ > self._keyed = definition.get('keyed', False) > self._expiration = definition.get('expires_in_version') > self._labels = definition.get('labels', []) > + self._record_in_processes = definition['record_in_processes'] Sorry about this late request, I didn't catch it last time and I had a second look at it due to the failures we're seeing in bug 1368713. This will throw a KeyError if 'record_in_processes' is not in definition. Maybe the safest path here is to use definition.get('record_in_processes', ['main'] if not self._is_use_counter else ['main', 'content']). If we do that, I think we could even get rid of the 'use_counters' special case a few lines above. In any case, we should be updating the docs to mention the defaults here: https://gecko.readthedocs.io/en/latest/toolkit/components/telemetry/telemetry/collection/histograms.html#record-in-processes It should probably be ok to default to 'main' and 'content'
Comment 27•7 years ago
|
||
Luckily, we didn't land this yet :) Chris, would you mind fixing the stuff from comment 26 before landing this second part? Rajiit found about it in bug 1368713 since the tests we just added were failing... so I guess we're on the right path with that patch \o/
Flags: needinfo?(chutten)
Assignee | ||
Comment 28•7 years ago
|
||
I don't want there to be defaults for this key. It's required. So I'll go with get('record_in_processes', None) and then it'll fail its type checks on the client yet succeed on the server and for use counters.
Flags: needinfo?(chutten)
Assignee | ||
Comment 29•7 years ago
|
||
I also rebased it atop central while I was looking at it.
Attachment #8873584 -
Attachment is obsolete: true
Attachment #8877157 -
Flags: review?(alessio.placitelli)
Updated•7 years ago
|
Attachment #8877157 -
Flags: review?(alessio.placitelli) → review+
Assignee | ||
Updated•7 years ago
|
Keywords: leave-open → checkin-needed
Comment 30•7 years ago
|
||
Pushed by ryanvm@gmail.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/8194cd1a678d Prevent histogram recording in disallowed processes. r=Dexter
Keywords: checkin-needed
Comment 31•7 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/8194cd1a678d
Status: ASSIGNED → RESOLVED
Closed: 7 years ago
status-firefox56:
--- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla56
Updated•7 years ago
|
You need to log in
before you can comment on or make changes to this bug.
Description
•