Closed
Bug 1261063
Opened 8 years ago
Closed 8 years ago
Remove direct uses of base::histogram instances
Categories
(Toolkit :: Telemetry, defect, P1)
Toolkit
Telemetry
Tracking
()
RESOLVED
FIXED
mozilla49
People
(Reporter: gfritzsche, Assigned: gfritzsche)
References
Details
(Whiteboard: [measurement:client])
Attachments
(5 files, 2 obsolete files)
2.57 KB,
patch
|
chutten
:
review+
|
Details | Diff | Splinter Review |
3.41 KB,
patch
|
chutten
:
review+
|
Details | Diff | Splinter Review |
2.93 KB,
patch
|
tnikkel
:
review+
|
Details | Diff | Splinter Review |
3.03 KB,
patch
|
Details | Diff | Splinter Review | |
5.17 KB,
patch
|
jesup
:
review+
|
Details | Diff | Splinter Review |
base::histogram and other instances from histogram.h/.cc is only supposed to used internally in Telemetry code. However, currently there are three files where base::histogram::Add()/Substract() is used directly. We should: * make those use the Accumulate() functions like everything else * remove Telemetry::GetHistogramById() et al from the public interface to avoid those scenarios in the future
Assignee | ||
Comment 1•8 years ago
|
||
Uses of base::histogram via Telemetry::GetHistogramById() etc.: * https://dxr.mozilla.org/mozilla-central/source/image/RasterImage.cpp * https://dxr.mozilla.org/mozilla-central/source/media/webrtc/signaling/src/peerconnection/PeerConnectionCtx.cpp * https://dxr.mozilla.org/mozilla-central/source/media/webrtc/signaling/src/peerconnection/PeerConnectionImpl.cpp
Assignee | ||
Updated•8 years ago
|
Summary: Remove uses base::histogram instances directly → Remove direct uses of base::histogram instances
Assignee | ||
Comment 2•8 years ago
|
||
For converting RasterImage.cpp to use the Telemetry::Accumulate() functions instead of GetHistogramById()->Add()/Substract() we need to understand what this is actually supposed to be doing: https://dxr.mozilla.org/mozilla-central/search?q=path%3ARasterImage.cpp+GetHistogram&redirect=false&case=true I see that IMAGE_DECODE_COUNT & IMAGE_MAX_DECODE_COUNT use Add() and Substract() to just keep a count. IMAGE_MAX_DECODE_COUNT is clear and just ends up being the maximum of the instance decode counts. IMAGE_DECODE_COUNT seems to be rather randomly set to the count of whatever RasterImage instance submitted this last. Seth, do you know what question that histogram is actually trying to answer? Are both of those histograms actually being used and monitored?
Flags: needinfo?(seth)
Assignee | ||
Comment 3•8 years ago
|
||
The other histogram that GetHistogramById()->Add()/Substract() is used instead of Telemetry::Accumulate() is WEBRTC_CALL_COUNT in webrtc: https://dxr.mozilla.org/mozilla-central/search?q=path%3APeerConnection+GetHistogramById&redirect=false&case=true This seems to be a really simple count. Jesup, are you actively using/monitoring WEBRTC_CALL_COUNT? If yes, can we just migrate this to a plain "count" histogram and simple always add 1 to it in PeerConnectionImpl::startCallTelem()?
Flags: needinfo?(rjesup)
Assignee | ||
Comment 4•8 years ago
|
||
Assignee | ||
Updated•8 years ago
|
Assignee: nobody → gfritzsche
Status: NEW → ASSIGNED
Assignee | ||
Updated•8 years ago
|
Priority: P3 → P2
Comment 5•8 years ago
|
||
(In reply to Georg Fritzsche [:gfritzsche] from comment #2) > IMAGE_MAX_DECODE_COUNT is clear and just ends up being the maximum of the > instance decode counts. Yep. > IMAGE_DECODE_COUNT seems to be rather randomly set to the count of whatever > RasterImage instance submitted this last. If so, that means it's using the API wrong and we should fix it. I confess that that code predates me and I'm not familiar with the histogram API myself. > Seth, do you know what question that histogram is actually trying to answer? It's simple enough, we'd like to have a histogram of how often an image is decoded. Since we can now discard images (which wasn't true a long time ago), which can force us to redecode them if we paint them again, it's nice to know how often this is happening and if we ever make a change that causes a major shift in the histogram. > Are both of those histograms actually being used and monitored? I don't think anyone is looking at them frequently; to my knowledge, we mostly look at them when doing changes that have the potential to drastically change these values. It'd be nice to keep them.
Flags: needinfo?(seth)
Comment 6•8 years ago
|
||
(In reply to Georg Fritzsche [:gfritzsche] from comment #3) > The other histogram that GetHistogramById()->Add()/Substract() is used > instead of Telemetry::Accumulate() is WEBRTC_CALL_COUNT in webrtc: > https://dxr.mozilla.org/mozilla-central/ > search?q=path%3APeerConnection+GetHistogramById&redirect=false&case=true > > This seems to be a really simple count. > Jesup, are you actively using/monitoring WEBRTC_CALL_COUNT? Somewhat. It's not a critical stat to us. > If yes, can we just migrate this to a plain "count" histogram and simple > always add 1 to it in PeerConnectionImpl::startCallTelem()? We're trying to accumulate the number of calls per browser execution, not just count the number of calls made. I don't think what your suggesting will get us that. And we can't do it on shutdown, since we can't count on clean shutdown.
Flags: needinfo?(rjesup)
Assignee | ||
Comment 7•8 years ago
|
||
Ok, i think i misunderstood how both these histograms are used. They are not updating one count (for calls / decodes) per browser session, but rather keeping 1 count updated for each of N instances per browser session. A simple fix here would be that we simply expose Telemetry::Subtract(), then we can just keep both use-cases working without changing the semantics.
Assignee | ||
Comment 8•8 years ago
|
||
Seth, for RasterImage.cpp, could you just record IMAGE_DECODE_COUNT once per RasterImage instance when it is destroyed or similar? Or does that usually happen too late in shutdown?
Flags: needinfo?(seth)
Assignee | ||
Comment 9•8 years ago
|
||
(In reply to Randell Jesup [:jesup] from comment #6) > > If yes, can we just migrate this to a plain "count" histogram and simple > > always add 1 to it in PeerConnectionImpl::startCallTelem()? > > We're trying to accumulate the number of calls per browser execution, not > just count the number of calls made. > I don't think what your suggesting will get us that. And we can't do it on > shutdown, since we can't count on clean shutdown. Are you worried about shutdown crashes? Or do you actually drop the PeerConnectionImpl instances so you can never cleanly record this on shutdown?
Flags: needinfo?(rjesup)
Comment 10•8 years ago
|
||
> > We're trying to accumulate the number of calls per browser execution, not
> > just count the number of calls made.
> > I don't think what your suggesting will get us that. And we can't do it on
> > shutdown, since we can't count on clean shutdown.
>
> Are you worried about shutdown crashes?
> Or do you actually drop the PeerConnectionImpl instances so you can never
> cleanly record this on shutdown?
Anything that doesn't lead to a clean shutdown will be lost, and those that are lost are the people who generally run-until-crash (or os restart for patching, etc). This means the long tail will get chopped predominately. For example, my non-development browsers would fail to do clean shutdowns more often than they do clean shutdowns.
However... a) this isn't critical to us, and b) if we cared, we could compare to a separate total-number-of-calls telemetry value to calculate the number of "lost calls". It doesn't replace the date, but gives us a bound on the impact.
We could probably shift it to a "record once before shutdown" value.
Flags: needinfo?(rjesup)
Comment 11•8 years ago
|
||
(In reply to Georg Fritzsche [:gfritzsche] from comment #8) > Seth, for RasterImage.cpp, could you just record IMAGE_DECODE_COUNT once per > RasterImage instance when it is destroyed or similar? > Or does that usually happen too late in shutdown? I think for our use case it'd be fine to record IMAGE_DECODE_COUNT when a RasterImage is destroyed, and just bail if it's being destroyed during shutdown. It shouldn't affect the quality of the results significantly.
Flags: needinfo?(seth)
Assignee | ||
Comment 12•8 years ago
|
||
Assignee | ||
Updated•8 years ago
|
Attachment #8736782 -
Attachment is obsolete: true
Assignee | ||
Comment 13•8 years ago
|
||
(In reply to Randell Jesup [:jesup] from comment #10) > > > We're trying to accumulate the number of calls per browser execution, not > > > just count the number of calls made. > > > I don't think what your suggesting will get us that. And we can't do it on > > > shutdown, since we can't count on clean shutdown. > > > > Are you worried about shutdown crashes? > > Or do you actually drop the PeerConnectionImpl instances so you can never > > cleanly record this on shutdown? > > Anything that doesn't lead to a clean shutdown will be lost, and those that > are lost are the people who generally run-until-crash (or os restart for > patching, etc). This means the long tail will get chopped predominately. > For example, my non-development browsers would fail to do clean shutdowns > more often than they do clean shutdowns. > > However... a) this isn't critical to us, and b) if we cared, we could > compare to a separate total-number-of-calls telemetry value to calculate the > number of "lost calls". It doesn't replace the date, but gives us a bound > on the impact. > > We could probably shift it to a "record once before shutdown" value. Great, can you point to where we could record this before shutdown?
Flags: needinfo?(rjesup)
Comment 14•8 years ago
|
||
PeerConnection.js watches for shutdown notifications and deletes/cleans up peerconnections. That might be a reasonable place to hang it off.
Flags: needinfo?(rjesup)
Assignee | ||
Updated•8 years ago
|
Priority: P2 → P1
Assignee | ||
Updated•8 years ago
|
Points: --- → 3
Assignee | ||
Comment 15•8 years ago
|
||
Assignee | ||
Updated•8 years ago
|
Attachment #8737740 -
Attachment is obsolete: true
Assignee | ||
Comment 16•8 years ago
|
||
Assignee | ||
Comment 17•8 years ago
|
||
Assignee | ||
Comment 18•8 years ago
|
||
Comment on attachment 8738562 [details] [diff] [review] Part 1 - Remove public Telemetry functions that return raw histogram instances This removes the public functions that allow to access the base::histogram. We consider those internal, so we remove the access here. Chris, can you take a look here?
Attachment #8738562 -
Flags: review?(chutten)
Assignee | ||
Comment 19•8 years ago
|
||
Comment on attachment 8738563 [details] [diff] [review] Part 2 - Expose C++ function to clear Telemetry histograms For the semantics of IMAGE_MAX_DECODE_COUNT in RasterImage, we need to overwrite the value. The simple approach here is to expose clearing the histogram from C++, so we can remove the Substract()/Add() juggle. We already expose clearing histograms to JavaScript, so that doesn't change things here really. Note: Longer-term we should address those use-cases better, e.g. by adding a kind of "max" or "single value" histogram type, but that seems out of scope for this bug. Chris, can you take a look?
Attachment #8738563 -
Flags: review?(chutten)
Assignee | ||
Comment 20•8 years ago
|
||
Comment on attachment 8738565 [details] [diff] [review] Part 3 - Make RasterImage use the public Telemetry API Seth, this should cover things per the previous discussion here. In rough testing this looks fine, but i didn't manage to get a decode count > 1 - do you have a tip on triggering this?
Attachment #8738565 -
Flags: review?(seth)
Comment 21•8 years ago
|
||
Comment on attachment 8738562 [details] [diff] [review] Part 1 - Remove public Telemetry functions that return raw histogram instances Review of attachment 8738562 [details] [diff] [review]: ----------------------------------------------------------------- So... GetKeyedHistogramById(ID, const nsAString&) wasn't actually implemented or called? It'd be nice if that were mentioned in the commit message, but eh.
Attachment #8738562 -
Flags: review?(chutten) → review+
Comment 22•8 years ago
|
||
Comment on attachment 8738563 [details] [diff] [review] Part 2 - Expose C++ function to clear Telemetry histograms Review of attachment 8738563 [details] [diff] [review]: ----------------------------------------------------------------- LGTM
Attachment #8738563 -
Flags: review?(chutten) → review+
Assignee | ||
Comment 23•8 years ago
|
||
This version should keep the previous behavior as far as i can tell, as there is only one global instance of mConnectionCounter. It loses the one entry in the "0" bucket, but that doesnt seem like a problem for doing call counts.
Assignee | ||
Comment 24•8 years ago
|
||
Given that this apparently only tracks one global call count, we could just move that to a new count histogram. Is that a feasible option jesup?
Attachment #8739443 -
Flags: review?(rjesup)
Comment 26•8 years ago
|
||
Comment on attachment 8739443 [details] [diff] [review] Part 4b (Alternative) - Use a count histogram for WebRTC call counts Review of attachment 8739443 [details] [diff] [review]: ----------------------------------------------------------------- I can't evaluate the impact of this without a better understanding of what this change means. Can you point to more docs on the telemetry types? The include files don't really define the API well from what I saw (may have missed something) ::: toolkit/components/telemetry/Histograms.json @@ +6554,5 @@ > + "WEBRTC_CALL_COUNT_2": { > + "alert_emails": ["?"], > + "bug_numbers": [1261063], > + "expires_in_version": "never", > + "kind": "count", I don't know what a "count" histogram does differently than an "exponential"; I had trouble finding docs on this.
Attachment #8739443 -
Flags: review?(rjesup)
Comment 27•8 years ago
|
||
Comment on attachment 8739412 [details] [diff] [review] Part 4a - Make WebRTC PeerConnection use the public Telemetry API Review of attachment 8739412 [details] [diff] [review]: ----------------------------------------------------------------- ::: media/webrtc/signaling/src/peerconnection/PeerConnectionCtx.cpp @@ -356,5 @@ > initGMP(); > > #if !defined(MOZILLA_EXTERNAL_LINKAGE) > mConnectionCounter = 0; > - Telemetry::GetHistogramById(Telemetry::WEBRTC_CALL_COUNT)->Add(0); removing this means we lose the ability to measure what % of sessions made a call at all. (Or have to measure it indirectly as total submissions against number of submissions with a count.) That said, it still would be useful. Or we could add a second item that lets us know that. Hopefully this sort of evaluation would be possible without having to jump through hoops, even without the Add(0)
Updated•8 years ago
|
Flags: needinfo?(rjesup)
Assignee | ||
Comment 28•8 years ago
|
||
(In reply to Randell Jesup [:jesup] from comment #26) > I can't evaluate the impact of this without a better understanding of what > this change means. Can you point to more docs on the telemetry types? The > include files don't really define the API well from what I saw (may have > missed something) > > ::: toolkit/components/telemetry/Histograms.json > @@ +6554,5 @@ > > + "WEBRTC_CALL_COUNT_2": { > > + "alert_emails": ["?"], > > + "bug_numbers": [1261063], > > + "expires_in_version": "never", > > + "kind": "count", > > I don't know what a "count" histogram does differently than an > "exponential"; I had trouble finding docs on this. The main documentation is here: https://developer.mozilla.org/en-US/docs/Mozilla/Performance/Adding_a_new_Telemetry_probe#Choosing_a_Histogram_Type "count" histograms are different in that they only track one value. > removing this means we lose the ability to measure what % of sessions made a > call at all. (Or have to measure it indirectly as total submissions against > number of submissions with a count.) That said, it still would be useful. > Or we could add a second item that lets us know that. Hopefully this sort > of evaluation would be possible without having to jump through hoops, even > without the Add(0) Right - inferring it is a bit inconvenient at the moment. We could just add a specific probe for that - probably a separate "flag" histogram set from |PeerConnectionCtx::Initialize()| that you can compare with the "count" histogram of actual call counts. Does that sound ok?
Flags: needinfo?(rjesup)
Comment 29•8 years ago
|
||
Seth, Randell, review ping, please. Refactoring Telemetry is a prereq for de-racing Telemetry, which is indirectly blocking on this.
Assignee | ||
Comment 30•8 years ago
|
||
Comment on attachment 8738565 [details] [diff] [review] Part 3 - Make RasterImage use the public Telemetry API This has been sitting for a while - Timothy, do you have time to review this?
Attachment #8738565 -
Flags: review?(tnikkel)
Comment 31•8 years ago
|
||
(In reply to Georg Fritzsche [:gfritzsche] from comment #28) > (In reply to Randell Jesup [:jesup] from comment #26) > > I can't evaluate the impact of this without a better understanding of what > > this change means. Can you point to more docs on the telemetry types? The > > include files don't really define the API well from what I saw (may have > > missed something) > > > > ::: toolkit/components/telemetry/Histograms.json > > @@ +6554,5 @@ > > > + "WEBRTC_CALL_COUNT_2": { > > > + "alert_emails": ["?"], > > > + "bug_numbers": [1261063], > > > + "expires_in_version": "never", > > > + "kind": "count", > > > > I don't know what a "count" histogram does differently than an > > "exponential"; I had trouble finding docs on this. > > The main documentation is here: > https://developer.mozilla.org/en-US/docs/Mozilla/Performance/ > Adding_a_new_Telemetry_probe#Choosing_a_Histogram_Type > "count" histograms are different in that they only track one value. And they have a default value that's submitted as well. That will tell us how many sessions made calls - but unless we can view it with 0 removed, it will be very hard to see the distribution of call numbers > 0. That said, the data would be there, and this isn't a critical item (per above). Please drop a link into the .h or .json file to the docs, or describe these types in the .h (or json) file. > > removing this means we lose the ability to measure what % of sessions made a > > call at all. (Or have to measure it indirectly as total submissions against > > number of submissions with a count.) That said, it still would be useful. > > Or we could add a second item that lets us know that. Hopefully this sort > > of evaluation would be possible without having to jump through hoops, even > > without the Add(0) > > Right - inferring it is a bit inconvenient at the moment. > We could just add a specific probe for that - probably a separate "flag" > histogram set from |PeerConnectionCtx::Initialize()| that you can compare > with the "count" histogram of actual call counts. Now I'm confused; the docs you link to say 'count' histograms have a default (0), so we would get the % that made calls (and in fact it might be annoying to see numbers other than that, but we'll see).
Flags: needinfo?(rjesup)
Comment 32•8 years ago
|
||
(In reply to Randell Jesup [:jesup] from comment #31) > > Right - inferring it is a bit inconvenient at the moment. > > We could just add a specific probe for that - probably a separate "flag" > > histogram set from |PeerConnectionCtx::Initialize()| that you can compare > > with the "count" histogram of actual call counts. > > Now I'm confused; the docs you link to say 'count' histograms have a default > (0), so we would get the % that made calls (and in fact it might be annoying > to see numbers other than that, but we'll see). Aha, that was for the a different alternative (non-count-type)
Updated•8 years ago
|
Attachment #8739443 -
Flags: review+
Updated•8 years ago
|
Attachment #8738565 -
Flags: review?(tnikkel)
Attachment #8738565 -
Flags: review?(seth)
Attachment #8738565 -
Flags: review+
Assignee | ||
Comment 33•8 years ago
|
||
(In reply to Randell Jesup [:jesup] from comment #31) > (In reply to Georg Fritzsche [:gfritzsche] from comment #28) > > (In reply to Randell Jesup [:jesup] from comment #26) > > > I can't evaluate the impact of this without a better understanding of what > > > this change means. Can you point to more docs on the telemetry types? The > > > include files don't really define the API well from what I saw (may have > > > missed something) > > > > > > ::: toolkit/components/telemetry/Histograms.json > > > @@ +6554,5 @@ > > > > + "WEBRTC_CALL_COUNT_2": { > > > > + "alert_emails": ["?"], > > > > + "bug_numbers": [1261063], > > > > + "expires_in_version": "never", > > > > + "kind": "count", > > > > > > I don't know what a "count" histogram does differently than an > > > "exponential"; I had trouble finding docs on this. > > > > The main documentation is here: > > https://developer.mozilla.org/en-US/docs/Mozilla/Performance/ > > Adding_a_new_Telemetry_probe#Choosing_a_Histogram_Type > > "count" histograms are different in that they only track one value. > > And they have a default value that's submitted as well. That will tell us > how many sessions made calls - but unless we can view it with 0 removed, it > will be very hard to see the distribution of call numbers > 0. > > That said, the data would be there, and this isn't a critical item (per > above). > > Please drop a link into the .h or .json file to the docs, or describe these > types in the .h (or json) file. There are no comments allowed in JSON. Do you suggest adding a link to the wiki page in the Telemetry.h header? That sounds useful to me.
Flags: needinfo?(rjesup)
Comment 35•8 years ago
|
||
https://hg.mozilla.org/integration/fx-team/rev/f2a21c428fdf https://hg.mozilla.org/integration/fx-team/rev/7e3d90c76f93 https://hg.mozilla.org/integration/fx-team/rev/beaa0bcb3d16 https://hg.mozilla.org/integration/fx-team/rev/2deebb00193a
Comment 36•8 years ago
|
||
(In reply to Georg Fritzsche [:gfritzsche] from comment #2) > For converting RasterImage.cpp to use the Telemetry::Accumulate() functions > instead of GetHistogramById()->Add()/Substract() we need to understand what > this is actually supposed to be doing: > https://dxr.mozilla.org/mozilla-central/search?q=path%3ARasterImage. > cpp+GetHistogram&redirect=false&case=true > > I see that IMAGE_DECODE_COUNT & IMAGE_MAX_DECODE_COUNT use Add() and > Substract() to just keep a count. > > IMAGE_MAX_DECODE_COUNT is clear and just ends up being the maximum of the > instance decode counts. > > IMAGE_DECODE_COUNT seems to be rather randomly set to the count of whatever > RasterImage instance submitted this last. > Seth, do you know what question that histogram is actually trying to answer? > Are both of those histograms actually being used and monitored? I filed bug 1269829 to make IMAGE_DECODE_COUNT track something useful.
Comment 37•8 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/f2a21c428fdf https://hg.mozilla.org/mozilla-central/rev/7e3d90c76f93 https://hg.mozilla.org/mozilla-central/rev/beaa0bcb3d16 https://hg.mozilla.org/mozilla-central/rev/2deebb00193a
Status: ASSIGNED → RESOLVED
Closed: 8 years ago
status-firefox49:
--- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla49
Assignee | ||
Updated•8 years ago
|
Flags: needinfo?(seth)
You need to log in
before you can comment on or make changes to this bug.
Description
•