Closed Bug 1300491 Opened 3 years ago Closed 3 years ago

Remove nsITelemetry.histogramFrom()

Categories

(Toolkit :: Telemetry, defect, P4)

defect

Tracking

()

RESOLVED FIXED
mozilla52
Tracking Status
firefox52 --- fixed

People

(Reporter: gfritzsche, Assigned: adamg2, Mentored)

References

Details

(Whiteboard: [measurement:client] [lang=js] [lang=c++])

Attachments

(1 file)

Similar to bug 1288745, we can remove `histogramFrom()` too.
It is only used in one place:
https://dxr.mozilla.org/mozilla-central/rev/dbe4b47941c7b3d6298a0ead5e40dd828096c808/toolkit/components/telemetry/TelemetrySession.jsm#1227

We should be able to define the "STARTUP_" versions of the interesting histograms in Histograms.json.
I assume this will require instead adding a function to copy state between two registered histograms.
Assignee: nobody → adamgj.wong
Mentor: chutten
Hey, Adam! Are you stuck on this bug? Can I help?
Flags: needinfo?(adamgj.wong)
(In reply to Chris H-C :chutten from comment #1)
> Hey, Adam! Are you stuck on this bug? Can I help?

Hey Chris!

Sorry about the lack of communication, I was out of town this past week and so will likely work through it over the weekend.  Is it urgent?
No, no. Just checking in to ensure you aren't stuck or getting frustrated. :)
Flags: needinfo?(adamgj.wong)
Hmm, maybe you can help me out after all!  I wanted to spend some time to see if I could figure this one out, but I think I need a bit of assistance.

I logged all the calls to histogramFrom to see which histograms are being duplicated (is there an easier way to figure this out rather than printing all the histogram names to a file and then reading it later?) and it appears to be quite a big list.  

Is our intention to create a hard-coded constant histogram for each of them so that they can be accessed using getHistogramById?  Going by the first comment, it appears we want to create the constant STARTUP histograms and then copy over the state from the other registered histogram.  So the only difference between the new functionality and the old is that the resulting STARTUP histogram is registered?
Flags: needinfo?(chutten)
The way I'd do it would be to see from code inspection where it is called. As :gfritzsche mentioned above, it's only in the one place[1]. That place loops over all histogram names (Telemetry.registeredHistograms) and tests them against a regular expression (in isInterestingStartupHistogram). If the name matches, and we have data in the original histogram, we add a STARTUP_ variant.

So, the regex [2] is: /SQLITE|HTTP|SPDY|CACHE|DNS/

So we need a STARTUP_ variant for any histogram whose name has one of those strings in it. I'd pick my favourite appropriate language (or one I wanted to learn how to use better) to write a little script to read in Histograms.json and spit out a json file that was all the histogram definitions whose names match the regex, with STARTUP_ prepended to their names. (( It is important that we have a copy of all the histogram details so they expire and alert and are visible in the same way as their originals )). Then I'd append those to the bottom of Histograms.json, and look into writing that function for copying state between histogram objects.

...and then I'd realize we're doing a lot of work removing one function just to add one function. I'd ask for clarification from the reported about why this is worthwhile, as it isn't immediately clear to me from this or any related bug. We're able to remove one function from nsITelemetry, sure... but we have to add a function to JSHistogram's and JSKeyedHistogram's prototypes instead. Also, by effectively duplicating histograms in Histograms.json, we're just asking for someone to go in and make a change to one of the originals without changing the STARTUP_ one (or vice versa). So that's a maintenance headache we're specifically building.

Unfortunately, I happen to know that :gfritzsche is in training, so he might not be able to reply all that quickly.

Well, you have more than one bug to work on, so let's leave him this question as a ni? for when he gets around to it. This code isn't going anywhere.

[1]: https://dxr.mozilla.org/mozilla-central/rev/dbe4b47941c7b3d6298a0ead5e40dd828096c808/toolkit/components/telemetry/TelemetrySession.jsm#1227
[2]: https://dxr.mozilla.org/mozilla-central/source/toolkit/components/telemetry/TelemetrySession.jsm#655
Flags: needinfo?(chutten) → needinfo?(gfritzsche)
What i think we should get rid of here is the fact that cloneHistogram() dynamically creates new histograms.
If we have these STARTUP_* histograms registered in Histograms.json, the same way as any other histogram, then we:
* have them visible & discoverable (for documentation & data review)
* will have them show up on telemetry.mozilla.org & other datasets and tools
Currently they are tacked on to our histogram store as a special edge case.

Adding an additional function to "copy state from histogram A to histogram B" seems unfortunate, but i think it still gets us uniformity in the other areas.

How about we:
* proceed with the duplication
* more the "cloneHistogram()" into a "copy histogram state" function
* make the TelemetrySession code "copy state" instead of cloning
* assert through a test (or histogram_tools.py check?) that the STARTUP_* histograms definitions match their sources
Flags: needinfo?(gfritzsche)
And while i was writing cloneHistogram(), i really meant to write histogramFrom().
If STARTUP_ isn't available in documentation, data review, tmo, and "other datasets and tools"... then what purpose do they serve? Can we excise STARTUP_* and be done with it?
Great point. I can't find them in t.m.o or in re:dash, because they are not listed in Histograms.json.

Roberto, do you remember if the STARTUP_* histograms are used by anyone historically, if those informed some dashboards, ...?
David, are any of them relevant for your team?

This is about the STARTUP_SQLITE_*, STARTUP_HTTP_*, STARTUP_SPDY_*, STARTUP_CACHE_*, STARTUP_DNS_* histograms.
Flags: needinfo?(rvitillo)
Flags: needinfo?(ddurst)
Assuming we will start looking into startup crashes post-CrashSender, I would expect them to be potentially relevant. Roberto or chutten can probably speak to the dashboards better than I can.
Flags: needinfo?(ddurst)
Alrighty, :ddurst tells me he knows of nothing that uses them.

I've double-checked telemetry-alerts, and it doesn't support them.

If Roberto doesn't come up with anything, I say we write a post to fhr-dev saying we'll be removing them in two days and then start work just removing the functionality.
Oh, but if this is about keeping them vs getting rid of them, if we're not using them for dashboards (and no one on the list says "I'm using them!") then let's get rid of them. At the point at which we'd want to query startup things, we'd use regular telemetry; if we're trying to track deviations from the norm (during startup) for things like those listed, then... we could add them back in.

(Categorically I agree with explicit listing in Histograms.json and copying state rather than outright cloning.)
Flags: needinfo?(rvitillo)
(In reply to Georg Fritzsche [:gfritzsche] from comment #9)
> Roberto, do you remember if the STARTUP_* histograms are used by anyone
> historically, if those informed some dashboards, ...?
> David, are any of them relevant for your team?

I am not aware of any user of those histograms.
Alrighty, so Adam, this is where the fun things happen. We're about to remove some stuff, but we're not sure if anyone's using it. The correct way to give everyone a heads-up and last chance to tell us to stop is to send an email to the mailing list of people who care about these things (in this case: fhr-dev).

So your next steps are:
1) Subscribe to fhr-dev: https://mail.mozilla.org/listinfo/fhr-dev
2) Post to (send an email to) fhr-dev@mozilla.org outlining our intent to remove STARTUP_ histograms and the mechanism for auto-generating them. Include a reference to this bug and your best guess about when you'd like to land this change so people know they have a deadline to respond (and a timeline for when to start looking for things breaking)
3) Code the patch to just straight up remove histogramFrom and all the STARTUP_* you can find.
4) Get it reviewed.
5) When the deadline hits, land the patch.

Sound cool?
Flags: needinfo?(adamgj.wong)
Sounds good!  Thanks for the input everyone!
Flags: needinfo?(adamgj.wong)
The mail went out here:
https://mail.mozilla.org/pipermail/fhr-dev/2016-October/001052.html

Clear write-up Adam, thanks!
You might want to forward that mail to fx-team@ or one of the broader lists; I'm not sure fhr-dev is a wide enough distribution to be confident that any potential stakeholders saw your message.
Thanks Richard, I have forwarded it to fx-team.  Hope I did that right.
Hm that did not work. Is it fx-team@mozilla.org?
This is the right public list for Firefox development topics:
https://mail.mozilla.org/listinfo/firefox-dev

In this case i don't think it's needed though - these are measurements that are not documented anywhere and we have informed the successor team of the one that implemented and used it (ddursts team).
Attachment #8800064 - Flags: review?(chutten)
Comment on attachment 8800064 [details]
Bug 1300491 - Removed histogramFrom and resulting unused functions, all STARTUP_* histograms, and any tests using these.

https://reviewboard.mozilla.org/r/85088/#review83764

Oh, how I love patches with a lot of red.

There's a little bit more we can grab while we're here, so let's make this a clean sweep.

::: toolkit/components/telemetry/TelemetryHistogram.cpp
(Diff revision 1)
> -                                    &id);
> -    if (NS_FAILED(rv)) {
> -      return rv;
> -    }
> -
> -    clone = internal_CloneHistogram(name, id);

This is the only use of internal_CloneHistogram(const nsACString&, mozilla::Telemetry::ID), so we can remove that too

::: toolkit/components/telemetry/TelemetrySession.jsm
(Diff revision 1)
> -    let info =
> -      Telemetry.registeredHistograms(this.getDatasetType(), []);
> -    let snapshots = Telemetry.histogramSnapshots;
> -    for (let name of info) {
> -      // Only duplicate histograms with actual data.
> -      if (this.isInterestingStartupHistogram(name) && name in snapshots) {

This is the only use of isInterestingStartupHistogram (which is, in turn, I believe the only use of _startupHistogramRegex), so we can remove those while we're in the neighbourhood.
Attachment #8800064 - Flags: review?(chutten) → review-
Comment on attachment 8800064 [details]
Bug 1300491 - Removed histogramFrom and resulting unused functions, all STARTUP_* histograms, and any tests using these.

https://reviewboard.mozilla.org/r/85088/#review85046

I'm sorry. I missed one.

::: toolkit/components/telemetry/TelemetrySession.jsm
(Diff revisions 1 - 2)
>    _histograms: {},
>    _initialized: false,
>    _logger: null,
>    _prevValues: {},
> -  // Regex that matches histograms we care about during startup.
> -  // Keep this in sync with gen-histogram-bucket-ranges.py.

Oh darn, I missed this comment the first time 'round. This means gen-histogram-bucket-ranges.py needs to be changed too, to pull out STARTUP_ logic
Attachment #8800064 - Flags: review?(chutten) → review-
Comment on attachment 8800064 [details]
Bug 1300491 - Removed histogramFrom and resulting unused functions, all STARTUP_* histograms, and any tests using these.

https://reviewboard.mozilla.org/r/85088/#review87016

Thank you for getting all these edge cases!
Attachment #8800064 - Flags: review?(chutten) → review+
Pushed by chutten@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/4fb54b655a91
Removed histogramFrom and resulting unused functions, all STARTUP_* histograms, and any tests using these. r=chutten
This had to be backed out due to an Android build failure[1]. On that platform, internal_GetSubsessionHistogram isn't present. It's the only user of internal_CloneHistogram(const nsACString_internal&, mozilla::Telemetry::ID, base::Histogram&) left after the removal of histogramFrom.

So the correct thing to do, Adam, would be to move that internal_CloneHistogram inside the same #ifdef block as internal_GetSubsessionHistogram so that they're both only available and unavailable on the same platforms.
Backed out for build bustage in TelemetryHistogram.cpp (defined but not used):

https://hg.mozilla.org/integration/autoland/rev/fd274d4c044396fa00d8358c94714bc87d43b069

Push with failures: https://treeherder.mozilla.org/#/jobs?repo=autoland&revision=4fb54b655a91fde12d7ba2cf8205c452da0ed20b
Failure log: https://treeherder.mozilla.org/logviewer.html#?job_id=5578163&repo=autoland

[task 2016-10-24T15:31:02.072284Z] 15:31:02     INFO -  /home/worker/workspace/build/src/toolkit/components/telemetry/TelemetryHistogram.cpp:564:1: error: 'base::Histogram* {anonymous}::internal_CloneHistogram(const nsACString_internal&, mozilla::Telemetry::ID, base::Histogram&)' defined but not used [-Werror=unused-function]
[task 2016-10-24T15:31:02.074315Z] 15:31:02     INFO -   internal_CloneHistogram(const nsACString& newName,
[task 2016-10-24T15:31:02.074372Z] 15:31:02     INFO -   ^
[task 2016-10-24T15:31:02.074423Z] 15:31:02     INFO -  cc1plus: all warnings being treated as errors

Please update the patch. To be able to push again to mozreview, you have to reopen the review at https://reviewboard.mozilla.org/r/85086/
Flags: needinfo?(adamgj.wong)
I'm slightly confused as to what the problem here is - the issue on ReviewBoard is "There was an error executing the Autoland request on autoland - We're sorry, Autoland could not rebase your commits for you automatically. Please manually rebase your commits and try again."

I have tried to rebase my commit which is revision 319176 but rebase fails - "source is ancestor of destination".  Do I have to save a patch of the files I committed, revert the file, update, apply patch and recommit?
Maybe because of your previous push and backout hg thinks your patch landed back on the 24.

You shouldn't have to deal with this nonsense as a new(ish) contributor. Let me take care of getting this landed for you.

Do you need help finding something new to work on? Or do you have your sights set on a particular thing to work on next?
Pushed by chutten@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/d19cffeae1fe
Removed histogramFrom and resulting unused functions, all STARTUP_* histograms, and any tests using these. r=chutten
https://hg.mozilla.org/mozilla-central/rev/d19cffeae1fe
Status: NEW → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla52
(In reply to Chris H-C :chutten from comment #32)
> Maybe because of your previous push and backout hg thinks your patch landed
> back on the 24.
> 
> You shouldn't have to deal with this nonsense as a new(ish) contributor. Let
> me take care of getting this landed for you.
> 
> Do you need help finding something new to work on? Or do you have your
> sights set on a particular thing to work on next?

Thanks for fixing that - could I ask how you resolved it?

I was looking through what you recommended before: https://bugzilla.mozilla.org/show_bug.cgi?id=1296654 is still open, so I was thinking of trying that out.
Please by all means take bug 1296654 and see what can be done there. Let me know if I can help.

As to how I fixed it... my setup is a bit unique, as I use git locally for version control. What I did was checkout a copy of the tree as it existed when you first wrote your patch, then applied your hg patch as a commit using `git am`. From there I rebased it onto the latest version of mozilla-inbound. I had to resolve some conflicts, as code had moved somewhat, so I built the tree to make sure it worked and sent it off to `try` to make sure that it built on a few other platforms.

Then I pushed directly to mozilla-inbound, which is an integration branch that is pulled onto mozilla-central (which builds Nightly) every day or so. [1]

Some of these steps you might not yet be able to do with your set of permissions (namely, pushing to try to have it build on other platforms, and pushing directly to mozilla-inbound), and others might not make sense if you don't have the odd git-based setup I have, but hopefully this has been informative.

Allow me to clear the needinfo? flag for you. This flag is generally set when the bug is waiting specifically for information from one or more contributors, and it looks like this bug is done.

[1]: https://wiki.mozilla.org/Tree_Rules/Inbound
Flags: needinfo?(adamgj.wong)
(In reply to Chris H-C :chutten from comment #36)
> Please by all means take bug 1296654 and see what can be done there. Let me
> know if I can help.
> 
> As to how I fixed it... my setup is a bit unique, as I use git locally for
> version control. What I did was checkout a copy of the tree as it existed
> when you first wrote your patch, then applied your hg patch as a commit
> using `git am`. From there I rebased it onto the latest version of
> mozilla-inbound. I had to resolve some conflicts, as code had moved
> somewhat, so I built the tree to make sure it worked and sent it off to
> `try` to make sure that it built on a few other platforms.
> 
> Then I pushed directly to mozilla-inbound, which is an integration branch
> that is pulled onto mozilla-central (which builds Nightly) every day or so.
> [1]
> 
> Some of these steps you might not yet be able to do with your set of
> permissions (namely, pushing to try to have it build on other platforms, and
> pushing directly to mozilla-inbound), and others might not make sense if you
> don't have the odd git-based setup I have, but hopefully this has been
> informative.
> 
> Allow me to clear the needinfo? flag for you. This flag is generally set
> when the bug is waiting specifically for information from one or more
> contributors, and it looks like this bug is done.
> 
> [1]: https://wiki.mozilla.org/Tree_Rules/Inbound

Great, thanks for the detailed explanation!
Duplicate of this bug: 1055208
You need to log in before you can comment on or make changes to this bug.