Last Comment Bug 723846 - make boolean histograms more useful
: make boolean histograms more useful
Status: RESOLVED FIXED
:
Product: Toolkit
Classification: Components
Component: Telemetry (show other bugs)
: Trunk
: All All
: -- enhancement (vote)
: mozilla13
Assigned To: Nathan Froyd [:froydnj]
:
Mentors:
: 732310 (view as bug list)
Depends on:
Blocks: 732310 733510
  Show dependency treegraph
 
Reported: 2012-02-03 01:10 PST by Nathan Froyd [:froydnj]
Modified: 2012-03-08 13:55 PST (History)
5 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments
part 1 - add FlagHistogram to ipc/ (4.03 KB, patch)
2012-03-02 07:25 PST, Nathan Froyd [:froydnj]
taras.mozilla: review+
Details | Diff | Review
part 2 - add FlagHistogram support to telemetry (3.27 KB, patch)
2012-03-02 07:26 PST, Nathan Froyd [:froydnj]
taras.mozilla: review-
Details | Diff | Review
part 3 - tests for FlagHistogram (1.65 KB, patch)
2012-03-02 07:26 PST, Nathan Froyd [:froydnj]
no flags Details | Diff | Review
part 2 - add FlagHistogram support to telemetry (9.14 KB, patch)
2012-03-05 08:57 PST, Nathan Froyd [:froydnj]
taras.mozilla: review-
Details | Diff | Review
part 3 - tests for FlagHistogram (3.62 KB, patch)
2012-03-05 08:58 PST, Nathan Froyd [:froydnj]
taras.mozilla: review+
Details | Diff | Review
part 2 - add FlagHistogram support to telemetry (8.86 KB, patch)
2012-03-06 12:06 PST, Nathan Froyd [:froydnj]
taras.mozilla: review+
Details | Diff | Review
part 1 - add FlagHistogram to ipc/ (4.05 KB, patch)
2012-03-06 14:01 PST, Nathan Froyd [:froydnj]
nfroyd: review+
Details | Diff | Review
part 2 - add FlagHistogram support to telemetry (11.33 KB, patch)
2012-03-06 14:02 PST, Nathan Froyd [:froydnj]
nfroyd: review+
Details | Diff | Review
part 3 - tests for FlagHistogram (6.19 KB, patch)
2012-03-06 14:03 PST, Nathan Froyd [:froydnj]
nfroyd: review+
Details | Diff | Review
part 1 - add FlagHistogram to ipc/ (4.05 KB, patch)
2012-03-06 17:36 PST, Nathan Froyd [:froydnj]
nfroyd: review+
Details | Diff | Review

Description Nathan Froyd [:froydnj] 2012-02-03 01:10:04 PST
+++ This bug was initially created as a clone of Bug #723802 +++

Is it necessary to call accumulate with a false value for boolean telemetry when we just want to know how often something happens?  From what I can tell it should be enough to just report the interesting (true) case and then compare that to the number of telemetry reports for the time period.  This is what Telemetry::A11Y_INSTANTIATED currently does but it leads to useless bar graphs showing 100% of reporters with it on without showing the total number of reports that didn't specify a value.  It seems like boolean telemetry probes should have a default value.  The advantage of only reporting in the true case is that it has no perf. impact.
Comment 1 (dormant account) 2012-02-03 02:01:10 PST
I think we should do this on the serverside instead.
Comment 2 Justin Lebar (not reading bugmail) 2012-02-03 10:14:46 PST
> The advantage of only reporting in the true case is that it has no perf. impact.

Have you measured a perf impact of accumulating into a histogram?
Comment 3 Matthew N. [:MattN] (PTO Jun. 29-30) 2012-02-03 11:24:42 PST
(In reply to Taras Glek (:taras) from comment #1)
> I think we should do this on the serverside instead.

That's possible and why I didn't file a bug for it in the first place.  I wanted to know if that's something that will get fixed on the server.  The problem is that the server side currently doesn't know for sure if a boolean will be used just for a true or false value or whether it's reporting both.  I guess they can just check if 100% of the reports are for one side and then assume it should be compared to the total reports?  This only works if you are looking only at data for a browser that reports that telemetry probe, otherwise it'll skew the data.

(In reply to Justin Lebar [:jlebar] from comment #2)
> > The advantage of only reporting in the true case is that it has no perf. impact.
> 
> Have you measured a perf impact of accumulating into a histogram?

That's a quote from me btw.

Note that I wasn't saying there was a perf impact of accumulating, I was just saying that not accumulating has zero perf impact compared to not using telemetry.
Comment 4 Nathan Froyd [:froydnj] 2012-02-03 11:36:29 PST
(In reply to Matthew N. [:MattN] from comment #3)
> (In reply to Taras Glek (:taras) from comment #1)
> > I think we should do this on the serverside instead.
> 
> That's possible and why I didn't file a bug for it in the first place.  I
> wanted to know if that's something that will get fixed on the server.

File a bug for the server side or the client side, doesn't matter much.  If it's not a good idea, somebody will say so.

> The problem is that the server side currently doesn't know for sure if a boolean
> will be used just for a true or false value or whether it's reporting both.

This is a problem, yes.  We could probably just use the heuristic you proposed for the boolean histograms we have now, though.
Comment 5 Justin Lebar (not reading bugmail) 2012-02-03 11:46:14 PST
> The problem is that the server side currently doesn't know for sure if a boolean will be used just 
> for a true or false value or whether it's reporting both.  I guess they can just check if 100% of 
> the reports are for one side and then assume it should be compared to the total reports?

I don't think you should rely on this.  Take a jaunt through the Telemetry dashboard and you'll see a bunch of bogus reporter names.  I don't think it's unlikely that we have bogus values as well.  Just one bogus value out of potentially millions of pings could throw your report off.
Comment 6 alexander :surkov 2012-03-02 07:08:41 PST
*** Bug 732310 has been marked as a duplicate of this bug. ***
Comment 7 Nathan Froyd [:froydnj] 2012-03-02 07:25:26 PST
Created attachment 602356 [details] [diff] [review]
part 1 - add FlagHistogram to ipc/

Here's a proposed solution: let's have a boolean-esque histogram that actually starts with a 0 value and only lets people accumulate a single positive value, kind of like a toggleable flag.  Hence the name FlagHistogram.

I realize that it might be more conceptually elegant to have a separate entity for this sort of thing, a TelemetryFlag or somesuch.  But this is expedient from a code point of view, an existing client's point of view, and from a metrics point of view (histogram dashboard already built, etc.)
Comment 8 Nathan Froyd [:froydnj] 2012-03-02 07:26:08 PST
Created attachment 602357 [details] [diff] [review]
part 2 - add FlagHistogram support to telemetry

Happily, little code is needed on the telemetry side of things.
Comment 9 Nathan Froyd [:froydnj] 2012-03-02 07:26:44 PST
Created attachment 602359 [details] [diff] [review]
part 3 - tests for FlagHistogram

...and tests to make sure everything works correctly.
Comment 10 (dormant account) 2012-03-02 15:57:00 PST
Comment on attachment 602356 [details] [diff] [review]
part 1 - add FlagHistogram to ipc/

that's a neat hack. This better get documented well
Comment 11 (dormant account) 2012-03-02 16:00:25 PST
Comment on attachment 602356 [details] [diff] [review]
part 1 - add FlagHistogram to ipc/

+  Histogram::Accumulate(1, 1, zero_index);
I'm confused why you don't use 0 as a default value
Comment 12 (dormant account) 2012-03-02 16:02:56 PST
Comment on attachment 602357 [details] [diff] [review]
part 2 - add FlagHistogram support to telemetry

 * HISTOGRAM_FLAG - For storing single 0/1 values

That's confusing. You should say that this histogram can only contain a single value, ie count == 1.

Also this is kind of weird. This histogram gets instantiated the first time you assign or read from it...so then why have the default value?

I thought the point of these was to exist in every ping, which will allow you to look at stuff like marketshare of l10n.
Comment 13 (dormant account) 2012-03-02 16:04:01 PST
Comment on attachment 602359 [details] [diff] [review]
part 3 - tests for FlagHistogram

+  // Should only switch counts once.
+  h.add(3);

Should this not throw an error?
Comment 14 (dormant account) 2012-03-02 16:06:17 PST
Comment on attachment 602357 [details] [diff] [review]
part 2 - add FlagHistogram support to telemetry

I'm pretty sure in order to work as intended you have to request all  HISTOGRAM_FLAG histograms to be created the first time histogramSnapshots is called.
Comment 15 Nathan Froyd [:froydnj] 2012-03-02 17:38:37 PST
(In reply to Taras Glek (:taras) from comment #11)
> +  Histogram::Accumulate(1, 1, zero_index);
> I'm confused why you don't use 0 as a default value

I am using 0 as a default value; I compute the index/bucket 0 goes in--that's zero_index--and stick a 1 there.

The inverse might be useful: default to 1 and get flipped to 0.  But I suppose you can have a FOO_DISABLED histogram to capture that semantic.

(In reply to Taras Glek (:taras) from comment #13)
> +  // Should only switch counts once.
> +  h.add(3);
> 
> Should this not throw an error?

I am ambivalent about this.  Doing that would require modifying all the Histogram subclasses to have their Add methods (and Accumulate methods, and SampleSet::Accumulate) return a status code.  I can see the value in doing this and am happy to do it if need be.

(In reply to Taras Glek (:taras) from comment #14)
> I'm pretty sure in order to work as intended you have to request all 
> HISTOGRAM_FLAG histograms to be created the first time histogramSnapshots is
> called.

Ooo, good point.  I'll do that and add a test.

I wonder how badly that would screw with the metrics side of things though; before you could assume that the histogram data was increasing only.  Now we might have these crazy histograms with one set of values one ping and a totally different (non-superset) set next ping.
Comment 16 (dormant account) 2012-03-02 21:10:23 PST
(In reply to Nathan Froyd (:froydnj) from comment #15)

> I wonder how badly that would screw with the metrics side of things though;
> before you could assume that the histogram data was increasing only.  Now we
> might have these crazy histograms with one set of values one ping and a
> totally different (non-superset) set next ping.

I'm not sure what you mean.
Comment 17 Nathan Froyd [:froydnj] 2012-03-05 08:57:22 PST
Created attachment 602932 [details] [diff] [review]
part 2 - add FlagHistogram support to telemetry

Updated patch, creating flag histograms automagically from addons and "main" telemetry, and fixing a couple of thinkos.
Comment 18 Nathan Froyd [:froydnj] 2012-03-05 08:58:02 PST
Created attachment 602933 [details] [diff] [review]
part 3 - tests for FlagHistogram

Updated tests for addons.
Comment 19 (dormant account) 2012-03-05 13:47:04 PST
Comment on attachment 602932 [details] [diff] [review]
part 2 - add FlagHistogram support to telemetry

+  printf("Getting histogram %s with type %u\n", name, histogramType);

^ I doubt you meant to put that in.

+  printf("Got histogram %s with type %u!\n", name, histogramType);
   return NS_OK;

+      nsresult rv = GetHistogramByEnumId(Telemetry::ID(i), &h);
+      if (NS_FAILED(rv)) {
+        return rv;
+      }

This is a bit scary. Having one ill-specified flag histogram will bust our entire telemetry reporting. An assert would work better.

I'll trust you on addon histogram stuff(that never fails, right?).
Comment 20 Nathan Froyd [:froydnj] 2012-03-05 13:54:54 PST
(In reply to Taras Glek (:taras) from comment #19)
> +  printf("Getting histogram %s with type %u\n", name, histogramType);
> 
> ^ I doubt you meant to put that in.

Oops, indeed.

> +      nsresult rv = GetHistogramByEnumId(Telemetry::ID(i), &h);
> +      if (NS_FAILED(rv)) {
> +        return rv;
> +      }
> 
> This is a bit scary. Having one ill-specified flag histogram will bust our
> entire telemetry reporting. An assert would work better.

GetHistogramByEnumId doesn't do any real checking of flag histogram parameters; we can make the static asserts for boolean histograms do the same checks for flag histograms, though.

So maybe there's no point in checking the return value here.

> I'll trust you on addon histogram stuff(that never fails, right?).

The addon histogram stuff at least has tests.  ^_^
Comment 21 Nathan Froyd [:froydnj] 2012-03-06 12:06:42 PST
Created attachment 603398 [details] [diff] [review]
part 2 - add FlagHistogram support to telemetry

Debugging code removed.  We now only complain about finding flag histograms in debug mode with an assert.
Comment 22 (dormant account) 2012-03-06 12:57:57 PST
Comment on attachment 602933 [details] [diff] [review]
part 3 - tests for FlagHistogram

You need a test in test_TelemetryPing to see that a FLAG histogram is being registered with a default 0 value
Comment 23 Nathan Froyd [:froydnj] 2012-03-06 14:01:31 PST
Created attachment 603454 [details] [diff] [review]
part 1 - add FlagHistogram to ipc/

There was a bug in FlagHistogram; the bucket ranges are not initialized in the constructor, but rather by explicit calls in FactoryGet.  So the initial bucketing of 0 was totally bogus.  I am unsure of how the nsITelemetry tests didn't catch this...
Comment 24 Nathan Froyd [:froydnj] 2012-03-06 14:02:49 PST
Created attachment 603455 [details] [diff] [review]
part 2 - add FlagHistogram support to telemetry

Add HISTOGRAM_FLAG macro and check for HISTOGRAM_FLAG in the static checking bits.
Comment 25 Nathan Froyd [:froydnj] 2012-03-06 14:03:48 PST
Created attachment 603456 [details] [diff] [review]
part 3 - tests for FlagHistogram

Add TelemetryPing test to make sure we automagically instantiate flag histograms.  Also add a little extra checking for sane sums to the nsITelemetry tests.
Comment 26 Mozilla RelEng Bot 2012-03-06 14:10:06 PST
Autoland Patchset:
	Patches: 603454, 603455, 603456
	Branch: mozilla-central => try
	Destination: http://hg.mozilla.org/try/pushloghtml?changeset=edb3d7d29932
Try run started, revision edb3d7d29932. To cancel or monitor the job, see: https://tbpl.mozilla.org/?tree=Try&rev=edb3d7d29932
Comment 27 Nathan Froyd [:froydnj] 2012-03-06 17:36:54 PST
Created attachment 603530 [details] [diff] [review]
part 1 - add FlagHistogram to ipc/

Try showed that Windows and Android builds fail on histogram.cc because MOZ_ASSERT is not defined there (possibly Mac too; those builds aren't done yet).  Curious...

Using the moral equivalent, DCHECK_EQ instead.
Comment 28 Mozilla RelEng Bot 2012-03-06 21:16:32 PST
Try run for edb3d7d29932 is complete.
Detailed breakdown of the results available here:
    https://tbpl.mozilla.org/?tree=Try&rev=edb3d7d29932
Results (out of 63 total builds):
    success: 47
    warnings: 7
    failure: 9
Builds (or logs if builds failed) available at:
http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/autolanduser@mozilla.com-edb3d7d29932
 Timed out after 06 hours without completing.
Comment 29 Mozilla RelEng Bot 2012-03-07 05:08:18 PST
Autoland Patchset:
	Patches: 603530, 603455, 603456
	Branch: mozilla-central => try
	Destination: http://hg.mozilla.org/try/pushloghtml?changeset=ac8c0c20a522
Try run started, revision ac8c0c20a522. To cancel or monitor the job, see: https://tbpl.mozilla.org/?tree=Try&rev=ac8c0c20a522
Comment 30 Nathan Froyd [:froydnj] 2012-03-07 08:29:06 PST
Try run looks good, flagging for checkin.
Comment 31 Mozilla RelEng Bot 2012-03-07 09:00:45 PST
Try run for ac8c0c20a522 is complete.
Detailed breakdown of the results available here:
    https://tbpl.mozilla.org/?tree=Try&rev=ac8c0c20a522
Results (out of 217 total builds):
    success: 172
    warnings: 42
    failure: 3
Builds (or logs if builds failed) available at:
http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/autolanduser@mozilla.com-ac8c0c20a522

Note You need to log in before you can comment on or make changes to this bug.