Closed Bug 1188888 Opened 4 years ago Closed 3 years ago

Implement categorical histograms

Categories

(Toolkit :: Telemetry, defect, P1)

defect
Points:
3

Tracking

()

RESOLVED FIXED
mozilla50
Tracking Status
firefox50 --- fixed

People

(Reporter: rvitillo, Assigned: gfritzsche)

References

(Blocks 1 open bug)

Details

(Whiteboard: [measurement:client])

Attachments

(7 files, 11 obsolete files)

7.19 KB, patch
chutten|PTO
: review+
Details | Diff | Splinter Review
26.78 KB, patch
gfritzsche
: review+
Details | Diff | Splinter Review
3.56 KB, patch
chutten|PTO
: review+
Details | Diff | Splinter Review
1.67 KB, patch
chutten|PTO
: review+
Details | Diff | Splinter Review
2.96 KB, patch
froydnj
: review+
Details | Diff | Splinter Review
23.60 KB, patch
gfritzsche
: review+
Details | Diff | Splinter Review
13.11 KB, patch
froydnj
: review+
Details | Diff | Splinter Review
It would be convenient to have a histogram for categorical data where each category is represented with a string. Ideally we could start with the implementation of enumerated histograms and add labels to it.
Summary: Implement categorial histograms → Implement categorical histograms
I suggested in email to keep this as type: "enumerated" and add a enumeration field:

{
  "name": "MY_TELEMETRY_ENUM",
  "type": "enumerated",
  "enumeration: [
    "VALUE_1",
    "VALUE_2",
    ...
  ]
}

The enumeration could be exposed as a real c++ enum class, if the names were all tokens:

namespace Telemetry {
enum class MY_TELEMETRY_ENUM_Values {
  VALUE_1 = 0,
  VALUE_2 = 2
}

And also people would be allowed to accumulate string values and have them automatically converted:

void Accumulate(ID id, const nsCString& value);
void Accumulate(const char* name, const nsCString& value);

Georg, would you be willing to mentor somebody through this? It seems like a pretty simple bug for somebody who knows c++, although the xpconnect bits might be hard.
Flags: needinfo?(gfritzsche)
(In reply to Benjamin Smedberg  [:bsmedberg] from comment #1)
> I suggested in email to keep this as type: "enumerated" and add a
> enumeration field:
> 
> {
>   "name": "MY_TELEMETRY_ENUM",
>   "type": "enumerated",
>   "enumeration: [
>     "VALUE_1",
>     "VALUE_2",
>     ...
>   ]
> }

The `enumeration` part looks sensible, but i think we should use a different type.
We should be able to tell for which histograms we support this (and error out for others).

> The enumeration could be exposed as a real c++ enum class, if the names were
> all tokens:
> 
> namespace Telemetry {
> enum class MY_TELEMETRY_ENUM_Values {
>   VALUE_1 = 0,
>   VALUE_2 = 2
> }

So, we can add a type-safe C++ path for this:
* generate an `enum class X : uint32_t ...` for each enumeration list
* generate a type list `TL` of those enum classes
* provide `template<class E> Accumulate(ID id, E value)`
* have that function `static_assert()` that `E` is an enum and in `TL`
* ... accumulate normally from there

> And also people would be allowed to accumulate string values and have them
> automatically converted:
> 
> void Accumulate(ID id, const nsCString& value);
> void Accumulate(const char* name, const nsCString& value);

Do you think we need that for the C++ part?
We need the IDL/JS interface to take string arguments, but we should try to force C++ to use enums where possible.

> Georg, would you be willing to mentor somebody through this? It seems like a
> pretty simple bug for somebody who knows c++, although the xpconnect bits
> might be hard.

Sure, i'll set this up when i'm back.
Flags: needinfo?(gfritzsche)
OS: Unspecified → All
Priority: -- → P3
Hardware: Unspecified → All
Whiteboard: [measurement:client]
Version: unspecified → Trunk
There have been two cases in the past week where the C++ code had a string and would have had to build a giant switch block to get back to an enumeration, which I was trying to avoid by exposing this directly to C++.
Roberto, i'm thinking that:
* we add a new histogram type, say "labelled" or "categorical"
* for a start, we treat them like enumerated histograms in the aggregator & cerberus
* in a follow-up bug, we add showing labels on t.m.o

How does that sound?
Flags: needinfo?(rvitillo)
(In reply to Georg Fritzsche [:gfritzsche] from comment #4)
> Roberto, i'm thinking that:
> * we add a new histogram type, say "labelled" or "categorical"
> * for a start, we treat them like enumerated histograms in the aggregator &
> cerberus
> * in a follow-up bug, we add showing labels on t.m.o
> 
> How does that sound?

Sounds good to me. I would prefer "categorical" over "labelled".
Flags: needinfo?(rvitillo)
Points: --- → 3
Priority: P3 → P1
Assignee: nobody → gfritzsche
Some long overdue refactoring of test_nsITelemetry.js.
Attachment #8769664 - Flags: review?(alessio.placitelli)
More prep work: Make the histogram string tables human-readable. I dont see a great reason for the previous structure.
Attachment #8769665 - Flags: review?(chutten)
More prep work: Clean up and refactor the type checking in the histogram script.
Attachment #8769666 - Flags: review?(chutten)
Implementing the categorical histograms and JS API. The C++ API comes in a separate patch.
Attachment #8769667 - Flags: review?(chutten)
Attachment #8769665 - Flags: review?(chutten) → review+
Attachment #8769666 - Flags: review?(chutten) → review+
Comment on attachment 8769667 [details] [diff] [review]
Part 4 - Implement categorical histograms with JS API

Review of attachment 8769667 [details] [diff] [review]:
-----------------------------------------------------------------

Basically there, but I have some questions

::: toolkit/components/telemetry/TelemetryHistogram.cpp
@@ +295,5 @@
> +nsresult
> +HistogramInfo::label_id(const char* label, uint32_t* labelId) const
> +{
> +  MOZ_ASSERT(label);
> +  MOZ_ASSERT(this->histogramType == nsITelemetry::HISTOGRAM_CATEGORICAL);

Do we want to early-return on these failing in opt?

@@ +298,5 @@
> +  MOZ_ASSERT(label);
> +  MOZ_ASSERT(this->histogramType == nsITelemetry::HISTOGRAM_CATEGORICAL);
> +
> +  for (uint32_t i = 0; i < this->label_count; ++i) {
> +    // gHistogramLabelTable contains the indices of the label strings in the

I like "label table". Rolls off the tongue. :)

@@ +377,5 @@
>    switch (histogramType) {
>    case nsITelemetry::HISTOGRAM_EXPONENTIAL:
>      *result = Histogram::FactoryGet(name, min, max, bucketCount, Histogram::kUmaTargetedHistogramFlag);
>      break;
>    case nsITelemetry::HISTOGRAM_LINEAR:

Do we need a "// Fall Through." comment or something here to prove that we're doing this a-purpose? *checks style guide* I guess we don't, but I'd kinda like one :)

::: toolkit/components/telemetry/gen-histogram-data.py
@@ +47,5 @@
> +        label_index = 0
> +        if len(labels) > 0:
> +            label_index = label_count
> +            label_table.append((histogram.name(), string_table.stringIndexes(labels)))
> +            label_count = label_count + len(labels)

Nit: label_count += len(labels)

::: toolkit/components/telemetry/tests/unit/test_nsITelemetry.js
@@ +216,5 @@
> +    Assert.throws(() => h1.add(s));
> +  }
> +
> +  let snapshot = h1.snapshot();
> +  Assert.equal(snapshot.sum, 6);

Uh... sum(snapshot.counts) is seven, no?
Attachment #8769667 - Flags: review?(chutten) → review-
(In reply to Chris H-C :chutten from comment #10)
> ::: toolkit/components/telemetry/tests/unit/test_nsITelemetry.js
> @@ +216,5 @@
> > +    Assert.throws(() => h1.add(s));
> > +  }
> > +
> > +  let snapshot = h1.snapshot();
> > +  Assert.equal(snapshot.sum, 6);
> 
> Uh... sum(snapshot.counts) is seven, no?

`.sum` is the sum of the submitted values and we submit 3 zero-bucket values.
`.sum` is basically `sum(.ranges * .counts)`.
(In reply to Georg Fritzsche [:gfritzsche] from comment #11)
> (In reply to Chris H-C :chutten from comment #10)
> > ::: toolkit/components/telemetry/tests/unit/test_nsITelemetry.js
> > @@ +216,5 @@
> > > +    Assert.throws(() => h1.add(s));
> > > +  }
> > > +
> > > +  let snapshot = h1.snapshot();
> > > +  Assert.equal(snapshot.sum, 6);
> > 
> > Uh... sum(snapshot.counts) is seven, no?
> 
> `.sum` is the sum of the submitted values and we submit 3 zero-bucket values.
> `.sum` is basically `sum(.ranges * .counts)`.

... at least for linear/enumerated/categorical histograms where the bucket size is 1.
For differently sized-buckets this doesn't hold though.
Ack, sorry about that. (Marked as spam)

r+ with or without nits addressed.
(In reply to Chris H-C :chutten from comment #10)
> @@ +298,5 @@
> > +  MOZ_ASSERT(label);
> > +  MOZ_ASSERT(this->histogramType == nsITelemetry::HISTOGRAM_CATEGORICAL);
> > +
> > +  for (uint32_t i = 0; i < this->label_count; ++i) {
> > +    // gHistogramLabelTable contains the indices of the label strings in the
> 
> I like "label table". Rolls off the tongue. :)

"i wasn't able to disable the label table" ...

> @@ +377,5 @@
> >    switch (histogramType) {
> >    case nsITelemetry::HISTOGRAM_EXPONENTIAL:
> >      *result = Histogram::FactoryGet(name, min, max, bucketCount, Histogram::kUmaTargetedHistogramFlag);
> >      break;
> >    case nsITelemetry::HISTOGRAM_LINEAR:
> 
> Do we need a "// Fall Through." comment or something here to prove that
> we're doing this a-purpose? *checks style guide* I guess we don't, but I'd
> kinda like one :)

I would agree if we have any code there, but with the switch labels directly following each other it seems like an obvious pattern.
Comment on attachment 8769664 [details] [diff] [review]
Part 1 - Refactor test_nsITelemetry.js

Review of attachment 8769664 [details] [diff] [review]:
-----------------------------------------------------------------

Good think you're taking care of this file. I think we can get rid of some more stuff (e.g. |expect_fail|). Other than that, there are just a few other minor style comments.

::: toolkit/components/telemetry/tests/unit/test_nsITelemetry.js
@@ +17,3 @@
>  }
>  
> +function expect_fail(f) {

Since we're here, can't we use Assert.throws instead of defining this?

@@ +26,5 @@
> +  }
> +  do_check_true(failed);
> +}
> +
> +function expect_success(f) {

Mh, why is this needed? Can't we simply call f() within the test and let it break it when throwing?

@@ +103,5 @@
>  }
>  
> +// This MUST be the very first test of this file.
> +add_task({
> +  skip_if: () => gIsAndroid 

Trailing whitespace.

@@ +128,5 @@
> +    let [min, max, bucket_count] = [1, INT_MAX - 1, 10]
> +    check_histogram(histogram_type, "test::"+histogram_type, min, max, bucket_count);
> +
> +    const nh = Telemetry.newHistogram;
> +    expect_fail(() => nh("test::min", "never", histogram_type, 0, max, bucket_count));

I think we can use |Assert.throws| in place of |expect_fail| and get rid of the latter.

@@ +136,5 @@
>  
> +add_task(function test_noSerialization() {
> +  // Instantiate the storage for this histogram and make sure it doesn't
> +  // get reflected into JS, as it has no interesting data in it.
> +  let h = Telemetry.getHistogramById("NEWTAB_PAGE_PINNED_SITES_COUNT");

nit: You're not using |h|, I think we can get rid of it

@@ +145,1 @@
>  {

nit: Let's move "{" to the previous line

@@ +167,1 @@
>  {

nit: same here and in other places within this file.

@@ +460,1 @@
>  {

nit: since we're here, would you move the opening "{" to the previous line?

@@ +509,1 @@
>  {

nit: move "{" to the previous line

@@ +565,1 @@
>  {

nit: same here

@@ +737,1 @@
>  {

nit: move "{" to the previous line

@@ +781,3 @@
>  
> +add_task({
> +  skip_if: () => gIsAndroid 

nit: trailing whitespace

@@ +867,3 @@
>  
> +add_task({
> +  skip_if: () => gIsAndroid 

nit: trailing whitespace
Attachment #8769664 - Flags: review?(alessio.placitelli) → feedback+
(In reply to Alessio Placitelli [:Dexter] from comment #16)
> Good think you're taking care of this file. I think we can get rid of some
> more stuff (e.g. |expect_fail|).

I'm not planning to do that in this bug.
Comment on attachment 8769664 [details] [diff] [review]
Part 1 - Refactor test_nsITelemetry.js

(In reply to Georg Fritzsche [:gfritzsche] from comment #17)
> (In reply to Alessio Placitelli [:Dexter] from comment #16)
> > Good think you're taking care of this file. I think we can get rid of some
> > more stuff (e.g. |expect_fail|).
> 
> I'm not planning to do that in this bug.

It looks good then, with the other nits addressed.
Attachment #8769664 - Flags: feedback+ → review+
Nits addressed, also removed unused function check_bug_numbers from histogram_tools.
Attachment #8769667 - Attachment is obsolete: true
Attachment #8769697 - Flags: review+
Attachment #8769664 - Attachment is obsolete: true
Also fix not using generators for the add_task() functions.
Attachment #8769699 - Attachment is obsolete: true
Attachment #8769704 - Flags: review+
Attached patch Part 5 - Restrict label values (obsolete) — Splinter Review
We should restrict the label values to sane contents from the start.
Attachment #8770256 - Flags: review?(chutten)
Attached patch Part 6 - Add mozilla::tl::Or (obsolete) — Splinter Review
This adds the C++ API and enum generation.
Attachment #8770258 - Flags: review?(chutten)
This restricts the histogram names to sane contents instead of making the C++ build fail when adding unsupported characters.
Attachment #8770259 - Flags: review?(chutten)
Comment on attachment 8770257 [details] [diff] [review]
Part 6 - Add mozilla::tl::Or

This adds tl::Or<> to the MFBT, which is used in part 7 to check for whether a type is in a known set of types:

> template<class T> struct IsCategoricalLabelEnum : tl::Or<
>   IsSame<T, LABELS_TELEMETRY_TEST_CATEGORICAL>::value,
>   IsSame<T, LABELS_TELEMETRY_TEST_CATEGORICAL_OPTOUT>::value
> >::Type {};

Not the most pretty way i think, but it works.
Attachment #8770257 - Flags: review?(jwalden+bmo)
Comment on attachment 8770256 [details] [diff] [review]
Part 5 - Restrict label values

Review of attachment 8770256 [details] [diff] [review]:
-----------------------------------------------------------------

::: toolkit/components/telemetry/histogram_tools.py
@@ +257,5 @@
> +        labels = definition.get('labels')
> +        if not labels:
> +            return
> +
> +        limit = 20

Shouldn't this be a constant someplace instead of a local?

@@ +262,5 @@
> +        invalid = filter(lambda l: len(l) > limit, labels)
> +        if len(invalid) > 0:
> +            raise ValueError, 'Label values for %s exceed length limit of %d: %s' % (name, limit, ', '.join(invalid))
> +
> +        pattern = '^[a-z][a-z0-9_]+[a-z0-9]$'

Maybe a comment that this forces labels to conform to identifiers allowed by C++ version whatever?
Attachment #8770256 - Flags: review?(chutten)
Attachment #8770257 - Flags: review?(jwalden+bmo) → review+
Attachment #8770514 - Flags: review?(chutten)
Attachment #8770256 - Attachment is obsolete: true
Updated doc comments for the API and dropped the EnableIf for a static_assert.
Attachment #8770515 - Flags: review?(chutten)
Attachment #8770258 - Attachment is obsolete: true
Attachment #8770258 - Flags: review?(chutten)
Attachment #8770259 - Attachment is obsolete: true
Attachment #8770259 - Flags: review?(chutten)
Attachment #8770514 - Flags: review?(chutten) → review+
Comment on attachment 8770515 [details] [diff] [review]
Part 7 - Implement C++ API for categorical histograms

Review of attachment 8770515 [details] [diff] [review]:
-----------------------------------------------------------------

::: toolkit/components/telemetry/Telemetry.h
@@ +106,5 @@
> +};
> +
> +/**
> + * Adds sample to a categorical histogram defined in TelemetryHistogramEnums.h
> + * This allows passings strings matching the labels defined in Histograms.json

"passings"

I'm not sure that this sentence is very clear. What does the caller care about the index of the label string?

Maybe something like: "This string will be matched against the labels defined in Histograms.json. If the string does not match a label defined for the histogram, nothing will be recorded"

::: toolkit/components/telemetry/TelemetryHistogram.cpp
@@ +1132,1 @@
>        JS_ReportError(cx, "Unknown label for categorical histogram");

By changing the semantics here, technically the error could be with the id (line 611 returns failure status from internal_GetHistogramByEnumId) _or_ the label. I'm not too worried.

::: toolkit/components/telemetry/gen-histogram-enum.py
@@ +91,5 @@
> +        print("\nenum class %s : uint32_t {" % name, file=output)
> +        print("  %s" % ",\n  ".join(labels), file=output)
> +        print("};", file=output)
> +
> +    print("\ntemplate<class T> struct IsCategoricalLabelEnum : tl::Or<", file=output)

I'm not so familiar with enum class finagling. Please ask for a second pair of eyes to look at these parts.
Attachment #8770515 - Flags: review?(chutten) → feedback+
Attachment #8770516 - Flags: review?(chutten) → review+
Nathan, can you take specifically take a look over the label enum handling? This affects IsCategoricalLabelEnum, CategoricalLabelId and their uses. chutten covered the rest.
Attachment #8770582 - Flags: review?(nfroyd)
Attachment #8770515 - Attachment is obsolete: true
Attachment #8770582 - Flags: review?(chutten)
(In reply to Georg Fritzsche [:gfritzsche] from comment #32)
> Created attachment 8770582 [details] [diff] [review]
> Part 7 - Implement C++ API for categorical histograms
> 
> Nathan, can you take specifically take a look over the label enum handling?
> This affects IsCategoricalLabelEnum, CategoricalLabelId and their uses.
> chutten covered the rest.

For supporting "categorical histograms" with a predefined set we generate C++ enums.
Basically for Histograms.json entries like this:
>  "TELEMETRY_TEST_CATEGORICAL": {
>    "alert_emails": ["telemetry-client-dev@mozilla.com"],
>    "bug_numbers": [1188888],
>    "expires_in_version": "never",
>    "kind": "categorical",
>    "labels": [
>      "CommonLabel",
>      "Label2",
>      "Label3"
>    ],
>    "description": "a testing histogram; not meant to be touched"
>  },
>  "TELEMETRY_TEST_CATEGORICAL_OPTOUT": {
>    "alert_emails": ["telemetry-client-dev@mozilla.com"],
>    "bug_numbers": [1188888],
>    "expires_in_version": "never",
>    "releaseChannelCollection": "opt-out",
>    "kind": "categorical",
>    "labels": [
>      "CommonLabel",
>      "Label4",
>      "Label5",
>      "Label6"
>    ],
>    "description": "a testing histogram; not meant to be touched"
>  },

... we generate one `enum class` for each categorical histograms labels and some type-checking. Here is a shortened version of the generated TelemetryHistogramEnums.h:
https://pastebin.mozilla.org/8883568
Comment on attachment 8770582 [details] [diff] [review]
Part 7 - Implement C++ API for categorical histograms

Review of attachment 8770582 [details] [diff] [review]:
-----------------------------------------------------------------

r1/2 for the telemetry semantics
Attachment #8770582 - Flags: review?(chutten) → review+
Comment on attachment 8769665 [details] [diff] [review]
Part 2 - Generate readable Telemetry string tables

Review of attachment 8769665 [details] [diff] [review]:
-----------------------------------------------------------------

Drive-by review.  The tables are written the way they are for a reason. :)

::: toolkit/components/telemetry/shared_telemetry_utils.py
@@ +26,5 @@
>          the table if it's not there.
>          :param string: the input string.
>          """
>          if string in self.table:
> +            return self.table.index(string)

You have turned a constant-time operation into a linear-time operation here...and you actually do a linear-time operation twice, because of the membership check and the indexing operation are both linear.

@@ +37,5 @@
>          :param f: the output stream.
>          :param name: the name of the output array.
>          """
> +        entries = self.table
> +        f.write('const char* %s[] = {\n' % name)

You have drastically increased the space requirements for the table here, because now in addition to the string data itself, you require pointers to individual strings, and those pointers on some platforms also require extra space in the form of relocations.  Please don't do that.
Attachment #8769665 - Flags: review-
Comment on attachment 8770582 [details] [diff] [review]
Part 7 - Implement C++ API for categorical histograms

Review of attachment 8770582 [details] [diff] [review]:
-----------------------------------------------------------------

The template definitions need a little bit of massaging, and label_id needs to be reworked.

::: toolkit/components/telemetry/TelemetryHistogram.cpp
@@ +600,5 @@
> +nsresult
> +internal_HistogramAddCategorical(mozilla::Telemetry::ID id, const nsCString& label)
> +{
> +  uint32_t labelId = 0;
> +  if (NS_FAILED(gHistograms[id].label_id(label.get(), &labelId))) {

I'll comment on this here rather than in the patch that implements it: I think this style of API in C++ is a horrible idea, because you should almost always have constant string values, and this API means that you're always doing work at runtime that you shouldn't have to do!

That said, bsmedberg indicated in comment 3 that we need something like this, and while I'm curious about the examples, I'll take his word for it.

But the underlying implementation of label_id is terrible: it should be logarithmic time rather than linear time, because we should be able to arrange it such that labels are stored in sorted order.  So can we fix that, please?

::: toolkit/components/telemetry/gen-histogram-enum.py
@@ +93,5 @@
> +        print("};", file=output)
> +
> +    print("\ntemplate<class T> struct IsCategoricalLabelEnum : tl::Or<", file=output)
> +    print(",\n".join(["  IsSame<T, %s>::value" % name for name,_,_ in enums]), file=output)
> +    print(">::Type {};", file=output)

This method is not optimal.  What you really want is something like:

template<typename T>
struct IsCategorialLabelEnum : public mozilla::FalseType {};

template<>
struct IsCategoricalLabelEnum<ENUM_TYPE1> : public mozilla::TrueType {};
template<>
struct IsCategoricalLabelEnum<ENUM_TYPE2> : public mozilla::TrueType {};

and so forth.

@@ +97,5 @@
> +    print(">::Type {};", file=output)
> +
> +    print("\ntemplate<class T> struct CategoricalLabelId {};", file=output)
> +    for name,_,id in enums:
> +        print("template<> struct CategoricalLabelId<%s> : IntegralConstant<uint32_t, %s>::Type {};" % (name, id), file=output)

This doesn't need ::Type, as IntegralConstant<>::Type is just IntegralConstant<>.
Attachment #8770582 - Flags: review?(nfroyd) → feedback+
(In reply to Nathan Froyd [:froydnj] from comment #35)
> Comment on attachment 8769665 [details] [diff] [review]
> Part 2 - Generate readable Telemetry string tables
> 
> Review of attachment 8769665 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> Drive-by review.  The tables are written the way they are for a reason. :)

Ok, i think that shows that we need a big comment there on the reason :)
Do you think this adds unacceptable space usage across platforms with current compiler generations?
(In reply to Nathan Froyd [:froydnj] from comment #36)
> But the underlying implementation of label_id is terrible: it should be
> logarithmic time rather than linear time, because we should be able to
> arrange it such that labels are stored in sorted order.  So can we fix that,
> please?

So, from what i see we have few options here:
(1) generate that mapping/sorting at compile-time: this means additional space usage (i.e. would have to be outside of the common histogram string table)
(2) generate that kind of (label -> id) mapping on start-up: this means storage costs for all labels (potentially many), many of which might not actually be used
(3) build a lazy map on label use: we can cache the results locally, speeding up repeated use

(1) and (2) don't sound great. If lookup perf is a concern, lets do (3)?
Flags: needinfo?(nfroyd)
(In reply to Georg Fritzsche [:gfritzsche] from comment #37)
> (In reply to Nathan Froyd [:froydnj] from comment #35)
> > Drive-by review.  The tables are written the way they are for a reason. :)
> 
> Ok, i think that shows that we need a big comment there on the reason :)
> Do you think this adds unacceptable space usage across platforms with
> current compiler generations?

Yes.  Granted, the space increase is not *huge* (~40K on x86-64 Linux, roughly half that on 32-bit platforms), but it is certainly a regression that's completely avoidable, and it would be an even more noticeable regression with multiple content processes.  It doesn't really matter what the compiler generation is, either.

If you want to make it more readable, adding a (generated) comment for each string that shows the actual value would be reasonable.

(In reply to Georg Fritzsche [:gfritzsche] from comment #38)
> (In reply to Nathan Froyd [:froydnj] from comment #36)
> > But the underlying implementation of label_id is terrible: it should be
> > logarithmic time rather than linear time, because we should be able to
> > arrange it such that labels are stored in sorted order.  So can we fix that,
> > please?
> 
> So, from what i see we have few options here:

In my reading, I missed that the linear search is actually per-histogram, rather than global.  So the number of entries is (presumably) small, which makes a linear search much more reasonable.  So I was out of line in grousing about the linear lookup: my apologies.  Let's keep the linear lookup for now.  (I think you can do this without any extra space and some cleverness, but given that linear search is over a small number of entries, I'm not going to spend a lot of time thinking about it.)
Flags: needinfo?(nfroyd)
(In reply to Nathan Froyd [:froydnj] from comment #39)
> (In reply to Georg Fritzsche [:gfritzsche] from comment #37)
> > (In reply to Nathan Froyd [:froydnj] from comment #35)
> > > Drive-by review.  The tables are written the way they are for a reason. :)
> > 
> > Ok, i think that shows that we need a big comment there on the reason :)
> > Do you think this adds unacceptable space usage across platforms with
> > current compiler generations?
> 
> Yes.  Granted, the space increase is not *huge* (~40K on x86-64 Linux,
> roughly half that on 32-bit platforms), but it is certainly a regression
> that's completely avoidable, and it would be an even more noticeable
> regression with multiple content processes.  It doesn't really matter what
> the compiler generation is, either.
> 
> If you want to make it more readable, adding a (generated) comment for each
> string that shows the actual value would be reasonable.

Sounds good, that was my planned fallback option here.
Attachment #8769665 - Attachment is obsolete: true
Attachment #8769697 - Attachment is obsolete: true
Attachment #8772444 - Flags: review+
Attachment #8770257 - Attachment is obsolete: true
Addressed the other two points.
Attachment #8772445 - Flags: review?(nfroyd)
Attachment #8770582 - Attachment is obsolete: true
Comment on attachment 8772437 [details] [diff] [review]
Part 2 - Generate readable Telemetry string tables

Review of attachment 8772437 [details] [diff] [review]:
-----------------------------------------------------------------

::: toolkit/components/telemetry/shared_telemetry_utils.py
@@ +62,4 @@
>          f.write("const char %s[] = {\n" % name)
> +        for (string, offset) in entries:
> +            if "*/" in string:
> +                raise ValueError, "String in string table contains unexpected sequence '*/': %s" % string

Nice catch to save somebody some future frustration.
Attachment #8772437 - Flags: review?(nfroyd) → review+
Comment on attachment 8772445 [details] [diff] [review]
Part 6 - Implement C++ API for categorical histograms

Review of attachment 8772445 [details] [diff] [review]:
-----------------------------------------------------------------

r=me.  Thanks for catching that the mozilla::Or change was no longer needed.
Attachment #8772445 - Flags: review?(nfroyd) → review+
Pushed by georg.fritzsche@googlemail.com:
https://hg.mozilla.org/integration/fx-team/rev/6a5f0538b3c9
Part 1 - Refactor test_nsITelemetry.js. r=dexter
https://hg.mozilla.org/integration/fx-team/rev/58344005efca
Part 2 - Generate readable Telemetry string tables. r=froydnj
https://hg.mozilla.org/integration/fx-team/rev/45b83bf04a26
Part 3 - Refactor the histogram script type & presence checking. r=chutten
https://hg.mozilla.org/integration/fx-team/rev/a662e130c9de
Part 4 - Implement categorical histograms with JS API. r=chutten
https://hg.mozilla.org/integration/fx-team/rev/9f2ef29dee01
Part 5 - Restrict label values. r=chutten
https://hg.mozilla.org/integration/fx-team/rev/13b22aa53d11
Part 6 - Implement C++ API for categorical histograms. r=chutten,r=froyndj
https://hg.mozilla.org/integration/fx-team/rev/d8dcff77656e
Part 7 - Bonus - Explicitly restrict histogram names to current patterns. r=chutten
Blocks: 1288130
Blocks: 1288444
Blocks: 1297060
Duplicate of this bug: 1166341
You need to log in before you can comment on or make changes to this bug.