Closed Bug 1245910 Opened 8 years ago Closed 8 years ago

Histograms.json entries contain strings or expressions instead of number or booleans

Categories

(Toolkit :: Telemetry, defect, P4)

defect

Tracking

()

RESOLVED FIXED
mozilla47
Tracking Status
firefox47 --- fixed

People

(Reporter: gfritzsche, Assigned: hu.eric, Mentored)

References

Details

(Whiteboard: [measurement:client] [lang=python])

Attachments

(1 file, 3 obsolete files)

In Histograms.json, multiple fields contain string or expression values when they should just be numbers/booleans:
https://dxr.mozilla.org/mozilla-central/search?q=path%3AHistograms.json+regexp%3A%28\%22n_buckets\%22%29|%28\%22n_values\%22%29|%28\%22low\%22%29|%28\%22high\%22%29|%28\%22keyed\%22%29%3A\W\%22&redirect=true&case=false

We should fix this in this bug.
Then we can make the script (histogram_tools.py) that uses them enforce types or apply a schema.
Eric, would you be interested in taking this?

After bug 920169 we can actually convert the remaining values to numbers and make histogram_tools.py enforce them to be numbers when parsing the histogram descriptions.
Anything that is not a number should print a descriptive error and stop the build, similar to how the bug_numbers check works:
https://dxr.mozilla.org/mozilla-central/rev/576a6dcde5b68c2ea45324ed5ce1dabb7d833d09/toolkit/components/telemetry/histogram_tools.py#253
Mentor: gfritzsche
Flags: needinfo?(pineapple.rice)
Whiteboard: [measurement:client] → [measurement:client] [lang=python]
Yep, I would!
Flags: needinfo?(pineapple.rice)
Assignee: nobody → pineapple.rice
If I understand the description correctly, the search above is a superset of the Histograms.json fields to be changed.  It includes some fields that have numbers as values.

If that's correct so far, then this slightly modified search contains the only the fields to be changed, yes?  (Ignoring the string-value expressions that were changed in bug 920169).
https://dxr.mozilla.org/mozilla-central/search?q=path%3AHistograms.json+regexp%3A((%5C%22n_buckets%5C%22)%7C(%5C%22n_values%5C%22)%7C(%5C%22low%5C%22)%7C(%5C%22high%5C%22)%7C(%5C%22keyed%5C%22))%3A%5CW*%22&redirect=false&case=false
(In reply to Eric Hu from comment #4)
> If I understand the description correctly, the search above is a superset of
> the Histograms.json fields to be changed.  It includes some fields that have
> numbers as values.

Right, your regex seems to cover this better.
FWIW, once we enforce the number types in the Python script, we should catch anything we missed when the script runs as part of the build.
- Add compile time type checks on Histograms.json
- Remove string to int/bool coercion for 4 Histogram keys:
  - n_buckets
  - n_values
  - low
  - high
  - keyed
Comment on attachment 8720334 [details] [diff] [review]
Convert string values to integer and boolean values

I wasn't sure if it would be worth inserting comments in Histograms.json to show the original form of the computed values (i.e. 64 * 1024).  What are your thoughts on this?

I'm unsure what the purpose of the low(), high(), and n_values() methods were before.  Perhaps using a function to access those values served some purpose I'm unaware of?
Attachment #8720334 - Flags: review?(gfritzsche)
Comments aren't supported in JSON. It's one of the longstanding complaints about the format, actually.
Comment on attachment 8720334 [details] [diff] [review]
Convert string values to integer and boolean values

Review of attachment 8720334 [details] [diff] [review]:
-----------------------------------------------------------------

Thanks, this is looking good.
There is only one issue below, the rest are nits or requests.

::: toolkit/components/telemetry/histogram_tools.py
@@ +220,5 @@
>              raise KeyError, 'alert_emails must be an array if present (in Histogram %s)' % name
>  
>          Histogram.check_expiration(name, definition)
>          Histogram.check_bug_numbers(name, definition)
> +        Histogram.check_expression_types(name, definition)

This is a nice general approach and simplifies the assumptions in this whole grown script.
Let's do this check first before the other two here.

We also can remove the `check_numeric_limits()` helper now.
Can you please move the `check_name()` call over here too while we're at it?

@@ +259,5 @@
> +                "n_buckets": int,
> +                "n_values": int,
> +                "low": int,
> +                "high": int,
> +                "keyed": bool

Would you mind just adding the string type fields here too? That's:
expires_in_version, kind, description, cpp_guard, releaseChannelCollection

(coming from https://developer.mozilla.org/en-US/docs/Mozilla/Performance/Adding_a_new_Telemetry_probe#Declaring_a_Histogram)

@@ +261,5 @@
> +                "low": int,
> +                "high": int,
> +                "keyed": bool
> +            }
> +        for key in type_checked_fields:

for key, type in field_types.iteritems():
  ...

@@ +264,5 @@
> +            }
> +        for key in type_checked_fields:
> +            value = definition.get(key)
> +            key_type = type_checked_fields[key]
> +            if not value:

Nit: This will not behave right for falsy values. Instead do:

if not key in definition:
  continue
...

@@ +268,5 @@
> +            if not value:
> +                continue
> +            if not type(value) is key_type:
> +                raise ValueError, ('value for key "{0}" in Histogram "{1}" '
> +                        'should be a {2}').format(key, name, key_type)

Minor nit: Just "should be {2}"? This avoids e.g. "should be a int".
Attachment #8720334 - Flags: review?(gfritzsche) → feedback+
(In reply to Eric Hu from comment #7)
> I'm unsure what the purpose of the low(), high(), and n_values() methods
> were before.  Perhaps using a function to access those values served some
> purpose I'm unaware of?

They are usage from outside of this class, e.g. in gen-histogram-data.py (which uses the generated Histogram info and generates C++ header data from it).
- Add compile time type checks on Histograms.json
- Remove string to int/bool coercion for 4 Histogram keys:
  - n_buckets
  - n_values
  - low
  - high
  - keyed
- Remove check_numeric (redundant with check_expression_types)
Attachment #8720334 - Attachment is obsolete: true
Comment on attachment 8720829 [details] [diff] [review]
Convert string values to integer and boolean values

Thanks for the reminder about JSON comments, Chris.  I always forget about that.  That rules that change out.

I made most of your suggested changes, Georg.  I moved check_name into verify_attributes.  I couldn't come up with a new name for `check_expression_types` that included the responsibility of `check_name`.  What are your thoughts?
Attachment #8720829 - Flags: review?(gfritzsche)
(In reply to Georg Fritzsche [:gfritzsche] from comment #10)
> (In reply to Eric Hu from comment #7)
> > I'm unsure what the purpose of the low(), high(), and n_values() methods
> > were before.  Perhaps using a function to access those values served some
> > purpose I'm unaware of?
> 
> They are usage from outside of this class, e.g. in gen-histogram-data.py
> (which uses the generated Histogram info and generates C++ header data from
> it).

Ah, so Histogram variables like _low are private outside of histogram_tools.py without a method definition.  That's good to know.
Comment on attachment 8720829 [details] [diff] [review]
Convert string values to integer and boolean values

Review of attachment 8720829 [details] [diff] [review]:
-----------------------------------------------------------------

Thanks!
Building locally it seems like the strings in Histograms.json can be of both "str" and "unicode" type.
Lets handle that... or if the string handling is too much side-tracking for you i'm happy to move that to a follow-up bug.

::: toolkit/components/telemetry/histogram_tools.py
@@ +208,5 @@
>              and not isinstance(definition['alert_emails'], list)):
>              raise KeyError, 'alert_emails must be an array if present (in Histogram %s)' % name
>  
> +        Histogram.check_name(name)
> +        Histogram.check_expression_types(name, definition)

I'd suggest to name this "check_field_types".
"check_name" can stay separate, that seems fine :)

@@ +251,5 @@
> +                "n_values": int,
> +                "low": int,
> +                "high": int,
> +                "keyed": bool,
> +                "expires_in_version": str,

Building locally, the string fields can be either "unicode" or "str" for me.
Can we just check for either here?

@@ +258,5 @@
> +                "cpp_guard": str,
> +                "releaseChannelCollection": str
> +            }
> +        for key, key_type in type_checked_fields.iteritems():
> +            value = definition.get(key)

Nit: we don't need to .get() the value before the check on the next line.

@@ +263,5 @@
> +            if not key in definition:
> +                continue
> +            if not type(value) is key_type:
> +                raise ValueError, ('value for key "{0}" in Histogram "{1}" '
> +                        'should be {2}').format(key, name, key_type)

Nit: This prints e.g. "... should be <type 'str'>".
Let's use key_type.__name__ to make it print like "... should be of type 'str'".
Attachment #8720829 - Flags: review?(gfritzsche) → feedback+
- Add compile time type checks on Histograms.json
- Remove string to int/bool coercion for 4 Histogram keys:
  - n_buckets
  - n_values
  - low
  - high
  - keyed
- Remove check_numeric (redundant with check_expression_types)
Attachment #8720829 - Attachment is obsolete: true
Comment on attachment 8721311 [details] [diff] [review]
Convert string values to integer and boolean values

Changes made--str/unicode checking is specific to Python 2.

I like check_field_types as a name as well.
Attachment #8721311 - Flags: review?(gfritzsche)
Comment on attachment 8721311 [details] [diff] [review]
Convert string values to integer and boolean values

Review of attachment 8721311 [details] [diff] [review]:
-----------------------------------------------------------------

Thanks!
We need one change to cover an error i see building locally, then i can get this landed.

::: toolkit/components/telemetry/histogram_tools.py
@@ +261,5 @@
> +        for key, key_type in type_checked_fields.iteritems():
> +            if not key in definition:
> +                continue
> +            if not isinstance(definition[key], key_type):
> +                type_name = "str" if key_type is basestring else key_type.__name

That should be `key_type.__name__`.
Lets make it "string" to be readable - people edit JSON, not Python.
Attachment #8721311 - Flags: review?(gfritzsche) → feedback+
- Add compile time type checks on Histograms.json
- Remove string to int/bool coercion for 4 Histogram keys:
  - n_buckets
  - n_values
  - low
  - high
  - keyed
- Remove check_numeric (redundant with check_expression_types)
Attachment #8721311 - Attachment is obsolete: true
Comment on attachment 8721358 [details] [diff] [review]
Convert string values to integer and boolean values

Woops, that was a dumb mistake with __name__.  Updated the error message as well.
Attachment #8721358 - Flags: review?(gfritzsche)
Comment on attachment 8721358 [details] [diff] [review]
Convert string values to integer and boolean values

Review of attachment 8721358 [details] [diff] [review]:
-----------------------------------------------------------------

Thanks, this looks good :)
I ran a quick JS scratchpad on comparing the Histograms.json data before and after the patch, so that plus the new type checking makes me pretty confident about the changes.
Attachment #8721358 - Flags: review?(gfritzsche) → review+
https://hg.mozilla.org/mozilla-central/rev/b35e412b5449
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla47
Depends on: 1263189
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: