Closed
Bug 1245910
Opened 8 years ago
Closed 8 years ago
Histograms.json entries contain strings or expressions instead of number or booleans
Categories
(Toolkit :: Telemetry, defect, P4)
Toolkit
Telemetry
Tracking
()
RESOLVED
FIXED
mozilla47
Tracking | Status | |
---|---|---|
firefox47 | --- | fixed |
People
(Reporter: gfritzsche, Assigned: hu.eric, Mentored)
References
Details
(Whiteboard: [measurement:client] [lang=python])
Attachments
(1 file, 3 obsolete files)
272.25 KB,
patch
|
gfritzsche
:
review+
|
Details | Diff | Splinter Review |
In Histograms.json, multiple fields contain string or expression values when they should just be numbers/booleans: https://dxr.mozilla.org/mozilla-central/search?q=path%3AHistograms.json+regexp%3A%28\%22n_buckets\%22%29|%28\%22n_values\%22%29|%28\%22low\%22%29|%28\%22high\%22%29|%28\%22keyed\%22%29%3A\W\%22&redirect=true&case=false We should fix this in this bug. Then we can make the script (histogram_tools.py) that uses them enforce types or apply a schema.
Reporter | ||
Comment 1•8 years ago
|
||
We can also get rid of the awkward value coercion code here: https://dxr.mozilla.org/mozilla-central/rev/2dfb45d74f42d2a0010696f5fd47c7a7da94cedb/toolkit/components/telemetry/histogram_tools.py#262
Reporter | ||
Comment 2•8 years ago
|
||
Eric, would you be interested in taking this? After bug 920169 we can actually convert the remaining values to numbers and make histogram_tools.py enforce them to be numbers when parsing the histogram descriptions. Anything that is not a number should print a descriptive error and stop the build, similar to how the bug_numbers check works: https://dxr.mozilla.org/mozilla-central/rev/576a6dcde5b68c2ea45324ed5ce1dabb7d833d09/toolkit/components/telemetry/histogram_tools.py#253
Mentor: gfritzsche
Flags: needinfo?(pineapple.rice)
Whiteboard: [measurement:client] → [measurement:client] [lang=python]
If I understand the description correctly, the search above is a superset of the Histograms.json fields to be changed. It includes some fields that have numbers as values. If that's correct so far, then this slightly modified search contains the only the fields to be changed, yes? (Ignoring the string-value expressions that were changed in bug 920169). https://dxr.mozilla.org/mozilla-central/search?q=path%3AHistograms.json+regexp%3A((%5C%22n_buckets%5C%22)%7C(%5C%22n_values%5C%22)%7C(%5C%22low%5C%22)%7C(%5C%22high%5C%22)%7C(%5C%22keyed%5C%22))%3A%5CW*%22&redirect=false&case=false
Reporter | ||
Comment 5•8 years ago
|
||
(In reply to Eric Hu from comment #4) > If I understand the description correctly, the search above is a superset of > the Histograms.json fields to be changed. It includes some fields that have > numbers as values. Right, your regex seems to cover this better. FWIW, once we enforce the number types in the Python script, we should catch anything we missed when the script runs as part of the build.
- Add compile time type checks on Histograms.json - Remove string to int/bool coercion for 4 Histogram keys: - n_buckets - n_values - low - high - keyed
Comment on attachment 8720334 [details] [diff] [review] Convert string values to integer and boolean values I wasn't sure if it would be worth inserting comments in Histograms.json to show the original form of the computed values (i.e. 64 * 1024). What are your thoughts on this? I'm unsure what the purpose of the low(), high(), and n_values() methods were before. Perhaps using a function to access those values served some purpose I'm unaware of?
Attachment #8720334 -
Flags: review?(gfritzsche)
Comment 8•8 years ago
|
||
Comments aren't supported in JSON. It's one of the longstanding complaints about the format, actually.
Reporter | ||
Comment 9•8 years ago
|
||
Comment on attachment 8720334 [details] [diff] [review] Convert string values to integer and boolean values Review of attachment 8720334 [details] [diff] [review]: ----------------------------------------------------------------- Thanks, this is looking good. There is only one issue below, the rest are nits or requests. ::: toolkit/components/telemetry/histogram_tools.py @@ +220,5 @@ > raise KeyError, 'alert_emails must be an array if present (in Histogram %s)' % name > > Histogram.check_expiration(name, definition) > Histogram.check_bug_numbers(name, definition) > + Histogram.check_expression_types(name, definition) This is a nice general approach and simplifies the assumptions in this whole grown script. Let's do this check first before the other two here. We also can remove the `check_numeric_limits()` helper now. Can you please move the `check_name()` call over here too while we're at it? @@ +259,5 @@ > + "n_buckets": int, > + "n_values": int, > + "low": int, > + "high": int, > + "keyed": bool Would you mind just adding the string type fields here too? That's: expires_in_version, kind, description, cpp_guard, releaseChannelCollection (coming from https://developer.mozilla.org/en-US/docs/Mozilla/Performance/Adding_a_new_Telemetry_probe#Declaring_a_Histogram) @@ +261,5 @@ > + "low": int, > + "high": int, > + "keyed": bool > + } > + for key in type_checked_fields: for key, type in field_types.iteritems(): ... @@ +264,5 @@ > + } > + for key in type_checked_fields: > + value = definition.get(key) > + key_type = type_checked_fields[key] > + if not value: Nit: This will not behave right for falsy values. Instead do: if not key in definition: continue ... @@ +268,5 @@ > + if not value: > + continue > + if not type(value) is key_type: > + raise ValueError, ('value for key "{0}" in Histogram "{1}" ' > + 'should be a {2}').format(key, name, key_type) Minor nit: Just "should be {2}"? This avoids e.g. "should be a int".
Attachment #8720334 -
Flags: review?(gfritzsche) → feedback+
Reporter | ||
Comment 10•8 years ago
|
||
(In reply to Eric Hu from comment #7) > I'm unsure what the purpose of the low(), high(), and n_values() methods > were before. Perhaps using a function to access those values served some > purpose I'm unaware of? They are usage from outside of this class, e.g. in gen-histogram-data.py (which uses the generated Histogram info and generates C++ header data from it).
Assignee | ||
Comment 11•8 years ago
|
||
- Add compile time type checks on Histograms.json - Remove string to int/bool coercion for 4 Histogram keys: - n_buckets - n_values - low - high - keyed - Remove check_numeric (redundant with check_expression_types)
Attachment #8720334 -
Attachment is obsolete: true
Assignee | ||
Comment 12•8 years ago
|
||
Comment on attachment 8720829 [details] [diff] [review] Convert string values to integer and boolean values Thanks for the reminder about JSON comments, Chris. I always forget about that. That rules that change out. I made most of your suggested changes, Georg. I moved check_name into verify_attributes. I couldn't come up with a new name for `check_expression_types` that included the responsibility of `check_name`. What are your thoughts?
Attachment #8720829 -
Flags: review?(gfritzsche)
Assignee | ||
Comment 13•8 years ago
|
||
(In reply to Georg Fritzsche [:gfritzsche] from comment #10) > (In reply to Eric Hu from comment #7) > > I'm unsure what the purpose of the low(), high(), and n_values() methods > > were before. Perhaps using a function to access those values served some > > purpose I'm unaware of? > > They are usage from outside of this class, e.g. in gen-histogram-data.py > (which uses the generated Histogram info and generates C++ header data from > it). Ah, so Histogram variables like _low are private outside of histogram_tools.py without a method definition. That's good to know.
Reporter | ||
Comment 14•8 years ago
|
||
Comment on attachment 8720829 [details] [diff] [review] Convert string values to integer and boolean values Review of attachment 8720829 [details] [diff] [review]: ----------------------------------------------------------------- Thanks! Building locally it seems like the strings in Histograms.json can be of both "str" and "unicode" type. Lets handle that... or if the string handling is too much side-tracking for you i'm happy to move that to a follow-up bug. ::: toolkit/components/telemetry/histogram_tools.py @@ +208,5 @@ > and not isinstance(definition['alert_emails'], list)): > raise KeyError, 'alert_emails must be an array if present (in Histogram %s)' % name > > + Histogram.check_name(name) > + Histogram.check_expression_types(name, definition) I'd suggest to name this "check_field_types". "check_name" can stay separate, that seems fine :) @@ +251,5 @@ > + "n_values": int, > + "low": int, > + "high": int, > + "keyed": bool, > + "expires_in_version": str, Building locally, the string fields can be either "unicode" or "str" for me. Can we just check for either here? @@ +258,5 @@ > + "cpp_guard": str, > + "releaseChannelCollection": str > + } > + for key, key_type in type_checked_fields.iteritems(): > + value = definition.get(key) Nit: we don't need to .get() the value before the check on the next line. @@ +263,5 @@ > + if not key in definition: > + continue > + if not type(value) is key_type: > + raise ValueError, ('value for key "{0}" in Histogram "{1}" ' > + 'should be {2}').format(key, name, key_type) Nit: This prints e.g. "... should be <type 'str'>". Let's use key_type.__name__ to make it print like "... should be of type 'str'".
Attachment #8720829 -
Flags: review?(gfritzsche) → feedback+
Assignee | ||
Comment 15•8 years ago
|
||
- Add compile time type checks on Histograms.json - Remove string to int/bool coercion for 4 Histogram keys: - n_buckets - n_values - low - high - keyed - Remove check_numeric (redundant with check_expression_types)
Attachment #8720829 -
Attachment is obsolete: true
Assignee | ||
Comment 16•8 years ago
|
||
Comment on attachment 8721311 [details] [diff] [review] Convert string values to integer and boolean values Changes made--str/unicode checking is specific to Python 2. I like check_field_types as a name as well.
Attachment #8721311 -
Flags: review?(gfritzsche)
Reporter | ||
Comment 17•8 years ago
|
||
Comment on attachment 8721311 [details] [diff] [review] Convert string values to integer and boolean values Review of attachment 8721311 [details] [diff] [review]: ----------------------------------------------------------------- Thanks! We need one change to cover an error i see building locally, then i can get this landed. ::: toolkit/components/telemetry/histogram_tools.py @@ +261,5 @@ > + for key, key_type in type_checked_fields.iteritems(): > + if not key in definition: > + continue > + if not isinstance(definition[key], key_type): > + type_name = "str" if key_type is basestring else key_type.__name That should be `key_type.__name__`. Lets make it "string" to be readable - people edit JSON, not Python.
Attachment #8721311 -
Flags: review?(gfritzsche) → feedback+
Assignee | ||
Comment 18•8 years ago
|
||
- Add compile time type checks on Histograms.json - Remove string to int/bool coercion for 4 Histogram keys: - n_buckets - n_values - low - high - keyed - Remove check_numeric (redundant with check_expression_types)
Attachment #8721311 -
Attachment is obsolete: true
Assignee | ||
Comment 19•8 years ago
|
||
Comment on attachment 8721358 [details] [diff] [review] Convert string values to integer and boolean values Woops, that was a dumb mistake with __name__. Updated the error message as well.
Attachment #8721358 -
Flags: review?(gfritzsche)
Reporter | ||
Comment 20•8 years ago
|
||
Comment on attachment 8721358 [details] [diff] [review] Convert string values to integer and boolean values Review of attachment 8721358 [details] [diff] [review]: ----------------------------------------------------------------- Thanks, this looks good :) I ran a quick JS scratchpad on comparing the Histograms.json data before and after the patch, so that plus the new type checking makes me pretty confident about the changes.
Attachment #8721358 -
Flags: review?(gfritzsche) → review+
Comment 22•8 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/b35e412b5449
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla47
You need to log in
before you can comment on or make changes to this bug.
Description
•