bugzilla.mozilla.org will be intermittently unavailable on Saturday, March 24th, from 16:00 until 20:00 UTC.

Make duplicate Histograms.json keys build errors

RESOLVED FIXED in Firefox 55



a year ago
a year ago


(Reporter: gfritzsche, Assigned: Deepjyoti Mondal, Mentored)



Firefox Tracking Flags

(firefox55 fixed)


(Whiteboard: [measurement:client] [lang=python] [good next bug])


(1 attachment, 2 obsolete attachments)

Per bug 1353666, it is possible to have duplicate keys in Histograms.json.

We should add checking to histogram_tools.py [1] that makes this an error at build-time.
We parse the file at [2], where json.load() already returns a dict.
Either we can make json.load() be stricter or we need to add other checking methods.

1: https://dxr.mozilla.org/mozilla-central/source/toolkit/components/telemetry/histogram_tools.py
2: https://dxr.mozilla.org/mozilla-central/rev/81e37ef1360ba4505726ddf542ebdcc952a57578/toolkit/components/telemetry/histogram_tools.py#435

Comment 1

a year ago
@Georg Hi I would like to work on this. How should I proceed?
Flags: needinfo?(gfritzsche)
You can test the behavior by adding duplicate keys like the ones from bug 1353666 to Histograms.json.

A quick web search finds suggestions like this to deal with duplicate keys:

From there raise an exception, ideally with the offending histogram names and property keys.
Assignee: nobody → djmdeveloper060796
Flags: needinfo?(gfritzsche)

Comment 3

a year ago
Created attachment 8856343 [details] [diff] [review]

Added check for duplicate keys.
Attachment #8856343 - Flags: review?(gfritzsche)
Comment on attachment 8856343 [details] [diff] [review]

Review of attachment 8856343 [details] [diff] [review]:

Thanks for the patch.
This looks close already, but i think we can actually do this more efficiently.

::: toolkit/components/telemetry/histogram_tools.py
@@ +429,5 @@
> +# The json being parsed should not contain duplicate keys. This function
> +# checks for duplicate keys.If duplicate key is present it raises a
> +# ValueError mentioning the duplicate key

- "JSON"
- We can describe this a bit shorter. Together with the change below e.g.:
  "This hook function loads the histograms into an OrderedDict.
  It will raise a ValueError if duplicate keys are found."

@@ +434,5 @@
> +def dict_raise_on_duplicates(ordered_pairs):
> +    d = {}
> +    for key, value in ordered_pairs:
> +        if key in d:
> +            raise ValueError("duplicate key: %r" % (key,))

No need for "%r", we only expect string keys.
"Found duplicate key in Histograms file: %s"

@@ +448,5 @@
>  def from_Histograms_json(filename):
> +    # Checking for duplicate keys
> +    with open(filename, 'r') as f:
> +        try:
> +            histograms = json.load(f, object_pairs_hook=dict_raise_on_duplicates)

Thinking about this, this will load the file from disk two times, which can impact build time.
Let's instead:
- change the object_pairs_hook below to use the new helper function
- return an OrderDict from the helper function
- rename the helper function to something more suitable like "load_histograms_into_dict"
Attachment #8856343 - Flags: review?(gfritzsche) → feedback+

Comment 5

a year ago
Created attachment 8856541 [details] [diff] [review]

I did the changes you mentioned above.
Attachment #8856343 - Attachment is obsolete: true
Attachment #8856541 - Flags: review?(gfritzsche)
Comment on attachment 8856541 [details] [diff] [review]

Review of attachment 8856541 [details] [diff] [review]:

Thanks, this looks good! I only have two small change requests below.

::: toolkit/components/telemetry/histogram_tools.py
@@ +434,5 @@
> +def load_histograms_into_dict(ordered_pairs):
> +    d = collections.OrderedDict()
> +    for key, value in ordered_pairs:
> +        if key in d:
> +            raise ValueError("Found duplicate key in Histograms file: %s" % (key,))

Nit: You don't need the trailing comma here. Just use e.g.:
> ("..." % key)

@@ +436,5 @@
> +    for key, value in ordered_pairs:
> +        if key in d:
> +            raise ValueError("Found duplicate key in Histograms file: %s" % (key,))
> +        else:
> +            d[key] = value

We don't need an `else:` here - if the above condition is true we always throw.
Attachment #8856541 - Flags: review?(gfritzsche) → feedback+
I also saw Python linting errors:
> <...>/toolkit/components/telemetry/histogram_tools.py
>   432:1  error  too many blank lines (3)         E303 (flake8)
>   434:1  error  expected 2 blank lines, found 3  E302 (flake8)

Please run this on the code:
> ./mach lint -l flake8 toolkit/components/telemetry

Comment 8

a year ago
Created attachment 8857086 [details] [diff] [review]

@George did the changes you mentioned. Lint tests are passing.
Attachment #8856541 - Attachment is obsolete: true
Attachment #8857086 - Flags: review?(gfritzsche)
Comment on attachment 8857086 [details] [diff] [review]

Review of attachment 8857086 [details] [diff] [review]:

Attachment #8857086 - Flags: review?(gfritzsche) → review+
Keywords: checkin-needed

Comment 10

a year ago
Pushed by cbook@mozilla.com:
Made duplicate Histograms.json keys build errors. r=gfritzsche
Keywords: checkin-needed

Comment 11

a year ago
Last Resolved: a year ago
status-firefox55: affected → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla55
You need to log in before you can comment on or make changes to this bug.