Closed Bug 1353678 Opened 9 years ago Closed 9 years ago

Make duplicate Histograms.json keys build errors

Categories

(Toolkit :: Telemetry, enhancement, P3)

enhancement

Tracking

()

RESOLVED FIXED
mozilla55
Tracking Status
firefox55 --- fixed

People

(Reporter: gfritzsche, Assigned: djmdeveloper060796, Mentored)

Details

(Whiteboard: [measurement:client] [lang=python] [good next bug])

Attachments

(1 file, 2 obsolete files)

Per bug 1353666, it is possible to have duplicate keys in Histograms.json. We should add checking to histogram_tools.py [1] that makes this an error at build-time. We parse the file at [2], where json.load() already returns a dict. Either we can make json.load() be stricter or we need to add other checking methods. 1: https://dxr.mozilla.org/mozilla-central/source/toolkit/components/telemetry/histogram_tools.py 2: https://dxr.mozilla.org/mozilla-central/rev/81e37ef1360ba4505726ddf542ebdcc952a57578/toolkit/components/telemetry/histogram_tools.py#435
@Georg Hi I would like to work on this. How should I proceed?
Flags: needinfo?(gfritzsche)
You can test the behavior by adding duplicate keys like the ones from bug 1353666 to Histograms.json. A quick web search finds suggestions like this to deal with duplicate keys: http://stackoverflow.com/questions/14902299/json-loads-allows-duplicate-keys-in-a-dictionary-overwriting-the-first-value From there raise an exception, ideally with the offending histogram names and property keys.
Assignee: nobody → djmdeveloper060796
Flags: needinfo?(gfritzsche)
Attached patch bug1353678.patch (obsolete) — Splinter Review
Added check for duplicate keys.
Attachment #8856343 - Flags: review?(gfritzsche)
Comment on attachment 8856343 [details] [diff] [review] bug1353678.patch Review of attachment 8856343 [details] [diff] [review]: ----------------------------------------------------------------- Thanks for the patch. This looks close already, but i think we can actually do this more efficiently. ::: toolkit/components/telemetry/histogram_tools.py @@ +429,5 @@ > > > +# The json being parsed should not contain duplicate keys. This function > +# checks for duplicate keys.If duplicate key is present it raises a > +# ValueError mentioning the duplicate key Nits: - "JSON" - We can describe this a bit shorter. Together with the change below e.g.: "This hook function loads the histograms into an OrderedDict. It will raise a ValueError if duplicate keys are found." @@ +434,5 @@ > +def dict_raise_on_duplicates(ordered_pairs): > + d = {} > + for key, value in ordered_pairs: > + if key in d: > + raise ValueError("duplicate key: %r" % (key,)) No need for "%r", we only expect string keys. "Found duplicate key in Histograms file: %s" @@ +448,5 @@ > def from_Histograms_json(filename): > + # Checking for duplicate keys > + with open(filename, 'r') as f: > + try: > + histograms = json.load(f, object_pairs_hook=dict_raise_on_duplicates) Thinking about this, this will load the file from disk two times, which can impact build time. Let's instead: - change the object_pairs_hook below to use the new helper function - return an OrderDict from the helper function - rename the helper function to something more suitable like "load_histograms_into_dict"
Attachment #8856343 - Flags: review?(gfritzsche) → feedback+
Attached patch bug1353678.patch (obsolete) — Splinter Review
I did the changes you mentioned above.
Attachment #8856343 - Attachment is obsolete: true
Attachment #8856541 - Flags: review?(gfritzsche)
Comment on attachment 8856541 [details] [diff] [review] bug1353678.patch Review of attachment 8856541 [details] [diff] [review]: ----------------------------------------------------------------- Thanks, this looks good! I only have two small change requests below. ::: toolkit/components/telemetry/histogram_tools.py @@ +434,5 @@ > +def load_histograms_into_dict(ordered_pairs): > + d = collections.OrderedDict() > + for key, value in ordered_pairs: > + if key in d: > + raise ValueError("Found duplicate key in Histograms file: %s" % (key,)) Nit: You don't need the trailing comma here. Just use e.g.: > ("..." % key) @@ +436,5 @@ > + for key, value in ordered_pairs: > + if key in d: > + raise ValueError("Found duplicate key in Histograms file: %s" % (key,)) > + else: > + d[key] = value We don't need an `else:` here - if the above condition is true we always throw.
Attachment #8856541 - Flags: review?(gfritzsche) → feedback+
I also saw Python linting errors: > <...>/toolkit/components/telemetry/histogram_tools.py > 432:1 error too many blank lines (3) E303 (flake8) > 434:1 error expected 2 blank lines, found 3 E302 (flake8) Please run this on the code: > ./mach lint -l flake8 toolkit/components/telemetry
Attached patch bug1353678.patchSplinter Review
@George did the changes you mentioned. Lint tests are passing.
Attachment #8856541 - Attachment is obsolete: true
Attachment #8857086 - Flags: review?(gfritzsche)
Comment on attachment 8857086 [details] [diff] [review] bug1353678.patch Review of attachment 8857086 [details] [diff] [review]: ----------------------------------------------------------------- Thanks!
Attachment #8857086 - Flags: review?(gfritzsche) → review+
Keywords: checkin-needed
Pushed by cbook@mozilla.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/0c63f689efc7 Made duplicate Histograms.json keys build errors. r=gfritzsche
Keywords: checkin-needed
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla55
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: