Closed Bug 781531 Opened 8 years ago Closed 8 years ago

generate TelemetryHistograms.h from JSON

Categories

(Toolkit :: Telemetry, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla17

People

(Reporter: froydnj, Assigned: froydnj)

References

Details

Attachments

(8 files, 4 obsolete files)

Attached file strawman Histograms.json file (obsolete) —
We have a couple bugs open (bug 748417, bug 753447, bug 778213) that would be somewhat easier to pull off if our histogram data were stored in a real format, rather than something shoved through the preprocessor.

Attached is a strawman proposal for what the data format would look like.  The only thing I'm not sure about is that the cpp_guard seems to easy to leave out, leading to unintentional bustage somewhere along the way.
Attachment #650557 - Flags: feedback?(taras.mozilla)
Attachment #650557 - Attachment mime type: application/json → text/plain
Comment on attachment 650557 [details]
strawman Histograms.json file

Yeah I agree with the general idea
Attachment #650557 - Flags: feedback?(taras.mozilla) → feedback+
Depends on: 782373
The rather large patch that removes the old stuff and adds the new stuff.

Will attach conversion scripts in a moment.
Attachment #650557 - Attachment is obsolete: true
Attachment #652839 - Flags: review?(taras.mozilla)
This is the Python conversion script.  It requires a shell wrapper for dealing with various bits of cpp and whatnot that I will attach shortly.
Attached file shell portion of the conversion script (obsolete) —
...and the promised shell part.  Put convert-to-json.{py,sh} in the same directory and run:

./convert-to-json.sh /path/to/TelemetryHistograms.h /path/to/output/json/file

The script should work just fine on any TelemetryHistograms.h from mozilla-central as of today; past revisions may require a bit of tweaking.
Adding Harsha to point him at the scripts.
Thanks. Just to confirm if i checkout mozilla-central I don't need to apply your patch to TelemetryHistogram.h ?
(In reply to schintalapani from comment #6)
> Thanks. Just to confirm if i checkout mozilla-central I don't need to apply
> your patch to TelemetryHistogram.h ?

That's correct.  Though if you checkout mozilla-central and apply the patch, you shouldn't need to use the conversion scripts, since the patch includes toolkit/components/telemetry/Histograms.json.  You can use that json file directly.
Comment on attachment 652839 [details] [diff] [review]
convert TelemetryHistograms.h to JSON

gen-histogram-data.py needs license boilerplate.
There no useful comments in the file. Having a general description near top would be handy.
Also need description various gotchas, ie distinction between 'BOOLEAN' and 'boolean'


For a followup bug:
I'm not sure a single histogram json is a great idea. I think we should break .jsons into a directory of their own and aggregate them based on various build flags.

For example,
cpp_guard stuff could just live on filename level, ala histograms.XP_WIN.json
We could still autogenerate networking histograms via the build system.
We can also better group related histograms in own files, do channel-specific histograms, etc.
Attachment #652839 - Flags: review?(taras.mozilla) → review-
(In reply to Taras Glek (:taras) from comment #8)
> gen-histogram-data.py needs license boilerplate.

Ah, yes, license boilerplate.  Forgot about that.

> There no useful comments in the file. Having a general description near top
> would be handy.

I refuse to break convention with the many millions of lines of source code already in mozilla-central by addressing this concern. ;)

> Also need description various gotchas, ie distinction between 'BOOLEAN' and
> 'boolean'

That particular distinction will go away once all the appropriate histograms are converted to use HISTOGRAM_BOOLEAN.

> For a followup bug:
> I'm not sure a single histogram json is a great idea. I think we should
> break .jsons into a directory of their own and aggregate them based on
> various build flags.
> 
> For example,
> cpp_guard stuff could just live on filename level, ala histograms.XP_WIN.json
> We could still autogenerate networking histograms via the build system.
> We can also better group related histograms in own files, do
> channel-specific histograms, etc.

I'd have to think about this.  It seems like it would be useful to have as little build knowledge as necessary so that any processing of the JSON can be done without having to perform a full build.  Splitting things into individual files wouldn't necessarily go against that, but autogeneration via the build system would.  I suppose you could provide pushbutton scripts for such purposes, to be used via the build system and any other interested parties.
Comment on attachment 652839 [details] [diff] [review]
convert TelemetryHistograms.h to JSON

Please separate the mechanical .h->.json translation into own patch in future patches.

I think we should factor out
+    with open(filename, 'r') as f:
+        histograms = json.load(f, object_pairs_hook=json.OrderedDict)
...
+        for (name, definition) in histograms.iteritems():
Into own python module even if it's a trivial amount of code.

This will make it easier to evolve our json functionality faster.

+if __name__ == '__main__':
+    main(sys.argv[1:])

Why not just do main(sys.argv[1])?

Overall this looks ok.
Attachment #652844 - Attachment mime type: application/x-sh → text/plain
Attachment #652840 - Attachment mime type: text/x-python → text/plain
Attached file reference.json
Hi,
  Please take a look at the reference.json and confirm if its the right file. I looked at the older reference.json its completely different from this one so confused.
Thanks,
Harsha
Attachment #654027 - Attachment mime type: application/json → text/plain
Attached file Histograms.json
(In reply to schintalapani from comment #11)
>   Please take a look at the reference.json and confirm if its the right
> file. I looked at the older reference.json its completely different from
> this one so confused.

I'm not sure what you mean by "older reference.json"; the file I'm attaching is the version I've submitted in my patches and will, with minor modifications, be committed to the repository once the surrounding infrastructure is approved.

Note that it has many more histograms than the reference.json you submitted; how did you generate your reference.json?
I checked out mozilla-central repo and used TelemetryHistograms.h and convert-to-json.sh to generate the reference.json
(In reply to schintalapani from comment #13)
> I checked out mozilla-central repo and used TelemetryHistograms.h and
> convert-to-json.sh to generate the reference.json

What platform are you using?
OS X 10.8
The TelemetryHistograms.h to JSON bit of the patch, separated out for ease of review.
Attachment #654344 - Flags: review?(taras.mozilla)
The rest of the story.  License boilerplate added: do we need a header on Histograms.json?  That's going to complicate parsing...
Attachment #654345 - Flags: review?(taras.mozilla)
Attached file shell portion of the conversion script (obsolete) —
(In reply to schintalapani from comment #15)
> OS X 10.8

Hum, that's strange.  Maybe the sed on your machine doesn't work the same as sed on my Linux box.  Could you try the attached script, please?
Attachment #652844 - Attachment is obsolete: true
I am getting this on OS X 
sed: 1: "/^\#define HISTOGRAM_/, ...": expected context address
I ran it in archlinux it works and matches your json output.
Thanks,
Harsha
Attachment #654346 - Attachment is obsolete: true
This is the older version of json I am talking about and there is no "buckets" integer array element in this new version.
(In reply to schintalapani from comment #21)
> This is the older version of json I am talking about and there is no
> "buckets" integer array element in this new version.

Ah, we are talking past each other, then.

The JSON I am attaching to this bug is merely a description of the essential bits of histograms.  The JSON you want is that generated by the scripts attached to bug 748417.  I don't believe it's particularly difficult to get from this bug's JSON description to the JSON you want, but the code to do that has not yet been written. :)
(In reply to Nathan Froyd (:froydnj) from comment #22)
> attached to bug 748417.  I don't believe it's particularly difficult to get
> from this bug's JSON description to the JSON you want, but the code to do
> that has not yet been written. :)

By written, Nathan meant cut'n'pasted :)

(In reply to Nathan Froyd (:froydnj) from comment #17)
> Created attachment 654345 [details] [diff] [review]
> Histograms.json processing machinery
> 
> The rest of the story.  License boilerplate added: do we need a header on
> Histograms.json?  That's going to complicate parsing...

It's a datafile, lets skip boilerplate because format does not allow it.
Comment on attachment 654344 [details] [diff] [review]
TelemetryHistograms.h -> JSON patch part

Thanks for factoring this out.
Attachment #654344 - Flags: review?(taras.mozilla) → review+
Comment on attachment 654345 [details] [diff] [review]
Histograms.json processing machinery

+histograms_file := $(srcdir)/Histograms.json
+histogram_enum_file := TelemetryHistogramEnums.h
+histogram_data_file := TelemetryHistogramData.inc

lowercase LHS seems to be pretty uncommon in our makefiles

+def allowed_keys(*keylist):
+    always_allowed = ['kind', 'description', 'cpp_guard']
+    return list(keylist) + always_allowed
For sanity, please rename the function to get_allowed_keys() so we aren't mixing a variable a function of the same name within the same func. I'm not even sure what the value of this function is.


+    with open(filename, 'r') as f:
+        histograms = json.load(f, object_pairs_hook=json.OrderedDict)
+        return histograms.iteritems()

This may be better with yield. but matter of taste really.


+if __name__ == '__main__':
+    main(sys.argv[1:])

Why not just do main(sys.argv[1])?(this is a holdover from last review)
(In reply to Taras Glek (:taras) from comment #24)
> Comment on attachment 654344 [details] [diff] [review]
> TelemetryHistograms.h -> JSON patch part
> 
> Thanks for factoring this out.

Note I do not want to land this with "BOOLEAN". I do not want metrics to deal with that bogus distinction
(In reply to Taras Glek (:taras) from comment #25)
> +histograms_file := $(srcdir)/Histograms.json
> +histogram_enum_file := TelemetryHistogramEnums.h
> +histogram_data_file := TelemetryHistogramData.inc
> 
> lowercase LHS seems to be pretty uncommon in our makefiles

The new hotness is that uppercase names are for build-system special variables and lowercase names are for makefile-local things.

> +def allowed_keys(*keylist):
> +    always_allowed = ['kind', 'description', 'cpp_guard']
> +    return list(keylist) + always_allowed
> For sanity, please rename the function to get_allowed_keys() so we aren't
> mixing a variable a function of the same name within the same func. I'm not
> even sure what the value of this function is.

I can just remove it.

> +    with open(filename, 'r') as f:
> +        histograms = json.load(f, object_pairs_hook=json.OrderedDict)
> +        return histograms.iteritems()
> 
> This may be better with yield. but matter of taste really.

as in:

  for item in histograms.iteritems():
    yield item

?  I don't care either way (it'll be changed to yield once other refactoring goes in).
 
> +if __name__ == '__main__':
> +    main(sys.argv[1:])
> 
> Why not just do main(sys.argv[1])?(this is a holdover from last review)

It is The Python Way.  Particularly useful if you're going to use a given file as a module, but you also want it to act as a script (testing, etc.).  I don't particularly care either way.
Comment on attachment 654345 [details] [diff] [review]
Histograms.json processing machinery

Other than minor nits I mentioned, this looks good. 
I'd like glandium to review the makefile bits.
Attachment #654345 - Flags: review?(taras.mozilla)
Attachment #654345 - Flags: review?(mh+mozilla)
Attachment #654345 - Flags: review+
(In reply to Taras Glek (:taras) from comment #26)
> > TelemetryHistograms.h -> JSON patch part
>
> Note I do not want to land this with "BOOLEAN". I do not want metrics to
> deal with that bogus distinction

Here's a version of the script which handles boolean histograms that get defined with the HISTOGRAM macro as opposed to the HISTOGRAM_BOOLEAN macro.

It also makes attempts to coerce expressions used for the upper bounds of histograms to numbers; doing this means that we can generate bucket ranges for a larger number of histograms than we could before.  I suppose we could just stick the coercion into the reading library; I am ambivalent about this one way or the other.
Attachment #652840 - Attachment is obsolete: true
Comment on attachment 654345 [details] [diff] [review]
Histograms.json processing machinery

Review of attachment 654345 [details] [diff] [review]:
-----------------------------------------------------------------

::: toolkit/components/telemetry/Makefile.in
@@ +71,5 @@
> +	$(PYTHON) $(srcdir)/gen-histogram-data.py $< > $@
> +
> +Telemetry.$(OBJ_SUFFIX): $(histogram_data_file)
> +
> +GARBAGE += $(histogram_data_file)

You want to remove histogram_enum_file too.
Attachment #654345 - Flags: review?(mh+mozilla) → review+
https://hg.mozilla.org/integration/mozilla-inbound/rev/b4316e1c474d
Assignee: nobody → nfroyd
Status: NEW → ASSIGNED
https://hg.mozilla.org/mozilla-central/rev/b4316e1c474d
Status: ASSIGNED → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla17
You need to log in before you can comment on or make changes to this bug.