Last Comment Bug 781531 - generate TelemetryHistograms.h from JSON
: generate TelemetryHistograms.h from JSON
Status: RESOLVED FIXED
:
Product: Toolkit
Classification: Components
Component: Telemetry (show other bugs)
: unspecified
: All All
: -- normal (vote)
: mozilla17
Assigned To: Nathan Froyd [:froydnj]
: telemetry
: Georg Fritzsche [:gfritzsche]
Mentors:
Depends on: 782373 790615
Blocks:
  Show dependency treegraph
 
Reported: 2012-08-09 07:56 PDT by Nathan Froyd [:froydnj]
Modified: 2012-09-12 09:22 PDT (History)
4 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments
strawman Histograms.json file (1.04 KB, text/plain)
2012-08-09 07:56 PDT, Nathan Froyd [:froydnj]
taras.mozilla: feedback+
Details
convert TelemetryHistograms.h to JSON (115.87 KB, patch)
2012-08-17 11:20 PDT, Nathan Froyd [:froydnj]
taras.mozilla: review-
Details | Diff | Splinter Review
python portion of the conversion script (3.46 KB, text/plain)
2012-08-17 11:21 PDT, Nathan Froyd [:froydnj]
no flags Details
shell portion of the conversion script (1.08 KB, text/plain)
2012-08-17 11:23 PDT, Nathan Froyd [:froydnj]
no flags Details
reference.json (37.10 KB, text/plain)
2012-08-21 17:09 PDT, Harsha [:harsha]
no flags Details
Histograms.json (67.40 KB, text/plain)
2012-08-22 10:23 PDT, Nathan Froyd [:froydnj]
no flags Details
TelemetryHistograms.h -> JSON patch part (108.07 KB, patch)
2012-08-22 13:10 PDT, Nathan Froyd [:froydnj]
taras.mozilla: review+
Details | Diff | Splinter Review
Histograms.json processing machinery (11.22 KB, patch)
2012-08-22 13:11 PDT, Nathan Froyd [:froydnj]
taras.mozilla: review+
mh+mozilla: review+
Details | Diff | Splinter Review
shell portion of the conversion script (1.09 KB, text/plain)
2012-08-22 13:15 PDT, Nathan Froyd [:froydnj]
no flags Details
shell portion of the conversion script (1.08 KB, text/plain)
2012-08-22 14:16 PDT, Nathan Froyd [:froydnj]
no flags Details
older version of reference.json (178.43 KB, application/json)
2012-08-22 15:29 PDT, Harsha [:harsha]
no flags Details
python portion of the conversion script (4.07 KB, text/x-python)
2012-08-24 10:34 PDT, Nathan Froyd [:froydnj]
no flags Details

Description Nathan Froyd [:froydnj] 2012-08-09 07:56:42 PDT
Created attachment 650557 [details]
strawman Histograms.json file

We have a couple bugs open (bug 748417, bug 753447, bug 778213) that would be somewhat easier to pull off if our histogram data were stored in a real format, rather than something shoved through the preprocessor.

Attached is a strawman proposal for what the data format would look like.  The only thing I'm not sure about is that the cpp_guard seems to easy to leave out, leading to unintentional bustage somewhere along the way.
Comment 1 (dormant account) 2012-08-10 09:45:14 PDT
Comment on attachment 650557 [details]
strawman Histograms.json file

Yeah I agree with the general idea
Comment 2 Nathan Froyd [:froydnj] 2012-08-17 11:20:15 PDT
Created attachment 652839 [details] [diff] [review]
convert TelemetryHistograms.h to JSON

The rather large patch that removes the old stuff and adds the new stuff.

Will attach conversion scripts in a moment.
Comment 3 Nathan Froyd [:froydnj] 2012-08-17 11:21:20 PDT
Created attachment 652840 [details]
python portion of the conversion script

This is the Python conversion script.  It requires a shell wrapper for dealing with various bits of cpp and whatnot that I will attach shortly.
Comment 4 Nathan Froyd [:froydnj] 2012-08-17 11:23:18 PDT
Created attachment 652844 [details]
shell portion of the conversion script

...and the promised shell part.  Put convert-to-json.{py,sh} in the same directory and run:

./convert-to-json.sh /path/to/TelemetryHistograms.h /path/to/output/json/file

The script should work just fine on any TelemetryHistograms.h from mozilla-central as of today; past revisions may require a bit of tweaking.
Comment 5 Nathan Froyd [:froydnj] 2012-08-17 11:24:55 PDT
Adding Harsha to point him at the scripts.
Comment 6 Harsha [:harsha] 2012-08-17 12:34:40 PDT
Thanks. Just to confirm if i checkout mozilla-central I don't need to apply your patch to TelemetryHistogram.h ?
Comment 7 Nathan Froyd [:froydnj] 2012-08-17 12:42:16 PDT
(In reply to schintalapani from comment #6)
> Thanks. Just to confirm if i checkout mozilla-central I don't need to apply
> your patch to TelemetryHistogram.h ?

That's correct.  Though if you checkout mozilla-central and apply the patch, you shouldn't need to use the conversion scripts, since the patch includes toolkit/components/telemetry/Histograms.json.  You can use that json file directly.
Comment 8 (dormant account) 2012-08-20 13:54:29 PDT
Comment on attachment 652839 [details] [diff] [review]
convert TelemetryHistograms.h to JSON

gen-histogram-data.py needs license boilerplate.
There no useful comments in the file. Having a general description near top would be handy.
Also need description various gotchas, ie distinction between 'BOOLEAN' and 'boolean'


For a followup bug:
I'm not sure a single histogram json is a great idea. I think we should break .jsons into a directory of their own and aggregate them based on various build flags.

For example,
cpp_guard stuff could just live on filename level, ala histograms.XP_WIN.json
We could still autogenerate networking histograms via the build system.
We can also better group related histograms in own files, do channel-specific histograms, etc.
Comment 9 Nathan Froyd [:froydnj] 2012-08-20 14:11:27 PDT
(In reply to Taras Glek (:taras) from comment #8)
> gen-histogram-data.py needs license boilerplate.

Ah, yes, license boilerplate.  Forgot about that.

> There no useful comments in the file. Having a general description near top
> would be handy.

I refuse to break convention with the many millions of lines of source code already in mozilla-central by addressing this concern. ;)

> Also need description various gotchas, ie distinction between 'BOOLEAN' and
> 'boolean'

That particular distinction will go away once all the appropriate histograms are converted to use HISTOGRAM_BOOLEAN.

> For a followup bug:
> I'm not sure a single histogram json is a great idea. I think we should
> break .jsons into a directory of their own and aggregate them based on
> various build flags.
> 
> For example,
> cpp_guard stuff could just live on filename level, ala histograms.XP_WIN.json
> We could still autogenerate networking histograms via the build system.
> We can also better group related histograms in own files, do
> channel-specific histograms, etc.

I'd have to think about this.  It seems like it would be useful to have as little build knowledge as necessary so that any processing of the JSON can be done without having to perform a full build.  Splitting things into individual files wouldn't necessarily go against that, but autogeneration via the build system would.  I suppose you could provide pushbutton scripts for such purposes, to be used via the build system and any other interested parties.
Comment 10 (dormant account) 2012-08-20 14:34:45 PDT
Comment on attachment 652839 [details] [diff] [review]
convert TelemetryHistograms.h to JSON

Please separate the mechanical .h->.json translation into own patch in future patches.

I think we should factor out
+    with open(filename, 'r') as f:
+        histograms = json.load(f, object_pairs_hook=json.OrderedDict)
...
+        for (name, definition) in histograms.iteritems():
Into own python module even if it's a trivial amount of code.

This will make it easier to evolve our json functionality faster.

+if __name__ == '__main__':
+    main(sys.argv[1:])

Why not just do main(sys.argv[1])?

Overall this looks ok.
Comment 11 Harsha [:harsha] 2012-08-21 17:09:46 PDT
Created attachment 654027 [details]
reference.json

Hi,
  Please take a look at the reference.json and confirm if its the right file. I looked at the older reference.json its completely different from this one so confused.
Thanks,
Harsha
Comment 12 Nathan Froyd [:froydnj] 2012-08-22 10:23:25 PDT
Created attachment 654255 [details]
Histograms.json

(In reply to schintalapani from comment #11)
>   Please take a look at the reference.json and confirm if its the right
> file. I looked at the older reference.json its completely different from
> this one so confused.

I'm not sure what you mean by "older reference.json"; the file I'm attaching is the version I've submitted in my patches and will, with minor modifications, be committed to the repository once the surrounding infrastructure is approved.

Note that it has many more histograms than the reference.json you submitted; how did you generate your reference.json?
Comment 13 Harsha [:harsha] 2012-08-22 10:52:55 PDT
I checked out mozilla-central repo and used TelemetryHistograms.h and convert-to-json.sh to generate the reference.json
Comment 14 Nathan Froyd [:froydnj] 2012-08-22 12:17:52 PDT
(In reply to schintalapani from comment #13)
> I checked out mozilla-central repo and used TelemetryHistograms.h and
> convert-to-json.sh to generate the reference.json

What platform are you using?
Comment 15 Harsha [:harsha] 2012-08-22 13:05:59 PDT
OS X 10.8
Comment 16 Nathan Froyd [:froydnj] 2012-08-22 13:10:49 PDT
Created attachment 654344 [details] [diff] [review]
TelemetryHistograms.h -> JSON patch part

The TelemetryHistograms.h to JSON bit of the patch, separated out for ease of review.
Comment 17 Nathan Froyd [:froydnj] 2012-08-22 13:11:55 PDT
Created attachment 654345 [details] [diff] [review]
Histograms.json processing machinery

The rest of the story.  License boilerplate added: do we need a header on Histograms.json?  That's going to complicate parsing...
Comment 18 Nathan Froyd [:froydnj] 2012-08-22 13:15:47 PDT
Created attachment 654346 [details]
shell portion of the conversion script

(In reply to schintalapani from comment #15)
> OS X 10.8

Hum, that's strange.  Maybe the sed on your machine doesn't work the same as sed on my Linux box.  Could you try the attached script, please?
Comment 19 Harsha [:harsha] 2012-08-22 13:52:09 PDT
I am getting this on OS X 
sed: 1: "/^\#define HISTOGRAM_/, ...": expected context address
I ran it in archlinux it works and matches your json output.
Thanks,
Harsha
Comment 20 Nathan Froyd [:froydnj] 2012-08-22 14:16:02 PDT
Created attachment 654363 [details]
shell portion of the conversion script
Comment 21 Harsha [:harsha] 2012-08-22 15:29:31 PDT
Created attachment 654388 [details]
older version of reference.json

This is the older version of json I am talking about and there is no "buckets" integer array element in this new version.
Comment 22 Nathan Froyd [:froydnj] 2012-08-23 09:10:26 PDT
(In reply to schintalapani from comment #21)
> This is the older version of json I am talking about and there is no
> "buckets" integer array element in this new version.

Ah, we are talking past each other, then.

The JSON I am attaching to this bug is merely a description of the essential bits of histograms.  The JSON you want is that generated by the scripts attached to bug 748417.  I don't believe it's particularly difficult to get from this bug's JSON description to the JSON you want, but the code to do that has not yet been written. :)
Comment 23 (dormant account) 2012-08-23 14:08:47 PDT
(In reply to Nathan Froyd (:froydnj) from comment #22)
> attached to bug 748417.  I don't believe it's particularly difficult to get
> from this bug's JSON description to the JSON you want, but the code to do
> that has not yet been written. :)

By written, Nathan meant cut'n'pasted :)

(In reply to Nathan Froyd (:froydnj) from comment #17)
> Created attachment 654345 [details] [diff] [review]
> Histograms.json processing machinery
> 
> The rest of the story.  License boilerplate added: do we need a header on
> Histograms.json?  That's going to complicate parsing...

It's a datafile, lets skip boilerplate because format does not allow it.
Comment 24 (dormant account) 2012-08-23 14:38:08 PDT
Comment on attachment 654344 [details] [diff] [review]
TelemetryHistograms.h -> JSON patch part

Thanks for factoring this out.
Comment 25 (dormant account) 2012-08-23 15:05:15 PDT
Comment on attachment 654345 [details] [diff] [review]
Histograms.json processing machinery

+histograms_file := $(srcdir)/Histograms.json
+histogram_enum_file := TelemetryHistogramEnums.h
+histogram_data_file := TelemetryHistogramData.inc

lowercase LHS seems to be pretty uncommon in our makefiles

+def allowed_keys(*keylist):
+    always_allowed = ['kind', 'description', 'cpp_guard']
+    return list(keylist) + always_allowed
For sanity, please rename the function to get_allowed_keys() so we aren't mixing a variable a function of the same name within the same func. I'm not even sure what the value of this function is.


+    with open(filename, 'r') as f:
+        histograms = json.load(f, object_pairs_hook=json.OrderedDict)
+        return histograms.iteritems()

This may be better with yield. but matter of taste really.


+if __name__ == '__main__':
+    main(sys.argv[1:])

Why not just do main(sys.argv[1])?(this is a holdover from last review)
Comment 26 (dormant account) 2012-08-23 15:19:04 PDT
(In reply to Taras Glek (:taras) from comment #24)
> Comment on attachment 654344 [details] [diff] [review]
> TelemetryHistograms.h -> JSON patch part
> 
> Thanks for factoring this out.

Note I do not want to land this with "BOOLEAN". I do not want metrics to deal with that bogus distinction
Comment 27 Nathan Froyd [:froydnj] 2012-08-23 15:47:07 PDT
(In reply to Taras Glek (:taras) from comment #25)
> +histograms_file := $(srcdir)/Histograms.json
> +histogram_enum_file := TelemetryHistogramEnums.h
> +histogram_data_file := TelemetryHistogramData.inc
> 
> lowercase LHS seems to be pretty uncommon in our makefiles

The new hotness is that uppercase names are for build-system special variables and lowercase names are for makefile-local things.

> +def allowed_keys(*keylist):
> +    always_allowed = ['kind', 'description', 'cpp_guard']
> +    return list(keylist) + always_allowed
> For sanity, please rename the function to get_allowed_keys() so we aren't
> mixing a variable a function of the same name within the same func. I'm not
> even sure what the value of this function is.

I can just remove it.

> +    with open(filename, 'r') as f:
> +        histograms = json.load(f, object_pairs_hook=json.OrderedDict)
> +        return histograms.iteritems()
> 
> This may be better with yield. but matter of taste really.

as in:

  for item in histograms.iteritems():
    yield item

?  I don't care either way (it'll be changed to yield once other refactoring goes in).
 
> +if __name__ == '__main__':
> +    main(sys.argv[1:])
> 
> Why not just do main(sys.argv[1])?(this is a holdover from last review)

It is The Python Way.  Particularly useful if you're going to use a given file as a module, but you also want it to act as a script (testing, etc.).  I don't particularly care either way.
Comment 28 (dormant account) 2012-08-23 15:51:10 PDT
Comment on attachment 654345 [details] [diff] [review]
Histograms.json processing machinery

Other than minor nits I mentioned, this looks good. 
I'd like glandium to review the makefile bits.
Comment 29 Nathan Froyd [:froydnj] 2012-08-24 10:34:53 PDT
Created attachment 655047 [details]
python portion of the conversion script

(In reply to Taras Glek (:taras) from comment #26)
> > TelemetryHistograms.h -> JSON patch part
>
> Note I do not want to land this with "BOOLEAN". I do not want metrics to
> deal with that bogus distinction

Here's a version of the script which handles boolean histograms that get defined with the HISTOGRAM macro as opposed to the HISTOGRAM_BOOLEAN macro.

It also makes attempts to coerce expressions used for the upper bounds of histograms to numbers; doing this means that we can generate bucket ranges for a larger number of histograms than we could before.  I suppose we could just stick the coercion into the reading library; I am ambivalent about this one way or the other.
Comment 30 Mike Hommey [:glandium] 2012-08-24 10:57:53 PDT
Comment on attachment 654345 [details] [diff] [review]
Histograms.json processing machinery

Review of attachment 654345 [details] [diff] [review]:
-----------------------------------------------------------------

::: toolkit/components/telemetry/Makefile.in
@@ +71,5 @@
> +	$(PYTHON) $(srcdir)/gen-histogram-data.py $< > $@
> +
> +Telemetry.$(OBJ_SUFFIX): $(histogram_data_file)
> +
> +GARBAGE += $(histogram_data_file)

You want to remove histogram_enum_file too.
Comment 32 Ryan VanderMeulen [:RyanVM] 2012-08-24 20:00:57 PDT
https://hg.mozilla.org/mozilla-central/rev/b4316e1c474d

Note You need to log in before you can comment on or make changes to this bug.