Closed Bug 748417 Opened 8 years ago Closed 7 years ago

write python script to convert TelemetryHistograms.h to JSON

Categories

(Toolkit :: Telemetry, defect)

defect
Not set

Tracking

()

RESOLVED FIXED
mozilla18

People

(Reporter: taras.mozilla, Assigned: froydnj)

References

Details

Attachments

(5 files, 15 obsolete files)

13.11 KB, patch
taras.mozilla
: review+
Details | Diff | Splinter Review
11.41 KB, patch
taras.mozilla
: review+
Details | Diff | Splinter Review
166.94 KB, text/plain
Details
3.03 KB, patch
taras.mozilla
: review+
Details | Diff | Splinter Review
15.15 KB, patch
froydnj
: review+
Details | Diff | Splinter Review
We need this to validate histograms on serverside
Assignee: nobody → taras.mozilla
If this is going to happen, it would be great if the script could also build gHistograms during the build, so that we didn't have char*s in gHistograms, thereby making things smaller and avoiding relocations.  Maybe a separate bug?
The valid histograms change over time; the plan is to keep this JSON object around for each buildid?
> Maybe a separate bug?

Think so, yes.  :)
Blocks: 748444
Attached patch python wip (obsolete) — Splinter Review
Attached file sample json (obsolete) —
Sample json output. This reports units, axis labels.
I could port c++ logic to report exact bucket #s and replace those with units when available. However before I do that work, xstevens, is that required?

This script also highlighted some telemetry abuse, perhaps we should run it as part of the build as a way to generate nathan's table(This script would become a module for bug 748444)
If anyone has a good idea for a strategy for unit testing python in our code, I'm open to suggestions.
You could just abuse the C preprocessor to generate that output, you know.  (Save for maybe the labels bit, which I didn't look at very closely.)  That would help avoid issues with multi-line HISTOGRAM definitions.

The idl parser has unit testing bits, though I don't know if those bits get run as part of the build.
> That would help avoid issues with multi-line HISTOGRAM definitions.

And with escaped " marks.
I don't think I could. I don't know how to collapse "string""s" with CPP. Python provides an opportunity for extra error/convention checking, parsing string comment for units/labels/etc.
(In reply to Taras Glek (:taras) from comment #9)
> I don't think I could. I don't know how to collapse "string""s" with CPP.
> Python provides an opportunity for extra error/convention checking, parsing
> string comment for units/labels/etc.

Mmm, good point.

Don't forget that you have to handle all the #defines in the .h too.  Maybe filter through CPP to get something consistent to run through Python?
(In reply to Nathan Froyd (:froydnj) from comment #10)
> (In reply to Taras Glek (:taras) from comment #9)
> > I don't think I could. I don't know how to collapse "string""s" with CPP.
> > Python provides an opportunity for extra error/convention checking, parsing
> > string comment for units/labels/etc.
> 
> Mmm, good point.
> 
> Don't forget that you have to handle all the #defines in the .h too.  Maybe
> filter through CPP to get something consistent to run through Python?

That's what the script does...It's the hip new thing, making up for lacking CPP features with python :)
(In reply to Taras Glek (:taras) from comment #11)
> (In reply to Nathan Froyd (:froydnj) from comment #10)
> > Don't forget that you have to handle all the #defines in the .h too.  Maybe
> > filter through CPP to get something consistent to run through Python?
> 
> That's what the script does...It's the hip new thing, making up for lacking
> CPP features with python :)

That's what I get for trying to make intelligent comments when I'm tired.
FYI subprocesss.check_output requires Python 2.7, and that is not a build requirement iirc.  So if you wanted to run an automated test, I think you'd have to do something else.
Attachment #618000 - Attachment is obsolete: true
Attached file sample json (obsolete) —
Attachment #618004 - Attachment is obsolete: true
Attached file script v2 (obsolete) —
This now handles multiple histograms per line.
Attachment #618360 - Attachment is obsolete: true
Just a quick status update on this:

The script requires features that only exist in python 2.7 and higher.  I tried to run it with Python 3, but ran into some different errors that mean it might not be compatible out of the box there.
The metrics infrastructure that we could easily run this script on only have Python 2.6 installed by default, so if we need to run this on our side, we have to figure out whether we try to get a new python version in our environment or whether the script can be ported to work on 2.6.

When run on Python 2.7, it threw this error:
  File "h_to_json.py", line 94, in parse
    success = self.add(*HGRAM_RE.match(line).groups()) and success
AttributeError: 'NoneType' object has no attribute 'groups'

Which is likely just an edge case that needs to be handled with a try/catch and some logging.  The script isn't overly complex, so anyone could productionize it given some time, but we need to figure out who is the best person to get it done based on the priority of the validation feature to other Telemetry and Metrics work.

I'd be really happy if this might be something that could be incorporated into the build environment such that Metrics can just expect the JSON output to arrive in a pre-designated place for us to consume whenever needed.  Lawrence said he'd chat with one of the build guys to get a sense of whether that is doable.
(In reply to Daniel Einspanjer :dre [:deinspanjer] from comment #17)
> Just a quick status update on this:
> 
> The script requires features that only exist in python 2.7 and higher.  I
> tried to run it with Python 3, but ran into some different errors that mean
> it might not be compatible out of the box there.
> The metrics infrastructure that we could easily run this script on only have
> Python 2.6 installed by default, so if we need to run this on our side, we
> have to figure out whether we try to get a new python version in our
> environment or whether the script can be ported to work on 2.6.
> 
> When run on Python 2.7, it threw this error:
>   File "h_to_json.py", line 94, in parse
>     success = self.add(*HGRAM_RE.match(line).groups()) and success
> AttributeError: 'NoneType' object has no attribute 'groups'
> 
> Which is likely just an edge case that needs to be handled with a try/catch
> and some logging.  The script isn't overly complex, so anyone could
> productionize it given some time, but we need to figure out who is the best
> person to get it done based on the priority of the validation feature to
> other Telemetry and Metrics work.

I wrote this script to spare you guys the indignity of parsing TelemetryHistograms.h all on your own. 

> 
> I'd be really happy if this might be something that could be incorporated
> into the build environment such that Metrics can just expect the JSON output
> to arrive in a pre-designated place for us to consume whenever needed. 
> Lawrence said he'd chat with one of the build guys to get a sense of whether
> that is doable.

The the only way to get this into the build is to get it doing something useful(ie bug 748444). I agree we should move in that direction. However pushing json from our build infrastructure onto metrics sounds like overengineering. Metrics can setup a cronjob to pull the script + telemetryhistograms.h from hg nightly and validate off that. 

We will also need to back away from using newer python features(2.7 ftw). My coding time is limited, I wrote code accordingly so people don't block on me.

In the meantime the script is a near-complete wip and it's a reasonable expectation to have someone complete the script enough to do a few validation runs. mreid has already shown a knack for that. There is no need to block on any infrastructure in the nearterm.
Attached patch better histogram tools (obsolete) — Splinter Review
We had some confusion over in bug 781531 about how to adjust to the new world order of histograms being defined in JSON.  The scripts here will need some updating for generating, e.g. bucket information.  As a first step, we need to refactor the bits we have so people aren't forced to continually derive the ranges of enumerated histograms, and so forth.

This patch introduces a histogram_tools.Histogram class that performs a lot of the heavy lifting, adjusts histogram_tools.from_file to return instances of the class, rather than (name, definition) pairs, and generally shuffles the world around.  It applies *on top* of the patches from bug 781531 and produces identical results for the generated files.

Generating the bucket data should be a much easier task with this patch in place, and I'll get to that tomorrow.

Only asking for f? at this point because the dependent bits haven't been r+'d yet.  Feel free to r+ it or just request that this be rolled into bug 781531.
Attachment #654774 - Flags: feedback?(taras.mozilla)
Comment on attachment 654774 [details] [diff] [review]
better histogram tools

rubberstamp. I like the general idea.

This needs more commenting so i can follow along before I can r+ it.
Attachment #654774 - Flags: feedback?(taras.mozilla) → feedback+
Attached patch better histogram tools (obsolete) — Splinter Review
Rebased to what landed on inbound today.

What do you mean by "better commenting"?  Better explanation or more comments in the source code itself?
Attachment #654774 - Attachment is obsolete: true
(In reply to Nathan Froyd (:froydnj) from comment #21)
> Created attachment 655143 [details] [diff] [review]
> better histogram tools
> 
> Rebased to what landed on inbound today.
> 
> What do you mean by "better commenting"?  Better explanation or more
> comments in the source code itself?

Descriptive comments describing key functions in the module.
Now with some comments.
Attachment #655143 - Attachment is obsolete: true
Attachment #655686 - Flags: review?(taras.mozilla)
If we're going to generate range information in Python separately from the range information in the histogram class, we should make sure the two implementations match.  This patch does that for DEBUG builds.
Attachment #655789 - Flags: review?(taras.mozilla)
Comment on attachment 655686 [details] [diff] [review]
better histogram tools

+        table_dispatch(definition['kind'], table,
+                       lambda allowed_keys: Histogram.check_keys(name, definition, allowed_keys))

It's too bad Guido hates currying.

I noticed that you went from no comments to:
+    def description(self):
+        """Return the description of the histogram."""
+        return self._description

My rule of thumb is that code has to be obvious enough to not need comments. Prior iteration wasn't obvious enough to me, this one tipped the balance the other way around.
Attachment #655686 - Flags: review?(taras.mozilla) → review+
Comment on attachment 655789 [details] [diff] [review]
check Python-generated range information at compile time



+  // Check that what Histogram thinks the ranges are lines up with what
+  // our informational scripts think the ranges are.
typo


r+. I do not feel strongly about nits below, use your judgement.

+               "Script calculates invalid # of buckets");
I think C++/Python bucket # mismatch might be easier to diagnose if it goes wrong

+                 "Script calculates invalid information!");
same idea here


try_to_coerce_to_integer is a bit of a mess with (either this didn't work or we are using an old crappy python). I do not think it's a good idea to have codepaths that diverge on our infrastructure and local machines. I think we should make toplevel except: clause as the default. ast module looks nice, but in this case it does not seem to simplify life.

Also to be pedantic, try_to_coerce_to_integer actually coerces to a primitive, integer in the name is misleading.
Attachment #655789 - Flags: review?(taras.mozilla) → review+
(In reply to Taras Glek (:taras) from comment #25)
> It's too bad Guido hates currying.

It's too bad Guido hates functional programming generally.

> I noticed that you went from no comments to:
> +    def description(self):
> +        """Return the description of the histogram."""
> +        return self._description
> 
> My rule of thumb is that code has to be obvious enough to not need comments.
> Prior iteration wasn't obvious enough to me, this one tipped the balance the
> other way around.

I think you are being inconsistent.

(In reply to Taras Glek (:taras) from comment #26)
> +  // Check that what Histogram thinks the ranges are lines up with what
> +  // our informational scripts think the ranges are.
> typo

Where?

> +               "Script calculates invalid # of buckets");
> I think C++/Python bucket # mismatch might be easier to diagnose if it goes
> wrong
> 
> +                 "Script calculates invalid information!");
> same idea here

I will make these more informative.

> try_to_coerce_to_integer is a bit of a mess with (either this didn't work or
> we are using an old crappy python). I do not think it's a good idea to have
> codepaths that diverge on our infrastructure and local machines. I think we
> should make toplevel except: clause as the default. ast module looks nice,
> but in this case it does not seem to simplify life.

Fair enough.
https://hg.mozilla.org/integration/mozilla-inbound/rev/01b657335f1d
https://hg.mozilla.org/integration/mozilla-inbound/rev/f8c53a2ecb2b
Assignee: taras.mozilla → nfroyd
Status: NEW → ASSIGNED
OS: Windows 7 → All
Hardware: x86_64 → All
Whiteboard: [leave open]
A new version of Taras's original script that generates the required json from Histograms.json.

Daniel, does this script output everything Metrics would like to know about?

Example output being attached shortly.
Attachment #656554 - Flags: review?(taras.mozilla)
Attachment #656554 - Flags: feedback?(deinspanjer)
Attached file sample json
New JSON generated from the new script.
Attachment #618361 - Attachment is obsolete: true
We may want to make the histogram parameters for boolean and flag histograms in the Python scripts match the parameters in the C++ source code...otherwise things look a little weird in the generated JSON.
Comment on attachment 656554 [details] [diff] [review]
generate bucket information for histograms for metrics's consumption

I believe we also need the histogram type so we can distinguish between linear and such.  Please forgive the - if this is no longer relevant for some reason.

Adding Harsha for feedback since he is working on the validation part.
Attachment #656554 - Flags: feedback?(schintalapani)
Attachment #656554 - Flags: feedback?(deinspanjer)
Attachment #656554 - Flags: feedback-
Ah, fair point about the histogram type.
Attachment #656554 - Attachment is obsolete: true
Attachment #656554 - Flags: review?(taras.mozilla)
Attachment #656554 - Flags: feedback?(schintalapani)
Attachment #656562 - Flags: feedback?(schintalapani)
Would the output of this script be wrapped up in something higher level that would describe the product, channel, and any other attributes that might be relevant such as platform or locale?  Would that information be inferred from the location in the source tree from which this output is retrieved?
Nathan, this also needs to duplicate histograms that match a certain regexp into STARTUP_ ones as per http://mxr.mozilla.org/mozilla-central/source/toolkit/components/telemetry/TelemetryPing.js#513
(In reply to Daniel Einspanjer :dre [:deinspanjer] from comment #35)
> Would the output of this script be wrapped up in something higher level that
> would describe the product, channel, and any other attributes that might be
> relevant such as platform or locale?  Would that information be inferred
> from the location in the source tree from which this output is retrieved?

We could do that.  I'm not clear on what the best way to distribute this to you from a build is; it's not a user-facing thing, so it shouldn't be packaged and it's not a development/sdk-like thing, so it shouldn't go in the sdk, but it should somehow go alongside the packages themselves...
(In reply to Taras Glek (:taras) from comment #36)
> Nathan, this also needs to duplicate histograms that match a certain regexp
> into STARTUP_ ones as per
> http://mxr.mozilla.org/mozilla-central/source/toolkit/components/telemetry/
> TelemetryPing.js#513

That's a good point.  Fixed.
Attachment #656562 - Attachment is obsolete: true
Attachment #656562 - Flags: feedback?(schintalapani)
Attachment #656585 - Flags: feedback?(schintalapani)
we need both bucket info and histogram type.
(In reply to schintalapani from comment #39)
> we need both bucket info and histogram type.

What's "bucket info" in this context?  Just the lower bounds of each bucket, or something more elaborate?
its an array containing all the values
example
"buckets": [0, 1, 3, 8, 21, 57, 154, 414, 1114, 3000]
Hi Nathan,
   Is the script ready to use.
Thanks
Please run the script, examine the output, and indicate whether you think the output is sufficient for your needs.  If it is sufficient, we'll get it checked into the tree next week.
could you give me some instructions running it
there is "script v2" what should be the input for this
and what about this one
https://bug748417.bugzilla.mozilla.org/attachment.cgi?id=656585

thanks,
Harsha
https://bug748417.bugzilla.mozilla.org/attachment.cgi?id=656585 is what you want to use.  Save it on your machine as gen-histogram-bucket-ranges.py and run from a command prompt:

python /path/to/gen-histogram-bucket-ranges.py /path/to/Histograms.json > output.json

and examine the output.json file.
Nathan,
   from where can i get histogram_tools package
(In reply to schintalapani from comment #46)
>    from where can i get histogram_tools package

In the same directory where Histograms.json lives.  You can either run python from that directory to pick it up automatically, or set the PYTHONPATH environment variable.
confused I only downloaded the code from attachments to this bug. Is there anything else I am missing?
You need this file: https://bug748417.bugzilla.mozilla.org/attachment.cgi?id=656585

In addition, you also need a checkout of mozilla-central or mozilla-inbound, to get access to histogram_tools.py and Histograms.json, both of which can be found in the toolkit/components/telemetry/ subdirectory of your checkout.
Made few changes to structure of final json output. 
Could you tell me which appVersion should this be applied to.
Attachment #658230 - Attachment mime type: text/x-python → text/plain
Attached file generate-histogram-bucket-ranges.py (obsolete) —
Attachment #662218 - Attachment mime type: text/x-python → text/plain
Attached file generate bucket information, vbetter (obsolete) —
Here's a refactored version that's somewhat clearer to read, as well as correcting a bug with the handling of STARTUP_ histograms.
Attachment #618861 - Attachment is obsolete: true
Attachment #656585 - Attachment is obsolete: true
Attachment #658230 - Attachment is obsolete: true
Attachment #662217 - Attachment is obsolete: true
Attachment #662218 - Attachment is obsolete: true
Attachment #656585 - Flags: feedback?(schintalapani)
Attached patch patch (obsolete) — Splinter Review
Here's an actual patch with the new script.
Attachment #664293 - Attachment is obsolete: true
Attachment #665463 - Flags: review?(taras.mozilla)
Comment on attachment 665463 [details] [diff] [review]
patch

+        if histogram.low() == 0:
+            parameters['min'] = 1
+        else:
+            parameters['min'] = histogram.low()

what's going on here?
(In reply to Taras Glek (:taras) from comment #56)
> +        if histogram.low() == 0:
> +            parameters['min'] = 1
> +        else:
> +            parameters['min'] = histogram.low()
> 
> what's going on here?

Oh, I bet this is leftover code from when we gave incorrect parameters for boolean and flag histograms, and the low value of 0 was screwing with something.  This should just be:

   parameters['min'] = histogram.low()
(In reply to Nathan Froyd (:froydnj) from comment #57)
> (In reply to Taras Glek (:taras) from comment #56)
> > +        if histogram.low() == 0:
> > +            parameters['min'] = 1
> > +        else:
> > +            parameters['min'] = histogram.low()
> > 
> > what's going on here?
> 
> Oh, I bet this is leftover code from when we gave incorrect parameters for
> boolean and flag histograms, and the low value of 0 was screwing with
> something.  This should just be:
> 
>    parameters['min'] = histogram.low()

k, please submit a patch that doesn't merely test whether I'm reading it :)
Attached patch patchSplinter Review
Attachment #665463 - Attachment is obsolete: true
Attachment #665463 - Flags: review?(taras.mozilla)
Attachment #665743 - Flags: review?(taras.mozilla)
Attachment #665743 - Flags: review?(taras.mozilla) → review+
[Approval Request Comment]
This uplift request is for the benefit of the metrics team, who uses these files to validate Telemetry data coming in.  They can use the files from mozilla-central for helping to validate Telemetry pings from FF17, but it'd be more convenient, I think, to have the files in the correct tree.  That's my understanding of the situation anyway; if Harsha or Daniel says it doesn't really matter, then let's drop this uplift request.

Uplifting this means bug 789371 should also be uplifted; I will make that request separately and it's a small dependency in any event.

Bug caused by (feature/regressing bug #): No bug.
User impact if declined: None.
Testing completed (on m-c, etc.): Tested on m-c for this release cycle.
Risk to taking this patch (and alternatives if risky): Very low.  The gen-histogram-bucket-changes.py file is not part of the build, and the changes to the build from the other files in the patch have been extensively tested this release cycle.
String or UUID changes made by this patch: None.
Attachment #667170 - Flags: review+
Attachment #667170 - Flags: approval-mozilla-aurora?
https://hg.mozilla.org/mozilla-central/rev/80ad55f342c9
Status: ASSIGNED → RESOLVED
Closed: 7 years ago
Flags: in-testsuite-
Resolution: --- → FIXED
Target Milestone: --- → mozilla18
Comment on attachment 667170 [details] [diff] [review]
aurora uplift of histogram_tools.py and gen-histogram-bucket-ranges.py

Please re-nominate when it's clear that this is necessary and bug 789371 is ready for approval as well.
Attachment #667170 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora-
You need to log in before you can comment on or make changes to this bug.