Closed Bug 1211599 Opened 9 years ago Closed 9 years ago

Enforce 100-bucket limit on Telemetry histograms

Categories

(Toolkit :: Telemetry, defect, P4)

defect

Tracking

()

RESOLVED FIXED
mozilla45
Tracking Status
firefox44 --- affected
firefox45 --- fixed

People

(Reporter: vladan, Assigned: chutten)

References

(Blocks 1 open bug)

Details

(Whiteboard: [measurement:client])

Attachments

(2 files, 5 obsolete files)

Attached file problemHistograms.txt
As discussed on fhr-dev, we should standardize on a maximum of 100 buckets in a histogram and enforce it in the histogram generation script.

Currently, there are 176 histograms with n_buckets > 100, so we may have to whitelist them.

Setting the maximum threshold higher would be of little help:

# of histograms with 100-199 buckets: 12
# of histograms with 200-299 buckets: 24
# of histograms with 400 buckets:  1
# of histograms with 500 or 512 buckets:  5
# of histograms with 700 buckets:  1
# of histograms with 1,000 buckets:  131
# of histograms with 10,000 buckets:  2

The 1,000 bucket histograms are the biggest challenge.
Bug 1167292 created the biggest histograms, with 10,000 buckets in 2 histograms
Blocks: 1201022
Priority: -- → P4
Whiteboard: [measurement:client]
Chris: can you add a 100 bucket maximum in toolkit/components/telemetry/histogram_tools.py? Whitelist the existing histograms with more than 100 buckets.. you can put their names in some kind of whitelist file in the same directory

Georg: any objections?
Assignee: nobody → chutten
Just a simple JSON file with the 185+ histograms who have numeric buckets > 100, and the code to throw an exception when one's seen.

Also, since this works in other places that might not have the whitelist present, default to allowing everything in the event the whitelist isn't found. Complain, but allow.
Attachment #8678277 - Flags: review?(nfroyd)
Comment on attachment 8678277 [details] [diff] [review]
0001-Bug-1211599-Only-allow-whitelisted-histograms-to-hav.patch

Review of attachment 8678277 [details] [diff] [review]:
-----------------------------------------------------------------

::: toolkit/components/telemetry/bucket-whitelist.json
@@ +1,4 @@
> +{
> +  "MEMORY_RESIDENT": true,
> +  "MEMORY_JS_MAIN_RUNTIME_TEMPORARY_PEAK": true,
> +  "MEMORY_JS_GC_HEAP": true,

Add a JS comment to the header warning people not to edit the whitelist without Perf team's approval (include my IRC nick or email address)

::: toolkit/components/telemetry/histogram_tools.py
@@ +244,5 @@
>          self._high = try_to_coerce_to_number(high)
>          self._n_buckets = try_to_coerce_to_number(n_buckets)
> +        if self._n_buckets > 100 and type(self._n_buckets) is int:
> +            try:
> +                with open(os.path.join(os.path.abspath(os.path.realpath(os.path.dirname(__file__))), 'bucket-whitelist.json'), 'r') as f:

this line is pretty ugly, can you figure out the path on a separate line and put it in a temporary variable

@@ +250,5 @@
> +                        whitelist = json.load(f)
> +                    except ValueError, e:
> +                        raise BaseException, 'error parsing bucket whitelist'
> +                if self._name not in whitelist:
> +                    raise KeyError, 'Non-whitelisted histogram %s cannot have > 100 buckets' % self._name

I wouldn't want people to just whitelist their large histogram after getting this message :) How about "New histograms are not allowed to have more than 100 buckets. Histograms with hundreds of buckets increase mobile memory use, slow down histogram aggregation, etc. Contact :vladan or Perf team if you think an exception needs to be made"
(In reply to Vladan Djeric (:vladan) -- please needinfo! from comment #3)
> Comment on attachment 8678277 [details] [diff] [review]
> 0001-Bug-1211599-Only-allow-whitelisted-histograms-to-hav.patch
> 
> Review of attachment 8678277 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> ::: toolkit/components/telemetry/bucket-whitelist.json
> @@ +1,4 @@
> > +{
> > +  "MEMORY_RESIDENT": true,
> > +  "MEMORY_JS_MAIN_RUNTIME_TEMPORARY_PEAK": true,
> > +  "MEMORY_JS_GC_HEAP": true,
> 
> Add a JS comment to the header warning people not to edit the whitelist
> without Perf team's approval (include my IRC nick or email address)

JSON files aren't permitted to have comments and still be JSON files. I double-checked and python's json.load unfortunately doesn't seem to like comments (ie, it's a strict JSON parser, not a JS object parser).

> 
> ::: toolkit/components/telemetry/histogram_tools.py
> @@ +244,5 @@
> >          self._high = try_to_coerce_to_number(high)
> >          self._n_buckets = try_to_coerce_to_number(n_buckets)
> > +        if self._n_buckets > 100 and type(self._n_buckets) is int:
> > +            try:
> > +                with open(os.path.join(os.path.abspath(os.path.realpath(os.path.dirname(__file__))), 'bucket-whitelist.json'), 'r') as f:
> 
> this line is pretty ugly, can you figure out the path on a separate line and
> put it in a temporary variable

Sure.

> @@ +250,5 @@
> > +                        whitelist = json.load(f)
> > +                    except ValueError, e:
> > +                        raise BaseException, 'error parsing bucket whitelist'
> > +                if self._name not in whitelist:
> > +                    raise KeyError, 'Non-whitelisted histogram %s cannot have > 100 buckets' % self._name
> 
> I wouldn't want people to just whitelist their large histogram after getting
> this message :) How about "New histograms are not allowed to have more than
> 100 buckets. Histograms with hundreds of buckets increase mobile memory use,
> slow down histogram aggregation, etc. Contact :vladan or Perf team if you
> think an exception needs to be made"

Agreed. I'll change it up.
Updated error message, split path calculation to separate line for clarity.
Attachment #8678277 - Attachment is obsolete: true
Attachment #8678277 - Flags: review?(nfroyd)
Attachment #8678305 - Flags: review?(vladan.bugzilla)
Attachment #8678305 - Flags: review?(nfroyd)
Comment on attachment 8678305 [details] [diff] [review]
0001-Bug-1211599-Only-allow-whitelisted-histograms-to-hav.patch

Review of attachment 8678305 [details] [diff] [review]:
-----------------------------------------------------------------

::: toolkit/components/telemetry/histogram_tools.py
@@ +244,5 @@
>          self._high = try_to_coerce_to_number(high)
>          self._n_buckets = try_to_coerce_to_number(n_buckets)
> +        if self._n_buckets > 100 and type(self._n_buckets) is int:
> +            try:
> +                whitelist_path = os.path.join(os.path.abspath(os.path.realpath(os.path.dirname(__file__))), 'bucket-whitelist.json');

semicolons in python?

@@ +247,5 @@
> +            try:
> +                whitelist_path = os.path.join(os.path.abspath(os.path.realpath(os.path.dirname(__file__))), 'bucket-whitelist.json');
> +                with open(whitelist_path, 'r') as f:
> +                    try:
> +                        whitelist = json.load(f)

are you loading the whitelist file for every histogram?
Attachment #8678305 - Flags: review?(vladan.bugzilla) → review-
Now we're only parsing it the once, and I've removed a spurious semi.
Attachment #8678305 - Attachment is obsolete: true
Attachment #8678305 - Flags: review?(nfroyd)
Attachment #8678333 - Flags: review?(vladan.bugzilla)
Attachment #8678333 - Flags: review?(nfroyd)
Comment on attachment 8678333 [details] [diff] [review]
0001-Bug-1211599-Only-allow-whitelisted-histograms-to-hav.patch

Review of attachment 8678333 [details] [diff] [review]:
-----------------------------------------------------------------

::: toolkit/components/telemetry/bucket-whitelist.json
@@ +1,2 @@
> +{
> +  "MEMORY_RESIDENT": true,

why not make this an array? the "true" values seems redundant

::: toolkit/components/telemetry/histogram_tools.py
@@ +77,5 @@
>  always_allowed_keys = ['kind', 'description', 'cpp_guard', 'expires_in_version',
>                         'alert_emails', 'keyed', 'releaseChannelCollection']
>  
> +try:
> +    whitelist_path = os.path.join(os.path.abspath(os.path.realpath(os.path.dirname(__file__))), 'bucket-whitelist.json')

are realpath & abspath needed?

@@ +82,5 @@
> +    with open(whitelist_path, 'r') as f:
> +        try:
> +            whitelist = json.load(f)
> +        except ValueError, e:
> +            raise BaseException, 'error parsing bucket whitelist'

include the path & filename in the error message to make it easier to debug

@@ +84,5 @@
> +            whitelist = json.load(f)
> +        except ValueError, e:
> +            raise BaseException, 'error parsing bucket whitelist'
> +except IOError:
> +    whitelist = None

declare this at the top of the file, and use a more descriptive name like n_buckets_whitelist

@@ +85,5 @@
> +        except ValueError, e:
> +            raise BaseException, 'error parsing bucket whitelist'
> +except IOError:
> +    whitelist = None
> +    print 'Unable to load whitelist. Assume all histograms are acceptable.'

ditto on including path/filename where script looked for the whitelist

@@ +255,5 @@
>          self._high = try_to_coerce_to_number(high)
>          self._n_buckets = try_to_coerce_to_number(n_buckets)
> +        if self._n_buckets > 100 and type(self._n_buckets) is int:
> +            if whitelist is None:
> +                print 'Histogram %s allowed to have > 100 buckets because the whitelist file could not be read' % self._name

this is going to print ~190 error messages, just put up one big scary message in the IOError handler instead
Attachment #8678333 - Flags: review?(vladan.bugzilla)
(In reply to Vladan Djeric (:vladan) -- please needinfo! from comment #8)
> Comment on attachment 8678333 [details] [diff] [review]
> 0001-Bug-1211599-Only-allow-whitelisted-histograms-to-hav.patch
> 
> Review of attachment 8678333 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> ::: toolkit/components/telemetry/bucket-whitelist.json
> @@ +1,2 @@
> > +{
> > +  "MEMORY_RESIDENT": true,
> 
> why not make this an array? the "true" values seems redundant

This way I can test for membership instead of having to indexOf each time. I could convert from list to set inside Python to accomplish a similar thing, but this way it is easier to extend in the future if we want this whitelist to mean more than just "presence in this list means we're good for > 100 buckets". Something like "MEMORY_RESIDENT": {n_buckets_limit: 400, can_access_awesome_thing1: 'read-only'}, for instance.
 
> ::: toolkit/components/telemetry/histogram_tools.py
> @@ +77,5 @@
> >  always_allowed_keys = ['kind', 'description', 'cpp_guard', 'expires_in_version',
> >                         'alert_emails', 'keyed', 'releaseChannelCollection']
> >  
> > +try:
> > +    whitelist_path = os.path.join(os.path.abspath(os.path.realpath(os.path.dirname(__file__))), 'bucket-whitelist.json')
> 
> are realpath & abspath needed?

For traversing symlinks it would be. I'm not sure if it's relevant, I copypasta'd from the many SCRIPT_DIR examples in other python files: https://dxr.mozilla.org/mozilla-central/search?q=SCRIPT_+realpath&redirect=true&case=true 

> @@ +82,5 @@
> > +    with open(whitelist_path, 'r') as f:
> > +        try:
> > +            whitelist = json.load(f)
> > +        except ValueError, e:
> > +            raise BaseException, 'error parsing bucket whitelist'
> 
> include the path & filename in the error message to make it easier to debug

Okily dokily.

> @@ +84,5 @@
> > +            whitelist = json.load(f)
> > +        except ValueError, e:
> > +            raise BaseException, 'error parsing bucket whitelist'
> > +except IOError:
> > +    whitelist = None
> 
> declare this at the top of the file, and use a more descriptive name like
> n_buckets_whitelist

Alrighty.

> @@ +85,5 @@
> > +        except ValueError, e:
> > +            raise BaseException, 'error parsing bucket whitelist'
> > +except IOError:
> > +    whitelist = None
> > +    print 'Unable to load whitelist. Assume all histograms are acceptable.'
> 
> ditto on including path/filename where script looked for the whitelist

Makes sense.

> @@ +255,5 @@
> >          self._high = try_to_coerce_to_number(high)
> >          self._n_buckets = try_to_coerce_to_number(n_buckets)
> > +        if self._n_buckets > 100 and type(self._n_buckets) is int:
> > +            if whitelist is None:
> > +                print 'Histogram %s allowed to have > 100 buckets because the whitelist file could not be read' % self._name
> 
> this is going to print ~190 error messages, just put up one big scary
> message in the IOError handler instead

I figured if it was in automation no one would be looking anyway, and if it wasn't in automation we'd want the user to be scared. I'll try to be just as scary with one line. 'Tis the season, after all.
Comment on attachment 8678333 [details] [diff] [review]
0001-Bug-1211599-Only-allow-whitelisted-histograms-to-hav.patch

Review of attachment 8678333 [details] [diff] [review]:
-----------------------------------------------------------------

Vladan's review seems good enough to me, and it sounds like there are some revisions that are going to be made anyway.

::: toolkit/components/telemetry/histogram_tools.py
@@ +85,5 @@
> +        except ValueError, e:
> +            raise BaseException, 'error parsing bucket whitelist'
> +except IOError:
> +    whitelist = None
> +    print 'Unable to load whitelist. Assume all histograms are acceptable.'

Nit: I think the error reads slightly better as "Assuming all histograms..."

@@ +258,5 @@
> +            if whitelist is None:
> +                print 'Histogram %s allowed to have > 100 buckets because the whitelist file could not be read' % self._name
> +            elif self._name not in whitelist:
> +                raise KeyError, ('New histograms are not permitted to have more than 100 buckets. '
> +                                'Histograms with large numbers of buckets use disproportionately-high amounts of resources. '

Nit: no hyphen here.
Attachment #8678333 - Flags: review?(nfroyd)
Attachment #8678859 - Flags: review?(vladan.bugzilla)
Attachment #8678859 - Flags: review?(nfroyd)
Attachment #8678333 - Attachment is obsolete: true
Comment on attachment 8678859 [details] [diff] [review]
0001-Bug-1211599-Only-allow-whitelisted-histograms-to-hav.patch

Review of attachment 8678859 [details] [diff] [review]:
-----------------------------------------------------------------

(In reply to Chris H-C :chutten from comment #9)
> (In reply to Vladan Djeric (:vladan) -- please needinfo! from comment #8)
> > why not make this an array? the "true" values seems redundant
> 
> This way I can test for membership instead of having to indexOf each time. I
> could convert from list to set inside Python to accomplish a similar thing,
> but this way it is easier to extend in the future if we want this whitelist
> to mean more than just "presence in this list means we're good for > 100
> buckets". Something like "MEMORY_RESIDENT": {n_buckets_limit: 400,
> can_access_awesome_thing1: 'read-only'}, for instance.

You have a point about future extensibility, but then the values in the dict should be {} or {n_buckets: true}. I suggest for this patch we just keep a list in the file and convert it to a set as you said.

> > are realpath & abspath needed?
> 
> For traversing symlinks it would be. I'm not sure if it's relevant, I
> copypasta'd from the many SCRIPT_DIR examples in other python files:
> https://dxr.mozilla.org/mozilla-central/
> search?q=SCRIPT_+realpath&redirect=true&case=true 

Ok, let's keep it
Attachment #8678859 - Flags: review?(vladan.bugzilla) → review+
Bug 1211599 - Only allow whitelisted histograms to have > 100 buckets.
Comment on attachment 8678896 [details]
MozReview Request: Bug 1211599 - Only allow whitelisted histograms to have > 100 buckets.

mozreview test publish. Feel free to ignore.
Attachment #8678896 - Attachment is obsolete: true
Attachment #8678859 - Flags: review?(nfroyd) → review+
Patch updated as asked. Now to figure out try.
Attachment #8678859 - Attachment is obsolete: true
Attachment #8680018 - Flags: review+
Clean trybuild of win32 and xpcshell tests on Win7: https://treeherder.mozilla.org/#/jobs?repo=try&revision=3d86e92d4cd2
Status: NEW → ASSIGNED
Keywords: checkin-needed
https://hg.mozilla.org/mozilla-central/rev/c0a60c2c5a83
Status: ASSIGNED → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla45
removing the b2g 2.5 flag since this commit has been reverted due to an incorrect merge, sorry for the confusion
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: