Closed Bug 1331915 Opened 7 years ago Closed 7 years ago

Add Telemetry for Graphite Library

Categories

(Core :: Graphics: Text, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla55
Tracking Status
firefox54 --- fixed
firefox55 --- fixed

People

(Reporter: tjr, Assigned: tjr)

References

Details

Attachments

(1 file)

Graphite is... not the most security-hardened library. We want to add Telemetry for the Graphite library so we can determine how often it is used. This will help us know how much it is relied upon by our user base.

I'm proposing the following metrics:
 - An exponential keyed histogram. The count is incremented with each page load that uses graphite. This will tell us how _much_ graphite is used.
 - A second exponential keyed histogram. This count; however, behaves like a flag (like how this flag was implemented: https://hg.mozilla.org/integration/mozilla-inbound/rev/8b9681467229) and will tell us how many _sessions_ use graphite.

The keys are the browser's language pack.  Keying it by language pack will tell us if certain language packs use the feature disproportionately (which seems likely). 

So we collect two pieces, both opt-in - about how many of our users (well, sessions) use graphite; and how much is it being used by those who do. Is this the best approach?
Flags: needinfo?(benjamin)
This is too complex ;-)

You should be able to use one scalar count (or count histogram). That can then be rather easily processed to tell you both how many session or users are triggering graphite, and how much.

Similarly, we already collect both locale and country-geo as part of the telemetry environment, so you don't need to key on that at all.

And I'll totally mark data-r on a counter!

Scalars are the new hotness for counting things: see https://gecko.readthedocs.io/en/latest/toolkit/components/telemetry/telemetry/collection/scalars.html and ping the #telemetry IRC channel for questions.
Flags: needinfo?(benjamin)
Ahha, I had been introduced to scalars, but didn't know post-processing could tell me breakdowns on sessions/users/locales/country-geo.  I will go with a single scalar then!
Attachment #8829653 - Flags: review?(jfkthame)
Comment on attachment 8829653 [details]
Bug 1331915 Add Telemetry probe to Graphite library usage

https://reviewboard.mozilla.org/r/106658/#review107806

Looks fine. Though unless there's some kind of hidden cleverness, it seems quite limited; it would be nice to have insight into things like how many different fonts in the wild depend on graphite, or how many different pages it gets used on. You can't figure out things like that from this count, can you? Still, any data is more than no data. :)
Attachment #8829653 - Flags: review?(jfkthame) → review+
This counter should increment per-page per-font. So a page with 4 fonts increments this 4 times. Above I'm told we can post-process this to tell us
- What percentage of users have used graphite
- What percentage of sessions have used graphite

I think we could made an upper-bounds guess about what percentage of page loads have used graphite. Assuming we can post-progress a value for 'percentage of page loads', then this value would be an upper bound (since some page loads will have more than one).  

I can add a new scalar (or change this one) to increment once per page to hopefully give us a more accurate 'percentage of page loads' value. (This seems more accurate really so I plan to make this change.)

I could add a new keyed scalar (with the key being the font name) to increment on each different font, which would tell us which fonts are most popular.

I can add a histogram that tells us what percentage of fonts we encounter use Graphite.

Ultimately, if graphite is rarely used my plan is to request it be disabled by default either on a language pack basis (I know this isn't done currently but assume I'm willing to implement it) or wholesale with the goal of reducing attack surface. (I'm open to other mitigations of course (whitelisting, prompts, etc but my goal is attack surface reduction.) So I'd like to make the telemetry be whatever information you would want me to use to convince you this is okay. What would you like? =)

bsmedberg: From a scalar value can we post-process to get a 'percentage of page loads' value? Are you opposed to any of proposed telemetry items?
Flags: needinfo?(jfkthame)
Flags: needinfo?(benjamin)
For any % you need to make sure you have a numerator and a denominator. We have counts for CONTENT_DOCUMENTS_DESTROYED and TOP_LEVEL_CONTENT_DOCUMENTS_DESTROYED but those are currently not opt-out metrics. Then you need to make sure your numerator matches. I believe the use-counter mechanism has a way to make that possible/easier, but I'm not sure; talk to froydnj about the mechanics of that.

I'm concerned about font names: that is rather uncontrolled and potentially identifying data, so it's risky to collect that and we'd have to put some additional controls on how we submit/aggregate/display/use the data to preserve privacy.
Flags: needinfo?(benjamin)
(In reply to Tom Ritter [:tjr] from comment #6)
> Ultimately, if graphite is rarely used my plan is to request it be disabled
> by default either on a language pack basis (I know this isn't done currently
> but assume I'm willing to implement it) or wholesale with the goal of
> reducing attack surface. (I'm open to other mitigations of course
> (whitelisting, prompts, etc but my goal is attack surface reduction.) So I'd
> like to make the telemetry be whatever information you would want me to use
> to convince you this is okay. What would you like? =)

I fully expect you'll find that graphite is "rarely" used, for some reasonable definition of "rarely".

However, I would be against making it disabled by default, because that will severely compromise the value that it offers. The primary "customers" for graphite are user (or language) communities whose needs are not adequately met by mainstream OpenType font solutions, which generally means lesser-known minority groups in areas such as south-east Asia. The fact that graphite is available -by default- in Firefox means that an author can create a site for such a community using a graphite-dependent font and be confident that it will render properly for "naive" Firefox users who have simply downloaded the browser -- not discovered how to go into about:config and tweak settings or install a particular add-on or anything.

Making graphite somehow "opt-in" instead of available by default will put a user-education barrier in the way of anyone wishing to produce content for these communities. Yes, they're a small minority, but they should have the same ability to use the Web in their own languages as the rest of us. These are people who are already at risk of being marginalized by much of our industry. Mozilla should be their advocate and supporter, doing all we can to help them bridge the "digital divide".

"Our mission is to ensure the Internet is a global public resource, open and accessible to all. An Internet that truly puts people first, where individuals can shape their own experience and are empowered, safe and independent."[1] Having graphite font support available (to anyone who needs it, not only to the few who discover how to enable it) is one part of making the Web "open and accessible to *all*", and not only to well-off majority-language communities.

[1] https://www.mozilla.org/en-US/mission/
Flags: needinfo?(jfkthame)
(In reply to Jonathan Kew (:jfkthame) from comment #8)
> (In reply to Tom Ritter [:tjr] from comment #6)
> > Ultimately, if graphite is rarely used my plan is to request it be disabled
> > by default either on a language pack basis (I know this isn't done currently
> > but assume I'm willing to implement it) or wholesale with the goal of
> > reducing attack surface. (I'm open to other mitigations of course
> > (whitelisting, prompts, etc but my goal is attack surface reduction.) So I'd
> > like to make the telemetry be whatever information you would want me to use
> > to convince you this is okay. What would you like? =)
> 
> I fully expect you'll find that graphite is "rarely" used, for some
> reasonable definition of "rarely".
> 
> However, I would be against making it disabled by default, because that will
> severely compromise the value that it offers. 

Thanks for the context. This use-case will absolutely be considered once we have collected data. 

The goal of this work is to provide the data to help us make both data-driven and qualitative decisions about Firefox. We need all the context we can get in order to make the best decisions for the experience and safety of our users.
(In reply to Selena Deckelmann :selenamarie :selena use ni? from comment #9)
> (In reply to Jonathan Kew (:jfkthame) from comment #8)
> > (In reply to Tom Ritter [:tjr] from comment #6)
> > > Ultimately, if graphite is rarely used my plan is to request it be disabled
> > > by default either on a language pack basis (I know this isn't done currently
> > > but assume I'm willing to implement it) or wholesale with the goal of
> > > reducing attack surface. (I'm open to other mitigations of course
> > > (whitelisting, prompts, etc but my goal is attack surface reduction.) So I'd
> > > like to make the telemetry be whatever information you would want me to use
> > > to convince you this is okay. What would you like? =)
> > 
> > I fully expect you'll find that graphite is "rarely" used, for some
> > reasonable definition of "rarely".
> > 
> > However, I would be against making it disabled by default, because that will
> > severely compromise the value that it offers. 
> 
> Thanks for the context. This use-case will absolutely be considered once we
> have collected data. 
> 
> The goal of this work is to provide the data to help us make both
> data-driven and qualitative decisions about Firefox. We need all the context
> we can get in order to make the best decisions for the experience and safety
> of our users.

Some more context for Graphite:
http://junglecode.net/graphite-smart-fonts-in-firefox/

As noted, Graphite is meant to serve communities for whom standard font rendering technologies can't adequately render their language. By definition, that's a small and discontiguous population. We knew that when we shipped the feature, and I'm fairly certain the Telemetry experiment will confirm that as well.
Rebased the patch, it's been a while so I just want to double check that we can mark this as checkin-needed.
Flags: needinfo?(jfkthame)
Seems OK to me, let's go ahead.
Flags: needinfo?(jfkthame)
Pushed by ryanvm@gmail.com:
https://hg.mozilla.org/integration/autoland/rev/03a3ed5fbce1
Add Telemetry probe to Graphite library usage r=jfkthame
Keywords: checkin-needed
Backed out for telemetry bustage:

https://hg.mozilla.org/integration/autoland/rev/b820712f307783b262fe59fdf5687386bc3d3a72

Push with failures: https://treeherder.mozilla.org/#/jobs?repo=autoland&revision=03a3ed5fbce14c38e0b2bd452b6aed182dfcd957&filter-resultStatus=testfailed&filter-resultStatus=busted&filter-resultStatus=exception&filter-resultStatus=retry&filter-resultStatus=usercancel&filter-resultStatus=runnable
Failure log: https://treeherder.mozilla.org/logviewer.html#?job_id=94631452&repo=autoland

[task 2017-04-26T22:59:52.574382Z] 22:59:52     INFO -  ../../config/nsinstall -R -m 644 'xpcAccEvents.h' '../../dist/include'
[task 2017-04-26T22:59:52.575954Z] 22:59:52     INFO -  gmake[5]: Leaving directory '/home/worker/workspace/build/src/obj-firefox/accessible/xpcom'
[task 2017-04-26T22:59:52.576041Z] 22:59:52     INFO -  ../../../config/nsinstall -R -m 644 'TelemetryEventEnums.h' '../../../dist/include/mozilla'
[task 2017-04-26T22:59:52.631957Z] 22:59:52     INFO -  /home/worker/workspace/build/src/obj-firefox/_virtualenv/bin/python -m mozbuild.action.webidl /home/worker/workspace/build/src/dom/bindings
[task 2017-04-26T22:59:52.803134Z] 22:59:52     INFO -  Traceback (most recent call last):
[task 2017-04-26T22:59:52.803424Z] 22:59:52     INFO -    File "/usr/lib/python2.7/runpy.py", line 162, in _run_module_as_main
[task 2017-04-26T22:59:52.804176Z] 22:59:52     INFO -      "__main__", fname, loader, pkg_name)
[task 2017-04-26T22:59:52.804285Z] 22:59:52     INFO -    File "/usr/lib/python2.7/runpy.py", line 72, in _run_code
[task 2017-04-26T22:59:52.804352Z] 22:59:52     INFO -      exec code in run_globals
[task 2017-04-26T22:59:52.805193Z] 22:59:52     INFO -    File "/home/worker/workspace/build/src/python/mozbuild/mozbuild/action/file_generate.py", line 108, in <module>
[task 2017-04-26T22:59:52.805612Z] 22:59:52     INFO -      sys.exit(main(sys.argv[1:]))
[task 2017-04-26T22:59:52.806055Z] 22:59:52     INFO -    File "/home/worker/workspace/build/src/python/mozbuild/mozbuild/action/file_generate.py", line 63, in main
[task 2017-04-26T22:59:52.806165Z] 22:59:52     INFO -      ret = module.__dict__[method](output, *args.additional_arguments)
[task 2017-04-26T22:59:52.806256Z] 22:59:52     INFO -    File "/home/worker/workspace/build/src/toolkit/components/telemetry/gen-scalar-data.py", line 85, in main
[task 2017-04-26T22:59:52.806705Z] 22:59:52     INFO -      scalars = parse_scalars.load_scalars(filenames[0])
[task 2017-04-26T22:59:52.807151Z] 22:59:52     INFO -    File "/home/worker/workspace/build/src/toolkit/components/telemetry/parse_scalars.py", line 298, in load_scalars
[task 2017-04-26T22:59:52.807284Z] 22:59:52     INFO -      scalar_list.append(ScalarType(group_name, probe_name, scalar_info))
[task 2017-04-26T22:59:52.807742Z] 22:59:52     INFO -    File "/home/worker/workspace/build/src/toolkit/components/telemetry/parse_scalars.py", line 33, in __init__
[task 2017-04-26T22:59:52.807842Z] 22:59:52     INFO -      self.validate_types(definition)
[task 2017-04-26T22:59:52.807936Z] 22:59:52     INFO -    File "/home/worker/workspace/build/src/toolkit/components/telemetry/parse_scalars.py", line 129, in validate_types
[task 2017-04-26T22:59:52.808818Z] 22:59:52     INFO -      '. See: {}#required-fields'.format(BASE_DOC_URL))
[task 2017-04-26T22:59:52.809241Z] 22:59:52     INFO -  TypeError: graphite - record_in_processes must be list. See: https://gecko.readthedocs.io/en/latest/toolkit/components/telemetry/telemetry/collection/scalars.html#required-fields
[task 2017-04-26T22:59:52.813565Z] 22:59:52     INFO -  backend.mk:37: recipe for target 'TelemetryScalarData.h' failed
[task 2017-04-26T22:59:52.813820Z] 22:59:52     INFO -  gmake[5]: *** [TelemetryScalarData.h] Error 1
[task 2017-04-26T22:59:52.814393Z] 22:59:52     INFO -  gmake[5]: *** Deleting file 'TelemetryScalarData.h'
[task 2017-04-26T22:59:52.814483Z] 22:59:52     INFO -  gmake[5]: *** Waiting for unfinished jobs....
[task 2017-04-26T22:59:52.890184Z] 22:59:52     INFO -  Traceback (most recent call last):
[task 2017-04-26T22:59:52.890495Z] 22:59:52     INFO -    File "/usr/lib/python2.7/runpy.py", line 162, in _run_module_as_main
[task 2017-04-26T22:59:52.890742Z] 22:59:52     INFO -      "__main__", fname, loader, pkg_name)
[task 2017-04-26T22:59:52.890981Z] 22:59:52     INFO -    File "/usr/lib/python2.7/runpy.py", line 72, in _run_code
[task 2017-04-26T22:59:52.891197Z] 22:59:52     INFO -      exec code in run_globals
[task 2017-04-26T22:59:52.892859Z] 22:59:52     INFO -    File "/home/worker/workspace/build/src/python/mozbuild/mozbuild/action/file_generate.py", line 108, in <module>
[task 2017-04-26T22:59:52.893016Z] 22:59:52     INFO -      sys.exit(main(sys.argv[1:]))
[task 2017-04-26T22:59:52.893821Z] 22:59:52     INFO -    File "/home/worker/workspace/build/src/python/mozbuild/mozbuild/action/file_generate.py", line 63, in main
[task 2017-04-26T22:59:52.893915Z] 22:59:52     INFO -      ret = module.__dict__[method](output, *args.additional_arguments)
[task 2017-04-26T22:59:52.894089Z] 22:59:52     INFO -    File "/home/worker/workspace/build/src/toolkit/components/telemetry/gen-scalar-enum.py", line 38, in main
[task 2017-04-26T22:59:52.894356Z] 22:59:52     INFO -      scalars = parse_scalars.load_scalars(filenames[0])
[task 2017-04-26T22:59:52.894831Z] 22:59:52     INFO -    File "/home/worker/workspace/build/src/toolkit/components/telemetry/parse_scalars.py", line 298, in load_scalars
[task 2017-04-26T22:59:52.894920Z] 22:59:52     INFO -      scalar_list.append(ScalarType(group_name, probe_name, scalar_info))
[task 2017-04-26T22:59:52.895052Z] 22:59:52     INFO -    File "/home/worker/workspace/build/src/toolkit/components/telemetry/parse_scalars.py", line 33, in __init__
[task 2017-04-26T22:59:52.895244Z] 22:59:52     INFO -      self.validate_types(definition)
[task 2017-04-26T22:59:52.895570Z] 22:59:52     INFO -    File "/home/worker/workspace/build/src/toolkit/components/telemetry/parse_scalars.py", line 129, in validate_types
[task 2017-04-26T22:59:52.895852Z] 22:59:52     INFO -      '. See: {}#required-fields'.format(BASE_DOC_URL))
[task 2017-04-26T22:59:52.896174Z] 22:59:52     INFO -  TypeError: graphite - record_in_processes must be list. See: https://gecko.readthedocs.io/en/latest/toolkit/components/telemetry/telemetry/collection/scalars.html#required-fields
[task 2017-04-26T22:59:52.900682Z] 22:59:52     INFO -  backend.mk:44: recipe for target 'TelemetryScalarEnums.h' failed
[task 2017-04-26T22:59:52.900985Z] 22:59:52     INFO -  gmake[5]: *** [TelemetryScalarEnums.h] Error 1
Flags: needinfo?(tom)
Well that's embarrassing. Should be fixed now though... (Successful local build and all...)
Flags: needinfo?(tom)
Keywords: checkin-needed
Pushed by cbook@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/a6a95c27df82
Add Telemetry probe to Graphite library usage r=jfkthame
Keywords: checkin-needed
Comment on attachment 8829653 [details]
Bug 1331915 Add Telemetry probe to Graphite library usage

https://reviewboard.mozilla.org/r/106658/#review137658

::: toolkit/components/telemetry/Scalars.yaml:215
(Diff revision 5)
> +      The number of times a graphite2 font has been loaded.
> +    expires: "60"
> +    kind: uint
> +    notification_emails:
> +      - tom@mozilla.com
> +    release_channel_collection: opt-in

Note: this patch never went through a final data review, which should have happened before landing.

Why is this opt-in? It seems unlikely to me that you're going to get useful data about this feature from prerelease channels, considering that it's highly geography-dependent and we know that beta is skewed towards western geos.
Flags: needinfo?(tom)
(In reply to Benjamin Smedberg [:bsmedberg] from comment #20)
> Comment on attachment 8829653 [details]
> Bug 1331915 Add Telemetry probe to Graphite library usage
> 
> https://reviewboard.mozilla.org/r/106658/#review137658
> 
> ::: toolkit/components/telemetry/Scalars.yaml:215
> (Diff revision 5)
> > +      The number of times a graphite2 font has been loaded.
> > +    expires: "60"
> > +    kind: uint
> > +    notification_emails:
> > +      - tom@mozilla.com
> > +    release_channel_collection: opt-in
> 
> Note: this patch never went through a final data review, which should have
> happened before landing.

My misunderstanding, sorry. I'll be more careful in the future.

> Why is this opt-in? It seems unlikely to me that you're going to get useful
> data about this feature from prerelease channels, considering that it's
> highly geography-dependent and we know that beta is skewed towards western
> geos.

After talking with bsmedberg we're going to make this opt-out. I'll provide a followup patch and have him approve it.
Flags: needinfo?(tom)
Keywords: leave-open
I think because this landed I need to make the opt-out switch a new bug. Opened Bug 1360892 for this.
Status: NEW → RESOLVED
Closed: 7 years ago
Keywords: leave-open
Resolution: --- → FIXED
Target Milestone: --- → mozilla55
Comment on attachment 8829653 [details]
Bug 1331915 Add Telemetry probe to Graphite library usage

Approval Request Comment
[Feature/Bug causing the regression]: We would like to collect telemetry about a feature we have concerns about.

[User impact if declined]: We will be delayed in making security decisions for our users.

[Is this code covered by automated tests?]: Yes, we have tests that test graphite, so it will hit this call.

[Has the fix been verified in Nightly?]: Yes.
https://telemetry.mozilla.org/new-pipeline/dist.html#!cumulative=0&end_date=2017-04-30&keys=__none__!__none__!__none__&max_channel_version=nightly%252F55&measure=SCALARS_BROWSER.USAGE.GRAPHITE&min_channel_version=null&processType=*&product=Firefox&sanitize=1&sort_keys=submissions&start_date=2017-04-28&table=0&trim=1&use_submission_date=0

[Needs manual test from QE? If yes, steps to reproduce]: No

[List of other uplifts needed for the feature/fix]: The child to this bug, which is Bug 1360892

[Is the change risky?]: No

[Why is the change risky/not risky?]: It adds a single Telemetry call. If it was going to crash, it should be crashing on Nightly.

[String changes made/needed]: None
Attachment #8829653 - Flags: approval-mozilla-beta?
Comment on attachment 8829653 [details]
Bug 1331915 Add Telemetry probe to Graphite library usage

Required by bug 1360892. Beta54+. Should be in 54 beta 6.
Attachment #8829653 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
(In reply to Tom Ritter [:tjr] from comment #24)
> [Is this code covered by automated tests?]: Yes, we have tests that test
> graphite, so it will hit this call.
> 
> [Needs manual test from QE? If yes, steps to reproduce]: No

Setting qe-verify- based on Tom's assessment on manual testing needs and the fact that this fix has automated coverage.
Flags: qe-verify-
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: