Add a preference to prevent local font enumeration

NEW
Unassigned

Status

()

Core
Layout
P3
normal
6 years ago
2 months ago

People

(Reporter: cviecco, Unassigned)

Tracking

(Blocks: 2 bugs)

Firefox Tracking Flags

(Not tracked)

Details

(Whiteboard: [fingerprinting][tor][tor-standalone])

Attachments

(3 attachments, 3 obsolete attachments)

(Reporter)

Description

6 years ago
According the panopticlick project, fonts provide 16 bits of entropy about a browser (http://panopticlick.eff.org/browser-uniqueness.pdf). We need to have a preference that limits this. 
The Tor developers have developed a patch that limits the number of fonts rendered per document ( https://trac.torproject.org/projects/tor/ticket/2872) but not so advanced attackers can still enumerate the fonts, just more slowly.

The best solution would be to enable generic fonts and webfonts. Initial work would be focused on limiting fonts to the generic set only ( http://www.w3.org/TR/CSS2/fonts.html#generic-font-families).
(Reporter)

Updated

6 years ago
Assignee: nobody → cviecco
(Reporter)

Comment 1

6 years ago
Created attachment 602035 [details] [diff] [review]
Use only generic fonts

First patch, for using only generic fonts. Has no test suite yet.
(Reporter)

Comment 2

6 years ago
Created attachment 603051 [details] [diff] [review]
Test for the bug.

This patch is only the test suite. Tests the changing and reseting of the preference.
(Reporter)

Comment 3

6 years ago
This has been tied to the following feature page (https://wiki.mozilla.org/Privacy/Features/Pref_to_limit_number_of_fonts_loaded_per_tab). 

The prefrence name probably needs a better name (limit_local_font_usage vs use_generic_fonts_only)

But I think the patch is ready for comments.
(Reporter)

Updated

6 years ago
Attachment #602035 - Flags: review?
(Reporter)

Updated

6 years ago
Attachment #603051 - Flags: review?
Component: Private Browsing → Layout
Product: Firefox → Core
QA Contact: private.browsing → layout
Camilo, perhaps you'd like to choose a reviewer? Right now, the patches are being requested for review "from the air".
(Reporter)

Comment 5

6 years ago
Comment on attachment 603051 [details] [diff] [review]
Test for the bug.

Review of attachment 603051 [details] [diff] [review]:
-----------------------------------------------------------------

Removing the review request for the test (for now)
Attachment #603051 - Flags: review?
(Reporter)

Comment 6

6 years ago
Comment on attachment 602035 [details] [diff] [review]
Use only generic fonts

Review of attachment 602035 [details] [diff] [review]:
-----------------------------------------------------------------

removing review request for now
Attachment #602035 - Flags: review?
(Reporter)

Comment 7

6 years ago
Created attachment 610678 [details] [diff] [review]
patch to hide local fonts, faking some ms corefonts
Attachment #602035 - Attachment is obsolete: true
(Reporter)

Comment 8

6 years ago
Created attachment 610978 [details] [diff] [review]
patch to hide local fonts, faking presence of mscorefonts
Attachment #610678 - Attachment is obsolete: true
(Reporter)

Comment 9

6 years ago
Created attachment 611655 [details] [diff] [review]
use only generic fonts (and fake mscorefont presence)

Potential Issues (that I know could be problematic)
1. The name of the pref is use_generic_fonts only, in the future I would also like to show downloadable fonts. So the name is not future-proof.
2. getCoreGenericID is inside layout/style/nsRuleNode.cpp but the getGeneicID is inside gfx/src/nsFont.cpp but it does not 'feel' right to put that code there.
Attachment #610978 - Attachment is obsolete: true
Attachment #611655 - Flags: review?(bzbarsky)
(In reply to Camilo Viecco from comment #0)
> According the panopticlick project, fonts provide 16 bits of entropy about a
> browser (http://panopticlick.eff.org/browser-uniqueness.pdf).

I thought the 16 bits of entropy claim was mostly about what's available via plugins that provide a *list* of fonts in a system-specific order (order of installation, I believe).

I think the amount of entropy provided if the list isn't ordered is considerably smaller.  Furthermore, this doesn't seem that useful a change to make if most users have the Flash or Java plugins that allow this enumeration.


Additionally, this preference seems almost identical to (the opposite of) the existing use_document_fonts pref, except for the added logic to fake generics for the MS core fonts.  Why is it implemented as an entirely separate branch?

Finally, if the goal is to implement something that could potentially be enabled by default or by wider swaths of users, this is the wrong approach.  You'd want to whitelist different sets of fonts for different platforms (on the assumption that platform can be detected and there's never going to be away around that), and you'd want to continue allowing downloadable fonts.
(Reporter)

Comment 11

6 years ago
(In reply to David Baron [:dbaron] from comment #10)
Thank you for taking a look at this.

> (In reply to Camilo Viecco from comment #0)
> > According the panopticlick project, fonts provide 16 bits of entropy about a
> > browser (http://panopticlick.eff.org/browser-uniqueness.pdf).
> 
> I thought the 16 bits of entropy claim was mostly about what's available via
> plugins that provide a *list* of fonts in a system-specific order (order of
> installation, I believe).
Ordered fonts gave 17.1 bits of entropy, unordered 16.0 (section 6.4 on the paper)

> I think the amount of entropy provided if the list isn't ordered is
> considerably smaller.  Furthermore, this doesn't seem that useful a change
> to make if most users have the Flash or Java plugins that allow this
> enumeration.

The security roadmap has a P1 feature to have click-to-play for plugins per site. So users by default would not have font lists these exposed via plugin-ins (by default). 

> 
> 
> Additionally, this preference seems almost identical to (the opposite of)
> the existing use_document_fonts pref, except for the added logic to fake
> generics for the MS core fonts.  Why is it implemented as an entirely
> separate branch?

I think you are right about this. I actually like the idea of changing the behavior of 'use document fonts' for what I am trying to do. I wrote it as a separate flag to be able to keep the old behavior, but I am very inclined to use your idea.

> Finally, if the goal is to implement something that could potentially be
> enabled by default or by wider swaths of users, this is the wrong approach. 
> You'd want to whitelist different sets of fonts for different platforms (on
> the assumption that platform can be detected and there's never going to be
> away around that), and you'd want to continue allowing downloadable fonts.

I do not see this enabled by default for large swaths of users, the primary user set would be privacy aware users (and privacy aware extenstions) that would enable it. 
I agree on the difficulty of hiding your OS platform, but I am ambivalent on the idea of whilelists per OS (I do not want to give up hope yet). My whishlist would be to figure out the 20 most used fonts on the web and build per os mappings.
Downloadable fonts should be enabled, but I see this as a follow up bug. 

Thanks again
(In reply to Camilo Viecco from comment #11)
> I agree on the difficulty of hiding your OS platform, but I am ambivalent on
> the idea of whilelists per OS (I do not want to give up hope yet). My
> whishlist would be to figure out the 20 most used fonts on the web and build
> per os mappings.
> Downloadable fonts should be enabled, but I see this as a follow up bug. 

Mappings don't help if you're mapping the font to something with different font metrics, since that's a detectable difference.

Enabling downloadable fonts wouldn't be appropriate for the existing use of the preference, though.
(In reply to Camilo Viecco from comment #11)
> Downloadable fonts should be enabled, but I see this as a follow up bug. 

Also, this doesn't make sense as a followup bug since it would call for implementing this preference in an entirely different layer of the code.


That said, another major problem here is that what you've implemented is very Latin-centric; it's not going to work for other scripts, which have other fonts.
Comment on attachment 611655 [details] [diff] [review]
use only generic fonts (and fake mscorefont presence)

r- per comments, but in general dbaron is a better reviewer for this than I am....
Attachment #611655 - Flags: review?(bzbarsky) → review-
(Reporter)

Comment 15

6 years ago
(In reply to David Baron [:dbaron] from comment #13)
> (In reply to Camilo Viecco from comment #11)
> > Downloadable fonts should be enabled, but I see this as a follow up bug. 
> 
> Also, this doesn't make sense as a followup bug since it would call for
> implementing this preference in an entirely different layer of the code.

OK. Would you be then OK with a patch that would allow only generics and downloadable
fonts? Ignore the problem of bad font stacks. Yes I accept the fact that
sites would still be able to fingerprint the five generic fonts.
If you agree with this, do you have any pointers on where in the source code
to start?

Thank you

> 
> That said, another major problem here is that what you've implemented is
> very Latin-centric; it's not going to work for other scripts, which have
> other fonts.

I have tested the patch on OSX and Linux (ubuntu 11.10) going to arabic,
israeli, thai, chinese, japanese and korean sites and they still rendered
readably. that is the default fonts for there OS have very good unicode
coverage. So I dont see it as a problem.
(In reply to Camilo Viecco from comment #15)
> (In reply to David Baron [:dbaron] from comment #13)
> > That said, another major problem here is that what you've implemented is
> > very Latin-centric; it's not going to work for other scripts, which have
> > other fonts.
> 
> I have tested the patch on OSX and Linux (ubuntu 11.10) going to arabic,
> israeli, thai, chinese, japanese and korean sites and they still rendered
> readably. that is the default fonts for there OS have very good unicode
> coverage. So I dont see it as a problem.

Sorry, it's your fallback from common names to generics that's Latin-centric.  You'll still get text because we'll still fall back to anything that we can find; it's just in non-Latin scripts you'll no longer get serif/sans-serif distinctions -- in other words, for non-Latin scripts this will be fully equivalent to the existing disable-document-fonts pref, whereas the "smarts" you're adding are Latin-script only (or maybe Latin+Cyrillic+Greek for some of the fonts).

Comment 17

5 years ago
I think this bug needs more research to back up (1) that the original research is actually valid and (2) that the solution proposed makes sense relative to the problem as defined.  Specifically, Flash provides API's that allow a much-higher entropy version of the font list (i.e. the order of fonts as provided by the OS).  Same is true for Java.  Unless you're including an explicit block on these API's, there isn't much sense neutering the browser like this.  Note that there *isn't* a non-chrome way of enumerating fonts other than by testing a huge list of fonts.

Fallback fonts are used more than most users understand.  It's actually hard to specify an explicit list of "these are the only fonts we'll ever need", because that will vary by locale (and many non-English locale users use an English version of the browser, so you can't realistically key off the browser locale).  Even the basic Facebook homepage uses multiple fonts to display the languages at the bottom of the page.

But I think the key here is to confirm the research before running off and neutering the browser without understanding the tradeoffs involved.  I'm surprised at the number of Mozilla engineers who are ready to propose this sort of solution when realistically all sorts of facets of browser behavior are capable of providing a way of identifying the local machine, especially any behavior associated with an OS or hardware API (e.g. canvas, WebGL, font rasterization).  

I think Peter from EFF would love to help (he's based in SF).  I think it's important to try and reproduce his results.  I'm still skeptical about the claims of fontlists providing such a high-degree of user identification.

Comment 18

5 years ago
(In reply to John Daggett (:jtd) from comment #17)
> I think this bug needs more research to back up (1) that the original
> research is actually valid and (2) that the solution proposed makes sense
> relative to the problem as defined.  Specifically, Flash provides API's that
> allow a much-higher entropy version of the font list (i.e. the order of
> fonts as provided by the OS).  Same is true for Java.  Unless you're
> including an explicit block on these API's, there isn't much sense neutering
> the browser like this.  Note that there *isn't* a non-chrome way of
> enumerating fonts other than by testing a huge list of fonts.

See previous comments about click-to-play Java+Flash.

> But I think the key here is to confirm the research before running off and
> neutering the browser without understanding the tradeoffs involved.  I'm
> surprised at the number of Mozilla engineers who are ready to propose this
> sort of solution when realistically all sorts of facets of browser behavior
> are capable of providing a way of identifying the local machine, especially
> any behavior associated with an OS or hardware API (e.g. canvas, WebGL, font
> rasterization).  

Well, I don't think that "OMG, fingerprinting iz hard" is a valid argument against dealing with it. It's likely that most engineers realize that while the list is long, it is not infinite. Where there is a will, there is a way.

Additionally, we're not talking about neutering the browser in all cases. We're only talking about making changes to provide those users who decide to use Private Browsing Mode with actual privacy...

> I think Peter from EFF would love to help (he's based in SF).  I think it's
> important to try and reproduce his results.  I'm still skeptical about the
> claims of fontlists providing such a high-degree of user identification.

Last time I pinged pde about this, his opinion was that the browser should ship with a common set of fonts for all languages that the browser would use in Private Browsing Mode, at the exclusion of OS-supplied fonts. For example: https://secure.wikimedia.org/wikipedia/en/wiki/Unicode_typeface#List_of_Unicode_fonts

To me, this seems more heavy than finding a heuristic such as "use only generic fonts" or "use only the first N probed fonts", but it would be more effective against fingerprinting, especially in the face of dbarons comments about non-Latin font issues.

Comment 19

5 years ago
(In reply to Mike Perry from comment #18)
> Well, I don't think that "OMG, fingerprinting iz hard" is a valid
> argument against dealing with it. It's likely that most engineers
> realize that while the list is long, it is not infinite. Where there
> is a will, there is a way.

To put it in simpler terms, I think the only effective way to prevent
fingerprinting in some form of anonymity mode is to disable script,
either completely or selectively.  I don't think you can run script in
the browser and assure anonymity by neutering all possible APIs that
could provide sidechannel way of identifying a given computer.  No
matter how much we will it, we can't turn iron into gold.  And we
don't get extra points for trying.

> Last time I pinged pde about this, his opinion was that the browser
> should ship with a common set of fonts for all languages that the
> browser would use in Private Browsing Mode, at the exclusion of
> OS-supplied fonts. For example:
> https://secure.wikimedia.org/wikipedia/en/wiki/
> Unicode_typeface#List_of_Unicode_fonts
> 
> To me, this seems more heavy than finding a heuristic such as "use
> only generic fonts" or "use only the first N probed fonts", but it
> would be more effective against fingerprinting, especially in the
> face of dbarons comments about non-Latin font issues.

Heh, this is something of a long-cherished goal.  Unfortunately, font
rendering varies widely across OS/platforms (hint: fingerprinting
opportunity) and there's a "long tail" of scripts which are supported
by only a handful of fonts on modern OS's.  Those happen to also be
the scripts for which font production is especially complicated,
either because of the large number of glyphs needed (Chinese/Japanese)
or the complexities of the script itselt (Arabic, Indic, Lao, Burmese,
Khmer, Tibetan, etc).  That's a noble goal but not a practical one.

Comment 20

5 years ago
P.S. if this is simply for experimentation, why not simply set 'browser.display.use_document_fonts' to 0?

The existing code in nsRuleNode::ComputeFontData seems very similar to what you're attempting to do:

http://mxr.mozilla.org/mozilla-central/source/layout/style/nsRuleNode.cpp#3139

Comment 21

5 years ago
(In reply to John Daggett (:jtd) from comment #20)
> P.S. if this is simply for experimentation, why not simply set
> 'browser.display.use_document_fonts' to 0?

Some data points here: We are using this mechanism for exactly the problem at hand since June 2011 and so far got a handful of user requests to help them reverting it as it resulted in usability issues (some of them making website navigation nearly impossible). Looking at our small western-centric userbase I doubt setting this preference to 0 is a viable general solution even if it is "just" for a proper private browsing mode.
(Reporter)

Comment 22

5 years ago
(In reply to John Daggett (:jtd) from comment #20)
> P.S. if this is simply for experimentation, why not simply set
> 'browser.display.use_document_fonts' to 0?
> 
> The existing code in nsRuleNode::ComputeFontData seems very similar to what
> you're attempting to do:
> 
> http://mxr.mozilla.org/mozilla-central/source/layout/style/nsRuleNode.
> cpp#3139

Initially that is what my patch did (see https://bugzilla.mozilla.org/attachment.cgi?id=602035&action=diff)  but after using the web for a few hours I started to realize how different (and ugly) many sites looked. In particular many sites had font stacks that did not included the a generic font or the generic font was not in the same style (serif vs sans-serif) as the preferred fonts. That is why the latest version of the patch uses some substitutions on the font resulutions so that internally we map 'Arial' to the generic sans-serif font, similar to the other ms core fonts.

However I now think tha having the map for 'Impact','Windings' and 'Arial Black' is a waste of CPU cycles as these fonts are not considered 'safe' and web-site developers should not expect these to be seen in all systems.

I after doing the faking of the corefonts sites looked much closer than with the generic only approach. And while the great majority looked different, the difference (to me) was not significant.
(Reporter)

Comment 23

5 years ago
oops ignore last parragraph on comment 22.(In reply to Camilo Viecco from comment #22)

Comment 24

5 years ago
Given that a pref (browser.display.use_document_fonts) already exists to effectively disable the ability to detect font families, I don't think we should go forward with the work on this patch unless it's part of a larger design for some form of "anonymity mode", one for which an overall strategy has been laid out that clearly defines the role of script under that mode (i.e. running/not running/running but with a specific set of APIs disabled or running in some form of generic rendering mode).

This bug just seems like an effort to make browser.display.use_document_fonts=0 rendering "suck less".  Experimentation with the existing pref is already possible, that should be enough for now until the larger question of whether script is running or not in "anonymity mode" is resolved.

Comment 25

5 years ago
(In reply to John Daggett (:jtd) from comment #19)
> (In reply to Mike Perry from comment #18)
> > Well, I don't think that "OMG, fingerprinting iz hard" is a valid
> > argument against dealing with it. It's likely that most engineers
> > realize that while the list is long, it is not infinite. Where there
> > is a will, there is a way.
> 
> To put it in simpler terms, I think the only effective way to prevent
> fingerprinting in some form of anonymity mode is to disable script,
> either completely or selectively.  I don't think you can run script in
> the browser and assure anonymity by neutering all possible APIs that
> could provide sidechannel way of identifying a given computer.  No
> matter how much we will it, we can't turn iron into gold.  And we
> don't get extra points for trying.

I disagree almost entirely. Javascript does not need to be disabled (in fact, a lot of the information is now available to CSS media queries), and you do get points for trying. Fingerprinting defenses are by definition best-effort. You go for the highest entropy sources first.

For information on how the Tor Project is approaching fingerprinting, see https://www.torproject.org/projects/torbrowser/design/#fingerprinting-linkability. Note that document is a draft, and is subject to change. If anyone spots obvious issues, I am open to changes. Our browser actively tracks Firefox.

> > Last time I pinged pde about this, his opinion was that the browser
> > should ship with a common set of fonts for all languages that the
> > browser would use in Private Browsing Mode, at the exclusion of
> > OS-supplied fonts. For example:
> > https://secure.wikimedia.org/wikipedia/en/wiki/
> > Unicode_typeface#List_of_Unicode_fonts
> > 
> > To me, this seems more heavy than finding a heuristic such as "use
> > only generic fonts" or "use only the first N probed fonts", but it
> > would be more effective against fingerprinting, especially in the
> > face of dbarons comments about non-Latin font issues.
> 
> Heh, this is something of a long-cherished goal.  Unfortunately, font
> rendering varies widely across OS/platforms (hint: fingerprinting
> opportunity) and there's a "long tail" of scripts which are supported
> by only a handful of fonts on modern OS's.  Those happen to also be
> the scripts for which font production is especially complicated,
> either because of the large number of glyphs needed (Chinese/Japanese)
> or the complexities of the script itselt (Arabic, Indic, Lao, Burmese,
> Khmer, Tibetan, etc).  That's a noble goal but not a practical one.

Yeah, this was my suspicion. Currently, Tor's approach is to limit the number of font probes and fonts used per document: https://gitweb.torproject.org/torbrowser.git/blob/maint-2.2:/src/current-patches/firefox/0013-Limit-the-number-of-fonts-per-document.patch

I am currently up in the air as to if this approach is superior to an 'allow only generics' or not. It seems more complicated than we thought to define 'generic' though, so perhaps limiting the number of font queries is the simplest way to minimize entropy?

I've emailed pde about this bug, in case he has any additional comments.

Comment 26

5 years ago
(In reply to Mike Perry from comment #25)
> I am currently up in the air as to if this approach is superior to an 'allow
> only generics' or not. It seems more complicated than we thought to define
> 'generic' though, so perhaps limiting the number of font queries is the
> simplest way to minimize entropy?

What's a "font query"?  Measuring the offsetWidth of dynamic content is influenced by font metrics along with any box metric affected by line layout.  And which font is used may be determined by fallback, either to pref fonts per script (e.g. font.name.sans-serif.ja) or by general system fallback.  That's why the relatively simple facebook.com page uses a whole slew of fonts, they're needed to display the languages at the bottom of the page, none of which are specified directly. So the metrics for a string containing characters from a number of scripts will vary widely based on the fonts used in fallback.  Likewise, the coverage of core webfonts varies depending upon OS/software version (e.g if Office installs a font I can detect that using the difference between the version in use and the base OS version).  Changes to the hinting results in rasterization differences.  Differences in shaping behavior allow me to sniff things like the precise Uniscribe/CoreText version, which varies with the specific set of security updates installed.  Finding the exact script for all this may be tricky but running it won't be.  And to neuter this you'll probably need to neuter things that would cause problems for correct rendering of international text.

> I've emailed pde about this bug, in case he has any additional comments.

I'm not sure this bug is the best place for a more general discussion of fingerprinting, starting a discussion on dev.platform ML (or some related list) might be better I think.
Hey all, I'm new here, and it is interesting to see that you are working on a solution to fix font based fingerprinting.

Actually, we're also working on an experiment to determine whether font based cross-browser fingerprinting can be realized or not. Our goal is not the fingerprinting method itself, but rather to find a solution to this problem.

Our site for collecting fingerprints:
http://fingerprint.pet-portal.eu/?lang=en

Based on previously collected data, we wrote a paper, that proved that JS based font detection is a rather good entropy source, but the data was insuffient to determine whether it can be used for cross-browser fingerprinting:
http://pet-portal.eu/articles/view/37/?set_language=eng

This is why we started the second round of fingerprint collection.

Besides, we created a small Firefox extension disabling JS based font detection via removing the offsetwidth and height getter functions. This works well for many sites, but in some cases it makes bugs. (This extension is called FireGloves.)

> Yeah, this was my suspicion. Currently, Tor's approach is to limit the
> number of font probes and fonts used per document:
> https://gitweb.torproject.org/torbrowser.git/blob/maint-2.2:/src/current-
> patches/firefox/0013-Limit-the-number-of-fonts-per-document.patch
> 
> I am currently up in the air as to if this approach is superior to an 'allow
> only generics' or not. It seems more complicated than we thought to define
> 'generic' though, so perhaps limiting the number of font queries is the
> simplest way to minimize entropy?
Our new, second idea is very similar to this: how about making the limit site dependent? Furthermore, the fonts required by a site could be cached, and the browser would load only those fonts required previosly? Of course, users would also need to control the cache size per site, and to flush the cache if required.

This could mean better user experience, and weaker linkability, since the user could not be tested with different fonts per page load.
(Reporter)

Comment 28

5 years ago
It seems like this patch should be rewritten so that remote fonts can be used, as the current patch breaks pdf.js. (Pdf.js transforms the pdf fonts into downloadable fonts and injects them into the document).

We shall also accept the fact that this limits would not prevent some fingerprinting due to default unicode coverage in different OS platforms (in general OSX>Win>Linux>other).

Comment 29

5 years ago
(In reply to Camilo Viecco from comment #28)
> It seems like this patch should be rewritten so that remote fonts can be
> used, as the current patch breaks pdf.js. (Pdf.js transforms the pdf fonts
> into downloadable fonts and injects them into the document).

Given that 'browser.display.use_document_fonts' already provides controls like this I'm not sure I see a clear reason to rework the patch.

> We shall also accept the fact that this limits would not prevent some
> fingerprinting due to default unicode coverage in different OS platforms (in
> general OSX>Win>Linux>other).

If there's a strong correlation between "sources" of entropy, then whacking one source simply shifts the method but doesn't change the result. This is exactly the whack-a-mole approach to fingerprinting in general that I don't think makes sense.  

The key question is whether enabling script is appropriate in any form of privacy mode.  If it is, then what's the strategy for determining the set of API's that are accessible in this mode?  There's a raft of ways to measure text and grab the results of rendering text that effectively expose machine-specific parameters.  Unicode coverage is only one of a slew of side channels to these (see comment 26).

The baseline question that needs to be answered is whether once all these capabilities are somehow neutered or the normal behavior altered, is the result still more usable than simply disabling script?
Assignee: cviecco → nobody

Comment 30

5 years ago
(In reply to John Daggett (:jtd) from comment #29)
> (In reply to Camilo Viecco from comment #28)
> > It seems like this patch should be rewritten so that remote fonts can be
> > used, as the current patch breaks pdf.js. (Pdf.js transforms the pdf fonts
> > into downloadable fonts and injects them into the document).
> 
> Given that 'browser.display.use_document_fonts' already provides controls
> like this I'm not sure I see a clear reason to rework the patch.

The way to think about this is in terms of the number of bits of fingerprinting you pay for rendering quality. use_document_fonts costs the least, but it sucks. Shipping a fixed font pack is hard technical endeavor, but costs about the same in terms of bits..

For 0-1.5 more bits (depending on if you can still infer OS even with use_document_fonts at false), we can map a fixed set of "generics" to OS supplied fonts, and cede OS fingerprinting in exchange for a practical solution that has slightly better rendering than use_document_fonts and improves the status quo.

For a tiny bit more (3 bits total max, but probably less in total joint entropy, maybe even also equivalent to OS detection in joint entropy with everything else), we can simply behave just like use_document_fonts after the first 8 fonts query attempts from a top-level domain. In response to your question in comment 26 about what is a "font query", in my patch it is a CSS font selection operation in nsRuleNode::ComputeFontData. Sid has pointed out some other issues with my patch, but they're not impossible to fix, as far as I know.

For where we're at now (lots of fingerprinting issues to address), I think my approach might be the most direct, simplest one that reduces the bits available from fonts from something like 14 down to at most 3, but probably less in the joint entropy.

> > We shall also accept the fact that this limits would not prevent some
> > fingerprinting due to default unicode coverage in different OS platforms (in
> > general OSX>Win>Linux>other).
> 
> If there's a strong correlation between "sources" of entropy, then whacking
> one source simply shifts the method but doesn't change the result. This is
> exactly the whack-a-mole approach to fingerprinting in general that I don't
> think makes sense.  
> 
> The key question is whether enabling script is appropriate in any form of
> privacy mode.  If it is, then what's the strategy for determining the set of
> API's that are accessible in this mode?  There's a raft of ways to measure
> text and grab the results of rendering text that effectively expose
> machine-specific parameters.  Unicode coverage is only one of a slew of side
> channels to these (see comment 26).

I think limiting the number of fonts a url can attempt to use is a direct solution that avoids the need to worry about what the script can do with them once they're loaded. You cede the 3 bits total max for the font selection itself. I don't think measuring the fonts after they exist will get you a whole lot more data than simply choosing the best 8 fonts to segment the userbase on OS.

> The baseline question that needs to be answered is whether once all these
> capabilities are somehow neutered or the normal behavior altered, is the
> result still more usable than simply disabling script?

The web is pretty unusable without JS. Also, Javascript is a VM whose implementation is fully under our control. We can put any abstraction layer we want into it for Private Browsing Mode. There is no technical reason it can't be done. It's all a matter of cost/benefit analysis, and going for the biggest payoffs first.

Comment 31

5 years ago
Sid: Does the nobody assignment mean that Camilo has given up on the generics idea? If so, I can devote some more effort to cleaning up my implementation. IUC, your complaints were twofold:

1. The nsPresContext is a bad place to store the font count, since refresh will just clear it and allow more than 8 fonts to get loaded in total. How about using the content preferences database to hold the count, instead?

2. Pass a hint down to the nsRuleNode evaluation code if the font is specified as a remote font, and allow it to be exempt from the limit.
(In reply to Mike Perry from comment #31)
> Sid: Does the nobody assignment mean that Camilo has given up on the
> generics idea? If so, I can devote some more effort to cleaning up my
> implementation.

I don't think he's given up, but he's focusing on other things that have more momentum since this has become a time sink with little forward progress.  If you have time to devote to cleaning up your patch, I'm sure nobody would object.
(In reply to Sid Stamm [:geekboy] from comment #32)
> (In reply to Mike Perry from comment #31)
> > Sid: Does the nobody assignment mean that Camilo has given up on the
> > generics idea? If so, I can devote some more effort to cleaning up my
> > implementation.
> 
> I don't think he's given up, but he's focusing on other things that have
> more momentum since this has become a time sink with little forward
> progress.  If you have time to devote to cleaning up your patch, I'm sure
> nobody would object.

You're sure nobody would object to his cleaning up the patch, or to our accepting the patch?  If your assertion is the latter, it would probably be good to share information about what the patch is.
(In reply to David Baron [:dbaron] from comment #33)
> You're sure nobody would object to his cleaning up the patch, or to our
> accepting the patch?  If your assertion is the latter, it would probably be
> good to share information about what the patch is.

Sorry, that was unclear.  I am sure nobody would object to his cleaning up the patch (and presenting it as an option).  I am not sure whether or not there would be objections.

Comment 35

5 years ago
I'm working on this one.

(In reply to Mike Perry from comment #31)
> 1. The nsPresContext is a bad place to store the font count, since refresh
> will just clear it and allow more than 8 fonts to get loaded in total. How
> about using the content preferences database to hold the count, instead?

After reading a lot, I think the best place is the session history. I'm 100% it's the best place, but it's the best I can think of.

> 2. Pass a hint down to the nsRuleNode evaluation code if the font is
> specified as a remote font, and allow it to be exempt from the limit.

I'll post a patch in a few days.

Comment 36

5 years ago
I'm working on this one.

(In reply to Mike Perry from comment #31)
> 1. The nsPresContext is a bad place to store the font count, since refresh
> will just clear it and allow more than 8 fonts to get loaded in total. How
> about using the content preferences database to hold the count, instead?

After reading a lot, I think the best place is the session history. I'm not 100% it's the best place, but it's the best I can think of.

> 2. Pass a hint down to the nsRuleNode evaluation code if the font is
> specified as a remote font, and allow it to be exempt from the limit.

I'll post a patch in a few days.
(Reporter)

Comment 37

4 years ago
Created attachment 775737 [details] [diff] [review]
rebased patch
So a few thoughts here on ways forward:

 (1) From a narrow (getting this patch in) perspective, I'd note that we already have a preference (the use-document-fonts preference) that's just as privacy-preserving as what this patch does, and very similar.  In fact, this patch **duplicates the code** (that's wrong, it should share code) for that preference and then adds one feature on top, for inferring generics (e.g., sans-serif) from common family names.  If you want this code in, I think the thing to do is to add a preference for that one new feature (which could be usable with or without the use-document-fonts pref)

 (2) From a broader perspective, I'd like to have an understanding of what the threat model is here.  I think trying to block detection of the user's OS when JS is enabled is futile; there are just too many ways to do that.  Is the goal to reduce entropy?  To reduce it to the level of not allowing more than detection of Windows vs. Mac vs. Linux?  Or versions thereof?  Or is there some acceptable level of entropy from fonts?  (There are all sorts of interesting attacks here, some of which could attack differences between versions of the same font (yes, fonts have versions), and some of which wouldn't require JS.)

Comment 39

4 years ago
(In reply to David Baron [:dbaron] (don't cc:, use needinfo? instead) from comment #38)
> So a few thoughts here on ways forward:
>  (2) From a broader perspective, I'd like to have an understanding of what
> the threat model is here.  I think trying to block detection of the user's
> OS when JS is enabled is futile; there are just too many ways to do that. 
> Is the goal to reduce entropy?  To reduce it to the level of not allowing
> more than detection of Windows vs. Mac vs. Linux?  Or versions thereof?  Or
> is there some acceptable level of entropy from fonts?  (There are all sorts
> of interesting attacks here, some of which could attack differences between
> versions of the same font (yes, fonts have versions), and some of which
> wouldn't require JS.)

The goal is to reduce how distinguishable is a browser instance from another to the server. There are *a lot* of ways to fingerprint a user from the server, one of those is to basically check fonts. This doesn't necessarily have to do with OS detection, it has to do with normalizing the way browsers "behave" so that it will be harder to tell one from the other.

If I'm using a VPN or Tor or just a proxy, so many users have the same IP, and I've disabled JS (not longer possible afaik), if I have a weird font or combination of them, then I can be distinguished from a group of other VPN/Tor/Proxy users.

May be this should turn into "Set use_document_fonts to 0 when browsing in private mode"?
I'm aware of the idea of fingerprinting.  I'm asking whether this bug is just a general request for "less entropy" or whether there's a specific threshold to which there's a need to reduce the amount of information conveyed by installed fonts (which, honestly, there's still probably a good bit even with use_document_fonts turned off).  In other words, I'd like a goal that's clearly stated enough so that it's possible to tell whether it's been met.
(And I think if the requirements are strict, the only answer is to ship fonts with the browser and use only those fonts and downloadable fonts, and never use fonts from the system. Note that this means shipping fonts with the browser for all languages supported.)

Comment 42

4 years ago
(In reply to David Baron [:dbaron] (don't cc:, use needinfo? instead) from comment #40)
> I'm aware of the idea of fingerprinting.  I'm asking whether this bug is
> just a general request for "less entropy" or whether there's a specific
> threshold to which there's a need to reduce the amount of information
> conveyed by installed fonts (which, honestly, there's still probably a good
> bit even with use_document_fonts turned off).  In other words, I'd like a
> goal that's clearly stated enough so that it's possible to tell whether it's
> been met.

I think the whole idea would be to reduce to the minimum. I don't think there's a specific threshold from which it would be good enough.

(In reply to David Baron [:dbaron] (don't cc:, use needinfo? instead) from comment #41)
> (And I think if the requirements are strict, the only answer is to ship
> fonts with the browser and use only those fonts and downloadable fonts, and
> never use fonts from the system. Note that this means shipping fonts with
> the browser for all languages supported.)

Indeed, so I guess it's a matter of reducing the fingerprintability of the browser through fonts by making it an equivalent check as that of the OS (i.e. checking fonts and the order of the list will be equivalent to checking the host OS).

Does use_document_fonts=0 stablishes any order in the list?
I don't know of any ways that browsers expose an order of the font list.  It's plugins (Flash and/or Java, if memory serves) that do that.

Comment 44

4 years ago
Then part of this should impose that ordering to be deterministic, otherwise it'll be determined by something like inodes of the host machine.

Comment 45

4 years ago
Or may be another bug, but it's all towards the same goal.

Does this clarify anything?
Issues with plugins exposing a system-specific order of font enumeration should be in a separate bug.

Comment 47

4 years ago
Came across this bug entry while researching ways to lower fingerprinting entropy.
And i agree that font enumeration is not really an issue that can or needs to be handled by Firefox.
Opposed to #757726 which is JavaScript plugin enumeration and seems to be solved for a future release.

But for those that want to limit the Flash plugin font enumeration have a shot at adding the line
"DisableDeviceFontEnumeration = 1" (without quotes)
to your "mms.cfg" located at /Library/Application Support/Macromedia on OS X - other OS's locations for this file can be found here http://helpx.adobe.com/flash-player/kb/administration-configure-auto-update-notification.html
(In reply to Camilo Viecco (:cviecco) from comment #28)
> It seems like this patch should be rewritten so that remote fonts can be
> used, as the current patch breaks pdf.js. (Pdf.js transforms the pdf fonts
> into downloadable fonts and injects them into the document).

bug 789788 requires a similar solution; it's probably worth figuring out a design that will address both before digging too far into either.
See Also: → bug 1041818
I'm going to toss in my proposal in here as well.

My understanding is that there are a couple of ways that we're "using" local fonts.

First off we let pages do "font-face: 'SomeLocalFont'". Second, when we come upon a character which none of the page selected fonts provide a glyph for, we look through the local fonts to try to find a font which cover that character. (I'm much less sure about the second one, so please correct me if I'm wrong).

The first one is the one I'm most concerned about. The fact that webpages can use local fonts isn't just a fingerprinting problem. It's also an interoperability problem. It means that a website that a web developer creates might look great on the developers computer. But once rendered by an end user, it'll use completely different fonts and look significantly differently.

This may not at all be obvious to the developer.

This is especially problematic given that different OSs have different fonts by default. So even a developer that takes care to not use fonts that he/she has installed, he/she might still create a page that only looks good on windows.

So my proposal is to hardcode a short list of local font names that are allowed to be used as font-face from http(s) pages. Hopefully we can make this list very short and also hardcode good fallback fonts if the given local font is not available.

This should both improve interoperability as well as reduce fingerprinting.

I.e. I'm not proposing to change the second usage of local fonts at all.

Long term it'd be good to reduce the need to scan local fonts for the second use case as well. Again both for finger printing reasons, and for cross-user/cross-OS reasons. This could maybe be done by shipping better fonts. But I'm not proposing that we do that now.
(In reply to David Baron [:dbaron] (UTC-7, busy Oct 7-9) (needinfo? for questions) from comment #40)
> I'm aware of the idea of fingerprinting.  I'm asking whether this bug is
> just a general request for "less entropy" or whether there's a specific
> threshold to which there's a need to reduce the amount of information
> conveyed by installed fonts (which, honestly, there's still probably a good
> bit even with use_document_fonts turned off).  In other words, I'd like a
> goal that's clearly stated enough so that it's possible to tell whether it's
> been met.

I'd say that the specific threshold that we should aim for is "few enough total number of bits that using fingerprinting as a user identifier is not a viable large scale business model".

I.e. I don't think that we can aim for "no one is ever uniquely identifiable, ever".

So say that that means that the total number of bits that we should be significantly lower than 20. Something like 15 probably archives this. But this is total number of bits. The less we spend on fonts, the more other web platform features we can build, even if they expose some fingerprintability.

Sadly the Panopticlick website isn't very good at surfacing correlation between different features. For example the HTTP_ACCEPT header should essentially be entirely correlated with the UserAgent. Yet it shows up as 4 bits of information for me.

It seems to me that we should be able to get down to very few bits of information from fonts once you take UserAgent into account. I.e. the set of fonts should largely be a function of OS+OSversion+browser+browserversion.

But I don't think we can get there in a single bug. Nor do I think we can get there right away. But if we don't land any patches until we have a perfectly figured out plan how to get all the way to the end, we likely won't do anything until it's too late.

Google is already adding features which add bits of fingerprinting because "fingerprinting is already possible".

Fingerprinting is a big deal. It means that any website you go to will know your age, name, home town, political affiliation, etc. Without you having to log in, even in private browsing mode.
Whiteboard: [fingerprinting][tor]
Here's a link that tracks the latest version of the patch Tor Browser uses to whitelist fonts:
https://torpat.ch/13313

(At present, fonts are whitelisted on linux using a fontconfig file.)
Depends on: 1121643
Priority: -- → P3

Comment 52

a year ago
Suggestion:
> Add option to disallow using system fonts, but retain ability to use downloadable fonts.

Currently the *opposite* is possible (allow system fonts, disallow web fonts):
Set `browser.display.use_document_fonts` to `1` (allow sites to specify fonts)
Set `gfx.downloadable_fonts.enabled` to `false` (disallow using downloadable fonts)

Comment 53

a year ago
Hello everyone,

I did a few tests to find the easiest way to track a browser and apparently the font enumeration is the winner.

Excluding cases with people who have strange and/or rare fonts installed on their systems (graphic designers, etc), the general population can also be tracked because a lot of Windows applications install their own fonts (some custom icons, some others open license fonts).

The reason why fonts because a privacy "issue", is because all other methods have been covered via various privacy plugins. For example, WebRTC fingerprinting can be avoided by using a plugin (Disable WebRTC). Canvas finderprinting can also be disabled by a simple plugin (CanvasBlocker). Referrers can be removed with another plugin (Referrer Control) and finally cookies and sessions storage can be auto-deleted after an expiration time with another plugin (Self-Destructing Cookies).

That leaves only one method of browser fingerprinting and possible a privacy risk: fonts.

I mention privacy because the browser is leaking information about font names, for example, commercial/custom fonts used internally by businesses, thus leaking the identity of the owner of the browser/computer.

Updated

a year ago
Blocks: 1260929

Updated

11 months ago
Whiteboard: [fingerprinting][tor] → [fingerprinting][tor][tor-standalone]

Comment 54

9 months ago
Now that the dependency bug 1121643 is fixed, what is left to do to close this one as well?
(In reply to Selek Respa from comment #54)
> Now that the dependency bug 1121643 is fixed, what is left to do to close
> this one as well?

Nothing, perhaps? AFAICS, the font whitelist added in 1121643 offers a way to address the issues here: just set a whitelist of a few standard fonts, and those are the only fonts that will be seen as "installed".

(In reply to Mark from comment #53)
> I did a few tests to find the easiest way to track a browser and apparently
> the font enumeration is the winner.

Out of curiosity, how are you enumerating fonts? AFAIK the browser doesn't directly expose them, though there are certainly techniques to test (indirectly) for the availability of a given font.

Updated

8 months ago
Blocks: 1329996

Comment 56

7 months ago
I honestly can't understand what sort of person comes up with all these ideas about sending more and more identifiable information of the user. Some people are either clueless about privacy or they are definitely working against it on purpose.


(In reply to Jonathan Kew (:jfkthame) from comment #55)
> Out of curiosity, how are you enumerating fonts? AFAIK the browser doesn't
> directly expose them, though there are certainly techniques to test
> (indirectly) for the availability of a given font.

Apparently javascript (and not Flash) exposes font names, you can test this yourself, here:

https://www.privacytools.io/#fingerprint
(In reply to Mark from comment #56)
> (In reply to Jonathan Kew (:jfkthame) from comment #55)
> > Out of curiosity, how are you enumerating fonts? AFAIK the browser doesn't
> > directly expose them, though there are certainly techniques to test
> > (indirectly) for the availability of a given font.
> 
> Apparently javascript (and not Flash) exposes font names, you can test this
> yourself, here:
> 
> https://www.privacytools.io/#fingerprint

No, all that is doing is (as I suggested earlier) testing for the presence of specific font names, it's not enumerating all the installed fonts (because that isn't exposed to JS). Hence, for example, the panopticlick site comes up with a list of 29 common system fonts on my browser, whereas my actual list of installed fonts is close to 500.
You need to log in before you can comment on or make changes to this bug.