Open Bug 1582687 Opened 5 years ago Updated 10 months ago

Block user-installed fonts by default

Categories

(Core :: Graphics: Text, enhancement)

enhancement

Tracking

()

People

(Reporter: hsivonen, Unassigned)

References

(Blocks 1 open bug, )

Details

(Keywords: parity-safari, Whiteboard: [fingerprinting])

Attachments

(4 files)

(This bug is not a duplicate of bug 1336208, which asks for 1) bundling fonts and 2) blocking user-installed fonts when an off-by-default pref is set. This bug explicitly doesn't ask for bundling fonts with Firefox and ask for blocking to be on-by-default.)

User-installed fonts are a fingerprinting vector. Yet, they have very limited utility for browsing the Web, because Web developers have little reason (for legitimate, non-fingerprinting reasons) to specify fonts that are neither known to be bundled with popular operating systems nor provided by the site itself. While some users might like to change the default font to a self-installed font, most users probably never touch the font settings.

I suggest that by default (with an override in the font prefs) we block local fonts that aren't known to be bundled with the operating system Firefox is running on. For Linux, it's impossible to cover all distros, so we should probably cover a reasonable selection of the major ones.

This would still leave fingerprinting surface in cases where operating system-bundled fonts are installed on-demand depending on locale settings instead of being installed by default for all users of a given operating system, but it would still be a lot better than letting scripts fingerprint users on fonts that are known to exist but are known not to be operating system-bundled. (Also, many Linux users would probably be unhappy if mscorefonts were blocked, so that's one bit of fingerprinting surface. We should probably still block most of the optional fonts available in distro repos and should encourage distros to install broad Unicode coverage by default to minimize configuration differences among installations.)

(In reply to Henri Sivonen (:hsivonen) from comment #0)

I suggest that by default (with an override in the font prefs) we block local fonts that aren't known to be bundled with the operating system Firefox is running on. For Linux, it's impossible to cover all distros, so we should probably cover a reasonable selection of the major ones.

This sounds like a great idea.

This would still leave fingerprinting surface in cases where operating system-bundled fonts are installed on-demand depending on locale settings instead of being installed by default for all users of a given operating system, but it would still be a lot better than letting scripts fingerprint users on fonts that are known to exist but are known not to be operating system-bundled.

Agreed.

(Also, many Linux users would probably be unhappy if mscorefonts were blocked, so that's one bit of fingerprinting surface. We should probably still block most of the optional fonts available in distro repos and should encourage distros to install broad Unicode coverage by default to minimize configuration differences among installations.)

AFAIK mscorefonts (in most distros I've tried) is a user-installable package, is that right? So we'd be exposing information about whether the user has installed that package or not, which is probably exposing less information than we'd be exposing about the individual fonts in the bundle for users who have the same number of custom fonts installed in other environments.

Supporting mscorefonts by default and recommending distros to move it to the list of installed-by-default extensions sounds like a good middle-ground solution.

Whiteboard: [fingerprinting]

(In reply to :ehsan akhgari from comment #1)

AFAIK mscorefonts (in most distros I've tried) is a user-installable package, is that right?

Yes.

Supporting mscorefonts by default and recommending distros to move it to the list of installed-by-default extensions sounds like a good middle-ground solution.

Distros won't move mscorefonts to installed-by-default, due to unusual licensing involved. The situation is not good for privacy, but I still expect many users to reject this feature if we block mscorefonts. :-(

(In reply to Henri Sivonen (:hsivonen) from comment #2)

Supporting mscorefonts by default and recommending distros to move it to the list of installed-by-default extensions sounds like a good middle-ground solution.

Distros won't move mscorefonts to installed-by-default, due to unusual licensing involved. The situation is not good for privacy, but I still expect many users to reject this feature if we block mscorefonts. :-(

Fair enough.

see Bug 1388743 (and also my email)

(In reply to Simon Mainey from comment #4)

see Bug 1388743 (and also my email)

I don't quite understand what the telemetry question is and how it relates to the actionability of this bug. It's pretty clear that a) some users have self-installed fonts and b) it's easy enough for a fingerprinter to get a list of plausible font names to try for that to be a fingerprinting vector.


TIL: There's a thread on this topic in the CSSWG issue tracker:
https://github.com/w3c/csswg-drafts/issues/4055

It raises the issue of support for minority (on global scale) scripts and bandwidth issues.

A worthwhile quick exercise is looking at the list of writing systems by adoption and seeing which ones various operating systems ship fonts for and which ones don't get fonts. https://en.wikipedia.org/wiki/List_of_writing_systems#List_of_writing_scripts_by_adoption

On vanilla Fedora, the writing systems that made in on the adoption list in Wikipedia (which is obviously not exhaustive of the long tail) but don't get fonts in the default install are:

  • Javanese
  • Sundanese
  • Batak
  • Balinese
  • Modern Yi
  • Mongolian
  • New Tai Lue

The list is the same for me on Ubuntu where I have some self-installed fonts, which are not supposed to cover scripts that aren't covered by the default install.

The scripts without fonts on Windows 10 1809 are:

  • Sundanese
  • Batak
  • Balinese

On Mac and on Nokia 9, an Android One phone running Android 9, I see fonts for all the writing systems listed. If this can be achieved on a phone, it should be within the realm of possibility for Linux distros and Windows to achieve it, too.

Of the scripts that don't have fonts on some popular operating system despite making it to the list on Wikipedia, the one reported to have the most characters is Modern Yi at 1165. The others are below 100 (plus ligatures/shaping). The list is incomplete, but my understanding is that living scripts in the long tail that didn't make it to the list also have repertoires that are sufficiently small for site-supplied fonts to be small as WOFF2.

It is, of course, possible for even tiny WOFF2 fonts to be "too large" in places with really bad networks, but the usual concerns about font sizes for Chinese and Japanese are not what should be considered here, since fonts for Chinese and Japanese are part of the every operating system's default bundle.

One bandwidth concern that was raised is that if you install local copies of popular fonts from Google Fonts, you save on bandwidth. While this is true, this isn't even a usual power user action but a very specialist optimization to do. We shouldn't set the defaults for this use case. People who have the know-how to pursue this will be able to figure out how to uncheck a box in the prefs is this is the bandwidth/privacy tradeoff they wish to make.

Chinese and Japanese are not what should be considered here, since fonts for Chinese and Japanese are part of the every operating system's default bundle.

(Subsequent discussion on the CSS WG issue shows that it's not that simple in the case of Windows.)

(In reply to Henri Sivonen (:hsivonen) from comment #5)

(In reply to Simon Mainey from comment #4)

see Bug 1388743 (and also my email)

I don't quite understand what the telemetry question is and how it relates to the actionability of this bug. It's pretty clear that a) some users have self-installed fonts and b) it's easy enough for a fingerprinter to get a list of plausible font names to try for that to be a fingerprinting vector.

What we've been unable to find is a comprehensive database of what OS/versions come with what fonts.

What we have found is that the same font between versions can be detectably different.

So the telemetry aimed to help us better understand what information we would still leak by using only system fonts, and what would be usably available by users if we did.

(In reply to Tom Ritter [:tjr] from comment #7)

What we've been unable to find is a comprehensive database of what OS/versions come with what fonts.

It seems to me that designing a telemetry experiment for that in a privacy-preserving way (that users also understand to be privacy-preserving) would be much harder than taking the time install a bunch of OSs from scratch and taking notes.

AFAICT, Windows 10 has a "download all" option, and on macOS FontBook shows the downloadables as gray, so it's easy to trigger download on macOS as well. (AFAICT, there's just one of these: Myriad Arabic)

This doesn't show how many anonymity sets the user population is actually divided due to conditionally-present system fonts, but getting the full list of system-bundled fonts for Windows 10 and macOS doesn't look too hard.

For Ubuntu and Fedora, it might be worth asking their developers and looking at their installer sources to see if there are fonts that are installed on the condition of a localization being enabled.

Estimating actual user configs by experimenting with installation probably won't work for Debian: A while back I installed Debian in Japanese in the hope of seeing what Japanese text input method it installs by default, and I discovered that the answer is "none" even when you've chosen Japanese as the language of the installer and the to-be-installed system! Trying to discover the actual Debian configurations by telemetry probably would not be well received.

What we have found is that the same font between versions can be detectably different.

Yeah, it seems like a good assumption that OS version splits the anonymity set. Since unshipping fonts is really disruptive for existing word processing documents, figuring out the allow-list for the latest version of a given OS and assuming that recent-ish OS version had the same fonts or a subset probably goes pretty far.

(In reply to Henri Sivonen (:hsivonen) from comment #8)

This doesn't show how many anonymity sets the user population is actually divided due to conditionally-present system fonts, but getting the full list of system-bundled fonts for Windows 10 and macOS doesn't look too hard.

Microsoft has pretty comprehensive documentation about what the conditionally-present buckets are:
https://docs.microsoft.com/en-us/typography/fonts/windows_10_font_list

There are 24 of these, and one, "Pan-European Supplemental Fonts" appears to be not like the others in the sense that it's documented not to autoinstall with any language. Indeed, a fresh en-US install of Windows 10 does not have the Pan-European Supplemental Fonts set despite it being relevant to English. (It might not be entirely unreasonable for a system font allow-list for Windows 10 to include everything except the Pan-European Supplemental Fonts set, for which the essential for Web experience vs. arbitrary fingerprinting bit arguably goes the other way relative to the other conditionally-present packs.)

Apple has comprehensive-looking docs as well:
https://support.apple.com/en-us/HT208968

Curiously, there are three categories:

  1. Included by default
  2. Downloadable
  3. Fonts available by name but not by requesting the system font list!

The existence of category 3 is news to me. It appears to contain old versions of Apple-shipped fonts, Noto, and a mix of others whose history I don't recognize. Now I'm curious about how category 3 behaves relative to Firefox.

Maybe a more interesting measurement question that "What fonts do users have?", which is problematic for privacy, is "Which ones of these do Web devs specify and expect to work without @font-face?", which could be measured by the Web crawler that we already have for measuring trackers.

Notably, macOS downloadable fonts can be installed by the user one-by-one whereas the Windows 10 conditional packs other than "Pan-European Supplemental Fonts" install with language support, so the macOS downloadables are probably closer to any other user-installed font in terms of fingerprint than the Windows 10 packs are.

The existence of category 3 is news to me. It appears to contain old versions of Apple-shipped fonts, Noto, and a mix of others whose history I don't recognize. Now I'm curious about how category 3 behaves relative to Firefox.

I'm looking at High Sierra rather than Mojave, but the story is similar. These fonts are found in several directories under "/Library/Application Support/Apple/Fonts/". Firefox explicitly makes the fonts from the "Language Support" directory (primarily lots of Noto fonts) available, as these provide coverage for a bunch of Unicode ranges that are otherwise unsupported by any default fonts, but not the others (fonts associated with iLife and iWork, and deprecated fonts from older OS versions).

Notably, macOS downloadable fonts can be installed by the user one-by-one

I believe that on different localized versions of macOS, some of these fonts may be installed by default, so that's another way they end up being present on user's systems.

(In reply to Jonathan Kew (:jfkthame) from comment #11)

These fonts are found in several directories under "/Library/Application Support/Apple/Fonts/". Firefox explicitly makes the fonts from the "Language Support" directory (primarily lots of Noto fonts) available, as these provide coverage for a bunch of Unicode ranges that are otherwise unsupported by any default fonts, but not the others (fonts associated with iLife and iWork, and deprecated fonts from older OS versions).

Nice! Thanks.

(In reply to Jonathan Kew (:jfkthame) from comment #12)

I believe that on different localized versions of macOS, some of these fonts may be installed by default, so that's another way they end up being present on user's systems.

Sigh. Everything is more difficult than it seems. :-(

I guess this needs research into whether certain system language setting trigger install and whether Web sites actually commonly specify these for non-fingerprinting purposes and such that these fonts actually get used (as opposed to being listed "just in case" at the end of the CSS font list). It would also be useful to have some characterization of the level of value added (i.e. are these more like Windows 10 Pan-European Supplemental Fonts, which are basically yet more fonts for languages already well served by the other fonts, or are these more like Windows 10 Japanese pack without which there's no mincho-style Japanese font at all).

Attachment #9098235 - Attachment description: fc-list --verbose from Fedora 31 beta → fc-list --verbose from Fedora 31 beta (English)

Some super-quick observations:

  • The hidden Noto directory on macOS is surprisingly small in terms of disk space. It should be quite possible for Windows 10, Ubuntu, and Fedora to catch up with macOS, Chrome OS, and Android on this point if they cared to.
  • Despite much Unicode activity being historic scripts and emoji these days, there does appear to be living scripts still being proposed, so we should be careful not break the Web adoption path for those.
  • At least in case of English, "Normal" vs. "Minimal" Ubuntu install mode does not affect fonts.
  • Ubuntu does the Windows 10 thing where en-US ships with broad script coverage but limited font styles, and enabling some languages autoinstalls more font styles.
  • I didn't test if an Ubuntu install in a non-European script drops any fonts present in en-US.
  • I don't see UI in Fedora for adding language packs. E.g. IMEs ships out-of-the-box unlike on en-US Ubuntu and there's a broader set of CJK font weight out-of-the-box than on en-US Ubuntu.

Even if having at most one language pack with conditional fonts installed would keep the number of anonymity sets still useful with Windows 10 and Ubuntu, without OS cooperation, it's hard to provide fingerprinting protection for users who install more than one pack without installing all of them. (Privacy-wise, it would be good if installing more than one installed them all.)

Maybe this needs a three-state setting:

  1. Allow only fonts that are part of the operating system's minimal install (allows language pack users to appears as non-language pack users to the Web).
  2. (Default) Allow fonts bundled with the operating system (i.e. optional language packs allowed), mscorefonts on Linux, and fonts that add supplementary-plane scripts (to avoid adverse effects on the adoption path of scripts still being added to Unicode).
  3. Allow all local fonts.

(Item 2 might need refinement for Mac to exclude bundled-but-optional fonts, since one-by-one install of those in pretty bad for fingerprinting.)

That kind of thing would still protect the common case of users who have at most one language pack and who install (knowingly or, as part of an app install, unknowingly) additional BMP fonts for non-Web uses. It would still be rather unsatisfactory how much accidental fingerprinting opportunity that kind of default formulation would leave open (e.g. installing more than one language pack without installing them all).

For Windows 7 and Linux distros that leave configuring stuff more up to the user, the default would probably need to be item 3.

(In reply to Henri Sivonen (:hsivonen) from comment #18)

(Privacy-wise, it would be good if installing more than one installed them all.)

FWIW, the base set for Windows 10 is 343 MB and the full set appears to be "Size: 3.73 GB, Size on disk: 3.39 GB". (File system compression at work maybe?)

Blocking user-installed fonts may also improve content processes' startup perf and memory usage.

Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: