Some Fonts Display as Blank due to content-process sandbox

VERIFIED FIXED in Firefox 58

Status

()

defect
P2
normal
VERIFIED FIXED
2 years ago
2 years ago

People

(Reporter: meiamsome, Assigned: jfkthame)

Tracking

57 Branch
mozilla59
Unspecified
Linux
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(relnote-firefox 58+, firefox58 fixed, firefox59 verified)

Details

(Whiteboard: sb+)

Attachments

(6 attachments, 6 obsolete attachments)

117.20 KB, image/png
Details
16.48 KB, patch
lsalzman
: review+
Details | Diff | Splinter Review
5.57 KB, patch
gcp
: review+
Details | Diff | Splinter Review
5.84 KB, patch
lsalzman
: review+
Details | Diff | Splinter Review
38.07 KB, patch
lsalzman
: review+
Details | Diff | Splinter Review
2.15 KB, patch
lsalzman
: review+
Details | Diff | Splinter Review
User Agent: Mozilla/5.0 (X11; Linux x86_64; rv:57.0) Gecko/20100101 Firefox/57.0
Build ID: 20171024001719

Steps to reproduce:

Go to some websites with specific font declarations, for example:

http://jupyter-notebook.readthedocs.io/en/stable/config.html





Actual results:

Fonts inside the pre and code elements on this website are rendered as blank strings

Specifically, their font declaration is:

font-family: Consolas,"Andale Mono WT","Andale Mono","Lucida Console","Lucida Sans Typewriter","DejaVu Sans Mono","Bitstream Vera Sans Mono","Liberation Mono","Nimbus Mono L",Monaco,"Courier New",Courier,monospace


Expected results:

It should use a font that exists rather than using a blank font
Cannot reproduce the problem. Font is displayed correctly here (but running Firefox 56 on Fedora 27).
Component: Untriaged → Graphics: Text
Product: Firefox → Core
Are you using a flatpak distribution of Firefox?

If you go to about:config and change the setting "security.sandbox.content.level" to 1 (instead of the default, which I think is 3), and then restart the browser, does that make any difference?
Flags: needinfo?(meiamsome)
I'm not sure what flatpak means in this context - I am using Firefox from the firefox-next Ubuntu PPA.

That configuration was at the default value of 3. Changing to 1 or 2 results in the font being displayed correctly.
Flags: needinfo?(meiamsome)
(In reply to meiamsome from comment #3)
> I'm not sure what flatpak means in this context - I am using Firefox from
> the firefox-next Ubuntu PPA.
> 
> That configuration was at the default value of 3. Changing to 1 or 2 results
> in the font being displayed correctly.

OK, that implies the problem is caused by the content-process sandbox preventing access to some of your font files. Moving this bug report to the Sandboxing component.

Do you know whether you have fonts installed in locations *other* than under the standard directories such as /usr/share, /usr/local/share, /usr/X11R6/lib/X11/fonts, or ~/.fonts ?
Component: Graphics: Text → Security: Process Sandboxing
OS: Unspecified → Linux
Aha, that certainly is the case.

This appears to have stemmed from a multitude of problems:

The specific font of issue was "Lucida Sans Typewriter", found by removing fonts form the list until it started working.
This font is included in Java, but I have never installed Java so that is a bit weird.

So, run fc-list | grep Lucidia, shows that the fonts are from a JVM install, but one that is included as part of Processing - and I had set up in my Downloads folder for whatever reason, but why is that being registered on fontconfig? After looking through the configuration, it looks like by default the downloads folder is included in ~/.config/fontconfig/conf.d/09-Directories.conf Removing that entry also resolved the problem.

Let me know if this is still something to look into for Firefox & if you need anything from me!
OK, this illustrates a scenario where the current rules being used for the content-process sandbox will result in pretty bad breakage for users. (If a widespread fontconfig setup includes the Downloads folder, I'm sure you are not alone in ending up with fonts there that fontconfig can "see", with the result that Firefox tries to use them, but fails due to sandboxing.)

Gian-Carlo, Jed: this is similar to the issue with flatpak font directories in bug 1396733. Anything we can do to make it better? The user-experience for affected users here is pretty dire...
Flags: needinfo?(jld)
Flags: needinfo?(gpascutto)
Summary: Fonts Display as Blank → Some Fonts Display as Blank due to content-process sandbox
Which Ubuntu version is this? Including the Download directory in the fontconfig dirs is (for many other reasons than sandboxing!) hopefully not the actual default...

I don't see any good or easy fix for this. Blocking the downloads dir from being accessed from content is obvious "correctly working as intended". Assuming that moving GTK out of content is going to be a many months project, what other options do we have? Is loading fonts over IPC from the parent (like printing!) a possibility? At what level do those font selections and substitutions get made? Anything that can see which fonts are in the defaults dir and which one's aren't (and hence untrusted)?
For example, an attack could work like this: trick the user into downloading a font that claims to be "Ubuntu" but substitutes several characters (you can probably do nice things with kerning or ligatures here by replacing groups?), then redirect him to "paypal.com" (which is what your font will render, but not where the user will be!), and voila, undetectable phishing.

That's the kind of thing I have in mind when I say "I hope this is not the default"!
Flags: needinfo?(gpascutto)
Yes, I agree that allowing arbitrary fonts to load from the Downloads directory seems unwise in general. (I don't know where the OP's configuration comes from; my Ubuntu system doesn't appear to have anything like that.)

(FWIW, I'm not sure whether your suggested scenario in comment 8 is directly relevant, though... isn't the URL bar rendered by the chrome process, not content? In which case it wouldn't be blocked from using a fake Ubuntu font from the Downloads dir, would it?)

I don't really know what a good solution here would look like. If nothing else, it would be useful if we could somehow inform the user that fonts are being blocked, and what they should do to resolve this (e.g. install desired fonts in standard locations); the current behavior where text simply disappears from the page, with no clue to the user, isn't good. But I'm not sure where/how we would detect such an issue...
(In reply to Jonathan Kew (:jfkthame) from comment #9)
> (FWIW, I'm not sure whether your suggested scenario in comment 8 is directly
> relevant, though... isn't the URL bar rendered by the chrome process, not
> content? In which case it wouldn't be blocked from using a fake Ubuntu font
> from the Downloads dir, would it?)

Exactly. This setup is a security hole, and that's unrelated to the sandboxing system (it does end up highlighting the issue!). It probably works against other browsers, too...

> I don't really know what a good solution here would look like. If nothing
> else, it would be useful if we could somehow inform the user that fonts are
> being blocked, 

That's going to be a major UX pain in itself. If we can (even) detect this case, can we stop the font substitution...somewhere?
Maybe if we could modify this so the content-process rules are checked? Due to the layering (this is in third-party Skia code), doing this on a non-messy way is going to be hard, but if it looks like that could work I can try!

http://searchfox.org/mozilla-central/source/gfx/skia/skia/src/ports/SkFontMgr_fontconfig.cpp#723
Whiteboard: sb?
(In reply to Gian-Carlo Pascutto [:gcp] from comment #10)
> > I don't really know what a good solution here would look like. If nothing
> > else, it would be useful if we could somehow inform the user that fonts are
> > being blocked, 
> 
> That's going to be a major UX pain in itself. If we can (even) detect this
> case, can we stop the font substitution...somewhere?

There's not really a "substitution" going on, I don't think. We ask fontconfig for the "system" font set at startup, and if fontconfig has been told to include ~/Downloads (for example) in the configuration, then whatever fonts are found there will be part of the system font set. And so they'll be eligible for selection when we do the CSS font-matching algorithm based on the font-* properties and the characters in the text.

I suppose we could probably test each font in gfxFcPlatformFontList::AddFontSetFamilies to see whether the file is readable, and if not, omit it from our list; but I'm concerned what the startup-perf impact of that would be. Currently, I expect we are able to initialize our font list without hitting every font file on disk, because we just get the details out of fontconfig's cache. If we have to check the true accessibility of each file, there'll presumably be a pretty severe i/o cost.

We could perhaps make this cheaper if gfxFcPlatformFontList::AddFontSetFamilies were made aware of the font directories that the sandbox allows, so it could determine which fonts to ignore by just comparing paths rather than actually doing disk access. Or is there some kind of sandbox-related API whereby the process can ask "will I be allowed to read /path/to/some/file?" more cheaply than doing an open() or similar call and seeing if it fails?
>Or is there some kind of sandbox-related API whereby the process can ask "will I be allowed to read /path/to/some/file?" 
>more cheaply than doing an open() or similar call and seeing if it fails?

There is no such API, but presuming that this runs in chrome near startup, we can indeed check the paths against the allowed list. I can expose such an API. Let me have a look at AddFontSetFamilies.
So I wonder if something like this (untested) might work? Though I'm concerned it'll hurt startup time somewhat... but maybe we should test this approach, at least.
(IIUC, this runs during the *content* process startup. It also runs in the chrome process; each has its own platform font list. I assume it's only in the content process that we run into issues here.)
(In reply to Jonathan Kew (:jfkthame) from comment #15)
> (IIUC, this runs during the *content* process startup. It also runs in the
> chrome process; each has its own platform font list. I assume it's only in
> the content process that we run into issues here.)

But if it runs during content process startup, how does it end up with fonts it can't access? It uses the cached list from somewhere? 

If the font selection happens in content (why else is it building that list in content), can we do the check whether we can actually open the font in a lazy manner at the time it is needed?
BTW. The problem of doing the access check in content is that you are going to generate a trap and IPC for every single file you try to access (due to the sandbox). Of course that's not as bad as on chrome startup but the performance impact of that could be pretty noticeable. We've certainly been sandboxing on the assumption that we want to get rid of content I/O, not add it :)
(In reply to Gian-Carlo Pascutto [:gcp] from comment #16)
> But if it runs during content process startup, how does it end up with fonts
> it can't access? It uses the cached list from somewhere? 

Font selection happens in content, yes; it's a per-character function that depends on the result of resolving all the CSS rules in effect, combined with browser prefs, and the actual characters that turn out to be present in the text.

My understanding is that fontconfig returns (cached) details of all the fonts present in the directories it has been told to scan, but does not filter this according to what is or isn't accessible to the current process. It's been possible in the past to cause problems by using user/group permissions to make font files unreadable to the user's process; the fonts can still be in fontconfig's set, so we select them during CSS font-matching, but then they'd fail to render. So the sandbox rules just introduce a new model of footgun here.

> If the font selection happens in content (why else is it building that list
> in content), can we do the check whether we can actually open the font in a
> lazy manner at the time it is needed?

Possibly, but that is a really hot codepath (font selection happens at least once per individual character of text, and can be a significant contributor to layout profiles), so I'd be very wary of introducing any additional work there.
Are you able to detect when drawing the font failed, and then take a sort of exception to check whether it was due to lack of file access, and then take a fallback path (that does the access check)?

It's probably much faster to recover from failure than  to try to predict it.
(This would also solve your non-sandboxing case!)
(In reply to Gian-Carlo Pascutto [:gcp] from comment #17)
> BTW. The problem of doing the access check in content is that you are going
> to generate a trap and IPC for every single file you try to access (due to
> the sandbox). Of course that's not as bad as on chrome startup but the
> performance impact of that could be pretty noticeable. We've certainly been
> sandboxing on the assumption that we want to get rid of content I/O, not add
> it :)

Yeah, I can certainly believe there's a perf concern with this.

Can the content process ask the sandbox (as a single IPC operation) for a list of allowed paths, so that it can then check the font paths without having to do IPC for each one of them? I guess that'd be non-trivial, though, given all the ways "allowed paths" might be specified.... :(
(In reply to Gian-Carlo Pascutto [:gcp] from comment #19)
> Are you able to detect when drawing the font failed, and then take a sort of
> exception to check whether it was due to lack of file access, and then take
> a fallback path (that does the access check)?

I'm not sure, offhand. I guess that failure happens somewhere within freetype or skia, and I'm not sure how (if at all) it's exposed to us; it'll take a bit of digging.

Also ni'ing Lee, who might be more familiar with this stuff. (See above for background.)
Flags: needinfo?(lsalzman)
Not without duplicating the full Sandboxing ruleset (which supports prefixes, symlink handling, etc). If we'd allow copying that list into content, then we could pre-filter Sandbox IO requests that we know are going to fail (not 100% sure this is possible with symlinks and what not) and limit this handling to the sandboxing code. But it's throw away work, obviously, because we don't want to allow file IO.

If we run the code in chrome on startup, that can do the path verification without IPC. If you can then send back a list of paths/fonts to remove back to content, that seems like a cleaner approach.

If the file path for the font is readily available, the access right can be cached in a hashtable. This may be acceptable for the hot codepath.

Detecting a failure from freetype/skia would be preferable, though.
(In reply to Gian-Carlo Pascutto [:gcp] from comment #23)
> If we run the code in chrome on startup, that can do the path verification
> without IPC. If you can then send back a list of paths/fonts to remove back
> to content, that seems like a cleaner approach.

On some platforms (currently Android and macOS, IIRC) we build the font list once in the chrome process, and then pass it to the content process on startup (instead of letting content build its own version of the list). So perhaps we should aim to do the same thing for Linux; and make sure that when chrome originally builds the list, it checks the paths against the sandbox rules, so it doesn't tell content about any fonts that won't be accessible.
Does Android use regular fontconfig or does it have magic? If it has no magic, then making the code usable on regular Linux is hopefully fairly doable?
It doesn't use fontconfig at all, we just iterate over the font files present in a known standard location. It's Linux that has "magic", to try and respect the user's fontconfig setup.
(Because fontconfig is about more than just providing a list of font files; it also does aliasing, substitutions, lang-based preferences, and stuff like that...)
Having a solution that does not require invasive changes to Cairo and Skia is definitely preferred here. Having file access fail and then having to go back and check for access would require at the very least some annoying mucking in Cairo. We don't really know what file we're going to be looking at until font matching has occurred, which is happening down in Cairo.

It would be nicer if we could ahead of time use the results we're getting inside gfxFcPlatformFontList::InitFontListForPlatform to just whitelist whatever there is at that point in time, or something to the effect of FcConfigGetFontDirs and whitelisting dirs instead of individual fonts.
Flags: needinfo?(lsalzman)
(In reply to Lee Salzman [:lsalzman] from comment #28)
> It would be nicer if we could ahead of time use the results we're getting
> inside gfxFcPlatformFontList::InitFontListForPlatform to just whitelist
> whatever there is at that point in time, or something to the effect of
> FcConfigGetFontDirs and whitelisting dirs instead of individual fonts.

This is what I'd suggest.  We're already whitelisting a few other configurable paths, as well as doing something similar indirectly by letting symlink targets inherit read permissions.  Longer-term we'll want font loading to move out of the content process, and my understanding is that trusted UI like the address bar is rendered in the parent process anyway, so if we're really concerned about font download attacks then that should be addressed directly and browser-wide, not as a side-effect of sandboxing.
Flags: needinfo?(jld)
Flags: needinfo?(gpascutto)
Release Note Request (optional, but appreciated)
[Why is this notable]: Webpages missing text are a pretty fatal problem from a browser users' point of view
[Affects Firefox for Android]: No
[Suggested wording]: Fonts installed in non-standard locations might show up blank in webpages. (Linux only)
[Links (documentation, blog post, etc)]: This bug, I might write a small blog about it.
relnote-firefox: --- → ?
Flags: needinfo?(gpascutto)
Attachment #8923483 - Attachment is obsolete: true
I've put together a patch that switches the content process from getting the list of fonts directly from fontconfig (which may include fonts that are inaccessible due to sandboxing) and instead passes it from the chrome process to each child (similar to what we're already doing for child-process startup perf reasons on macOS).

This means that if we add a hook to let the chrome process validate each font path against the sandbox rules, and filter out any fonts that will be blocked, we should be able to resolve this issue.

:gcp, could you look at providing an API (per comment 13) that we can call from gfxFcPlatformFontList::AddFontSetFamilies (in chrome) to check whether a given path will be readable for the content process?
Flags: needinfo?(gpascutto)
See the TODO comment in patch 2 for where we need to call a sandbox-related API to check font-file paths.
Jonathan, what about whitelisting the paths instead (as proposed by comment 28/29)? If you can see the file path in chrome, then it's just as easy to add a single-file exception as it is to feed back a "yes/no".
Flags: needinfo?(gpascutto)
(In reply to Gian-Carlo Pascutto [:gcp] from comment #35)
> Jonathan, what about whitelisting the paths instead (as proposed by comment
> 28/29)? If you can see the file path in chrome, then it's just as easy to
> add a single-file exception as it is to feed back a "yes/no".

Would that be able to deal with on-the-fly changes? New fonts could be installed while the browser is running, and will show up in fontconfig's list, but (IIUC) we wouldn't be able to add those to the whitelist for an already-started content process, which therefore could end up with broken rendering.
Ok, then disregard the proposed patch. I didn't know the list could change while running.

On Linux the whitelist can be modified on-the-fly though...in theory. In practice our code seems to make it const. I guess a bunch of locking would need to be added if we want to support on-the-fly modification. So let's not go there.
 
ContentParent -> mSandboxBroker -> mPolicy -> Lookup() & MAY_READ is what you need.
Attachment #8924965 - Attachment is obsolete: true
Cool, thanks - I'll try to take a look at that when I'm back on the Linux machine.

A question, though: given that we're setting up the font list during gfxPlatform initialization in the chrome process, will there actually be any content processes yet? What's the proper way to get hold of a ContentParent here, in order to interrogate its sandbox policy?
If there are no content processes, there won't be a policy yet.

Create a temporary one in your code and use that to do lookups, thus avoiding ContentParent entirely?
http://searchfox.org/mozilla-central/rev/af86a58b157fbed26b0e86fcd81f1b421e80e60a/dom/ipc/ContentParent.cpp#2395
OK, here's an attempt at a patch to check sandbox policy when building the font list. I'll see if I can set up a "broken" font configuration locally, and test whether this does in fact resolve things as intended.
Yes, this seems to do the trick. I configured my fontconfig to pick up fonts from an extra "local" directory (which won't be known to the sandbox), and put a copy of my system's default bold sans-serif font in there. Sure enough, with current Nightly, this caused bold text to disappear on web pages that use sans-serif, because fontconfig finds the "local" copy but the sandbox prevents content using it.

With the patches above, the problem doesn't occur, because we know the local font is blocked and so exclude it from the font list; so then we continue to use the standard, system-installed one.
Minor fixup - include the same checks as ContentParent for whether sandboxing is actually enabled.
Attachment #8925046 - Attachment is obsolete: true
Attachment #8924944 - Flags: review?(lsalzman)
Attachment #8924945 - Flags: review?(lsalzman)
Attachment #8925068 - Flags: review?(gpascutto)
Comment on attachment 8924945 [details] [diff] [review]
patch 2 - Rework the fontconfig-based platform font list implementation to pass the list of available font patterns from chrome to content, instead of letting the content process get it directly from fontconfig

Also tagging :jld for review here, because of the added sync-ipc message. Note that this is used only in the (rare) case that the font configuration changes at runtime; normally, we just get the list via SetXPCOMProcessAttributes at content startup time.
Attachment #8924945 - Flags: review?(jld)
Attachment #8924944 - Flags: review?(lsalzman) → review+
Attachment #8924945 - Flags: review?(lsalzman) → review+
Comment on attachment 8925068 [details] [diff] [review]
patch 3 - Check the sandbox policy to verify font files will be readable by the content process before including them in the system font list

Review of attachment 8925068 [details] [diff] [review]:
-----------------------------------------------------------------

Nice work.

::: gfx/thebes/gfxFcPlatformFontList.cpp
@@ +1311,5 @@
>  }
>  
>  void
> +gfxFcPlatformFontList::AddFontSetFamilies(FcFontSet* aFontSet,
> +                                          SandboxPolicy* aPolicy,

I guess this could be const, though we don't consistently care about that in our code.

@@ +1483,5 @@
> +    // Create a temporary SandboxPolicy to check font paths; use a fake PID
> +    // to avoid picking up any PID-specific rules by accident.
> +    SandboxBrokerPolicyFactory policyFactory;
> +    if (GetEffectiveContentSandboxLevel() > 0 &&
> +        !PR_GetEnv("MOZ_DISABLE_CONTENT_SANDBOX")) {

Maybe this env var check could be folded into GetEffectiveContentSandboxLevel (not in this bug/patch, though).
Attachment #8925068 - Flags: review?(gpascutto) → review+
(In reply to Gian-Carlo Pascutto [:gcp] from comment #46)
> Maybe this env var check could be folded into
> GetEffectiveContentSandboxLevel (not in this bug/patch, though).

Turns out we already have bug 1365257 for that, which we should probably revisit.
Sorry for the churn; re-requesting review as this is a changed (significantly improved) version of patch 2. Instead of having the content processes send a request to the parent for the new font list when they observe a fontconfig change, we can drive this more simply from the parent-process end: only it needs to monitor for fontconfig changes, and then it can send a message to all its child content processes with the new list. This avoids running a separate CheckFontUpdates task on a repeating timer in each child process.
Attachment #8925241 - Flags: review?(lsalzman)
Attachment #8924945 - Attachment is obsolete: true
Attachment #8924945 - Flags: review?(jld)
This is just a small "bonus" patch: having introduced the UpdateFontList message sent by ContentParent::NotifyUpdatedFonts, we might as well use this mechanism on macOS as well to push a new font list to the content processes, so they don't need to register separately with the CFNotificationCenter to handle font-list changes.
Attachment #8925242 - Flags: review?(lsalzman)
Attachment #8925241 - Flags: review?(lsalzman) → review+
Attachment #8925242 - Flags: review?(lsalzman) → review+
Pushed by jkew@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/c6ad01bf0b66
patch 1 - Wrap the font family record used to pass font info from chrome to content on macOS in a union, in preparation for using the same mechanism with a different type of font record on Linux. (No functional change here, just the added union type and some renamings from 'font family list' to 'font list' to be more generic.) r=lsalzman
https://hg.mozilla.org/integration/mozilla-inbound/rev/bf35d13e27f3
patch 2 - Rework the fontconfig-based platform font list implementation to pass the list of available font patterns from chrome to content, instead of letting the content process get it directly from fontconfig. r=lsalzman
https://hg.mozilla.org/integration/mozilla-inbound/rev/a1c1fe33a052
patch 3 - Check the sandbox policy to verify font files will be readable by the content process before including them in the system font list. r=gps
https://hg.mozilla.org/integration/mozilla-inbound/rev/f9763adebe68
patch 4 - Also adopt the ContentParent::NotifyUpdatedFonts method on macOS, so that only the parent process needs to register with CFNotificationCenter for font-changed notifications. r=lsalzman
Backout by jkew@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/df53224b9171
Backed out csets f9763adebe68, a1c1fe33a052, bf35d13e27f3, c6ad01bf0b66 for ASan failures.
This failed to build on linux64-asan, which is an easy fix: the "dummy" SandboxPolicy type needs to be a fully-defined (empty) struct, not just a forward-declared one. (So the ASAN build apparently doesn't have MOZ_CONTENT_SANDBOX defined, FWIW. Not sure if that's a deliberate choice?)

But what's more troubling is that it also failed on all the talos tests, apparently failing to instantiate any fonts. Yet reftests etc for the same builds run fine. And I can't seem to reproduce this locally; I've tried running ./mach talos-test ... on my linux64 opt build, and it works fine for me.

So that needs to be diagnosed and fixed before re-landing here. It's not yet clear to me what's going wrong there.
(In reply to Jonathan Kew (:jfkthame) from comment #53)
> (So the ASAN build apparently doesn't have MOZ_CONTENT_SANDBOX defined, FWIW. Not sure if that's a deliberate choice?)

Deliberate.  There are some fundamental issues with LSan; see bug 1287971 (but see also bug 1386297 comment #1 about another possible way to handle that).
Just a heads up, there was a pretty big AWSY regression when this landed initially. Jonathan can you make sure to do an AWSY run before landing again? Let me know if there's anything I can do to help out.
Flags: needinfo?(jfkthame)
(In reply to Jonathan Kew (:jfkthame) from comment #53)
> But what's more troubling is that it also failed on all the talos tests,
> apparently failing to instantiate any fonts. Yet reftests etc for the same
> builds run fine. And I can't seem to reproduce this locally; I've tried
> running ./mach talos-test ... on my linux64 opt build, and it works fine for
> me.
> 
> So that needs to be diagnosed and fixed before re-landing here. It's not yet
> clear to me what's going wrong there.

Well, @#$%! We're failing on systems with fontconfig < 2.9, because FcNameUnparse discards the FC_FILE element from the pattern, and so the content processes end up unable to find any font files when using the font patterns that were serialized and passed from the chrome process.

This FcNameUnparse bug was fixed in https://bugs.freedesktop.org/show_bug.cgi?id=26718 (and fontconfig-2.9.0 was released in early 2012), but it looks like we're running the talos jobs with an older version.

It's tempting to simply require a more up-to-date fontconfig library, but I guess if our talos systems are using such an old version, there are probably other people in the same position, and it'd be nice if Firefox continued to work for them. I think we should be able to work around the bug without too much pain; I'll give it a shot.
Flags: needinfo?(jfkthame)
(In reply to Eric Rahm [:erahm] (please no mozreview requests) from comment #55)
> Just a heads up, there was a pretty big AWSY regression when this landed
> initially. Jonathan can you make sure to do an AWSY run before landing
> again? Let me know if there's anything I can do to help out.

Yeah, thanks - I'll look into it.
(In reply to Eric Rahm [:erahm] (please no mozreview requests) from comment #55)
> Just a heads up, there was a pretty big AWSY regression when this landed
> initially. Jonathan can you make sure to do an AWSY run before landing
> again? Let me know if there's anything I can do to help out.

There was indeed a leak bug in the part-2 patch (failing to release an extra reference to the FcPattern for each font), but fixing this doesn't seem to have much impact on the AWSY figures, afaict.

I suspect what we're seeing is related to allocation happening internally in fontconfig as we serialize and deserialize all the font patterns (to send them across the IPC channel to the content process), but not sure how to get more specific than that.
Once more, with fingers crossed. The main fix here is the added code to work around the bug in fontconfig versions < 2.9, by manually adding the FC_FILE element to the serialized patterns (see gfxFcPlatformFontList::ReadSystemFontList). Also added FcPatternDestroy in InitFontListForPlatform() to release the deserialized pattern correctly (the font family record holds its own strong reference, so it won't actually get deleted until the whole list is destroyed/recreated). Tryserver says this no longer turns things red -- even the talos jobs with old fontconfig: https://treeherder.mozilla.org/#/jobs?repo=try&revision=88e23ebb0f97f13174172fe954251b32f809a78b.
Attachment #8926316 - Flags: review?(lsalzman)
Attachment #8925242 - Attachment is obsolete: true
Attachment #8926316 - Flags: review?(lsalzman) → review+
Sorry, *this* (not part 4) was the patch where I actually intended to ask for re-review, as this is where the old-fontconfig workaround actually lives. This was the main fix needed to make the talos jobs happy.
Attachment #8926420 - Flags: review?(lsalzman)
Attachment #8925241 - Attachment is obsolete: true
Attachment #8926420 - Flags: review?(lsalzman) → review+
Assignee: nobody → jfkthame
Priority: -- → P2
Whiteboard: sb? → sb+
Pushed by jkew@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/ea8ee40ed426
patch 1 - Wrap the font family record used to pass font info from chrome to content on macOS in a union, in preparation for using the same mechanism with a different type of font record on Linux. (No functional change here, just the added union type and some renamings from 'font family list' to 'font list' to be more generic.) r=lsalzman
https://hg.mozilla.org/integration/mozilla-inbound/rev/72a6f5f3512c
patch 2 - Rework the fontconfig-based platform font list implementation to pass the list of available font patterns from chrome to content, instead of letting the content process get it directly from fontconfig. r=lsalzman
https://hg.mozilla.org/integration/mozilla-inbound/rev/eec946b59360
patch 3 - Check the sandbox policy to verify font files will be readable by the content process before including them in the system font list. r=gps
https://hg.mozilla.org/integration/mozilla-inbound/rev/75e7f32c3365
patch 4 - Also adopt the ContentParent::NotifyUpdatedFonts method on macOS, so that only the parent process needs to register with CFNotificationCenter for font-changed notifications. r=lsalzman
Latest try pushes seem to indicate that the big AWSY regression is no longer happening, so I've re-landed the (updated) patches here; let's see how things look on inbound.
Status: UNCONFIRMED → ASSIGNED
Ever confirmed: true
This seems to have broken system-installed fonts completely for some Linux users; see bug 1416229. In view of those reports, I have backed out the patches here, pending further investigation:

https://hg.mozilla.org/integration/mozilla-inbound/rev/43cf3f8f1fd6808a500c4e47e43a0fb939685de2
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Depends on: 1416359
This patch (on top of the existing part 2) should solve the issue with fontconfig 2.11.0. I verified that a try build including this patch works as expected on a fresh Debian9 VM (whereas yesterday's Nightly showed the broken-fonts problem there). Thanks again for all your help diagnosing this.
Attachment #8927572 - Flags: review?(lsalzman)
Comment on attachment 8927572 [details] [diff] [review]
patch 2.1 - Work around FcNameParse bug in fontconfig versions around 2.11.0, by escaping any leading space in the encoded charset element

This is good, but you may want to make the comment more detailed to say that 2.10.94 is when they introduced the code to skip leading whitespace, and 2.11.1 is when they changed the charset format to no longer be susceptible to this bug.
Attachment #8927572 - Flags: review?(lsalzman) → review+
Pushed by jkew@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/163c693012c0
patch 1 - Wrap the font family record used to pass font info from chrome to content on macOS in a union, in preparation for using the same mechanism with a different type of font record on Linux. (No functional change here, just the added union type and some renamings from 'font family list' to 'font list' to be more generic.) r=lsalzman
https://hg.mozilla.org/integration/mozilla-inbound/rev/e549fef3f2a2
patch 2 - Rework the fontconfig-based platform font list implementation to pass the list of available font patterns from chrome to content, instead of letting the content process get it directly from fontconfig. r=lsalzman
https://hg.mozilla.org/integration/mozilla-inbound/rev/c32418d90230
patch 2.1 - Work around FcNameParse bug in fontconfig versions around 2.11.0, by escaping any leading space in the encoded charset element. r=lsalzman
https://hg.mozilla.org/integration/mozilla-inbound/rev/3a0e53eec16a
patch 3 - Check the sandbox policy to verify font files will be readable by the content process before including them in the system font list. r=gps
https://hg.mozilla.org/integration/mozilla-inbound/rev/93bf1194f09a
patch 4 - Also adopt the ContentParent::NotifyUpdatedFonts method on macOS, so that only the parent process needs to register with CFNotificationCenter for font-changed notifications. r=lsalzman
Backout mentioned in comment 65 canceled the regression highlighted in bug 1416177:

== Change summary for alert #10479 (as of Fri, 10 Nov 2017 14:10:45 GMT) ==

Improvements:

  7%  Heap Unclassified summary linux32-stylo-disabled opt stylo-disabled     46,015,099.99 -> 42,989,389.83
  6%  Heap Unclassified summary linux64 opt                                   59,785,811.13 -> 56,270,043.14
  5%  Heap Unclassified summary linux64-stylo-sequential opt stylo-sequential 59,130,393.17 -> 56,106,901.11
  5%  Heap Unclassified summary linux64-stylo-disabled opt stylo-disabled     59,982,189.82 -> 57,242,856.73

For up to date results, see: https://treeherder.mozilla.org/perf.html#/alerts?id=10479
...and the re-landing in comment 68 will no doubt bring back the heap-unclassified regression. I'll comment further in bug 1416177.
See Also: → 1416569
Duplicate of this bug: 1416569
Comment on attachment 8924944 [details] [diff] [review]
patch 1 - Wrap the font family record used to pass font info from chrome to content on macOS in a union, in preparation for using the same mechanism with a different type of font record on Linux. (No functional change here, just the added union type and s

Approval Request Comment [for the complete set of patches]

[Feature/Bug causing the regression]: content-process sandbox

[User impact if declined]: Totally broken fonts (i.e. text completely missing on web pages) for some Linux users, depending on local font configuration.

[Is this code covered by automated tests?]: This is the platform font initialization code, so heavily exercised by all tests; but the specific scenario that leads to broken rendering is not covered by automated tests as it depends on local font configuration.

[Has the fix been verified in Nightly?]: yes

[Needs manual test from QE? If yes, steps to reproduce]: Install copies of common fonts such as Open Sans, DejaVu, FreeSans, etc in a *non-standard* local directory (NOT under /etc/fonts or ~/.fonts and so on), and configure fontconfig to see them; without this fix, this will break usage of those fonts in web content.

[List of other uplifts needed for the feature/fix]: none

[Is the change risky?]: Somewhat, particularly as it depends on an apparently little-tested feature of fontconfig (we had to work around two distinct fontconfig bugs in the course of getting this landed!)

[Why is the change risky/not risky?]: Substantial reworking of how the content process on Linux acquires the list of available fonts; this is based on code we're also using on macOS, so the general approach is pre-existing, but the specific implementation is new (this is per-platform code) and depends on fontconfig functionality we were not previously using, so there is some risk of new bugs. Landing early in beta-58 seems advisable to maximize testing.

[String changes made/needed]: n/a
Attachment #8924944 - Flags: approval-mozilla-beta?
Hi :jfkthame, 
Are all of the patches you want to uplift or only patch 1? Is this a new regression in 58?
Flags: needinfo?(jfkthame)
This is for the complete set of patches; they were created in separate parts for easier review, but all need to work together.

The regression was new in Firefox 57, I believe, because that's where we shipped content-process sandbox level 3 by default (bug 1308400), and that's what causes the breakage.
Flags: needinfo?(jfkthame)
Hi Brindusa, could you help find someone to verify if this issue was fixed as expected on the latest Nightly build ASAP? Thanks!
Flags: qe-verify+
Flags: needinfo?(brindusa.tot)
Hi all, 
I did a test which was successful. 

Here is what I did
1. Download nightly from https://download.mozilla.org/?product=firefox-nightly-latest-l10n-ssl&os=linux64&lang=de

2. Untarred and 

cd firefox

3. Starting firefox like follows:

./firefox -P nightly -no-remote

4. Creating the new profile nightly

5. Checking in about:config that security.sandbox.content.level=3 which was the case.

6. Opened the following web pages which were wrongly displayed before

https://fedoramagazine.org/
https://fedoramagazine.org/announcing-fedora-27/
https://www.youtube.com/

All are looking fine now.
Duplicate of this bug: 1418651
Duplicate of this bug: 1418618
Duplicate of this bug: 1418607
Hi Manfred,
Can you reproduce the issue without the fix and the issue is gone after using the nightly build? Or do you follow the steps described in comment #73?
Flags: needinfo?(ml_news)
For what it's worth: I reported this bug as bug number 1418607. I tried the steps suggested by Manfred in Comment 77. The nightly build worked nicely for me.
Hi Gerry,
Yes. When using the Fedora Firefox which is firefox-57.0-2.fc27.x86_64 then the error is still there (when having set security.sandbox.content.level=3). So I haven't messed around with fonts as it is easy to reproduce the issue in 57.0 and to see it is fixed in nightly.
Flags: needinfo?(ml_news)
Reproduced the issue using Firefox Beta 57.0b12 (Build ID: 20171026211016) and Firefox Nightly 58.0a1 (Build ID: 20171112220346) on Ubuntu 16.04 x64 following the STR from comment 73 and using the link: http://jupyter-notebook.readthedocs.io/en/stable/config.html

Verified fixed using the latest Nightly 59.0a1 (Build ID: 20171120222519).
Status: RESOLVED → VERIFIED
Flags: qe-verify+
Flags: needinfo?(brindusa.tot)
Duplicate of this bug: 1419444
Comment on attachment 8924944 [details] [diff] [review]
patch 1 - Wrap the font family record used to pass font info from chrome to content on macOS in a union, in preparation for using the same mechanism with a different type of font record on Linux. (No functional change here, just the added union type and s

Fix the broken font issue when visiting some websites and was verified. Let's take it in Beta58.
Attachment #8924944 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Comment on attachment 8925068 [details] [diff] [review]
patch 3 - Check the sandbox policy to verify font files will be readable by the content process before including them in the system font list

Review of attachment 8925068 [details] [diff] [review]:
-----------------------------------------------------------------

::: gfx/thebes/gfxFcPlatformFontList.cpp
@@ +1482,5 @@
> +#ifdef MOZ_CONTENT_SANDBOX
> +    // Create a temporary SandboxPolicy to check font paths; use a fake PID
> +    // to avoid picking up any PID-specific rules by accident.
> +    SandboxBrokerPolicyFactory policyFactory;
> +    if (GetEffectiveContentSandboxLevel() > 0 &&

This check should be done versus level 2, i.e. we only remove fonts if the read sandbox is enabled, not sandboxing in general.

I only realized this after a user in https://bugzilla.mozilla.org/show_bug.cgi?id=1418240#c13 remarked that *all* his fonts are in non-standard locations. Which means that this patch would break falling back to level 2.
I filed Bug 1419673, let's track that separately.
(In reply to Gian-Carlo Pascutto [:gcp] from comment #87)
> This check should be done versus level 2, i.e. we only remove fonts if the
> read sandbox is enabled, not sandboxing in general.
> 
> I only realized this after a user in
> https://bugzilla.mozilla.org/show_bug.cgi?id=1418240#c13 remarked that *all*
> his fonts are in non-standard locations. Which means that this patch would
> break falling back to level 2.

Huh, I see what you mean ... ouch! The best way forward in that case is for the user to add their font location(s) to the whitelist pref, I think. Still, we should adjust this so that level 2 works for them.

(I guess I assumed -- incorrectly -- that the policy->Lookup() call would tell us whether the file was readable according to the current sandbox level; but that's not how it works, is it?)
Hm, as an outsider regarding the coding you all do I have to admit I'm a bit confused.

I have fonts somewhere below the following directories according to the output of fc-list:

/usr/share/fonts/
/usr/share/X11/fonts/
/usr/local/texlive/2017/texmf-dist
/usr/local/texlive/texmf-local/fonts/

So it seems I have non-standard font directories in use. Nevertheless, the error was fixed by nightly when security.sandbox.content.level=3.
(In reply to Manfred L. from comment #90)
> Hm, as an outsider regarding the coding you all do I have to admit I'm a bit
> confused.
> 
> I have fonts somewhere below the following directories according to the
> output of fc-list:
> 
> /usr/share/fonts/
> /usr/share/X11/fonts/
> /usr/local/texlive/2017/texmf-dist
> /usr/local/texlive/texmf-local/fonts/
> 
> So it seems I have non-standard font directories in use. Nevertheless, the
> error was fixed by nightly when security.sandbox.content.level=3.

That's as expected. You have your distribution's standard fonts in the usual places (under /usr/share), and in addition you have texlive fonts installed under /usr/local. With sandboxing, only the standard ones are usable for Firefox (unless you customize the sandbox whitelist pref). If you had *all* your fonts in non-standard places, you'd be seeing issues, as none of them would be available in the browser.
Thanks Jonathan,
Ok, so the font directories occuring in the list provided by fc-list are regarded as standard locations.
(In reply to Jonathan Kew (:jfkthame) from comment #89)
> (I guess I assumed -- incorrectly -- that the policy->Lookup() call would
> tell us whether the file was readable according to the current sandbox
> level; but that's not how it works, is it?)

No, you're right. That should actually work correctly. At worst it's a bit of extra work when it's not needed, and it's only not needed in the atypical case. So carry on!
(In reply to Manfred L. from comment #92)
> Thanks Jonathan,
> Ok, so the font directories occuring in the list provided by fc-list are
> regarded as standard locations.

The first two are, the texlive ones aren't. Normally, one wouldn't care about this, as webpages shouldn't be requesting TexLive fonts! But it seems the combination of weird font rules on webpages and fontconfig behavior on Linux means that they sometimes end up using them.

You're not seeing any problems on Nightly because it can detect this case and exclude those fonts.
(In reply to Gian-Carlo Pascutto [:gcp] from comment #93)
> No, you're right. That should actually work correctly. At worst it's a bit
> of extra work when it's not needed, and it's only not needed in the atypical
> case. So carry on!

Oh, OK! That's a relief. :)

(In reply to Gian-Carlo Pascutto [:gcp] from comment #94)
> (In reply to Manfred L. from comment #92)
> > Thanks Jonathan,
> > Ok, so the font directories occuring in the list provided by fc-list are
> > regarded as standard locations.
> 
> The first two are, the texlive ones aren't. Normally, one wouldn't care
> about this, as webpages shouldn't be requesting TexLive fonts! But it seems
> the combination of weird font rules on webpages and fontconfig behavior on
> Linux means that they sometimes end up using them.

TeX Live includes a bunch of fonts that may overlap with ones commonly found in distros and/or specified by web pages, like Libertine, Fira, FreeFont, etc.; and if these are exposed to fontconfig, they could easily lead to the issues here.
Duplicate of this bug: 1418240
Duplicate of this bug: 1422294
Gerry, want this for 58 release notes?  (I'm n-i you on these since I'm coming across them in triage for 59 notes)
Flags: needinfo?(gchang)
Liz, sure. Added it to release notes.
Flags: needinfo?(gchang)
Added to Fx58 release notes.
Duplicate of this bug: 1427980
You need to log in before you can comment on or make changes to this bug.