Closed Bug 1495900 Opened 6 years ago Closed 4 years ago

starting Chrome makes Firefox content processes hang for about two minutes, due to FontConfig font rescanning (FcInitReinitialize)

Categories

(Core :: Graphics: Text, defect, P3)

Desktop
Linux
defect

Tracking

()

RESOLVED FIXED
mozilla77
Tracking Status
firefox64 - wontfix
firefox65 --- wontfix
firefox76 --- fixed
firefox77 --- fixed

People

(Reporter: dbaron, Assigned: jfkthame)

References

Details

(Keywords: perf)

Attachments

(2 files)

Sometime in the past few weeks I've noticed a regression:  when I start Chromium (Ubuntu's packaged version of it) to test something, all of my Firefox content processes hang for about 2 minutes.  This is pretty bad!

Today I decided to profile, and the profile at https://perfht.ml/2y8pYIS showed what I expected to be the problem:  FontConfig.  In particular, we spend basically all of the time inside FcInitReinitialize.

I'm not sure if the thing that caused the regression is something in Chromium (e.g., a change in version 69, which I got the upgrade to on 2018-09-12, which seems plausible for when this started happening), or whether the regression is on the Firefox side.

[Tracking Requested - why for this release]:
Either way, spending this much time rescanning font directories just because I started Chromium doesn't seem acceptable, and seems likely to be problematic for any Web developers running Linux.

On the flip side, it's possible my times are particularly long because I have a lot of fonts installed.  (I don't have a particularly large number of active tabs -- probably about 20-30, though I have a very large number of dormant tabs that haven't been loaded but are part of my session.  If we do something once per tab, that could certainly be a problem...)

The bulk of the time (again, see the profile at https://perf-html.io/public/e0db389afe6c3fdcafc8df3db7f0ec0447a1f432 ) is spent in stacks like this:

TT_New_Context
FT_Load_Glyph
FcPatternFormat
FcPatternFormat
FcFreeTypeQueryFace
FcFileIsDir
FcFileIsDir
FcDirScan
FcConfigAppFontClear
FcConfigBuildFonts
FcInitLoadConfig
FcInitReinitialize
gtk_settings_get_type
g_closure_invoke
g_signal_handler_disconnect
g_signal_emit_valist
g_signal_emit
g_cclosure_marshal_BOOLEAN__BOXED_BOXEDv
g_object_notify_by_pspec
gtk_main_do_event
gdk_event_free
gdk_x11_drag_context_get_type
g_main_context_dispatch
g_main_context_dispatch
g_main_context_iteration
nsAppShell::ProcessNextNativeEvent(bool)
nsBaseAppShell::OnProcessNextEvent(nsIThreadInternal*, bool)
non-virtual thunk to nsBaseAppShell::OnProcessNextEvent(nsIThreadInternal*, bool)
nsThread::ProcessNextEvent(bool, bool*)
NS_ProcessNextEvent(nsIThread*, bool)
mozilla::ipc::MessagePump::Run(base::MessagePump::Delegate*)
MessageLoop::Run()
nsBaseAppShell::Run()
XRE_RunAppShell()
MessageLoop::Run()
XRE_InitChildProcess(int, char**, XREChildData const*)
XRE_InitChildProcess
Flags: needinfo?(jfkthame)
OS: Unspecified → Linux
Hardware: Unspecified → Desktop
Looks like this is not only within fontconfig, it's within fontconfig called from GTK code. That seems like it may not really be under our control. I guess what's happening is that when chromium is started, it's changing the (global) fontconfig configuration in such a way that fontconfig (in any process that uses it) decides it needs to reinitialize everything.

What version of fontconfig is installed? This might be a fontconfig issue that is fixed in newer versions; see also bug 1411338 comment 11.
Flags: needinfo?(jfkthame)
Flags: needinfo?(dbaron)
$ dpkg -l 'fontconfig*' | grep "^ii"
ii  fontconfig        2.12.6-0ubuntu2 amd64        generic font configuration library - support binaries
ii  fontconfig-config 2.12.6-0ubuntu2 all          generic font configuration library - configuration
Flags: needinfo?(dbaron)
Here's a more accurate callstack from gdb.

There seems to be a single call per child process, and that single call really does take a few minutes.
Oh, and the version you really wanted is:
ii  libfontconfig1:amd64        2.12.6-0ubuntu2 amd64        generic font configuration library - runtime
ii  libfontconfig1:i386         2.12.6-0ubuntu2 i386         generic font configuration library - runtime
but it's the same.
Judging by the discussion in bug 1411338, I suspect this is a fontconfig 2.12.x issue and the fix is to update to fontconfig-2.13 or later. Could you try that and confirm whether it resolves the problem?
Flags: needinfo?(dbaron)
(In reply to Jonathan Kew (:jfkthame) from comment #5)
> Judging by the discussion in bug 1411338, I suspect this is a fontconfig
> 2.12.x issue and the fix is to update to fontconfig-2.13 or later. Could you
> try that and confirm whether it resolves the problem?

That reduces the pause from 2-3 minutes to about 4-5 seconds, though there's still a pause.

(I updated to the fontconfig packages from cosmic (i.e., Ubuntu 18.10), which are version 2.13.0-5ubuntu3.)
Flags: needinfo?(dbaron)
Is there anything we can do here to avoid/reduce the pause or is this a wontfix?
I'm not sure. I don't see an obvious way forward, but the behavior is pretty bad. :(

Are any other (gtk-using) applications similarly affected? Given that the pause occurs because a fontconfig re-scan appears to be triggered from inside gtk event-handling code, it seems like other processes might be suffering the same issue.
See Also: → 1499824
Just wanted to pipe up and confirm that installing the latest fontconfig packages from cosmic significantly reduced the wait time for me, which makes running my tests a whole lot more manageable :)

For the record I just installed these debs:

http://mirrors.kernel.org/ubuntu/pool/main/f/fontconfig/fontconfig-config_2.13.0-5ubuntu3_all.deb
http://mirrors.kernel.org/ubuntu/pool/main/f/fontconfig/libfontconfig1_2.13.0-5ubuntu3_amd64.deb
Is there anything more we can do here? Not sure what to do with the Fx64 tracking request since this sounds like an issue that's out of our control.
Flags: needinfo?(jfkthame)
I don't know of any way forward here, unfortunately.

The one thing I'd be curious to know is whether other Gtk-based applications are similarly affected; it seems like they should be, given that the issue seems to be happening where fontconfig is used from within Gtk code rather than directly from Gecko. (It may be much less noticeable, though, if the app isn't visibly busy during the Chrome launch.)

David, can you try to see whether other apps are affected on your system, especially when using the older fontconfig? If this actually is unique to Firefox, that would suggest there might be something about how we use Gtk that's provoking it.
Flags: needinfo?(jfkthame) → needinfo?(dbaron)
FWIW I asked a contact on the ubuntu desktop team a couple of weeks ago, it sounds like they hadn't seen this reported on their side, either for firefox or other applications.
I'm reasonably confident that gnome-terminal wasn't affected, although I didn't retest (by downgrading) just now.  Are there other apps that are more Firefox-like that you'd want me to check?
Flags: needinfo?(dbaron)
I tried setting a breakpoint on settings_update_fontconfig in some other GTK apps (gnome-system-monitor, gnome-terminal-server), and I didn't even hit it when starting Chromium, whereas I do hit it in Firefox.
I wonder if apps like gnome-system-monitor or terminal don't do as much with font configuration? How about something like Gedit or  LibreOffice -- do they hit settings_update_fontconfig at all?

Speculation: maybe it's related to the fact that we register a callback to be notified when there's a change in Gtk default settings (see <https://searchfox.org/mozilla-central/rev/007b66c1f5f7a1b9a900a2038652a16a020f010c/widget/gtk/nsWindow.cpp#4073-4081>). If you comment out that block of code, does it make any difference?
So one difference I found between Firefox and gnome-system-monitor is that in Firefox, we execute this check_manager_window function that sets up the gdk_xsettings_manager_window_filter event filter that leads to an XEvent (property change) being turned into the fontconfig notification.  However, in gnome-system-monitor there's a call to gtk_init but it doesn't reach check_manager_window.

Here's the Firefox callstack:
#0  0x00007f071684ebf0 in check_manager_window (x11_screen=0x5612c32c4020 [GdkX11Screen], notify_changes=notify_changes@entry=0) at ../../../../../gdk/x11/xsettings-client.c:527
#1  0x00007f071684edc6 in _gdk_x11_xsettings_init (x11_screen=<optimized out>)
    at ../../../../../gdk/x11/xsettings-client.c:618
#2  0x00007f0716830e3c in _gdk_x11_display_open (display_name=<optimized out>)
    at ../../../../../gdk/x11/gdkdisplay-x11.c:1608
#3  0x00007f0716805d5d in gdk_display_manager_open_display (manager=<optimized out>, name=0x5612c32a1480 ":0") at ../../../../gdk/gdkdisplaymanager.c:472
#4  0x00007f0716cf4ffa in gtk_init_check (argc=<optimized out>, argv=<optimized out>)
    at ../../../../gtk/gtkmain.c:1104
#5  0x00007f0716cf5029 in gtk_init (argc=<optimized out>, argv=<optimized out>)
    at ../../../../gtk/gtkmain.c:1161
#6  0x00007f070b12d90c in mozilla::dom::ContentChild::Init(MessageLoop*, int, char const*, IPC::Channel*, unsigned long, bool) (this=0x5612c329d060, aIOLoop=
    0x7f0703d66cc8, aParentPid=10218, aParentBuildID=0x7ffd28fa8a3a "20180917145248", aChannel=0x7f06fc002560, aChildID=1, aIsForBrowser=<optimized out>)
    at /home/dbaron/builds/mozilla-central/mozilla/dom/ipc/ContentChild.cpp:656
gedit, like gnome-system-manager, does not hit settings_update_fontconfig, and also does not call check_manager_window within gtk_init.
And the difference is because in gedit, gdk_display_manager_open_display tries _gdk_wayland_display_open first, and it succeeds, so there's no x11 display at all.  So, basically, I think it doesn't happen in gedit because the notification code for fontconfig changes was never hooked up for wayland.
Not sure if the above leads to anything useful...
Flags: needinfo?(jfkthame)
This is likely wontfix for 64 at this point, though it'd be good to figure out a way to avoid this.
Keywords: perf
Priority: -- → P3

FWIW, this issue probably also happens with Android Emulator as well.

(In reply to Hiroyuki Ikezoe (:hiro) from comment #26)

FWIW, this issue probably also happens with Android Emulator as well.

Android does not use FontConfig, so is not affected by this.

(In reply to Lee Salzman [:lsalzman] from comment #27)

(In reply to Hiroyuki Ikezoe (:hiro) from comment #26)

FWIW, this issue probably also happens with Android Emulator as well.

Android does not use FontConfig, so is not affected by this.

Oh, you mean qemu or something like that doesn't use it at all on Linux? I've been seeing quite often similar issue when I run the emulator.

Hmm actually qemu-system-x86_64 doesn't linked to libfontconfig.so (I am not sure it's dynamically loaded or not), so probably some other processes might affect Firefox.

(In reply to svansintjan from comment #11)

Just wanted to pipe up and confirm that installing the latest fontconfig
packages from cosmic significantly reduced the wait time for me, which makes
running my tests a whole lot more manageable :)

For the record I just installed these debs:

http://mirrors.kernel.org/ubuntu/pool/main/f/fontconfig/fontconfig-config_2.
13.0-5ubuntu3_all.deb
http://mirrors.kernel.org/ubuntu/pool/main/f/fontconfig/libfontconfig1_2.13.
0-5ubuntu3_amd64.deb

Looks like the files specified in these links have a problem. Firefox is complaining that the files have 'Virus or Malware' when trying to download them. I downloaded the same package from 'Launchpad.net', it did not complain - https://launchpad.net/ubuntu/disco/amd64/fontconfig/2.13.1-2ubuntu2

Probably the links need to be removed and people should download the packages from Launchpad.

I am using Ubuntu 18.04, is it possible to fix this? I cant seem to install the deb files provided.

Hi folks,

David, can you try to see whether other apps are affected on your system

actually, when in my spare time, there are only three desktop apps that i use on my system:
Firefox, Chromium, and skype. There is actually no difference in handling w.r.t this problem.
So starting skype oftentimes freezes chromium and firefox, as well as starting chromium might freeze the others.

This is likely wontfix for 64 at this point, though it'd be good to figure out a way to avoid this.

Well, we are at 71 now (in my distribution), and the problem is still there.

That reduces the pause from 2-3 minutes to about 4-5 seconds, though there's still a pause.

Well, even 4-5 seconds are too much, when there is a video or a real time financial trader running in the browser,
or a chat with a friend just having asked you the "ultimate question" and you cannot answer for several minutes ....


As user "contacto" above, I cannot install 2.13 on my system due to missing dependencies, and the next stable long time version will not come so soon.

Can you please let us know, why firefox has to do this job synchroneously and wait for fontconfig to finish? Can't that be done in an async way?

Why does firefox have to update the fonts anyway? If the user installs new fonts, why just don't wait until the user restarts firefox to see the new fonts in the app? How about a user-option to opt-out of this behaviour?

Finally: What can we do now to fix this problem?

I seem to have run into this when starting Atom, which uses Electron (which is itself based on Chromium).

Every firefox process is at max CPU. Looking at strace it seems to mainly hang out after opening a NotoSansCJK or NotoSerifCJK font. These fonts have 2^16 - 1 glyphs, the maximum for a truetype font. It's possible the delay wouldn't be perceivable without such large fonts installed.

Firefox 72.0.1+build1-0ubuntu0.18.04.1
libfontconfig1 2.12.6-0ubuntu2
Ubuntu 18.04.3 LTS
x86-64

I believe I'm using X, not Wayland.

Can't seem to trigger it with other GTK apps. Tried epiphany, gedit, evince, Gimp, Inkscape, QuodLibet, and LibreOffice Writer (not sure if that's GTK), and Thunderbird (although I don't have an account configured in Thunderbird).

Morgan,

yes, I have that fonts, too. But even when deactivating NotoCJK, the load of the firefox (sub)processes/threads goes up to 95% (while with the Noto*CJK fonts all of them went up to 100+%).

Still waiting here for an answer that I can use as a workaround, since I cannot install the 2.13 lib version.

Waiting for a workaround too :(

This is a very frustrating bug, hoping for a fix soon as possible.

See Also: → 1411338

Out of curiosity, does this repro with MOZ_DISABLE_CONTENT_SANDBOX=1? I expect it'd be at least much faster with that...

No. As far as I can tell it eliminates the hang completely.

So this may be because we route all the system calls through the parent process, which means that file reads from content have a lot of overhead... Next thing to figure out would be whether the FcReinitialize stuff still happens consistently with the sandbox disabled...

Jed, do you know if the broker thread is registered with the profiler? It'd be kinda cool to see where time is spent there if the sandboxing is indeed the culprit.

Flags: needinfo?(jld)

I don't think they are, but that could be changed; this is the top of the broker threads' main function.

But something else to consider: is fontconfig trying to shell out to fc-cache from the content processes? That will fail under sandboxing, and that might explain some of the difference between sandboxed and unsandboxed. One experiment to try there is to set security.sandbox.content.level to 1, which should block fork but allow direct (unbrokered) filesystem access.

Flags: needinfo?(jld)

(In reply to David Baron :dbaron: 🏴󠁵󠁳󠁣󠁡󠁿 ⌚UTC-8 from comment #0)

Today I decided to profile, and the profile at https://perfht.ml/2y8pYIS
showed what I expected to be the problem: FontConfig. In particular, we
spend basically all of the time inside FcInitReinitialize.

In particular, this appears to be spending almost all of the time doing computations in fontconfig itself, not blocked on opening files (seen as recvmsg with inverted call stacks), which — assuming the nature of the problem hasn't changed since then — supports the idea that the cause of the bad performance isn't the file brokering, but it might be fontconfig falling back to ignoring a cache after failing to run an external command to rebuild it.

I've keep security.sandbox.content.level to 4, but whitelisted fontconfig caches

user_pref("security.sandbox.content.read_path_whitelist", "/var/cache/fontconfig/,/home/lynn/.cache/fontconfig/");

Now touch ~/.local/share/fonts/ doesn't make firefox to eat 100% of CPU.

According to comment 44, it looks like the problem here arises because the sandbox is blocking fontconfig from reading its caches (which then results in it doing a full rescan in each process).

So if we add these cache dirs to the sandbox policy, it may help with this issue.

Flags: needinfo?(jfkthame)
Assignee: nobody → jfkthame
Status: NEW → ASSIGNED

(In reply to Alexey Ten (Lynn) from comment #44)

I've keep security.sandbox.content.level to 4, but whitelisted fontconfig caches

user_pref("security.sandbox.content.read_path_whitelist", "/var/cache/fontconfig/,/home/lynn/.cache/fontconfig/");

Now touch ~/.local/share/fonts/ doesn't make firefox to eat 100% of CPU.

FWIW, with the pref value (though I used my home directory) Firefox crashes when a wpt finished running on my environment.

bp-8fcb3fe8-6515-4444-8d06-792780200408

FWIW, with the pref value (though I used my home directory) Firefox crashes when a wpt finished running on my environment.

Huh, that's a bit worrying..... it looks a bit like bug 1627605, but that only applies when gfx.e10s.font-list.shared is enabled, which I don't see in your crash report. Setting needinfo? to remind myself to see if I can reproduce that locally.

Flags: needinfo?(jfkthame)

Hmm I don't recall when I set gfx.e10s.font-list.shared to false and re-set to true, but today I can't reproduce the crash so far, I think I am using a nightly which is not yet including bug 1627605.

Bugbug thinks this bug is a regression, but please revert this change in case of error.

Keywords: regression
Pushed by jkew@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/a58f96ac7e03
Add fontconfig cache directories to content-process sandbox read paths. r=jld
Status: ASSIGNED → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla77

Isn't firefox-77 a tad ahead? For example, Ubuntu 18.04 is a LTS release that should benefit from having stable software, and Firefox is at 75.

This bug is a bit of a show-stopper for people who have to develop websites and test them on both Firefox and Chromium, and/or simultaneously use an Electron-based editor such as the enormously popular VSCode.

What other workarounds are available in a shorter timeframe?

Dennis: there was a workaround posted in this bug thread that may help you, namely, upgrade libfontconfig1 to 2.13.*. I am also on Ubuntu 18.04/Firefox 75 and this bug isn't causing me problems now. You will probably have to grab a newer libfontconfig deb from a later release of Ubuntu.

(In reply to Dennis Mayr from comment #53)

Ubuntu LTS has special policy for some programs (e.g. Firefox) to upgrade them. I'm using Ubuntu 16.04 and have latest stable Firefox. So this is not a problem.

What other workarounds are available in a shorter timeframe?

https://bugzilla.mozilla.org/show_bug.cgi?id=1495900#c44

Any chance this can be backported to the firefox 66 branch?

Firefox 66 is highly highly insecure! Please update!

D'oh, I meant the 76 branch, of course.

Comment on attachment 9139105 [details]
Bug 1495900 - Add fontconfig cache directories to content-process sandbox read paths. r=jld

Beta/Release Uplift Approval Request

  • User impact if declined: Long pause (jank) in Firefox as a result of starting up chrome (or possibly other triggers)
  • Is this code covered by automated tests?: No
  • Has the fix been verified in Nightly?: Yes
  • Needs manual test from QE?: No
  • If yes, steps to reproduce: On affected system, launch chrome while firefox is running, observe whether firefox pauses for a noticeable period. (Dependent on fontconfig version and possibly other aspects of system configuration.)
  • List of other uplifts needed: None
  • Risk to taking this patch: Low
  • Why is the change risky/not risky? (and alternatives if risky): Trivial, just whitelisting directory where fontconfig caches may be stored
  • String changes made/needed:
Flags: needinfo?(jfkthame)
Attachment #9139105 - Flags: approval-mozilla-beta?

Comment on attachment 9139105 [details]
Bug 1495900 - Add fontconfig cache directories to content-process sandbox read paths. r=jld

May help ameliorate jank experienced by some users by whitelisting the fontconfig cache directory. Approved for 76.0rc1.

Attachment #9139105 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Flags: qe-verify+
QA Whiteboard: [qa-triaged]

Hello,

Since I was unable to reproduce this issue in first place, I cannot confirm if it's indeed fixed or not.

Tried to reproduce it using Firefox 67.0, 75.0, 76.0b4 and 73.0a1 (2020-01-01) on multiple Ubuntu 18.04 x64 machines with the STR found in this bug and within the duplicates, I’ve also used a Firefox 68.7.0esr build from Ubuntu Store. It worked fine on each attempt, Firefox worked as expected.
I assume this might also be related to the older Chromium/Chrome versions that were available within that timeframe.
The Chromium version that is installed on the machines I've used is 81.0.4044.122.

Flags: qe-verify+

Bug 1719279 suggests this fix never worked.

Depends on: 1719279
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: