Closed Bug 1384701 Opened 7 years ago Closed 7 years ago

Stylo: Linux debug parallel reftest / mochitest runs crash on shutdown

Categories

(Core :: CSS Parsing and Computation, enhancement, P2)

enhancement

Tracking

()

RESOLVED FIXED
mozilla57
Tracking Status
firefox57 --- fixed

People

(Reporter: jryans, Assigned: karlt)

References

Details

Attachments

(3 files)

See all 8 crashing reftest runs here:

https://treeherder.mozilla.org/#/jobs?repo=try&revision=6813000f2a777de4ed8d410522ef9dc4bf8f6de5&filter-searchStr=test-linux64-stylo%2Fdebug%20tc-R(

An example log at the crash:

https://treeherder.mozilla.org/logviewer.html#?job_id=117730537&repo=try&lineNumber=39514

The tests themselves seem to pass, but then we crash on shutdown, perhaps because of the assertion "firefox: ../../../../src/cairo-hash.c:217: _cairo_hash_table_destroy: Assertion `hash_table->live_entries == 0' failed."
:bholley, this is blocking the work to test on more desktop platforms.  We haven't been testing the configuration of Stylo + e10s off + parallel traversal correctly, which I ran into while working on bug 1380053.

What's the right way to proceed here?  Do we want to fix it even though it's e10s off only?  If so, who is best placed to do so?  Or else should we disable this test configuration?
Flags: needinfo?(bobbyholley)
I'll try disabling these runs for now to unblock the other platforms at least.
Mochitest style runs are also affected.
Summary: Stylo: Linux64 debug e10s-off parallel reftest runs crash on shutdown → Stylo: Linux64 debug e10s-off parallel reftest / mochitest runs crash on shutdown
Looks like the crashing thread goes through cairo, which then tries to call "firefox!arena_dalloc [mozjemalloc.cpp:6813000f2a77 : 1115 + 0x5]" which appears to fail.

:bholley and I discussed on IRC that we'll disable this configuration for now to unblock other platforms while we investigate a fix here.

:glandium, any ideas about the cairo / jemalloc issue here?  It only appears to happen for e10s-off + parallel Stylo + debug runs on Linux64, so necessary options include:

STYLO_THREADS=4
STYLO_FORCE_ENABLED=1
--disable-e10s
(debug build)
Flags: needinfo?(bobbyholley) → needinfo?(mh+mozilla)
Just to emphasize it: we aren't testing this correctly on try currently, because we were not setting STYLO_THREADS to 4 when e10s was off.  So we'll need to double-check the try runs when testing any fix to be sure we've covered the right thing here.
Priority: -- → P2
Now that bug 1380053 has landed (which disabled the failing configs here), you can re-enable them for testing on try by searching tests.yml[1] for the lines mentioning this bug and removing them so that e10s-off runs as well.

[1]: https://hg.mozilla.org/mozilla-central/diff/30fb5403e45c/taskcluster/ci/test/tests.yml
Priority: P2 → --
The same issue also seems to occur in a parallel _e10s on_ run of the DevTools test devtools/client/responsive.html/test/browser/browser_window_close.js.
After some painful experience with a one-click loaner (most of which revolves around bug 1386946), I managed to get a gdbserver attached to the crashing firefox, which just showed that our crash reporter can't really handle to get a good stack trace from system libraries, even when the debug packages are installed (incidentally, libc's was, but not libcairo's), and the trace in the logs is just quoting irrelevant things.

Here the stack trace from gdb's perspective:

Thread 1 "firefox" received signal SIGABRT, Aborted.
0x00007ffff6b79428 in __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:54
54      ../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) bt
#0  0x00007ffff6b79428 in __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:54
#1  0x00007ffff6b7b02a in __GI_abort () at abort.c:89
#2  0x00007ffff6b71bd7 in __assert_fail_base (fmt=<optimized out>, 
    assertion=assertion@entry=0x7ffff343eaad "hash_table->live_entries == 0", 
    file=file@entry=0x7ffff343ea90 "../../../../src/cairo-hash.c", line=line@entry=217, 
    function=function@entry=0x7ffff343ec70 <__PRETTY_FUNCTION__.10447> "_cairo_hash_table_destroy")
    at assert.c:92
#3  0x00007ffff6b71c82 in __GI___assert_fail (
    assertion=assertion@entry=0x7ffff343eaad "hash_table->live_entries == 0", 
    file=file@entry=0x7ffff343ea90 "../../../../src/cairo-hash.c", line=line@entry=217, 
    function=function@entry=0x7ffff343ec70 <__PRETTY_FUNCTION__.10447> "_cairo_hash_table_destroy")
    at assert.c:101
#4  0x00007ffff33975bc in _cairo_hash_table_destroy (hash_table=<optimized out>)
    at ../../../../src/cairo-hash.c:217
#5  0x00007ffff33cd697 in _cairo_scaled_font_map_destroy ()
    at ../../../../src/cairo-scaled-font.c:444
#6  0x00007ffff338ed69 in cairo_debug_reset_static_data () at ../../../../src/cairo-debug.c:67
#7  0x00007fffe8a563d7 in MOZ_gdk_display_close(_GdkDisplay*) ()
   from target:/home/worker/workspace/build/application/firefox/libxul.so
#8  0x00007fffe8a5e90c in XREMain::XRE_main(int, char**, mozilla::BootstrapConfig const&) ()
   from target:/home/worker/workspace/build/application/firefox/libxul.so
#9  0x00007fffe8a5ea80 in XRE_main(int, char**, mozilla::BootstrapConfig const&) ()
   from target:/home/worker/workspace/build/application/firefox/libxul.so
#10 0x00000000004064b6 in do_main(int, char**, char**) ()
#11 0x0000000000405c5c in main ()

Which says two things:
- yes, the assertion "firefox: ../../../../src/cairo-hash.c:217: _cairo_hash_table_destroy: Assertion `hash_table->live_entries == 0' failed." is what's causing it
- no, jemalloc has nothing to do with it.
Flags: needinfo?(mh+mozilla)
I see this assertion all the time locally with --disable-e10s on linux64.
Jonathan, do you know what the cause of this assertion (_cairo_scaled_font_map_destroy is not empty) generally is?  Is it leaking a gfxFontEntry?
Flags: needinfo?(jfkthame)
Given this blocks more mochitests, it should pretty much have a higher priority.
Priority: P4 → P2
(In reply to Cameron McCormack (:heycam) from comment #12)
> Jonathan, do you know what the cause of this assertion
> (_cairo_scaled_font_map_destroy is not empty) generally is?  Is it leaking a
> gfxFontEntry?

Yes, that seems likely. It might be interesting to catch this in a debugger (rr?) and see exactly what entry is left there during shutdown.

(I tried to reproduce this and bug 1387490 in a local linux64 build with stylo, but so far I'm finding them elusive here...)
Flags: needinfo?(jfkthame)
Are you passing --disable-e10s? That makes it happen consistently for me.
Flags: needinfo?(jfkthame)
Yes, I'm using --disable-e10s.

I did get the assertion once after loading the testcase from bug 1387490 and then quitting the browser, but then when I tried to repeat that in order to record/debug it, I couldn't get it to happen again. I'm guessing something racy may be going on, and my particular system has a hard time getting the timing just right to catch it.
Flags: needinfo?(jfkthame)
Summary: Stylo: Linux64 debug e10s-off parallel reftest / mochitest runs crash on shutdown → Stylo: Linux64 debug parallel reftest / mochitest runs crash on shutdown
(In reply to Jonathan Kew (:jfkthame) from comment #16)
> Yes, I'm using --disable-e10s.
> 
> I did get the assertion once after loading the testcase from bug 1387490 and
> then quitting the browser, but then when I tried to repeat that in order to
> record/debug it, I couldn't get it to happen again. I'm guessing something
> racy may be going on, and my particular system has a hard time getting the
> timing just right to catch it.

You may want to clear all cache before trying again probably?

Also you can probably try applying the patch in bug 1369815 then executing "mach mochitest layout/style/test/chrome". This is a reasonably small test set, which, according to that bug, can reproduce this issue without stylo.
Flags: needinfo?(jfkthame)
Depends on: 1387490
This also appears to affect Linux 32-bit Stylo runs:

https://treeherder.mozilla.org/#/jobs?repo=try&revision=4225df6913ab00cb97bcdb17b8c29a24ee52673b

(Ignore the browser chrome part, that seems unrelated...) Mochitest chrome, a11y, and clipboard seem to trigger this issue on Linux 32-bit with Stylo enabled.
Summary: Stylo: Linux64 debug parallel reftest / mochitest runs crash on shutdown → Stylo: Linux debug parallel reftest / mochitest runs crash on shutdown
So I did several try push yesterday with the assertion removed... but those patch doesn't seem to work.

It seems on Linux, we use the system cairo rather than our in-tree one, and it seems we don't switch to the in-tree one even if I make MOZ_TREE_CAIRO set unconditionally in old-configure.in. I have no idea why...
We *are* using the in-tree cairo. However, *gtk* is using system cairo, and there's no way around that.
So, not all tests trigger this assertion.

I've figured out that accessible/tests/mochitest/test_aria_token_attrs.html is one test which can trigger this shutdown assertion in a11y.

Given the testcase in bug 1387490, I suspect this is related to the <input>s in this test as well.
I'm still finding this very hard to reproduce locally with any consistency.

I did catch it in rr once (after loading the testcase from bug 1387490, then quitting the browser), and poked around enough to confirm that we had one stray gfxFont instance still alive with refcnt=1. This was an instance of the DejaVu Serif font, which was created early in the run (it was the second gfxFont instance to be created).

Exploring this led me to notice bug 1393185, which was causing a lot of refcounting noise, though I don't believe it is directly connected to the eventual assertion here. But having touched my build there, I can no longer replay that recording in order to investigate further; and now I can't seem to trigger the assertion again to get a fresh recording. :(
Flags: needinfo?(jfkthame)
(In reply to Xidorn Quan [:xidorn] UTC+10 (PTO, back Aug 28) from comment #24)
> I've figured out that accessible/tests/mochitest/test_aria_token_attrs.html
> is one test which can trigger this shutdown assertion in a11y.

I got this test to trigger the cairo hashtable assertion during shutdown, though it's worth noting that it is preceded by a couple of other assertions as well, starting with a LOUD WARNING ABOUT JSRuntime LEAKAGE:

GECKO(9593) | WARNING: YOU ARE LEAKING THE WORLD (at least one JSRuntime and everything alive inside it, that is) AT JS_ShutDown TIME.  FIX THIS!
GECKO(9593) | [9593] ###!!! ASSERTION: Component Manager being held past XPCOM shutdown.: 'cnt == 0', file /home/jkew/mozdev/mozilla-central/xpcom/build/XPCOMInit.cpp, line 1040
GECKO(9593) | #01: mozilla::ShutdownXPCOM(nsIServiceManager*) (/home/jkew/mozdev/mozilla-central/xpcom/build/XPCOMInit.cpp:1040)
GECKO(9593) | #02: NS_ShutdownXPCOM (/home/jkew/mozdev/mozilla-central/xpcom/build/XPCOMInit.cpp:825)
GECKO(9593) | #03: ScopedXPCOMStartup::~ScopedXPCOMStartup() (/home/jkew/mozdev/mozilla-central/toolkit/xre/nsAppRunner.cpp:1478)
GECKO(9593) | #04: operator delete(void*) (/home/jkew/mozdev/mozilla-central/obj-x86_64-pc-linux-gnu/dist/include/mozilla/mozalloc.h:224)
GECKO(9593) | #05: XRE_main(int, char**, mozilla::BootstrapConfig const&) (/home/jkew/mozdev/mozilla-central/toolkit/xre/nsAppRunner.cpp:4905)
GECKO(9593) | #06: mozilla::BootstrapImpl::XRE_main(int, char**, mozilla::BootstrapConfig const&) (/home/jkew/mozdev/mozilla-central/toolkit/xre/Bootstrap.cpp:46)
GECKO(9593) | #07: do_main(int, char**, char**) (/home/jkew/mozdev/mozilla-central/browser/app/nsBrowserApp.cpp:237 (discriminator 2))
GECKO(9593) | #08: main (/home/jkew/mozdev/mozilla-central/browser/app/nsBrowserApp.cpp:309)
GECKO(9593) | #09: __libc_start_main (/build/glibc-bfm8X4/glibc-2.23/csu/../csu/libc-start.c:325)
GECKO(9593) | #10: _start (/home/jkew/mozdev/mozilla-central/obj-x86_64-pc-linux-gnu/dist/bin/firefox)
GECKO(9593) | #11: ??? (???:???)
GECKO(9593) | [9593] ###!!! ASSERTION: 2543 dynamic atom(s) with non-zero refcount: New Folder…,inactive,menu_openFile,panel-wide-item,onDOMMenuInactive,context_unpinTab,context_reloadTab,notification-anchor-icon,context-sep-ctp,addon-webext-perm-notification-content,BMB_bookmarksPopup,menu_sendLink,cmd_cut,View:ReaderView,pointerlock-warning,cmd_quitApplication,Email Image…,X,onsizemodechange,messageCloseButton,...: 'nonZeroRefcountAtomsCount == 0', file /home/jkew/mozdev/mozilla-central/xpcom/ds/nsAtom
GECKO(9593) | #01: nsACString::~nsACString() (/home/jkew/mozdev/mozilla-central/obj-x86_64-pc-linux-gnu/dist/include/nsTSubstring.h:347)
GECKO(9593) | #02: mozilla::BaseAutoLock<mozilla::Mutex>::~BaseAutoLock() (/home/jkew/mozdev/mozilla-central/obj-x86_64-pc-linux-gnu/dist/include/mozilla/Mutex.h:171)
GECKO(9593) | #03: mozilla::ShutdownXPCOM(nsIServiceManager*) (/home/jkew/mozdev/mozilla-central/xpcom/build/XPCOMInit.cpp:1050)
GECKO(9593) | #04: NS_ShutdownXPCOM (/home/jkew/mozdev/mozilla-central/xpcom/build/XPCOMInit.cpp:825)
GECKO(9593) | #05: ScopedXPCOMStartup::~ScopedXPCOMStartup() (/home/jkew/mozdev/mozilla-central/toolkit/xre/nsAppRunner.cpp:1478)
GECKO(9593) | #06: operator delete(void*) (/home/jkew/mozdev/mozilla-central/obj-x86_64-pc-linux-gnu/dist/include/mozilla/mozalloc.h:224)
GECKO(9593) | #07: XRE_main(int, char**, mozilla::BootstrapConfig const&) (/home/jkew/mozdev/mozilla-central/toolkit/xre/nsAppRunner.cpp:4905)
GECKO(9593) | #08: mozilla::BootstrapImpl::XRE_main(int, char**, mozilla::BootstrapConfig const&) (/home/jkew/mozdev/mozilla-central/toolkit/xre/Bootstrap.cpp:46)
GECKO(9593) | #09: do_main(int, char**, char**) (/home/jkew/mozdev/mozilla-central/browser/app/nsBrowserApp.cpp:237 (discriminator 2))
GECKO(9593) | #10: main (/home/jkew/mozdev/mozilla-central/browser/app/nsBrowserApp.cpp:309)
GECKO(9593) | #11: __libc_start_main (/build/glibc-bfm8X4/glibc-2.23/csu/../csu/libc-start.c:325)
GECKO(9593) | #12: _start (/home/jkew/mozdev/mozilla-central/obj-x86_64-pc-linux-gnu/dist/bin/firefox)
GECKO(9593) | #13: ??? (???:???)
GECKO(9593) | firefox: ../../../../src/cairo-hash.c:217: _cairo_hash_table_destroy: Assertion `hash_table->live_entries == 0' failed.
TEST-INFO | Main app process: killed by out-of-range signal, number 134
Buffered messages finished
238 ERROR TEST-UNEXPECTED-FAIL | accessible/tests/mochitest/test_aria_token_attrs.html | application terminated with exit code -134

I don't know whether those earlier leak messages have any connection with the cairo-hash issue (due to holding something alive beyond the expected shutdown time) or not.

However, if I run this under rr, the assertions magically go away; despite repeated attempts, it consistently exits cleanly. Which makes investigation harder... and suggests that this may be some kind of timing-dependent shutdown race, and the overhead of rr recording (on my machine, anyhow) is enough to perturb the behavior and prevent it happening.
It's... mysterious.

So if we run only the specific test on try, we leak the world as well: https://treeherder.mozilla.org/#/jobs?repo=try&revision=d148caee5ca701aa4a7a983caa1d2218cf66b15d&filter-tier=3&selectedJob=125062322

However, if we run several tests, we end up having this assertion without leaking the world: https://treeherder.mozilla.org/#/jobs?repo=try&revision=aebe4461764721d163c8b59870d06f869cf57159&filter-tier=3&selectedJob=125029871

That says, there may actually be two different issues on this...


It is also weird that why this happens only for debug build. If this is an assertion in system cairo (which gtk uses), then it should always assert, or never assert, if the leak happens consistently. If this is timing-dependent, then probably we are just lucky on opt builds that this situation never happens...
(In reply to Xidorn Quan [:xidorn] UTC+10 (PTO, back Aug 28) from comment #27)

> It is also weird that why this happens only for debug build. If this is an
> assertion in system cairo (which gtk uses), then it should always assert, or
> never assert, if the leak happens consistently. If this is timing-dependent,
> then probably we are just lucky on opt builds that this situation never
> happens...

IIRC, this assertion comes from (somewhere underneath) cairo_debug_reset_static_data, which I don't believe we call at all in release builds.
Bug 1393585 is another report of what I'd guess is probably the same issue. :jesup reports (on irc) similar inability to reproduce under rr.
I have repro'd it under rr -- add the -h flag when recording (chaos mode).  Unfortunately, my kernel has a bug and when I tried to replay the trace, rr failed -- I'll update my kernel/etc and try again.
Flags: needinfo?(jfkthame)
So, if I skip the whole MOZ_gdk_display_close (via undefining CLEANUP_MEMORY in toolkit/xre/nsAppRunner.cpp), the task would finish successfully without any leak detected. [1]

[1] https://treeherder.mozilla.org/#/jobs?repo=try&revision=18c0d85723fbb3289843068e1409817626da898d&filter-tier=3
Actually... it doesn't seem to fail anymore! Let me see other tasks.
No longer blocks: 1387993
Karl is going to try to reproduce.
Assignee: nobody → karlt
FWIW, as I mentioned in bug 1387993 comment 4, this crash goes away for most of the tasks, probably because of bug 1383332 and/or bug 1393632 which make us do fewer parallelism. tc-M(c1) on Linux x64 seems to be the only affected task now. See https://treeherder.mozilla.org/#/jobs?repo=try&revision=183bca8aff06dc698683c94cdea36489e21c40a5&selectedJob=126332600
And in particular, using a checking from before that bug landed (last week) might improve reproducibility.
Bug 1387490 discovered this same issue via fuzzing.  It has a small test case that might aid in reproducing, however it was found before the traversal changes that Xidorn mentions in comment 34, so you may still need to go back in time before those traversal changes for the best change at reproduction.
Bug 1393585 may also have useful STR here.
So if commenting out the assertion shows that we don't get any of our own leak reports when the assertion would have happened, that suggests the problem might be related to the memory management of GTK, GDK, or cairo objects.

Have we done anything to verify that any of those objects whose reference-counting is single-threaded are being reference-counted only on the correct threads?  It seems like reference-count races could lead to something like this.

Or might there be similar issues related to management of statics, either in our code or in those libraries?
(In reply to David Baron :dbaron: ⌚️UTC-7 from comment #40)
> So if commenting out the assertion shows that we don't get any of our own
> leak reports when the assertion would have happened, that suggests the
> problem might be related to the memory management of GTK, GDK, or cairo
> objects.

Not so fast... I probably should do a try push again with an earlier commit, because the push in comment 31 doesn't show anything simply because it doesn't fail anymore as I mentioned in comment 32, and I realized what happened after that (in comment 34).
(In reply to Xidorn Quan [:xidorn] UTC+10 from comment #41)
> Not so fast... I probably should do a try push again with an earlier commit,
> because the push in comment 31 doesn't show anything simply because it
> doesn't fail anymore as I mentioned in comment 32, and I realized what
> happened after that (in comment 34).

Alright, I just did a try push [1] with a m-c commit on Aug 21 which is supposed to still have this problem, and without triggering the assertion, we don't have any other leak, so it probably means the problem is related to other libraries we link to.

[1] https://treeherder.mozilla.org/#/jobs?repo=try&revision=a9564909c81aa6abdacdda26d17a7122bfa9f585&filter-tier=3
(In reply to Xidorn Quan [:xidorn] UTC+10 from comment #18)
> FWIW, this seems to affect non-stylo as well. See bug 1369815 where the test
> change leads to a shutdown assertion similar to bug 1387490.

https://bugzilla.mozilla.org/show_bug.cgi?id=1369815#c35 seems to indicate
that was a PresContext leak, and so is unrelated.
(In reply to Karl Tomlinson (:karlt) from comment #43)
> (In reply to Xidorn Quan [:xidorn] UTC+10 from comment #18)
> > FWIW, this seems to affect non-stylo as well. See bug 1369815 where the test
> > change leads to a shutdown assertion similar to bug 1387490.
> 
> https://bugzilla.mozilla.org/show_bug.cgi?id=1369815#c35 seems to indicate
> that was a PresContext leak, and so is unrelated.

Yep, that one is unrelated.
(In reply to Karl Tomlinson (:karlt) from comment #45)
> I'm yet to reproduce this bug locally.
> 
> I've run the build from comment 42, with STYLO_THREADS=4
> STYLO_FORCE_ENABLED=1
> in the environment, against gtk, pango, cairo, and glibc libraries from
> Ubuntu
> 16.04, with Ubuntu's Ambiance theme, and failed to reproduce with steps of
> bug
> 1387490 and bug 1393585.  I've tried this with and without rr record -h.

To be clear, the build in comment 42 has the cairo leak checking turned off (with the hopes that it would allow us to reach some gecko leak checking, which didn't happen). So you wouldn't reproduce anything with that build.

> 
> I've similarly run caps/tests/mochitest/test_bug995943.xul against a recent
> local build and against a local build based on e16dba457260 (i.e. before
> changes for bug 1383332 and bug 1393632).
> 
> I did catch one world leak, but this did not trigger this assertion.
> This was with rr record -h from a local build of e16dba457260, closing the
> browser after loading https://bugzilla.mozilla.org/attachment.cgi?id=8893860.
> e10s was disabled but there was still a child (perhaps for tab previews?):

Yes, I see this sometimes as well. Would be interesting to investigate, but ideally we'd focus on the specific cairo crash.

I just grabbed a build (--enable-debug --disable-optimize --enable-stylo) from a few weeks ago that I had lying around, based on [1]. I was able to reproduce it reliably with:

./mach run --disable-e10s https://en.wikipedia.org/wiki/Barack_Obama --debugger=rr --debugger-args="record -h"

The --disable-e10s is necessary to reproduce. The backtrace isn't that useful because my system cairo doesn't have any debug symbols. Not sure if there's an easy way to install symbols after the fact?

So anyway, I have a recording, and can play human gdbserver over IRC if need be. But hopefully you can reproduce it yourself using the above. Let me know.

[1] https://hg.mozilla.org/mozilla-central/rev/d25db0546c92
Flags: needinfo?(karlt)
No longer blocks: 1387993
Thank you for that.

It seems the key to reproducing reliably is to load an input before the
Nightly Start Page.

STYLO_THREADS=4 STYLO_FORCE_ENABLED=1 ./mach run --disable-e10s 'data:text/html,<input>'

will reproduce reliably with the parent of
https://hg.mozilla.org/mozilla-central/rev/a9f5863c9d48
but not with that change for bug 1393632.

I have a recording and will investigate the cause.

(FWIW the world leak happens even with stylo off.)
Flags: needinfo?(karlt)
Blocks: 1349417
Status: NEW → ASSIGNED
Comment on attachment 8904420 [details]
bug 1384701 remove MOZ_WIDGET_GTK == 2 code from nsLookAndFeel

https://reviewboard.mozilla.org/r/176244/#review181370

(might not be the appropriate person to review this, but it looks ok to me)
Attachment #8904420 - Flags: review?(manishearth) → review+
Awesome - thanks for figuring this our Karl!
Comment on attachment 8904421 [details]
bug 1384701 get system font name and size from widget style context instead of GtkSettings

https://reviewboard.mozilla.org/r/176246/#review181384
Attachment #8904421 - Flags: review?(manishearth) → review+
Comment on attachment 8904422 [details]
bug 1384701 get system fonts in EnsureInit() which is on main thread even with servo

https://reviewboard.mozilla.org/r/176248/#review181392

looks ok to me, but I might not be the best person to review this

::: commit-message-9ae5f:16
(Diff revision 1)
> +
> +A GtkEntry already exists on the main thread, as well as style contexts for
> +most other system fonts, and so it is more efficient to create these on the
> +main thread while the style contexts exist.
> +
> +Doing this also avoids the need for Gecko_nsFont_InitSystem() to hold a global

should the lock be removed?
Attachment #8904422 - Flags: review?(manishearth) → review+
Thanks for this! I'm not overly familiar with all this code so you may want to get a font peer to review it, but it looks good from my side!
(In reply to Manish Goregaokar [:manishearth] from comment #56)
> Thanks for this! I'm not overly familiar with all this code so you may want
> to get a font peer to review it, but it looks good from my side!

Thank you for looking these over.

This is ancient code.  It's been moved around but hasn't really changed
significantly since changes for bug 176842.  The people involved are no longer
around.

I've tested that scaling works as expected with GDK_SCALE=2 in the
environment, and that the style resolves with this in
~/.config/gtk-3.0/gtk.css:

entry {
    font-family: serif;
}

dbaron more or less suggested this change in
https://bugzilla.mozilla.org/show_bug.cgi?id=96971#c0
"It would make it much easier (well, I would be able to avoid duplicating a
lot more code) to implement system fonts correctly on GTK, since for GTK we
have to create native widgets and poke at their style information, and I'd
rather not duplicate all the creation/destruction code that already exists in
nsLookAndFeelGTK for the system colors implementation."

I'm happy if someone else would like to review, but I don't think we need to
wait for dbaron to return and review this.  GTK3 is different from GTK2
anyway.
Comment on attachment 8904422 [details]
bug 1384701 get system fonts in EnsureInit() which is on main thread even with servo

https://reviewboard.mozilla.org/r/176248/#review181392

> should the lock be removed?

My point here was meant to be that, if GTK calls were to continue to be risked
off main thread from nsLookAndFeel::GetFontImpl(), then
Gecko_nsFont_InitSystem() would need to compete with
Gecko_GetLookAndFeelSystemColor() for the same lock.

Each of those functions does hold a lock, but they hold different locks, and
so are not mutually exclusive.

From the perspective of the GTK port, there is no longer a need for any lock
around calls to LookAndFeel::GetFont(), and nsRuleNode::ComputeSystemFont()
also looks thread-safe to me.

However, the WINNT port at least does not look thread-safe [3], and so the
lock can't just be removed.  From a quick look at the Mac port, it might be
thread-safe or possible to make it thread-safe, but that requires more
investigation.

[3] http://searchfox.org/mozilla-central/rev/4d8e389498a08668cce9ebf6232cc96be178c3e4/widget/windows/nsLookAndFeel.cpp#713
Pushed by ktomlinson@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/4166c5ecc5d7
remove MOZ_WIDGET_GTK == 2 code from nsLookAndFeel r=manishearth
https://hg.mozilla.org/integration/autoland/rev/143ef903d8f6
get system font name and size from widget style context instead of GtkSettings r=manishearth
https://hg.mozilla.org/integration/autoland/rev/e711937ff6fa
get system fonts in EnsureInit() which is on main thread even with servo r=manishearth
(In reply to Pulsebot from comment #59)
> Pushed by ktomlinson@mozilla.com:
> https://hg.mozilla.org/integration/autoland/rev/4166c5ecc5d7
> remove MOZ_WIDGET_GTK == 2 code from nsLookAndFeel r=manishearth

What does this have to do with this bug? Wasn't that still kept for redhat?
Flags: needinfo?(karlt)
(In reply to Mike Hommey [:glandium] from comment #60)
> What does this have to do with this bug?

The file modified here is hard to understand with all the GTK2 #ifs.
It is much easier to understand the associated code portions and changes here without the many #ifs.

Maintaining support for GTK2 in the system fonts code modified to address this
bug would require more #ifs.

> Wasn't that still kept for redhat?

https://bugzilla.mozilla.org/show_bug.cgi?id=1278282#c7 indicated that RedHat
no longer need this.
Flags: needinfo?(karlt)
See Also: → 1278282
Depends on: 1400817
You need to log in before you can comment on or make changes to this bug.