Closed Bug 1721014 Opened 3 years ago Closed 3 years ago

Firefox/WebRender crashes while rendering glyphs

Categories

(Core :: Graphics: WebRender, defect)

Firefox 90
defect

Tracking

()

RESOLVED FIXED
92 Branch
Tracking Status
firefox-esr78 --- unaffected
firefox90 --- wontfix
firefox91 --- fixed
firefox92 --- fixed

People

(Reporter: robert, Assigned: lsalzman)

References

(Blocks 1 open bug, )

Details

(Keywords: crash, regression)

Crash Data

Attachments

(2 files)

User Agent: Mozilla/5.0 (X11; Linux x86_64; rv:90.0) Gecko/20100101 Firefox/90.0

Steps to reproduce:

Using:

  • Arch Linux (up to date as of 2021-07-17)
  • FreeType freetype2-2.10.4-1
  • Either Firefox 90.0 (64-bit) from the Arch repositories, or the official Firefox Nightly 92.0a1 (2021-07-16) (64-bit)
  1. Visit https://github.com/alacritty/vtebench/blob/d3ca4c823be0a28ec33dde752398d7b48d856c08/benchmarks/unicode/symbols
  2. Use the scrollbar under the source to scroll to the right hand side

Actual results:

Firefox segfaults. No crash is listed in about:crashes.

Using a nightly build, and a version of freetype2 built from the official Arch PKGBUILD (with debug symbols), I get the following backtrace from gdb in thread WRWorker#4:

#0  0x00007ffff5d0b675 in FT_Outline_Decompose (outline=outline@entry=0x7fffc43042f0, func_interface=func_interface@entry=0x7ffff5dcdc00 <func_interface>, user=user@entry=0x7fffc43041d0) at /usr/src/debug/freetype2/src/base/ftoutln.c:46
#1  0x00007ffff5d6dc07 in gray_convert_glyph_inner (worker=worker@entry=0x7fffc43041d0, continued=continued@entry=0) at /usr/src/debug/freetype2/src/smooth/ftgrays.c:1644
#2  0x00007ffff5d6de31 in gray_convert_glyph (worker=worker@entry=0x7fffc43041d0) at /usr/src/debug/freetype2/src/smooth/ftgrays.c:1723
#3  0x00007ffff5d6e208 in gray_raster_render (raster=<optimized out>, params=<optimized out>) at /usr/src/debug/freetype2/src/smooth/ftgrays.c:1839
#4  0x00007ffff5d6d963 in ft_smooth_raster_lcd (bitmap=0x7fffe3885e28, outline=0x7fffe3885e58, render=0x7fffa9f7e900) at /usr/src/debug/freetype2/src/smooth/ftsmooth.c:292
#5  ft_smooth_render (render=0x7fffa9f7e900, slot=0x7fffe3885d90, mode=<optimized out>, origin=<optimized out>) at /usr/src/debug/freetype2/src/smooth/ftsmooth.c:517
#6  0x00007ffff5d0fa0d in FT_Render_Glyph_Internal (library=0x7fffab8f6e90, slot=0x7fffe3885d90, render_mode=FT_RENDER_MODE_LCD) at /usr/src/debug/freetype2/src/base/ftobjs.c:4652
#7  0x00007ffff0bc5df5 in webrender::glyph_rasterizer::GlyphRasterizer::flush_glyph_requests::{{closure}} () at /tmp/firefox/libxul.so
#8  0x00007ffff1c302cd in rayon::iter::plumbing::bridge_producer_consumer::helper () at /tmp/firefox/libxul.so
#9  0x00007ffff1c17178 in rayon_core::join::join_context::{{closure}} () at /tmp/firefox/libxul.so
-- lots more rayon internals --
#3035 0x00007ffff1c33228 in <rayon_core::job::StackJob<L,F,R> as rayon_core::job::Job>::execute () at /tmp/firefox/libxul.so
#3036 0x00007ffff03652ec in rayon_core::registry::WorkerThread::wait_until_cold () at /tmp/firefox/libxul.so
#3037 0x00007ffff1c171cc in rayon_core::join::join_context::{{closure}} () at /tmp/firefox/libxul.so
#3038 0x00007ffff1c30247 in rayon::iter::plumbing::bridge_producer_consumer::helper () at /tmp/firefox/libxul.so
#3039 0x00007ffff1c31947 in <rayon_core::job::HeapJob<BODY> as rayon_core::job::Job>::execute () at /tmp/firefox/libxul.so
#3040 0x00007ffff03652ec in rayon_core::registry::WorkerThread::wait_until_cold () at /tmp/firefox/libxul.so
#3041 0x00007ffff1e8df7b in rayon_core::registry::ThreadBuilder::run () at /tmp/firefox/libxul.so
#3042 0x00007ffff1e8ce14 in std::sys_common::backtrace::__rust_begin_short_backtrace () at /tmp/firefox/libxul.so
#3043 0x00007ffff1e8cffd in core::ops::function::FnOnce::call_once{{vtable.shim}} () at /tmp/firefox/libxul.so
#3044 0x00007ffff1ee02ca in std::sys::unix::thread::Thread::new::thread_start () at /tmp/firefox/libxul.so
#3045 0x00007ffff7f84259 in start_thread () at /usr/lib/libpthread.so.0
#3046 0x00007ffff7b315e3 in clone () at /usr/lib/libc.so.6

Expected results:

Firefox should not crash.

If it does crash, the crash should be listed in about:crashes.

The Bugbug bot thinks this bug should belong to the 'Core::Graphics: WebRender' component, and is moving the bug to that component. Please revert this change in case you think the bot is wrong.

Component: Untriaged → Graphics: WebRender
Product: Firefox → Core

Got a crash on latest Wintelx64 machine, but more importantly, the browser just froze with no CPU usage
Edit: The WER crash reporter also kicked in. Is this s-s?

Edit2 :
Doesnt repro with safe mode (Basic)
Doesnt repro with D3D11
Repros with Sw-WR
Repros with Hw-wr
Shared-font lint true/false makes no difference.

Crash Signature: [@ _chkstk | webrender::glyph_rasterizer::{{impl}}::flush_glyph_requests::{{closure}} ] [@ RtlpLowFragHeapAllocFromContext | RtlpAllocateHeapInternal | class SafeInt<T> SafeInt<T>::operator+ ]
Status: UNCONFIRMED → NEW
Ever confirmed: true
Attached file about:support

Instant crash without crash report on Xwayland, Debian Testing, Intel.

But Nightly with GPU Process might permanently or shortly freeze and shows the following on about:support:

Unable to set glyph size and transform: 23

Blocks: wr-stability
Keywords: crash, regression
OS: Unspecified → All
Hardware: Unspecified → All
See Also: → 1565588

Browser freezes in Nightly92.0a1 Windows10 with Compositing:WebRender.

(Mayank Bansal from comment #3)

OS: Windows_NT 10.0 19042
Report ID: bp-bc63312f-b65f-4eb7-afb9-023180210717 [@ _chkstk | webrender::glyph_rasterizer::{{impl}}::flush_glyph_requests::{{closure}} ]
Submitted: 6 minutes ago

Report ID: bp-ade6e170-eada-434a-8a16-e4f6e0210717 [@ RtlpLowFragHeapAllocFromContext | RtlpAllocateHeapInternal | class SafeInt<T> SafeInt<T>::operator+ ]
Submitted: 7 minutes ago

Report ID: bp-ee3c10e9-a8cc-41ad-9abe-f29f00210717 [@ _chkstk | webrender::glyph_rasterizer::{{impl}}::flush_glyph_requests::{{closure}} ]
Submitted: 7 minutes ago

Report ID: bp-eb62a66e-c00d-4b81-a293-b70550210717 [@ OOM | small ]
Submitted: 7 minutes ago


Xwayland, Debian Testing, Intel

  • regular Nightly: instant browser crash without crash reporter.
  • debug build: no crash, short hang, errors in terminal: "failed to load glyph!" and "Unable to set glyph size and transform: 23".

mozregression --launch 2021-07-17 --pref gfx.webrender.all:true -a https://github.com/alacritty/vtebench/blob/d3ca4c823be0a28ec33dde752398d7b48d856c08/benchmarks/unicode/symbols -B debug

0:39.00 INFO: b'[Child 31447, Main Thread] WARNING: failed to load glyph!: file /builds/worker/checkouts/gecko/gfx/thebes/gfxFT2FontBase.cpp:634'
0:39.00 INFO: b'[Child 31447, Main Thread] WARNING: failed to load glyph!: file /builds/worker/checkouts/gecko/gfx/thebes/gfxFT2FontBase.cpp:634'
0:39.00 INFO: b'[Child 31447, Main Thread] WARNING: failed to load glyph!: file /builds/worker/checkouts/gecko/gfx/thebes/gfxFT2FontBase.cpp:634'
0:39.00 INFO: b'[Child 31447, Main Thread] WARNING: failed to load glyph!: file /builds/worker/checkouts/gecko/gfx/thebes/gfxFT2FontBase.cpp:634'
0:39.25 INFO: b'WARNING: cell content 0x7fc7799ba688 has large inline size 32695692'
0:39.25 INFO: b'WARNING: cell content 0x7fc7790fcd70 has large inline size 32695692'
0:40.02 INFO: b'[Child 31538, Main Thread] WARNING: Scrolled rect smaller than scrollport?: file /builds/worker/checkouts/gecko/layout/generic/nsGfxScrollFrame.cpp:7046'
0:44.36 INFO: b'[GFX1-]: Unable to set glyph size and transform: 23'
0:44.36 INFO: b'[GFX1-]: Unable to set glyph size and transform: 23'
0:44.36 INFO: b'[2021-07-17T15:26:56Z ERROR webrender::platform::unix::font] Unable to set glyph size and transform: 23'
0:44.36 INFO: b'[2021-07-17T15:26:56Z ERROR webrender::platform::unix::font] Unable to set glyph size and transform: 23'

From https://searchfox.org/mozilla-central/rev/740e77e6aec278385381ba9c22f9d88d91c2b858/gfx/thebes/gfxFT2FontBase.cpp#634:

// FT_Face was somehow broken/invalid? Don't try to access glyph slot.
// This probably shouldn't happen, but does: see bug 1440938.
NS_WARNING("failed to load glyph!");

Flags: needinfo?(lsalzman)
See Also: → 1440938

suspected regression from bug 1664084 . I tried to do a mozregression and this bug was in the range. But after the regression got into the autoland zone, the final output of mozregression was incorrect.

Crash Signature: [@ _chkstk | webrender::glyph_rasterizer::{{impl}}::flush_glyph_requests::{{closure}} ] [@ RtlpLowFragHeapAllocFromContext | RtlpAllocateHeapInternal | class SafeInt<T> SafeInt<T>::operator+ ] → [@ _chkstk | webrender::glyph_rasterizer::{{impl}}::flush_glyph_requests::{{closure}} ] [@ RtlpLowFragHeapAllocFromContext | RtlpAllocateHeapInternal | class SafeInt<T> SafeInt<T>::operator+ ] [@ ClientSideCacheContext::FindInSharedCache ] [@ dwrote::g…
See Also: → 1721155

Rayon's collect primitive is somehow establishing recursive dependencies on
waiting for task completion, such that if a lot of glyph jobs are submitted
all at once, this can result in huge recursive stack chains. This simplifies
the glyph job queuing to just send everything immediately over the result
channel, rather than waiting for all jobs in the batch (via collect). We then
rely upon sorting upon receipt to put everything back in a sane order.

Assignee: nobody → lsalzman
Status: NEW → ASSIGNED
Pushed by lsalzman@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/2920d37e1da0
Avoid use of rayon par_iter collect in WR glyph resolution. r=gw
Status: ASSIGNED → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
Target Milestone: --- → 92 Branch

Comment on attachment 9232462 [details]
Bug 1721014 - Avoid use of rayon par_iter collect in WR glyph resolution. r?gw

Beta/Release Uplift Approval Request

  • User impact if declined: Potential crashes on all platforms when rendering large lines of text.
  • Is this code covered by automated tests?: Yes
  • Has the fix been verified in Nightly?: Yes
  • Needs manual test from QE?: No
  • If yes, steps to reproduce:
  • List of other uplifts needed: None
  • Risk to taking this patch: Low
  • Why is the change risky/not risky? (and alternatives if risky): We have had the patch living in nightly for a week, and it seems to have not caused any regressions so far.
  • String changes made/needed:
Flags: needinfo?(lsalzman)
Attachment #9232462 - Flags: approval-mozilla-beta?

Comment on attachment 9232462 [details]
Bug 1721014 - Avoid use of rayon par_iter collect in WR glyph resolution. r?gw

We have almost no crash on beta, so that doesn't seem like a problem worth fixing in 91. Also, this caused a perf regression and today is the last beta, I think we can just let it ride 92, thanks.

Attachment #9232462 - Flags: approval-mozilla-beta? → approval-mozilla-beta-

It's worth noting that crashing may be under-reported, at least on Linux - the Firefox will crash, but no crash report is generated (see the original report).

(In reply to Pascal Chevrel:pascalc from comment #12)

Comment on attachment 9232462 [details]
Bug 1721014 - Avoid use of rayon par_iter collect in WR glyph resolution. r?gw

We have almost no crash on beta, so that doesn't seem like a problem worth fixing in 91. Also, this caused a perf regression and today is the last beta, I think we can just let it ride 92, thanks.

For my sake, I believe the crash is a bigger concern, and I would prefer to have it fixed. The perf regression appears to only effect macOS 10.14 (and not 10.15+), and not other platforms. I do not believe the perf regression, given its limited scope, is big enough to outweigh the stability concerns.

(In reply to Lee Salzman [:lsalzman] from comment #14)

(In reply to Pascal Chevrel:pascalc from comment #12)

Comment on attachment 9232462 [details]
Bug 1721014 - Avoid use of rayon par_iter collect in WR glyph resolution. r?gw

We have almost no crash on beta, so that doesn't seem like a problem worth fixing in 91. Also, this caused a perf regression and today is the last beta, I think we can just let it ride 92, thanks.

For my sake, I believe the crash is a bigger concern, and I would prefer to have it fixed. The perf regression appears to only effect macOS 10.14 (and not 10.15+), and not other platforms. I do not believe the perf regression, given its limited scope, is big enough to outweigh the stability concerns.

Given that we don't actually know the number of actual crashes and that you think this is a bigger problem than it looks, I am going to trust your expertise on this and take the patch into beta, thanks!

Attachment #9232462 - Flags: approval-mozilla-beta- → approval-mozilla-beta+
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: