Closed Bug 1647035 Opened 7 months ago Closed 4 months ago

Missing characters with Webrender on Linux (independent of Mesa/llvmpipe/proprietary Nvidia)

Categories

(Core :: Graphics: WebRender, defect)

77 Branch
Desktop
Linux
defect

Tracking

()

VERIFIED FIXED
83 Branch
Tracking Status
firefox77 --- disabled
firefox78 --- disabled
firefox79 --- disabled
firefox81 --- disabled
firefox82 --- disabled
firefox83 --- fixed

People

(Reporter: do-not-mail-the-lion, Assigned: nical)

References

Details

(Keywords: correctness)

Attachments

(7 files)

User Agent: Mozilla/5.0 (X11; Linux x86_64; rv:77.0) Gecko/20100101 Firefox/77.0

Steps to reproduce:

Actual results:

Text was missing letters as shown in screenshot attachment

Expected results:

Text should appear normally.

OS: Unspecified → Linux
Hardware: Unspecified → Desktop

Bugbug thinks this bug should belong to this component, but please revert this change in case of error.

Component: Untriaged → Graphics: WebRender
Product: Firefox → Core

Thanks for the report! Please open about:support, click on "copy text to clipboard" and paste it here.

Attached file about:support text

Is this problem easy to reproduce for you or do you need luck and time?
Can you reproduce this bug right after Firefox startup? How many tabs do you have? Do you use multiple windows? (bug 1553578)

Please try disabling all dmabuf prefs on about:config, restart Firefox and use it for a while to check if the bug doesn't come back.

(Your configuration is different from regular Firefox: Flatpak, Wayland (bug 1543600; instead of XWayland) and DMABUF are all experimental and not even enabled in Firefox Nightly yet.) Mesa 20.0.5 (bug 1634213) had an Intel regression which was fixed with Mesa 20.0.8 and 20.1.0.

Blocks: wr-linux
Keywords: correctness

Very easy to reproduce, no luck. Had other people test it too.

It's reproducible regardless of startup, tabs and windows.

Also reproducible with XWayland window and DMABUF options set to false.

Can you reproduce it with https://nightly.mozilla.org as well?

Yeah, reproduceable with nightly using webrender. Worth mentioning this is using system Mesa 20.0.7

Hmm... Could you try to test older Firefox (Nightly) versions?
$ pip3 install --upgrade mozregression
First check if you can reproduce with recent Nightly:
$ ~/.local/bin/mozregression --launch 2020-06-19 --pref gfx.webrender.all:true -a https://www.gnome.org/challenge/faq/
Then try to find a problem-free version. Go back version by version.
$ mozregression --launch 74 --pref gfx.webrender.all:true -a https://www.gnome.org/challenge/faq/
If you encounter a problem-free version, you could try to find a regression range like this:
$ mozregression --good 67 --bad 69 --pref gfx.webrender.all:true -a https://www.gnome.org/challenge/faq/
If you get tab crashes on load, append " security.sandbox.content.level:0" to pref, to disable the sandbox.
https://wiki.mozilla.org/Release_Management/Calendar
If you can't find a regression range, it might be caused by a problem outside of Firefox.

From 67 to 69, we had another bad text rendering which I will upload. But it seems that the text that we have today in 77 was present all the way back in 70. I'm unsure what this indicates.

Perhaps it will be fixed with newer mesa. It seems my machine has 20.1.1 as an update on the host system so I'll try that on mozregression. (It will still be broken in the flatpak until freedesktop updates. Will report back after a reboot.

Mesa 20.1.1 did not fix it.

With llvmpipe it looks totally fine, right?
$ LIBGL_ALWAYS_SOFTWARE=1 mozregression --launch 2020-06-19 --pref gfx.webrender.all:true -a https://www.gnome.org/challenge/faq/

That doesn't work either. I guess this shows it's not a mesa bug? But it's still linux-specific since I've had someone try on windows and it worked fine.

Lee, fonts seem broken on this user's Fedora 32. What could help to narrow this down further?

Flags: needinfo?(lsalzman)
Severity: -- → S3

I've had someone test Nvidia too so it seems to affect all Linux users using Webrender both on X and Wayland. Feel free to change the bug report to reflect that. (even though we did also llvmpipe so that should've already been changed, I think)

Has no one at Mozilla been able to reproduce this? I can provide more information if needed for reproduction purposes.

Reproduced this on the latest FF Beta 79b7 on Arch with mesa 20.1.3

Reproduced on the same laptop as above (Optimus laptop) on an X11 native session with Nvidia. I think this should rule out any mesa/Intel/Wayland concerns.

from comment 16 (Beta 79/Gnome Wayland/Mesa/Arch Linux):

(#0) Error: Unable to load glyph: 3
etc.

from comment 17 (Beta79/Gnome X11/Proprietary Nvidia/Arch Linux):

Failure Log
(#0): GP+[GFX1-]: Unable to load glyph: 3
(#233): GP+[GFX1-]: Unable to load glyph: 3
(#234): GP+[GFX1-]: Unable to load glyph: 3
(#235): GP+[GFX1-]: Unable to load glyph: 3
etc.

https://searchfox.org/mozilla-central/rev/82c04b9cad5b98bdf682bd477f2b1e3071b004ad/gfx/wr/webrender/src/platform/unix/font.rs#530
This error was added by bug 1562892.

Summary: Missing characters with Webrender on AMD or Intel systems → Missing characters with Webrender on Linux (independent of Mesa/llvmpipe/proprietary Nvidia)

I can't reproduce this at all on Debian. I looked at bug 1562892, but I don't believe that has any relation to this, as there was only a superficial change to the error handling code.

Flags: needinfo?(lsalzman)

(In reply to Lee Salzman [:lsalzman] from comment #19)

I can't reproduce this at all on Debian. I looked at bug 1562892, but I don't believe that has any relation to this, as there was only a superficial change to the error handling code.

I've had some more people test it and one of those was a debian user. We think it might be related to the version of cantarell font being shipped which breaks subtly with FreeType since the same version of fontconfig is shipped on both OSs? Might be worth looking into.

Jonathan, does this reproduce for you on WebRender + Linux?

Flags: needinfo?(jfkthame)

I'm not seeing such visual bugs, but I've also seen a font error message today. This may be completely offtopic:
With MOZ_X11_EGL=1 (may be irrelevant) on Xwayland, Debian Testing I saw this in my console once:

[2020-07-20T20:41:47Z ERROR webrender::platform::unix::font] Unable to set glyph size and transform: 23

Above 3 and my 23 seem to be:
https://www.freetype.org/freetype2/docs/reference/ft2-error_code_values.html

FT_ERRORDEF_( Invalid_File_Format, 0x03, "broken file" )

FT_ERRORDEF_( Invalid_Face_Handle, 0x23, "invalid face handle" )

(In reply to Lee Salzman [:lsalzman] from comment #21)

Jonathan, does this reproduce for you on WebRender + Linux?

I don't see missing characters, the text all seems to be there just fine.

What I do see is that the regular vs bold styles of Cantarell seem to be getting mixed up: the text that should be styled as weight:normal is appearing bold, and if I use the Inspector to increase font-weight to bold, it becomes regular! This may be related to my having a variable-font version of Cantarell installed locally..... ah, yes: after removing that font, I get the expected faces (from the site's @font-face resources).

Do you have a Cantarell font installed locally? If so, what exact font files are installed?

Flags: needinfo?(jfkthame)
Flags: needinfo?(do-not-mail-the-lion)
Flags: needinfo?(Vash63)
Blocks: wr-linux-mvp
No longer blocks: wr-linux
Flags: needinfo?(Vash63)

That package provides both the variable font Cantarell-VF.otf and a set of individual styled faces (Cantarell-Regular.otf, Cantarell-Bold.otf, etc). It seems a bit rash to install both the VF and the individual faces, as this leads to ambiguity: they have the same family name, and the variable font exposes styles that match (or should we say clash with?) the individual faces, making it unclear what should be used.

With all these installed on my Linux machine, I still don't see any problem of missing glyphs, but I do get confused styles -- bold where it should be regular, and vice versa. This seems to be a result of the VF file being present; if I remove that and just leave the individual explicitly-styled faces, things look fine.

No idea about the missing characters, though; I can't reproduce that at all.

I wonder if the workaround we have on some android devices and MacOS AMD of enforcing a large horizontal alignment in the texture atlas would help here.

I tried both of these builds and still reproduced the issue on the Intel/Wayland configuration. Didn't try the Nvidia/X11 setup.

See Also: → 1657932
See Also: → 1658423
Flags: needinfo?(aosmond)

@aosmong: Can you repro?

See Also: 1658423
See Also: → 1513185

Could you try this build https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/FEGcsVvWTYuavx8o23pG9w/runs/0/artifacts/public/build/target.tar.bz2 ?

It contains a number of assertions in the text rendering code. I f you can run it from a terminal and see what comes out in the log, the hope is that on your configuration it will crash right away, and we'll be able to narrow down the search knowing which assertion failed.

Flags: needinfo?(Vash63)

The only output I get from that when viewing the test site w/ WebRender enabled is:

ExceptionHandler::GenerateDump cloned child 8800
ExceptionHandler::WaitForContinueSignal waiting for continue signal...
ExceptionHandler::SendContinueSignalToChild sent continue signal to child
Exiting due to channel error.
Exiting due to channel error.
Exiting due to channel error.
Exiting due to channel error.

It does indeed crash as soon as I try to load the test page.

Flags: needinfo?(Vash63)

Thanks for testing it. Now we know that there is a good chance that the issue is either related to a specific version of freetype or specific system fonts.
It's a bit frustrating that the assertion message didn't show up. It seems that it doesn't for release builds but does for debug builds, so here is a debug build of the same revision:

https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/WuNejuBtSF-8zP4wc5SdmA/runs/0/artifacts/public/build/target.tar.bz2

Could you try this one and report what you see in the log ?

Flags: needinfo?(Vash63)
See Also: → 1664003
Attached file output.log

The browser hung when trying to load the page but there is plenty in the log this time. Also interesting but probably unrelated, this build wouldn't work for me with EGL enabled at all, failed to create context, but normal nightly works fine with MOZ_X11_EGL=1.

This log is after unsetting MOZ_X11_EGL and then enabling webrender.

Flags: needinfo?(Vash63)
See Also: → 1666078
Duplicate of this bug: 1666078

I can reproduce this on one of my laptops. It's fedora (latest stable) on a xps13.

The issue is happening specifically with the regular Cantarell system font (/usr/share/fonts/cantarell/Cantarell-VF.otf) only when using font variations. If I change the code to use the default face, then text renders (although I suppose not with the desired settings).

Freetype detects an invalid format in the font when running cff_blend_build_vector. The code is:

1401│     /* protect against malformed fonts */
1402│     if ( !( lenNDV == 0 || NDV ) )
1403│     {
1404│       FT_TRACE4(( " cff_blend_build_vector:"
1405│                   " Malformed Normalize Design Vector data\n" ));
1406│       error = FT_THROW( Invalid_File_Format );
1407|       goto Exit;
1408│     }

With lenNDV = 1 and NDV being a null pointer.

The stacktrace is:

#0  cff_blend_build_vector (blend=<optimized out>, vsindex=<optimized out>, lenNDV=<optimized out>, NDV=<optimized out>) at /usr/src/debug/freetype-2.10.2-1.fc32.x86_64/src/cff/cffload.c:1407
#1  0x00007ffff5760d30 in cff_parse_blend (parser=<optimized out>) at /usr/src/debug/freetype-2.10.2-1.fc32.x86_64/src/cff/cffparse.c:1012
#2  0x00007ffff5762687 in cff_parser_run (parser=<optimized out>, start=<optimized out>, limit=<optimized out>) at /usr/src/debug/freetype-2.10.2-1.fc32.x86_64/src/cff/cffparse.c:1576
#3  0x00007ffff5762a1a in cff_load_private_dict (font=<optimized out>, subfont=<optimized out>, lenNDV=<optimized out>, NDV=<optimized out>) at /usr/src/debug/freetype-2.10.2-1.fc32.x86_64/src/cff/cffload.c:1925
#4  0x00007ffff57b5e97 in cf2_font_setup (transform=0x7fffb23cb1d0, font=0x7fff94045800) at /usr/src/debug/freetype-2.10.2-1.fc32.x86_64/src/psaux/psfont.c:300
#5  cf2_getGlyphOutline (glyphWidth=<synthetic pointer>, transform=0x7fffb23cb1d0, charstring=0x7fffb23cb1f0, font=0x7fff94045800) at /usr/src/debug/freetype-2.10.2-1.fc32.x86_64/src/psaux/psfont.c:511
#6  cf2_decoder_parse_charstrings (decoder=<optimized out>, charstring_base=<optimized out>, charstring_len=<optimized out>) at /usr/src/debug/freetype-2.10.2-1.fc32.x86_64/src/psaux/psft.c:435
#7  0x00007ffff5761632 in cff_slot_load (glyph=<optimized out>, size=<optimized out>, glyph_index=<optimized out>, load_flags=<optimized out>) at /usr/src/debug/freetype-2.10.2-1.fc32.x86_64/src/cff/cffgload.c:441
#8  0x00007ffff573be38 in FT_Load_Glyph (face=0x7fff93cbb800, glyph_index=360, load_flags=1114624) at /usr/src/debug/freetype-2.10.2-1.fc32.x86_64/src/base/ftobjs.c:949

The value of lenNdv comes from tt_get_var_blend where num_coords is set to 1 reading from face->blend->num_axis. NVD comes from face->blend->normalizedcoords in the same function, however I'm having trouble getting to where these come frome because the code is optimized out (This is all happening in the system provided version of freetype which in my case is 2.10.2 I also checked that the C++ code is also using the same system provided freetype version.

Interestingly, I get this issue to a different degree every time I reload that page. Sometimes a large portion of the glyphs end up being rendered, sometimes none of them do, and all variations in between. Every time I run into the same if ( !( lenNDV == 0 || NDV ) ) branch when FT_Load_Glyph fails.

I was able to reproduce this on a Fedora 32 install, but not on my main Debian Testing machine.

One thing I did notice from looking at the Gecko code is that it calls FT_Get_Var_Design_Coordinates [1] and skips adding variations if that function fails. From what I could see, we never call this in the WR font implementation.

I wonder if in regular Gecko this function fails, resulting in no variations being added and thus not hitting this code path?

[1] https://searchfox.org/mozilla-central/source/gfx/2d/UnscaledFontFreeType.cpp#129

Status: UNCONFIRMED → NEW
Ever confirmed: true

FT_Get_Var_Design_Coordinates doesn't fail on this test case for me.

However, if I call FT_Get_Var_Design_Coordinates on the original face before calling FT_Set_Var_Design_Coordinates on the new face with the variaions, even if I ignore the result, the page renders correctly.

It's a bit odd but I suppose freetype does some mutation in there and expects that we always call this function on a face when using font variation (we don't on the parent process because we have already called it on the content process on a face built from the same font).

The bug doesn't affect every user. It at least affects fedora users running freetype 2.10.2 with the system font Cantarell regular. If FT_Get_Var_Design_Coordinates isn't called on the original FT_Face before FT_Set_Var_Design_Coordinates is called, the new font face with variations gets into an invalid state, causing FT_Load_Glyph to fail with the Invalid font error (0x3). FT_Get_Var_Design_Coordinates was not called on the parent/gpu process because it is done on a face built from the same font on the content process. The workaround is to call it and ignore the result.

Assignee: nobody → nical.bugzilla
Status: NEW → ASSIGNED

Are you also escalating this to the FreeType bug tracker?

Pushed by nsilva@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/5f7108650a79
Avoid incorrect font variation initialization. r=gw,lsalzman
Status: ASSIGNED → RESOLVED
Closed: 4 months ago
Resolution: --- → FIXED
Target Milestone: --- → 83 Branch

Vash63, could try the latest nightly and confirm that the bug is fixed on your configuration ?

Flags: needinfo?(Vash63)
Flags: needinfo?(aosmond)
Duplicate of this bug: 1657932

Yep, I can confirm it's fixed!

Flags: needinfo?(Vash63)
Status: RESOLVED → VERIFIED
Flags: needinfo?(do-not-mail-the-lion)
You need to log in before you can comment on or make changes to this bug.