Closed Bug 1554819 Opened 9 months ago Closed 3 months ago

Crash in fontlist::Pointer::ToPtr

Categories

(Core :: Layout: Text and Fonts, defect, P3)

defect

Tracking

()

RESOLVED FIXED
mozilla72
Tracking Status
firefox-esr68 --- disabled
firefox70 --- disabled
firefox71 --- disabled
firefox72 --- fixed

People

(Reporter: hiro, Assigned: jfkthame)

References

(Blocks 1 open bug)

Details

(Keywords: crash)

Crash Data

Attachments

(2 files)

Today, I enabled gfx.e10s.font-list.shared, and saw two crashes so far.

bp-50ad5f4e-7c57-4f0b-b750-f2b7e0190527
bp-5ccc1039-972e-47b2-a4dc-2273d0190527

More.

bp-04a11ca2-5d59-4268-8a3c-f2e4f0190527
bp-6a84a36b-0cdd-421c-a98e-59fe00190527

The crash seems to frequently happen but unpredictable at all, all the crashes happened when I didn't touch the browser at all, e.g. during typing something in gnome-terminal or some such.
Anyways, font sharing doesn't work on WebRender yet?

All these seem to be happening at the same place during font-matching:

0 	libxul.so 	mozilla::fontlist::Pointer::ToPtr(mozilla::fontlist::FontList*) const 	clang/include/c++/6.4.0/bits/atomic_base.h:396 	context
1 	libxul.so 	mozilla::fontlist::Family::SearchAllFontsForChar(mozilla::fontlist::FontList*, GlobalFontMatch*) 	gfx/thebes/SharedFontList.cpp:301 	cfi
2 	libxul.so 	gfxFontGroup::FindFallbackFaceForChar(mozilla::fontlist::Family*, unsigned int) 	gfx/thebes/gfxTextRun.cpp:2727 	cfi
3 	libxul.so 	gfxFontGroup::FindFontForChar(unsigned int, unsigned int, unsigned int, mozilla::unicode::Script, gfxFont*, FontMatchType*) 	gfx/thebes/gfxTextRun.cpp:0 	cfi

Thanks for the report, will look into it...

Assignee: nobody → jfkthame

FYI, I've seen a couple of these after enabling the pref too. Usually when switching between tabs, but nothing reproducible so far.

Crash Signature: @ mozilla::fontlist::Pointer::ToPtr → [@ mozilla::fontlist::Pointer::ToPtr ]
Keywords: crash

Seems like this is null implying embedded nulls in facePtrs, which seems to imply uninitialized shared memory and/or an out of bounds read that just happens to be in mapped memory (or possibly that's valid and we're just not checking). Note: facePtrs itself can be null when there's an error in the call to Faces so we should probably check that as well.

It's not yet clear to me what's leading to this, but I do think we should null-check the result of Faces(), which currently we fail to do in a couple of places. If that fails it seems possible it could lead to a crash like this.

Pushed by jkew@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/cf1abd5f97b7
Add a couple of missing null-checks in SharedFontList.cpp. r=jwatt

We should probably leave this open for a while until we see if the patch above actually makes any difference.

Keywords: leave-open

I have enabled the pref since yesterday, it was pretty stable yesterday, but this morning I got this crash;
bp-1c741652-dbeb-437f-9753-1f3170190613

(In reply to Hiroyuki Ikezoe (:hiro) from comment #11)

I have enabled the pref since yesterday, it was pretty stable yesterday, but this morning I got this crash;
bp-1c741652-dbeb-437f-9753-1f3170190613

Is there any chance you can reproduce with a debug build? One of our assertions might give more details on what's going on. Something like this should get you up and running with a debug build and a clean profile if you don't have easy access to one:

cd /tmp
virtualenv venv && source venv/bin/activate
pip install mozdownload mozinstall mozrunner
mozdownload --debug-build -t daily && mozinstall *.bz2 && mozrunner -b firefox/firefox --pref=gfx.e10s.font-list.shared:True
Flags: needinfo?(hikezoe)

As I mentioned in comment 1, the crash is unpredictable at all. In the case of the crash in comment 11, it happened when I didn't touch the browser window at all. Anyway I will try.

Hmm, one thing I notice is that all the crash reports here seem to involve XUL frames in some way... e.g. your most recent one crashed under nsTextBoxFrame::DrawText. (Just an observation at this point; I don't know why it would matter. gfxFontGroup::FindFontForChar shouldn't care what kind of frame we're dealing with, it just seems an odd correlation.)

Blocks: 1533462

(In reply to Eric Rahm [:erahm] from comment #12)

cd /tmp
virtualenv venv && source venv/bin/activate
pip install mozdownload mozinstall mozrunner
mozdownload --debug-build -t daily && mozinstall *.bz2 && mozrunner -b firefox/firefox --pref=gfx.e10s.font-list.shared:True

Today, I realized the pref isn't set properly. The pref value should be true instead of True.

It's pretty hard to catch this crash on debug builds. Actually I got two crashes on the download builds by mozdownload in comment 12, but in both cases there is nothing suspicious message and it happened suddenly and it seems the crash happened in the parent process so that there is nothing I could do. Also, I started using local debug builds, but it's also really painful, it's crashed by other assertions. :/ Anyways, I am still trying to catch some clues on this.

I just realized that this crash likely happens while running wpt on an Android emulator x86_64. bp-38b1150b-97bf-4cf9-9fb6-ec8cf0190627 and bp-01b28ff8-b7da-4e64-98ea-059bb0190627 are such crashes (unfortunately it's on an opt build). And then I tried a local debug build, I got a content crash. Oddly there was no backtraces appeared so I am not sure it's caused by this ToPtr() issue or not, but I found suspicious messages on console;

[GPU 10652, Compositor] WARNING: bad Shmem: file /home/hiro/central/ipc/glue/ProtocolUtils.cpp, line 531

There were 16 warnings. Interestingly the warnings seem to be output a while ago the crash happens (maybe a few tens of seconds).

Also, FWIW, when I run wpt on Android emulator, any contents inside the firefox instance becomes irresponsible just like bug 1495900 (see bug 1495900 comment 26)

The crash was also triggered by launching Chrome. And this time I got an assertion;

Assertion failure: block < aFontList->mBlocks.Length(), at /home/hiro/central/gfx/thebes/SharedFontList.cpp:60

I am pretty sure that this crash is pretty much same as this bug. Also note that before this assertion I also got the 'bad Shmem' warnings.

That's said, I suppose the assertion doesn't provide useful information so much, memory corruption happened before we reached to the assertion?

Jonathan, is there anything what I can do for you?

Flags: needinfo?(hikezoe) → needinfo?(jfkthame)

Adding the crash signature for comment 18. There are a few of these crashes in the 6-26 Nightly.

Crash Signature: [@ mozilla::fontlist::Pointer::ToPtr ] → [@ mozilla::fontlist::Pointer::ToPtr ] [@ InvalidArrayIndex_CRASH | mozilla::fontlist::Pointer::ToPtr ]
Duplicate of this bug: 1581715

Steps to reproduce:

  1. Set gfx.e10s.font-list.shared to true.
  2. Restart Firefox.
  3. Download Font Loader.
  4. Download Franklin Gothic Book Regular.ttf.
  5. Open http://linux.voyager.hr/grsec_last_stable/changelog-stable2.txt.
  6. Press Shift-Reload.
  7. Open the Font Loader, Click on the Add Fonts button, Select the font file Franklin Gothic Book Regular.ttf then click Open.
  8. Click on the Load button.

Actual results:

The tab crashed.

Crash report: bp-dc902d43-a705-403b-87d6-749fd0190917

Top 10 frames of crashing thread:

0 xul.dll mozilla::fontlist::Pointer::ToPtr gfx/thebes/SharedFontList.cpp:40
1 xul.dll mozilla::fontlist::Family::SearchAllFontsForChar gfx/thebes/SharedFontList.cpp:304
2 xul.dll class gfxFont* gfxFontGroup::FindFallbackFaceForChar gfx/thebes/gfxTextRun.cpp:2714
3 xul.dll gfxFontGroup::FindFontForChar gfx/thebes/gfxTextRun.cpp
4 xul.dll gfxFontGroup::InitScriptRun<char16_t> gfx/thebes/gfxTextRun.cpp:2486
5 xul.dll static void gfxFontGroup::InitTextRun<char16_t> gfx/thebes/gfxTextRun.cpp:2408
6 xul.dll gfxFontGroup::MakeTextRun gfx/thebes/gfxTextRun.cpp:2280
7 xul.dll BuildTextRunsScanner::BuildTextRunForFrames layout/generic/nsTextFrame.cpp:2482
8 xul.dll BuildTextRunsScanner::FlushFrames layout/generic/nsTextFrame.cpp:1640
9 xul.dll BuildTextRunsScanner::ScanFrame layout/generic/nsTextFrame.cpp:1964

I was able to reproduce the crash from comment 21 a few days ago, but this no longer reproduces for me with latest Nightly; blinky, does it still reproduce for you, or is it resolved?

Hiro, are you still able to trigger this on current builds with any consistency?

Flags: needinfo?(over68)
Flags: needinfo?(jfkthame)
Flags: needinfo?(hikezoe.birchill)

I can reproduce with the latest Nightly build.

bp-d71e994e-836e-42b7-a772-389ab0190923

Flags: needinfo?(over68)

Hmm, that's frustrating.... trying to reproduce (on Win10) with that page, I saw it crash once, but now I'm unable to get it to crash again despite many attempts. It feels like perhaps there's something very timing-sensitive about it...

Is the crash reliably reproducible for you on Win7, or is it erratic there as well?

Not always reproducible from the first time, try several times, before every trying press Shift-Reload.

OK, thanks. I'll try again once my new debug build is ready and see if I can get to the root of this.

Note the crash cannot be reproduced if the file is run locally.

I find it difficult to reproduce the crash on Win10, I got one bp-9ce0df3e-79b0-477d-8c2c-3a3170190923.

FWIW, I haven't seen this crash since last Thursday. This is the last crash on my side; bp-07e92f16-2bc3-4a93-9a7e-12c380190919
Before the last Thursday I had been seeing the crash when I launched Chrome or run wpt or some such. But now it just causes long hang (bug 14910010).

Flags: needinfo?(hikezoe.birchill)

I can reproduce the crash on Win10 in https://archive.org/stream/ChineseCharacters/Chinese%20Characters_djvu.txt with the steps in comment 21.

Crash report from Windows 10: bp-41fc5d43-626c-48a7-a8ae-5b68f0190925
Crash report from Windows 7: bp-46a9115c-cd38-487c-9504-95e2d0190925

Can you still reproduce this with latest Nightly (since other font-list fixes have landed)? I'm currently not succeeding in reproducing locally, but it's always been unreliable for me (probably dependent on various factors involving installed fonts, timing, etc.)

Flags: needinfo?(over68)

I can reproduce with the latest Nightly build.

bp-c7bb4fc5-11b2-4f2d-9935-c58ea0191004

Flags: needinfo?(over68)

OK, thanks ... will keep looking into it, then.

Steps to reproduce 2:

  1. Open this page.
  2. Click on the icon on the right side at the bottom of the page to open the paragraph.
  3. Open the Font Loader, Click on the Add Fonts button, Select the font file Franklin Gothic Book Regular.ttf then click Open.
  4. Click on the Load button.
  5. Scroll to the bottom of the page.
  6. Scroll to the bottom of the paragraph.

See https://youtu.be/RtR46ToRlMs

Actual results:

The tab crashed.

Crash report: bp-2f7f8dc2-3dab-4300-af20-686000191007

Top 10 frames of crashing thread:

0 xul.dll mozilla::fontlist::Pointer::ToPtr gfx/thebes/SharedFontList.cpp:40
1 xul.dll mozilla::fontlist::Family::SearchAllFontsForChar gfx/thebes/SharedFontList.cpp:304
2 xul.dll class gfxFont* gfxFontGroup::FindFallbackFaceForChar gfx/thebes/gfxTextRun.cpp:2723
3 xul.dll gfxFontGroup::FindFontForChar gfx/thebes/gfxTextRun.cpp
4 xul.dll gfxFontGroup::InitScriptRun<char16_t> gfx/thebes/gfxTextRun.cpp:2495
5 xul.dll static void gfxFontGroup::InitTextRun<char16_t> gfx/thebes/gfxTextRun.cpp:2417
6 xul.dll gfxFontGroup::MakeTextRun gfx/thebes/gfxTextRun.cpp:2289
7 xul.dll void mozilla::dom::CanvasBidiProcessor::SetText dom/canvas/CanvasRenderingContext2D.cpp:3496
8 xul.dll nsBidiPresUtils::ProcessText layout/base/nsBidiPresUtils.cpp:2239
9 xul.dll nsresult mozilla::dom::CanvasRenderingContext2D::DrawOrMeasureText dom/canvas/CanvasRenderingContext2D.cpp:3821

I can reproduce the crash on Win10 with the steps in comment 36.

bp-31bbd4e4-671d-419d-9858-472320191105

The most recent form of this crash (comment 36, which I can currently reproduce fairly reliably) occurs because the CanvasRenderingContext2D may have cached a gfxFontGroup instance (or several, even, if it has multiple states in its mStyleStack), and those fontgroups contain font objects that are invalidated by a fontlist rebuild. In general, we aim to completely reflow everything after a fontlist change, so that textruns, fontgroups, etc all get flushed and rebuilt, but the fontgroup(s) attached to CanvasRenderingContext2D escape this. So we need an alternative mechanism to ensure those fontgroups drop their no-longer-valid references to the old fontlist.

I have a patch that appears to resolve this, at least for my system. There's a build at https://treeherder.mozilla.org/#/jobs?repo=try&revision=b3acded119dd427a1ef85c2f8119610e976b122d&selectedJob=276447171; would you be able to give that a try and see if it fixes things for you? (Thanks so much for all your testing & reports!)

Flags: needinfo?(over68)

I can not reproduce the crash with the steps in comment 36, but I can reproduce it with the steps in comment 21. I'll follow up in bug 1581715. Thanks.

Flags: needinfo?(over68)
Pushed by jkew@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/e6e016592c81
Ensure fontgroups attached to CanvasRenderingContext2D get reset after a shared-fontlist rebuild. r=jwatt

Removing leave-open, as we've landed a specific bug-fix here that resolves at least one of the crash scenarios. Remaining similar crashes are being followed up in bug 1581715.

Status: NEW → RESOLVED
Closed: 3 months ago
Keywords: leave-open
Resolution: --- → FIXED
Target Milestone: --- → mozilla72
See Also: → 1581715
Flags: qe-verify+
You need to log in before you can comment on or make changes to this bug.