Closed Bug 544617 Opened 10 years ago Closed 9 years ago

crash [@ _cairo_dwrite_font_face_scaled_font_create]

Categories

(Core :: Graphics, defect, critical)

x86
Windows Vista
defect
Not set
critical

Tracking

()

RESOLVED FIXED
mozilla5

People

(Reporter: wsmwk, Assigned: jfkthame)

Details

(Keywords: crash)

Crash Data

Attachments

(3 files, 1 obsolete file)

crash [@ _cairo_dwrite_font_face_scaled_font_create]

regression?

There are 6 crashes total in last 5 months, all trunk. earliest build I find is 2010012300.  Most are within 60 seconds of startup

my crash bp-3b9339f3-c13f-4de2-85ed-f079b2100205
0	xul.dll	_cairo_dwrite_font_face_scaled_font_create	 gfx/cairo/cairo/src/cairo-dwrite-font.cpp:352
1	xul.dll	_moz_cairo_scaled_font_create	gfx/cairo/cairo/src/cairo-scaled-font.c:1014
2	xul.dll	_cairo_gstate_ensure_scaled_font	gfx/cairo/cairo/src/cairo-gstate.c:1481
3	xul.dll	_cairo_gstate_glyph_extents	gfx/cairo/cairo/src/cairo-gstate.c:1556
4	xul.dll	_moz_cairo_glyph_extents	gfx/cairo/cairo/src/cairo.c:3161
5	xul.dll	gfxFont::SetupGlyphExtents	gfx/thebes/src/gfxFont.cpp:1185
6	xul.dll	gfxFont::Measure	
7	xul.dll	gfxTextRun::MeasureText	gfx/thebes/src/gfxFont.cpp:2843
8	xul.dll	gfxTextRun::BreakAndMeasureText	gfx/thebes/src/gfxFont.cpp:3001
9	xul.dll	nsTextFrame::Reflow	layout/generic/nsTextFrameThebes.cpp:6298
10	xul.dll	nsLineLayout::ReflowFrame	layout/generic/nsLineLayout.cpp:852
11	xul.dll	nsBlockFrame::ReflowInlineFrame	layout/generic/nsBlockFrame.cpp:3715
12	xul.dll	nsBlockFrame::DoReflowInlineFrames	layout/generic/nsBlockFrame.cpp:3510
13	xul.dll	nsBlockFrame::ReflowInlineFrames	layout/generic/nsBlockFrame.cpp:3364
14	xul.dll	nsBlockFrame::ReflowLine	layout/generic/nsBlockFrame.cpp:2439
15	xul.dll	nsBlockFrame::ReflowDirtyLines	layout/generic/nsBlockFrame.cpp:1885
16	xul.dll	nsBlockFrame::Reflow	layout/generic/nsBlockFrame.cpp:993
17	xul.dll	nsFrame::BoxReflow	layout/generic/nsFrame.cpp:6499
18	xul.dll	nsFrame::RefreshSizeCache	layout/generic/nsFrame.cpp:6085
19	xul.dll	nsFrame::GetPrefSize	layout/generic/nsFrame.cpp:6169
20	xul.dll	nsSprocketLayout::GetPrefSize	layout/xul/base/src/nsSprocketLayout.cpp:1366
21	xul.dll	nsBoxFrame::GetPrefSize	layout/xul/base/src/nsBoxFrame.cpp:808
22	xul.dll	nsPopupSetFrame::DoLayout	
23	xul.dll	nsIFrame::Layout	layout/xul/base/src/nsBox.cpp:543
24	xul.dll	nsSprocketLayout::Layout	layout/xul/base/src/nsSprocketLayout.cpp:521
25	xul.dll	nsBoxFrame::DoLayout	layout/xul/base/src/nsBoxFrame.cpp:938
26	xul.dll	nsIFrame::Layout	layout/xul/base/src/nsBox.cpp:543
27	xul.dll	nsStackLayout::Layout	layout/xul/base/src/nsStackLayout.cpp:342
28	xul.dll	nsBoxFrame::DoLayout	layout/xul/base/src/nsBoxFrame.cpp:938
29	xul.dll	nsBoxFrame::Reflow	layout/xul/base/src/nsBoxFrame.cpp:748
30	xul.dll	nsContainerFrame::ReflowChild	layout/generic/nsContainerFrame.cpp:756
31	xul.dll	nsBoxFrame::IsFrameOfType	layout/xul/base/src/nsBoxFrame.h:165
32	xul.dll	XPCConvert::NativeData2JS	js/src/xpconnect/src/xpcprivate.h:2998 


others
bp-a5548fde-7859-4b07-b7cb-fda902100130
bp-ffd70f80-f675-4d8f-b2d4-96ac12100126
bp-25e44a6c-0fc8-4ad2-9059-599252100124
bp-37c046c7-b694-4cc6-a6b1-c166e2100124
bp-c0d1f08b-9226-44bd-884c-ca2272100124
This code isn't even checked in yet.

Bas, do you know of any crashes in your DirectWrite patch?
These must be from the try-server build that was built. We didn't distribute that but I guess people must have been running it anyway.

Those try server builds didn't even contain some of the needed code though. I wouldn't make too much of it. Although I haven't seen this crash. before, nor do I know what line that would've been at the time. I'll make sure to pay some extra attention to it though. It may be a real problem on a system with corrupt fonts or something like that, but I doubt it..
I don't know about the others, but mine is stock nightly build
Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv:1.9.3a1pre) Gecko/20100123 Minefield/3.7a1pre
Interesting. This must be memory corruption somehow then, but that would be very surprising in a way. As Joe said, the code to get DirectWrite isn't checked in yet. This could mean that a cairo toy face somehow had the wrong backend, if that was the case it would also make a lot of sense to crash.
Turns out DWrite code is indeed checked in - see for example http://mxr.mozilla.org/mozilla-central/source/gfx/cairo/cairo/src/cairo-dwrite-font.cpp

It'd be nice to get a way to reproduce this.
(In reply to comment #5)
> Turns out DWrite code is indeed checked in - see for example
> http://mxr.mozilla.org/mozilla-central/source/gfx/cairo/cairo/src/cairo-dwrite-font.cpp
> 
> It'd be nice to get a way to reproduce this.

Well, yes. The cairo backend is indeed checked in. But a dwrite toy face can (in theory) never be created with the current thebes code. So if code ends up with this stack trace something weird must still have gone wrong.
my dwrite crash could well be a wild sig - wouldn't be the first time. Some of my crashes happen after brief hangs/looping.  I can't speak to the other crashes. 

no new crashes found on crash-stats
Attached file stack
I just hit this on http://ie.microsoft.com/testdrive/Performance/Preschool/Default.html I only loaded the page, then I got distracted for some minutes and when back it was crashed here, the stack is slightly different though.
Since I was debugging another crash, I have this catched in the debugger. dwriteface points to a random address.
In case I should hit this again, what could I check to help?
What version were you running, a current trunk build or something older?

Note that your stack has gfxGDIFont::InitTextRun but _cairo_dwrite_font_face_scaled_font_create, a combination that should never happen - no wonder dwriteface looks random. This reminds me of bug 634762 comment #37. It's not identical, as yours doesn't involve gfxUniscribeShaper, but perhaps there's a similar issue. If you can reproduce this, I'd very much like to have STR.
Right before I crashed in the preschool demos the GUI was completely destroyed. (Tabs where erased, buttons were, no text, etc. I assumed that's because of rendering getting screwed up)

But, my stack looks completely different from most of the ones above, and appears mildly corrupted. (0x8000009, 0x3, etc)

Also, mine happened right after JS.

https://crash-stats.mozilla.com/report/index/bp-5f26a312-45a6-46c8-9698-b58642110305
(In reply to comment #11)
> Right before I crashed in the preschool demos the GUI was completely destroyed.
> (Tabs where erased, buttons were, no text, etc. I assumed that's because of
> rendering getting screwed up)
> 
> But, my stack looks completely different from most of the ones above, and
> appears mildly corrupted. (0x8000009, 0x3, etc)
> 
> Also, mine happened right after JS.
> 
> https://crash-stats.mozilla.com/report/index/bp-5f26a312-45a6-46c8-9698-b58642110305

This crash has D3D10 layers disabled but DirectWrite is being used. As far as I know, that is not a supported configuration. Do you know why you're using DirectWrite without D3D10 layers?
(In reply to comment #12)
> (In reply to comment #11)
> > Right before I crashed in the preschool demos the GUI was completely destroyed.
> > (Tabs where erased, buttons were, no text, etc. I assumed that's because of
> > rendering getting screwed up)
> > 
> > But, my stack looks completely different from most of the ones above, and
> > appears mildly corrupted. (0x8000009, 0x3, etc)
> > 
> > Also, mine happened right after JS.
> > 
> > https://crash-stats.mozilla.com/report/index/bp-5f26a312-45a6-46c8-9698-b58642110305
> 
> This crash has D3D10 layers disabled but DirectWrite is being used. As far as I
> know, that is not a supported configuration. Do you know why you're using
> DirectWrite without D3D10 layers?

I have no idea.
(In reply to comment #13)
> (In reply to comment #12)
>> Do you know why you're using DirectWrite without D3D10 layers?
> I have no idea.

Can you attach the contents of about:support?
Attached file about:support
I got this crash on my laptop with an Intel graphics card
(In reply to comment #12)
> (In reply to comment #11)
> > Right before I crashed in the preschool demos the GUI was completely destroyed.
> > (Tabs where erased, buttons were, no text, etc. I assumed that's because of
> > rendering getting screwed up)
> > 
> > But, my stack looks completely different from most of the ones above, and
> > appears mildly corrupted. (0x8000009, 0x3, etc)
> > 
> > Also, mine happened right after JS.
> > 
> > https://crash-stats.mozilla.com/report/index/bp-5f26a312-45a6-46c8-9698-b58642110305
> 
> This crash has D3D10 layers disabled but DirectWrite is being used. As far as I
> know, that is not a supported configuration. Do you know why you're using
> DirectWrite without D3D10 layers?

I don't think DirectWrite is actually being used in any of these stacks. Every stack where this is happening has other weird things about it.
If any of you can reliably reproduce this, please grab the tryserver build from http://ftp.mozilla.org/pub/mozilla.org/firefox/tryserver-builds/jkew@mozilla.com-67c74bc2468c and let me know if it resolves the problem.
(In reply to comment #17)
> If any of you can reliably reproduce this, please grab the tryserver build from
> http://ftp.mozilla.org/pub/mozilla.org/firefox/tryserver-builds/jkew@mozilla.com-67c74bc2468c
> and let me know if it resolves the problem.

Though it wasn't too reliably reproduced without this build, I have had no success reproducing it on the build you provided. WFM
Try server patch
http://hg.mozilla.org/try/rev/67c74bc2468c

Jonathan, why were you thinking this code might be a problem?
Sorry but I was somehow not cc-ed to the bug and a I missed a bunch of questions.

I crash with a local debug build made from the RC1 changeset.
This profile is new:

Direct2D Enabled false
DirectWrite Enabled false (6.1.7600.20830, font cache n/a)
GPU Accelerated Windows1/1 Direct3D 9

Notice that on the same page I see video memory going up till OOM (see bug 591358) so this could be a similar problem

(In reply to comment #11)
> Right before I crashed in the preschool demos the GUI was completely destroyed.
> (Tabs where erased, buttons were, no text, etc. I assumed that's because of
> rendering getting screwed up)

this happened to me in the other crash bug, but was due to OOM conditions where we were unable to create any cairo context.
(In reply to comment #19)
> Try server patch
> http://hg.mozilla.org/try/rev/67c74bc2468c
> 
> Jonathan, why were you thinking this code might be a problem?

I was trying to figure out how we could end up crashing at _cairo_dwrite_.... when we're using GDI fonts. Looking at Marco's stack from comment 8 and Alex's from comment 11, I think we must be calling gfxFont::InitTextRun at times when the font is not necessarily selected into the relevant gfxContext. That's no problem if we're using dwrite, because the code uses the dwrite font objects themselves, not a "current font" in the context. But with GDI, we rely on the font being selected into the Windows DC in order to measure glyphs.

That's handled for glyph measurement by ensuring (in gfxGDIFont::GetGlyphWidth) that the DC is set up and the font selected. However, bug 605043 added GetRoundOffsetsToPixels in the harfbuzz shaper, which is called before we actually start shaping. This gets the cairo context and retrieves its scaled_font to check its options. But if the cairo_scaled_font has not actually been set up yet for this context, this will trigger creation of a fallback font via the "toy" APIs in cairo, which takes a dwrite path. I think that's how we're getting into the cairo_dwrite code even though we're supposedly using GDI fonts. And that's a Bad Thing.

So to pre-empt this possibility, the patch here moves the DC setup and font selection out of the GDI and Uniscribe shapers, and the GetGlyphWidth method that harfbuzz calls, and hoists it up to gfxGDIFont::InitTextRun so that it's guaranteed to be done before any attempt to access the cairo_scaled_font in the context, regardless of which shaper we're using and what context is being passed in.

I admit I don't really understand all the possible ways InitTextRun may get called, and the gfx and cairo contexts it may be passed; but this patch looks to me as though it will make this part of the code more robust and less dependent on exactly how callers have set up the context. That seems like a good thing.
Comment on attachment 517708 [details] [diff] [review]
patch, gfxGDIFont::InitTextRun should not assume the context is set up already

Ugh - this regresses printing for GDI fonts (messed-up spacing). :(
Attachment #517708 - Flags: review?(jdaggett) → review-
OK, this is a more conservative fix that should avoid the crash scenario that shows up in the stacks in comment 8 and 11, and not regress printing. It doesn't directly modify the Windows DC (like the previous version did, incorrectly), it just ensures the Cairo font is set up earlier.
Attachment #517708 - Attachment is obsolete: true
Attachment #518342 - Flags: review?(jdaggett)
Attachment #518342 - Flags: review?(jdaggett) → review+
http://hg.mozilla.org/mozilla-central/rev/b343cef45420
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Whiteboard: [fixed-in-cedar]
Target Milestone: --- → mozilla2.2
Assignee: nobody → jfkthame
Crash Signature: [@ _cairo_dwrite_font_face_scaled_font_create]
You need to log in before you can comment on or make changes to this bug.