Closed Bug 732925 Opened 12 years ago Closed 11 years ago

Crash in gfxGDIFont::InitTextRun

Categories

(Core :: Graphics, defect)

11 Branch
x86
Windows XP
defect
Not set
critical

Tracking

()

RESOLVED WORKSFORME

People

(Reporter: scoobidiver, Unassigned)

References

Details

(Keywords: crash)

Crash Data

Attachments

(1 file)

It happens mainly on Windows XP (90% of crashes).

Signature 	_alloca_probe | gfxGDIFont::InitTextRun(gfxContext*, gfxTextRun*, wchar_t const*, unsigned int, unsigned int, int, bool) More Reports Search
UUID	66c0d5b5-07af-4e91-a193-bfee62120305
Date Processed	2012-03-05 12:10:28
Uptime	599
Last Crash	20.3 hours before submission
Install Age	20.2 hours since version was first installed.
Install Time	2012-03-04 15:53:56
Product	Firefox
Version	11.0
Build ID	20120228210006
Release Channel	beta
OS	Windows NT
OS Version	5.1.2600 Service Pack 3
Build Architecture	x86
Build Architecture Info	GenuineIntel family 6 model 28 stepping 10
Crash Reason	EXCEPTION_ACCESS_VIOLATION_READ
Crash Address	0x10f000
App Notes 	
AdapterVendorID: 0x8086, AdapterDeviceID: 0xa011, AdapterSubsysID: 03491025, AdapterDriverVersion: 6.14.10.5260
D3D10 Layers? D3D10 Layers-
D3D9 Layers? D3D9 Layers-
Processor Notes 	
EMCheckCompatibility	True
Total Virtual Memory	2147352576
Available Virtual Memory	1920266240
System Memory Use Percentage	62

Frame 	Module 	Signature [Expand] 	Source
0 	xul.dll 	_alloca_probe 	chkstk.asm:99
1 	xul.dll 	gfxGDIFont::InitTextRun 	gfx/thebes/gfxGDIFont.cpp:181
2 	xul.dll 	gfxFont::SplitAndInitTextRun 	gfx/thebes/gfxFont.cpp:1594
3 	xul.dll 	gfxFontGroup::InitScriptRun 	gfx/thebes/gfxFont.cpp:2624
4 	xul.dll 	gfxFontGroup::InitTextRun 	gfx/thebes/gfxFont.cpp:2573
5 	xul.dll 	gfxFontGroup::MakeTextRun 	gfx/thebes/gfxFont.cpp:2507
6 	xul.dll 	TextRunWordCache::MakeTextRun 	gfx/thebes/gfxTextRunWordCache.cpp:859
7 	xul.dll 	MakeTextRun 	layout/generic/nsTextFrameThebes.cpp:594
8 	xul.dll 	ParseManifest 	xpcom/components/ManifestParser.cpp:476
9 		@0x0 	
10 		@0x80000009 	
11 	xul.dll 	nsRuleNode::ComputeFontData 	layout/style/nsRuleNode.cpp:3162
12 	xul.dll 	nsRuleNode::WalkRuleTree 	layout/style/nsStyleStructList.h:78
13 	xul.dll 	nsRuleNode::WalkRuleTree 	layout/style/nsRuleNode.cpp:1944
14 	xul.dll 	nsTextFrame::EnsureTextRun 	layout/generic/nsTextFrameThebes.cpp:2279
15 	xul.dll 	nsTextFrame::ReflowText 	layout/generic/nsTextFrameThebes.cpp:7260

More reports at:
https://crash-stats.mozilla.com/report/list?signature=_chkstk%20|%20gfxGDIFont%3A%3AInitTextRun%28gfxContext*%2C%20gfxTextRun*%2C%20wchar_t%20const*%2C%20unsigned%20int%2C%20unsigned%20int%2C%20int%2C%20bool%29
https://crash-stats.mozilla.com/report/list?signature=_alloca_probe%20|%20gfxGDIFont%3A%3AInitTextRun%28gfxContext*%2C%20gfxTextRun*%2C%20wchar_t%20const*%2C%20unsigned%20int%2C%20unsigned%20int%2C%20int%2C%20bool%29
It sounds to me (_chkstk, _alloca_probe) like we're running out of stack space. Might be worth trying to increase the stack size and see if that reduces these kind of crashes?

It makes sense that this would happen primarily on XP: first, XP always uses the GDI font backend, whereas Vista and Win7 may be using DirectWrite instead (so the stacks would be different even if we ran into similar issues), and second, I wouldn't be surprised if later Windows versions allocate a larger stack in the first place.
On the assumption that these reports may suggest we're running too close to the limit of stack space on WinXP, what would you think of doing something like this to bump up the space a bit and see if that helps? I would think most XP users could afford to sacrifice an extra 0.5MB to the stack if it'll provide stability.
Attachment #602927 - Flags: review?(dbaron)
Presumably this affects only the main thread and not other threads?

What's using alloca, for what, and why?

I don't really feel like I'm the right reviewer for this.  Maybe bsmedberg?
(In reply to David Baron [:dbaron] from comment #3)
> Presumably this affects only the main thread and not other threads?

I believe so.

> What's using alloca, for what, and why?

I don't think it's called directly by our code - my guess is that it's used internally by Windows APIs (uniscribe?) that we call.

> I don't really feel like I'm the right reviewer for this.  Maybe bsmedberg?

OK. (I picked you because you'd reviewed bug 582910, that's all. Redirecting to bsmedberg...)
Attachment #602927 - Flags: review?(dbaron) → review?(benjamin)
See also bug 733372, where it looks like we're intermittently hitting a stack overflow during GC on Win32 debug builds, suggesting that perhaps we are routinely running a bit too close to the limit for comfort.
<jfkthame> no
* askalski has quit (Quit: Wychodzi)
<jfkthame> hmm, SetGlyphsFromRun doesn't seem to have as many autoarrays as i was thinking
The actual overflowing function is gfxHarfBuzzShaper::InitTextRun at http://hg.mozilla.org/releases/mozilla-beta/annotate/8c9e4873d419/gfx/thebes/gfxHarfBuzzShaper.cpp#l716

_chkstk is a compiler-generated call for any frame which is larger than one OS page. This frame appears to be 0x517C (20k) large, so _chkstk goes through and hits each page in order to trigger Windows guard page behavior.

I'm worried that what we're actually seeing here is an infinite-recursion problem in code below the stack (the stack is equally or even more busted in MSVC than in crash-stats), and that increasing the main-thread stack size is not a good solution.
It looks like the compiler has inlined some of the harfbuzz functions that are called from gfxHarfBuzzShaper::InitTextRun, and so their stack frames get merged with this one.

In current trunk code, I don't think this should be a major issue - it looks like the frame would add up to around 1.5K, provided I've found the main arrays involved:

hb_shape
> hb_shape_full
> > hb_ot_shape:
    - hb_ot_shape_plan_t
      - hb_ot_map_t
        - hb_tag_t (4) * 2                8
        - feature_map_t (64) * 8        512
        - lookup_map_t (8) * 32         256
        - pause_map_t (12) * 1           12
> > > hb_ot_shape_plan_internal
      - hb_ot_shape_planner_t
        - hb_ot_map_builder_t
          - hb_tag_t (4) * 2              8
          - feature_info_t (28) * 16    448
          - pause_info_t (12) * 1 * 2    24
plus various miscellaneous variables, pointers, etc.

However, an earlier version of the HB code (as found in current mozilla-beta) had much larger stack allocations for the feature and lookup arrays, which explains the big frame we're seeing in the beta crashes.

Still, if attempting to allocate a 20K stack frame (in a non-recursive function) can cause a crash, I think that could suggest we're running dangerously close to the edge.

Of course, it's possible we have an infinite-recursion problem somewhere below this, in which case enlarging the stack will only delay the crash and not prevent it. But IMO it'd be worth trying, to see if it eliminates these reports.
There are some fairly large stack frames in here, but after examining the minidump more carefully, we're only using about 110k of stack total:

At the access violation at 0x10f000
according to the stack scan, ESP of XRE_main is 0x0012f108

That's 131336 bytes. Unless we are mistakenly setting the stack size to *less* than 1MB, I don't think that's the problem here, although I don't know what the problem actually is. I would suspect mandatory ASLR if that were actually enabled, but that was backed out.

Scoobidiver, is there a way to see if aggregate chkstk/alloca_probe crashes went up on any particular nightly?
(In reply to Benjamin Smedberg  [:bsmedberg] from comment #9)
> Scoobidiver, is there a way to see if aggregate chkstk/alloca_probe crashes
> went up on any particular nightly?
You need to file a bug to reprocess those crash reports from a date before bug 727963 landed.
Attachment #602927 - Flags: review?(benjamin) → review-
Blocks: 737942
There's some interesting discussion here:
http://apocryph.org/2008/06/01/visual_c_apps_crashing_chkstk_under_load/

which mentions that one possible failure mode here is that _chkstk tries to commit previously reserved stack memory and fails because the system is out of physical memory. Looking at a few of these crash reports, they don't seem to be in that situation, though (judging from the reported numbers). It might be useful to run a report and check the system memory usage % numbers.

There's also a MSDN article that describes the behavior of _chkstk:
http://support.microsoft.com/kb/100775
There have been no crashes for the last four weeks after 11.0.
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: