Closed Bug 553963 Opened 11 years ago Closed 11 years ago

crash in [@ TextRunWordCache::CacheHashEntry::KeyEquals(TextRunWordCache::CacheHashKey const*) const ]

Categories

(Core :: Graphics, defect, P2)

x86
Windows XP
defect

Tracking

()

RESOLVED FIXED
Tracking Status
blocking2.0 --- betaN+

People

(Reporter: polidobj, Assigned: jfkthame)

References

Details

(Keywords: crash)

Crash Data

Attachments

(5 files)

This bug is the Windows version of bug 527280 since that bug is for Mac.

I've been seeing a crash with this signature for week or so.  
e.g.
http://crash-stats.mozilla.com/report/index/56422440-9135-41f2-bc12-ced052100321

I figured the start of the crashing coincided with my changing the font
properties.  Now I have a fairly reproducible way to crash.  The key seems to
be to use the Antique  Olive Roman AOR font.  I like the font but it has caused
problems before like after the change to cairo.  I believe I got the font with
Photoshop 7 years ago.  

I was crashing often with my daily profile but it took me a while to get
something in a clean profile.  

1a. In the font dialog change the proportional font from using serif to san
serif.  
1b. Change the default san serif font to AOR. 
1c. Uncheck the box to that allows pages to choose their own font.
2.  Goto this page:
http://www.rotoworld.com/content/playernews.aspx?sport=NHL&line=106324

This won't crash there.  But browsing in other tabs then changing back to that
tab may crash.  Also sometimes scrolling the page crashes too.  The page zoom
may also be a factor.  At some zoom levels some of the words are not drawn.  I
have not noticed that happening before with previous incidences of these
crashes.  I've only seen that on this page so I'm not sure of the relevance of
that.  

And flash is not involved here since I have it disabled in my daily use
profile.  And the crash occurs in a clean profile with flash disabled too.
Blocks: 527280
This seems to have started with the 3-16 nightly.  
3-15 - ok - http://hg.mozilla.org/mozilla-central/rev/17a3e86d5ec5
3-16 - crashes - http://hg.mozilla.org/mozilla-central/rev/050887c64183

http://hg.mozilla.org/mozilla-central/pushloghtml?startdate=2010-03-15+08%3A07%3A15&enddate=2010-03-16+02%3A02%3A50

The crash and the missing text both start with that nightly.
The Antique Olive Roman I have is a true type font.

From the properties extension:
Version 1.0
Creation Date
Oct 02, 2000

Font Vendor - Agfa
Copyright 2000 Agfa Monotype Corporation

"Antique Olive Roman contains 324 glyphs and no standard kern pairs. This font does not include embedded bitmaps."

Font Encoding Type - Unicode (ISO 10646-2)

Supported Unicode Ranges
Basic Latin
Latin-1-Supplement
Latin Extended-A

Supported Code Pages
1252 Latin 1
1250 Latin 2:East Europe
1254 Turkish
1257 Windows Baltic

I'm not browsing anywhere in particular in other tabs.  But since it's a clean profile some tabs had articles from the BBC from the headlines feed.
Severity: normal → critical
Keywords: crash
I think time stamps may have gotten mixed up because of timezones so my pushlog range given above is incorrect.  I noticed that this was checked in earlier on 3-15:
http://hg.mozilla.org/mozilla-central/rev/8ff6056308bd

IMO that seems like the best candidate for this since it looks to involve TextRuns.
Blocks: 502906
(In reply to comment #4)

> http://hg.mozilla.org/mozilla-central/rev/8ff6056308bd
> 
> IMO that seems like the best candidate for this since it looks to involve
> TextRuns.

That seems like a reasonable assumption.  Without having reproduced this yet, I'm guessing this is another manifestation of a memory stomping bug of some sort where word cache data is affected because of the frequency and short lifecycle of these bugs.  My guess is that this patch perturbed memory usage patterns just enough to allow your testcase to be more likely to cause a crash.
Possibly related: bug 555091
Reproduced using reporter's instructions:
http://crash-stats.mozilla.com/report/index/bp-2835d83e-f269-4c5a-91ac-d54702100326
Assignee: nobody → jdaggett
Could this be related?  For some time now I've been observing the
following invalid read when a Win32 build renders, uh, Oriental-style
characters (I mean, Chinese, Korean, Japanese).  This is 1.9.2.

At first I thought this was a bug in Wine, but now I'm not so sure.
Over the weekend I rewrote the part of Wine that informs Valgrind of
the state of the Win32 application's heap, so I'm a bit more confident
this isn't bogus.  Can we get Purify on the case, to get a second
opinion?

The invalid read is consistently reported on the specified URL, and
others containing said characters.  The diagnosis of the invalid
address ("Address XX is YY bytes inside a block of size ZZ free'd at
..") is unreliable, though; often the address is not identified at
all.

So .. this gives the impression we're drawing text using heap
allocated resources that have already been freed.  I peered at some of
the stated code locations but failed to make any sense of it.

If anyone can tell me how to get hold of & install the Antique Olive
Roman font, I could try to reproduce this using the rotoworld.com
page Brian mentions.




ITERATEPAGES: entry number 3 =  http://blog.mozillaonline.com

Invalid read of size 2
   at 0x8FC5FCA: X11DRV_XRender_ExtTextOut (xrender.c:1664)
   by 0x8FAB2E8: X11DRV_ExtTextOut (text.c:55)
   by 0x78A14F2: ExtTextOutW (font.c:1969)
   by 0x10B67CE7: _cairo_win32_surface_show_glyphs (cairo-win32-surface.c:1780)
   by 0x10B5BE59: _cairo_surface_show_text_glyphs (cairo-surface.c:2668)
   by 0x10B83E35: _cairo_gstate_show_text_glyphs (cairo-gstate.c:1706)
   by 0x10B65160: _moz_cairo_show_glyphs (cairo.c:3241)
   by 0x10B15A06: GlyphBuffer::Flush (gfxfont.cpp:761)
   by 0x10B15828: gfxFont::Draw (gfxfont.cpp:899)
   by 0x10B1FC7A: gfxWindowsFont::Draw (gfxwindowsfonts.cpp:1166)
   by 0x10B189F4: gfxTextRun::DrawGlyphs (gfxfont.cpp:2063)
   by 0x10B190A4: gfxTextRun::Draw (gfxfont.cpp:2274)
   by 0x104554D9: nsTextFrame::DrawText (nstextframethebes.cpp:4766)
   by 0x104552F1: nsTextFrame::PaintText (nstextframethebes.cpp:4753)
   by 0x10452D64: nsDisplayText::Paint (nstextframethebes.cpp:3854)
   by 0x104FD9FF: nsDisplayList::Paint (nsdisplaylist.cpp:405)
   by 0x104FEF49: nsDisplayWrapList::Paint (nsdisplaylist.cpp:1004)
   by 0x104FF6CB: nsDisplayClip::Paint (nsdisplaylist.cpp:1200)
   by 0x104FD9FF: nsDisplayList::Paint (nsdisplaylist.cpp:405)
   by 0x104F539A: nsLayoutUtils::PaintFrame (nslayoututils.cpp:1145)
   by 0x1026A59E: PresShell::Paint (nspresshell.cpp:5842)
   by 0x1032FB90: nsViewManager::RenderViews (nsviewmanager.cpp:533)
   by 0x1032F88A: nsViewManager::Refresh (nsviewmanager.cpp:492)
   by 0x10330D84: nsViewManager::DispatchEvent (nsviewmanager.cpp:1008)
   by 0x104FBFDF: HandleEvent (nsview.cpp:167)
   by 0x10A21C19: nsWindow::DispatchEvent (nswindow.cpp:2973)
   by 0x10A21D5F: nsWindow::DispatchWindowEvent (nswindow.cpp:3006)
   by 0x10A2C1C1: nsWindow::OnPaint (nswindowgfx.cpp:520)
   by 0x10A23C66: nsWindow::ProcessMessage (nswindow.cpp:3908)
   by 0x10A22E7D: nsWindow::WindowProc (nswindow.cpp:3608)
   by 0x7806719: ??? (in /space2/sewardj/MOZ/WINE/Inst/lib/wine/user32.dll.so)
   by 0x7808398: call_window_proc (winproc.c:242)
   by 0x780983B: WINPROC_call_window (winproc.c:908)
   by 0x77CD92D: DispatchMessageW (message.c:3143)
   by 0x10A1CFAE: nsAppShell::ProcessNextNativeEvent (nsappshell.cpp:179)
   by 0x10A1D43B: nsBaseAppShell::DoProcessNextNativeEvent 
                  (nsbaseappshell.cpp:151)

 Address 0x1126a664 is 243,356 bytes inside a block of size 269,760 free'd
   at 0x6FF6546: RtlFreeHeap (heap.c:213)
   by 0x8F7563A: X11DRV_DIB_DestroyXImage (dib.c:246)
   by 0x8F7D47A: X11DRV_DIB_DeleteDIBSection (dib.c:4870)
   by 0x8F694C3: X11DRV_DeleteBitmap (bitmap.c:460)
   by 0x7885151: BITMAP_DeleteObject (bitmap.c:636)
   by 0x78B4EFC: DeleteObject (gdiobj.c:825)
   by 0x10B65FFD: _cairo_win32_surface_finish (cairo-win32-surface.c:496)
   by 0x10B59D26: _moz_cairo_surface_finish (cairo-surface.c:537)
   by 0x10B59BFA: _moz_cairo_surface_destroy (cairo-surface.c:440)
   by 0x10B665D1: _cairo_win32_surface_release_dest_image
                  (cairo-win32-surface.c:753)
   by 0x10B5A41B: _cairo_surface_release_dest_image (cairo-surface.c:1285)
   by 0x10B7CA59: _fallback_fini (cairo-surface-fallback.c:101)
   by 0x10B7C997: _cairo_surface_fallback_composite
                  (cairo-surface-fallback.c:1198)
   by 0x10B5AA1B: _cairo_surface_composite (cairo-surface.c:1606)
   by 0x10B7BC95: _composite_trap_region (cairo-surface-fallback.c:457)
   by 0x10B7B4D1: _clip_and_composite_trapezoids (cairo-surface-fallback.c:641)
   by 0x10B7C3FF: _cairo_surface_fallback_fill (cairo-surface-fallback.c:990)
   by 0x10B5B37E: _cairo_surface_fill (cairo-surface.c:2004)
   by 0x10B83475: _cairo_gstate_fill (cairo-gstate.c:1076)
   by 0x10B64B74: _moz_cairo_fill_preserve (cairo.c:2203)
   by 0x10B0FF32: gfxContext::Fill (gfxcontext.cpp:147)
   by 0x1045E1DD: nsCSSRendering::PaintBackgroundWithSC
                  (nscssrendering.cpp:2173)
   by 0x1045C64B: nsCSSRendering::PaintBackground (nscssrendering.cpp:1367)
   by 0x104FE3A4: nsDisplayBackground::Paint (nsdisplaylist.cpp:716)
   by 0x104FD9FF: nsDisplayList::Paint (nsdisplaylist.cpp:405)
   by 0x104FEF49: nsDisplayWrapList::Paint (nsdisplaylist.cpp:1004)
   by 0x104FF6CB: nsDisplayClip::Paint (nsdisplaylist.cpp:1200)
   by 0x104FD9FF: nsDisplayList::Paint (nsdisplaylist.cpp:405)
   by 0x104F539A: nsLayoutUtils::PaintFrame (nslayoututils.cpp:1145)
   by 0x1026A59E: PresShell::Paint (nspresshell.cpp:5842)
   by 0x1032FB90: nsViewManager::RenderViews (nsviewmanager.cpp:533)
   by 0x1032F88A: nsViewManager::Refresh (nsviewmanager.cpp:492)
   by 0x10330D84: nsViewManager::DispatchEvent (nsviewmanager.cpp:1008)
   by 0x104FBFDF: HandleEvent (nsview.cpp:167)
   by 0x10A21C19: nsWindow::DispatchEvent (nswindow.cpp:2973)
   by 0x10A21D5F: nsWindow::DispatchWindowEvent (nswindow.cpp:3006)
Julian, if you've got all the sources, can you trace the uninitialized data back to see where it comes from?
(In reply to comment #9)
Roc, this is an out-of-range read (reading freed mem).  Nothing to
do with initialisation.
Right OK. The question is, can you debug it?

Maybe that DeleteObject is deleting a font while it's still in use or something?
(In reply to comment #11)

My über-lame analysis so far (further insights welcomed):
(nb, analysis pertains to 1.9.2)

V's error messages provides stacks for two events.  The earlier event
is a block being freed.  The later event is a 16-bit read inside that
now-freed block.

The two stacks share a common prefix, that is, the part from main
inwards to nsDisplayList::Paint (nsdisplaylist.cpp:405).  The prefix
is:

   ## inner direction
   by 0x104FD9FF: nsDisplayList::Paint (nsdisplaylist.cpp:405)
   by 0x104FEF49: nsDisplayWrapList::Paint (nsdisplaylist.cpp:1004)
   by 0x104FF6CB: nsDisplayClip::Paint (nsdisplaylist.cpp:1200)
   by 0x104FD9FF: nsDisplayList::Paint (nsdisplaylist.cpp:405)
   by 0x104F539A: nsLayoutUtils::PaintFrame (nslayoututils.cpp:1145)
   by 0x1026A59E: PresShell::Paint (nspresshell.cpp:5842)
   by 0x1032FB90: nsViewManager::RenderViews (nsviewmanager.cpp:533)
   by 0x1032F88A: nsViewManager::Refresh (nsviewmanager.cpp:492)
   by 0x10330D84: nsViewManager::DispatchEvent (nsviewmanager.cpp:1008)
   by 0x104FBFDF: HandleEvent (nsview.cpp:167)
   by 0x10A21C19: nsWindow::DispatchEvent (nswindow.cpp:2973)
   by 0x10A21D5F: nsWindow::DispatchWindowEvent (nswindow.cpp:3006)
   by ...
   by main
   ## outer direction

So the trouble starts in nsDisplayList::Paint (nsdisplaylist.cpp:405),
where we have a loop iterating over a bunch of nsDisplayItems*s, doing
Paint on each in turn:

  for (nsDisplayItem* i = GetBottom(); i != nsnull; i = i->GetAbove()) {
    i->Paint(aBuilder, aCtx);
  }

The crudest interpretation is that the Paint call to one of the
items in the list caused some resource to be freed, that was needed
by a Paint call for some later item in the list.  (Hmm.  Is the
same item allowed to be in the list twice?  Does the order matter?)

Anyway: following the chain of calls leading to the free.  We have:

nsDisplayList::Paint (nsdisplaylist.cpp:405)
-> nsDisplayBackground::Paint (nsdisplaylist.cpp:716)
 -> nsCSSRendering::PaintBackground (nscssrendering.cpp:1367)
  -> nsCSSRendering::PaintBackgroundWithSC (nscssrendering.cpp:2173)
   -> gfxContext::Fill (gfxcontext.cpp:147)

now we're into Cairo:

-> _moz_cairo_fill_preserve (cairo.c:2203)
 -> _cairo_gstate_fill (cairo-gstate.c:1076)
  -> _cairo_surface_fill (cairo-surface.c:2004)
   -> _cairo_surface_fallback_fill (cairo-surface-fallback.c:990)
    -> _clip_and_composite_trapezoids (cairo-surface-fallback.c:641)
     -> _composite_trap_region (cairo-surface-fallback.c:457)
      -> _cairo_surface_composite (cairo-surface.c:1606)
       -> _cairo_surface_fallback_composite (cairo-surface-fallback.c:1198)

At this point _cairo_surface_fallback_composite calls
_fallback_fini(&state), thus setting itself on the road to freeing
something or t'other:

-> _fallback_fini (cairo-surface-fallback.c:101)
 -> _cairo_surface_release_dest_image (cairo-surface.c:1285)
  -> _cairo_win32_surface_release_dest_image (cairo-win32-surface.c:753)
   -> _moz_cairo_surface_destroy (cairo-surface.c:440)
    -> _moz_cairo_surface_finish (cairo-surface.c:537)
     -> _cairo_win32_surface_finish (cairo-win32-surface.c:496)
      -> DeleteObject (gdiobj.c:825)

Following the chain from _cairo_surface_fallback_composite to
DeleteObject, it is possible to see that the thing passed to
DeleteObject is

  state.image_extra->bitmap

where "fallback_state_t state" is a local var in
_cairo_surface_fallback_composite.  It (and hence presumably the
.image_extra->bitmap field) is initialised via a call to
_fallback_init at cairo-surface-fallback.c:1180.

I guess the next questions are:

* why does the chain of calls from _fallback_fini to 
  DeleteObject consider it OK to unilaterally call DeleteObject on
  state.image_extra->bitmap?

* and what does _fallback_init initialise said field to?
Jeff, can you answer the questions in comment #12?
> I guess the next questions are:
> 
> * why does the chain of calls from _fallback_fini to 
>   DeleteObject consider it OK to unilaterally call DeleteObject on
>   state.image_extra->bitmap?

bitmaps are not shared across cairo_win32_surfaces. The image data in the bitmap can be, however, anyone that shares the image data is required to hold a reference to the win32_surface. However I expect that this might not be happening properly by the users of _cairo_win32_surface_acquire_dest_image. We currently have a patch (wrap-source_image.patch) that fixes this for acquire_source_image, perhaps something similar is needed for acquire_dest_image.

> * and what does _fallback_init initialise said field to?

It looks like image_extra can either be null or a new cairo_win32_surface.
I looked into whether the situation is similar to acquire_source_image and it looks like it's not. In fact I don't really understand what's going on.

_cairo_win32_surface_release_dest_image will only call DeleteObject when image_extra has been set. image_extra is only set when we create a win32 surface for acquire_dest_image. This win32 surface should only be alive for the duration of the fallback. (from _fallback_init to _fallback_fini). So I'd expect it to be long gone by the call to _moz_cairo_show_glyphs.

I guess we'll need to figure out how the fallback win32 surface is escaping from the fallback path...
Could we add some kind of runtime check to determine whether/when that is actually happening? That would let us either rule out this possibility or perhaps make it easier to reproduce the bug.
Priority: -- → P2
I've got this consistently reproducing on a tinderbox log right now on minefield nightlies.

http://tinderbox.mozilla.org/showlog.cgi?tree=MozillaTry&errorparser=unittest&logfile=1280456125.1280457205.17031.gz&buildtime=1280456125&buildname=WINNT%205.2%20tryserver%20debug%20test%20mochitests-1%2f5&fulltext=1

I'll save it and see if I can reproduce in recording.
I could not reproduce in the recording VM. I can reproduce in a debugger, if that helps. I'm next going to try installing all my fonts from the host machine onto my VM guest.
It would be useful to know more about the state we're in when we crash --- what's the state of CacheHashEntry, what's the state of CacheHashEntry->mTextRun.

I suspect the bug is probably that we're accessing a deleted textrun that's still in the cache.

Can you reproduce in a debug build or only an opt build? If you could reproduce in a debug build, or in a build with just gfx/thebes with DEBUG, then it would be very useful to know if any asserts are firing, such as the one in TextRunWordCache::RemoveTextRun. If those aren't firing we could probably add some more useful asserts.
Local debug build:

###!!! ASSERTION: CopyItemSplitOversize() failed: 'NS_SUCCEEDED(nrs)', file c:/builds/mozilla-central/ff-debug/gfx/thebes/../../../src/gfx/thebes/gfxUniscribeShaper.cpp, line 518

 	xul.dll!Uniscribe::Itemize()  Line 518	C++
 	xul.dll!gfxUniscribeShaper::InitTextRun(aContext=0x09a35500, aTextRun=0x0ec19c28, aString=0x0b6d0048, aRunStart=0x00000000, aRunLength=0x0001adfa, aRunScript=0x00000019)  Line 570	C++
 	xul.dll!gfxGDIFont::InitTextRun(aContext=0x09a35500, aTextRun=0x0ec19c28, aString=0x0b6d0048, aRunStart=0x00000000, aRunLength=0x0001adfa, aRunScript=0x00000019)  Line 179	C++
 	xul.dll!gfxFontGroup::InitTextRun(aContext=0x09a35500, aTextRun=0x0ec19c28, aString=0x0b6d0048, aTotalLength=0x00054c4a, aScriptRunStart=0x00000000, aScriptRunEnd=0x00022309, aRunScript=0x00000019)  Line 2240	C++
 	xul.dll!gfxFontGroup::InitTextRun(aContext=0x09a35500, aTextRun=0x0ec19c28, aString=0x0b6d0048, aLength=0x00054c4a)  Line 2209	C++
 	xul.dll!gfxFontGroup::MakeTextRun(aString=0x0b6d0048, aLength=0x00054c4a, aParams=0x003e68b0, aFlags=0x01010109)  Line 2185	C++
 	xul.dll!TextRunWordCache::MakeTextRun(aText=0x0bf90048, aLength=0x0022c7e2, aFontGroup=0x09d3b0f0, aParams=0x003e7d24, aFlags=0x01010108)  Line 693	C++
 	xul.dll!gfxTextRunWordCache::MakeTextRun(aText=0x0bf90048, aLength=0x0022c7e2, aFontGroup=0x09d3b0f0, aParams=0x003e7d24, aFlags=0x01010108)  Line 1003	C++
 	xul.dll!MakeTextRun(aText=0x0bf90048, aLength=0x0022c7e2, aFontGroup=0x09d3b0f0, aParams=0x003e7d24, aFlags=0x01010108)  Line 446	C++
 	xul.dll!BuildTextRunsScanner::BuildTextRunForFrames(aTextBuffer=0x0c3e900c)  Line 1807	C++
 	xul.dll!BuildTextRunsScanner::FlushFrames(aFlushLineBreaks=0x00000000, aSuppressTrailingBreak=0x00000000)  Line 1238	C++
 	xul.dll!BuildTextRunsScanner::ScanFrame(aFrame=0x06d9f910)  Line 1390	C++
 	xul.dll!BuildTextRuns(aContext=0x09a35500, aForFrame=0x0eb77920, aLineContainer=0x09bb71b0, aForFrameLine=0x003e98d8)  Line 1148	C++
 	xul.dll!nsTextFrame::EnsureTextRun(aReferenceContext=0x09a35500, aLineContainer=0x09bb71b0, aLine=0x003e98d8, aFlowEndInTextRun=0x003e9480)  Line 2040	C++
 	xul.dll!nsTextFrame::Reflow(aPresContext=0x06b5c730, aMetrics={...}, aReflowState={...}, aStatus=0x6f40bd66)  Line 6295	C++
 	xul.dll!nsLineLayout::ReflowFrame(aFrame=0x0eb77920, aReflowStatus=0x6f40bd66, aMetrics=0x00000000, aPushedFrame=0x00000000)  Line 853	C++
 	xul.dll!nsBlockFrame::ReflowInlineFrame(aState={...}, aLineLayout={...}, aLine={...}, aFrame=0x0eb77920, aLineReflowStatus=0x003e9844)  Line 3722	C++
 	xul.dll!nsBlockFrame::DoReflowInlineFrames(aState={...}, aLineLayout={...}, aLine={...}, aFloatAvailableSpace={...}, aAvailableSpaceHeight=0x00000000, aFloatStateBeforeLine=0x003e98b0, aKeepReflowGoing=0x003e9cfc, aLineReflowStatus=0x003e9980, aAllowPullUp=0x00000001)  Line 3517	C++
 	xul.dll!nsBlockFrame::ReflowInlineFrames(aState={...}, aLine={...}, aKeepReflowGoing=0x003e9cfc)  Line 3371	C++
 	xul.dll!nsBlockFrame::ReflowLine(aState={...}, aLine={...}, aKeepReflowGoing=0x003e9cfc)  Line 2467	C++
 	xul.dll!nsBlockFrame::ReflowDirtyLines(aState={...})  Line 1907	C++
 	xul.dll!nsBlockFrame::Reflow(aPresContext=0x06b5c730, aMetrics={...}, aReflowState={...}, aStatus=0x00000000)  Line 1009	C++
 	xul.dll!nsBlockReflowContext::ReflowBlock(aSpace={...}, aApplyTopMargin=0x00000001, aPrevMargin={...}, aClearance=0x00000000, aIsAdjacentWithTop=0x00000000, aLine=0x09bb74e0, aFrameRS={...}, aFrameReflowStatus=0x00000000, aState={...})  Line 310	C++
 	xul.dll!nsBlockFrame::ReflowBlockFrame(aState={...}, aLine={...}, aKeepReflowGoing=0x003ea874)  Line 3090	C++
 	xul.dll!nsBlockFrame::ReflowLine(aState={...}, aLine={...}, aKeepReflowGoing=0x003ea874)  Line 2412	C++
 	xul.dll!nsBlockFrame::ReflowDirtyLines(aState={...})  Line 1907	C++
 	xul.dll!nsBlockFrame::Reflow(aPresContext=0x06b5c730, aMetrics={...}, aReflowState={...}, aStatus=0x00000000)  Line 1009	C++
 	xul.dll!nsBlockReflowContext::ReflowBlock(aSpace={...}, aApplyTopMargin=0x00000001, aPrevMargin={...}, aClearance=0x00000000, aIsAdjacentWithTop=0x00000000, aLine=0x09bb75a8, aFrameRS={...}, aFrameReflowStatus=0x00000000, aState={...})  Line 310	C++
 	xul.dll!nsBlockFrame::ReflowBlockFrame(aState={...}, aLine={...}, aKeepReflowGoing=0x003eb3ec)  Line 3090	C++
 	xul.dll!nsBlockFrame::ReflowLine(aState={...}, aLine={...}, aKeepReflowGoing=0x003eb3ec)  Line 2412	C++
 	xul.dll!nsBlockFrame::ReflowDirtyLines(aState={...})  Line 1907	C++
 	xul.dll!nsBlockFrame::Reflow(aPresContext=0x06b5c730, aMetrics={...}, aReflowState={...}, aStatus=0x00000000)  Line 1009	C++
 	xul.dll!nsBlockReflowContext::ReflowBlock(aSpace={...}, aApplyTopMargin=0x00000001, aPrevMargin={...}, aClearance=0x00000000, aIsAdjacentWithTop=0x00000001, aLine=0x09bb2ee0, aFrameRS={...}, aFrameReflowStatus=0x00000000, aState={...})  Line 310	C++
 	xul.dll!nsBlockFrame::ReflowBlockFrame(aState={...}, aLine={...}, aKeepReflowGoing=0x003ebf64)  Line 3090	C++
 	xul.dll!nsBlockFrame::ReflowLine(aState={...}, aLine={...}, aKeepReflowGoing=0x003ebf64)  Line 2412	C++
 	xul.dll!nsBlockFrame::ReflowDirtyLines(aState={...})  Line 1907	C++
 	xul.dll!nsBlockFrame::Reflow(aPresContext=0x06b5c730, aMetrics={...}, aReflowState={...}, aStatus=0x00000000)  Line 1009	C++
 	xul.dll!nsContainerFrame::ReflowChild(aKidFrame=0x09bb2a70, aPresContext=0x06b5c730, aDesiredSize={...}, aReflowState={...}, aX=0x00000000, aY=0x00000000, aFlags=0x00000000, aStatus=0x00000000, aTracker=0x00000000)  Line 738	C++
 	xul.dll!nsCanvasFrame::Reflow(aPresContext=0x06b5c730, aDesiredSize={...}, aReflowState={...}, aStatus=0x00000000)  Line 491	C++
 	xul.dll!nsContainerFrame::ReflowChild(aKidFrame=0x06daeef8, aPresContext=0x06b5c730, aDesiredSize={...}, aReflowState={...}, aX=0x00000000, aY=0x00000000, aFlags=0x00000003, aStatus=0x00000000, aTracker=0x00000000)  Line 738	C++
 	xul.dll!nsHTMLScrollFrame::ReflowScrolledFrame(aState=0x003ec940, aAssumeHScroll=0x00000001, aAssumeVScroll=0x00000001, aMetrics=0x003ec870, aFirstPass=0x00000001)  Line 513	C++
 	xul.dll!nsHTMLScrollFrame::ReflowContents(aState=0x003ec940, aDesiredSize={...})  Line 606	C++
 	xul.dll!nsHTMLScrollFrame::Reflow(aPresContext=0x06b5c730, aDesiredSize={...}, aReflowState={...}, aStatus=0x00000000)  Line 812	C++
 	xul.dll!nsContainerFrame::ReflowChild(aKidFrame=0x06daf080, aPresContext=0x06b5c730, aDesiredSize={...}, aReflowState={...}, aX=0x00000000, aY=0x00000000, aFlags=0x00000000, aStatus=0x00000000, aTracker=0x00000000)  Line 738	C++
 	xul.dll!ViewportFrame::Reflow(aPresContext=0x06b5c730, aDesiredSize={...}, aReflowState={...}, aStatus=0x00000000)  Line 285	C++
 	xul.dll!PresShell::DoReflow(target=0x06dae090, aInterruptible=0x00000001)  Line 7522	C++
 	xul.dll!PresShell::ProcessReflowCommands(aInterruptible=0x00000001)  Line 7647	C++
 	xul.dll!PresShell::FlushPendingNotifications(aType=Flush_InterruptibleLayout)  Line 4829	C++
 	xul.dll!nsRefreshDriver::Notify(__formal=0x098b34d8)  Line 226	C++

###!!! ASSERTION: UniscribeItem is too big, ScriptShape() will fail!: 'mMaxGlyphs < 65535', file c:/builds/mozilla-central/ff-debug/gfx/thebes/../../../src/gfx/thebes/gfxUniscribeShaper.cpp, line 88

###!!! ASSERTION: Hmm, something went wrong, aOffset should have been found: 'mGlyphRuns[start].mCharacterOffset <= aOffset', file c:/builds/mozilla-central/ff-debug/gfx/thebes/../../../src/gfx/thebes/gfxFont.cpp, line 3638
.... repeated many times
Then, many repetitions later:

nsLineLayout: Text(0)@17EE4140 metrics=62967840,960!
nsLineLayout: Text(0)@17EE9D70 metrics=62967840,960!
###!!! ASSERTION: Textrun was not completely removed from the cache!: 'aTextRun->mCachedWords == 0', file c:/builds/mozilla-central/ff-debug/gfx/thebes/../../../src/gfx/thebes/gfxTextRunWordCache.cpp, line 873
In this case, at least, the failure seems to start with an error return from the Uniscribe function ScriptBreak, called from CopyItemSplitOversize with an item length (character count) of 65537. ScriptBreak returns E_INVALIDARG (presumably it doesn't like counts > 64K, although the MSDN doc does not explicitly say so); this causes CopyItemSplitOversize to return early with a failure, and things go downhill from there.
blocking2.0: --- → betaN+
OK, I assume you can fix that.

However, it also seems to me that gfxFontGroup::MakeTextRun should be checking for failure from InitTextRun and returning null on failure, then we should be checking and handling that in gfxTextRunWordCache.
This fixes bsmedberg's assert-then-crash example, at least in my testing. We can actually avoid calling ScriptBreak here at all, as the gfxFontGroup has already called SetupClusterBoundaries by the time we reach this code, so we can use those instead. I also reduced the max item size constant from 32K to 16K, as I also encountered a ScriptShape failure when it was exactly 32K. So a more conservative limit seems to be called for.
Attachment #462520 - Flags: review?(jdaggett)
(In reply to comment #24)
> However, it also seems to me that gfxFontGroup::MakeTextRun should be checking
> for failure from InitTextRun and returning null on failure, then we should be
> checking and handling that in gfxTextRunWordCache.

I don't think returning null from gfxFontGroup::MakeTextRun really makes sense, as this function may call gfxFont::InitTextRun many times for various sub-runs of the text, and we might encounter failure in one of these while others succeed. Returning null for the textrun as a whole seems too severe of a response. It's also not clear to me how gfxTextRunWordCache could usefully handle this.

If one of the individual fonts' InitTextRun calls fails, how about treating that particular run as missing glyphs? I think we could do that with a minor change in gfxFontGroup::InitTextRun(), and that's probably the most reasonable thing we can do if text shaping has failed completely, even though ComputeRanges found a font that claimed to support the characters involved.
This attempts to handle shaper failure in a more robust/useful way, by marking the relevant text run as "missing glyphs" (to be rendered as hexboxes) rather than leaving the textRun in a broken state.
Attachment #462557 - Flags: review?(roc)
The original testcase listed in the description still fails.  This has been masked recently by all the changes made to work around Uniscribe rendering bugs on XP.  The attached patch simulates the same thing as using 'text-rendering: optimizeLegibility' on a testpage.  The fallback from Uniscribe to GDI shaper was broken by the original shaper code bug back in March, it looks like we're not cleaning out the glyph runs correctly such that there are text runs with zero glyph runs.  This produces lots of assertions:

WARNING: Uniscribe failed to place with font: file c:/dev/mozcentral/gfx/thebes/gfxUniscribeShaper.cpp, line 630
###!!! ASSERTION: invalid array index: 'i < Length()', file c:\dev\mozcentral\obj-debug\dist\include\nsTArray.h, line 339

and 

###!!! ASSERTION: Textrun was not completely removed from the cache!: 'aTextRun->mCachedWords == 0', file c:/dev/mozcentral/gfx/thebes/gfxTextRunWordC
ache.cpp, line 873

Eventually a crash occurs indexing into a zero-length array.
I think I've identified the problem with Uniscribe-to-GDI fallback that's leading to the missing glyph runs and failure in the original report here. The problem is that when the Uniscribe shaper fails, it calls gfxTextRun::ResetGlyphRuns() to completely clear the glyph run information in the textRun. This is no longer appropriate; even though shaping has failed, we still want the glyph run entry for the glyphs that the GDI shaper is going to provide (or for missing glyph info, in the last-resort case where no shaper succeeds).

So the attached patch appears to resolve the problem with Antique Olive Roman in my testing on XP, at least. It's still not clear WHY Uniscribe fails to shape some text runs with this font, but at least we'll now handle the failure gracefully.

All three of the patches here should be taken, I believe: they each address separate, valid bugs in the text handling, any of which could lead to a crash of this nature.
Attachment #463231 - Flags: review?(jdaggett)
Comment on attachment 462520 [details] [diff] [review]
patch, v1 - fix handling of overlong text runs in the Uniscribe shaper

Looks good.  Confirmed that it fixes the original reported testcase. 
Plus'ing with a couple nits.

-        if (ESTIMATE_MAX_GLYPHS(itemLength) > 65535) {
+        if (ESTIMATE_MAX_GLYPHS(itemLength) > 32767) {

Was there a reason for changing the run length limit here?  I think a
#define with a comment explaining the limit would be a good idea.

I think we also need an assertion in
gfxTextRun::FindFirstGlyphRunContaining on (mGlyphRuns.Length() != 0),
since that appears to be the way other code is treating this function
(i.e. using the return value to index directly into the glyph run
array).
Attachment #462520 - Flags: review?(jdaggett) → review+
Attachment #463231 - Flags: review?(jdaggett) → review+
http://hg.mozilla.org/mozilla-central/rev/fc3b32b2f050
(part 1 - overlong text runs for uniscribe)

http://hg.mozilla.org/mozilla-central/rev/444328d96da2
(part 2 - handling of shaper failure)

http://hg.mozilla.org/mozilla-central/rev/153bd6dc88be
(part 3 - uniscribe to gdi fallback)
Assignee: jdaggett → jfkthame
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
This was backed out ?  Should this be re-opened ?
Status: RESOLVED → UNCONFIRMED
Ever confirmed: false
Resolution: FIXED → ---
Now why the hell did bugzilla change the status from Resolved Fixed to Unconfirmed just by adding a comment - ???
What the...?! Yes, this should have been re-opened; I was intending to do that. But we can't get there from Unconfirmed.... guess I'll try changing it to Resolved, and then Re-open it.
Status: UNCONFIRMED → RESOLVED
Closed: 11 years ago11 years ago
Resolution: --- → FIXED
Status: RESOLVED → REOPENED
Ever confirmed: true
Resolution: FIXED → ---
Backed out because the assertion suggested in comment #31 ended up firing and causing lots of orange. :(
Well, this is good actually.  If we're seeing cases where mGlyphRuns.length is 0, then we should be able to track down the codepath causing that and fix.  There are several places elsewhere in the code where it's clearly assumed that length > 0, so this probably goes a long way to explaining many of the hard-to-pin-down crashes in text run code that show up in crash reports going back to 3.0.
(In reply to comment #37)
> Well, this is good actually.  If we're seeing cases where mGlyphRuns.length is
> 0, then we should be able to track down the codepath causing that and fix. 
> There are several places elsewhere in the code where it's clearly assumed that
> length > 0, so this probably goes a long way to explaining many of the
> hard-to-pin-down crashes in text run code that show up in crash reports going
> back to 3.0.

I don't think it's that simple. I've been looking into this and there are situations where length=0 is apparently ok. We have places (e.g. in gfxTextRun::MeasureText) where we use a pattern like

    GlyphRunIterator iter(this, aStart, aLength);
    while (iter.NextRun()) {
        .....
    }

and these are sometimes called for zero-length text ranges where there may be no glyphRuns present. The GlyphRunIterator constructor will call FindFirstGlyphRunContaining() to initialize its mNextIndex member, which will thus be set to zero. This is correct because mNextIndex == mGlyphRuns.Length() indicates the end of the iteration has already been reached, and GlyphRunIterator::NextRun will immediately return false in this case. This happens hundreds of times in the course of a normal reftest run, for example.
We can probably have a slightly weaker assertion that

    (mCharacterCount == 0 || mGlyphRuns.Length() > 0)

here, as the only case where no glyph runs could be valid would be a zero-length run; I'll test this and re-land with that in place if all goes well.

If we want to avoid ever calling this code with zero-length runs, so that the strong assertion about glyph runs is correct, we could do that but it would involve adding length checks further up the call hierarchy. It's not clear to me whether that's worthwhile - it would short-circuit the code paths for empty runs, but at the cost of adding extra conditionals to all the non-empty ones. If we want to look into that further, I think it should be in a separate bug, not here.
http://hg.mozilla.org/mozilla-central/rev/ddd9c8f8b0e2 (pt 1, with revised assertion as per comment #39)
http://hg.mozilla.org/mozilla-central/rev/8b5f0928066c (pt 2)
http://hg.mozilla.org/mozilla-central/rev/465524bbbb5b (pt 3)
Status: REOPENED → RESOLVED
Closed: 11 years ago11 years ago
Resolution: --- → FIXED
Crash Signature: [@ TextRunWordCache::CacheHashEntry::KeyEquals(TextRunWordCache::CacheHashKey const*) const ]
You need to log in before you can comment on or make changes to this bug.