Closed Bug 554544 Opened 14 years ago Closed 13 years ago

Wikipedia crashes [@ nsDisplayTextDecoration::Paint(nsDisplayListBuilder*, nsIRenderingContext*) ] and [@ nsTextFrame::Reflow(nsPresContext*, nsHTMLReflowMetrics&, nsHTMLReflowState const&, unsigned int&) ]

Categories

(Core :: Layout, defect)

All
macOS
defect
Not set
normal

Tracking

()

RESOLVED FIXED
Tracking Status
status1.9.2 --- .5-fixed

People

(Reporter: jaas, Assigned: jfkthame)

References

Details

(Keywords: topcrash, Whiteboard: [qa-needs-STR])

Crash Data

Attachments

(2 files, 1 obsolete file)

There seems to be a new topcrash in Firefox 3.6.2 for Mac OS X. Wikipedia is often mentioned as the site on which the crash occurred.

http://crash-stats.mozilla.com/report/index/9fc6a455-6378-477b-b193-831fa2100323

Happens to both PPC and i386 users.
Summary: wikipedia crash [@ nsDisplayTextDecoration::Paint(nsDisplayListBuilder*, nsIRenderingContext*)] → wikipedia crash [@ nsDisplayTextDecoration::Paint]
Actually, this is not a 3.6.2 regression (crash-stats wasn't working for 3.6 a few minutes ago). It did rise to the #2 spot though.

Crash from 3.6.0:

http://crash-stats.mozilla.com/report/index/4b03dc52-6b9f-4423-baff-e12a82100322
This crash almost always happens on Mac OS X 10.4, but there are a few reports from Mac OS X 10.6.
Keywords: topcrash
Crashing on null font metrics? is that possible?
I wonder if it's somehow possible for gfxMacPlatformFontList::GetDefaultFont() to return null? It's not supposed to, but if it ever does then we could crash like this, I think.

I suspect Wikipedia shows up as a common site because it has a bunch of other-language links including various non-Latin scripts, and so it triggers a lot of font fallback trying to display these.
I have seen Twitter comments about this but I have not yet been able to trigger the crash from URLs I have harvested. Will try again today and report back.
There are also many crashes at nsTextFrame::Reflow(nsPresContext*,
nsHTMLReflowMetrics&, nsHTMLReflowState const&, unsigned int&) that
seem related (those on the Mac, at least):

http://crash-stats.mozilla.com/report/list?platform=mac&query_search=signature&query_type=exact&query=nsTextFrame%3A%3AReflow%28nsPresContext%2A%2C%20nsHTMLReflowMetrics%26%2C%20nsHTMLReflowState%20const%26%2C%20unsigned%20int%26%29&date=&range_value=1&range_unit=weeks&process_type=all&plugin_field=&plugin_query_type=&plugin_query=&do_query=1&signature=nsTextFrame%3A%3AReflow%28nsPresContext%2A%2C%20nsHTMLReflowMetrics%26%2C%20nsHTMLReflowState%20const%26%2C%20unsigned%20int%26%29

They happen when a NULL nsIFontMetrics* is derefenced (one returned by
PropertyProvider.GetFontMetrics()):

http://hg.mozilla.org/releases/mozilla-1.9.2/annotate/cd857b3b0e33/layout/generic/nsTextFrameThebes.cpp#l6319

Like crashes originally reported here, they mostly happen on OS X
10.4.11.  But there are also a few on OS X 10.6.2.  The comments
indicate that many of them happen on Wikipedia.
Summary: wikipedia crash [@ nsDisplayTextDecoration::Paint] → Wikipedia crashes [@ nsDisplayTextDecoration::Paint] and [@ nsTextFrame::Reflow(nsPresContext*, nsHTMLReflowMetrics&, nsHTMLReflowState const&, unsigned int&) ]
Summary: Wikipedia crashes [@ nsDisplayTextDecoration::Paint] and [@ nsTextFrame::Reflow(nsPresContext*, nsHTMLReflowMetrics&, nsHTMLReflowState const&, unsigned int&) ] → Wikipedia crashes [@ nsDisplayTextDecoration::Paint ] and [@ nsTextFrame::Reflow(nsPresContext*, nsHTMLReflowMetrics&, nsHTMLReflowState const&, unsigned int&) ]
Summary: Wikipedia crashes [@ nsDisplayTextDecoration::Paint ] and [@ nsTextFrame::Reflow(nsPresContext*, nsHTMLReflowMetrics&, nsHTMLReflowState const&, unsigned int&) ] → Wikipedia crashes [@ nsDisplayTextDecoration::Paint(nsDisplayListBuilder*, nsIRenderingContext*) ] and [@ nsTextFrame::Reflow(nsPresContext*, nsHTMLReflowMetrics&, nsHTMLReflowState const&, unsigned int&) ]
Bug 558101 may also be related -- it's also about crashes on
dereferencing a NULL nsIFontMetrics pointer.

Interestingly, that bug happens on both Windows and OS X.
Jonathan, could you work up some kind of fix here, at least so we don't crash?
Assignee: nobody → jfkthame
I'm not sure we know where the null nsIFontMetrics originally comes from. Comment #4 mentioned an idea but I don't know if that's actually happening. Anyhow, I'll see what I can come up with tomorrow.
I'm guessing this shows up particularly often on Wikipedia because of the frequent links to other language (and script) versions, which trigger fallback to font preferences and even the system font-list search much more often than typical sites.

The link with font searching/fallback/etc is highlighted by comments such as "This was an attempt to access Wikipedia—crash happened several times. Have now changed the font choices as suggested on a site that I came across via Google. This seems to have worked."

I haven't been able to reproduce this locally, but my current guess is that somehow the gfxFontGroup is ending up with no actual fonts in its list. This would lead to failure when layout tries to get metrics.

A few possible scenarios where this might happen would be if the platform's GetDefaultFont returns a "broken" family with no valid members (we've seen problems with such families previously); or if we have a valid gfxFontEntry, but instantiating the actual gfxFont fails (e.g., because of error returns from platform API when we try to set up the metrics). Bad fonts and/or corrupted system font caches may well be contributing factors.

This patch attempts to mitigate the risk of ending up without a valid font by adding a "last resort" fallback step to gfxFontGroup::BuildFontList(). If we still have no fonts in the font list after trying GetDefaultFont(), it will retrieve the platform font list and simply take the first gfxFont that it successfully instantiates. Unless the platform is so broken that we can't access any fonts, this ought to return *something* valid, so that subsequent GetMetrics calls will be safe.

I don't have any way to test this as I have not been able to reproduce the failure, but it should have no effect in any "normal" situation where we're finding fonts successfully; it just steps in to provide a "font of last resort" when the alternative is an empty font list and subsequent crash. As such, I think it would be safe to land and see if it has any effect on the crash stats.
Attachment #438101 - Flags: review?(roc)
Would it make more sense to say that GetDefaultFont must never return null (unless the system is utterly broken beyond usability) and therefore GetDefaultFont should be responsible for doing this fallback search if necessary?

Then if GetDefaultFont can't find any font at all, it could just do an NS_RUNTIMEABORT. That would at least tell us more clearly what the problem was, if it actually happens, which presumably it won't.
We already "require" GetDefaultFont to return something valid, in the sense that we NS_ASSERT if it fails (but of course that doesn't help people with non-debug builds). We could change that to an NS_RUNTIMEABORT, on the grounds that if we can't find a font at all, we're doomed. (I don't think that can be the actual point of failure in these crashes, though, because we immediately proceed to dereference the returned defaultFont, and would probably crash at that point if it were null.) My worry, though, is that the defaultFont reference itself (gfxFontEntry) is valid, but then calling FindOrMakeFont() on it fails.

So to move the "last-ditch fallback" search into GetDefaultFont, we'd have to change it so that it actually instantiates and returns the gfxFont object, not just the gfxFontEntry as it currently does. This is because the problem might be that we successfully find a font entry in the list, but hit a failure at the time of gfxFont creation.

I'm a bit reluctant to do this because it would mean duplicating more code in the various platform implementations of GetDefaultFont (we have 4 at the moment, potentially more if we can get Linux onto the same font-list APIs.) It's gfxFontGroup::BuildFontList() that handles the actual instantiation of gfxFont objects, so that seems the most logical place to handle potential failure of that process.
OK that makes sense.

Can we add an NS_RUNTIMEABORT to catch the case where "font of last resort" fails?
Attachment #438101 - Attachment is obsolete: true
Attachment #438351 - Flags: review?(roc)
Attachment #438101 - Flags: review?(roc)
Comment on attachment 438351 [details] [diff] [review]
patch, v2: die with NS_RUNTIMEABORT if no font can be found

great!
Attachment #438351 - Flags: review?(roc) → review+
Pushed to trunk: http://hg.mozilla.org/mozilla-central/rev/ed979b721cb7

I don't think we'll really know whether (or how much) this helps until we land it on 1.9.2 and see what impact it has on crash stats, though, as the crash reports are coming from FF 3.6.x, AFAICS. I'll put up a patch for moz-192 tomorrow.
On 1.9.2, the gfxFontGroup font list is built in platform-specific subclasses; this patch only touches the Mac (ATSUI) implementation at present, as that's the platform we're concerned about in this bug. We could consider adding similar code to the Windows implementation, but I'd prefer not to touch it unless we see there's a real problem happening.
Attachment #438449 - Flags: review?(roc)
Comment on attachment 438449 [details] [diff] [review]
patch for 1.9.2 branch ATSUI code to avoid empty font list

Requesting approval to land this on 1.9.2 branch. The intention is to protect us from "bad font" situations that lead to null-deref crashes later in layout. As we don't have a reproducible test case for the reported crashes, it's hard to be sure whether this will really resolve them all; only future crash-stats will indicate this.

The code here should be safe to land, as it represents fallback precautionary measures that have no effect in normal use; it only comes into play when normal font setup fails and we would otherwise be headed for a potential crash.
Attachment #438449 - Flags: approval1.9.2.4?
Blocks: 560813
I think bug 560813 might be the same, so the reporter there can be asked to test fixes...
Comment on attachment 438449 [details] [diff] [review]
patch for 1.9.2 branch ATSUI code to avoid empty font list

a=LegNeato for 1.9.2.5. Please ONLY land this on mozilla-1.9.2 default, as we
are still working on 1.9.2.4 on the relbranch
Attachment #438449 - Flags: approval1.9.2.4? → approval1.9.2.5+
I landed the above patch on 1.9.2 for you:
http://hg.mozilla.org/releases/mozilla-1.9.2/rev/cabaf5f4cd0f

Marking as .5-fixed, since I think that's what's expected, even though I'm not sure it's know whether this will actually be fixed.
Checking crash-stats, I don't see any reports from Firefox versions later than 3.6.6. (They're mostly from users on OS X 10.4.11, a few from earlier 10.4.x release, and a handful are from 10.6.4.)

It's not obvious to me from the branch pushlog what may have fixed it since the FF3.6.6 release, but it seems to no longer occur. OK to resolve this?
Re-checking crash-stats for the last 30 days, I still can't find any reports from post-3.6.6 versions. Closing this as FIXED; I don't think there's anything more we can usefully do here.
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Crash Signature: [@ nsDisplayTextDecoration::Paint(nsDisplayListBuilder*, nsIRenderingContext*) ] [@ nsTextFrame::Reflow(nsPresContext*, nsHTMLReflowMetrics&, nsHTMLReflowState const&, unsigned int&) ]
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: