Closed Bug 110793 Opened 23 years ago Closed 23 years ago

Moz cores with SIGFPE in GTK font code

Categories

(SeaMonkey :: General, defect)

x86
Linux
defect
Not set
critical

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: david, Assigned: adam)

References

()

Details

Attachments

(2 files, 6 obsolete files)

When trying to view this email w/ either messenger or nav, moz will core.  I'm
rebuilding w/ debug enabled now to produce a stack trace.

<bold>Please save unfinished work before viewing this email.</bold>

- cvs code from early nov18/2001
- gcc 2.95.3
- -O3 opt w/out debug
Version: 3.0 → 4.1.3
Moz cores on http://godaddy.com/ also
Severity: major → critical
Summary: Moz segfaults with SIGFPE → Moz cores with SIGFPE
Changed the product to Browser.
Assignee: wtc → asa
Component: NSPR → Browser-General
Product: NSPR → Browser
QA Contact: wtc → doronr
Version: 4.1.3 → other
Another website which moz cores on; http://securityfocus.com/
godaddy and securityfocus both
WFM (2001111808 Win2K)
Ooops, sorry, http://stuph.org/broken2.eml works, too ;-)
I'm seeing SIGFPE (divide by zero) in the new X11 anti-aliased
font code, Linux/x86.  Not seeing it on Solaris6.  Bet it's
dependant on the fonts in the font-path.
Another email that cores, http://stuph.org/broken3.eml.

I cvs updated X about two weeks ago.

I'm not exactly sure what the problem is, in broken2.eml, there isn't a
font-face used anywhere.  broken3.eml has Times New Roman and Arial.  I use
those fonts in other web pages, both of which are presented fine.

Perhaps it is a particular glyph?
************************************************************
* Call to xpconnect wrapped JSObject produced this error:  *
[Exception... "Component returned failure code: 0xbfffb670
[nsICurrentCharsetListener.SetCurrentCharset]"  nsresult: "0xbfffb670
(<unknown>)"  location: "JS frame :: chrome://global/content/charsetOverlay.js
:: charsetLoadListener :: line 211"  data: no]
************************************************************
XXX Damage rectangle (2808,559,209,209) does not intersect the widget's view
(0,0,0,0)!
XXX Damage rectangle (2808,559,209,209) does not intersect the widget's view
(0,0,208,208)!
nsWidget::~nsWidget() of toplevel: 13 widgets still exist.
###!!! ASSERTION: cannot handle open-ended tag name query: 'Error', file
nsContentTagTestNode.cpp, line 84
###!!! Break: at file nsContentTagTestNode.cpp, line 84
Start reading in bookmarks.html
Finished reading in bookmarks.html  (63309 microseconds)
Opening file cookperm.txt failed
XXX Damage rectangle (0,0,209,209) does not intersect the widget's view (0,0,0,0)!
/usr/src/cvs/mozilla/dist/bin/run-mozilla.sh: line 72: 12480 Floating point
exception(core dumped) $prog ${1+"$@"}

Definitely something to do with fonts.  Dang it, I really need to recompile all
my libs w/ debugging.

#0  0x42234702 in nsFontMetricsGTK::RealizeFont ()
   from /usr/local/src/cvs/mozilla/dist/bin/components/libgfx_gtk.so
#1  0x42234502 in nsFontMetricsGTK::Init ()
   from /usr/local/src/cvs/mozilla/dist/bin/components/libgfx_gtk.so
#2  0x400386d4 in nsFontCache::GetMetricsFor () at eval.c:41
#3  0x400376df in DeviceContextImpl::GetMetricsFor () at eval.c:41
#4  0x41f57bb3 in nsTextFrame::Reflow ()

more info coming..
I'm poking around in the AA code right now.  Basically this:
NS_ASSERTION(unscaled_width<=mUnscaledMax.width, "unexpected glyph width");
is firing; I'm tracing it back now to see where things are
going awry.  (Brian?)
Status: UNCONFIRMED → NEW
Ever confirmed: true
    1263 
    1264 fprintf(stderr, "b3.1: mMaxAscent(%i), mEmHeight(%d), lineSpacing(%d)\n",
    1265     mMaxAscent, mEmHeight, lineSpacing);
    1266 
    1267   mEmAscent = nscoord(mMaxAscent * mEmHeight / lineSpacing);
    1268 
    1269 fprintf(stderr, "b3.2: mMaxAscent(%i), mEmHeight(%d), lineSpacing(%d)\n",
    1270     mMaxAscent, mEmHeight, lineSpacing);


b3.1: mMaxAscent(130), mEmHeight(143), lineSpacing(156)
b3.2: mMaxAscent(130), mEmHeight(143), lineSpacing(156)
b3.1: mMaxAscent(130), mEmHeight(143), lineSpacing(156)
b3.2: mMaxAscent(130), mEmHeight(143), lineSpacing(156)
b3.1: mMaxAscent(-2147483648), mEmHeight(117), lineSpacing(0)

Hmm, that's odd.  where is this INTMAX number coming from for mMaxAscent?
Attached patch avoid crash (do not check in) (obsolete) — Splinter Review
Okay, the attached patch avoids the crash by returning
early when metrics get funky, but then of course you
end up with pieces of text undrawn.

Hope it helps someone track down the real problem...
I'm seeing the crash on all attempts to use AA-text
anywhere, so I don't think it's specific to rendering
a particular bad glyph.

FYI I'm using the XFree86 4.1.0 built-in font rasterizer
for standard font types -- no freetype rendering, no xft,
no font-server or other funky stuff, though my installed
fonts are more various than the default pack of fonts that
comes with X11.

Adam, how does that explain a few things?

a) broken2.eml doesn't specify any fonts which should use my default fonts
b) my default fonts work fine for other emails
c) this started happening w/ moz code <24 hours ago
d) i haven't changed anything in X for ~2 weeks
e) i haven't touched AA

Granted, I'm not up on AA fonts.  Is it on "by default" w/ X?  I haven't paid
any attention to xf86 mail in months.
Okay, here's where I stand...

godaddy and securityfocus work fine.

http://stuph.org/broken2.eml crashes due to a font
problem I haven't identified.

http://chinese.china.com/zh_tw/ (or any other AAfont-
triggering page) crashes in the antiJag routines without
the attached patch; works fine on another machine though.
I just checked in AA (non-TrueType) code this weekend.

Looking ...
David, the new AA stuff is very recent and moz-specific.
It's not directly handled by X itself.
> http://chinese.china.com/zh_tw/ (or any other AAfont-
> triggering page) crashes in the antiJag routines

http://chinese.china.com/zh_tw/ is definitely one of my test pages

I wonder why it would even be calling an anti-jag since that is only used
for scaling up.

as a temporary workaround one can disable the anti-aliased scaled bitmap (AASB)
fonts in unix.js 
(http://lxr.mozilla.org/seamonkey/source/modules/libpref/src/unix/unix.js)
by changing:

 217 pref("font.scale.aa_bitmap.enable", true);

to

 217 pref("font.scale.aa_bitmap.enable", false);
I need to find a way to duplicate this crash.
Assignee: asa → bstell
Thanks Brian, that fixes broken2.eml and china.com
here, confirming the problem as an AA-text one I suppose.
I was about to rudely assign the bug to you but I see
you've done that yourself now.  =)
why does it work on all my systems and crash on others?
I imagine it'd be dependant on the contents of
the available fonts, Brian.
Adam: Until I can reproduce the problem would you mind helping?

Can you stop the code in a debugger at the crash point and look
report the values in fontInfo? (eg: do they look reasonable, unitialized,
or all zero)
http://lxr.mozilla.org/seamonkey/source/gfx/src/gtk/nsFontMetricsGTK.cpp#1247
1247   XFontStruct *fontInfo = xFont->GetXFontStruct();

The values should have been set in:
http://lxr.mozilla.org/seamonkey/source/gfx/src/gtk/nsXFontAAScaledBitmap.cpp#49
6
496 nsXFontAAScaledBitmap::LoadFont()
Adam: would it be possible to set the environment variable NS_FONT_DEBUG=D
and capture moz's output when the crash happens and attach it to this bug?
<pokearound>
Well, I can't seem to see where mScaledMax.lbearing
is getting trashed, but it does indeed seem to be
(it's becoming the most-negative int).
</pokearound>

I'll do the inspection you request in a sec...
I can't reproduce the crash with any page mentioned in this report.
I'm running 2001-11-19-08-trunk comm. build on RH7.1 Ja.
Two, one just before the faulty font, and the faulty font:

FindFont(a/0x0061), nsFontMetricsGTK.cpp 4345
    FindStyleSheetSpecificFont, nsFontMetricsGTK.cpp 3876
        familyName = times, nsFontMetricsGTK.cpp 3889
        TryFamily times with lang group = x-western, nsFontMetricsGTK.cpp 3824
      TryLangGroup lang group = x-western, aName = *-times-*-*,
nsFontMetricsGTK.cpp 3806
      lang group = x-western, nsFontMetricsGTK.cpp 4272
      iso8859-1 ffre = *-times-iso8859-1, nsFontMetricsGTK.cpp 4302
        TryNodes aFFREName = *-times-iso8859-1, nsFontMetricsGTK.cpp 3715
        load font adobe-times-iso8859-1, nsFontMetricsGTK.cpp 3191
bitmap font:_______ adobe-times-iso8859-1
                    desired=32, scaled=32, bitmap=34, nsFontMetricsGTK.cpp 2803
loaded -adobe-times-bold-r-normal--34-240-100-100-p-177-iso8859-1
b3.0: f(13.000000), fontInfo->descent(7)
b3.1: mMaxAscent(351), mEmHeight(442), lineSpacing(442)
b3.2: mMaxAscent(351), mEmHeight(442), lineSpacing(442)

FindFont(a/0x0061), nsFontMetricsGTK.cpp 4345
    FindStyleSheetSpecificFont, nsFontMetricsGTK.cpp 3876
        familyName = times, nsFontMetricsGTK.cpp 3889
        TryFamily times with lang group = x-western, nsFontMetricsGTK.cpp 3824
      TryLangGroup lang group = x-western, aName = *-times-*-*,
nsFontMetricsGTK.cpp 3806
      lang group = x-western, nsFontMetricsGTK.cpp 4272
      iso8859-1 ffre = *-times-iso8859-1, nsFontMetricsGTK.cpp 4302
        TryNodes aFFREName = *-times-iso8859-1, nsFontMetricsGTK.cpp 3715
        load font adobe-times-iso8859-1, nsFontMetricsGTK.cpp 3191
anti-aliased bitmap scaled font: adobe-times-iso8859-1
                    desired=48, aa-scaled=48, bitmap=34, aa_bitmap=34,
nsFontMetricsGTK.cpp 2776
scaled font:_______ adobe-times-iso8859-1
                    desired=48, scaled=48, bitmap=34, nsFontMetricsGTK.cpp 2809
loaded -adobe-times-bold-r-normal--34-240-100-100-p-177-iso8859-1
b3.0: f(13.000000), fontInfo->descent(-2147483648)
b3.1: mMaxAscent(-2147483648), mEmHeight(624), lineSpacing(0)

(that was broken2.eml)

I'm attaching broken3.eml
(broken3.eml font debug information attached)
I've got a hunch that mRatio is (nan) for some reason.
Here's the last font loaded on http://godaddy.com/ when it cores

FindFont(a/0x0061), nsFontMetricsGTK.cpp 4345
    FindStyleSheetSpecificFont, nsFontMetricsGTK.cpp 3876
    FindStyleSheetGenericFont, nsFontMetricsGTK.cpp 3983
      user pref font.name.serif.x-western = adobe-times-iso8859-1,
nsFontMetricsGTK.cpp 4221
        TryNode aName = adobe-times-iso8859-1, nsFontMetricsGTK.cpp 3741
        load font adobe-times-iso8859-1, nsFontMetricsGTK.cpp 3191
anti-aliased bitmap scaled font: adobe-times-iso8859-1
                    desired=9, aa-scaled=9, bitmap=11, aa_bitmap=17,
nsFontMetricsGTK.cpp 2776
scaled font:_______ adobe-times-iso8859-1
                    desired=9, scaled=9, bitmap=11, nsFontMetricsGTK.cpp 2809
loaded -adobe-times-medium-r-normal--17-120-100-100-p-84-iso8859-1
b3.0: f(13.000000), fontInfo->descent(-2147483648)
b3.1: mMaxAscent(-2147483648), mEmHeight(117), lineSpacing(0)
The nsXFontAAScaledBitmap constructor is
initializing mRatio with aSize/aUnscaledSize
which in this case is 0.0/0.0.
Attached patch fix for testing (obsolete) — Splinter Review
This patch bulletproofs the mRatio calculation in
the constructor against bad input metrics (well, the third
hunk does, the rest is diagnostic).  This fixes all problems on
this system.  David?
Attachment #58410 - Attachment is obsolete: true
Attached patch minimal fix (obsolete) — Splinter Review
Just the mRatio sanity fix.
Attachment #58442 - Attachment is obsolete: true
I applied patch 58443, no more crashing.  Thank you
Is this an interim fix, is the font metrics code going to be looked over for a
different patch?
As far as I'm concerned, I'm happy enough with the
final fix there, and won't be working on this any
further.  I can't speak for anyone else!
Adam: thanks for your help so far. Would you mind breakpointing
in nsXFontAAScaledBitmap::nsXFontAAScaledBitmap when (aSize == 0) ?

I'd like to understand why this would happen.
Is there anything in particular you want me to glean
from a backtrace at that point?  I don't have enough
time or disk-space for a debug-build, so I'll just
recompile the relevant pieces with -ggdb...
I'd like to know why it got called with a size of 0
It should have been called from nsFontMetricsGTK

http://lxr.mozilla.org/seamonkey/source/gfx/src/gtk/nsFontMetricsGTK.cpp#2007
2007     mXFont = new nsXFontAAScaledBitmap(GDK_DISPLAY(),
2008                                        DefaultScreen(GDK_DISPLAY()),
2009                                        gdkFont, mSize, mAABaseSize);

What page would ask for a size of 0?
is http://stuph.org/broken2.eml supposed to be a porn site?
Sorry Brian, now I can't make it misbehave even without
the fix.  Great, eh?

Afraid I have to sleep now... past midnight... work tomorrow... zzzz...
Attached patch patch; why is the req. size 0? (obsolete) — Splinter Review
This patch is probably not the right fix.

This patch would detect and reject a request for a zero size but why
is that size being asked for?
Attachment #58443 - Attachment is obsolete: true
Applying only attachment 58462 [details] [diff] [review] yields the crash again.  You 
probably knew that, but I thought I'd point it out.
Yes, broken2.eml is porn advertisement.
in #11 is says: "avoid crash (do not check in)"

+  if (aX-mScaledMax.lbearing <= 0 ||
+      aY-mScaledMax.ascent <= 0) {
+    //    fprintf(stderr, "ERK!\n");
+    return;
+  }

Since lbearing is allowed to be positive, zero, or negative it has to be the
ascent value.

in #35 it is odd that it checks for aUnscaledSize and aSize and stops
the crash

-  mRatio               = ((double)aSize)/((double)aUnscaledSize);
+  if (aUnscaledSize == 0 || aSize == 0)
+    mRatio = 1.0F;
+  else
+    mRatio               = ((double)aSize)/((double)aUnscaledSize);

but attachment 58462 [details] [diff] [review] which tests the values that are passed in as
aUnscaledSize and aSize does not stop the crash

+    NS_ASSERTION(mSize, "requested font size is 0");
+    NS_ASSERTION(mAABaseSize, "font size to scale is 0");
+    if ((mSize==0) || (mAABaseSize==0))
+      return;
-  mRatio               = ((double)aSize)/((double)aUnscaledSize);
+  if (aUnscaledSize == 0 || aSize == 0)
+    mRatio = 1.0F;
+  else
+    mRatio               = ((double)aSize)/((double)aUnscaledSize);

and

+    NS_ASSERTION(mSize, "requested font size is 0");
+    NS_ASSERTION(mAABaseSize, "font size to scale is 0");
+    if ((mSize==0) || (mAABaseSize==0))
+      return;

are *catching* the same thing but then reacting differently
to it.  The former says 'okie dokey!', nurses mRatio into a
sane state and keeps on truckin', while the latter drops out
early (wouldn't you want to nsnullify mXFont?).
And would it be a good idea to nsnullify mXFont at
about line 2001 also?  Might we not return with a
dangling pointer?  From a quick browse of
nsFontMetricsGTK.cpp it looks like mXFont could be
consequential, but I'm obviously not too familiar
with that code.
> are *catching* the same thing but then reacting differently to it.

good point!

Still, why is a 0 sized font is being requested?

Perhaps the size should be force to a minimum of 1 pixel
Status: NEW → ASSIGNED
> wouldn't you want to nsnullify mXFont?

actually I think mXFont is already null.
Attachment #58462 - Attachment is obsolete: true
FYI, with the 'early escape' patch mozilla goes
on to get a SIGFPE on line 1264 of nsFontMetricsGTK.cpp
-- lineSpacing comes out as 0.0
nsFontMetricsGTK.cpp:1248
fontInfo->ascent == fontInfo->descent == -2147483648

Great, huh?

I'm poking around to try to see why.
These get set in nsXFontAAScaledBitmap::LoadFont() in
gfx/src/gtk/nsXFontAAScaledBitmap.cpp
I'm there now -- the values look fine when they
pop out of
|mUnscaledFontInfo = (XFontStruct *)GDK_FONT_XFONT(mGdkFont);|
there.  But if they're then having |SCALED_SIZE()| done to them
and mRatio is bad... hmm.

Okay, so that wild goosechase arced all the way back to the mRatio problem. 
Hmph.
I'm afraid that attachment 58554 [details] [diff] [review] doesn't make a difference here.
Humm...

I understand how this code stops the crash but I do not understand how
we could have gotten here with these values.

+  if (aUnscaledSize == 0 || aSize == 0)
+    mRatio = 1.0F;
+  else
+    mRatio               = ((double)aSize)/((double)aUnscaledSize);

Can you put this code back in and breakpoint in the debugger and
find out what is being passed in?

Yup Brian, I'm working on that.
thank you for your continuing help and patience
Uh-oh.  Oh no no no no.  Looks like a gcc bug.
??
Well, I'm out of time again so will have another look
tomorrow.  This is probably specific to GCC 2.95.3.  I
think I may have a workaround but I've been delayed by
an hour or two by a totally unrelated bug so I haven't
gotten to test it properly yet.
Hmm, moz is coring on http://mailbits.com/Rdr.asp?s=5815, is this also a font
issue or should I open another bug?
belay that, it's a different bug.  if you are running messenger in a tab and you
open a link from an email w/ middle click to open a new tab and the new link
spawns more than one window, moz crashes.

searching|opening a new bug.
david: there is another font bug 110084
I'd like to steal this bug if I may.
Assignee: bstell → adam
Status: ASSIGNED → NEW
Read it and weep.  Ugh.

Works around what I can only see as being an argument-marshalling bug in GCC
2.95.3.  Does the trick here  (n.b. explicitly casting these contructor
arguments up to PRUInt32 does not work as an alternative).

David, please give this one a try.
Attachment #58554 - Attachment is obsolete: true
(comment) That doesn't right, I don't see why there is a bug w/ a 32bit integer.
 Integers on most 32bit platforms are 32bits, afaik, only Microsoft land has a
16bit integer, also called a 'short int'.
Are you saying that it doesn't work, or that it doesn't look right?
Doesn't look right, sorry, typo.  It'll take me a while to get results, I had
just clobbered my tree.
Brendan: would you kindly add comments to this bug as you see fit?
gcc bug workarounds should be commmented noisily as such.  Ordinarily we
discourage narrow int parameter types where chopping is not required.

/be
How's this one?
Attachment #58727 - Attachment is obsolete: true
Comment on attachment 58843 [details] [diff] [review]
Fix again, with comments on use of PRUint16

r/sr=brendan@mozilla.org, whatever you need.

/be
Attachment #58843 - Flags: superreview+
David, have you had a chance to verify the final fix yet?
Status: NEW → ASSIGNED
Mildly elucidating summary.
Summary: Moz cores with SIGFPE → Moz cores with SIGFPE in GTK font code
I'm getting a FPE with the following fragment under Linux:
(Renders ok when font size set to "6")

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">
<html>
        <body>
                <font size="7">Hi there</font>
        </body>
</html>
Any further information, Michael?  Build, compiler?

With attached fix or without?
Yes it works.

Sorry, Thanksgiving and all.
Thanks David.

Brian, might I trouble you for a sr/r= to
mirror brendan's, and a checkin?
Keywords: patch
Works for me after appying the patch.

For the records: yesterday's CVS, gcc 2.95.2 -O2 -march=i586 build
Thanks, Michael.
The patch seems harmless enough. The only issue I have is why there
is a problem (hence why the patch would help). 

Would you kindly add a reference or explaination re: "a GCC 2.95[.3] bug which 
would otherwise cause these parameters to be corrupted in the callee"

with that comment added: r=bstell@netscape.com
by-the-way: I don't mean to say you are wrong, I just don't understand
the issue and hence don't understand the fix.
*** Bug 111736 has been marked as a duplicate of this bug. ***
As 111736 is marked as dupe of this one, I do my ranting here. Im not convinced
that this is actually a dupe, as the only link here that causes a crash (for me)
is "http://chinese.china.com/zh_tw/".

Oh and the reason for this entry. New build from CVS, updated 2 hours ago,
craches on "http.//www.altavista.com" (did not try altavista with the old build).

This is a must fix. As Mozilla is criple-ware right now.
Jarmo: have you tried attachment 58843 [details] [diff] [review] ?
Regarding comment #85 I can dust off the x86 disassembler
to compare before/after code but I fear it can't be soon.
I don't have a reference in the GCC bug database (does such
a thing exist?) if that's what you're after, I'm afraid.  If
you wish to reproduce the problem yourself to at least assure
yourself that it is indeed a GCC fault (this'll provide the 'if'
but not the 'why') then build an opt (-O2+) moz build on
2.95.3-x86 and printf() the parameters last thing before the
function is called and first thing within the function itself.

I do not remember now the path of thinking or analysis that
arrived at the specific fix submitted except that it occurred
to me as something that should work as I was going to sleep,
and it did indeed work when trying it upon waking.  I'm sure
that's not set your mind at ease at all.  :)
Comment on attachment 58843 [details] [diff] [review]
Fix again, with comments on use of PRUint16

I'm not against the patch in any way. I just don't understand it.

I think it would be okay to check it in and if that fixes the patch
I'm good with that.
Attachment #58843 - Flags: review+
I tried the attachment and it seems to fix it.
checked in
Status: ASSIGNED → RESOLVED
Closed: 23 years ago
Resolution: --- → FIXED
Ah, thanks!
Product: Browser → Seamonkey
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: