Closed Bug 119042 Opened 23 years ago Closed 23 years ago

mixing PRUint32 and int terms in an array index expression crashes 64-bit gcc code

Categories

(Core :: Internationalization, defect)

DEC
Linux
defect
Not set
critical

Tracking

()

VERIFIED FIXED
mozilla0.9.9

People

(Reporter: ajschult784, Assigned: bstell)

References

()

Details

(Keywords: crash, intl, platform-parity, Whiteboard: DEC EXPERT NEEDED driver:brendan)

Attachments

(4 files)

Mozilla/5.0 (X11; U; Linux alpha; en-US; rv:0.9.7) Gecko/20011226

Mozilla crashes on any page with font size=7 or higher.

100% reproducible.
Attached file simple testcase
Keywords: crash
worksforme, PC Linux build 2002-01-07-06.  Is this an alpha-only problem?  Or
just not an issue anymore?
please provide versions/vendors of: Mozilla (cvs build, ftp.mozilla.org, ...), 
Xserver, Linux Kernel, Linux Distribution, c/c++ Compiler, CLibrary, Font 
server(s).
please indicate the .mozconfig or ./configure options you used (especially -g, 
-O or equivalents --enable-debug/--disable-debug --enable-optimize)
This was from the Mozilla binary RPMs by RedHat.  The machine is running RedHat
7.1 and XFree86 4.0.3-21 (RedHat binary RPM), 2.4.9-12 kernel (also RedHat
binary RPM).  The font server is the standard one that comes with XFree86
(XFree86-xfs-4.0.3-21.alpha.rpm)

I am also not seeing this bug on PC (Linux or Win98)
I suspect this may be a 64-bit issue, thought I can't repro it on Solaris.  Over
to layout... setting status to NEW to get this investigated by someone who has
the appropriate hardware.

Andrew, any chance of a stack trace with symbols?
Assignee: asa → attinasi
Status: UNCONFIRMED → NEW
Component: Browser-General → Layout
Ever confirmed: true
Keywords: pp
QA Contact: doronr → petersen
I don't have the opportunity to compile my own build, so I can't get you the
symbols...  :-(

I tried logging in to the Linux machine from a Unix (OSF4.0) Alpha machine
(mozilla running on Linux, but displayed on OSF), and it came up without crashing.

Also, I've noticed that if I have two Mozillas running at the same time and one
of them experiences the problem, they both crash.

I also told XFree86 to use an OSF machine as its font server, and it still crashed.
Sounds like a font stuff bug then, not layout directly.  Over to Intl.
Assignee: attinasi → yokoyama
Component: Layout → Internationalization
QA Contact: petersen → ruixu
over to shanjian.
Assignee: yokoyama → shanjian
Keywords: intl
QA Contact: ruixu → ylong
I tried running Mozilla on my PC (Linux, i686, Build 2002011508) and displaying
on my Alpha-Linux box.  This also worked without a crash.
Here's the stack trace (0.9.7 source):

(gdb) bt
#0  0x20002e6c2a8 in scale_imageAntiJag (aSrc=0x11fffcd30, aDst=0x12092a210)
    at nsXFontAAScaledBitmap.cpp:1125
#1  0x20002e68a98 in nsXFontAAScaledBitmap::GetScaledGreyImage (
    this=0x120c20400, aChar=0x120c317a0 "too big", aGreyImage=0x11fffd748)
    at nsXFontAAScaledBitmap.cpp:428
#2  0x20002e67a34 in nsXFontAAScaledBitmap::DrawText8or16 (this=0x120c20400, 
    aDrawable=0x120a894b0, aGC=0x120496b40, aX=8, aY=48, 
    a8or16String=0x120c317a0, aLength=7) at nsXFontAAScaledBitmap.cpp:259
#3  0x20002e675e8 in nsXFontAAScaledBitmap::DrawText8 (this=0x21ffecd63, 
    aDrawable=0x120a894b0, aGC=0x120496b40, aX=8, aY=48, 
    aString=0x120c317a0 "too big", aLength=7) at nsXFontAAScaledBitmap.cpp:137
#4  0x20002e5b130 in nsRenderingContextGTK::DrawString (this=0x120b9fcf0, 
    aString=0x120c317a0 "too big", aLength=7, aX=0, aY=760, aSpacing=0x0)
    at nsRenderingContextGTK.cpp:1486
#5  0x20003a52b74 in nsTextFrame::PaintAsciiText (this=0x120c2dd68, 
    aPresContext=0x120b23dd0, aRenderingContext=@0x120b9fcf0, 
    aStyleContext=0x120c2dd00, aTextStyle=@0x11fffdee0, dx=0, dy=0)
    at nsTextFrame.cpp:3109
#6  0x20003a4b7a4 in nsTextFrame::Paint (this=0x120c2dd68, 
    aPresContext=0x120b23dd0, aRenderingContext=@0x120b9fcf0, 
    aDirtyRect=@0x11fffdff8, aWhichLayer=eFramePaintLayer_Overlay, aFlags=0)
    at nsTextFrame.cpp:1460
#7  0x20003912630 in nsContainerFrame::PaintChild (this=0xff, 
    aPresContext=0x120b23dd0, aRenderingContext=@0x120b9fcf0, 
    aDirtyRect=@0x11fffe188, aFrame=0x120c2dd68, 
    aWhichLayer=eFramePaintLayer_Overlay, aFlags=0) at nsContainerFrame.cpp:251
#8  0x200039123f4 in nsContainerFrame::PaintChildren (this=0x120c2d898, 
    aPresContext=0x120b23dd0, aRenderingContext=@0x120b9fcf0, 
    aDirtyRect=@0x11fffe188, aWhichLayer=eFramePaintLayer_Overlay, aFlags=0)
    at nsContainerFrame.cpp:195
#9  0x2000395e880 in nsHTMLContainerFrame::Paint (this=0x120c2d898, 
    aPresContext=0x120b23dd0, aRenderingContext=@0x120b9fcf0, 
    aDirtyRect=@0x11fffe188, aWhichLayer=eFramePaintLayer_Overlay, aFlags=0)
    at nsHTMLContainerFrame.cpp:135
#10 0x20003912630 in nsContainerFrame::PaintChild (this=0xff, 
    aPresContext=0x120b23dd0, aRenderingContext=@0x120b9fcf0, 
    aDirtyRect=@0x11fffe538, aFrame=0x120c2d898, 
    aWhichLayer=eFramePaintLayer_Overlay, aFlags=0) at nsContainerFrame.cpp:251
#11 0x200038ee700 in nsBlockFrame::PaintChildren (this=0x120c2be58, 
    aPresContext=0x120b23dd0, aRenderingContext=@0x120b9fcf0, 
    aDirtyRect=@0x11fffe538, aWhichLayer=eFramePaintLayer_Overlay, aFlags=0)
    at nsBlockFrame.cpp:5467
#12 0x200038ee088 in nsBlockFrame::Paint (this=0x120c2be58, 
    aPresContext=0x120b23dd0, aRenderingContext=@0x120b9fcf0, 
    aDirtyRect=@0x11fffe538, aWhichLayer=eFramePaintLayer_Overlay, aFlags=0)
    at nsBlockFrame.cpp:5338
#13 0x20003912630 in nsContainerFrame::PaintChild (this=0xff, 
    aPresContext=0x120b23dd0, aRenderingContext=@0x120b9fcf0, 
    aDirtyRect=@0x11fffe8e8, aFrame=0x120c2be58, 
    aWhichLayer=eFramePaintLayer_Overlay, aFlags=0) at nsContainerFrame.cpp:251
#14 0x200038ee700 in nsBlockFrame::PaintChildren (this=0x120c2ba68, 
    aPresContext=0x120b23dd0, aRenderingContext=@0x120b9fcf0, 
    aDirtyRect=@0x11fffe8e8, aWhichLayer=eFramePaintLayer_Overlay, aFlags=0)
    at nsBlockFrame.cpp:5467
#15 0x200038ee088 in nsBlockFrame::Paint (this=0x120c2ba68, 
    aPresContext=0x120b23dd0, aRenderingContext=@0x120b9fcf0, 
    aDirtyRect=@0x11fffe8e8, aWhichLayer=eFramePaintLayer_Overlay, aFlags=0)
    at nsBlockFrame.cpp:5338
#16 0x20003912630 in nsContainerFrame::PaintChild (this=0xff, 
    aPresContext=0x120b23dd0, aRenderingContext=@0x120b9fcf0, 
    aDirtyRect=@0x11fffec90, aFrame=0x120c2ba68, 
    aWhichLayer=eFramePaintLayer_Overlay, aFlags=0) at nsContainerFrame.cpp:251
#17 0x200039123f4 in nsContainerFrame::PaintChildren (this=0x1209738e8, 
    aPresContext=0x120b23dd0, aRenderingContext=@0x120b9fcf0, 
    aDirtyRect=@0x11fffec90, aWhichLayer=eFramePaintLayer_Overlay, aFlags=0)
    at nsContainerFrame.cpp:195
#18 0x2000395e880 in nsHTMLContainerFrame::Paint (this=0x1209738e8, 
    aPresContext=0x120b23dd0, aRenderingContext=@0x120b9fcf0, 
    aDirtyRect=@0x11fffec90, aWhichLayer=eFramePaintLayer_Overlay, aFlags=0)
    at nsHTMLContainerFrame.cpp:135
#19 0x20003962f70 in CanvasFrame::Paint (this=0x1209738e8, 
    aPresContext=0x120b23dd0, aRenderingContext=@0x120b9fcf0, 
    aDirtyRect=@0x11fffec90, aWhichLayer=eFramePaintLayer_Overlay, aFlags=0)
    at nsHTMLFrame.cpp:393
#20 0x20003a07588 in PresShell::Paint (this=0x120b31e30, aView=0x120c2d390, 
    aRenderingContext=@0x120b9fcf0, aDirtyRect=@0x11fffec90)
    at nsPresShell.cpp:5579
#21 0x20003eebc04 in nsView::Paint (this=0x120c2d390, rc=@0x120b9fcf0, 
    rect=@0x11fffec90, aPaintFlags=0, aResult=@0x11fffec84) at nsView.cpp:270
#22 0x20003f01c40 in nsViewManager::RenderDisplayListElement (
    this=0x120b2aa70, element=0x120b12130, aRC=@0x120b9fcf0)
    at nsViewManager.cpp:1215
#23 0x20003f017d4 in nsViewManager::RenderViews (this=0x120b2aa70, 
    aRootView=0x120c2cb40, aRC=@0x120b9fcf0, aRect=@0x11fffeee0, 
    aResult=@0x11fffeea8) at nsViewManager.cpp:1138
#24 0x20003eff72c in nsViewManager::Refresh (this=0x120b2aa70, 
    aView=0x120c2cb40, aContext=0x120b9fcf0, aRegion=0x12086fb30, 
    aUpdateFlags=1) at nsViewManager.cpp:684
#25 0x20003f0412c in nsViewManager::DispatchEvent (this=0x120b2aa70, 
    aEvent=0x11ffff0f8, aStatus=0x11fffefd8) at nsViewManager.cpp:1762
#26 0x20003eeb15c in HandleEvent (aEvent=0x11ffff0f8) at nsView.cpp:80
#27 0x20001098364 in nsWidget::DispatchEvent (this=0x120c2cf90, 
    aEvent=0x11ffff0f8, aStatus=@0x11ffff0c8) at nsWidget.cpp:1408
#28 0x20001097cd8 in nsWidget::DispatchWindowEvent (this=0x120c2cf90, 
    event=0x11ffff0f8) at nsWidget.cpp:1299
#29 0x200010a5238 in nsWindow::DoPaint (this=0x120c2cf90, aX=0, aY=0, 
    aWidth=1024, aHeight=655, aClipRegion=0x120c2cc00) at nsWindow.cpp:794
#30 0x200010a5618 in nsWindow::Update (this=0x120c2cf90) at nsWindow.cpp:840
#31 0x200010a4dac in nsWindow::UpdateIdle (data=0x0) at nsWindow.cpp:704
#32 0x2000081a61c in g_idle_dispatch () from /usr/lib/libglib-1.2.so.0
#33 0x200008190b8 in g_main_dispatch () from /usr/lib/libglib-1.2.so.0
#34 0x20000819840 in g_main_iterate () from /usr/lib/libglib-1.2.so.0
#35 0x20000819a5c in g_main_run () from /usr/lib/libglib-1.2.so.0
#36 0x2000069f4c0 in gtk_main () from /usr/lib/libgtk-1.2.so.0
#37 0x2000106f79c in nsAppShell::Run (this=0x120098a30) at nsAppShell.cpp:349
#38 0x20000fbf624 in nsAppShellService::Run (this=0xff)
    at nsAppShellService.cpp:302
#39 0x1200187b0 in main1 (argc=1, argv=0x11ffff598, nativeApp=0x0)
    at nsAppRunner.cpp:1264
#40 0x120019824 in main (argc=1, argv=0x11ffff598) at nsAppRunner.cpp:1594
#41 0x20000a8f0ac in __libc_start_main (main=0x120019570 <main>, argc=1, 
    ubp_av=0x11ffff598, init=0x1200120e8 <_init>, 
    fini=0x2000002d738 <_dl_debug_mask>, rtld_fini=0x120afd1f0, 
    stack_end=0x11ffff580) at ../sysdeps/generic/libc-start.c:129
Andrew, thanks!

Brian, this is crashing somewhere in AA code land....
Assignee: shanjian → bstell
Before I generated that stack trace, I had deleted the XFree86-100dpi fonts from
my system.  Mozilla crashed on almost everything except www.mozilla.org/start
(even though the 100dpi font directories were deleted from my xfs config file).
I didn't notice that because I was just loading the big font testcase
over-and-over.

After posting the stack trace here, I noticed my bigger problems and
re-installed the 100dpi fonts.  Mozilla works again, and I generated a new stack
trace for the big font problem... it is identical, except that gdb can't make it
past #33 now for some reason.

Anyway, I thought that bit of info might come in handy.
comment #12 is not neccessarily true... I don't think I did a stack trace on
normal font page without 100dpi fonts, and the big font crash could have kicked
in before the 100dpi crash.  argh.
Ssince I do not see any thing obiviously wrong I expect I will need to
debug this.

Is there a Dec system I can build and debug on?
Status: NEW → ASSIGNED
Target Milestone: --- → mozilla0.9.9
Jim dunn might help find a machine (or better, a person) to help you 
identify the problem. 
I asked my compaq/dec buddies and they don't have an
alpha-linux box... sorry
Summary: mozilla crashes when font size>=7 → HELP NEEDED: mozilla crashes when font size>=7
Whiteboard: HELP NEEDED
here is my question after I look at the surronding code

1. what is sizeof(int) on x86? what is the sizeof(int) on the platform it crash.
> 1001 int expand = (((dst_width<<8)/aSrc->GetWidth())+255)>>8;

2. why expand is declare as "int" instead of one of the "PRUint32" or "PRInt32"
(or 16, )
What it is the assumption of the length in here?

3. why buffer and exp_buffer are declared as 64K elements array of PRUint8? Why
you pick that size?

4. can aSrc->GetWidth be 0?
>1001 int expand = (((dst_width<<8)/aSrc->GetWidth())+255)>>8;
how can we guranteed expand is not infinete ? should we put
if(aSrc->GetWidth() > 0) around line 1001 ?

5. about line 1001. What is our assumption of the max value of 
dst_width ?  can it be 0xFFFFFFFF (PRUint32). If so what will happen
when we have (dst_width<<8) ? will it truncated to
0xFFFFFF00 ?
since "int expand" is declared as a signed interger, will the result
be a negative number?

6. no null checking of expanded_data after malloc.
1020 if (exp_len > sizeof(exp_buffer))
1021 expanded_data = (PRUint8*)malloc(expanded_width*expanded_height);


7. how about add some assertion code here?
1027 for (int j=0; j<expand; j++) {
+     NS_ASSERTION( (exp_index + j) < exp_len, "wrong index");

+     NS_ASSERTION( padded_index < padded_src_len, "wrong index");
1028 expanded_data[exp_index+j] = padded_src[padded_index];
1029 }
 what should be the padded_src_len here ??? 



why you use "int" instead of "PRUint32" for 
1023 for (int i=0; i<expand; i++) {
1027 for (int j=0; j<expand; j++) {

8. 

1124 es[expand-1-j+((expand-1-i)*expanded_width)] =
1125 255-(((((i+j)<<8)/jag_len)*FILL_VALUE)>>8);
how can we be sure
expand-1-j+((expand-1-i)*expanded_width)
is in the right boundary ? ( > 0 and smaller than the size?)

can we add some assertion code here?


I think we should do at least the 1 and 2 in the following
1. check null after call malloc
2. use "PRUint32" or other to replace "int" 
3. add NS_ASSERTION in some places. 

If 1 and 2 didn't fix the crash, add 3 probabaly will help us ping point the issue. 
sizeof(int) on DEC Alpha and i686 are both 4.

Here are the values of some relevant variables when it crashes.  they all look
OK to me...

expand=2
i=0
j=0
jag_len=1
x=4
y=11
expanded_width=26
expanded_height=82
border_width=2
padded_width=13

es[0]=0
es[1]=255
es-expanded_data=688

on line 1164-1165, the index for es evaluates to 1, and the RHS evaluates to 0.

es was not malloc'd because 26*82 < 65536.  it should be using exp_buffer's
space, which is much larger than 688 (so 1164-1165 is not going out-of-bounds).
shouldn't it be:

1020   if (exp_len*sizeof(PRUint8) < sizeof(exp_buffer))

?

that seems rather serious.  
for my testcase, exp_buffer is still large enough in the present case to hold
expanded_data, so I don't think this is causing the current problem.
> 1. what is sizeof(int) on x86? what is the sizeof(int) on the platform it 
crash.
> > 1001 int expand = (((dst_width<<8)/aSrc->GetWidth())+255)>>8;
> 
> 2. why expand is declare as "int" instead of one of the "PRUint32" or 
"PRInt32"
> (or 16, )
> What it is the assumption of the length in here?

The value of "expand" will generally be in the range of 1 to 5 but could be in 
the 
range of 1 to 1000.

> 3. why buffer and exp_buffer are declared as 64K elements array of PRUint8? 
Why
> you pick that size?

The intent is to avoid malloc'ing temporary buffers. Any size would be
okay. I choose a large size.

> 4. can aSrc->GetWidth be 0?
> >1001 int expand = (((dst_width<<8)/aSrc->GetWidth())+255)>>8;
> how can we guranteed expand is not infinete ? should we put
> if(aSrc->GetWidth() > 0) around line 1001 ?

The code should never be here with a source image who's width
is 0. If you want you can add an assertion (or check in the caller)
but this is clearly the wrong place for a run time check.

> 5. about line 1001. What is our assumption of the max value of 
> dst_width ?  can it be 0xFFFFFFFF (PRUint32). If so what will happen
> when we have (dst_width<<8) ? will it truncated to
> 0xFFFFFF00 ?
> since "int expand" is declared as a signed interger, will the result
> be a negative number?

Do we expect an glyph to be expanded to 4 billion pixel wide?

> 6. no null checking of expanded_data after malloc.
> 1020 if (exp_len > sizeof(exp_buffer))
> 1021 expanded_data = (PRUint8*)malloc(expanded_width*expanded_height);

A check here would be a good thing to add.

> 7. how about add some assertion code here?
> 1027 for (int j=0; j<expand; j++) {
> +     NS_ASSERTION( (exp_index + j) < exp_len, "wrong index");
> 
> +     NS_ASSERTION( padded_index < padded_src_len, "wrong index");

These won't hurt but I doubt they would help.
Did you see anything wrong here or is this just a guess?

> 1028 expanded_data[exp_index+j] = padded_src[padded_index];
> 1029 }
>  what should be the padded_src_len here ??? 

It should be padded_width * padded_height.

> why you use "int" instead of "PRUint32" for 
> 1023 for (int i=0; i<expand; i++) {
> 1027 for (int j=0; j<expand; j++) {

Again, expand will be a small number.

> 1124 es[expand-1-j+((expand-1-i)*expanded_width)] =
> 1125 255-(((((i+j)<<8)/jag_len)*FILL_VALUE)>>8);
> how can we be sure
> expand-1-j+((expand-1-i)*expanded_width)
> is in the right boundary ? ( > 0 and smaller than the size?)
> 
> can we add some assertion code here?

An assertion would be okay.

Did you look at the math or is this just a guess?

> I think we should do at least the 1 and 2 in the following
> 1. check null after call malloc
> 2. use "PRUint32" or other to replace "int" 
> 3. add NS_ASSERTION in some places. 
> 

The check after malloc would be good.

Using PRUint32 will not make any difference.

The is nothing wrong with adding some assertion.

However, this feels like we are trying to guess an answer. I would
strongly prefer we actually understand the problem before we try to
fix it. 

> If 1 and 2 didn't fix the crash, add 3 probabaly will help us ping point the 
issue. 

Putting in fixes based on guesses is a form of superstition not 
engineering. If there are *no* other options then it may be an okay
thing to try. But it should be considered a "last resort" not a "common
debugging methodology".
> shouldn't it be:
> 
> 1020   if (exp_len*sizeof(PRUint8) < sizeof(exp_buffer))
> 
> ?
> 
> that seems rather serious.  

Isn't sizeof(PRUint8) is 1 (one) which make it identical to the existing
statement?

1020   if (exp_len > sizeof(exp_buffer))


> Isn't sizeof(PRUint8) is 1 (one)

yup.  momentary lapse of intelligence.
np. thanks for testing

By chance, are you running a debugger?
yes, I'm using gdb.
I revisitted the 100dpi issue (see comment #12 and #13).  With 100dpi fonts not
installed it crashes on <h1> as well as font size=6 and 7.  I did a debug run on
the <h1> tag, and it crashes at 1125, with different values for some variables,
although they are still all reasonable.

With 100dpi fonts installed, everything but the font size=7 works ok, and it
crashes at line 1165 (not line 1125 as stated in the backtrace, that was without
100dpi fonts).

So, forcing Mozilla to use 75dpi instead of 100dpi fonts seems to intensify the
problem.
I added the following to nsXFontAAScaledBitmap.cpp to test the crash

1155   for (i=0; i<jag_len; i++)
1156     for (j=0; j<jag_len-i; j++)
1157     {
1158       k = (i+j);
1159       k = ((i+j)<<8);
1160       l = jag_len;
1161       k = k/l;
1162       k = k*FILL_VALUE;
1163       k = k>>8;
1164       es[j+(i*expanded_width)] =
1165                 (((((i+j)<<8)/jag_len)*FILL_VALUE)>>8);
1166     }

gdb says that lines 1159 and 1160 set k=0 and l=1 (k and l are declared as
PRUint32).  when attempting to execute line 1161, gdb says a segmentation fault
occurs and jumps to line 1173 (old line 1165)

1172       es[expand-1-j+(i*expanded_width)] =
1173                     (((((i+j)<<8)/jag_len)*FILL_VALUE)>>8);
followup to comment #26:  Mozilla does not anti-alias 100dpi fonts below size 7,
but probably does size 6 and <h1> with 75dpi fonts.  Mozilla always seg faults
when attempting to antialias fonts.

changing the bug description.
Summary: HELP NEEDED: mozilla crashes when font size>=7 → HELP NEEDED: mozilla crashes when antialiasing fonts
changed name from:

  (HELP NEEDED: mozilla crashes when font size>=7)
  HELP NEEDED: mozilla crashes when antialiasing fonts

to:

  DEC EXPERT NEEDED: mozilla crashes when dividing 0 by 1
Summary: HELP NEEDED: mozilla crashes when antialiasing fonts → DEC EXPERT NEEDED: mozilla crashes when dividing 0 by 1
Whiteboard: HELP NEEDED → DEC EXPERT NEEDED
This still happens for me with today's gdb source, so I added a description of
my problems to bug58.  Here's a shorter URL for the bug:

http://sources.redhat.com/cgi-bin/gnatsweb.pl?cmd=view%20audit-trail&database=gdb&pr=58
How about adding a printf just before the line that crashes to display
the numbers? This may be a bit clumsy but it does not require gdb.
Doing printf's, Mozilla makes it through the first set (old line 1158), but
chokes on the second (old line 1164) trying to access es[expand-1-j].

Here is the output:

(this is at old line 1157)
i=0 j=0 expanded_width=26
j+(i*expanded_width)=0
es-exp_buffer=688
es[j+(i*expanded_width)]=255
success!
es[j+(i*expanded_width)]=0


(this is at old line 1164)
i=0 j=0 expanded_width=26 expand=2
expand-1-j+(i*expanded_width)=1
es-exp_buffer=688
*(es+1)=255
*(es+expand-1)=255
*(es+expand-1-j)=255
*(es+expand-1-j+(0*expanded_width))=255
*(es+expand-1-j+(i*expanded_width))=255
es[1]=255
es[expand-1]=255
(segmentation fault trying to print es[expand-1-j])

note that *(es+expand-1-j) works fine, but es[expand-1-j] crashes.

I persisted...
es[1-j]=255
es[expand+j]=0
es[expand-j]=0
es[expand+j-1]=255
(segmentation fault trying to print es[expand-1+j], even though es[expand+j-1]
worked fine!)
Andrew,

This is just a wild guess: could you try setting the size of the stack
buffers to something smaller, perhaps 8K?

http://lxr.mozilla.org/seamonkey/source/gfx/src/x11shared/nsXFontAAScaledBitmap.
cpp#987

987   PRUint8 buffer[65536];
988   PRUint8 *padded_src = aSrc->GetBuffer();
989   PRUint8 exp_buffer[65536];
I tried 8K and then 1K.  1K is small enough that it malloc's new space for
expanded_data.

Both 8K and 1K crashed identically to 64K buffers.
ok, here we go.  if i and j and expanded_width are declared as int's instead of
PRUint8, 8 and 32, then the Mozilla *does not crash*.  expand is an int, so, in
the computation:

expand-j-1+(i*expanded_width)

there are 3 different types and I guess the compiler can't come up with a decent
way to handle it.

leaving i, j and expanded_width as they are (PRUint8/32) and changing expand to
PRUint32 also prevents the crash, but causes Mozilla to hangly shortly
thereafter.  It seems to finish the x and y loops, but doesn't make it out of
the routine.  I'll post a followup when I track it down.

Note that PRUint8 is actually a char.  PRUint32 should be a real int, although
changing expanded_width to int from PRUint32 was necessary to fix the problem!
Attached patch patchSplinter Review
i and j just needed to be PRUint32 like they were before.
I hope no one is using PRUint8 for a local variable -- use PRUintn or just
unsigned int if you want a "at least 16 bits" unsigned integral type that's
likely to be optimal (register-sized or just as fast is if it were).  I ask cuz
I couldn't see a PRUint8 local in the current AA source.

I've seen this kind of bug before, long ago on a 64-bit system (pre-Mozilla, on
bug link available).  Is it possible that the sum or any intermediates in
expand-j-1+(i*expanded_width) is overflowing 32 bits and being represented using
64 bits, instead wrapping around 2^32 - 1?  What are the instructions generated
for that index expression?

/be
IOW, there may be no compiler bug here -- I bet ptrdiff_t is a signed 64-bit
integral type on the target system, and the compiler may be rightly promoting
intermediates or the final result of the index expression to that type.

/be
Blocks: 115520
Sorry for the spam, I hit enter when focus was in a text input, before I was
done.  drivers@mozilla.org would like to fix this for 0.9.8 -- patch coming up soon?

/be
Whiteboard: DEC EXPERT NEEDED → DEC EXPERT NEEDED driver:brendan
ptrdiff_t  is a 64 bit long when using c++ or when __STDC__ is defined. 
otherwise it is a 32 bit int.
Andrew: does attachment 66054 [details] [diff] [review] actually fix the crash?
yes.  no crash and font properly anti-aliased.
changing summary back now that we know what's going on.
Summary: DEC EXPERT NEEDED: mozilla crashes when dividing 0 by 1 → anti-aliasing causes crash on 64bit machines
Let's get this in for 0.9.8 -- bstell, please r=.

/be
I will build on Linux/Pc and test.
Comment on attachment 66054 [details] [diff] [review]
patch

The patch seems harmless. It would be nice if someone could explain why this
has an effect.
Attachment #66054 - Flags: review+
Comment on attachment 66054 [details] [diff] [review]
patch

The patch seems harmless. It would be nice if someone could explain why this
has an effect. r=bstell@ix.netcom.com
Comment on attachment 66054 [details] [diff] [review]
patch

sr=brendan@mozilla.org.

I can a=brendan this too for checkin into the 0.9.8 tree (hurry, if you get it
into the trunk before 8am, you'll not have to update the branch being cut later
today).

It appears that the fact that expand has type int in the current checked-in
code causes the compiler to generate instructions that "overflow" rather than
"32-bit-wrap" negative intermediate results.  Whereas if all terms in the index
expression have type PRUint32, as they do with the patch (please correct me if
I'm wrong), then the index is computed using 32-bit modular (unsigned)
arithmetic, and all is well.  As I wrote in a previous comment, I dimly recall
just such a bug years ago, in Netscape 4.x JS sources, IIRC.  I bet the
compiler is operating within the standard.  Ask dbaron or waldemar if you want
an authoritative ruling, but for now, let's go with what works.

/be
Attachment #66054 - Flags: superreview+
A little disassembly before and after the patch, of the code generated for the
index expression, would go a long way.  Is gdb behaving enough to do that?

dbaron, waldemar: please see comment #50 in particular.

/be
Summary: anti-aliasing causes crash on 64bit machines → mixing PRUint32 and int terms in an array index expression crashes 64-bit gcc code
Andrew, I'm confused by the fix and the statement in "Comment #27": 
> ...
> 1159       k = ((i+j)<<8);
> 1160       l = jag_len;
> 1161       k = k/l;
> ...
> gdb says that lines 1159 and 1160 set k=0 and l=1 (k and l are declared as
> PRUint32).  when attempting to execute line 1161, gdb says a segmentation
> fault occurs and jumps to line 1173 (old line 1165)

How is the patch different than this?

I'm willing to let this go but I do find it odd that integer math would
crash a Dec system. I'd be happier if I actually understood the problem
because then I would feel confident we had actually solved it and not just
guessed.


I think comment #27 is actually a gdb bug.  I was able to reproduce that
behavior (dividiing 0 by 1 make gdb "continue") in other (unrelated) parts of
the code like nsViewManager.cpp
the is an asm dump from gdb (via ddd).	I don't know where the line (1164-1165)
ends, or where it crashes...
the crash occurs at +4164>:	ldq_u	t1,0(t3)
% cat testuint.c
#include <stdio.h>

int main ()
{
   unsigned char *es;
   unsigned char exp_buffer[65536];
   unsigned int j;
   int expand;

   es=exp_buffer;
   j=0;
   expand=2;
   es[1]=0;
   printf ("es[expand-1-j]=%d\n",es[expand-1-j]);
   return 0;
}

% c++ -o testuint testuint.cpp
% testuint
Segmentation fault

works fine if "expand" is an unsigned int.
could you try this:

   int expand;
+  int tmp;

   es=exp_buffer;
   j=0;
   expand=2;
   es[1]=0;
+  tmp = expand-1-j;
+  printf ("es[tmp]=%d\n",es[tmp]);
   printf ("es[expand-1-j]=%d\n",es[expand-1-j]);
ok, I added that, and one more line

------------------------
es[tmp]=0
es[expand-j-1]=0
Segmentation fault
------------------------

Perhaps the negative intermediate might be "-1-j".  But why isn't "-j-1" also a
negative intermediate.
could you try this:

+  int I;
+  unsigned int uI;
+  long sL;
+  unsigned long uL;

   unsigned int j;
   int expand;

   j=0;
   expand=2;

+  I = expand-1-j;
+  printf("I = %d\n", I);
+
+  uI = expand-1-j;
+  printf("uI = %d\n", uI);
+
+  sL = expand-1-j;
+  printf("sL = %ld\n", sL);
+  uL = expand-1-j;
+  printf("uL = %ld\n", uL);
+
I = 1
es[I]=0
uI = 1
es[uI]=0
sL = 1
es[sL]=0
uL = 1
es[uL]=0
-------------
it seems that the arithmetic must be done within the [] of es.
okay, how about this:

    long sL;
    printf("&es[1] = %P\n", &es[1]);
    printf("expand-1-j] = %d\n", expand-1-j);
    printf("&es[expand-1-j] = %P\n", &es[expand-1-j]);
    sL = &es[expand-1-j] - &es[1];
    printf("sL = %ld\n", sL);
    printf("&es[expand-1-j] - &es[1] = %d\n", &es[expand-1-j] - &es[1]);

that seems to nail it down:

-------------------------------------------
&es[1] = 0x11ffff489
expand-1-j = 1
&es[expand-1-j] = 0x21ffff489
sL = &es[expand-1-j] - &es[1]
   = 4294967296
&es[expand-1-j] - &es[1] = 0
-------------------------------------------
> &es[1] = 0x11ffff489
> expand-1-j = 1
> &es[expand-1-j] = 0x21ffff489

the array math is off by 0x1_0000_0000; 
ie: the 33rd bit

I'll wager that -1 is represented by gcc as 0xFFFF_FFFF and got promoted to
0x0000_0000_FFFF_FFFF instead of 0xFFFF_FFFF_FFFF_FFFF when it was promoted
to 64 bits for adding to expand.

> sL = &es[expand-1-j] - &es[1]
>    = 4294967296

64 bit math keeps the 33rd bit

> &es[expand-1-j] - &es[1] = 0

32 bit math drops the 33rd bit

Marking this bug fixed.
Status: ASSIGNED → RESOLVED
Closed: 23 years ago
Resolution: --- → FIXED
(just for the record) checked in at 6:46 am
In response to comment 50, the compiler has a major bug here.  This behavior is
higly illegal according to both the C and C++ standards.  The standards define
the expression es[expand-1-j] to be equivalent to *(es + (expand-1-j)).  By the
rules of C and C++ arithmetic, expand-1-j is calculated as (expand-1)-j.  The
type of expand-1 is int, and its value is 1.  The type of (expand-1)-j is
unsigned int (see the first section of chapter 5 of the C++ spec), and its value
is 1.  Thus, es[expand-1-j] = *(es + (unsigned int)1), as you'd expect.
No negative numbers were generated anywhere.  No correct implementation of C or
C++ can generate an extra offset of 0x100000000 here.
Andrew:
Could you please check it on latast build see you still can see the crash cause
I don't have right machine here?  Thanks a lot!
Thanks to bstell for the checkin, and to waldemar for the C++ standard citation,
and especially to ajschult for the work tracking this down.  Has anyone filed a
bug with the gcc keepers?

/be
I filed a bug with RedHat, as I am using their stock binaries.
http://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=58746
Andrew: Thanks for putting up with and trying out all my variations. I really 
prefer to understand a problem before I let it go.

It is so cool to see bugzilla used at Redhat!
I rebuilt last night from CVS, and it works great with the patch.
FYI, if I use gcc-2.96-98, which is the compiler for Red Hat 7.2, I don't see
this bug, at least not using the test case provided in the Red Hat bug.  I also
don't see it with gcc-2.96-85.
this is Alpha-only, and RedHat 7.2 has not yet been released for Alpha.  You are
testing on Alpha, right?
Oh.  No.  *blush*
I've opened bug 121642 to add a test for the malloc return value
as recommended by ftang in comment #17
Mark as verified cause seperated bugs filed.
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: