Closed Bug 94375 Opened 23 years ago Closed 23 years ago

-O3 build will open the window but segv [WAS: gcc3 -O3 Garbage characters in time on status bar.]

Categories

(Core :: Graphics: ImageLib, defect)

x86
Linux
defect
Not set
trivial

Tracking

()

RESOLVED FIXED

People

(Reporter: loz, Assigned: pavlov)

Details

I've only seen this happen on the following system:
Complete gcc3.0 Linux from scratch build.
Mozilla built with -march=athlon -mcpu=athlon -O3.
With these setting I often get one of the characters ":;<=> or ?" in the second
decimal point, or the unit column of the time taken to render.
This happens especially often when drawing one of the debug test pages that
renders very quickly.
If I apply the following patch to
dist/bin/chrome/comm/content/navigator/nsBrowserStatusHander.js :
196,198c196,197
<           //msg = gNavigatorBundle.getString("nv_done");
<           //msg = msg.replace(/%elapsed%/, elapsed);
< 	
			msg = elapsed;
---
>           msg = gNavigatorBundle.getString("nv_done");
>           msg = msg.replace(/%elapsed%/, elapsed);

Then the problem goes away (at the expense of the status bar message being
rather terse). This would imply that the problem is somewhere within the
javascript replace function. Compliling the js directory with -O2 instead of -O3
makes the problem go away. I cannot tell whether this is because the problem is
fixed or whether the browser can no longer render fast enough to experience the
problem. Just compiling jsstr.c with -O2 does not fix the problem.
i'm confused by the patch, because the code isn't commented out (see url)
Assignee: asa → blake
Component: Browser-General → XP Apps: GUI Features
QA Contact: doronr → sairuh
I'm guessing he just has the patch the wrong way around. So he's actually
commenting out those two lines. I suspect that gcc 3.0 (which we don't support
yet, mind you) 's optimization does something messy to the js code.

Brendan? Shaver? Interested in taking a closer look, or should this just be
futured for now?
Yep, sorry, I got the patch, backwards (Doh! and a three liner at that).
I've found a couple more things. 
1. The problem goes away if I output every string that is getting converted in
jsstr.c - js_ValueToString to stdout. whether this is because the compiler is
now inlining differently or because the extra time to do this has stopped a race
condition triggering I don't know yet. 
2. The corruption has happened by the time do_replace (again in jsstr.c) is called.
Assignee: blake → rogerl
Component: XP Apps: GUI Features → Javascript Engine
QA Contact: sairuh → pschwartau
Summary: Garbage characters in time on status bar. → gcc3 -O3 Garbage characters in time on status bar.
ok so let's try jseng and the gcc3 mozilla people
Does compiling some other js*.c file -O2 make the problem go away?  If so, which
file?  Loc, how did you divine that the corruption occurred before do_replace is
called?  Maybe it can be fenced in -- is there a way to bound the earliest point
it could have happened in the str_replace control flow?

/be

P.S.  Thanks timeless, well-assigned this time!
Status: UNCONFIRMED → NEW
Ever confirmed: true
Reassigning to Kenton.
Assignee: rogerl → khanson
I saw the corruption in do_replace (and also str_replace) by putting code like
this in (I'm afraid I'm not a C programmer, and I don't know the mozilla code
base, so there is almost certainly a function for this somewhere)...

    uint lozi;
    for (lozi = 0; (lozi < 10) && (rdata->repstr->chars[lozi] != 0); lozi++) {
      printf("%c", rdata->repstr->chars[lozi]);
    }
    printf("\n");

I'm wondering why the garbage characters come from a small set, and these are
(in ASCII charset) immediately after the digits.
Could someone give me an order of files to try compiling -O2 in js to see if I
can narrow things down a bit.
Ah, I'm learning a little about this C lark. By trial and error it seems that
compiling jsinterp.c and jsstr.c as -O2 removes the problem, though either on
its own is not sufficient.
Could this be related to http://bugzilla.mozilla.org/show_bug.cgi?id=83388 ?
Can you try compiling those files with -O3 -fno-strict-aliasing? Does that cause
the problem to go away?
Compiling jsstr.c and jsinterp.c with -O3 -fno-strict-aliasing does not make the
problem go away. I'll try and get a download of gcc-3.0.1 sometime soon and see
if that makes a difference.
gcc-3.0.1 doesn't fix this problem, though it did fix a different problem I was
having. I could not previously compile
layout/html/forms/src/nsFormControlHelper.cpp at -O3 (if interested it was
logged at http://gcc.gnu.org/cgi-bin/gnatsweb.pl?cmd=view&pr=3485&database=gcc )
gcc-3.0.1 does not fix this problem. Now, however, recompiling js/src/jsdtoa.c
with -O1 (-mcpu=athlon -march=athlon) does appear to fix it. Out of curiosity I
tried compiling this file at -O3 with gcc-3.0, and the rest of the app using
gcc-3.0.1, but this didn't help either.
i use a cvs from gcc-3.0.2 wich solves this problem (current debian unstable package)
WFM with cvs from 20011020
Marking WFM per the last comments.
Status: NEW → RESOLVED
Closed: 23 years ago
Resolution: --- → WORKSFORME
Loz, does this solution work for you, too? If so, you can mark this
bug "Verified"; otherwise, you can reopen it, thanks -
This still happens for me with gcc-3.0.2, mozilla-0.9.5.
Maybe my setup is wrong - I instaled gcc to a new directory, and then compiled
Mozilla with my new gcc bin directory at the front of my PATH (typing gcc
--version gave 3.0.2). Is this sufficient for Mozilla's build process?
I've also not recompiled any additional libraries (e.g. glibc) - would I need to
do this to resolve the problem?

cheers

Loz
Status: RESOLVED → REOPENED
Resolution: WORKSFORME → ---
Remarking worksforme.  The fix was checked in after 0.9.5 branched, so you'll
need to pull from current CVS to get the fix.
Status: REOPENED → RESOLVED
Closed: 23 years ago23 years ago
Resolution: --- → WORKSFORME
I'll try it when I get home tonight - do I need a complete CVS update, or are
there a couple of files I can update by hand?

Loz
Many changes have been made, and there was no patch for this bug
specifically that made it WORKSFORME - I would pull everything.
I'll try this if someone can give me a CVS command that will pull something
recent and compilable, I'm getting errors in directory/c-sdk/ldap and if I don't
build that part I get a SEGV on startup. I used cvs -z3 update -A to refresh the
source.
Loz, can you paste what errors you're getting in directory/c-sdk/ldap?
That will help; thanks -
The with c-sdk/ldap was an undefined symbol LDAP_CONTROL_PROXYAUTH in
directory/c-sdk/ldap/libraries/libldap/proxyauthctrl.c
There was also a warning when configuring that no configure information existed
in directory/c-sdk/ldap.

I tried this last night
cvs update -A client.mk
make -f client.mk checkout

I think this is probably the correct way to get the latest stuff that people
believe is compilable. And it did compile. Unfortunately is segv'd immediately
after putting the window up. I left a debug build compiling when I left for work.
I'll try a -O2 build at some point too.
I've had a chance to try three builds. 
The -O2 build apparently runs fine.
A -O3 --enable-debug build also appears to runs ok.
However, the -O3 build will open the window but segv as soon as it attempts to
render a page. (Setting the start page to blank means I can actually see the
window and interact with it before it crashes)
If there is a way I can interpret the core file to get something useful for
people then fire away. Any other ideas as to what I can try are gratefully received.
I don't know how useful this is - but its what its the backtrace I get from my
core file for -O3. If there is a way to use the symbols from my debug build to
put some more meat on this then I'm all ears.

#0  0x40d24db7 in NSGetModule ()
   from
/home/loz/work/mozilla-0.9.5+3.0.2-athlon-O3/dist/bin/components/libimglib2.so
#0  0x40d24db7 in NSGetModule ()
   from
/home/loz/work/mozilla-0.9.5+3.0.2-athlon-O3/dist/bin/components/libimglib2.so
#1  0x412e012c in NSGetModule ()
   from
/home/loz/work/mozilla-0.9.5+3.0.2-athlon-O3/dist/bin/components/libimggif.so
#2  0x412df258 in NSGetModule ()
   from
/home/loz/work/mozilla-0.9.5+3.0.2-athlon-O3/dist/bin/components/libimggif.so
#3  0x412deefb in NSGetModule ()
   from
/home/loz/work/mozilla-0.9.5+3.0.2-athlon-O3/dist/bin/components/libimggif.so
#4  0x412ddf2c in NSGetModule ()
   from
/home/loz/work/mozilla-0.9.5+3.0.2-athlon-O3/dist/bin/components/libimggif.so
#5  0x412dfa88 in NSGetModule ()
   from
/home/loz/work/mozilla-0.9.5+3.0.2-athlon-O3/dist/bin/components/libimggif.so
#6  0x412e0230 in NSGetModule ()
   from
/home/loz/work/mozilla-0.9.5+3.0.2-athlon-O3/dist/bin/components/libimggif.so
#7  0x4015163d in
_ZN6nsPipe17nsPipeInputStream12ReadSegmentsEPFjP14nsIInputStreamPvPKcjjPjES3_jS6_
() from /home/loz/work/mozilla/dist/bin/libxpcom.so
#8  0x412dfaeb in NSGetModule ()
   from
/home/loz/work/mozilla-0.9.5+3.0.2-athlon-O3/dist/bin/components/libimggif.so
#9  0x40d2892b in NSGetModule ()
   from
/home/loz/work/mozilla-0.9.5+3.0.2-athlon-O3/dist/bin/components/libimglib2.so
#10 0x40d274ed in NSGetModule ()
   from
/home/loz/work/mozilla-0.9.5+3.0.2-athlon-O3/dist/bin/components/libimglib2.so
#11 0x408eb32f in NSGetModule ()
   from /home/loz/work/mozilla-0.9.5+3.0.2-athlon-O3/dist/bin/components/libnecko.so
#12 0x408af9a6 in NSGetModule ()
   from /home/loz/work/mozilla-0.9.5+3.0.2-athlon-O3/dist/bin/components/libnecko.so
#13 0x408a04ea in NSGetModule ()
   from /home/loz/work/mozilla-0.9.5+3.0.2-athlon-O3/dist/bin/components/libnecko.so
#14 0x4016a292 in PL_HandleEvent ()
   from /home/loz/work/mozilla/dist/bin/libxpcom.so
#15 0x40169637 in PL_ProcessPendingEvents ()
   from /home/loz/work/mozilla/dist/bin/libxpcom.so
#16 0x4016b874 in _ZN16nsEventQueueImpl20ProcessPendingEventsEv ()
   from /home/loz/work/mozilla/dist/bin/libxpcom.so
#17 0x40957266 in NSGetModule ()
   from
/home/loz/work/mozilla-0.9.5+3.0.2-athlon-O3/dist/bin/components/libwidget_gtk.so
#18 0x409571ef in NSGetModule ()
Maybe this is a bit more useful (I updated CVS again, and this time remembered
not to strip the libraries when build (doh!)). This is the backtrace from my -O3
build.

#0  0x40e03db7 in _ZN12imgContainer11AppendFrameEP14gfxIImageFrame ()
   from
/home/loz/work/mozilla-0.9.5+3.0.2-athlon-O3/dist/bin/components/libimglib2.so
#1  0x4129a12c in _Z14HaveDecodedRowPvPhiiiihi ()
   from
/home/loz/work/mozilla-0.9.5+3.0.2-athlon-O3/dist/bin/components/libimggif.so
#2  0x41299258 in _Z10output_rowP10gif_struct ()
   from
/home/loz/work/mozilla-0.9.5+3.0.2-athlon-O3/dist/bin/components/libimggif.so
#3  0x41298efb in _Z6do_lzwP10gif_structPKh ()
   from
/home/loz/work/mozilla-0.9.5+3.0.2-athlon-O3/dist/bin/components/libimggif.so
#4  0x41297f2c in _Z9gif_writeP10gif_structPKhj ()
   from
/home/loz/work/mozilla-0.9.5+3.0.2-athlon-O3/dist/bin/components/libimggif.so
#5  0x41299a88 in _ZN13nsGIFDecoder211ProcessDataEPhj ()
   from
/home/loz/work/mozilla-0.9.5+3.0.2-athlon-O3/dist/bin/components/libimggif.so
#6  0x4129a230 in _Z11ReadDataOutP14nsIInputStreamPvPKcjjPj ()
   from
/home/loz/work/mozilla-0.9.5+3.0.2-athlon-O3/dist/bin/components/libimggif.so
#7  0x40156edd in
_ZN6nsPipe17nsPipeInputStream12ReadSegmentsEPFjP14nsIInputStreamPvPKcjjPjES3_jS6_
() from /home/loz/work/mozilla/dist/bin/libxpcom.so
#8  0x41299aeb in _ZN13nsGIFDecoder29WriteFromEP14nsIInputStreamjPj ()
   from
/home/loz/work/mozilla-0.9.5+3.0.2-athlon-O3/dist/bin/components/libimggif.so
#9  0x40e0792b in
_ZN10imgRequest15OnDataAvailableEP10nsIRequestP11nsISupportsP14nsIInputStreamjj ()
   from
/home/loz/work/mozilla-0.9.5+3.0.2-athlon-O3/dist/bin/components/libimglib2.so
#10 0x40e064ed in
_ZN13ProxyListener15OnDataAvailableEP10nsIRequestP11nsISupportsP14nsIInputStreamjj
()
   from
/home/loz/work/mozilla-0.9.5+3.0.2-athlon-O3/dist/bin/components/libimglib2.so
#11 0x409cb33f in
_ZN12nsJARChannel15OnDataAvailableEP10nsIRequestP11nsISupportsP14nsIInputStreamjj ()
   from /home/loz/work/mozilla-0.9.5+3.0.2-athlon-O3/dist/bin/components/libnecko.so
#12 0x4098f9a6 in _ZN22nsOnDataAvailableEvent11HandleEventEv ()
   from /home/loz/work/mozilla-0.9.5+3.0.2-athlon-O3/dist/bin/components/libnecko.so
#13 0x409804ea in _ZN23nsARequestObserverEvent13HandlePLEventEP7PLEvent ()
   from /home/loz/work/mozilla-0.9.5+3.0.2-athlon-O3/dist/bin/components/libnecko.so
#14 0x4016ef90 in PL_ProcessEventsBeforeID ()
   from /home/loz/work/mozilla/dist/bin/libxpcom.so
#15 0x407f815d in _Z12processQueuePvS_ ()
   from
/home/loz/work/mozilla-0.9.5+3.0.2-athlon-O3/dist/bin/components/libwidget_gtk.so
#16 0x40138dad in _ZN11nsVoidArray17EnumerateForwardsEPFiPvS0_ES0_ ()
   from /home/loz/work/mozilla/dist/bin/libxpcom.so
#17 0x407f819e in _ZN10nsAppShell15ProcessBeforeIDEm ()
   from
/home/loz/work/mozilla-0.9.5+3.0.2-athlon-O3/dist/bin/components/libwidget_gtk.so
#18 0x407ff1c6 in _Z16handle_gdk_eventP9_GdkEventPv ()
   from
/home/loz/work/mozilla-0.9.5+3.0.2-athlon-O3/dist/bin/components/libwidget_gtk.so
#19 0x4037ec0e in gdk_event_dispatch () from /opt/gnome/lib/libgdk-1.2.so.0
#20 0x403afcf0 in g_main_dispatch () from /opt/gnome/lib/libglib-1.2.so.0
#21 0x403afff8 in g_main_iterate () from /opt/gnome/lib/libglib-1.2.so.0
#22 0x403b04bc in g_main_run () from /opt/gnome/lib/libglib-1.2.so.0
#23 0x402c838f in gtk_main () from /opt/gnome/lib/libgtk-1.2.so.0
#24 0x407f7e43 in _ZN10nsAppShell3RunEv ()
   from
/home/loz/work/mozilla-0.9.5+3.0.2-athlon-O3/dist/bin/components/libwidget_gtk.so
#25 0x407dac12 in _ZN17nsAppShellService3RunEv ()
   from
/home/loz/work/mozilla-0.9.5+3.0.2-athlon-O3/dist/bin/components/libnsappshell.so
#26 0x804c814 in _Z5main1iPPcP11nsISupports ()
#27 0x804bb1c in main ()
#28 0x404f1861 in __libc_start_main () at soinit.c:56
If I revert imgContainer.cpp and imgContainer.h in modules/libpr0n/src to
MOZILLA_0_9_5_RELEASE then the application runs ok, and the garbage problem in
the status bar goes away.
Should I mark this bug as verified and raise a new one for the new problem with -O3?
Perhaps we should keep this bug open and just change the summary,
since it contains the stack traces already. Would that be OK? 
I've got no problem with that - anyone else?
Reopening and resummarizing for the crash issue; if my summary is not
precise enough, please adjust; thanks. Will also need advice as to 
whether JavaScript Engine is the correct component for this bug. 
Based on the stack traces, does it look like ImageLib might be the 
correct component? 
Status: RESOLVED → REOPENED
Resolution: WORKSFORME → ---
Summary: gcc3 -O3 Garbage characters in time on status bar. → -O3 build will open the window but segv [WAS: gcc3 -O3 Garbage characters in time on status bar.]
I'd guess it should be ImageLib as the problem goes away when I revert two files
in libpr0n (imgContainer.cpp and imgContainer.h) to the 0.9.5 release. But, I'm
not a C++ programmer...
> I'd guess it should be ImageLib as the problem goes away when I revert two
> files in libpr0n (imgContainer.cpp and imgContainer.h) to the 0.9.5 release.

Can you try the following hack?

--- imgContainer.h-save Tue Nov  6 10:19:24 2001
+++ imgContainer.h      Tue Nov  6 10:30:44 2001
@@ -42,9 +42,9 @@
 
 #include "nsWeakReference.h"
 
-#ifdef __GNUC__
-#define CANT_INLINE_GETTER
-#endif
+//#ifdef __GNUC__
+//#define CANT_INLINE_GETTER
+//#endif
 
 #define NS_IMGCONTAINER_CID \
 { /* 5e04ec5e-1dd2-11b2-8fda-c4db5fb666e0 */         \
@@ -81,8 +81,9 @@
   nsresult inlinedGetFrameAt(PRUint32 index, gfxIImageFrame **_retval);
 #else
   inline nsresult inlinedGetFrameAt(PRUint32 index, gfxIImageFrame **_retval) {
-    *_retval = NS_STATIC_CAST(gfxIImageFrame*, mFrames.ElementAt(index));
-    if (!*_retval) return NS_ERROR_FAILURE;
+    nsISupports *_elem = mFrames.ElementAt(index);
+    if (!_elem) return NS_ERROR_FAILURE;
+    *_retval = NS_STATIC_CAST(gfxIImageFrame*, _elem);
     return NS_OK;
   }
 #endif


This is just a guess but I cannot see anything other than that which would
prevet gcc from inlining the function.
Reassigning to ImageLib based on Loz's findings at 2001-11-06 03:40 above. 
Assignee: khanson → pavlov
Status: REOPENED → NEW
Component: Javascript Engine → ImageLib
QA Contact: pschwartau → tpreston
dup!
Ulrich's patch fixes this for me. I've marked it as WORKSFORME on the off-chance
that thats the right thing to do.

cheers all.
Status: NEW → RESOLVED
Closed: 23 years ago23 years ago
Resolution: --- → WORKSFORME
I just recalled the cvs log message for the last patch to imgContainer.h:

revision 1.12
date: 2001/10/28 21:02:04;  author: dbaron%fas.harvard.edu;  state: Exp;  lines:
+8 -0
Fix -O2 optimization crash with gcc 2.96 or 3.0.{1,2} by not inlining |#ifdef
__GNUC__|.  b=106891  r=pavlov  sr=brendan

Does anyone want to check that Ulrich's patch doesn't cause this to regress
(apologies if I'm being dumb) - I'll put a message on the list for 106891.
> Does anyone want to check that Ulrich's patch doesn't cause this to regress
> (apologies if I'm being dumb) - I'll put a message on the list for 106891.

If the patch does work the original change (to prevent inlining) only hid the
problem.  Since the non-inline code was the same the generated code should have
been, given the right situation, be the same.

Maybe somebody can send me the disassembled code which had the problem so that I
can check what really is the problem.
dbaron: can you look at this and send drepper the disassembled code?
Status: RESOLVED → REOPENED
Resolution: WORKSFORME → ---
we checked in a workaround for this a while ago.  i don't remeber the bug
number, but i'm marking this fixed.  if someone finds the other bug, please dup it.
Status: REOPENED → RESOLVED
Closed: 23 years ago23 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.