Closed Bug 94375 Opened 24 years ago Closed 24 years ago

-O3 build will open the window but segv [WAS: gcc3 -O3 Garbage characters in time on status bar.]

Categories

(Core :: Graphics: ImageLib, defect)

x86
Linux
defect
Not set
trivial

Tracking

()

RESOLVED FIXED

People

(Reporter: loz, Assigned: pavlov)

Details

I've only seen this happen on the following system: Complete gcc3.0 Linux from scratch build. Mozilla built with -march=athlon -mcpu=athlon -O3. With these setting I often get one of the characters ":;<=> or ?" in the second decimal point, or the unit column of the time taken to render. This happens especially often when drawing one of the debug test pages that renders very quickly.
If I apply the following patch to dist/bin/chrome/comm/content/navigator/nsBrowserStatusHander.js : 196,198c196,197 < //msg = gNavigatorBundle.getString("nv_done"); < //msg = msg.replace(/%elapsed%/, elapsed); < msg = elapsed; --- > msg = gNavigatorBundle.getString("nv_done"); > msg = msg.replace(/%elapsed%/, elapsed); Then the problem goes away (at the expense of the status bar message being rather terse). This would imply that the problem is somewhere within the javascript replace function. Compliling the js directory with -O2 instead of -O3 makes the problem go away. I cannot tell whether this is because the problem is fixed or whether the browser can no longer render fast enough to experience the problem. Just compiling jsstr.c with -O2 does not fix the problem.
i'm confused by the patch, because the code isn't commented out (see url)
Assignee: asa → blake
Component: Browser-General → XP Apps: GUI Features
QA Contact: doronr → sairuh
I'm guessing he just has the patch the wrong way around. So he's actually commenting out those two lines. I suspect that gcc 3.0 (which we don't support yet, mind you) 's optimization does something messy to the js code. Brendan? Shaver? Interested in taking a closer look, or should this just be futured for now?
Yep, sorry, I got the patch, backwards (Doh! and a three liner at that). I've found a couple more things. 1. The problem goes away if I output every string that is getting converted in jsstr.c - js_ValueToString to stdout. whether this is because the compiler is now inlining differently or because the extra time to do this has stopped a race condition triggering I don't know yet. 2. The corruption has happened by the time do_replace (again in jsstr.c) is called.
Assignee: blake → rogerl
Component: XP Apps: GUI Features → Javascript Engine
QA Contact: sairuh → pschwartau
Summary: Garbage characters in time on status bar. → gcc3 -O3 Garbage characters in time on status bar.
ok so let's try jseng and the gcc3 mozilla people
Does compiling some other js*.c file -O2 make the problem go away? If so, which file? Loc, how did you divine that the corruption occurred before do_replace is called? Maybe it can be fenced in -- is there a way to bound the earliest point it could have happened in the str_replace control flow? /be P.S. Thanks timeless, well-assigned this time!
Status: UNCONFIRMED → NEW
Ever confirmed: true
Reassigning to Kenton.
Assignee: rogerl → khanson
I saw the corruption in do_replace (and also str_replace) by putting code like this in (I'm afraid I'm not a C programmer, and I don't know the mozilla code base, so there is almost certainly a function for this somewhere)... uint lozi; for (lozi = 0; (lozi < 10) && (rdata->repstr->chars[lozi] != 0); lozi++) { printf("%c", rdata->repstr->chars[lozi]); } printf("\n"); I'm wondering why the garbage characters come from a small set, and these are (in ASCII charset) immediately after the digits. Could someone give me an order of files to try compiling -O2 in js to see if I can narrow things down a bit.
Ah, I'm learning a little about this C lark. By trial and error it seems that compiling jsinterp.c and jsstr.c as -O2 removes the problem, though either on its own is not sufficient.
Can you try compiling those files with -O3 -fno-strict-aliasing? Does that cause the problem to go away?
Compiling jsstr.c and jsinterp.c with -O3 -fno-strict-aliasing does not make the problem go away. I'll try and get a download of gcc-3.0.1 sometime soon and see if that makes a difference.
gcc-3.0.1 doesn't fix this problem, though it did fix a different problem I was having. I could not previously compile layout/html/forms/src/nsFormControlHelper.cpp at -O3 (if interested it was logged at http://gcc.gnu.org/cgi-bin/gnatsweb.pl?cmd=view&pr=3485&database=gcc )
gcc-3.0.1 does not fix this problem. Now, however, recompiling js/src/jsdtoa.c with -O1 (-mcpu=athlon -march=athlon) does appear to fix it. Out of curiosity I tried compiling this file at -O3 with gcc-3.0, and the rest of the app using gcc-3.0.1, but this didn't help either.
i use a cvs from gcc-3.0.2 wich solves this problem (current debian unstable package) WFM with cvs from 20011020
Marking WFM per the last comments.
Status: NEW → RESOLVED
Closed: 24 years ago
Resolution: --- → WORKSFORME
Loz, does this solution work for you, too? If so, you can mark this bug "Verified"; otherwise, you can reopen it, thanks -
This still happens for me with gcc-3.0.2, mozilla-0.9.5. Maybe my setup is wrong - I instaled gcc to a new directory, and then compiled Mozilla with my new gcc bin directory at the front of my PATH (typing gcc --version gave 3.0.2). Is this sufficient for Mozilla's build process? I've also not recompiled any additional libraries (e.g. glibc) - would I need to do this to resolve the problem? cheers Loz
Status: RESOLVED → REOPENED
Resolution: WORKSFORME → ---
Remarking worksforme. The fix was checked in after 0.9.5 branched, so you'll need to pull from current CVS to get the fix.
Status: REOPENED → RESOLVED
Closed: 24 years ago24 years ago
Resolution: --- → WORKSFORME
I'll try it when I get home tonight - do I need a complete CVS update, or are there a couple of files I can update by hand? Loz
Many changes have been made, and there was no patch for this bug specifically that made it WORKSFORME - I would pull everything.
I'll try this if someone can give me a CVS command that will pull something recent and compilable, I'm getting errors in directory/c-sdk/ldap and if I don't build that part I get a SEGV on startup. I used cvs -z3 update -A to refresh the source.
Loz, can you paste what errors you're getting in directory/c-sdk/ldap? That will help; thanks -
The with c-sdk/ldap was an undefined symbol LDAP_CONTROL_PROXYAUTH in directory/c-sdk/ldap/libraries/libldap/proxyauthctrl.c There was also a warning when configuring that no configure information existed in directory/c-sdk/ldap. I tried this last night cvs update -A client.mk make -f client.mk checkout I think this is probably the correct way to get the latest stuff that people believe is compilable. And it did compile. Unfortunately is segv'd immediately after putting the window up. I left a debug build compiling when I left for work. I'll try a -O2 build at some point too.
I've had a chance to try three builds. The -O2 build apparently runs fine. A -O3 --enable-debug build also appears to runs ok. However, the -O3 build will open the window but segv as soon as it attempts to render a page. (Setting the start page to blank means I can actually see the window and interact with it before it crashes) If there is a way I can interpret the core file to get something useful for people then fire away. Any other ideas as to what I can try are gratefully received.
I don't know how useful this is - but its what its the backtrace I get from my core file for -O3. If there is a way to use the symbols from my debug build to put some more meat on this then I'm all ears. #0 0x40d24db7 in NSGetModule () from /home/loz/work/mozilla-0.9.5+3.0.2-athlon-O3/dist/bin/components/libimglib2.so #0 0x40d24db7 in NSGetModule () from /home/loz/work/mozilla-0.9.5+3.0.2-athlon-O3/dist/bin/components/libimglib2.so #1 0x412e012c in NSGetModule () from /home/loz/work/mozilla-0.9.5+3.0.2-athlon-O3/dist/bin/components/libimggif.so #2 0x412df258 in NSGetModule () from /home/loz/work/mozilla-0.9.5+3.0.2-athlon-O3/dist/bin/components/libimggif.so #3 0x412deefb in NSGetModule () from /home/loz/work/mozilla-0.9.5+3.0.2-athlon-O3/dist/bin/components/libimggif.so #4 0x412ddf2c in NSGetModule () from /home/loz/work/mozilla-0.9.5+3.0.2-athlon-O3/dist/bin/components/libimggif.so #5 0x412dfa88 in NSGetModule () from /home/loz/work/mozilla-0.9.5+3.0.2-athlon-O3/dist/bin/components/libimggif.so #6 0x412e0230 in NSGetModule () from /home/loz/work/mozilla-0.9.5+3.0.2-athlon-O3/dist/bin/components/libimggif.so #7 0x4015163d in _ZN6nsPipe17nsPipeInputStream12ReadSegmentsEPFjP14nsIInputStreamPvPKcjjPjES3_jS6_ () from /home/loz/work/mozilla/dist/bin/libxpcom.so #8 0x412dfaeb in NSGetModule () from /home/loz/work/mozilla-0.9.5+3.0.2-athlon-O3/dist/bin/components/libimggif.so #9 0x40d2892b in NSGetModule () from /home/loz/work/mozilla-0.9.5+3.0.2-athlon-O3/dist/bin/components/libimglib2.so #10 0x40d274ed in NSGetModule () from /home/loz/work/mozilla-0.9.5+3.0.2-athlon-O3/dist/bin/components/libimglib2.so #11 0x408eb32f in NSGetModule () from /home/loz/work/mozilla-0.9.5+3.0.2-athlon-O3/dist/bin/components/libnecko.so #12 0x408af9a6 in NSGetModule () from /home/loz/work/mozilla-0.9.5+3.0.2-athlon-O3/dist/bin/components/libnecko.so #13 0x408a04ea in NSGetModule () from /home/loz/work/mozilla-0.9.5+3.0.2-athlon-O3/dist/bin/components/libnecko.so #14 0x4016a292 in PL_HandleEvent () from /home/loz/work/mozilla/dist/bin/libxpcom.so #15 0x40169637 in PL_ProcessPendingEvents () from /home/loz/work/mozilla/dist/bin/libxpcom.so #16 0x4016b874 in _ZN16nsEventQueueImpl20ProcessPendingEventsEv () from /home/loz/work/mozilla/dist/bin/libxpcom.so #17 0x40957266 in NSGetModule () from /home/loz/work/mozilla-0.9.5+3.0.2-athlon-O3/dist/bin/components/libwidget_gtk.so #18 0x409571ef in NSGetModule ()
Maybe this is a bit more useful (I updated CVS again, and this time remembered not to strip the libraries when build (doh!)). This is the backtrace from my -O3 build. #0 0x40e03db7 in _ZN12imgContainer11AppendFrameEP14gfxIImageFrame () from /home/loz/work/mozilla-0.9.5+3.0.2-athlon-O3/dist/bin/components/libimglib2.so #1 0x4129a12c in _Z14HaveDecodedRowPvPhiiiihi () from /home/loz/work/mozilla-0.9.5+3.0.2-athlon-O3/dist/bin/components/libimggif.so #2 0x41299258 in _Z10output_rowP10gif_struct () from /home/loz/work/mozilla-0.9.5+3.0.2-athlon-O3/dist/bin/components/libimggif.so #3 0x41298efb in _Z6do_lzwP10gif_structPKh () from /home/loz/work/mozilla-0.9.5+3.0.2-athlon-O3/dist/bin/components/libimggif.so #4 0x41297f2c in _Z9gif_writeP10gif_structPKhj () from /home/loz/work/mozilla-0.9.5+3.0.2-athlon-O3/dist/bin/components/libimggif.so #5 0x41299a88 in _ZN13nsGIFDecoder211ProcessDataEPhj () from /home/loz/work/mozilla-0.9.5+3.0.2-athlon-O3/dist/bin/components/libimggif.so #6 0x4129a230 in _Z11ReadDataOutP14nsIInputStreamPvPKcjjPj () from /home/loz/work/mozilla-0.9.5+3.0.2-athlon-O3/dist/bin/components/libimggif.so #7 0x40156edd in _ZN6nsPipe17nsPipeInputStream12ReadSegmentsEPFjP14nsIInputStreamPvPKcjjPjES3_jS6_ () from /home/loz/work/mozilla/dist/bin/libxpcom.so #8 0x41299aeb in _ZN13nsGIFDecoder29WriteFromEP14nsIInputStreamjPj () from /home/loz/work/mozilla-0.9.5+3.0.2-athlon-O3/dist/bin/components/libimggif.so #9 0x40e0792b in _ZN10imgRequest15OnDataAvailableEP10nsIRequestP11nsISupportsP14nsIInputStreamjj () from /home/loz/work/mozilla-0.9.5+3.0.2-athlon-O3/dist/bin/components/libimglib2.so #10 0x40e064ed in _ZN13ProxyListener15OnDataAvailableEP10nsIRequestP11nsISupportsP14nsIInputStreamjj () from /home/loz/work/mozilla-0.9.5+3.0.2-athlon-O3/dist/bin/components/libimglib2.so #11 0x409cb33f in _ZN12nsJARChannel15OnDataAvailableEP10nsIRequestP11nsISupportsP14nsIInputStreamjj () from /home/loz/work/mozilla-0.9.5+3.0.2-athlon-O3/dist/bin/components/libnecko.so #12 0x4098f9a6 in _ZN22nsOnDataAvailableEvent11HandleEventEv () from /home/loz/work/mozilla-0.9.5+3.0.2-athlon-O3/dist/bin/components/libnecko.so #13 0x409804ea in _ZN23nsARequestObserverEvent13HandlePLEventEP7PLEvent () from /home/loz/work/mozilla-0.9.5+3.0.2-athlon-O3/dist/bin/components/libnecko.so #14 0x4016ef90 in PL_ProcessEventsBeforeID () from /home/loz/work/mozilla/dist/bin/libxpcom.so #15 0x407f815d in _Z12processQueuePvS_ () from /home/loz/work/mozilla-0.9.5+3.0.2-athlon-O3/dist/bin/components/libwidget_gtk.so #16 0x40138dad in _ZN11nsVoidArray17EnumerateForwardsEPFiPvS0_ES0_ () from /home/loz/work/mozilla/dist/bin/libxpcom.so #17 0x407f819e in _ZN10nsAppShell15ProcessBeforeIDEm () from /home/loz/work/mozilla-0.9.5+3.0.2-athlon-O3/dist/bin/components/libwidget_gtk.so #18 0x407ff1c6 in _Z16handle_gdk_eventP9_GdkEventPv () from /home/loz/work/mozilla-0.9.5+3.0.2-athlon-O3/dist/bin/components/libwidget_gtk.so #19 0x4037ec0e in gdk_event_dispatch () from /opt/gnome/lib/libgdk-1.2.so.0 #20 0x403afcf0 in g_main_dispatch () from /opt/gnome/lib/libglib-1.2.so.0 #21 0x403afff8 in g_main_iterate () from /opt/gnome/lib/libglib-1.2.so.0 #22 0x403b04bc in g_main_run () from /opt/gnome/lib/libglib-1.2.so.0 #23 0x402c838f in gtk_main () from /opt/gnome/lib/libgtk-1.2.so.0 #24 0x407f7e43 in _ZN10nsAppShell3RunEv () from /home/loz/work/mozilla-0.9.5+3.0.2-athlon-O3/dist/bin/components/libwidget_gtk.so #25 0x407dac12 in _ZN17nsAppShellService3RunEv () from /home/loz/work/mozilla-0.9.5+3.0.2-athlon-O3/dist/bin/components/libnsappshell.so #26 0x804c814 in _Z5main1iPPcP11nsISupports () #27 0x804bb1c in main () #28 0x404f1861 in __libc_start_main () at soinit.c:56
If I revert imgContainer.cpp and imgContainer.h in modules/libpr0n/src to MOZILLA_0_9_5_RELEASE then the application runs ok, and the garbage problem in the status bar goes away. Should I mark this bug as verified and raise a new one for the new problem with -O3?
Perhaps we should keep this bug open and just change the summary, since it contains the stack traces already. Would that be OK?
I've got no problem with that - anyone else?
Reopening and resummarizing for the crash issue; if my summary is not precise enough, please adjust; thanks. Will also need advice as to whether JavaScript Engine is the correct component for this bug. Based on the stack traces, does it look like ImageLib might be the correct component?
Status: RESOLVED → REOPENED
Resolution: WORKSFORME → ---
Summary: gcc3 -O3 Garbage characters in time on status bar. → -O3 build will open the window but segv [WAS: gcc3 -O3 Garbage characters in time on status bar.]
I'd guess it should be ImageLib as the problem goes away when I revert two files in libpr0n (imgContainer.cpp and imgContainer.h) to the 0.9.5 release. But, I'm not a C++ programmer...
> I'd guess it should be ImageLib as the problem goes away when I revert two > files in libpr0n (imgContainer.cpp and imgContainer.h) to the 0.9.5 release. Can you try the following hack? --- imgContainer.h-save Tue Nov 6 10:19:24 2001 +++ imgContainer.h Tue Nov 6 10:30:44 2001 @@ -42,9 +42,9 @@ #include "nsWeakReference.h" -#ifdef __GNUC__ -#define CANT_INLINE_GETTER -#endif +//#ifdef __GNUC__ +//#define CANT_INLINE_GETTER +//#endif #define NS_IMGCONTAINER_CID \ { /* 5e04ec5e-1dd2-11b2-8fda-c4db5fb666e0 */ \ @@ -81,8 +81,9 @@ nsresult inlinedGetFrameAt(PRUint32 index, gfxIImageFrame **_retval); #else inline nsresult inlinedGetFrameAt(PRUint32 index, gfxIImageFrame **_retval) { - *_retval = NS_STATIC_CAST(gfxIImageFrame*, mFrames.ElementAt(index)); - if (!*_retval) return NS_ERROR_FAILURE; + nsISupports *_elem = mFrames.ElementAt(index); + if (!_elem) return NS_ERROR_FAILURE; + *_retval = NS_STATIC_CAST(gfxIImageFrame*, _elem); return NS_OK; } #endif This is just a guess but I cannot see anything other than that which would prevet gcc from inlining the function.
Reassigning to ImageLib based on Loz's findings at 2001-11-06 03:40 above.
Assignee: khanson → pavlov
Status: REOPENED → NEW
Component: Javascript Engine → ImageLib
QA Contact: pschwartau → tpreston
dup!
Ulrich's patch fixes this for me. I've marked it as WORKSFORME on the off-chance that thats the right thing to do. cheers all.
Status: NEW → RESOLVED
Closed: 24 years ago24 years ago
Resolution: --- → WORKSFORME
I just recalled the cvs log message for the last patch to imgContainer.h: revision 1.12 date: 2001/10/28 21:02:04; author: dbaron%fas.harvard.edu; state: Exp; lines: +8 -0 Fix -O2 optimization crash with gcc 2.96 or 3.0.{1,2} by not inlining |#ifdef __GNUC__|. b=106891 r=pavlov sr=brendan Does anyone want to check that Ulrich's patch doesn't cause this to regress (apologies if I'm being dumb) - I'll put a message on the list for 106891.
> Does anyone want to check that Ulrich's patch doesn't cause this to regress > (apologies if I'm being dumb) - I'll put a message on the list for 106891. If the patch does work the original change (to prevent inlining) only hid the problem. Since the non-inline code was the same the generated code should have been, given the right situation, be the same. Maybe somebody can send me the disassembled code which had the problem so that I can check what really is the problem.
dbaron: can you look at this and send drepper the disassembled code?
Status: RESOLVED → REOPENED
Resolution: WORKSFORME → ---
we checked in a workaround for this a while ago. i don't remeber the bug number, but i'm marking this fixed. if someone finds the other bug, please dup it.
Status: REOPENED → RESOLVED
Closed: 24 years ago24 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.