Closed Bug 94734 Opened 24 years ago Closed 23 years ago

crash on a bugzilla search

Categories

(Core :: Networking: HTTP, defect, P1)

defect

Tracking

()

VERIFIED FIXED
mozilla1.0.1

People

(Reporter: bobbell, Assigned: darin.moz)

References

()

Details

(Keywords: 64bit, crash, Whiteboard: [adt2 RTM] [ETA 07/31])

Attachments

(6 files)

From Bugzilla Helper: User-Agent: Mozilla/5.0 (X11; U; OSF1 alpha; en-US; rv:0.9.3) Gecko/20010804 BuildID: 2001080416 When I do a bugzilla search, I get the message instructing be to wait. When the screen should refresh with the results, mozilla crashes. Reproducible: Always Steps to Reproduce: 1. Go to bugzilla.mozilla.org 2. Search for a bug (e.g., "corrupt font") 3. Wait for results Actual Results: mozilla crashed instead of displaying results. The "Just a moment" (or whatever the exact text is) screen did display, but not the results. Expected Results: display the results, as my copy of Navigator 4.76 does.
Stack trace (note mozilla was built without symbols):$ dbx /usr/local/mozilla/mozilla-bin core dbx version 5.1 Type 'help' for help. Core file created by program "mozilla-bin" warning: /usr/local/mozilla/mozilla-bin has no symbol table -- very little is supported without it thread 0x8 signal Segmentation fault at >*[__nxm_thread_kill, 0x3ff805cb1d8] ret zero, (ra), 1 (dbx) t > 0 __nxm_thread_kill(0xb, 0x0, 0x3ff805b7914, 0x3ffc01b2000, 0x3ffc01b2000) [0x3ff805cb1d8] 1 pthread_kill(0x0, 0x11fffaab8, 0x0, 0x11fffc010, 0x1) [0x3ff805b7934] 2 (unknown)() [0x3ff805cf854] 3 (unknown)() [0x3ff807f369c] 4 exc_raise_signal_exception(0xb0ffe0003, 0x86, 0x0, 0x3ffbfcb9504, 0x1) [0x3ff807f3a08] 5 (unknown)() [0x3ff805b9470] DBX Fault: Segmentation fault (dbx)
Reporter, can you try a recent nightly build available at: http://ftp.mozilla.org/pub/mozilla/nightly/latest/mozilla-alpha-dec-osf4.0f.tar.gz
This problem still exists with the latest nightly build.
Marking NEW. BTW, this seems to be the first OSF/1 bug report.
Status: UNCONFIRMED → NEW
Ever confirmed: true
This bug is still being produced with Mozilla Build 2001101719. I've verified that other co-workers can also reproduce this bug.
we must have a stack trace to assign it to the correct component. Browser general will never fix a bug... On I386-Linux, win and mac we have talkback... Reporter: It is possible that you can build a debug build or a optimized build with symbols ?
Yes, I believe I can do a debug build. Please specify exactly what parameters you would like me to build Mozilla with, as I don't normally compile it myself and therefore don't know how such a build should be done. All specify whether you would like me to build Mozilla from a nightly release or from the last milestone.
Please read http://www.mozilla.org/build (should contain the build instructions) Please build a nightly build if possible.. Thanks !
I built mozilla from nightly source, using '-O -g3' flags passed to the Compaq Tru64 'cc' and 'cxx' compilers. I didn't do anything else special. I've attached the script 'buildit' that I used to build mozilla. NOTE: I first ran mozilla on the machine where I built it. This started it with a fairly clean profile. The bug was reproduced. However, because that machine was running very low on disk space, I subsequently ran it from a second machine, NFS mounting mozilla from the build machine. This picked up by own profile, and also make it easier to debug the crash. The bug was reproduced in the same fashion. Running mozilla from the second machine via NFS from the first is how I gathered the information in this bug report. When mozilla ran it generated a lot of output to the terminal window from which I ran it. I've attached the file 'mozilla.err', which is a copy-and-paste of this text output. Mozilla stills crashes as expected when retrieving results of a search from bugzilla.mozilla.org. Below you will find a debugging session on that dump. I can give you the dump if you want, or you can tell me what I should investigate. /usr/bin/dbx /usr/local/rpm/tmp/mozilla/dist/bin/mozilla-bin core dbx version 5.1 Type 'help' for help. Core file created by program "mozilla-bin" thread 0xf signal Segmentation fault at >*[__nxm_thread_kill, 0x3ff805cb1d8] ret zero, (ra), 1 (/usr/bin/dbx) t > 0 __nxm_thread_kill(0xb, 0x0, 0x3ff805b7914, 0x3ffc01b2000, 0x3ffc01b2000) [0x3ff805cb1d8] 1 pthread_kill(0x0, 0x11fffaef8, 0x0, 0x11fffc010, 0x1) [0x3ff805b7934] 2 (unknown)() [0x3ff805cf854] 3 (unknown)() [0x3ff807f369c] 4 exc_raise_signal_exception(0xb0ffe0003, 0x86, 0x0, 0x3ffbf938b40, 0x1) [0x3ff807f3a08] 5 (unknown)() [0x3ff805b9470] 6 OnDataAvailable__16nsMultiMixedConvXP10nsIRequestP11nsISupportsP14nsIInputStreamUiUi() ["nsMultiMixedConv.cpp":545, 0x3ffbf938b40] 7 OnDataAvailable__18nsDocumentOpenInfoXP10nsIRequestP11nsISupportsP14nsIInputStreamUiUi() ["nsURILoader.cpp":259, 0x3ffbf78de08] 8 OnDataAvailable__19nsStreamListenerTeeXP10nsIRequestP11nsISupportsP14nsIInputStreamUiUi() ["nsStreamListenerTee.cpp":56, 0x3ffbf91abc8] 9 OnDataAvailable__13nsHttpChannelXP10nsIRequestP11nsISupportsP14nsIInputStreamUiUi() ["nsHttpChannel.cpp":2359, 0x3ffbf9a3104] 10 HandleEvent__22nsOnDataAvailableEventXv() ["nsStreamListenerProxy.cpp":192, 0x3ffbf917ea0] 11 HandlePLEvent__23nsARequestObserverEventXP7PLEvent() ["nsRequestObserverProxy.cpp":79, 0x3ffbf8f04f4] 12 PL_HandleEvent(self = (unallocated - symbol optimized away)) ["plevent.c":590, 0x3ffbfee1668] 13 PL_ProcessPendingEvents(self = (unallocated - symbol optimized away)) ["plevent.c":520, 0x3ffbfee1440] 14 ProcessPendingEvents__16nsEventQueueImplXv() ["nsEventQueue.cpp":388, 0x3ffbfee573c] 15 event_processor_callback__XPvi17GdkInputCondition() ["nsAppShell.cpp":184, 0x3ffbf1568b4] 16 our_gdk_io_invoke__XP11_GIOChannel12GIOConditionPv() ["nsAppShell.cpp":76, 0x3ffbf15622c] 17 (unknown)() [0x300030155c4] 18 (unknown)() [0x3000301786c] 19 (unknown)() [0x3000301808c] 20 g_main_run(0x1, 0x1, 0x300018d5cf0, 0x0, 0x300018d5dac) [0x3000301827c] 21 gtk_main(0x3ffbf147ed0, 0x0, 0x11fffbe00, 0x3ffbfee8dd0, 0x140105a80) [0x300018d5da8] 22 Run__10nsAppShellXv() ["nsAppShell.cpp":364, 0x3ffbf157240] 23 Run__17nsAppShellServiceXv() ["nsAppShellService.cpp":302, 0x3ffbe0a3a9c] 24 main1__XiPPcP11nsISupports() ["nsAppRunner.cpp":1303, 0x12001466c] 25 main() ["nsAppRunner.cpp":1629, 0x1200154f4] (/usr/bin/dbx) Please tell me if you need more information. I may also attempt recompiling with '-g' instead of '-g3' to gather even more debugging information.
Thanks for your stack trace !!! -> Networking
Assignee: asa → darin
Component: Browser-General → Networking: HTTP
QA Contact: doronr → tever
reporter: can you reproduce this problem in a more recent nightly build?
I just reproduced this with 2002013113. I used an optimized build, from an custom RPM. I'll look into generating a debug build again, but this may take a little time.
bobbell: before you bother building debug, you might try capturing a HTTP log while reproducing the crash. here's how: setenv NSPR_LOG_MODULES nsHttp:5 setenv NSPR_LOG_FILE http.log then just attach http.log to this bug report. thx!
Since it seems that debug build will take a little longer to generate, I've attached the log as per the instructions given. I went straight to http://bugzilla.mozilla.org/ and searched for "foo bar" (no quotes). I was given the "please wait" screen, and the mozilla crashed (as usual) instead of displaying the results of the search.
Target Milestone: --- → Future
*** Bug 124557 has been marked as a duplicate of this bug. ***
I am seeing this with Mozilla 0.9.9 on Linux alpha (from redhat rawhide RPM), although it seems to only happen when the query returns a long list of bugs (>70). It also crashes on RedHat's bugzilla, but occurs there even with short lists. It crashes in nsMultiMixedConv::OnDataAvailable (as in OSF). I did the NSPR_LOG_MODULES a couple times with similar results to attachment 67747 [details], although one time it gave me: 1026[120246da0]: nsHttpHandler::ReclaimConnection [conn=20b17e70(bugzilla.redhat.com:80) keep-alive=1] 1026[120246da0]: adding connection to idle list [conn=20b17e70] 1026[120246da0]: active connection count is now 0 1026[120246da0]: nsHttpHandler::ProcessTransactionQ_Locked 1026[120246da0]: >> unable to process transaction queue at this time I can attach the full log if you want to see it. I also did sniffit and it appears to crash in different places (one time in the bug list, one time in the footer). OS --> All, please
It does seem to happen sometimes with shorter lists, but more consistently with longer lists.
1026[120246da0]: >> unable to process transaction queue at this time is neither an error nor a warning. it just indicates that there are no pending transactions to process.
I was unable to reproduce this crash with an unoptimzed build from CVS a few days ago. Currently attempting to recompile with optimization...
Depends on: 134221
no crash with optimization on. I also built from SRPM 20020327 (since original crasher for me was 0.9.9 RPM). Also no crash. RedHat's bugzilla also works, which was a more consistent crasher than Mozilla's bugzilla. worksforme bobbell: if you still can't get it to compile (bug 134221), you might try the latest nightly (20020312), assuming you have OSF v4.x
I am running running Tru64 UNIX V5.1, though I do find it intriguing that an V4.0F build is occuring nightly. It should still run on V5.1, so I may give it a shot. Also note that bug 134221 does not prevent compiling; it prevents the use of the compiled program.
Mozilla 1.0RC1 Alpha-Linux RPM is still crashing here.
Mozilla 1.0rc1 from SRPM works fine...
Keywords: crash
OS=>All (although could be build config on Linux)
OS: OSF/1 → All
*** Bug 143696 has been marked as a duplicate of this bug. ***
Attached patch Fix Splinter Review
I just came across this problem today on a Tru64 UNIX 5.0. After debugging this for a little while I could find the bug. The code that was causing grief was if (!mPartChannel || !(cursor[bufLen-1] == nsCRT::LF) ) bufLen is declared as PRUint32. changing the exporession to if (!mPartChannel || !(cursor[int(bufLen-1)] == nsCRT::LF) ) solved the problem.
(and there was much rejoicing) is that not a compiler bug, then? nominating for mozilla1.0.1 since we finally know what's going on and have a fix.
Keywords: mozilla1.0.1
Can I have review for this fix.
Keywords: review
Shanmugavelu: can you explain how your patch solves this crash? is bufLen sometimes zero? is that the problem?
Yes. The problem is that the buflen in the expression if (!mPartChannel || !(cursor[bufLen-1] == nsCRT::LF) ) is "0". bufLen has been defined as a PRUint32. (ladebug) p bufLen 0 (ladebug) whatis bufLen PRUint32 bufLen
then the question becomes: "is cursor[-1] valid?" if not, then we need to protect against evaluating this expression when bufLen == 0.
Attached file testcase
proper testcase output: 11ffff814 11ffff814 11ffff814 11ffff814 actual testcase output: 11ffff814 51ffff814 11ffff814 11ffff814 (that's with cc, I don't have c++ on OSF) this bug also bites on Alpha-Linux with C, but not C++ (at least for me, gcc-2.96-87). Blizzard might be using an older compiler to make RPMS. dunno. also, if bufLen is positive, the bug doesn't bite.
actually, I can get the bad behavior in c++ with gcc-2.96-101 (so Blizzard is actually using a newer compiler). It seems like this might have been caused (on Linux) by fixing Redhat bug 58746: http://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=58746 For all I know, this is proper behavior in 64-bit land (since it happens on OSF and Linux), although it then ought to show up on Solaris/Sun. adding 64bit keyword.
Keywords: 64bit
dereferencing memory you don't own is never valid. simply casting bufLen - 1 to a signed integer is not a solution. we need to understand how it is that bufLen can be 0 since the original author clearly didn't think that was possible. if we decide that it is rightly possible, then the code needs to avoid bufLen - 1 when bufLen == 0, else we need to fix the code that is leading to bufLen == 0.
Status: NEW → ASSIGNED
Priority: -- → P3
Target Milestone: Future → mozilla1.0.1
I also see bufLen==0 on i686, but it just doesn't crash. Here's what happens during the bad pass through OnDataAvailable buffer="Set-Cookie: LASTORDER=bugs.bug_id ; path=/; expires=Sun, 30-Jun-2029 00:00:00 GMT\nSet-Cookie: BUGLIST=\n\n" mFirstOnData is false mProcessingHeaders is true on return from ParseHeaders, bufLen is 0 and done is true, so mProcessingHeaders is set to false. because mProcessingHeaders is false, it does: if (!mPartChannel || !(cursor[bufLen-1] == nsCRT::LF) ) bufAmt = PR_MIN(mTokenLen - 1, bufLen); which causes the crash. note that cursor[-1] is actually part of buffer it had allocated earler (it is not dereferencing memory it doesn't own).
Attached patch alternate fixSplinter Review
this isn't going to make 1.0.1 ... -> 1.2alpha
Target Milestone: mozilla1.0.1 → mozilla1.2alpha
cc'ing valeski, who seems to have written most of the relevant code here (according to CVS blame) valeski: could you explain the desired/expected behavior in the particular case here where it crashes? thanks.
*** Bug 159619 has been marked as a duplicate of this bug. ***
from bug 159619: also crashes IA64
Hardware: DEC → All
Comment on attachment 86721 [details] [diff] [review] alternate fix sr=darin this patch looks very good. bufAmt shouldn't change from zero if bufLen is zero, so adding this check definitely doesn't change the intended logic of the block.
Attachment #86721 - Flags: superreview+
-> going to shoot for getting this into both 1.1 and 1.0.1
Priority: P3 → P1
Target Milestone: mozilla1.2alpha → mozilla1.1beta
Comment on attachment 86721 [details] [diff] [review] alternate fix r=blizzard
Attachment #86721 - Flags: review+
Comment on attachment 86721 [details] [diff] [review] alternate fix a=brendan@mozilla.org for trunk and branch. /be
Attachment #86721 - Flags: approval+
fixed-on-trunk
Status: ASSIGNED → RESOLVED
Closed: 23 years ago
Resolution: --- → FIXED
Whiteboard: [adt1 RTM]
-> mozilla1.0.1 waiting for ADT approval.
Target Milestone: mozilla1.1beta → mozilla1.0.1
Lowering to adt2 since it appears to only affect 64bit machines.
Whiteboard: [adt1 RTM] → [adt2 RTM]
can someone with a 64-bit system confirm that this patch makes bugzilla usable again? i don't think tever has access to such a machine, and we really need to get this verified ASAP. thx!
Tested this on a Tru64 UNIX system. Works fine.
marking VERIFIED per previous comment.
Status: RESOLVED → VERIFIED
adt1.0.1+ (on ADT's behalf) approval for checkin to the 1.0 branch. pls check this in asap, then replace the "mozilla1.0.1+" with "fixed1.0.1". thanks!
Blocks: 143047
Keywords: adt1.0.1adt1.0.1+
Whiteboard: [adt2 RTM] → [adt2 RTM] [ETA 07/31]
fixed1.0.1
*** Bug 162446 has been marked as a duplicate of this bug. ***
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: