Closed Bug 94734 Opened 23 years ago Closed 22 years ago

crash on a bugzilla search

Categories

(Core :: Networking: HTTP, defect, P1)

defect

Tracking

()

VERIFIED FIXED
mozilla1.0.1

People

(Reporter: bobbell, Assigned: darin.moz)

References

()

Details

(Keywords: 64bit, crash, Whiteboard: [adt2 RTM] [ETA 07/31])

Attachments

(6 files)

From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; OSF1 alpha; en-US; rv:0.9.3) Gecko/20010804
BuildID:    2001080416

When I do a bugzilla search, I get the message instructing be to wait.  When the
screen should refresh with the results, mozilla crashes.

Reproducible: Always
Steps to Reproduce:
1. Go to bugzilla.mozilla.org
2. Search for a bug (e.g., "corrupt font")
3. Wait for results

Actual Results:  mozilla crashed instead of displaying results.  The "Just a
moment" (or whatever the exact text is) screen did display, but not the results.

Expected Results:  display the results, as my copy of Navigator 4.76 does.
Stack trace (note mozilla was built without symbols):$ dbx
/usr/local/mozilla/mozilla-bin core
dbx version 5.1
Type 'help' for help.
Core file created by program "mozilla-bin"

warning: /usr/local/mozilla/mozilla-bin has no symbol table -- very little is
supported without it


thread 0x8 signal Segmentation fault at >*[__nxm_thread_kill, 0x3ff805cb1d8]   
ret     zero, (ra), 1
(dbx) t
>  0 __nxm_thread_kill(0xb, 0x0, 0x3ff805b7914, 0x3ffc01b2000, 0x3ffc01b2000)
[0x3ff805cb1d8]
   1 pthread_kill(0x0, 0x11fffaab8, 0x0, 0x11fffc010, 0x1) [0x3ff805b7934]
   2 (unknown)() [0x3ff805cf854]
   3 (unknown)() [0x3ff807f369c]
   4 exc_raise_signal_exception(0xb0ffe0003, 0x86, 0x0, 0x3ffbfcb9504, 0x1)
[0x3ff807f3a08]
   5 (unknown)() [0x3ff805b9470]

DBX Fault: Segmentation fault
(dbx) 

Reporter, can you try a recent nightly build available at:
http://ftp.mozilla.org/pub/mozilla/nightly/latest/mozilla-alpha-dec-osf4.0f.tar.gz
This problem still exists with the latest nightly build.
Marking NEW.

BTW, this seems to be the first OSF/1 bug report.
Status: UNCONFIRMED → NEW
Ever confirmed: true
This bug is still being produced with Mozilla Build 2001101719.  I've verified
that other co-workers can also reproduce this bug.
we must have a stack trace to assign it to the correct component.
Browser general will never fix a bug...
On I386-Linux, win and mac we have talkback...

Reporter: 
It is possible that you can build a debug build or a optimized build with 
symbols ?
Yes, I believe I can do a debug build.  Please specify exactly what parameters
you would like me to build Mozilla with, as I don't normally compile it myself
and therefore don't know how such a build should be done.  All specify whether
you would like me to build Mozilla from a nightly release or from the last
milestone.
Please read http://www.mozilla.org/build (should contain the build instructions)

Please build a nightly build if possible..

Thanks !
I built mozilla from nightly source, using '-O -g3' flags passed to the Compaq
Tru64 'cc' and 'cxx' compilers.  I didn't do anything else special.  I've
attached the script 'buildit' that I used to build mozilla.

NOTE: I first ran mozilla on the machine where I built it.  This started it with
a fairly clean profile.  The bug was reproduced.  However, because that machine
was running very low on disk space, I subsequently ran it from a second machine,
NFS mounting mozilla from the build machine.  This picked up by own profile, and
also make it easier to debug the crash.  The bug was reproduced in the same
fashion.  Running mozilla from the second machine via NFS from the first is how
I gathered the information in this bug report.

When mozilla ran it generated a lot of output to the terminal window from which
I ran it.  I've attached the file 'mozilla.err', which is a copy-and-paste of
this text output.

Mozilla stills crashes as expected when retrieving results of a search from
bugzilla.mozilla.org.  Below you will find a debugging session on that dump.  I
can give you the dump if you want, or you can tell me what I should investigate.

/usr/bin/dbx /usr/local/rpm/tmp/mozilla/dist/bin/mozilla-bin core
dbx version 5.1
Type 'help' for help.
Core file created by program "mozilla-bin"

thread 0xf signal Segmentation fault at >*[__nxm_thread_kill, 0x3ff805cb1d8]   
ret     zero, (ra), 1
(/usr/bin/dbx) t
>  0 __nxm_thread_kill(0xb, 0x0, 0x3ff805b7914, 0x3ffc01b2000, 0x3ffc01b2000)
[0x3ff805cb1d8]
   1 pthread_kill(0x0, 0x11fffaef8, 0x0, 0x11fffc010, 0x1) [0x3ff805b7934]
   2 (unknown)() [0x3ff805cf854]
   3 (unknown)() [0x3ff807f369c]
   4 exc_raise_signal_exception(0xb0ffe0003, 0x86, 0x0, 0x3ffbf938b40, 0x1)
[0x3ff807f3a08]
   5 (unknown)() [0x3ff805b9470]
   6
OnDataAvailable__16nsMultiMixedConvXP10nsIRequestP11nsISupportsP14nsIInputStreamUiUi()
["nsMultiMixedConv.cpp":545, 0x3ffbf938b40]
   7
OnDataAvailable__18nsDocumentOpenInfoXP10nsIRequestP11nsISupportsP14nsIInputStreamUiUi()
["nsURILoader.cpp":259, 0x3ffbf78de08]
   8
OnDataAvailable__19nsStreamListenerTeeXP10nsIRequestP11nsISupportsP14nsIInputStreamUiUi()
["nsStreamListenerTee.cpp":56, 0x3ffbf91abc8]
   9
OnDataAvailable__13nsHttpChannelXP10nsIRequestP11nsISupportsP14nsIInputStreamUiUi()
["nsHttpChannel.cpp":2359, 0x3ffbf9a3104]
  10 HandleEvent__22nsOnDataAvailableEventXv() ["nsStreamListenerProxy.cpp":192,
0x3ffbf917ea0]
  11 HandlePLEvent__23nsARequestObserverEventXP7PLEvent()
["nsRequestObserverProxy.cpp":79, 0x3ffbf8f04f4]
  12 PL_HandleEvent(self = (unallocated - symbol optimized away))
["plevent.c":590, 0x3ffbfee1668]
  13 PL_ProcessPendingEvents(self = (unallocated - symbol optimized away))
["plevent.c":520, 0x3ffbfee1440]
  14 ProcessPendingEvents__16nsEventQueueImplXv() ["nsEventQueue.cpp":388,
0x3ffbfee573c]
  15 event_processor_callback__XPvi17GdkInputCondition() ["nsAppShell.cpp":184,
0x3ffbf1568b4]
  16 our_gdk_io_invoke__XP11_GIOChannel12GIOConditionPv() ["nsAppShell.cpp":76,
0x3ffbf15622c]
  17 (unknown)() [0x300030155c4]
  18 (unknown)() [0x3000301786c]
  19 (unknown)() [0x3000301808c]
  20 g_main_run(0x1, 0x1, 0x300018d5cf0, 0x0, 0x300018d5dac) [0x3000301827c]
  21 gtk_main(0x3ffbf147ed0, 0x0, 0x11fffbe00, 0x3ffbfee8dd0, 0x140105a80)
[0x300018d5da8]
  22 Run__10nsAppShellXv() ["nsAppShell.cpp":364, 0x3ffbf157240]
  23 Run__17nsAppShellServiceXv() ["nsAppShellService.cpp":302, 0x3ffbe0a3a9c]
  24 main1__XiPPcP11nsISupports() ["nsAppRunner.cpp":1303, 0x12001466c]
  25 main() ["nsAppRunner.cpp":1629, 0x1200154f4]
(/usr/bin/dbx) 

Please tell me if you need more information.  I may also attempt recompiling
with '-g' instead of '-g3' to gather even more debugging information.
Thanks for your stack trace !!!

-> Networking
Assignee: asa → darin
Component: Browser-General → Networking: HTTP
QA Contact: doronr → tever
reporter: can you reproduce this problem in a more recent nightly build?
I just reproduced this with 2002013113.  I used an optimized build, from an
custom RPM.  I'll look into generating a debug build again, but this may take a
little time.
bobbell: before you bother building debug, you might try capturing a HTTP log
while reproducing the crash.  here's how:

  setenv NSPR_LOG_MODULES nsHttp:5
  setenv NSPR_LOG_FILE http.log

then just attach http.log to this bug report.  thx!
Since it seems that debug build will take a little longer to generate, I've
attached the log as per the instructions given.  I went straight to
http://bugzilla.mozilla.org/ and searched for "foo bar" (no quotes).  I was
given the "please wait" screen, and the mozilla crashed (as usual) instead of
displaying the results of the search.
Target Milestone: --- → Future
*** Bug 124557 has been marked as a duplicate of this bug. ***
I am seeing this with Mozilla 0.9.9 on Linux alpha (from redhat rawhide RPM),
although it seems to only happen when the query returns a long list of bugs
(>70).  It also crashes on RedHat's bugzilla, but occurs there even with short
lists.

It crashes in nsMultiMixedConv::OnDataAvailable (as in OSF).

I did the NSPR_LOG_MODULES a couple times with similar results to attachment
67747 [details], although one time it gave me:

1026[120246da0]: nsHttpHandler::ReclaimConnection
[conn=20b17e70(bugzilla.redhat.com:80) keep-alive=1]
1026[120246da0]: adding connection to idle list [conn=20b17e70]
1026[120246da0]: active connection count is now 0
1026[120246da0]: nsHttpHandler::ProcessTransactionQ_Locked
1026[120246da0]: >> unable to process transaction queue at this time

I can attach the full log if you want to see it.

I also did sniffit and it appears to crash in different places (one time in the
bug list, one time in the footer).

OS --> All, please
It does seem to happen sometimes with shorter lists, but more consistently with
longer lists.
1026[120246da0]: >> unable to process transaction queue at this time

is neither an error nor a warning.  it just indicates that there are no pending
transactions to process.
I was unable to reproduce this crash with an unoptimzed build from CVS a few
days ago.  Currently attempting to recompile with optimization...
Depends on: 134221
no crash with optimization on.  I also built from SRPM 20020327 (since original
crasher for me was 0.9.9 RPM).  Also no crash.  RedHat's bugzilla also works,
which was a more consistent crasher than Mozilla's bugzilla.
worksforme

bobbell: if you still can't get it to compile (bug 134221), you might try the
latest nightly (20020312), assuming you have OSF v4.x
I am running running Tru64 UNIX V5.1, though I do find it intriguing that an
V4.0F build is occuring nightly.  It should still run on V5.1, so I may give it
a shot.

Also note that bug 134221 does not prevent compiling; it prevents the use of the
compiled program.
Mozilla 1.0RC1 Alpha-Linux RPM is still crashing here.
Mozilla 1.0rc1 from SRPM works fine...
Keywords: crash
OS=>All (although could be build config on Linux)
OS: OSF/1 → All
*** Bug 143696 has been marked as a duplicate of this bug. ***
Attached patch Fix Splinter Review
I just came across this problem today on a Tru64 UNIX 5.0. 
After debugging this for a little while I could find the bug.
The code that was causing grief was	   

if (!mPartChannel || !(cursor[bufLen-1] == nsCRT::LF) )

bufLen is declared as PRUint32. 
changing the exporession to	

if (!mPartChannel || !(cursor[int(bufLen-1)] == nsCRT::LF) )

solved the problem.
(and there was much rejoicing)
is that not a compiler bug, then?

nominating for mozilla1.0.1 since we finally know what's going on and have a fix.
Keywords: mozilla1.0.1
Can I have review for this fix.
Keywords: review
Shanmugavelu: can you explain how your patch solves this crash?  is bufLen
sometimes zero?  is that the problem?
Yes. The problem is that the buflen in the expression 
if (!mPartChannel || !(cursor[bufLen-1] == nsCRT::LF) )
is "0". bufLen has been defined as a PRUint32.

(ladebug) p bufLen
0
(ladebug) whatis bufLen
PRUint32 bufLen
then the question becomes: "is cursor[-1] valid?"

if not, then we need to protect against evaluating this expression when bufLen == 0.
Attached file testcase
proper testcase output:
11ffff814 11ffff814 11ffff814 11ffff814
actual testcase output:
11ffff814 51ffff814 11ffff814 11ffff814
(that's with cc, I don't have c++ on OSF)

this bug also bites on Alpha-Linux with C, but not C++ (at least for me,
gcc-2.96-87).  Blizzard might be using an older compiler to make RPMS.	dunno.

also, if bufLen is positive, the bug doesn't bite.
actually, I can get the bad behavior in c++ with gcc-2.96-101 (so Blizzard is
actually using a newer compiler).  It seems like this might have been caused (on
Linux) by fixing Redhat bug 58746:
http://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=58746

For all I know, this is proper behavior in 64-bit land (since it happens on OSF
and Linux), although it then ought to show up on Solaris/Sun.

adding 64bit keyword.
Keywords: 64bit
dereferencing memory you don't own is never valid.  simply casting bufLen - 1 to
a signed integer is not a solution.  we need to understand how it is that bufLen
can be 0 since the original author clearly didn't think that was possible.  if
we decide that it is rightly possible, then the code needs to avoid bufLen - 1
when bufLen == 0, else we need to fix the code that is leading to bufLen == 0.
Status: NEW → ASSIGNED
Priority: -- → P3
Target Milestone: Future → mozilla1.0.1
I also see bufLen==0 on i686, but it just doesn't crash.
Here's what happens during the bad pass through OnDataAvailable

buffer="Set-Cookie: LASTORDER=bugs.bug_id ; path=/; expires=Sun, 30-Jun-2029
00:00:00 GMT\nSet-Cookie: BUGLIST=\n\n"

mFirstOnData is false
mProcessingHeaders is true
on return from ParseHeaders, bufLen is 0 and done is true, so mProcessingHeaders
is set to false.

because mProcessingHeaders is false, it does:
      if (!mPartChannel || !(cursor[bufLen-1] == nsCRT::LF) )
            bufAmt = PR_MIN(mTokenLen - 1, bufLen);

which causes the crash.  note that cursor[-1] is actually part of buffer it had
allocated earler (it is not dereferencing memory it doesn't own).
Attached patch alternate fixSplinter Review
this isn't going to make 1.0.1 ...

-> 1.2alpha
Target Milestone: mozilla1.0.1 → mozilla1.2alpha
cc'ing valeski, who seems to have written most of the relevant code here
(according to CVS blame)

valeski: could you explain the desired/expected behavior in the particular case
here where it crashes?  thanks.
*** Bug 159619 has been marked as a duplicate of this bug. ***
from bug 159619: also crashes IA64
Hardware: DEC → All
Comment on attachment 86721 [details] [diff] [review]
alternate fix

sr=darin

this patch looks very good.  bufAmt shouldn't change from zero if bufLen is
zero, so adding this check definitely doesn't change the intended logic of the
block.
Attachment #86721 - Flags: superreview+
-> going to shoot for getting this into both 1.1 and 1.0.1
Priority: P3 → P1
Target Milestone: mozilla1.2alpha → mozilla1.1beta
Comment on attachment 86721 [details] [diff] [review]
alternate fix

r=blizzard
Attachment #86721 - Flags: review+
Comment on attachment 86721 [details] [diff] [review]
alternate fix

a=brendan@mozilla.org for trunk and branch.

/be
Attachment #86721 - Flags: approval+
fixed-on-trunk
Status: ASSIGNED → RESOLVED
Closed: 22 years ago
Resolution: --- → FIXED
Whiteboard: [adt1 RTM]
-> mozilla1.0.1

waiting for ADT approval.
Target Milestone: mozilla1.1beta → mozilla1.0.1
Lowering to adt2 since it appears to only affect 64bit machines.
Whiteboard: [adt1 RTM] → [adt2 RTM]
can someone with a 64-bit system confirm that this patch makes bugzilla usable
again?  i don't think tever has access to such a machine, and we really need to
get this verified ASAP.  thx!
Tested this on a Tru64 UNIX system. Works fine.
marking VERIFIED per previous comment.
Status: RESOLVED → VERIFIED
adt1.0.1+ (on ADT's behalf) approval for checkin to the 1.0 branch. pls check
this in asap, then replace the "mozilla1.0.1+" with "fixed1.0.1". thanks!
Blocks: 143047
Keywords: adt1.0.1adt1.0.1+
Whiteboard: [adt2 RTM] → [adt2 RTM] [ETA 07/31]
fixed1.0.1
*** Bug 162446 has been marked as a duplicate of this bug. ***
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: