Last Comment Bug 199021 - [FIXr]Trunk M140A crash [@ imgRequestProxy::OnStartDecode]
: [FIXr]Trunk M140A crash [@ imgRequestProxy::OnStartDecode]
Status: VERIFIED FIXED
: crash, topcrash
Product: Core
Classification: Components
Component: ImageLib (show other bugs)
: Trunk
: x86 Windows XP
: P1 critical (vote)
: mozilla1.4beta
Assigned To: Boris Zbarsky [:bz] (TPAC)
: Terri Preston
Mentors:
: 199415 200842 201846 (view as bug list)
Depends on:
Blocks: 83774
  Show dependency treegraph
 
Reported: 2003-03-24 13:06 PST by Jay Patel [:jay]
Modified: 2003-04-13 08:07 PDT (History)
16 users (show)
asa: blocking1.4a-
See Also:
Crash Signature:
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments
my startup page (2.48 KB, text/html)
2003-03-30 03:48 PST, Henrik Gemal
no flags Details
fix for the crash (3.63 KB, patch)
2003-04-01 08:59 PST, Boris Zbarsky [:bz] (TPAC)
pavlov: review+
dbaron: superreview+
Details | Diff | Splinter Review

Description Jay Patel [:jay] 2003-03-24 13:06:31 PST
This might not be a topcrasher, but I'm going to make it one for now.  This
crash started with 3/22 MozillaTrunk builds.  Although there are a lot of
crashes reported, it looks like it is just 1 user crashing.

Any idea what might be causing him/her to crash here?  Here is the latest from
Talkback:

Count   Offset    Real Signature
[ 33   imgRequestProxy::OnStartDecode 6293b964 - imgRequestProxy::OnStartDecode ]
 
     Crash date range: 2003-03-23 to 2003-03-23
     Min/Max Seconds since last crash: 2 - 503
     Min/Max Runtime: 3 - 14019
 
     Count   Platform List 
     33   Windows NT 5.1 build 2600
 
     Count   Build Id List 
     32   2003032208
     1   2003032308
 
     No of Unique Users         1
 
 Stack trace(Frame) 

	 imgRequestProxy::OnStartDecode
[c:/builds/seamonkey/mozilla/modules/libpr0n/src/imgRequestProxy.cpp  line 315] 
	 imgRequest::OnStartDecode
[c:/builds/seamonkey/mozilla/modules/libpr0n/src/imgRequest.cpp  line 371] 
	 nsGIFDecoder2::BeginGIF
[c:/builds/seamonkey/mozilla/modules/libpr0n/decoders/gif/nsGIFDecoder2.cpp 
line 243] 
	 gif_write	[c:/builds/seamonkey/mozilla/modules/libpr0n/decoders/gif/GIF2.cpp 
line 622] 
	 nsGIFDecoder2::ProcessData
[c:/builds/seamonkey/mozilla/modules/libpr0n/decoders/gif/nsGIFDecoder2.cpp 
line 196] 
	 ReadDataOut
[c:/builds/seamonkey/mozilla/modules/libpr0n/decoders/gif/nsGIFDecoder2.cpp 
line 138] 
	 nsPipeInputStream::ReadSegments
[c:/builds/seamonkey/mozilla/xpcom/io/nsPipe3.cpp  line 719] 
	 nsGIFDecoder2::WriteFrom
[c:/builds/seamonkey/mozilla/modules/libpr0n/decoders/gif/nsGIFDecoder2.cpp 
line 216] 
	 ReadDataOut
[c:/builds/seamonkey/mozilla/modules/libpr0n/decoders/gif/nsGIFDecoder2.cpp 
line 135] 
	 nsCOMPtr_base::~nsCOMPtr_base
[c:/builds/seamonkey/mozilla/xpcom/glue/nsCOMPtr.cpp  line 66] 
	 ProxyListener::OnStartRequest
[c:/builds/seamonkey/mozilla/modules/libpr0n/src/imgLoader.cpp  line 852] 
	 ProxyListener::OnDataAvailable
[c:/builds/seamonkey/mozilla/modules/libpr0n/src/imgLoader.cpp  line 872] 
	 nsFileChannel::OnDataAvailable
[c:/builds/seamonkey/mozilla/netwerk/protocol/file/src/nsFileChannel.cpp  line 588] 
	 nsInputStreamPump::OnStateTransfer
[c:/builds/seamonkey/mozilla/netwerk/base/src/nsInputStreamPump.cpp  line 409] 
	 nsInputStreamPump::OnInputStreamReady
[c:/builds/seamonkey/mozilla/netwerk/base/src/nsInputStreamPump.cpp  line 322] 
	 nsInputStreamReadyEvent::EventHandler
[c:/builds/seamonkey/mozilla/xpcom/io/nsStreamUtils.cpp  line 112] 
	 PL_HandleEvent	[c:/builds/seamonkey/mozilla/xpcom/threads/plevent.c  line 664] 
	 PL_ProcessPendingEvents	[c:/builds/seamonkey/mozilla/xpcom/threads/plevent.c 
line 597] 
	 _md_EventReceiverProc	[c:/builds/seamonkey/mozilla/xpcom/threads/plevent.c 
line 1385] 
	 USER32.dll + 0x4455 (0x77d44455)  
	 USER32.dll + 0x95d5 (0x77d495d5)  
	 nsAppShellService::Run
[c:/builds/seamonkey/mozilla/xpfe/appshell/src/nsAppShellService.cpp  line 480] 
	 main1	[c:/builds/seamonkey/mozilla/xpfe/bootstrap/nsAppRunner.cpp  line 1286] 
	 main	[c:/builds/seamonkey/mozilla/xpfe/bootstrap/nsAppRunner.cpp  line 1644] 
	 WinMain	[c:/builds/seamonkey/mozilla/xpfe/bootstrap/nsAppRunner.cpp  line 1665] 
	 WinMainCRTStartup()  
	 kernel32.dll + 0x214c7 (0x77e814c7)
Comment 1 Christian :Biesinger (don't email me, ping me on IRC) 2003-03-24 14:09:22 PST
  if (mListener) {
    // Hold a ref to the listener while we call it, just in case.
    nsCOMPtr<imgIDecoderObserver> kungFuDeathGrip(mListener);
    mListener->OnStartDecode(this); <--- This is line 315

Would it be possible to find out what the pointer pointed to at the time of the
crash?
Comment 2 Christian :Biesinger (don't email me, ping me on IRC) 2003-03-24 14:11:34 PST
could maybe be caused by Bug 196797 which made mListener a weak reference.
cc'ing bz.
Comment 3 Jay Patel [:jay] 2003-03-24 14:28:05 PST
Some register info from detailed Talkback report (internal folks can go to
http://climate.netscape.com/reports/incidenttemplate.cfm?bbid=18390251):

Registers:
EAX:	02199568 	EBX:	00000000 	ECX:	67616d69 	EDX:	00000000
ESI:	02199c20 	EDI:	02199858 	ESP:	0012fff8 	EBP:	00000000
EIP:	00000000 	cf pf af zf sf of IF df nt RF vm   IOPL: 0
CS:	001b	DS:	0023	SS:	0023	ES:	0023	FS:	0038	GS:	0000
Comment 4 David Baron :dbaron: ⌚️UTC+1 (busy September 14-25) 2003-03-24 14:48:21 PST
The register info isn't useful without the disassembly.
Comment 5 Boris Zbarsky [:bz] (TPAC) 2003-03-24 15:03:51 PST
This is almost certainly a result of the weak ref change.  Interesting thing 
is, what changed on the 22nd to cause this?  Or is that when this one person 
started seeing it?

Do we have any idea what URLs (s)he's crashing on?
Comment 6 Jay Patel [:jay] 2003-03-24 17:17:37 PST
It does look like just one person...and we don't have any user comments or urls
for what (s)he might have been doing at the time of the crash.
Comment 7 David Baron :dbaron: ⌚️UTC+1 (busy September 14-25) 2003-03-26 06:34:32 PST
This doesn't look like just one person anymore.  There are five different OS
versions in the OS fields.

The only user comment among the 115 reports so far is:
(18470728) Comments: starting mozilla
Comment 8 Boris Zbarsky [:bz] (TPAC) 2003-03-28 09:17:36 PST
This sounds like someone is not clearing the mListener properly when the 
listener is destroyed....  There is an assert in place for cases when that 
happens, and I _thought_ I'd fixed all the places that triggered it....  :(
Comment 9 Jay Patel [:jay] 2003-03-28 13:44:01 PST
*** Bug 199415 has been marked as a duplicate of this bug. ***
Comment 10 Boris Zbarsky [:bz] (TPAC) 2003-03-29 22:06:42 PST
Taking so I keep track of this....
Comment 11 Henrik Gemal 2003-03-30 03:46:18 PST
I keep crashing when I start mozilla. In 9 of of 10 starts I crash.
I'm on WinXp using latest nightly builds.
My start page is a local page (file://)that contains some images, etc...
Comment 12 Henrik Gemal 2003-03-30 03:48:50 PST
Created attachment 118893 [details]
my startup page

save it and set it as the default start page. the start mozilla
Comment 13 Boris Zbarsky [:bz] (TPAC) 2003-03-30 06:59:42 PST
Henrik, thanks for the testcase!  I'll try it tonight.
Comment 14 Boris Zbarsky [:bz] (TPAC) 2003-03-31 06:33:41 PST
I've managed to reproduce this once, but that's it.... (with the startup page
testcase).  Henrik, do you see it consistently?  If so, could you possibly
reduce the HTML as much as possible to the bare minimum testcase?
Comment 15 Henrik Gemal 2003-03-31 08:30:23 PST
it's also seems to be happening for Phoenix.

I narrowed it down to something like:

<img src="">
<img src="http://cluster.chart.dk/chart.asp?id=79683&amp;style=7">

If I change <img src=""> info a valid image url like:
<img src="http://gemal.dk/pics/gemaldk.png">
it loads ok!

it seems that any combination of a empty src="" followed to by img, crashes.
This is also crashing:
<img src="">
<img src="http://gemal.dk/pics/gemaldk.png">

Strangely enough it seems to crash more often if the TalkBack client isn't
running in the background. Must be something else...:)

I'm using Proxomitron as a proxy.

Hope this is enough info. Let me know if I can help or test something...
Comment 16 Henrik Gemal 2003-03-31 08:35:55 PST
I also crash if I'm not using a proxy...:(

crashing in 9 out of 10 starts

Crash without proxy:
TB18649389W
Comment 17 Jim Dunn 2003-03-31 13:15:41 PST
with my debug build I am crashing semi-randomly using a the test
http://home.maine.rr.com/jimdunn/199021.html
which is basically what Henrik mentioned.  I say semi-randomly because
I saw the crash the first time I went to the site and then
couldn't recreate it no matter how many times I cleared the cache
and/or history.  However, when I rebuilt one of the mozilla
components (libpr0n) and re-ran, I crashed immediately in 
    void imgRequestProxy::OnStartDecode() on
        mListener->OnStartDecode(this);

Not sure if this helps.
Comment 18 Boris Zbarsky [:bz] (TPAC) 2003-03-31 14:46:30 PST
I don't get it.  :(  src="" should do absolutely nothing (it's totally
equivalent to having no src attribute at all as far as the image loading code is
concerned).  It should not cause any calls into libpr0n....

I saw exactly what jdunn did -- I could repro once with any given build, then it
would never crash.  I tried touching nsGIFDecoder2.cpp and rebuilding
modules/libpr0n, but that did not bring the crash back....

jdunn, any specific steps to get the crash to reappear?
Comment 19 Jim Dunn 2003-03-31 15:23:05 PST
I have seen this once or twice since but tried everything to get a consistent
re-occurence and can't.  I will continue to play with it, but not sure 
if I will find anything else.
Comment 20 Boris Zbarsky [:bz] (TPAC) 2003-03-31 15:40:20 PST
Yeah... part of the issue is that if we're jumping into some random spot, we may
well not crash offhand... anyone know of any tools I could use that would catch
such accesses to deleted memory 100% of the time?
Comment 21 Henrik Gemal 2003-03-31 22:40:24 PST
remember you have to test it like this:

download the testcase to a local file
set mozilla to start with the local file
Comment 22 Boris Zbarsky [:bz] (TPAC) 2003-03-31 22:42:13 PST
That's exactly how I've been testing....
Comment 23 Henrik Gemal 2003-03-31 22:57:09 PST
my start file is now just:

<img src="">
<img src="http://gemal.dk/pics/gemaldk.png">

and this crashes moz build 2003033108

are you running debug?

If I start mozilla mail first and then a browser window I *never* crash.

What else could be different?

I'm also experiencing it on Win2k on a different machine.

It says that the crash is in imgRequestProxy. Does that has anything to do with
using a proxy, since I can also reproduce without a proxy.
Comment 24 Henrik Gemal 2003-03-31 23:07:01 PST
Ok... thing are now getting weird. I'm using McAfee VirusScan. If I exit McAfee
VirusScan I dont crash!

I'm using McAfee VirusScan 4.5.1 SP1
VirusDefs: 4.0.4254
Scan Engine: 4.2.40

I'm running McAfee VirusScan on both of my PC where I crash.

Any clue why this is happening?
Comment 25 Christian :Biesinger (don't email me, ping me on IRC) 2003-04-01 05:19:40 PST
bz: try valgrind maybe? http://developer.kde.org/~sewardj/
Comment 26 Boris Zbarsky [:bz] (TPAC) 2003-04-01 07:08:29 PST
Henrik, I've tried reproducing it with mozilla nightlies, my opt self-build, and
three different debug self-builds (with slightly different options).  In each
case, I got it to happen once and only once...  imgRequestProxy has nothing to
do with real proxies.  Any idea why virus scan is affecting it?  Are there any
settings for the virus scanner you can fiddle with to attempt to narrow down the
problem?

Biesi, I'll try to give it a shot... I'm actually more interested in something
that will cause a SIGSEGV 100% of the time when accessing deleted memory.  That
would allow me to reproduce the crash reliably, I'd hope...
Comment 27 Henrik Gemal 2003-04-01 07:16:25 PST
If I have the virusscanner running I seem to crash more often. But I also crash
without it runinng.
Currently I'm testing a XPI package so I restart a lot. Perhaps installing an
XPI package could have something to do with it? Who knows...:(

will try to test some more
Comment 28 Boris Zbarsky [:bz] (TPAC) 2003-04-01 08:59:22 PST
Created attachment 119071 [details] [diff] [review]
fix for the crash

I finally managed to reproduce this consistently...  Should have known
something like this was up when we were crashing in the _GIF_ decoder while
loading a _PNG_ image.	;)

So the problem here is that the "broken" image starts the icon loads, passing
in its mListener as the observer, then immediately blows away the frame to
replace itself with alt text, killing the listener in the process.

But the second image has started to load by that point, looks like, so the
iconload does not die, and the icon load requests use a bogus listener
pointer....

The whole thing is highly timing-dependent, which is why it's a little hard to
reproduce.  I'd suspect the virus scanner just affects some timing somewhere
and makes the problem more likely to happen.

This patch is basically wallpaper -- the real fix would be to have the iconload
itself observing the load and to have the images register with the iconload to
be notified of stuff... but I'd rather do that as part of a more thorough
rewrite of the iconload business.
Comment 29 David Baron :dbaron: ⌚️UTC+1 (busy September 14-25) 2003-04-01 09:07:11 PST
Comment on attachment 119071 [details] [diff] [review]
fix for the crash

sr=dbaron, but I'd rather have someone more familiar with this code review.
Comment 30 Boris Zbarsky [:bz] (TPAC) 2003-04-01 09:13:19 PST
Comment on attachment 119071 [details] [diff] [review]
fix for the crash

Seeing as attinasi is not here... pav, could you review?
Comment 31 Boris Zbarsky [:bz] (TPAC) 2003-04-01 17:35:27 PST
Fix checked in for 1.4b.
Comment 32 Jay Patel [:jay] 2003-04-07 14:54:41 PDT
Adding M140A to summary for future reference.  Although this is fixed on the
MozillaTrunk, it did not make it into Mozilla 1.4 alpha.

According to Talkback, there have been 0 crashes since the checkin.  Marking
verified.
Comment 33 Jim Dunn 2003-04-10 08:01:32 PDT
*** Bug 200842 has been marked as a duplicate of this bug. ***
Comment 34 Boris Zbarsky [:bz] (TPAC) 2003-04-13 08:07:15 PDT
*** Bug 201846 has been marked as a duplicate of this bug. ***

Note You need to log in before you can comment on or make changes to this bug.