81118 - Crash in nsImageListener::FrameChanged

Reporter

Description

•

23 years ago

The recent orange on the coffee tinderbox is jrgm's page-load test dying (on the
eighth page, or so, in the test). I duplicated this in my (tip) debug build, and
got the following stack trace:

#0  0x419565b6 in nsImageListener::FrameChanged (this=0x88525b8,
aContainer=0x8b084d8, aContext=0x87b77f0, newframe=0x8a17990,
dirtyRect=0xbffff370) at nsImageFrame.cpp:1638
#1  0x4179a371 in imgRequestProxy::FrameChanged (this=0x88526d8,
container=0x8b084d8, cx=0x0, newframe=0x8a17990, dirtyRect=0xbffff370) at
imgRequestProxy.cpp:260
#2  0x41797a7f in imgRequest::FrameChanged (this=0x8852b78, container=0x8b084d8,
cx=0x0, newframe=0x8a17990, dirtyRect=0xbffff370) at imgRequest.cpp:375
#3  0x4179378c in imgContainer::Notify (this=0x8b084d8, timer=0x8789a50) at
imgContainer.cpp:439
#4  0x4155e370 in nsTimerGtk::FireTimeout (this=0x8789a50) at nsTimerGtk.cpp:186
[and so forth...]

It looks as though |this| is getting clobbered in memory, maybe.

I'm not certain, but I think it might be related to pavlov's May 3rd fix for
78015 combined with mstoltz's enabling checkin for 78831.

Other folks on the hook (just looking brainlessly at tinderbox and seeing when
solid orange started) are waterson, hewitt, danm, yokoyama, ddrinan, and nhotta
(cc'ing).

Dan Rosen

Reporter

Comment 1

•

23 years ago

More info: it's dying on the "My Compuserve" page. I'll attach a full stack
trace and some variables in a moment. Setting as smoketest blocker.

Severity: critical → blocker

Keywords: crash, smoketest

Dan Rosen

Reporter

Comment 2

•

23 years ago

Attached file full stacktrace, data in memory at |this| — Details

Chris Waterson

Assignee

Comment 3

•

23 years ago

This happens on windows, too, but it looks an awful lot like a dependency
problem. (On one side of the call, the object looks fine; on the other side, it
thinks the mRefCnt slot is the mFrame slot.) I'm clobbering now to see if it
goes away. dr: did you start with a clean tree?

John Morrison

Comment 4

•

23 years ago

Maybe we need "yet another" clobber of coffee ... mcafee?

Dan Rosen

Reporter

Comment 5

•

23 years ago

waterson: My debug build has one change in it, in nsCSSFrameConstructor's
internal logic. I think that isn't affecting things for me -- I think I'm seeing
the same problem as is being seen on coffee. It is a depend build, though, not
clobber, if that's useful info...

If this is happening on windows, is the only reason why the windows tinderboxen
are green instead of orange that they don't run the page loader tests?

Platform, OS -> All

URL: http://jrgm.mcom.com/page-loader/load...

OS: Linux → All

Hardware: PC → All

John Morrison

Comment 6

•

23 years ago

They don't run the page load test. In fact, I don't think they (currently) run
any tests. So, as long as they compile ... green.

R.K.Aa.

Comment 7

•

23 years ago

RH7.1 w/gcc2.96-81
slightly different backtrace, the beginning looks like this:

#0  0x00000029 in __strtol_internal (nptr=0x918f5b4 "\200pLA", endptr=0x8965eb8,
base=-1073746288, group=0) at eval.c:40
#1  0x4139e9fa in nsImageFrame::FrameChanged () from
/home/dark/DISK/mozilla/dist/bin/components/libgklayout.so
#2  0x413a12b8 in nsImageListener::FrameChanged () from
/home/dark/DISK/mozilla/dist/bin/components/libgklayout.so
#3  0x41326271 in imgRequestProxy::FrameChanged () from
/home/dark/DISK/mozilla/dist/bin/components/libimglib2.so

etc.

Dan Rosen

Reporter

Comment 8

•

23 years ago

Oh, I suppose I should mention, I'm using gcc-2.96-81 and a snapshot gdb
(insight) 20010427.

Can anybody tell whether either of the patches for bug 78015 or bug 78831 might
be the culprit? Or the checkins just before we went orange on coffee? I still
can't see who this should belong to.

Dan Rosen

Reporter

Comment 9

•

23 years ago

Looking more closely at:

http://bonsai.mozilla.org/cvsquery.cgi?module=MozillaTinderboxAll&date=explicit&mindate=989949120&maxdate=989952659

(which is the cycle when coffee went orange), I can't see how any of these would
have caused this problem. Removing some cc's of the obviously innocent. The one
checkin at that time which confuses me is yokoyama's. The checkin comment
doesn't have what seems to be the right bug number, so I'm just having trouble
deciphering this. It probably doesn't have anything to do with this either,
though...

Maybe the fact that it's dying on the "My Compuserve" page is useful
information? Dunno.

Stuart Parmenter

Comment 10

•

23 years ago

I havn't checked anything in that would have just started causing this at all to
this.  Start looking at the other checkins.

Stuart Parmenter

Comment 11

•

23 years ago

dr: the patches you are talking about have been in the tree for a long time.  I
really don't think they are coming back to bite us in the ass right now.

Whiteboard: NOTMYBUG

John Morrison

Comment 12

•

23 years ago

1) I did a full clobber build on win2k; still crashing at the same point

2) I looked at the content, and after experimenting a bit, it turns out
   that it depends on the page that it is leaving, the "static" copy being
   http://jrgm.mcom.com/perf/loadtime5/base/web.icq.com/index.html     

3) But going to the static copy is not enough, it's only when the page 
   loads are changed together that I can duplicate this crash. (Some
   (perhaps uncancelled) timer issue?)

4) I noticed the the first interval of orange was crashing on page "38",
   but that consistently this was crashing on page "8" since then. So, 
   making an assumption that maybe the first crash was '**** happens', 
   then maybe the current crash was due to checkins in the second interval.
   I tried backing out this checkin from waterson,
     cvs update -j3.431 -j3.430 mozilla/layout/html/base/src/nsBlockFrame.cpp
   and, for reasons that I cannot explain that fixes this crash. I tried 
   backing out and then reapplying twice each, and I crash everytime with in 
   applied, and never crash with it backed out. Could someone try this on Linux 
   with a current build (do we even know if this is happening on Mac). 

Removing the "NOTMYBUG" but leaving with pavlov, since he's coming back from 
Denny's in a bit I think, and perhaps he can see if this fixes the crash for
him.

To save time in debugging this (and show that it is the page exited 
that is driving the crash), I set up a simple test at 
  http://jrgm.mcom.com/yapl-crash/index.html
that loops through only three pages, including compuserve and icq, but 
in a different order (msnbc->compuserve->icq->msnbc...). It will crash
as msnbc is shown for a second time.

Whiteboard: NOTMYBUG

Stuart Parmenter

Comment 13

•

23 years ago

I don't have any tip builds right now (that arn't ripped apart).  reassigning 
to waterson.  let me know if i can help

Assignee: pavlov → waterson

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Comment 14

•

23 years ago

This seems like it's probably the same as bug 80203, which has a lot of gdb
output but doesn't seem to be well-owned right now.  The exact location of the
crash varied a little on that bug, although some of the stacks here are another
stack frame up from anything noted there.

Chris Waterson

Assignee

Comment 15

•

23 years ago

Back to pavlov. The problem is that the imgRequestProxy::Cancel() method isn't 
being called, its mListener pointing to an nsImageListener on a dead frame. This 
really isn't my bug, pav. Backatcha.

Assignee: waterson → pavlov

Chris Waterson

Assignee

Comment 16

•

23 years ago

Allright, taking this back. Looks like my checkins for bug 43914 caused this 
regression. Backing those out fixes the crash.

Assignee: pavlov → waterson

Chris Waterson

Assignee

Comment 17

•

23 years ago

Backed out; added dependency to 43914.

Blocks: 43914

Status: NEW → RESOLVED

Closed: 23 years ago

Resolution: --- → FIXED

Dan Rosen

Reporter

Comment 18

•

23 years ago

waterson: is this really fixed? why are we still seeing orange in tinderbox?

Tracy Walker [:tracy]

Comment 19

•

23 years ago

I ran jrgms loadtime test on the linux build 2001-05-16-12-trunk and mac 
2001-05-16-12-trunk.  It worked fine on both...cruised right on through icq.com.

Status: RESOLVED → VERIFIED

Christopher Blizzard (:blizzard)

Comment 20

•

23 years ago

This isn't fixed.  I just crashed with this stack trace which includes your fixes:

#4  <signal handler called>
#5  0x4181455c in nsFrame::Invalidate (this=0x8e81f8c, aPresContext=0x8df70b0, 
    aDamageRect=@0xbffff250, aImmediate=0) at nsFrame.cpp:2197
#6  0x418225b6 in nsImageFrame::FrameChanged (this=0x8e81f8c, 
    aContainer=0x8e83210, aPresContext=0x8df70b0, aNewFrame=0x8e894b8, 
    aDirtyRect=0xbffff3a0) at nsImageFrame.cpp:430
#7  0x41824eec in nsImageListener::FrameChanged (this=0x8e868a8, 
    aContainer=0x8e83210, aContext=0x8df70b0, newframe=0x8e894b8, 
    dirtyRect=0xbffff3a0) at ../../../../dist/include/nsCOMPtr.h:642
#8  0x410b85b1 in imgRequestProxy::FrameChanged (this=0x8e84b38, 
    container=0x8e83210, cx=0x0, newframe=0x8e894b8, dirtyRect=0xbffff3a0)
    at imgRequestProxy.cpp:260
#9  0x410b6fb0 in imgRequest::FrameChanged (this=0x8e83050, 
    container=0x8e83210, cx=0x0, newframe=0x8e894b8, dirtyRect=0xbffff3a0)
    at imgRequest.cpp:375

Status: VERIFIED → REOPENED

Resolution: FIXED → ---

Dan Rosen

Reporter

Comment 21

•

23 years ago

Didn't think so <grumble> ... Pavlov, I know you checked in a week or so ago,
but it looks like mstoltz's checkin to nsImageFrame.cpp (rev 1.170) yesterday
activated your code. That's my reason for suspecting that your checkin might
conceivably be biting us in the ass only now. (OTOH, I haven't looked at the
code, sooo...)

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Comment 22

•

23 years ago

blizzard:  I think that's bug 80203.  (Some of us have been seeing that stack
for days now.  In fact, I think it's all bug 80203 and waterson's changes
somehow mysteriously triggered that bug on the page in jrgm's load tests.  Or
something like that...)

Christopher Blizzard (:blizzard)

Comment 23

•

23 years ago

I suspect you are correct.  OK, what is the real fix here?  Who needs to own
this and get it fixed?

Chris Waterson

Assignee

Comment 24

•

23 years ago

Urgh. Still looking at this.

Chris McAfee

Comment 25

•

23 years ago

over to pavlov, after we put both pavlov and waterson in the ring.

Assignee: waterson → pavlov

Status: REOPENED → NEW

Stuart Parmenter

Comment 26

•

23 years ago

Ok, someone is freeing the frame but not calling its Destroy() method.  mFrame
gets nulled out in the imageframe's Destroy() method.... investigating checkins
to layout.

Gagan

Updated

•

23 years ago

Target Milestone: --- → mozilla0.9.1

Chris Waterson

Assignee

Comment 27

•

23 years ago

The image frame that is not being destroyed is a direct child of a <td> frame.

Chris Waterson

Assignee

Comment 28

•

23 years ago

...I see the <td>'s frame get destroyed, but not the img frame.

Chris McAfee

Comment 29

•

23 years ago

back over to waterson.

Assignee: pavlov → waterson

rbs

Comment 30

•

23 years ago

waterson wrote:
> The image frame that is not being destroyed is a direct child of a <td> frame.

That's an interesting clue. That shouldn't happen. It means that an assumption
is broken. The _direct and sole_ child of a <td> frame should be a block frame
(not an image frame or anything else). The table code relies on this assumption.
Unpredictable disaster may strike if the assumption is broken.

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Comment 31

•

23 years ago

Or did waterson mean that it was the nsBlockFrame that was the inner frame for a
TD element?  (When I was poking at bug 80203 in the debugger the I think the
parent was an nsBlockFrame.)

rbs

Comment 32

•

23 years ago

Is there a public URL which triggers this bug?

Chris Waterson

Assignee

Comment 33

•

23 years ago

I've investigated further. This is indeed due to my changes in bug 43914, where 
I'll attach a test case that illustrates the problem.

Status: NEW → RESOLVED

Closed: 23 years ago → 23 years ago

Resolution: --- → FIXED

Stuart Parmenter

Comment 34

•

23 years ago

*** Bug 81063 has been marked as a duplicate of this bug. ***

Terri Preston

Comment 35

•

23 years ago

Verified fixed per email with jrgm, thanks for the help John!

Status: RESOLVED → VERIFIED