Crash in nsImageListener::FrameChanged

VERIFIED FIXED in mozilla0.9.1

Status

()

Core
ImageLib
--
blocker
VERIFIED FIXED
17 years ago
17 years ago

People

(Reporter: Dan Rosen, Assigned: Chris Waterson)

Tracking

({crash, smoketest})

Trunk
mozilla0.9.1
crash, smoketest
Points:
---

Firefox Tracking Flags

(Not tracked)

Details

(URL)

Attachments

(1 attachment)

(Reporter)

Description

17 years ago
The recent orange on the coffee tinderbox is jrgm's page-load test dying (on the
eighth page, or so, in the test). I duplicated this in my (tip) debug build, and
got the following stack trace:

#0  0x419565b6 in nsImageListener::FrameChanged (this=0x88525b8,
aContainer=0x8b084d8, aContext=0x87b77f0, newframe=0x8a17990,
dirtyRect=0xbffff370) at nsImageFrame.cpp:1638
#1  0x4179a371 in imgRequestProxy::FrameChanged (this=0x88526d8,
container=0x8b084d8, cx=0x0, newframe=0x8a17990, dirtyRect=0xbffff370) at
imgRequestProxy.cpp:260
#2  0x41797a7f in imgRequest::FrameChanged (this=0x8852b78, container=0x8b084d8,
cx=0x0, newframe=0x8a17990, dirtyRect=0xbffff370) at imgRequest.cpp:375
#3  0x4179378c in imgContainer::Notify (this=0x8b084d8, timer=0x8789a50) at
imgContainer.cpp:439
#4  0x4155e370 in nsTimerGtk::FireTimeout (this=0x8789a50) at nsTimerGtk.cpp:186
[and so forth...]

It looks as though |this| is getting clobbered in memory, maybe.

I'm not certain, but I think it might be related to pavlov's May 3rd fix for
78015 combined with mstoltz's enabling checkin for 78831.

Other folks on the hook (just looking brainlessly at tinderbox and seeing when
solid orange started) are waterson, hewitt, danm, yokoyama, ddrinan, and nhotta
(cc'ing).
(Reporter)

Comment 1

17 years ago
More info: it's dying on the "My Compuserve" page. I'll attach a full stack
trace and some variables in a moment. Setting as smoketest blocker.
Severity: critical → blocker
Keywords: crash, smoketest
(Reporter)

Comment 2

17 years ago
Created attachment 34685 [details]
full stacktrace, data in memory at |this|
(Assignee)

Comment 3

17 years ago
This happens on windows, too, but it looks an awful lot like a dependency
problem. (On one side of the call, the object looks fine; on the other side, it
thinks the mRefCnt slot is the mFrame slot.) I'm clobbering now to see if it
goes away. dr: did you start with a clean tree?

Comment 4

17 years ago
Maybe we need "yet another" clobber of coffee ... mcafee?
(Reporter)

Comment 5

17 years ago
waterson: My debug build has one change in it, in nsCSSFrameConstructor's
internal logic. I think that isn't affecting things for me -- I think I'm seeing
the same problem as is being seen on coffee. It is a depend build, though, not
clobber, if that's useful info...

If this is happening on windows, is the only reason why the windows tinderboxen
are green instead of orange that they don't run the page loader tests?

Platform, OS -> All
OS: Linux → All
Hardware: PC → All

Comment 6

17 years ago
They don't run the page load test. In fact, I don't think they (currently) run
any tests. So, as long as they compile ... green.

Comment 7

17 years ago
RH7.1 w/gcc2.96-81
slightly different backtrace, the beginning looks like this:

#0  0x00000029 in __strtol_internal (nptr=0x918f5b4 "\200pLA", endptr=0x8965eb8,
base=-1073746288, group=0) at eval.c:40
#1  0x4139e9fa in nsImageFrame::FrameChanged () from
/home/dark/DISK/mozilla/dist/bin/components/libgklayout.so
#2  0x413a12b8 in nsImageListener::FrameChanged () from
/home/dark/DISK/mozilla/dist/bin/components/libgklayout.so
#3  0x41326271 in imgRequestProxy::FrameChanged () from
/home/dark/DISK/mozilla/dist/bin/components/libimglib2.so

etc.
(Reporter)

Comment 8

17 years ago
Oh, I suppose I should mention, I'm using gcc-2.96-81 and a snapshot gdb
(insight) 20010427.

Can anybody tell whether either of the patches for bug 78015 or bug 78831 might
be the culprit? Or the checkins just before we went orange on coffee? I still
can't see who this should belong to.
(Reporter)

Comment 9

17 years ago
Looking more closely at:

http://bonsai.mozilla.org/cvsquery.cgi?module=MozillaTinderboxAll&date=explicit&mindate=989949120&maxdate=989952659

(which is the cycle when coffee went orange), I can't see how any of these would
have caused this problem. Removing some cc's of the obviously innocent. The one
checkin at that time which confuses me is yokoyama's. The checkin comment
doesn't have what seems to be the right bug number, so I'm just having trouble
deciphering this. It probably doesn't have anything to do with this either,
though...

Maybe the fact that it's dying on the "My Compuserve" page is useful
information? Dunno.

Comment 10

17 years ago
I havn't checked anything in that would have just started causing this at all to
this.  Start looking at the other checkins.

Comment 11

17 years ago
dr: the patches you are talking about have been in the tree for a long time.  I
really don't think they are coming back to bite us in the ass right now.
Whiteboard: NOTMYBUG

Comment 12

17 years ago
1) I did a full clobber build on win2k; still crashing at the same point

2) I looked at the content, and after experimenting a bit, it turns out
   that it depends on the page that it is leaving, the "static" copy being
   http://jrgm.mcom.com/perf/loadtime5/base/web.icq.com/index.html     

3) But going to the static copy is not enough, it's only when the page 
   loads are changed together that I can duplicate this crash. (Some
   (perhaps uncancelled) timer issue?)

4) I noticed the the first interval of orange was crashing on page "38",
   but that consistently this was crashing on page "8" since then. So, 
   making an assumption that maybe the first crash was '**** happens', 
   then maybe the current crash was due to checkins in the second interval.
   I tried backing out this checkin from waterson,
     cvs update -j3.431 -j3.430 mozilla/layout/html/base/src/nsBlockFrame.cpp
   and, for reasons that I cannot explain that fixes this crash. I tried 
   backing out and then reapplying twice each, and I crash everytime with in 
   applied, and never crash with it backed out. Could someone try this on Linux 
   with a current build (do we even know if this is happening on Mac). 

Removing the "NOTMYBUG" but leaving with pavlov, since he's coming back from 
Denny's in a bit I think, and perhaps he can see if this fixes the crash for
him.

To save time in debugging this (and show that it is the page exited 
that is driving the crash), I set up a simple test at 
  http://jrgm.mcom.com/yapl-crash/index.html
that loops through only three pages, including compuserve and icq, but 
in a different order (msnbc->compuserve->icq->msnbc...). It will crash
as msnbc is shown for a second time.

Whiteboard: NOTMYBUG

Comment 13

17 years ago
I don't have any tip builds right now (that arn't ripped apart).  reassigning 
to waterson.  let me know if i can help
Assignee: pavlov → waterson
This seems like it's probably the same as bug 80203, which has a lot of gdb
output but doesn't seem to be well-owned right now.  The exact location of the
crash varied a little on that bug, although some of the stacks here are another
stack frame up from anything noted there.
(Assignee)

Comment 15

17 years ago
Back to pavlov. The problem is that the imgRequestProxy::Cancel() method isn't 
being called, its mListener pointing to an nsImageListener on a dead frame. This 
really isn't my bug, pav. Backatcha.
Assignee: waterson → pavlov
(Assignee)

Comment 16

17 years ago
Allright, taking this back. Looks like my checkins for bug 43914 caused this 
regression. Backing those out fixes the crash.
Assignee: pavlov → waterson
(Assignee)

Comment 17

17 years ago
Backed out; added dependency to 43914.
Blocks: 43914
Status: NEW → RESOLVED
Last Resolved: 17 years ago
Resolution: --- → FIXED
(Reporter)

Comment 18

17 years ago
waterson: is this really fixed? why are we still seeing orange in tinderbox?
I ran jrgms loadtime test on the linux build 2001-05-16-12-trunk and mac 
2001-05-16-12-trunk.  It worked fine on both...cruised right on through icq.com.
Status: RESOLVED → VERIFIED
This isn't fixed.  I just crashed with this stack trace which includes your fixes:

#4  <signal handler called>
#5  0x4181455c in nsFrame::Invalidate (this=0x8e81f8c, aPresContext=0x8df70b0, 
    aDamageRect=@0xbffff250, aImmediate=0) at nsFrame.cpp:2197
#6  0x418225b6 in nsImageFrame::FrameChanged (this=0x8e81f8c, 
    aContainer=0x8e83210, aPresContext=0x8df70b0, aNewFrame=0x8e894b8, 
    aDirtyRect=0xbffff3a0) at nsImageFrame.cpp:430
#7  0x41824eec in nsImageListener::FrameChanged (this=0x8e868a8, 
    aContainer=0x8e83210, aContext=0x8df70b0, newframe=0x8e894b8, 
    dirtyRect=0xbffff3a0) at ../../../../dist/include/nsCOMPtr.h:642
#8  0x410b85b1 in imgRequestProxy::FrameChanged (this=0x8e84b38, 
    container=0x8e83210, cx=0x0, newframe=0x8e894b8, dirtyRect=0xbffff3a0)
    at imgRequestProxy.cpp:260
#9  0x410b6fb0 in imgRequest::FrameChanged (this=0x8e83050, 
    container=0x8e83210, cx=0x0, newframe=0x8e894b8, dirtyRect=0xbffff3a0)
    at imgRequest.cpp:375
Status: VERIFIED → REOPENED
Resolution: FIXED → ---
(Reporter)

Comment 21

17 years ago
Didn't think so <grumble> ... Pavlov, I know you checked in a week or so ago,
but it looks like mstoltz's checkin to nsImageFrame.cpp (rev 1.170) yesterday
activated your code. That's my reason for suspecting that your checkin might
conceivably be biting us in the ass only now. (OTOH, I haven't looked at the
code, sooo...)
blizzard:  I think that's bug 80203.  (Some of us have been seeing that stack
for days now.  In fact, I think it's all bug 80203 and waterson's changes
somehow mysteriously triggered that bug on the page in jrgm's load tests.  Or
something like that...)
I suspect you are correct.  OK, what is the real fix here?  Who needs to own
this and get it fixed?
(Assignee)

Comment 24

17 years ago
Urgh. Still looking at this.

Comment 25

17 years ago
over to pavlov, after we put both pavlov and waterson in the ring.

Assignee: waterson → pavlov
Status: REOPENED → NEW

Comment 26

17 years ago
Ok, someone is freeing the frame but not calling its Destroy() method.  mFrame
gets nulled out in the imageframe's Destroy() method.... investigating checkins
to layout.

Updated

17 years ago
Target Milestone: --- → mozilla0.9.1
(Assignee)

Comment 27

17 years ago
The image frame that is not being destroyed is a direct child of a <td> frame.
(Assignee)

Comment 28

17 years ago
...I see the <td>'s frame get destroyed, but not the img frame.

Comment 29

17 years ago
back over to waterson.
Assignee: pavlov → waterson

Comment 30

17 years ago
waterson wrote:
> The image frame that is not being destroyed is a direct child of a <td> frame.

That's an interesting clue. That shouldn't happen. It means that an assumption
is broken. The _direct and sole_ child of a <td> frame should be a block frame
(not an image frame or anything else). The table code relies on this assumption.
Unpredictable disaster may strike if the assumption is broken.
Or did waterson mean that it was the nsBlockFrame that was the inner frame for a
TD element?  (When I was poking at bug 80203 in the debugger the I think the
parent was an nsBlockFrame.)

Comment 32

17 years ago
Is there a public URL which triggers this bug?
(Assignee)

Comment 33

17 years ago
I've investigated further. This is indeed due to my changes in bug 43914, where 
I'll attach a test case that illustrates the problem.
Status: NEW → RESOLVED
Last Resolved: 17 years ago17 years ago
Resolution: --- → FIXED

Comment 34

17 years ago
*** Bug 81063 has been marked as a duplicate of this bug. ***

Comment 35

17 years ago
Verified fixed per email with jrgm, thanks for the help John!
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.