Closed Bug 81118 Opened 23 years ago Closed 23 years ago

Crash in nsImageListener::FrameChanged

Categories

(Core :: Graphics: ImageLib, defect)

defect
Not set
blocker

Tracking

()

VERIFIED FIXED
mozilla0.9.1

People

(Reporter: dr, Assigned: waterson)

References

()

Details

(Keywords: crash, smoketest)

Attachments

(1 file)

The recent orange on the coffee tinderbox is jrgm's page-load test dying (on the
eighth page, or so, in the test). I duplicated this in my (tip) debug build, and
got the following stack trace:

#0  0x419565b6 in nsImageListener::FrameChanged (this=0x88525b8,
aContainer=0x8b084d8, aContext=0x87b77f0, newframe=0x8a17990,
dirtyRect=0xbffff370) at nsImageFrame.cpp:1638
#1  0x4179a371 in imgRequestProxy::FrameChanged (this=0x88526d8,
container=0x8b084d8, cx=0x0, newframe=0x8a17990, dirtyRect=0xbffff370) at
imgRequestProxy.cpp:260
#2  0x41797a7f in imgRequest::FrameChanged (this=0x8852b78, container=0x8b084d8,
cx=0x0, newframe=0x8a17990, dirtyRect=0xbffff370) at imgRequest.cpp:375
#3  0x4179378c in imgContainer::Notify (this=0x8b084d8, timer=0x8789a50) at
imgContainer.cpp:439
#4  0x4155e370 in nsTimerGtk::FireTimeout (this=0x8789a50) at nsTimerGtk.cpp:186
[and so forth...]

It looks as though |this| is getting clobbered in memory, maybe.

I'm not certain, but I think it might be related to pavlov's May 3rd fix for
78015 combined with mstoltz's enabling checkin for 78831.

Other folks on the hook (just looking brainlessly at tinderbox and seeing when
solid orange started) are waterson, hewitt, danm, yokoyama, ddrinan, and nhotta
(cc'ing).
More info: it's dying on the "My Compuserve" page. I'll attach a full stack
trace and some variables in a moment. Setting as smoketest blocker.
Severity: critical → blocker
Keywords: crash, smoketest
This happens on windows, too, but it looks an awful lot like a dependency
problem. (On one side of the call, the object looks fine; on the other side, it
thinks the mRefCnt slot is the mFrame slot.) I'm clobbering now to see if it
goes away. dr: did you start with a clean tree?
Maybe we need "yet another" clobber of coffee ... mcafee?
waterson: My debug build has one change in it, in nsCSSFrameConstructor's
internal logic. I think that isn't affecting things for me -- I think I'm seeing
the same problem as is being seen on coffee. It is a depend build, though, not
clobber, if that's useful info...

If this is happening on windows, is the only reason why the windows tinderboxen
are green instead of orange that they don't run the page loader tests?

Platform, OS -> All
OS: Linux → All
Hardware: PC → All
They don't run the page load test. In fact, I don't think they (currently) run
any tests. So, as long as they compile ... green.
RH7.1 w/gcc2.96-81
slightly different backtrace, the beginning looks like this:

#0  0x00000029 in __strtol_internal (nptr=0x918f5b4 "\200pLA", endptr=0x8965eb8,
base=-1073746288, group=0) at eval.c:40
#1  0x4139e9fa in nsImageFrame::FrameChanged () from
/home/dark/DISK/mozilla/dist/bin/components/libgklayout.so
#2  0x413a12b8 in nsImageListener::FrameChanged () from
/home/dark/DISK/mozilla/dist/bin/components/libgklayout.so
#3  0x41326271 in imgRequestProxy::FrameChanged () from
/home/dark/DISK/mozilla/dist/bin/components/libimglib2.so

etc.
Oh, I suppose I should mention, I'm using gcc-2.96-81 and a snapshot gdb
(insight) 20010427.

Can anybody tell whether either of the patches for bug 78015 or bug 78831 might
be the culprit? Or the checkins just before we went orange on coffee? I still
can't see who this should belong to.
Looking more closely at:

http://bonsai.mozilla.org/cvsquery.cgi?module=MozillaTinderboxAll&date=explicit&mindate=989949120&maxdate=989952659

(which is the cycle when coffee went orange), I can't see how any of these would
have caused this problem. Removing some cc's of the obviously innocent. The one
checkin at that time which confuses me is yokoyama's. The checkin comment
doesn't have what seems to be the right bug number, so I'm just having trouble
deciphering this. It probably doesn't have anything to do with this either,
though...

Maybe the fact that it's dying on the "My Compuserve" page is useful
information? Dunno.
I havn't checked anything in that would have just started causing this at all to
this.  Start looking at the other checkins.
dr: the patches you are talking about have been in the tree for a long time.  I
really don't think they are coming back to bite us in the ass right now.
Whiteboard: NOTMYBUG
1) I did a full clobber build on win2k; still crashing at the same point

2) I looked at the content, and after experimenting a bit, it turns out
   that it depends on the page that it is leaving, the "static" copy being
   http://jrgm.mcom.com/perf/loadtime5/base/web.icq.com/index.html     

3) But going to the static copy is not enough, it's only when the page 
   loads are changed together that I can duplicate this crash. (Some
   (perhaps uncancelled) timer issue?)

4) I noticed the the first interval of orange was crashing on page "38",
   but that consistently this was crashing on page "8" since then. So, 
   making an assumption that maybe the first crash was '**** happens', 
   then maybe the current crash was due to checkins in the second interval.
   I tried backing out this checkin from waterson,
     cvs update -j3.431 -j3.430 mozilla/layout/html/base/src/nsBlockFrame.cpp
   and, for reasons that I cannot explain that fixes this crash. I tried 
   backing out and then reapplying twice each, and I crash everytime with in 
   applied, and never crash with it backed out. Could someone try this on Linux 
   with a current build (do we even know if this is happening on Mac). 

Removing the "NOTMYBUG" but leaving with pavlov, since he's coming back from 
Denny's in a bit I think, and perhaps he can see if this fixes the crash for
him.

To save time in debugging this (and show that it is the page exited 
that is driving the crash), I set up a simple test at 
  http://jrgm.mcom.com/yapl-crash/index.html
that loops through only three pages, including compuserve and icq, but 
in a different order (msnbc->compuserve->icq->msnbc...). It will crash
as msnbc is shown for a second time.

Whiteboard: NOTMYBUG
I don't have any tip builds right now (that arn't ripped apart).  reassigning 
to waterson.  let me know if i can help
Assignee: pavlov → waterson
This seems like it's probably the same as bug 80203, which has a lot of gdb
output but doesn't seem to be well-owned right now.  The exact location of the
crash varied a little on that bug, although some of the stacks here are another
stack frame up from anything noted there.
Back to pavlov. The problem is that the imgRequestProxy::Cancel() method isn't 
being called, its mListener pointing to an nsImageListener on a dead frame. This 
really isn't my bug, pav. Backatcha.
Assignee: waterson → pavlov
Allright, taking this back. Looks like my checkins for bug 43914 caused this 
regression. Backing those out fixes the crash.
Assignee: pavlov → waterson
Backed out; added dependency to 43914.
Blocks: 43914
Status: NEW → RESOLVED
Closed: 23 years ago
Resolution: --- → FIXED
waterson: is this really fixed? why are we still seeing orange in tinderbox?
I ran jrgms loadtime test on the linux build 2001-05-16-12-trunk and mac 
2001-05-16-12-trunk.  It worked fine on both...cruised right on through icq.com.
Status: RESOLVED → VERIFIED
This isn't fixed.  I just crashed with this stack trace which includes your fixes:

#4  <signal handler called>
#5  0x4181455c in nsFrame::Invalidate (this=0x8e81f8c, aPresContext=0x8df70b0, 
    aDamageRect=@0xbffff250, aImmediate=0) at nsFrame.cpp:2197
#6  0x418225b6 in nsImageFrame::FrameChanged (this=0x8e81f8c, 
    aContainer=0x8e83210, aPresContext=0x8df70b0, aNewFrame=0x8e894b8, 
    aDirtyRect=0xbffff3a0) at nsImageFrame.cpp:430
#7  0x41824eec in nsImageListener::FrameChanged (this=0x8e868a8, 
    aContainer=0x8e83210, aContext=0x8df70b0, newframe=0x8e894b8, 
    dirtyRect=0xbffff3a0) at ../../../../dist/include/nsCOMPtr.h:642
#8  0x410b85b1 in imgRequestProxy::FrameChanged (this=0x8e84b38, 
    container=0x8e83210, cx=0x0, newframe=0x8e894b8, dirtyRect=0xbffff3a0)
    at imgRequestProxy.cpp:260
#9  0x410b6fb0 in imgRequest::FrameChanged (this=0x8e83050, 
    container=0x8e83210, cx=0x0, newframe=0x8e894b8, dirtyRect=0xbffff3a0)
    at imgRequest.cpp:375
Status: VERIFIED → REOPENED
Resolution: FIXED → ---
Didn't think so <grumble> ... Pavlov, I know you checked in a week or so ago,
but it looks like mstoltz's checkin to nsImageFrame.cpp (rev 1.170) yesterday
activated your code. That's my reason for suspecting that your checkin might
conceivably be biting us in the ass only now. (OTOH, I haven't looked at the
code, sooo...)
blizzard:  I think that's bug 80203.  (Some of us have been seeing that stack
for days now.  In fact, I think it's all bug 80203 and waterson's changes
somehow mysteriously triggered that bug on the page in jrgm's load tests.  Or
something like that...)
I suspect you are correct.  OK, what is the real fix here?  Who needs to own
this and get it fixed?
Urgh. Still looking at this.
over to pavlov, after we put both pavlov and waterson in the ring.

Assignee: waterson → pavlov
Status: REOPENED → NEW
Ok, someone is freeing the frame but not calling its Destroy() method.  mFrame
gets nulled out in the imageframe's Destroy() method.... investigating checkins
to layout.
Target Milestone: --- → mozilla0.9.1
The image frame that is not being destroyed is a direct child of a <td> frame.
...I see the <td>'s frame get destroyed, but not the img frame.
back over to waterson.
Assignee: pavlov → waterson
waterson wrote:
> The image frame that is not being destroyed is a direct child of a <td> frame.

That's an interesting clue. That shouldn't happen. It means that an assumption
is broken. The _direct and sole_ child of a <td> frame should be a block frame
(not an image frame or anything else). The table code relies on this assumption.
Unpredictable disaster may strike if the assumption is broken.
Or did waterson mean that it was the nsBlockFrame that was the inner frame for a
TD element?  (When I was poking at bug 80203 in the debugger the I think the
parent was an nsBlockFrame.)
Is there a public URL which triggers this bug?
I've investigated further. This is indeed due to my changes in bug 43914, where 
I'll attach a test case that illustrates the problem.
Status: NEW → RESOLVED
Closed: 23 years ago23 years ago
Resolution: --- → FIXED
*** Bug 81063 has been marked as a duplicate of this bug. ***
Verified fixed per email with jrgm, thanks for the help John!
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: