Closed
Bug 81118
Opened 23 years ago
Closed 23 years ago
Crash in nsImageListener::FrameChanged
Categories
(Core :: Graphics: ImageLib, defect)
Core
Graphics: ImageLib
Tracking
()
VERIFIED
FIXED
mozilla0.9.1
People
(Reporter: dr, Assigned: waterson)
References
()
Details
(Keywords: crash, smoketest)
Attachments
(1 file)
2.04 KB,
text/plain
|
Details |
The recent orange on the coffee tinderbox is jrgm's page-load test dying (on the eighth page, or so, in the test). I duplicated this in my (tip) debug build, and got the following stack trace: #0 0x419565b6 in nsImageListener::FrameChanged (this=0x88525b8, aContainer=0x8b084d8, aContext=0x87b77f0, newframe=0x8a17990, dirtyRect=0xbffff370) at nsImageFrame.cpp:1638 #1 0x4179a371 in imgRequestProxy::FrameChanged (this=0x88526d8, container=0x8b084d8, cx=0x0, newframe=0x8a17990, dirtyRect=0xbffff370) at imgRequestProxy.cpp:260 #2 0x41797a7f in imgRequest::FrameChanged (this=0x8852b78, container=0x8b084d8, cx=0x0, newframe=0x8a17990, dirtyRect=0xbffff370) at imgRequest.cpp:375 #3 0x4179378c in imgContainer::Notify (this=0x8b084d8, timer=0x8789a50) at imgContainer.cpp:439 #4 0x4155e370 in nsTimerGtk::FireTimeout (this=0x8789a50) at nsTimerGtk.cpp:186 [and so forth...] It looks as though |this| is getting clobbered in memory, maybe. I'm not certain, but I think it might be related to pavlov's May 3rd fix for 78015 combined with mstoltz's enabling checkin for 78831. Other folks on the hook (just looking brainlessly at tinderbox and seeing when solid orange started) are waterson, hewitt, danm, yokoyama, ddrinan, and nhotta (cc'ing).
More info: it's dying on the "My Compuserve" page. I'll attach a full stack trace and some variables in a moment. Setting as smoketest blocker.
Assignee | ||
Comment 3•23 years ago
|
||
This happens on windows, too, but it looks an awful lot like a dependency problem. (On one side of the call, the object looks fine; on the other side, it thinks the mRefCnt slot is the mFrame slot.) I'm clobbering now to see if it goes away. dr: did you start with a clean tree?
Comment 4•23 years ago
|
||
Maybe we need "yet another" clobber of coffee ... mcafee?
waterson: My debug build has one change in it, in nsCSSFrameConstructor's internal logic. I think that isn't affecting things for me -- I think I'm seeing the same problem as is being seen on coffee. It is a depend build, though, not clobber, if that's useful info... If this is happening on windows, is the only reason why the windows tinderboxen are green instead of orange that they don't run the page loader tests? Platform, OS -> All
Comment 6•23 years ago
|
||
They don't run the page load test. In fact, I don't think they (currently) run any tests. So, as long as they compile ... green.
RH7.1 w/gcc2.96-81 slightly different backtrace, the beginning looks like this: #0 0x00000029 in __strtol_internal (nptr=0x918f5b4 "\200pLA", endptr=0x8965eb8, base=-1073746288, group=0) at eval.c:40 #1 0x4139e9fa in nsImageFrame::FrameChanged () from /home/dark/DISK/mozilla/dist/bin/components/libgklayout.so #2 0x413a12b8 in nsImageListener::FrameChanged () from /home/dark/DISK/mozilla/dist/bin/components/libgklayout.so #3 0x41326271 in imgRequestProxy::FrameChanged () from /home/dark/DISK/mozilla/dist/bin/components/libimglib2.so etc.
Oh, I suppose I should mention, I'm using gcc-2.96-81 and a snapshot gdb (insight) 20010427. Can anybody tell whether either of the patches for bug 78015 or bug 78831 might be the culprit? Or the checkins just before we went orange on coffee? I still can't see who this should belong to.
Looking more closely at: http://bonsai.mozilla.org/cvsquery.cgi?module=MozillaTinderboxAll&date=explicit&mindate=989949120&maxdate=989952659 (which is the cycle when coffee went orange), I can't see how any of these would have caused this problem. Removing some cc's of the obviously innocent. The one checkin at that time which confuses me is yokoyama's. The checkin comment doesn't have what seems to be the right bug number, so I'm just having trouble deciphering this. It probably doesn't have anything to do with this either, though... Maybe the fact that it's dying on the "My Compuserve" page is useful information? Dunno.
Comment 10•23 years ago
|
||
I havn't checked anything in that would have just started causing this at all to this. Start looking at the other checkins.
Comment 11•23 years ago
|
||
dr: the patches you are talking about have been in the tree for a long time. I really don't think they are coming back to bite us in the ass right now.
Whiteboard: NOTMYBUG
Comment 12•23 years ago
|
||
1) I did a full clobber build on win2k; still crashing at the same point 2) I looked at the content, and after experimenting a bit, it turns out that it depends on the page that it is leaving, the "static" copy being http://jrgm.mcom.com/perf/loadtime5/base/web.icq.com/index.html 3) But going to the static copy is not enough, it's only when the page loads are changed together that I can duplicate this crash. (Some (perhaps uncancelled) timer issue?) 4) I noticed the the first interval of orange was crashing on page "38", but that consistently this was crashing on page "8" since then. So, making an assumption that maybe the first crash was '**** happens', then maybe the current crash was due to checkins in the second interval. I tried backing out this checkin from waterson, cvs update -j3.431 -j3.430 mozilla/layout/html/base/src/nsBlockFrame.cpp and, for reasons that I cannot explain that fixes this crash. I tried backing out and then reapplying twice each, and I crash everytime with in applied, and never crash with it backed out. Could someone try this on Linux with a current build (do we even know if this is happening on Mac). Removing the "NOTMYBUG" but leaving with pavlov, since he's coming back from Denny's in a bit I think, and perhaps he can see if this fixes the crash for him. To save time in debugging this (and show that it is the page exited that is driving the crash), I set up a simple test at http://jrgm.mcom.com/yapl-crash/index.html that loops through only three pages, including compuserve and icq, but in a different order (msnbc->compuserve->icq->msnbc...). It will crash as msnbc is shown for a second time.
Whiteboard: NOTMYBUG
Comment 13•23 years ago
|
||
I don't have any tip builds right now (that arn't ripped apart). reassigning to waterson. let me know if i can help
Assignee: pavlov → waterson
This seems like it's probably the same as bug 80203, which has a lot of gdb output but doesn't seem to be well-owned right now. The exact location of the crash varied a little on that bug, although some of the stacks here are another stack frame up from anything noted there.
Assignee | ||
Comment 15•23 years ago
|
||
Back to pavlov. The problem is that the imgRequestProxy::Cancel() method isn't being called, its mListener pointing to an nsImageListener on a dead frame. This really isn't my bug, pav. Backatcha.
Assignee: waterson → pavlov
Assignee | ||
Comment 16•23 years ago
|
||
Allright, taking this back. Looks like my checkins for bug 43914 caused this regression. Backing those out fixes the crash.
Assignee: pavlov → waterson
Assignee | ||
Comment 17•23 years ago
|
||
Backed out; added dependency to 43914.
Reporter | ||
Comment 18•23 years ago
|
||
waterson: is this really fixed? why are we still seeing orange in tinderbox?
Comment 19•23 years ago
|
||
I ran jrgms loadtime test on the linux build 2001-05-16-12-trunk and mac 2001-05-16-12-trunk. It worked fine on both...cruised right on through icq.com.
Status: RESOLVED → VERIFIED
Comment 20•23 years ago
|
||
This isn't fixed. I just crashed with this stack trace which includes your fixes: #4 <signal handler called> #5 0x4181455c in nsFrame::Invalidate (this=0x8e81f8c, aPresContext=0x8df70b0, aDamageRect=@0xbffff250, aImmediate=0) at nsFrame.cpp:2197 #6 0x418225b6 in nsImageFrame::FrameChanged (this=0x8e81f8c, aContainer=0x8e83210, aPresContext=0x8df70b0, aNewFrame=0x8e894b8, aDirtyRect=0xbffff3a0) at nsImageFrame.cpp:430 #7 0x41824eec in nsImageListener::FrameChanged (this=0x8e868a8, aContainer=0x8e83210, aContext=0x8df70b0, newframe=0x8e894b8, dirtyRect=0xbffff3a0) at ../../../../dist/include/nsCOMPtr.h:642 #8 0x410b85b1 in imgRequestProxy::FrameChanged (this=0x8e84b38, container=0x8e83210, cx=0x0, newframe=0x8e894b8, dirtyRect=0xbffff3a0) at imgRequestProxy.cpp:260 #9 0x410b6fb0 in imgRequest::FrameChanged (this=0x8e83050, container=0x8e83210, cx=0x0, newframe=0x8e894b8, dirtyRect=0xbffff3a0) at imgRequest.cpp:375
Status: VERIFIED → REOPENED
Resolution: FIXED → ---
Reporter | ||
Comment 21•23 years ago
|
||
Didn't think so <grumble> ... Pavlov, I know you checked in a week or so ago, but it looks like mstoltz's checkin to nsImageFrame.cpp (rev 1.170) yesterday activated your code. That's my reason for suspecting that your checkin might conceivably be biting us in the ass only now. (OTOH, I haven't looked at the code, sooo...)
blizzard: I think that's bug 80203. (Some of us have been seeing that stack for days now. In fact, I think it's all bug 80203 and waterson's changes somehow mysteriously triggered that bug on the page in jrgm's load tests. Or something like that...)
Comment 23•23 years ago
|
||
I suspect you are correct. OK, what is the real fix here? Who needs to own this and get it fixed?
Assignee | ||
Comment 24•23 years ago
|
||
Urgh. Still looking at this.
Comment 25•23 years ago
|
||
over to pavlov, after we put both pavlov and waterson in the ring.
Assignee: waterson → pavlov
Status: REOPENED → NEW
Comment 26•23 years ago
|
||
Ok, someone is freeing the frame but not calling its Destroy() method. mFrame gets nulled out in the imageframe's Destroy() method.... investigating checkins to layout.
Assignee | ||
Comment 27•23 years ago
|
||
The image frame that is not being destroyed is a direct child of a <td> frame.
Assignee | ||
Comment 28•23 years ago
|
||
...I see the <td>'s frame get destroyed, but not the img frame.
Comment 30•23 years ago
|
||
waterson wrote:
> The image frame that is not being destroyed is a direct child of a <td> frame.
That's an interesting clue. That shouldn't happen. It means that an assumption
is broken. The _direct and sole_ child of a <td> frame should be a block frame
(not an image frame or anything else). The table code relies on this assumption.
Unpredictable disaster may strike if the assumption is broken.
Or did waterson mean that it was the nsBlockFrame that was the inner frame for a TD element? (When I was poking at bug 80203 in the debugger the I think the parent was an nsBlockFrame.)
Comment 32•23 years ago
|
||
Is there a public URL which triggers this bug?
Assignee | ||
Comment 33•23 years ago
|
||
I've investigated further. This is indeed due to my changes in bug 43914, where I'll attach a test case that illustrates the problem.
Status: NEW → RESOLVED
Closed: 23 years ago → 23 years ago
Resolution: --- → FIXED
Comment 34•23 years ago
|
||
*** Bug 81063 has been marked as a duplicate of this bug. ***
Comment 35•23 years ago
|
||
Verified fixed per email with jrgm, thanks for the help John!
Status: RESOLVED → VERIFIED
You need to log in
before you can comment on or make changes to this bug.
Description
•