Closed Bug 144315 Opened 22 years ago Closed 22 years ago

Trunk M11A crash [@ nsImageListener::FrameChanged] [@ nsBox::Redraw] [@ 0x00000000]

Categories

(Core :: Layout, defect, P1)

defect

Tracking

()

VERIFIED FIXED
mozilla1.0.1

People

(Reporter: jay, Assigned: bryner)

References

()

Details

(Keywords: crash, testcase, topcrash+, Whiteboard: [adt2 rtm] custrtm-)

Crash Data

Attachments

(7 files)

Although bug 133410 and bug 138292 have been verified fixed, we are still seeing
crashes at nsImageListener::FrameChanged with recent MozillaTrunk builds.  Those
2 bugs might have some clues to what's going on, but we need to address these
new crashes:

Count   Offset    Real Signature
[ 35   nsImageListener::FrameChanged 1c0e1f8a - nsImageListener::FrameChanged ]
[ 26   nsImageListener::FrameChanged f0971e0e - nsImageListener::FrameChanged ]
[ 21   nsImageListener::FrameChanged 0be4b6aa - nsImageListener::FrameChanged ]
[ 9   nsImageListener::FrameChanged 2a5e057a - nsImageListener::FrameChanged ]
[ 5   nsImageListener::FrameChanged 70ae2a6a - nsImageListener::FrameChanged ]
[ 4   nsImageListener::FrameChanged a88c85df - nsImageListener::FrameChanged ]
[ 4   nsImageListener::FrameChanged 937cff02 - nsImageListener::FrameChanged ]
[ 3   nsImageListener::FrameChanged f4f6126b - nsImageListener::FrameChanged ]
 
     Crash date range: 2002-05-04 to 2002-05-12
     Min/Max Seconds since last crash: 29 - 446127
     Min/Max Runtime: 410 - 484914
     Keyword List : click(4),  
     Count   Platform List 
     51   Windows NT 5.0 build 2195
     49   Windows NT 5.1 build 2600
     7   Windows 98 4.10 build 67766446
 
     Count   Build Id List 
     20   2002050708
     15   2002050408
     14   2002050807
     9   2002050308
     8   2002050608
     8   2002050604
     8   2002050508
     6   2002050504
     5   2002050908
     3   2002051008
     3   2002051004
     3   2002050705
     2   2002050904
     2   2002050404
     1   2002051204
 
     No of Unique Users        73
 
 Stack trace(Frame) 

	 nsImageListener::FrameChanged
[nsImageFrame.cpp  line 2383] 
	 imgRequestProxy::FrameChanged
[imgRequestProxy.cpp  line 294] 
	 imgRequest::FrameChanged
[imgRequest.cpp  line 338] 
	 imgContainer::Notify
[imgContainer.cpp  line 459] 
	 nsTimerImpl::Fire
[nsTimerImpl.cpp  line 357] 
	 nsTimerManager::FireNextIdleTimer
[nsTimerImpl.cpp  line 591] 
	 nsAppShell::Run
[nsAppShell.cpp  line 134] 
	 nsAppShellService::Run
[nsAppShellService.cpp  line 451] 
	 main1
[nsAppRunner.cpp  line 1472] 
	 main
[nsAppRunner.cpp  line 1808] 
	 WinMain
[nsAppRunner.cpp  line 1826] 
	 WinMainCRTStartup()  
	 kernel32.dll + 0x1eb69 (0x77e7eb69)   
 
     (6211424)	URL: http://slashdot.org
     (6159191)	Comments: Click boom bah! Nothing out of the ordinary. Single window.I think
these crashes are intention so you can gather marketroidle demographics
information  like what other programs I'm running at the time. Try tossing some
more code at the screen to see if
     (6159191)	Comments:  it sticks. We don need no steenkin algorithms.
     (6147840)	URL: www.ubid.com
     (6115349)	URL: http://www.ubid.com/actn/opn/getpage.asp?AuctionId=7214002
     (6101905)	URL: groups.yahoo.com
     (6101822)	URL: groups.yahoo.com
     (6067037)	URL: www.paypal.com
     (6067037)	Comments: I was trying to login to their secure site
     (6066401)	Comments: Moving back and forth between eBay & Half.com.  Was doing a "back"
from Half.com to eBay when it errored.
     (6054065)	URL: www.blockbuster.com
     (6041117)	URL: http://www.ubid.com/actn/opn/getpage.asp?AuctionId=7214002
     (6041117)	Comments: Initial click on the page
     (6038291)	Comments: scrolled a bugzilla query result-page before it had fully loaded
     (6033643)	URL: http://gamefix.free.fr
     (6032400)	URL: http://gamefix.free.fr
     (6032355)	URL: http://www.winace.com
     (6012854)	Comments: I was surfing eBay
     (6012843)	Comments: I was surfing eBay
     (6012830)	Comments: when pressing Home button
     (6012671)	URL: http://cgi.ebay.com/ws/eBayISAPI.dll?ViewItem&Item=1535479505
     (6003592)	URL: http://www.football365.com/Homegrounds/Chelsea/News/index.shtml
     (6003592)	Comments: Clicked on  Rangers Hero link
     (5992341)	Comments: clicked on linik regarding 'armored ascii bug' in google search
for 'armored ascii'. Kept going to linuxtoday site and when clicked back would
not go back (link was redirecting me?) . Hit back a couple times then crashed.
     (5967321)	URL: www.neimanmarcus.com
     (5956412)	URL: http://www.wrestlingheadlines.com/index2.html
     (5929043)	Comments: browsing a web site

We have found at lease 1 reproducible testcase.  jrgm was able to crash by doing
this:
"Try loading http://gamefix.free.fr/ and then 'about:blank', and repeating that
once or twice."
Adding crash keywords and nominating.  We need to keep an eye on the branch to
see if this is crashing there as well.
As I note in bug 142830, while the talkback stacks for the crash at 
http://gamefix.free.fr/ name nsImageListener::FrameChanged as the top of stack 
for the crash, when I reproduce the crash in an opt-with-symbols build on win2k
I am actually crashing in nsImageBoxListener::OnStopDecode()

But, at any rate, the crash at http://gamefix.free.fr/ is very reproducible 
after one to three tries of alternating between loading gamefix and 
about:blank in current trunk builds (I can't seem to do it in branch builds 
though).
nsbeta1+.
Reassigning to Alex.
Assignee: attinasi → alexsavulov
Keywords: nsbeta1nsbeta1+
Whiteboard: [adt1 rtm]
Priority: -- → P1
Target Milestone: --- → mozilla1.0
Whiteboard: [adt1 rtm] → [adt1 rtm] custrtm-
Lowering impact to [ADT2 RTM], per ADT triage.
Whiteboard: [adt1 rtm] custrtm- → [adt2 rtm] custrtm-
OS should be all, I crash with this sig in linux, and there are several linux
signatures in talkback.

However someone else will have to make that change.
INTERMEDIATE ANALYSIS RESULTS:

- happens randomly on all kinds of urls, that means is not related to a specific
HTML/whatever construct (slashdot.org is a frequently visited page, we would
have much more crash reports than one)

- if the crash occurs in nsImageListener::FrameChanged then is possible that we
have to do with a corrupt frame pointer to the nsImageFrame that created this
nsImageListener in its RealLoadImage (what a name). 

- the question that arises here is: Why does the listener exist beyond the
life-end of its owner/client (nsImageFrame)?

investigating
Blocks: 146027
there is a problem with this nsImageListener used by the nsImageFrame:

- i was able traking the con/destruction of the image listener and of the frame
to demonstrate myself that we leak an nsImageListener in some circumstances
- in that case it might occur that the listener gets notified (FrameChanged for
example and the crash occurs) i suspect that slower connections might have this
problem. for some reason the imgRequestProxy that gets a reference (pointer) for
the nsImageListener in form of an imgIDecoderObserver might not have finished
the load still the frame was deleted.
- what i don't know yet is what the hell deletes the frame without invoking
Destroy and why?

simple way to reproduce:

- open the browser about:blank (not to get confused)
- browse to www.google.com (an image frame for the logo will be created with the
corresponding image listener)
- reload the page with Refresh+SHIFT

in this moment we already leaked the previous image listener. (note that the
page contains an empty image too 1px/1px that can be ignored). 
aaaaaah, there is another problem:

is not that the frame is deleted without calling destroy but there is something
worng with that listener: something keeps a reference on it and it cannot be
deleted.
hmmm, actually is not really a leak. the nsImageListener keeps existing until i
 browse again to an image free page.

example:

- load about:blank
- load google.com (now does that first image listener gets created)
- reload (the image frame gets destroyed, the listener not)
- click on one of the links there (the listener still lives)
- go to any random page with images (the listener stil lives)
- go to any random page with images (the listener stil lives)
...
- load about:blank (the listener gets destroyed)

for gods sake:

W     W H   H Y   Y  ???
W  W  W HHHHH  Y Y  ?   ? 
W  W  W H   H   Y      ?
 W  W   H   H   Y     ?
                      ?

goddamit
ccing waterson, dbaron, dp for more information. 
If you want to see why an image listener stays alive longer than you expected,
use refcount logging.  See http://mozilla.org/performance/leak-brownbag.html and
http://mozilla.org/performance/refcnt-balancer.html and
http://www.mozilla.org/performance/leak-tutorial.html and
http://lxr.mozilla.org/seamonkey/source/xpcom/doc/MemoryTools.html .

Note that nsImageFrame::Destroy nulls the pointer back to the frame, and that
this crash is due to calling through a null vtable pointer, which probably means
the object has been deleted.  This makes it seem like the cause is leaked frames
-- do the leak logs (XPCOM_MEM_LEAK_LOG) for the pages in question show
|nsFrame|s leaked (which means that their destructors weren't called, but we
then deleted the arena that contained them)?  This could cause such a crash.
yeah, i noticed that. so what i need to find out is what and why is there that
"something" that keeps the reference to the listener pointer for the entire time
we have images on the loaded pages. i think that actually that somehing that
lives on keeps the reference indirectly trough some of it's owned objects.
i also think this is a general problem and the crash occurs randomly depending
on some time factors. thanks for the refcount logging hint. i will work on it.
let's see what comes out of it.
i was stupid all of the time. aaaaaah!

ok i know now how this works: as soon as a page with images is loaded, the first
frame also loads the gif's for "loading" and "broken" (although broken is not
needed yet) then the actual image gets loaded. now the problem is that the image
loader creates a proxy for every one of the three images and they are added i
the observer list of the requester. but be cause they are loaded for the first
image frame, they are associated with the image listener that was created by
that frame. after that when a new page with images is loaded the frame gets
destroyed but the proxies for "loading" and "broken" are cached. so they still
keep a reference on that image listener that was created by the defunct frame.
so actually we don't leak the listener at all. we just use it multiple times
until an image frame manages to delete mIconLoad if(mIconLoad->Release()) during
destroy.

now, if a frame gets deleted otherwise than using Destroy, it does not reset the
listener to not keep the pointer to the frame and that means crash. for now, the
only thing i can imagine is either:

- modifying the way we load the two icons "loading" and "broken" and keep them
separated from any actual image frame or

- just move the part for the listener from nsImageFrame::Destroy to ~nsImageFrame

// set the frame to null so we don't send messages to a dead object.
  if (mListener)
    NS_REINTERPRET_CAST(nsImageListener*, mListener.get())->SetFrame(nsnull);

  mListener = nsnull;

cc'ing pavlov for his oppinion
this is the only way to make sure that the frame is set to null when the frame
gets destroyeds since i cannot reproduce the crash and we don't know how does
it happens that the frame gets destroyed without calling Destroy. i know, is
not the real fix but is better than nothing. i invite anyone aquainted with the
image stuff to take a nearer look and if he think this can be solved otherwise,
then go on.
that won't fix it.  the destructor calls Destroy().

Frames are arena allocated, so it is easy for one to get stomped on if not
handled properly.  this is usually caused by bad html that causes us to create
frames in places that we normally wouldn't create.
which destructor calls Destroy?

nsFrame::Destroy calls delete this; then pressshell::FreeFrame

nsImageFrame::~nImageFrame is empty
nsFrame::~nsFrame does not call Destroy
although hmm, you're right calling delete on a frame won't remove the frame from
the arena. the frame would still be there.
But frames are _always_ destroyed via the Destroy method, so I don't see how
this would fix it. Or have you found a code path where the frame is deleted
directly? (Since nsFrame's dtor is protected, it would need to be done by a
subclass...)
I don't know exactly what's going on here.

Fact is that nsImageFrame has a public destructor for whatever reasons so what
Waterson said is possible.
I was thinking that something destroys the frames not using Destroy but that's
why I thought to put the ...mListener.get())->SetFrame(nsnull)... into destructor.

Pavlov suggests that bad HTML could be the reason for that causing corruptions
in the arena regions. Is it possible that the arena overlaps regions or
something? If nothing attempts to destroy/delete those frames, why is that vfptr
corrupt?

Ok, two ideeas of what I could do:

- make the destructor of nsImageFrame protected and see how many things we break
(great tip Chris, I missed that detail, d'oh)
- learn what that timer thing does, i bet there are animated gif's involved (it
might be a combination of animated gif and bad html)
It seems like it's worth looking for cases where we "leak" frames (i.e., don't
call destructor or Destroy) on the pages where this crash happens.  (Are there
any assertions on the page related to frame construction?)
sorry, my mistake i was looking at the wrong class: ~nsImageFrame is protected
*** Bug 152475 has been marked as a duplicate of this bug. ***
I can reproduce this crash with the steps in bug 152475.. (everytime)
BTW: we need a new target milestone...
matti,

what is the connection speed u have? i sit on a >2Mb/s thing and i don't see the
crash. the images are comming and there is no crash.
Target Milestone: mozilla1.0 → mozilla1.0.1
I can crash, given the steps in bug 152475, clicking on the upper-right red 
arrow, within one to three pages using 2002-06-17-07-trunk (the trunk) on win2k

However, I _cannot_ get the same crash when using 2002-06-17-07-1.0 (the
branch) on win2k.

Note: this is with a connection from B21 in Mt. View (i.e., high bandwidth).
i click 1x right and 1x the red-upper corner
connection: DSL 768 down/128kbit upstream

I tested this with my optimized build (crash) and after that with my debug to
create the stack. Tried with 1.0: no crash, optimized trunk again : crash
















thx adrian. i've just checked talkback's data and it _is_ happening on all 
platforms. changing platform to "all"
Hardware: PC → All
the stack sig is also showing up on macs when looking at talkback data.
OS: Windows 2000 → All
sorry for the whitespace...
Updating summary with M11A since this is a topcrasher with Mozilla 1.1 Alpha.  I
will attach some Talkback reports.
Summary: Trunk crash [@ nsImageListener::FrameChanged] → Trunk M11A crash [@ nsImageListener::FrameChanged]
I have crashes from Mozilla 1.1 Alpha to help us reproduce this on the
MozillaTrunk.
Well, for whatever reason, this crash is not showing up in large numbers with
Mozilla 1.0 (although there are quite a few crashes like sister bug 146027).
Quick Bandage/Fix, I'll explain in more detail later (still trying to better
hash out what's going on).. if it works for everyone else, that is.

Steps that make me crash 100% of the time:
1. Go to
http://www.streetmap.co.uk/newmap.srf?x=526982&y=179499&z=0&sv=sw7+1ne&st=2&tl=Postcode+sw7+1ne&pc=sw7+1ne&mapp=newmap.srf&searchp=newsearch.srf
Wait for it to load.
2. In the same window, type www.google.com into the URL bar and press enter
3. Should crash

After step 1, you should get at least 2 assertions
###!!! ASSERTION: initial containing block already created: 'nsnull ==
mInitialContainingBlock', file nsCSSFrameConstructor.cpp, line 8916
and
###!!! ASSERTION: unexpected next reflow command frame: '*iter ==
mFrames.FirstChild()', file nsHTMLFrame.cpp, line 522
If you didn't get those assertions (in debug mode), or the crash didn't happen
when you went to google, just click the arrows for a while.
I very much doubt the patch is fixing the root of the problem, but rather 
patching one of the effects of it (perhaps the only effect?).  The patch exits 
the function if mInitialContainingBlock is null.  This gets rid of the second 
assertion entirely, and the page doesn't appear to be affected adversly.

A few things I've learned while playing (I may be duplicating other people's 
findings here):
- The second assertion is caused by the first.

- In nsImageListener::FrameChanged, mFrame is invalid.  The frame has been 
since destroyed.

- One of the imageframes gets created with that first assertion error.  Then, 
when all the imageframes are destroyed (nsImageFrame::~nsImageFrame), it 
doesn't get destroyed (so it never gets added to the list of stuff to 
destory?). After everything's been removed (& another page is loaded), the 
animated gif timer fires for the lost imageframe, as per imgContainer::Notify() 
in the stack.  Things flow until the eventual crash due to mFrame being invalid.

- It has something to do with <iframe> and <frame> (HTML objects).  The 
animated gif that causes the crash is inside a <iframe>.  The first assertion 
comes from ContentInserted which is called by nsPresShell::InitialReflow with a 
null container.

- InitialReflow is called twice for an <iframe>!  This is probably why the 
assertion occurs.
it is a broken frame tree. we know that. i'm on it. found a way to reproduce this.
*** Bug 153059 has been marked as a duplicate of this bug. ***
I've been trying to reproduce the crash from "sister" bug 146027 and was able to
crash, but with the stack signature and trace from this bug:

Incident ID 7553267
Stack Signature nsImageListener::FrameChanged 9f3827be
Email Address jpatel@netscape.com
Product ID MozillaTrunk
Build ID 2002061808
Trigger Time 2002-06-20 15:47:56
Platform Win32
Operating System Windows NT 4.0 build 1381
Module gklayout.dll
URL visited bug 146027
User Comments Using Ly's steps to repro at the MIB website. 1. Type in or paste
http://www.sonypictures.com/movies/meninblack/ into URL bar. 2. Click
enter...and 2 new windows will pop up. A popup ad and a new window with the
content of the MIB page. The original window only has the navigation menu at the
top (the content is missing b/c it is in a new window). 3. Close popup add and
new window with the MIB content in it. Only the original window should remain
open. 4. Click on URL bar. The URL will be highlighted...you can click again to
unhighlight it (although I doubt it makes a difference). Then press Enter to
reload the page....BOOM!
Trigger Reason Access violation
Source File Name c:/builds/seamonkey/mozilla/layout/html/base/src/nsImageFrame.cpp
Trigger Line No. 2380
Stack Trace
nsImageListener::FrameChanged
[c:/builds/seamonkey/mozilla/layout/html/base/src/nsImageFrame.cpp, line 2380]
imgRequestProxy::FrameChanged
[c:/builds/seamonkey/mozilla/modules/libpr0n/src/imgRequestProxy.cpp, line 295]
imgRequest::FrameChanged
[c:/builds/seamonkey/mozilla/modules/libpr0n/src/imgRequest.cpp, line 339]
imgContainer::Notify
[c:/builds/seamonkey/mozilla/modules/libpr0n/src/imgContainer.cpp, line 460]
nsTimerImpl::Fire [c:/builds/seamonkey/mozilla/xpcom/threads/nsTimerImpl.cpp,
line 352]
nsTimerManager::FireNextIdleTimer
[c:/builds/seamonkey/mozilla/xpcom/threads/nsTimerImpl.cpp, line 588]
nsAppShell::Run [c:/builds/seamonkey/mozilla/widget/src/windows/nsAppShell.cpp,
line 134]
nsAppShellService::Run
[c:/builds/seamonkey/mozilla/xpfe/appshell/src/nsAppShellService.cpp, line 458]
main1 [c:/builds/seamonkey/mozilla/xpfe/bootstrap/nsAppRunner.cpp, line 1472]
main [c:/builds/seamonkey/mozilla/xpfe/bootstrap/nsAppRunner.cpp, line 1808]
WinMain [c:/builds/seamonkey/mozilla/xpfe/bootstrap/nsAppRunner.cpp, line 1826]
WinMainCRTStartup()
KERNEL32.dll + 0x1ba06 (0x77f1ba06)

I have been able to conistently crash with the steps from my incident
above...although I think you will only crash if one of the frame content is
opened up in a new window instead of the frame in the original window (read: if
2 new windows don't pop up when you load the url, you probably won't crash).
Added http://www.sonypictures.com/movies/meninblack/ to URL field, since we have
been able to reproduce at that URL.
yeah, i was able to repro this crash with the steps i used in bug 146027 comment
#24 consistantly. this morning, i wasn't able to repro using arron's algorithm.
i pulled my trunk build at this time Jun 18 05:37. jay and i were talking and it
looks like it's popup windows related.
Here is an importatnt question guys: 

from my experimets, i get the broken frame tree only when there is a popup
window around. (all the URL's i was successfull to test so far bringup a popup
window)

Did anyone reproduced this (or it's twin bug 146027) on URLs that do not pop up
a second window?

I downloaded a new nightly and my steps don't work anymore until I hit the back
button after I go to google.
No matter, if you have a way to get the crash, that's all that's important.

The MIB also has a frameset, just like the streetmap site does.  I can't get the
MIB sight to crash (go figure), but I'm guessing that it does the two
InitialReflow 's for one frame like I said in Comment 34.

I'll leave this bug to the more capable hands of people who understand frame
trees and layout :)
Arron:

thanks for the help. yeah, i saw the same things you saw seems that all of this
things are needed in order to repro: a frameset, a popup, some animated gif. I
was able to repro the crash in may ways: refreshing, browsing to another page,
closing the window, and so on. In one of the testcases we totally fail to build
the frame tree, all that you can see there is that nice gray background. first i
will construct a local testcase cause QA will need something to verify the fix
and i need something that does not dissapear from one day to another :-) then
i'll proceed to the fix.
I can now reproduce the MIB crash.  I had to turn on "Open Unrequested Windows"
to make it crash though.  The streetmap one didn't require that setting turned
on.  Another thing that I may not have mentioned (clearly) is that I've found
when the crash happens, it's actually the preceding page that is causing it
(it's the one that has the bad frame tree).  That's why my testcase worked
(well, for me :P) only when I went to google (it could have been any page).

I've tried making a testcase, even by saving the pages that crashed my mozilla,
but once it was local, it didn't crash anymore.  I'm not sure if this hints to a
network library issue, or perhaps incremental loads (from the net) cause the bug
to appear, but instantaneous loads (from HD) do not.

Good luck :)
actually it shouldn't make a difference but it could (i mean network vs. local).
necko is used in both cases, but I still think the cause is pure layout. the
london streetmap testcase was not crashing on my machine.
I'm a long time user of Moz.
bug 153059 was submitted by me. And although I did give info there about what I
browse, I forgot another addition.
Comment #39 's "it looks like it's popup windows related" sparked a thought. I
already said in my bug that I browse 4 different forums. 2 of them have
advertisements, the other 2 don't. The 2 that do are a Snitz forum and a phpBB
forum, the ones that don't are vBB and Beehive (A new development looking like
Delphi, but started from scratch)

Where am I going to? I have set in my old and my new profiles that "open
unrequested windows" (pop-ups), "lower and raise windows" and "move or resize
windows" are off. I also use "start as: Blank page".

None of the forums I go to have pop-up ads, but I can get a consistent crash
when I open up a 4th tabbed browser window. And the 4th is usually one of the
forums without ads, but with frames. I do open the 4th tab from a link on one of
the other forums already open. It loads completely (I have a 1Mbit/256k ADSL
connection, no proxy), before I even have the chance to change it to the forum I
want to go to (either through typing it in in the address bar or through
bookmarks). 

My crashes were at first in <unknown> (the most usefull crash in history ;))
Later they changed consistently to GKLAYOUT.DLL and as of late I also have
FULLSOFT.DLL in the list.

This comment may just to add confusion to the ranks ;)
But I hope it helps in some ways.

*** Bug 153211 has been marked as a duplicate of this bug. ***
Let me report that I can crash Mozilla 2002061708 trunk for MacOS9.x at a URL
that does not have popup windows. It seems the URL contains iframe that may be
causing the crash.
TB7567162Y crash occurred when I visited 
http://book.asahi.com/review/?info=b&no=1 ,
and then clicked links within the page such as;
http://book.asahi.com/review/?info=s&no=1
http://book.asahi.com/review/?info=s&no=2
http://book.asahi.com/review/?info=s&no=4 ,
 and then repeated going back and forward several times.
aha, hmmm, i do belive this is happening, however i have problems to make it
crash. maybe there is a timing problem in tha way i click the links.
hirata: i was _not_ able to crash using your steps 8(
weird...when i put some printf statements in, the crash doesn't happen anymore.
i took them out thinking i was crazy but, you know, it starts crashing again. i
spoke  to darinf and he says that that may indicate "event queuing issue".
printf's cause some delays in event queue processing. is normal that a crash
dissapears when you printf. happended to me a lot of times. at the beginning i
thought i'm on crack but i wasn't. it points to the fact that multithreading is
involved. (our necko works multithreaded)
The leaked frames, at least on http://www.sonypictures.com/movies/meninblack/ ,
seem to be due to the assertion
NS_PRECONDITION(nsnull == mInitialContainingBlock, "initial containing block
already created");
in nsCSSFrameConstructor.cpp, line 14386.

That is happening because we're doing InitialReflow twice due to bryner's patch
for bug 138237 and we're leaking one of the frame trees.  I think backing out
that patch also changes the layout of the page (restores it, I think), although
my memory could be confusing me at this point.
Thanks David for confirming my comment in Comment #34 about InitialReflow being
called twice. 

I reverted the patch in bug 138237 and the MIB crash no longer occurs (I checked
2 times with the patch removed, 2 times patch in)

However, I still get that assertion firing when I browse streetmap.co.uk. 
InitialReflow in this case being called from the same place.  Partial stack as
follows:

PresShell::InitialReflow(PresShell * const 0x03bb86d0, int 15, int 15) line 2767
HTMLContentSink::StartLayout() line 3745
HTMLContentSink::OpenBody(HTMLContentSink * const 0x03bd59c0, const
nsIParserNode & {...}) line 2971
CNavDTD::OpenBody(const nsCParserNode * 0x03b75ec0) line 3156 + 31 bytes
CNavDTD::OpenContainer(const nsCParserNode * 0x03b75ec0, nsHTMLTag
eHTMLTag_body, int 1, nsEntryStack * 0x00000000) line 3401 + 12 bytes

So the two calls (the second one ending up in that assertion) appear to come
from the same place.  This does indeed sound like a timing issue.  Would reflow
start if one of the animated gifs fires off and says it has a new image frame to
display?  I'm thinking maybe the animated gif is firing off before initalReflow
gets called, and it causing an InitialReflow. Then when the page is really ready
(or <iframe>, in the case of streetmap), the second InitialReflow is called.
this crash (or bug 146027?) seems to be showing up on linux as
nsBox::Redraw (talkback data)
and
0x00000000 - (various)   (bug 153211 and talkback data)

this crash is also occurring on the branch (linux branch build 20020621)
Summary: Trunk M11A crash [@ nsImageListener::FrameChanged] → Trunk M11A crash [@ nsImageListener::FrameChanged] [@ nsBox::Redraw] [@ 0x00000000]
I take back what I said in Comment #53 (Sorry, I'm still learning)

streetmap does NOT call InitialReflow twice from the same place.  Whenever
there are two InitialReflows being called instead of one, the stack appears as
what this attachement is.  The second InitialReflow is the proper one.	I don't
how the first one happens, but it shouldn't.  Almost looks like some content is
being added, which somehow results in an InitialReflow.
The extra InitialReflow I mention in Comment #55 is put in by Bug 52334
Stack Trace of MIB, confirming comment #52
ack, ok.  So, one thing we could do is add the check for |mDidInitialReflow|
inside InitialReflow(), instead of checking it at the call site.  Otherwise,
we'd have to add new virtual functions to ask the presshell if it's been
reflowed (since InitialReflow is called from the content library).

If we did this, would we want to do something else instead (if mDidInitialReflow
is already true), or should we just drop the reflow?  What if the width and
height are different from what it was originally reflowed with?
Brian, I tried that with streetmap, and what happens is the <iframe>s are just 
holes in the page (background color where the iframe should be).  

We can't throw away the second call to InitialReflow (at least for streetmap, 
haven't tried it in MIB), because it's the second one that actually draws it 
(properly).

For each InitialReflow, I think we need to do something like

if (we are not planning an InitialReflow via CNavDTD::OpenBody)
  InitialReflow();

CNavDTD::OpenBody == (The "Second InitialReflow" stack trace in Attachment 
88789 [details] and 8878f)

Or maybe there's an easier way :P I'm just throwing ideas out
How about doing a ResizeReflow() if we've already done an InitialReflow?
bryner:

i think we have to change the way you handle the NS_GOTFOCUS in the press shell.
i have here a local test case that reproduces the crash, and it shows me that
the call to InitialReflow occurs too soon. How about deffering the focus message
and post it on the event queue instead of reflowing?
i will try to see if i can post a focus message to the view manager instead of
initial reflowing
Brian:
Calling ResizeReflow() in InitialReflow if mDidInitialReflow is already true 
still causes the frame to become blank.

I believe Alexandru is on the right track with trying to defer some of the 
messages (so that the proper InitialReflow gets called before the bad ones try, 
if I'm not mistaken.)
That would work, but we'd need to make sure that the focus event is processed
prior to the presshell being destroyed (I'm thinking of the case described in
bug 138237.)

If this is too hairy, I could change it to just update the focus controller
state if we haven't done an initial reflow, and not worry about doing a reflow
or dispatching the event to the frames/content.  That would also fix 138237.
agree. give it a try.
Attached patch patchSplinter Review
This needs a fair amount of testing, but as far as I can tell it fixes the new
tab focus problem and doesn't cause the regression in this bug.
Comment on attachment 88801 [details] [diff] [review]
patch

congratulations! tested with my testcase that i will attach for QA. please
verify bug 138237 too.

r= alexsavulov
Attachment #88801 - Flags: review+
Attached file packed testcase for qa
unpack to temp, open test.html. (based on the window.open setting in prefs you
will /will get to see a popup). if yuo don't see an image in stest.html window,
the bugis there. if you hit refresh the app will crash with the stack.
btw: the testcase in attachment 88809 [details] is ZIP packed
excellent!  1 down, hopefully only 1 to go.  Now we just need to stop the 2 
InitialReflows in the the case of streetmap.co.uk (which doesn't do popups, but 
has a couple of misbehaving <iframe>s)
what, that one still crashes? oh, ye f_ing gods! (is what H.S.Thompson would say
to this ;-)
This is the crash stack you get whenever we leak frames that contain (animated?)
images inside them.
but that is a different problem than the one caused by the additional InitialReflow
from bug 138237. applying bryner's patch fixes that problem (can be tested with
attachment 88809 [details], if you don't apply the patch you get the stack with the image
listener calling framechanged on the leaked image frame)

i cannot reproduce that http://streetmap.co.uk crash at all!!! i learned the
streets of london (the song ;-) by the time i tried to reproduce that
crash. does that happen on all platforms? please apply bryners patch 88801 and
test with that. if the crash is still repro, then we might consider to open
additional bugs. besides that 
hmm, what i meant is:

the attachment 88809 [details] reveals the crash with the stack 

nsImageListener::FrameChanged
[nsImageFrame.cpp  line 2383] 

....

bryners patch 88801 repairs that.

so after applying this patch is the crash on http://streetmap.co.uk still there?
awright! i was able to crash!!! so bryner's patch solves only one part of the
problem. i'm converted and i belive you guys now! the big problem is that i
needed to traverse england W->E to get the crash :-) can someone try to get a
time reduced testcase please? :-)
Try this:
http://www.animecity.nu/mozilla/lynxos/index.html

There are a few rules that seem to be required:

- One <script src="..."> inside iframe.html
- Animated gif inside iframe.html
- One <script src="..."> inside index.html
- local src doesn't seem to always work

I hope it crashes for you.  

If it does, then the testcase is as follows:

- index.html with one script src and one iframe opening to iframe.html
- iframe.html with one script src
- both script files are empty (one blank line)

On a side note, without the animated gif, this testcase would be the same 
problem exhibited in the URL field of Bug 148827 (not the testcase in that bug, 
which appears to be a slightly different problem)
*** Bug 153799 has been marked as a duplicate of this bug. ***
I've found a crash testcase that doesn't involve iframes or scripting at all.

http://jslab.org/mozilla/xsldom.xml

Not 100% reproducible.  XML/XSLT.
Comment on attachment 88801 [details] [diff] [review]
patch

sr=blake
Attachment #88801 - Flags: superreview+
I checked this patch into the trunk.  Leaving the bug open since there are other
causes for this crash.
will open a new bug for the remaining issue.
bryner,

let's close this bug fixed so that it can be verified and so that you can check
it on the branch too. i reassign it to you since you fixed it. thanks!

all,

let's continue disscutions on bug 153815. so we have separate issues. please
feel free to transfer any important information to that bug.

Assignee: alexsavulov → bryner
Ok, resolving as fixed, since this is checked into the trunk, and nominating for
branch approval.
Status: NEW → RESOLVED
Closed: 22 years ago
Resolution: --- → FIXED
please checkin to the 1.0.1 branch. once there, remove the "mozilla1.0.1+"
keyword and add the "fixed1.0.1" keyword.
petersen, can you verify this on the trunk?
I no longer crash at the MIB website with MozillaTrunk builds 2002062508.
Talkback data also shows this stack signature last showing up with MozillaTrunk
builds from 6/22.  

Fix looks good on the MozillaTrunk for the one case we are worried about here. 
Marking verified.

Other issues with similar crashes are being dealt with in bug 153815.
Status: RESOLVED → VERIFIED
Adding adt1.0.1+ on behalf of the adt for checkin to the 1.0 branch.  Please
checkin to the branch asap. When you check this into the branch, please change
the mozilla1.0.1+ keyword to fixed1.0.1
Keywords: adt1.0.1adt1.0.1+
checked into the branch.
Verified in OS X (2002-07-16-05) and Windows ME (2002-07-16-08) branch build.
Keywords: verified1.0.1
No longer blocks: 146027
Crash Signature: [@ nsImageListener::FrameChanged] [@ nsBox::Redraw] [@ 0x00000000]
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: