M091 leaking PresShell leaves live frame tree without view tree - Trunk & N610

VERIFIED FIXED in mozilla0.9.4

Status

()

defect
--
critical
VERIFIED FIXED
18 years ago
18 years ago

People

(Reporter: timeless, Assigned: dbaron)

Tracking

({crash, topcrash})

Trunk
mozilla0.9.4
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(Not tracked)

Details

()

Attachments

(11 attachments)

11.79 KB, text/plain
Details
9.83 KB, text/plain
Details
12.06 KB, image/gif
Details
293 bytes, text/html
Details
10.64 KB, patch
Details | Diff | Splinter Review
6.33 KB, text/plain
Details
11.67 KB, text/plain
Details
48.51 KB, patch
Details | Diff | Splinter Review
17.68 KB, patch
Details | Diff | Splinter Review
37.03 KB, patch
Details | Diff | Splinter Review
39.96 KB, patch
Details | Diff | Splinter Review
Reporter

Description

18 years ago
solaris7sparc xlib w/ 3 patches [none near layout]
browsing zdnet
Program received signal SIGSEGV, Segmentation fault.
0xfc90f5a0 in ?? () from 
/tmp/obj-sparc-sun-solaris2.7/dist/bin/components/libgklayout.so
(gdb) bt
#0  0xfc90f5a0 in nsFrame::Invalidate (this=0xa51b04, aPresContext=0x9490f0,
    aDamageRect=@0xffbef058, aImmediate=0) at 
/tmp/mozilla/layout/html/base/src/nsFrame.cpp:2181
#1  0xfc92b9ac in nsImageFrame::FrameChanged (this=0xa51b04, 
aContainer=0x912950,
    aPresContext=0x9490f0, aNewFrame=0xa2ea58, aDirtyRect=0xffbef300)
    at /tmp/mozilla/layout/html/base/src/nsImageFrame.cpp:434
#2  0xfc930a60 in nsImageListener::FrameChanged (this=0xa51128, 
aContainer=0x912950,
    aContext=0x9490f0, newframe=0xa2ea58, dirtyRect=0xffbef300)
    at /tmp/mozilla/layout/html/base/src/nsImageFrame.cpp:1644
#3  0xfd709990 in imgRequestProxy::FrameChanged (this=0xa2b9a8, 
container=0x912950, cx=0x0,
    newframe=0xa2ea58, dirtyRect=0xffbef300)
    at /tmp/mozilla/modules/libpr0n/src/imgRequestProxy.cpp:260
#4  0xfd705550 in imgRequest::FrameChanged (this=0xa31df0, container=0x912950, 
cx=0x0,
    newframe=0xa2ea58, dirtyRect=0xffbef300) at 
/tmp/mozilla/modules/libpr0n/src/imgRequest.cpp:375
#5  0xfd6feb8c in imgContainer::Notify (this=0x912950, timer=0x0)
    at /tmp/mozilla/modules/libpr0n/src/imgContainer.cpp:428
#6  0xfcf4261c in nsTimerXlib::Fire (this=0xb70fa0)
    at /tmp/mozilla/widget/timer/src/unix/xlib/nsTimerXlib.cpp:211
#7  0xfcf427ac in nsTimerXlib::ProcessTimeouts (array=0x516ca0)
    at /tmp/mozilla/widget/timer/src/unix/xlib/nsTimerXlib.cpp:281
#8  0xfcf42be0 in NS_ProcessTimeouts (aDisplay=0x1667e0)
    at /tmp/mozilla/widget/timer/src/unix/xlib/nsTimerXlib.cpp:438
#9  0xfe3a1ba0 in CallProcessTimeoutsProc (aDisplay=0x1667e0)
    at /tmp/mozilla/widget/src/xlib/nsAppShell.cpp:230
#10 0xfe3a28dc in nsAppShell::Run (this=0xf4120) at 
/tmp/mozilla/widget/src/xlib/nsAppShell.cpp:474
#11 0xfdf3c5c0 in ?? () from 
/tmp/obj-sparc-sun-solaris2.7/dist/bin/components/libnsappshell.so
#12 0x1b630 in main1 (argc=3, argv=0xffbef92c, nativeApp=0x0)
    at /tmp/mozilla/xpfe/bootstrap/nsAppRunner.cpp:1010
#13 0x1c530 in main (argc=3, argv=0xffbef92c) at 
/tmp/mozilla/xpfe/bootstrap/nsAppRunner.cpp:1311
(gdb) print view
$5 = (nsIView *) 0xa31500
(gdb) print *view
$6 = {<nsISupports> = {_vptr. = 0x0}, <No data fields>}

This is wierd, because view isn't an nsCOMPtr.  [refcounting problems/pointing 
to freed memory?]

markh says that the blind cast in 1982 looks suspect.
Severity: normal → critical
Keywords: crash
Reporter

Updated

18 years ago
Blocks: 79119

Comment 1

18 years ago
timeless, please add the particular URL that you crashed on?  Also, what is the
behavior of that URL on Linux?  It will be easier to debug this for us if we can
get a reproducible crasher test case on either Windows, Mac, or Linux.  Thanks!

Shiva, would you check the latest Talkback reports to see if the stack reported
on this bug is a top crash?  Thanks!
Keywords: qawanted
This stack signature is not appearing in the topcrash in Trunk or M09. We are
not actively tracking Sun Solaris crashes.
Reporter

Comment 3

18 years ago
Sorry, my sessions self destruct w/in 24 hours, we'll have to try to figure out 
what we're clobbering.  Or i'll have to figure out how to get purify working.
I've seen this twice recently on Linux, and I'll attach some info from the
debugger.  I think it may be image-code related, although I'm not sure.  It
seems like the frame could be deleted by this point, and since the frame's
mParent looks like garbage, I'm suspicious that this may be calling through a
garbage frame.  (I'm not sure what a deleted frame would look like, having been
allocated in an arena, although I'd think the vtable pointer would still get set
to null.)

I think nsImageFrame should still null the frame in its nsImageListener when it
is destroyed, although I suspect that won't fix this bug (although it might).


views aren't refcounted, so it makes no sense to hold them in nsCOMPtrs. 
(retitling appropriately)
Summary: Segfault in nsFrame::Invalidate non nsCOMPtr view is garbage (null nsCOMPtr) → Segfault in nsFrame::Invalidate on garbage/deleted view
Never mind my comment about the image listener...  I missed the method in the
header file when I was searching for places where mFrame could be set.
Hrm.  A destroyed frame should have its vtbl pointer clobbered.  So this makes
me think something else is overwriting the memory.  Or else I'm misinterpreting
the debugger output...
In ftp://ftp.mozilla.org/pub/data/crash-data/detailed-crash-analysis-all.html I
found 2 talkback reports of this crash, one on Linux and one on Windows.

ID: 30287919, build 2001-05-09-22, Linux
ID: 30158630, build 2001-05-08-06, Windows

The latter shows the crash as happening (I think) on the
  frame->GetView(aPresContext, aView);
line in nsFrame::GetOffsetFromView.
OS: Solaris → All
Hardware: Sun → All

Comment 9

18 years ago
I'm seeing this crash too. Only recently -- within the past couple of days --
but *very* frequently. I snooped around in gdb and saw pretty much the same
stuff dbaron posted. The one possibly interesting piece of info I have in
addition is, of the several times I saw this crash, the data in memory where
view should have been living was entirely different each time. A few times it
was different binary junk, one other time it was a string holding an URL.

I initially tried to fix this naively by nsCOMPtr'ing it, but this was my first
foray into the mystical world of views and frames... They're not refcounted.
Ouch. But I can't vouch for dbaron's theory that the memory is getting stomped
over -- as far as I can tell, the view is getting free'd prematurely...

This could be a case for adding refcounting to views...

Regardless, setting keywords (adding "topcrash" as well, just given the
ridiculous frequency I've hit this). Also, reassigning to default owner of
layout.
Assignee: nisheeth → karnaze
Reporter

Comment 11

18 years ago
*** Bug 80576 has been marked as a duplicate of this bug. ***

Comment 12

18 years ago
I get the same stack trace as this with the attached test case in bug #80571. Dupe ?

Comment 13

18 years ago
80571 does not crashes any more.
Please supply the url where the crash occurs for this bug.
Keywords: donttest
I crash after browsing http://nytimes.com/ .  It happens frequently, but I don't
see a way to reproduce 100% of the time.  (And it crashes enough that I've
stopped using Mozilla and gone back to a combination of Opera 5 and Netscape 4.x.)

Comment 16

18 years ago
Verified on
build: 2001-05-14-04-Trunk
platform: WinNT

Marking this as WFM, as it does not crashes for me on http://nytimes.com.
Also, see my previous comments.
Status: NEW → RESOLVED
Last Resolved: 18 years ago
Resolution: --- → WORKSFORME
Reopening.  You have to browse around for a bit.
Status: RESOLVED → REOPENED
Resolution: WORKSFORME → ---
This is not the type of crash that can be reproduced 100% of the time.  However,
it's still a bug.  If the browser crashes after 5 minutes of usage on site X,
it's a bug, even if the crash can't be reproduced every time.

Comment 20

18 years ago
Yeah I get this crash many times per day now.  Nightly builds from today and the
last few days have an mtbf of about 5 minutes.  Also this is not limited to
Xlib, there are a couple of stacks here using GTK+.

Comment 21

18 years ago
Clicking on "flowers" on the left side of the following page still crashes 100%
reproducably for me in nsFrame::Invalidate.
http://www.toronto.com/E/F/TORON/0020/07/35/
Linux (Redhat 7) 

Comment 22

18 years ago
Yes, that "Flowers" link crashes every time, stacktrace is same every time.
Linux (rh 7.0), build from 2001/05/15

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 1024 (LWP 11889)]
0x40ddb98c in nsFrame::Invalidate ()
(gdb) bt
#0  0x40ddb98c in nsFrame::Invalidate ()
#1  0x40de990e in nsImageFrame::FrameChanged ()
#2  0x40dec248 in nsImageListener::FrameChanged ()
#3  0x40d705ad in imgRequestProxy::FrameChanged ()
#4  0x40d6efac in imgRequest::FrameChanged ()
#5  0x40d6cdb3 in imgContainer::Notify ()
#6  0x40f50225 in nsTimerGtk::FireTimeout ()
#7  0x40f5041e in process_timers ()
#8  0x40f504b3 in TimerCallbackFunc ()
#9  0x40676983 in g_timeout_dispatch ()
#10 0x40675987 in g_main_dispatch ()
#11 0x40676001 in g_main_iterate ()
#12 0x406761cc in g_main_run ()
#13 0x4058ce57 in gtk_main ()
#14 0x404cc2a2 in nsAppShell::Run ()
#15 0x404ad976 in nsAppShellService::Run ()
#16 0x0804d87b in main1 ()
#17 0x0804e117 in main ()
#18 0x401efb5c in __libc_start_main ()
Whiteboard: critical to mozilla 0.9.1
What is going on with this bug?

Comment 24

18 years ago
I'm looking into it for karnaze - seems I am the layout-image guy nowadays.
Assignee: karnaze → attinasi
Status: REOPENED → NEW
I crash 100% of the time just after leaving the "simple testcase" by clicking on
the imagemap link to mozilla.org.

Comment 28

18 years ago
Nice work timeless!  Crashes on my Linux nightly 100%.  This explains why I see
the crashes I see at seemingly innocent times.  They are happening after I LEAVE
a page with the offending content.

Comment 29

18 years ago
Thanks for the testcase, David. It will crash my Linux build every time, but not
my Mac build for some reason. 
I think this bug may be caused by the view tree having been destroyed by the
destruction of the view manager but the frame tree still being kept alive.
If that's the case, it could be caused by someone leaking a PresShell.  The
solution could be to move some of what's in ~nsPresShell into
nsPresShell::Destroy and call nsPresShell::Destroy from
DocumentViewerImpl::Destroy.
PresShell::EndObservingDocument already exists, so we wouldn't need to create
PresShell::Destroy.
Moving the entire contents of the destructor of PresShell into
PresShell::EndObservingDocument fixes the problem, as I suspected.  This is a
good thing anyway because it means the lifetime of the frames is driven entirely
by user input and never waits for destructors that might wait for GC, cycles
that are never broken, etc.

However, doing this also messes up hyatt's hack to display the old document
while loading the new, I think because he didn't listen to my advice on bug
76495 and move the call to DocumentViewerImpl::Destroy to right before he lets
go of the doc viewer.

Comment 33

18 years ago
Good information, David. So, do you want to take this, give it to Hyatt, or
what? I'm miles behind you here...

Comment 36

18 years ago
This looks okay to me. Hyatt, what do you think?
This patch causes some DOM problems (e.g., the DOM code on tinderbox) and causes
a JS error when bringing up view-source.  I'll investigate why that is tomorrow...
The source of the leaked presshell was in IBMBIDI code.  I think the reason this
affects Linux the most is because of autocopy.

Fixing this leak will probably quiet the bug down, although I also do want to
fix it so that if we leak a pres shell on a page with animated images we don't
crash.

Index: nsCopySupport.cpp
===================================================================
RCS file: /cvsroot/mozilla/layout/base/src/nsCopySupport.cpp,v
retrieving revision 1.4
diff -u -d -r1.4 nsCopySupport.cpp
--- nsCopySupport.cpp	2001/04/13 22:07:57	1.4
+++ nsCopySupport.cpp	2001/05/18 02:29:39
@@ -92,7 +92,7 @@
   
   nsCOMPtr<nsIDocument> doc = do_QueryInterface(aDoc);
   if (doc) {
-    nsIPresShell* shell = doc->GetShellAt(0);
+    nsCOMPtr<nsIPresShell> shell(dont_AddRef(doc->GetShellAt(0)));
     if (shell) {
       nsCOMPtr<nsIPresContext> context;
       shell->GetPresContext(getter_AddRefs(context) );
*** Bug 80355 has been marked as a duplicate of this bug. ***
*** Bug 81264 has been marked as a duplicate of this bug. ***
dbaron's patch fixes two "crash on sending email" bugs on linux.

I've marked them dups of this bug.

r=sspitzer on that last patch.  thanks dbaron.

Comment 42

18 years ago
*** Bug 81432 has been marked as a duplicate of this bug. ***

Comment 43

18 years ago
*** Bug 80571 has been marked as a duplicate of this bug. ***
OK, leak fix is checked in to stop the crash.

Leaving this open to fix the problem that if we leak a pres shell without a doc
viewer, we have a frame tree with no view tree.

Comment 46

18 years ago
You should not remove the SetPreviousViewer line.  Otherwise an entire chain of 
viewers can be built up and you'll never release any (e.g., when you're running 
the i-bench tests which never unsuppress since they change pages during the 
onload handler).

Comment 47

18 years ago
Destroy() should not tear down the view tree and should be safe to call from 
the docshell.  By delaying the Destroy() you don't disconnect the scripts, and 
that's even more dangerous than the way it is now.

I'm suspecting that jst's XPCDOM landing changed the timing of some things and 
we just need to adjust for that.

Comment 48

18 years ago
Well, this is obviously from the code I wrote, so taking the bug. :)
Assignee: attinasi → hyatt
Target Milestone: --- → mozilla0.9.1

Updated

18 years ago
Status: NEW → ASSIGNED
Target Milestone: mozilla0.9.1 → mozilla0.9.2
Retitling to reflect the problem that remains (since we're not leaking the pres
shell anymore).
Summary: Segfault in nsFrame::Invalidate on garbage/deleted view → leaking PresShell leaves live frame tree without view tree
Reporter

Comment 50

18 years ago
hyatt: a reminder that this bug is marked critical for mozilla0.9.1 and you 
just changed the milestone away from that.  Because you're a netscape employee 
I'm guessing you need to work on things that are nsbeta1 so nominating.
Keywords: nsbeta1

Comment 51

18 years ago
Timeless, dbaron checked in a fix already for the crasher.  That's why I think 
this can wait for 0.9.2.

Comment 52

18 years ago
This is related to a problem reported in bug 80382 (changing DPI setting crashes 
on Linux). dbaron's fix fixed that one too, and the crash is in the presShell, 
in a reflowevent using a presShell actually. This may be a more general problem 
when presShell's are leaked.

Also, clearing status whiteboard as it no longer applies since the crash was 
fixed.
Whiteboard: critical to mozilla 0.9.1
Reporter

Comment 53

18 years ago
ok, cleaning out keywords.
No longer blocks: 79119
Severity: critical → major

Comment 54

18 years ago
putting [@ nsFrame::Invalidate] back into summary so we can keep track of this 
"topcrash".  i understand that the crash has been fixed, but the talkback team 
would like to keep the stack signature in the summary for tracking.  

this crash last appeared in large numbers in talkback data with build 
2001051722...so leaving the crash and topcrash keywords out.
Summary: leaking PresShell leaves live frame tree without view tree → leaking PresShell leaves live frame tree without view tree [@ nsFrame::Invalidate]
I think this one (or something closely related; problems with views being
destroyed) is still with us.  I hit a crash using a FreeBSD4.1 Mozilla
20010529xx build that appears to be a shell has a view that was freed.  See the
backtrace.  The view's vptr points to the string
"chrome://global/content/bindings/scrollbar.xml#scrollbarbutton.accDoDefaultAction()".

Note: I have changes in my tree (while playing with bug 80327) in this function;
the section of code around the crash looks like this (==> is the crash point):

  nsIView* view;

  GetView(aPresContext, &view);
  if (view) {
    view->GetViewManager(*getter_AddRefs(viewManager));
    if (viewManager) {
      viewManager->UpdateView(view, damageRect, flags);
    }
  } else {
    nsRect    rect(damageRect);
    nsPoint   offset;
  
    GetOffsetFromView(aPresContext, offset, &view);
    NS_ASSERTION(nsnull != view, "no view");
    rect += offset;
==> view->GetViewManager(*getter_AddRefs(viewManager));
    if (viewManager) {
      viewManager->UpdateView(view, rect, flags);
    }
  }

The assertion did not fire (view isn't null; just deallocated).

I was at http://www.planethalflife.com/features/mailbag; I think I had just
switched to that screen and then exposed the window (it was partially covered).

Adding back crash.  Topcrash may be wanted too.
Keywords: crash
BTW, this crash is repeatable by going to http://www.planethalflife.com and
clicking on MailBag (in the left column).

Comment 58

18 years ago
The http://www.planethalflife.com/features/mailbag link (and navigating to it)
works OK for me using the 0531 build on Win2K - no crash, no problems. Also
tested it with a CVS build from this morning, no problem there either. Once my
linux build finishes I'll try that.

Comment 59

18 years ago
IMO the presshell should just own a ref to the view tree.  Then the leak won't 
create any sort of bad situation.
That would solve some of the related problems, but not all of them.  For
example, I suspect the underlying issue here of causing bugs like bug 80327 and
bug 83613.  (And I also suspect it may be related to the bugs where content
continues appearing after it shouldn't anymore.)
Guys - I think this is a real problem.  Since this is a "memory gets reused
after free" bug, the behavior will vary wildly under different OS's and
allocators.  In order to truely test this, you need to use some tool or system
option that trashes memory on free().  WFM on another OS really doesn't tell us
if the bug is there or not.  I've made a number of clobber builds, and checked
every line of every diff I have (almost all are for 80327 - making
nsIViewManager *'s into nsCOMPtr's and checking for nsnull's).

It doesn't crash 100% of the time for me; only about 80.  It also crashes on
other sites and causes random crashes.
upping severity.  Retested with bits from today, problem is still there.
removing donttest.

I tried using MALLOC_OPTIONS=J in FreeBSD (trash allocated and freed memory to
d0d0d0d0 etc), but the same problem.  The memory _is_ being reallocated as
something else.  I'm tempted to run it with dmalloc or some such, but I'm afraid
of what I'll find...  (dmalloc has an option (compile-time) to "hold" freed
memory for some amount of time before allowing it to be re-allocated, so that
you can more easily find reuse-of-freed-memory bugs.  http://www.dmalloc.org)

dmalloc with a FREED_POINTER_DELAY of 100 (or more; I use 1000) instead of 10
(default) will use a lot more memory, but it will also be more likely to catch
reuse of freed memory.  It also checks freed memory for modification on
allocation and periodically while free.
Severity: major → critical
Keywords: donttest

Comment 63

18 years ago
Adding M091 to the summary and topcrash keyword for tracking. This one is still
living in the talkback M091 Windows reports from 6/12.
Keywords: topcrash
Summary: leaking PresShell leaves live frame tree without view tree [@ nsFrame::Invalidate] → M091 leaking PresShell leaves live frame tree without view tree [@ nsFrame::Invalidate]
Ok, I have some more info.  Dmalloc merely proves that it is a
memory-reallocation case (i'll keep trying with it, and release some dmalloc
patches for the build process).  With various debugs I've managed to (I believe)
get a stack trace of where the view is being freed (GDB trace will be attached):

nsView::Destroy
nsViewManager::Release
DocumentViewerImpl::~DocumentViewerImpl
DocumentViewerImpl::Release
nsCOMPtr<nsIContentViewer>::assign_with_AddRef
DocumentViewerImpl::SetPreviousViewer
PresShell::UnsuppressAndInvalidate
PresShell::UnsuppressPainting
PresShell::sPaintSuppressionCallback
nsTimerGtk::FireTimeout
...

This occurs when a whole bunch of views are deleted, after the new views are
created (I assume for the new page (mailbag)), and before we get the "Document
... loaded successfully".  (I put debugs on view creation/deletion printing
pointer values, then looked back to see when that view was deleted.)

I don't crash 100% of the time on the page load, sometimes it's when I go back
to the previous page (though the errant view delete occured while loading the
mailbag page).

Again, the sequence is goto http://www.planethalflife.com, scroll down one page,
click on Mailbag in the left column.  (and then maybe click back)  It may be
much harder to reproduce at will in Win32 or Linux due to a different allocator,
however apparently crashes in Invalidate are common in the field (from
talkback).

I suspect it's a problem either with the SetPreviousViewer stuff, or with
something being kept around when it should have been released.

The crashes are all apparently for animations - it's as if animations from
previous pages aren't always removed properly, or their timers aren't.

I think the summary is now may be quite wrong.

But isn't the reason the timer wasn't unset earlier that the pres shell was
leaked?  I don't see why the comments above imply that the summary is wrong.
Hmm.  I didn't realize that the timer was tied into the presshell.  I'm rather
confused as to the relationship of views, viewmanagers, appshells, presshells,
images, documents, etc.  I didn't think that the view would be destroyed if the
presshell still knew about it (or about the viewmanager).  As hyatt mentioned,
perhaps part of the problem is that presshell's have weak references to
viewmanagers, so the viewmanager (and views) can disappear from under it.  That
would solve the crash, though it wouldn't solve the leaking presshell.

After all, isn't this what COM is for, to track reference counts?  The presshell
really does need a reference to the viewmanager, and if that creates loops then
resolve them.  The comment in nsPresShell.cpp is this:

  nsIViewManager* mViewManager;   // [WEAK] docViewer owns it so I don't have to

The docview has in fact gone away, of course (see my gdb trace).  IMHO, it seems
to me like weak references are just fodder for bugs (if not initially, then when
someone who doesn't know about it mods the code).  If there's a circular
reference, there should be some other way to break it.
I think I'm also seeing a crash in nsCaret.cpp::GetViewForRendering() during
shutdown due to this (a freed/reallocated view).
Mozilla 20010620xx

Updated

18 years ago
Target Milestone: mozilla0.9.2 → mozilla0.9.3
Is there any progress on this, or anyone working on it currently?  This is a
topcrash bug which I hit frequently.
I'm not quite sure why you're still seeing it and nobody else is.  Maybe you
could try to figure out why you're leaking pres shells?  (If you're familiar
with refcount logging, you could use it after applying the patch in attachment
35072 [details] [diff] [review].)

My understanding is that this is blocked on rewriting the way paint suppression
works, which I think is something hyatt is planning to do sometime in the next
few months or so.  Once paint suppression works differently (which I think
involves making the global window object a proxy for a per-document object or
something like that) then we can do something like my patch in attachment 35072 [details] [diff] [review]
to fix the underlying issue here.  (There *might* be some other hack to fix it
that wouldn't leak, although I'm not sure.)
I suspect that the memory allocator under BSD is more likely to re-allocate the
freed view than Linux or Win32.  Note: I'm still doing a new build; I just got
back from 2 weeks of vacation and was wondering if there'd been any progress. 
I'll re-test as soon as I get a good build.

Also, there was this note:

------- Additional Comments From David Hyatt 2001-05-18 15:04 -------

Timeless, dbaron checked in a fix already for the crasher.  That's why I think 
this can wait for 0.9.2.
------

However, as noted in responses I made shortly afterwards, I still crash, and I
just looked and the your patch (the attachment) has NOT been checked in.
I don't think this has a thing to do with the memory allocator under BSD.  We
crash quite hard on Linux when we leak a pres shell on a page with animated
images.  The question is why you are leaking pres shells, or, if you aren't,
what else is causing the crash.

My large patch wasn't checked in because it breaks paint suppression and it
hasn't been very well tested.

As far as I can tell, you are the *only* one seeing the crash, and unless you
can gather more data on what is happening (such as why you're the only one
seeing it or what is actually happening in the code), it's probably not going to
be fixed.  Complaints won't fix the crash.  We're not going to be able to do the
long term fix easily, so it's not going to happen immediately.
I thought (from the comment below) that this was being seen often in talkback
reports.  It doesn't sound like I'm the only one having this problem.  If I am,
I'll put more work into trying to find out what's happening.  Unfortunately,
there's no stack traceback code in nsTraceRefcnt.cpp for FreeBSD, which makes
things harder.  I'm willing to run any tests anyone wants.  I'm about to try
your patch.

------- Additional Comments From greer@netscape.com 2001-06-12 15:23 -------

Adding M091 to the summary and topcrash keyword for tracking. This one is still
living in the talkback M091 Windows reports from 6/12.

Comment 74

18 years ago
The latest Talkback data only shows a few of these crashes...so it's not really 
a "topcrasher", but it's still definitely happening on both the Trunk and the 
N610 branch.

Here is the latest Trunk crash:

Incident ID 32778341 
 Stack Signature
              nsFrame::Invalidate ac4f7d66 
 Bug ID
 Trigger Time 
              2001-07-11 12:33:52 
 User Comments 
              searching for newsgroups to subscribe to 
 Build ID
              2001071107 
 Product ID
              MozillaTrunk 
 Platform ID
              Win32 
 Stack Trace

nsFrame::Invalidate 
[d:\builds\seamonkey\mozilla\layout\html\base\src\nsFrame.cpp, line 2172] 
nsOutlinerBodyFrame::InvalidateRow 
[d:\builds\seamonkey\mozilla\layout\xul\base\src\outliner\src\nsOutlinerBodyFram
e.cpp, line 559] 
nsOutlinerSelection::SetCurrentIndex 
[d:\builds\seamonkey\mozilla\layout\xul\base\src\outliner\src\nsOutlinerSelectio
n.cpp, line 556] 
nsOutlinerSelection::Select 
[d:\builds\seamonkey\mozilla\layout\xul\base\src\outliner\src\nsOutlinerSelectio
n.cpp, line 330] 
XPTC_InvokeByIndex 
[d:\builds\seamonkey\mozilla\xpcom\reflect\xptcall\src\md\win32\xptcinvoke.cpp, 
line 139] 
XPCWrappedNative::CallMethod 
[d:\builds\seamonkey\mozilla\js\src\xpconnect\src\xpcwrappednative.cpp, line 
1883] 
XPC_WN_CallMethod 
[d:\builds\seamonkey\mozilla\js\src\xpconnect\src\xpcwrappednativejsops.cpp, 
line 1253] 
js_Invoke [d:\builds\seamonkey\mozilla\js\src\jsinterp.c, line 809] 
js_Interpret [d:\builds\seamonkey\mozilla\js\src\jsinterp.c, line 2702] 
js_Invoke [d:\builds\seamonkey\mozilla\js\src\jsinterp.c, line 825] 
js_InternalInvoke [d:\builds\seamonkey\mozilla\js\src\jsinterp.c, line 897] 
JS_CallFunctionValue [d:\builds\seamonkey\mozilla\js\src\jsapi.c, line 3322] 
nsJSContext::CallEventHandler 
[d:\builds\seamonkey\mozilla\dom\src\base\nsJSEnvironment.cpp, line 944] 
nsJSEventListener::HandleEvent 
[d:\builds\seamonkey\mozilla\dom\src\events\nsJSEventListener.cpp, line 140] 
nsXBLPrototypeHandler::ExecuteHandler 
[d:\builds\seamonkey\mozilla\content\xbl\src\nsXBLPrototypeHandler.cpp, line 
431] 
DoMouse [d:\builds\seamonkey\mozilla\content\xbl\src\nsXBLMouseHandler.cpp, line 
102] 
nsXBLMouseHandler::MouseDown 
[d:\builds\seamonkey\mozilla\content\xbl\src\nsXBLMouseHandler.cpp, line 108] 
nsEventListenerManager::HandleEvent 
[d:\builds\seamonkey\mozilla\content\events\src\nsEventListenerManager.cpp, line 
1251] 
nsXULElement::HandleDOMEvent 
[d:\builds\seamonkey\mozilla\content\xul\content\src\nsXULElement.cpp, line 
3633] 
PresShell::HandleEventInternal 
[d:\builds\seamonkey\mozilla\layout\html\base\src\nsPresShell.cpp, line 5637] 
PresShell::HandleEvent 
[d:\builds\seamonkey\mozilla\layout\html\base\src\nsPresShell.cpp, line 5559] 
nsView::HandleEvent [d:\builds\seamonkey\mozilla\view\src\nsView.cpp, line 377] 
nsView::HandleEvent [d:\builds\seamonkey\mozilla\view\src\nsView.cpp, line 350] 
nsView::HandleEvent [d:\builds\seamonkey\mozilla\view\src\nsView.cpp, line 350] 
nsViewManager::DispatchEvent 
[d:\builds\seamonkey\mozilla\view\src\nsViewManager.cpp, line 2056] 
HandleEvent [d:\builds\seamonkey\mozilla\view\src\nsView.cpp, line 68] 
nsWindow::DispatchEvent 
[d:\builds\seamonkey\mozilla\widget\src\windows\nsWindow.cpp, line 724] 
nsWindow::DispatchWindowEvent 
[d:\builds\seamonkey\mozilla\widget\src\windows\nsWindow.cpp, line 741] 
nsWindow::DispatchMouseEvent 
[d:\builds\seamonkey\mozilla\widget\src\windows\nsWindow.cpp, line 4242] 
ChildWindow::DispatchMouseEvent 
[d:\builds\seamonkey\mozilla\widget\src\windows\nsWindow.cpp, line 4489] 
nsWindow::ProcessMessage 
[d:\builds\seamonkey\mozilla\widget\src\windows\nsWindow.cpp, line 3233] 
nsWindow::WindowProc 
[d:\builds\seamonkey\mozilla\widget\src\windows\nsWindow.cpp, line 989] 
USER32.DLL + 0x2e98 (0x77e12e98) 
USER32.DLL + 0x30e0 (0x77e130e0) 
USER32.DLL + 0x5824 (0x77e15824) 
nsWebShellWindow::ShowModal 
[d:\builds\seamonkey\mozilla\xpfe\appshell\src\nsWebShellWindow.cpp, line 1085] 
nsContentTreeOwner::ShowAsModal 
[d:\builds\seamonkey\mozilla\xpfe\appshell\src\nsContentTreeOwner.cpp, line 393] 
nsWindowWatcher::OpenWindowJS 
[d:\builds\seamonkey\mozilla\embedding\components\windowwatcher\src\nsWindowWatc
her.cpp, line 701] 
GlobalWindowImpl::OpenInternal 
[d:\builds\seamonkey\mozilla\dom\src\base\nsGlobalWindow.cpp, line 3136] 
GlobalWindowImpl::OpenDialog 
[d:\builds\seamonkey\mozilla\dom\src\base\nsGlobalWindow.cpp, line 2339] 
XPTC_InvokeByIndex 
[d:\builds\seamonkey\mozilla\xpcom\reflect\xptcall\src\md\win32\xptcinvoke.cpp, 
line 139] 
XPCWrappedNative::CallMethod 
[d:\builds\seamonkey\mozilla\js\src\xpconnect\src\xpcwrappednative.cpp, line 
1883] 
XPC_WN_CallMethod 
[d:\builds\seamonkey\mozilla\js\src\xpconnect\src\xpcwrappednativejsops.cpp, 
line 1253] 
js_Invoke [d:\builds\seamonkey\mozilla\js\src\jsinterp.c, line 809] 
js_Interpret [d:\builds\seamonkey\mozilla\js\src\jsinterp.c, line 2702] 
js_Invoke [d:\builds\seamonkey\mozilla\js\src\jsinterp.c, line 825] 
js_InternalInvoke [d:\builds\seamonkey\mozilla\js\src\jsinterp.c, line 897] 
JS_CallFunctionValue [d:\builds\seamonkey\mozilla\js\src\jsapi.c, line 3322] 
nsJSContext::CallEventHandler 
[d:\builds\seamonkey\mozilla\dom\src\base\nsJSEnvironment.cpp, line 944] 
nsJSEventListener::HandleEvent 
[d:\builds\seamonkey\mozilla\dom\src\events\nsJSEventListener.cpp, line 140] 
nsEventListenerManager::HandleEventSubType 
[d:\builds\seamonkey\mozilla\content\events\src\nsEventListenerManager.cpp, line 
1162] 
nsEventListenerManager::HandleEvent 
[d:\builds\seamonkey\mozilla\content\events\src\nsEventListenerManager.cpp, line 
1332] 
nsXULElement::HandleDOMEvent 
[d:\builds\seamonkey\mozilla\content\xul\content\src\nsXULElement.cpp, line 
3633] 
PresShell::HandleEventInternal 
[d:\builds\seamonkey\mozilla\layout\html\base\src\nsPresShell.cpp, line 5632] 
PresShell::HandleEventWithTarget 
[d:\builds\seamonkey\mozilla\layout\html\base\src\nsPresShell.cpp, line 5604] 
nsEventStateManager::CheckForAndDispatchClick 
[d:\builds\seamonkey\mozilla\content\events\src\nsEventStateManager.cpp, line 
2456] 
nsEventStateManager::PostHandleEvent 
[d:\builds\seamonkey\mozilla\content\events\src\nsEventStateManager.cpp, line 
1542] 
PresShell::HandleEventInternal 
[d:\builds\seamonkey\mozilla\layout\html\base\src\nsPresShell.cpp, line 5653] 
PresShell::HandleEvent 
[d:\builds\seamonkey\mozilla\layout\html\base\src\nsPresShell.cpp, line 5559] 
nsView::HandleEvent [d:\builds\seamonkey\mozilla\view\src\nsView.cpp, line 377] 
nsViewManager::DispatchEvent 
[d:\builds\seamonkey\mozilla\view\src\nsViewManager.cpp, line 2056] 
HandleEvent [d:\builds\seamonkey\mozilla\view\src\nsView.cpp, line 68] 
nsWindow::DispatchEvent 
[d:\builds\seamonkey\mozilla\widget\src\windows\nsWindow.cpp, line 724] 
nsWindow::DispatchWindowEvent 
[d:\builds\seamonkey\mozilla\widget\src\windows\nsWindow.cpp, line 741] 

And a branch crash:

Incident ID 32566398 
 Stack Signature
              nsFrame::Invalidate a5077aff 
 Bug ID
 Trigger Time 
              2001-07-05 15:08:44 
 User Comments 
              quitting, while saving a number of drafts 
 Build ID
              2001070306 
 Product ID
              Netscape6.10 
 Platform ID
              Win32 
 Stack Trace

nsFrame::Invalidate 
[d:\builds\seamonkey\mozilla\layout\html\base\src\nsFrame.cpp, line 2193] 
nsOutlinerBodyFrame::InvalidateRow 
[d:\builds\seamonkey\mozilla\layout\xul\base\src\outliner\src\nsOutlinerBodyFram
e.cpp, line 559] 
nsOutlinerBodyFrame::InvalidateRange 
[d:\builds\seamonkey\mozilla\layout\xul\base\src\outliner\src\nsOutlinerBodyFram
e.cpp, line 593] 
nsOutlinerRange::Invalidate 
[d:\builds\seamonkey\mozilla\layout\xul\base\src\outliner\src\nsOutlinerSelectio
n.cpp, line 197] 
nsOutlinerSelection::ClearSelection 
[d:\builds\seamonkey\mozilla\layout\xul\base\src\outliner\src\nsOutlinerSelectio
n.cpp, line 448] 
nsMsgDBView::OnAnnouncerGoingAway 
[d:\builds\seamonkey\mozilla\mailnews\base\src\nsMsgDBView.cpp, line 3427] 
nsMsgDatabase::NotifyAnnouncerGoingAway 
[d:\builds\seamonkey\mozilla\mailnews\db\msgdb\src\nsMsgDatabase.cpp, line 481] 
nsMsgDatabase::ForceClosed 
[d:\builds\seamonkey\mozilla\mailnews\db\msgdb\src\nsMsgDatabase.cpp, line 977] 
nsMsgDatabase::CleanupCache 
[d:\builds\seamonkey\mozilla\mailnews\db\msgdb\src\nsMsgDatabase.cpp, line 520] 
nsGenericModule::Shutdown 
[d:\builds\seamonkey\mozilla\xpcom\components\nsGenericFactory.cpp, line 223] 
nsGenericModule::~nsGenericModule 
[d:\builds\seamonkey\mozilla\xpcom\components\nsGenericFactory.cpp, line 201] 
nsGenericModule::`scalar deleting destructor' 
nsThreadPool::Release [d:\builds\seamonkey\mozilla\xpcom\threads\nsThread.cpp, 
line 467] 
nsDll::Shutdown [d:\builds\seamonkey\mozilla\xpcom\components\xcDll.cpp, line 
479] 
nsFreeLibrary 
[d:\builds\seamonkey\mozilla\xpcom\components\nsNativeComponentLoader.cpp, line 
381] 
nsFreeLibraryEnum 
[d:\builds\seamonkey\mozilla\xpcom\components\nsNativeComponentLoader.cpp, line 
430] 
_hashEnumerate [d:\builds\seamonkey\mozilla\xpcom\ds\nsHashtable.cpp, line 194] 
PL_HashTableEnumerateEntries [../../../lib/ds/plhash.c, line 414] 
nsHashtable::Enumerate [d:\builds\seamonkey\mozilla\xpcom\ds\nsHashtable.cpp, 
line 360] 
nsNativeComponentLoader::UnloadAll 
[d:\builds\seamonkey\mozilla\xpcom\components\nsNativeComponentLoader.cpp, line 
991] 
nsComponentManagerImpl::UnloadLibraries 
[d:\builds\seamonkey\mozilla\xpcom\components\nsComponentManager.cpp, line 1913] 
nsComponentManagerImpl::Shutdown 
[d:\builds\seamonkey\mozilla\xpcom\components\nsComponentManager.cpp, line 373] 
NS_ShutdownXPCOM [d:\builds\seamonkey\mozilla\xpcom\build\nsXPComInit.cpp, line 
503] 
netscp6.exe + 0x11d1 (0x004011d1) 
netscp6.exe + 0x3251 (0x00403251) 
KERNEL32.DLL + 0x17d08 (0x77e97d08) 
Summary: M091 leaking PresShell leaves live frame tree without view tree [@ nsFrame::Invalidate] → M091 leaking PresShell leaves live frame tree without view tree - Trunk & N610 [@ nsFrame::Invalidate]

Comment 75

18 years ago
This is not the same bug (what you just posted).
Removing "[@ nsFrame::Invalidate]" from summary since the current bug on that
stack signature is a separate bug (and should be filed separately if it isn't
already).
Summary: M091 leaking PresShell leaves live frame tree without view tree - Trunk & N610 [@ nsFrame::Invalidate] → M091 leaking PresShell leaves live frame tree without view tree - Trunk & N610
The bug in that frame appears to be failure to get an outline style.

I just re-verified I'm seeing the crash.  Note: I tried dbaron's patch.  With
the patch installed, I couldn't start mozilla without crashing due to an
nsSupportsArray having a garbage object in it.  Without the patch I still crash
on mailbag.

I'm going to attach a simple patch that tries to trash views that are deleted
and watch for that in a few places.  The memory on my system is reallocated
quickly, though, unless I use something like dmalloc to delay reuse of freed
memory.  ElectricFence or some such might work at this too.

If you turn on the BSD allocator built into nspr, you might have more luck
reproducing this, perhaps.
Because the tracebacks now appear to be something else, I just did a clean pull
and build of 0.9.2.  _One_ test of the page didn't cause a failure.  This isn't
proof that something in my tree is causing this, but it's certainly pointing
that way.  I'm doing a full diff (which I have done before while debugging this)
and will check or remove every difference -- better yet, I'll drop them into my
0.9.2 tree until it breaks as well.

When I first hit this bug (before dbaron's fix for the IBMBIDI presshell leak),
I started cleaning up the usage of GetViewManager() (1/3 of the uses used
nsCOMPtr's, 1/4 of them checked the result, a bunch called NS_RELEASE directly,
etc).  It's possible that I messed one up, though I've already gone back once
and checked every file with a visual diff in emacs.

People may not care, but I'll attach a copy of all the changes in my tree. I
just went over them with a fine-tooth comb, and they all look good to me. There
was one thing I wondered about (I think it was a CVS merge artifact), so I fixed
that and will retry once my build finishes (I have to go home, so I may not test
it until Monday).

Comment 82

18 years ago
we ought to get the reviews, look at the risks and consider for a hitchhiker on
the 0.9.2 branch.
The patch above should fix bug 89626 (topcrash).  However, there is some risk. 
I certainly need to figure out how to test full page plugins.  I should also
make sure it doesn't do anything weird to things like focus, although it
shouldn't...
Target Milestone: mozilla0.9.3 → mozilla0.9.4
I want to fix this for 0.9.3.
Assignee: hyatt → dbaron
Status: ASSIGNED → NEW
Target Milestone: mozilla0.9.4 → mozilla0.9.3

Comment 85

18 years ago
Rather than eyeballing this, it would probably be best if we just went through
it and you explained your changes to me in person.

Comment 86

18 years ago
sr=hyatt

Comment 87

18 years ago
r/sr=waterson
Fix checked in 2001-07-24 21:30/21:31 PDT.
Status: NEW → RESOLVED
Last Resolved: 18 years ago18 years ago
Resolution: --- → FIXED
Backed out to fix bug 92325 and probably also bug 92443.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Target Milestone: mozilla0.9.3 → mozilla0.9.4
Reporter

Updated

18 years ago
Blocks: 79119
Changes from the previous patch:
 * fix printing crash #1 by moving what used to be in the pres shell's
destructor into Destroy rather than EndObservingDocument so as not to change the
protocol
 * fix printing crash #2 by calling Destroy in ~PrintObject.  (Why does printing
need to create two pres shells (and thus 2 frame trees)?)
 * do lots of cleanup of nsDocumentViewer.cpp to bring it into the modern era of
XPCOM use
 * Check whether mStyleSet is null in the pres shell's destructor, and if not,
call Destroy and assert (I'm hoping this would fix bug 92443, but I'm not sure
since I'm not seeing the assertion or the crash).

Comment 92

18 years ago
Compared to the previous patch you don't make any changes to nsPluginViewer.cpp. 
If you plan to, please be aware of restructuring the modules/plugin folder which 
is being performed now. See bug 76602.

Comment 93

18 years ago
sr=hyatt
Oops, the same diffs to nsPluginViewer should have been in that patch as in the
first patch.
Fix checked in 2001-07-31 20:15 PDT.
Status: REOPENED → RESOLVED
Last Resolved: 18 years ago18 years ago
Resolution: --- → FIXED

Updated

18 years ago
Blocks: 90574

Updated

18 years ago
No longer blocks: 90574

Comment 98

18 years ago
Marking verified per last comments.
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.