Closed Bug 93105 Opened 23 years ago Closed 12 years ago

used to Crash entering Real Player site

Categories

(Core Graveyard :: Plug-ins, defect, P3)

x86
All

Tracking

(Not tracked)

RESOLVED INVALID
Future

People

(Reporter: tracy, Assigned: dbaron)

References

()

Details

Attachments

(7 files, 1 obsolete file)

seen on commercial trunk builds

windows 2001-08-01-06-trunk
mac 2001-08-01-04-trunk

-go to the above site to run Real Player.

the page begins to load then crashes.  I'll grab the talk back incident and post 
next.
Keywords: smoketest
here's the Mac talkback incident:

Incident ID 33594905 
 Stack Signature
               0x000045b0 8ea96d58 
 Bug ID
 Trigger Time 
               2001-08-01 08:41:20 
 User Comments 
               Crash loading Real Player site 
 Build ID
               2001080104 
 Product ID
               MozillaTrunk 
 Platform ID
               MacOS 
 Stack Trace

0x000045b0 
PresShell::UnsuppressPainting() [nsPresShell.cpp, line 4648] 
PresShell::sPaintSuppressionCallback() [nsPresShell.cpp, line 2761] 
nsTimerImpl::Fire() [nsTimerMac.cpp, line 149] 
nsTimerPeriodical::FireAndReprimeTimer() [nsTimerPeriodical.cpp, line 172] 
nsTimerPeriodical::ProcessExpiredTimer() [nsTimerPeriodical.cpp, line 265] 
nsTimerPeriodical::ProcessTimers() [nsTimerPeriodical.cpp, line 150] 
nsTimerPeriodical::RepeatAction() [nsTimerPeriodical.cpp, line 107] 
Repeater::DoRepeaters() [nsRepeater.cpp, line 119] 
nsMacMessagePump::DispatchEvent() [nsMacMessagePump.cpp, line 465] 
nsMacMessagePump::DoMessagePump() [nsMacMessagePump.cpp, line 269] 
nsAppShell::Run() [nsAppShell.cpp, line 110] 
nsAppShellService::Run() [nsAppShellService.cpp, line 423] 
Netscape 6 + 0x6fd8 (0x1f79cda8) 
Netscape 6 + 0x7d5c (0x1f79db2c) 
cc:ing dbaron, I suspect this could be caused by the checkin to
nsObjectFrame.cpp last night.

Why change cv->SetPreviousViewer(nsnull) to cv->Show()?                        
                             

In the case of an OBJECT frame, we want to turn off paint suppression to ensure
any plugins still running are cleaned up before we start to create new ones.

I think this code may be in the wrong place anyway. A better place would be in
nsPuginInstanceOwner::Init() because we only want to do this for a new plugin,
not a new frame. 

Yeah, this is probably me.  I thought I would have avoided this, though, with
the change I made between the two times I checked this in.

The change from SetPreviousViewer to Show was because I changed the doc viewer
so Show calls SetPreviousViewer rather than the other way around.
Assignee: av → dbaron
This bug is the same as bug 92443, which I fixed by backing out the changes that
I checked in again last night.  I thought a change that I made in the changes
for bug 80203 that I checked in again would have fixed this problem, but
apparently not.  I really don't see how this could be happening, but I'll try to
poke around.

However, I wasn't able to reproduce this on Windows the last time it came up,
and I don't even have access to a Windows box right now, so I may need some help
debugging or something...
Attached patch potential fixSplinter Review
....I'm still updating my trunk tree so I can't test my patch (or even compile)
or try dbaron's.
I can reproduce this on Linux by visiting the above URL, but I _don't_ have the
Flash plugin installed. (The unknown plugin dialog comes up, and then I crash.)
Here is the stack trace:

#0  0x41a30230 in PresShell::UnsuppressAndInvalidate (this=0x8877480)
    at ../../../../../mozilla/layout/html/base/src/nsPresShell.cpp:4616
#1  0x41a3045b in PresShell::UnsuppressPainting (this=0x8877480)
    at ../../../../../mozilla/layout/html/base/src/nsPresShell.cpp:4648
#2  0x41a2906e in PresShell::sPaintSuppressionCallback (aTimer=0x89123f8, 
    aPresShell=0x8877480)
    at ../../../../../mozilla/layout/html/base/src/nsPresShell.cpp:2761
#3  0x4087b6a6 in nsTimerGtk::FireTimeout (this=0x89123f8)
    at ../../../../../../mozilla/widget/timer/src/unix/gtk/nsTimerGtk.cpp:183
#4  0x4087b90f in process_timers (array=0x8439440)
    at ../../../../../../mozilla/widget/timer/src/unix/gtk/nsTimerGtk.cpp:256
#5  0x4087b9c8 in TimerCallbackFunc (data=0x0)
    at ../../../../../../mozilla/widget/timer/src/unix/gtk/nsTimerGtk.cpp:276
#6  0x40488731 in g_timeout_dispatch () from /usr/lib/libglib-1.2.so.0
#7  0x404877f3 in g_main_dispatch () from /usr/lib/libglib-1.2.so.0
#8  0x40487dd9 in g_main_iterate () from /usr/lib/libglib-1.2.so.0
#9  0x40487f8c in g_main_run () from /usr/lib/libglib-1.2.so.0
#10 0x4039c803 in ?? () from /usr/lib/libgtk-1.2.so.0
#11 0x41254139 in ?? ()
   from /usr/src/seamonkey-clean/debug/dist/bin/components/libwidget_gtk.so
#12 0x42059f8d in ?? ()
   from /usr/src/seamonkey-clean/debug/dist/bin/components/libnsappshell.so
#13 0x0805b0c8 in main1 (argc=1, argv=0xbffffa04, nativeApp=0x0)
    at ../../../mozilla/xpfe/bootstrap/nsAppRunner.cpp:1290
#14 0x0805bd5b in main (argc=1, argv=0xbffffa04)
    at ../../../mozilla/xpfe/bootstrap/nsAppRunner.cpp:1599
#15 0x405d2177 in ?? () from /lib/i686/libc.so.6


In |UnsuppressAndInvalidate()|, mFrameManager == 0, so we crash trying to
dereference it to get the root frame.
peterl: s/mPresContext/aPresContext/
peterl: that patch fixes the crash for me. If I understand it correctly, the
first plugin instance that actually gets created (which happens during reflow,
as opposed to frame construction, right?) will cause us to boot the old content
viewer. Is that right?
I don't see the crash without the patch, but I'm pretty sure the above patch
should fix the problem.  The timer holds a strong reference to the pres shell,
so it could be being destroyed when the timer is canceled or released (varies by
platform).

But actually since some platforms look like they release the callback in
Cancel(), I'm going to make one more change.
Actually, there's no need for any other change...
My initial reaction is that both patches are probably good, but peterl's is a
little risky (could it regress the bug that that code was originally added to
fix?), and that mine might fix some other crash with the same stack trace, but
not related to plugins.

(I'm still waiting for waterson to test mine, though.)
..still updating...<has every file changed?>..

Chris: exactly right! for a plugin there could possibly be several frames (such
as caused by a reframe), but there should only one plugin instance[owner].

dbaron: even with your patch, doesn't it sound wrong to call cv->Show() in
nsObjectFrame::Init()?

OK, apparently my patch doesn't fix it.
Calling |cv->Show()| is no more wrong than calling |SetPreviousViewer| was
before, since they do the same thing.  However, it does seem wrong, yes.

If it doesn't cause bug 85334 to regress, r=dbaron on your fix.
Aargh.  twalker's stack was missing a frame at the top, and I was mislead. 
Compare to waterson's stack.  I'm not completely sure what the chain of events
is here, but I'm going to check this in to fix the blocker and then we can work
out the real fix later today.
Let's go with dbaron's last patch. I'm not sure what'll happen with peterl's,
but it also seems reasonable. (Are there any issues with forcing a paint while
we're in the middle of reflow? Probably no more or less than forcing a paint
during frame construction...)
r/sr=waterson on attachment 44268 [details] [diff] [review], FWIW.
peterl: ought we check your patch in as well? 
Downgrading from blocker since it's no longer a blocker but we still need to
figure out the long-term fix.
Severity: blocker → critical
yeah, but I need to run it through the smoketests and some basic regression
tests first. 
This stack shows how we get to the nested |Destroy|:

#4  0x420512de in PresShell::Destroy() (this=0x432030e0)
    at /builds/seamonkey/mozilla/layout/html/base/src/nsPresShell.cpp:1677
#5  0x416c17aa in DocumentViewerImpl::Destroy() (this=0x42b60098)
    at /builds/seamonkey/mozilla/content/base/src/nsDocumentViewer.cpp:1205
#6  0x416c2651 in DocumentViewerImpl::Show() (this=0x4322bd58)
    at /builds/seamonkey/mozilla/content/base/src/nsDocumentViewer.cpp:1437
#7  0x4205bd5a in PresShell::UnsuppressAndInvalidate() (this=0x432030e0)
    at /builds/seamonkey/mozilla/layout/html/base/src/nsPresShell.cpp:4611
#8  0x4205c020 in PresShell::UnsuppressPainting() (this=0x432030e0)
    at /builds/seamonkey/mozilla/layout/html/base/src/nsPresShell.cpp:4654
#9  0x42054fc6 in PresShell::sPaintSuppressionCallback(nsITimer*, void*) (
    aTimer=0x4323f3e0, aPresShell=0x432030e0)
    at /builds/seamonkey/mozilla/layout/html/base/src/nsPresShell.cpp:2762
#10 0x41da8d81 in nsTimerGtk::FireTimeout() (this=0x4323f3e0)
    at /builds/seamonkey/mozilla/widget/timer/src/unix/gtk/nsTimerGtk.cpp:183
#11 0x41da919d in process_timers(nsVoidArray*) (array=0x8377798)
    at /builds/seamonkey/mozilla/widget/timer/src/unix/gtk/nsTimerGtk.cpp:256
#12 0x41da925b in TimerCallbackFunc(void*) (data=0x0)
    at /builds/seamonkey/mozilla/widget/timer/src/unix/gtk/nsTimerGtk.cpp:276
#13 0x407a3983 in g_timeout_dispatch () from /usr/lib/libglib-1.2.so.0
#14 0x407a2987 in g_main_dispatch () from /usr/lib/libglib-1.2.so.0
#15 0x407a3001 in g_main_iterate () from /usr/lib/libglib-1.2.so.0
#16 0x407a31cc in g_main_run () from /usr/lib/libglib-1.2.so.0
#17 0x406b8e57 in gtk_main () from /usr/lib/libgtk-1.2.so.0
#18 0x410fea89 in nsAppShell::Run() (this=0x8141530)
    at /builds/seamonkey/mozilla/widget/src/gtk/nsAppShell.cpp:348
#19 0x40d4c11e in nsAppShellService::Run() (this=0x8138ef8)
    at /builds/seamonkey/mozilla/xpfe/appshell/src/nsAppShellService.cpp:424
#20 0x0805c6d8 in main1(int, char**, nsISupports*) (argc=2, argv=0xbffff6dc, 
    nativeApp=0x0)
    at /builds/seamonkey/mozilla/xpfe/bootstrap/nsAppRunner.cpp:1294
#21 0x0805d348 in main (argc=2, argv=0xbffff6dc)
    at /builds/seamonkey/mozilla/xpfe/bootstrap/nsAppRunner.cpp:1603
peterl's patch will leak, since he changed the |Show| to |SetPreviousViewer|. 
What you used to get by calling |SetPreviousViewer(nsnull)| you now get by
calling |Show|.  peterl's patch doesn't destroy the previous viewer.  (I changed
the semantics of SetPreviousViewer so that it doesn't destroy the previous
viewer and doesn't automatically show, because that wouldn't work for what I
needed to do.)
Yeah, I quickly found the leak too and was just about to post another patch with
|Show()| as it now looks like that does what I want and seems to work.

How are you getting the nested |Destroy()|?
Hmmm.  I suspect the real problem is that we need to destroy the pres shell's
paint suppression timer from within the doc viewer's |close| method rather than
its |destroy| method.  (But I still don't understand why we would have chaining
here.)
Hmm.  That doesn't make sense.  We should be (are we?) |destroy|ing any
intermediate doc viewers in the chain immediately after |close|ing them.
Hmm.  Is what this page is doing redirecting from a page that has a plugin?

I think perhaps the right thing to do whereever we put the code that peterl's
patch moves would be the following (replacing |if (cv) cv->Show()|):

  if (cv) {
    nsCOMPtr<nsIPresShell> shell;
    cv->GetPresShell(getter_AddRefs(shell));
    if (shell == this)
      cv->Show();
  }

Here's why:
 * The point of this code is that it does nothing (or does it trigger a repaint
we don't need or something like that?  It could depend on the widget code...) in
some cases since the cv is already shown (and therefore has no previous
viewer).  However, if the cv is not shown, we want to show it.
 * The problem here is I think that we have a document viewer that's already
being replaced, but into which frames are still being created (why?).  So we try
to show the document viewer for |this| pres shell.  It's already being show, but
we don't know that, since we don't check that the newest document viewer is
actually the one for the |this| pres shell.

In fact, we might even want to change it to:

  if (cv) {
    nsCOMPtr<nsIContentViewer> prevViewer;
    cv->GetPreviousViewer(prevViewer);
    if (prevViewer) {
      nsCOMPtr<nsIPresShell> shell;
      cv->GetPresShell(getter_AddRefs(shell));
      if (shell == this)
        cv->Show();
    }
  }

because then we would only bother calling |Show| if we weren't shown already.
Actually, the solution I proposed would break the frames case, which is what
this is.  (A frame has its on doc viewer, no?)  But why is the frames case
broken already?  Could we be setting a previous viewer when we should just be
destroying...?
That sounds right. The intent of this code is to ensure that the previous viewer
is gone. Since things are changed things around now, perhaps we can just
directly destroy the previous viewer?
We can't just destroy the old one, since we need to ensure that something is
displayed (I think).

I'm thinking more and more that this crash might be specific to pages with
frames.  I see Destroy called twice when loading this page:

1)
#3  0x420512de in PresShell::Destroy() (this=0x42d233d8)
    at /builds/seamonkey/mozilla/layout/html/base/src/nsPresShell.cpp:1677
#4  0x416c17aa in DocumentViewerImpl::Destroy() (this=0x429f2548)
    at /builds/seamonkey/mozilla/content/base/src/nsDocumentViewer.cpp:1205
#5  0x416c2651 in DocumentViewerImpl::Show() (this=0x42dd3280)
    at /builds/seamonkey/mozilla/content/base/src/nsDocumentViewer.cpp:1437
#6  0x4205bd5a in PresShell::UnsuppressAndInvalidate() (this=0x429f4088)
    at /builds/seamonkey/mozilla/layout/html/base/src/nsPresShell.cpp:4611
#7  0x4205c020 in PresShell::UnsuppressPainting() (this=0x429f4088)
    at /builds/seamonkey/mozilla/layout/html/base/src/nsPresShell.cpp:4654
#8  0x42054fc6 in PresShell::sPaintSuppressionCallback(nsITimer*, void*) (
    aTimer=0x42f4b058, aPresShell=0x429f4088)

2)
#3  0x420512de in PresShell::Destroy() (this=0x429d0e60)
    at /builds/seamonkey/mozilla/layout/html/base/src/nsPresShell.cpp:1677
#4  0x416c17aa in DocumentViewerImpl::Destroy() (this=0x42f1a048)
    at /builds/seamonkey/mozilla/content/base/src/nsDocumentViewer.cpp:1205
#5  0x416c2651 in DocumentViewerImpl::Show() (this=0x42d68f98)
    at /builds/seamonkey/mozilla/content/base/src/nsDocumentViewer.cpp:1437
#6  0x4205bd5a in PresShell::UnsuppressAndInvalidate() (this=0x42f117c8)
    at /builds/seamonkey/mozilla/layout/html/base/src/nsPresShell.cpp:4611
#7  0x4205c020 in PresShell::UnsuppressPainting() (this=0x42f117c8)
    at /builds/seamonkey/mozilla/layout/html/base/src/nsPresShell.cpp:4654
#8  0x416c119d in DocumentViewerImpl::LoadComplete(unsigned) (this=0x42d68f98, 
    aStatus=0)
    at /builds/seamonkey/mozilla/content/base/src/nsDocumentViewer.cpp:1093

hyatt, any ideas?
What do you mean by "frames have their own document viewer"?  AFAIK, there is
(excluding printing) a one-to-one correspondence with document viewers and
docshells.
smoketested mac build 2001-08-01-13-trunk and this no longer crashes.
marking fixed
Status: NEW → RESOLVED
Closed: 23 years ago
Resolution: --- → FIXED
Moving it from nsObjectFrame to nsPluginInstanceOwner seems right thing to do in 
light of the fact that reframe can occur. r=av on 44273.
Peter, is this connected somehow to bug 90268?
Sort of....this will make plugins more defensive against a reframe, which we
don't do well at anyway right now. My patch is checked in.
Hyatt said he thought the real problem might be that close() isn't getting
called for subframes.
Yes, the Close of the subframe isn't getting called until the Destroy of the
parent, since it's triggered by the frame destruction of nsHTMLFrameInnerFrame,
which calls Destroy on the docshell, which calls Close and Destroy.  That seems
like it could be part of the problem.
I think that might explain the problem.  Not calling Close on the sub-docshell
is a good thing, since it means that at least frames are fully alive until the
page stops being displayed (and we don't need the earlier Close because we're
not going to reuse the script object).  However, it means that we can get a
Show() triggered by a plugin within one of those frames that thinks it's still
the related to the docshell's current content viewer, even though it's not.  Or
something like that.

I'll think about it more in the morning...
Sorry, I gotta call you on this fix. Purify is getting a FMR on the access to
mIsDestroying. Is that comment in the patch literal? Did you anticipate that the
object might have been fully destroyed (in the C++ dtor has already run) sense?

I'll attach a Purify trace. It is from a browser-buster run and I don't know
what url it was at or anything.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Well, that would be fixed by
  nsCOMPtr<nsIPresShell> kungFuDeathGrip(this);
...but I'd really like to figure out why this is happening.  I think that purify
trace might help, but I need to think about this some more.  I suspect the bug
might be specific to pages with plugins within frames.

Anyone else have any ideas why this is happening?  I also thought we wanted to
check in peterlubczynski's fix as well...
I just noticed that twalker resolved this as fixed when I had intentionally left
it open to look at further.  I guess I should've noticed the first time, but...
Status: REOPENED → ASSIGNED
Priority: -- → P1
Target Milestone: --- → mozilla0.9.4
What that purify trace shows, I think, is that the pres shell getting the
unsupression callback is no longer active, but for some reason its paint
suppression timer wasn't stopped (|Destroy| wasn't called?).
Hrm.  Even with frames, all the paint suppression timers should be stopped in
the same event, since PresShell::Destroy both destroys the frame tree and stops
the paint suppression timer, and the subframes call Destroy on their pres
shell's in the destructor of the frames, which is triggered by the destruction
of the frame tree.  So I still have no clue what's happening here.
The obvious band-aid on a band-aid:

Index: nsPresShell.cpp
===================================================================
RCS file: /cvsroot/mozilla/layout/html/base/src/nsPresShell.cpp,v
retrieving revision 3.435
diff -u -d -r3.435 nsPresShell.cpp
--- nsPresShell.cpp	2001/08/19 04:48:34	3.435
+++ nsPresShell.cpp	2001/08/21 03:40:31
@@ -4597,6 +4597,7 @@
       nsCOMPtr<nsIContentViewer> cv;
       cvc->GetContentViewer(getter_AddRefs(cv));
       if (cv) {
+        nsCOMPtr<nsIPresShell> kungFuDeathGrip(this);
         cv->Show();
         // Calling |Show| may destroy us.  Not sure why yet, but it's
         // a smoketest blocker.
r=jag
sr=attinasi
a=asa on behalf of drivers
OK, checked that in 2001-08-24 07:00 PDT.
Target Milestone: mozilla0.9.4 → mozilla0.9.5
dbaron: Are you still contemplating a better fix here or should this be marked
FIXED?
I'm still contemplating a better fix -- sometime I'd like to sit down and really
figure out what's causing this.  However, it's not my top priority right now...
Target Milestone: mozilla0.9.5 → mozilla0.9.6
Target Milestone: mozilla0.9.6 → ---
Keywords: smoketest
wfm 20020525
Summary: Crash entering Real Player site → used to Crash entering Real Player site
Severity: critical → major
Priority: P1 → P3
Target Milestone: --- → Future
this has faded away with time, I see nocrash on trunk 1021 on NT. marking wfm
Status: ASSIGNED → RESOLVED
Closed: 23 years ago22 years ago
Resolution: --- → WORKSFORME
See the summary of the bug.  "used to crash"
Status: RESOLVED → REOPENED
Resolution: WORKSFORME → ---
Status: REOPENED → ASSIGNED
The bandaid code is hit when you load any document that contains something like
<iframe src="javascript:1"/> 

Unforunately this fix returns early without cleaning up the SetFocusSuppression
call. As a result, focus suppression gets stuck on.

The end user experiences it this way (bug 247323): when they switch tabs, a lot
of things no longer work because focus hasn't switched to the new window.

I have a patch for bug 247323 which seems to be a bandaid on top of a bandaid.
However, perhaps a better fix is if someone wants to look into this for the
proper fix, rather than:
         // Calling |Show| may destroy us.  Not sure why yet, but it's
         // a smoketest blocker.

We should also have an assertion somewhere when focus suppression gets stuck on,
and some documentation as to what it's for. I get find comments or docs related
to it.
Comment on attachment 44268 [details] [diff] [review]
simple fix that works (waterson's idea)

mozilla/layout/html/base/src/nsPresShell.cpp	3.426
Attachment #44268 - Attachment is obsolete: true
this is being kept for ... comment 46?
is there a better summary?
probably something about a maze of twisty suppressions, nested frames, destroy()s and timers.
QA Contact: shrir → plugins
I think this bug has outlived its usefulness.
Status: ASSIGNED → RESOLVED
Closed: 22 years ago12 years ago
Resolution: --- → INVALID
Product: Core → Core Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: