Closed Bug 232798 Opened 21 years ago Closed 21 years ago

Crash on clicking link in popup window [@ OnLinkClickEvent::HandleEvent]

Categories

(Core :: DOM: UI Events & Focus Handling, defect, P1)

x86
Linux
defect

Tracking

()

RESOLVED FIXED
mozilla1.7alpha

People

(Reporter: tenthumbs, Assigned: bzbarsky)

References

()

Details

(Keywords: crash, regression)

Crash Data

Attachments

(3 files, 2 obsolete files)

Go to
<http://www.jandr.com/JRProductPage.process?RestartFlow=t&Section_Id=1014&Product_Id=3706314>.
Click on "Multisync" link. Click on "Close window" in popup. Crash.

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 1024 (LWP 8902)]
0x00000000 in ?? ()

(gdb) bt
#0  0x00000000 in ?? ()
#1  0x4199f45a in OnLinkClickEvent::HandleEvent() (this=0x8776868)
    at nsWebShell.cpp:431
#2  0x419995bf in HandlePLEvent (aEvent=0x89723d0) at nsWebShell.cpp:448
#3  0x406f8fdc in PL_HandleEvent (self=0x8776868) at plevent.c:671
#4  0x406f8e99 in PL_ProcessPendingEvents (self=0x812e448) at plevent.c:606
#5  0x406fbeeb in nsEventQueueImpl::ProcessPendingEvents() (this=0x810af90)
    at nsEventQueue.cpp:391
#6  0x41661d05 in event_processor_callback (data=0x0, source=5, 
    condition=GDK_INPUT_READ) at nsAppShell.cpp:186
#7  0x416616fe in our_gdk_io_invoke (source=0x89723d0, condition=G_IO_IN, 
    data=0x810af90) at nsAppShell.cpp:71
#8  0x402797cf in g_io_unix_dispatch (source_data=0x89723d0, 
    current_time=0xbffff340, user_data=0x8306b90) at giounix.c:135
#9  0x4027ad9b in g_main_dispatch (dispatch_time=0xbffff340) at gmain.c:656
#10 0x4027b345 in g_main_iterate (block=0, dispatch=1) at gmain.c:877
#11 0x4027b4d6 in g_main_run (loop=0x8306c30) at gmain.c:935
#12 0x401a3137 in gtk_main () at gtkmain.c:524
#13 0x416620c2 in nsAppShell::Run() (this=0x80d6390) at nsAppShell.cpp:317
#14 0x416105ba in nsAppShellService::Run() (this=0x8171518)
    at nsAppShellService.cpp:483
#15 0x08068100 in main1 (argc=2, argv=0xbffff694, nativeApp=0x89723d0)
    at nsAppRunner.cpp:1291
#16 0x08068f71 in main (argc=2, argv=0xbffff694) at nsAppRunner.cpp:1678
#17 0x4044d17d in __libc_start_main (main=0x8068dc0 <main>, argc=2, 
    ubp_av=0xbffff694, init=0x8064aa8 <_init>, fini=0x8099970 <_fini>, 
    rtld_fini=0x4000a534 <_dl_fini>, stack_end=0xbffff68c)
    at ../sysdeps/generic/libc-start.c:129

I'm not sure this is actually where it goes bad but it's the only clue.

My Linux debug trunk build from this morning.
Bug also occurs in Mozilla 2004-01-28-08 trunk Linux.
AFAICT, the regression occured between 2004-01-26-08 and 2004-01-27-08.
-> Event Handling.
Assignee: adamlock → events
Severity: normal → critical
Component: Embedding: Docshell → Event Handling
Keywords: crash, regression
QA Contact: adamlock → ian
Summary: Crash on clicking link in popup window → Crash on clicking link in popup window [@ OnLinkClickEvent::HandleEvent]
Works for me with:
Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7a) Gecko/20040130 Firebird/0.8.0+
It looks like the nsWebShell destructor fires before the content node calls
OnLinkClick on it... How is that managing to happen?  We're getting the link
handler off the prescontext; is the prescontext holding on to a webshell past
the point where it's destroyed?  Who's supposed to clear out the mLinkHandler
pointer in the prescontext?

I suspect that dbaron's changes to clean up stuff more quickly during GC are
what's biting us here.... up to now, that bug masked the problem here.
Attached patch PatchSplinter Review
I think we need to be prepared that after every HandleDOMEvent the content node

could have been removed from the document and the document might not exist.
Attachment #140388 - Flags: review?(bz-vacation)
The question is, if the script event handler removes the <a> from the document,
should the link still trigger?

In addition to that, we still need to resolve this link handler issue -- just
because we fix this place that accesses the dangling pointer doesn't mean
someone else won't do it....  For that matter, why is this pointer on the
prescontext?
Blocks: 233240
No longer blocks: 233240
Flags: blocking1.7a?
Blocks: 233240
*** Bug 233543 has been marked as a duplicate of this bug. ***
We can also probably work around this by keeping a strong reference to the link
handler in TriggerLink, as was the case before my patch for bug 229371.  I
normally keep strong references to _everything_ related to the document or
presentation that I obtain before dispatching a DOM event and then need to
access again after the event... I didn't catch this one.
The problem is that by TriggerLink it's too late.  The document is already gone
(because the JS code has already executed).
Attached patch Maybe more something like this? (obsolete) — Splinter Review
Can someone who can reproduce this bug (I can't anymore, for some reason) try
this?  This should fix the root of the problem (the dangling pointer).
Comment on attachment 141108 [details] [diff] [review]
Maybe more something like this?

I still crash with this patch applied.
You're reset of the link handler was not effective
since the pres context was null in this case.
(but I think this patch is good and should be added
anyway, to cover other potential problems).
Attached file Trace + stack (obsolete) —
Here is a bit of tracing of what's going on.
Look for:  nsWebShell::Destroy 0x87b6cd8 (context=(nil))
> nsDocShell::Destroy 0x87b6cd8 AFTER mContentViewer->Destroy()

Hmm.. so maybe the following code in DocumentViewerImpl::~DocumentViewerImpl
needs to move to DocumentViewerImpl::Destroy instead?  Does that help?

533   // clear weak references before we go away
534   if (mPresContext) {
535     mPresContext->SetContainer(nsnull);
536     mPresContext->SetLinkHandler(nsnull);
537   }

If that helps, I am a little confused -- we're clearly getting a null context in
the webshell, but that context comes off the doc viewer.... and the doc viewer
does not clear its mPresContext pointer, really...  and I made sure to add my
GetPresContext() call _before_ calling nsDocShell::Destroy....
Moving that block to (the end of) Destroy did not help.
Trace + stack is exactly the same (except for the actual addresses).
Is mPresContext non-null in DocumentViewerImpl::Destroy?
Attached file Trace + stack
There were two calls to DocumentViewerImpl::Destroy -
in one of them mPresContext was null.
Look for: DocumentViewerImpl::Destroy
Attachment #141113 - Attachment is obsolete: true
Ah, now we're getting somewhere.  That's also the webshell that didn't get a
prescontext in my patch, and it's the webshell that crashes at the end.

Looking at the doc viewer code, we should probably set the link handler and
container to null in DocumentViewerImpl::Hide right before we set mPresContext
to null.

And as long as we're here, we should add an assert in Show() right before we
assign into mPresContext to the effect that mPresContext is null at that point
(if it's not, bad things happen).
With this in DocumentViewerImpl::Hide as you suggested fixes the crash:
  if (mPresContext) {
    mPresContext->SetContainer(nsnull);
    mPresContext->SetLinkHandler(nsnull);
  }
I've dropped the nsWebShell changes; any time we go through there
nsDocShell::Destroy will deal for us anyway.
Attachment #140388 - Attachment is obsolete: true
Attachment #141108 - Attachment is obsolete: true
Attachment #141120 - Flags: superreview?(jst)
Attachment #141120 - Flags: review?(jst)
Comment on attachment 140388 [details] [diff] [review]
Patch

Actually, leaving this not-obsolete for now; I'd still like to know whether a
link that gets removed from the DOM by the onclick should still trigger...
Attachment #140388 - Attachment is obsolete: false
Attachment #140388 - Flags: superreview?(jst)
Comment on attachment 141120 [details] [diff] [review]
So this should work, right?

Yes, this works fine.
Comment on attachment 141120 [details] [diff] [review]
So this should work, right?

r+sr=jst
Attachment #141120 - Flags: superreview?(jst)
Attachment #141120 - Flags: superreview+
Attachment #141120 - Flags: review?(jst)
Attachment #141120 - Flags: review+
Assignee: events → bzbarsky
Priority: -- → P1
Target Milestone: --- → mozilla1.7alpha
Patch checked in for 1.7a.  Thanks for the help in debugging this, Mats!

Leaving open for now pending resolution of the question on removed nodes.
I guess we should plus this and mark it fixed, right? ;-)
Flags: blocking1.7a? → blocking1.7a+
Yeah, except for the marking fixed part.  See comment 22.
Interesting... I get a different crash in a similar case (i.e., click on a link
in a popup window). This crash is in the method PresShell::DidDoReflow
(nsPresShell.cp:6349), because the mViewManager is null (DidDoReflow was called
from PresShell::ProcessReflowCommands, in turn called from
ReflowEvent::HandleEvent). What happens when I click the link is a bit different
though:

* The location URI for the popup window is changed by JavaScript.

* The contents from the new URI opens a new popup window (this requires popup
blocking to be disabled, or the site to be white-listed) and closes the old.

* Firefox crashes when both windows are open, and there's nothing rendered
in the new window (nothing at all, so it is in reality transparent, until the
Windows XP crash dialog opens).

Should I file this as a separate bug?
Yes, please.  Make sure it happens in a current trunk build, then file it (cc
me, dbaron, roc, and Mats, please).
OK, filed as bug 234435.
marking this fixed so it gets on the testing radar.  let's follow up with other
bugs if more are needed.  thanks
Status: NEW → RESOLVED
Closed: 21 years ago
Resolution: --- → FIXED
Comment on attachment 140388 [details] [diff] [review]
Patch

See bug 234722
Attachment #140388 - Flags: superreview?(jst)
Attachment #140388 - Flags: superreview-
Attachment #140388 - Flags: review?(bzbarsky)
Attachment #140388 - Flags: review-
Crash Signature: [@ OnLinkClickEvent::HandleEvent]
Component: Event Handling → User events and focus handling
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: