Closed Bug 373911 Opened 17 years ago Closed 17 years ago

xbl destructor bound to body causes trouble [@ nsXBLBinding::AllowScripts]

Categories

(Core :: XBL, defect)

x86
Linux
defect
Not set
normal

Tracking

()

RESOLVED FIXED

People

(Reporter: guninski, Assigned: smaug)

References

()

Details

(Keywords: fixed1.8.0.15, verified1.8.1.10, verified1.8.1.9, Whiteboard: [sg:critical?])

Attachments

(6 files, 2 obsolete files)

xbl destructor bound to body causes trouble

xbl destructor bound to body doing location.replace('javascript:...')
and then enumerating |document| causes crash with scary registers,
almost sure use after free.

 
#6  0xb5efd303 in nsINodeInfo::GetDocument (this=0x5b0020)
    at ../../dist/include/content/nsINodeInfo.h:291
#7  0xb5efeae0 in nsINode::GetOwnerDoc (this=0x8f6f7a0)
    at ../../dist/include/content/nsINode.h:205
#8  0xb6396c53 in nsXBLBinding::AllowScripts (this=0x8f75fe8)
    at /opt/joro/firefox/mozilla/content/xbl/src/nsXBLBinding.cpp:1198
#9  0xb6397a1d in nsXBLBinding::ExecuteDetachedHandler (this=0x8f75fe8)
    at /opt/joro/firefox/mozilla/content/xbl/src/nsXBLBinding.cpp:801
#10 0xb63b8657 in ExecuteDetachedHandler (aBinding=0x8f75fe8, aClosure=0x0)
    at /opt/joro/firefox/mozilla/content/xbl/src/nsBindingManager.cpp:842
#11 0xb7d57098 in nsVoidArray::EnumerateForwards (this=0xbfd2f780, 
    aFunc=0xb63b85fc <ExecuteDetachedHandler>, aData=0x0)
    at nsVoidArray.cpp:678
#12 0xb63b85ea in nsBindingManager::ExecuteDetachedHandlers (this=0x8f8c050)
    at /opt/joro/firefox/mozilla/content/xbl/src/nsBindingManager.cpp:855
#13 0xb64257a2 in nsGlobalWindow::PostHandleEvent (this=0x8f5c1d0,

(gdb) frame 6
#6  0xb5efd303 in nsINodeInfo::GetDocument (this=0x5b0020)
    at ../../dist/include/content/nsINodeInfo.h:291
291         return mOwnerManager->GetDocument();
(gdb) p mOwnerManager 
Cannot access memory at address 0x5b0020
(gdb) x/i $eip
0xb5efd303 <_ZNK11nsINodeInfo11GetDocumentEv+21>:       mov    0x14(%eax),%eax
(gdb) p/x $eax
$1 = 0x5b0020
(gdb) x/4x $eax
0x5b0020:       Cannot access memory at address 0x5b0020
(gdb)
Assignee: nobody → general
Severity: normal → critical
Component: General → XBL
Product: Firefox → Core
QA Contact: general → ian
Summary: xbl destructor bound to body causes trouble → xbl destructor bound to body causes trouble [@ nsXBLBinding::AllowScripts]
Attached file testcase
I think this testcase shows the same crash. It happens when the window gets destroyed in the destructor.
martijn, your testcase seems very similar to these:
Bug 332971 and Bug 332807

will attach my testcases in this bug.
Assignee: general → Olli.Pettay
Severity: critical → normal
Component: XBL → DOM: Events
Component: DOM: Events → XBL
This fixes the crash.
Bound elements must stay alive while iterating through xbl destructors.

nsBindingList AddRefs/Releases automatically.

(Have to write a testcase first, but maybe similar thing is needed for 
constructors.)
Attachment #258579 - Flags: superreview?(bzbarsky)
Attachment #258579 - Flags: review?(bzbarsky)
(In reply to comment #6)
> 
> (Have to write a testcase first, but maybe similar thing is needed for 
> constructors.)
> 

it works from constructors.
in "testcase inside iframe" just move the code from the destructor to the constructor.
mean constructors also crash.

martijn testcase crashes 2.0.0.2
Attached patch fixes contructor too (obsolete) — Splinter Review
The constructor crash doesn't seem to be XBL problem.
I'm not too happy with the change I had to do to presshell.
Attachment #258579 - Attachment is obsolete: true
Attachment #258579 - Flags: superreview?(bzbarsky)
Attachment #258579 - Flags: review?(bzbarsky)
do you expect problems xbl event handler to kill as much content as it can and then try to access dead objects?
I don't expect, but if you find any problems, please CC me to the bug.

Event handling is quite often protected because of the fact that
event target chain is kept alive until the end of dispatch.
Did you mean to ask for review on that last patch?
Attachment #258647 - Flags: review?(bzbarsky)
(In reply to comment #2)
> I think this testcase shows the same crash. It happens when the window gets
> destroyed in the destructor.
> 

not quite sure it is the same bug, though definitely quite similar problem.
your testcase works on 2.0.0.2 so you may consider filing a different bug (there are similar bugs xbl removing elements, your one may be a duplicate of some of them)

Comment on attachment 258647 [details] [diff] [review]
fixes contructor too

>Index: content/xbl/src/nsBindingManager.cpp
>+    closure->mBindings.AppendElement(aBinding);

Say this succeeds....

>+    closure->mBoundElements.AppendObject(aBinding->GetBoundElement());

And this fails with out-of-memory.  Won't you crash the same way?

More interestingly, should bindings just hold a strong ref to the bound element?

>Index: layout/base/nsPresShell.cpp

I'm not entirely convinced of this change.  Why is it needed, given http://bonsai.mozilla.org/cvsblame.cgi?file=mozilla/layout/base/nsPresShell.cpp&rev=3.982&mark=4569-4574#4564 ?
>Index: layout/generic/nsGfxScrollFrame.cpp

>+  nsCOMPtr<nsIPresShell> presShell = mOuter->GetPresContext()->GetPresShell();
>+  presShell->FlushPendingNotifications(Flush_OnlyReflow);

Any other callers of this that should be holding strong refs?
Attachment #258647 - Flags: review?(bzbarsky) → review-
Flags: blocking1.9?
Flags: blocking1.9? → blocking1.9+
More to the point, if the presshell can go away during that flush, why is it safe to access mAsyncScrollPortEvent at that point?  That doesn't make sense to me...

I'd be happy to just have Flush_OnlyReflow be able to destroy the presshell (in which case I would probably disallow passing it period, in favor of Flush_Layout), but then all consumers need to deal with it.
I think we could just keep viewmanager alive in flushpendingnotifications.
I didn't make boundelement strong because we need to iterate through
bindings anyway and if the patch is needed in 1.8, there is no
cycle collector there.

Changes to nsGfxScrollFrameInner::FireScrollPortEvent aren't needed 
anymore.
Attachment #258647 - Attachment is obsolete: true
Attachment #267140 - Flags: review?(bzbarsky)
I'm not sure I follow this patch.  Is the viewmanager destroyed as part of a Flush_OnlyReflow flush?  Or as part of a Flush_StyleReresolves flush?  The former is a problem.  The latter is acceptable.  Can you tell me exactly what the crashing sequence is here?  That would make it easier to review things...

Oh, and you can't remove those XXX comments, because the problem they're talking about is still there even with this patch applied.
hmm,  I can't get the constructor (nor destructor) testcase to crash if 
I have only the fixes for content/xbl. Something has changed...
er, nm, ofc it doesn't crash anymore, but I'll still think about this some more.
Attachment #267140 - Flags: review?(bzbarsky)
(In reply to comment #19)
> hmm,  I can't get the constructor (nor destructor) testcase to crash if 
> I have only the fixes for content/xbl. Something has changed...
> 

looks like someone broke inline xbl which broke martijn's xbl testcase (it doesn't load)

xulifr3.html in this bug still crashes locally with this stack:
#6  0x08732e6a in nsXBLBinding::AllowScripts (this=0x9d684a0)
    at /opt/joro/firefox-cvs/mozilla/content/xbl/src/nsXBLBinding.cpp:1237
#7  0x08733c47 in nsXBLBinding::ExecuteDetachedHandler (this=0x9d684a0)
    at /opt/joro/firefox-cvs/mozilla/content/xbl/src/nsXBLBinding.cpp:835
#8  0x0874ac57 in ExecuteDetachedHandler (aBinding=0x9d684a0, aClosure=0x0)
    at /opt/joro/firefox-cvs/mozilla/content/xbl/src/nsBindingManager.cpp:872
#9  0xb7d3c5dc in nsVoidArray::EnumerateForwards (this=0xbfe90a54, 
    aFunc=0x874ac0e <ExecuteDetachedHandler>, aData=0x0) at nsVoidArray.cpp:678
#10 0x0874ac00 in nsBindingManager::ExecuteDetachedHandlers (this=0x9c50ed8)
    at /opt/joro/firefox-cvs/mozilla/content/xbl/src/nsBindingManager.cpp:885
#11 0x0879c291 in nsGlobalWindow::PostHandleEvent (this=0x9b5c4c0, 
inline xbl works here fine and both destructor testcases crash.
(In reply to comment #22)
> inline xbl works here fine and both destructor testcases crash.
> 

    hmm, when reloading martijn's attachment "testcase" from bugzilla the stack is scary: xulifr3.html still works.
    #5  <signal handler called>
    #6  0x0823969c in nsNodeInfoManager::GetDocument (this=0xd8d8d8d8)
        at ../../dist/include/content/nsNodeInfoManager.h:113
    #7  0x082396b6 in nsINodeInfo::GetDocument (this=0xa0f6e18)
        at ../../dist/include/content/nsINodeInfo.h:291
    #8  0x08239818 in nsINode::GetOwnerDoc (this=0x9b4c198)
        at ../../dist/include/content/nsINode.h:213
    #9  0x08732e4b in nsXBLBinding::AllowScripts (this=0x9b50260)
        at /opt/joro/firefox-cvs/mozilla/content/xbl/src/nsXBLBinding.cpp:1232

    (gdb) frame 6
    #6  0x0823969c in nsNodeInfoManager::GetDocument (this=0xd8d8d8d8)
        at ../../dist/include/content/nsNodeInfoManager.h:113
    113         return mDocument;
    (gdb) x/i $pc
    0x823969c <_ZNK17nsNodeInfoManager11GetDocumentEv+6>:   mov    0xc(%eax),%eax
    (gdb) p/x $eax
    $1 = 0xd8d8d8d8

(In reply to comment #18)
> I'm not sure I follow this patch.  Is the viewmanager destroyed as part of a
> Flush_OnlyReflow flush?  Or as part of a Flush_StyleReresolves flush?  The
> former is a problem.  The latter is acceptable.


ProcessReflowCommands() may cause PresShell's Destroy to be called.
That is because DidDoReflow() calls HandlePostedReflowCallbacks().
Comment on attachment 267140 [details] [diff] [review]
Keep viewmanager alive

So processing reflow commands may destroy presshell. Need to look at other cases when Flush_OnlyReflow is used.

What problem is there still with viewmanager? At least it gets EndUpdateViewBatch if 
BeginUpdateViewBatch is called.
Attachment #267140 - Flags: review?(bzbarsky)
Which reflow callback can destroy presshells?

I'm happy to get rid of FlushOnlyReflow; that's come up in other bugs too.  But then we absolutely need to fix the callers -- every single caller is doing it because it can't handle frametree changes.

As for viewmanager, batches are stored on the root of the viewmanager tree.  So ending the batch after it's been removed from the tree has no effect -- the rest of the tree will never render right again.  And the batch counter on that viewmanager will underflow, of course.

So basically this patch is wallpaper.  I'd rather we fixed the real problem(s).
Some attributes are set in reflow callbacks, for example in xul menu code and
scrollbars IIRC.
OK.  Sounds like we need to make the assumption that flushing layout can always run arbitrary script, and change all callers accordingly.  We'll want this change too, I guess, but it's not nearly enough.
I really really don't want layout to run scripts during reflow. Not only does that make layout code a lot harder, it would also mean that in a pile of places where we can perform reflow, such as when getting layout DOM properties like offset* and during nsIMutationObserver notifications.

Aren't those attribute-sets using PR_FALSE for aNotify and should therefore not run scripts? If that's not the case, could we set those attributes off an event instead?
Scripts don't (or shouldn't) run during reflow, but it is possible that scripts 
run right *after* reflow, when reflow callbacks are called.
See comment #24.
Sicking's point is that for everyone but the actual frame tree code and presshell it's the same thing.
Comment on attachment 267140 [details] [diff] [review]
Keep viewmanager alive

r+ as wallpaper if you restore the comment you shouldn't be removing.  But we really need to fix the actual bug.  Before we ship 1.9.  Otherwise we're in for a world of hurt...
Attachment #267140 - Flags: review?(bzbarsky) → review+
Comment on attachment 267140 [details] [diff] [review]
Keep viewmanager alive

I'll post a follow-up if this gets sr.
Attachment #267140 - Flags: superreview?(dbaron)
Status: NEW → ASSIGNED
Comment on attachment 267140 [details] [diff] [review]
Keep viewmanager alive

sr=dbaron
Attachment #267140 - Flags: superreview?(dbaron) → superreview+
Status: ASSIGNED → RESOLVED
Closed: 17 years ago
Resolution: --- → FIXED
Flags: in-testsuite?
Whiteboard: [sg:critical?
Whiteboard: [sg:critical? → [sg:critical?]
Might want this on branch to fix bug 400735, if this patch does so.
Flags: blocking1.8.1.9?
Flags: blocking1.8.1.10?
Blocks: 400735
The presshell part isn't perfect, but prevents possible crashes.
It isn't needed to fix Bug 400735.
Comparing to trunk version, the xbl part has manual AddRef/Release
Attachment #285866 - Flags: superreview?(bzbarsky)
Attachment #285866 - Flags: review?(bzbarsky)
Comment on attachment 285866 [details] [diff] [review]
possible patch for 18

Looks good.
Attachment #285866 - Flags: superreview?(bzbarsky)
Attachment #285866 - Flags: superreview+
Attachment #285866 - Flags: review?(bzbarsky)
Attachment #285866 - Flags: review+
Attachment #285866 - Flags: approval1.8.1.9?
Comment on attachment 285866 [details] [diff] [review]
possible patch for 18

approved for 1.8.1.9 and 1.8.1.10, a=dveditz for release-drivers

For 1.8.1.9 please land on the GECKO181_20071004_RELBRANCH. The regular 1.8 branch will become 1.8.1.10
Attachment #285866 - Flags: approval1.8.1.9?
Attachment #285866 - Flags: approval1.8.1.9+
Attachment #285866 - Flags: approval1.8.1.10+
Checked in to MOZILLA_1_8_BRANCH and GECKO181_20071004_RELBRANCH
Flags: blocking1.8.1.9?
Flags: blocking1.8.1.9+
Flags: blocking1.8.1.10?
Flags: blocking1.8.1.10+
verified fixed 1.8.1.9 using Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.1.9pre) Gecko/2007102403 BonEcho/2.0.0.9pre and the testcase from martijn from this bug. 

No Crash on testcase -> adding verified keyword
Why would this fix bug 400735? The patch here makes us more stable while executing destructors, but bug 400735 is about us crashing while running constructors.
Ah, true. I made the 1.8 patch because of the testcase in bug 400735 and
because of https://bugzilla.mozilla.org/show_bug.cgi?id=400735#c11.
But it seems that the testcase is possibly wrong.
Flags: blocking1.8.1.11+ → blocking1.8.1.10+
verified fixed 1.8.1.10 using Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.1.10) Gecko/2007111504 Firefox/2.0.0.10 and the testcases from this bug -> no crash


-> adding verified keyword
Group: security
(In reply to comment #32)
> r+ as wallpaper if you restore the comment you shouldn't be removing.  But we
> really need to fix the actual bug.  Before we ship 1.9.  Otherwise we're in for
> a world of hurt..

(In reply to comment #33))
> I'll post a follow-up if this gets sr.

Did a follow-up bug on this ever get filed?
Comment on attachment 285866 [details] [diff] [review]
possible patch for 18

a=asac for 1.8.0.15

(same patch as shipped by distros)
Attachment #285866 - Flags: approval1.8.0.15+
please land on 1.8.0 branch
Flags: blocking1.8.0.15+
Keywords: checkin-needed
MOZILLA_1_8_0_BRANCH:

Checking in content/xbl/src/nsBindingManager.cpp;
/cvsroot/mozilla/content/xbl/src/nsBindingManager.cpp,v  <--  nsBindingManager.cpp
new revision: 1.136.2.1.4.5; previous revision: 1.136.2.1.4.4
done
Checking in layout/base/nsPresShell.cpp;
/cvsroot/mozilla/layout/base/nsPresShell.cpp,v  <--  nsPresShell.cpp
new revision: 3.852.2.11.2.11; previous revision: 3.852.2.11.2.10
done
You need to log in before you can comment on or make changes to this bug.