Closed Bug 736695 Opened 8 years ago Closed 8 years ago

crash in nsGenericElement::UnbindFromTree , when I open Customize Toolbar with Video DownloadHelper 4.9.8 installed

Categories

(Core :: DOM: Core & HTML, defect, P2, critical)

15 Branch
defect

Tracking

()

RESOLVED FIXED
mozilla16
Tracking Status
firefox14 + unaffected
firefox15 + unaffected

People

(Reporter: alice0775, Assigned: bzbarsky)

References

Details

(4 keywords)

Crash Data

Attachments

(3 files, 4 obsolete files)

4.28 KB, patch
smaug
: review+
Details | Diff | Splinter Review
5.35 KB, patch
smaug
: review+
Details | Diff | Splinter Review
4.93 KB, patch
smaug
: review+
Details | Diff | Splinter Review
This bug was filed from the Socorro interface and is 
report bp-e8bdcfd6-c965-429f-b50a-605402120317 .
============================================================= 

Reproducible: Always

Steps to Reproduce:
1. Start Nightly with new profile
2. Install Video DownloadHelper 4.9.8 and Restart browser
   ( https://addons.mozilla.org/ja/firefox/addon/video-downloadhelper/ )
3. Open Customize Toolbar (Alt V T C)

If browser does not crash, carry out the following STR.
4. Drag & drop an Icon of Video DownloadHelper to Toolbar from palette.
5. Click Done
6. Open Customize Toolbar (Alt V T C)

Actual Results:
 Browser crashes

Expected Results:
 Shoud not.
Err I forgot build ID.
http://hg.mozilla.org/mozilla-central/rev/e5f6caa40409
Mozilla/5.0 (Windows NT 6.1; WOW64; rv:14.0) Gecko/20120316 Firefox/14.0a1 ID:20120316031151
Regression window:
No crash:
http://hg.mozilla.org/mozilla-central/rev/a888f210af4e
Mozilla/5.0 (Windows NT 6.1; WOW64; rv:14.0) Gecko/20120314 Firefox/14.0a1 ID:20120314130626
Crash
http://hg.mozilla.org/mozilla-central/rev/5280e98d2d77
Mozilla/5.0 (Windows NT 6.1; WOW64; rv:14.0) Gecko/20120314 Firefox/14.0a1 ID:20120314131528
Pushlog:
http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=a888f210af4e&tochange=5280e98d2d77
Triggered by:
5280e98d2d77	Kyle Huey — Bug 730587: Stash a pointer to the subtree root on DOM nodes. r=smaug, sr=j
Blocks: 730587
Blocks: 561277
Crash Signature: [@ nsGenericElement::UnbindFromTree(bool, bool)] → [@ nsGenericElement::UnbindFromTree(bool, bool)] [@ nsGenericElement::UnbindFromTree ]
Component: General → DOM
Keywords: topcrash
OS: Windows 7 → All
Product: Firefox → Core
QA Contact: general → general
Hardware: x86 → All
Version: Trunk → 14 Branch
Uninstalling this extension fixes the issue.  This should probably be resolved INVALID.
(In reply to Bill Gianopoulos [:WG9s] from comment #3)
> Uninstalling this extension fixes the issue.  This should probably be
> resolved INVALID.
I disagree because it's a well identified regression in the DOM component that VDH has only highlighted. If you don't want to fix it, it's at least an Extension Compatibility bug.
Attached patch Possible patch (obsolete) — Splinter Review
This fixes the crash.  What else it breaks ... who knows.
bz is going to take this.
Assignee: khuey → bzbarsky
OK.  The thing Kyle and I talked about totally doesn't work, because the destructor rearranges the DOM that UnbindFromTree is iterating.  So a scriptrunner but more carefully it is.  Here's hoping.
So I tried putting all of the XBL stuff on a scriptrunner, but that fails leads to crashes in tests because we end up trying to reresolve style on elements which have no current doc (and indeed no parent).

Furthermore, the content has a null GetPrimaryFrame() (because it's not in the document, so it's using mSubtreeRoot), but it's the mContent for a frame that's still in the DOM tree.

The fundamental problem here is that destructors can touch the anonymous content (and in the textbox case apparently do, which is why Kyle's patch failed), but that anonymous content is also supposed to get unbound by binding teardown... and may have destructors of its own.  I'd guess that somewhere in here we end up with an event posted that unbinds an element even though some other destructor code somewhere stuck it back in the tree.  Or something.

I'm not quite sure where to go with this, without breaking compat.  :(
Oh, I see the real problem.  We're inserting a node that's already in the doc, so we remove it in the old location, then insert in the new one, then end up running the now-async destructor code and then it's all broken.

We need to do the remove and insert in separate update batches or something.
Since I was cc'ed:

I really think that we need to fix XBL's insanity of running too much code inside traditionally unsafe code paths like UnbindFromTree and nsIMutationObserver callbacks. Moving to script runners really feel like the correct solution. So even if it brings some risk as far as backwards incompatibility goes, I think it's a bullet we'll need to bite sooner or later.

Beyond that there's not enough context here for me to understand what's going on, feel free to ping me on irc if you want more specific input.
Priority: -- → P2
Whiteboard: [need review]
Comment on attachment 613307 [details] [diff] [review]
Proposed scary fix

Review of attachment 613307 [details] [diff] [review]:
-----------------------------------------------------------------

::: content/base/src/nsGenericElement.cpp
@@ +4208,5 @@
> +  if (aRefChild) {
> +    return aParent->IndexOf(aRefChild);
> +  }
> +
> +  return aParent->GetChildCount();

I know optimizers used to deal better with ternary operator than multiple return statements. Not sure if that's still the case, but I find them basically as readable so I tend to prefer the ternary version.

@@ +4296,5 @@
> +      // Reget the insPos
> +      insPos = GetInsPos(this, aRefChild);
> +      if (insPos < 0) {
> +        return NS_ERROR_DOM_NOT_FOUND_ERR;
> +      }

You also need to check IsAllowedAsChild.

But I think this is wrong. It's not removing a to-be-replaced child that we want to do in a separate script-blocker. The node here can't ever be inserted into another parent during the scriptblocker.

It's where we remove aNewChild from it's old parent that we should be inside a separate scriptblocker.
Attachment #613307 - Flags: review?(jonas) → review-
> It's not removing a to-be-replaced child that we want to do in a separate script-blocker.

Hmm.  Because it's not going to be reinserted?  But that's not the important part here.  We need to remove the aRefChild _before_ we remove aNewChild, right?  And if the latter can't be under the same scripblocker as the insert (because it _is_ being reinserted), then neither can the former one be, since scriptblockers can't be "canceled" temporarily.  Hence the need for all of these to be under separate scriptblockers.
So what I think we need to do is to:

* Rearrange mutations such that we first remove newChild from its old parent, then (if needed) remove refChild from its current parent, then insert newChild in its new parent.
* Create one mutationObserverBatch and one scriptblocker which spans removing newChild from its old parent
* Create one mutationObserverBatch+scriptblocker which spans the other mutations
* Insert code in between the batches/blockers which re-checks invariants.
Duplicate of this bug: 745142
What does the spec say?
This is spiking again on trunk, can we find a fix before it's uplifted to Aurora and affects even more users?
I really doubt that, esp. with the approval-only business.  I'll try to get a patch done this week, but no guarantees.... if someone else has time to deal, please do so.
Duplicate of this bug: 749158
Duplicate of this bug: 750143
Could we please have some progress here? This signature is still a large topcrash on Nightly and now on Aurora as well.
> Could we please have some progress here?

If you want things to not regress in the process, then probably not in the next week unless you find someone else to do it.

For purposes of Aurora, does backing out bug 730587 make sense?
And to be clear, the changes being discussed earlier in this bug are _really_ scary.  Very high web compat risk, very high security regression risk, very high crash regression (worse than this one) risk.
(In reply to Boris Zbarsky (:bz) from comment #22)
> For purposes of Aurora, does backing out bug 730587 make sense?
Makes sense to me.
Duplicate of this bug: 752214
I backed bug 730587 out of Aurora.
Version: 14 Branch → 15 Branch
Customize toolbar is a instant browser crash here on 15.0a1 (2012-05-19).

I'd guess you all know this it is being reported by the auto bug reporting.
> * Rearrange mutations such that we first remove newChild from its old parent

Fwiw, that's what the spec at http://dvcs.w3.org/hg/domcore/raw-file/tip/Overview.html#concept-node-replace says right now too:


  To replace a child with node within a parent, run these steps: 
  ...
    8. Adopt node with parent's node document. 
    9. Remove child from its parent with the suppress observers flag set. 

The adopt removes "node" from the DOM.

I'll try to do that.
Blocks: 760769
Attachment #613307 - Attachment is obsolete: true
Attachment #629449 - Attachment is obsolete: true
Attachment #629449 - Flags: review?(jonas)
Comment on attachment 629448 [details] [diff] [review]
part 1.  Rearrange node insertion/replacement such that we remove the newChild from its parent before removing the refChild, if we need to remove refChild.

Review of attachment 629448 [details] [diff] [review]:
-----------------------------------------------------------------

Waiting with reviewing the rest until I'm sure I understand this patch correctly.

::: content/base/src/nsGenericElement.cpp
@@ +4205,5 @@
> +
> +  // Record the node to insert before, if any
> +  nsINode* nodeToInsertBefore;
> +  if (aReplace) {
> +    nodeToInsertBefore = aRefChild ? aRefChild->GetNextSibling() : nsnull;

If aReplace is then aRefChild will be true too, otherwise we would have bailed up at the top of the function.

@@ +4216,5 @@
> +    nodeToInsertBefore = nodeToInsertBefore->GetNextSibling();
> +  }
> +
> +  mozAutoDocUpdate batch(GetCurrentDoc(), UPDATE_CONTENT_MODEL, true);
> +  nsAutoMutationBatch mb;

Hmm.. I think this mutation-batch should start after removing newchild from its old parent. I.e. we want two "batches", one for removing newchild from its old parent, and one for replacing refchild with newchild (or just inserting newchild)

@@ +4245,5 @@
> +  PRInt32 insPos;
> +  if (nodeToInsertBefore) {
> +    insPos = IndexOf(nodeToInsertBefore);
> +    if (insPos < 0) {
> +      // XXXbz How the heck would _that_ happen, exactly?

This can happen as a result of XBL, right?

@@ +4254,5 @@
> +    insPos = GetChildCount();
> +  }
> +
> +  // If we're replacing and we haven't removed aRefChild yet, do so now
> +  if (aReplace && aRefChild != aNewChild) {

Why the |aRefChild != aNewChild| check? Shouldn't we treat that as removing the child and then re-inserting it? It seems like that was what the old code did?
Attachment #629448 - Flags: review?(jonas) → review-
> If aReplace is then aRefChild will be true too

Hmm.  Yeah, ok.  Will fix.

> I.e. we want two "batches"

That happens in part 2.  Part 1 is about changing the order without changing anything about the mutation batching behavior.

> This can happen as a result of XBL, right?

Hmm.  I guess so, if aRefChild is anonymous content.  OK, I'll fix the comment accordingly.

> Why the |aRefChild != aNewChild| check?

Because if aRefChild == aNewChild then we already removed it when we removed aNewChild above.  So there is nothing else to remove here (and in fact trying will trigger the asserts in that block, since aRefChild->GetNextSibling() is now null).
Duplicate of this bug: 762016
Comment on attachment 629448 [details] [diff] [review]
part 1.  Rearrange node insertion/replacement such that we remove the newChild from its parent before removing the refChild, if we need to remove refChild.

Jonas is AWOL....
Attachment #629448 - Flags: review- → review?(bugs)
Attachment #629450 - Flags: review?(jonas) → review?(bugs)
Attachment #629452 - Flags: review?(jonas) → review?(bugs)
Comment on attachment 629452 [details] [diff] [review]
part 3.  Tear down XBL bindings off a scriptrunner when unbinding nodes.


>+class RemoveFromBindingManagerRunnable : public nsRunnable {
{ should be in the next line

>+public:
>+  RemoveFromBindingManagerRunnable(nsBindingManager* manager,
>+                                   Element* element,
>+                                   nsIDocument* doc,
>+                                   nsIContent* bindingParent):
aParameterName, please


>+    mManager(manager), mElement(element), mDoc(doc),
>+    mBindingParent(bindingParent)
>+  {}
>+
>+  NS_IMETHOD Run() {
{ to next line
Attachment #629452 - Flags: review?(bugs) → review+
I'll review the difficult parts tomorrow :)
Comment on attachment 629448 [details] [diff] [review]
part 1.  Rearrange node insertion/replacement such that we remove the newChild from its parent before removing the refChild, if we need to remove refChild.

Apparently sicking had reviewed this already, and I found the same issues before
noticing that.
Attachment #629448 - Flags: review?(bugs) → review-
Attachment #629450 - Flags: review?(bugs) → review+
What issues?  Did I not address them in comment 34?  Do you just want to see a patch with the changes from comment 34 applied?
(In reply to Boris Zbarsky (:bz) from comment #40)
> Do you just want to see a patch with the changes from comment 34 applied?
Yes please :)
Attachment #629448 - Attachment is obsolete: true
Attachment #631925 - Flags: review?(bugs) → review+
Depends on: 765502
Duplicate of this bug: 752186
No longer blocks: 760769
Duplicate of this bug: 760769
Since this is RESOLVED FIXED, when will the "FIX" hit the Aurora channel?
Sorry asked the wrong question. When will the "FIX" hit FF15 which is impacted by this bug?
When bug 730587 is backed out from Aurora.  Kyle was aiming to do that yesterday, but presumably ran out of time...
(In reply to Kyle Huey [:khuey] (khuey@mozilla.com) from comment #51)
> https://hg.mozilla.org/releases/mozilla-aurora/rev/44103bdf65b1

Just updated FF15a2 as of 8:20 AM CDST [6:20 AM PDST] and it still crashes when trying to open Customize Toolbar with Video DownloadHelper 4.9.8 installed.
The first Aurora build that contains the fix is 15.0a2/20120621042006.
Thanks for the info. I assume that it is in the update channel.
This FF15a2 build on OSX 

20120621042006
http://hg.mozilla.org/releases/mozilla-aurora/rev/d557426b0c8d

is still crashing when trying to open Customize Toolbar with Video DownloadHelper 4.9.8 installed.
haha, thanks for catching that.  I only backed out the followup to the bug :-P
Depends on: 773297
Duplicate of this bug: 919585
Component: DOM → DOM: Core & HTML
You need to log in before you can comment on or make changes to this bug.