Last Comment Bug 736695 - crash in nsGenericElement::UnbindFromTree , when I open Customize Toolbar with Video DownloadHelper 4.9.8 installed
: crash in nsGenericElement::UnbindFromTree , when I open Customize Toolbar wi...
Status: RESOLVED FIXED
: crash, regression, reproducible, topcrash
Product: Core
Classification: Components
Component: DOM (show other bugs)
: 15 Branch
: All All
: P2 critical with 3 votes (vote)
: mozilla16
Assigned To: Boris Zbarsky [:bz] (Out June 25-July 6)
:
Mentors:
: 745142 749158 750143 752186 752214 760769 762016 919585 (view as bug list)
Depends on: 764591 765502 773297
Blocks: 561277 730587
  Show dependency treegraph
 
Reported: 2012-03-16 20:46 PDT by Alice0775 White
Modified: 2014-03-04 06:59 PST (History)
23 users (show)
bzbarsky: in‑testsuite?
See Also:
Crash Signature:
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---
+
unaffected
+
unaffected


Attachments
Possible patch (2.59 KB, patch)
2012-03-26 14:47 PDT, Kyle Huey [:khuey] (khuey@mozilla.com)
no flags Details | Diff | Review
Proposed scary fix (9.75 KB, patch)
2012-04-09 08:53 PDT, Boris Zbarsky [:bz] (Out June 25-July 6)
jonas: review-
Details | Diff | Review
part 1. Rearrange node insertion/replacement such that we remove the newChild from its parent before removing the refChild, if we need to remove refChild. (4.95 KB, patch)
2012-06-02 00:48 PDT, Boris Zbarsky [:bz] (Out June 25-July 6)
bugs: review-
Details | Diff | Review
part 2. Use separate mutation observer batches and script blockers for the removal of newChild from its parent and the other mutations we do during an insert. (4.26 KB, patch)
2012-06-02 00:48 PDT, Boris Zbarsky [:bz] (Out June 25-July 6)
no flags Details | Diff | Review
part 2. Use separate mutation observer batches and script blockers for the removal of newChild from its parent and the other mutations we do during an insert. (4.28 KB, patch)
2012-06-02 00:52 PDT, Boris Zbarsky [:bz] (Out June 25-July 6)
bugs: review+
Details | Diff | Review
part 3. Tear down XBL bindings off a scriptrunner when unbinding nodes. (5.35 KB, patch)
2012-06-02 00:53 PDT, Boris Zbarsky [:bz] (Out June 25-July 6)
bugs: review+
Details | Diff | Review
part 1. Rearrange node insertion/replacement such that we remove the newChild from its parent before removing the refChild, if we need to remove refChild. (4.93 KB, patch)
2012-06-11 09:45 PDT, Boris Zbarsky [:bz] (Out June 25-July 6)
bugs: review+
Details | Diff | Review

Description Alice0775 White 2012-03-16 20:46:52 PDT
This bug was filed from the Socorro interface and is 
report bp-e8bdcfd6-c965-429f-b50a-605402120317 .
============================================================= 

Reproducible: Always

Steps to Reproduce:
1. Start Nightly with new profile
2. Install Video DownloadHelper 4.9.8 and Restart browser
   ( https://addons.mozilla.org/ja/firefox/addon/video-downloadhelper/ )
3. Open Customize Toolbar (Alt V T C)

If browser does not crash, carry out the following STR.
4. Drag & drop an Icon of Video DownloadHelper to Toolbar from palette.
5. Click Done
6. Open Customize Toolbar (Alt V T C)

Actual Results:
 Browser crashes

Expected Results:
 Shoud not.
Comment 1 Alice0775 White 2012-03-16 20:48:20 PDT
Err I forgot build ID.
http://hg.mozilla.org/mozilla-central/rev/e5f6caa40409
Mozilla/5.0 (Windows NT 6.1; WOW64; rv:14.0) Gecko/20120316 Firefox/14.0a1 ID:20120316031151
Comment 2 Alice0775 White 2012-03-16 21:13:21 PDT
Regression window:
No crash:
http://hg.mozilla.org/mozilla-central/rev/a888f210af4e
Mozilla/5.0 (Windows NT 6.1; WOW64; rv:14.0) Gecko/20120314 Firefox/14.0a1 ID:20120314130626
Crash
http://hg.mozilla.org/mozilla-central/rev/5280e98d2d77
Mozilla/5.0 (Windows NT 6.1; WOW64; rv:14.0) Gecko/20120314 Firefox/14.0a1 ID:20120314131528
Pushlog:
http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=a888f210af4e&tochange=5280e98d2d77
Triggered by:
5280e98d2d77	Kyle Huey — Bug 730587: Stash a pointer to the subtree root on DOM nodes. r=smaug, sr=j
Comment 3 Bill Gianopoulos [:WG9s] 2012-03-25 13:43:56 PDT
Uninstalling this extension fixes the issue.  This should probably be resolved INVALID.
Comment 4 Scoobidiver (away) 2012-03-25 23:36:30 PDT
(In reply to Bill Gianopoulos [:WG9s] from comment #3)
> Uninstalling this extension fixes the issue.  This should probably be
> resolved INVALID.
I disagree because it's a well identified regression in the DOM component that VDH has only highlighted. If you don't want to fix it, it's at least an Extension Compatibility bug.
Comment 5 Kyle Huey [:khuey] (khuey@mozilla.com) 2012-03-26 14:47:54 PDT
Created attachment 609501 [details] [diff] [review]
Possible patch

This fixes the crash.  What else it breaks ... who knows.
Comment 6 Kyle Huey [:khuey] (khuey@mozilla.com) 2012-04-03 11:57:54 PDT
bz is going to take this.
Comment 7 Boris Zbarsky [:bz] (Out June 25-July 6) 2012-04-03 14:25:30 PDT
OK.  The thing Kyle and I talked about totally doesn't work, because the destructor rearranges the DOM that UnbindFromTree is iterating.  So a scriptrunner but more carefully it is.  Here's hoping.
Comment 8 Boris Zbarsky [:bz] (Out June 25-July 6) 2012-04-04 11:28:37 PDT
So I tried putting all of the XBL stuff on a scriptrunner, but that fails leads to crashes in tests because we end up trying to reresolve style on elements which have no current doc (and indeed no parent).

Furthermore, the content has a null GetPrimaryFrame() (because it's not in the document, so it's using mSubtreeRoot), but it's the mContent for a frame that's still in the DOM tree.

The fundamental problem here is that destructors can touch the anonymous content (and in the textbox case apparently do, which is why Kyle's patch failed), but that anonymous content is also supposed to get unbound by binding teardown... and may have destructors of its own.  I'd guess that somewhere in here we end up with an event posted that unbinds an element even though some other destructor code somewhere stuck it back in the tree.  Or something.

I'm not quite sure where to go with this, without breaking compat.  :(
Comment 9 Boris Zbarsky [:bz] (Out June 25-July 6) 2012-04-04 11:44:16 PDT
Oh, I see the real problem.  We're inserting a node that's already in the doc, so we remove it in the old location, then insert in the new one, then end up running the now-async destructor code and then it's all broken.

We need to do the remove and insert in separate update batches or something.
Comment 10 Jonas Sicking (:sicking) PTO Until July 5th 2012-04-04 14:15:17 PDT
Since I was cc'ed:

I really think that we need to fix XBL's insanity of running too much code inside traditionally unsafe code paths like UnbindFromTree and nsIMutationObserver callbacks. Moving to script runners really feel like the correct solution. So even if it brings some risk as far as backwards incompatibility goes, I think it's a bullet we'll need to bite sooner or later.

Beyond that there's not enough context here for me to understand what's going on, feel free to ping me on irc if you want more specific input.
Comment 11 Boris Zbarsky [:bz] (Out June 25-July 6) 2012-04-09 08:53:35 PDT
Created attachment 613307 [details] [diff] [review]
Proposed scary fix

A try run is at https://tbpl.mozilla.org/?tree=Try&rev=c0b100307ac8
Comment 12 Jonas Sicking (:sicking) PTO Until July 5th 2012-04-09 17:22:11 PDT
Comment on attachment 613307 [details] [diff] [review]
Proposed scary fix

Review of attachment 613307 [details] [diff] [review]:
-----------------------------------------------------------------

::: content/base/src/nsGenericElement.cpp
@@ +4208,5 @@
> +  if (aRefChild) {
> +    return aParent->IndexOf(aRefChild);
> +  }
> +
> +  return aParent->GetChildCount();

I know optimizers used to deal better with ternary operator than multiple return statements. Not sure if that's still the case, but I find them basically as readable so I tend to prefer the ternary version.

@@ +4296,5 @@
> +      // Reget the insPos
> +      insPos = GetInsPos(this, aRefChild);
> +      if (insPos < 0) {
> +        return NS_ERROR_DOM_NOT_FOUND_ERR;
> +      }

You also need to check IsAllowedAsChild.

But I think this is wrong. It's not removing a to-be-replaced child that we want to do in a separate script-blocker. The node here can't ever be inserted into another parent during the scriptblocker.

It's where we remove aNewChild from it's old parent that we should be inside a separate scriptblocker.
Comment 13 Boris Zbarsky [:bz] (Out June 25-July 6) 2012-04-09 18:42:50 PDT
> It's not removing a to-be-replaced child that we want to do in a separate script-blocker.

Hmm.  Because it's not going to be reinserted?  But that's not the important part here.  We need to remove the aRefChild _before_ we remove aNewChild, right?  And if the latter can't be under the same scripblocker as the insert (because it _is_ being reinserted), then neither can the former one be, since scriptblockers can't be "canceled" temporarily.  Hence the need for all of these to be under separate scriptblockers.
Comment 14 Jonas Sicking (:sicking) PTO Until July 5th 2012-04-10 12:35:17 PDT
So what I think we need to do is to:

* Rearrange mutations such that we first remove newChild from its old parent, then (if needed) remove refChild from its current parent, then insert newChild in its new parent.
* Create one mutationObserverBatch and one scriptblocker which spans removing newChild from its old parent
* Create one mutationObserverBatch+scriptblocker which spans the other mutations
* Insert code in between the batches/blockers which re-checks invariants.
Comment 15 Scoobidiver (away) 2012-04-13 13:17:30 PDT
*** Bug 745142 has been marked as a duplicate of this bug. ***
Comment 16 Olli Pettay [:smaug] (high review load, please consider other reviewers) 2012-04-13 13:20:16 PDT
What does the spec say?
Comment 17 Robert Kaiser (not working on stability any more) 2012-04-19 06:50:24 PDT
This is spiking again on trunk, can we find a fix before it's uplifted to Aurora and affects even more users?
Comment 18 Boris Zbarsky [:bz] (Out June 25-July 6) 2012-04-19 08:10:40 PDT
I really doubt that, esp. with the approval-only business.  I'll try to get a patch done this week, but no guarantees.... if someone else has time to deal, please do so.
Comment 19 Scoobidiver (away) 2012-04-27 01:00:55 PDT
*** Bug 749158 has been marked as a duplicate of this bug. ***
Comment 20 Scoobidiver (away) 2012-04-30 02:11:11 PDT
*** Bug 750143 has been marked as a duplicate of this bug. ***
Comment 21 Robert Kaiser (not working on stability any more) 2012-04-30 07:12:52 PDT
Could we please have some progress here? This signature is still a large topcrash on Nightly and now on Aurora as well.
Comment 22 Boris Zbarsky [:bz] (Out June 25-July 6) 2012-04-30 07:16:59 PDT
> Could we please have some progress here?

If you want things to not regress in the process, then probably not in the next week unless you find someone else to do it.

For purposes of Aurora, does backing out bug 730587 make sense?
Comment 23 Boris Zbarsky [:bz] (Out June 25-July 6) 2012-04-30 07:17:51 PDT
And to be clear, the changes being discussed earlier in this bug are _really_ scary.  Very high web compat risk, very high security regression risk, very high crash regression (worse than this one) risk.
Comment 24 Olli Pettay [:smaug] (high review load, please consider other reviewers) 2012-04-30 07:20:39 PDT
(In reply to Boris Zbarsky (:bz) from comment #22)
> For purposes of Aurora, does backing out bug 730587 make sense?
Makes sense to me.
Comment 25 Scoobidiver (away) 2012-05-05 13:46:55 PDT
*** Bug 752214 has been marked as a duplicate of this bug. ***
Comment 26 Kyle Huey [:khuey] (khuey@mozilla.com) 2012-05-07 12:31:39 PDT
I backed bug 730587 out of Aurora.
Comment 27 Yani 2012-05-19 10:58:32 PDT
Customize toolbar is a instant browser crash here on 15.0a1 (2012-05-19).

I'd guess you all know this it is being reported by the auto bug reporting.
Comment 28 Boris Zbarsky [:bz] (Out June 25-July 6) 2012-06-01 14:08:23 PDT
> * Rearrange mutations such that we first remove newChild from its old parent

Fwiw, that's what the spec at http://dvcs.w3.org/hg/domcore/raw-file/tip/Overview.html#concept-node-replace says right now too:


  To replace a child with node within a parent, run these steps: 
  ...
    8. Adopt node with parent's node document. 
    9. Remove child from its parent with the suppress observers flag set. 

The adopt removes "node" from the DOM.

I'll try to do that.
Comment 29 Boris Zbarsky [:bz] (Out June 25-July 6) 2012-06-02 00:48:09 PDT
Created attachment 629448 [details] [diff] [review]
part 1.  Rearrange node insertion/replacement such that we remove the newChild from its parent before removing the refChild, if we need to remove refChild.
Comment 30 Boris Zbarsky [:bz] (Out June 25-July 6) 2012-06-02 00:48:52 PDT
Created attachment 629449 [details] [diff] [review]
part 2.  Use separate mutation observer batches and script blockers for the removal of newChild from its parent and the other mutations we do during an insert.
Comment 31 Boris Zbarsky [:bz] (Out June 25-July 6) 2012-06-02 00:52:44 PDT
Created attachment 629450 [details] [diff] [review]
part 2.  Use separate mutation observer batches and script blockers for the removal of newChild from its parent and the other mutations we do during an insert.
Comment 32 Boris Zbarsky [:bz] (Out June 25-July 6) 2012-06-02 00:53:51 PDT
Created attachment 629452 [details] [diff] [review]
part 3.  Tear down XBL bindings off a scriptrunner when unbinding nodes.
Comment 33 Jonas Sicking (:sicking) PTO Until July 5th 2012-06-03 21:05:08 PDT
Comment on attachment 629448 [details] [diff] [review]
part 1.  Rearrange node insertion/replacement such that we remove the newChild from its parent before removing the refChild, if we need to remove refChild.

Review of attachment 629448 [details] [diff] [review]:
-----------------------------------------------------------------

Waiting with reviewing the rest until I'm sure I understand this patch correctly.

::: content/base/src/nsGenericElement.cpp
@@ +4205,5 @@
> +
> +  // Record the node to insert before, if any
> +  nsINode* nodeToInsertBefore;
> +  if (aReplace) {
> +    nodeToInsertBefore = aRefChild ? aRefChild->GetNextSibling() : nsnull;

If aReplace is then aRefChild will be true too, otherwise we would have bailed up at the top of the function.

@@ +4216,5 @@
> +    nodeToInsertBefore = nodeToInsertBefore->GetNextSibling();
> +  }
> +
> +  mozAutoDocUpdate batch(GetCurrentDoc(), UPDATE_CONTENT_MODEL, true);
> +  nsAutoMutationBatch mb;

Hmm.. I think this mutation-batch should start after removing newchild from its old parent. I.e. we want two "batches", one for removing newchild from its old parent, and one for replacing refchild with newchild (or just inserting newchild)

@@ +4245,5 @@
> +  PRInt32 insPos;
> +  if (nodeToInsertBefore) {
> +    insPos = IndexOf(nodeToInsertBefore);
> +    if (insPos < 0) {
> +      // XXXbz How the heck would _that_ happen, exactly?

This can happen as a result of XBL, right?

@@ +4254,5 @@
> +    insPos = GetChildCount();
> +  }
> +
> +  // If we're replacing and we haven't removed aRefChild yet, do so now
> +  if (aReplace && aRefChild != aNewChild) {

Why the |aRefChild != aNewChild| check? Shouldn't we treat that as removing the child and then re-inserting it? It seems like that was what the old code did?
Comment 34 Boris Zbarsky [:bz] (Out June 25-July 6) 2012-06-04 08:15:24 PDT
> If aReplace is then aRefChild will be true too

Hmm.  Yeah, ok.  Will fix.

> I.e. we want two "batches"

That happens in part 2.  Part 1 is about changing the order without changing anything about the mutation batching behavior.

> This can happen as a result of XBL, right?

Hmm.  I guess so, if aRefChild is anonymous content.  OK, I'll fix the comment accordingly.

> Why the |aRefChild != aNewChild| check?

Because if aRefChild == aNewChild then we already removed it when we removed aNewChild above.  So there is nothing else to remove here (and in fact trying will trigger the asserts in that block, since aRefChild->GetNextSibling() is now null).
Comment 35 Scoobidiver (away) 2012-06-07 02:55:52 PDT
*** Bug 762016 has been marked as a duplicate of this bug. ***
Comment 36 Boris Zbarsky [:bz] (Out June 25-July 6) 2012-06-08 14:37:59 PDT
Comment on attachment 629448 [details] [diff] [review]
part 1.  Rearrange node insertion/replacement such that we remove the newChild from its parent before removing the refChild, if we need to remove refChild.

Jonas is AWOL....
Comment 37 Olli Pettay [:smaug] (high review load, please consider other reviewers) 2012-06-10 13:22:24 PDT
Comment on attachment 629452 [details] [diff] [review]
part 3.  Tear down XBL bindings off a scriptrunner when unbinding nodes.


>+class RemoveFromBindingManagerRunnable : public nsRunnable {
{ should be in the next line

>+public:
>+  RemoveFromBindingManagerRunnable(nsBindingManager* manager,
>+                                   Element* element,
>+                                   nsIDocument* doc,
>+                                   nsIContent* bindingParent):
aParameterName, please


>+    mManager(manager), mElement(element), mDoc(doc),
>+    mBindingParent(bindingParent)
>+  {}
>+
>+  NS_IMETHOD Run() {
{ to next line
Comment 38 Olli Pettay [:smaug] (high review load, please consider other reviewers) 2012-06-10 13:30:46 PDT
I'll review the difficult parts tomorrow :)
Comment 39 Olli Pettay [:smaug] (high review load, please consider other reviewers) 2012-06-11 03:49:09 PDT
Comment on attachment 629448 [details] [diff] [review]
part 1.  Rearrange node insertion/replacement such that we remove the newChild from its parent before removing the refChild, if we need to remove refChild.

Apparently sicking had reviewed this already, and I found the same issues before
noticing that.
Comment 40 Boris Zbarsky [:bz] (Out June 25-July 6) 2012-06-11 08:06:33 PDT
What issues?  Did I not address them in comment 34?  Do you just want to see a patch with the changes from comment 34 applied?
Comment 41 Olli Pettay [:smaug] (high review load, please consider other reviewers) 2012-06-11 09:40:26 PDT
(In reply to Boris Zbarsky (:bz) from comment #40)
> Do you just want to see a patch with the changes from comment 34 applied?
Yes please :)
Comment 42 Boris Zbarsky [:bz] (Out June 25-July 6) 2012-06-11 09:45:03 PDT
Created attachment 631925 [details] [diff] [review]
part 1.  Rearrange node insertion/replacement such that we remove the newChild from its parent before removing the refChild, if we need to remove refChild.

Here you go!
Comment 45 Scoobidiver (away) 2012-06-19 07:23:53 PDT
*** Bug 752186 has been marked as a duplicate of this bug. ***
Comment 46 Scoobidiver (away) 2012-06-20 06:04:57 PDT
*** Bug 760769 has been marked as a duplicate of this bug. ***
Comment 47 RobertJ 2012-06-20 11:56:38 PDT
Since this is RESOLVED FIXED, when will the "FIX" hit the Aurora channel?
Comment 48 :Ms2ger 2012-06-20 11:57:39 PDT
Next merge on 2012-07-16. See <https://wiki.mozilla.org/RapidRelease/Calendar>.
Comment 49 RobertJ 2012-06-20 12:00:33 PDT
Sorry asked the wrong question. When will the "FIX" hit FF15 which is impacted by this bug?
Comment 50 Boris Zbarsky [:bz] (Out June 25-July 6) 2012-06-20 12:08:11 PDT
When bug 730587 is backed out from Aurora.  Kyle was aiming to do that yesterday, but presumably ran out of time...
Comment 51 Kyle Huey [:khuey] (khuey@mozilla.com) 2012-06-21 01:07:37 PDT
https://hg.mozilla.org/releases/mozilla-aurora/rev/44103bdf65b1
Comment 52 RobertJ 2012-06-21 06:29:10 PDT
(In reply to Kyle Huey [:khuey] (khuey@mozilla.com) from comment #51)
> https://hg.mozilla.org/releases/mozilla-aurora/rev/44103bdf65b1

Just updated FF15a2 as of 8:20 AM CDST [6:20 AM PDST] and it still crashes when trying to open Customize Toolbar with Video DownloadHelper 4.9.8 installed.
Comment 53 Scoobidiver (away) 2012-06-21 06:37:42 PDT
The first Aurora build that contains the fix is 15.0a2/20120621042006.
Comment 54 RobertJ 2012-06-21 06:45:49 PDT
Thanks for the info. I assume that it is in the update channel.
Comment 55 RobertJ 2012-06-21 09:27:43 PDT
This FF15a2 build on OSX 

20120621042006
http://hg.mozilla.org/releases/mozilla-aurora/rev/d557426b0c8d

is still crashing when trying to open Customize Toolbar with Video DownloadHelper 4.9.8 installed.
Comment 56 Kyle Huey [:khuey] (khuey@mozilla.com) 2012-06-21 09:30:09 PDT
haha, thanks for catching that.  I only backed out the followup to the bug :-P
Comment 57 Kyle Huey [:khuey] (khuey@mozilla.com) 2012-06-21 13:07:07 PDT
https://hg.mozilla.org/releases/mozilla-aurora/rev/0f805ac53831

It should be fixed in tomorrow's Aurora.
Comment 58 Andrew McCreight (PTO-ish through 6-29) [:mccr8] 2014-03-04 06:59:55 PST
*** Bug 919585 has been marked as a duplicate of this bug. ***

Note You need to log in before you can comment on or make changes to this bug.