crash in nsGenericElement::UnbindFromTree , when I open Customize Toolbar with Video DownloadHelper 4.9.8 installed

RESOLVED FIXED in mozilla16

Status

()

Core
DOM
P2
critical
RESOLVED FIXED
6 years ago
4 years ago

People

(Reporter: Alice0775 White, Assigned: bz)

Tracking

(Blocks: 1 bug, 4 keywords)

15 Branch
mozilla16
crash, regression, reproducible, topcrash
Points:
---
Dependency tree / graph
Bug Flags:
in-testsuite ?

Firefox Tracking Flags

(firefox14+ unaffected, firefox15+ unaffected)

Details

(crash signature)

Attachments

(3 attachments, 4 obsolete attachments)

4.28 KB, patch
smaug
: review+
Details | Diff | Splinter Review
5.35 KB, patch
smaug
: review+
Details | Diff | Splinter Review
4.93 KB, patch
smaug
: review+
Details | Diff | Splinter Review
(Reporter)

Description

6 years ago
This bug was filed from the Socorro interface and is 
report bp-e8bdcfd6-c965-429f-b50a-605402120317 .
============================================================= 

Reproducible: Always

Steps to Reproduce:
1. Start Nightly with new profile
2. Install Video DownloadHelper 4.9.8 and Restart browser
   ( https://addons.mozilla.org/ja/firefox/addon/video-downloadhelper/ )
3. Open Customize Toolbar (Alt V T C)

If browser does not crash, carry out the following STR.
4. Drag & drop an Icon of Video DownloadHelper to Toolbar from palette.
5. Click Done
6. Open Customize Toolbar (Alt V T C)

Actual Results:
 Browser crashes

Expected Results:
 Shoud not.
(Reporter)

Comment 1

6 years ago
Err I forgot build ID.
http://hg.mozilla.org/mozilla-central/rev/e5f6caa40409
Mozilla/5.0 (Windows NT 6.1; WOW64; rv:14.0) Gecko/20120316 Firefox/14.0a1 ID:20120316031151
(Reporter)

Comment 2

6 years ago
Regression window:
No crash:
http://hg.mozilla.org/mozilla-central/rev/a888f210af4e
Mozilla/5.0 (Windows NT 6.1; WOW64; rv:14.0) Gecko/20120314 Firefox/14.0a1 ID:20120314130626
Crash
http://hg.mozilla.org/mozilla-central/rev/5280e98d2d77
Mozilla/5.0 (Windows NT 6.1; WOW64; rv:14.0) Gecko/20120314 Firefox/14.0a1 ID:20120314131528
Pushlog:
http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=a888f210af4e&tochange=5280e98d2d77
Triggered by:
5280e98d2d77	Kyle Huey — Bug 730587: Stash a pointer to the subtree root on DOM nodes. r=smaug, sr=j
Blocks: 730587
Keywords: regression, reproducible

Updated

6 years ago
Blocks: 561277

Updated

6 years ago
Crash Signature: [@ nsGenericElement::UnbindFromTree(bool, bool)] → [@ nsGenericElement::UnbindFromTree(bool, bool)] [@ nsGenericElement::UnbindFromTree ]
Component: General → DOM
Keywords: topcrash
OS: Windows 7 → All
Product: Firefox → Core
QA Contact: general → general
Hardware: x86 → All
Version: Trunk → 14 Branch
Assignee: nobody → khuey

Updated

6 years ago
tracking-firefox14: --- → ?
Uninstalling this extension fixes the issue.  This should probably be resolved INVALID.

Comment 4

6 years ago
(In reply to Bill Gianopoulos [:WG9s] from comment #3)
> Uninstalling this extension fixes the issue.  This should probably be
> resolved INVALID.
I disagree because it's a well identified regression in the DOM component that VDH has only highlighted. If you don't want to fix it, it's at least an Extension Compatibility bug.
Created attachment 609501 [details] [diff] [review]
Possible patch

This fixes the crash.  What else it breaks ... who knows.

Updated

6 years ago
tracking-firefox14: ? → +
Attachment #609501 - Attachment is obsolete: true
bz is going to take this.
Assignee: khuey → bzbarsky
OK.  The thing Kyle and I talked about totally doesn't work, because the destructor rearranges the DOM that UnbindFromTree is iterating.  So a scriptrunner but more carefully it is.  Here's hoping.
So I tried putting all of the XBL stuff on a scriptrunner, but that fails leads to crashes in tests because we end up trying to reresolve style on elements which have no current doc (and indeed no parent).

Furthermore, the content has a null GetPrimaryFrame() (because it's not in the document, so it's using mSubtreeRoot), but it's the mContent for a frame that's still in the DOM tree.

The fundamental problem here is that destructors can touch the anonymous content (and in the textbox case apparently do, which is why Kyle's patch failed), but that anonymous content is also supposed to get unbound by binding teardown... and may have destructors of its own.  I'd guess that somewhere in here we end up with an event posted that unbinds an element even though some other destructor code somewhere stuck it back in the tree.  Or something.

I'm not quite sure where to go with this, without breaking compat.  :(
Oh, I see the real problem.  We're inserting a node that's already in the doc, so we remove it in the old location, then insert in the new one, then end up running the now-async destructor code and then it's all broken.

We need to do the remove and insert in separate update batches or something.
Since I was cc'ed:

I really think that we need to fix XBL's insanity of running too much code inside traditionally unsafe code paths like UnbindFromTree and nsIMutationObserver callbacks. Moving to script runners really feel like the correct solution. So even if it brings some risk as far as backwards incompatibility goes, I think it's a bullet we'll need to bite sooner or later.

Beyond that there's not enough context here for me to understand what's going on, feel free to ping me on irc if you want more specific input.
Created attachment 613307 [details] [diff] [review]
Proposed scary fix

A try run is at https://tbpl.mozilla.org/?tree=Try&rev=c0b100307ac8
Attachment #613307 - Flags: review?(jonas)
Priority: -- → P2
Whiteboard: [need review]
Comment on attachment 613307 [details] [diff] [review]
Proposed scary fix

Review of attachment 613307 [details] [diff] [review]:
-----------------------------------------------------------------

::: content/base/src/nsGenericElement.cpp
@@ +4208,5 @@
> +  if (aRefChild) {
> +    return aParent->IndexOf(aRefChild);
> +  }
> +
> +  return aParent->GetChildCount();

I know optimizers used to deal better with ternary operator than multiple return statements. Not sure if that's still the case, but I find them basically as readable so I tend to prefer the ternary version.

@@ +4296,5 @@
> +      // Reget the insPos
> +      insPos = GetInsPos(this, aRefChild);
> +      if (insPos < 0) {
> +        return NS_ERROR_DOM_NOT_FOUND_ERR;
> +      }

You also need to check IsAllowedAsChild.

But I think this is wrong. It's not removing a to-be-replaced child that we want to do in a separate script-blocker. The node here can't ever be inserted into another parent during the scriptblocker.

It's where we remove aNewChild from it's old parent that we should be inside a separate scriptblocker.
Attachment #613307 - Flags: review?(jonas) → review-
> It's not removing a to-be-replaced child that we want to do in a separate script-blocker.

Hmm.  Because it's not going to be reinserted?  But that's not the important part here.  We need to remove the aRefChild _before_ we remove aNewChild, right?  And if the latter can't be under the same scripblocker as the insert (because it _is_ being reinserted), then neither can the former one be, since scriptblockers can't be "canceled" temporarily.  Hence the need for all of these to be under separate scriptblockers.
So what I think we need to do is to:

* Rearrange mutations such that we first remove newChild from its old parent, then (if needed) remove refChild from its current parent, then insert newChild in its new parent.
* Create one mutationObserverBatch and one scriptblocker which spans removing newChild from its old parent
* Create one mutationObserverBatch+scriptblocker which spans the other mutations
* Insert code in between the batches/blockers which re-checks invariants.

Updated

5 years ago
Duplicate of this bug: 745142
What does the spec say?

Comment 17

5 years ago
This is spiking again on trunk, can we find a fix before it's uplifted to Aurora and affects even more users?
I really doubt that, esp. with the approval-only business.  I'll try to get a patch done this week, but no guarantees.... if someone else has time to deal, please do so.

Updated

5 years ago
Duplicate of this bug: 749158

Updated

5 years ago
Duplicate of this bug: 750143

Comment 21

5 years ago
Could we please have some progress here? This signature is still a large topcrash on Nightly and now on Aurora as well.
> Could we please have some progress here?

If you want things to not regress in the process, then probably not in the next week unless you find someone else to do it.

For purposes of Aurora, does backing out bug 730587 make sense?
And to be clear, the changes being discussed earlier in this bug are _really_ scary.  Very high web compat risk, very high security regression risk, very high crash regression (worse than this one) risk.
(In reply to Boris Zbarsky (:bz) from comment #22)
> For purposes of Aurora, does backing out bug 730587 make sense?
Makes sense to me.

Updated

5 years ago
Duplicate of this bug: 752214
I backed bug 730587 out of Aurora.
status-firefox14: --- → unaffected
status-firefox15: --- → affected
tracking-firefox15: --- → ?
Version: 14 Branch → 15 Branch
tracking-firefox15: ? → +

Comment 27

5 years ago
Customize toolbar is a instant browser crash here on 15.0a1 (2012-05-19).

I'd guess you all know this it is being reported by the auto bug reporting.
> * Rearrange mutations such that we first remove newChild from its old parent

Fwiw, that's what the spec at http://dvcs.w3.org/hg/domcore/raw-file/tip/Overview.html#concept-node-replace says right now too:


  To replace a child with node within a parent, run these steps: 
  ...
    8. Adopt node with parent's node document. 
    9. Remove child from its parent with the suppress observers flag set. 

The adopt removes "node" from the DOM.

I'll try to do that.
Blocks: 760769
Created attachment 629448 [details] [diff] [review]
part 1.  Rearrange node insertion/replacement such that we remove the newChild from its parent before removing the refChild, if we need to remove refChild.
Attachment #629448 - Flags: review?(jonas)
Attachment #613307 - Attachment is obsolete: true
Created attachment 629449 [details] [diff] [review]
part 2.  Use separate mutation observer batches and script blockers for the removal of newChild from its parent and the other mutations we do during an insert.
Attachment #629449 - Flags: review?(jonas)
Created attachment 629450 [details] [diff] [review]
part 2.  Use separate mutation observer batches and script blockers for the removal of newChild from its parent and the other mutations we do during an insert.
Attachment #629450 - Flags: review?(jonas)
Attachment #629449 - Attachment is obsolete: true
Attachment #629449 - Flags: review?(jonas)
Created attachment 629452 [details] [diff] [review]
part 3.  Tear down XBL bindings off a scriptrunner when unbinding nodes.
Attachment #629452 - Flags: review?(jonas)
Comment on attachment 629448 [details] [diff] [review]
part 1.  Rearrange node insertion/replacement such that we remove the newChild from its parent before removing the refChild, if we need to remove refChild.

Review of attachment 629448 [details] [diff] [review]:
-----------------------------------------------------------------

Waiting with reviewing the rest until I'm sure I understand this patch correctly.

::: content/base/src/nsGenericElement.cpp
@@ +4205,5 @@
> +
> +  // Record the node to insert before, if any
> +  nsINode* nodeToInsertBefore;
> +  if (aReplace) {
> +    nodeToInsertBefore = aRefChild ? aRefChild->GetNextSibling() : nsnull;

If aReplace is then aRefChild will be true too, otherwise we would have bailed up at the top of the function.

@@ +4216,5 @@
> +    nodeToInsertBefore = nodeToInsertBefore->GetNextSibling();
> +  }
> +
> +  mozAutoDocUpdate batch(GetCurrentDoc(), UPDATE_CONTENT_MODEL, true);
> +  nsAutoMutationBatch mb;

Hmm.. I think this mutation-batch should start after removing newchild from its old parent. I.e. we want two "batches", one for removing newchild from its old parent, and one for replacing refchild with newchild (or just inserting newchild)

@@ +4245,5 @@
> +  PRInt32 insPos;
> +  if (nodeToInsertBefore) {
> +    insPos = IndexOf(nodeToInsertBefore);
> +    if (insPos < 0) {
> +      // XXXbz How the heck would _that_ happen, exactly?

This can happen as a result of XBL, right?

@@ +4254,5 @@
> +    insPos = GetChildCount();
> +  }
> +
> +  // If we're replacing and we haven't removed aRefChild yet, do so now
> +  if (aReplace && aRefChild != aNewChild) {

Why the |aRefChild != aNewChild| check? Shouldn't we treat that as removing the child and then re-inserting it? It seems like that was what the old code did?
Attachment #629448 - Flags: review?(jonas) → review-
> If aReplace is then aRefChild will be true too

Hmm.  Yeah, ok.  Will fix.

> I.e. we want two "batches"

That happens in part 2.  Part 1 is about changing the order without changing anything about the mutation batching behavior.

> This can happen as a result of XBL, right?

Hmm.  I guess so, if aRefChild is anonymous content.  OK, I'll fix the comment accordingly.

> Why the |aRefChild != aNewChild| check?

Because if aRefChild == aNewChild then we already removed it when we removed aNewChild above.  So there is nothing else to remove here (and in fact trying will trigger the asserts in that block, since aRefChild->GetNextSibling() is now null).

Updated

5 years ago
Duplicate of this bug: 762016
tracking-firefox16: --- → ?
Comment on attachment 629448 [details] [diff] [review]
part 1.  Rearrange node insertion/replacement such that we remove the newChild from its parent before removing the refChild, if we need to remove refChild.

Jonas is AWOL....
Attachment #629448 - Flags: review- → review?(bugs)
Attachment #629450 - Flags: review?(jonas) → review?(bugs)
Attachment #629452 - Flags: review?(jonas) → review?(bugs)
Comment on attachment 629452 [details] [diff] [review]
part 3.  Tear down XBL bindings off a scriptrunner when unbinding nodes.


>+class RemoveFromBindingManagerRunnable : public nsRunnable {
{ should be in the next line

>+public:
>+  RemoveFromBindingManagerRunnable(nsBindingManager* manager,
>+                                   Element* element,
>+                                   nsIDocument* doc,
>+                                   nsIContent* bindingParent):
aParameterName, please


>+    mManager(manager), mElement(element), mDoc(doc),
>+    mBindingParent(bindingParent)
>+  {}
>+
>+  NS_IMETHOD Run() {
{ to next line
Attachment #629452 - Flags: review?(bugs) → review+
I'll review the difficult parts tomorrow :)
Comment on attachment 629448 [details] [diff] [review]
part 1.  Rearrange node insertion/replacement such that we remove the newChild from its parent before removing the refChild, if we need to remove refChild.

Apparently sicking had reviewed this already, and I found the same issues before
noticing that.
Attachment #629448 - Flags: review?(bugs) → review-

Updated

5 years ago
Attachment #629450 - Flags: review?(bugs) → review+
What issues?  Did I not address them in comment 34?  Do you just want to see a patch with the changes from comment 34 applied?
(In reply to Boris Zbarsky (:bz) from comment #40)
> Do you just want to see a patch with the changes from comment 34 applied?
Yes please :)
Created attachment 631925 [details] [diff] [review]
part 1.  Rearrange node insertion/replacement such that we remove the newChild from its parent before removing the refChild, if we need to remove refChild.

Here you go!
Attachment #631925 - Flags: review?(bugs)
Attachment #629448 - Attachment is obsolete: true

Updated

5 years ago
Attachment #631925 - Flags: review?(bugs) → review+
https://hg.mozilla.org/integration/mozilla-inbound/rev/7e5996b0a171
https://hg.mozilla.org/integration/mozilla-inbound/rev/e48ff34679e3
https://hg.mozilla.org/integration/mozilla-inbound/rev/ca716959ca2b

We still need to back out bug 730587 on Aurora 15.
Flags: in-testsuite?
Whiteboard: [need review]
Target Milestone: --- → mozilla16
https://hg.mozilla.org/mozilla-central/rev/7e5996b0a171
https://hg.mozilla.org/mozilla-central/rev/e48ff34679e3
https://hg.mozilla.org/mozilla-central/rev/ca716959ca2b
Status: NEW → RESOLVED
Last Resolved: 5 years ago
Resolution: --- → FIXED
Depends on: 764591
tracking-firefox16: ? → ---
(Reporter)

Updated

5 years ago
Depends on: 765502

Updated

5 years ago
Duplicate of this bug: 752186

Updated

5 years ago
No longer blocks: 760769
Duplicate of this bug: 760769

Comment 47

5 years ago
Since this is RESOLVED FIXED, when will the "FIX" hit the Aurora channel?
Next merge on 2012-07-16. See <https://wiki.mozilla.org/RapidRelease/Calendar>.

Comment 49

5 years ago
Sorry asked the wrong question. When will the "FIX" hit FF15 which is impacted by this bug?
When bug 730587 is backed out from Aurora.  Kyle was aiming to do that yesterday, but presumably ran out of time...
https://hg.mozilla.org/releases/mozilla-aurora/rev/44103bdf65b1
status-firefox15: affected → unaffected

Comment 52

5 years ago
(In reply to Kyle Huey [:khuey] (khuey@mozilla.com) from comment #51)
> https://hg.mozilla.org/releases/mozilla-aurora/rev/44103bdf65b1

Just updated FF15a2 as of 8:20 AM CDST [6:20 AM PDST] and it still crashes when trying to open Customize Toolbar with Video DownloadHelper 4.9.8 installed.

Comment 53

5 years ago
The first Aurora build that contains the fix is 15.0a2/20120621042006.

Comment 54

5 years ago
Thanks for the info. I assume that it is in the update channel.

Comment 55

5 years ago
This FF15a2 build on OSX 

20120621042006
http://hg.mozilla.org/releases/mozilla-aurora/rev/d557426b0c8d

is still crashing when trying to open Customize Toolbar with Video DownloadHelper 4.9.8 installed.
haha, thanks for catching that.  I only backed out the followup to the bug :-P
https://hg.mozilla.org/releases/mozilla-aurora/rev/0f805ac53831

It should be fixed in tomorrow's Aurora.
(Reporter)

Updated

5 years ago
Depends on: 773297
Duplicate of this bug: 919585
You need to log in before you can comment on or make changes to this bug.