Closed Bug 325730 Opened 19 years ago Closed 18 years ago

Crash changing document during DOMNodeRemoved [@ nsAttrAndChildArray::InsertChildAt]

Categories

(Core :: DOM: Core & HTML, defect)

defect
Not set
critical

Tracking

()

RESOLVED FIXED

People

(Reporter: jruderman, Assigned: sicking)

References

Details

(4 keywords, Whiteboard: [sg:critical?])

Crash Data

Attachments

(5 files, 4 obsolete files)

Mozilla/5.0 (Macintosh; U; PPC Mac OS X Mach-O; en-US; rv:1.9a1) Gecko/20060202 Firefox/1.6a1 (late build)

Testcase triggers

###!!! ASSERTION: out-of-bounds: 'aPos <= ChildCount()', file /Users/admin/trunk/mozilla/content/base/src/nsAttrAndChildArray.cpp, line 143

and then crashes, quits, or otherwise dies.
Attached file testcase
Attached file stack trace
ditto 1.8 and 1.9a1 winxp builds.
OS: MacOS X → All
Hardware: Macintosh → All
Whiteboard: [sg:critical?]
Attached patch Patch to fix (obsolete) — Splinter Review
This should fix a pile of mutations-in-mutations bugs. The code keeps a mutation-counter that can be used to see if any unexpected mutations occur. If unexpected mutations occurs the code updates neccesary indexes as well as checks if to-be-inserted still are allowed to be inserted.

I ran the code on the testcase in this bug as well as some testcases i wrote locally that used fragments without crashing. I also set breakpoints in all codepaths that are run for mutations-in-mutations and none were run during normal browsing or poking around in chrome dialogs.


I also made us share a lot of code between doInsertBefore and doReplace since with proper fixes they are practically the same. An added advantage of this is that the fast-path code for fragments is used even for ReplaceChild.


I removed the fragment code that tried to revert changes if inserting a fragment failed halfway through. Failing to insert nodes happens pretty much only in OOM cases, so trying to recover seems pretty futile. And keeping the nodes in the fragment during insertion is begging for mutations headache. Especially since they children previously were unbound from the fragment, but still kept in the childarray.
Attachment #211971 - Flags: superreview?(jst)
Attachment #211971 - Flags: review?(bzbarsky)
This code depends on bug 327256, but i don't want to mark the dependency since this one is marked security sensitive.
No longer depends on: 327256
Oh, there is another slight change in functionality with this patch:

When calling a.replaceChild(b, c) we used to do:

b.parentNode.removeChild(b);
a.removeChild(c);
a.insertBefore(b, ...);

We now do:

a.removeChild(c);
b.parentNode.removeChild(b);
a.insertBefore(b, ...);


But I don't see anything in the spec that says anything about which order things should be done in. I also made

a.replaceChild(b, b);
and
a.insertBefore(b, b);

both be no-ops.
Comment on attachment 211971 [details] [diff] [review]
Patch to fix

I haven't really sorted through this yet; just some general initial comments:

>@@ -2486,17 +2490,17 @@ nsGenericElement::doInsertChildAt(nsICon
>+    if (aParent && aParent->IsInDoc() &&

Why?  Same at a few other places.

>-nsGenericElement::isSelfOrAncestor(nsIContent *aNode,
>-   * If aPossibleAncestor doesn't have children it can't be our ancestor

Do we want to add that optimization to ContentIsDescendantOf?  I dunno that it's worth it...

>+  if (aParent && nsContentUtils::ContentIsDescendantOf(aParent, aNewChild)) {
>+    return PR_FALSE;
>+  }
>+
>+

Extra line here?  Why?

>+    LL_ADD(gen, sMutationCount, 1);

I believe NSPR has officially dropped support for platforms without native 64-bit types.  In other words, you can just use PRU?Int64 as a native type without having to bother with the LL macros.

More thorough review hopefully by early next week at the latest.
> >@@ -2486,17 +2490,17 @@ nsGenericElement::doInsertChildAt(nsICon
> >+    if (aParent && aParent->IsInDoc() &&
> 
> Why?  Same at a few other places.

I wanted to make absolutly sure we don't fire mutation events on nodes not in the tree. This is so that I wouldn't have to worry about mutations firing while I remove the childnodes from the documentFragment.

However we already behave this way. HasMutationListeners will return false if there is no current document. Just wanted to make it a bit more explicit.

Alternativly, i could add code that deals with mutations while removing the childnodes of the fragment.

> >-nsGenericElement::isSelfOrAncestor(nsIContent *aNode,
> >-   * If aPossibleAncestor doesn't have children it can't be our ancestor
> 
> Do we want to add that optimization to ContentIsDescendantOf?  I dunno that
> it's worth it...

I doubt it is. In most cases the parent chain isn't more then a handful of levels, and in that case the virtual call is probably as much work as just walking up the parent chain. Additionally, it seems like a lot of the time we'd end up both walking the parent chain and checking the childcount.

> >+  if (aParent && nsContentUtils::ContentIsDescendantOf(aParent, aNewChild))
> >+    return PR_FALSE;
> >+  }
> >+
> >+
> 
> Extra line here?  Why?

Fixed

> >+    LL_ADD(gen, sMutationCount, 1);
> 
> I believe NSPR has officially dropped support for platforms without native
> 64-bit types.  In other words, you can just use PRU?Int64 as a native type
> without having to bother with the LL macros.

Yeah, brendan told me about this yesterday too. I'll remove the macros.
Attached patch Patch v1.1 (obsolete) — Splinter Review
I also added a bounds check at the top of doInsertChildAt. Just in case all else fails, or someone calls nsINode::InsertChildAt directly.
Attachment #211971 - Attachment is obsolete: true
Attachment #212126 - Flags: superreview?(jst)
Attachment #212126 - Flags: review?(bzbarsky)
Attachment #211971 - Flags: superreview?(jst)
Attachment #211971 - Flags: review?(bzbarsky)
> I wanted to make absolutly sure we don't fire mutation events on nodes not in
> the tree.

Sure, but why?  I see nothing about this in the spec, and we generally fire events on nodes not in the tree; times when we don't are by and large bugs.

> This is so that I wouldn't have to worry about mutations firing while
> I remove the childnodes from the documentFragment.

I don't think this one issue is worth blocking all mutation events for disconnected subtrees, frankly.

> Alternativly, i could add code that deals with mutations while removing the
> childnodes of the fragment.

I'd prefer that.
So maybe I'm anal for even thinking about this, but it seems like a more reliable way for doing this (independent of the number of mutations that can be done from JS in any reasonable ammount of time, i.e. 32/64 bit values wrapping or not) would be to not count the number of nested mutations that happen as the result of doing a mutation, but to set a counter to be how many we want to allow before double checking things (which at least for now is always 1), and decrementing that counter when doing nested mutations and stopping the decrementing when the counter hits 0. IOW, we wouldn't know how many mutations happend, only if more than N (which is always 1 atm) happened, and really, that's all we care about here.

Jonas and I talked about this, and he thinks we could do this with a simple helper class that would do the decrementing etc for us.

I's not entirely clear to either of us if it's worth doing, but conceptually it'd be a safer approach.
If it's not too much more complex, let's do it just to be safe.  But wrapping over a 64-bit counter is really really hard.  I guess we shouldn't make assumptions about how fast computers will be tomorrow, eh?  ;)
For the record, on my shiny dual pentium 3GHz it'd take 35 million years to cycle a 64bit. :)
Attached patch Patch v2 (obsolete) — Splinter Review
Two things are updated in this patch:

1. Use an alternative approach to the mutationcounter. This approach is safe
   against wrapping and is fairly clean. All uglyness is kept in a new
   nsMutationGuard class.
2. Don't add IsInDoc checks for mutations and instead deal with mutations when
   the nodes are removed from the fragment.
Attachment #212126 - Attachment is obsolete: true
Attachment #212276 - Flags: superreview?(jst)
Attachment #212276 - Flags: review?(bzbarsky)
Attachment #212126 - Flags: superreview?(jst)
Attachment #212126 - Flags: review?(bzbarsky)
Flags: blocking1.9a1?
Flags: blocking1.8.1?
Flags: blocking1.8.0.3?
Comment on attachment 212276 [details] [diff] [review]
Patch v2

Looks good. sr=jst
Attachment #212276 - Flags: superreview?(jst) → superreview+
For the record, this doesn't compile on linux as is (from nsGenericElement.cpp):

for (PRUint32 j = ...) {
  ...
}

if (j != ...)
  ...

You'll need to declare j outside of the for loop, of course.
Comment on attachment 212276 [details] [diff] [review]
Patch v2

>Index: content/base/src/nsGenericElement.cpp

>+nsGenericElement::doReplaceOrInsertBefore(PRBool 
>+  // Figure out which index to insert

s/insert/insert at/

>+    if (guard.Mutated(1)) {

I guess we don't really want to do the security check here, eh?  Should be OK, I guess....

>+      if (!IsAllowedAsChild(newContent, nodeType, aParent, aDocument,
>+                            PR_FALSE, refContent)) {

PR_FALSE because we've already removed the thing we're replacing, so all that's left is an insert, right?  Document that, please.


>+      // We don't need to update i if someone mutates the DOM. The only thing
>+      // that'd happen is that the resulting child list might be unexpected,
>+      // but we should never crash since RemoveChildAt is OOB safe.

Maybe "might be shorter than expected"?  And expand out "out-of-bounds"?

>+        for (PRUint32 j = i + 1; j < count; ++j) {

Hmm.  So if every insert mutates, this will be O(N^2) in the number of kids in the fragment...  We may not care, but if we do and we're willing to have a slightly different behavior, we could just have a boolean "do check" flag and check only the node we're about to insert if the flag is set.  The flag would start false and go true and stay true if there is ever a mutation.

That would mean we could insert a bunch of the kids before bailing out on one, though.  Either way, I guess.

In any case, the loop body duplicates code up above, no?  Want to factor it out into a function?  No need to even inline it; codesize over perf in this case, imo.

>+    nsINode* oldParent = newContent->GetParent();
>+    if (!oldParent && newContent->IsInDoc()) {
>+      oldParent = newContent->GetCurrentDoc();

Why bother with the IsInDoc() check? GetCurrentDoc() will do that anyway, no?

>+    if (oldParent) {
>+      nsMutationGuard guard;
> 
>+      PRInt32 removeIndex = oldParent->IndexOf(newContent);
>+      NS_ASSERTION(removeIndex >= 0 && removeIndex != insPos,
>+                   "invalid removeIndex");

Why assert if removeIndex == insPos?  They're indices for different nodes in different child lists, no?

Also, this assert _will_ fire for XUL, generally speaking (since XUL's IsInDoc()/GetCurrentDoc() lies for cloned XUL nodes).  Just so you're warned.  ;)

In any case, you need to handle the case of removeIndex == -1 safely.  :(

>+      // Adjust insert index if the node we ripped out was a sibling
>+      // of the node we're inserting before

Wouldn't it make more sense to do this after the remove succeeds?

>+      if (guard.Mutated(1)) {

Again, I feel like I've seen the code in this if before.  Factor it out some?

>Index: content/base/src/nsGenericElement.h
>+ * You can then at any time call Mutated to check if any unexpected mutations
>+ * has occured.

"have occured".

>+class nsMutationGuard {

It'd really help if this were documented a little (like what sMutationCount and mDelta really mean.  Could you do that, please?

>+  ~nsMutationGuard()
>+  {
>+    sMutationCount =
>+      mDelta > sMutationCount ? 0 : sMutationCount - mDelta;

OK, why that check?  I'm clearly not following something here...

>+  PRBool Mutated(PRUint8 aIgnoreCount)
>+  {
>+    return sMutationCount < NS_STATIC_CAST(PRUint32, eMaxMutations - aIgnoreCount);

Wait.  We decrement sMutationCount on every mutation, right?  eMaxMutations - aIgnoreCount is guaranteed between 0 and 300.  If more than 300 mutations happen under a single guard, then sMutationCount will be very large and this will return false, no?

What am I missing?

>+  // This function should be called whenever a mutation that we want to keep
>+  // track of.

"happens", right?  ;)

>    * Actual implementation of the DOM InsertBefore method.  Shared by

and replaceChild too, right?

>    * @param aNewChild The child to insert

What happend to @param aReplace?

>    * Actual implementation of the DOM ReplaceChild method.  Shared by

Shouldn't that decl go away, like the impl did?

I'd like to understand nsMutationGuard before giving r+.
Comment on attachment 212276 [details] [diff] [review]
Patch v2

>+class nsMutationGuard {

Ah, I see.  The key is that sMutationCount is guaranteed to be in the range 0 to eMaxMutations.  That should be documented, and asserted in ~nsMutationGuard after setting the value, as well as in Mutated() before returning.

And mDelta should be documented as data we need to restore the correct mutation count for an outer guard.

We should also document that if you have two nested guards, calling Mutated() on an outer guard will give the wrong results; I wonder whether we can assert in that situation somehow...  Not a big deal either way, though.

r=bzbarsky with the comments addressed.
Attachment #212276 - Flags: review?(bzbarsky) → review+
> >+    if (guard.Mutated(1)) {
> 
> I guess we don't really want to do the security check here, eh?  Should be OK,
> I guess....

Not sure what you mean here. You were thinking if we need to rerun the CanCallerAccess check in case of mutation in case the node was moved to another document?

> >+      if (!IsAllowedAsChild(newContent, nodeType, aParent, aDocument,
> >+                            PR_FALSE, refContent)) {
> 
> PR_FALSE because we've already removed the thing we're replacing, so all that's
> left is an insert, right?  Document that, please.

Done. I did notice that I was passing PR_FALSE in the first call too though. I fixed that to pass aReplace

> >+      // We don't need to update i if someone mutates the DOM. The only thing
> >+      // that'd happen is that the resulting child list might be unexpected,
> >+      // but we should never crash since RemoveChildAt is OOB safe.
> 
> Maybe "might be shorter than expected"?  And expand out "out-of-bounds"?

unexpected in any way really. Children can be added, replaced and removed while we're clearing the fragment.

> >+        for (PRUint32 j = i + 1; j < count; ++j) {
> 
> Hmm.  So if every insert mutates, this will be O(N^2) in the number of kids in
> the fragment...  We may not care, but if we do and we're willing to have a
> slightly different behavior, we could just have a boolean "do check" flag and
> check only the node we're about to insert if the flag is set.  The flag would
> start false and go true and stay true if there is ever a mutation.

I did this. It actually removes a bit of code duplication so it's a good idea. I don't really care about the changed behaviour because there is no 'right' behaviour for this stuff.

> In any case, the loop body duplicates code up above, no?  Want to factor it 
> out into a function?  No need to even inline it; codesize over perf in this 
> case, imo.

I'd rather not factor it out. I think the code would be even harder to follow actually. It's a little bit better once 

> >+    if (oldParent) {
> >+      nsMutationGuard guard;
> > 
> >+      PRInt32 removeIndex = oldParent->IndexOf(newContent);
> >+      NS_ASSERTION(removeIndex >= 0 && removeIndex != insPos,
> >+                   "invalid removeIndex");
> 
> Why assert if removeIndex == insPos?  They're indices for different nodes in
> different child lists, no?

Right, there should be an additional |oldParent == container| check in there.

> Also, this assert _will_ fire for XUL, generally speaking (since XUL's
> IsInDoc()/GetCurrentDoc() lies for cloned XUL nodes).  Just so you're warned. 
> ;)

I fixed this by adding an extra IndexOf if newContent is a xul element.

> In any case, you need to handle the case of removeIndex == -1 safely.  :(

That shouldn't happen if I address the xul crappiness, right?

> >+      // Adjust insert index if the node we ripped out was a sibling
> >+      // of the node we're inserting before
> 
> Wouldn't it make more sense to do this after the remove succeeds?

Well.. sure. Not that it makes a difference, done anyway.


Don't really think that assertions for sMutationCount are needed. It's only ever adjusted down and then clearly capped. The only time it's adjusted up it's set to eMaxMutations. And all modifications live inside the nsMutationGuard class.

Let me know if you want them anyway.
Attached patch address review comments (obsolete) — Splinter Review
Attachment #212276 - Attachment is obsolete: true
> You were thinking if we need to rerun the CanCallerAccess check in case of
> mutation in case the node was moved to another document?

Yeah, basically....  I don't think we need to, but I'm not completely sure.

> It's a little bit better once 

Once what?

> That shouldn't happen if I address the xul crappiness, right?

Yeah, indeed.  File a bug on removing this XUL crap workaround and mark it dependent on the bug about XUL lying?

> Don't really think that assertions for sMutationCount are needed.

I think they'd make it much clearer why the code works, basically.  You could also just add documentation about the invariants that are maintained in the form of comments; either way as long as the invariants are documented.  ;)

Did you mean to actually attach the updated patch for this bug, btw?  ;)
> > You were thinking if we need to rerun the CanCallerAccess check in case of
> > mutation in case the node was moved to another document?
> 
> Yeah, basically....  I don't think we need to, but I'm not completely sure.

The only way we'd need the checks is if the nodes were moved to another document by chrome code. And if that happens I think we'd have bigger issues.

> > It's a little bit better once 
> 
> Once what?

Sorry, once I add the |mutated| flag that gets rid of the O(N^2) complexity.

> > That shouldn't happen if I address the xul crappiness, right?
> 
> Yeah, indeed.  File a bug on removing this XUL crap workaround and mark it
> dependent on the bug about XUL lying?

I added a comment referring to the existing bug.

> > Don't really think that assertions for sMutationCount are needed.
> 
> I think they'd make it much clearer why the code works, basically.  You could
> also just add documentation about the invariants that are maintained in the
> form of comments; either way as long as the invariants are documented.  ;)

Yeah, i added a couple of comments to that effect.

> Did you mean to actually attach the updated patch for this bug, btw?  ;)

Doh! Too similar filenames. I'm at home now so i'll attach the updated patch once i'm in the office tomorrow.
This is the right patch
Attachment #214397 - Attachment is obsolete: true
Comment on attachment 214450 [details] [diff] [review]
address review comments

Yeah, this looks great. Let's land it!
Attachment #214450 - Flags: review?(bzbarsky) → review+
Landed! We should be able to run a lot more mutation tests now.

This patch actually also fixed an xbl bug, i forgot to mention this before. We used to prevent moving direct anonymous children to elsewhere in insertBefore (bug 308120). However the same fix was needed to replaceChild. Since the two functions now use the same codepath it's fixed in both places.
...and marking fixed
Status: NEW → RESOLVED
Closed: 18 years ago
Flags: blocking1.9a1? → blocking-aviary1.0.9?
Resolution: --- → FIXED
jonas, boris: is this something you guys intend to port to gecko 1.8.1?  what about 1.8.0.3?
Yes, I just wanted to let it bake on trunk first. Note that there are blocking requiests for most branches.
OK, I asked because I did not see branch approval requests for the patch.
Comment on attachment 214450 [details] [diff] [review]
address review comments

We have too many branches :(
Attachment #214450 - Flags: approval1.8.0.3?
Attachment #214450 - Flags: approval1.7.14?
Attachment #214450 - Flags: approval-branch-1.8.1?
Attachment #214450 - Flags: approval-aviary1.0.9?
Flags: blocking1.8.1?
Flags: blocking1.8.1+
Flags: blocking1.8.0.3?
Flags: blocking1.8.0.3+
Flags: blocking1.7.14?
Comment on attachment 214450 [details] [diff] [review]
address review comments

Since this is a pretty big patch, we'd like to see this baking for Gecko 1.8.1 as well before taking it for 1.8.0.x.
Attachment #214450 - Flags: approval-branch-1.8.1? → approval-branch-1.8.1?(jst)
Blocks: 330925
Attachment #214450 - Flags: approval-branch-1.8.1?(jst) → approval-branch-1.8.1+
Sicking: need this landed for 1.8.1 ASAP if it's going to get 1.8.0.3 approval
This patch includes fixes for this bug, bug 330925 and bug 330084
Comment on attachment 214450 [details] [diff] [review]
address review comments

approved for 1.8.0 branch, a=dveditz for drivers
Attachment #214450 - Flags: approval1.8.0.3? → approval1.8.0.3+
verified with Mozilla/5.0 (Macintosh; U; PPC Mac OS X Mach-O; en-US; rv:1.8.0.4) Gecko/20060508 Firefox/1.5.0.4
Group: security
This caused bug 360021 on branches
Depends on: zdi-can-126
Crashtest checked in.
Flags: in-testsuite+
Flags: blocking-aviary1.0.9?
Comment on attachment 214450 [details] [diff] [review]
address review comments

Clearing stale requests.
Attachment #214450 - Flags: approval1.7.14?
Attachment #214450 - Flags: approval-aviary1.0.9?
Crash Signature: [@ nsAttrAndChildArray::InsertChildAt]
Component: DOM → DOM: Core & HTML
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: