Closed Bug 902618 Opened 6 years ago Closed 5 years ago

Parser notifies about subtree roots, but not their descendants

Categories

(Core :: HTML: Parser, defect)

26 Branch
x86_64
Windows 7
defect
Not set

Tracking

()

RESOLVED FIXED
mozilla33

People

(Reporter: milakam, Assigned: smaug)

References

Details

Attachments

(7 files, 6 obsolete files)

Attached file test.html
User Agent: Opera/9.80 (Windows NT 6.1; WOW64) Presto/2.12.388 Version/12.16

Steps to reproduce:

Open attached testcase


Actual results:

Iframe with the nested site http://example.com is loaded, because no event is fired.


Expected results:

Inserting the nested Iframe with document.write should fire a event which triggers the javascript code within the MutationObserver (that then changes the Iframe source and prevents loading of http://example.com).

This works in Chrome (but not in IE11), so if anyone finds some infos in the spec, please confirm the bug. I only found this discussion:
W3 discussion: http://lists.w3.org/Archives/Public/public-webapps/2012AprJun/1217.html
Component: Untriaged → DOM
Product: Firefox → Core
Summary: MutationObserver event not fired for nested Iframe → MutationObserver event not fired for document.write
Attachment #787119 - Attachment mime type: text/plain → text/html
> Iframe with the nested site http://example.com is loaded, because no event is fired.

Which browser version is that in, exactly?  I see the problem in Firefox 22 but not Firefox 23.
Ah, nevermind.  That's just because the site being loaded is non-https.  If I save the testcase locally I see example.com load.

There is in fact a mutation observer notification fired here for the <div> containing the <iframe>.  It's not clear to me whether there is supposed to be one for the <iframe> as well...  Olli?
Flags: needinfo?(bugs)
(I'll get to this later this week)
Any news? I hope this is a bug and that it can be fixed, because right now I've to use a second for loop as workaround:

for (var i = 0; i < mutation.addedNodes.length; i++) {
	if (mutation.addedNodes[i].nodeName == 'IFRAME') {
		var mystuff;
	}
	
	if (mutation.addedNodes[i].hasChildNodes() && mutation.type == 'childList') {
		var ifr = mutation.addedNodes[i].getElementsByTagName('IFRAME');
		for (var j=0; j<ifr.length; j++) {
			var myotherstuff;
		}
	}
}

The problem with that is that in most cases both loops are called and FF starts with loop "1" and then "0". So even if there is no standard reference and Chrome would be wrong firing an event for document.write, there is defenitely sth. going on here.
Sorry, been busy with other things...
Flags: needinfo?(bugs)
Flags: needinfo?(bugs)
On latest Aurora (20131014004003), the content is not loaded, the message in the browser's console is: 
Blocked loading mixed active content "http://example.com/", same as on Chrome.

Reporter, do you still see the issue? Thank you.
This is _still_ in my todo list.
petruta.rasa@softvision.ro, you need to download the testcase or otherwise run it from a non-https url.
(In reply to Olli Pettay [:smaug] from comment #7)
> This is _still_ in my todo list.
Any updates? Anyhow we can support (files to start with a patch, code snippets to verify, specs...)?
Seems to be getting worse. 2nd added test case is not using any document.write or frames. However, FF doesn't seem to be firing events for nested tags in general (this is against the standard).

Please let me know if I should open a separate bug for it, but I guess it's the same root cause as the with the first testcase.
2nd case is different, and nothing requires Chrome's behavior. It is about HTML parsing.


And yes, this is still in my todo list, but that list just happens to be rather long.
(Anyone willing to take this bug, feel free to)
Oh, wait, why is this a bug at all. We get the notification about adding div to the document.
And that div already contains iframe.
Flags: needinfo?(bugs)
So if there is a bug, it isn't in MutationObserver but in the HTML parsing.
Should we add div to the document before iframe to the div?
Flags: needinfo?(hsivonen)
Component: DOM → HTML: Parser
Summary: MutationObserver event not fired for document.write → Parser notifies about subtree roots, but not their descendants
OK, I think it's clear that it's a bug, because it breaks the spec:
http://www.w3.org/TR/dom/#mutation-observers

"subtree
Set to true if mutations to not just target, but also target's descendants are to be observed."

whereas "descendants" is linked and points to this:

"An object A is called a descendant of an object B, if either A is a child of B or A is a child of an object C that is a descendant of B."

Otherwise the whole usage of subtree would be useless, because there would be no difference to childlist.

>Should we add div to the document before iframe to the div?
Again, from my point of view Chrome implemented this correctly. And it fires events "top to bottom". So if you're talking about the 2nd testcase it fires events for the following:
1. container div
2. nested link

Hopes this lights things a little bit up. Happy new year everybody ;)
> because it breaks the spec:

The real question here is what the DOM mutations look like. That is, given this markup:

  <body><div><iframe></div></body>

does the parser insert the div into the body and then the iframe into the div, or does it insert the iframe into the div and then the div into the body?  And if the parsing spec allows either one, should the page be able to tell the two apart via mutation observers on the body?
(In reply to Olli Pettay [:smaug] from comment #14)
> So if there is a bug, it isn't in MutationObserver but in the HTML parsing.

document.written nodes are parser-inserted nodes. I thought MutationObservers were supposed to exclude parser-performed insertions since the beginning of the MutationObserver idea was introduced. Did that change along the way?

> Should we add div to the document before iframe to the div?

Yes.

(In reply to Vacation Dec 19 to Jan 1 from comment #16)
> > because it breaks the spec:
> 
> The real question here is what the DOM mutations look like. That is, given
> this markup:
> 
>   <body><div><iframe></div></body>
> 
> does the parser insert the div into the body and then the iframe into the
> div,

Yes.

> or does it insert the iframe into the div and then the div into the
> body? 

No.

> And if the parsing spec allows either one, should the page be able to
> tell the two apart via mutation observers on the body?

The parsing spec only allows the first case, AFAIK.
Flags: needinfo?(hsivonen)
(In reply to Henri Sivonen (:hsivonen) from comment #17) 
> document.written nodes are parser-inserted nodes. I thought
> MutationObservers were supposed to exclude parser-performed insertions since
> the beginning of the MutationObserver idea was introduced. Did that change
> along the way?
Mutation Events aren't dispatched, but MutationObserver should work just fine.


> > Should we add div to the document before iframe to the div?
> 
> Yes.
...so this bug is that we don't notify about iframe being added to the DOM tree.
(In reply to Olli Pettay [:smaug] from comment #18)
> ...so this bug is that we don't notify about iframe being added to the DOM
> tree.

Back when I wrote the notification code (before MutationObserver existed), I tried very carefully to avoid over-notification and to do the minimal amount of notification that our layout code required. Does the MutationObserver code expect different notification semantics than layout code? Why?
I don't know about "layout code", but there should be no over-notification. The difference between MutationObserver and Mutation Events is described here:
https://hacks.mozilla.org/2012/05/dom-mutationobserver-reacting-to-dom-changes-without-killing-browser-performance/

"The key advantage to this new specification over the deprecated DOM Mutation Events spec is one of efficiency. If you are observing a node for changes, your callback will not be fired until the DOM has finished changing. When the callback is triggered, it is supplied a list of the changes to the DOM, which you can then loop through and choose to react to."

"If you play with the live example, you’ll notice some quirks in behaviour, in particular that the callback is triggered when you press enter in each li, in particular when the user action results in a node being added or removed from the DOM. This is an important distinction to be made from other techniques such as binding events to key presses or more common events like ‘click’. MutationObservers work differently from these techniques because they are triggered by changes in the DOM itself, not by events generated either via JS or user interaction."
Attached patch wip 1 (obsolete) — Splinter Review
https://tbpl.mozilla.org/?tree=Try&rev=93f422835b9e

notify always, and keep the parent alive when adding a child.
WIP patch, but at least locally feels ok.
But let's see what talos tells us.
Attachment #8376459 - Flags: feedback?(hsivonen)
Looks like some failures from the test for bug Bug 650493.
I guess I want to use nodeutils explicitly and not notify.
talos doesn't show any tp regressions, at least on osx and linux. win is still running.
I'm very worried about performance here. We should make sure that adding a mutationobserver on the documentElement doesn't slow down pageload meaningfully, even for large documents. We should expect things like addons to add mutationobservers, and of course pages might add observers themselves.
So far I haven't found any case where the approach would slow down page load if there are no DOM MutationObservers.
Loading mxr'ed nsCSSFrameConstructor.cpp with a MutationObserver observing all the changes
slows down page load ~10%, which is very much acceptable. Also, based on profiles, I could quite
easily get that down to 5%, I think (better nsTArray usage, less Addref/Release etc).

(HTML spec wasn't a good testcase, since all the time is taken in layout.)
Comment on attachment 8376459 [details] [diff] [review]
wip 1

(clearing feedback? since I have a new patch coming.)
Attachment #8376459 - Flags: feedback?(hsivonen)
Attached patch patch v1 (obsolete) — Splinter Review
This passes on try.
Looks like we had a bit buggy <output> handling (and the spec is super odd when handling <output>'s content).

I need to write still some tests and go through DoneAddingChildren usage, since
it seems to be quite error prone.
Attachment #8376459 - Attachment is obsolete: true
Attachment #8378262 - Flags: feedback?(hsivonen)
Attached patch v2Splinter Review
Safer DoneAddingChildren. It is a bit ugly, but I don't see anything in parser
to explicitly force the right ordering.

Some tests coming https://tbpl.mozilla.org/?tree=Try&rev=d303888ef78f
Assignee: nobody → bugs
Attachment #8378262 - Attachment is obsolete: true
Status: UNCONFIRMED → ASSIGNED
Ever confirmed: true
Attachment #8378262 - Flags: feedback?(hsivonen)
Attachment #8378425 - Flags: review?(hsivonen)
Attached patch some testsSplinter Review
https://tbpl.mozilla.org/?tree=Try&rev=01bfaa28be04

Rather basic tests, but at least Blink gives the same behavior.
Note, characterData changes can depend on implementation and on network packets and what not.
Attachment #8379225 - Flags: review?(hsivonen)
Comment on attachment 8378425 [details] [diff] [review]
v2

>-  if (aName == nsHtml5Atoms::title) {
>-    nsHtml5TreeOperation* treeOp = mOpQueue.AppendElement();
>-    NS_ASSERTION(treeOp, "Tree op allocation failed.");
>-    treeOp->Init(eTreeOpDoneAddingChildren, aElement);
>-    return;
>-  }
>   if (aName == nsHtml5Atoms::style || (aNamespace == kNameSpaceID_XHTML && aName == nsHtml5Atoms::link)) {
>     nsHtml5TreeOperation* treeOp = mOpQueue.AppendElement();
>     NS_ASSERTION(treeOp, "Tree op allocation failed.");
>     treeOp->Init(eTreeOpUpdateStyleSheet, aElement);
>     return;
>   }
>   if (aNamespace == kNameSpaceID_SVG) {

Removing the aName == nsHtml5Atoms::title case above the aNamespace == kNameSpaceID_SVG check and moving it below seems to break SVG title node handling. Is there a reason why it's not broken that I'm missing?

>diff --git a/parser/html/nsHtml5TreeOperation.cpp b/parser/html/nsHtml5TreeOperation.cpp
>--- a/parser/html/nsHtml5TreeOperation.cpp
>+++ b/parser/html/nsHtml5TreeOperation.cpp
>@@ -66,17 +66,17 @@ class MOZ_STACK_CLASS nsHtml5OtherDocUpd
> 
>     ~nsHtml5OtherDocUpdate()
>     {
>       if (MOZ_UNLIKELY(mDocument)) {
>         mDocument->EndUpdate(UPDATE_CONTENT_MODEL);
>       }
>     }
>   private:
>-    nsIDocument* mDocument;
>+    nsCOMPtr<nsIDocument> mDocument;

Why add refcounting here?

>-        rv = text->SetText(buffer, length, false);
>+        rv = text->SetText(buffer, length, true);

Is this actually safe when it comes to legacy mutation events?

r+ from me if the above things indeed can be explained to be OK. However, I can't assess the full impact of this change and the number of update batches this adds to parsing scares me, so I think the general approach needs a review from bz.
Attachment #8378425 - Flags: review?(hsivonen)
Attachment #8378425 - Flags: review?(bzbarsky)
Attachment #8378425 - Flags: review+
Attachment #8379225 - Flags: review?(hsivonen) → review+
> Removing the aName == nsHtml5Atoms::title case above the aNamespace ==
> kNameSpaceID_SVG check and moving it below seems to break SVG title node
> handling. Is there a reason why it's not broken that I'm missing?
Bug. Will fix.

> 
> Why add refcounting here?
I want to have sane refcounting. notifying anything may run scripts, so better to make sure things stay alive

> 
> >-        rv = text->SetText(buffer, length, false);
> >+        rv = text->SetText(buffer, length, true);
> 
> Is this actually safe when it comes to legacy mutation events?
It is, but it is also not needed since 'text' isn't anyway in the document yet. I'll remove this change.
Comment on attachment 8378425 [details] [diff] [review]
v2

~mozAutoDocUpdate can run script and do other nasty things.  Are all the new consumers OK with that?  Or do the notifications need to happen at a safe point of some sort?  nsHtml5TreeOperation::AppendToDocument seems to at least assert there is a preexisting scriptblocker on the stack... should the other functions be doing that too?  What guarantees that there is that other script blocker?

> nsHtml5TreeOperation::AppendTextToTextNode(const char16_t* aBuffer,

Please document why we're notifying manually instead of just passing true for aNotify?

Similar in nsHtml5TreeOperation::Append and so forth.

r=me if that is addressed, I guess.  The performance here really worries me, though....
Attachment #8378425 - Flags: review?(bzbarsky) → review+
So far I haven't found a testcase where this slows down page loads.
(MutationObserver case is separate thing).
We definitely need to get better data on how this affects performance when MutationObservers are attached before we can land this.
Well, the patch makes us to work per spec.
..but I'm not landing this right now. Still doing more profiling etc.
(In reply to Olli Pettay [:smaug] from comment #36)
> Well, the patch makes us to work per spec.

Yes, but should we still run the old code when there are no mutation observers? And is it a good idea that the spec makes parser-performed mutations observable?

BTW, did you test that this works per spec for the innerHTML setter? That is, does the innerHTML setter look like a DocumentFragment gets inserted instead of looking like the parser parser straight into the document, which is what we actually do?

(In reply to Boris Zbarsky [:bz] (reviews will be slow; ask someone else) from comment #33)
> ~mozAutoDocUpdate can run script and do other nasty things.

The whole tree op batch is wrapped in an update batch. The script only run when the outer doc update ends, right?
> The script only run when the outer doc update ends, right?

Yes.
(In reply to Henri Sivonen (:hsivonen) from comment #38)
> Yes, but should we still run the old code when there are no mutation
> observers?
I don't see why, unless someone finds a testcase where the patch slows down page loads.
We should try to avoid having multiple code paths and behaviors for the same thing.


> And is it a good idea that the spec makes parser-performed
> mutations observable?
Yes it is. We want to be able to observe all the mutations to a dom tree.
Consistency in APIs is good.


> BTW, did you test that this works per spec for the innerHTML setter? That
> is, does the innerHTML setter look like a DocumentFragment gets inserted
> instead of looking like the parser parser straight into the document, which
> is what we actually do?
innerHTML is very different beast, and MutationObserver needs to handle it rather specially, since
it looks like a batch. We do have tests for it.


> 
> (In reply to Boris Zbarsky [:bz] (reviews will be slow; ask someone else)
> from comment #33)
> > ~mozAutoDocUpdate can run script and do other nasty things.
> 
> The whole tree op batch is wrapped in an update batch. The script only run
> when the outer doc update ends, right?
Which reminds me... I could in theory not have AutoDocUpdates, but having extra ones
doesn't really cause harm and makes the code less error prone.
(And even the extra addref/release should matter perf wise, since we do addref/release anyway for the
document, which means the document ends up to purple buffer anyway.)


I'm still trying to hunt some b2g profiles. I was told html parsing was showing up in the profiles
(but then someone else told it is not).
Attached patch wip merge (obsolete) — Splinter Review
(There has been quite some changes to parser)
https://tbpl.mozilla.org/?tree=Try&rev=44ffea346adf

(This may cause some assertions to fire)
Attachment #8388963 - Attachment is obsolete: true
One day I'll manage to push to try using non-broken tree :/

https://tbpl.mozilla.org/?tree=Try&rev=67a15827b5da
@Olli
This is amazing, thank you and everyone else for the hard work. I'm eager to test it live, when it's landed.
I need to re-do some testing and then either get the patch landed or change the spec so that
Gecko's current behavior is ok.
Attached patch up-to-dateSplinter Review
Still can't see regressions in page load, and some, expected perf regression when there are mutation observers. This is with a pgo build on linux.
Duplicate of this bug: 998503
(In reply to Olli Pettay [:smaug] from comment #50)
> Created attachment 8403236 [details] [diff] [review]
> up-to-date
> 
> Still can't see regressions in page load, and some, expected perf regression
> when there are mutation observers. This is with a pgo build on linux.
Please let us know how to help. Are there any testcases that we can create, any test builds?
https://hg.mozilla.org/mozilla-central/rev/48b9c15368aa
Status: ASSIGNED → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla33
(bz wants me to post some perf data here)
Flags: needinfo?(bugs)
So observing 'document, {childList: true, characterData: true, subtree: true}'
while loading http://mxr.mozilla.org/mozilla-central/source/layout/base/nsCSSFrameConstructor.cpp
slowed down from ~1260ms to ~1360ms (pretty high variance). ~8%

Same with http://dom.spec.whatwg.org/ went from ~865ms to ~915ms. ~6%

A very artificial test where there is style="display:none;" in 
http://mxr.mozilla.org/mozilla-central/source/layout/base/nsCSSFrameConstructor.cpp body element
shows I guess the worst case regression 340ms -> 410ms. So 20%, which is not nice but it is totally
unrealistic case. It also shows that page load performance tend to be heavily layout bound - nothing new there.
(I'm not aware of any significant optimizations done to reflow for ages, and that takes ~50% of the page load time
even in the case there are mutation observers)

Without mutation observer the change didn't cause any regressions (with or without display: none)


So, we follow the spec now better, and there hasn't been any exact proposals for other behaviors.
If there are good, realistic reasons for other behavior, and someone even specs it and other browser vendors are happy with it,
sure we can change the behavior. But the current one just happens to be quite natural given how html parsing is spec'ed.
Flags: needinfo?(bugs)
Also, if someone thinks this causes real world perf issues, I'd like to see profiles.
(Gecko profiler works just fine on b2g). There might be cases we can optimize in MutationObserver handling.
(In reply to Olli Pettay [:smaug] from comment #57)
> So observing 'document, {childList: true, characterData: true, subtree:
> true}'
> while loading
> http://mxr.mozilla.org/mozilla-central/source/layout/base/
> nsCSSFrameConstructor.cpp
> slowed down from ~1260ms to ~1360ms (pretty high variance). ~8%
> 
> Same with http://dom.spec.whatwg.org/ went from ~865ms to ~915ms. ~6%

Is this when loading these files locally from cache? Or were they loaded from network? It'd be interesting to see numbers when loading files from memory cache, but I have no idea how to cause that to happen.
local files
Hi all,

many thanks for this great & huge fix. It works great in my productive version. 

However I realized a small issue with createElement. The event is fired correctly for one element, but not for nested ones (no addedNode event fired for the iframe):

d=document.createElement('div');
i=document.createElement('iframe');
d.appendChild(i);
document.body.appendChild(d);

Please let me know if this is intended or if I should open a new bug.
That's intended, yes.  You get a mutation observer notification for each mutation at the time the mutation happens.  If you attach a mutation observer to "d" before appending "i", it will be notified, for example.
Hm, I never get a notification. Even as I use this code before any other appendigs:

MutationObserver = window.MutationObserver;
var observer = new MutationObserver(function(mutations) {
  mutations.forEach(function(mutation) {
	var addedNodes = mutation.addedNodes;
	for (var i = 0; i < addedNodes.length; i++) {
		if (addedNodes[i].nodeName === 'IFRAME') { alert('yes, notified'); }
	}
  });
});
observer.observe(document, {childList: true, subtree: true, attributes: true});
I said:

  If you attach a mutation observer to "d".

You're not doing that.  You're doing it on the document.  The mutation when "i" is appended to "d" happens at a point when "d" does not have the document on its parent chain, so mutation observers on the document aren't notified about it.

This is all pretty clear in the spec.  Start with https://dom.spec.whatwg.org/#concept-node-insert where the mutation observer bits are in step 6.  This links to https://dom.spec.whatwg.org/#queue-a-mutation-record which queues things on the ancestors of "target" (in this case that's "i").
> "target" (in this case that's "i")

Er, I meant "d".
Thank you very much Boris, I'll adjust my scripts.
Wait, is there a browser in which the script in comment 63 gets a notification for "i" from the code in comment 61?
No, Konqueror's KHTML hasn't implemented MutationObserver, it's Webkit view is not executing anything after MO function and Chrome & IE are not notifying.
You need to log in before you can comment on or make changes to this bug.