Closed Bug 895239 Opened 11 years ago Closed 11 years ago

OutputNonTextContentAsPlaceholder fails to output non-textual container element as placeholder

Categories

(Core :: DOM: Serializers, defect, P1)

defect

Tracking

()

RESOLVED FIXED
mozilla26
blocking-b2g leo+
Tracking Status
firefox24 --- wontfix
firefox25 --- wontfix
firefox26 --- fixed
b2g18 --- fixed
b2g18-v1.0.0 --- wontfix
b2g18-v1.0.1 --- wontfix
b2g-v1.1hd --- fixed

People

(Reporter: xyuan, Assigned: xyuan)

References

Details

Attachments

(1 file, 4 obsolete files)

This is a followup of Bug 890580.

By design, the plaintext serializer with OutputNonTextContentAsPlaceholder flag should convert iframe to placeholder character. But the current implementation converts iframe to empty string.
Severity: normal → major
blocking-b2g: --- → leo+
Priority: -- → P1
Attached patch iframe_as_placeholder (obsolete) — Splinter Review
Without this patch, `<iframe></iframe>` and `<iframe>XXX</iframe>` is serialized to empty string and "XXX" respectively. 

With this patch and OutputNonTextContentAsPlaceholder flag, <iframe> will be serialized to the placeholder character - U+FFFC (OBJECT REPLACEMENT CHARACTER).

Hi Olli, as Henry(:hsivonen) isn't available until 29 July, could help to review the patch?
Attachment #780153 - Flags: review?(bugs)
Fix a typo in the test_bug895239.html.
Attachment #780153 - Attachment is obsolete: true
Attachment #780153 - Flags: review?(bugs)
Attachment #780155 - Flags: review?(bugs)
Comment on attachment 780155 [details] [diff] [review]
Serialize iframe as placeholder charater with test case

># HG changeset patch
># Parent fb4bf993a58a153a1d63302ccd2aa3103975f850
># User Yuan Xulei <xyuan@mozilla.com>
>Serialize iframe as placeholder if OutputNonTextContentAsPlaceholder is set. r=smaug
>
>diff --git a/content/base/src/nsDocumentEncoder.cpp b/content/base/src/nsDocumentEncoder.cpp
>--- a/content/base/src/nsDocumentEncoder.cpp
>+++ b/content/base/src/nsDocumentEncoder.cpp
>@@ -477,21 +477,26 @@ nsDocumentEncoder::SerializeToStringRecu
> 
>   if (!aDontSerializeRoot) {
>     rv = SerializeNodeStart(maybeFixedNode, 0, -1, aStr, aNode);
>     NS_ENSURE_SUCCESS(rv, rv);
>   }
> 
>   nsINode* node = serializeClonedChildren ? maybeFixedNode : aNode;
> 
>-  for (nsINode* child = nsNodeUtils::GetFirstChildOfTemplateOrNode(node);
>-       child;
>-       child = child->GetNextSibling()) {
>-    rv = SerializeToStringRecursive(child, aStr, false);
>-    NS_ENSURE_SUCCESS(rv, rv);
>+  // If OutputNonTextContentAsPlaceholder is set, don't serialize child nodes
>+  // of <iframe>.
>+  if (!((mFlags & OutputNonTextContentAsPlaceholder) &&
>+      aNode->Tag() == nsGkAtoms::iframe)) {
This code really shouldn't have to deal with iframes.
(In general all the special handling for some particular element in nsDocumentEncoder class is a bug, IMO, and looks like
it is currently limited to selection handling only, and <br>)
You need to somehow deal this stuff in serializer.
And is iframe really the only case broken?
Attachment #780155 - Flags: review?(bugs) → review-
(In reply to Olli Pettay [:smaug] from comment #3)
> This code really shouldn't have to deal with iframes.
> (In general all the special handling for some particular element in
> nsDocumentEncoder class is a bug, IMO, and looks like
> it is currently limited to selection handling only, and <br>)
> You need to somehow deal this stuff in serializer.

I tried to keep the handling logic inside serializer, but it seems impossible to put all in serializer. 

Traditionally, to be compatible with browsers which don't support <iframe>, bea short text is added between <iframe> and </iframe>. This short text is interpreted as child nodes of <iframe>.
These child nodes should be skipped during the serializing process. 

Since nsDocumentEncoder is responsible for node traversal and determines which node to serialize and which node not, we are able to skip the child nodes of <iframe> in this class.

While nsPlainTextSerializer is only responsible for converting node to text. It doesn't know the node context. When receiving child nodes of <iframe>, nsPlainTextSerializer cannot tell whether they belong to <iframe> and then cannot skip them. 

> And is iframe really the only case broken?

No, unfortunately I found another broken case of <canvas>. I'll figure out a list of all broken tags.
Well, it really needs to go to serializer. You can skip stuff there. It already keeps track of
various states. Add something to AppendElementStart and AppendElementEnd or so.
Summary: OutputNonTextContentAsPlaceholder fails to output iframe as placeholder → OutputNonTextContentAsPlaceholder fails to output non-textual container element as placeholder
This patch handles all text serialization work in nsPlainTextSerializer. A member variable of mIgnoredChildNodeLevel is added to ignore child nodes of <iframe>.

By checking HTML element list(https://developer.mozilla.org/en-US/docs/Web/HTML/Element), I got other similar broken cases, including <audio>, <canvas>, <meter>, <progress>, <object>, <svg> and <video>. 

This patch also fixed these cases.
Attachment #780155 - Attachment is obsolete: true
Attachment #783250 - Flags: review?(bugs)
Comment on attachment 783250 [details] [diff] [review]
Serialize non-textual container element as placeholder charater.

Based on bugzilla, Henri should be back now.

Henri, please re-assign the review to me if you don't want or don't have time to
review this.

Yuan, please re-assign the review to me if you don't hear back from hsivonen in the next few days.


(Looks like the patch is rather huge mainly because of whitespace fixes.)
Attachment #783250 - Flags: review?(bugs) → review?(hsivonen)
Attached patch no_whitespace_fixes (obsolete) — Splinter Review
Olli, thank you.

Should I remove the whitespace fixes?

To help with review, I create this patch without whitespace fixeds.
I guess you could just create a -w patch too.
hg diff -w or some such. That shouldn't contain the whitespace changes.
Comment on attachment 783250 [details] [diff] [review]
Serialize non-textual container element as placeholder charater.

Review of attachment 783250 [details] [diff] [review]:
-----------------------------------------------------------------

Hi Olli, Henri seems too busy to review the patch and could you give a help?
Attachment #783250 - Flags: review?(hsivonen) → review?(bugs)
Ok, reviewing.
Comment on attachment 783250 [details] [diff] [review]
Serialize non-textual container element as placeholder charater.

Thanks for the -w patch!

Perhaps rename IsChildNodeIgnorable to ShouldReplaceContainerWithPlaceholder()

>-NS_IMETHODIMP 
>+bool
>+nsPlainTextSerializer::IsChildNodeIgnorable(nsIAtom* aTag)
>+{
>+  // If nsIDocumentEncoder::OutputNonTextContentAsPlaceholder is set,
>+  // non-textual container element should be serialized as placeholder
>+  // character and its child nodes should be ignored. See bug 895239.
>+  if (!(mFlags & nsIDocumentEncoder::OutputNonTextContentAsPlaceholder)) {
>+    return false;
>+  }
>+
>+  return
>+    (aTag == nsGkAtoms::audio) ||
>+    (aTag == nsGkAtoms::canvas) ||
>+    (aTag == nsGkAtoms::iframe) ||
>+    (aTag == nsGkAtoms::meter) ||
>+    (aTag == nsGkAtoms::progress) ||
>+    (aTag == nsGkAtoms::object) ||
>+    (aTag == nsGkAtoms::svg) ||
>+    (aTag == nsGkAtoms::video);
>+}
Hmm, it is not super clear how we should handle this case when some fallback is applied.
Like <object> may have some sane fallback content.
But I guess this is good enough for now.

>@@ -244,16 +244,17 @@ MOCHITEST_FILES_A = \
> 		test_bug282547.html \
> 		bug282547.sjs \
> 		test_domparser_null_char.html \
> 		test_bug811701.html \
> 		test_bug811701.xhtml \
> 		test_bug820909.html \
> 		test_bug704063.html \
> 		test_bug894874.html \
>+                test_bug895239.html \
indentation (looks like tabs are used for other cases here)
Attachment #783250 - Flags: review?(bugs) → review+
I asked around, and I was told the \u syntax is not supported everywhere.
So, \x syntax need to be used. Fix the case in this and the case added in bug 890580
Olli, thanks for you help. The path addresses all your comments.

1. Renames IsChildNodeIgnorable to ShouldReplaceContainerWithPlaceholder.

2. Fixes indentation in content/base/test/Makefile.in.

3. Uses `\x` instead of `\u` to escape special unicode character in C++ code. Fixs the cases both in this and bug 890580.
Attachment #783250 - Attachment is obsolete: true
Attachment #783503 - Attachment is obsolete: true
Attachment #785952 - Flags: review?(bugs)
Comment on attachment 785952 [details] [diff] [review]
Serialize non-textual container element as placeholder charater.

Please push to try before to m-i.
Attachment #785952 - Flags: review?(bugs) → review+
Try server result:
https://tbpl.mozilla.org/?tree=Try&rev=6dbafd0b8ee3

The mochitest on window and linux has been passed and it's ready to check in.
Keywords: checkin-needed
https://hg.mozilla.org/mozilla-central/rev/5d6e9e87f6f9
Status: ASSIGNED → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla26
Needs a branch-specific patch for uplift.
Hi Ryan, the patch of bug 890580 is required to uplift before this. After bug 890580 lands on b2g18, I guess this patch could be appied to b2g18 branch without confict.
Flags: needinfo?(xyuan)
Ryan, thanks for your help!
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: