The default bug view has changed. See this FAQ.

improvements to the XML and XHTML serializers

RESOLVED FIXED

Status

()

Core
Serializers
--
enhancement
RESOLVED FIXED
9 years ago
6 years ago

People

(Reporter: Laurent Jouanneau, Assigned: Laurent Jouanneau)

Tracking

(Depends on: 1 bug, Blocks: 6 bugs)

Trunk
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(4 attachments, 14 obsolete attachments)

9.10 KB, text/plain
Details
15.44 KB, text/plain
Details
2.59 KB, text/plain
Details
341.72 KB, patch
Details | Diff | Splinter Review
(Assignee)

Description

9 years ago
In the current version of Gecko, the XML serializer is used to serialize XHTML content. However, it doesn't serialize like the HTML serializer does for HTML documents (pretty-printing, specific serialization for some tags etc.) so it isn't really usefull for Composer.

I'm going to fix this bug. I think I will create a new nsXHTMLContentSerializer class. It will support features of nsHTMLContentSerializer, and of course will take care of all XML stuff like namespaces etc.
(Assignee)

Updated

9 years ago
Depends on: 424359
(Assignee)

Comment 1

9 years ago
After working few hours on the serializers, I decided to make improvements in the XML serializer at the same time, to do pretty printing, wrapping etc. In fact, I need it for my wysiwyg XML editor (ETNA).
Status: NEW → ASSIGNED
Summary: new XHTML serializer → improvements of the XML and XHTML serializers
(Assignee)

Comment 2

9 years ago
Created attachment 330248 [details] [diff] [review]
Work in progress #1

First patch. It is not finished. Worked only on the XML serializer so I didn't updated mochitests for the HTML serializer. Indentation doesn't work very well and then some xpcshell tests fail for the XML serializer. You need to apply the patch for bug 424359 before applying this patch.
(Assignee)

Comment 3

9 years ago
Created attachment 334297 [details] [diff] [review]
Work in progress #2

The XML serializer is almost finished. Indentation is better than in the old HTML serializer (text content is well indented). Some tests should be still added to check attributes and namespaces serialization. I worked on the HTML serializer but it's not finished.
Attachment #330248 - Attachment is obsolete: true
(Assignee)

Comment 4

9 years ago
Created attachment 335045 [details] [diff] [review]
Work in progress #3

HTML and XML serializers are fines now. All tests pass :-)

I'm working now on a new nsXHMLContentSerializer class, which will inherit from nsXMLContentSerializer and nsHTMLContentSerializer will inherit from nsXHTMLContentSerializer.
Attachment #334297 - Attachment is obsolete: true
(Assignee)

Updated

9 years ago
Component: XML → Serializers
QA Contact: xml → dom-to-text
(Assignee)

Updated

9 years ago
(Assignee)

Comment 5

9 years ago
Created attachment 338851 [details] [diff] [review]
the patch, version 1.0

I think my patch is ready for a first review.

Remember that you need to apply the patch for bug 424359 before applying this patch.

So, what's this patch doing ?

- it adds a new content serializer for XHTML : there is a new class nsXHTMLContentSerializer which inherits from nsXMLContentSerializer. This serializer is called only for pages which have a application/xhtml+xml content type. This serializer respects the XML serialization specification, but also specificities of HTML serialization (pre elements, script elements etc..).
- most of features of nsHTMLContentSerializer are moved into the XML serializer, or into the XHTML serializer.
- nsHTMLContentSerializer now inherits from nsXHTMLContentSerializer.
- the XML serializer supports now this flags : OutputFormatted, OutputWrap (+ OutputLFLineBreak and OutputCRLineBreak added by the patch of bug 424359)
- indentation and wrapping are better (there were some problems in the HTML serializer).
- unit tests are added (and I will add more tests later)
- documentation of flags on nsIDocumentEncoder is updated
Attachment #335045 - Attachment is obsolete: true
Attachment #338851 - Flags: review?(Olli.Pettay)
Summary: improvements of the XML and XHTML serializers → improvements to the XML and XHTML serializers

Comment 6

9 years ago
Created attachment 342079 [details]
comments

See comments in the attachment

This is so huge patch that few iterations are needed.
Sorry that reviewing has taken so much time.

Updated

9 years ago
Attachment #338851 - Flags: review?(Olli.Pettay) → review-
(Assignee)

Comment 7

9 years ago
> This is so huge patch that few iterations are needed.
> Sorry that reviewing has taken so much time.

Don't worry, no problem, I understand :-)

Here are responses to some of your questions in your comments. 

> But why does OutputRaw wrap? How is XHTML handled.

There isn't any wrap when you set the OutputRaw flag. But if you don't set any flags (no OutputFormatted, no OutputRaw, no OutputWrap), it acts like OutputRaw but it wraps long lines. And I don't know why, so I just kept the current behavior. Perhaps it is related to the editor, or mail...

> do we really want HTML entities in XHTML?

Yes, the specification of XHTML says : "The XHTML entity sets are the same as for HTML 4". http://www.w3.org/TR/xhtml1/dtds.html#h-A2

>Actually, why are these no-op? What if CDATA or PI is added to an HTML document using scripts? > How does this work currently?

Sorry, I thought that SGML didn't support CDATA or PI, and this is not the case, according to the specification, even if the HTML specification says that the user agent should ignore them http://www.w3.org/TR/html4/appendix/notes.html#sgmlfeatures.
The current HTML serializer supports CDATA and PI. I will remove this empty methods.

>And when is mozdirty used? We really shouldn't have anything like that.
> It might be better to use some property in a content object. (Set/GetProperty)

We already talked about it on irc :-). I created a new bug 459212

>+nsXHTMLContentSerializer::AppendEndOfElementStart(nsIDOMElement *aOriginalElement,
>+...
>But this method is in XHTML serializer, not HTML.

Yes, of course, because the end of an HTML element is always ">", so we don't need this method in the HTML serializer. The behavior of this methods is for old HTML browser which doesn't support XHTML.

>Make sure this all works with nested bodies and when there are several sibling bodies. (Maybe you're testing that, haven't yet read the tests)

Yes there are tests, which are provided by the patch of bug 424359.

>+    // for backward compatibility with old HTML user agents,
>+    // empty elements should have an ending tag, so we mustn't call
>+    // nsXMLContentSerializer::CheckElementEnd
>But this is XHTML serializer. Why do we need to care about HTML UAs?

Why not ? It is needed by the editor, to generate pages that can be viewed in old HTML UAs. If a web developer creates a page with Mozilla Composer, BlueGriffon or else, I think he would like to see his page in his old browser (well, there are some chance that the rendering is not the expected one, but if we can minimize unexpected behaviors, at least by doing a good serialization for this old browsers...).

>+void
>+nsXHTMLContentSerializer::IncrIndentation(nsIAtom* aName)
> ...
>Why most /***/? Should you check namespace?

In fact, I hesitate: we can increment indentation on only few elements, but is it really necessary ? I don't know.

I will attach a new patch with all corrections you ask in your other comments. It is almost ready, I just have to add new tests.
(Assignee)

Comment 8

9 years ago
Created attachment 342862 [details] [diff] [review]
version 1.1

It fixes bad things pointed by Olli: bad spelling in a comment, style nits, indentation, assignement of the forceFormat attribute, namespace checking in various methods, new test with a <pre> from an other namespace.

Thank you Olli for your review
Attachment #338851 - Attachment is obsolete: true
Attachment #342862 - Flags: review?(Olli.Pettay)
(Assignee)

Comment 9

9 years ago
Comment on attachment 342862 [details] [diff] [review]
version 1.1

I will update the patch tomorrow, since some changes have been made on the patch of bug 424359.
Attachment #342862 - Flags: review?(Olli.Pettay)
(Assignee)

Comment 10

9 years ago
Created attachment 343031 [details] [diff] [review]
patch 1.2 : updated after changes on the patch of bug 424359
Attachment #342862 - Attachment is obsolete: true
Attachment #343031 - Flags: review?(Olli.Pettay)
Comment on attachment 343031 [details] [diff] [review]
patch 1.2 : updated after changes on the patch of bug 424359

Please use -P when generating patches.

Whenever casting is needed, use C++ style (static_cast<> etc.), not C's '()'.
>-    if (mIsCopying && mIsFirstChildOfOL && (aTagName == nsGkAtoms::li) && 
>-        (attrName == nsGkAtoms::value)){
>+    if (mIsCopying && mIsFirstChildOfOL && (aTagName == nsGkAtoms::li)
>+        && (attrName == nsGkAtoms::value)) {
No need for this. It just makes the patch even bigger.

>@@ -588,19 +186,6 @@
>     }
> 
>     attrName->ToString(nameStr);
>-    
>-    /*If we already crossed the MaxColumn limit or 
>-    * if this attr name-value pair(including a space,=,opening and closing quotes) is greater than MaxColumn limit
>-    * then start the attribute from a new line.
>-    */
>-
>-    if (mDoFormat
>-        && (mColPos >= mMaxColumn
>-            || ((PRInt32)(mColPos + nameStr.Length() +
>-                          valueStr.Length() + 4) > mMaxColumn))) {
>-        aStr.Append(mLineBreak);
>-        mColPos = 0;
>-    }
Without -P it is hard to say which method this code belongs to.

>+NS_IMETHODIMP
>+nsXHTMLContentSerializer::Init(PRUint32 aFlags, PRUint32 aWrapColumn,
>+                              const char* aCharSet, PRBool aIsCopying,
>+                              PRBool aIsWholeDocument)
>+{
>+  // to keep compatibility with existing calling code
>+  // XXXLJ perhaps we should remove these two tests ?
>+  if (aFlags & nsIDocumentEncoder::OutputFormatted ) {
>+      aFlags = aFlags | nsIDocumentEncoder::OutputWrap;
>+
>+  } else if (!(aFlags & nsIDocumentEncoder::OutputRaw)) {
>+      aFlags = aFlags | nsIDocumentEncoder::OutputWrap;
>+  }
Could you explain why this all? What compatibility?

>+nsXHTMLContentSerializer::HasLongLines(const nsString& text, PRInt32& aLastNewlineOffset)
>+{
>+  PRUint32 start=0;
>+  PRUint32 theLen=text.Length();
Add space before and after '='

>+  PRBool rv = PR_FALSE;
>+  aLastNewlineOffset = kNotFound;
>+  for (start = 0; start < theLen; ) {
>+    PRInt32 eol = text.FindChar('\n', start);
>+    if (eol < 0) {
>+      eol = text.Length();
>+    }
>+    else {
>+      aLastNewlineOffset = eol;
>+    }
>+    if (PRInt32(eol - start) > kLongLineLen)
>+      rv = PR_TRUE;
>+    start = eol+1;
Nit, Add space before and after '+'

>+  if (aNamespaceID != kNameSpaceID_XHTML && !mIsHTMLSerializer) {
>+    return PR_FALSE;
>+  }
I guess this should be
if ((mIsHTMLSerializer && aNamespaceID != kNameSpaceID_None) ||
    (!mIsHTMLSerializer && aNamespaceID != kNameSpaceID_XHTML))
Similar thing also elsewhere.

>+nsXHTMLContentSerializer::IncrIndentation(nsIAtom* aName)
>+{
>+  /*if ((aName == nsGkAtoms::head) ||
>+      (aName == nsGkAtoms::table) ||
>+      (aName == nsGkAtoms::tr) ||
>+      (aName == nsGkAtoms::ul) ||
>+      (aName == nsGkAtoms::ol) ||
>+      (aName == nsGkAtoms::dl) ||
>+      (aName == nsGkAtoms::tbody) ||
>+      (aName == nsGkAtoms::form) ||
>+      (aName == nsGkAtoms::frameset) ||
>+      (aName == nsGkAtoms::blockquote) ||
>+      (aName == nsGkAtoms::li) ||
>+      (aName == nsGkAtoms::dt) ||
>+      (aName == nsGkAtoms::dd)) {*/
>+      nsXMLContentSerializer::IncrIndentation(aName);
>+  //}
>+}
>+
>+void
>+nsXHTMLContentSerializer::DecrIndentation(nsIAtom* aName)
>+{
>+  /*if ((aName == nsGkAtoms::head) ||
>+      (aName == nsGkAtoms::table) ||
>+      (aName == nsGkAtoms::tr) ||
>+      (aName == nsGkAtoms::ul) ||
>+      (aName == nsGkAtoms::ol) ||
>+      (aName == nsGkAtoms::dl) ||
>+      (aName == nsGkAtoms::li) ||
>+      (aName == nsGkAtoms::tbody) ||
>+      (aName == nsGkAtoms::form) ||
>+      (aName == nsGkAtoms::blockquote) ||
>+      (aName == nsGkAtoms::dt) ||
>+      (aName == nsGkAtoms::dd) ||
>+      (aName == nsGkAtoms::frameset)) {*/
>+      nsXMLContentSerializer::DecrIndentation(aName);
>+  //}
>+}
Why you have commented out most of the code? Either the code should be there
and do something reasonable, or it shouldn't be there at all.


>+  // To keep track of startvalue of OL and first list item for nested lists
>+  struct olState {
>+    olState(PRInt32 aStart, PRBool aIsFirst):startVal(aStart),isFirstListItem(aIsFirst)
>+    {
>+    }
>+    PRInt32 startVal;
>+    PRBool isFirstListItem;
>+  };
>+
>+  nsAutoVoidArray   mOLStateStack;// Stack to store one olState struct per <OL>.
I know you're just moving this code, but could you make this something like
nsTArray<olState> mOLStateStack;
That way there isn't need to manually |new| and |delete|
You may want to add a copy constructor to struct olState.


>@@ -468,13 +565,13 @@
>-    
>+
No need for whitespace changes. Those just make the patch even bigger.
Same thing also elsewhere.

>+void
>+nsXMLContentSerializer::IncrIndentation(nsIAtom* aName)
>+{
>+  // we want to keep the source readable
>+  if(mDoWrap && mIndent.Length() >= mMaxColumn - 15) {
>+    mIndentOverflow ++;
>+  }
What is this magical 15?

>+  else {
>+    mIndent.AppendLiteral("   ");
Why 3 spaces? Maybe better to #define it somewhere.

>+void
>+nsXMLContentSerializer::AppendToStringConvertLF(const nsAString& aStr,
>+                                                nsAString& aOutputStr)
>+{
>+  if (mDoRaw) {
>+    nsAutoString str (aStr);
>+    PRInt32 lastNewlineOffset = str.RFindChar('\n');
>+    AppendToString(aStr, aOutputStr);
>+
>+    if (lastNewlineOffset != kNotFound) {
>+      PRInt32 lastNewlineOffset2 = str.RFindChar('\r');
>+      if (lastNewlineOffset2 != kNotFound) {
>+        lastNewlineOffset = lastNewlineOffset2;
>+      }
>+
>+      mColPos = aStr.Length() - lastNewlineOffset;
Could you add some comment here what these offsets are meant for.
...
>+        // if there was a line break before this substring
>+        // AppendNewLineToString was call, so we should reverse
>+        // this flag
>+        mMayIgnoreLineBreakSequence = PR_FALSE;
'was call' -> 'was called'?

>+nsXMLContentSerializer::AppendWrapped_NonWhitespaceSequence(
...
>+    if (aPos == aEnd || foundWhitespaceInLoop) {
>+      // there is enough room for the complete block we found
>+      if (mDoFormat && !mColPos) {
>+        AppendIndentation(aOutputStr);
>+      }
>+
>+      if (mAddSpace) {
>+        aOutputStr.Append(PRUnichar(' '));
>+        mAddSpace = PR_FALSE;
>+      }
So we want both AppendIndentation and Append(PRUnichar(' ')) ?

>+        if (foundWrapPosition) {
>+          if (!mColPos && mDoFormat) {
>+            AppendIndentation(aOutputStr);
>+          }
>+          if (mAddSpace) {
>+            aOutputStr.Append(PRUnichar(' '));
>+            mAddSpace = PR_FALSE;
>+          }
Same here. Is there a test for the case when !mColPos && mDoFormat && mAddSpace?
(if that is even possible case.)

>+          // try some simple fallback logic
>+          // go forward up to the next whitespace position,
>+          // in the worst case this will be all the rest of the data
>+          mColPos += length;
You added this. Could you explain why it is needed.

>+  // if the current line already has text on it, such as a tag,
>+  // leading whitespace is significant
>+  PRBool mayIgnoreStartOfLineWhitespaceSequence = (!mColPos 
>+                                            || (mIsIndentationAddedOnCurrentLine 
>+                                                && sequenceStartAfterAWhitespace
>+                                                && mColPos == mIndent.Length()));
Perhaps newline after '=' and start the expression from the next line.

>+void
>+nsXMLContentSerializer::AppendWrapped_WhitespaceSequence(
>+                        nsASingleFragmentString::const_char_iterator &aPos,
>+                        const nsASingleFragmentString::const_char_iterator aEnd,
>+                        const nsASingleFragmentString::const_char_iterator aSequenceStart,
>+                        nsAString &aOutputStr)
>+{
Note to myself: so this isn't replacing nsHTMLContentSerializer::AppendWrapped_WhitespaceSequence,
but there is nsXMLContentSerializer::AppendFormatedWrapped_WhitespaceSequence
for that, sort of. Documented in nsXMLContentSerializer.h ...

>+  // Handle the complete sequence of whitespace.
>+  // Continue to iterate until we find the first non-whitespace char.
>+  // Updates "aPos" to point to the first unhandled char.
>+  mAddSpace = PR_FALSE;
>+  mIsIndentationAddedOnCurrentLine = PR_FALSE;
>+
>+  PRBool leaveLoop = PR_FALSE;
>+  nsASingleFragmentString::const_char_iterator lastPos = aPos;
>+
>+  do {
>+    switch (*aPos) {
>+      case ' ':
>+      case '\t':
>+        // if there are two many spaces on a line, we wrap
'two many' -> 'too many'

>\ No newline at end of file
Fix these warnings
Attachment #343031 - Flags: review?(Olli.Pettay) → review-
New patch coming?
(Assignee)

Comment 13

9 years ago
Yes, it's coming :-) I already updated my local patch with your comments, but I added also some unit tests to test serialization of OL element with the htmlcopyencoder, and also on the serialization of a selection, of a single node etc.  It appears that some of these tests fail. My patch brokes some few things :-). I currently try to fix these bugs.

There are also few problems when we use the XHTML serializer on a HTML document (and I didn't already test the HTML serializer on an XHTML document). I will try to fix that too. I think I have also to modify the htmlcopyencoder to use the XHTML serializer when the document is in XHTML.

I think a new version of the patch will be ready at the end of the week.
(Assignee)

Updated

9 years ago
Blocks: 56921
(Assignee)

Comment 14

9 years ago
Created attachment 348652 [details] [diff] [review]
patch v2

Here is a new version of the patch.

It includes all wanted improvements:
- used static_cast instead of ()
- canceled changes on whitespaces and indentation
- added spaces where needed
- changed test on namespaces as indicated
- removed nsXHTMLContentSerializer::IncrIndentation and nsXHTMLContentSerializer::DecrIndentation
- changed the type of mOLStateStack to nsTArray<olState>
- put the "magical 15" and the default indentation into #define
- removed unuseful code in nsXMLContentSerializer::AppendToStringConvertLF
- fixed tests in nsXMLContentSerializer::AppendWrapped_NonWhitespaceSequence and added details in comments

And I added also:
- some unit tests on the serialization of selections, range, node etc..
- some unit tests on the serialization of OL lists
- unit tests on htmlcopyencoder
- a test for bug 270145 to verify I do not break something
- support of "autoplay" and "controls" attributes on video and audio elements

And I changed flags in nsContentAreaDragDrop to keep the original serialization of dragged content.
Attachment #343031 - Attachment is obsolete: true
Attachment #348652 - Flags: review?(Olli.Pettay)
(Assignee)

Comment 15

9 years ago
Created attachment 348653 [details] [diff] [review]
diff between v1.2 and v2

Just a diff between patch v1.2 and patch v2...
The patch breaks /tests/embedding/test/test_bug293834.html
and I think also /tests/toolkit/components/url-classifier/tests/mochitest/test_classifier.html
(Assignee)

Comment 17

8 years ago
Created attachment 357873 [details] [diff] [review]
patch v2.1

This is an update of the patch for the current trunk (patch on a Makefile.in was broken). No modification in the part of the patch on the serializer itself.

This new patch includes a modification on test_bug293834.html to pass the test. No modification with test_classifier.html since it passes on my machine.
Attachment #348652 - Attachment is obsolete: true
Attachment #357873 - Flags: review?(Olli.Pettay)
Attachment #348652 - Flags: review?(Olli.Pettay)
Do you know why the patch breaks test_bug293834.html? And is the new behavior
more correct than the old one?
(Assignee)

Comment 19

8 years ago
In fact, in this test, there is a serialisation, and the serialized content is then "unserialized" in a new iframe. Then the test check if a simple string "form state" is in textarea or input contents. However, the new serializer has a better wrapping, and then there is now a linebreak in some input contents : "form\nstate". This is why the test failed. So I changed the regular expression in the test : from /form state/  to /form\s+state/.
Created attachment 357942 [details]
comments

Updated

8 years ago
Attachment #357942 - Attachment is patch: false
(Assignee)

Comment 21

8 years ago
Created attachment 358388 [details] [diff] [review]
patch v2.2

>@@ -1415,17 +1415,18 @@ nsTransferableFactory::SerializeNodeOrSe
>Is there some testcase for this? Or is it even possible to test this
>automatically? At least you have tested this manually, right?

It seems to be complex to do such testcase automatically (perhaps there are some functions in mochitests to facilitate to write test case for it ? Or am I wrong ?). So yes, I've done manual tests. However, I saw that this method use the "html copy encoder", so I added some test on it in test_htmlcopyencoder.html and test_htmlcopyencoder.xhtml.


>+      mPreLevel--;
>Use prefix operator (it is faster, though compiler probably knows how to optimize it)

Done, and also in all other lines where ++ and -- is used.

> mPreLevel decrement happens now somewhere else, right?

Yes, in nsXHTMLContentSerializer::MaybeLeaveFromPreContent()

>+PRBool
>+nsXHTMLContentSerializer::IsJavaScript(nsIAtom* aAttrNameAtom, const nsAString& aValueString)
>This should handle also XUL and SVG inside XHTML, right? Perhaps the method
>could have an nsIContent* aContent parameter, and based on its type
>it could check whether the attribute is an event attribute name.

Yes you're right. So I moved the IsJavascript method to the nsXMLContentSerializer, then XHTML/SVG/XUL attributes into any XML content can be checked too. Its new signature is
   PRBool IsJavaScript(nsIContent * aContent,
                       nsIAtom* aAttrNameAtom,
                       PRInt32 aAttrNamespaceID,
                       const nsAString& aValueString);

And I restored the original IsJavascript method into the nsHTMLContentSerializer.

>+void 
>+nsXHTMLContentSerializer::AppendEndOfElementStart(nsIDOMElement *aOriginalElement,
>+                                                  nsIAtom * aName,
>+                                                  PRInt32 aNamespaceID,
>+                                                  nsAString& aStr)
>+{
>+  // this method is not called by nsHTMLContentSerializer
>+  // so we don't have to check HTML element, just XHTML
>Could you add an assertion that this isn't an nsHTMLContentSerializer

I added 
  NS_ASSERTION(!mIsHTMLSerializer, "nsHTMLContentSerializer shouldn't call this method !");

Also in other places where you told me...

>+nsXHTMLContentSerializer::IsShorthandAttr(const nsIAtom* aAttrName,
>+                                        const nsIAtom* aElementName)
>Is there a testcase which checks that shorthand attributes works in HTML and XHTML,
>and don't if the element is in some other namespace?

No. I added some in file_xhtmlserializer_1. In XHTML, this method is called only when we are on a XHTML element.
So it is not called on any other elements. However, it was called for attributes which have a specific namespace, then I fixed it.

>+  PRPackedBool mInPseudoCDATA;
>Perhaps better to call this something else than ***CDATA.
>Maybe mInNoEntityEncoding ?

Ok, I renamed it to mDisableEntityEncoding.

>Btw, how does this handle the case when 'script' or 'style' etc. have
>child elements? I know, that isn't probably valid, but still possible
>if the DOM is modified by a script. Perhaps using a counter would be
>better than just a simple flag.

Yes, you're right, I changed this.

>+  // To keep track of startvalue of OL and first list item for nested lists
>+  struct olState {
>I'd prefer having an explicit copy-constructor, and maybe even operator=
>(depends on what nsTArray requires)

I added an explicit copy-constructor.

>+// the string used to indent.
>+#define INDENT_STRING "   "
>Why 3 spaces?

This is an arbitrary value..

>+
>+  mDoRaw = (mFlags & nsIDocumentEncoder::OutputRaw) ? PR_TRUE
>+                                                    : PR_FALSE;
>mDoRaw = !!(mFlags & nsIDocumentEncoder::OutputRaw);

ok

>+
>+  mDoFormat = (mFlags & nsIDocumentEncoder::OutputFormatted 
>+                && !mDoRaw) ? PR_TRUE
>+                            : PR_FALSE;
>Why not just
>  mDoFormat = (mFlags & nsIDocumentEncoder::OutputFormatted && !mDoRaw);

ok

>+
>+  mDoWrap = (mFlags & nsIDocumentEncoder::OutputWrap
>+            && !mDoRaw) ? PR_TRUE
>+                        : PR_FALSE;
>Similar thing here

ok

>Use NS_NAMED_LITERAL_STRING for <![CDATA[

ok. Done for  "<!--" too.

>   if (!data.IsEmpty()) {
>-    AppendToString(NS_LITERAL_STRING(" "), aStr);
>+    AppendToString(PRUnichar(' '), aStr);
>Nit, is there any reason for this change?

Because I think it uses less memory since with a PRUnichar, we don't instantiate any object...

>-        if ( *buf == PRUnichar('\'') )
>-        {
>+        if (*buf == PRUnichar('\'')) {
>Why this change? The coding style is anyway a bit strange around here.

Ok, I canceled this change.


I fixed also all other little things indicated in your comments: about syntax, about whitespaces, about C++/Idl comments etc. I fixed also testcases which didn't pass on windows.
Attachment #348653 - Attachment is obsolete: true
Attachment #357873 - Attachment is obsolete: true
Attachment #358388 - Flags: review?(Olli.Pettay)
Attachment #357873 - Flags: review?(Olli.Pettay)
Created attachment 358437 [details]
comments
Comment on attachment 358388 [details] [diff] [review]
patch v2.2

I could mark this r+ now, but this needs good sr too. The patch is just so huge (although lots of it is moving code).
Attachment #358388 - Flags: review?(Olli.Pettay) → review+
(Assignee)

Comment 24

8 years ago
Created attachment 359039 [details] [diff] [review]
patch v2.3

Fixed Olli's latest comments. For the isJavascript method, I need to know how to make the difference between a HTML content and a XML node without a namespace. Should I do a QueryInterface each time I run this method ?
Attachment #358388 - Attachment is obsolete: true
Attachment #359039 - Flags: superreview?(peterv)
Maybe aContent->IsNodeOfType(nsINode::eHTML)
(Assignee)

Comment 26

8 years ago
Created attachment 359058 [details] [diff] [review]
patch v2.4

Thanks Olli, I removed the isJavascript method from nsHTMLContentSerializer.

Oh, and the bug for MathML is bug 475518.
Attachment #359039 - Attachment is obsolete: true
Attachment #359058 - Flags: superreview?(peterv)
Attachment #359039 - Flags: superreview?(peterv)
(Assignee)

Updated

8 years ago
Attachment #359058 - Flags: superreview?(peterv) → superreview?(jonas)
Comment on attachment 359058 [details] [diff] [review]
patch v2.4

>@@ -685,649 +265,143 @@ nsHTMLContentSerializer::AppendElementSt
>       //If OL has "start" attribute, first LI element has to start with that value
>       //Therefore subtracting 1 as all the LI elements are incrementing it before using it;
>       //In failure of ToInteger(), default StartAttrValue to 0.
>       if (NS_SUCCEEDED(rv))
>         startAttrVal--; 
>       else
>         startAttrVal = 0;
>     }
>-    olState* state = new olState(startAttrVal, PR_TRUE);
>-    if (state)
>-      mOLStateStack.AppendElement(state);
>+    olState state (startAttrVal, PR_TRUE);
>+    mOLStateStack.AppendElement(state);

No space before the '('.

Also, I think you could write this as
mOLStateStack.AppendElement(olState(startAttrVal, PR_TRUE));

>+nsXHTMLContentSerializer::SerializeAttributes(nsIContent* aContent,
...
>+  NS_NAMED_LITERAL_STRING(_mozStr, "_moz");
...
>+    // Filter out any attribute starting with [-|_]moz
>+    const char* sharedName;
>+    attrName->GetUTF8String(&sharedName);
>+    if ((('_' == *sharedName) || ('-' == *sharedName)) &&
>+        !nsCRT::strncmp(sharedName+1, kMozStr, PRUint32(sizeof(kMozStr)-1))) {
>+      continue;
>+    }

is the '+1' there really correct? Seems like this will filter attributes starting with "__moz" or "-_moz"

Other than that I couldn't find anything. But this patch is a bit enormous and so hard to review. sr=me either way
Attachment #359058 - Flags: superreview?(jonas) → superreview+
(Assignee)

Comment 28

8 years ago
Created attachment 372288 [details] [diff] [review]
patch v2.5

Unfortunately, the applying of the patch 2.4 fails on the current trunk because of some recents changes on nsHTMLContentSerializer.* and nsXMLContentSerializer.*. This changes are on the nsVoidArray properties (http://hg.mozilla.org/mozilla-central/rev/52c7758d5a4d and http://hg.mozilla.org/mozilla-central/rev/659355060532 ). On nsHTMLContentSerializer, my patch 2.4 contains already this changes (asked by Olli, see comments below), but it doesn't includes changes on nsXMLContentSerializer.

I updated my patch.

>Also, I think you could write this as
>mOLStateStack.AppendElement(olState(startAttrVal, PR_TRUE));

Yes, it is already made in the trunk

>is the '+1' there really correct? Seems like this will filter attributes
starting with "__moz" or "-_moz"

no, because kMozStr equals to "moz", not "_moz". I added some tests about this in this new patches.

Thanks for your superreview Jonas :-)
Attachment #359058 - Attachment is obsolete: true
(Assignee)

Comment 29

8 years ago
Created attachment 372289 [details] [diff] [review]
the diff between the latest two patches

here are the differences between patch v2.4 and v2.5.

I forgot to say that I also updated some few unit tests that I forgot to modify for patch 2.4 (difference of number of spaces for indentation).

For patch 2.5, I launched mochitests and xpcshell tests on linux and mac, and all is green, except on some tests in toolkit/content/tests/widgets/test_textbox_number.xul and browser/base/content/test/test_contextmenu.html. However I don't think it is related to the serializer.
(Assignee)

Comment 30

8 years ago
Should I ask a new review/superreview or can we land the patch in the trunk ?
The patch is so huge that perhaps worth to use tryserver before landing to trunk.
(Assignee)

Comment 32

8 years ago
Ok Olli. I haven't rights to use tryserver. Could somebody test my patch on a tryserver please ?
(Assignee)

Comment 33

8 years ago
Olli has landed the patch into tryservers. The results are :

For build only, it's ok
- linux: http://tinderbox.mozilla.org/showlog.cgi?log=MozillaTry/1239886953.1239890526.19483.gz
- mac: http://tinderbox.mozilla.org/showlog.cgi?log=MozillaTry/1239886954.1239892679.28620.gz
- windows: http://tinderbox.mozilla.org/showlog.cgi?log=MozillaTry/1239886953.1239894765.32652.gz

For build+test
- mac: ok: http://tinderbox.mozilla.org/showlog.cgi?log=MozillaTry/1239886954.1239895206.842.gz
- leaks on linux : http://tinderbox.mozilla.org/showlog.cgi?log=MozillaTry/1239886953.1239894676.32544.gz
However, these leaks appear on many components which are not related to the serializer I mean (ex: mozstorage).

- a test failed on windows http://tinderbox.mozilla.org/showlog.cgi?log=MozillaTry/1239886953.1239898098.6218.gz

Apparently, it is because of a difference on end-of-line characters.. I will fix this test :-/
(Assignee)

Comment 34

8 years ago
Created attachment 373667 [details] [diff] [review]
patch v2.6

I just fixed the test which fails. It was just a missing flag OutputLFLineBreak when calling the encoder.
Attachment #372288 - Attachment is obsolete: true
Attachment #372289 - Attachment is obsolete: true
(Assignee)

Comment 35

8 years ago
Olli, could you test this patch on try servers please ?
Uploaded to tryserver. Look for 'laurentj_serializers_26'
(Assignee)

Comment 37

8 years ago
Thank you Olli.

All box are greens except one. Here are the reports for all green boxes:

http://tinderbox.mozilla.org/showlog.cgi?log=MozillaTry/1240258500.1240276856.6161.gz
http://tinderbox.mozilla.org/showlog.cgi?log=MozillaTry/1240258500.1240270883.31148.gz
http://tinderbox.mozilla.org/showlog.cgi?log=MozillaTry/1240258500.1240270911.31177.gz
http://tinderbox.mozilla.org/showlog.cgi?log=MozillaTry/1240258500.1240270971.31240.gz
http://tinderbox.mozilla.org/showlog.cgi?log=MozillaTry/1240258529.1240264875.21996.gz
http://tinderbox.mozilla.org/showlog.cgi?log=MozillaTry/1240260238.1240269576.29357.gz
http://tinderbox.mozilla.org/showlog.cgi?log=MozillaTry/1240258529.1240260818.15811.gz
http://tinderbox.mozilla.org/showlog.cgi?log=MozillaTry/1240259546.1240265209.22553.gz
http://tinderbox.mozilla.org/showlog.cgi?log=MozillaTry/1240260811.1240267460.26305.gz
http://tinderbox.mozilla.org/showlog.cgi?log=MozillaTry/1240261231.1240272761.927.gz
http://tinderbox.mozilla.org/showlog.cgi?log=MozillaTry/1240258500.1240278058.7574.gz
http://tinderbox.mozilla.org/showlog.cgi?log=MozillaTry/1240258500.1240277494.6906.gz
http://tinderbox.mozilla.org/showlog.cgi?log=MozillaTry/1240258500.1240277619.7035.gz
http://tinderbox.mozilla.org/showlog.cgi?log=MozillaTry/1240258500.1240277623.7038.gz

The fail :
http://tinderbox.mozilla.org/showlog.cgi?log=MozillaTry/1240258500.1240269405.29116.gz&fulltext=1

The browser crashed during talos tests. I don't know why. Log file is not very detailed about the fail.

Is it significant or can we commit the patch into mozilla-central ?
(Assignee)

Comment 38

8 years ago
Yesterday Olli has re-pushed the patch on try servers, and all were greens.
Keywords: checkin-needed
I'll push this once my build is ready.
http://hg.mozilla.org/mozilla-central/rev/f6d27eb05ec1
Status: ASSIGNED → RESOLVED
Last Resolved: 8 years ago
Keywords: checkin-needed
Resolution: --- → FIXED
(Assignee)

Comment 41

8 years ago
Thank you Olli for all, and thanks Jonas for your superreview :-)
Depends on: 494103
This seems to have added a warning which looked like a serious bug to me at first glance (virtual methods not being overridden correctly):

mozilla/src/content/base/src/nsXHTMLContentSerializer.h:114:
warning: ‘virtual void nsXHTMLContentSerializer::SerializeAttributes(nsIContent*, nsIDOMElement*, nsAString_internal&, const nsAString_internal&, nsIAtom*, nsAString_internal&, PRUint32, PRBool)’ was hidden
by ‘virtual void nsHTMLContentSerializer::SerializeAttributes(nsIContent*, nsIDOMElement*, nsAString_internal&, const nsAString_internal&, nsIAtom*, nsAString_internal&)’

After spending an hour or so reading through this code, I _think_ that it's correct.  But is there a reason the HTML one has a different signature?  Or isn't called SerializeHTMLAttributes or something?
(Assignee)

Comment 43

8 years ago
Actually, I think there is no reason to define this nsHTMLContentSerializer::SerializeAttributes method with the same name of the  nsXHTMLContentSerializer one. Yes, we could call it SerializeHTMLAttributes.

Do you want I create a bug and provide a patch ?
That would be wonderful!
(Assignee)

Comment 45

8 years ago
done: https://bugzilla.mozilla.org/show_bug.cgi?id=513647
(Assignee)

Updated

8 years ago
Blocks: 502177

Updated

8 years ago
Depends on: 524975

Updated

7 years ago
Depends on: 545644
Depends on: 546474
Depends on: 549295
This patch broke the wrapping behavior when serializing HTML.  See bug 549295.
(Assignee)

Updated

7 years ago
Depends on: 541937
What's the use case for having the W3CEntities (MathML) configuration?
You need to log in before you can comment on or make changes to this bug.