Last Comment Bug 422403 - improvements to the XML and XHTML serializers
: improvements to the XML and XHTML serializers
Status: RESOLVED FIXED
:
Product: Core
Classification: Components
Component: Serializers (show other bugs)
: Trunk
: All All
-- enhancement (vote)
: ---
Assigned To: Laurent Jouanneau
:
: Andrew Overholt [:overholt]
Mentors:
Depends on: 524975 424359 494103 541937 545644 546474 549295
Blocks: 45628 55313 278518 281407 333060 502177 56921 236290 368437
  Show dependency treegraph
 
Reported: 2008-03-12 07:22 PDT by Laurent Jouanneau
Modified: 2011-04-21 02:22 PDT (History)
10 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments
Work in progress #1 (74.52 KB, patch)
2008-07-18 08:55 PDT, Laurent Jouanneau
no flags Details | Diff | Splinter Review
Work in progress #2 (100.80 KB, patch)
2008-08-18 09:14 PDT, Laurent Jouanneau
no flags Details | Diff | Splinter Review
Work in progress #3 (137.30 KB, patch)
2008-08-22 07:36 PDT, Laurent Jouanneau
no flags Details | Diff | Splinter Review
the patch, version 1.0 (272.30 KB, patch)
2008-09-16 06:25 PDT, Laurent Jouanneau
bugs: review-
Details | Diff | Splinter Review
comments (9.10 KB, text/plain)
2008-10-07 08:53 PDT, Olli Pettay [:smaug] (pto-ish for couple of days)
no flags Details
version 1.1 (278.24 KB, patch)
2008-10-13 05:41 PDT, Laurent Jouanneau
no flags Details | Diff | Splinter Review
patch 1.2 : updated after changes on the patch of bug 424359 (277.96 KB, patch)
2008-10-14 03:28 PDT, Laurent Jouanneau
bugs: review-
Details | Diff | Splinter Review
patch v2 (331.35 KB, patch)
2008-11-17 14:38 PST, Laurent Jouanneau
no flags Details | Diff | Splinter Review
diff between v1.2 and v2 (79.39 KB, patch)
2008-11-17 14:39 PST, Laurent Jouanneau
no flags Details | Diff | Splinter Review
patch v2.1 (332.58 KB, patch)
2009-01-20 15:11 PST, Laurent Jouanneau
no flags Details | Diff | Splinter Review
comments (15.44 KB, text/plain)
2009-01-21 04:58 PST, Olli Pettay [:smaug] (pto-ish for couple of days)
no flags Details
patch v2.2 (344.29 KB, patch)
2009-01-23 05:48 PST, Laurent Jouanneau
bugs: review+
Details | Diff | Splinter Review
comments (2.59 KB, text/plain)
2009-01-23 10:30 PST, Olli Pettay [:smaug] (pto-ish for couple of days)
no flags Details
patch v2.3 (344.39 KB, patch)
2009-01-27 06:43 PST, Laurent Jouanneau
no flags Details | Diff | Splinter Review
patch v2.4 (343.95 KB, patch)
2009-01-27 09:12 PST, Laurent Jouanneau
jonas: superreview+
Details | Diff | Splinter Review
patch v2.5 (341.70 KB, patch)
2009-04-12 06:00 PDT, Laurent Jouanneau
no flags Details | Diff | Splinter Review
the diff between the latest two patches (35.47 KB, patch)
2009-04-12 06:12 PDT, Laurent Jouanneau
no flags Details | Diff | Splinter Review
patch v2.6 (341.72 KB, patch)
2009-04-20 09:49 PDT, Laurent Jouanneau
no flags Details | Diff | Splinter Review

Description User image Laurent Jouanneau 2008-03-12 07:22:46 PDT
In the current version of Gecko, the XML serializer is used to serialize XHTML content. However, it doesn't serialize like the HTML serializer does for HTML documents (pretty-printing, specific serialization for some tags etc.) so it isn't really usefull for Composer.

I'm going to fix this bug. I think I will create a new nsXHTMLContentSerializer class. It will support features of nsHTMLContentSerializer, and of course will take care of all XML stuff like namespaces etc.
Comment 1 User image Laurent Jouanneau 2008-04-03 09:35:00 PDT
After working few hours on the serializers, I decided to make improvements in the XML serializer at the same time, to do pretty printing, wrapping etc. In fact, I need it for my wysiwyg XML editor (ETNA).
Comment 2 User image Laurent Jouanneau 2008-07-18 08:55:19 PDT
Created attachment 330248 [details] [diff] [review]
Work in progress #1

First patch. It is not finished. Worked only on the XML serializer so I didn't updated mochitests for the HTML serializer. Indentation doesn't work very well and then some xpcshell tests fail for the XML serializer. You need to apply the patch for bug 424359 before applying this patch.
Comment 3 User image Laurent Jouanneau 2008-08-18 09:14:56 PDT
Created attachment 334297 [details] [diff] [review]
Work in progress #2

The XML serializer is almost finished. Indentation is better than in the old HTML serializer (text content is well indented). Some tests should be still added to check attributes and namespaces serialization. I worked on the HTML serializer but it's not finished.
Comment 4 User image Laurent Jouanneau 2008-08-22 07:36:06 PDT
Created attachment 335045 [details] [diff] [review]
Work in progress #3

HTML and XML serializers are fines now. All tests pass :-)

I'm working now on a new nsXHMLContentSerializer class, which will inherit from nsXMLContentSerializer and nsHTMLContentSerializer will inherit from nsXHTMLContentSerializer.
Comment 5 User image Laurent Jouanneau 2008-09-16 06:25:52 PDT
Created attachment 338851 [details] [diff] [review]
the patch, version 1.0

I think my patch is ready for a first review.

Remember that you need to apply the patch for bug 424359 before applying this patch.

So, what's this patch doing ?

- it adds a new content serializer for XHTML : there is a new class nsXHTMLContentSerializer which inherits from nsXMLContentSerializer. This serializer is called only for pages which have a application/xhtml+xml content type. This serializer respects the XML serialization specification, but also specificities of HTML serialization (pre elements, script elements etc..).
- most of features of nsHTMLContentSerializer are moved into the XML serializer, or into the XHTML serializer.
- nsHTMLContentSerializer now inherits from nsXHTMLContentSerializer.
- the XML serializer supports now this flags : OutputFormatted, OutputWrap (+ OutputLFLineBreak and OutputCRLineBreak added by the patch of bug 424359)
- indentation and wrapping are better (there were some problems in the HTML serializer).
- unit tests are added (and I will add more tests later)
- documentation of flags on nsIDocumentEncoder is updated
Comment 6 User image Olli Pettay [:smaug] (pto-ish for couple of days) 2008-10-07 08:53:35 PDT
Created attachment 342079 [details]
comments

See comments in the attachment

This is so huge patch that few iterations are needed.
Sorry that reviewing has taken so much time.
Comment 7 User image Laurent Jouanneau 2008-10-10 03:26:00 PDT
> This is so huge patch that few iterations are needed.
> Sorry that reviewing has taken so much time.

Don't worry, no problem, I understand :-)

Here are responses to some of your questions in your comments. 

> But why does OutputRaw wrap? How is XHTML handled.

There isn't any wrap when you set the OutputRaw flag. But if you don't set any flags (no OutputFormatted, no OutputRaw, no OutputWrap), it acts like OutputRaw but it wraps long lines. And I don't know why, so I just kept the current behavior. Perhaps it is related to the editor, or mail...

> do we really want HTML entities in XHTML?

Yes, the specification of XHTML says : "The XHTML entity sets are the same as for HTML 4". http://www.w3.org/TR/xhtml1/dtds.html#h-A2

>Actually, why are these no-op? What if CDATA or PI is added to an HTML document using scripts? > How does this work currently?

Sorry, I thought that SGML didn't support CDATA or PI, and this is not the case, according to the specification, even if the HTML specification says that the user agent should ignore them http://www.w3.org/TR/html4/appendix/notes.html#sgmlfeatures.
The current HTML serializer supports CDATA and PI. I will remove this empty methods.

>And when is mozdirty used? We really shouldn't have anything like that.
> It might be better to use some property in a content object. (Set/GetProperty)

We already talked about it on irc :-). I created a new bug 459212

>+nsXHTMLContentSerializer::AppendEndOfElementStart(nsIDOMElement *aOriginalElement,
>+...
>But this method is in XHTML serializer, not HTML.

Yes, of course, because the end of an HTML element is always ">", so we don't need this method in the HTML serializer. The behavior of this methods is for old HTML browser which doesn't support XHTML.

>Make sure this all works with nested bodies and when there are several sibling bodies. (Maybe you're testing that, haven't yet read the tests)

Yes there are tests, which are provided by the patch of bug 424359.

>+    // for backward compatibility with old HTML user agents,
>+    // empty elements should have an ending tag, so we mustn't call
>+    // nsXMLContentSerializer::CheckElementEnd
>But this is XHTML serializer. Why do we need to care about HTML UAs?

Why not ? It is needed by the editor, to generate pages that can be viewed in old HTML UAs. If a web developer creates a page with Mozilla Composer, BlueGriffon or else, I think he would like to see his page in his old browser (well, there are some chance that the rendering is not the expected one, but if we can minimize unexpected behaviors, at least by doing a good serialization for this old browsers...).

>+void
>+nsXHTMLContentSerializer::IncrIndentation(nsIAtom* aName)
> ...
>Why most /***/? Should you check namespace?

In fact, I hesitate: we can increment indentation on only few elements, but is it really necessary ? I don't know.

I will attach a new patch with all corrections you ask in your other comments. It is almost ready, I just have to add new tests.
Comment 8 User image Laurent Jouanneau 2008-10-13 05:41:37 PDT
Created attachment 342862 [details] [diff] [review]
version 1.1

It fixes bad things pointed by Olli: bad spelling in a comment, style nits, indentation, assignement of the forceFormat attribute, namespace checking in various methods, new test with a <pre> from an other namespace.

Thank you Olli for your review
Comment 9 User image Laurent Jouanneau 2008-10-13 14:21:18 PDT
Comment on attachment 342862 [details] [diff] [review]
version 1.1

I will update the patch tomorrow, since some changes have been made on the patch of bug 424359.
Comment 10 User image Laurent Jouanneau 2008-10-14 03:28:02 PDT
Created attachment 343031 [details] [diff] [review]
patch 1.2 : updated after changes on the patch of bug 424359
Comment 11 User image Olli Pettay [:smaug] (pto-ish for couple of days) 2008-10-16 07:24:49 PDT
Comment on attachment 343031 [details] [diff] [review]
patch 1.2 : updated after changes on the patch of bug 424359

Please use -P when generating patches.

Whenever casting is needed, use C++ style (static_cast<> etc.), not C's '()'.
>-    if (mIsCopying && mIsFirstChildOfOL && (aTagName == nsGkAtoms::li) && 
>-        (attrName == nsGkAtoms::value)){
>+    if (mIsCopying && mIsFirstChildOfOL && (aTagName == nsGkAtoms::li)
>+        && (attrName == nsGkAtoms::value)) {
No need for this. It just makes the patch even bigger.

>@@ -588,19 +186,6 @@
>     }
> 
>     attrName->ToString(nameStr);
>-    
>-    /*If we already crossed the MaxColumn limit or 
>-    * if this attr name-value pair(including a space,=,opening and closing quotes) is greater than MaxColumn limit
>-    * then start the attribute from a new line.
>-    */
>-
>-    if (mDoFormat
>-        && (mColPos >= mMaxColumn
>-            || ((PRInt32)(mColPos + nameStr.Length() +
>-                          valueStr.Length() + 4) > mMaxColumn))) {
>-        aStr.Append(mLineBreak);
>-        mColPos = 0;
>-    }
Without -P it is hard to say which method this code belongs to.

>+NS_IMETHODIMP
>+nsXHTMLContentSerializer::Init(PRUint32 aFlags, PRUint32 aWrapColumn,
>+                              const char* aCharSet, PRBool aIsCopying,
>+                              PRBool aIsWholeDocument)
>+{
>+  // to keep compatibility with existing calling code
>+  // XXXLJ perhaps we should remove these two tests ?
>+  if (aFlags & nsIDocumentEncoder::OutputFormatted ) {
>+      aFlags = aFlags | nsIDocumentEncoder::OutputWrap;
>+
>+  } else if (!(aFlags & nsIDocumentEncoder::OutputRaw)) {
>+      aFlags = aFlags | nsIDocumentEncoder::OutputWrap;
>+  }
Could you explain why this all? What compatibility?

>+nsXHTMLContentSerializer::HasLongLines(const nsString& text, PRInt32& aLastNewlineOffset)
>+{
>+  PRUint32 start=0;
>+  PRUint32 theLen=text.Length();
Add space before and after '='

>+  PRBool rv = PR_FALSE;
>+  aLastNewlineOffset = kNotFound;
>+  for (start = 0; start < theLen; ) {
>+    PRInt32 eol = text.FindChar('\n', start);
>+    if (eol < 0) {
>+      eol = text.Length();
>+    }
>+    else {
>+      aLastNewlineOffset = eol;
>+    }
>+    if (PRInt32(eol - start) > kLongLineLen)
>+      rv = PR_TRUE;
>+    start = eol+1;
Nit, Add space before and after '+'

>+  if (aNamespaceID != kNameSpaceID_XHTML && !mIsHTMLSerializer) {
>+    return PR_FALSE;
>+  }
I guess this should be
if ((mIsHTMLSerializer && aNamespaceID != kNameSpaceID_None) ||
    (!mIsHTMLSerializer && aNamespaceID != kNameSpaceID_XHTML))
Similar thing also elsewhere.

>+nsXHTMLContentSerializer::IncrIndentation(nsIAtom* aName)
>+{
>+  /*if ((aName == nsGkAtoms::head) ||
>+      (aName == nsGkAtoms::table) ||
>+      (aName == nsGkAtoms::tr) ||
>+      (aName == nsGkAtoms::ul) ||
>+      (aName == nsGkAtoms::ol) ||
>+      (aName == nsGkAtoms::dl) ||
>+      (aName == nsGkAtoms::tbody) ||
>+      (aName == nsGkAtoms::form) ||
>+      (aName == nsGkAtoms::frameset) ||
>+      (aName == nsGkAtoms::blockquote) ||
>+      (aName == nsGkAtoms::li) ||
>+      (aName == nsGkAtoms::dt) ||
>+      (aName == nsGkAtoms::dd)) {*/
>+      nsXMLContentSerializer::IncrIndentation(aName);
>+  //}
>+}
>+
>+void
>+nsXHTMLContentSerializer::DecrIndentation(nsIAtom* aName)
>+{
>+  /*if ((aName == nsGkAtoms::head) ||
>+      (aName == nsGkAtoms::table) ||
>+      (aName == nsGkAtoms::tr) ||
>+      (aName == nsGkAtoms::ul) ||
>+      (aName == nsGkAtoms::ol) ||
>+      (aName == nsGkAtoms::dl) ||
>+      (aName == nsGkAtoms::li) ||
>+      (aName == nsGkAtoms::tbody) ||
>+      (aName == nsGkAtoms::form) ||
>+      (aName == nsGkAtoms::blockquote) ||
>+      (aName == nsGkAtoms::dt) ||
>+      (aName == nsGkAtoms::dd) ||
>+      (aName == nsGkAtoms::frameset)) {*/
>+      nsXMLContentSerializer::DecrIndentation(aName);
>+  //}
>+}
Why you have commented out most of the code? Either the code should be there
and do something reasonable, or it shouldn't be there at all.


>+  // To keep track of startvalue of OL and first list item for nested lists
>+  struct olState {
>+    olState(PRInt32 aStart, PRBool aIsFirst):startVal(aStart),isFirstListItem(aIsFirst)
>+    {
>+    }
>+    PRInt32 startVal;
>+    PRBool isFirstListItem;
>+  };
>+
>+  nsAutoVoidArray   mOLStateStack;// Stack to store one olState struct per <OL>.
I know you're just moving this code, but could you make this something like
nsTArray<olState> mOLStateStack;
That way there isn't need to manually |new| and |delete|
You may want to add a copy constructor to struct olState.


>@@ -468,13 +565,13 @@
>-    
>+
No need for whitespace changes. Those just make the patch even bigger.
Same thing also elsewhere.

>+void
>+nsXMLContentSerializer::IncrIndentation(nsIAtom* aName)
>+{
>+  // we want to keep the source readable
>+  if(mDoWrap && mIndent.Length() >= mMaxColumn - 15) {
>+    mIndentOverflow ++;
>+  }
What is this magical 15?

>+  else {
>+    mIndent.AppendLiteral("   ");
Why 3 spaces? Maybe better to #define it somewhere.

>+void
>+nsXMLContentSerializer::AppendToStringConvertLF(const nsAString& aStr,
>+                                                nsAString& aOutputStr)
>+{
>+  if (mDoRaw) {
>+    nsAutoString str (aStr);
>+    PRInt32 lastNewlineOffset = str.RFindChar('\n');
>+    AppendToString(aStr, aOutputStr);
>+
>+    if (lastNewlineOffset != kNotFound) {
>+      PRInt32 lastNewlineOffset2 = str.RFindChar('\r');
>+      if (lastNewlineOffset2 != kNotFound) {
>+        lastNewlineOffset = lastNewlineOffset2;
>+      }
>+
>+      mColPos = aStr.Length() - lastNewlineOffset;
Could you add some comment here what these offsets are meant for.
...
>+        // if there was a line break before this substring
>+        // AppendNewLineToString was call, so we should reverse
>+        // this flag
>+        mMayIgnoreLineBreakSequence = PR_FALSE;
'was call' -> 'was called'?

>+nsXMLContentSerializer::AppendWrapped_NonWhitespaceSequence(
...
>+    if (aPos == aEnd || foundWhitespaceInLoop) {
>+      // there is enough room for the complete block we found
>+      if (mDoFormat && !mColPos) {
>+        AppendIndentation(aOutputStr);
>+      }
>+
>+      if (mAddSpace) {
>+        aOutputStr.Append(PRUnichar(' '));
>+        mAddSpace = PR_FALSE;
>+      }
So we want both AppendIndentation and Append(PRUnichar(' ')) ?

>+        if (foundWrapPosition) {
>+          if (!mColPos && mDoFormat) {
>+            AppendIndentation(aOutputStr);
>+          }
>+          if (mAddSpace) {
>+            aOutputStr.Append(PRUnichar(' '));
>+            mAddSpace = PR_FALSE;
>+          }
Same here. Is there a test for the case when !mColPos && mDoFormat && mAddSpace?
(if that is even possible case.)

>+          // try some simple fallback logic
>+          // go forward up to the next whitespace position,
>+          // in the worst case this will be all the rest of the data
>+          mColPos += length;
You added this. Could you explain why it is needed.

>+  // if the current line already has text on it, such as a tag,
>+  // leading whitespace is significant
>+  PRBool mayIgnoreStartOfLineWhitespaceSequence = (!mColPos 
>+                                            || (mIsIndentationAddedOnCurrentLine 
>+                                                && sequenceStartAfterAWhitespace
>+                                                && mColPos == mIndent.Length()));
Perhaps newline after '=' and start the expression from the next line.

>+void
>+nsXMLContentSerializer::AppendWrapped_WhitespaceSequence(
>+                        nsASingleFragmentString::const_char_iterator &aPos,
>+                        const nsASingleFragmentString::const_char_iterator aEnd,
>+                        const nsASingleFragmentString::const_char_iterator aSequenceStart,
>+                        nsAString &aOutputStr)
>+{
Note to myself: so this isn't replacing nsHTMLContentSerializer::AppendWrapped_WhitespaceSequence,
but there is nsXMLContentSerializer::AppendFormatedWrapped_WhitespaceSequence
for that, sort of. Documented in nsXMLContentSerializer.h ...

>+  // Handle the complete sequence of whitespace.
>+  // Continue to iterate until we find the first non-whitespace char.
>+  // Updates "aPos" to point to the first unhandled char.
>+  mAddSpace = PR_FALSE;
>+  mIsIndentationAddedOnCurrentLine = PR_FALSE;
>+
>+  PRBool leaveLoop = PR_FALSE;
>+  nsASingleFragmentString::const_char_iterator lastPos = aPos;
>+
>+  do {
>+    switch (*aPos) {
>+      case ' ':
>+      case '\t':
>+        // if there are two many spaces on a line, we wrap
'two many' -> 'too many'

>\ No newline at end of file
Fix these warnings
Comment 12 User image Olli Pettay [:smaug] (pto-ish for couple of days) 2008-10-31 08:56:12 PDT
New patch coming?
Comment 13 User image Laurent Jouanneau 2008-11-03 05:42:44 PST
Yes, it's coming :-) I already updated my local patch with your comments, but I added also some unit tests to test serialization of OL element with the htmlcopyencoder, and also on the serialization of a selection, of a single node etc.  It appears that some of these tests fail. My patch brokes some few things :-). I currently try to fix these bugs.

There are also few problems when we use the XHTML serializer on a HTML document (and I didn't already test the HTML serializer on an XHTML document). I will try to fix that too. I think I have also to modify the htmlcopyencoder to use the XHTML serializer when the document is in XHTML.

I think a new version of the patch will be ready at the end of the week.
Comment 14 User image Laurent Jouanneau 2008-11-17 14:38:07 PST
Created attachment 348652 [details] [diff] [review]
patch v2

Here is a new version of the patch.

It includes all wanted improvements:
- used static_cast instead of ()
- canceled changes on whitespaces and indentation
- added spaces where needed
- changed test on namespaces as indicated
- removed nsXHTMLContentSerializer::IncrIndentation and nsXHTMLContentSerializer::DecrIndentation
- changed the type of mOLStateStack to nsTArray<olState>
- put the "magical 15" and the default indentation into #define
- removed unuseful code in nsXMLContentSerializer::AppendToStringConvertLF
- fixed tests in nsXMLContentSerializer::AppendWrapped_NonWhitespaceSequence and added details in comments

And I added also:
- some unit tests on the serialization of selections, range, node etc..
- some unit tests on the serialization of OL lists
- unit tests on htmlcopyencoder
- a test for bug 270145 to verify I do not break something
- support of "autoplay" and "controls" attributes on video and audio elements

And I changed flags in nsContentAreaDragDrop to keep the original serialization of dragged content.
Comment 15 User image Laurent Jouanneau 2008-11-17 14:39:58 PST
Created attachment 348653 [details] [diff] [review]
diff between v1.2 and v2

Just a diff between patch v1.2 and patch v2...
Comment 16 User image Olli Pettay [:smaug] (pto-ish for couple of days) 2009-01-19 13:07:41 PST
The patch breaks /tests/embedding/test/test_bug293834.html
and I think also /tests/toolkit/components/url-classifier/tests/mochitest/test_classifier.html
Comment 17 User image Laurent Jouanneau 2009-01-20 15:11:03 PST
Created attachment 357873 [details] [diff] [review]
patch v2.1

This is an update of the patch for the current trunk (patch on a Makefile.in was broken). No modification in the part of the patch on the serializer itself.

This new patch includes a modification on test_bug293834.html to pass the test. No modification with test_classifier.html since it passes on my machine.
Comment 18 User image Olli Pettay [:smaug] (pto-ish for couple of days) 2009-01-21 02:02:07 PST
Do you know why the patch breaks test_bug293834.html? And is the new behavior
more correct than the old one?
Comment 19 User image Laurent Jouanneau 2009-01-21 02:13:05 PST
In fact, in this test, there is a serialisation, and the serialized content is then "unserialized" in a new iframe. Then the test check if a simple string "form state" is in textarea or input contents. However, the new serializer has a better wrapping, and then there is now a linebreak in some input contents : "form\nstate". This is why the test failed. So I changed the regular expression in the test : from /form state/  to /form\s+state/.
Comment 20 User image Olli Pettay [:smaug] (pto-ish for couple of days) 2009-01-21 04:58:03 PST
Created attachment 357942 [details]
comments
Comment 21 User image Laurent Jouanneau 2009-01-23 05:48:18 PST
Created attachment 358388 [details] [diff] [review]
patch v2.2

>@@ -1415,17 +1415,18 @@ nsTransferableFactory::SerializeNodeOrSe
>Is there some testcase for this? Or is it even possible to test this
>automatically? At least you have tested this manually, right?

It seems to be complex to do such testcase automatically (perhaps there are some functions in mochitests to facilitate to write test case for it ? Or am I wrong ?). So yes, I've done manual tests. However, I saw that this method use the "html copy encoder", so I added some test on it in test_htmlcopyencoder.html and test_htmlcopyencoder.xhtml.


>+      mPreLevel--;
>Use prefix operator (it is faster, though compiler probably knows how to optimize it)

Done, and also in all other lines where ++ and -- is used.

> mPreLevel decrement happens now somewhere else, right?

Yes, in nsXHTMLContentSerializer::MaybeLeaveFromPreContent()

>+PRBool
>+nsXHTMLContentSerializer::IsJavaScript(nsIAtom* aAttrNameAtom, const nsAString& aValueString)
>This should handle also XUL and SVG inside XHTML, right? Perhaps the method
>could have an nsIContent* aContent parameter, and based on its type
>it could check whether the attribute is an event attribute name.

Yes you're right. So I moved the IsJavascript method to the nsXMLContentSerializer, then XHTML/SVG/XUL attributes into any XML content can be checked too. Its new signature is
   PRBool IsJavaScript(nsIContent * aContent,
                       nsIAtom* aAttrNameAtom,
                       PRInt32 aAttrNamespaceID,
                       const nsAString& aValueString);

And I restored the original IsJavascript method into the nsHTMLContentSerializer.

>+void 
>+nsXHTMLContentSerializer::AppendEndOfElementStart(nsIDOMElement *aOriginalElement,
>+                                                  nsIAtom * aName,
>+                                                  PRInt32 aNamespaceID,
>+                                                  nsAString& aStr)
>+{
>+  // this method is not called by nsHTMLContentSerializer
>+  // so we don't have to check HTML element, just XHTML
>Could you add an assertion that this isn't an nsHTMLContentSerializer

I added 
  NS_ASSERTION(!mIsHTMLSerializer, "nsHTMLContentSerializer shouldn't call this method !");

Also in other places where you told me...

>+nsXHTMLContentSerializer::IsShorthandAttr(const nsIAtom* aAttrName,
>+                                        const nsIAtom* aElementName)
>Is there a testcase which checks that shorthand attributes works in HTML and XHTML,
>and don't if the element is in some other namespace?

No. I added some in file_xhtmlserializer_1. In XHTML, this method is called only when we are on a XHTML element.
So it is not called on any other elements. However, it was called for attributes which have a specific namespace, then I fixed it.

>+  PRPackedBool mInPseudoCDATA;
>Perhaps better to call this something else than ***CDATA.
>Maybe mInNoEntityEncoding ?

Ok, I renamed it to mDisableEntityEncoding.

>Btw, how does this handle the case when 'script' or 'style' etc. have
>child elements? I know, that isn't probably valid, but still possible
>if the DOM is modified by a script. Perhaps using a counter would be
>better than just a simple flag.

Yes, you're right, I changed this.

>+  // To keep track of startvalue of OL and first list item for nested lists
>+  struct olState {
>I'd prefer having an explicit copy-constructor, and maybe even operator=
>(depends on what nsTArray requires)

I added an explicit copy-constructor.

>+// the string used to indent.
>+#define INDENT_STRING "   "
>Why 3 spaces?

This is an arbitrary value..

>+
>+  mDoRaw = (mFlags & nsIDocumentEncoder::OutputRaw) ? PR_TRUE
>+                                                    : PR_FALSE;
>mDoRaw = !!(mFlags & nsIDocumentEncoder::OutputRaw);

ok

>+
>+  mDoFormat = (mFlags & nsIDocumentEncoder::OutputFormatted 
>+                && !mDoRaw) ? PR_TRUE
>+                            : PR_FALSE;
>Why not just
>  mDoFormat = (mFlags & nsIDocumentEncoder::OutputFormatted && !mDoRaw);

ok

>+
>+  mDoWrap = (mFlags & nsIDocumentEncoder::OutputWrap
>+            && !mDoRaw) ? PR_TRUE
>+                        : PR_FALSE;
>Similar thing here

ok

>Use NS_NAMED_LITERAL_STRING for <![CDATA[

ok. Done for  "<!--" too.

>   if (!data.IsEmpty()) {
>-    AppendToString(NS_LITERAL_STRING(" "), aStr);
>+    AppendToString(PRUnichar(' '), aStr);
>Nit, is there any reason for this change?

Because I think it uses less memory since with a PRUnichar, we don't instantiate any object...

>-        if ( *buf == PRUnichar('\'') )
>-        {
>+        if (*buf == PRUnichar('\'')) {
>Why this change? The coding style is anyway a bit strange around here.

Ok, I canceled this change.


I fixed also all other little things indicated in your comments: about syntax, about whitespaces, about C++/Idl comments etc. I fixed also testcases which didn't pass on windows.
Comment 22 User image Olli Pettay [:smaug] (pto-ish for couple of days) 2009-01-23 10:30:59 PST
Created attachment 358437 [details]
comments
Comment 23 User image Olli Pettay [:smaug] (pto-ish for couple of days) 2009-01-23 10:32:37 PST
Comment on attachment 358388 [details] [diff] [review]
patch v2.2

I could mark this r+ now, but this needs good sr too. The patch is just so huge (although lots of it is moving code).
Comment 24 User image Laurent Jouanneau 2009-01-27 06:43:25 PST
Created attachment 359039 [details] [diff] [review]
patch v2.3

Fixed Olli's latest comments. For the isJavascript method, I need to know how to make the difference between a HTML content and a XML node without a namespace. Should I do a QueryInterface each time I run this method ?
Comment 25 User image Olli Pettay [:smaug] (pto-ish for couple of days) 2009-01-27 07:06:58 PST
Maybe aContent->IsNodeOfType(nsINode::eHTML)
Comment 26 User image Laurent Jouanneau 2009-01-27 09:12:03 PST
Created attachment 359058 [details] [diff] [review]
patch v2.4

Thanks Olli, I removed the isJavascript method from nsHTMLContentSerializer.

Oh, and the bug for MathML is bug 475518.
Comment 27 User image Jonas Sicking (:sicking) No longer reading bugmail consistently 2009-04-09 14:30:05 PDT
Comment on attachment 359058 [details] [diff] [review]
patch v2.4

>@@ -685,649 +265,143 @@ nsHTMLContentSerializer::AppendElementSt
>       //If OL has "start" attribute, first LI element has to start with that value
>       //Therefore subtracting 1 as all the LI elements are incrementing it before using it;
>       //In failure of ToInteger(), default StartAttrValue to 0.
>       if (NS_SUCCEEDED(rv))
>         startAttrVal--; 
>       else
>         startAttrVal = 0;
>     }
>-    olState* state = new olState(startAttrVal, PR_TRUE);
>-    if (state)
>-      mOLStateStack.AppendElement(state);
>+    olState state (startAttrVal, PR_TRUE);
>+    mOLStateStack.AppendElement(state);

No space before the '('.

Also, I think you could write this as
mOLStateStack.AppendElement(olState(startAttrVal, PR_TRUE));

>+nsXHTMLContentSerializer::SerializeAttributes(nsIContent* aContent,
...
>+  NS_NAMED_LITERAL_STRING(_mozStr, "_moz");
...
>+    // Filter out any attribute starting with [-|_]moz
>+    const char* sharedName;
>+    attrName->GetUTF8String(&sharedName);
>+    if ((('_' == *sharedName) || ('-' == *sharedName)) &&
>+        !nsCRT::strncmp(sharedName+1, kMozStr, PRUint32(sizeof(kMozStr)-1))) {
>+      continue;
>+    }

is the '+1' there really correct? Seems like this will filter attributes starting with "__moz" or "-_moz"

Other than that I couldn't find anything. But this patch is a bit enormous and so hard to review. sr=me either way
Comment 28 User image Laurent Jouanneau 2009-04-12 06:00:21 PDT
Created attachment 372288 [details] [diff] [review]
patch v2.5

Unfortunately, the applying of the patch 2.4 fails on the current trunk because of some recents changes on nsHTMLContentSerializer.* and nsXMLContentSerializer.*. This changes are on the nsVoidArray properties (http://hg.mozilla.org/mozilla-central/rev/52c7758d5a4d and http://hg.mozilla.org/mozilla-central/rev/659355060532 ). On nsHTMLContentSerializer, my patch 2.4 contains already this changes (asked by Olli, see comments below), but it doesn't includes changes on nsXMLContentSerializer.

I updated my patch.

>Also, I think you could write this as
>mOLStateStack.AppendElement(olState(startAttrVal, PR_TRUE));

Yes, it is already made in the trunk

>is the '+1' there really correct? Seems like this will filter attributes
starting with "__moz" or "-_moz"

no, because kMozStr equals to "moz", not "_moz". I added some tests about this in this new patches.

Thanks for your superreview Jonas :-)
Comment 29 User image Laurent Jouanneau 2009-04-12 06:12:21 PDT
Created attachment 372289 [details] [diff] [review]
the diff between the latest two patches

here are the differences between patch v2.4 and v2.5.

I forgot to say that I also updated some few unit tests that I forgot to modify for patch 2.4 (difference of number of spaces for indentation).

For patch 2.5, I launched mochitests and xpcshell tests on linux and mac, and all is green, except on some tests in toolkit/content/tests/widgets/test_textbox_number.xul and browser/base/content/test/test_contextmenu.html. However I don't think it is related to the serializer.
Comment 30 User image Laurent Jouanneau 2009-04-12 06:14:56 PDT
Should I ask a new review/superreview or can we land the patch in the trunk ?
Comment 31 User image Olli Pettay [:smaug] (pto-ish for couple of days) 2009-04-12 12:26:32 PDT
The patch is so huge that perhaps worth to use tryserver before landing to trunk.
Comment 32 User image Laurent Jouanneau 2009-04-15 13:34:44 PDT
Ok Olli. I haven't rights to use tryserver. Could somebody test my patch on a tryserver please ?
Comment 33 User image Laurent Jouanneau 2009-04-16 09:35:29 PDT
Olli has landed the patch into tryservers. The results are :

For build only, it's ok
- linux: http://tinderbox.mozilla.org/showlog.cgi?log=MozillaTry/1239886953.1239890526.19483.gz
- mac: http://tinderbox.mozilla.org/showlog.cgi?log=MozillaTry/1239886954.1239892679.28620.gz
- windows: http://tinderbox.mozilla.org/showlog.cgi?log=MozillaTry/1239886953.1239894765.32652.gz

For build+test
- mac: ok: http://tinderbox.mozilla.org/showlog.cgi?log=MozillaTry/1239886954.1239895206.842.gz
- leaks on linux : http://tinderbox.mozilla.org/showlog.cgi?log=MozillaTry/1239886953.1239894676.32544.gz
However, these leaks appear on many components which are not related to the serializer I mean (ex: mozstorage).

- a test failed on windows http://tinderbox.mozilla.org/showlog.cgi?log=MozillaTry/1239886953.1239898098.6218.gz

Apparently, it is because of a difference on end-of-line characters.. I will fix this test :-/
Comment 34 User image Laurent Jouanneau 2009-04-20 09:49:22 PDT
Created attachment 373667 [details] [diff] [review]
patch v2.6

I just fixed the test which fails. It was just a missing flag OutputLFLineBreak when calling the encoder.
Comment 35 User image Laurent Jouanneau 2009-04-20 09:50:19 PDT
Olli, could you test this patch on try servers please ?
Comment 36 User image Olli Pettay [:smaug] (pto-ish for couple of days) 2009-04-20 13:15:18 PDT
Uploaded to tryserver. Look for 'laurentj_serializers_26'
Comment 37 User image Laurent Jouanneau 2009-04-21 01:52:50 PDT
Thank you Olli.

All box are greens except one. Here are the reports for all green boxes:

http://tinderbox.mozilla.org/showlog.cgi?log=MozillaTry/1240258500.1240276856.6161.gz
http://tinderbox.mozilla.org/showlog.cgi?log=MozillaTry/1240258500.1240270883.31148.gz
http://tinderbox.mozilla.org/showlog.cgi?log=MozillaTry/1240258500.1240270911.31177.gz
http://tinderbox.mozilla.org/showlog.cgi?log=MozillaTry/1240258500.1240270971.31240.gz
http://tinderbox.mozilla.org/showlog.cgi?log=MozillaTry/1240258529.1240264875.21996.gz
http://tinderbox.mozilla.org/showlog.cgi?log=MozillaTry/1240260238.1240269576.29357.gz
http://tinderbox.mozilla.org/showlog.cgi?log=MozillaTry/1240258529.1240260818.15811.gz
http://tinderbox.mozilla.org/showlog.cgi?log=MozillaTry/1240259546.1240265209.22553.gz
http://tinderbox.mozilla.org/showlog.cgi?log=MozillaTry/1240260811.1240267460.26305.gz
http://tinderbox.mozilla.org/showlog.cgi?log=MozillaTry/1240261231.1240272761.927.gz
http://tinderbox.mozilla.org/showlog.cgi?log=MozillaTry/1240258500.1240278058.7574.gz
http://tinderbox.mozilla.org/showlog.cgi?log=MozillaTry/1240258500.1240277494.6906.gz
http://tinderbox.mozilla.org/showlog.cgi?log=MozillaTry/1240258500.1240277619.7035.gz
http://tinderbox.mozilla.org/showlog.cgi?log=MozillaTry/1240258500.1240277623.7038.gz

The fail :
http://tinderbox.mozilla.org/showlog.cgi?log=MozillaTry/1240258500.1240269405.29116.gz&fulltext=1

The browser crashed during talos tests. I don't know why. Log file is not very detailed about the fail.

Is it significant or can we commit the patch into mozilla-central ?
Comment 38 User image Laurent Jouanneau 2009-04-25 01:35:57 PDT
Yesterday Olli has re-pushed the patch on try servers, and all were greens.
Comment 39 User image Olli Pettay [:smaug] (pto-ish for couple of days) 2009-04-25 02:15:04 PDT
I'll push this once my build is ready.
Comment 40 User image Olli Pettay [:smaug] (pto-ish for couple of days) 2009-04-25 04:19:14 PDT
http://hg.mozilla.org/mozilla-central/rev/f6d27eb05ec1
Comment 41 User image Laurent Jouanneau 2009-04-27 05:52:02 PDT
Thank you Olli for all, and thanks Jonas for your superreview :-)
Comment 42 User image Boris Zbarsky [:bz] (still a bit busy) 2009-08-04 08:02:52 PDT
This seems to have added a warning which looked like a serious bug to me at first glance (virtual methods not being overridden correctly):

mozilla/src/content/base/src/nsXHTMLContentSerializer.h:114:
warning: ‘virtual void nsXHTMLContentSerializer::SerializeAttributes(nsIContent*, nsIDOMElement*, nsAString_internal&, const nsAString_internal&, nsIAtom*, nsAString_internal&, PRUint32, PRBool)’ was hidden
by ‘virtual void nsHTMLContentSerializer::SerializeAttributes(nsIContent*, nsIDOMElement*, nsAString_internal&, const nsAString_internal&, nsIAtom*, nsAString_internal&)’

After spending an hour or so reading through this code, I _think_ that it's correct.  But is there a reason the HTML one has a different signature?  Or isn't called SerializeHTMLAttributes or something?
Comment 43 User image Laurent Jouanneau 2009-08-19 03:41:53 PDT
Actually, I think there is no reason to define this nsHTMLContentSerializer::SerializeAttributes method with the same name of the  nsXHTMLContentSerializer one. Yes, we could call it SerializeHTMLAttributes.

Do you want I create a bug and provide a patch ?
Comment 44 User image Boris Zbarsky [:bz] (still a bit busy) 2009-08-24 09:12:32 PDT
That would be wonderful!
Comment 45 User image Laurent Jouanneau 2009-08-31 02:00:14 PDT
done: https://bugzilla.mozilla.org/show_bug.cgi?id=513647
Comment 46 User image Boris Zbarsky [:bz] (still a bit busy) 2010-03-03 10:08:10 PST
This patch broke the wrapping behavior when serializing HTML.  See bug 549295.
Comment 47 User image Henri Sivonen (:hsivonen) 2011-04-21 02:22:44 PDT
What's the use case for having the W3CEntities (MathML) configuration?

Note You need to log in before you can comment on or make changes to this bug.