Closed Bug 422403 Opened 17 years ago Closed 15 years ago

improvements to the XML and XHTML serializers

Tracking

()

Status:

RESOLVED FIXED

People

(Reporter: laurent, Assigned: laurent)

References

(Blocks 4 open bugs)

Details

Attachments

(4 files, 14 obsolete files)

Work in progress #1 16 years ago Laurent Jouanneau 74.52 KB, patch		Details \| Diff \| Splinter Review
Work in progress #2 16 years ago Laurent Jouanneau 100.80 KB, patch		Details \| Diff \| Splinter Review
Work in progress #3 16 years ago Laurent Jouanneau 137.30 KB, patch		Details \| Diff \| Splinter Review
the patch, version 1.0 16 years ago Laurent Jouanneau 272.30 KB, patch	smaug : review-	Details \| Diff \| Splinter Review
comments 16 years ago Olli Pettay [:smaug][bugs@pettay.fi] 9.10 KB, text/plain		Details
version 1.1 16 years ago Laurent Jouanneau 278.24 KB, patch		Details \| Diff \| Splinter Review
patch 1.2 : updated after changes on the patch of bug 424359 16 years ago Laurent Jouanneau 277.96 KB, patch	smaug : review-	Details \| Diff \| Splinter Review
patch v2 16 years ago Laurent Jouanneau 331.35 KB, patch		Details \| Diff \| Splinter Review
diff between v1.2 and v2 16 years ago Laurent Jouanneau 79.39 KB, patch		Details \| Diff \| Splinter Review
patch v2.1 16 years ago Laurent Jouanneau 332.58 KB, patch		Details \| Diff \| Splinter Review
comments 16 years ago Olli Pettay [:smaug][bugs@pettay.fi] 15.44 KB, text/plain		Details
patch v2.2 16 years ago Laurent Jouanneau 344.29 KB, patch	smaug : review+	Details \| Diff \| Splinter Review
comments 16 years ago Olli Pettay [:smaug][bugs@pettay.fi] 2.59 KB, text/plain		Details
patch v2.3 16 years ago Laurent Jouanneau 344.39 KB, patch		Details \| Diff \| Splinter Review
patch v2.4 16 years ago Laurent Jouanneau 343.95 KB, patch	sicking : superreview+	Details \| Diff \| Splinter Review
patch v2.5 15 years ago Laurent Jouanneau 341.70 KB, patch		Details \| Diff \| Splinter Review
the diff between the latest two patches 15 years ago Laurent Jouanneau 35.47 KB, patch		Details \| Diff \| Splinter Review
patch v2.6 15 years ago Laurent Jouanneau 341.72 KB, patch		Details \| Diff \| Splinter Review

Laurent Jouanneau

Assignee

Description

•

17 years ago

In the current version of Gecko, the XML serializer is used to serialize XHTML content. However, it doesn't serialize like the HTML serializer does for HTML documents (pretty-printing, specific serialization for some tags etc.) so it isn't really usefull for Composer.

I'm going to fix this bug. I think I will create a new nsXHTMLContentSerializer class. It will support features of nsHTMLContentSerializer, and of course will take care of all XML stuff like namespaces etc.

Laurent Jouanneau

Assignee

Updated

•

17 years ago

Depends on: 424359

Laurent Jouanneau

Assignee

Comment 1

•

17 years ago

After working few hours on the serializers, I decided to make improvements in the XML serializer at the same time, to do pretty printing, wrapping etc. In fact, I need it for my wysiwyg XML editor (ETNA).

Status: NEW → ASSIGNED

Summary: new XHTML serializer → improvements of the XML and XHTML serializers

Laurent Jouanneau

Assignee

Comment 2

•

16 years ago

Attached patch Work in progress #1 (obsolete) — Details — Splinter Review

First patch. It is not finished. Worked only on the XML serializer so I didn't updated mochitests for the HTML serializer. Indentation doesn't work very well and then some xpcshell tests fail for the XML serializer. You need to apply the patch for bug 424359 before applying this patch.

Laurent Jouanneau

Assignee

Comment 3

•

16 years ago

Attached patch Work in progress #2 (obsolete) — Details — Splinter Review

The XML serializer is almost finished. Indentation is better than in the old HTML serializer (text content is well indented). Some tests should be still added to check attributes and namespaces serialization. I worked on the HTML serializer but it's not finished.

Attachment #330248 - Attachment is obsolete: true

Laurent Jouanneau

Assignee

Comment 4

•

16 years ago

Attached patch Work in progress #3 (obsolete) — Details — Splinter Review

HTML and XML serializers are fines now. All tests pass :-)

I'm working now on a new nsXHMLContentSerializer class, which will inherit from nsXMLContentSerializer and nsHTMLContentSerializer will inherit from nsXHTMLContentSerializer.

Attachment #334297 - Attachment is obsolete: true

Laurent Jouanneau

Assignee

Updated

•

16 years ago

Component: XML → Serializers

QA Contact: xml → dom-to-text

Laurent Jouanneau

Assignee

Updated

•

16 years ago

Blocks: 333060, 281407, 278518, 236290

Laurent Jouanneau

Assignee

Comment 5

•

16 years ago

Attached patch the patch, version 1.0 (obsolete) — Details — Splinter Review

I think my patch is ready for a first review.

Remember that you need to apply the patch for bug 424359 before applying this patch.

So, what's this patch doing ?

- it adds a new content serializer for XHTML : there is a new class nsXHTMLContentSerializer which inherits from nsXMLContentSerializer. This serializer is called only for pages which have a application/xhtml+xml content type. This serializer respects the XML serialization specification, but also specificities of HTML serialization (pre elements, script elements etc..).
- most of features of nsHTMLContentSerializer are moved into the XML serializer, or into the XHTML serializer.
- nsHTMLContentSerializer now inherits from nsXHTMLContentSerializer.
- the XML serializer supports now this flags : OutputFormatted, OutputWrap (+ OutputLFLineBreak and OutputCRLineBreak added by the patch of bug 424359)
- indentation and wrapping are better (there were some problems in the HTML serializer).
- unit tests are added (and I will add more tests later)
- documentation of flags on nsIDocumentEncoder is updated

Attachment #335045 - Attachment is obsolete: true

Attachment #338851 - Flags: review?(Olli.Pettay)

:Gavin Sharp [email: gavin@gavinsharp.com]

Updated

•

16 years ago

Summary: improvements of the XML and XHTML serializers → improvements to the XML and XHTML serializers

Olli Pettay [:smaug][bugs@pettay.fi]

Comment 6

•

16 years ago

Attached file comments — Details

See comments in the attachment

This is so huge patch that few iterations are needed.
Sorry that reviewing has taken so much time.

Olli Pettay [:smaug][bugs@pettay.fi]

Updated

•

16 years ago

Attachment #338851 - Flags: review?(Olli.Pettay) → review-

Laurent Jouanneau

Assignee

Comment 7

•

16 years ago

> This is so huge patch that few iterations are needed.
> Sorry that reviewing has taken so much time.

Don't worry, no problem, I understand :-)

Here are responses to some of your questions in your comments. 

> But why does OutputRaw wrap? How is XHTML handled.

There isn't any wrap when you set the OutputRaw flag. But if you don't set any flags (no OutputFormatted, no OutputRaw, no OutputWrap), it acts like OutputRaw but it wraps long lines. And I don't know why, so I just kept the current behavior. Perhaps it is related to the editor, or mail...

> do we really want HTML entities in XHTML?

Yes, the specification of XHTML says : "The XHTML entity sets are the same as for HTML 4". http://www.w3.org/TR/xhtml1/dtds.html#h-A2

>Actually, why are these no-op? What if CDATA or PI is added to an HTML document using scripts? > How does this work currently?

Sorry, I thought that SGML didn't support CDATA or PI, and this is not the case, according to the specification, even if the HTML specification says that the user agent should ignore them http://www.w3.org/TR/html4/appendix/notes.html#sgmlfeatures.
The current HTML serializer supports CDATA and PI. I will remove this empty methods.

>And when is mozdirty used? We really shouldn't have anything like that.
> It might be better to use some property in a content object. (Set/GetProperty)

We already talked about it on irc :-). I created a new bug 459212

>+nsXHTMLContentSerializer::AppendEndOfElementStart(nsIDOMElement *aOriginalElement,
>+...
>But this method is in XHTML serializer, not HTML.

Yes, of course, because the end of an HTML element is always ">", so we don't need this method in the HTML serializer. The behavior of this methods is for old HTML browser which doesn't support XHTML.

>Make sure this all works with nested bodies and when there are several sibling bodies. (Maybe you're testing that, haven't yet read the tests)

Yes there are tests, which are provided by the patch of bug 424359.

>+    // for backward compatibility with old HTML user agents,
>+    // empty elements should have an ending tag, so we mustn't call
>+    // nsXMLContentSerializer::CheckElementEnd
>But this is XHTML serializer. Why do we need to care about HTML UAs?

Why not ? It is needed by the editor, to generate pages that can be viewed in old HTML UAs. If a web developer creates a page with Mozilla Composer, BlueGriffon or else, I think he would like to see his page in his old browser (well, there are some chance that the rendering is not the expected one, but if we can minimize unexpected behaviors, at least by doing a good serialization for this old browsers...).

>+void
>+nsXHTMLContentSerializer::IncrIndentation(nsIAtom* aName)
> ...
>Why most /***/? Should you check namespace?

In fact, I hesitate: we can increment indentation on only few elements, but is it really necessary ? I don't know.

I will attach a new patch with all corrections you ask in your other comments. It is almost ready, I just have to add new tests.

Laurent Jouanneau

Assignee

Comment 8

•

16 years ago

Attached patch version 1.1 (obsolete) — Details — Splinter Review

It fixes bad things pointed by Olli: bad spelling in a comment, style nits, indentation, assignement of the forceFormat attribute, namespace checking in various methods, new test with a <pre> from an other namespace.

Thank you Olli for your review

Attachment #338851 - Attachment is obsolete: true

Attachment #342862 - Flags: review?(Olli.Pettay)

Laurent Jouanneau

Assignee

Comment 9

•

16 years ago

Comment on attachment 342862 [details] [diff] [review]
version 1.1

I will update the patch tomorrow, since some changes have been made on the patch of bug 424359.

Attachment #342862 - Flags: review?(Olli.Pettay)

Laurent Jouanneau

Assignee

Comment 10

•

16 years ago

Attached patch patch 1.2 : updated after changes on the patch of bug 424359 (obsolete) — Details — Splinter Review

Attachment #342862 - Attachment is obsolete: true

Attachment #343031 - Flags: review?(Olli.Pettay)

Olli Pettay [:smaug][bugs@pettay.fi]

Comment 11

•

16 years ago

Comment on attachment 343031 [details] [diff] [review]
patch 1.2 : updated after changes on the patch of bug 424359

Please use -P when generating patches.

Whenever casting is needed, use C++ style (static_cast<> etc.), not C's '()'.
>-    if (mIsCopying && mIsFirstChildOfOL && (aTagName == nsGkAtoms::li) && 
>-        (attrName == nsGkAtoms::value)){
>+    if (mIsCopying && mIsFirstChildOfOL && (aTagName == nsGkAtoms::li)
>+        && (attrName == nsGkAtoms::value)) {
No need for this. It just makes the patch even bigger.

>@@ -588,19 +186,6 @@
>     }
> 
>     attrName->ToString(nameStr);
>-    
>-    /*If we already crossed the MaxColumn limit or 
>-    * if this attr name-value pair(including a space,=,opening and closing quotes) is greater than MaxColumn limit
>-    * then start the attribute from a new line.
>-    */
>-
>-    if (mDoFormat
>-        && (mColPos >= mMaxColumn
>-            || ((PRInt32)(mColPos + nameStr.Length() +
>-                          valueStr.Length() + 4) > mMaxColumn))) {
>-        aStr.Append(mLineBreak);
>-        mColPos = 0;
>-    }
Without -P it is hard to say which method this code belongs to.

>+NS_IMETHODIMP
>+nsXHTMLContentSerializer::Init(PRUint32 aFlags, PRUint32 aWrapColumn,
>+                              const char* aCharSet, PRBool aIsCopying,
>+                              PRBool aIsWholeDocument)
>+{
>+  // to keep compatibility with existing calling code
>+  // XXXLJ perhaps we should remove these two tests ?
>+  if (aFlags & nsIDocumentEncoder::OutputFormatted ) {
>+      aFlags = aFlags | nsIDocumentEncoder::OutputWrap;
>+
>+  } else if (!(aFlags & nsIDocumentEncoder::OutputRaw)) {
>+      aFlags = aFlags | nsIDocumentEncoder::OutputWrap;
>+  }
Could you explain why this all? What compatibility?

>+nsXHTMLContentSerializer::HasLongLines(const nsString& text, PRInt32& aLastNewlineOffset)
>+{
>+  PRUint32 start=0;
>+  PRUint32 theLen=text.Length();
Add space before and after '='

>+  PRBool rv = PR_FALSE;
>+  aLastNewlineOffset = kNotFound;
>+  for (start = 0; start < theLen; ) {
>+    PRInt32 eol = text.FindChar('\n', start);
>+    if (eol < 0) {
>+      eol = text.Length();
>+    }
>+    else {
>+      aLastNewlineOffset = eol;
>+    }
>+    if (PRInt32(eol - start) > kLongLineLen)
>+      rv = PR_TRUE;
>+    start = eol+1;
Nit, Add space before and after '+'

>+  if (aNamespaceID != kNameSpaceID_XHTML && !mIsHTMLSerializer) {
>+    return PR_FALSE;
>+  }
I guess this should be
if ((mIsHTMLSerializer && aNamespaceID != kNameSpaceID_None) ||
    (!mIsHTMLSerializer && aNamespaceID != kNameSpaceID_XHTML))
Similar thing also elsewhere.

>+nsXHTMLContentSerializer::IncrIndentation(nsIAtom* aName)
>+{
>+  /*if ((aName == nsGkAtoms::head) ||
>+      (aName == nsGkAtoms::table) ||
>+      (aName == nsGkAtoms::tr) ||
>+      (aName == nsGkAtoms::ul) ||
>+      (aName == nsGkAtoms::ol) ||
>+      (aName == nsGkAtoms::dl) ||
>+      (aName == nsGkAtoms::tbody) ||
>+      (aName == nsGkAtoms::form) ||
>+      (aName == nsGkAtoms::frameset) ||
>+      (aName == nsGkAtoms::blockquote) ||
>+      (aName == nsGkAtoms::li) ||
>+      (aName == nsGkAtoms::dt) ||
>+      (aName == nsGkAtoms::dd)) {*/
>+      nsXMLContentSerializer::IncrIndentation(aName);
>+  //}
>+}
>+
>+void
>+nsXHTMLContentSerializer::DecrIndentation(nsIAtom* aName)
>+{
>+  /*if ((aName == nsGkAtoms::head) ||
>+      (aName == nsGkAtoms::table) ||
>+      (aName == nsGkAtoms::tr) ||
>+      (aName == nsGkAtoms::ul) ||
>+      (aName == nsGkAtoms::ol) ||
>+      (aName == nsGkAtoms::dl) ||
>+      (aName == nsGkAtoms::li) ||
>+      (aName == nsGkAtoms::tbody) ||
>+      (aName == nsGkAtoms::form) ||
>+      (aName == nsGkAtoms::blockquote) ||
>+      (aName == nsGkAtoms::dt) ||
>+      (aName == nsGkAtoms::dd) ||
>+      (aName == nsGkAtoms::frameset)) {*/
>+      nsXMLContentSerializer::DecrIndentation(aName);
>+  //}
>+}
Why you have commented out most of the code? Either the code should be there
and do something reasonable, or it shouldn't be there at all.


>+  // To keep track of startvalue of OL and first list item for nested lists
>+  struct olState {
>+    olState(PRInt32 aStart, PRBool aIsFirst):startVal(aStart),isFirstListItem(aIsFirst)
>+    {
>+    }
>+    PRInt32 startVal;
>+    PRBool isFirstListItem;
>+  };
>+
>+  nsAutoVoidArray   mOLStateStack;// Stack to store one olState struct per <OL>.
I know you're just moving this code, but could you make this something like
nsTArray<olState> mOLStateStack;
That way there isn't need to manually |new| and |delete|
You may want to add a copy constructor to struct olState.


>@@ -468,13 +565,13 @@
>-    
>+
No need for whitespace changes. Those just make the patch even bigger.
Same thing also elsewhere.

>+void
>+nsXMLContentSerializer::IncrIndentation(nsIAtom* aName)
>+{
>+  // we want to keep the source readable
>+  if(mDoWrap && mIndent.Length() >= mMaxColumn - 15) {
>+    mIndentOverflow ++;
>+  }
What is this magical 15?

>+  else {
>+    mIndent.AppendLiteral("   ");
Why 3 spaces? Maybe better to #define it somewhere.

>+void
>+nsXMLContentSerializer::AppendToStringConvertLF(const nsAString& aStr,
>+                                                nsAString& aOutputStr)
>+{
>+  if (mDoRaw) {
>+    nsAutoString str (aStr);
>+    PRInt32 lastNewlineOffset = str.RFindChar('\n');
>+    AppendToString(aStr, aOutputStr);
>+
>+    if (lastNewlineOffset != kNotFound) {
>+      PRInt32 lastNewlineOffset2 = str.RFindChar('\r');
>+      if (lastNewlineOffset2 != kNotFound) {
>+        lastNewlineOffset = lastNewlineOffset2;
>+      }
>+
>+      mColPos = aStr.Length() - lastNewlineOffset;
Could you add some comment here what these offsets are meant for.
...
>+        // if there was a line break before this substring
>+        // AppendNewLineToString was call, so we should reverse
>+        // this flag
>+        mMayIgnoreLineBreakSequence = PR_FALSE;
'was call' -> 'was called'?

>+nsXMLContentSerializer::AppendWrapped_NonWhitespaceSequence(
...
>+    if (aPos == aEnd || foundWhitespaceInLoop) {
>+      // there is enough room for the complete block we found
>+      if (mDoFormat && !mColPos) {
>+        AppendIndentation(aOutputStr);
>+      }
>+
>+      if (mAddSpace) {
>+        aOutputStr.Append(PRUnichar(' '));
>+        mAddSpace = PR_FALSE;
>+      }
So we want both AppendIndentation and Append(PRUnichar(' ')) ?

>+        if (foundWrapPosition) {
>+          if (!mColPos && mDoFormat) {
>+            AppendIndentation(aOutputStr);
>+          }
>+          if (mAddSpace) {
>+            aOutputStr.Append(PRUnichar(' '));
>+            mAddSpace = PR_FALSE;
>+          }
Same here. Is there a test for the case when !mColPos && mDoFormat && mAddSpace?
(if that is even possible case.)

>+          // try some simple fallback logic
>+          // go forward up to the next whitespace position,
>+          // in the worst case this will be all the rest of the data
>+          mColPos += length;
You added this. Could you explain why it is needed.

>+  // if the current line already has text on it, such as a tag,
>+  // leading whitespace is significant
>+  PRBool mayIgnoreStartOfLineWhitespaceSequence = (!mColPos 
>+                                            || (mIsIndentationAddedOnCurrentLine 
>+                                                && sequenceStartAfterAWhitespace
>+                                                && mColPos == mIndent.Length()));
Perhaps newline after '=' and start the expression from the next line.

>+void
>+nsXMLContentSerializer::AppendWrapped_WhitespaceSequence(
>+                        nsASingleFragmentString::const_char_iterator &aPos,
>+                        const nsASingleFragmentString::const_char_iterator aEnd,
>+                        const nsASingleFragmentString::const_char_iterator aSequenceStart,
>+                        nsAString &aOutputStr)
>+{
Note to myself: so this isn't replacing nsHTMLContentSerializer::AppendWrapped_WhitespaceSequence,
but there is nsXMLContentSerializer::AppendFormatedWrapped_WhitespaceSequence
for that, sort of. Documented in nsXMLContentSerializer.h ...

>+  // Handle the complete sequence of whitespace.
>+  // Continue to iterate until we find the first non-whitespace char.
>+  // Updates "aPos" to point to the first unhandled char.
>+  mAddSpace = PR_FALSE;
>+  mIsIndentationAddedOnCurrentLine = PR_FALSE;
>+
>+  PRBool leaveLoop = PR_FALSE;
>+  nsASingleFragmentString::const_char_iterator lastPos = aPos;
>+
>+  do {
>+    switch (*aPos) {
>+      case ' ':
>+      case '\t':
>+        // if there are two many spaces on a line, we wrap
'two many' -> 'too many'

>\ No newline at end of file
Fix these warnings

Attachment #343031 - Flags: review?(Olli.Pettay) → review-

Olli Pettay [:smaug][bugs@pettay.fi]

Comment 12

•

16 years ago

New patch coming?

Laurent Jouanneau

Assignee

Comment 13

•

16 years ago

Yes, it's coming :-) I already updated my local patch with your comments, but I added also some unit tests to test serialization of OL element with the htmlcopyencoder, and also on the serialization of a selection, of a single node etc.  It appears that some of these tests fail. My patch brokes some few things :-). I currently try to fix these bugs.

There are also few problems when we use the XHTML serializer on a HTML document (and I didn't already test the HTML serializer on an XHTML document). I will try to fix that too. I think I have also to modify the htmlcopyencoder to use the XHTML serializer when the document is in XHTML.

I think a new version of the patch will be ready at the end of the week.

Laurent Jouanneau

Assignee

Updated

•

16 years ago

Blocks: 56921

Laurent Jouanneau

Assignee

Comment 14

•

16 years ago

Attached patch patch v2 (obsolete) — Details — Splinter Review

Here is a new version of the patch.

It includes all wanted improvements:
- used static_cast instead of ()
- canceled changes on whitespaces and indentation
- added spaces where needed
- changed test on namespaces as indicated
- removed nsXHTMLContentSerializer::IncrIndentation and nsXHTMLContentSerializer::DecrIndentation
- changed the type of mOLStateStack to nsTArray<olState>
- put the "magical 15" and the default indentation into #define
- removed unuseful code in nsXMLContentSerializer::AppendToStringConvertLF
- fixed tests in nsXMLContentSerializer::AppendWrapped_NonWhitespaceSequence and added details in comments

And I added also:
- some unit tests on the serialization of selections, range, node etc..
- some unit tests on the serialization of OL lists
- unit tests on htmlcopyencoder
- a test for bug 270145 to verify I do not break something
- support of "autoplay" and "controls" attributes on video and audio elements

And I changed flags in nsContentAreaDragDrop to keep the original serialization of dragged content.

Attachment #343031 - Attachment is obsolete: true

Attachment #348652 - Flags: review?(Olli.Pettay)

Laurent Jouanneau

Assignee

Comment 15

•

16 years ago

Attached patch diff between v1.2 and v2 (obsolete) — Details — Splinter Review

Just a diff between patch v1.2 and patch v2...

Olli Pettay [:smaug][bugs@pettay.fi]

Comment 16

•

16 years ago

The patch breaks /tests/embedding/test/test_bug293834.html
and I think also /tests/toolkit/components/url-classifier/tests/mochitest/test_classifier.html

Laurent Jouanneau

Assignee

Comment 17

•

16 years ago

Attached patch patch v2.1 (obsolete) — Details — Splinter Review

This is an update of the patch for the current trunk (patch on a Makefile.in was broken). No modification in the part of the patch on the serializer itself.

This new patch includes a modification on test_bug293834.html to pass the test. No modification with test_classifier.html since it passes on my machine.

Attachment #348652 - Attachment is obsolete: true

Attachment #357873 - Flags: review?(Olli.Pettay)

Attachment #348652 - Flags: review?(Olli.Pettay)

Olli Pettay [:smaug][bugs@pettay.fi]

Comment 18

•

16 years ago

Do you know why the patch breaks test_bug293834.html? And is the new behavior
more correct than the old one?

Laurent Jouanneau

Assignee

Comment 19

•

16 years ago

In fact, in this test, there is a serialisation, and the serialized content is then "unserialized" in a new iframe. Then the test check if a simple string "form state" is in textarea or input contents. However, the new serializer has a better wrapping, and then there is now a linebreak in some input contents : "form\nstate". This is why the test failed. So I changed the regular expression in the test : from /form state/  to /form\s+state/.

Olli Pettay [:smaug][bugs@pettay.fi]

Comment 20

•

16 years ago

Attached file comments — Details

Olli Pettay [:smaug][bugs@pettay.fi]

Updated

•

16 years ago

Attachment #357942 - Attachment is patch: false

Laurent Jouanneau

Assignee

Comment 21

•

16 years ago

Attached patch patch v2.2 (obsolete) — Details — Splinter Review

>@@ -1415,17 +1415,18 @@ nsTransferableFactory::SerializeNodeOrSe
>Is there some testcase for this? Or is it even possible to test this
>automatically? At least you have tested this manually, right?

It seems to be complex to do such testcase automatically (perhaps there are some functions in mochitests to facilitate to write test case for it ? Or am I wrong ?). So yes, I've done manual tests. However, I saw that this method use the "html copy encoder", so I added some test on it in test_htmlcopyencoder.html and test_htmlcopyencoder.xhtml.


>+      mPreLevel--;
>Use prefix operator (it is faster, though compiler probably knows how to optimize it)

Done, and also in all other lines where ++ and -- is used.

> mPreLevel decrement happens now somewhere else, right?

Yes, in nsXHTMLContentSerializer::MaybeLeaveFromPreContent()

>+PRBool
>+nsXHTMLContentSerializer::IsJavaScript(nsIAtom* aAttrNameAtom, const nsAString& aValueString)
>This should handle also XUL and SVG inside XHTML, right? Perhaps the method
>could have an nsIContent* aContent parameter, and based on its type
>it could check whether the attribute is an event attribute name.

Yes you're right. So I moved the IsJavascript method to the nsXMLContentSerializer, then XHTML/SVG/XUL attributes into any XML content can be checked too. Its new signature is
   PRBool IsJavaScript(nsIContent * aContent,
                       nsIAtom* aAttrNameAtom,
                       PRInt32 aAttrNamespaceID,
                       const nsAString& aValueString);

And I restored the original IsJavascript method into the nsHTMLContentSerializer.

>+void 
>+nsXHTMLContentSerializer::AppendEndOfElementStart(nsIDOMElement *aOriginalElement,
>+                                                  nsIAtom * aName,
>+                                                  PRInt32 aNamespaceID,
>+                                                  nsAString& aStr)
>+{
>+  // this method is not called by nsHTMLContentSerializer
>+  // so we don't have to check HTML element, just XHTML
>Could you add an assertion that this isn't an nsHTMLContentSerializer

I added 
  NS_ASSERTION(!mIsHTMLSerializer, "nsHTMLContentSerializer shouldn't call this method !");

Also in other places where you told me...

>+nsXHTMLContentSerializer::IsShorthandAttr(const nsIAtom* aAttrName,
>+                                        const nsIAtom* aElementName)
>Is there a testcase which checks that shorthand attributes works in HTML and XHTML,
>and don't if the element is in some other namespace?

No. I added some in file_xhtmlserializer_1. In XHTML, this method is called only when we are on a XHTML element.
So it is not called on any other elements. However, it was called for attributes which have a specific namespace, then I fixed it.

>+  PRPackedBool mInPseudoCDATA;
>Perhaps better to call this something else than ***CDATA.
>Maybe mInNoEntityEncoding ?

Ok, I renamed it to mDisableEntityEncoding.

>Btw, how does this handle the case when 'script' or 'style' etc. have
>child elements? I know, that isn't probably valid, but still possible
>if the DOM is modified by a script. Perhaps using a counter would be
>better than just a simple flag.

Yes, you're right, I changed this.

>+  // To keep track of startvalue of OL and first list item for nested lists
>+  struct olState {
>I'd prefer having an explicit copy-constructor, and maybe even operator=
>(depends on what nsTArray requires)

I added an explicit copy-constructor.

>+// the string used to indent.
>+#define INDENT_STRING "   "
>Why 3 spaces?

This is an arbitrary value..

>+
>+  mDoRaw = (mFlags & nsIDocumentEncoder::OutputRaw) ? PR_TRUE
>+                                                    : PR_FALSE;
>mDoRaw = !!(mFlags & nsIDocumentEncoder::OutputRaw);

ok

>+
>+  mDoFormat = (mFlags & nsIDocumentEncoder::OutputFormatted 
>+                && !mDoRaw) ? PR_TRUE
>+                            : PR_FALSE;
>Why not just
>  mDoFormat = (mFlags & nsIDocumentEncoder::OutputFormatted && !mDoRaw);

ok

>+
>+  mDoWrap = (mFlags & nsIDocumentEncoder::OutputWrap
>+            && !mDoRaw) ? PR_TRUE
>+                        : PR_FALSE;
>Similar thing here

ok

>Use NS_NAMED_LITERAL_STRING for <![CDATA[

ok. Done for  "<!--" too.

>   if (!data.IsEmpty()) {
>-    AppendToString(NS_LITERAL_STRING(" "), aStr);
>+    AppendToString(PRUnichar(' '), aStr);
>Nit, is there any reason for this change?

Because I think it uses less memory since with a PRUnichar, we don't instantiate any object...

>-        if ( *buf == PRUnichar('\'') )
>-        {
>+        if (*buf == PRUnichar('\'')) {
>Why this change? The coding style is anyway a bit strange around here.

Ok, I canceled this change.


I fixed also all other little things indicated in your comments: about syntax, about whitespaces, about C++/Idl comments etc. I fixed also testcases which didn't pass on windows.

Attachment #348653 - Attachment is obsolete: true

Attachment #357873 - Attachment is obsolete: true

Attachment #358388 - Flags: review?(Olli.Pettay)

Attachment #357873 - Flags: review?(Olli.Pettay)

Olli Pettay [:smaug][bugs@pettay.fi]

Comment 22

•

16 years ago

Attached file comments — Details

Olli Pettay [:smaug][bugs@pettay.fi]

Comment 23

•

16 years ago

Comment on attachment 358388 [details] [diff] [review]
patch v2.2

I could mark this r+ now, but this needs good sr too. The patch is just so huge (although lots of it is moving code).

Attachment #358388 - Flags: review?(Olli.Pettay) → review+

Laurent Jouanneau

Assignee

Comment 24

•

16 years ago

Attached patch patch v2.3 (obsolete) — Details — Splinter Review

Fixed Olli's latest comments. For the isJavascript method, I need to know how to make the difference between a HTML content and a XML node without a namespace. Should I do a QueryInterface each time I run this method ?

Attachment #358388 - Attachment is obsolete: true

Attachment #359039 - Flags: superreview?(peterv)

Olli Pettay [:smaug][bugs@pettay.fi]

Comment 25

•

16 years ago

Maybe aContent->IsNodeOfType(nsINode::eHTML)

Laurent Jouanneau

Assignee

Comment 26

•

16 years ago

Attached patch patch v2.4 (obsolete) — Details — Splinter Review

Thanks Olli, I removed the isJavascript method from nsHTMLContentSerializer.

Oh, and the bug for MathML is bug 475518.

Attachment #359039 - Attachment is obsolete: true

Attachment #359058 - Flags: superreview?(peterv)

Attachment #359039 - Flags: superreview?(peterv)

Laurent Jouanneau

Assignee

Updated

•

16 years ago

Attachment #359058 - Flags: superreview?(peterv) → superreview?(jonas)

Jonas Sicking (:sicking) No longer reading bugmail consistently

Comment 27

•

15 years ago

Comment on attachment 359058 [details] [diff] [review]
patch v2.4

>@@ -685,649 +265,143 @@ nsHTMLContentSerializer::AppendElementSt
>       //If OL has "start" attribute, first LI element has to start with that value
>       //Therefore subtracting 1 as all the LI elements are incrementing it before using it;
>       //In failure of ToInteger(), default StartAttrValue to 0.
>       if (NS_SUCCEEDED(rv))
>         startAttrVal--; 
>       else
>         startAttrVal = 0;
>     }
>-    olState* state = new olState(startAttrVal, PR_TRUE);
>-    if (state)
>-      mOLStateStack.AppendElement(state);
>+    olState state (startAttrVal, PR_TRUE);
>+    mOLStateStack.AppendElement(state);

No space before the '('.

Also, I think you could write this as
mOLStateStack.AppendElement(olState(startAttrVal, PR_TRUE));

>+nsXHTMLContentSerializer::SerializeAttributes(nsIContent* aContent,
...
>+  NS_NAMED_LITERAL_STRING(_mozStr, "_moz");
...
>+    // Filter out any attribute starting with [-|_]moz
>+    const char* sharedName;
>+    attrName->GetUTF8String(&sharedName);
>+    if ((('_' == *sharedName) || ('-' == *sharedName)) &&
>+        !nsCRT::strncmp(sharedName+1, kMozStr, PRUint32(sizeof(kMozStr)-1))) {
>+      continue;
>+    }

is the '+1' there really correct? Seems like this will filter attributes starting with "__moz" or "-_moz"

Other than that I couldn't find anything. But this patch is a bit enormous and so hard to review. sr=me either way

Attachment #359058 - Flags: superreview?(jonas) → superreview+

Laurent Jouanneau

Assignee

Comment 28

•

15 years ago

Attached patch patch v2.5 (obsolete) — Details — Splinter Review

Unfortunately, the applying of the patch 2.4 fails on the current trunk because of some recents changes on nsHTMLContentSerializer.* and nsXMLContentSerializer.*. This changes are on the nsVoidArray properties (http://hg.mozilla.org/mozilla-central/rev/52c7758d5a4d and http://hg.mozilla.org/mozilla-central/rev/659355060532 ). On nsHTMLContentSerializer, my patch 2.4 contains already this changes (asked by Olli, see comments below), but it doesn't includes changes on nsXMLContentSerializer.

I updated my patch.

>Also, I think you could write this as
>mOLStateStack.AppendElement(olState(startAttrVal, PR_TRUE));

Yes, it is already made in the trunk

>is the '+1' there really correct? Seems like this will filter attributes
starting with "__moz" or "-_moz"

no, because kMozStr equals to "moz", not "_moz". I added some tests about this in this new patches.

Thanks for your superreview Jonas :-)

Attachment #359058 - Attachment is obsolete: true

Laurent Jouanneau

Assignee

Comment 29

•

15 years ago

Attached patch the diff between the latest two patches (obsolete) — Details — Splinter Review

here are the differences between patch v2.4 and v2.5.

I forgot to say that I also updated some few unit tests that I forgot to modify for patch 2.4 (difference of number of spaces for indentation).

For patch 2.5, I launched mochitests and xpcshell tests on linux and mac, and all is green, except on some tests in toolkit/content/tests/widgets/test_textbox_number.xul and browser/base/content/test/test_contextmenu.html. However I don't think it is related to the serializer.

Laurent Jouanneau

Assignee

Comment 30

•

15 years ago

Should I ask a new review/superreview or can we land the patch in the trunk ?

Olli Pettay [:smaug][bugs@pettay.fi]

Comment 31

•

15 years ago

The patch is so huge that perhaps worth to use tryserver before landing to trunk.

Laurent Jouanneau

Assignee

Comment 32

•

15 years ago

Ok Olli. I haven't rights to use tryserver. Could somebody test my patch on a tryserver please ?

Laurent Jouanneau

Assignee

Comment 33

•

15 years ago

Olli has landed the patch into tryservers. The results are :

For build only, it's ok
- linux: http://tinderbox.mozilla.org/showlog.cgi?log=MozillaTry/1239886953.1239890526.19483.gz
- mac: http://tinderbox.mozilla.org/showlog.cgi?log=MozillaTry/1239886954.1239892679.28620.gz
- windows: http://tinderbox.mozilla.org/showlog.cgi?log=MozillaTry/1239886953.1239894765.32652.gz

For build+test
- mac: ok: http://tinderbox.mozilla.org/showlog.cgi?log=MozillaTry/1239886954.1239895206.842.gz
- leaks on linux : http://tinderbox.mozilla.org/showlog.cgi?log=MozillaTry/1239886953.1239894676.32544.gz
However, these leaks appear on many components which are not related to the serializer I mean (ex: mozstorage).

- a test failed on windows http://tinderbox.mozilla.org/showlog.cgi?log=MozillaTry/1239886953.1239898098.6218.gz

Apparently, it is because of a difference on end-of-line characters.. I will fix this test :-/

Laurent Jouanneau

Assignee

Comment 34

•

15 years ago

Attached patch patch v2.6 — Details — Splinter Review

I just fixed the test which fails. It was just a missing flag OutputLFLineBreak when calling the encoder.

Attachment #372288 - Attachment is obsolete: true

Attachment #372289 - Attachment is obsolete: true

Laurent Jouanneau

Assignee

Comment 35

•

15 years ago

Olli, could you test this patch on try servers please ?

Olli Pettay [:smaug][bugs@pettay.fi]

Comment 36

•

15 years ago

Uploaded to tryserver. Look for 'laurentj_serializers_26'

Laurent Jouanneau

Assignee

Comment 37

•

15 years ago

Thank you Olli.

All box are greens except one. Here are the reports for all green boxes:

http://tinderbox.mozilla.org/showlog.cgi?log=MozillaTry/1240258500.1240276856.6161.gz
http://tinderbox.mozilla.org/showlog.cgi?log=MozillaTry/1240258500.1240270883.31148.gz
http://tinderbox.mozilla.org/showlog.cgi?log=MozillaTry/1240258500.1240270911.31177.gz
http://tinderbox.mozilla.org/showlog.cgi?log=MozillaTry/1240258500.1240270971.31240.gz
http://tinderbox.mozilla.org/showlog.cgi?log=MozillaTry/1240258529.1240264875.21996.gz
http://tinderbox.mozilla.org/showlog.cgi?log=MozillaTry/1240260238.1240269576.29357.gz
http://tinderbox.mozilla.org/showlog.cgi?log=MozillaTry/1240258529.1240260818.15811.gz
http://tinderbox.mozilla.org/showlog.cgi?log=MozillaTry/1240259546.1240265209.22553.gz
http://tinderbox.mozilla.org/showlog.cgi?log=MozillaTry/1240260811.1240267460.26305.gz
http://tinderbox.mozilla.org/showlog.cgi?log=MozillaTry/1240261231.1240272761.927.gz
http://tinderbox.mozilla.org/showlog.cgi?log=MozillaTry/1240258500.1240278058.7574.gz
http://tinderbox.mozilla.org/showlog.cgi?log=MozillaTry/1240258500.1240277494.6906.gz
http://tinderbox.mozilla.org/showlog.cgi?log=MozillaTry/1240258500.1240277619.7035.gz
http://tinderbox.mozilla.org/showlog.cgi?log=MozillaTry/1240258500.1240277623.7038.gz

The fail :
http://tinderbox.mozilla.org/showlog.cgi?log=MozillaTry/1240258500.1240269405.29116.gz&fulltext=1

The browser crashed during talos tests. I don't know why. Log file is not very detailed about the fail.

Is it significant or can we commit the patch into mozilla-central ?

Laurent Jouanneau

Assignee

Comment 38

•

15 years ago

Yesterday Olli has re-pushed the patch on try servers, and all were greens.

Keywords: checkin-needed

Olli Pettay [:smaug][bugs@pettay.fi]

Comment 39

•

15 years ago

I'll push this once my build is ready.

Olli Pettay [:smaug][bugs@pettay.fi]

Comment 40

•

15 years ago

http://hg.mozilla.org/mozilla-central/rev/f6d27eb05ec1

Status: ASSIGNED → RESOLVED

Closed: 15 years ago

Keywords: checkin-needed

Resolution: --- → FIXED

Laurent Jouanneau

Assignee

Comment 41

•

15 years ago

Thank you Olli for all, and thanks Jonas for your superreview :-)

Boris Zbarsky [:bzbarsky]

Updated

•

15 years ago

Depends on: 494103

Boris Zbarsky [:bzbarsky]

Comment 42

•

15 years ago

This seems to have added a warning which looked like a serious bug to me at first glance (virtual methods not being overridden correctly):

mozilla/src/content/base/src/nsXHTMLContentSerializer.h:114:
warning: ‘virtual void nsXHTMLContentSerializer::SerializeAttributes(nsIContent*, nsIDOMElement*, nsAString_internal&, const nsAString_internal&, nsIAtom*, nsAString_internal&, PRUint32, PRBool)’ was hidden
by ‘virtual void nsHTMLContentSerializer::SerializeAttributes(nsIContent*, nsIDOMElement*, nsAString_internal&, const nsAString_internal&, nsIAtom*, nsAString_internal&)’

After spending an hour or so reading through this code, I _think_ that it's correct.  But is there a reason the HTML one has a different signature?  Or isn't called SerializeHTMLAttributes or something?

Laurent Jouanneau

Assignee

Comment 43

•

15 years ago

Actually, I think there is no reason to define this nsHTMLContentSerializer::SerializeAttributes method with the same name of the  nsXHTMLContentSerializer one. Yes, we could call it SerializeHTMLAttributes.

Do you want I create a bug and provide a patch ?

Boris Zbarsky [:bzbarsky]

Comment 44

•

15 years ago

That would be wonderful!

Laurent Jouanneau

Assignee

Comment 45

•

15 years ago

done: https://bugzilla.mozilla.org/show_bug.cgi?id=513647

Laurent Jouanneau

Assignee

Updated

•

15 years ago

Blocks: 502177

Mats Palmgren (inactive)

Updated

•

15 years ago

Depends on: 524975

Mats Palmgren (inactive)

Updated

•

15 years ago

Depends on: 545644

Boris Zbarsky [:bzbarsky]

Updated

•

15 years ago

Depends on: 546474

Boris Zbarsky [:bzbarsky]

Updated

•

15 years ago

Depends on: 549295

Boris Zbarsky [:bzbarsky]

Comment 46

•

15 years ago

This patch broke the wrapping behavior when serializing HTML.  See bug 549295.

Laurent Jouanneau

Assignee

Updated

•

15 years ago

Depends on: 541937

Henri Sivonen (:hsivonen)

Comment 47

•

13 years ago

What's the use case for having the W3CEntities (MathML) configuration?

You need to log in before you can comment on or make changes to this bug.