Closed Bug 424359 Opened 17 years ago Closed 16 years ago

Html serializer : bugs with OutputBodyOnly and OutputLF/CRLineBreak flags

Categories

(Core :: DOM: Serializers, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED

People

(Reporter: laurent, Assigned: laurent)

References

Details

Attachments

(2 files, 9 obsolete files)

The HTML serializer contains two bugs:

1) with nsIDocumentEncoder::OutputBodyOnly, there is an </html> tag at the end of the string
2) with nsIDocumentEncoder::OutputCRLineBreak, some end of line are not replaced by a CR  (ex: end of lines after the doctype, inside comments).
Status: NEW → ASSIGNED
Attached patch First patch, work in progress (obsolete) — Splinter Review
Here is a first version of the patch. I added some mochitests (html serializer) and xpcshell test (xml serializer).

To resolve the second bug, I modified the XML serializer. So now, the XML serializer supports OutputCR/LFLineBreak flags ;-)

I will submit a second version, because I think my tests don't pass on plateforms like Mac and Windows, because of the default line break used on them.
Assignee: nobody → laurent
Attached patch version 2 of the patch (obsolete) — Splinter Review
Here is the updated patch, ready for a review.
Attachment #310987 - Attachment is obsolete: true
Attachment #312721 - Flags: superreview?(peterv)
Attachment #312721 - Flags: review?(jonas)
Flags: wanted1.9.0.x?
Attachment #312721 - Flags: review?(jonas) → review?(Olli.Pettay)
Comment on attachment 312721 [details] [diff] [review]
version 2 of the patch

>@@ -855,16 +834,20 @@ nsHTMLContentSerializer::AppendElementEn
>     AppendToString(mLineBreak, aStr);
>     mMayIgnoreLineBreakSequence = PR_TRUE;
>     mColPos = 0;
>   }
>   else {
>     MaybeFlagNewline(aElement);
>   }
> 
>+  if (name == nsGkAtoms::body) {
>+    mInBody = PR_FALSE;
>+  }
>+
Mainly because of this change, could you add tests which have
more than one <body> element in the document.
And what happens if there are nested <body> elements?
(that may happen with non-valid-XHTML or if someone modifies the DOM.)

>+++ content/base/test/test_htmlserializer_1.html	2008-03-31 15:12:28.000000000 +0200
>@@ -0,0 +1,157 @@
>+<!DOCTYPE HTML>
>+<html>
>+<!--
>+-->
>+<head>
>+  <title>Test for HTML serializer</title>
>+  <script type="text/javascript" src="/MochiKit/MochiKit.js"></script>
>+  <script type="text/javascript" src="/tests/SimpleTest/SimpleTest.js"></script>
>+  <link rel="stylesheet" type="text/css" href="/tests/SimpleTest/test.css" />
>+</head>
>+<body>
>+<!--<a target="_blank" href="https://bugzilla.mozilla.org/show_bug.cgi?id=">Mozilla Bug </a>-->
Could you please rename this file to test_bug424359-1.html and make the link point to
this bug.	

>+++ content/base/test/test_htmlserializer_2.html	2008-03-31 15:12:24.000000000 +0200
>@@ -0,0 +1,96 @@
>+<!DOCTYPE HTML>
>+<html>
>+<!--
>+-->
>+<head>
>+  <title>Test HTML serializer with entities</title>
>+  <script type="text/javascript" src="/MochiKit/MochiKit.js"></script>
>+  <script type="text/javascript" src="/tests/SimpleTest/SimpleTest.js"></script>
>+  <link rel="stylesheet" type="text/css" href="/tests/SimpleTest/test.css" />
>+</head>
>+<body>
>+<!--<a target="_blank" href="https://bugzilla.mozilla.org/show_bug.cgi?id=">Mozilla Bug </a>-->
This file could be then test_bug424359-2.html
Attached patch Version 3 of the patch (obsolete) — Splinter Review
This new version fixes some issues when there are several body elements

* if body elements are nested, only the top body element is generated. However, the content of child body elements are generated inside the top body element (in fact, I follow the same behavior of the HTML parser)
* if there are two sibling body elements, only the first one is generated.

I renamed also the two test files.
Attachment #312721 - Attachment is obsolete: true
Attachment #328506 - Flags: review?(Olli.Pettay)
Attachment #312721 - Flags: superreview?(peterv)
Attachment #312721 - Flags: review?(Olli.Pettay)
Comment on attachment 328506 [details] [diff] [review]
Version 3 of the patch

Please use -p with diff.

>   if (name == nsGkAtoms::body) {
>-    mInBody = PR_TRUE;
>+    mInBody ++;
>+    if (mInBody > 1) {
>+      // we ignore nested body elements
>+      return NS_OK;
>+    }
>+    if (mAlreadyHaveBody) {
>+      return NS_OK;
>+    }
Extra space between mInBody and ++;
(and I'd prefer ++mBody)


>+  if (name == nsGkAtoms::body && mInBody > 1) {
>+    // we ignore nested body elements
>+    mInBody--;
>+    return NS_OK;
>+  }

Could this be
  if (name == nsGkAtoms::body && --mInBody > 1) {
    // we ignore nested body elements
    return NS_OK;
  }
Then you don't need to set mInBody explicitly to 0 later in the method.

But still some comments about the approach.

(In reply to comment #4)
> * if body elements are nested, only the top body element is generated. However,
> the content of child body elements are generated inside the top body element
> (in fact, I follow the same behavior of the HTML parser)
Sounds reasonable, though to keep the patch as simple as possible and not
change the behavior too much, perhaps serializer could create those inner bodys too.

> * if there are two sibling body elements, only the first one is generated.
This is tricky. What would be great, IMO, was to be able to push the content of
latter sibling body elements to the first body, but that would require
quite large changes.
What is the current behavior? Generate all the sibling bodys? Maybe there isn't
need to change that.

So to fix this bug, is it enough to use the version 2 patch but change
mInBody to PRInt32 and increase and decrease it when needed?
In my opinion, I think this isn't the role of the serializer to fix problems with invalid elements. I we start to try to fix some things like I do in my patch with the body element, then why not to verify other things like H1 inside P, DIV inside SPAN, P as child of UL etc. ? It would be too complex, and It would be outside of the goals of a serializer. The serializer should only take care about the syntax it generates, so it should provide a valid SGML document or a valid XML document.

So ok, I can remove behaviors I added, so nested body elements  or sibling body elements will be generated, like in the current behavior. So no need to change mInBody to PRInt32, and changes made in the version 2 of the patch are enough.

If you are ok, I will make a new patch with new unit tests.

(Argh, I wasn't CC'd to this bug)

(In reply to comment #6)
> So ok, I can remove behaviors I added, so nested body elements  or sibling body
> elements will be generated, like in the current behavior. So no need to change
> mInBody to PRInt32, and changes made in the version 2 of the patch are enough.
> 
> If you are ok, I will make a new patch with new unit tests.
As far as I see, there is still need to change mInBody ot PRInt32. Otherwise
you set mInBody to PR_FALSE too early (when closing the innermost <body> in
the case of nested <body>s.) 

Yes, you're right. I'm going to make a new patch today.
Attached patch Version 4 of the patch (obsolete) — Splinter Review
Here is the updated patch.
Attachment #329680 - Flags: review?(Olli.Pettay)
Attachment #328506 - Attachment is obsolete: true
Attachment #328506 - Flags: review?(Olli.Pettay)
Sorry for the spam. I forgot to create the diff with the -p flag.
Attachment #329680 - Attachment is obsolete: true
Attachment #329683 - Flags: review?(Olli.Pettay)
Attachment #329680 - Flags: review?(Olli.Pettay)
Comment on attachment 329683 [details] [diff] [review]
Version 4 of the patch with function names

I guess this is ok.
Attachment #329683 - Flags: review?(Olli.Pettay) → review+
Attachment #329683 - Flags: superreview?(jst)
Not sure we actually want this in 1.9.0.x (the stable branch), but probably in 1.9.1?
Flags: wanted1.9.1?
Whiteboard: [needs sr jst]
Component: XML → Serializers
QA Contact: xml → dom-to-text
Blocks: 305711
Same patch, but the applying of the previous one fails on the current trunk, because of a change in a C++ comment.
Attachment #329683 - Attachment is obsolete: true
Attachment #329683 - Flags: superreview?(jst)
Attachment #342864 - Flags: superreview?(bzbarsky)
Comment on attachment 342864 [details] [diff] [review]
same patch updated for the current trunk

So... it seems like some of the "HTML output only" comments describe things as they are, whereas other are things as they "should" be (that is, some things are HTML-only for good reasons, and some just happen to be that way for now).  I'd rather not document the things that are HTML-only by accident as being HTML-only...  Or at least make it very clear that this can change.

Past that, please remove the random whitespace changes and please use -p when diffing like Olli asked you to.  I'm not going to spend time reviewing this patch in its current state because it'll be so much easier to review with the above two changes...
Attached patch same patch with -p -U 8 (obsolete) — Splinter Review
Sorry Boris for the format of the patch. I currently have my patch in a hg queue, and it seems that hg qrefresh doesn't use the diff parameters from my .hgrc. I didn't know it and I forgot to verify the output format. I removed also the random whitespace changes.
Attachment #342864 - Attachment is obsolete: true
Attachment #342882 - Flags: superreview?(bzbarsky)
Attachment #342864 - Flags: superreview?(bzbarsky)
Ah, for mq you want to do:

[diff]
git = 1
showfunc = true
unified = 8

in your .hgrc.  That will make things happy.  Looking at the patch now.
Comment on attachment 342882 [details] [diff] [review]
same patch with -p -U 8

>+++ b/content/base/src/nsHTMLContentSerializer.h
>+  PRInt32  mInBody;

Probably better as PRUint32.

>+++ b/content/base/src/nsXMLContentSerializer.h
>+  // current colonne position

s/colonne/column/

>   PRPackedBool mAddNewline;
>+
> };

No need for that blank line

With those nits, looks great.  Thanks for writing all those tests!
Attachment #342882 - Flags: superreview?(bzbarsky) → superreview+
I pushed changeset 43d8e57c41e9 with those three things changed.
Status: ASSIGNED → RESOLVED
Closed: 16 years ago
Resolution: --- → FIXED
Backed out, because this caused test failures on Windows.  Note that I left the test files in the tree and just backed out the file changes (by accident, mostly).

The failures seem to be due to the XML serializer changes, at least in part:

TEST-UNEXPECTED-FAIL | ../../../_tests/xpcshell-simple/content/unit/test_xmlserializer.js | test failed, see log
NEXT ERROR *** TEST-UNEXPECTED-FAIL | ../../../_tests/xpcshell-simple/content/unit/test_xmlserializer.js | <?xml version="1.0" encoding="UTF-8"?>
<foo/>
<!-- é --> == <?xml version="1.0" encoding="UTF-8"?>

<foo/>

<!-- é -->

A number of mochitest failures also, in both test_bug424359-1.html and test_bug424359-2.html.  See log at http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1223917750.1223920953.6499.gz
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
This new version includes Boris's changes, plus some changes in tests: I added the use of the OutputLFLineBreak flag to all tests, so the encoder won't use the default line break of the platform. I think tests will then pass on windows. I haven't a windows system here at home to verify it, but I could test at office tomorrow morning...
Attachment #342882 - Attachment is obsolete: true
Attachment #342941 - Flags: superreview?(bzbarsky)
I pushed that patch to the try server with the label "CRLFStuff" so we can try to get some data on it.
Er.. except try server doesn't run unit tests.  :(
Thanks to a patch which fixes a VC7.1 bustage, and which has just landed into the trunk, I finally could compile on my window desktop at the office :-) On this machine, all tests pass with my latest patch (mochitest + xpcshell). And on my linux box, same result: all is green.
Attachment #342941 - Flags: superreview?(bzbarsky) → superreview+
Pushed that last diff as changeset 9fde2bdd93ef.
Status: REOPENED → RESOLVED
Closed: 16 years ago16 years ago
Resolution: --- → FIXED
backed out 9fde2bdd93ef due to unit test failures
(In reply to comment #25)
> backed out 9fde2bdd93ef due to unit test failures

It caused Windows FF+TB+SM to fail at
http://mxr.mozilla.org/mozilla-central/source/content/test/unit/test_xml_serializer.js?mark=276#265
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Attached patch patch for failed tests (obsolete) — Splinter Review
Here is an additionnal patch which fixes the test. I launch xpcshell tests in all directories, mochitests in all directories, and it's ok. Some tests failed on my machine but apparently, it is not related to the serialization, it's mainly on some tests which do some calculation (on the layout, SVG...), and some others about network apis...

I hope it will be enough.  I'm sorry for these failures :-(
Attachment #343551 - Flags: superreview?(bzbarsky)
Comment on attachment 343551 [details] [diff] [review]
patch for failed tests

>+var LB;

Please add some comments about what this variable really means?

And no need for two newlines after; one is enough.

>+  if(("@mozilla.org/windows-registry-key;1" in C) || ("nsILocalFileOS2" in I))

Gah.  Can you file a followup bug on having a way to get the platform linebreak string from some service and replacing this ickiness with that?
Yes, I agree this plateform test is ugly. I could use the app-info component with its nsIXULRuntime interface, and add a new property on it (I filled bug 460573), or use at least its OS property.

But according to the documentation, http://developer.mozilla.org/En/Writing_xpcshell-based_unit_tests , it is not always available in xpcshell (someone on #developers told me the same thing), although I can use it on my build on windows, and it is used in this test : uriloader/exthandler/tests/mochitest/handlerApps.js 

So, is the documentation deprecated or is there really a risk to use it in xpcshell ?
bsmedberg might be able to answer the nsIXULRuntime question.... at the very least, I suspect it's only available if you are running a toolkit app.  Mochitests run in browser, so that works.  You're writing an xpcshell test, though, so I suspect it's not guaranteed to work.
Attached patch patch for failed tests v2 (obsolete) — Splinter Review
Thanks Boris, I will ask him.

Here is the patch with a new comment on the variable.
Attachment #343551 - Attachment is obsolete: true
Attachment #343885 - Flags: superreview?(bzbarsky)
Attachment #343551 - Flags: superreview?(bzbarsky)
s/use/uses/, and I'd put the comment where the variable is declared, not where it's assigned.
nsIXULRuntime is not available by default in xpcshell, I think... but there's no reason you couldn't fail gracefully back to unix-style line endings in that case, is there?
The problem is that the serializer will create platform line-breaks, period.  The reference string needs to have them too, therefore.
I changed the comments.
Attachment #343885 - Attachment is obsolete: true
Attachment #344043 - Flags: superreview?(bzbarsky)
Attachment #343885 - Flags: superreview?(bzbarsky)
Attachment #344043 - Flags: superreview?(bzbarsky) → superreview+
Pushed changeset f19fc94f348b
Status: REOPENED → RESOLVED
Closed: 16 years ago16 years ago
Resolution: --- → FIXED
Whiteboard: [needs sr jst]
Unless I'm missing some baby-killing aspect of this bug it doesn't sound like a level of risk we'd want to take in a 1.9.0.x security update. Fixed in FF3.1
Flags: wanted1.9.1?
Flags: wanted1.9.0.x?
Flags: wanted1.9.0.x-
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: