17017 - newlines outside </body> are inserted inside the body node

Reporter

Description

•

25 years ago

If the source html file has a newline after the body (which I would assume that
most do), the parser inserts that newline as a text node inside, rather than
outside, the body node in the content tree.

The easiest way to see this is to run the editor on the file
htmlparser/tests/outsinks/simple.html, then do a Debug->Dump Content Tree (or do
the equivalent debug action in viewer).  Note the three adjacent newlines near
the end but inside the body node; from the source file, they should be two
adjacent newlines inside the body plus another newline outside the body.

For extra credit, it would be nice to get the right number of newlines after the
</html>, but I can live without that and special-case it in the output sink; I
could special-case a newline after the </body> tag, too, if necessary. But I
can't detect that the extra newline node shouldn't be there inside the body, so
the output will show an extra newline before the </body> tag where there
shouldn't be one.

Akkana Peck

Reporter

Updated

•

25 years ago

Summary: newlines after </body> are inserted inside the body node → newlines outside </body> are inserted inside the body node

Akkana Peck

Reporter

Comment 1

•

25 years ago

I'm guessing this is the same problem as the longstanding problem of newlines in
the head getting inserted into the body node.  For instance, the sample
attachment to bug 15674:

 <!doctype HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
"http://www.w3.org/TR/html40/loose.dtd">

<html>

<head>
<title>An Example</title>
</head>

<body>
<h1>Test</h1>

</body>
</html>

gets parsed into this tree:

html refcount=4<
  head refcount=3<
    title refcount=3<
      Text refcount=3<An Example>
    >
  >
  Text refcount=4< \n\n\n\n\n\n>
  body refcount=4<
    Text refcount=4<\n>
    h1 refcount=4<
      Text refcount=8<Test>
    >
    Text refcount=4<\n\n\n\n>
  >
>

and there's no way for the output routines to know which of these are valid body
newlines and which are not, so they all go into the output.

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Comment 2

•

25 years ago

This seems like bug 12998 at the opposite end...

Akkana Peck

Reporter

Comment 3

•

25 years ago

Yes, it does look like a dup of 12998.  That one is marked as fixed, but I'm
still seeing it.

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Comment 4

•

25 years ago

By "at the opposite end" I meant that bug 12998 was for extra whitespace at
the beginning of the body element and this is extra whitespace at the end.
Bug 12998 is fixed, and this one isn't.

Akkana Peck

Reporter

Comment 5

•

25 years ago

This one is for whitespace at both ends (see my second comment to the bug
report); and as of Friday, I was still seeing whitespace at both ends being
inserted in the wrong place.

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Comment 6

•

25 years ago

OK... but at the beginning that extra whitespace is outside of body (it was once
inside) whereas at the end it's inside.  It's still not exactly where it was,
though...

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Comment 7

•

25 years ago

"where it was" == "where it was in the document"

harishd

Assignee

Updated

•

25 years ago

Assignee: harishd → akkana

harishd

Assignee

Comment 8

•

25 years ago

Akkana, looks like there is nothing much that could done on the parser end.

Giving the bug back to you.

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Comment 9

•

25 years ago

IMO, this should be fixed in the parser, just like bug 12998 was.  It's a bug in
conformance with the W3C DOM Level 1 spec.

Akkana Peck

Reporter

Updated

•

25 years ago

Assignee: akkana → harishd

Akkana Peck

Reporter

Comment 10

•

25 years ago

If the parser inserts newline nodes in the wrong place in the DOM tree, there's
absolutely nothing the output system can do to distinguish these nodes from
nodes which really were in the original source.

Akkana Peck

Reporter

Updated

•

25 years ago

Assignee: harishd → rickg

Akkana Peck

Reporter

Comment 11

•

25 years ago

Rick and I discussed this at some length.  He says that the whitespace at the
end of the document (after the /body and which aren't associated with any other
text or markup besides /html) should be trivial to fix.  For the whitespace
inside the head, at least some of it should also be easy to leave in place
instead of coalescing it into one text node between the head and the body; he's
going to look into that.

Akkana Peck

Reporter

Comment 12

•

25 years ago

It turns out that this also happens for tables: all newlines occurring inside
table nodes are coalesced into a couple of text nodes outside the table
boundaries.

harishd

Assignee

Updated

•

25 years ago

Assignee: rickg → harishd

harishd

Assignee

Comment 13

•

25 years ago

stealing!!

harishd

Assignee

Comment 14

•

25 years ago

Fixed by updating CNavDTD.

Status: NEW → RESOLVED

Closed: 25 years ago

Resolution: --- → FIXED

bsharma

Comment 15

•

24 years ago

updated qa contact.

QA Contact: janc → bsharma

bsharma

Comment 16

•

23 years ago

Verified on:
build: 2001-04-02-09-Mtrunk
platform: WinNT

Marking it verified as per the developer comments.

Do not know how I can view the content tree using release builds?

Status: RESOLVED → VERIFIED

Bugzilla

Quick Search

newlines outside </body> are inserted inside the body node

Categories

(Core :: DOM: HTML Parser, defect, P3)

Tracking

()

People

(Reporter: akkzilla, Assigned: harishd)

References

Details

Crash Data

Security

(public)

User Story

Description

Updated

Comment 1

Comment 2

Comment 3

Comment 4

Comment 5

Comment 6

Comment 7

Updated

Comment 8

Comment 9

Updated

Comment 10

Updated

Comment 11

Comment 12

Updated

Comment 13

Comment 14

Comment 15

Comment 16