Closed Bug 306353 Opened 19 years ago Closed 19 years ago

XHTML parsed as XML with doctype has trilicense/other comments from dtd in DOM

Categories

(Core :: XML, defect)

defect
Not set
minor

Tracking

()

RESOLVED FIXED

People

(Reporter: bmills, Unassigned)

References

()

Details

(Keywords: regression, verified1.8)

Attachments

(2 files, 1 obsolete file)

User-Agent:       Mozilla/5.0 (Macintosh; U; PPC Mac OS X Mach-O; en-US; rv:1.9a1) Gecko/20050815 Firefox/1.0+
Build Identifier: Mozilla/5.0 (Macintosh; U; PPC Mac OS X Mach-O; en-US; rv:1.9a1) Gecko/20050815 Firefox/1.0+

My web server happens to serve files ending in .xml as text/xml, and xhtml may
be served as xml (although that is not recommended).  Deer Park Alpha 2 decides
to parse text/xml documents with an XHTML doctype as XHTML but display it as
XML, and in the process unrolls some internal XML into the document.  The end
result is that my document, which is not in any way covered by the MPL/GPL/LGPL,
is shown as an XML parse tree /with an MPL/GPL block that isn't supposed to be
there/.

It is technically allowable to treat my xml-served XHTML as plain old style-less
XML.  However, if that is the case it should not be combined with internal GPL'd
documents.  Alternatively, it could be displayed and rendered properly as XHTML,
which would be preferable but is not at all required.

See http://www.w3.org/TR/xhtml-media-types/#text-xml .

Representing a non-GPL'd document as a GPL'd document is a potential legal
liability, so this really ought to be fixed, altough it doesn't come up much in
practice.

Reproducible: Always

Steps to Reproduce:
View http://www.andrew.cmu.edu/user/bmills/test.xml .
Actual Results:  
An XML parse tree is shown, including several spurious comments with the MPL/GPL
and other comments not in the actual document.

Expected Results:  
EITHER correctly render the XHTML as XHTML, OR display the correct XML parse
tree WITHOUT spurious comments.
Not that we should really be spewing our comments like that, but, if you add the
utterly required xmlns="http://www.w3.org/1999/xhtml" attribute on your html
element, it will work like you expect.
Assignee: nobody → xml
Severity: normal → minor
Component: View Source → XML
OS: MacOS X → All
Product: Firefox → Core
QA Contact: view.source → ashshbhatt
Hardware: Macintosh → All
Summary: XHTML displayed as XML shows MPL/GPL/LGPL and other comments not in original document → XHTML as text/xml with missing xmlns spews trilicense comments
Version: unspecified → Trunk
Just noticed that myself, but since the xmlns is technically required for it to
be XHTML it shouldn't even be parsed as XHTML and the garbage shouldn't show up.
 An error message would even be fine -- "this document has an XHTML doctype but
is not valid XHTML and may contain errors".  Regardless, the comments should
absolutely not be leaking out into a purportedly faithful parse tree, valid
XHTML or not.
Also, Safari appears to tolerate the missing xmlns, so this may be a
compatibility issue down the road.  Possibly needing a quirks-mode kind of
tolerance?
Marking as a regression, I don't see the extra nodes in Moz 1.7.11/Windows
Keywords: regression
The comments are coming from xhtml11.dtd. Maybe we should strip them with the
preprocessor (same for mathml.dtd).
Status: UNCONFIRMED → NEW
Ever confirmed: true
what do you mean with "parsed as XHTML"? the file is parsed as XML like any
other XML mime type...
Well, during parsing the system treats it as XHTML and not just XML, because the
XHTML DTD is what's causing the spurious comments.

So I suppose the real problem is that comments from the local DTD are being
shown in the parse tree of non-local markup that references a corresponding DTD,
in this case the Firefox version of the xhtml DTD.
Nearly filed my own bug on this...

The missing xmlns doesn't matter, if you look in the DOM Inspector they're there
even in ones with an xmlns. Also, it also affects application/xhtml+xml documents.

It doesn't matter if the file is local or not, I've seen it either way. 

The only thing that does seem to matter is if the doctype declaration is there.

I'll work on finding a regression range. I suspect it's in the past 2-3 days
max, though.
Seems I was wrong on the regression: it was between 2005-05-13 and 2005-05-14.
Bonsai points to bug 288133. ccing bzbarsky as he made those changes.

Changing summary to more accurately reflect the problem.
Summary: XHTML as text/xml with missing xmlns spews trilicense comments → XHTML parsed as XML with doctype has trilicense/other comments from dtd in DOM
The regression range isn't that important, this is probably fallout from
updating Expat. We're now doing the right thing (including the comments from a
DTD), we just need to strip the license from the dtd's that we ship.
Hmm... are we really supposed to include comments and text from an external DTD
in the main DOM?  That seems odd...
Attached patch I think we want this behavior (obsolete) — Splinter Review
It really makes no sense to put comments and newlines from an external DTD into
the main page DOM...  I think we should just ignore those callbacks altogether
while we're parsing an external DTD.
Attachment #194367 - Flags: superreview?(peterv)
Attachment #194367 - Flags: review?(peterv)
Comment on attachment 194367 [details] [diff] [review]
I think we want this behavior

As discussed, let's add this to PIs too.
Attachment #194367 - Flags: superreview?(peterv)
Attachment #194367 - Flags: superreview+
Attachment #194367 - Flags: review?(peterv)
Attachment #194367 - Flags: review+
XML Information Set does specify this for comments, but not for PIs: "There is a
comment information item for each XML comment in the original document, except
for those appearing in the DTD (which are not represented)." This either means
they forgot or we need to handle PIs in internal/external subsets :-(.
I realized that the comments part wouldn't reserialize correctly as it was
written too...
Attachment #194367 - Attachment is obsolete: true
Attachment #194413 - Flags: superreview?(peterv)
Attachment #194413 - Flags: review?(peterv)
Attachment #194413 - Flags: superreview?(peterv)
Attachment #194413 - Flags: superreview+
Attachment #194413 - Flags: review?(peterv)
Attachment #194413 - Flags: review+
Comment on attachment 194413 [details] [diff] [review]
More comprehensive

I think we want this for 1.8b.	Without this patch, random parts of external
DTDs can end up in a page DOM.
Attachment #194413 - Flags: approval1.8b4?
Bz,

Can we get a risk-assement on this bug this late in the game?
Fixed on trunk.

schrep, the risk is fairly low.  The only codepaths affected are:

1)  Loading of an external DTD that contains comments, whitespace or PIs
2)  Loading of an internal subset that contains comments or PIs

The former case only happens with our in-tree DTDs, since we don't generally
load external DTDs.  In all cases, the only real change is to not create nodes
in the DOM for the stuff from the DTD, which is effectively a regression fix.  I
don't see this fix breaking anything.
Status: NEW → RESOLVED
Closed: 19 years ago
Resolution: --- → FIXED
Comment on attachment 194413 [details] [diff] [review]
More comprehensive

Thnx for the analysis bz.
Attachment #194413 - Flags: approval1.8b4? → approval1.8b4+
Fixed on branch.
Keywords: fixed1.8
*** Bug 306977 has been marked as a duplicate of this bug. ***
Keywords: fixed1.8verified1.8
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: