Closed
Bug 13015
Opened 25 years ago
Closed 25 years ago
Mozilla confused by comments in titles
Categories
(Core :: DOM: HTML Parser, defect, P3)
Core
DOM: HTML Parser
Tracking
()
VERIFIED
WORKSFORME
M13
People
(Reporter: jmorzins, Assigned: harishd)
References
()
Details
What were you trying to do? Read the web page http://technology.news.com.au/news/4276366.htm , a page that contains a comment in its title. What's wrong? Mozilla does not realize that the comment is a comment, and renders it as part of the page's title. Whats should have happened: Mozilla should have omitted the comment before memorizing the page's title. Got any documentation? I've put up a simple example page at http://web.mit.edu/jmorzins/www/netscape-title-bug.html The page is accepted as valid html-4.0 by validator.w3.org, and contains a comment inside its title. When viewed in netscape, netscape is unaware of the comment, and treats the page as if its title were "<!--This title has a comment in it.--> This is the title" Thank you, Jacob Morzinski
Excerpt (w3c.org): "Titles may contain character entities (for accented characters, special characters, etc.), but may not contain other markup."
The content model of TITLE is #PCDATA, not CDATA, so comments are allowed, I think. By other markup, the spec meant other elements.
Updated•25 years ago
|
OS: Linux → All
Hardware: PC → All
Comment 5•25 years ago
|
||
Yup, comments in <title>...</title> should be treated exactly like comments in, say, <span>...</span>.
Comment 6•25 years ago
|
||
Asked the question, "HTML comments in <title> elements - valid or not?" the www-html mailing list has replied. (See the thread of the same name in <URL:http://lists.w3.org/Archives/Public/www-html/1999Nov/>, now dormant.) Summarizing the responses, the important point is that comments are considered to be markup in HTML - actually, are defined to be markup in the SGML productions for HTML - see <URL:http://www.w3.org/MarkUp/SGML/productions.html#prod91>. Given that the only markup that is allowed in <title>s is character entities, <URL:http://www.w3.org/TR/REC-html40/struct/global.html#h-7.4.2>, comments must not occur within <title>s, despite the content model of title being #PCDATA. This is one of the places where the specification constrains valid HTML beyond the constraints imposed by the DTD. For better or for worse, all of the HTML specification's requirements cannot be expressed in a DTD. (Under <URL:http://www.w3.org/TR/REC-html40/intro/sgmltut.html#h-3.1> the following statement about any application of SGML is applicable: "3.A specification that describes the semantics to be ascribed to the markup. This specification also imposes syntax restrictions that cannot be expressed within the DTD.") (It was pointed out that most major browsers treat the <title> element as having a content model of #CDATA, sidestepping the issue of parsing entities but no other markup. Mozilla gets this right: "Réné" shows up as four characters, two of them accented "e"s, and a title like "Why to avoid using <FONT> tags" shows up exactly like that.) It looks like this bug report can be marked INVALID. (BTW, no validator that works from SGML DTDs only can catch compliance issues with aspects of the specification that are not encoded in the DTD, like this one. Such a validator would also allow WIDTH="ceci n'est pas une pipe" in order to be able to allow WIDTH="50%" "<!ENTITY % Length "CDATA" -- nn for pixels or nn% for percentage length -->")
Comment 7•25 years ago
|
||
No, we should be treating comments in <TITLE> blocks the same as in any other #PCDATA blocks. The spec quote you give is a validity constraint, which means that documents that break it are invalid. It does not apply to user agents (web browsers). The part of the spec that applies to the parser is indeed the DTD, and that says that we should parse comments in TITLE elements.
Comment 8•25 years ago
|
||
If markup in #PCDATA in <title>s *could* be honoured just as any other #PCDATA, then it would make sense to leave it at that and the validity constraint "Titles may contain character entities (for accented characters, special characters, etc.), but may not contain other markup" at <URL:http://www.w3.org/TR/REC-html40/struct/global.html#h-7.4.2> would not be necessary or meaningful. But if #PCDATA in <title>s were parsed the same as any other #PCDATA, the markup it contained could still not be used in the contexts in which <title> data gets used, in the same way that same markup can be used in other contexts. As far as I can tell, neither the spec nor the DTD define what is to be done by a User Agent with invalidly present comments or other markup. For comments, the choice is between honouring it (parsing it out) or leaving it as it is. For other markup, the choices are to try to honour it, parse it out as if it were unknown markup, or leave it as it is. Given that the spec *does* say that markup other than character entities (including comments) are not to appear in <title>s, the writer who puts them in despite that has no reason to expect any of those choices in particular, certainly not consistently. Given those choices, given the lack of direction from the spec, and given that as a practical matter #PCDATA in <title> is a special case (whatever else is done, markup other than comments (honoured by removing) and character entities cannot be honoured in a typical application title bar) the simplest way to handle the content of <title> elements is exactly how Mozilla is currently handling it: replace the character entities with the characters they represent and otherwise display as-is. Aside from that, given a choice between an implementation that discourages invalid HTML by putting it on display, and an implementation that honours invalid HTML and thus does nothing discourage it, what exactly is gained by putting in additional effort to make the latter happen? As far as I can see, the validity of this bug report depends on which of those two implementations is better. Is the latter truly *required* by the DTD for <title> regardless of what the spec says, or is an implementor free to choose an implementation truer to the intent of the whole of the spec? This is the main question. Finally, if #PCDATA for <title> were parsed exactly like any other #PCDATA, what would be done with the other markup that might also be invalidly present after it was parsed and made part of the DOM? This awkward question is not about parsing but about what happens after parsing, and if the answer is "nothing useful" - why parse markup (including comments) at all? The obvious rejoinder I can see to that is "No, treat <title> content as #PCDATA except for markup that isn't comments or character entities" - but would that not then be just as much a violation of the DTD? The only other rejoinder I can see to that is "Because the DTD says to" - so, is that the absolute last word, and what then is to be done with the parsed <I> and <B> and <FONT> tags (and these, deprecated as they are, are about the only markup that those flouting the spec would want to put in) that were never valid in the first place?
I'm closing this bug. As far as I can tell, we're behaving identically to both navigator and IE.
Updated•25 years ago
|
Status: RESOLVED → VERIFIED
Comment 10•25 years ago
|
||
ok
You need to log in
before you can comment on or make changes to this bug.
Description
•