Closed Bug 477455 Opened 15 years ago Closed 15 years ago

Parser does not wait for "?>" to close blocks that begin with "<?"

Categories

(Core :: DOM: HTML Parser, defect)

defect
Not set
normal

Tracking

()

RESOLVED INVALID

People

(Reporter: GPHemsley, Unassigned)

Details

The markup parser knows that blocks that begin with "<?" are special (the source code viewer highlights them as pink), so it should know that those blocks can only end with "?>". It currently doesn't, meaning it ends the block whenever it encounters any old ">". This doesn't seem right to me.

The XML declaration requires the question mark end tag, as does PHP. I can't think of any markup language (or any other language, for that matter) that opens with a question mark but doesn't close with one.
Well, HTML actually requires this. http://www.w3.org/TR/html4/appendix/notes.html#h-B.3.6 is the relevant specification.
Hmm... well the beginning of that Appendix B says that it is informative, not normative, but it says that everything listed there is already defined elsewhere. Could you point me to where that issue is defined in the normative part of the spec?

Also, does anyone anywhere use these SGML "processing instructions" as they are defined in the spec? Seriously, just point me to one valid use that Firefox is expected to (and does) interpret correctly.

(However, XSLT processing instructions might have to be looked into. I can't figure out if they end with ">" or "?>". And those would be necessary to support, right? [Even if I don't know what they're for....])
(In reply to comment #2)
> Hmm... well the beginning of that Appendix B says that it is informative, not
> normative, but it says that everything listed there is already defined
> elsewhere. Could you point me to where that issue is defined in the normative
> part of the spec?

It isn't, except if you count that HTML is an SGML application.

Currently HTML5 specifies <? ... >. I don't have strong feelings about this one way or the other, you might want to bring it up on the WhatWG list.
<? ... ?> is an XMLism.
<? ... > is an SGMLism.
HTML comes from an SGML heritage (it predates XML).

However this is all rather academic, because HTML doesn't actually have any valid <? ... > constructs, and so there's not really anything to support here.
Status: NEW → RESOLVED
Closed: 15 years ago
Resolution: --- → INVALID
(In reply to comment #4)
> <? ... ?> is an XMLism.
> <? ... > is an SGMLism.
> HTML comes from an SGML heritage (it predates XML).
> 
> However this is all rather academic, because HTML doesn't actually have any
> valid <? ... > constructs, and so there's not really anything to support here.
Well, if HTML doesn't have valid <? ... > constructs to support, what's the harm in changing it to <? ... ?> parsing?
What's the benefit?

Changing things usually has a cost associated with it, in terms of needed coding, testing, documenting, etc, and often causes compatibility problems. There are almost certainly pages that depend on the current error handling. Since all the browsers pretty much agree on this, why change it?
Well, I was really prompted to report this when I loaded a page that contained PHP code in it. Rather than match the <?php with ?>, the parser (and the view-source parser, as I recall) stopped a > within the code, and thus left the remaining PHP out in the open for the markup parser to mishandle.

Plus, I like symmetry. :)
Don't send PHP code to the browser. It'll fail just like if you send C++ code to the browser. :-)
You need to log in before you can comment on or make changes to this bug.