Last Comment Bug 94284 - Implement null end tags (SHORTTAG NET) <foo/.../
: Implement null end tags (SHORTTAG NET) <foo/.../
Status: RESOLVED WONTFIX
[Hixie-P2]
: testcase
Product: Core
Classification: Components
Component: HTML: Parser (show other bugs)
: Trunk
: All All
: -- normal (vote)
: ---
Assigned To: parser
: Moied
Mentors:
http://www.nyct.net/~aray/sgml/short/...
: 177075 (view as bug list)
Depends on: 107904
Blocks:
  Show dependency treegraph
 
Reported: 2001-08-08 04:13 PDT by Andreas M. "Clarence" Schneider
Modified: 2004-11-03 21:26 PST (History)
12 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments

Description Andreas M. "Clarence" Schneider 2001-08-08 04:13:27 PDT
Currently we support XML-style empty tags in HTML. This should be quirks mode
only because it blocks implementation of correct SGML behavior: "<foo/" is a
net (null end tag) enabling start tag and should be closed by a single '/' or by
the next start tag, but not by a '>'.

I expect little problems with broken pages because a typical use like
<title/><... will become the same as <title>&gt;</title><... if we implement
null end tags at the same time, that's not too bad IMO. I do not know yet what
the correct behavior with elements declared as empty is. <br/> might cause
problems.
Assigning this to myself, but low priority for now.
Comment 1 Christopher Hoess (gone) 2001-08-08 09:43:33 PDT
Look around line 617 of nsHTMLTokenizer.cpp, and possibly also line 765...an 
extra condition there checking the flag for parsing mode should do it.
Comment 2 Andreas M. "Clarence" Schneider 2001-08-08 10:42:09 PDT
But that won't implement null end tags. We need a solution what to do instead.
Ok, we could simply ignore '/', but then nobody would understand why we are
doing this.
Comment 3 Henri Sivonen (:hsivonen) 2001-08-08 13:26:13 PDT
Please note that XHTML sent as text/html activates the standards mode *and*
contain XML-style tags.
Comment 4 Bernard Alleysson 2001-08-08 14:06:38 PDT
May be related to bug 84633 ? (this was marked WONTFIX :-))
See also bug 84939
Is it a fix for this kind of bugs (in quirks mode) ?
Comment 5 Andreas M. "Clarence" Schneider 2001-08-08 15:16:56 PDT
Henri: Yes, this doesn't apply to XHTML served as text/html.

Bernard: Actually this would fix bugs like bug 84633 if applied to quirks mode
(aside from inserting a '>' into the element), but it would break other things
instead. So I propose behavior like in XHTML for quirks mode (as it is now) and
correct SGML/HTML behavior for standards mode.
Comment 6 Christopher Hoess (gone) 2001-08-09 12:11:23 PDT
Clarence: I was just looking at the parts of the code that allow XML-style empty
tags to be parsed in HTML, which will have to be pushed into a conditional so
that they only work in Quirks Mode or XHTML and XML modes.  For modifying things
to allow "/" to close the tag, I think you'll want to modify the conditional on
line 545, and probably the ConsumeEndTag function on line 806.
Comment 7 Andreas M. "Clarence" Schneider 2001-08-10 00:04:04 PDT
I know the code because I'm rewritng it for bug 57724. If '/' closes a tag, the
next '/' should we recognized as end tag closing the element too, e.g.
<head>  <title /my title/ </head> .
Comment 8 Heikki Toivonen (remove -bugzilla when emailing directly) 2001-08-10 09:59:17 PDT
Just out of curiosity: why are you implementing this? Do you plan to make
Mozilla into an SGML browser (a la DocZilla) or what? I don't think there are
too many HTML authors that know about SGML's NET (null end tag). I have an SGML
background but I have never seen NET been used in a production environment.
Comment 9 Andreas M. "Clarence" Schneider 2001-08-10 10:25:09 PDT
Heikki: I noticed that we treat "/>" wrong and because HTML *is* a SGML language
I think Mozilla should support basic SGML parsing. I'm familiar with the parser
code, so it wouldn't too hard for me to implement this.

If you think we shouldn't support such SGML features, we should discuss it in
the newsgroups and decide then if this ever should be fixed or if it's a wontfix.
Comment 10 Heikki Toivonen (remove -bugzilla when emailing directly) 2001-08-10 11:16:03 PDT
Well, we are seeing a few sites (I think even top100) abuse /> syntax in HTML
(like <select />My item</select>). Besides <br /> (notice the space) has been
documented as the way to encode your empty tags for now if you plan a transition
to XHTML because it works in most browsers. If/when the time comes when you want
to switch to XHTML for real, you can simply switch the mime type. My guess is
people also use the HTML doctype instead of XHTML doctype.

I am a little scared that we will break sites that work now. I think I am
against this change if it is just for standards compliance. After all, we need
to be as lenient as possible with HTML. We are already receiving too much flak
with our handling of bad HTML.
Comment 11 Andreas M. "Clarence" Schneider 2001-08-10 14:56:08 PDT
AIUI, <br /> has been documented in http://www.w3.org/TR/xhtml1/#guidelines for
documents containing an XHTML doctype only. I understand your concerns, but why
do we have strict parsing mode then? Nearly nobody uses comments like
"<!---- --> -- >", but we parse them nevertheless. That breaks pages too.

I noticed that Netscape no longer mentions standards compliance as a reason
to use Netscape 6.1 ( http://home.netscape.com/browsers/6/switch.html ). We
might indeed consider HTML as a dying language, limit our support for it to the
extent other browsers have supported it in the past and continue to have the
same old bugs. But then we shoudn't claim conformance with HTML 4.x. The SGML
declaration of HTML 4.01 includes NET functionality.

I'm not very keen on fixing this. But I don't want that we just ignore such
bugs (there are others too, e.g. bug 47522, and I could file some more). If we
decide not to fix them, it's IMO ok if we don't keep secret that we're going to
drop full HTML 4 support.
Comment 12 Heikki Toivonen (remove -bugzilla when emailing directly) 2001-08-10 15:02:37 PDT
Is someone using NET in HTML, and expecting it to work? Can you show me browsers
where it works? If there are no people using this feature, and/or no browsers
supporting it I don't see the point (especially if it can break other pages).
Comment 13 Andreas M. "Clarence" Schneider 2001-08-10 15:45:30 PDT
Nobody is using it because it's nowhere working yet. But it's still a bug as
long as we claim conformance to HTML 4. We might better fix bugs like bug 74201
instead though.
Comment 14 Boris Zbarsky [:bz] 2002-05-18 10:27:25 PDT
Note that the HTML parser now parser <foo /> as just <foo> (instead of the
<foo></foo> it used to treat it as).
Comment 15 Christopher Hoess (gone) 2002-10-28 05:39:26 PST
*** Bug 177075 has been marked as a duplicate of this bug. ***
Comment 16 Hixie (not reading bugmail) 2002-10-28 07:24:49 PST
From bug 177075:

We should implement the null end tag SHORTTAG feature of HTML.

For elements with close tags that would mean looking for a "/" instead of a
"</foo>"; for tags that have no close tags, it would mean accepting "/" as an
end character as well as ">". Note that other elements can be nested inside
elements that with a NET.

This should only be done in standards mode, not in quirks mode. (In fact, I
would even suggest avoiding doing it in almost standards mode, since those
documents are numerous, and usually XHTML.)

Testcases: http://www.hixie.ch/tests/adhoc/html/parsing/shorttag/net/


Reassigning to bz since the current assignee has not worked on this for a while
and I'd like this to be on someone's radar.

This sounds relatively easy to implement.
Comment 17 Boris Zbarsky [:bz] 2002-10-28 08:42:43 PST
Um.. I never plan to work on this.  I never thought I'd see Ian make the "this
sounds relatively easy to implement" mistake... ;)  No substantive change to the
tokenizer is easy and testing the cascading effects on the dtd and the content
sink (and then fixing the problems) would take forever.  If I _do_ decide to
mess with the tokenizer, I'll be fixing our hang bugs and line number
miscounting and the like....

Sorry, but reassigning to default component owner.
Comment 18 Boris Zbarsky [:bz] 2002-10-28 08:42:59 PST
.
Comment 19 Blake Kaplan (:mrbkap) 2004-10-15 16:56:39 PDT
While this is implementable, this would break XHTML files served as text/html.
Making this standards only makes implementing it twice as hard anyway. Also no
other major browser implements this feature, so I'm marking this WONTFIX.
Comment 20 Hixie (not reading bugmail) 2004-11-03 21:26:19 PST
bah!

Note You need to log in before you can comment on or make changes to this bug.