Closed Bug 94284 Opened 23 years ago Closed 20 years ago

Implement null end tags (SHORTTAG NET) <foo/.../

Categories

(Core :: DOM: HTML Parser, defect)

defect
Not set
normal

Tracking

()

RESOLVED WONTFIX

People

(Reporter: c, Unassigned)

References

()

Details

(Keywords: testcase, Whiteboard: [Hixie-P2])

Currently we support XML-style empty tags in HTML. This should be quirks mode
only because it blocks implementation of correct SGML behavior: "<foo/" is a
net (null end tag) enabling start tag and should be closed by a single '/' or by
the next start tag, but not by a '>'.

I expect little problems with broken pages because a typical use like
<title/><... will become the same as <title>&gt;</title><... if we implement
null end tags at the same time, that's not too bad IMO. I do not know yet what
the correct behavior with elements declared as empty is. <br/> might cause
problems.
Assigning this to myself, but low priority for now.
Priority: -- → P5
Look around line 617 of nsHTMLTokenizer.cpp, and possibly also line 765...an 
extra condition there checking the flag for parsing mode should do it.
But that won't implement null end tags. We need a solution what to do instead.
Ok, we could simply ignore '/', but then nobody would understand why we are
doing this.
Please note that XHTML sent as text/html activates the standards mode *and*
contain XML-style tags.
May be related to bug 84633 ? (this was marked WONTFIX :-))
See also bug 84939
Is it a fix for this kind of bugs (in quirks mode) ?
Henri: Yes, this doesn't apply to XHTML served as text/html.

Bernard: Actually this would fix bugs like bug 84633 if applied to quirks mode
(aside from inserting a '>' into the element), but it would break other things
instead. So I propose behavior like in XHTML for quirks mode (as it is now) and
correct SGML/HTML behavior for standards mode.
Clarence: I was just looking at the parts of the code that allow XML-style empty
tags to be parsed in HTML, which will have to be pushed into a conditional so
that they only work in Quirks Mode or XHTML and XML modes.  For modifying things
to allow "/" to close the tag, I think you'll want to modify the conditional on
line 545, and probably the ConsumeEndTag function on line 806.
I know the code because I'm rewritng it for bug 57724. If '/' closes a tag, the
next '/' should we recognized as end tag closing the element too, e.g.
<head>  <title /my title/ </head> .
Just out of curiosity: why are you implementing this? Do you plan to make
Mozilla into an SGML browser (a la DocZilla) or what? I don't think there are
too many HTML authors that know about SGML's NET (null end tag). I have an SGML
background but I have never seen NET been used in a production environment.
Heikki: I noticed that we treat "/>" wrong and because HTML *is* a SGML language
I think Mozilla should support basic SGML parsing. I'm familiar with the parser
code, so it wouldn't too hard for me to implement this.

If you think we shouldn't support such SGML features, we should discuss it in
the newsgroups and decide then if this ever should be fixed or if it's a wontfix.
Well, we are seeing a few sites (I think even top100) abuse /> syntax in HTML
(like <select />My item</select>). Besides <br /> (notice the space) has been
documented as the way to encode your empty tags for now if you plan a transition
to XHTML because it works in most browsers. If/when the time comes when you want
to switch to XHTML for real, you can simply switch the mime type. My guess is
people also use the HTML doctype instead of XHTML doctype.

I am a little scared that we will break sites that work now. I think I am
against this change if it is just for standards compliance. After all, we need
to be as lenient as possible with HTML. We are already receiving too much flak
with our handling of bad HTML.
AIUI, <br /> has been documented in http://www.w3.org/TR/xhtml1/#guidelines for
documents containing an XHTML doctype only. I understand your concerns, but why
do we have strict parsing mode then? Nearly nobody uses comments like
"<!---- --> -- >", but we parse them nevertheless. That breaks pages too.

I noticed that Netscape no longer mentions standards compliance as a reason
to use Netscape 6.1 ( http://home.netscape.com/browsers/6/switch.html ). We
might indeed consider HTML as a dying language, limit our support for it to the
extent other browsers have supported it in the past and continue to have the
same old bugs. But then we shoudn't claim conformance with HTML 4.x. The SGML
declaration of HTML 4.01 includes NET functionality.

I'm not very keen on fixing this. But I don't want that we just ignore such
bugs (there are others too, e.g. bug 47522, and I could file some more). If we
decide not to fix them, it's IMO ok if we don't keep secret that we're going to
drop full HTML 4 support.
Is someone using NET in HTML, and expecting it to work? Can you show me browsers
where it works? If there are no people using this feature, and/or no browsers
supporting it I don't see the point (especially if it can break other pages).
Nobody is using it because it's nowhere working yet. But it's still a bug as
long as we claim conformance to HTML 4. We might better fix bugs like bug 74201
instead though.
QA Contact: bsharma → moied
Depends on: 107904
Note that the HTML parser now parser <foo /> as just <foo> (instead of the
<foo></foo> it used to treat it as).
*** Bug 177075 has been marked as a duplicate of this bug. ***
From bug 177075:

We should implement the null end tag SHORTTAG feature of HTML.

For elements with close tags that would mean looking for a "/" instead of a
"</foo>"; for tags that have no close tags, it would mean accepting "/" as an
end character as well as ">". Note that other elements can be nested inside
elements that with a NET.

This should only be done in standards mode, not in quirks mode. (In fact, I
would even suggest avoiding doing it in almost standards mode, since those
documents are numerous, and usually XHTML.)

Testcases: http://www.hixie.ch/tests/adhoc/html/parsing/shorttag/net/


Reassigning to bz since the current assignee has not worked on this for a while
and I'd like this to be on someone's radar.

This sounds relatively easy to implement.
Assignee: c → bzbarsky
Keywords: testcase
Summary: Make XML-style empty tags in HTML like <foo/> quirks mode only → Implement null end tags (NET) <foo/.../
Whiteboard: [Hixie-P2]
Um.. I never plan to work on this.  I never thought I'd see Ian make the "this
sounds relatively easy to implement" mistake... ;)  No substantive change to the
tokenizer is easy and testing the cascading effects on the dtd and the content
sink (and then fixing the problems) would take forever.  If I _do_ decide to
mess with the tokenizer, I'll be fixing our hang bugs and line number
miscounting and the like....

Sorry, but reassigning to default component owner.
.
Assignee: bzbarsky → harishd
Target Milestone: --- → Future
Priority: P5 → --
Target Milestone: Future → ---
Assignee: harishd → parser
Summary: Implement null end tags (NET) <foo/.../ → Implement null end tags (SHORTTAG NET) <foo/.../
While this is implementable, this would break XHTML files served as text/html.
Making this standards only makes implementing it twice as hard anyway. Also no
other major browser implements this feature, so I'm marking this WONTFIX.
Status: NEW → RESOLVED
Closed: 20 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.