Currently we support XML-style empty tags in HTML. This should be quirks mode only because it blocks implementation of correct SGML behavior: "<foo/" is a net (null end tag) enabling start tag and should be closed by a single '/' or by the next start tag, but not by a '>'. I expect little problems with broken pages because a typical use like <title/><... will become the same as <title>></title><... if we implement null end tags at the same time, that's not too bad IMO. I do not know yet what the correct behavior with elements declared as empty is. <br/> might cause problems. Assigning this to myself, but low priority for now.
Look around line 617 of nsHTMLTokenizer.cpp, and possibly also line 765...an extra condition there checking the flag for parsing mode should do it.
But that won't implement null end tags. We need a solution what to do instead. Ok, we could simply ignore '/', but then nobody would understand why we are doing this.
Please note that XHTML sent as text/html activates the standards mode *and* contain XML-style tags.
Henri: Yes, this doesn't apply to XHTML served as text/html. Bernard: Actually this would fix bugs like bug 84633 if applied to quirks mode (aside from inserting a '>' into the element), but it would break other things instead. So I propose behavior like in XHTML for quirks mode (as it is now) and correct SGML/HTML behavior for standards mode.
Clarence: I was just looking at the parts of the code that allow XML-style empty tags to be parsed in HTML, which will have to be pushed into a conditional so that they only work in Quirks Mode or XHTML and XML modes. For modifying things to allow "/" to close the tag, I think you'll want to modify the conditional on line 545, and probably the ConsumeEndTag function on line 806.
I know the code because I'm rewritng it for bug 57724. If '/' closes a tag, the next '/' should we recognized as end tag closing the element too, e.g. <head> <title /my title/ </head> .
Just out of curiosity: why are you implementing this? Do you plan to make Mozilla into an SGML browser (a la DocZilla) or what? I don't think there are too many HTML authors that know about SGML's NET (null end tag). I have an SGML background but I have never seen NET been used in a production environment.
Heikki: I noticed that we treat "/>" wrong and because HTML *is* a SGML language I think Mozilla should support basic SGML parsing. I'm familiar with the parser code, so it wouldn't too hard for me to implement this. If you think we shouldn't support such SGML features, we should discuss it in the newsgroups and decide then if this ever should be fixed or if it's a wontfix.
Well, we are seeing a few sites (I think even top100) abuse /> syntax in HTML (like <select />My item</select>). Besides <br /> (notice the space) has been documented as the way to encode your empty tags for now if you plan a transition to XHTML because it works in most browsers. If/when the time comes when you want to switch to XHTML for real, you can simply switch the mime type. My guess is people also use the HTML doctype instead of XHTML doctype. I am a little scared that we will break sites that work now. I think I am against this change if it is just for standards compliance. After all, we need to be as lenient as possible with HTML. We are already receiving too much flak with our handling of bad HTML.
AIUI, <br /> has been documented in http://www.w3.org/TR/xhtml1/#guidelines for documents containing an XHTML doctype only. I understand your concerns, but why do we have strict parsing mode then? Nearly nobody uses comments like "<!---- --> -- >", but we parse them nevertheless. That breaks pages too. I noticed that Netscape no longer mentions standards compliance as a reason to use Netscape 6.1 ( http://home.netscape.com/browsers/6/switch.html ). We might indeed consider HTML as a dying language, limit our support for it to the extent other browsers have supported it in the past and continue to have the same old bugs. But then we shoudn't claim conformance with HTML 4.x. The SGML declaration of HTML 4.01 includes NET functionality. I'm not very keen on fixing this. But I don't want that we just ignore such bugs (there are others too, e.g. bug 47522, and I could file some more). If we decide not to fix them, it's IMO ok if we don't keep secret that we're going to drop full HTML 4 support.
Is someone using NET in HTML, and expecting it to work? Can you show me browsers where it works? If there are no people using this feature, and/or no browsers supporting it I don't see the point (especially if it can break other pages).
Nobody is using it because it's nowhere working yet. But it's still a bug as long as we claim conformance to HTML 4. We might better fix bugs like bug 74201 instead though.
Note that the HTML parser now parser <foo /> as just <foo> (instead of the <foo></foo> it used to treat it as).
*** Bug 177075 has been marked as a duplicate of this bug. ***
From bug 177075: We should implement the null end tag SHORTTAG feature of HTML. For elements with close tags that would mean looking for a "/" instead of a "</foo>"; for tags that have no close tags, it would mean accepting "/" as an end character as well as ">". Note that other elements can be nested inside elements that with a NET. This should only be done in standards mode, not in quirks mode. (In fact, I would even suggest avoiding doing it in almost standards mode, since those documents are numerous, and usually XHTML.) Testcases: http://www.hixie.ch/tests/adhoc/html/parsing/shorttag/net/ Reassigning to bz since the current assignee has not worked on this for a while and I'd like this to be on someone's radar. This sounds relatively easy to implement.
Um.. I never plan to work on this. I never thought I'd see Ian make the "this sounds relatively easy to implement" mistake... ;) No substantive change to the tokenizer is easy and testing the cascading effects on the dtd and the content sink (and then fixing the problems) would take forever. If I _do_ decide to mess with the tokenizer, I'll be fixing our hang bugs and line number miscounting and the like.... Sorry, but reassigning to default component owner.
While this is implementable, this would break XHTML files served as text/html. Making this standards only makes implementing it twice as hard anyway. Also no other major browser implements this feature, so I'm marking this WONTFIX.