Closed Bug 101600 Opened 20 years ago Closed 17 years ago
If Transitional DOCTYPE does not end in //EN, standards mode is triggered
I have two documents containing basically the same content, but written in different languages, therefore having different DOCTYPEs. I read bug 42525 - Additional Comments From Henri Sivonen 2000-11-01 13:51 [...] * Among others, no doctype, HTML 2.0, HTML 3.2 and these doctype declarations trigger the quirks layout mode: <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"> <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN"> <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd"> [...] My documents use the following DOCTYPEs: <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//en"> <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//de"> the URLs for those documents are (same order as DOCTYPEs above): http://www.kairo.at/error404-main.html.en http://www.kairo.at/error404-main.html.de If you look at the documents, you'll see the contents of .en documents get almost centered vertically, it's using a table height of 70%, which is only rendered that way in quirks mode. In .de document, there is no centering, because standards mode is triggered. I believe that //en or //de in DOCTYPE shouldn't result in treating the document that differently, or you would assume that english documents are buggier than those of other languages...
reassigning to dbaron, who wrote that code.
Assignee: attinasi → dbaron
*** Bug 101686 has been marked as a duplicate of this bug. ***
dbaron's new doctype sniffing code has a list of quirky doctypes. Anything else gets the standards treatment for forward compatibility purposes. <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//de"> is not on the list of quirky doctypes, because is was though to not exist on the Web as there is no such W3C doctype. The public identifier is bogus, because it contains "de" where the right chars would be "EN". The substring "EN" indicates the natural language in which the normative version of the markup language specification is written. The string is always "EN" for HTML, because only the English version of the specification is normative. Also, in the real public indentifier, the string "EN" should be in upper case. Composer 4.x emits a doctype with bogus case, so the matching of quirky doctypes is case-insensitive. Now that I think about it, I have seen this once before in the Mozilla newsgroups. Someone had substituted "EN" with "PL" for a Polish page.
Note: the //EN part of the DOCTYPE does *not* refer to the language of the document. It refers to the language of the HTML elements. Therefore, it is always English. To change the language of the document, you should use the 'lang' attribute on the 'html' element, as in: <html lang="de"> ... </html>
*** Bug 102495 has been marked as a duplicate of this bug. ***
Why is the 'standards treatment' not the same like the 'EN' treatment, when 'EN' is the standard? This would help many visitors of such wrong Websites.
>Why is the 'standards treatment' not the same like the 'EN' treatment, when >'EN' is the standard? There are too many sites out there that use the HTMl 4.0 Transitional doctype and don't comply with the W3C Recommendations. However, the main reason is bug 22274. See http://www.hut.fi/u/hsivonen/doctype on how to use the standards mode with transitional docs. BTW, KaiRo, how did you happen to put //de there? Is there a erroneous tutorial somewhere out there?
> Why is the 'standards treatment' not the same like the 'EN' treatment, when 'EN' > is the standard? We have a list of known existing doctypes that should trigger quirks mode (as should the absence of a doctype). To be future-compatible, all other doctypes trigger standards mode. I'm hesitant to add variants for all sorts of languages to this list just because a handful of people fiddle with public identifiers thinking that they represent the language of the document. See: http://www.people.fas.harvard.edu/~dbaron/mozilla/doctypes
If I understand you correctly, this is a web designer's bug, and we don't want to add additional "buggy" DOCTYPES to our quirks list - so this is a WONTFIX and it should be marked as that... BTW, that pages I initionally use to file this bug are fixed and show right doctypes (but also doctypes that trigger standards and not quirks mode)
BTW, Henri, this was no erroneous documentation, this was just wrong guessing at my side :(
*** Bug 110265 has been marked as a duplicate of this bug. ***
In my opinion a good Browser should be able to handle mistakes of web designers. At least the most frequent ones. IE, Opera, Netscape and even Mozilla till 0.94 can do it. Why not 0.95 and newer ones. If I understand you correctly, there can't be anything else than 'EN' after '//'. So why don't we ignore these two letters and assume there is 'EN'. I don't know if this is the right way to handle this, but you can't correct every web designer and the enduser doesn't care whose fault it is. The only thing he can notice is: Every other Browser can handle it an Mozilla can't.
The reason is that we're trying to parse doctype declarations in a forward-compatible way. See http://mozilla.org/docs/web-developer/quirks/ Marking as WONTFIX.
Status: NEW → RESOLVED
Closed: 20 years ago
Resolution: --- → WONTFIX
Do You really think there will be any other language for html than English in the future??? I don't. But many websites with this bug are fact and wil be there in future.
4 doesn't qualify as many.
So You won't fix it. OK. I don't know how many Sites with that bug are out there, but I can imagine there are more then four. Perhaps, when I have time, I make a working version for myself.
I found a similar problem at http://www.clubic.com This time it's with //FR. This page uses an obviously incorrect doctype of !DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//FR". I feel like Mozilla's doctype detection should not be so tight with obviously old and what should be quirks mode doctypes. I think it would be better if the doctype detection were more of a pattern mattching, so "-//W3C//DTD HTML 3. anything" would force quirks mode. I don't see the harm of this for compatibility. I'd say the same thing about 4.0 Transitional. I seriously doubt that HTML 4.0 will ever support a different language, so why not do pattern matching (or just match the substring) instead of an exact string match for older doctypes? (Just noticed that I appear to be saying the same thing as comment #12.) BTW, this site reportedly worked in Mozilla in 0.9.4 and previous and also Netscape 6.2 and previous.
In your comment: ------- Additional Comment #17 From Tim Powell 2002-02-06 08:36 ------- I found a similar problem at http://www.clubic.com This time it's with //FR. This page uses an obviously incorrect doctype of !DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//FR". This would be an evangelism issue. If you find sites that are incorrectly using the doctype, then please feel free to open a new bug pointing to the URL and assign the bug to evangelism. Our evang team can contact those sites letting them know the appropriate constructs.
Evangelism shouldn't be necessary for this type of issue. It wastes many people's time: Web developers that need to change doctypes for one browser when others work, Mozilla evangelists that need to try to persuade web developers to make the change, users and bugzilla QA who try to figure out the problem, Mozilla developers who get bug reports and then need to try and figure out if this is a real bug or a markup problem. And the end result after successfully persauding the web developer to change things is exactly what? Mozilla now works like it should have in the first place? There's no benefit here. I can see doing evangelism for broken nav4 versus mozilla/netscape 6 code or for DHTML. There's little logic to this. I'm failing to see how treating older and somewhat broken doctypes as quirks hurts anybody. Because of the real time drain and for compatibility (see the compat keyword's description) this bug should be reopened and fixed.
Without question, the person who put //FR in the doctype made a mistake. email@example.com, do you have a proposed algorithm decribed in more detail than "DWIM"? Try searching the discussions on the W3C's mailing lists and in comp.infosystems.www.* and you'll find that elaborate DWIM gets more confusing than using a simple list of doctypes.
> Without question, the person who put //FR in the doctype made a mistake. Yes. No argument there. How should Mozilla handle the mistake? Should it intentionally break a page (handle as strict) that works in other browsers? It's obvious from the DTD that this should be handled by quirks. > do you have a proposed algorithm decribed in more detail than "DWIM" I thought I was being clear above, but in case I wasn't, I'm suggesting that Mozilla just do substring matches instead of exact string matches of the following DTDs and treat them as quirks: -//W3C//DTD HTML 4.01 T, -//W3C//DTD HTML 4.0 T, -//W3C//DTD HTML 3., and -//W3C//DTD HTML 2. I suggest 4.0 and 4.01 T to make up for likely typos in the word Transitional, but don't feel strongly about this. I would also think that the comparisons should be done case insensitively. It sounds like from comment #3 that that is already the case, so that's good.
The reason we do this is that guessing algorithms just aren't forward compatible. You seem to have forgotten about (or never known about) the problem that we behaved differently for 4.01 Transitional doctypes depending on whether the URI used was the TR/REC-html40/ URL or the TR/REC-html40-YYYYMMDD/ URL because it changed whether the word "loose" was in the first 25 characters of the URI.
I fail to see how a substring match is "guessing". I remain (blissfully?) unaware of any problem with loose in a 4.01 URI but it sounds awful. It seems to me that Mozilla could quite reasonably consider all HTML 2.x and HTML 3.x DTDs as needing quirks mode regardless of URI. This would also be consistent with IE6 at least as its doctype handling is documented: http://msdn.microsoft.com/workshop/author/dhtml/reference/objects/doctype.asp It would also seem that the matching for 4.0x Transitional could ignore the language such as //EN (but care about the URI).
*** Bug 104372 has been marked as a duplicate of this bug. ***
*** Bug 159625 has been marked as a duplicate of this bug. ***
*** Bug 166260 has been marked as a duplicate of this bug. ***
*** Bug 170949 has been marked as a duplicate of this bug. ***
*** Bug 192515 has been marked as a duplicate of this bug. ***
Even <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//DE"> gets rendered in standards mode wich is definitly no Good Idea(tm). Please reopen.
2 arguments for the reopen: 1. many many non-english webpages, which are typical candidated for being rendered in quirks mode (as they have totally broken HTML, not only in the DOCTYPE) have this, and doesn't work in Mozilla so. 2. if such a wrong DOCTYPE doesn't show, that the page author didn't understand HTML (and so quite likely will make many mistakes, he didn't see in his MSIE), I don't know, what can show it.
Status: RESOLVED → REOPENED
Resolution: WONTFIX → ---
I'm not convinced.
Status: REOPENED → RESOLVED
Closed: 20 years ago → 18 years ago
Resolution: --- → WONTFIX
OK, reopening, although only to add a few doctypes to the list, not to modify general parsing of what should be an opaque string.
Status: RESOLVED → REOPENED
Priority: -- → P2
Resolution: WONTFIX → ---
Target Milestone: --- → mozilla1.5alpha
here goes a list of candidates: <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//DE"> <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//DE"> <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0//DE"> <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//DE"> I'm quite sure, 'strict', 'frameset' and all those with an URL are not hit by this, as editors doesn't set them as a default (which will be "fixed" by the author). Also found some //FR and one //NL in the dups.
Priority: P2 → --
Target Milestone: mozilla1.5alpha → ---
FWIW, doctype sniffing is confusing already (not to myself but to people who haven't read the code). Every time doctype sniffing changes, it becomes more confusing. And besides, changing this now would make Mozilla and Safari bahave differently--making things more confusing. > as editors doesn't set them as a default Do you mean some bogo-editors set //DE doctypes by default? Or that authors change the editor defaults to //DE?
> And besides, changing this now would make Mozilla and Safari bahave differently--making things more confusing. they behave different, if it comes to SGML-Comments and to image alignment. Also khtml always accepts CSS length without a unit - but these are the 3 points breaking so many pages.. > Do you mean some bogo-editors set //DE doctypes by default? Or that authors change the editor defaults to //DE? I hope only the last.
I have checked all the duplicates of this bug and all of them seems to have fixed their DOCTYPEs: Bugzilla URL DOCTYPE bug 101686 http://news.mail.ru/ no DOCTYPE at all bug 102495 http://www.tvtv.de is using //EN bug 110265 http://www.tvtv.de is using //EN bug 104372 http://www.clubic.com is using //EN bug 159625 URL is 404 but other pages at http://www.talkline.de is using //EN bug 166260 http://linuxfocus.org/Deutsch/July2002/article239.shtml using //EN bug 170949 URL is 404 but other pages at http://www.everlage.de/ is using //EN bug 192515 no URL provided As far as I can tell there are very few sites using these invalid DOCTYPEs and for those that do - are their layout really broken because of it? Does anyone know of a major site that is suffering from this problem? IMO, this problem can and should be handled through evangelization.
*** Bug 209378 has been marked as a duplicate of this bug. ***
*** Bug 221661 has been marked as a duplicate of this bug. ***
Do we still want to add to the list of DOCTYPEs?
QA Contact: petersen → ian
WONTFIX. Not enough sites depend on this.
Status: REOPENED → RESOLVED
Closed: 18 years ago → 17 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.