<a class="header-button" href="https://bugzilla.mozilla.org/home" title="Go to home page"> Bugzilla

Comment 9

•

25 years ago

I just wrote a perl script that would traverse dmoz.org and identify the DOCTYPEs on each site. My computer isn't in a situation to run this over the whole of dmoz, but I'll attach the script so that if someone wants to run it over a larger subset of the web, they can do so (It should run on any system that can support perl and wget). The results for the first 100 sites (alphabetically by category on dmoz) are as follows: 7 <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN"> 4 <!doctype html public "-//w3c//dtd html 4.0 transitional//en"> 2 <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd"> 1 <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN"> 1 <!DOCTYPE HTML PUBLIC '-//W3C//DTD HTML 3.2//EN'> 1 <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/1999/REC-html401-19991224/loose.dtd"> 85 NO DOCTYPE Since only 15 of these have doctypes at all, this is clearly not a large enough sample, but the script can keep going indefinitely... Suggested tweaks to the script if anyone wants to enhance it: - Some sort of parallelizability (eg run 100 threads, or at least open 100 http connections) - The ability to save its status before terminating and then resume afterwards from the same place - Pick the META GENERATOR tag out of the file if one exists, so that we can identify what popular editors are putting in the doctype - Some kind of crawling from the sites themselves (perhaps to a depth of 2 or 3), rather than only going to the single page linked from dmoz. The current process doesn't even attempt to find the actual frames in a frameset, let alone linked pages. - Although wget knows that the file is text/html, the script doesn't retrieve this information. If dmoz links to a plaintext file that has <!doctype in the middle somewhere, you'll get a potentially very strange result... If I get a chance to address any of these issues, I'll post updated versions of the script in this bug. In the meantime, if anyone else wants to work on it (or just to run it for a larger number of sites), go for it. (attachment to follow...)

Comment 10

•

25 years ago

Attached file perl script to find doctypes of sites trawled from dmoz — Details

Henri Sivonen (:hsivonen)

Comment 11

•

25 years ago

I'd just like to add something (I've already posted this as a bug which was marked "invalid", and as a comment on another thread). The subject of this bug is one of the things I fell are more lacking to Mozilla. There *must* be a way to enable standards-compliant rendering with a transitional doctype. Having a "quirks mode" must be of some use to many people (not for me, though), but there must be a way to go around it without having to go to strict mode. Why this? Take for example one vastly used HTML authoring tool: Dreamweaver. Pages generated by it have (questionably) transitional content, and they look very good when shown in IE. But, if you look at them with Mozilla, you get the same effect as NS4, which is not as good as IE's. One example: tables with colored borders (both outline and inside the table). Dreamweaver does this by setting a cell spacing higher than 0, and setting a background color to the table. IE shows them great (even without a DOCTYPE attached), because as far as I can tell it is not using a "quirks mode". But looking in Mozilla with quirks mode it goes back to the same behaviour NS4 had: showing the spacing between the cells as a white background instead of the table background, ruining the effect. Using a strict DTD makes Mozilla show them correctly, but it breaks almost the whole page as Dreamweaver uses transitional syntax. I'm not a huge fan of Dreamweaver myself (actually I hate it most of the time), but I'm not a web designer (nor work with it, I do web programming), but the designers here use it most of the time. And I can say that because of this results are much better with IE than with Mozilla. If anyone is interested in seeing the example in action, follow: http://www.geocities.com/mvanzin/mozilla/table-strict.htm http://www.geocities.com/mvanzin/mozilla/table-loose.htm Same HTML, diferent DTD's (both with the DTD URL), and different results in Mozilla. I'm not sure about if my intentions in the second table are correct, but the first one (case I described above) should render equally in both cases (by equally I mean equal to the strict mode one). Hope this adds some food for thought. (BTW, I'm adding myself to the CC list.)

Comment 12

•

25 years ago

This bug is not about transitional doctypes. This is about unknown doctypes. Let's keep the bug focused. (It is possible to activate the standards layout mode for transitional documents. See: http://www.hut.fi/u/hsivonen/doctype.html)

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Comment 13

•

25 years ago

Oops, sorry (the original bug, 42525, was about transitional. I think I did not check the summary when going to this new thread). BTW, thanks for the link. (I think that 4.0 transitional should also fire standars mode, but...)

Assignee

Comment 14

•

25 years ago

The issue about table backgrounds is, I think, covered by bug 4510.

Comment 15

•

25 years ago

Hmm, maybe I can add something to the discussion the. :-) I'll try to base my comments on my previous comment (using the fact that many people use WYSIWYG HTML tools today, and they are mainly focused on producing nice output in IE, while mantaining an accetable result in NS4). I'd say that using strict mode for unknown doctypes would be a very bad idea in this case. If you go around, you will see very few pages using strict layout, and when they do they generally use the correct doctype declaration. The same cannot be said about the much more common case of transitional doctypes. Most of the pages use transitional syntax, and many of them do not contain a doctype declaration. That would leave us with three options: transitional with standards mode, transitional with quirks mode and old HTML 3 mode. First one is the best in my opinion. I changed the doctype in documents at work today (using Henri's tip) and the results were better, but with minor quirks I will be watching more closely tomorrow. Second one should be ok also, but then we fallback to my rants above. :-) It would not be in my preference to use this mode, but... Third one is out of question I think. By doing so we were going to ignore CSS mostly, and results would be horrible. This is a pretty tricky topic, but I think that it should, at least at this time, be resolved based on today's trends on HTML authoring. A nice idea would be (I think this was already suggested in some way): make one of them the default, and create an invisible preference (only editable going to the prefs file) to change it. That way, Mozilla can easily adapt to any decision made, and when people decide to change it then no problems would arise.

Comment 16

•

25 years ago

Remember that, by definition, "unknown" doctypes *excludes* all doctypes for the most popular authoring tools (because, as described in this bug, we would have to search out and identify what authoring tools use before we could fix this, and then they become "known"). The aim here is, I think, that we should do something like the following: 1) Identify the doctypes that are widely used on the net, including *at least* the ones used by popular authoring tools. 2) Decide what to do with these popular doctypes on a case-by-case basis, and encode that knowledge (the majority of these will probably require quirks-mode enabled, with a few exceptions such as XHTML, all STRICT variations and 4.0 Transitional with URL). 3) Treat doctypes that are *still* unknown as if they were strict. The rationale here is that we need to render the web as it is now (for compatibility) but we also need to render HTML5, XHTML 2, SOMEFUTUREML 7.6 etc which we don't know about yet. The ones we *do* know about, we can decide what to do with, but the ones we *don't* are ones that haven't been invented yet... and those will *certainly* require strict handling.

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Comment 17

•

25 years ago

I agree on the point that future (thus yet unknown) DTD's should be treated as strict mode. But I think that doing such a move *now* would break more things than fix them. At least until we get more standards compliance from everyone (browsers and authoring tools). bug 55916 has an example of such a behaviour. If a new version of a popular tool is released and then includes a new DTD declaration, what will be the result? Maybe it will still output transitional syntax for better backward compatibility, but the DTD could be declared in a way that fools the doctype parser so it would think it was better to use strict parsing... and we would be breaking the rendering again.

Assignee

Comment 18

•

25 years ago

Nominating for mozilla 0.9. I think we need to fix this for mozilla 1.0 and we need to get a bit of testing in before that happens. I am willing to fix this.

Keywords: rtm → mozilla0.9

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Assignee

Updated

•

25 years ago

Blocks: 60511

bsharma

Comment 19

•

24 years ago

updated qa contact.

QA Contact: janc → bsharma

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Comment 20

•

24 years ago

Because of bug 55916, I propose we WONTFIX this bug. If we want to encourage standards support, we should be encouraging XHTML, and we already do all of text/xml in standards mode.

Keywords: qawanted

Whiteboard: [rtm-] → WONTFIX?

Assignee

Comment 21

•

24 years ago

That's a silly reason. We should just add HotMeTaL's DOCTYPE to our list of quirks doctypes. This is crucial for future-compatibility, since AFAICT XHTML won't be able to be sent as text/xml for a long time in the future since there are still non-supporting browsers around today.

Comment 22

•

24 years ago

As I see it it's six of one and half a dozen of the other. We're going to get as many people writing new text/html pages with DOCTYPEs we don't recognise and wanting strict layout as we are people publishing old pages with silly DOCTYPEs with typos or other weird things and expecting a compatible rendering. XHTML2 is not going to be backwards compatible with XHTML1 or HTML4, and the latest version of XHTML, 1.1, is already treated in strict mode: http://www.bath.ac.uk/%7Epy8ieh/cgi/compat-test.pl?DOCTYPE=%3C% 21DOCTYPE+html+PUBLIC+%22-%2F%2FW3C%2F%2FDTD+XHTML+1.1%2F%2FEN%22++%22http%3A% 2F%2Fwww.w3.org%2FTR%2Fxhtml11%2FDTD%2Fxhtml11.dtd%22%3E&MODE=full So the only likely possible forward compatability problem is already covered.

Matthew T (active 1999-2002)

Comment 23

•

24 years ago

I've said this before, but I really think we should do some type of doctype-crawl of the web to find out what's actually out there. Perhaps we could ask the ODP/dmoz people, or google, or altavista, or somebody with a big database of web pages whether they have any exhaustive lists of the DOCTYPEs found on pages in their catalogs. Then we can look at pretty much all existing doctypes on a case by case basis - and hard-code this knowledge for these doctypes - making it much easier to say "if nobody's EVER used it before, then it's almost guaranteed to want standard rendering".

Comment 24

•

24 years ago

> We're going to get as many people writing new text/html pages with DOCTYPEs > we don't recognise and wanting strict layout as we are people publishing old > pages with silly DOCTYPEs with typos or other weird things and expecting a > compatible rendering. I disagree. If there's an error in your page as fundamental as a bad DOCTYPE, then all bets are (or should be) off as to how a browser will render it. I'm with David on this one -- forward compatibility is more important than backward compatibility.

Keywords: mozilla0.9 → mozilla0.9.2

Stefan Huszics

Comment 25

•

24 years ago

> If there's an error in your page as fundamental as a bad DOCTYPE, then all bets are (or should be) off as to how a browser will render it. I agree compleatly. However do notice 1 important thing about the doctype if this will ever be a "fix" for this bug. <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 ...etc is the exact same as <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 ...etc since html markup is case insensitive. Thus the casing on the word "HTML" (and only HTML) is not relevant and should thus not yield an invalid doctype. For "proof" compair with the corresponding XHTML (which is case sensitive) <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 ... At least this is how I have intrepreted the difference between high & low case in HTML/XHTML declarations.

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Assignee

Comment 26

•

24 years ago

I'm taking this bug, P1/critical (for standards support)/0.9.5. See my comments on bug 60511.

Assignee: rickg → dbaron

Severity: normal → critical

Priority: P3 → P1

Target Milestone: --- → mozilla0.9.5

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Assignee

Updated

•

24 years ago

Status: NEW → ASSIGNED

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Assignee

Comment 27

•

24 years ago

Attached patch preliminary patch — Details — Splinter Review

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Assignee

Comment 28

•

24 years ago

I still need to go through the code that I removed a little more carefully, since the code I added is an updated version of the old patch I had on bug 44340, and the code I am replacing may have changed more that I noticed since then. I also need to do a good bit of testing...

Whiteboard: WONTFIX?

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Assignee

Comment 29

•

24 years ago

I filed bug 98218, which exists with or without my patch. I'll attach an much-improved patch shortly (although it still uses obsolete string code).

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Assignee

Updated

•

24 years ago

Blocks: 44340, 55916, 61901

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Assignee

Comment 30

•

24 years ago

Attached patch much improved patch — Details — Splinter Review

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Assignee

Comment 31

•

24 years ago

Attached file the new code within the above patch (easier to read than the diff) — Details

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Assignee

Comment 32

•

24 years ago

Oops, I just noticed the bad formatting in ParsePS, and fixed it in my tree.

Markus Hübner

Comment 33

•

24 years ago

what impact will this bug have to those thousands of websites out there having no doctype specified. What are the consequences in rendering and the used appearance of websites to customers?

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Comment 34

•

24 years ago

no impact -- no DTD will be done in quirks mode just as now. This bug is only about *unknown* DTDs, not missing DTDs.

Assignee

Comment 35

•

24 years ago

What Ian said.

harishd

Comment 36

•

24 years ago

David: I like your changes a lot. The only thing that I didn't like is inlining DetermineHTMLParseMode(). What's the reason behind it?

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Assignee

Comment 37

•

24 years ago

The reason I made it |inline| was that it's only used once -- this may as well all get compiled into one big function (it should be slightly more efficient that way), but I'd rather not *think* about it as one big function. Of course, changing |inline| to |static| would probably be only a negligible slowdown (assuming the compiler is even capable of inlining the function), and it doesn't really matter to me.

Henri Sivonen (:hsivonen)

Comment 38

•

24 years ago

In the past Metrius has used an FPI like this: "-//Metrius//DTD Metrius Presentational//EN" on pages of their clients. I suggest including it on the list of quirky doctypes. Otherwise, bug 22274 will occur on Motorola's site. BTW, in DetermineHTMLParseMode() there's a call like this aBuffer.InsertWithConversion( "<!DOCTYPE HTML PUBLIC \"-//W3C//DTD HTML 4.01 Transitional//EN\">\n", 0); What's the purpose of that one? Is it used when Editor creates a new doc?

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Assignee

Comment 39

•

24 years ago

I updated my proposal at http://www.people.fas.harvard.edu/~dbaron/mozilla/doctypes to reflect one bit of current practice -- that we *are* using strict mode for HTML 4.01 transitional and frameset doctypes when a system identifier is present. (I also fixed some escaping errors in some of the links.) I also updated both the proposal and the code in my tree (a one line change, removing the line noting that doctype -- not worth posting a new patch) when I discovered that the old code treated "-//IETF//DTD HTML i18n//EN" as a strict-mode public ID, and we haven't had any problems caused by that. Finally, I updated both the proposal and the code for the Metrius doctype mentioned above (as eQuirks, not eQuirks3). I verified that I was not changing the behavior on any of the tests listed in that page (other than the one mentioned above that I changed) where I was not expecting to change the behavior. The only changes were: * on the 2d, 3d, and 4th items in the strict mode list (system identifier only, neither system nor public identifier, and internal subset), which I think are safe changes. * on the public ID "-//SoftQuad Software//DTD HoTMetaL PRO 6.0::19990601::extensions to HTML 4.0//EN" in the quirks list, which is bug 55916 So I think the patch is tested well enough that it's ready for checkin early in a milestone. I suspect we'll get a few reports of obscure doctypes that my search-for-doctypes missed. I plan to post to n.p.m.layout and n.p.m.seamonkey, and also email some Netscape tech evangelism folks so they know to be aware of the change. So I think the patch is ready for review. It's just the patch attached above, with the one formatting change in ParsePS, and the one doctype declaration mentioned above removed from the list of quirky doctypes, and the Metrius one mentioned above added (as eQuirks). I expect we'll have to add a few more public IDs to the list in the coming weeks, but that's why I want to check it in early in the milestone cycle. (To respond to Henri Sivonen's comment: I'm not sure what that InsertWithConversion is there for, but it was there and I didn't want to remove it for fear I'd break something.)

harishd

Comment 40

•

24 years ago

Comment on attachment 48191 [details] [diff] [review] much improved patch Change inline to static. With that r=harishd

Attachment #48191 - Flags: review+

Comment 41

•

24 years ago

harishd: Why do you think it is better for that function to be static rather than inline? I'm trying to learn the "tricks" of the trade, and can't quite understand this particular request. Thanks in advance for any explanation! :-)

Marc Attinasi

Comment 42

•

24 years ago

Not to speak for Harish, but generally 'static' is used to prevent a method from being exported from a module. It is useful, often necessary, if you want to make sure that you don't end up with several different global functions with the same name clashing at link time. 'inline' will likewise prevent the method from being exported, so I think inline is fine here.

harishd

Comment 43

•

24 years ago

inline functions, especially large functions, can introduce code bloat which in turn can cause negative performance. I would therefore perfer inlining smaller functions. On the other hand the inline keyword does not force the complier to inline a function ( I think ). It leaves the discretion to the compiler. I, personally, don't prefer guessing compilers' actions :-)