Closed
Bug 89885
Opened 23 years ago
Closed 21 years ago
Front page needs charset parameter
Categories
(www.mozilla.org :: General, defect)
www.mozilla.org
General
Tracking
(Not tracked)
VERIFIED
FIXED
People
(Reporter: choess, Assigned: endico)
References
()
Details
(Whiteboard: start reading at comment 33)
Attachments
(2 obsolete files)
The front page of www.mozilla.org is missing the correct doctype (HTML 4.01 Transitional), preventing it from validating; in addition, an ampersand in the link to the Galeon download at sourceforge.net should be transformed into the entity &. I know that the whole page is supposed to be overhauled Real Soon Now with a Zope-based framework (It should be in a state to be hacked on very soon (in the next week, I hope.)-Gervase Markham, July 2, 2001, n.p.m.documentation), but I'm filing this bug in case its arrival is delayed, as the matter has attracted comment.
Comment 1•23 years ago
|
||
Comment 2•23 years ago
|
||
The & problem seems to be gone. I attached a patch that adds an appropriate doctype to the front page. The change doesn't affect display but it would make the front page look better to validators.
Keywords: patch
Comment 3•23 years ago
|
||
Endico, could you, please, take a look at the patch and check it in, if it is OK?
Comment 4•23 years ago
|
||
This is one of these things that is really just a niggle but it is so easy to fix. Would it be an idea to cc: tor@cs.brown.edu as that address shows up most often in the CVS log? It would just be nice to get this looking nice to the w3 validator.
Comment 5•23 years ago
|
||
*** Bug 101552 has been marked as a duplicate of this bug. ***
Comment 6•23 years ago
|
||
CCing brendan, as he expressed an interest. Dawn - any reason we can't check this in? Gerv
Comment 7•23 years ago
|
||
Dawn told me that we can't check this in because she thinks it will affect how browsers display the pages, and that we would be labelling some broken HTML (on the pages that haven't been fixed yet) with a DOCTYPE that they didn't match. Will checking this in affect browser display in any way? Gerv
Comment 8•23 years ago
|
||
> Dawn told me that we can't check this in because she thinks it will affect how > browsers display the pages, There are three browsers that are known to pay attention to the doctype. These are Mozilla, Mac IE 5 and Windows IE 6. The doctype suggested in the patch makes all three go in their respective standards modes. Switching the mode in Mac IE 5 would not change the layout of http://www.mozilla.org/index.html in any way. In Mozilla the margin/padding around the text changes slightly. Nothing drastic. (IMO, the slight change in Mozilla shouldn't block this change. It would be *very* bad from the evang point of view.) I'm unable to test with Windows IE 6. I'd appreaciate it if some else took a look. However, I think it is very safe to expect Windows IE 6 to have no problem with it. > and that we would be labelling some broken HTML (on > the pages that haven't been fixed yet) with a DOCTYPE that they didn't match. The intent is not to add the doctype to the wrapper. This particular change is only about the front page template.
Comment 9•23 years ago
|
||
I wrote "It would be *very* bad from the evang point of view." I meant: It would be *very* bad from the evang point of view if mozilla.org itself refused to use the standards mode of Mozilla.
Comment 10•23 years ago
|
||
Dawn: does that address your concerns? Gerv
Comment 11•23 years ago
|
||
what happens the next time newsbot gives us an & in a url?
Comment 12•23 years ago
|
||
Dawn: ping? Gerv
Comment 13•23 years ago
|
||
>what happens the next time newsbot gives us an & in a url? So the broken URL came from newsbot? I suggest fixing the newsbot URL output then. > I'm unable to test with Windows IE 6. I got access to a Windows machine with IE 6 on it. There's one issue with IE 6. It is easy to fix. I'll attach a new patch.
Comment 14•23 years ago
|
||
Attachment #42388 -
Attachment is obsolete: true
Comment 15•23 years ago
|
||
Dawn or Myk - could this please be checked in? Gerv
Reporter | ||
Comment 16•23 years ago
|
||
Apropos of this, isn't it about time to update the copyright notice?
Updated•23 years ago
|
Blocks: advocacybugs
Comment 17•22 years ago
|
||
*** Bug 132749 has been marked as a duplicate of this bug. ***
Comment 18•22 years ago
|
||
OK, so the hold-up to making mozilla.org valid HTML is...?
Comment 19•22 years ago
|
||
accepting QA for mozilla developer docs. some of these bugs have been around for a _long_ time. Reporters, would you please review the bugs, see if the issues have been resolved, and close bugs appropriately. I will do a full review of all bugs not touched in one week (8th April). Thanks. </spam>
QA Contact: endico → imajes
Comment 20•22 years ago
|
||
The patch misses this part: @@ -115,7 +116,7 @@ <a href="http://lxr.mozilla.org/mozilla1.0/source/"> LXR for the Mozilla 1.0 branch</a> </li> -<ul> +</ul> <!-- End of Body of 1.0 Countdown --> </TD> is there a reason why this bug is being ignored? it would be trivial to fix. also, it makes mozilla look better if the front page is a correct HTML page. (shouldn't this be in product mozilla.org component webmaster@mozilla.org ?)
Comment 21•22 years ago
|
||
yeah, I think so
Component: Mozilla Developer → webmaster@mozilla.org
Product: Documentation → mozilla.org
Version: unspecified → other
Comment 22•22 years ago
|
||
*** Bug 138807 has been marked as a duplicate of this bug. ***
Comment 23•22 years ago
|
||
changing summary for easier searching
Summary: Invalid HTML on front page → Invalid HTML on front page [page does not validate, mozilla.org has no doctype]
Comment 24•22 years ago
|
||
*** Bug 144413 has been marked as a duplicate of this bug. ***
Comment 25•22 years ago
|
||
How about marking this bug mozilla1.0? I think this should be fixed before 1.0 comes out.
Comment 26•22 years ago
|
||
*** Bug 148184 has been marked as a duplicate of this bug. ***
Comment 27•22 years ago
|
||
Do you need help with this huge change? hohoho Please, commit this.. this is a shame. And how come there have been no comments by the bug owner so far? Is "Dawn Endico" the only person with write access to Mozilla web pages?
Assignee | ||
Comment 28•22 years ago
|
||
i checked this in and also added the align=left attributes to the 'towards 1.0' but the w3 validator is down and the others i found don't do file uploads so i didn't check it.
Comment 29•22 years ago
|
||
I've used an offline SGML validator. This is that I got: nsgmls:index.html:119:3:E: document type does not allow element "UL" here; assuming missing "LI" start-tag nsgmls:index.html:122:15:E: "UL" not finished but containing element ended nsgmls:index.html:122:15:E: end tag for "UL" omitted, but its declaration does not permit this nsgmls:index.html:119:0: start tag was here nsgmls:index.html:122:15:E: end tag for "UL" omitted, but its declaration does not permit this nsgmls:index.html:106:0: start tag was here
Comment 30•22 years ago
|
||
This is probably all due to the same issue, namely a <ul> being where an </ul> should be: <ul> [...] LXR for the Mozilla 1.0 branch</a> </li> <ul> last <ul> should be </ul>
Assignee | ||
Comment 31•22 years ago
|
||
just checked in a fix for that.
Reporter | ||
Comment 32•22 years ago
|
||
Validator is back up again. Page validates. Thanks a million, Dawn; I greatly appreciate this.
Status: NEW → RESOLVED
Closed: 22 years ago
Resolution: --- → FIXED
Comment 33•22 years ago
|
||
Thank you for checking this in. The page is still missing the charset parameter in the Content-Type header, though. Is there another bug about that?
Comment 34•22 years ago
|
||
eh... why? RFC 2616 (HTTP) specifies this: The "charset" parameter is used with some media types to define the character set (section 3.4) of the data. When no explicit charset parameter is provided by the sender, media subtypes of the "text" type are defined to have a default charset value of "ISO-8859-1" when received via HTTP. Data in character sets other than "ISO-8859-1" or its subsets MUST be labeled with an appropriate charset value. See section 3.4.1 for compatibility problems. Therefore, for Latin1, no charset needs to be specified, since that's the default anyway.
Comment 35•22 years ago
|
||
biesi: the problem is, that the validator gives a rather big warning if you don't specify a charset and you won't see this nice "valid HTML"-Button until the page has a charset given.
Assignee | ||
Comment 36•22 years ago
|
||
that's a problem with the validator then
Comment 37•22 years ago
|
||
No, the document cited is old. The W3C changed the requirement to always spcify the encoding. From http://www.w3.org/TR/REC-html40/charset.html#h-5.2.2 : The HTTP protocol ([RFC2616], section 3.7.1) mentions ISO-8859-1 as a default character encoding when the "charset" parameter is absent from the "Content-Type" header field. In practice, this recommendation has proved useless because some servers don't allow a "charset" parameter to be sent, and others may not be configured to send the parameter. Therefore, user agents must not assume any default value for the "charset" parameter.
Comment 38•22 years ago
|
||
Stupid W3C... Well, adding <meta http-equiv="content-type" content="text/html; charset=iso-8859-1"> in the <head> should fix it, or?
Comment 39•22 years ago
|
||
biesi: Yes, it would.
Comment 40•22 years ago
|
||
OK, now the front page validates, but many other pages do not. See: http://www.htmlhelp.com/cgi-bin/validate.cgi?url=http%3A%2F%2Fwww.mozilla.org&warnings=yes&spider=yes&hidevalid=yes What should be done: - reopen this bug ? - open a global bug for all pages? - open a bug for each page? Regards,
Comment 41•22 years ago
|
||
Open a new bug, definitely - this one is about front page, as the summary says.
Comment 43•22 years ago
|
||
I am sorry, but the http://www.mozilla.org/index.html page still does not validate. Using http://validator.w3.org I get the following message: I was not able to extract a character encoding labeling from any of the valid sources for such information. Without encoding information it is impossible to validate the document. The sources I tried are: * The HTTP Content-Type field. * The XML Declaration. * The HTML "META" element. And I even tried to autodetect it using the algorithm defined in Appendix F of the XML 1.0 Recommendation. Since none of these sources yielded any usable information, I will not be able to validate this document. Sorry. Please make sure you specify the character encoding in use.
Comment 44•22 years ago
|
||
I am sorry, but I cannot reopen the bug. Can the subbmitter or owner please reopen it?
Comment 45•22 years ago
|
||
http://webtools.mozilla.org/web-sniffer/view.cgi?url=http%3A%2F%2Fwww.mozilla.org%2Findex.html Front page does not provide a charset.
Whiteboard: No charset provider - not fixed?
Comment 46•22 years ago
|
||
Section 2, paragraph 3 of the Mozilla.org style guide says that META tags should not be used. So either this part of the style guide should be changed, or the charset encoding should be send in the HTTP header. The style guide can be found at http://www.mozilla.org/README-style.html .
Comment 47•22 years ago
|
||
Re: Comment #46 From Oliver Klee 2002-12-16 04:34 ------- > Section 2, paragraph 3 of the Mozilla.org style guide says that META tags > should not be used. Which I can agree to (HTTP headers are definitly preferable). > or the charset encoding should be send in the HTTP header. Exactly. As the websniffer URI I've given says, the encoding is *not* sent. Is this a new bug?
Comment 48•22 years ago
|
||
Thanks for the reference to the style guide.
The style guide says:
<< Composer likes to put in noise like this:
<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=iso-8859-1">
While a nice touch in theory, back here in the real world, that tag makes
3.0-vintage Navigators load the document twice and generally lose their minds.
Don't go there. If you use Composer, take this junk out before publishing it.
>>
The reason given for not using META (Netscape 3.0) seems very out of date.
I would fix the style guide.
By the way, I would add to the style guide a first rule saying
"Always validate your HTML code before publishing it."
Reporter | ||
Comment 49•22 years ago
|
||
There are other significant issues with the use of <meta> tags, such as they're not actually a very good way to set the charset. HTTP headers are much preferable. This probably depends on some server-upgrade bug or another...
Comment 50•22 years ago
|
||
I do not know what the best way to give the character set is. What I can see is that now it is not given. If there are really good reasons not to use META tags, this should be documented in the style guide. The reason given in the current document seems very poor for me. Some facts: 1) META tags are used on some pages, to specify the character set as ISO Latin 1 http://www.mozilla.org/status/ http://www.mozilla.org/hacking/ http://www.mozilla.org/status/2002-11-20.html 2) META tags are used to specify a different character set than Latin 1, e.g. http://www.mozilla.org/releases/mozilla1.3a/ 3) META tags are used to specify other things, e.g. http://www.mozilla.org/hacking/ <meta name="GENERATOR" content="Mozilla/4.73 (Macintosh; I; PPC) [Netscape]">
Reporter | ||
Comment 51•22 years ago
|
||
See <URL:http://ppewww.ph.gla.ac.uk/~flavell/charset/ns-burp.html>, for instance. This is really something that should be solved by having the server send a charset parameter with the Content-Type; that's considerably less hackish.
Comment 52•22 years ago
|
||
A real HTTP header is certainly preferrable in the HTTP context. The meta thing could still be useful for those who browse the pages from a local filesystem after saving them to disk. So, what methods does the Netscape Enterprise 3.6 server provide for setting the charset parameter on a per-file, per-directory or per-server basis? Who can change the configuration *and* can be persuaded into doing so in the foreseeable future?
Comment 53•22 years ago
|
||
Page still does not have a character set..... REOPENED
Status: VERIFIED → REOPENED
Resolution: FIXED → ---
Updated•22 years ago
|
Attachment #61450 -
Attachment is obsolete: true
Comment 54•22 years ago
|
||
What options does the server provide for fixing this and who is able and willing to alter the settings?
Comment 55•22 years ago
|
||
After some digging at w3c.org, I found the following at http://www.w3.org/International/O-charset.html which says :- --- It is very important that the character encoding of any XML or (X)HTML document is clearly labeled . This can be done in the following ways: * Use the 'charset' parameter in the Content-Type header of HTTP . Example: Content-Type: text/html; charset=EUC-JP * For XML, use the encoding pseudo-attribute in the xml declaration at the start of a document or the text declaration at the start of an entity. Example: <?xml version="1.0" encoding="iso-8859-1" ?> * For HTML, use the <meta> tag inside <head>. Example: <meta http-equiv="Content-Type" content="text/html; charset=utf-8" > For XHTML, you need a slash at the end: <meta http-equiv="Content-Type" content="text/html; charset=utf-8" /> --- I found that adding <?xml version="1.0" encoding="iso-8859-1" ?> as the very first line of the HTML file (I'm using HTML 4.01) fixed the problem with the W3C Validator.
Reporter | ||
Comment 56•22 years ago
|
||
> I found that adding <?xml version="1.0" encoding="iso-8859-1" ?>
> as the very first line of the HTML file (I'm using HTML 4.01) fixed the
> problem with the W3C Validator.
That's a problem with the validator. (Really, it's a problem with the XHTML
1.0 spec, Appendix C, but let's not go there.) The advice given is only
appropriate for XML (including XHTML) documents, not HTML 4.01. (And
philosophically, charset should only really be handled at the HTTP level,
but let's not go there, either.)
Comment 57•22 years ago
|
||
Putting Charset into the HTTP header (i.e. via the Web Server) restricts the server to one charset, which means you can't have multi-lingual pages. Mind you, I think I saw, in passing, that some web servers allow multiple charsets. Putting a XML statement in a HTML file is very wrong and, to be honest, I'm extremely surprised that it worked as my Doctype is HTML 4.01. The W3C Validator should've complained very loudly about that. Anyway, it appears that the Charset problem is outside the realm of Mozilla as it involves the W3C and web servers and can be fixed within XHTML which I understand supersedes HTML. Should this bug be marked fixed as before?
Comment 58•22 years ago
|
||
This should not be marked fixed. Also, the character encoding does not need to be the same server-wise even if the information is put in the HTTP headers (unless the server has limitations, that is). The ideal fix would be adjusting the server configuration so that the right charset parameter is sent on the HTTP level. If no one is going to fix the server configuration, then the <meta> tag would be the plan B. The XML declaration doesn't belong to HTML at all (though technically it would be an unrecognized processing instruction). (And moving to XHTML is a whole other can of worms, so let's not go there.)
Reporter | ||
Comment 59•22 years ago
|
||
The Netscape Enterprise server documentation suggests it can do conneg on charset, so it should be quite possible.
Keywords: patch
Summary: Invalid HTML on front page [page does not validate, mozilla.org has no doctype] → Front page needs charset parameter
Whiteboard: No charset provider - not fixed?
Comment 60•22 years ago
|
||
From WaSP's Ask W3C column for December 2002: Specifying Character Encoding This month kicks off our new “WaSP Asks the W3C” Question and Answer project. In this project, frequently asked questions posed to WaSP by Web authors and designers regarding standards are submitted by WaSP members to the W3C’s Quality Assurance Group for information. The answers are published and archived both here and on the W3C Web Standards Education list, where follow-up discussion also takes place. Signup details can be found at the end of this article. WaSP asks There are several ways of specifying the character encoding for a particular document. Which of the following methods (or combination thereof) does the W3C recommend, and why? * Have the server administrator set the proper encoding via the HTTP headers returned by the Web server * Have the author add the encoding with a meta element * XHTML authors can add the character encoding using the XML declaration The W3C responds These three ways of providing the character encoding of a document are not equivalent. When trying to figure out the character encoding of a resource, user agents will try, in this order: * The HTTP Content-Type header sent by the server * The XML declaration (only for XHTML documents) * The HTML/XHTML meta element * Other ways. There are algorithms to guess the character encoding, for example Since the HTTP Content-Type header has precedence, and is also the easiest information to retrieve (user-agents do not have to parse the resource to get it), it is almost always the preferred way to provide the character encoding for an (X)HTML document. However, in at least two cases, this is simply not possible: * The document author does not have any way to configure the server to send the proper HTTP Content-Type header * The document is not served via HTTP. In these cases, an HTML document should provide the character encoding via a meta element, and an XML document can provide it via the XML declaration. If the XML document uses one of the default encodings (UTF-8 or UTF-16) no declaration is needed to manage the character encoding. To sum it up * Wrong. The webmaster sets a default character encoding to be sent by the server but does not let the author override it or the info is not provided anywhere whatsoever * Good. The character encoding is not set at the server level but properly declared through the HTML meta element (and/or the XML declaration for XHTML documents) * Best. The character encoding is properly set at the server level, either with a default that authors can override or on a per-document basis, and is also available at the document level (both in the XML declaration if applicable and the meta element) for standalone use Examples Example of an XHTML 1.0 document written in French with an ISO-8859-1 encoding: <?xml version="1.0" encoding="ISO-8859-1"?> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="fr" lang="fr"> <head> <title>Exemple de document XHTML 1.0</title> </head> <body> <h1>Portrait Intérieur</h1> <h2>Rainer-Maria Rilke</h2> <p>Ce ne sont pas des souvenirs<br /> qui, en moi, t'entretiennent ;<br /> tu n'es pas non plus mienne<br /> par la force d'un beau désir.</p> </body> </html> Example of an HTML 4.01 document written in French with a UTF-8 encoding: <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd"> <html lang="fr"> <head> <meta http-equiv="content-type" content="text/html; charset=UTF-8"> <title>Exemple de document HTML 4.01</title> </head> <body> <h1>Portrait Intérieur</h1> <h2>Rainer-Maria Rilke</h2> <p>Ce ne sont pas des souvenirs<br> qui, en moi, t'entretiennent ;<br> tu n'es pas non plus mienne<br> par la force d'un beau désir.</p> </body> </html> On the popular Apache Web server, the HTTP Content-Type header for a resource can be set up in the .htaccess file, as follows: <Files example.html> ForceType text/html;charset=ISO-8859-1 </Files> This would force the file example.html to be served as ISO-8859-1 even if the server had a different global configuration. WaSP comments WaSP and W3C member Tim Bray commented on this answer and said: “If you know that the document you’re sending is going to get read by an XML processor, the server should get the charset right. If the server makes any mistake the rules say that the processor is supposed to do the wrong thing! On the other hand, if the document is going to any kind of HTML reader, the server can usefully try to help and do what is suggested here. So it turns out that it matters whether you serve it as html or xhtml+xml.” How to serve HTML and XHTML will be discussed in the next issue of WaSP Asks the W3C. References * About Charset Parameters * About Character Encodings * HTML 4.0 specification on character encodings * XHTML 1.0 specification on character encodings * XML 1.0 specification on character encodings Discussion For clarification and discussion on this topic, please address your comments and questions to the W3C Web Standards Education list. To subscribe to the list, send an email to public-evangelist-request@w3.org with “Subject: subscribe”. You can read archived posts at http://lists.w3.org/Archives/Public/public-evangelist/.
Comment 61•22 years ago
|
||
For the charset on the whole web site there is bug 154570. Comment 57 is wrong. You can set the charset in http headers on a file by file basis (at least on reasonable web servers like Apache). pi
Updated•21 years ago
|
Comment 63•21 years ago
|
||
see attachment 127836 [details] for an xhtml transitional version of the mozilla.org home page, which uses an xml prolog to declare its content-type. i realize that it's better to send charset information on the server-side, but that's bug 154570, and i'd say that it's better to have the page validate now and later than just later.
Reporter | ||
Comment 64•21 years ago
|
||
Um, serving XHTML as text/html causes more problems than it solves. (Hixie can talk your ear off about this, if necessary). Maybe the current meltdown will wind up with us moving to Apache, which would have been the sensible choice all along?
Comment 65•21 years ago
|
||
as we can't use <meta> or an XML prolog, i checked out some docs on netscape enterprise: http://kuhub.cc.ku.edu/www/html/721final/6563/6563pro_001.html#writingnsconfigfiles it looks like we can do this (note: i can't test this b/c i don't have an ns server): <Files index.html> AddType exp=index.html type=text/html;charset=iso-8859-1 </Files> however, i'm not sure that child directories won't inherit this, so perhaps we could add something like: <Files ?*/index.html> AddType exp=index.html type=text/html </Files>
Comment 66•21 years ago
|
||
Just for reference, the documentation you linked to is for Netscape FastTrack Server 7.2. We're running Netscape Enterprise Server 3.0.
Comment 67•21 years ago
|
||
ben@netscape.com maintains the front page; CCing him. I don't know which bug the referenced attachment 127836 [details] (XHTML version of the new front page) is attached to, though... Gerv
Comment 68•21 years ago
|
||
Attachment #127836 [details] is attached to Bug #154570 ("www.mozilla.org doesn't send charset information")
Comment 69•21 years ago
|
||
This charset discussion runs for more than 15 months now. Can't we just insert the meta tag for now until we can reconfigure the server and/or port the page to XHTML 1.1?
Whiteboard: start reading at comment 33
Comment 70•21 years ago
|
||
The server move is happening this coming week. Fiddling with server-provided headers will be much easier once we're running on Apache.
Comment 71•21 years ago
|
||
Okay, according to http://www.delorie.com/web/headers.cgi?url=http%3A%2F%2Fwww.mozilla.org , the server now runs Apache/1.3.27 (Unix) (Red-Hat/Linux). It sends no charset currently.
Comment 72•21 years ago
|
||
It *just* got moved this afternoon. Apache can do it, but not by default. :) We still have to configure it. It will happen sometime in the next week or so (we're all volunteers here).
Comment 73•21 years ago
|
||
Dave, thanks for the work. Let me just give a reminder to bug 154570 (which would fix this bug, but does not really block it). AddDefaultCharset On in the config would do the job. It can be overwritten on a by file or by type basis. pi
Comment 74•21 years ago
|
||
Request this be closed as the beta site is live ( / validates as HTML 4.01 Strict)? I would close this but I don't have permissions (and don't fulfill req's to get permissions) yet.
Comment 75•21 years ago
|
||
fixed by redesign.
Status: REOPENED → RESOLVED
Closed: 22 years ago → 21 years ago
Resolution: --- → FIXED
Updated•16 years ago
|
Product: mozilla.org → Websites
Updated•12 years ago
|
Component: www.mozilla.org → General
Product: Websites → www.mozilla.org
You need to log in
before you can comment on or make changes to this bug.
Description
•