Closed Bug 89885 Opened 23 years ago Closed 21 years ago

Front page needs charset parameter

Categories

(www.mozilla.org :: General, defect)

defect
Not set
trivial

Tracking

(Not tracked)

VERIFIED FIXED

People

(Reporter: choess, Assigned: endico)

References

()

Details

(Whiteboard: start reading at comment 33)

Attachments

(2 obsolete files)

The front page of www.mozilla.org is missing the correct doctype (HTML 4.01
Transitional), preventing it from validating; in addition, an ampersand in the
link to the Galeon download at sourceforge.net should be transformed into the
entity &.  I know that the whole page is supposed to be overhauled Real Soon
Now with a Zope-based framework (It should be in a state to be hacked on very
soon (in the next week, I hope.)-Gervase Markham, July 2, 2001,
n.p.m.documentation), but I'm filing this bug in case its arrival is delayed, as
the matter has attracted comment.
The & problem seems to be gone. I attached a patch that adds an appropriate
doctype to the front page. The change doesn't affect display but it would make
the front page look better to validators.
Keywords: patch
Endico, could you, please, take a look at the patch and check it in, if it is OK?
This is one of these things that is really just a niggle but it is so easy to fix.

Would it be an idea to cc: tor@cs.brown.edu as that address shows up most often
in the CVS log?

It would just be nice to get this looking nice to the w3 validator.
*** Bug 101552 has been marked as a duplicate of this bug. ***
CCing brendan, as he expressed an interest.

Dawn - any reason we can't check this in?

Gerv
Dawn told me that we can't check this in because she thinks it will affect how
browsers display the pages, and that we would be labelling some broken HTML (on
the pages that haven't been fixed yet) with a DOCTYPE that they didn't match.

Will checking this in affect browser display in any way?

Gerv
> Dawn told me that we can't check this in because she thinks it will affect how
> browsers display the pages,

There are three browsers that are known to pay attention to the doctype. These
are Mozilla, Mac IE 5 and Windows IE 6. The doctype suggested in the patch makes
all three go in their respective standards modes.

Switching the mode in Mac IE 5 would not change the layout of
http://www.mozilla.org/index.html in any way. In Mozilla the margin/padding
around the text changes slightly. Nothing drastic. (IMO, the slight change in
Mozilla shouldn't block this change. It would be *very* bad from the evang point
of view.)

I'm unable to test with Windows IE 6. I'd appreaciate it if some else took a
look. However, I think it is very safe to expect Windows IE 6 to have no problem
with it.

> and that we would be labelling some broken HTML (on
> the pages that haven't been fixed yet) with a DOCTYPE that they didn't match.

The intent is not to add the doctype to the wrapper. This particular change is
only about the front page template.
I wrote "It would be *very* bad from the evang point
of view." I meant: It would be *very* bad from the evang point of view if
mozilla.org itself refused to use the standards mode of Mozilla.
Dawn: does that address your concerns?

Gerv
what happens the next time newsbot gives us an & in a url?
Dawn: ping?

Gerv
>what happens the next time newsbot gives us an & in a url?

So the broken URL came from newsbot? I suggest fixing the newsbot URL output then.
> I'm unable to test with Windows IE 6.

I got access to a Windows machine with IE 6 on it. There's one issue with IE 6.
It is easy to fix. I'll attach a new patch.
Attached patch Patch v2 (IE6-friendly) (obsolete) — Splinter Review
Attachment #42388 - Attachment is obsolete: true
Dawn or Myk - could this please be checked in?

Gerv
Apropos of this, isn't it about time to update the copyright notice?
Blocks: advocacybugs
*** Bug 132749 has been marked as a duplicate of this bug. ***
OK, so the hold-up to making mozilla.org valid HTML is...?
accepting QA for mozilla developer docs.

some of these bugs have been around for a _long_ time. Reporters, would you
please review the bugs, see if the issues have been resolved, and close bugs
appropriately.

I will do a full review of all bugs not touched in one week (8th April). 

Thanks.

</spam>
QA Contact: endico → imajes
The patch misses this part:
@@ -115,7 +116,7 @@
 <a href="http://lxr.mozilla.org/mozilla1.0/source/">
 LXR for the Mozilla 1.0 branch</a>
 </li>
-<ul>
+</ul>
 
 <!-- End of Body of 1.0 Countdown -->
            </TD>


is there a reason why this bug is being ignored? it would be trivial to fix.
also, it makes mozilla look better if the front page is a correct HTML page.

(shouldn't this be in product mozilla.org component webmaster@mozilla.org ?)
yeah, I think so
Component: Mozilla Developer → webmaster@mozilla.org
Product: Documentation → mozilla.org
Version: unspecified → other
*** Bug 138807 has been marked as a duplicate of this bug. ***
changing summary for easier searching
Summary: Invalid HTML on front page → Invalid HTML on front page [page does not validate, mozilla.org has no doctype]
*** Bug 144413 has been marked as a duplicate of this bug. ***
How about marking this bug mozilla1.0? I think this should be fixed before 1.0
comes out.
*** Bug 148184 has been marked as a duplicate of this bug. ***
Do you need help with this huge change? hohoho

Please, commit this.. this is a shame. And how come there have been no comments
by the bug owner so far? Is "Dawn Endico" the only person with write access to
Mozilla web pages?
i checked this in and also added the align=left attributes to the 'towards 1.0'
but the w3 validator is down and the others i found don't do file uploads so
i didn't check it.
I've used an offline SGML validator. This is that I got:

nsgmls:index.html:119:3:E: document type does not allow element "UL" here;
assuming missing "LI" start-tag
nsgmls:index.html:122:15:E: "UL" not finished but containing element ended
nsgmls:index.html:122:15:E: end tag for "UL" omitted, but its declaration does
not permit this
nsgmls:index.html:119:0: start tag was here
nsgmls:index.html:122:15:E: end tag for "UL" omitted, but its declaration does
not permit this
nsgmls:index.html:106:0: start tag was here
This is probably all due to the same issue, namely a <ul> being where an </ul> 
should be:
<ul>
[...]
LXR for the Mozilla 1.0 branch</a>
</li>
<ul>


last <ul> should be </ul>
just checked in a fix for that.
Validator is back up again. Page validates. Thanks a million, Dawn; I greatly
appreciate this.
Status: NEW → RESOLVED
Closed: 22 years ago
Resolution: --- → FIXED
Thank you for checking this in. The page is still missing the charset parameter
in the Content-Type header, though. Is there another bug about that?
eh... why? RFC 2616 (HTTP) specifies this:
   The "charset" parameter is used with some media types to define the
   character set (section 3.4) of the data. When no explicit charset
   parameter is provided by the sender, media subtypes of the "text"
   type are defined to have a default charset value of "ISO-8859-1" when
   received via HTTP. Data in character sets other than "ISO-8859-1" or
   its subsets MUST be labeled with an appropriate charset value. See
   section 3.4.1 for compatibility problems.

Therefore, for Latin1, no charset needs to be specified, since that's the
default anyway.
biesi: the problem is, that the validator gives a rather big warning if you
don't specify a charset and you won't see this nice "valid HTML"-Button until
the page has a charset given.
that's a problem with the validator then
No, the document cited is old. The W3C changed the requirement to always spcify
the encoding.

From http://www.w3.org/TR/REC-html40/charset.html#h-5.2.2 :

The HTTP protocol ([RFC2616], section 3.7.1) mentions ISO-8859-1 as a default
character encoding when the "charset" parameter is absent from the
"Content-Type" header field. In practice, this recommendation has proved useless
because some servers don't allow a "charset" parameter to be sent, and others
may not be configured to send the parameter. Therefore, user agents must not
assume any default value for the "charset" parameter.
Stupid W3C...

Well, adding <meta http-equiv="content-type" content="text/html;
charset=iso-8859-1"> in the <head> should fix it, or?
biesi: Yes, it would.
OK, now the front page validates, but many other pages do not.

See:
http://www.htmlhelp.com/cgi-bin/validate.cgi?url=http%3A%2F%2Fwww.mozilla.org&warnings=yes&spider=yes&hidevalid=yes

What should be done:
- reopen this bug ?
- open a global bug for all pages?
- open a bug for each page?

Regards,
Open a new bug, definitely - this one is about front page, as the summary says.
v
Status: RESOLVED → VERIFIED
I am sorry, but the http://www.mozilla.org/index.html page still
does not validate.

Using http://validator.w3.org

I get the following message:

 I was not able to extract a character encoding labeling from any of the valid
sources for such information. Without encoding information it is impossible to
validate the document. The sources I tried are:

    * The HTTP Content-Type field.
    * The XML Declaration.
    * The HTML "META" element.

And I even tried to autodetect it using the algorithm defined in Appendix F of
the XML 1.0 Recommendation.

Since none of these sources yielded any usable information, I will not be able
to validate this document. Sorry. Please make sure you specify the character
encoding in use. 

I am sorry, but I cannot reopen the bug.
Can the subbmitter or owner please reopen it?
http://webtools.mozilla.org/web-sniffer/view.cgi?url=http%3A%2F%2Fwww.mozilla.org%2Findex.html

Front page does not provide a charset.
Whiteboard: No charset provider - not fixed?
Section 2, paragraph 3 of the Mozilla.org style guide says that META tags should
not be used. So either this part of the style guide should be changed, or the
charset encoding should be send in the HTTP header.

The style guide can be found at http://www.mozilla.org/README-style.html .
Re: Comment #46 From Oliver Klee 2002-12-16 04:34 -------

> Section 2, paragraph 3 of the Mozilla.org style guide says that META tags
> should not be used.

Which I can agree to (HTTP headers are definitly preferable).

> or the charset encoding should be send in the HTTP header.

Exactly. As the websniffer URI I've given says, the encoding is *not* sent.

Is this a new bug?
Thanks for the reference to the style guide.

The style guide says:

<< Composer likes to put in noise like this:

      <META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=iso-8859-1">

While a nice touch in theory, back here in the real world, that tag makes
3.0-vintage Navigators load the document twice and generally lose their minds.
Don't go there. If you use Composer, take this junk out before publishing it. 
>>

The reason given for not using META (Netscape 3.0) seems very out of date.
I would fix the style guide.

By the way, I would add to the style guide a first rule saying
"Always validate your HTML code before publishing it."
There are other significant issues with the use of <meta> tags, such as 
they're not actually a very good way to set the charset. HTTP headers are 
much preferable. This probably depends on some server-upgrade bug or 
another...
I do not know what the best way to give the character set is.
What I can see is that now it is not given.

If there are really good reasons not to use META tags, this should
be documented in the style guide. The reason given in the current
document seems very poor for me.

Some facts:

1) META tags are used on some pages, to specify the character set as
ISO Latin 1
http://www.mozilla.org/status/
http://www.mozilla.org/hacking/
http://www.mozilla.org/status/2002-11-20.html

2) META tags are used to specify a different character set than
Latin 1, e.g.
http://www.mozilla.org/releases/mozilla1.3a/

3) META tags are used to specify other things, e.g.
http://www.mozilla.org/hacking/
<meta name="GENERATOR" content="Mozilla/4.73 (Macintosh; I; PPC) [Netscape]">
See <URL:http://ppewww.ph.gla.ac.uk/~flavell/charset/ns-burp.html>, for 
instance. This is really something that should be solved by having the 
server send a charset parameter with the Content-Type; that's 
considerably less hackish.
A real HTTP header is certainly preferrable in the HTTP context. The meta thing
could still be useful for those who browse the pages from a local filesystem
after saving them to disk.

So, what methods does the Netscape Enterprise 3.6 server provide for setting the
charset parameter on a per-file, per-directory or per-server basis? Who can
change the configuration *and* can be persuaded into doing so in the foreseeable
future?
Page still does not have a character set.....

REOPENED
Status: VERIFIED → REOPENED
Resolution: FIXED → ---
Attachment #61450 - Attachment is obsolete: true
What options does the server provide for fixing this and who is able and willing
to alter the settings?
After some digging at w3c.org, I found the following at
http://www.w3.org/International/O-charset.html which says :-

---
It is very important that the character encoding of any XML or (X)HTML document
is clearly labeled . This can be done in the following ways:

    * Use the 'charset' parameter in the Content-Type header of HTTP . Example:
      Content-Type: text/html; charset=EUC-JP
    * For XML, use the encoding pseudo-attribute in the xml declaration at the
start of a document or the text declaration at the start of an entity. Example:
      <?xml version="1.0" encoding="iso-8859-1" ?>
    * For HTML, use the <meta> tag inside <head>. Example:
      <meta http-equiv="Content-Type" content="text/html; charset=utf-8" >

      For XHTML, you need a slash at the end:

      <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
---

I found that adding <?xml version="1.0" encoding="iso-8859-1" ?>
as the very first line of the HTML file (I'm using HTML 4.01) fixed the problem
with the W3C Validator.
> I found that adding <?xml version="1.0" encoding="iso-8859-1" ?>
> as the very first line of the HTML file (I'm using HTML 4.01) fixed the         
> problem with the W3C Validator.
 
That's a problem with the validator. (Really, it's a problem with the XHTML 
1.0 spec, Appendix C, but let's not go there.) The advice given is only 
appropriate for XML (including XHTML) documents, not HTML 4.01. (And 
philosophically, charset should only really be handled at the HTTP level, 
but let's not go there, either.)

Putting Charset into the HTTP header (i.e. via the Web Server) restricts the
server to one charset, which means you can't have multi-lingual pages. Mind you,
I think I saw, in passing, that some web servers allow multiple charsets.

Putting a XML statement in a HTML file is very wrong and, to be honest, I'm
extremely surprised that it worked as my Doctype is HTML 4.01. The W3C Validator
should've complained very loudly about that. 

Anyway, it appears that the Charset problem is outside the realm of Mozilla as
it involves the W3C and web servers and can be fixed within XHTML which I
understand supersedes HTML.

Should this bug be marked fixed as before?
This should not be marked fixed. Also, the character encoding does not need to
be the same server-wise even if the information is put in the HTTP headers
(unless the server has limitations, that is).

The ideal fix would be adjusting the server configuration so that the right
charset parameter is sent on the HTTP level.

If no one is going to fix the server configuration, then the <meta> tag would be
the plan B. The XML declaration doesn't belong to HTML at all (though
technically it would be an unrecognized processing instruction).

(And moving to XHTML is a whole other can of worms, so let's not go there.)
The Netscape Enterprise server documentation suggests it can do 
conneg on charset, so it should be quite possible.
Keywords: patch
Summary: Invalid HTML on front page [page does not validate, mozilla.org has no doctype] → Front page needs charset parameter
Whiteboard: No charset provider - not fixed?
From WaSP's Ask W3C column for December 2002:

Specifying Character Encoding

This month kicks off our new “WaSP Asks the W3C” Question and Answer project. In
this project, frequently asked questions posed to WaSP by Web authors and
designers regarding standards are submitted by WaSP members to the W3C’s Quality
Assurance Group for information. The answers are published and archived both
here and on the W3C Web Standards Education list, where follow-up discussion
also takes place. Signup details can be found at the end of this article.
WaSP asks

There are several ways of specifying the character encoding for a particular
document. Which of the following methods (or combination thereof) does the W3C
recommend, and why?

    * Have the server administrator set the proper encoding via the HTTP headers
returned by the Web server
    * Have the author add the encoding with a meta element
    * XHTML authors can add the character encoding using the XML declaration

The W3C responds

These three ways of providing the character encoding of a document are not
equivalent. When trying to figure out the character encoding of a resource, user
agents will try, in this order:

    * The HTTP Content-Type header sent by the server
    * The XML declaration (only for XHTML documents)
    * The HTML/XHTML meta element
    * Other ways. There are algorithms to guess the character encoding, for example

Since the HTTP Content-Type header has precedence, and is also the easiest
information to retrieve (user-agents do not have to parse the resource to get
it), it is almost always the preferred way to provide the character encoding for
an (X)HTML document.

However, in at least two cases, this is simply not possible:

    * The document author does not have any way to configure the server to send
the proper HTTP Content-Type header
    * The document is not served via HTTP.

In these cases, an HTML document should provide the character encoding via a
meta element, and an XML document can provide it via the XML declaration. If the
XML document uses one of the default encodings (UTF-8 or UTF-16) no declaration
is needed to manage the character encoding.
To sum it up

    * Wrong. The webmaster sets a default character encoding to be sent by the
server but does not let the author override it or the info is not provided
anywhere whatsoever
    * Good. The character encoding is not set at the server level but properly
declared through the HTML meta element (and/or the XML declaration for XHTML
documents)
    * Best. The character encoding is properly set at the server level, either
with a default that authors can override or on a per-document basis, and is also
available at the document level (both in the XML declaration if applicable and
the meta element) for standalone use

Examples

Example of an XHTML 1.0 document written in French with an ISO-8859-1 encoding:

<?xml version="1.0" encoding="ISO-8859-1"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
  "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">

<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="fr" lang="fr">

<head>
<title>Exemple de document XHTML 1.0</title>
</head>

<body>
<h1>Portrait Intérieur</h1>
<h2>Rainer-Maria Rilke</h2>
<p>Ce ne sont pas des souvenirs<br />
qui, en moi, t'entretiennent ;<br />
tu n'es pas non plus mienne<br />
par la force d'un beau désir.</p>
</body>
</html>

Example of an HTML 4.01 document written in French with a UTF-8 encoding:

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
  "http://www.w3.org/TR/html4/strict.dtd">
		 
<html lang="fr">

<head>
<meta http-equiv="content-type" content="text/html; charset=UTF-8">

<title>Exemple de document HTML 4.01</title>
</head>

<body>
<h1>Portrait Intérieur</h1>
<h2>Rainer-Maria Rilke</h2>

<p>Ce ne sont pas des souvenirs<br>
qui, en moi, t'entretiennent ;<br>
tu n'es pas non plus mienne<br>
par la force d'un beau désir.</p>
</body>
</html>

On the popular Apache Web server, the HTTP Content-Type header for a resource
can be set up in the .htaccess file, as follows:

<Files example.html>
ForceType text/html;charset=ISO-8859-1
</Files>

This would force the file example.html to be served as ISO-8859-1 even if the
server had a different global configuration.
WaSP comments

WaSP and W3C member Tim Bray commented on this answer and said:

“If you know that the document you’re sending is going to get read by an XML
processor, the server should get the charset right. If the server makes any
mistake the rules say that the processor is supposed to do the wrong thing! On
the other hand, if the document is going to any kind of HTML reader, the server
can usefully try to help and do what is suggested here. So it turns out that it
matters whether you serve it as html or xhtml+xml.”

How to serve HTML and XHTML will be discussed in the next issue of WaSP Asks the
W3C.
References

    * About Charset Parameters
    * About Character Encodings
    * HTML 4.0 specification on character encodings
    * XHTML 1.0 specification on character encodings
    * XML 1.0 specification on character encodings

Discussion

For clarification and discussion on this topic, please address your comments and
questions to the W3C Web Standards Education list.

To subscribe to the list, send an email to public-evangelist-request@w3.org with
“Subject: subscribe”. You can read archived posts at
http://lists.w3.org/Archives/Public/public-evangelist/.
For the charset on the whole web site there is bug 154570.

Comment 57 is wrong. You can set the charset in http headers on a file by file
basis (at least on reasonable web servers like Apache).

pi
Blocks: validate
Depends on: 154570
removing dependency on an invalid bug
No longer blocks: advocacybugs
see attachment 127836 [details] for an xhtml transitional version of the mozilla.org home
page, which uses an xml prolog to declare its content-type.

i realize that it's better to send charset information on the server-side, but
that's bug 154570, and i'd say that it's better to have the page validate now
and later than just later.
Um, serving XHTML as text/html causes more problems than it solves. (Hixie can
talk your ear off about this, if necessary). Maybe the current meltdown will
wind up with us moving to Apache, which would have been the sensible choice all
along?
as we can't use <meta> or an XML prolog, i checked out some docs on netscape
enterprise:

http://kuhub.cc.ku.edu/www/html/721final/6563/6563pro_001.html#writingnsconfigfiles

it looks like we can do this (note: i can't test this b/c i don't have an ns
server):

<Files index.html>
 AddType exp=index.html type=text/html;charset=iso-8859-1
</Files>

however, i'm not sure that child directories won't inherit this, so perhaps we
could add something like:

<Files ?*/index.html>
 AddType exp=index.html type=text/html
</Files>
Just for reference, the documentation you linked to is for Netscape FastTrack
Server 7.2.  We're running Netscape Enterprise Server 3.0.
ben@netscape.com maintains the front page; CCing him. I don't know which bug the
referenced attachment 127836 [details] (XHTML version of the new front page) is attached
to, though...

Gerv
Attachment #127836 [details] is attached to Bug #154570 ("www.mozilla.org doesn't send
charset information")
This charset discussion runs for more than 15 months now. Can't we just insert
the meta tag for now until we can reconfigure the server and/or port the page to
XHTML 1.1?
Whiteboard: start reading at comment 33
The server move is happening this coming week.  Fiddling with server-provided
headers will be much easier once we're running on Apache.
Okay, according to
http://www.delorie.com/web/headers.cgi?url=http%3A%2F%2Fwww.mozilla.org , the
server now runs Apache/1.3.27 (Unix)  (Red-Hat/Linux). It sends no charset
currently.
It *just* got moved this afternoon.
Apache can do it, but not by default. :)  We still have to configure it.

It will happen sometime in the next week or so (we're all volunteers here).
Dave,

thanks for the work. Let me just give a reminder to bug 154570 (which would fix
this bug, but does not really block it).

AddDefaultCharset On
in the config would do the job. It can be overwritten on a by file or by type basis.

pi
Request this be closed as the beta site is live ( / validates as HTML 4.01
Strict)?  I would close this but I don't have permissions (and don't fulfill
req's to get permissions) yet.
fixed by redesign.
Status: REOPENED → RESOLVED
Closed: 22 years ago21 years ago
Resolution: --- → FIXED
Verifying fixed.
Status: RESOLVED → VERIFIED
No longer blocks: validate
Product: mozilla.org → Websites
Component: www.mozilla.org → General
Product: Websites → www.mozilla.org
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: