Closed Bug 258478 Opened 17 years ago Closed 16 years ago

Fix Update to use UTF-8 (Wrong Meta Charset Information)

Categories

(addons.mozilla.org Graveyard :: Public Pages, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: volkmar, Assigned: annevk)

References

()

Details

Attachments

(1 file)

User-Agent:       Mozilla/5.0 (X11; U; Linux i686; rv:1.7.3) Gecko/20040908 Firefox/0.10
Build Identifier: Mozilla/5.0 (X11; U; Linux i686; rv:1.7.3) Gecko/20040908 Firefox/0.10

Wrong Meta Charset Information and illegal Characters on
"http://update.mozilla.org/extensions/showlist.php"

Reproducible: Always
Steps to Reproduce:
1.
2.
3.
"Sorry, I am unable to validate this document because on lines 58, 122  it
contained one or more bytes that I cannot interpret as utf-8 (in other words,
the bytes found are not valid values in the specified Character Encoding).
Please check both the content of the file and the character encoding indication."

 57: </DIV><DIV id="content">
 58: #### encoding problem on this line, not shown ####
 59: <SPAN class="listtitle">Firefox Extensions &#187; All </SPAN><br>Extensions
1 - 10 of 53

Missing line is something like
<DIV id="listnav"><DIV class="pagenum"  style="margin-right: 95px;">Page 0 of
0</DIV>

121: 
122: #### encoding problem on this line, not shown ####
123: Jump to: <A HREF="?pageid="></A>&nbsp;

Missing line is something like
Status: UNCONFIRMED → NEW
Ever confirmed: true
That would be the same lines that Mozilla prints as ?. It's a non html encoded
bullet (which i've already fixed locally, in the new look)

Though aren't there author names that are exhibiting the same problems under UTF-8?
> #2
> Though aren't there author names that are exhibiting the same problems under
UTF-8?

I don't understand your question. The problem is that the page is sent with an
HTTP header saying charset is UTF-8 and the page itself is obviously encoded
iso-8859-1, according to its meta hack:

http://validator.w3.org/check?uri=http%3A%2F%2Fupdate.mozilla.org%2Fextensions%2Fshowlist.php%3Fcategory%3DAll&charset=iso-8859-1+%28Western+Europe%29&ss=1&verbose=1
Yes, but that's exactly why the meta tag is allowed.  It sounds like Wolf
already has this fixed locally.
Whiteboard: fixed-development
The page really should not provide conflicting charset encoding info, and should
probably stick to 7-bit ASCII with entities, for safety.
(In reply to comment #5)
> The page really should not provide conflicting charset encoding info, 

This means the meta hack must be in accordance with HTTP header's charset info.

> and should probably stick to 7-bit ASCII with entities, for safety.
Unicode characters in UTF-8 are ok. No need at all for char entities.
*** Bug 261583 has been marked as a duplicate of this bug. ***
Whiteboard: fixed-development
Even if I choose Western encoding, Firefox automagically changes back to
(obviously wrong) UTF-8 on every next page in extension list, because that's
what is given in HTTP response header.

Mozilla/5.0 (Windows; U; Windows NT 5.1; rv:1.7.3) Gecko/20040913 Firefox/0.10.1
Sounds like a server config issue. now that I think about it..

I bet it's because the Default Charset for the new server is UTF-8 and not
ISO-8859-1, which needs to be changed. 
Whiteboard: [Server Config Blocking]
*** Bug 263969 has been marked as a duplicate of this bug. ***
Severity: trivial → minor
OS: Linux → All
Hardware: PC → All
Whiteboard: [Server Config Blocking] → [Server Config]
This problem can also be found on the page
https://update.mozilla.org/themes/moreinfo.php?id=213

Author name shown is V&#65533;ctor Fern&#65533;ndez. Changing character encoding to
iso-8859-1, we can see the correct name : Víctor Fernández. 
Sorry, two "?" should have been shown in the place of "&#65533;" in the last
comment.
*** Bug 264681 has been marked as a duplicate of this bug. ***
Moving to server operations. Where it should've been awhlie ago. Oops.

The default character set for update.mozilla.org on iguana needs to be changed
from UTF-8 to ISO-8859-1. This bug appeared when we moved to iguana from rodan
which had ISO-8859-1 as default.
Assignee: psychoticwolf → myk
Severity: minor → normal
Component: Update → Server Operations
QA Contact: mozilla.update → justdave
Whiteboard: [Server Config]
(In reply to comment #14)

> The default character set for update.mozilla.org on iguana needs to be changed
> from UTF-8 to ISO-8859-1. This bug appeared when we moved to iguana from rodan
> which had ISO-8859-1 as default.

Aren't there a bunch of folks working on making update.mozilla.org localizable
right now?

Fix the app to use UTF-8 instead of ISO-8859-1.  Trust me.

You're shooting the localization efforts in the foot if you don't.
Component: Server Operations → Update
Assignee: myk → psychoticwolf
QA Contact: justdave → mozilla.update
(In reply to comment #15)
> Fix the app to use UTF-8 instead of ISO-8859-1.  Trust me.

And this especially applies to the data in the database (such as author names),
which isn't going to change when the localization stuff is pulled in.
This fixes the featured update box and the ?'s on showlist.php.
Comment on attachment 163496 [details] [diff] [review]
Fixes obvious non-UTF-8 issues.

Patch checked in to branch, applied to site.
(In reply to comment #17)
Thanks a lot. Now it looks much better. But UMO pages are still in a very very
bad condition. The silly wrong meta hack is still around. The validator[1] says:

    The character encoding specified in the HTTP 
    header (utf-8) is different from the value in 
    the <meta> element (iso-8859-1). I will use 
    the value from the HTTP header (utf-8) for 
    this validation.


[1]http://validator.w3.org/check?verbose=1&uri=http%3A//update.mozilla.org/extensions/showlist.php
I'm aware of that, that patch didn't even attempt to address that issue. :-)
Summary: Wrong Meta Charset Information and illegal Characters on "http://update.mozilla.org/extensions/showlist.php" → Fix Update to use UTF-8 (Wrong Meta Charset Information)
Bulk Moving Web Site bugs to new component.
(Filter: massumowebsitespam)
Component: Update → Web Site
Product: mozilla.org → Update
Version: other → unspecified
Blocks: 245948
Just remove the META element altogether. It's not needed.
Target Milestone: --- → 1.0
Still a problem with update-beta.
(In reply to comment #23)
> Still a problem with update-beta.

This bug isn't marked as being fixed either. :-)
Version: unspecified → 0.9
The UTF-8/ISO-8859-1 meta conflict should be solved on
http://update-beta.mozilla.org.

Though, incoming data isn't guaranteed to be UTF-8 in the DB. That will likely
not be solved for 1.0.. Changing Target Milestone.
Assignee: psychoticwolf → psychoticwolf
Target Milestone: 1.0 → 1.1
I would like to point out comment 22 once again and I was wondering why it can't
be guarenteed UTF-8? If people are using a form to put data in the database, and
that form is on a UTF-8 encoded page with UTF-8 as value of the charset
parameter there shouldn't be a single problem.
(In reply to comment #22)
> Just remove the META element altogether. It's not needed.

I think w3 recommends having it.
I don't think so -- unless proved otherwise.

W3 (WWW) perfectly works without <meta>, relying on Content-Type HTTP response
header... or one of other headers.

And just in case if you meant W3C... Well, that's a public organization, so
either  you'll find their specs recommending <meta>, or there's no such
recommendations.
All pages now validate.
Status: NEW → RESOLVED
Closed: 16 years ago
Resolution: --- → FIXED
This has nothing to do with validation. And, by the way, see also comment 22.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
META tags are used for supplemental information or to override server settings.
 Since we're controlling charset at the server level, we don't need a meta tag.  
(In reply to comment #31)
> META tags are used for supplemental information or to override server settings.

Actually, it's the other way around.  According to the RFCs, the Content-Type
header the server sends overrides the META tag.
Assignee: psychoticwolf → nobody
Status: REOPENED → NEW
Assignee: nobody → justdave
Unfortunately, some browsers are not RFC-compliant in this case, Dave.
I am not aware of a single, much used browser (that has something useful to do
on UMO), that does not support this. Could you perhaps list them? And point to
the test case you used?
Why is this assigned to me?  What would you like me to do?
(In reply to comment #35)
> Why is this assigned to me?  What would you like me to do?

As far as I can tell, in all pages referenced in the discussion thread the META
and HTTP header content type information are equal to UTF-8. So, unless someone
can throw a page where non-ASCII characters are badly displayed (I have
unfortunately not seen any page showing non-English characters), this bug should
be closed as fixed.
(In reply to comment #36)
> As far as I can tell, in all pages referenced in the discussion thread the META
> and HTTP header content type information are equal to UTF-8. So, unless someone
> can throw a page where non-ASCII characters are badly displayed (I have
> unfortunately not seen any page showing non-English characters), this bug should
> be closed as fixed.

No it should not. See for instance comment 31. This bug is fixed once the META
element is gone.
Assignee: justdave → Bugzilla-alanjstrBugs
Patch with R+ on bug 279004
Assignee: Bugzilla-alanjstrBugs → bug
The META element is gone in CVS. Marking FIXED per comment 37. The fix can be
found in bug 279004.
Status: NEW → RESOLVED
Closed: 16 years ago16 years ago
Resolution: --- → FIXED
and the backend data is ensured to be UTF-8? really? who wrote the code for that?
Wolf, what do you mean with backend data? As long as the MIME type of the input
pages is UTF-8 there will not be a problem. There is no additional action required.
Product: addons.mozilla.org → addons.mozilla.org Graveyard
You need to log in before you can comment on or make changes to this bug.