Closed Bug 931401 Opened 11 years ago Closed 11 years ago

Firefox page info says windows-1252 but it's really US-ASCII

Categories

(Core :: DOM: Core & HTML, defect)

19 Branch
x86_64
All
defect
Not set
normal

Tracking

()

RESOLVED DUPLICATE of bug 890478

People

(Reporter: eggert, Unassigned)

References

Details

(Keywords: regression)

User Agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:24.0) Gecko/20100101 Firefox/24.0 (Beta/Release)
Build ID: 20130917154415

Steps to reproduce:

Start up Firefox and cut and paste this URL:

http://www.cs.ucla.edu/~eggert/tz/tz-link.htm

to visit the page. Then type Control-I to get Page Info.


Actual results:

It reports "Encoding: windows-1252".


Expected results:

It should report "Encoding: US-ASCII", since that's what's in the HTTP header and in the
contents, which say "<meta http-equiv="Content-type" content='text/html; charset="US-ASCII"'>".
Pushlog
http://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?fromchange=539bb6ca633a&tochange=3267977b0f8a
Blocks: 801402
Component: General → DOM
Keywords: regression
OS: Linux → All
Version: 24 Branch → 19 Branch
Status: UNCONFIRMED → NEW
Ever confirmed: true
us-ascii is a label for windows-1252, per <http://encoding.spec.whatwg.org/>.
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → INVALID
<http://encoding.spec.whatwg.org/> is incorrect.  US-ASCII is not an alias for Windows-1252.  Windows-1252 is an 8-bit encoding and has assignments for most characters 128-255.  US-ASCII has assignments only for characters 0-127.  If you look at the IANA assignments for character sets <http://www.iana.org/assignments/character-sets/character-sets.xhtml>, which is the official registry for encodings, you'll see that they are distinct.
Comment 2 and comment 3 are at cross purposes: <http://encoding.spec.whatwg.org/> is aware that US-ASCII and Windows-1252 are distinct encodings; what it is saying is that user agents must interpret web content labeled as US-ASCII as if it were labeled as Windows-1252.
(In reply to Paul Eggert from comment #3)
> <http://www.iana.org/assignments/character-sets/character-sets.xhtml>, which
> is the official registry for encodings, you'll see that they are distinct.

The Gecko implementation doesn't treat the IANA registry as authoritative, because treating http://encoding.spec.whatwg.org/ as authoritative is better for compatibility with Web content and other browsers. This results in some political friction, sure, but it also results in more useful software.
Status: RESOLVED → VERIFIED
Status: VERIFIED → RESOLVED
Closed: 11 years ago11 years ago
Resolution: INVALID → DUPLICATE
Component: DOM → DOM: Core & HTML
You need to log in before you can comment on or make changes to this bug.