Closed Bug 802030 Opened 12 years ago Closed 10 years ago

Stop treating us-ascii, iso-8859-1, and Windows-1252 as distinct encodings

Categories

(Core :: Internationalization, defect)

defect
Not set
normal

Tracking

()

RESOLVED DUPLICATE of bug 936466

People

(Reporter: hsivonen, Unassigned)

References

(Blocks 1 open bug)

Details

At present, our character encoding infrastructure treats iso-8859-1 and Windows-1252 as distinct encodings even though they have identical decoders and having a true iso-8859-1 encoder is kind of pointless. In the Encoding Standard, iso-8859-1 is merely an alias for Windows-1252. We should get rid of the separate iso-8859-1 encoding and make its labels aliases for Windows-1252.

Risk: it is possible that there exists a site that reads an encoding label supplied by Gecko and expects it to say iso-8859-1 and 10 deal if it says Windows-1252.
(In reply to Henri Sivonen (:hsivonen) from comment #0)
>  and 10 deal 

and can't deal
Blocks: 562096
This also applies to the following:

* iso-8859-11 is the same as windows-874 in the spec and in IE/WebKit.
* tis-620 is the same as windows-874 in the spec and in IE/WebKit.
* us-ascii is the same as windows-1252 in the spec, but not in any browser.
* iso-8859-9 is the same as windows-1254 in the spec and WebKit, but not in IE.
* gbk is the same as gb2312 in the spec and WebKit, but not in IE.
* big5-hkscs is the same as big5 in the spec and IE, but not in WebKit.
* euc-kr is the same as x-windows-949 in the spec and in IE/WebKit.
* iso-8859-6-e and iso-8859-6-i are the same as iso-8859-6 in the spec and WebKit.  IE seems not to recognize them at all.
* iso-8859-8-e is the same as iso-8859-8 in the spec and WebKit.  IE seems not to recognize it.

Some or all of these should probably be in different bugs, though.  In particular, all of them except iso-8859-9/windows-1254 are already implemented in at least one browser, so should be safer than this.
I should add that the data from the previous comment comes only from .characterSet, and didn't involve analysis of encoders or decoders.  But I hope that if .characterSet is the same in a browser, the encoder/decoder is the same too.
And I also should add that by "WebKit" I mean "Chrome 23 dev".  Anne tells me Safari uses a different ICU version.
This should cover us-ascii too.  I'll open a new bug for the other ones, since they're more likely to be web-compatible.
Summary: Stop treating iso-8859-1 and Windows-1252 as distinct encodings → Stop treating us-ascii, iso-8859-1, and Windows-1252 as distinct encodings
The browser side label handling is done. Blocking on mailnews as far as getting rid of the extra code goes.
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → DUPLICATE
You need to log in before you can comment on or make changes to this bug.