Closed Bug 802030 Opened 12 years ago Closed 11 years ago

Stop treating us-ascii, iso-8859-1, and Windows-1252 as distinct encodings

Tracking

()

Status:

RESOLVED DUPLICATE of bug 936466

People

(Reporter: hsivonen, Unassigned)

References

(Blocks 1 open bug)

Details

Henri Sivonen (:hsivonen)

Reporter

Description

•

12 years ago

At present, our character encoding infrastructure treats iso-8859-1 and Windows-1252 as distinct encodings even though they have identical decoders and having a true iso-8859-1 encoder is kind of pointless. In the Encoding Standard, iso-8859-1 is merely an alias for Windows-1252. We should get rid of the separate iso-8859-1 encoding and make its labels aliases for Windows-1252. Risk: it is possible that there exists a site that reads an encoding label supplied by Gecko and expects it to say iso-8859-1 and 10 deal if it says Windows-1252.

Henri Sivonen (:hsivonen)

Reporter

Comment 1

•

12 years ago

(In reply to Henri Sivonen (:hsivonen) from comment #0) > and 10 deal and can't deal

Masatoshi Kimura [:emk]

Updated

•

12 years ago

Blocks: 562096

Aryeh Gregor (:ayg) (no longer with Mozilla)

Comment 2

•

12 years ago

This also applies to the following: * iso-8859-11 is the same as windows-874 in the spec and in IE/WebKit. * tis-620 is the same as windows-874 in the spec and in IE/WebKit. * us-ascii is the same as windows-1252 in the spec, but not in any browser. * iso-8859-9 is the same as windows-1254 in the spec and WebKit, but not in IE. * gbk is the same as gb2312 in the spec and WebKit, but not in IE. * big5-hkscs is the same as big5 in the spec and IE, but not in WebKit. * euc-kr is the same as x-windows-949 in the spec and in IE/WebKit. * iso-8859-6-e and iso-8859-6-i are the same as iso-8859-6 in the spec and WebKit. IE seems not to recognize them at all. * iso-8859-8-e is the same as iso-8859-8 in the spec and WebKit. IE seems not to recognize it. Some or all of these should probably be in different bugs, though. In particular, all of them except iso-8859-9/windows-1254 are already implemented in at least one browser, so should be safer than this.

Aryeh Gregor (:ayg) (no longer with Mozilla)

Comment 3

•

12 years ago

I should add that the data from the previous comment comes only from .characterSet, and didn't involve analysis of encoders or decoders. But I hope that if .characterSet is the same in a browser, the encoder/decoder is the same too.

Aryeh Gregor (:ayg) (no longer with Mozilla)

Comment 4

•

12 years ago

And I also should add that by "WebKit" I mean "Chrome 23 dev". Anne tells me Safari uses a different ICU version.

Aryeh Gregor (:ayg) (no longer with Mozilla)

Comment 5

•

12 years ago

This should cover us-ascii too. I'll open a new bug for the other ones, since they're more likely to be web-compatible.

Summary: Stop treating iso-8859-1 and Windows-1252 as distinct encodings → Stop treating us-ascii, iso-8859-1, and Windows-1252 as distinct encodings

Henri Sivonen (:hsivonen)

Reporter

Comment 6

•

11 years ago

The browser side label handling is done. Blocking on mailnews as far as getting rid of the extra code goes.

Status: NEW → RESOLVED

Closed: 11 years ago

Resolution: --- → DUPLICATE

You need to log in before you can comment on or make changes to this bug.

Bugzilla

Stop treating us-ascii, iso-8859-1, and Windows-1252 as distinct encodings

Categories

(Core :: Internationalization, defect)

Tracking

()

People

(Reporter: hsivonen, Unassigned)

References

(Blocks 1 open bug)

Details

Crash Data

Security

(public)

User Story

Description

Comment 1

Updated

Comment 2

Comment 3

Comment 4

Comment 5

Comment 6