Closed Bug 99426 Opened 23 years ago Closed 23 years ago

Shouldn't translate Windows-1252 characters when document is ISO-8859-1

Categories

(Core :: Internationalization, defect)

x86
Linux
defect
Not set
normal

Tracking

()

RESOLVED WONTFIX

People

(Reporter: jmd, Assigned: shanjian)

References

Details

(Keywords: intl, Whiteboard: jmd-remind)

Attachments

(1 file)

I believe ISO-8859 specifically reserves 0x80 through 0x9f for controls, no printable characters should be there. Translating characters in the range when the document specifically requests ISO-8895-1 rendering is just furthering Microsoft extend and embrace. Using the characters in that range is considered bad netiquette, which Mozilla should not help proliferate.
Keywords: intl
QA Contact: andreasb → ylong
Switching component to "Internationalization".
Assignee: rchen → yokoyama
Component: Localization → Internationalization
QA Contact: ylong → andreasb
QA Contact: andreasb → ylong
assiging to shanjian.
Assignee: yokoyama → shanjian
ISO-8859 left 0x80 to 0x9f unassigned. It is not reserved for anything. In a 8859-1 encoded text, we shouldn't see anything in this range. In case it happens, it is very likely that user's intention is using win1252. Many programmers do not know the difference of 8859-1 and win1252, let along average users. So if we don't handle those code points, they thought it as a bug in mozilla. Considering of the real situation mozilla based browser is in, we could do nothing to stop this kind of practice from proliferating. I absolutely agree with you if we can make some difference. The fact is, we can't blame user for such practice, nor can we stop it. There is much more evil things in this world we need to fight, so let's make the compromise here.
Status: NEW → RESOLVED
Closed: 23 years ago
Resolution: --- → WONTFIX
To my understanding, ISO 8859 referances the C1 control set from ISO 6429 as the controls for 80-9f. Of course, ISO standards aren't exactly available, so this is all heresay. =80 U+0080 PADDING CHARACTER =81 U+0081 HIGH OCTET PRESET =82 U+0082 BREAK PERMITTED HERE =83 U+0083 NO BREAK HERE =84 U+0084 INDEX =85 U+0085 NEXT LINE =86 U+0086 START OF SELECTED AREA =87 U+0087 END OF SELECTED AREA =88 U+0088 CHARACTER TABULATION SET =89 U+0089 CHARACTER TABULATION WITH JUSTIFICATION =8A U+008A LINE TABULATION SET =8B U+008B PARTIAL LINE FORWARD =8C U+008C PARTIAL LINE BACKWARD =8D U+008D REVERSE LINE FEED =8E U+008E SINGLE-SHIFT TWO =8F U+008F SINGLE-SHIFT THREE =90 U+0090 DEVICE CONTROL STRING =91 U+0091 PRIVATE USE ONE =92 U+0092 PRIVATE USE TWO =93 U+0093 SET TRANSMIT STATE =94 U+0094 CANCEL CHARACTER =95 U+0095 MESSAGE WAITING =96 U+0096 START OF GUARDED AREA =97 U+0097 END OF GUARDED AREA =98 U+0098 START OF STRING =99 U+0099 SINGLE GRAPHIC CHARACTER INTRODUCER =9A U+009A SINGLE CHARACTER INTRODUCER =9B U+009B CONTROL SEQUENCE INTRODUCER =9C U+009C STRING TERMINATOR =9D U+009D OPERATING SYSTEM COMMAND =9E U+009E PRIVACY MESSAGE =9F U+009F APPLICATION PROGRAM COMMAND I don't think those are going to be implemented, however I'd eventually like to take a look at the wording of 8859 regarding the range. Marking status.
Whiteboard: jmd-remind
I don't have ISO8859 standard document either, so my understanding of this are also base one various indirect source. In my understanding, control set C1 is an application level stuff. ISO8859 intentionally leave those code points unassign to make applications which utilize ISO6429 C1 control set possible. That's say C1 control set code points (0x80 to 0x9f) should not live beyond its application scope. They are meaningless in the context of general information exchange, especially in HTML document.
Attachment #49177 - Attachment mime type: text/html → text/html; charset=iso-8859-1
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: