Replace ISO-8859-9 (latin5, etc.) decoder with windows-1254 decoder per HTML5/Encoding spec

RESOLVED FIXED in mozilla12

Status

()

Core
Internationalization
RESOLVED FIXED
6 years ago
6 years ago

People

(Reporter: GPHemsley, Assigned: emk)

Tracking

unspecified
mozilla12
Points:
---

Firefox Tracking Flags

(Not tracked)

Details

(URL)

Attachments

(3 attachments, 2 obsolete attachments)

(Reporter)

Description

6 years ago
According to the recently-spun-off Encoding Standard [1], Gecko does not currently support the full list of aliases for the windows-1254 encoding, which are as follows:
"csisolatin5", "iso-8859-9", "iso-ir-148", "l5", "latin5", and "windows-1254". 

It is noted in [1] that these aliases should already be supported per the HTML(5| Living) Standard.

For the most recent version of the Encoding Standard, see [2].

I don't know the implementation details of such a thing, but this seems to me to be a candidate Good First Bug.

[1] http://dvcs.w3.org/hg/encoding/raw-file/8cafea8b65f9/Overview.html#windows-1254
[2] http://dvcs.w3.org/hg/encoding/raw-file/tip/Overview.html#windows-1254
> I don't know the implementation details of such a thing

You add the relevant entries to intl/locale/src/charsetalias.properties
(Reporter)

Comment 2

6 years ago
Oh, perhaps the problem here is that they're all mapped to ISO-8859-9 instead of windows-1254:
http://hg.mozilla.org/mozilla-central/file/ed47a41ba26a/intl/locale/src/charsetalias.properties#l270

   270 #
   271 # Aliases for ISO-8859-9
   272 #
   273 latin5=ISO-8859-9
   274 iso_8859-9=ISO-8859-9
   275 # Currently .properties cannot handle : in key
   276 #iso_8859-9:1989=ISO-8859-9
   277 iso-ir-148=ISO-8859-9
   278 l5=ISO-8859-9
   279 csisolatin5=ISO-8859-9
(Assignee)

Comment 3

6 years ago
What label should be used on sending?
For example, we uses ISO-8859-1 instead of windows-1252 unless the text contains windows-1252 specific characters.

Comment 4

6 years ago
Is that difference required? It seems better to always use windows-1252 and windows-1254. Pretty sure that is how it works in Opera.
(Assignee)

Comment 5

6 years ago
Our charset converter is not only for the Web browser.
It will violate RFCs to always use windows-1252/1254 in mail messages.

Comment 6

6 years ago
How exactly would that violate those RFCs? The sender is in charge of picking the encoding, no?
(Assignee)

Comment 7

6 years ago
Created attachment 584157 [details]
Encoding label selection test

At least IE9, Chrome for Win, Safari for Win, and Firefox Nightly do not always use windows-1252/1254. I know Opera is the dominant browser in the world, but I don't think it's a good idea to change all other browsers to align with Opera.
(This test doesn't work with Opera. Is there a way to detect the internal encoding name on Opera?)
(Assignee)

Updated

6 years ago
Attachment #584157 - Attachment is patch: false
Attachment #584157 - Attachment mime type: text/plain → text/html
(Assignee)

Comment 8

6 years ago
Our behavior is consistent with IE9 and "correct" per IANA registry. Although the Encoding Standard can override any other standards by using the magic word "willful violation", it needs a good reason.
Status: NEW → RESOLVED
Last Resolved: 6 years ago
Resolution: --- → WONTFIX

Comment 9

6 years ago
When it comes to decoding the actual octets Gecko is the only browser that decodes iso-8859-9 per iso-8859-9 rather than windows-1254. It may be that implementations do not do correct label reporting, but that does not mean they decode it differently.

I was just saying that it seems simpler to always use windows-1252/windows-1254 but if there are good reasons not to do that we can certainly change things around, but the original bug as filed still seems accurate.

Updated

6 years ago
Status: RESOLVED → REOPENED
Resolution: WONTFIX → ---
(Assignee)

Comment 10

6 years ago
We can replace ISO-8859-9 decoder's mapping table with windows-1254's one as we did for ISO-8859-11 decoder.
However we will not simply turn ISO-8859-9 into an alias of windows-1254. It will affect our mail message header and some recipients may not handle the label windows-1254 even if they can handle "incorrectly" labeled ISO-8859-9 messages which actually contain windows-1254 specific characters.
(Assignee)

Comment 11

6 years ago
Created attachment 584182 [details]
Compare iso-8859-9 encoder vs. windows-1254 encoder

IE actually has a difference between iso-8859-9 encoder and windows-1254 encoder while both decoders are the same.
(Assignee)

Updated

6 years ago
Summary: Support aliases for windows-1254 encoding (latin5, iso-8859-9, etc.) → Replace iso-8859-9 (, latin5, etc.) decoder with windows-1254 decoder per HTML5/Encoding spec
(Assignee)

Comment 12

6 years ago
Created attachment 584185 [details]
Compare iso-8859-9 encoder vs. windows-1254 encoder

Sorry, the previous test had a bug. (But the difference is still present)
Attachment #584182 - Attachment is obsolete: true
(Assignee)

Comment 13

6 years ago
Created attachment 584194 [details] [diff] [review]
patch
Assignee: smontagu → VYV03354
Status: REOPENED → ASSIGNED
Attachment #584194 - Flags: review?(smontagu)
(Assignee)

Comment 14

6 years ago
I think the Encoding Standard should have an explanation about these "asymmetric" encodings so that browser vendors do not have to reverse engineer to find the trick.
(Assignee)

Comment 15

6 years ago
https://tbpl.mozilla.org/?tree=Try&rev=8d8012438396

Comment 16

6 years ago
If there is agreement that we should do this for iso-8859-9 / windows-1254 it should certainly reflect that. I'm not convinced this is an actual problem for email clients though. And as far as browsers go Opera and Chrome both use the iso-8859-9 labels to mean windows-1254. However, please file a bug on the Encoding Standard so it can be considered. There's a pointer at the top of the document.
(Assignee)

Comment 17

6 years ago
Filed W3C bug 15332.
(Reporter)

Comment 18

6 years ago
I'm not sure how this bug got so cranky so quickly, but I ask that we all please Assume Good Faith here.[1] I'm fairly certain that Anne's goal is to create an interoperable standard for encodings, not impose Opera's methods on everyone else.

From the point of view of someone not familiar with the inner workings of all these things, it is not clear to me whether the question of "which RFCs would this change violate?" has been answer, nor which IANA registry we are discussing. Also, I was not aware that IANA registries themselves could articulate rules—aren't they just databases of information that are relevant to certain RFCs?

Also, I should note that this new Encoding Standard is barely two weeks old. There is no reason to criticize its contents just yet. File bugs and participate in discussion first.

One final thought: Anne has collected a lot of data about how browsers handle these various encodings.[2][3] They might be worth a look.

[1] http://en.wikipedia.org/wiki/Wikipedia:Assume_good_faith
[2] http://dvcs.w3.org/hg/encoding/raw-file/8cafea8b65f9/single-octet-research.html
[3] http://lists.w3.org/Archives/Public/www-archive/2011Dec/att-0021/encoding-labels.html
Status: ASSIGNED → NEW
Summary: Replace iso-8859-9 (, latin5, etc.) decoder with windows-1254 decoder per HTML5/Encoding spec → Replace ISO-8859-9 (latin5, etc.) decoder with windows-1254 decoder per HTML5/Encoding spec
(Reporter)

Updated

6 years ago
Status: NEW → ASSIGNED
Comment on attachment 584194 [details] [diff] [review]
patch

Review of attachment 584194 [details] [diff] [review]:
-----------------------------------------------------------------

This is consistent with what we do with other encodings (e.g. EUC-JP[1] and Big5[2]). It doesn't conform with what the HTML5 spec currently says wrt "misinterpreting encodings for compatibility", which expects the misinterpretation to be symmetric. Will https://www.w3.org/Bugs/Public/show_bug.cgi?id=15332 get backported to HTML5?

[1]Bug 600715
[2]Bug 310299
Attachment #584194 - Flags: review?(smontagu) → review+
(Assignee)

Comment 20

6 years ago
Created attachment 584256 [details] [diff] [review]
patch for check in. r=smontagu
Attachment #584194 - Attachment is obsolete: true
Attachment #584256 - Flags: review+
(Assignee)

Updated

6 years ago
Keywords: checkin-needed
http://hg.mozilla.org/integration/mozilla-inbound/rev/4fb24658d1f2
Keywords: checkin-needed
Target Milestone: --- → mozilla12
https://hg.mozilla.org/mozilla-central/rev/4fb24658d1f2
Status: ASSIGNED → RESOLVED
Last Resolved: 6 years ago6 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.