Closed
Bug 712876
Opened 13 years ago
Closed 13 years ago
Replace ISO-8859-9 (latin5, etc.) decoder with windows-1254 decoder per HTML5/Encoding spec
Categories
(Core :: Internationalization, defect)
Core
Internationalization
Tracking
()
RESOLVED
FIXED
mozilla12
People
(Reporter: GPHemsley, Assigned: emk)
References
()
Details
Attachments
(3 files, 2 obsolete files)
According to the recently-spun-off Encoding Standard [1], Gecko does not currently support the full list of aliases for the windows-1254 encoding, which are as follows:
"csisolatin5", "iso-8859-9", "iso-ir-148", "l5", "latin5", and "windows-1254".
It is noted in [1] that these aliases should already be supported per the HTML(5| Living) Standard.
For the most recent version of the Encoding Standard, see [2].
I don't know the implementation details of such a thing, but this seems to me to be a candidate Good First Bug.
[1] http://dvcs.w3.org/hg/encoding/raw-file/8cafea8b65f9/Overview.html#windows-1254
[2] http://dvcs.w3.org/hg/encoding/raw-file/tip/Overview.html#windows-1254
Comment 1•13 years ago
|
||
> I don't know the implementation details of such a thing
You add the relevant entries to intl/locale/src/charsetalias.properties
Reporter | ||
Comment 2•13 years ago
|
||
Oh, perhaps the problem here is that they're all mapped to ISO-8859-9 instead of windows-1254:
http://hg.mozilla.org/mozilla-central/file/ed47a41ba26a/intl/locale/src/charsetalias.properties#l270
270 #
271 # Aliases for ISO-8859-9
272 #
273 latin5=ISO-8859-9
274 iso_8859-9=ISO-8859-9
275 # Currently .properties cannot handle : in key
276 #iso_8859-9:1989=ISO-8859-9
277 iso-ir-148=ISO-8859-9
278 l5=ISO-8859-9
279 csisolatin5=ISO-8859-9
Assignee | ||
Comment 3•13 years ago
|
||
What label should be used on sending?
For example, we uses ISO-8859-1 instead of windows-1252 unless the text contains windows-1252 specific characters.
Comment 4•13 years ago
|
||
Is that difference required? It seems better to always use windows-1252 and windows-1254. Pretty sure that is how it works in Opera.
Assignee | ||
Comment 5•13 years ago
|
||
Our charset converter is not only for the Web browser.
It will violate RFCs to always use windows-1252/1254 in mail messages.
Comment 6•13 years ago
|
||
How exactly would that violate those RFCs? The sender is in charge of picking the encoding, no?
Assignee | ||
Comment 7•13 years ago
|
||
At least IE9, Chrome for Win, Safari for Win, and Firefox Nightly do not always use windows-1252/1254. I know Opera is the dominant browser in the world, but I don't think it's a good idea to change all other browsers to align with Opera.
(This test doesn't work with Opera. Is there a way to detect the internal encoding name on Opera?)
Assignee | ||
Updated•13 years ago
|
Attachment #584157 -
Attachment is patch: false
Attachment #584157 -
Attachment mime type: text/plain → text/html
Assignee | ||
Comment 8•13 years ago
|
||
Our behavior is consistent with IE9 and "correct" per IANA registry. Although the Encoding Standard can override any other standards by using the magic word "willful violation", it needs a good reason.
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → WONTFIX
Comment 9•13 years ago
|
||
When it comes to decoding the actual octets Gecko is the only browser that decodes iso-8859-9 per iso-8859-9 rather than windows-1254. It may be that implementations do not do correct label reporting, but that does not mean they decode it differently.
I was just saying that it seems simpler to always use windows-1252/windows-1254 but if there are good reasons not to do that we can certainly change things around, but the original bug as filed still seems accurate.
Updated•13 years ago
|
Status: RESOLVED → REOPENED
Resolution: WONTFIX → ---
Assignee | ||
Comment 10•13 years ago
|
||
We can replace ISO-8859-9 decoder's mapping table with windows-1254's one as we did for ISO-8859-11 decoder.
However we will not simply turn ISO-8859-9 into an alias of windows-1254. It will affect our mail message header and some recipients may not handle the label windows-1254 even if they can handle "incorrectly" labeled ISO-8859-9 messages which actually contain windows-1254 specific characters.
Assignee | ||
Comment 11•13 years ago
|
||
IE actually has a difference between iso-8859-9 encoder and windows-1254 encoder while both decoders are the same.
Assignee | ||
Updated•13 years ago
|
Summary: Support aliases for windows-1254 encoding (latin5, iso-8859-9, etc.) → Replace iso-8859-9 (, latin5, etc.) decoder with windows-1254 decoder per HTML5/Encoding spec
Assignee | ||
Comment 12•13 years ago
|
||
Sorry, the previous test had a bug. (But the difference is still present)
Attachment #584182 -
Attachment is obsolete: true
Assignee | ||
Comment 13•13 years ago
|
||
Assignee: smontagu → VYV03354
Status: REOPENED → ASSIGNED
Attachment #584194 -
Flags: review?(smontagu)
Assignee | ||
Comment 14•13 years ago
|
||
I think the Encoding Standard should have an explanation about these "asymmetric" encodings so that browser vendors do not have to reverse engineer to find the trick.
Assignee | ||
Comment 15•13 years ago
|
||
Comment 16•13 years ago
|
||
If there is agreement that we should do this for iso-8859-9 / windows-1254 it should certainly reflect that. I'm not convinced this is an actual problem for email clients though. And as far as browsers go Opera and Chrome both use the iso-8859-9 labels to mean windows-1254. However, please file a bug on the Encoding Standard so it can be considered. There's a pointer at the top of the document.
Assignee | ||
Comment 17•13 years ago
|
||
Filed W3C bug 15332.
Reporter | ||
Comment 18•13 years ago
|
||
I'm not sure how this bug got so cranky so quickly, but I ask that we all please Assume Good Faith here.[1] I'm fairly certain that Anne's goal is to create an interoperable standard for encodings, not impose Opera's methods on everyone else.
From the point of view of someone not familiar with the inner workings of all these things, it is not clear to me whether the question of "which RFCs would this change violate?" has been answer, nor which IANA registry we are discussing. Also, I was not aware that IANA registries themselves could articulate rules—aren't they just databases of information that are relevant to certain RFCs?
Also, I should note that this new Encoding Standard is barely two weeks old. There is no reason to criticize its contents just yet. File bugs and participate in discussion first.
One final thought: Anne has collected a lot of data about how browsers handle these various encodings.[2][3] They might be worth a look.
[1] http://en.wikipedia.org/wiki/Wikipedia:Assume_good_faith
[2] http://dvcs.w3.org/hg/encoding/raw-file/8cafea8b65f9/single-octet-research.html
[3] http://lists.w3.org/Archives/Public/www-archive/2011Dec/att-0021/encoding-labels.html
Status: ASSIGNED → NEW
Summary: Replace iso-8859-9 (, latin5, etc.) decoder with windows-1254 decoder per HTML5/Encoding spec → Replace ISO-8859-9 (latin5, etc.) decoder with windows-1254 decoder per HTML5/Encoding spec
Reporter | ||
Updated•13 years ago
|
Status: NEW → ASSIGNED
Comment 19•13 years ago
|
||
Comment on attachment 584194 [details] [diff] [review]
patch
Review of attachment 584194 [details] [diff] [review]:
-----------------------------------------------------------------
This is consistent with what we do with other encodings (e.g. EUC-JP[1] and Big5[2]). It doesn't conform with what the HTML5 spec currently says wrt "misinterpreting encodings for compatibility", which expects the misinterpretation to be symmetric. Will https://www.w3.org/Bugs/Public/show_bug.cgi?id=15332 get backported to HTML5?
[1]Bug 600715
[2]Bug 310299
Attachment #584194 -
Flags: review?(smontagu) → review+
Assignee | ||
Comment 20•13 years ago
|
||
Attachment #584194 -
Attachment is obsolete: true
Attachment #584256 -
Flags: review+
Assignee | ||
Updated•13 years ago
|
Keywords: checkin-needed
Comment 21•13 years ago
|
||
Keywords: checkin-needed
Target Milestone: --- → mozilla12
Comment 22•13 years ago
|
||
Status: ASSIGNED → RESOLVED
Closed: 13 years ago → 13 years ago
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•