Closed Bug 1227006 Opened 9 years ago Closed 7 years ago

APIs for getting encoding should return "GBK" for gbk encoding.

Categories

(Core :: Internationalization, defect)

40 Branch
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla56

People

(Reporter: crimsteam, Assigned: hsivonen)

References

Details

(Whiteboard: [fixed by encoding_rs])

User Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:42.0) Gecko/20100101 Firefox/42.0
Build ID: 20151029151421

Steps to reproduce:

Per DOM:
https://dom.spec.whatwg.org/#dom-document-characterset
all this characterSet, charset and inputEncoding attributes should return "GBK" for gbk encoding.

All labels for gbk encoding (https://encoding.spec.whatwg.org/#names-and-labels) product this results in particular browsers:

Firefox
document.characterSet: gbk
document.charset: gbk
document.inputEncoding: gbk 

Chrome
document.characterSet: GBK
document.charset: GBK
document.inputEncoding: GBK

IE11
document.characterSet: gb2312
document.charset: gb2312
document.inputEncoding: GB2312

Opera (Presto)
document.characterSet: gbk
document.charset: undefined
document.inputEncoding: undefined

Or maybe change spec to output original name "gbk"? All other encoding names are consistent between Firefox and Chrome.
This is a general intl, thing not a DOM thing.

We'd need to switch the lines in intl/locale/unix/unixcharset.properties to alias gbk to GBK, not vice versa, if we actually want to make this change.
Component: DOM → Internationalization
If I'm following https://www.w3.org/Bugs/Public/show_bug.cgi?id=27436 correctly, especially comment number 10, the right fix is to change the internal canonical name from "gbk" to "GBK" (and double check whether any other names are affected). Henri, is that correct?
Flags: needinfo?(hsivonen)
annevk, in https://www.w3.org/Bugs/Public/show_bug.cgi?id=27435#c13 you said you aligned with Blink without saying why? Why did you align with Blink instead of Gecko&Presto? (https://www.w3.org/Bugs/Public/show_bug.cgi?id=27436#c10 gives a potential reason, but you comment doesn't cite it.)

(In reply to Simon Montagu :smontagu from comment #2)
> If I'm following https://www.w3.org/Bugs/Public/show_bug.cgi?id=27436
> correctly, especially comment number 10, the right fix is to change the
> internal canonical name from "gbk" to "GBK" (and double check whether any
> other names are affected). Henri, is that correct?

Changing an internal name is more trouble that special-casing this in the getters for document.characterSet, document.charset and document.inputEncoding. I'd be OK with just doing that. 

However, if you want to keep the DOM case and the Gecko case consistent, the s/gbk/GBK/ substitution in C++/XUL/chrome JS needs to be thorough. Fortunately, there aren't too many places to change if https://mxr.mozilla.org/mozilla-central/search?string=%22gbk%22&case=on and https://mxr.mozilla.org/mozilla-central/search?string=gbk&case=1&find=\.properties%24&findi=\.xul%24&filter=^[^\0]*%24&hitlimit=&tree=mozilla-central are any indication. In Firefox, the potentially gbk-valued pref goes through label resolution anyway (https://mxr.mozilla.org/mozilla-central/source/dom/encoding/FallbackEncoding.cpp#57) and bug 1177830 probably ends up handling the email prefs. Per-NNTP server setting might break, though. I didn't check if those go through label resolution.
Flags: needinfo?(hsivonen) → needinfo?(annevk)
Given the commit message that bug was considered so I suspect that more IANA compatibility was indeed the reason.
Flags: needinfo?(annevk)
Why the compatibility name for gb18030 was removed? It was present when compatibility names were added:
https://github.com/whatwg/dom/commit/03e170351f095e4fe749e0259a3aafc0cbb49c91
IANA registration is uppercase "GB18030".
It was removed in https://github.com/whatwg/dom/commit/dd172fa5f8c2fc82d0c66b7f9305fd59666c95ba. I'm guessing that since all of Gecko/WebKit/Blink agreed that made more sense.
I filed https://github.com/w3c/web-platform-tests/issues/2453 on the test Ms2ger mentioned.
OK. Let's update the canonical name for "gbk".
Status: UNCONFIRMED → NEW
Ever confirmed: true
Depends on: encoding_rs
Fixed by bug 1261841.
Assignee: nobody → hsivonen
Whiteboard: [fixed by encoding_rs]
Target Milestone: --- → mozilla56
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.