Closed Bug 844082 Opened 7 years ago Closed 7 years ago

Simplified Chinese should use GBK as the fallback encoding instead of GB18030

Categories

(Mozilla Localizations :: zh-CN / Chinese (Simplified), defect)

defect
Not set

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: hsivonen, Assigned: hsivonen)

Details

Attachments

(1 file)

No description provided.
Oops. I pressed enter too soon.

Currently, the Simplified Chinese localization sets GB18030 as the fallback encoding for unlabeled legacy content. Since GBK has been around for longer than GB18030, one would expect users to encounter more unlabeled GBK content than unlabeled GB18030 content.

I haven't had a chance to verify what the fallback encoding is in Internet Explorer on a Simplified Chinese Windows system, but if the fallback in IE is equivalent to our GBK rather than our GB18030, we should change our fallback to GBK.
Note that GB2312 will go away once we implement the Encoding Standard. A GBK decoder decodes GB2312 content.
Assignee: shaohua.wen → hsivonen
Status: NEW → ASSIGNED
How it can be tested?
(In reply to Shaohua Wen from comment #3)
> How it can be tested?

Try loading http://hsivonen.iki.fi/test/moz/check-charset.htm in IE on Windows that was installed as Simplified Chinese as the primary language and that hasn't had encoding stuff reconfigured by the user and see what name gets displayed.
So for firefox, the correct behavior should be display GBK as the fallback charset right?

currently I tried with latest Aurora and get this:
Check the fallback charset

Your fallback charset is: GB2312
(In reply to Shaohua Wen from comment #5)
> So for firefox, the correct behavior should be display GBK as the fallback
> charset right?

Yes, if that's what IE does.

> currently I tried with latest Aurora and get this:
> Check the fallback charset
> 
> Your fallback charset is: GB2312

That's weird. The result doesn't match what's in the localization repository. If you check about:config, is the line intl.charset.default bold and its value set to GB2312?

Anyway, GBK is a superset of GB2312, and my understanding is that our GB2312 is already actually a GBK decoder. The plan is to make GB2312 an alias for GBK and just have one encoding internally.
Our GB2312 decoder is the same as the GB18030 decoder.
BTW, given that the GB18030 is the superset of the GBK, why the GBK decoder is needed in the first place? Only to align with what IE is doing?
That said, we will need a more restrictive *encoder* for compatibility, like EUC-JP and Big5.
If you think we can merge gb2312, gbk, and gb18030, I'm for it. Give me updated tables for the specification and I'll write it in.
(In reply to Masatoshi Kimura [:emk] from comment #7)
> Our GB2312 decoder is the same as the GB18030 decoder.
> BTW, given that the GB18030 is the superset of the GBK, why the GBK decoder
> is needed in the first place? Only to align with what IE is doing?

I thought GBK and GB18030 were mutually incompatible supersets of GB2312.

If GB18030 is a superset of GBK, that changes things.
Indeed, our current implementation has some subtle differences between our GBK decoder and our GB18030 decoder.
In Encoding Standard, however, the gb18030 decoder is just the gbk decoder with the gb18030 flag set.
http://encoding.spec.whatwg.org/#gb18030
I have checked in your patch and the new test result in aurora is OK:
Check the fallback charset

Your fallback charset is: gbk
What did the test say in IE?
IE8: 
Check the fallback charset
Your fallback charset is: windows-1252
(In reply to Shaohua Wen from comment #14)
> IE8: 
> Check the fallback charset
> Your fallback charset is: windows-1252

Is this on Simplified Chinese-localized Windows. If yes, that's ... surprising.
In my IE9 on Windows 7 and clean IE10 on Windows 8 (Both are zh-CN version), they say:
Your fallback charset is: gb2312
my IE8 is not on Simplifed Chinese version of Windows.
(In reply to YF (Yang) from comment #16)
> Your fallback charset is: gb2312

Excellent. AFAICK, our GBK is the same thing as gb2312 in IE.

Marking FIXED per comment 12.

Thank you!
Status: ASSIGNED → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.