Mr. Xianping Ge reported a Unicode to GBK converter bug to me in a few email communications. Here I relay them to the bugzilla database. ================================Msg 2 ===================== In message <7DAA70BEB463D211AC3E00A0C96B7AB20359EFEA@orsmsx41.jf.intel.com>, "X u, Yueheng" writes: > Dear Mr. Ge, > > Thank you for your time spent on this. Did you tested your local >build on both Windows and Linux platform ? I tested it on Linux. I do not have a Windows box. > I have not touched those code for months since last check in and now >I am busy with other thigns. Let's test > them thoroughly offline before we check them in. Do you have a test >page that contains those characters > that are in GBK but not in GB2312 ? > > Can you send me those test files (or put them in a public accessible web > server) so I can verify the > problems you mentioned and also verify any changes we made are effective. http://www.ics.uci.edu/~xge/clinux/gbk-test-files/ Tao-Hua-Yuan.gbk.html zhu-rong-ji.gbk.html xcin.gbk.html Talk to you later. -- Xianping firstname.lastname@example.org =============================== Msg 1 ===================== >> The problem is that Mozilla does not load my "gb13000.1993-1" font >> for rendering GBK text. It seems to get the glyphs from GB2312, Japanese, >> etc, and cannot display the character 'Rong(2)' in 'Premier Zhu Rongji'. >> Any suggestion? >> >As for GBK support in Mozilla, it is already working, you just need to set >charset to x-gbk in your HTML page's meta tag and it willl work. >If it is not working, blame Frank Tang ( email@example.com ). See the >email paragraph below. > > >GB2312V2 - for now "windows-936" >HZ- "HZ-GB-2312" >GBK- "x-gbk" I spent last night figuring this out. I removed all other Han fonts (GB, Big5, Japanese, etc) and Mozilla loaded my "gb13000.1993-1" font, but still had problem correctly rendering a GBK (or a simple GB2312) file. I looked into some of your files (under mozilla/intl/uconv/ucvcn) and found some typo's. Attached is the patch (based M11) for your consideration; can you merge into the main CVS if it's OK? Here is a short description of the patches: 1. Associate X charset encoding "gb13000.1993-1" with mime type "x-gbk": gfx/src/gtk/nsFontMetricsGTK.cpp gfx/src/xlib/nsFontMetricsXlib.cpp 2. Add a menu item for GBK (close to the menuitem GB): editor/ui/composer/content/editorOverlay.xul editor/ui/composer/locale/en-US/editorOverlay.dtd mailnews/base/resources/locale/en-US/messenger.dtd mailnews/compose/resources/content/messengercompose.xul mailnews/compose/resources/locale/en-US/messengercompose.dtd xpfe/browser/resources/content/navigatorOverlay.xul xpfe/browser/resources/locale/en-US/navigator.dtd 3. Associate Linux locale zh_CN.GBK with "x-gbk". intl/uconv/src/unixcharset.properties 4. Some typo's, bugs corrected. (New bugs introduced? :-) intl/uconv/ucvcn/nsGB2312ToUnicodeV2.cpp intl/uconv/ucvcn/nsGBKToUnicode.cpp intl/uconv/ucvcn/nsUnicodeToGB2312V2.cpp intl/uconv/ucvcn/nsUnicodeToGBK.cpp intl/uconv/ucvcn/nsUnicodeToHZ.cpp The "corrections" for these files: - change 0x41 to 0x40 for the starting value of GBK right byte. - change "row size" from (0x00FE - 0x0080) to 0x00BF (==0xFE-0x3F) - remove conflicts between the variables i in outer and inner blocks. - The result of (i / 0x00BF + 0x0081) ( i % 0x00BF+ 0x0040) should not (or unnecessary) be |0x80. For GBK, OR'ing the right byte may be wrong; e.g. between 0x40 and 0x80. - big-endian problem. The first byte of a Uint16 may not be the least significant byte on a big-endian machine: + #if 0 // This will run into trouble for big-endian machines pSrcDBCode = (DByte *)pSrc; *aDest = pSrcDBCode->leftbyte; + #else + *aDest= (unsigned char)(*pSrc); + #endif - struct packing. In some places, you alias (using pointer) the two bytes in a char array by a DByte struct. The two bytes in a struct may or may not packed tightly; there might be a hole if the two bytes are 4-byte aligned. This does not seem to be a problem with GCC, but personally I think the alternative (directly working with aDest = 'x', aDest = 'x') is simpler, and more robust. 5 years ago, I wrote some C code on SCO Unix, it took me a long time to find the bug introduced by struct packing/padding. -- Xianping firstname.lastname@example.org
Code ready for check in, Pending review from Ftang@netscape.com
Status: NEW → ASSIGNED
erik- Please review change from email@example.com Created an attachment (id=6404) nsUnicodeToGBK.diff - r=ftang Created an attachment (id=6402) nsFontMetricsXlib.diff <- erik please review. I don't think this make sense since no code referred to GBK here. Created an attachment (id=6401) nsFontMetricsGTK.diff <- erik please review. I think it is reasonable but we should let erik approve this.
can we mark this M16 ?
As far as I know, the Xlib version is not being maintained/developed any more, so there is no need to check in that fix. Also, I agree with Frank that it is incomplete anyway. The other one (for GTK) is fine.
can you check in this by 5/16 . If so, please mark this M16
I will check the fix in this week.
Target Milestone: --- → M16
fix checked in last night. But probably need GBK font installed to test. In the following test page of zhu-rong-ji.gbk.html, the correct behavior of gbk enabled browser should render Mr. Zhu Rong-Ji's name correctly. A GB2312 only browser will miss the character 'Rong'. http://www.ics.uci.edu/~xge/clinux/gbk-test-files/ Tao-Hua-Yuan.gbk.html zhu-rong-ji.gbk.html xcin.gbk.html
Status: ASSIGNED → RESOLVED
Last Resolved: 18 years ago
Resolution: --- → FIXED
I verified this in 2000-05-31-08 build.
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.