Closed
Bug 31380
Opened 25 years ago
Closed 24 years ago
Unicode to GBK converter not working for some GBK chars
Categories
(Core :: Internationalization, defect, P3)
Core
Internationalization
Tracking
()
VERIFIED
FIXED
M16
People
(Reporter: yueheng.xu, Assigned: yueheng.xu)
Details
Attachments
(3 files)
857 bytes,
patch
|
Details | Diff | Splinter Review | |
484 bytes,
patch
|
Details | Diff | Splinter Review | |
1.88 KB,
patch
|
Details | Diff | Splinter Review |
Mr. Xianping Ge reported a Unicode to GBK converter bug to me in a few email communications. Here I relay them to the bugzilla database. ================================Msg 2 ===================== In message <7DAA70BEB463D211AC3E00A0C96B7AB20359EFEA@orsmsx41.jf.intel.com>, "X u, Yueheng" writes: > Dear Mr. Ge, > > Thank you for your time spent on this. Did you tested your local >build on both Windows and Linux platform ? I tested it on Linux. I do not have a Windows box. > I have not touched those code for months since last check in and now >I am busy with other thigns. Let's test > them thoroughly offline before we check them in. Do you have a test >page that contains those characters > that are in GBK but not in GB2312 ? > > Can you send me those test files (or put them in a public accessible web > server) so I can verify the > problems you mentioned and also verify any changes we made are effective. http://www.ics.uci.edu/~xge/clinux/gbk-test-files/ Tao-Hua-Yuan.gbk.html zhu-rong-ji.gbk.html xcin.gbk.html Talk to you later. -- Xianping xge@ics.uci.edu =============================== Msg 1 ===================== >> The problem is that Mozilla does not load my "gb13000.1993-1" font >> for rendering GBK text. It seems to get the glyphs from GB2312, Japanese, >> etc, and cannot display the character 'Rong(2)' in 'Premier Zhu Rongji'. >> Any suggestion? >> >As for GBK support in Mozilla, it is already working, you just need to set >charset to x-gbk in your HTML page's meta tag and it willl work. >If it is not working, blame Frank Tang ( ftang@netscape.com ). See the >email paragraph below. > > >GB2312V2 - for now "windows-936" >HZ- "HZ-GB-2312" >GBK- "x-gbk" I spent last night figuring this out. I removed all other Han fonts (GB, Big5, Japanese, etc) and Mozilla loaded my "gb13000.1993-1" font, but still had problem correctly rendering a GBK (or a simple GB2312) file. I looked into some of your files (under mozilla/intl/uconv/ucvcn) and found some typo's. Attached is the patch (based M11) for your consideration; can you merge into the main CVS if it's OK? Here is a short description of the patches: 1. Associate X charset encoding "gb13000.1993-1" with mime type "x-gbk": gfx/src/gtk/nsFontMetricsGTK.cpp gfx/src/xlib/nsFontMetricsXlib.cpp 2. Add a menu item for GBK (close to the menuitem GB): editor/ui/composer/content/editorOverlay.xul editor/ui/composer/locale/en-US/editorOverlay.dtd mailnews/base/resources/locale/en-US/messenger.dtd mailnews/compose/resources/content/messengercompose.xul mailnews/compose/resources/locale/en-US/messengercompose.dtd xpfe/browser/resources/content/navigatorOverlay.xul xpfe/browser/resources/locale/en-US/navigator.dtd 3. Associate Linux locale zh_CN.GBK with "x-gbk". intl/uconv/src/unixcharset.properties 4. Some typo's, bugs corrected. (New bugs introduced? :-) intl/uconv/ucvcn/nsGB2312ToUnicodeV2.cpp intl/uconv/ucvcn/nsGBKToUnicode.cpp intl/uconv/ucvcn/nsUnicodeToGB2312V2.cpp intl/uconv/ucvcn/nsUnicodeToGBK.cpp intl/uconv/ucvcn/nsUnicodeToHZ.cpp The "corrections" for these files: - change 0x41 to 0x40 for the starting value of GBK right byte. - change "row size" from (0x00FE - 0x0080) to 0x00BF (==0xFE-0x3F) - remove conflicts between the variables i in outer and inner blocks. - The result of (i / 0x00BF + 0x0081) ( i % 0x00BF+ 0x0040) should not (or unnecessary) be |0x80. For GBK, OR'ing the right byte may be wrong; e.g. between 0x40 and 0x80. - big-endian problem. The first byte of a Uint16 may not be the least significant byte on a big-endian machine: + #if 0 // This will run into trouble for big-endian machines pSrcDBCode = (DByte *)pSrc; *aDest = pSrcDBCode->leftbyte; + #else + *aDest= (unsigned char)(*pSrc); + #endif - struct packing. In some places, you alias (using pointer) the two bytes in a char array by a DByte struct. The two bytes in a struct may or may not packed tightly; there might be a hole if the two bytes are 4-byte aligned. This does not seem to be a problem with GCC, but personally I think the alternative (directly working with aDest[0] = 'x', aDest[1] = 'x') is simpler, and more robust. 5 years ago, I wrote some C code on SCO Unix, it took me a long time to find the bug introduced by struct packing/padding. -- Xianping xge@ics.uci.edu
Assignee | ||
Comment 4•24 years ago
|
||
Code ready for check in, Pending review from Ftang@netscape.com
Status: NEW → ASSIGNED
Comment 5•24 years ago
|
||
erik- Please review change from yueheng.xu@intel.com Created an attachment (id=6404) nsUnicodeToGBK.diff - r=ftang Created an attachment (id=6402) nsFontMetricsXlib.diff <- erik please review. I don't think this make sense since no code referred to GBK here. Created an attachment (id=6401) nsFontMetricsGTK.diff <- erik please review. I think it is reasonable but we should let erik approve this.
Comment 6•24 years ago
|
||
can we mark this M16 ?
Comment 7•24 years ago
|
||
As far as I know, the Xlib version is not being maintained/developed any more, so there is no need to check in that fix. Also, I agree with Frank that it is incomplete anyway. The other one (for GTK) is fine.
Comment 8•24 years ago
|
||
can you check in this by 5/16 . If so, please mark this M16
Assignee | ||
Comment 10•24 years ago
|
||
fix checked in last night. But probably need GBK font installed to test. In the following test page of zhu-rong-ji.gbk.html, the correct behavior of gbk enabled browser should render Mr. Zhu Rong-Ji's name correctly. A GB2312 only browser will miss the character 'Rong'. http://www.ics.uci.edu/~xge/clinux/gbk-test-files/ Tao-Hua-Yuan.gbk.html zhu-rong-ji.gbk.html xcin.gbk.html
Status: ASSIGNED → RESOLVED
Closed: 24 years ago
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•