Closed Bug 33162 Opened 25 years ago Closed 23 years ago

missing Japanese characters on Linux

Categories

(Core :: Internationalization, defect, P2)

All
Linux
defect

Tracking

()

VERIFIED FIXED

People

(Reporter: masaki.katakai, Assigned: bstell)

References

Details

(Keywords: intl, Whiteboard: [PDT+]wait for tree open to check in)

Attachments

(8 files)

I'll create an attachment that shows snapshot on screen. I checked how Mozilla and Communicator 4.x display the japanese characters. I found many characters are displays as `?'. For example, EUC 0xa1e2 is missing, is drawn as `?'. On Windows, it looks fine.
It seems we have two problems here. 1. some characters are rendering by some font which we believe it have the character glyph but it don't . That is the cause of the rendering of t 2. We have bugs in unicode to JIS x0208 conversion for the following characters- U+2015,U+2010,U+2225,U+2260,etc
mark this as assign. cata- can you help to look at the Unicode to JIS X 0208 converter table ? I think we have some bug there.
Status: UNCONFIRMED → ASSIGNED
Ever confirmed: true
move to M16
Target Milestone: --- → M16
Keywords: beta2
reassign to bobj
Assignee: ftang → bobj
Status: ASSIGNED → NEW
Reassigned to nhotta. Tentatively set TM to M17.
Assignee: bobj → nhotta
Target Milestone: M16 → M17
Accepting, if this is just a change to the conversion table, but no more unix bugs to me, please (>_<).
Status: NEW → ASSIGNED
Keywords: nsbeta2
Putting on [nsbeta2+] radar for beta2 fix.
Whiteboard: [nsbeta2+]
Reassign to ftang.
Assignee: nhotta → ftang
Status: ASSIGNED → NEW
Take it back from nhotta.
Status: NEW → ASSIGNED
cata- can you help erik take a look at this ?
Assignee: ftang → cata
Status: ASSIGNED → NEW
Status: NEW → ASSIGNED
reassign to ftang
Assignee: cata → ftang
Status: ASSIGNED → NEW
please help. There are two kind of problem. Some character are display as blank and some characters are display as ?. My feeling is it probably is related to the nsIAtom issue, but I am not sure.
Assignee: ftang → erik
Status: ASSIGNED → NEW
The Unix version of Mozilla has improved since this bug report was filed, but we still have some question marks. For example: EUC 0xA2A8 -> U+203B (REFERENCE MARK) This character is not being displayed properly on Unix because we have a hack that zeroes out all Unicodes from U+0000 to U+2200 in the font. We do this because JIS X 0208 fonts contain some of the CP1252 characters such as smart quotes, and since JIS fonts are much larger than 8859-1 fonts on Unix, CP1252 documents look strange (e.g. very large smart quote inside small Western text). This one might be a bit complicated to fix. We probably want to use the JIS glyphs when the surrounding text is (nearly as) large as the JIS font. When the surrounding text is much smaller that the JIS font, then we may want to use our transliteration routine instead (e.g. smart quote becomes regular ASCII quote). Even better might be a solution where we try to stick to the same font as much as possible, without switching to a JIS font that may be too tall, too wide, or look strange. However, this would violate the CSS2 spec, which says that we must go down the font-family list *for each character* in the element. MSIE doesn't follow CSS2 in this regard, and maybe that is justifiable (and CSS2 is "wrong"). Accepting bug for now.
Status: NEW → ASSIGNED
As I mention, I see two kinds of problem. Some characters display as blank and some display as question mark The following characters display as ? EUC-JP[0xA1BD] == U+2015 EUC-JP[0xA1BE] == U+2010 # HYPHEN EUC-JP[0xA2A8] == U+203b # REFERENCE MARK EUC-JP[0xA2F2] == U+212b # ANGSTROM SIGN The following characters display as blank EUC-JP[0xA2C0] == U+222A # UNION EUC-JP[0xA2C1] == U+2229 # INTERSECTION EUC-JP[0xA2CC] == U+FFE2 # NOT SIGN EUC-JP[0xA2DC] == U+2220 # ANGLE EUC-JP[0xA2DD] == U+22A5 # UP TACK EUC-JP[0xA2E1] == U+2261 # IDENTICAL TO EUC-JP[0xA2E2] == U+2252 # APPROXIMATELY EQUAL TO OR THE IMAGE OF EUC-JP[0xA2E5] == U+221A # SQUARE ROOT EUC-JP[0xA2E8] == U+2235 # BECAUSE EUC-JP[0xA2E9] == U+222b # INTEGRAL
The first kind of problem mentioned by frank is caused by the hack Erik talked about in his email. The 2nd problem is not seen at least on Exceed X-server and on HP. I believe that probably is a font problem and is not our concern. If CSS2 could not produce a consistent looking, it is a problem. I believe the problem mostly exist between single byte charset font and doublebyte charset font. If a string is in single byte charset, we should let single byte charset font take precedence. Same thing should happen to double byte. Will that solve the problem?
Shanjian, in this case we cannot give precedence to a single-byte font because there is no single-byte font with those characters (e.g. smart quotes). I suppose we could ignore the double-byte font in that case, and then fall back to the transliteration using an ASCII font. But that means that we need to look at neighboring characters, and that is quite a big change that I haven't tried or even given much thought to yet.
after discuss with erik. We decide to nsbeta2- this bug. The reason is the following 1. We cannot see the "shown as blank" problem on erik machine, either erik's other fix fix it or there are some strang issue with my fonts. In any cases, the missing characters are not belong to frequent use characters. So... we could hold it after nsbeta2. The only character in this catergory have higher usage is EUC-JP[0xA2E8] == U+2235 # BECAUSE but it is still rarely used. For the those characters shown as '?', EUC-JP[0xA2A8] == U+203b # REFERENCE MARK have higher usage but I don't think we really need it for nsbeta2.
mark it nsbeta2-
Whiteboard: [nsbeta2+] → [nsbeta2-]
Keywords: nsbeta2nsbeta3
Adding nsbeta2 keyword to bugs with nsbeta2 triage value in status field so the queries don't get screwed up
Keywords: nsbeta2
nsbeta3+ P1 per bug meeting
Priority: P3 → P1
Whiteboard: [nsbeta2-] → [nsbeta2-][nsbeta3+]
This still happens in 2000-08-14-12 Linux build.
Whiteboard: [nsbeta2-][nsbeta3+] → [nsbeta2-][nsbeta3+]possible patch in hand
mark it as P2
Priority: P1 → P2
A Japanese user made a good testcase. http://www.netlaputa.ne.jp/~vmi/software/mozilla/jis.html His platform is Sparc, Solaris 2.6.
erik- should we check in your patch now? sooner is better for this particular issue, right ?
The patch has some serious problems, so we can't use it. However, I have come up with a different change that works quite well. It zeroes out only the Unicodes corresponding to CP1252's 0x80-0x9F range (instead of all Unicodes less than 0x2200).
Whiteboard: [nsbeta2-][nsbeta3+]possible patch in hand → [nsbeta2-][nsbeta3+] fix in hand
could you put the new patch here ? let's review and check it in.
New patch checked in.
Status: ASSIGNED → RESOLVED
Closed: 24 years ago
Resolution: --- → FIXED
Target Milestone: M17 → M18
I verified this in 2000-09-13-12 Linux build.
Status: RESOLVED → VERIFIED
It seems 2240-2269 weren't displayed. http://www.netlaputa.ne.jp/~vmi/software/mozilla/jis.txt
Reopend. As Koike-san mentioned, there are still some problems, - JIS 0x2240-0x2269 characters can not be displayed, just blank - JIS 0x2273, 0x2277, 0x2278 are not displayed properly - JIS 0x2146-0x2149 are displayed as HALFWIDTH I'll attachment a snapshot. For 0x2240-0x2269 problem, I'm thinking mapping table for JISX208 is not correct. For example, JIS 0x2240 should display the exact 0x2240 code point jis-fixed japanese font, however, it seems that the character 0x2240 is mapped to JIS 0x2d7d. JIS 0x2d?? is called IBM-NEC vendor specific area and jis-fixed japanese font does not contain any characters in this area. sun-gothic fonts bundled in Solaris provide those characters, so when I switched to sun-gothic for jisx208, I can see the characters. However, we should not use those area. It's dependend on fonts. What do you think?
Status: VERIFIED → REOPENED
Resolution: FIXED → ---
Attached image snapshot
I don't think we can make everybody happy for Beta 3 or even RTM. Since we are using Unicode internally, and since most Unix systems are missing the CP1252 fonts (they only have iso8859-1), and since JIS X 0208 happens to have some of the CP1252 characters, Mozilla picks up the huge Japanese fonts even when the surrounding text is small Western text (e.g. for quotation marks aka smart quotes). There are a lot of CP1252 documents out there because of the dominance of Windows, so we need to pay attention to those characters. Unfortunately, the hack that we came up with has some nasty effects on Japanese documents, so we are forced to choose the lesser of two evils. Much of this can be blamed on the poor state of fonts on X, the dominance of Windows, Mozilla's decision to implement CSS2 to the letter, and the imminence of RTM (it's too late to make big changes). However, I'm reassigning this bug to Frank, to have somebody look at the JIS <-> Unicode conversion tables, especially JIS 2240 - 2269.
Assignee: erik → ftang
Status: REOPENED → NEW
I think the proper action for now is 1. ignore the problem of those character which overlap with windows-1252. 2. Fix the conversion table that map to the NEC 0x2Dxx range. I am worknig on 2. and should have an update table very soon.
Fix in the local tree. The change is in mozilla/intl/uconv/ucvja/jis0208.uf file. This file is compressed binary data. I don't think anyone can review the changes. I remove the NEC specific mapping from it.
PDT agrees P2 for Frank's part (2)
Whiteboard: [nsbeta2-][nsbeta3+] fix in hand → [nsbeta2-][nsbeta3+][PDTP2] fix in hand
mark it fixed.
Status: NEW → RESOLVED
Closed: 24 years ago24 years ago
Resolution: --- → FIXED
Please see: http://www.netlaputa.ne.jp/~vmi/software/mozilla/jis.html I found Following problems: * JIS code 224C is drawn as blank. * JIS code 2146, 2147, 2148, 2149 are not drawn as FULLWIDTH characters. * JIS code 2273, 2277, 2278 are not drawn corresponding fonts. Please compare http://www.netlaputa.ne.jp/~vmi/software/mozilla/jis.txt on Mozilla with http://www.netlaputa.ne.jp/~vmi/software/mozilla/term.gif
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Frank, please take a look at JIS 224C. Maybe we have a problem in the converter? JIS 2146-9 are drawn half-width due to the CP1252 problem. I don't think we will fix this for Netscape 6. Sorry. We may want to consider drawing JIS 2273, 2277 and 2278 in the JIS font, since they are probably rare in CP1252 documents, and it probably doesn't matter if we draw them too large in CP1252 docs anyway.
somehow U+FFE2 also map to 0x7C7B in the CP932 table which cause this problem. The half width issue won't be solve. JISx0208 only specify what character it encode, it does not specify the width of the glyph. Move this bug to future.
Status: REOPENED → ASSIGNED
Whiteboard: [nsbeta2-][nsbeta3+][PDTP2] fix in hand → [nsbeta2-][nsbeta3-][PDTP2] fix in hand
Target Milestone: M18 → Future
Keywords: intl, nsbeta1
clear out the status whiteboard , Target Milestone and keyword field. Reassign this to bstell to look again after we ship 6.0 RTM
Assignee: ftang → bstell
Status: ASSIGNED → NEW
Keywords: nsbeta2, nsbeta3
Whiteboard: [nsbeta2-][nsbeta3-][PDTP2] fix in hand
Target Milestone: Future → ---
Changed QA contact to ylong@netscape.com.
QA Contact: teruko → ylong
Target Milestone: --- → mozilla0.9.1
Status: NEW → ASSIGNED
brian - what's the status on this one?
Target Milestone: mozilla0.9.1 → mozilla0.9.2
Target Milestone: mozilla0.9.2 → mozilla0.9.1
224c seems okay 2146, 2147, 2148, 2149 are half width 2273, 2277, 2278 appear to be transliterated which gives a poor result
Target Milestone: mozilla0.9.1 → mozilla0.9.2
changing to nsbeta1-. this one does not meet beta stopper guidelines.
Keywords: nsbeta1nsbeta1-
Keywords: nsCatFood, rtm
Adding nsCatFood and RTM keywords. This looks pretty bad.
Adding nsCatFood and RTM keywords. This looks pretty bad.
This is a difficult one. The current behavior is becuase a hack erik put in to prevent JIS font used to display smart quote in windows1252 document. I wonder do we still need this after bstell add the per language group font fallback. (Not sure, may still need it because we probably will still hit JIS font before we hit the transliteration fallback)
Is there that we can decide zero out or not base on the language group?
can someone create a new screen shot?
pdt+ base on 6/11 pdt meeting.
Whiteboard: [PDT+]
> can someone create a new screen shot? Only JIS symbol characters, as follows: http://www.netlaputa.ne.jp/~vmi/software/mozilla/jis.html This image is jis.txt displayed by kterm & Mozilla (checkout from CVS at 2001-06-12 JST). Configuration of Japanese monospace font is misc-fixed-jisx0208.1983-0 (14dot). JIS[0x2144] is not monospace font. JIS[0x2146-0x2149] is not displayed. JIS[0x2273,0x2277,0x2278] is not valid font.
IWAMURO, Would it be possible to get a file like jis.txt but with just the characters of interest?
these characters: JIS[0x2144] is not monospace font. JIS[0x2146-0x2149] is not displayed. JIS[0x2273,0x2277,0x2278] is not valid font. are disbled by this code: 1574 * XXX This is a bit of a hack. Documents containing the CP1252 1575 * extensions of Latin-1 (e.g. smart quotes) will display with those 1576 * special characters way too large. This is because they happen to 1577 * be in these large double byte fonts. So, we disable those 1578 * characters here. Revisit this decision later. 1579 */ 1580 if (aSelf->Convert == DoubleByteConvert) { 1581 PRUint32* map = aSelf->mMap; 1584 REMOVE_CHAR(map, 0x20AC); ... The goal of this code was to use transliteration in western documents instead of the large glyphs from double byte fonts. 1.117 <erik@netscape.com> 11 Sep 2000 14:03 bug 33162; instead of zeroing out all Unicodes less than 0x2200, we just zero out the common ones that correspond to CP1252 (for things like smart quotes), so that we can still see most of the JIS X 0208 characters; Perhaps instead of disabling these glyphs from the double byte fonts so the glyph lookup runs all the way to the tranliterator we should add an early transliterator to the loaded fonts list for non-double byte documents. This way non-double byte documents will transliterate instead of using these double byte glyphs and double byte documents will use these glyphs.
> Perhaps instead of disabling these glyphs from the double byte fonts so the > glyph lookup runs all the way to the tranliterator we should add an early > transliterator to the loaded fonts list for non-double byte documents. This made the Asian font work but broke the euro on lang groups that are western-ish but not "x-western" like baltic which is "x-baltic".
The reason the previous idea broke the euro is that the current code by disabling the Asian glyphs allows the glyph search code to find the euro in one of the other single byte fonts (iso-8859-15) before hitting the transliterator. When I added the early transliterator it stopped the font search from finding the Asian glyphs but also stopped the font search from finding the other single byte glyphs. The correct fix would be to have 2 copies of all the Asian font maps: one for single byte documents and one for double byte documents. This however would be a very big change and seems highly unlikely to be approved any time before the next release. Perhaps we should look at the user's locale and if it is an Asian locale not disable the glyphs. That way Asian users will see the Asian (bigger) glyphs and non-Asian users will see the smaller (non-Asian) glyphs.
attachment 38989 [details] [diff] [review] re-enables double byte special chars. For single byte documents it adds a special char tranliterator before the double byte fonts are checked so that the oversized glyphs in double byte fonts will not be used in (single byte type docs)
Whiteboard: [PDT+] → [PDT+] have patch, need r= sr= a=
Whiteboard: [PDT+] have patch, need r= sr= a= → [PDT+] have patch r=ftang, need sr= a=
change status to "[PDT+] r=ftang, ask blizzard to sr= (6/19 9;40) also need a="
Whiteboard: [PDT+] have patch r=ftang, need sr= a= → [PDT+] r=ftang, ask blizzard to sr= (6/19 9;40) also need a=
sr=blizzard
Whiteboard: [PDT+] r=ftang, ask blizzard to sr= (6/19 9;40) also need a= → [PDT+] r=ftang, sr=blizzard, (6/19 12:49 asked for a=) need a=
a= asa@mozilla.org for checkin to the trunk. (on behalf of drivers)
Blocks: 83989
Whiteboard: [PDT+] r=ftang, sr=blizzard, (6/19 12:49 asked for a=) need a= → [PDT+]wait for tree open to check in
checked into trunk
Status: ASSIGNED → RESOLVED
Closed: 24 years ago23 years ago
Resolution: --- → FIXED
I checked that all JIS symbol characters were displayed correctly. Thanks.
It looks fine with me on 0.9.2(06-25) build, mark it as verified.
Status: RESOLVED → VERIFIED
3718 if (western_font) { 3719 NS_ASSERTION(western_font->SupportsChar(aChar), "font supposed to support this char"); 3720 return font; 3721 } This should return western_font not font.
Status: VERIFIED → REOPENED
Resolution: FIXED → ---
from email: > I just checked 33162, and it was not reopened yet. Is it a bugzilla problem? > Anyway, you can put r=shanjian there. > > thanks > > shanjian
Status: REOPENED → ASSIGNED
Target Milestone: mozilla0.9.2 → ---
fixed in bug 86368
Status: ASSIGNED → RESOLVED
Closed: 23 years ago23 years ago
Resolution: --- → FIXED
Mark it as verified. Please re-open if still has problem.
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: