Closed Bug 33162 Opened 24 years ago Closed 23 years ago

missing Japanese characters on Linux

Categories

(Core :: Internationalization, defect, P2)

All
Linux
defect

Tracking

()

VERIFIED FIXED

People

(Reporter: masaki.katakai, Assigned: bstell)

References

Details

(Keywords: intl, Whiteboard: [PDT+]wait for tree open to check in)

Attachments

(8 files)

I'll create an attachment that shows snapshot on screen.

I checked how Mozilla and Communicator 4.x display the
japanese characters.

I found many characters are displays as `?'. For example,
EUC 0xa1e2 is missing, is drawn as `?'.

On Windows, it looks fine.
It seems we have two problems here.

1. some characters are rendering by some font which we believe it have the

character glyph but it don't . That is the cause of the rendering of t

2. We have bugs in unicode to JIS x0208 conversion for  the following characters-

U+2015,U+2010,U+2225,U+2260,etc

mark this as assign. cata- can you help to look at the Unicode to JIS X 0208 
converter table ? I think we have some bug there.
Status: UNCONFIRMED → ASSIGNED
Ever confirmed: true
move to M16
Target Milestone: --- → M16
Keywords: beta2
reassign to bobj
Assignee: ftang → bobj
Status: ASSIGNED → NEW
Reassigned to nhotta. Tentatively set TM to M17.
Assignee: bobj → nhotta
Target Milestone: M16 → M17
Accepting, if this is just a change to the conversion table, but no more unix 
bugs to me, please (>_<).
Status: NEW → ASSIGNED
Keywords: nsbeta2
Putting on [nsbeta2+] radar for beta2 fix.
Whiteboard: [nsbeta2+]
Reassign to ftang.
Assignee: nhotta → ftang
Status: ASSIGNED → NEW
Take it back from nhotta.
Status: NEW → ASSIGNED
cata- can you help erik take a look at this ?
Assignee: ftang → cata
Status: ASSIGNED → NEW
Status: NEW → ASSIGNED
reassign to ftang
Assignee: cata → ftang
Status: ASSIGNED → NEW
please help. There are two kind of problem. Some character are display as blank 
and some characters are display as ?.
My feeling is it probably is related to the nsIAtom issue, but I am not sure.
Assignee: ftang → erik
Status: ASSIGNED → NEW
The Unix version of Mozilla has improved since this bug report was filed, but we
still have some question marks. For example:

  EUC 0xA2A8 -> U+203B (REFERENCE MARK)

This character is not being displayed properly on Unix because we have a hack
that zeroes out all Unicodes from U+0000 to U+2200 in the font. We do this
because JIS X 0208 fonts contain some of the CP1252 characters such as smart
quotes, and since JIS fonts are much larger than 8859-1 fonts on Unix, CP1252
documents look strange (e.g. very large smart quote inside small Western text).

This one might be a bit complicated to fix. We probably want to use the JIS
glyphs when the surrounding text is (nearly as) large as the JIS font. When the
surrounding text is much smaller that the JIS font, then we may want to use our
transliteration routine instead (e.g. smart quote becomes regular ASCII quote).

Even better might be a solution where we try to stick to the same font as much
as possible, without switching to a JIS font that may be too tall, too wide, or
look strange. However, this would violate the CSS2 spec, which says that we must
go down the font-family list *for each character* in the element. MSIE doesn't
follow CSS2 in this regard, and maybe that is justifiable (and CSS2 is "wrong").

Accepting bug for now.
Status: NEW → ASSIGNED
As I mention, I see two kinds of problem. Some characters display as blank and 
some display as question mark
The following characters display as ?
EUC-JP[0xA1BD] == U+2015 
EUC-JP[0xA1BE] == U+2010 # HYPHEN
EUC-JP[0xA2A8] == U+203b # REFERENCE MARK
EUC-JP[0xA2F2] == U+212b # ANGSTROM SIGN
The following characters display as blank
EUC-JP[0xA2C0] == U+222A # UNION
EUC-JP[0xA2C1] == U+2229 # INTERSECTION
EUC-JP[0xA2CC] == U+FFE2 # NOT SIGN
EUC-JP[0xA2DC] == U+2220 # ANGLE
EUC-JP[0xA2DD] == U+22A5 # UP TACK
EUC-JP[0xA2E1] == U+2261 # IDENTICAL TO
EUC-JP[0xA2E2] == U+2252 # APPROXIMATELY EQUAL TO OR THE IMAGE OF
EUC-JP[0xA2E5] == U+221A # SQUARE ROOT
EUC-JP[0xA2E8] == U+2235 # BECAUSE
EUC-JP[0xA2E9] == U+222b # INTEGRAL
The first kind of problem mentioned by frank is caused by the 
hack Erik talked about in his email. The 2nd problem is not 
seen at least on Exceed X-server and on HP. I believe that 
probably is a font problem and is not our concern. 

If CSS2 could not produce a consistent looking, it is a problem.
I believe the problem mostly exist between single byte charset
font and doublebyte charset font. If a string is in single byte
charset, we should let single byte charset font take precedence.
Same thing should happen to double byte. Will that solve the 
problem?
Shanjian, in this case we cannot give precedence to a single-byte font because
there is no single-byte font with those characters (e.g. smart quotes). I
suppose we could ignore the double-byte font in that case, and then fall back to
the transliteration using an ASCII font. But that means that we need to look at
neighboring characters, and that is quite a big change that I haven't tried or
even given much thought to yet.
after discuss with erik. We decide to nsbeta2- this bug. The reason is the 
following
1. We cannot see the "shown as blank" problem on erik machine, either erik's 
other fix fix it or there are some strang issue with my fonts. In any cases, the 
missing characters are not belong to frequent use characters. So... we could 
hold it after nsbeta2. The only character in this catergory have higher usage is
EUC-JP[0xA2E8] == U+2235 # BECAUSE
but it is still rarely used.

For the those characters shown as '?',  
EUC-JP[0xA2A8] == U+203b # REFERENCE MARK
have higher usage but I don't think we really need it for nsbeta2.
mark it nsbeta2-
Whiteboard: [nsbeta2+] → [nsbeta2-]
Keywords: nsbeta2nsbeta3
Adding nsbeta2 keyword to bugs with nsbeta2 triage value in status field so the 
queries don't get screwed up
Keywords: nsbeta2
nsbeta3+ P1 per bug meeting
Priority: P3 → P1
Whiteboard: [nsbeta2-] → [nsbeta2-][nsbeta3+]
This still happens in 2000-08-14-12 Linux build.
Whiteboard: [nsbeta2-][nsbeta3+] → [nsbeta2-][nsbeta3+]possible patch in hand
mark it as P2
Priority: P1 → P2
A Japanese user made a good testcase.

http://www.netlaputa.ne.jp/~vmi/software/mozilla/jis.html

His platform is Sparc, Solaris 2.6.


erik- should we check in your patch now? sooner is better for this particular 
issue, right ?
The patch has some serious problems, so we can't use it. However, I have come
up with a different change that works quite well. It zeroes out only the
Unicodes corresponding to CP1252's 0x80-0x9F range (instead of all Unicodes less
than 0x2200).
Whiteboard: [nsbeta2-][nsbeta3+]possible patch in hand → [nsbeta2-][nsbeta3+] fix in hand
could you put the new patch here ? let's review and check it in.
New patch checked in.
Status: ASSIGNED → RESOLVED
Closed: 24 years ago
Resolution: --- → FIXED
Target Milestone: M17 → M18
I verified this in 2000-09-13-12 Linux build.
Status: RESOLVED → VERIFIED
It seems 2240-2269 weren't displayed.

http://www.netlaputa.ne.jp/~vmi/software/mozilla/jis.txt

Reopend.

As Koike-san mentioned, there are still some problems,

 - JIS 0x2240-0x2269 characters can not be displayed, just blank
 - JIS 0x2273, 0x2277, 0x2278 are not displayed properly
 - JIS 0x2146-0x2149 are displayed as HALFWIDTH

I'll attachment a snapshot.

For 0x2240-0x2269 problem, I'm thinking mapping table for
JISX208 is not correct.
For example, JIS 0x2240 should display the exact 0x2240
code point jis-fixed japanese font, however, it
seems that the character 0x2240 is mapped to JIS 0x2d7d.
JIS 0x2d?? is called IBM-NEC vendor specific area and
jis-fixed japanese font does not contain any characters
in this area.

sun-gothic fonts bundled in Solaris provide those
characters, so when I switched to sun-gothic for jisx208,
I can see the characters. However, we should not
use those area. It's dependend on fonts.
What do you think?

Status: VERIFIED → REOPENED
Resolution: FIXED → ---
Attached image snapshot
I don't think we can make everybody happy for Beta 3 or even RTM. Since we are
using Unicode internally, and since most Unix systems are missing the CP1252
fonts (they only have iso8859-1), and since JIS X 0208 happens to have some of
the CP1252 characters, Mozilla picks up the huge Japanese fonts even when the
surrounding text is small Western text (e.g. for quotation marks aka smart
quotes). There are a lot of CP1252 documents out there because of the dominance
of Windows, so we need to pay attention to those characters.

Unfortunately, the hack that we came up with has some nasty effects on Japanese
documents, so we are forced to choose the lesser of two evils.

Much of this can be blamed on the poor state of fonts on X, the dominance of
Windows, Mozilla's decision to implement CSS2 to the letter, and the imminence
of RTM (it's too late to make big changes).

However, I'm reassigning this bug to Frank, to have somebody look at the JIS <->
Unicode conversion tables, especially JIS 2240 - 2269.
Assignee: erik → ftang
Status: REOPENED → NEW
I think the proper action for now is
1. ignore the problem of those character which overlap with windows-1252. 
2. Fix the conversion table that map to the NEC 0x2Dxx range.
I am worknig on 2. and should have an update table very soon.
Fix in the local tree. The change is in mozilla/intl/uconv/ucvja/jis0208.uf 
file. This file is compressed binary data. I don't think anyone can review the 
changes. I remove the NEC specific mapping from it.
PDT agrees P2 for Frank's part (2)
Whiteboard: [nsbeta2-][nsbeta3+] fix in hand → [nsbeta2-][nsbeta3+][PDTP2] fix in hand
mark it fixed.
Status: NEW → RESOLVED
Closed: 24 years ago24 years ago
Resolution: --- → FIXED
Please see:
http://www.netlaputa.ne.jp/~vmi/software/mozilla/jis.html

I found Following problems:
* JIS code 224C is drawn as blank.
* JIS code 2146, 2147, 2148, 2149 are not drawn as FULLWIDTH characters.
* JIS code 2273, 2277, 2278 are not drawn corresponding fonts.
  Please compare
  http://www.netlaputa.ne.jp/~vmi/software/mozilla/jis.txt
  on Mozilla with
  http://www.netlaputa.ne.jp/~vmi/software/mozilla/term.gif

Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Frank, please take a look at JIS 224C. Maybe we have a problem in the converter?

JIS 2146-9 are drawn half-width due to the CP1252 problem. I don't think we will
fix this for Netscape 6. Sorry.

We may want to consider drawing JIS 2273, 2277 and 2278 in the JIS font, since
they are probably rare in CP1252 documents, and it probably doesn't matter if
we draw them too large in CP1252 docs anyway.
somehow U+FFE2 also map to 0x7C7B in the CP932 table
which cause this problem.
The half width issue won't be solve. JISx0208 only specify what character it
encode, it does not specify the width of the glyph.
Move this bug to future.
Status: REOPENED → ASSIGNED
Whiteboard: [nsbeta2-][nsbeta3+][PDTP2] fix in hand → [nsbeta2-][nsbeta3-][PDTP2] fix in hand
Target Milestone: M18 → Future
Keywords: intl, nsbeta1
clear out the status whiteboard , Target Milestone and keyword field.
Reassign this to bstell to look again after we ship 6.0 RTM
Assignee: ftang → bstell
Status: ASSIGNED → NEW
Keywords: nsbeta2, nsbeta3
Whiteboard: [nsbeta2-][nsbeta3-][PDTP2] fix in hand
Target Milestone: Future → ---
Changed QA contact to ylong@netscape.com.
QA Contact: teruko → ylong
Target Milestone: --- → mozilla0.9.1
Status: NEW → ASSIGNED
brian - what's the status on this one?
Target Milestone: mozilla0.9.1 → mozilla0.9.2
Target Milestone: mozilla0.9.2 → mozilla0.9.1
224c seems okay
2146, 2147, 2148, 2149 are half width
2273, 2277, 2278 appear to be transliterated which gives a poor result
Target Milestone: mozilla0.9.1 → mozilla0.9.2
changing to nsbeta1-. this one does not meet beta stopper guidelines.
Keywords: nsbeta1nsbeta1-
Keywords: nsCatFood, rtm
Adding nsCatFood and RTM keywords. This looks pretty bad.
Adding nsCatFood and RTM keywords. This looks pretty bad.
This is a difficult one. 
The current behavior is becuase a hack erik put in to prevent JIS font used to 
display smart quote in windows1252 document. 
I wonder do we still need this after bstell add the per language group font 
fallback. (Not sure, may still need it because we probably will still hit JIS 
font before we hit the transliteration fallback)
Is there that we can decide zero out or not base on the language group?
can someone create a new screen shot?
pdt+ base on 6/11 pdt meeting.
Whiteboard: [PDT+]
> can someone create a new screen shot?

Only JIS symbol characters, as follows:
http://www.netlaputa.ne.jp/~vmi/software/mozilla/jis.html

This image is jis.txt displayed by kterm & Mozilla (checkout from CVS at
2001-06-12 JST).
Configuration of Japanese monospace font is misc-fixed-jisx0208.1983-0 (14dot).

JIS[0x2144] is not monospace font.
JIS[0x2146-0x2149] is not displayed.
JIS[0x2273,0x2277,0x2278] is not valid font.
IWAMURO,

Would it be possible to get a file like jis.txt but with just the
characters of interest?

these characters:

  JIS[0x2144] is not monospace font.
  JIS[0x2146-0x2149] is not displayed.
  JIS[0x2273,0x2277,0x2278] is not valid font.

are disbled by this code:

1574          * XXX This is a bit of a hack. Documents containing the CP1252
1575          * extensions of Latin-1 (e.g. smart quotes) will display with
those
1576          * special characters way too large. This is because they happen to
1577          * be in these large double byte fonts. So, we disable those
1578          * characters here. Revisit this decision later.
1579          */
1580         if (aSelf->Convert == DoubleByteConvert) {
1581           PRUint32* map = aSelf->mMap;
1584           REMOVE_CHAR(map, 0x20AC);
...

The goal of this code was to use transliteration in western documents instead
of the large glyphs from double byte fonts.

  1.117 <erik@netscape.com> 11 Sep 2000 14:03
  bug 33162; instead of zeroing out all Unicodes less than 0x2200, we just
  zero out the common ones that correspond to CP1252 (for things like smart
  quotes), so that we can still see most of the JIS X 0208 characters;
  
Perhaps instead of disabling these glyphs from the double byte fonts so the
glyph lookup runs all the way to the tranliterator we should add an early
transliterator to the loaded fonts list for non-double byte documents.

This way non-double byte documents will transliterate instead of using these
double byte glyphs and double byte documents will use these glyphs.
> Perhaps instead of disabling these glyphs from the double byte fonts so the
> glyph lookup runs all the way to the tranliterator we should add an early
> transliterator to the loaded fonts list for non-double byte documents.

This made the Asian font work but broke the euro on lang groups that are
western-ish but not "x-western" like baltic which is "x-baltic".
The reason the previous idea broke the euro is that the current code by
disabling the Asian glyphs allows the glyph search code to find the euro
in one of the other single byte fonts (iso-8859-15) before hitting the
transliterator. When I added the early transliterator it stopped the font 
search from finding the Asian glyphs but also stopped the font search from 
finding the other single byte glyphs.

The correct fix would be to have 2 copies of all the Asian font maps: one
for single byte documents and one for double byte documents. This however 
would be a very big change and seems highly unlikely to be approved any
time before the next release.

Perhaps we should look at the user's locale and if it is an Asian locale
not disable the glyphs.

That way Asian users will see the Asian (bigger) glyphs and non-Asian 
users will see the smaller (non-Asian) glyphs.
attachment 38989 [details] [diff] [review] re-enables double byte special chars. 

For single byte documents it adds a special char tranliterator before the double
byte fonts are checked so that the oversized glyphs in double byte fonts will
not be used in (single byte type docs)
Whiteboard: [PDT+] → [PDT+] have patch, need r= sr= a=
Whiteboard: [PDT+] have patch, need r= sr= a= → [PDT+] have patch r=ftang, need sr= a=
change status to "[PDT+] r=ftang, ask blizzard to sr= (6/19 9;40) also need a="
Whiteboard: [PDT+] have patch r=ftang, need sr= a= → [PDT+] r=ftang, ask blizzard to sr= (6/19 9;40) also need a=
sr=blizzard
Whiteboard: [PDT+] r=ftang, ask blizzard to sr= (6/19 9;40) also need a= → [PDT+] r=ftang, sr=blizzard, (6/19 12:49 asked for a=) need a=
a= asa@mozilla.org for checkin to the trunk.
(on behalf of drivers)
Blocks: 83989
Whiteboard: [PDT+] r=ftang, sr=blizzard, (6/19 12:49 asked for a=) need a= → [PDT+]wait for tree open to check in
checked into trunk
Status: ASSIGNED → RESOLVED
Closed: 24 years ago23 years ago
Resolution: --- → FIXED
I checked that all JIS symbol characters were displayed correctly.
Thanks.
It looks fine with me on 0.9.2(06-25) build, mark it as verified.
Status: RESOLVED → VERIFIED
3718       if (western_font) {
3719         NS_ASSERTION(western_font->SupportsChar(aChar), "font supposed to 
support this char");
3720         return font;
3721       }

This should return western_font not font.
Status: VERIFIED → REOPENED
Resolution: FIXED → ---
from email:

> I just checked 33162, and it was not reopened yet. Is it a bugzilla problem? 
> Anyway, you can put r=shanjian there. 
> 
> thanks 
> 
> shanjian 
Status: REOPENED → ASSIGNED
Target Milestone: mozilla0.9.2 → ---
fixed in bug 86368
Status: ASSIGNED → RESOLVED
Closed: 23 years ago23 years ago
Resolution: --- → FIXED
Mark it as verified.  Please re-open if still has problem.
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: