missing Japanese characters on Linux

VERIFIED FIXED

Status

()

P2
normal
VERIFIED FIXED
19 years ago
18 years ago

People

(Reporter: masaki.katakai, Assigned: bstell)

Tracking

({intl})

Trunk
All
Linux
Points:
---

Firefox Tracking Flags

(Not tracked)

Details

(Whiteboard: [PDT+]wait for tree open to check in)

Attachments

(8 attachments)

(Reporter)

Description

19 years ago
I'll create an attachment that shows snapshot on screen.

I checked how Mozilla and Communicator 4.x display the
japanese characters.

I found many characters are displays as `?'. For example,
EUC 0xa1e2 is missing, is drawn as `?'.

On Windows, it looks fine.

Comment 2

19 years ago
It seems we have two problems here.

1. some characters are rendering by some font which we believe it have the

character glyph but it don't . That is the cause of the rendering of t

2. We have bugs in unicode to JIS x0208 conversion for  the following characters-

U+2015,U+2010,U+2225,U+2260,etc

Comment 3

19 years ago
mark this as assign. cata- can you help to look at the Unicode to JIS X 0208 
converter table ? I think we have some bug there.
Status: UNCONFIRMED → ASSIGNED
Ever confirmed: true

Comment 4

19 years ago
move to M16
Target Milestone: --- → M16

Updated

19 years ago
Keywords: beta2

Comment 5

19 years ago
reassign to bobj
Assignee: ftang → bobj
Status: ASSIGNED → NEW

Comment 6

19 years ago
Reassigned to nhotta. Tentatively set TM to M17.
Assignee: bobj → nhotta
Target Milestone: M16 → M17

Comment 7

19 years ago
Accepting, if this is just a change to the conversion table, but no more unix 
bugs to me, please (>_<).
Status: NEW → ASSIGNED

Updated

19 years ago
Keywords: nsbeta2

Comment 8

19 years ago
Putting on [nsbeta2+] radar for beta2 fix.
Whiteboard: [nsbeta2+]

Comment 9

19 years ago
Reassign to ftang.
Assignee: nhotta → ftang
Status: ASSIGNED → NEW

Comment 10

19 years ago
Take it back from nhotta.
Status: NEW → ASSIGNED

Comment 11

19 years ago
cata- can you help erik take a look at this ?
Assignee: ftang → cata
Status: ASSIGNED → NEW

Updated

19 years ago
Status: NEW → ASSIGNED

Comment 12

19 years ago
reassign to ftang
Assignee: cata → ftang
Status: ASSIGNED → NEW

Comment 14

19 years ago
please help. There are two kind of problem. Some character are display as blank 
and some characters are display as ?.
My feeling is it probably is related to the nsIAtom issue, but I am not sure.
Assignee: ftang → erik
Status: ASSIGNED → NEW

Comment 15

19 years ago
The Unix version of Mozilla has improved since this bug report was filed, but we
still have some question marks. For example:

  EUC 0xA2A8 -> U+203B (REFERENCE MARK)

This character is not being displayed properly on Unix because we have a hack
that zeroes out all Unicodes from U+0000 to U+2200 in the font. We do this
because JIS X 0208 fonts contain some of the CP1252 characters such as smart
quotes, and since JIS fonts are much larger than 8859-1 fonts on Unix, CP1252
documents look strange (e.g. very large smart quote inside small Western text).

This one might be a bit complicated to fix. We probably want to use the JIS
glyphs when the surrounding text is (nearly as) large as the JIS font. When the
surrounding text is much smaller that the JIS font, then we may want to use our
transliteration routine instead (e.g. smart quote becomes regular ASCII quote).

Even better might be a solution where we try to stick to the same font as much
as possible, without switching to a JIS font that may be too tall, too wide, or
look strange. However, this would violate the CSS2 spec, which says that we must
go down the font-family list *for each character* in the element. MSIE doesn't
follow CSS2 in this regard, and maybe that is justifiable (and CSS2 is "wrong").

Accepting bug for now.
Status: NEW → ASSIGNED

Comment 16

19 years ago
As I mention, I see two kinds of problem. Some characters display as blank and 
some display as question mark
The following characters display as ?
EUC-JP[0xA1BD] == U+2015 
EUC-JP[0xA1BE] == U+2010 # HYPHEN
EUC-JP[0xA2A8] == U+203b # REFERENCE MARK
EUC-JP[0xA2F2] == U+212b # ANGSTROM SIGN
The following characters display as blank
EUC-JP[0xA2C0] == U+222A # UNION
EUC-JP[0xA2C1] == U+2229 # INTERSECTION
EUC-JP[0xA2CC] == U+FFE2 # NOT SIGN
EUC-JP[0xA2DC] == U+2220 # ANGLE
EUC-JP[0xA2DD] == U+22A5 # UP TACK
EUC-JP[0xA2E1] == U+2261 # IDENTICAL TO
EUC-JP[0xA2E2] == U+2252 # APPROXIMATELY EQUAL TO OR THE IMAGE OF
EUC-JP[0xA2E5] == U+221A # SQUARE ROOT
EUC-JP[0xA2E8] == U+2235 # BECAUSE
EUC-JP[0xA2E9] == U+222b # INTEGRAL

Comment 17

19 years ago
The first kind of problem mentioned by frank is caused by the 
hack Erik talked about in his email. The 2nd problem is not 
seen at least on Exceed X-server and on HP. I believe that 
probably is a font problem and is not our concern. 

If CSS2 could not produce a consistent looking, it is a problem.
I believe the problem mostly exist between single byte charset
font and doublebyte charset font. If a string is in single byte
charset, we should let single byte charset font take precedence.
Same thing should happen to double byte. Will that solve the 
problem?

Comment 19

19 years ago
Shanjian, in this case we cannot give precedence to a single-byte font because
there is no single-byte font with those characters (e.g. smart quotes). I
suppose we could ignore the double-byte font in that case, and then fall back to
the transliteration using an ASCII font. But that means that we need to look at
neighboring characters, and that is quite a big change that I haven't tried or
even given much thought to yet.

Comment 20

19 years ago
after discuss with erik. We decide to nsbeta2- this bug. The reason is the 
following
1. We cannot see the "shown as blank" problem on erik machine, either erik's 
other fix fix it or there are some strang issue with my fonts. In any cases, the 
missing characters are not belong to frequent use characters. So... we could 
hold it after nsbeta2. The only character in this catergory have higher usage is
EUC-JP[0xA2E8] == U+2235 # BECAUSE
but it is still rarely used.

For the those characters shown as '?',  
EUC-JP[0xA2A8] == U+203b # REFERENCE MARK
have higher usage but I don't think we really need it for nsbeta2.

Comment 21

19 years ago
mark it nsbeta2-
Whiteboard: [nsbeta2+] → [nsbeta2-]

Updated

19 years ago
Keywords: nsbeta2 → nsbeta3
Adding nsbeta2 keyword to bugs with nsbeta2 triage value in status field so the 
queries don't get screwed up
Keywords: nsbeta2

Comment 23

19 years ago
nsbeta3+ P1 per bug meeting
Priority: P3 → P1
Whiteboard: [nsbeta2-] → [nsbeta2-][nsbeta3+]

Comment 24

19 years ago
This still happens in 2000-08-14-12 Linux build.

Updated

19 years ago
Whiteboard: [nsbeta2-][nsbeta3+] → [nsbeta2-][nsbeta3+]possible patch in hand

Comment 25

19 years ago
mark it as P2
Priority: P1 → P2

Comment 26

19 years ago
A Japanese user made a good testcase.

http://www.netlaputa.ne.jp/~vmi/software/mozilla/jis.html

His platform is Sparc, Solaris 2.6.


Comment 27

19 years ago
erik- should we check in your patch now? sooner is better for this particular 
issue, right ?

Comment 28

19 years ago
The patch has some serious problems, so we can't use it. However, I have come
up with a different change that works quite well. It zeroes out only the
Unicodes corresponding to CP1252's 0x80-0x9F range (instead of all Unicodes less
than 0x2200).
Whiteboard: [nsbeta2-][nsbeta3+]possible patch in hand → [nsbeta2-][nsbeta3+] fix in hand

Comment 29

19 years ago
could you put the new patch here ? let's review and check it in.

Comment 31

19 years ago
New patch checked in.
Status: ASSIGNED → RESOLVED
Last Resolved: 19 years ago
Resolution: --- → FIXED
Target Milestone: M17 → M18

Comment 32

19 years ago
I verified this in 2000-09-13-12 Linux build.
Status: RESOLVED → VERIFIED

Comment 33

19 years ago
It seems 2240-2269 weren't displayed.

http://www.netlaputa.ne.jp/~vmi/software/mozilla/jis.txt

(Reporter)

Comment 34

19 years ago
Reopend.

As Koike-san mentioned, there are still some problems,

 - JIS 0x2240-0x2269 characters can not be displayed, just blank
 - JIS 0x2273, 0x2277, 0x2278 are not displayed properly
 - JIS 0x2146-0x2149 are displayed as HALFWIDTH

I'll attachment a snapshot.

For 0x2240-0x2269 problem, I'm thinking mapping table for
JISX208 is not correct.
For example, JIS 0x2240 should display the exact 0x2240
code point jis-fixed japanese font, however, it
seems that the character 0x2240 is mapped to JIS 0x2d7d.
JIS 0x2d?? is called IBM-NEC vendor specific area and
jis-fixed japanese font does not contain any characters
in this area.

sun-gothic fonts bundled in Solaris provide those
characters, so when I switched to sun-gothic for jisx208,
I can see the characters. However, we should not
use those area. It's dependend on fonts.
What do you think?

Status: VERIFIED → REOPENED
Resolution: FIXED → ---
(Reporter)

Comment 35

19 years ago
Posted image snapshot

Comment 36

19 years ago
I don't think we can make everybody happy for Beta 3 or even RTM. Since we are
using Unicode internally, and since most Unix systems are missing the CP1252
fonts (they only have iso8859-1), and since JIS X 0208 happens to have some of
the CP1252 characters, Mozilla picks up the huge Japanese fonts even when the
surrounding text is small Western text (e.g. for quotation marks aka smart
quotes). There are a lot of CP1252 documents out there because of the dominance
of Windows, so we need to pay attention to those characters.

Unfortunately, the hack that we came up with has some nasty effects on Japanese
documents, so we are forced to choose the lesser of two evils.

Much of this can be blamed on the poor state of fonts on X, the dominance of
Windows, Mozilla's decision to implement CSS2 to the letter, and the imminence
of RTM (it's too late to make big changes).

However, I'm reassigning this bug to Frank, to have somebody look at the JIS <->
Unicode conversion tables, especially JIS 2240 - 2269.
Assignee: erik → ftang
Status: REOPENED → NEW

Comment 37

19 years ago
I think the proper action for now is
1. ignore the problem of those character which overlap with windows-1252. 
2. Fix the conversion table that map to the NEC 0x2Dxx range.
I am worknig on 2. and should have an update table very soon.

Comment 38

19 years ago
Fix in the local tree. The change is in mozilla/intl/uconv/ucvja/jis0208.uf 
file. This file is compressed binary data. I don't think anyone can review the 
changes. I remove the NEC specific mapping from it.

Comment 39

19 years ago
PDT agrees P2 for Frank's part (2)
Whiteboard: [nsbeta2-][nsbeta3+] fix in hand → [nsbeta2-][nsbeta3+][PDTP2] fix in hand

Comment 40

19 years ago
mark it fixed.
Status: NEW → RESOLVED
Last Resolved: 19 years ago19 years ago
Resolution: --- → FIXED

Comment 41

19 years ago
Please see:
http://www.netlaputa.ne.jp/~vmi/software/mozilla/jis.html

I found Following problems:
* JIS code 224C is drawn as blank.
* JIS code 2146, 2147, 2148, 2149 are not drawn as FULLWIDTH characters.
* JIS code 2273, 2277, 2278 are not drawn corresponding fonts.
  Please compare
  http://www.netlaputa.ne.jp/~vmi/software/mozilla/jis.txt
  on Mozilla with
  http://www.netlaputa.ne.jp/~vmi/software/mozilla/term.gif

Status: RESOLVED → REOPENED
Resolution: FIXED → ---

Comment 42

19 years ago
Frank, please take a look at JIS 224C. Maybe we have a problem in the converter?

JIS 2146-9 are drawn half-width due to the CP1252 problem. I don't think we will
fix this for Netscape 6. Sorry.

We may want to consider drawing JIS 2273, 2277 and 2278 in the JIS font, since
they are probably rare in CP1252 documents, and it probably doesn't matter if
we draw them too large in CP1252 docs anyway.

Comment 43

19 years ago
somehow U+FFE2 also map to 0x7C7B in the CP932 table
which cause this problem.
The half width issue won't be solve. JISx0208 only specify what character it
encode, it does not specify the width of the glyph.
Move this bug to future.
Status: REOPENED → ASSIGNED
Whiteboard: [nsbeta2-][nsbeta3+][PDTP2] fix in hand → [nsbeta2-][nsbeta3-][PDTP2] fix in hand
Target Milestone: M18 → Future

Updated

19 years ago
Keywords: intl, nsbeta1

Comment 44

18 years ago
clear out the status whiteboard , Target Milestone and keyword field.
Reassign this to bstell to look again after we ship 6.0 RTM
Assignee: ftang → bstell
Status: ASSIGNED → NEW
Keywords: nsbeta2, nsbeta3
Whiteboard: [nsbeta2-][nsbeta3-][PDTP2] fix in hand
Target Milestone: Future → ---

Comment 45

18 years ago
Changed QA contact to ylong@netscape.com.
QA Contact: teruko → ylong
(Assignee)

Updated

18 years ago
Target Milestone: --- → mozilla0.9.1
(Assignee)

Updated

18 years ago
Status: NEW → ASSIGNED
brian - what's the status on this one?
(Assignee)

Updated

18 years ago
Target Milestone: mozilla0.9.1 → mozilla0.9.2
(Assignee)

Updated

18 years ago
Target Milestone: mozilla0.9.2 → mozilla0.9.1
(Assignee)

Comment 47

18 years ago
224c seems okay
2146, 2147, 2148, 2149 are half width
2273, 2277, 2278 appear to be transliterated which gives a poor result
Target Milestone: mozilla0.9.1 → mozilla0.9.2
changing to nsbeta1-. this one does not meet beta stopper guidelines.
Keywords: nsbeta1 → nsbeta1-

Updated

18 years ago
Keywords: nsCatFood, rtm
Adding nsCatFood and RTM keywords. This looks pretty bad.
Adding nsCatFood and RTM keywords. This looks pretty bad.

Comment 51

18 years ago
This is a difficult one. 
The current behavior is becuase a hack erik put in to prevent JIS font used to 
display smart quote in windows1252 document. 
I wonder do we still need this after bstell add the per language group font 
fallback. (Not sure, may still need it because we probably will still hit JIS 
font before we hit the transliteration fallback)

Comment 52

18 years ago
Is there that we can decide zero out or not base on the language group?

Comment 53

18 years ago
can someone create a new screen shot?

Comment 54

18 years ago
pdt+ base on 6/11 pdt meeting.

Updated

18 years ago
Whiteboard: [PDT+]

Comment 55

18 years ago
> can someone create a new screen shot?

Only JIS symbol characters, as follows:
http://www.netlaputa.ne.jp/~vmi/software/mozilla/jis.html

This image is jis.txt displayed by kterm & Mozilla (checkout from CVS at
2001-06-12 JST).
Configuration of Japanese monospace font is misc-fixed-jisx0208.1983-0 (14dot).

JIS[0x2144] is not monospace font.
JIS[0x2146-0x2149] is not displayed.
JIS[0x2273,0x2277,0x2278] is not valid font.
(Assignee)

Comment 56

18 years ago
IWAMURO,

Would it be possible to get a file like jis.txt but with just the
characters of interest?

(Assignee)

Comment 57

18 years ago
these characters:

  JIS[0x2144] is not monospace font.
  JIS[0x2146-0x2149] is not displayed.
  JIS[0x2273,0x2277,0x2278] is not valid font.

are disbled by this code:

1574          * XXX This is a bit of a hack. Documents containing the CP1252
1575          * extensions of Latin-1 (e.g. smart quotes) will display with
those
1576          * special characters way too large. This is because they happen to
1577          * be in these large double byte fonts. So, we disable those
1578          * characters here. Revisit this decision later.
1579          */
1580         if (aSelf->Convert == DoubleByteConvert) {
1581           PRUint32* map = aSelf->mMap;
1584           REMOVE_CHAR(map, 0x20AC);
...

The goal of this code was to use transliteration in western documents instead
of the large glyphs from double byte fonts.

  1.117 <erik@netscape.com> 11 Sep 2000 14:03
  bug 33162; instead of zeroing out all Unicodes less than 0x2200, we just
  zero out the common ones that correspond to CP1252 (for things like smart
  quotes), so that we can still see most of the JIS X 0208 characters;
  
Perhaps instead of disabling these glyphs from the double byte fonts so the
glyph lookup runs all the way to the tranliterator we should add an early
transliterator to the loaded fonts list for non-double byte documents.

This way non-double byte documents will transliterate instead of using these
double byte glyphs and double byte documents will use these glyphs.
(Assignee)

Comment 62

18 years ago
> Perhaps instead of disabling these glyphs from the double byte fonts so the
> glyph lookup runs all the way to the tranliterator we should add an early
> transliterator to the loaded fonts list for non-double byte documents.

This made the Asian font work but broke the euro on lang groups that are
western-ish but not "x-western" like baltic which is "x-baltic".
(Assignee)

Comment 63

18 years ago
The reason the previous idea broke the euro is that the current code by
disabling the Asian glyphs allows the glyph search code to find the euro
in one of the other single byte fonts (iso-8859-15) before hitting the
transliterator. When I added the early transliterator it stopped the font 
search from finding the Asian glyphs but also stopped the font search from 
finding the other single byte glyphs.

The correct fix would be to have 2 copies of all the Asian font maps: one
for single byte documents and one for double byte documents. This however 
would be a very big change and seems highly unlikely to be approved any
time before the next release.

Perhaps we should look at the user's locale and if it is an Asian locale
not disable the glyphs.

That way Asian users will see the Asian (bigger) glyphs and non-Asian 
users will see the smaller (non-Asian) glyphs.
(Assignee)

Comment 65

18 years ago
attachment 38989 [details] [diff] [review] re-enables double byte special chars. 

For single byte documents it adds a special char tranliterator before the double
byte fonts are checked so that the oversized glyphs in double byte fonts will
not be used in (single byte type docs)
Whiteboard: [PDT+] → [PDT+] have patch, need r= sr= a=
(Assignee)

Updated

18 years ago
Whiteboard: [PDT+] have patch, need r= sr= a= → [PDT+] have patch r=ftang, need sr= a=

Comment 67

18 years ago
change status to "[PDT+] r=ftang, ask blizzard to sr= (6/19 9;40) also need a="
Whiteboard: [PDT+] have patch r=ftang, need sr= a= → [PDT+] r=ftang, ask blizzard to sr= (6/19 9;40) also need a=
sr=blizzard
(Assignee)

Updated

18 years ago
Whiteboard: [PDT+] r=ftang, ask blizzard to sr= (6/19 9;40) also need a= → [PDT+] r=ftang, sr=blizzard, (6/19 12:49 asked for a=) need a=
a= asa@mozilla.org for checkin to the trunk.
(on behalf of drivers)
Blocks: 83989

Updated

18 years ago
Whiteboard: [PDT+] r=ftang, sr=blizzard, (6/19 12:49 asked for a=) need a= → [PDT+]wait for tree open to check in
(Assignee)

Comment 70

18 years ago
checked into trunk
Status: ASSIGNED → RESOLVED
Last Resolved: 19 years ago18 years ago
Resolution: --- → FIXED

Comment 71

18 years ago
I checked that all JIS symbol characters were displayed correctly.
Thanks.

Comment 72

18 years ago
It looks fine with me on 0.9.2(06-25) build, mark it as verified.
Status: RESOLVED → VERIFIED
(Assignee)

Comment 73

18 years ago
3718       if (western_font) {
3719         NS_ASSERTION(western_font->SupportsChar(aChar), "font supposed to 
support this char");
3720         return font;
3721       }

This should return western_font not font.
Status: VERIFIED → REOPENED
Resolution: FIXED → ---
(Assignee)

Comment 74

18 years ago
from email:

> I just checked 33162, and it was not reopened yet. Is it a bugzilla problem? 
> Anyway, you can put r=shanjian there. 
> 
> thanks 
> 
> shanjian 
Status: REOPENED → ASSIGNED
(Assignee)

Updated

18 years ago
Target Milestone: mozilla0.9.2 → ---
(Assignee)

Comment 75

18 years ago
fixed in bug 86368
Status: ASSIGNED → RESOLVED
Last Resolved: 18 years ago18 years ago
Resolution: --- → FIXED

Comment 76

18 years ago
Mark it as verified.  Please re-open if still has problem.
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.