Closed Bug 33162 Opened 25 years ago Closed 23 years ago

missing Japanese characters on Linux

Tracking

()

Status:

VERIFIED FIXED

People

(Reporter: masaki.katakai, Assigned: bstell)

References

Details

(Keywords: intl, Whiteboard: [PDT+]wait for tree open to check in)

Attachments

(8 files)

differences between Mozilla and Communicator4.x 25 years ago Masaki Katakai 96.50 KB, image/jpeg		Details
This is one possible solution, that hasn't been tested enough yet. 24 years ago Erik van der Poel 2.09 KB, patch		Details \| Diff \| Splinter Review
new patch, zeroing out all the CP1252 0x80-9F chars 24 years ago Erik van der Poel 1.61 KB, patch		Details \| Diff \| Splinter Review
snapshot 24 years ago Masaki Katakai 79.36 KB, image/jpeg		Details
smart_quotes.tar.gz - html pages with a smart quote in ncr, named-entity, etc. 23 years ago kill this account 567 bytes, application/octet-stream		Details
page listing special chars - http://home.earthlink.net/~bobbau/platforms/specialchars/#windows 23 years ago kill this account 20.15 KB, text/html		Details
html pages with all the disable double byte special chars; tar/gz 23 years ago kill this account 1.07 KB, application/octet-stream		Details
patch; re-enable double byte special chars; add tranliterator for single byte docs add a transliterator 23 years ago kill this account 11.35 KB, patch		Details \| Diff \| Splinter Review

Masaki Katakai

Reporter

Description

•

25 years ago

I'll create an attachment that shows snapshot on screen. I checked how Mozilla and Communicator 4.x display the japanese characters. I found many characters are displays as `?'. For example, EUC 0xa1e2 is missing, is drawn as `?'. On Windows, it looks fine.

Masaki Katakai

Reporter

Comment 1

•

25 years ago

Attached image differences between Mozilla and Communicator4.x — Details

Frank Tang

Comment 2

•

25 years ago

It seems we have two problems here. 1. some characters are rendering by some font which we believe it have the character glyph but it don't . That is the cause of the rendering of t 2. We have bugs in unicode to JIS x0208 conversion for the following characters- U+2015,U+2010,U+2225,U+2260,etc

Frank Tang

Comment 3

•

25 years ago

mark this as assign. cata- can you help to look at the Unicode to JIS X 0208 converter table ? I think we have some bug there.

Status: UNCONFIRMED → ASSIGNED

Ever confirmed: true

Frank Tang

Comment 4

•

25 years ago

move to M16

Target Milestone: --- → M16

Frank Tang

Updated

•

25 years ago

Keywords: beta2

Frank Tang

Comment 5

•

25 years ago

reassign to bobj

Assignee: ftang → bobj

Status: ASSIGNED → NEW

bobj

Comment 6

•

25 years ago

Reassigned to nhotta. Tentatively set TM to M17.

Assignee: bobj → nhotta

Target Milestone: M16 → M17

nhottanscp

Comment 7

•

25 years ago

Accepting, if this is just a change to the conversion table, but no more unix bugs to me, please (>_<).

Status: NEW → ASSIGNED

leger

Updated

•

25 years ago

Keywords: nsbeta2

leger

Comment 8

•

25 years ago

Putting on [nsbeta2+] radar for beta2 fix.

Whiteboard: [nsbeta2+]

nhottanscp

Comment 9

•

25 years ago

Reassign to ftang.

Assignee: nhotta → ftang

Status: ASSIGNED → NEW

Frank Tang

Comment 10

•

25 years ago

Take it back from nhotta.

Status: NEW → ASSIGNED

Frank Tang

Comment 11

•

24 years ago

cata- can you help erik take a look at this ?

Assignee: ftang → cata

Status: ASSIGNED → NEW

cata

Updated

•

24 years ago

Status: NEW → ASSIGNED

Frank Tang

Comment 12

•

24 years ago

reassign to ftang

Assignee: cata → ftang

Status: ASSIGNED → NEW

Frank Tang

Comment 13

•

24 years ago

internal test cases- http://babel/Intl_Client/browser/fonts/multibyte_tests/euc-jp/allchars_euc.html

Status: NEW → ASSIGNED

Frank Tang

Comment 14

•

24 years ago

please help. There are two kind of problem. Some character are display as blank and some characters are display as ?. My feeling is it probably is related to the nsIAtom issue, but I am not sure.

Assignee: ftang → erik

Status: ASSIGNED → NEW

Erik van der Poel

Comment 15

•

24 years ago

The Unix version of Mozilla has improved since this bug report was filed, but we still have some question marks. For example: EUC 0xA2A8 -> U+203B (REFERENCE MARK) This character is not being displayed properly on Unix because we have a hack that zeroes out all Unicodes from U+0000 to U+2200 in the font. We do this because JIS X 0208 fonts contain some of the CP1252 characters such as smart quotes, and since JIS fonts are much larger than 8859-1 fonts on Unix, CP1252 documents look strange (e.g. very large smart quote inside small Western text). This one might be a bit complicated to fix. We probably want to use the JIS glyphs when the surrounding text is (nearly as) large as the JIS font. When the surrounding text is much smaller that the JIS font, then we may want to use our transliteration routine instead (e.g. smart quote becomes regular ASCII quote). Even better might be a solution where we try to stick to the same font as much as possible, without switching to a JIS font that may be too tall, too wide, or look strange. However, this would violate the CSS2 spec, which says that we must go down the font-family list *for each character* in the element. MSIE doesn't follow CSS2 in this regard, and maybe that is justifiable (and CSS2 is "wrong"). Accepting bug for now.

Status: NEW → ASSIGNED

Frank Tang

Comment 16

•

24 years ago

As I mention, I see two kinds of problem. Some characters display as blank and some display as question mark The following characters display as ? EUC-JP[0xA1BD] == U+2015 EUC-JP[0xA1BE] == U+2010 # HYPHEN EUC-JP[0xA2A8] == U+203b # REFERENCE MARK EUC-JP[0xA2F2] == U+212b # ANGSTROM SIGN The following characters display as blank EUC-JP[0xA2C0] == U+222A # UNION EUC-JP[0xA2C1] == U+2229 # INTERSECTION EUC-JP[0xA2CC] == U+FFE2 # NOT SIGN EUC-JP[0xA2DC] == U+2220 # ANGLE EUC-JP[0xA2DD] == U+22A5 # UP TACK EUC-JP[0xA2E1] == U+2261 # IDENTICAL TO EUC-JP[0xA2E2] == U+2252 # APPROXIMATELY EQUAL TO OR THE IMAGE OF EUC-JP[0xA2E5] == U+221A # SQUARE ROOT EUC-JP[0xA2E8] == U+2235 # BECAUSE EUC-JP[0xA2E9] == U+222b # INTEGRAL

Shanjian Li

Comment 17

•

24 years ago

The first kind of problem mentioned by frank is caused by the hack Erik talked about in his email. The 2nd problem is not seen at least on Exceed X-server and on HP. I believe that probably is a font problem and is not our concern. If CSS2 could not produce a consistent looking, it is a problem. I believe the problem mostly exist between single byte charset font and doublebyte charset font. If a string is in single byte charset, we should let single byte charset font take precedence. Same thing should happen to double byte. Will that solve the problem?

Erik van der Poel

Comment 18

•

24 years ago

Attached patch This is one possible solution, that hasn't been tested enough yet. — Details — Splinter Review

Erik van der Poel

Comment 19

•

24 years ago

Shanjian, in this case we cannot give precedence to a single-byte font because there is no single-byte font with those characters (e.g. smart quotes). I suppose we could ignore the double-byte font in that case, and then fall back to the transliteration using an ASCII font. But that means that we need to look at neighboring characters, and that is quite a big change that I haven't tried or even given much thought to yet.

Frank Tang

Comment 20

•

24 years ago

after discuss with erik. We decide to nsbeta2- this bug. The reason is the following 1. We cannot see the "shown as blank" problem on erik machine, either erik's other fix fix it or there are some strang issue with my fonts. In any cases, the missing characters are not belong to frequent use characters. So... we could hold it after nsbeta2. The only character in this catergory have higher usage is EUC-JP[0xA2E8] == U+2235 # BECAUSE but it is still rarely used. For the those characters shown as '?', EUC-JP[0xA2A8] == U+203b # REFERENCE MARK have higher usage but I don't think we really need it for nsbeta2.

Frank Tang

Comment 21

•

24 years ago

mark it nsbeta2-

Whiteboard: [nsbeta2+] → [nsbeta2-]

Teruko Kobayashi

Updated

•

24 years ago

Keywords: nsbeta2 → nsbeta3

Daniel Veditz [:dveditz]

Comment 22

•

24 years ago

Adding nsbeta2 keyword to bugs with nsbeta2 triage value in status field so the queries don't get screwed up

Keywords: nsbeta2

Frank Tang

Comment 23

•

24 years ago

nsbeta3+ P1 per bug meeting

Priority: P3 → P1

Whiteboard: [nsbeta2-] → [nsbeta2-][nsbeta3+]

Teruko Kobayashi

Comment 24

•

24 years ago

This still happens in 2000-08-14-12 Linux build.

Updated

•

24 years ago

Whiteboard: [nsbeta2-][nsbeta3+] → [nsbeta2-][nsbeta3+]possible patch in hand

Frank Tang

Comment 25

•

24 years ago

mark it as P2

Priority: P1 → P2

Koike Kazuhiko

Comment 26

•

24 years ago

A Japanese user made a good testcase. http://www.netlaputa.ne.jp/~vmi/software/mozilla/jis.html His platform is Sparc, Solaris 2.6.

Frank Tang

Comment 27

•

24 years ago

erik- should we check in your patch now? sooner is better for this particular issue, right ?

Erik van der Poel

Comment 28

•

24 years ago

The patch has some serious problems, so we can't use it. However, I have come up with a different change that works quite well. It zeroes out only the Unicodes corresponding to CP1252's 0x80-0x9F range (instead of all Unicodes less than 0x2200).

Whiteboard: [nsbeta2-][nsbeta3+]possible patch in hand → [nsbeta2-][nsbeta3+] fix in hand

Frank Tang

Comment 29

•

24 years ago

could you put the new patch here ? let's review and check it in.

Erik van der Poel

Comment 30

•

24 years ago

Attached patch new patch, zeroing out all the CP1252 0x80-9F chars — Details — Splinter Review

Erik van der Poel

Comment 31

•

24 years ago

New patch checked in.

Status: ASSIGNED → RESOLVED

Closed: 24 years ago

Resolution: --- → FIXED

Target Milestone: M17 → M18

Teruko Kobayashi

Comment 32

•

24 years ago

I verified this in 2000-09-13-12 Linux build.

Status: RESOLVED → VERIFIED

Koike Kazuhiko

Comment 33

•

24 years ago

It seems 2240-2269 weren't displayed. http://www.netlaputa.ne.jp/~vmi/software/mozilla/jis.txt

Masaki Katakai

Reporter

Comment 34

•

24 years ago

Reopend. As Koike-san mentioned, there are still some problems, - JIS 0x2240-0x2269 characters can not be displayed, just blank - JIS 0x2273, 0x2277, 0x2278 are not displayed properly - JIS 0x2146-0x2149 are displayed as HALFWIDTH I'll attachment a snapshot. For 0x2240-0x2269 problem, I'm thinking mapping table for JISX208 is not correct. For example, JIS 0x2240 should display the exact 0x2240 code point jis-fixed japanese font, however, it seems that the character 0x2240 is mapped to JIS 0x2d7d. JIS 0x2d?? is called IBM-NEC vendor specific area and jis-fixed japanese font does not contain any characters in this area. sun-gothic fonts bundled in Solaris provide those characters, so when I switched to sun-gothic for jisx208, I can see the characters. However, we should not use those area. It's dependend on fonts. What do you think?

Status: VERIFIED → REOPENED

Resolution: FIXED → ---

Masaki Katakai

Reporter

Comment 35

•

24 years ago

Attached image snapshot — Details

Erik van der Poel

Comment 36

•

24 years ago

I don't think we can make everybody happy for Beta 3 or even RTM. Since we are using Unicode internally, and since most Unix systems are missing the CP1252 fonts (they only have iso8859-1), and since JIS X 0208 happens to have some of the CP1252 characters, Mozilla picks up the huge Japanese fonts even when the surrounding text is small Western text (e.g. for quotation marks aka smart quotes). There are a lot of CP1252 documents out there because of the dominance of Windows, so we need to pay attention to those characters. Unfortunately, the hack that we came up with has some nasty effects on Japanese documents, so we are forced to choose the lesser of two evils. Much of this can be blamed on the poor state of fonts on X, the dominance of Windows, Mozilla's decision to implement CSS2 to the letter, and the imminence of RTM (it's too late to make big changes). However, I'm reassigning this bug to Frank, to have somebody look at the JIS <-> Unicode conversion tables, especially JIS 2240 - 2269.

Assignee: erik → ftang

Status: REOPENED → NEW

Frank Tang

Comment 37

•

24 years ago

I think the proper action for now is 1. ignore the problem of those character which overlap with windows-1252. 2. Fix the conversion table that map to the NEC 0x2Dxx range. I am worknig on 2. and should have an update table very soon.

Frank Tang

Comment 38

•

24 years ago

Fix in the local tree. The change is in mozilla/intl/uconv/ucvja/jis0208.uf file. This file is compressed binary data. I don't think anyone can review the changes. I remove the NEC specific mapping from it.

Phil Peterson

Comment 39

•

24 years ago

PDT agrees P2 for Frank's part (2)

Whiteboard: [nsbeta2-][nsbeta3+] fix in hand → [nsbeta2-][nsbeta3+][PDTP2] fix in hand

Frank Tang

Comment 40

•

24 years ago

mark it fixed.

Status: NEW → RESOLVED

Closed: 24 years ago → 24 years ago

Resolution: --- → FIXED

IWAMURO, Motonori

Comment 41

•

24 years ago

Please see: http://www.netlaputa.ne.jp/~vmi/software/mozilla/jis.html I found Following problems: * JIS code 224C is drawn as blank. * JIS code 2146, 2147, 2148, 2149 are not drawn as FULLWIDTH characters. * JIS code 2273, 2277, 2278 are not drawn corresponding fonts. Please compare http://www.netlaputa.ne.jp/~vmi/software/mozilla/jis.txt on Mozilla with http://www.netlaputa.ne.jp/~vmi/software/mozilla/term.gif

Status: RESOLVED → REOPENED

Resolution: FIXED → ---

Erik van der Poel

Comment 42

•

24 years ago

Frank, please take a look at JIS 224C. Maybe we have a problem in the converter? JIS 2146-9 are drawn half-width due to the CP1252 problem. I don't think we will fix this for Netscape 6. Sorry. We may want to consider drawing JIS 2273, 2277 and 2278 in the JIS font, since they are probably rare in CP1252 documents, and it probably doesn't matter if we draw them too large in CP1252 docs anyway.

Frank Tang

Comment 43

•

24 years ago

somehow U+FFE2 also map to 0x7C7B in the CP932 table which cause this problem. The half width issue won't be solve. JISx0208 only specify what character it encode, it does not specify the width of the glyph. Move this bug to future.

Status: REOPENED → ASSIGNED

Whiteboard: [nsbeta2-][nsbeta3+][PDTP2] fix in hand → [nsbeta2-][nsbeta3-][PDTP2] fix in hand

Target Milestone: M18 → Future

Cindy Roberts

Updated

•

24 years ago

Keywords: intl, nsbeta1

Frank Tang

Comment 44

•

24 years ago

clear out the status whiteboard , Target Milestone and keyword field. Reassign this to bstell to look again after we ship 6.0 RTM

Assignee: ftang → bstell

Status: ASSIGNED → NEW

Keywords: nsbeta2, nsbeta3

Whiteboard: [nsbeta2-][nsbeta3-][PDTP2] fix in hand

Target Milestone: Future → ---

Teruko Kobayashi

Comment 45

•

24 years ago

Changed QA contact to ylong@netscape.com.

QA Contact: teruko → ylong