Closed
Bug 33162
Opened 25 years ago
Closed 23 years ago
missing Japanese characters on Linux
Categories
(Core :: Internationalization, defect, P2)
Tracking
()
VERIFIED
FIXED
People
(Reporter: masaki.katakai, Assigned: bstell)
References
Details
(Keywords: intl, Whiteboard: [PDT+]wait for tree open to check in)
Attachments
(8 files)
96.50 KB,
image/jpeg
|
Details | |
2.09 KB,
patch
|
Details | Diff | Splinter Review | |
1.61 KB,
patch
|
Details | Diff | Splinter Review | |
79.36 KB,
image/jpeg
|
Details | |
567 bytes,
application/octet-stream
|
Details | |
20.15 KB,
text/html
|
Details | |
1.07 KB,
application/octet-stream
|
Details | |
11.35 KB,
patch
|
Details | Diff | Splinter Review |
I'll create an attachment that shows snapshot on screen.
I checked how Mozilla and Communicator 4.x display the
japanese characters.
I found many characters are displays as `?'. For example,
EUC 0xa1e2 is missing, is drawn as `?'.
On Windows, it looks fine.
Reporter | ||
Comment 1•25 years ago
|
||
Comment 2•25 years ago
|
||
It seems we have two problems here.
1. some characters are rendering by some font which we believe it have the
character glyph but it don't . That is the cause of the rendering of t
2. We have bugs in unicode to JIS x0208 conversion for the following characters-
U+2015,U+2010,U+2225,U+2260,etc
Comment 3•25 years ago
|
||
mark this as assign. cata- can you help to look at the Unicode to JIS X 0208
converter table ? I think we have some bug there.
Status: UNCONFIRMED → ASSIGNED
Ever confirmed: true
Reassigned to nhotta. Tentatively set TM to M17.
Assignee: bobj → nhotta
Target Milestone: M16 → M17
Comment 7•25 years ago
|
||
Accepting, if this is just a change to the conversion table, but no more unix
bugs to me, please (>_<).
Status: NEW → ASSIGNED
Comment 11•24 years ago
|
||
cata- can you help erik take a look at this ?
Assignee: ftang → cata
Status: ASSIGNED → NEW
Comment 13•24 years ago
|
||
internal test cases-
http://babel/Intl_Client/browser/fonts/multibyte_tests/euc-jp/allchars_euc.html
Status: NEW → ASSIGNED
Comment 14•24 years ago
|
||
please help. There are two kind of problem. Some character are display as blank
and some characters are display as ?.
My feeling is it probably is related to the nsIAtom issue, but I am not sure.
Assignee: ftang → erik
Status: ASSIGNED → NEW
Comment 15•24 years ago
|
||
The Unix version of Mozilla has improved since this bug report was filed, but we
still have some question marks. For example:
EUC 0xA2A8 -> U+203B (REFERENCE MARK)
This character is not being displayed properly on Unix because we have a hack
that zeroes out all Unicodes from U+0000 to U+2200 in the font. We do this
because JIS X 0208 fonts contain some of the CP1252 characters such as smart
quotes, and since JIS fonts are much larger than 8859-1 fonts on Unix, CP1252
documents look strange (e.g. very large smart quote inside small Western text).
This one might be a bit complicated to fix. We probably want to use the JIS
glyphs when the surrounding text is (nearly as) large as the JIS font. When the
surrounding text is much smaller that the JIS font, then we may want to use our
transliteration routine instead (e.g. smart quote becomes regular ASCII quote).
Even better might be a solution where we try to stick to the same font as much
as possible, without switching to a JIS font that may be too tall, too wide, or
look strange. However, this would violate the CSS2 spec, which says that we must
go down the font-family list *for each character* in the element. MSIE doesn't
follow CSS2 in this regard, and maybe that is justifiable (and CSS2 is "wrong").
Accepting bug for now.
Status: NEW → ASSIGNED
Comment 16•24 years ago
|
||
As I mention, I see two kinds of problem. Some characters display as blank and
some display as question mark
The following characters display as ?
EUC-JP[0xA1BD] == U+2015
EUC-JP[0xA1BE] == U+2010 # HYPHEN
EUC-JP[0xA2A8] == U+203b # REFERENCE MARK
EUC-JP[0xA2F2] == U+212b # ANGSTROM SIGN
The following characters display as blank
EUC-JP[0xA2C0] == U+222A # UNION
EUC-JP[0xA2C1] == U+2229 # INTERSECTION
EUC-JP[0xA2CC] == U+FFE2 # NOT SIGN
EUC-JP[0xA2DC] == U+2220 # ANGLE
EUC-JP[0xA2DD] == U+22A5 # UP TACK
EUC-JP[0xA2E1] == U+2261 # IDENTICAL TO
EUC-JP[0xA2E2] == U+2252 # APPROXIMATELY EQUAL TO OR THE IMAGE OF
EUC-JP[0xA2E5] == U+221A # SQUARE ROOT
EUC-JP[0xA2E8] == U+2235 # BECAUSE
EUC-JP[0xA2E9] == U+222b # INTEGRAL
Comment 17•24 years ago
|
||
The first kind of problem mentioned by frank is caused by the
hack Erik talked about in his email. The 2nd problem is not
seen at least on Exceed X-server and on HP. I believe that
probably is a font problem and is not our concern.
If CSS2 could not produce a consistent looking, it is a problem.
I believe the problem mostly exist between single byte charset
font and doublebyte charset font. If a string is in single byte
charset, we should let single byte charset font take precedence.
Same thing should happen to double byte. Will that solve the
problem?
Comment 18•24 years ago
|
||
Comment 19•24 years ago
|
||
Shanjian, in this case we cannot give precedence to a single-byte font because
there is no single-byte font with those characters (e.g. smart quotes). I
suppose we could ignore the double-byte font in that case, and then fall back to
the transliteration using an ASCII font. But that means that we need to look at
neighboring characters, and that is quite a big change that I haven't tried or
even given much thought to yet.
Comment 20•24 years ago
|
||
after discuss with erik. We decide to nsbeta2- this bug. The reason is the
following
1. We cannot see the "shown as blank" problem on erik machine, either erik's
other fix fix it or there are some strang issue with my fonts. In any cases, the
missing characters are not belong to frequent use characters. So... we could
hold it after nsbeta2. The only character in this catergory have higher usage is
EUC-JP[0xA2E8] == U+2235 # BECAUSE
but it is still rarely used.
For the those characters shown as '?',
EUC-JP[0xA2A8] == U+203b # REFERENCE MARK
have higher usage but I don't think we really need it for nsbeta2.
Updated•24 years ago
|
Comment 22•24 years ago
|
||
Adding nsbeta2 keyword to bugs with nsbeta2 triage value in status field so the
queries don't get screwed up
Keywords: nsbeta2
Comment 23•24 years ago
|
||
nsbeta3+ P1 per bug meeting
Priority: P3 → P1
Whiteboard: [nsbeta2-] → [nsbeta2-][nsbeta3+]
Comment 24•24 years ago
|
||
This still happens in 2000-08-14-12 Linux build.
Whiteboard: [nsbeta2-][nsbeta3+] → [nsbeta2-][nsbeta3+]possible patch in hand
Comment 26•24 years ago
|
||
A Japanese user made a good testcase.
http://www.netlaputa.ne.jp/~vmi/software/mozilla/jis.html
His platform is Sparc, Solaris 2.6.
Comment 27•24 years ago
|
||
erik- should we check in your patch now? sooner is better for this particular
issue, right ?
Comment 28•24 years ago
|
||
The patch has some serious problems, so we can't use it. However, I have come
up with a different change that works quite well. It zeroes out only the
Unicodes corresponding to CP1252's 0x80-0x9F range (instead of all Unicodes less
than 0x2200).
Whiteboard: [nsbeta2-][nsbeta3+]possible patch in hand → [nsbeta2-][nsbeta3+] fix in hand
Comment 29•24 years ago
|
||
could you put the new patch here ? let's review and check it in.
Comment 30•24 years ago
|
||
Comment 31•24 years ago
|
||
New patch checked in.
Status: ASSIGNED → RESOLVED
Closed: 24 years ago
Resolution: --- → FIXED
Target Milestone: M17 → M18
Comment 33•24 years ago
|
||
It seems 2240-2269 weren't displayed.
http://www.netlaputa.ne.jp/~vmi/software/mozilla/jis.txt
Reporter | ||
Comment 34•24 years ago
|
||
Reopend.
As Koike-san mentioned, there are still some problems,
- JIS 0x2240-0x2269 characters can not be displayed, just blank
- JIS 0x2273, 0x2277, 0x2278 are not displayed properly
- JIS 0x2146-0x2149 are displayed as HALFWIDTH
I'll attachment a snapshot.
For 0x2240-0x2269 problem, I'm thinking mapping table for
JISX208 is not correct.
For example, JIS 0x2240 should display the exact 0x2240
code point jis-fixed japanese font, however, it
seems that the character 0x2240 is mapped to JIS 0x2d7d.
JIS 0x2d?? is called IBM-NEC vendor specific area and
jis-fixed japanese font does not contain any characters
in this area.
sun-gothic fonts bundled in Solaris provide those
characters, so when I switched to sun-gothic for jisx208,
I can see the characters. However, we should not
use those area. It's dependend on fonts.
What do you think?
Status: VERIFIED → REOPENED
Resolution: FIXED → ---
Reporter | ||
Comment 35•24 years ago
|
||
Comment 36•24 years ago
|
||
I don't think we can make everybody happy for Beta 3 or even RTM. Since we are
using Unicode internally, and since most Unix systems are missing the CP1252
fonts (they only have iso8859-1), and since JIS X 0208 happens to have some of
the CP1252 characters, Mozilla picks up the huge Japanese fonts even when the
surrounding text is small Western text (e.g. for quotation marks aka smart
quotes). There are a lot of CP1252 documents out there because of the dominance
of Windows, so we need to pay attention to those characters.
Unfortunately, the hack that we came up with has some nasty effects on Japanese
documents, so we are forced to choose the lesser of two evils.
Much of this can be blamed on the poor state of fonts on X, the dominance of
Windows, Mozilla's decision to implement CSS2 to the letter, and the imminence
of RTM (it's too late to make big changes).
However, I'm reassigning this bug to Frank, to have somebody look at the JIS <->
Unicode conversion tables, especially JIS 2240 - 2269.
Assignee: erik → ftang
Status: REOPENED → NEW
Comment 37•24 years ago
|
||
I think the proper action for now is
1. ignore the problem of those character which overlap with windows-1252.
2. Fix the conversion table that map to the NEC 0x2Dxx range.
I am worknig on 2. and should have an update table very soon.
Comment 38•24 years ago
|
||
Fix in the local tree. The change is in mozilla/intl/uconv/ucvja/jis0208.uf
file. This file is compressed binary data. I don't think anyone can review the
changes. I remove the NEC specific mapping from it.
Comment 39•24 years ago
|
||
PDT agrees P2 for Frank's part (2)
Whiteboard: [nsbeta2-][nsbeta3+] fix in hand → [nsbeta2-][nsbeta3+][PDTP2] fix in hand
Comment 40•24 years ago
|
||
mark it fixed.
Status: NEW → RESOLVED
Closed: 24 years ago → 24 years ago
Resolution: --- → FIXED
Comment 41•24 years ago
|
||
Please see:
http://www.netlaputa.ne.jp/~vmi/software/mozilla/jis.html
I found Following problems:
* JIS code 224C is drawn as blank.
* JIS code 2146, 2147, 2148, 2149 are not drawn as FULLWIDTH characters.
* JIS code 2273, 2277, 2278 are not drawn corresponding fonts.
Please compare
http://www.netlaputa.ne.jp/~vmi/software/mozilla/jis.txt
on Mozilla with
http://www.netlaputa.ne.jp/~vmi/software/mozilla/term.gif
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Comment 42•24 years ago
|
||
Frank, please take a look at JIS 224C. Maybe we have a problem in the converter?
JIS 2146-9 are drawn half-width due to the CP1252 problem. I don't think we will
fix this for Netscape 6. Sorry.
We may want to consider drawing JIS 2273, 2277 and 2278 in the JIS font, since
they are probably rare in CP1252 documents, and it probably doesn't matter if
we draw them too large in CP1252 docs anyway.
Comment 43•24 years ago
|
||
somehow U+FFE2 also map to 0x7C7B in the CP932 table
which cause this problem.
The half width issue won't be solve. JISx0208 only specify what character it
encode, it does not specify the width of the glyph.
Move this bug to future.
Status: REOPENED → ASSIGNED
Whiteboard: [nsbeta2-][nsbeta3+][PDTP2] fix in hand → [nsbeta2-][nsbeta3-][PDTP2] fix in hand
Target Milestone: M18 → Future
Updated•24 years ago
|
Comment 44•24 years ago
|
||
clear out the status whiteboard , Target Milestone and keyword field.
Reassign this to bstell to look again after we ship 6.0 RTM
Assignee | ||
Updated•24 years ago
|
Target Milestone: --- → mozilla0.9.1
Assignee | ||
Updated•24 years ago
|
Status: NEW → ASSIGNED
Comment 46•24 years ago
|
||
brian - what's the status on this one?
Assignee | ||
Updated•24 years ago
|
Target Milestone: mozilla0.9.1 → mozilla0.9.2
Assignee | ||
Updated•24 years ago
|
Target Milestone: mozilla0.9.2 → mozilla0.9.1
Assignee | ||
Comment 47•24 years ago
|
||
224c seems okay
2146, 2147, 2148, 2149 are half width
2273, 2277, 2278 appear to be transliterated which gives a poor result
Target Milestone: mozilla0.9.1 → mozilla0.9.2
Comment 48•24 years ago
|
||
changing to nsbeta1-. this one does not meet beta stopper guidelines.
Updated•24 years ago
|
Comment 49•24 years ago
|
||
Adding nsCatFood and RTM keywords. This looks pretty bad.
Comment 50•24 years ago
|
||
Adding nsCatFood and RTM keywords. This looks pretty bad.
Comment 51•23 years ago
|
||
This is a difficult one.
The current behavior is becuase a hack erik put in to prevent JIS font used to
display smart quote in windows1252 document.
I wonder do we still need this after bstell add the per language group font
fallback. (Not sure, may still need it because we probably will still hit JIS
font before we hit the transliteration fallback)
Comment 52•23 years ago
|
||
Is there that we can decide zero out or not base on the language group?
Comment 53•23 years ago
|
||
can someone create a new screen shot?
Comment 54•23 years ago
|
||
pdt+ base on 6/11 pdt meeting.
Updated•23 years ago
|
Whiteboard: [PDT+]
Comment 55•23 years ago
|
||
> can someone create a new screen shot?
Only JIS symbol characters, as follows:
http://www.netlaputa.ne.jp/~vmi/software/mozilla/jis.html
This image is jis.txt displayed by kterm & Mozilla (checkout from CVS at
2001-06-12 JST).
Configuration of Japanese monospace font is misc-fixed-jisx0208.1983-0 (14dot).
JIS[0x2144] is not monospace font.
JIS[0x2146-0x2149] is not displayed.
JIS[0x2273,0x2277,0x2278] is not valid font.
Assignee | ||
Comment 56•23 years ago
|
||
IWAMURO,
Would it be possible to get a file like jis.txt but with just the
characters of interest?
Assignee | ||
Comment 57•23 years ago
|
||
these characters:
JIS[0x2144] is not monospace font.
JIS[0x2146-0x2149] is not displayed.
JIS[0x2273,0x2277,0x2278] is not valid font.
are disbled by this code:
1574 * XXX This is a bit of a hack. Documents containing the CP1252
1575 * extensions of Latin-1 (e.g. smart quotes) will display with
those
1576 * special characters way too large. This is because they happen to
1577 * be in these large double byte fonts. So, we disable those
1578 * characters here. Revisit this decision later.
1579 */
1580 if (aSelf->Convert == DoubleByteConvert) {
1581 PRUint32* map = aSelf->mMap;
1584 REMOVE_CHAR(map, 0x20AC);
...
The goal of this code was to use transliteration in western documents instead
of the large glyphs from double byte fonts.
1.117 <erik@netscape.com> 11 Sep 2000 14:03
bug 33162; instead of zeroing out all Unicodes less than 0x2200, we just
zero out the common ones that correspond to CP1252 (for things like smart
quotes), so that we can still see most of the JIS X 0208 characters;
Perhaps instead of disabling these glyphs from the double byte fonts so the
glyph lookup runs all the way to the tranliterator we should add an early
transliterator to the loaded fonts list for non-double byte documents.
This way non-double byte documents will transliterate instead of using these
double byte glyphs and double byte documents will use these glyphs.
Assignee | ||
Comment 58•23 years ago
|
||
the code that "zeroing out all Unicodes less than 0x2200" was added in rev 1.48
to fix bugzilla bug 4760
http://bonsai.mozilla.org/cvsview2.cgi?diff_mode=context&whitespace_mode=show&file=nsFontMetricsGTK.cpp&root=/cvsroot&subdir=mozilla/gfx/src/gtk&command=DIFF_FRAMESET&rev1=1.47&rev2=1.48
http://bugzilla.mozilla.org/show_bug.cgi?id=4760
Assignee | ||
Comment 59•23 years ago
|
||
Assignee | ||
Comment 60•23 years ago
|
||
Assignee | ||
Comment 61•23 years ago
|
||
Assignee | ||
Comment 62•23 years ago
|
||
> Perhaps instead of disabling these glyphs from the double byte fonts so the
> glyph lookup runs all the way to the tranliterator we should add an early
> transliterator to the loaded fonts list for non-double byte documents.
This made the Asian font work but broke the euro on lang groups that are
western-ish but not "x-western" like baltic which is "x-baltic".
Assignee | ||
Comment 63•23 years ago
|
||
The reason the previous idea broke the euro is that the current code by
disabling the Asian glyphs allows the glyph search code to find the euro
in one of the other single byte fonts (iso-8859-15) before hitting the
transliterator. When I added the early transliterator it stopped the font
search from finding the Asian glyphs but also stopped the font search from
finding the other single byte glyphs.
The correct fix would be to have 2 copies of all the Asian font maps: one
for single byte documents and one for double byte documents. This however
would be a very big change and seems highly unlikely to be approved any
time before the next release.
Perhaps we should look at the user's locale and if it is an Asian locale
not disable the glyphs.
That way Asian users will see the Asian (bigger) glyphs and non-Asian
users will see the smaller (non-Asian) glyphs.
Assignee | ||
Comment 64•23 years ago
|
||
Assignee | ||
Comment 65•23 years ago
|
||
attachment 38989 [details] [diff] [review] re-enables double byte special chars.
For single byte documents it adds a special char tranliterator before the double
byte fonts are checked so that the oversized glyphs in double byte fonts will
not be used in (single byte type docs)
Whiteboard: [PDT+] → [PDT+] have patch, need r= sr= a=
Comment 66•23 years ago
|
||
r=ftang for http://bugzilla.mozilla.org/showattachment.cgi?attach_id=38989 (
06/18/01 14:51 )
Assignee | ||
Updated•23 years ago
|
Whiteboard: [PDT+] have patch, need r= sr= a= → [PDT+] have patch r=ftang, need sr= a=
Comment 67•23 years ago
|
||
change status to "[PDT+] r=ftang, ask blizzard to sr= (6/19 9;40) also need a="
Whiteboard: [PDT+] have patch r=ftang, need sr= a= → [PDT+] r=ftang, ask blizzard to sr= (6/19 9;40) also need a=
Comment 68•23 years ago
|
||
sr=blizzard
Assignee | ||
Updated•23 years ago
|
Whiteboard: [PDT+] r=ftang, ask blizzard to sr= (6/19 9;40) also need a= → [PDT+] r=ftang, sr=blizzard, (6/19 12:49 asked for a=) need a=
Comment 69•23 years ago
|
||
a= asa@mozilla.org for checkin to the trunk.
(on behalf of drivers)
Blocks: 83989
Updated•23 years ago
|
Whiteboard: [PDT+] r=ftang, sr=blizzard, (6/19 12:49 asked for a=) need a= → [PDT+]wait for tree open to check in
Assignee | ||
Comment 70•23 years ago
|
||
checked into trunk
Status: ASSIGNED → RESOLVED
Closed: 24 years ago → 23 years ago
Resolution: --- → FIXED
Comment 71•23 years ago
|
||
I checked that all JIS symbol characters were displayed correctly.
Thanks.
Comment 72•23 years ago
|
||
It looks fine with me on 0.9.2(06-25) build, mark it as verified.
Status: RESOLVED → VERIFIED
Assignee | ||
Comment 73•23 years ago
|
||
3718 if (western_font) {
3719 NS_ASSERTION(western_font->SupportsChar(aChar), "font supposed to
support this char");
3720 return font;
3721 }
This should return western_font not font.
Status: VERIFIED → REOPENED
Resolution: FIXED → ---
Assignee | ||
Comment 74•23 years ago
|
||
from email:
> I just checked 33162, and it was not reopened yet. Is it a bugzilla problem?
> Anyway, you can put r=shanjian there.
>
> thanks
>
> shanjian
Status: REOPENED → ASSIGNED
Assignee | ||
Updated•23 years ago
|
Target Milestone: mozilla0.9.2 → ---
Assignee | ||
Comment 75•23 years ago
|
||
fixed in bug 86368
Status: ASSIGNED → RESOLVED
Closed: 23 years ago → 23 years ago
Resolution: --- → FIXED
Comment 76•23 years ago
|
||
Mark it as verified. Please re-open if still has problem.
Status: RESOLVED → VERIFIED
You need to log in
before you can comment on or make changes to this bug.
Description
•