Closed Bug 167136 Opened 22 years ago Closed 21 years ago

Allowed blank(space) glyph list have to be updated

Categories

(Core :: Internationalization, defect)

x86
Windows 2000
defect
Not set
normal

Tracking

()

RESOLVED FIXED

People

(Reporter: jshin1987, Assigned: jshin1987)

References

()

Details

(Keywords: intl)

Attachments

(5 files)

User-Agent:       Mozilla/5.0 (Windows; U; Windows NT 5.0; ko-KR; rv:1.1b) Gecko/20020721
Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 5.0; ko-KR; rv:1.1b) Gecko/20020721

In the page at the URL given above, Hangul Vowel filler(U+1160) is rendered as
a question mark. The font specified in the page (CODE2000 :
http://home.att.net/~jameskass)
has the non-spacing blank glyph for U+1160, but Mozilla regards the glyph (blank)
as invalid and falls back to the question mark for U+1160. 


Reproducible: Always

Steps to Reproduce:
1.install CODE2000 font available at http://home.att.net/~jameskass
2. launch mozilla
3. go to http://jshin.net/i18n/korean/fillers.html


Actual Results:  
Hangul vowel filler(U+1160) following Hangul leading consonants are rendered
as a question mark.

Expected Results:  
Hangul vowel filler(U+1160) should be rendered as a non-spacing/combining/zero-width
blank. 

It's easy to fix  and I'll attach the patch.
Attached patch patchSplinter Review
add U+1160 to the list of characters that are allowed to have 'blank' glyph.

I haven't added U+115F(Hangul leading consonant filler) because it appears
to be rendered fine without being added to the list..
Keywords: intl
intl.
Assignee: kmcclusk → yokoyama
Status: UNCONFIRMED → NEW
Component: GFX Compositor → Internationalization
Ever confirmed: true
QA Contact: petersen → ruixu
Keith Packard (a member of XFree86 Core team and the maintainer of
fontconfig package) went through the Unicode
char. table and came up with a more extensive list of characters
that are supposed to have 'blank' visual representation (empty outline)
(his original list came from Mozilla source)
Below is the list taken from  his email about the issue:


range              added to fc         comments
U+180B - U+180E       no            (but I don't have a Mongolian font to heck
against)
U+200C - U+200F       yes           (the Unicode description isn't clear)
U+2028 - U+2029       no            (these seem like they're supposed to be drawn)
U+202A - U+202F    yes              (these also appear blank from the description)
U+3164             yes              (HANGUL FILLER, similar to U+1160)
U+FEFF             yes              (byte order detector (ZERO WIDTH NO-BREAK
SPACE))
U+FFA0             yes              HALFWIDTH HANGUL FILLER (similar to U+3164)
U+FFF9 - U+FFFB    yes              INTERLINEAR ANNOTATION marks for furigana

I guess some of characters listed above are taken care of by Mozilla (e.g. 
ZWNBS/BOM), but I believe others have to be added.

FYI, the related thread in XF86-font list begins at

http://www.xfree86.org/pipermail/fonts/2002-September/002099.html

Although deprecated, U+206A - U+206D appear to have be included as well.
As for U+206E and U+206F, I'm not sure. 

BTW, I'm wondering how these characters are handled in MacOS 9/X, gtk and X11.
At least in gtk, Mozilla doesn't have this problem rendering the page given
at the URL with the same truetype font(CODE2000). Are they handled at a higher
layer before reaching to the lower level of font access? 
 
QA Contact: ruixu → ylong
changing summary line because it's not just about Hangul Vowel filler but also
involves
many other characters.
also reassigning it to myself.
Assignee: yokoyama → jshin
Summary: U+1160(Hangul Vowel filler) is rendered as a question mark → Allowed blank(space) glyph list have to be updated
A simplstic patch for this bug is just modify the macro to check if a char.
is allowed to be blank. However, as comment #5 shows, there are a little
bit too many of them to use a simple macro. Would there be a better way 
to deal with this list (a data structure?)? 
Adding shanjian to CC to seek his opinion on the best way to represent
the list of blank characters as he was the last one to change the line
in question :-) 
I ended up using CCMap. This may or may not be excessive for this
simple task. It seems to be all right
considering that the map is created only once per session at the beginning
and CCMap accessor macro is fast.
Shanjian, can you review?
A couple of issues to resolve:

  - find out which characters currently in the list are reliably filtered out
(possibly
    in a platform-independent way) upstream and remove them from the list. It
seems like
    what chars are filtered out is not platform-independent (e.g.
    nsFontMetricsWin does not get U+115f  from upstream, while nsFontMetricsXft
    gets it unfiltered.  I can't check how this is handled in Mac) 

  - think about a need to make the list user-configurable (in  prefs.js). Some fonts
     have _legitimate_ blanks glyphs in code points in PUA. Obviously, this
    cannot be hard-coded. With CCMap, it's easy to make this user-configurable.

 
jshin,
Thanks a lot for doing this. Using CCmap is the right approach. This has been in
my mind for quite some time and I haven't found the time to do it. 
I have a suggestion. Can you write a perl tools to generated the CCMAP in binary
form instead of generate it in run time? That will shrink the memory footprint
and improve starttime performance. We will need to apply similar approach in
several other places. (Punctuation mark check in layout is one example.)
Attached file ccmapbin.pl
Shanjian,
Attached is a simple perl tool to generate PRUint16 array for CCMap.
Actually, it generates three files for LE/BE(16bit), BE(32bit) and BE(64bit).
I tested the result (with a simple test program modified from printCCMap()
in nsCompressedCharMap.cpp) on ix86 (32bit LE), Alpha(64bit LE), Sparc(32bit
BE),
and PA-Risc(32bit BE) and it worked fine. I couldn't find a 64bit BE machine
(PR-Risc machine I used is 64bit but its long is only 32bit..), but I believe
it should 
work well there, too. 

Can you tell me where else we need this (nsFontMetricsGTK.cpp
is one of them)? Perhaps, I'll file a new bug (to put 'precompiled CCMap' in
place of
character list) and make this bug dependent on it.

BTW, currently, it just works on BMP, but can be extended easily.
Thanks for your greak work!!!

punctuation checking in nsTextFrame is a sure thing:
http://lxr.mozilla.org/seamonkey/source/layout/html/base/src/nsTextFrame.cpp#4645

CJK and hangul check in linebreaker is questionable, 
http://lxr.mozilla.org/seamonkey/source/intl/lwbrk/src/nsJISx4501LineBreaker.cpp

I am sure that we will need this in some other places now and in future. 
Shanjian, 
Thank you for your kind words.

 I filed a new bug 180266 for this
and am going to make this bug depend on it. I didn't have to,
but it seems like it's more 'conceptually' clear...

I added Shanjian to CC of bug 180266 and anyone here
is welcome to add her/himself to CC list there.

> I am sure that we will need this in some other places now and in future. 

So am I :-) Especially, I guess Mozilla may need to look up Unicode
char. class  table in several places (line breaking, rendering/layout - bidi, ...
text boundary identification, editing - search and replace, etc) 
Most TRs at http://www.unicode.org/reports appear relevant to
adopting this approach  in Mozilla in one way or another
(UTR 14, UTR 29, UTR  9, UTR 13 to name just a few)...
Depends on: 180266
180266 was just resolved and accordingly this is, too.
Status: NEW → RESOLVED
Closed: 21 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: