Closed Bug 167136 Opened 22 years ago Closed 21 years ago

Allowed blank(space) glyph list have to be updated

Tracking

()

Status:

RESOLVED FIXED

People

(Reporter: jshin1987, Assigned: jshin1987)

References

(
URL
)

Details

(Keywords: intl)

Attachments

(5 files)

patch 22 years ago Jungshik Shin 781 bytes, patch		Details \| Diff \| Splinter Review
a screenshot revealing the problem 22 years ago Jungshik Shin 51.18 KB, image/jpeg		Details
a screenshot taken with a patched mozilla 22 years ago Jungshik Shin 65.85 KB, image/jpeg		Details
a new patch using CCMap (with a more extensive list of blank chars) 22 years ago Jungshik Shin 4.33 KB, patch		Details \| Diff \| Splinter Review
ccmapbin.pl 22 years ago Jungshik Shin 5.83 KB, text/plain		Details

Jungshik Shin

Assignee

Description

•

22 years ago

User-Agent:       Mozilla/5.0 (Windows; U; Windows NT 5.0; ko-KR; rv:1.1b) Gecko/20020721
Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 5.0; ko-KR; rv:1.1b) Gecko/20020721

In the page at the URL given above, Hangul Vowel filler(U+1160) is rendered as
a question mark. The font specified in the page (CODE2000 :
http://home.att.net/~jameskass)
has the non-spacing blank glyph for U+1160, but Mozilla regards the glyph (blank)
as invalid and falls back to the question mark for U+1160. 


Reproducible: Always

Steps to Reproduce:
1.install CODE2000 font available at http://home.att.net/~jameskass
2. launch mozilla
3. go to http://jshin.net/i18n/korean/fillers.html


Actual Results:  
Hangul vowel filler(U+1160) following Hangul leading consonants are rendered
as a question mark.

Expected Results:  
Hangul vowel filler(U+1160) should be rendered as a non-spacing/combining/zero-width
blank. 

It's easy to fix  and I'll attach the patch.

Jungshik Shin

Assignee

Comment 1

•

22 years ago

Attached patch patch — Details — Splinter Review

add U+1160 to the list of characters that are allowed to have 'blank' glyph.

I haven't added U+115F(Hangul leading consonant filler) because it appears
to be rendered fine without being added to the list..

Jungshik Shin

Assignee

Updated

•

22 years ago

Keywords: intl

Jungshik Shin

Assignee

Comment 2

•

22 years ago

Attached image a screenshot revealing the problem — Details

Jungshik Shin

Assignee

Comment 3

•

22 years ago

Attached image a screenshot taken with a patched mozilla — Details

Boris Zbarsky [:bzbarsky]

Comment 4

•

22 years ago

intl.

Assignee: kmcclusk → yokoyama

Status: UNCONFIRMED → NEW

Component: GFX Compositor → Internationalization

Ever confirmed: true

QA Contact: petersen → ruixu

Jungshik Shin

Assignee

Comment 5

•

22 years ago

Keith Packard (a member of XFree86 Core team and the maintainer of
fontconfig package) went through the Unicode
char. table and came up with a more extensive list of characters
that are supposed to have 'blank' visual representation (empty outline)
(his original list came from Mozilla source)
Below is the list taken from  his email about the issue:


range              added to fc         comments
U+180B - U+180E       no            (but I don't have a Mongolian font to heck
against)
U+200C - U+200F       yes           (the Unicode description isn't clear)
U+2028 - U+2029       no            (these seem like they're supposed to be drawn)
U+202A - U+202F    yes              (these also appear blank from the description)
U+3164             yes              (HANGUL FILLER, similar to U+1160)
U+FEFF             yes              (byte order detector (ZERO WIDTH NO-BREAK
SPACE))
U+FFA0             yes              HALFWIDTH HANGUL FILLER (similar to U+3164)
U+FFF9 - U+FFFB    yes              INTERLINEAR ANNOTATION marks for furigana

I guess some of characters listed above are taken care of by Mozilla (e.g. 
ZWNBS/BOM), but I believe others have to be added.

FYI, the related thread in XF86-font list begins at

http://www.xfree86.org/pipermail/fonts/2002-September/002099.html

Jungshik Shin

Assignee

Comment 6

•

22 years ago

Although deprecated, U+206A - U+206D appear to have be included as well.
As for U+206E and U+206F, I'm not sure. 

BTW, I'm wondering how these characters are handled in MacOS 9/X, gtk and X11.
At least in gtk, Mozilla doesn't have this problem rendering the page given
at the URL with the same truetype font(CODE2000). Are they handled at a higher
layer before reaching to the lower level of font access?

Rui Xu

Updated

•

22 years ago

QA Contact: ruixu → ylong

Jungshik Shin

Assignee

Comment 7

•

22 years ago

changing summary line because it's not just about Hangul Vowel filler but also
involves
many other characters.
also reassigning it to myself.

Assignee: yokoyama → jshin

Summary: U+1160(Hangul Vowel filler) is rendered as a question mark → Allowed blank(space) glyph list have to be updated

Jungshik Shin

Assignee

Comment 8

•

22 years ago

A simplstic patch for this bug is just modify the macro to check if a char.
is allowed to be blank. However, as comment #5 shows, there are a little
bit too many of them to use a simple macro. Would there be a better way 
to deal with this list (a data structure?)?

Jungshik Shin

Assignee

Comment 9

•

22 years ago

Adding shanjian to CC to seek his opinion on the best way to represent
the list of blank characters as he was the last one to change the line
in question :-)

Jungshik Shin

Assignee

Comment 10

•

22 years ago

Attached patch a new patch using CCMap (with a more extensive list of blank chars) — Details — Splinter Review

I ended up using CCMap. This may or may not be excessive for this
simple task. It seems to be all right
considering that the map is created only once per session at the beginning
and CCMap accessor macro is fast.
Shanjian, can you review?

Jungshik Shin

Assignee

Comment 11

•

22 years ago

A couple of issues to resolve:

  - find out which characters currently in the list are reliably filtered out
(possibly
    in a platform-independent way) upstream and remove them from the list. It
seems like
    what chars are filtered out is not platform-independent (e.g.
    nsFontMetricsWin does not get U+115f  from upstream, while nsFontMetricsXft
    gets it unfiltered.  I can't check how this is handled in Mac) 

  - think about a need to make the list user-configurable (in  prefs.js). Some fonts
     have _legitimate_ blanks glyphs in code points in PUA. Obviously, this
    cannot be hard-coded. With CCMap, it's easy to make this user-configurable.

Shanjian Li

Comment 12

•

22 years ago

jshin,
Thanks a lot for doing this. Using CCmap is the right approach. This has been in
my mind for quite some time and I haven't found the time to do it. 
I have a suggestion. Can you write a perl tools to generated the CCMAP in binary
form instead of generate it in run time? That will shrink the memory footprint
and improve starttime performance. We will need to apply similar approach in
several other places. (Punctuation mark check in layout is one example.)

Jungshik Shin

Assignee

Comment 13

•

22 years ago

Attached file ccmapbin.pl — Details

Shanjian,
Attached is a simple perl tool to generate PRUint16 array for CCMap.
Actually, it generates three files for LE/BE(16bit), BE(32bit) and BE(64bit).
I tested the result (with a simple test program modified from printCCMap()
in nsCompressedCharMap.cpp) on ix86 (32bit LE), Alpha(64bit LE), Sparc(32bit
BE),
and PA-Risc(32bit BE) and it worked fine. I couldn't find a 64bit BE machine
(PR-Risc machine I used is 64bit but its long is only 32bit..), but I believe
it should 
work well there, too. 

Can you tell me where else we need this (nsFontMetricsGTK.cpp
is one of them)? Perhaps, I'll file a new bug (to put 'precompiled CCMap' in
place of
character list) and make this bug dependent on it.

BTW, currently, it just works on BMP, but can be extended easily.

Shanjian Li

Comment 14

•

22 years ago

Thanks for your greak work!!!

punctuation checking in nsTextFrame is a sure thing:
http://lxr.mozilla.org/seamonkey/source/layout/html/base/src/nsTextFrame.cpp#4645

CJK and hangul check in linebreaker is questionable, 
http://lxr.mozilla.org/seamonkey/source/intl/lwbrk/src/nsJISx4501LineBreaker.cpp

I am sure that we will need this in some other places now and in future.

Jungshik Shin

Assignee

Comment 15

•

22 years ago

Shanjian, 
Thank you for your kind words.

 I filed a new bug 180266 for this
and am going to make this bug depend on it. I didn't have to,
but it seems like it's more 'conceptually' clear...

I added Shanjian to CC of bug 180266 and anyone here
is welcome to add her/himself to CC list there.

> I am sure that we will need this in some other places now and in future. 

So am I :-) Especially, I guess Mozilla may need to look up Unicode
char. class  table in several places (line breaking, rendering/layout - bidi, ...
text boundary identification, editing - search and replace, etc) 
Most TRs at http://www.unicode.org/reports appear relevant to
adopting this approach  in Mozilla in one way or another
(UTR 14, UTR 29, UTR  9, UTR 13 to name just a few)...

Depends on: 180266

Jungshik Shin

Assignee

Comment 16

•

21 years ago

180266 was just resolved and accordingly this is, too.

Status: NEW → RESOLVED

Closed: 21 years ago

Resolution: --- → FIXED

You need to log in before you can comment on or make changes to this bug.