Closed
Bug 118589
Opened 23 years ago
Closed 15 years ago
do not draw ? as fallback for Non-Spacing, and control characters
Categories
(Core :: Internationalization, defect)
Core
Internationalization
Tracking
()
RESOLVED
WONTFIX
mozilla1.2alpha
People
(Reporter: ftang, Assigned: smontagu)
References
(Blocks 1 open bug)
Details
(Keywords: intl, Whiteboard: [eta 8/25])
Attachments
(1 file)
3.01 KB,
text/plain
|
Details |
we currently draw '?' for all character which we do not have glyph in a font. We
should draw nothing and measure as zero for all the following characters:
class in unicode data base
Mn Mark, Non-Spacing
Cc Other, Control
Cf Other, FormatLm Letter, Modifier
we could have a xp ccharmap and use it for all 3 platform to implement such
fallback.
Reporter | ||
Comment 1•23 years ago
|
||
Reporter | ||
Comment 2•23 years ago
|
||
simon- I think we could use this to solve the showing of zwj and zwnj problem.
also we can turn OFF Hebrew and Arabic mark showing as '?' mark.
shanjian- how can we build the ccharmap in the compile time (or before check in)
as a bitmap ?
for these 582 unciode code point, right before we draw / measure as '?', we
should check. If they are one of these characters, measure as 0,0 for
GetTextDimension, draw nothing .
Comment 3•23 years ago
|
||
Frank, should bug 106311 be duped into this?
Reporter | ||
Comment 4•23 years ago
|
||
shanjian said the compressed cmap will be more than 1K. probably we should
simply use an array with binary search since we don't care too much of the
performance here.
Reporter | ||
Updated•23 years ago
|
Status: NEW → ASSIGNED
Target Milestone: --- → mozilla0.9.9
Comment 5•23 years ago
|
||
Frank,
I calculated the size of ccmap manually ( it is not very hard to do so.) The
size of such a ccmap should be around 964 bytes. The advantage is fast accessibility.
There are 3 memory reference before getting the result. A binary search takes up to
10 comparison. So ccmap approach might still be a better idea if this kind of search
is done frequently in certain user environment.
Comment 6•23 years ago
|
||
the map should be a singleton (eg: one only)
Comment 7•23 years ago
|
||
frank,
My answer to your second question is incorrect. I double checked the implementation
of CCMAP. It allows several ALU_TYPE (ie. 16bits, 32bits, 64bits) to initiate and
access ccmap. So if we always use 16bits integer to initiate the array (ccmap), there
will be difference between BE and LE when ALU_TYPE is 32 or 64.
We probably should get rid of ALU_TYPE. I need to talk to brian.
Comment 8•23 years ago
|
||
Brendan asked specifically for variable sized access (ALU).
Lets discuss the plusses and minuses of the various options before we make a
decision.
Comment 9•23 years ago
|
||
If we initialize a ccmap using PRUint16 stores, with bit-setting within 16-bit
units, but access using wider loads, then shanjian is right and we'll definitely
care about byte order (or PRUint16-order within the 32- or 64-bit units). But
can we not use the wider (ALU_TYPE) accesses always, whenever loading or storing?
/be
Comment 10•23 years ago
|
||
>>But can we not use the wider (ALU_TYPE) accesses always, whenever loading or
>>storing?
Brendan, are you suggesting us to disable wider accesses only in certain
situation (loading or storing)? CCMAP has a flag field we can use, but we will
have one more memory reference in each access.
Comment 11•23 years ago
|
||
I talked with brian yesterday, and we came up with 3 possible solution:
1) Dynamicly generate CCMAP
plus: We can still utilize the wider access in ccmap
Static initialization array is more readable and easy to maintenance.
minus:There is addition dependency on ccmap library (runtime).
There is little addition running cost
2) Totally disable CCMAP wider access, and initialize ccmap directly
plus: All ccmap access are just macros, we don't have runtime dependency
CCMAP code will be less complicated, and thus easy to maintenance.
minus:The benefit of wider access is lost
3) Create a new indexed array similar to ccmap
plus: we might have the smallest foot print
minus:we are reinventing the wheel.
Comment 12•23 years ago
|
||
No, I was suggesting that you use wide accesses for all loads and stores from
the map, where the current code uses a wide access some of the time. Where the
current code uses only PRUint16 accesses, no need to change. But first perhaps
we can ascertain the performance gain of wider accesses?
/be
Reporter | ||
Comment 13•23 years ago
|
||
move this to m1.1 item
Target Milestone: mozilla0.9.9 → mozilla1.1
Comment 14•22 years ago
|
||
*** Bug 152958 has been marked as a duplicate of this bug. ***
Reporter | ||
Updated•22 years ago
|
Target Milestone: mozilla1.1alpha → ---
Reporter | ||
Comment 15•22 years ago
|
||
ok, I change my position,
first of all, I think we have a sloution already. we can add empty entry to
/intl/unicharutil/tables/transliterate.properties to solve this problem instead
of invent a new method. At least it work on window now. I should try it with mac
and linux
for example, if the linux and mac do not have hebrew vowel sign, add the
following lines into the /intl/unicharutil/tables/transliterate.properties
probably will turn off the vowel sign rendering when the font is not there
instead of display a ? mark
2. I think my attachment about which unicode could be treat as this way is
wrong. There are some character should not be display as nothing. We need to
display them as ? instead
>Frank, should bug 106311 be duped into this?
no, this bug is about if we cannot display the character by using a valid glyph,
display it as nothing instead of display it as question mark
For bug 106311, those characters are not display as a question mark but are
displayed with a glphy which claim to be a glyph for ascii 0x11. that is totally
a different issue.
This one is how we treat fallback, that one is how we decide which glyph is
invalid from a valid truetype font.
for now, we know that we probably want to address for the following characters:
1. hebrew accent and point mark
2. arabic points
3. bidi control characters
smontagu, is that true ?
Reporter | ||
Comment 16•22 years ago
|
||
smontagu- can you give me a list of hebrew/arabic charcaters that you think we
should display nothing instead of ? in case we don't have a glyph from any font
Whiteboard: [eta 8/25]
Target Milestone: --- → mozilla1.2alpha
Assignee | ||
Comment 17•22 years ago
|
||
In the following list, I am sure about the Hebrew characters, but it would be
good if someone could give a second opinion about the Arabic.
0591;HEBREW ACCENT ETNAHTA
0592;HEBREW ACCENT SEGOL
0593;HEBREW ACCENT SHALSHELET
0594;HEBREW ACCENT ZAQEF QATAN
0595;HEBREW ACCENT ZAQEF GADOL
0596;HEBREW ACCENT TIPEHA
0597;HEBREW ACCENT REVIA
0598;HEBREW ACCENT ZARQA
0599;HEBREW ACCENT PASHTA
059A;HEBREW ACCENT YETIV
059B;HEBREW ACCENT TEVIR
059C;HEBREW ACCENT GERESH
059D;HEBREW ACCENT GERESH MUQDAM
059E;HEBREW ACCENT GERSHAYIM
059F;HEBREW ACCENT QARNEY PARA
05A0;HEBREW ACCENT TELISHA GEDOLA
05A1;HEBREW ACCENT PAZER
05A3;HEBREW ACCENT MUNAH
05A4;HEBREW ACCENT MAHAPAKH
05A5;HEBREW ACCENT MERKHA
05A6;HEBREW ACCENT MERKHA KEFULA
05A7;HEBREW ACCENT DARGA
05A8;HEBREW ACCENT QADMA
05A9;HEBREW ACCENT TELISHA QETANA
05AA;HEBREW ACCENT YERAH BEN YOMO
05AB;HEBREW ACCENT OLE
05AC;HEBREW ACCENT ILUY
05AD;HEBREW ACCENT DEHI
05AE;HEBREW ACCENT ZINOR
05AF;HEBREW MARK MASORA CIRCLE
05B0;HEBREW POINT SHEVA
05B1;HEBREW POINT HATAF SEGOL
05B2;HEBREW POINT HATAF PATAH
05B3;HEBREW POINT HATAF QAMATS
05B4;HEBREW POINT HIRIQ
05B5;HEBREW POINT TSERE
05B6;HEBREW POINT SEGOL
05B7;HEBREW POINT PATAH
05B8;HEBREW POINT QAMATS
05B9;HEBREW POINT HOLAM
05BB;HEBREW POINT QUBUTS
05BC;HEBREW POINT DAGESH OR MAPIQ
05BD;HEBREW POINT METEG
05BF;HEBREW POINT RAFE
05C1;HEBREW POINT SHIN DOT
05C2;HEBREW POINT SIN DOT
05C4;HEBREW MARK UPPER DOT
0640;ARABIC TATWEEL
064B;ARABIC FATHATAN
064C;ARABIC DAMMATAN
064D;ARABIC KASRATAN
064E;ARABIC FATHA
064F;ARABIC DAMMA
0650;ARABIC KASRA
0651;ARABIC SHADDA
0652;ARABIC SUKUN
0653;ARABIC MADDAH ABOVE
0654;ARABIC HAMZA ABOVE
0655;ARABIC HAMZA BELOW
0670;ARABIC LETTER SUPERSCRIPT ALEF
06D6;ARABIC SMALL HIGH LIGATURE SAD WITH LAM WITH ALEF MAKSURA;
06D7;ARABIC SMALL HIGH LIGATURE QAF WITH LAM WITH ALEF MAKSURA;
06D8;ARABIC SMALL HIGH MEEM INITIAL FORM;
06D9;ARABIC SMALL HIGH LAM ALEF;
06DA;ARABIC SMALL HIGH JEEM;
06DB;ARABIC SMALL HIGH THREE DOTS;
06DC;ARABIC SMALL HIGH SEEN;
06DF;ARABIC SMALL HIGH ROUNDED ZERO;
06E0;ARABIC SMALL HIGH UPRIGHT RECTANGULAR ZERO;
06E1;ARABIC SMALL HIGH DOTLESS HEAD OF KHAH;
06E2;ARABIC SMALL HIGH MEEM ISOLATED FORM;
06E3;ARABIC SMALL LOW SEEN;
06E4;ARABIC SMALL HIGH MADDA;
06E7;ARABIC SMALL HIGH YEH;
06E8;ARABIC SMALL HIGH NOON;
06EA;ARABIC EMPTY CENTRE LOW STOP;
06EB;ARABIC EMPTY CENTRE HIGH STOP;
06EC;ARABIC ROUNDED HIGH STOP WITH FILLED CENTRE;
06ED;ARABIC SMALL LOW MEEM;
FB1E;HEBREW POINT JUDEO-SPANISH VARIKA
Comment 18•22 years ago
|
||
The Arabic looks right to me.
Comment 19•22 years ago
|
||
I'm only not sure about U+0640 ARABIC TATWEEL. It's a semi-letter semi-control
character, also something used as a dingbat. I prefer removing it from the list.
Comment 20•22 years ago
|
||
Well, what other implications are there for keeping it on the list? If all there
is to it is that the fallback is to simply ignore it, then it certainly should
be on the list. After all, the TATWEEL has no significance really (it's a
formatting character to elongate the length of a word).
But if there's something else I'm missing then please do enligten me ;)
Comment 21•22 years ago
|
||
ِwell, Tatweel is sometimes used a hyphen in Persian (the hyphen glyph in many
fonts are a little high for Arabic text), it is sometimes used as a bullet, ...
These cases are not ignorable, and I prefer seeing a question mark in these
places than nothing, to find that there is a font problem.
Comment 22•22 years ago
|
||
I am now enlightened, thanks ;) I changed my mind, I would rather not see the
U+0640 in that list.
Reporter | ||
Comment 23•20 years ago
|
||
what a hack. I have not touch mozilla code for 2 years. I didn't read these bugs
for 2 years. And they are still there. Just close them as won't fix to clean up.
Status: ASSIGNED → RESOLVED
Closed: 20 years ago
Resolution: --- → WONTFIX
Comment 24•20 years ago
|
||
This issue is being dealt with in bug 205387
Status: RESOLVED → REOPENED
Resolution: WONTFIX → ---
Comment 25•20 years ago
|
||
Mass Reassign Please excuse the spam
Assignee: ftang → nobody
Status: REOPENED → NEW
Assignee | ||
Comment 26•20 years ago
|
||
This has wider scope than bug 205387.
Assignee: nobody → smontagu
Depends on: 205387
Updated•15 years ago
|
QA Contact: amyy → i18n
Assignee | ||
Comment 27•15 years ago
|
||
Very, very WONTFIX: displaying nothing for a given character can be used as a phishing vector.
Status: NEW → RESOLVED
Closed: 20 years ago → 15 years ago
Resolution: --- → WONTFIX
You need to log in
before you can comment on or make changes to this bug.
Description
•