Closed
Bug 204993
Opened 21 years ago
Closed 1 month ago
add transliteration to Xft
Categories
(Core Graveyard :: GFX, defect)
Tracking
(Not tracked)
RESOLVED
INCOMPLETE
People
(Reporter: rbs, Assigned: jshin1987)
References
Details
GfxGTK with core X11 fonts as well as GfxWin do what is called "transliteration". That is, when a glyph isn't found for a character, the code substitute with another fallback character/string. For example if there is no glyph for the "euro" currency symbol, the font engine uses the string "EUR" as a substitute. There is a transliteration API which provides such predefined mappings. However, it requires to support another class |nsFontXftSubstitute| to host the transliterator. @see bug 454 for how it was done in GfxGTK. Another example of how it was hooked in GfxWin can be found bug 33498. Doing the transliteration helps in mitigating the excessive amount of question marks/dominos for missing glyphs. It also helps to re-map character without visual representation to nothing. For example, there is no point showing a domino for U+200B ZERO WIDTH SPACE. Rather, it can be transliterated to the emptiness.
Assignee | ||
Comment 1•21 years ago
|
||
re bug 128153 comment 65 > It is lot easier to turn that into nothingness in the font engine > than higher up. Now that you said that, I fully agree with you about the easiness. However, where it ought to be done may not have so clear-cut an answer in every case. For some characters such as ZWNJ/ZWJ in Indic script rendering, it's certainly the font engine that has to take care of them in a context-sensitive and engine-dependent manner. (see bug 202352. In bug 192088, we tried to deal with them higher up in nsTextFrame and broke Indic script rendering). However, cases like U+2062 might(or might not) need a bit different treatment. If no font covers invisible characters, transliteration approach works well. However, in case some 'broken'(?) fonts have visible glyphs (as on my system. I don't know which font is to blame) for invisible characters, that does not work. We might need to know the 'authorial intent' (if that kind of authorial intent is allowed. this is a big if) and the font engine may not be the best place to figure that out in some cases... Perhaps, what we can do now is to make the font engine do a bit more. In addition to using nsFontSubStitute for unknown characters. it also has to turn invisible characters with no visual effect at all (this excludes characters like ZWJ/ZWNJ from the list but includes chars. like U+2052. Having said that, I'm not 100% sure whether 'U+0061 U+0062' and 'U+0061 U+2062 U+0062' are supposed to be identically rendered.) into nothingness even if they have visible glyphs in some (broken?) fonts. ... There might even be an opentype GSUB feature to turn the invisible to the visible that's supposed to be invoked in a controlled manner. Well, I'm getting off the track a bit...
Comment 2•21 years ago
|
||
Isn't is possible to just fall back to the transliterator before falling back to the unknown glyph, assuming the flag is set? Am I reading the patch correctly? I don't think that should require a whole new type myself but I haven't thought about it at length.
It is possible to embed it without another class. That's precisely what you are doing at present with the |MiniFont|. It is a lot easier to read/maintain down the track if there is a separate handling. All the logic move into that one class which is just treated as any other nsFontXft, and in particular, the enumerator callbacks become clean and tidy since |if (!aFont)| would mean a sure out-of-memory condition (or something like that) rather than a font-inexistence needing to sprinkle further special casing here and there. Essentially, it supersedes the |MiniFont| and does the extras.
Assignee | ||
Comment 4•21 years ago
|
||
This feature is not in as much demand as back in late 1990's when the font
situation for POSIX/X11 was far worse than now. To take just an example,
ISO-8859-15 was new and fonts with Euro(let alone wider Unicode coverage) was
not so commonly available. Back then(and even now for Moz-X11core), installing
more fonts with wider coverage meant significant speed-down. Anyone who visited
UTF-8 pages with Mozilla-X11 would know what I mean. It takes Pentium 3 700Mhz
machine more than a minute to load Google search result (in UTF-8) if
Moz-X11core is run in UTF-8 locale and one has a lot of large X11 'BDFs' [0]
(CJK or iso10646-1 encoding) installed. These days with Mozill-Xft and TTFs,
there's no performance reduction with more fonts installed. Even with a number
of fonts with pan-unicode coverage, it takes a second or less so that it's not
an unreasonable requirement that fonts with the necessary coverage of scripts of
one's interest be installed to avoid 'domino's. Moreover, thanks to fontconfig,
it's very easy to add fonts. Just dropping a font into fc search path do the
job. On most standard Linux distros, it should be rare to see excessive numbers
of dominos. If one does, that's a clear indication that one needs to install
more fonts. Transliteration support is a good guard against that (and perhaps
good for small embedded systems), but one can't keep putting off installing
more fonts relying on transliteration.
Having said that, I think implementing this can build upon the patch for 176290
sharing some routines and generally following what's done in GfxGTK w/X11core.
However, we should set the fallback to 'NONE' only using the genuine
transliteration and resorting to drawing 'unknown glyphs' when
nsSaveCharset::Convert fails because the current unknown glyph approach is
better than any of fallbacks offered by nsISaveAsCharset(hexadecimal, decimal or
'?').[1] This means that nsSaveAsCharset::Convert has to be called per character
basis (in nsXftFontSubstitute::HasChar or equivalent), the result of which might
need to be cached for speed-up (and later use).
> I don't think that should require a whole new type
Perhaps not, but I'm afraid doing that in-situ is not the best path we can
take. Generally, there's a change in length after the transliteration so that it
is similar to converting to custom-font encoding in a sense.
[0] TTFs packaged and presented as X11core fonts via a font server fall to this
category as well.
[1] I'm writing from memory. Is 'entity'(*non-numerical*) one of fallbacks? If
it is, some people might like it better than the current unknown glyph.
> I think implementing this can build upon the patch for 176290 I was about to add that but you are too quick... it is better to wait on bug 176290 which provides a helpful basis for this. >This feature is not in as much demand as back in late 1990's when the font >situation for POSIX/X11 was far worse than now. To take just an example, I am not interested in yet another font debate on Linux where the tendency is often to resist font improvements until being forced... Even on Windows where there are already many excellent fonts, my experience with Math characters (thousands of them...) shows that this is a helpful feature.
As a further incentive for this to happen on Xft, I was the one who implemented it on GfxWin and did it out of _necessity_.
Assignee | ||
Comment 7•21 years ago
|
||
> I was the one who implemented it on GfxWin and did it out of _necessity_. If I sounded differently, I don't have a single bit of doubt that it was and is necessary. Our difference lies only in the degree, but your experience should carry more weight than mine because I haven't tested many MathML pages. Anyway, I already have a sketch of implementation, but as we agreed, we'd better do it after bug 176290 is resolved. BTW, one of Unicode 4.0 data files [1] define 'default_ignorable_codes'. U+2062 (invisible times) is one of them (ZWJ/ZWNJ are not). According to Mark Davis, characters with 'default_ignorable_code' can be ignored (be turned to nothing), but can affect the rendering/layout if supported. That is, 'ab' can be rendered a little differently from 'a⁢b'. My problem is the opposite. One of my fonts(I suspect it's CODE2000 that James Kass has been diligently making as 'Pan-Unicodic' as possible) has a visible glyph for this invisible character. So, I'm getting a very conspicuous 'dotted x inside a dotted box' where nothing or just a very thin space should be... I may be getting off the track a bit, but I kinda think of this as dual of the problem at hand. [1] http://www.unicode.org/Public/4.0-Update/DerivedCoreProperties-4.0.0.txt
Comment 8•21 years ago
|
||
Transliteration is quite a useful for some of the more exotic characters... How many people have fonts with U+2496? ("15.")
Assignee | ||
Comment 9•21 years ago
|
||
If you're telling me, for the record I didn't say it's not useful. It's for sure useful for characters like U+2496 (and many others that are in Unicode purely for the sake of backward compatibility with legacy character sets and some others that are in Unicode thanks to their own merits).[1] However, for some other characters, it _could_ (depending on the situation) be better to alert users to the need to install more fonts by conspicuous 'unknown glyph symbols'. Well, Mozilla's transliteration table at the moment is not so exhaustive (as, say, glibc 2.x's transliteration table) and this is not an issue. [1] U+2496 is a part of PRC's GB 2312-80 and I suspect that's the sole reason it's in Unicode/ISO 10646. Everybody would object to encoding it if we could begin from the scratch. Being a part of GB 2312-80, it's in every simplified Chinese font (BDF or truetype or whatother format. Even some very old X terminals from the early 1990's is likely to have a font or two with it.) and any font that aims to be pan-Unicodic (e.g shareware Code2000, Cyberbit, Arial MS Unicode, etc). I also found that some GPL'd Japanese fonts (that come by default in major Linux distros) have it. Note that in this age of truetype dominance, there's _no_ platform dependency in the font availability barring license issues. As I wrote, my problem is exactly the opposite. Some overagressive fonts have visible glyphs for invisible characters like U+2062. Of course, this fact doesn't reduce the usefulness of transliterating 'U+2496' to U+0031, U+0035, FullStop.
Reporter | ||
Comment 10•21 years ago
|
||
For quick refrence, I am "connecting" to bug 205387 which is about ignorable characters.
Blocks: 205387
Assignee | ||
Comment 11•21 years ago
|
||
One of reasons I haven't implemented this is that I don't like any of 'transliteration options' available in nsISaveAsCharset. For instance, I prefer 'minifont' to using '&#ddddd;' (NCR). There's a way to have the best of both, but it takes some work. I have to add a API (or options) to nsISaveAsCharset to preserve 'untransilterable' characters for which I want to use minifont. On the other hand, the need for nsFontXftSubstitute is clearly there (see bug 221024).
Assignee: blizzard → jshin
Assignee | ||
Comment 12•21 years ago
|
||
I've filed bug 230088 for a new API for transliteration that can preserve the untransliterable.
Status: NEW → ASSIGNED
Updated•16 years ago
|
Product: Core → Core Graveyard
Status: ASSIGNED → RESOLVED
Closed: 1 month ago
Resolution: --- → INCOMPLETE
You need to log in
before you can comment on or make changes to this bug.
Description
•