Closed Bug 192088 Opened 21 years ago Closed 21 years ago
zwnj show up as [zwnj]
58.69 KB, image/png
123.02 KB, image/png
619 bytes, text/html
7.45 KB, image/jpeg
24.43 KB, image/png
3.52 KB, patch
|Details | Diff | Splinter Review|
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.3b) Gecko/20030205 Build Identifier: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.3b) Gecko/20030205 Certain Farsi characters display as a box with "zwnj" instead of the character. This is seen in the first paragraph on <http://www.farsikde.org>. I first noticed this in a Jan 16 build of Mozilla, but it's in 2003-02-05-08 (Linux), as well. Netscape 7.x (the latest, got it today) displays these characters correctly. I have modelled the summary line on bug 117041. Reproducible: Always Steps to Reproduce:
Seems fine with xft here.
Confirming and taking. It looks like the problem is there in some form even in xft: in attachment 113691 [details] I can see vertical lines in the same positions as the ZWNJ symbols in attachment 113660 [details].
Assignee: mkaply → smontagu
Status: UNCONFIRMED → NEW
Ever confirmed: true
This shows a few words from the URL, using NCRs to make it clearer what is happening. The first paragraph is written with presentation forms in the source, so it doesn't go through the Arabic shaping routines at nsTextTransformer::DoArabicShaping(), which is where we strip the zero width characters. That is obviously an overoptimization.
Changing OS to All. The ZWNJ isn't rendered on Windows, but the layout of the text is incorrect wherever it appears.
Status: NEW → ASSIGNED
OS: Linux → All
Fix checked in.
Status: ASSIGNED → RESOLVED
Closed: 21 years ago
Resolution: --- → FIXED
> it doesn't go through the Arabic shaping routines at > nsTextTransformer::DoArabicShaping(), which is where we strip the zero > width characters. That is obviously an overoptimization However, ZWJ and ZWNJ have roles to play in Indic scripts and probably elsewhere. I'm afraid simply removing them caused the symptom reported in bug 202352. The web site in question might have to be fixed instead of Mozilla. I mean, if the Arabic presentation forms (in U+FB00 -FDFF) are used, it's not clear to me what ZWNJ/ZWJ are doing there in the middle of Arabic presentation forms. Or, we may have to take care of them in a *context-sensitive* way if possible. (see http://www.unicode.org/book/ch13.pdf : p.317) In case of Xft, I guess it's almost doing the right thing for *non-complex* scripts including Arabic/Farsi text represented with Arabic presentation forms (where I guess ZWJ/ZWNJ can be safely ignored. Even for Latin scripts, ZWJ/ZWNJ can play a role...). The slight 'misalignment' observed in attachment 113691 [details]) may be a font issue where ZWJ/ZWNJ has non-zero-width (advancing) blank glyphs instead of non-advancing blank glyphs. Needless to say, when ZWJ/ZWNJ have to be 'honored', Xft build does not do the right thing, but that's another issue.
As a possible result of removing ZWJ/ZWNJ, Mozilla(20030426) doesn't render the following sequence(at the leftmost end of the second line) correctly : ژه‌ی‌. Because U+06CC(dual-joining) is surrounded by ZWNJ(non-joining), it has to be rendered with a nominal glyph. In case of U+0647, it is following U+0698(Not right-join-causing) and is followed by ZWNJ(non-joining) so that it also has to be rendered with a nominal glyph. MS IE render them both with their nominal glyphs while Mozilla joins them and then ligate them. This should not happen if it's Mozilla's internal ArabicShaping that does the shaping because ArabicShaping should take care of this. If it's the native Arabic shaper on Win2k/XP, this can be explained by the removal of ZWJ/ZWNJ implemented in the patch. In conclusion, I think we should NOT remove ZWJ/ZWNJ. Doing so would deprive the renderer/shaper down the road (either native as for various Indic scripts on Win2k/XP or Mozilla's own as for Hindi shaper for Mozilla-gtk-x11core) of the opportunity to deal with them in a context-sensitive manner (either ignore it or change the rendering behavior of text around them). As for the original bug (apparently filed for gtk-X11core+freetype) build, nsFontMetricsGTK/FT has to prevent ZWJ/ZWNJ from being rendered with 'last resort' glyphs if they can be safely ignored.
Jungshik, you are right. This was the wrong fix and should be backed out.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
*** Bug 204234 has been marked as a duplicate of this bug. ***
Simon, thanks for taking care of it. With zwnj/zwj passed along intact, attachment 113746 [details] is rendered correctly under Win2k/XP (with isolated forms/nominal glyphs). It also solved a part of bug 202352(one of two problems reported there). As I wrote there, the other may be beyond what Mozilla-Win can do at the moment (unless we want to switch over to Uniscribe APIs soon, which we have to do for editing/selection). BTW, bstell, where does Mozilla-Gtk/FT draw 'last resort' glyphs([zwnj]) when characters are not available in any font on the system? Can you exclude zwnj/zwj from that process? Falling short of handling zwnj/zwj in a context-sensitve manner, we may be better off just ignoring them. Well, this is debatable. What it does now can be better...
Comment on attachment 122359 [details] [diff] [review] back it out Do we still want to keep StripZeroWidthJoinControls() as a function? It's invoked only once with this change.
Let's get in either Simon's (attachment 122359 [details] [diff] [review]) or this quickly. We need at least one platform (Win2k/XP) where Indic scripts(not just Devanagari but other Indic scripts) are rendered on par with MS IE, don't we?
Attachment #122359 - Flags: superreview?(roc+moz) → superreview+
Comment on attachment 122359 [details] [diff] [review] back it out Requesting approval to back out this patch, which caused serious regressions in Persian and many Indian languages.
Attachment #122359 - Flags: approval1.4?
Comment on attachment 122359 [details] [diff] [review] back it out a=asa (on behalf of drivers) for checkin to 1.4
Attachment #122359 - Flags: approval1.4? → approval1.4+
With respect to comment 14, the glyphs being displayed e.g. in attachment 113660 [details] aren't last-resort glyphs created by mozilla. They are coming from the font -mutt-clearlyu-medium-r-normal--17-120-100-100-p-*-iso10646-1
Thanks for figuring out where the glyph comes from. Without any font having visible glyphs for 'non-visible' characters (see bug 205387), transliteration implemented in GFX-Gtk/X11core would turn them to nothingness. Situations like this (some 'aggressive' fonts have visible spacing glyphs for the invisible that control shaping of characters around them, but no toolkit-specific shaper is yet available for them) are hard to deal with. One possibility is to implement 'blackhole' font rbs suggested in bug 205387 for GFX-Gtk/X11core and include ZWJ/ZWNJ in the list of characters to throw into the blackhole font *only until* we implement shaping for them. For instance, Prabhat is aware of problems with ZWJ/ZWNJ in his hindi shaper and plans to fix them. Another is to do nothing. Arguably, it's not a bug to render characters like ZWJ/ZWNJ (that are supposed to have visual effects) with visible glyphs when no other option is available. They're different from characters like INVISIBLE TIMES (U+2062)that has to be black-holed unless explicitly turned visible by yet-to-be-specified CSS or other higher level markups. Especially, in this particular case(http://www.farsikde.org), we may say 'WONT FIX' because U+200C(ZWNJ) follows Arabic presentation forms like U+FEC2 and U+FEAA, which doesn't make a lot of sense to me because Arabic presen. forms are non-joining without being followed by ZWNJ. Alternatively, we may do some special-casing in nsTextTra... Remove zwj/zwnj only when preceded by Arabic presentation forms, but I'm wondering how many web pages mix zwj/zwnj with Arabic presentation forms as is done at http://www.farsikde.org
I have at last got a handle on this, with a lot of thanks due to Roland Mainz: we decide whether X fonts have a glyph for a codepoint by looking at only the |ascent| and |descent| members of the XCharStruct. We should be checking all the members of the struct. See http://tronche.com/gui/x/xlib/graphics/font-metrics/ : "A nonexistent character is represented with all members of its XCharStruct set to zero."
This fixes http://www.farsikde.org, but should be tested more widely.
jshin / katakai, can you test the patch, please ?
The patch itself makes GFX:GTK more 'compliant' to Xlib spec(?) and is useful. However, I'm afraid it doesn't remove the root cause of the problem (comment #21). What happens if a font with visible glyphs for zwj/zwnj is _ahead of_ a font with invisible glyphs for them in the loaded font list? Perhaps what we can do include : 1. black-hole font (as rbs proposed) 2. removes default_ignorable_code_point from ccmap of any _10646_ fonts. this is similar to what's done for CJK special chars (http://lxr.mozilla.org/seamonkey/source/gfx/src/gtk/nsFontMetricsGTK.cpp#2242) The second is attractive because even when shapers associated with sun.unicode.india-0 or other XLFDs begin to support zwj/zwnj, we don't need to change anything (note that we're removing zwj/zwnj only from ccmap of iso10646-1 fonts). In the meantime, the transliterator (nsFontGTKSubstitute) takes care of zwj/zwnj.
> 2. removes default_ignorable_code_point from ccmap of any _10646_ > fonts. this is similar to what's done for CJK special chars Or, to take a quick route, we can just block them in |GetMapFor10646|. Either way, it'll even more slow down Mozilla-X11core when 10646 fonts are used.
Shall we just dupe this to bug 205387?
Yeah, it is the same root cause. And since you are hot on it right now, it is perhaps best to settle on implementing a definitive solution. *** This bug has been marked as a duplicate of 205387 ***
Status: REOPENED → RESOLVED
Closed: 21 years ago → 21 years ago
Resolution: --- → DUPLICATE
Comment on attachment 123952 [details] [diff] [review] Don't reject glyphs unless every member of the XCharStruct is empty I was wrong: this patch doesn't make any difference and is probably not worth pursuing. The reason that it seemed to work was that I had the patch for bug 204272 in the tree where I was testing and the characters were being stripped anyway.
Attachment #123952 - Attachment is obsolete: true
Comment on attachment 123952 [details] [diff] [review] Don't reject glyphs unless every member of the XCharStruct is empty Well, we may want it at least to conform little bit better to the Xlib specs... :) I'll open a seperate bug for that later today...
Attachment #123952 - Attachment is obsolete: false
Yes, if you care about that issue, it belongs in a separate bug.
Simon Montagu wrote: > Yes, if you care about that issue, it belongs in a separate bug. Filed bug 207438 ("Glyph-detection code for iso10646-1 fonts should conform more to the Xlib specs") for that issue...
Component: Layout: BiDi Hebrew & Arabic → Layout: Text
QA Contact: zach → layout.fonts-and-text
You need to log in before you can comment on or make changes to this bug.