Status

()

RESOLVED DUPLICATE of bug 205387
16 years ago
10 years ago

People

(Reporter: bulbul, Assigned: smontagu)

Tracking

({regression})

Trunk
mozilla1.4alpha
x86
All
regression
Points:
---

Firefox Tracking Flags

(Not tracked)

Details

(URL)

Attachments

(6 attachments, 3 obsolete attachments)

(Reporter)

Description

16 years ago
User-Agent:       Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.3b) Gecko/20030205
Build Identifier: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.3b) Gecko/20030205

Certain Farsi characters display as a box with "zwnj" instead of the character.
This is seen in the first paragraph on <http://www.farsikde.org>. I first
noticed this in a Jan 16 build of Mozilla, but it's in 2003-02-05-08 (Linux), as
well. Netscape 7.x (the latest, got it today) displays these characters correctly.

I have modelled the summary line on bug 117041.

Reproducible: Always

Steps to Reproduce:
(Reporter)

Comment 1

16 years ago
Created attachment 113660 [details]
screenshot of page with zwnj characters
Created attachment 113691 [details]
screenshot with xft

Seems fine with xft here.
(Assignee)

Comment 3

16 years ago
Confirming and taking. It looks like the problem is there in some form even in
xft: in attachment 113691 [details] I can see vertical lines in the same positions as the
ZWNJ symbols in attachment 113660 [details].
Assignee: mkaply → smontagu
Status: UNCONFIRMED → NEW
Ever confirmed: true
Keywords: regression
(Assignee)

Comment 4

16 years ago
Created attachment 113746 [details]
Reduced and enlarged testcase

This shows a few words from the URL, using NCRs to make it clearer what is
happening. The first paragraph is written with presentation forms in the
source, so it doesn't go through the Arabic shaping routines at
nsTextTransformer::DoArabicShaping(), which is where we strip the zero width
characters. That is obviously an overoptimization.
(Assignee)

Comment 5

16 years ago
Changing OS to All. The ZWNJ isn't rendered on Windows, but the layout of the
text is incorrect wherever it appears.
Status: NEW → ASSIGNED
OS: Linux → All
(Assignee)

Comment 6

16 years ago
Created attachment 113771 [details]
Screen capture on W2K
(Assignee)

Comment 7

16 years ago
Created attachment 113773 [details] [diff] [review]
Patch - strip zwj and zwnj even when not doing Arabic shaping
(Assignee)

Updated

16 years ago
Attachment #113773 - Flags: superreview?(roc+moz)
Attachment #113773 - Flags: review?(roc+moz)
(Assignee)

Updated

16 years ago
Target Milestone: --- → mozilla1.4alpha
Attachment #113773 - Flags: superreview?(roc+moz)
Attachment #113773 - Flags: superreview+
Attachment #113773 - Flags: review?(roc+moz)
Attachment #113773 - Flags: review+
(Assignee)

Comment 8

16 years ago
Fix checked in.
Status: ASSIGNED → RESOLVED
Last Resolved: 16 years ago
Resolution: --- → FIXED

Comment 9

16 years ago
> it doesn't go through the Arabic shaping routines at
> nsTextTransformer::DoArabicShaping(), which is where we strip the zero 
> width characters. That is obviously an overoptimization

  However, ZWJ and ZWNJ have roles to play in Indic scripts
and probably elsewhere. I'm afraid simply removing them
caused the symptom reported in bug 202352. The web
site in question might have to be fixed instead of Mozilla.
I mean, if the Arabic presentation forms (in U+FB00 -FDFF)
are used, it's not clear to me what ZWNJ/ZWJ are doing there 
in the middle of Arabic presentation forms. Or, we may have to take
care of them  in a *context-sensitive* way if possible.
(see http://www.unicode.org/book/ch13.pdf : p.317) 

In case of Xft, I guess it's almost doing the right thing
for *non-complex* scripts including Arabic/Farsi text
represented with Arabic presentation forms (where I  guess
ZWJ/ZWNJ can be safely ignored. Even for Latin scripts, ZWJ/ZWNJ
can play a role...). The slight 'misalignment' observed in attachment 113691 [details])
may be a font issue where ZWJ/ZWNJ has non-zero-width
(advancing) blank glyphs instead of non-advancing blank glyphs. 
Needless to say, when ZWJ/ZWNJ have to be 'honored', Xft build 
does not do the right thing, but that's another issue.

Comment 10

16 years ago
Created attachment 122284 [details]
screenshot (Win2k) : Mozilla vs MS IE 6

As a possible result of removing ZWJ/ZWNJ, Mozilla(20030426) 
doesn't render the following sequence(at the leftmost end of the second line)
correctly :

 &#x0698;&#x0647;&#x200c;&#x06cc;&#x200c;.

Because U+06CC(dual-joining) is surrounded by ZWNJ(non-joining),
it has to be rendered with a nominal glyph. In case of
U+0647, it is following U+0698(Not right-join-causing)
and is followed by ZWNJ(non-joining) so that it also has to be
rendered with a nominal glyph. MS IE render them both
with their nominal glyphs while Mozilla joins them
and then ligate them.  This should not happen if it's
Mozilla's internal ArabicShaping that does the shaping
because ArabicShaping should take care of this. If it's
the native Arabic shaper on Win2k/XP, this can be explained
by the removal of ZWJ/ZWNJ implemented in the patch.  
  
In conclusion, I think we should NOT remove ZWJ/ZWNJ. Doing
so would  deprive the renderer/shaper down the road (either native
as for various Indic scripts on Win2k/XP or Mozilla's own as for Hindi shaper
for Mozilla-gtk-x11core) of the opportunity to deal with them in a
context-sensitive manner (either ignore it or change the rendering behavior of
text around them).


As for the original bug (apparently filed for gtk-X11core+freetype)
build, nsFontMetricsGTK/FT has to prevent ZWJ/ZWNJ from being
rendered with 'last resort' glyphs if they can be safely ignored.
(Assignee)

Comment 11

16 years ago
Jungshik, you are right. This was the wrong fix and should be backed out.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---

Comment 12

16 years ago
*** Bug 204234 has been marked as a duplicate of this bug. ***
(Assignee)

Comment 13

16 years ago
Created attachment 122359 [details] [diff] [review]
back it out
(Assignee)

Updated

16 years ago
Attachment #122359 - Flags: superreview?
Attachment #122359 - Flags: review?(roc+moz)

Updated

16 years ago
Attachment #122359 - Flags: superreview?(roc+moz)
Attachment #122359 - Flags: superreview?
Attachment #122359 - Flags: review?(roc+moz)
Attachment #122359 - Flags: review+
(Assignee)

Updated

16 years ago
Blocks: 202352

Comment 14

16 years ago
Simon, thanks for taking care of it.
With zwnj/zwj passed along intact, attachment 113746 [details] is rendered
correctly under Win2k/XP (with isolated forms/nominal glyphs). 
It also solved a part of bug 202352(one of two problems
reported there). As I wrote there, the other may be beyond what
Mozilla-Win can do at the moment (unless we want to switch over to Uniscribe
APIs soon,  which we have to do for editing/selection).

BTW,  bstell, where does Mozilla-Gtk/FT draw 'last resort' glyphs([zwnj]) when
characters are not available in any  font on the system? Can you exclude
zwnj/zwj from that process? Falling short of handling zwnj/zwj in a
context-sensitve manner,  we may be better
off just ignoring them. Well, this is debatable. What it does now can
be better... 

Comment 15

16 years ago
Comment on attachment 122359 [details] [diff] [review]
back it out

Do we still want to
keep StripZeroWidthJoinControls()
as a function? It's invoked only once with this change.

Comment 16

16 years ago
Created attachment 123371 [details] [diff] [review]
remove |StripZeroWidthJoinControls|

Let's get in either Simon's (attachment 122359 [details] [diff] [review]) or this quickly. 
We need at least one platform (Win2k/XP) where Indic scripts(not just
Devanagari but other Indic scripts) are rendered on par with MS IE, don't we?
Attachment #122359 - Flags: superreview?(roc+moz) → superreview+
(Assignee)

Comment 17

16 years ago
Comment on attachment 122359 [details] [diff] [review]
back it out

Requesting approval to back out this patch, which caused serious regressions in
Persian and many Indian languages.
Attachment #122359 - Flags: approval1.4?

Comment 18

16 years ago
Comment on attachment 122359 [details] [diff] [review]
back it out

a=asa (on behalf of drivers) for checkin to 1.4
Attachment #122359 - Flags: approval1.4? → approval1.4+
(Assignee)

Comment 19

16 years ago
Checked in attachment 122359 [details] [diff] [review]. The original problem will now resurface.
(Assignee)

Comment 20

16 years ago
With respect to comment 14, the glyphs being displayed e.g. in attachment 113660 [details]
aren't last-resort glyphs created by mozilla. They are coming from the font
-mutt-clearlyu-medium-r-normal--17-120-100-100-p-*-iso10646-1

Comment 21

16 years ago
Thanks for figuring out where the glyph comes from. Without any font having
visible glyphs for 'non-visible' characters (see bug 205387), transliteration
implemented in GFX-Gtk/X11core would turn them to nothingness. Situations like
this (some 'aggressive' fonts have visible spacing glyphs for the invisible that
control shaping of characters around them, but no toolkit-specific shaper is yet
available for them) are hard to deal with. 

One possibility is to implement 'blackhole' font rbs suggested in bug 205387 for
GFX-Gtk/X11core and include ZWJ/ZWNJ in the list of characters to throw into the
blackhole font *only until* we implement shaping for them. For instance, Prabhat
is aware of problems with ZWJ/ZWNJ in his hindi shaper and plans to fix them.
  
Another is to do nothing. Arguably, it's not a bug to render characters like
ZWJ/ZWNJ (that are supposed to have visual effects) with visible glyphs when no
other option is available. They're different from characters like INVISIBLE
TIMES (U+2062)that has to be black-holed unless explicitly turned visible by
yet-to-be-specified CSS or other higher level markups. Especially, in this
particular case(http://www.farsikde.org), we may say 'WONT FIX' because
U+200C(ZWNJ) follows Arabic presentation forms like U+FEC2 and U+FEAA, which
doesn't make a lot of sense to me because Arabic presen. forms are non-joining
without being followed by ZWNJ. 

Alternatively, we may do some special-casing in nsTextTra... Remove zwj/zwnj
only when preceded by Arabic presentation forms, but I'm wondering how many web
pages mix zwj/zwnj with Arabic presentation forms as is done at
http://www.farsikde.org


(Assignee)

Comment 22

16 years ago
I have at last got a handle on this, with a lot of thanks due to Roland Mainz:
we decide whether X fonts have a glyph for a codepoint by looking at only the
|ascent| and |descent| members of the XCharStruct. We should be checking all the
members of the struct. See http://tronche.com/gui/x/xlib/graphics/font-metrics/ :
"A nonexistent character is represented with all members of its XCharStruct set
to zero."
(Assignee)

Comment 23

16 years ago
Created attachment 123952 [details] [diff] [review]
Don't reject glyphs unless every member of the XCharStruct is empty

This fixes http://www.farsikde.org, but should be tested more widely.
Attachment #113773 - Attachment is obsolete: true
Attachment #122359 - Attachment is obsolete: true
Attachment #123371 - Attachment is obsolete: true

Comment 24

16 years ago
jshin / katakai, can you test the patch, please ?

Comment 25

16 years ago
The patch itself makes GFX:GTK more 'compliant' to Xlib spec(?) and is useful.
However, I'm afraid it doesn't remove the root cause of the problem (comment
#21). What happens if a font with visible glyphs for zwj/zwnj is _ahead of_ a
font with invisible glyphs for them in the loaded font list? 

Perhaps what we can do include :
  1. black-hole font (as rbs proposed)
  2. removes default_ignorable_code_point from ccmap of any _10646_ fonts.
    this is similar to what's done for CJK special chars  
(http://lxr.mozilla.org/seamonkey/source/gfx/src/gtk/nsFontMetricsGTK.cpp#2242)

The second is attractive  because even when shapers associated with
sun.unicode.india-0 or other XLFDs begin to support zwj/zwnj, we don't need to
change anything (note that we're removing zwj/zwnj only from ccmap of iso10646-1
fonts). In the meantime, the transliterator (nsFontGTKSubstitute) takes care of
zwj/zwnj.  

Comment 26

16 years ago
>  2. removes default_ignorable_code_point from ccmap of any _10646_ 
> fonts. this is similar to what's done for CJK special chars 
 Or, to take a quick route, we can just block them in |GetMapFor10646|. 

Either way, it'll even more slow down Mozilla-X11core when 10646 fonts are used. 
(Assignee)

Comment 27

16 years ago
Shall we just dupe this to bug 205387?

Comment 28

16 years ago
Yeah, it is the same root cause. And since you are hot on it right now, it is
perhaps best to settle on implementing a definitive solution.

*** This bug has been marked as a duplicate of 205387 ***
Status: REOPENED → RESOLVED
Last Resolved: 16 years ago16 years ago
Resolution: --- → DUPLICATE
(Assignee)

Comment 29

16 years ago
Comment on attachment 123952 [details] [diff] [review]
Don't reject glyphs unless every member of the XCharStruct is empty

I was wrong: this patch doesn't make any difference and is probably not worth
pursuing. The reason that it seemed to work was that I had the patch for bug
204272 in the tree where I was testing and the characters were being stripped
anyway.
Attachment #123952 - Attachment is obsolete: true

Comment 30

16 years ago
Comment on attachment 123952 [details] [diff] [review]
Don't reject glyphs unless every member of the XCharStruct is empty

Well, we may want it at least to conform little bit better to the Xlib specs...
:)

I'll open a seperate bug for that later today...
Attachment #123952 - Attachment is obsolete: false
(Assignee)

Comment 31

16 years ago
Yes, if you care about that issue, it belongs in a separate bug.

Comment 32

16 years ago
Simon Montagu wrote:
> Yes, if you care about that issue, it belongs in a separate bug.

Filed bug 207438 ("Glyph-detection code for iso10646-1 fonts should conform more
to the Xlib specs") for that issue...

Updated

10 years ago
Component: Layout: BiDi Hebrew & Arabic → Layout: Text
QA Contact: zach → layout.fonts-and-text
You need to log in before you can comment on or make changes to this bug.