Vowels are not rendered correctly in some Persian/Arabic/Hebrew fonts

RESOLVED FIXED

Status

()

defect
RESOLVED FIXED
8 years ago
7 years ago

People

(Reporter: semekh.dev, Assigned: jfkthame)

Tracking

(Depends on 1 bug, {regression})

Trunk
Points:
---
Dependency tree / graph
Bug Flags:
in-testsuite +

Firefox Tracking Flags

(blocking2.0 Macaw+, status2.0 .1-fixed)

Details

(Whiteboard: [softblocker][fx4-rc-ridealong])

Attachments

(7 attachments, 2 obsolete attachments)

Reporter

Description

8 years ago
User-Agent:       Mozilla/5.0 (X11; Linux i686; rv:2.0b12pre) Gecko/20110216 Firefox/4.0b12pre
Build Identifier: Mozilla/5.0 (X11; Linux i686; rv:2.0b12pre) Gecko/20110216 Firefox/4.0b12pre

When browsing Persian/Arabic pages, vowels are not rendered correctly using some fonts (e.g. Tahoma) . That is, مَن turns to something like م‍ َ‍ن

Reproducible: Always

Steps to Reproduce:
1. you should have Tahoma font (install ttf-mscorefonts-installer on Ubuntu)
2. open http://www.tebyan.net/index.aspx?pid=17257&threadID=16957 using FF4
3. open the same page using FF3 and you'll notice the difference
Actual Results:  
Many letters are not joined

Expected Results:  
Letters should be joined

FF3 is ok with that.
I think it is somehow related to HarfBuzz. After going to about:config and changing gfx.font_rendering.harfbuzz.level to 1 (default is 2) the rendering is not broken anymore.

Comment 2

8 years ago
Does the issue still occur if you start Firefox in Safe Mode? http://support.mozilla.com/en-US/kb/Safe+Mode
Version: unspecified → Trunk
Reporter

Comment 3

8 years ago
(In reply to comment #2)
> Does the issue still occur if you start Firefox in Safe Mode?
> http://support.mozilla.com/en-US/kb/Safe+Mode

Nope, that doesn't work either.
As I told, I'm about sure it is related to HarfBuzz. FF3 doesn't use HarfBuzz and it's ok. In FF4 if you set HarfBuzz level to 1 there is no issue.
Component: General → Layout: Text
Product: Firefox → Core
QA Contact: general → layout.fonts-and-text
Assignee

Comment 4

8 years ago
Are you sure about the STR in comment #0? I just installed ttf-mscorefonts-installer on my Ubuntu system, but this does _not_ include the Tahoma font.

So if Tahoma is in fact being used here, where does it come from? And what specific version of the font do you have?
Reporter

Comment 5

8 years ago
(In reply to comment #4)
> Are you sure about the STR in comment #0? I just installed
> ttf-mscorefonts-installer on my Ubuntu system, but this does _not_ include the
> Tahoma font.
> 
> So if Tahoma is in fact being used here, where does it come from? And what
> specific version of the font do you have?

That's right. I don't know where I installed Tahoma from. It may be from ttf-freefarsi.
but, I'm sure that ttf-farsiweb contains the font "Nazli" which produces the bug on FF4 (but not FF3) as well.
Assignee

Comment 6

8 years ago
No, there's no Tahoma there. (I'm not aware of any legitimate source for a "free" Tahoma font, it's a commercially-licensed product.)

I can reproduce the problem with the Nazli font, as you say. Inspecting the font with Fontforge confirms that in this font, the Arabic diacritics (fatha, damma, kasra, etc) are given a positive advance width, rather than being designed as zero-width glyphs. That is why they lead to "gaps" in the rendered text.

Some OpenType rendering engines (including, apparently, Pango) will override the designed width of diacritic glyphs, but this behavior is not universal, nor is it clearly specified anywhere as far as I am aware. As such, font designers should not be relying on it; glyphs intended to be used as zero-width diacritics should be designed that way in the font.

I think Behdad has considered adding something like this to Harfbuzz, but I don't know if any firm decision has been taken. In any case, I suggest filing a bug report about the diacritics in the Nazli font, and any others you find that show this problem.

Comment 7

8 years ago
(In reply to comment #6)
> I think Behdad has considered adding something like this to Harfbuzz, but I
> don't know if any firm decision has been taken. In any case, I suggest filing a
> bug report about the diacritics in the Nazli font, and any others you find that
> show this problem.

Is it possible to work around this problem by detecting this problem in the font and not using Harfbuzz in that case?

What worries me is that I don't think that there's much hope in actually getting the font fixed, and even if that happens, there would still be a large number of users using the broken version of the font preinstalled on their system.  As far as those users are concerned, this will be a regression from 3.6 for them.
Status: UNCONFIRMED → NEW
blocking2.0: --- → ?
Ever confirmed: true
Keywords: regression

Comment 8

8 years ago
I have plans to fix this.  Just that I'm behind on my plans :(.

http://lists.freedesktop.org/archives/harfbuzz/2010-November/000959.html

Comment 9

8 years ago
I still think we should work around this for 2.0 if possible...
I agree with Ehsan. If we can't fix this for 2.0, we should consider turning off harfbuzz for Arabic, at least on Linux[1]. The rationale is the same that I was given for not turning on harfbuzz for Hebrew: although it makes things better for fonts with good OpenType tables, it makes things worse for legacy fonts.

[1] Personally I have only seen this on Linux, but that may be because my Windows system is a clean install of Windows 7 without any older fonts. cc-ing Amir Aharoni, who mentioned seeing a similar issue on Windows XP.

Comment 11

8 years ago
(In reply to comment #10)
> I agree with Ehsan. If we can't fix this for 2.0, we should consider turning
> off harfbuzz for Arabic, at least on Linux[1]. The rationale is the same that I
> was given for not turning on harfbuzz for Hebrew: although it makes things
> better for fonts with good OpenType tables, it makes things worse for legacy
> fonts.

I agree.  I'm renominating this for blocking again based on this rationale.

> [1] Personally I have only seen this on Linux, but that may be because my
> Windows system is a clean install of Windows 7 without any older fonts. cc-ing
> Amir Aharoni, who mentioned seeing a similar issue on Windows XP.

I can test on Windows and Mac if you tell me which version of what font I need to get installed on my system!
blocking2.0: - → ?
Assignee: nobody → jfkthame
blocking2.0: ? → final+
Whiteboard: [softblocker]
Assignee

Comment 12

8 years ago
(In reply to comment #11)
 
> I can test on Windows and Mac if you tell me which version of what font I need
> to get installed on my system!

If you download the farsifonts-0.4 package (see http://www.farsiweb.ir/wiki/Products/PersianFonts) and install the Nazli font on Windows or Mac, you can reproduce the problem with this; just set the style of some Arabic/Persian text with diacritics to use Nazli.
Assignee

Comment 13

8 years ago
A possible workaround, at least until harfbuzz gains more extensive support for heuristic diacritic positioning, would be to force all diacritics to be zero-width if the font does not actually have a GPOS table.

Behdad, does this seem like a reasonable first step? I realize you may have more sophisticated plans for hb_position_complex_fallback in the long run, but in the meantime we need a solution to avoid the current regression.
Attachment #515015 - Flags: review?(jdaggett)
Attachment #515015 - Flags: feedback?(mozilla)
Attachment 515015 [details] [diff] fixes the bug for all Arabic fonts on my system except for DejaVu Sans Mono (which is the font used by default for monospace Arabic).

As the screenshot shows, the bug also manifests itself slightly differently: with most fonts (without the patch) the diacritics appear *between* the letters, but with DejaVu Sans Mono (with or without the patch) the diacritics themselves are correctly placed, but there are empty spaces after them.

There is also another problem that appears even without harfbuzz turned on: on letters with two diacritics, e.g. the REH ofالرَّحِيمِ, only one diacritic appears.
Assignee

Comment 15

8 years ago
(In reply to comment #14)
> Created attachment 515027 [details]
> Screenshot of DejaVu Sans fonts with the patch
> 
> Attachment 515015 [details] [diff] fixes the bug for all Arabic fonts on my system except for
> DejaVu Sans Mono (which is the font used by default for monospace Arabic).
> 
> As the screenshot shows, the bug also manifests itself slightly differently:
> with most fonts (without the patch) the diacritics appear *between* the
> letters, but with DejaVu Sans Mono (with or without the patch) the diacritics
> themselves are correctly placed, but there are empty spaces after them.

Interesting. This is a separate issue, in that DejaVu Sans Mono *does* have GPOS support for diacritic positioning, I believe - and hence is not affected by the patch here.

What version of DejaVu Sans Mono do you have? I'm aware that it does some unusual GPOS trickery, and have seen problems as a result of this in the past - but it's been a slightly different problem (overlapping glyphs, not gaps). So I'm curious whether there have been changes in the font.

> There is also another problem that appears even without harfbuzz turned on: on
> letters with two diacritics, e.g. the REH ofالرَّحِيمِ, only one diacritic
> appears.

That sounds like a font bug, then. Do other applications (e.g. gedit) on your system show the same problem with this font?
(In reply to comment #15)
> What version of DejaVu Sans Mono do you have?

The version that was installed with Ubuntu 10.10:

        fontRevision=2.31
        File created: Mon May 31 13:08:55 2010
        File modified: Mon May 31 13:08:55 2010

> That sounds like a font bug, then. Do other applications (e.g. gedit) on your
> system show the same problem with this font?

Yes.
Assignee

Comment 17

8 years ago
OK, I'm starting to understand what's going on here. (And the answer is... it's a mess!)

In the DejaVu Sans Mono font, the "non-spacing" diacritic characters (e.g. in the U+03xx block, as well as the Arabic vowel marks) are designed with a non-zero advance width; they are "spacing diacritics". This was presumably done on the grounds that it's a "monospaced" font where every glyph is expected to have the same width; this was applied even to the "non-spacing" characters. (Whether that's a good idea is questionable, I think - but the important point here is that it was done that way.)

If the font had no GPOS table - i.e., no specific OpenType positioning - then the patch here would resolve the problem by forcing the diacritic glyphs to be zero-width, regardless of their metrics in the font. It's a hack, but it helps for old fonts. However, DejaVu Sans Mono _does_ have OpenType GPOS support (as you can tell by the fact that the diacritics adjust to different heights, according to the base character they're attached to).

Now, an earlier version of the harfbuzz code aimed to deal with fonts like this, in that it explicitly zeroed the advance width of diacritic glyphs as part of executing a MarkToBase attachment. That would have fixed the vowels here, and I believe it's similar to what the old Pango code also did. However, that was deliberately _removed_ from the MarkToBase code a few months ago, because it caused problems with some fonts........ in particular, with DejaVu Sans Mono! (Also with Consolas, on Windows.)

The problem is that the GPOS 'mark' feature in DejaVu Sans Mono actually does two things when it attaches a mark glyph to a base. First, it executes a MarkToBase lookup that aligns anchor points on the mark and base glyphs, to get the desired positioning. AND THEN it executes a second lookup that adjusts the advance width of the mark glyph, subtracting its original value so as to leave it zero.

So the problem for harfbuzz is that if it zeroes the advance width when it executes the mark attachment, and then a subsequent lookup in the font subtracts the original value from the mark's advance width, we end up with a NEGATIVE advance, and the next glyph ends up overprinting the base+mark pair. Which is exactly what was happening. And so that zeroing of the advance was removed from the MarkToBase attachment processing.

However, the added twist is that the GPOS table in DejaVu Sans Mono does this subtraction of the advance width for the U+03xx diacritics used in (e.g.) Latin script; but it does NOT have a similar subtable in the Arabic-script 'mark' feature for the U+06xx vowel diacritics in Arabic script. For these, it does a MarkToBase attachment, but it does not touch their advance width. Hence the gaps you see in the rendered text.

I think we can fix this by restoring the feature of zeroing diacritic advance widths even in the case where OpenType positioning is present. But to avoid the problem with DejaVu Sans Mono's own efforts to _also_ zero the width (in the case of the U+03xx diacritics), we need to do this _after_ that lookup has been run.
Assignee

Comment 18

8 years ago
Updated patch to also zero the advance width of diacritics in hb_ot_layout_position_finish().
Attachment #515015 - Attachment is obsolete: true
Attachment #515015 - Flags: review?(jdaggett)
Attachment #515015 - Flags: feedback?(mozilla)
Attachment #515125 - Flags: review?(jdaggett)
Attachment #515125 - Flags: feedback?(mozilla)

Comment 19

8 years ago
Ok, let me distill what you've discovered...

Comment 20

8 years ago
(In reply to comment #12)
> (In reply to comment #11)
> 
> > I can test on Windows and Mac if you tell me which version of what font I need
> > to get installed on my system!
> 
> If you download the farsifonts-0.4 package (see
> http://www.farsiweb.ir/wiki/Products/PersianFonts) and install the Nazli font
> on Windows or Mac, you can reproduce the problem with this; just set the style
> of some Arabic/Persian text with diacritics to use Nazli.

I could reproduce this easily on Mac with these instructions.

Let me know if you need testing help with this.

Comment 21

8 years ago
Also, we need to test this automatically.  I'd write the test myself if I had the slightest idea how to.  :-)
Flags: in-testsuite?
Assignee

Comment 22

8 years ago
(In reply to comment #21)
> Also, we need to test this automatically.  I'd write the test myself if I had
> the slightest idea how to.  :-)

I'm intending to write reftests to check that adding diacritics to a string in DejaVu Sans Mono does not change its width - both for Latin and Arabic script.
Assignee

Comment 23

8 years ago
These tests are designed to verify that adding some zero-width diacritics to a string in DejaVu Sans Mono does not alter its overall advance. We test both Latin and Arabic script, as the font actually implements them in inconsistent ways such that it's easy to break one when fixing the other! (See comments above.)
Attachment #515870 - Flags: review?(jdaggett)
Assignee

Comment 24

8 years ago
The patch here leads to problems with multiple diacritics in the SBL Hebrew font; see bug 637772.
Assignee

Comment 25

8 years ago
The previous patch caused problems with SBL Hebrew, because in certain cases it ligates a diacritic with the _following_ base glyph; the resulting ligature glyph should of course _not_ be zero-width, but checking the General Category gives us the GC from the diacritic, which is not appropriate here.

A better approach, therefore, is to rely on GDEF glyph classes to identify the Mark glyphs whose advance width should be zeroed, and fall back on GC=M* only if no GDEF is available. With this version, both DejaVu Sans Mono and SBL Hebrew appear to work correctly.
Attachment #515125 - Attachment is obsolete: true
Attachment #515125 - Flags: review?(jdaggett)
Attachment #515125 - Flags: feedback?(mozilla)
Attachment #516578 - Flags: review?(jdaggett)
Attachment #516578 - Flags: feedback?(mozilla)
** PRODUCT DRIVERS PLEASE NOTE **

This bug is one of 7 automatically changed from blocking2.0:final+ to blocking2.0:.x during the endgame of Firefox 4 for the following reasons:

 - it was marked as a soft blocking issue without a requirement for beta coverage
blocking2.0: final+ → .x+
Assignee

Updated

8 years ago
Duplicate of this bug: 638458
Assignee

Comment 28

8 years ago
Changing platform to All; bug 638458 shows this happening on Windows XP, and it could happen on any platform especially in the presence of older Arabic fonts.
OS: Linux → All
Hardware: x86 → All
Summary: Vowels are not rendered correctly in some Persian/Arabic fonts → Vowels are not rendered correctly in some Persian/Arabic/Hebrew fonts

Comment 29

8 years ago
I saw a seemingly similar problem on this page <http://www.soundsofmychildhood.com/posts/49> on Mac (using the Terafik font).  Jonathan, can you confirm that this is the same bug please?
Assignee

Comment 30

8 years ago
(In reply to comment #29)
> Created attachment 519305 [details]
> Screenshot of the same problem, perhaps
> 
> I saw a seemingly similar problem on this page
> <http://www.soundsofmychildhood.com/posts/49> on Mac (using the Terafik font). 
> Jonathan, can you confirm that this is the same bug please?

Yes, that's the same bug, and is fixed by the patch here.

Updated

8 years ago
Attachment #516578 - Flags: review?(jdaggett) → review+

Updated

8 years ago
Attachment #515870 - Flags: review?(jdaggett) → review+
Assignee

Comment 31

8 years ago
Comment on attachment 516578 [details] [diff] [review]
patch, force diacritics to be zero width - updated

Requesting approval-2.0; this is a significant regression for Arabic-script users, and we should fix it ASAP. (It could affect other scripts as well, though we have not seen specific examples of fonts that show problems.) Low-risk patch that has no effect except when combining diacritics are actually present.
Attachment #516578 - Flags: feedback?(mozilla) → approval2.0?

Comment 32

8 years ago
I agree that something along the lines of your patch is a must for the public release.  Lets at least get it out and see if it regresses any fonts.  We would get a better picture of how to exactly handle this situation then.  Thanks.
Assignee

Updated

8 years ago
Whiteboard: [softblocker] → [softblocker][fx4-rc-ridealong]
Reporter

Comment 33

8 years ago
Seems like FF4 is planned to be released on March 22nd if no major bug is found:
http://groups.google.com/group/mozilla.dev.planning/browse_thread/thread/18a347956e4693eb?pli=1
I don't know if this may be blocking2.0:final+ or not, but it sure is a major regression for all Persian users (almost all Persian fonts have the issue)
I hope it doesn't make distraction for such users.

Comment 34

8 years ago
http://hg.mozilla.org/projects/cedar/rev/a82e3f12c621
http://hg.mozilla.org/projects/cedar/rev/cfe55d5089a9
Whiteboard: [softblocker][fx4-rc-ridealong] → [softblocker][fx4-rc-ridealong][fixed-in-cedar]

Comment 36

8 years ago
The reftest for this bug failed:

http://tinderbox.mozilla.org/showlog.cgi?log=Cedar/1300936858.1300938479.32251.gz

so I backed it out:

http://hg.mozilla.org/projects/cedar/rev/fa327e9db425
Whiteboard: [softblocker][fx4-rc-ridealong][fixed-in-cedar] → [softblocker][fx4-rc-ridealong][not-ready]
Assignee

Comment 37

8 years ago
Argh, the reftests need HTTP(..) in the manifest (so that @font-face works).
Assignee

Comment 38

8 years ago
Pushed again, to m-c this time, after checking reftest on tryserver:

http://hg.mozilla.org/mozilla-central/rev/c4113c1aa7e5 - patch
http://hg.mozilla.org/mozilla-central/rev/bf15f5d6cf32 - reftests, with HTTP(..)
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Whiteboard: [softblocker][fx4-rc-ridealong][not-ready] → [softblocker][fx4-rc-ridealong]
You should double-check that it's still in correctly, since the landing *and backout* just merged in to mozilla-central, *after* your second landing.  (I think it should be, but I don't completely trust hg merging on such things.)
Assignee

Comment 40

8 years ago
(In reply to comment #39)
> You should double-check that it's still in correctly, since the landing *and
> backout* just merged in to mozilla-central, *after* your second landing.  (I
> think it should be, but I don't completely trust hg merging on such things.)

Yes, I noticed that in the pushlog, and was a bit concerned, but it looks fine in current trunk. Thanks for the heads-up, though!

Comment 41

8 years ago
(In reply to comment #39)
> You should double-check that it's still in correctly, since the landing *and
> backout* just merged in to mozilla-central, *after* your second landing.  (I
> think it should be, but I don't completely trust hg merging on such things.)

Do you have a negative experience about this in the past?  I didn't use to trust hg merge for anything which touches the same file, until I started to use it for quite some time, and I was really satisfied with it (zero bad merges), and now I just assume that it works.  But I'd like to know if I'm making the wrong assumption.
Assignee

Updated

8 years ago
Depends on: 644857
Duplicate of this bug: 645392
Depends on: 646371

Comment 43

8 years ago
It is not on Unix/Linux only and it is not connected to the font. it happened on windows and iMac.
The FF version 4.0 is to blame not local fonts. You can not control what clients have in there PCs. Instead look in FF 4 and fix it.
Assignee

Comment 44

8 years ago
(In reply to comment #43)
> It is not on Unix/Linux only and it is not connected to the font. it happened
> on windows and iMac.
> The FF version 4.0 is to blame not local fonts. You can not control what
> clients have in there PCs. Instead look in FF 4 and fix it.

If you actually _look_ at this bug report, you'll see that it is already fixed. You can get a current nightly build from http://nightly.mozilla.org/ if you want to try it.

Comment 45

8 years ago
Right I did not _look_ at the bug report. The nightly build does fix it.
Comment on attachment 516578 [details] [diff] [review]
patch, force diacritics to be zero width - updated

Approved for the mozilla2.0 repo, a=dveditz for release-drivers
Attachment #516578 - Flags: approval2.0? → approval2.0+
blocking2.0: .x+ → Macaw

Comment 47

8 years ago
Jonathan, let me know if you need me to help land this, please.  Thanks for your patch again! :)
Assignee

Comment 48

8 years ago
(In reply to comment #47)
> Jonathan, let me know if you need me to help land this, please.  Thanks for
> your patch again! :)

I hope to get it landed this morning (UK time), but if it's not done by the time you wake up, feel free to do it for me! :)
Assignee

Updated

8 years ago
Duplicate of this bug: 650470

Comment 51

8 years ago
(In reply to comment #50)
> *** Bug 650470 has been marked as a duplicate of this bug. ***

Why is this bug flagged as "RESOLVED FIXED"? It is not fixed, the problem is still going on. There are still those unwanted spaces between characters.
Reporter

Comment 52

8 years ago
This is resolved, but not included in FF4. It's currently included in Nightly builds.

Updated

8 years ago
Depends on: 589682
Assignee

Updated

8 years ago
Duplicate of this bug: 650635

Comment 54

8 years ago
Another interesting manifestation of this bug can be seen in this test case:

data:text/html;charset=utf-8,<span style="font-size: 72;">خانهٔ&zwj;من</span>

On 4.0, there is a space where the ZWJ character is (note that the presence of the ZWJ character is not significant in this test case, I just included it to make it easier to see the two parts. it's the HAMZA character which triggers this bug).  On 4.0.1-build1, this problem does not exist.  (I reproduced this on Windows, of course; I'm attaching a screenshot of the bug.)

Jonathan, do you think this is worth building a test case based on?
Assignee

Comment 55

8 years ago
(In reply to comment #54)
> Created attachment 526619 [details]
> Screenshot of this bug triggered by HAMZA
> 
> Another interesting manifestation of this bug can be seen in this test case:
> 
> data:text/html;charset=utf-8,<span style="font-size: 72;">خانهٔ&zwj;من</span>
> 
> On 4.0, there is a space where the ZWJ character is (note that the presence of
> the ZWJ character is not significant in this test case, I just included it to
> make it easier to see the two parts. it's the HAMZA character which triggers
> this bug).  On 4.0.1-build1, this problem does not exist.  (I reproduced this
> on Windows, of course; I'm attaching a screenshot of the bug.)
> 
> Jonathan, do you think this is worth building a test case based on?

I don't think this is significantly different from the case with the vowels, for which we've already got a reftest. Although you may think of hamza as a different category from the vowel marks, as far as the OpenType rendering is concerned they're all just non-spacing marks.

Comment 56

8 years ago
(In reply to comment #55)
> I don't think this is significantly different from the case with the vowels,
> for which we've already got a reftest. Although you may think of hamza as a
> different category from the vowel marks, as far as the OpenType rendering is
> concerned they're all just non-spacing marks.

Fair enough!  I trust your judgment here.  :-)
Assignee

Updated

8 years ago
Depends on: 654057

Updated

7 years ago
Flags: in-testsuite? → in-testsuite+
You need to log in before you can comment on or make changes to this bug.