Last Comment Bug 667166 - wrong shape of letter when it comes at the end of word in the arabic version of Firefox 5.0
: wrong shape of letter when it comes at the end of word in the arabic version ...
Status: RESOLVED FIXED
: fonts, regression
Product: Core
Classification: Components
Component: Layout: Text (show other bugs)
: Trunk
: All All
: -- normal (vote)
: mozilla8
Assigned To: Jonathan Kew (:jfkthame)
:
Mentors:
http://www.bbc.co.uk/arabic/sciencean...
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2011-06-25 01:36 PDT by Rami Ali
Modified: 2011-08-14 04:47 PDT (History)
6 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments
The problem does not exist in Firefox 4.0.1 (213.60 KB, image/png)
2011-06-26 01:03 PDT, Rami Ali
no flags Details
The problem exists in Firefox 5.0 (225.20 KB, image/png)
2011-06-26 01:05 PDT, Rami Ali
no flags Details
patch, apply 'locl' as one of the first features, before Arabic-specific shaping (1007 bytes, patch)
2011-08-11 09:03 PDT, Jonathan Kew (:jfkthame)
jd.bugzilla: review+
Details | Diff | Review

Description Rami Ali 2011-06-25 01:36:34 PDT
User-Agent:       Mozilla/5.0 (Windows NT 6.1; WOW64; rv:5.0) Gecko/20100101 Firefox/5.0
Build Identifier: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:5.0) Gecko/20100101 Firefox/5.0

When the letter "Ha" (in arabic: ه) comes at the end of word and not connected with previous letter, it is not written correctly (in some websites but not all).

Reproducible: Always

Steps to Reproduce:
1. Visit This Link http://www.bbc.co.uk/arabic/scienceandtech/2011/06/110614_facebook_users.shtml
2. Search for work هذه , it looks like هذهـ


Actual Results:  
When "ha" (in arabic: ه) comes at the end of word like: هذه or سماه and not connected with previous letter like "Alif" (in arabic: ا) or "Dal" (in arabic: د),
It is written in a "beginning contextual" form which is wrong.

Expected Results:  
It should look like a circle, the end contextual form of "Ha" (in arabic: ه)

This problem may related to font. It occurs in all pages of some websites like BBC Arabic but not all websites. It was not found in Firefox 4.0 but happened in Firefox 5.0
Comment 1 Jonathan Kew (:jfkthame) 2011-06-25 02:34:48 PDT
I think this is a font issue rather than a Firefox bug.

The BBC Arabic site is using a custom downloadable font "BBCNassim", and this is apparently the glyph shape it renders in this context. (Note that it's not actually the initial form; it's the alternate "do-chashmee" form of HA that is normally used in certain contexts such as for aspiration in Urdu, or sometimes when using letters as numerals in a list, etc. You can tell this by zooming the text to a very large size, and observe that the "tail" on the left of the letter is not designed to link to a following letter but has a tapered terminal shape.)

The BBCNassim font apparently has other problems, too; in Nightly builds, its OpenType tables are rejected by the (updated) OTS sanitizer, and so the text does not shape properly at all.
Comment 2 Rami Ali 2011-06-26 01:03:31 PDT
Created attachment 541999 [details]
The problem does not exist in Firefox 4.0.1
Comment 3 Rami Ali 2011-06-26 01:05:09 PDT
Created attachment 542000 [details]
The problem exists in Firefox 5.0
Comment 4 Thomas Ahlblom 2011-07-23 19:07:41 PDT
Reproduced:
Mozilla/5.0 (X11; Linux x86_64; rv:5.0.1) Gecko/20100101 Firefox/5.0.1
Mozilla/5.0 (X11; Linux x86_64; rv:6.0) Gecko/20100101 Firefox/6.0
Mozilla/5.0 (X11; Linux x86_64; rv:7.0a2) Gecko/20110723 Firefox/7.0a2
Mozilla/5.0 (X11; Linux x86_64; rv:8.0a1) Gecko/20110723 Firefox/8.0a1

WFM:
Mozilla/5.0 (X11; U; Linux i686 (x86_64); en-US; rv:1.9.2.18) Gecko/20110614 Firefox/3.6.18
Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/534.30 (KHTML, like Gecko) Chrome/12.0.742.112 Safari/534.30

Last good nightly: 2011-04-11
First bad nightly: 2011-04-12

Pushlog:
http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=09b605eb3e0d&tochange=a174b86200d6

The first bad revision is:
changeset:   67845:24edeee30683
parent:      67844:55086e3bfe4d
parent:      67813:ac15602832bd
user:        Ehsan Akhgari <ehsan@mozilla.com>
date:        Mon Apr 11 11:53:07 2011 -0400
summary:     Merge cedar into mozilla-central
Comment 5 Kevin Brosnan [:kbrosnan] 2011-08-03 08:46:14 PDT
could this be 674335?
Comment 6 Thomas Ahlblom 2011-08-03 09:12:27 PDT
Mozilla/5.0 (X11; Linux x86_64; rv:8.0a1) Gecko/20110803 Firefox/8.0a1

Well, it looks like bug 674335 is a duplicate of this one, but setting gfx.downloadable_fonts.sanitize = false as suggested in bug 674335 comment 2 makes no difference to me. But I must admit I'm not familiar with Arabic, however I see a difference at the regression in comment 4.
Comment 7 Thomas Ahlblom 2011-08-03 09:54:02 PDT
Further local bisecting:

The first bad revision is:
changeset:   67842:fd6a216b1072
user:        Jonathan Kew <jfkthame@gmail.com>
date:        Mon Apr 11 16:33:12 2011 +0100
summary:     bug 644184 - ensure basic arabic shaping features are applied before ligature formation. r=jdaggett

http://hg.mozilla.org/mozilla-central/rev/fd6a216b1072
Comment 8 tntypography 2011-08-10 08:19:31 PDT
> I think this is a font issue rather than a Firefox bug.

It might be either, but both FF 4 and IE9 render it as expected.

The font is programmed so that the isolated shape of Heh renders in contexts that are not alphabetic - it does this by a contextual substitution that renders the shape that is expected after final and isolated glyphs.

This behaviour is there to allow proper Higra dates that do not require the 'hack' of inserting a tatweel character in order to get a simulacrum of the correct isolated shape (initial plus tail) - a bad practice users are forced into by fonts that do not contain this consideration.

> The BBC Arabic site is using a custom downloadable font "BBCNassim", and
> this is apparently the glyph shape it renders in this context.

It shouldn't and doesn't in other rendering engines.

> it's not actually the initial form; it's the alternate "do-chashmee" form of
> HA that is normally used in certain contexts such as for aspiration in Urdu,
> or sometimes when using letters as numerals in a list, etc. 

Not quite: it's not the "do-chashmee" and actually has a different Unicode too - FEE9.

> The BBCNassim font apparently has other problems, too; in Nightly builds,
> its OpenType tables are rejected by the (updated) OTS sanitizer, and so the
> text does not shape properly at all.

The language and script tag problems are unlikely to be related to this issue as the expected behaviour is defined for both, dlft and Arabic.

tn
Comment 9 Jonathan Kew (:jfkthame) 2011-08-11 04:11:01 PDT
OK, I think I'm beginning to understand the behavior here, and why it broke with the change in bug 644184.

Prior to that patch, Arabic shaping was implemented by determining the appropriate set of features for each character in the string, collecting the relevant lookups, and executing them in the order defined in the font. This is the generic feature/lookup-processing model described in the OpenType spec, where the order of lookup execution is entirely in the hands of the font developer.

Unfortunately, Microsoft's implementation of certain specific scripts (in Uniscribe, etc) departs from this, and executes individual _features_ sequentially in a predetermined order, instead of collecting the set of features and executing _lookups_ in the font-specified order. Some fonts rely on this (unconsciously, I expect), in that they have their lookups defined in an "incorrect" order (i.e. an order that if used, will not work as intended), and Uniscribe masks this (which I would consider sloppy font programming) by ignoring the font's lookup order and instead imposing its predefined feature order. In particular, some fonts have a ligature lookup ordered before the lookups for the basic Arabic joining features, but rely on it actually being executed later.

For compatibility with such fonts, we made a change to the Arabic shaper in bug 644184 that forced certain features (ccmp, and the core Arabic features init/medi/fina/isol) to be executed earlier than the other "generic" features such as ligatures.

However, in the case of this BBC Nassim font, that change breaks the implementation of 'heh', which relies on the lookups for the 'locl' feature being applied early. Ironically, in this font the actual lookup order is logical, so that our older implementation (applying lookups in the font-defined order) gave the expected result, but moving the Arabic-shaping features ahead of 'locl' causes this regression.

I think we can fix this if we move the 'locl' feature in Arabic to be processed (along with 'ccmp') ahead of the core joining features. Unfortunately, the MS spec for Arabic OT shaping does not mention this feature at all, so it's unclear whether this should be considered "standard" behavior. (See http://www.microsoft.com/typography/otfntdev/arabicot/features.aspx.)
Comment 10 Behdad Esfahbod 2011-08-11 06:13:15 PDT
I think we should do 'locl' combined with 'ccmp'.  No idea why I didn't do that already.
Comment 11 Jonathan Kew (:jfkthame) 2011-08-11 09:03:10 PDT
Created attachment 552385 [details] [diff] [review]
patch, apply 'locl' as one of the first features, before Arabic-specific shaping

This fixes the issue for the current trunk code by treating 'locl' along with 'ccmp' as one of the first features to be applied.

When we update to a new harfbuzz release, this will be superseded by the revised feature-management there, but we should fix it for now in the version we're currently using.
Comment 13 Kyle Huey [:khuey] (khuey@mozilla.com) (Away until 6/13) 2011-08-14 04:47:23 PDT
http://hg.mozilla.org/mozilla-central/rev/78dea7cd0f4d

Note You need to log in before you can comment on or make changes to this bug.