User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:5.0) Gecko/20100101 Firefox/5.0
Build Identifier: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:5.0) Gecko/20100101 Firefox/5.0
When the letter "Ha" (in arabic: ه) comes at the end of word and not connected with previous letter, it is not written correctly (in some websites but not all).
Steps to Reproduce:
1. Visit This Link http://www.bbc.co.uk/arabic/scienceandtech/2011/06/110614_facebook_users.shtml
2. Search for work هذه , it looks like هذهـ
When "ha" (in arabic: ه) comes at the end of word like: هذه or سماه and not connected with previous letter like "Alif" (in arabic: ا) or "Dal" (in arabic: د),
It is written in a "beginning contextual" form which is wrong.
It should look like a circle, the end contextual form of "Ha" (in arabic: ه)
This problem may related to font. It occurs in all pages of some websites like BBC Arabic but not all websites. It was not found in Firefox 4.0 but happened in Firefox 5.0
I think this is a font issue rather than a Firefox bug.
The BBC Arabic site is using a custom downloadable font "BBCNassim", and this is apparently the glyph shape it renders in this context. (Note that it's not actually the initial form; it's the alternate "do-chashmee" form of HA that is normally used in certain contexts such as for aspiration in Urdu, or sometimes when using letters as numerals in a list, etc. You can tell this by zooming the text to a very large size, and observe that the "tail" on the left of the letter is not designed to link to a following letter but has a tapered terminal shape.)
The BBCNassim font apparently has other problems, too; in Nightly builds, its OpenType tables are rejected by the (updated) OTS sanitizer, and so the text does not shape properly at all.
Created attachment 541999 [details]
The problem does not exist in Firefox 4.0.1
Created attachment 542000 [details]
The problem exists in Firefox 5.0
Mozilla/5.0 (X11; Linux x86_64; rv:5.0.1) Gecko/20100101 Firefox/5.0.1
Mozilla/5.0 (X11; Linux x86_64; rv:6.0) Gecko/20100101 Firefox/6.0
Mozilla/5.0 (X11; Linux x86_64; rv:7.0a2) Gecko/20110723 Firefox/7.0a2
Mozilla/5.0 (X11; Linux x86_64; rv:8.0a1) Gecko/20110723 Firefox/8.0a1
Mozilla/5.0 (X11; U; Linux i686 (x86_64); en-US; rv:22.214.171.124) Gecko/20110614 Firefox/3.6.18
Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/534.30 (KHTML, like Gecko) Chrome/12.0.742.112 Safari/534.30
Last good nightly: 2011-04-11
First bad nightly: 2011-04-12
The first bad revision is:
user: Ehsan Akhgari <email@example.com>
date: Mon Apr 11 11:53:07 2011 -0400
summary: Merge cedar into mozilla-central
could this be 674335?
Mozilla/5.0 (X11; Linux x86_64; rv:8.0a1) Gecko/20110803 Firefox/8.0a1
Well, it looks like bug 674335 is a duplicate of this one, but setting gfx.downloadable_fonts.sanitize = false as suggested in bug 674335 comment 2 makes no difference to me. But I must admit I'm not familiar with Arabic, however I see a difference at the regression in comment 4.
Further local bisecting:
The first bad revision is:
user: Jonathan Kew <firstname.lastname@example.org>
date: Mon Apr 11 16:33:12 2011 +0100
summary: bug 644184 - ensure basic arabic shaping features are applied before ligature formation. r=jdaggett
> I think this is a font issue rather than a Firefox bug.
It might be either, but both FF 4 and IE9 render it as expected.
The font is programmed so that the isolated shape of Heh renders in contexts that are not alphabetic - it does this by a contextual substitution that renders the shape that is expected after final and isolated glyphs.
This behaviour is there to allow proper Higra dates that do not require the 'hack' of inserting a tatweel character in order to get a simulacrum of the correct isolated shape (initial plus tail) - a bad practice users are forced into by fonts that do not contain this consideration.
> The BBC Arabic site is using a custom downloadable font "BBCNassim", and
> this is apparently the glyph shape it renders in this context.
It shouldn't and doesn't in other rendering engines.
> it's not actually the initial form; it's the alternate "do-chashmee" form of
> HA that is normally used in certain contexts such as for aspiration in Urdu,
> or sometimes when using letters as numerals in a list, etc.
Not quite: it's not the "do-chashmee" and actually has a different Unicode too - FEE9.
> The BBCNassim font apparently has other problems, too; in Nightly builds,
> its OpenType tables are rejected by the (updated) OTS sanitizer, and so the
> text does not shape properly at all.
The language and script tag problems are unlikely to be related to this issue as the expected behaviour is defined for both, dlft and Arabic.
OK, I think I'm beginning to understand the behavior here, and why it broke with the change in bug 644184.
Prior to that patch, Arabic shaping was implemented by determining the appropriate set of features for each character in the string, collecting the relevant lookups, and executing them in the order defined in the font. This is the generic feature/lookup-processing model described in the OpenType spec, where the order of lookup execution is entirely in the hands of the font developer.
Unfortunately, Microsoft's implementation of certain specific scripts (in Uniscribe, etc) departs from this, and executes individual _features_ sequentially in a predetermined order, instead of collecting the set of features and executing _lookups_ in the font-specified order. Some fonts rely on this (unconsciously, I expect), in that they have their lookups defined in an "incorrect" order (i.e. an order that if used, will not work as intended), and Uniscribe masks this (which I would consider sloppy font programming) by ignoring the font's lookup order and instead imposing its predefined feature order. In particular, some fonts have a ligature lookup ordered before the lookups for the basic Arabic joining features, but rely on it actually being executed later.
For compatibility with such fonts, we made a change to the Arabic shaper in bug 644184 that forced certain features (ccmp, and the core Arabic features init/medi/fina/isol) to be executed earlier than the other "generic" features such as ligatures.
However, in the case of this BBC Nassim font, that change breaks the implementation of 'heh', which relies on the lookups for the 'locl' feature being applied early. Ironically, in this font the actual lookup order is logical, so that our older implementation (applying lookups in the font-defined order) gave the expected result, but moving the Arabic-shaping features ahead of 'locl' causes this regression.
I think we can fix this if we move the 'locl' feature in Arabic to be processed (along with 'ccmp') ahead of the core joining features. Unfortunately, the MS spec for Arabic OT shaping does not mention this feature at all, so it's unclear whether this should be considered "standard" behavior. (See http://www.microsoft.com/typography/otfntdev/arabicot/features.aspx.)
I think we should do 'locl' combined with 'ccmp'. No idea why I didn't do that already.
Created attachment 552385 [details] [diff] [review]
patch, apply 'locl' as one of the first features, before Arabic-specific shaping
This fixes the issue for the current trunk code by treating 'locl' along with 'ccmp' as one of the first features to be applied.
When we update to a new harfbuzz release, this will be superseded by the revised feature-management there, but we should fix it for now in the version we're currently using.