Closed Bug 644184 Opened 14 years ago Closed 14 years ago

wrong spelling when two letters are written one after another in the arabic version of Firefox 4.0

Categories

(Core :: Layout: Text and Fonts, defect)

x86_64
Windows 7
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla5

People

(Reporter: moh-ter, Assigned: jfkthame)

References

Details

(Keywords: regression)

Attachments

(8 files, 1 obsolete file)

User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:2.0) Gecko/20100101 Firefox/4.0 Build Identifier: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:2.0) Gecko/20100101 Firefox/4.0 When I write the letter "Alif" after a "Lam" , it doesn't look normal Reproducible: Always Steps to Reproduce: 1.Enter any page with the word "La" or لا written in arabic. 2.see how it looks. Actual Results: When I write the letter "Alif" or (http://img832.imageshack.us/img832/3783/alifw.jpg) after a "Lam" or (http://img715.imageshack.us/img715/8228/lamj.jpg), it looks like a U (http://img204.imageshack.us/img204/1021/wrongla.jpg) Expected Results: it should look like an X (http://img109.imageshack.us/img109/4152/77687119.jpg) Example: (I highlighted the wrong forms with red, and the correct ones with green) From Firefox 4.0 browser: http://img28.imageshack.us/img28/6049/firefoxprob.jpg From another browser: http://img33.imageshack.us/img33/2606/otherbrowser.jpg Note: When I'm writing the letters, it looks normal (like in the search bar on the example picture), but when posting it (on a forum for example) and someone else is viewing it, this problem happens. * Note: The problem doesn't occur on 3.6.15 or the older versions I've tried.
Attached image the letter "Alif"
Attached image the letter "Lam"
Attached image Example: Firefox 4.0
Is this the same problem as Bug 635639?
(In reply to comment #7) > Is this the same problem as Bug 635639? No it's not, the bug you posted is already fixed. (The letters of each word must be joint together in Arabic, in Bug 635639 it seems that there were spaces between the letters, That is not the problem I'm facing right now, everything is normal right now regarding the spacing between the letters) My problem is that a certain letter is spelled wrong (or at least it is shown wrong, but while writing the word it looks correct) , the wrong form looks like a U https://bugzilla.mozilla.org/attachment.cgi?id=521203 while it should look like an X https://bugzilla.mozilla.org/attachment.cgi?id=521202 for more details please see the attachments.
It seems like the pictures I posted are deleted from the host server. I re-uploaded them in the attechments. Sorry for the inconvenience.
Please attach a HTML testcase or provide a web link.
This problem happens with all the sites not just Google. This is the search I did on the example pictures: http://www.google.com.sa/search?q=%D9%84%D8%A7&ie=utf-8&oe=utf-8&aq=t&rls=org.mozilla:ar:official&client=firefox-a&safe=active it is also in the Arabic version of Firefox's webpage here: http://www.mozilla.com/ar/firefox/ [in the right corner of upper side of the page there is Firefox logo, on the left of it the word الميزات (meaning features) is written, on the left of it the word الإضافات(meaning add-ons) is written (but it appears wrong in Firefox 4.0, and in other browsers there is no problem)] I am facing this problem using the Arabic version of Firefox 4.0 I don't know about the English version
I found out that the problem is from the font "arial". The bug can be "temporarily" fixed by changing the font that is used by Firefox to "Tahoma" or "Times New Roman" I hope this helps. Thanks
Keywords: regression
Product: Firefox → Core
QA Contact: general → general
Version: unspecified → Trunk
CCing our text folks.
Component: General → Layout: Text
QA Contact: general → layout.fonts-and-text
Could this be font fallback falling back separately for the Lam and the Alif and not finding the Lam-Alif ligature?
(In reply to comment #14) > Could this be font fallback falling back separately for the Lam and the Alif > and not finding the Lam-Alif ligature? I don't think so - that would require the user to have a font that includes Lam, but doesn't include Alif, which seems pretty unlikely. Moreover, in that case they would render without joining at all; here, we're seeing the default joined shapes but not the required ligature.
OS: Windows 7 → Windows XP
OK, I can reproduce this when the font is Arial Bold, but not with Arial Regular. (Note that the Google search given as an example uses bold to highlight the "lam-alif" search term.) This seems likely to be a harfbuzz issue. :( The ligature forms correctly if harfbuzz is disabled.
Assignee: nobody → jfkthame
Status: UNCONFIRMED → NEW
Ever confirmed: true
OS: Windows XP → Windows 7
The cause of this is an anomaly in the GSUB table of Arial Bold. In this table, the lookup for the 'init' feature is placed _after_ the lookup for the 'rlig' feature. Harfbuzz gathers all the features to be applied to each glyph, and processes the lookups in the order defined in the font, and therefore the 'init' substitution for the Lam has not yet happened when the 'rlig' ligature lookup is applied, and so the ligature fails. (Note that this problem does _not_ occur with the final form of the ligature, as can be seen with a simple testcase: data:text/html,<p style="font: bold 32px arial">&#x644;&#x644;&#x627; This forms the Lam-Alif ligature because the lookups for 'medi' and 'fina' are correctly placed _before_ the 'rlig' lookup in the font's lookup list.) This dependency on lookup ordering in the font is explicitly mentioned in the OpenType specification (in several places): http://www.microsoft.com/typography/otspec/TTOCHAP1.htm: "In practice, the engine may apply features simultaneously; thus, it is up to the font vendor to ensure that the features’ lookups are ordered to achieve the desired effect..." http://www.microsoft.com/typography/otspec/chapter2.htm: "After choosing which features to use, the client assembles all lookups from the selected features. Multiple lookups may be needed to define the data required for different substitution and positioning actions, as well as to control the sequencing and effects of those actions. "To implement features, a client applies the lookups in the order the lookup definitions occur in the LookupList. As a result, within the GSUB or GPOS table, lookups from several different features may be interleaved during text processing.... "....The lookup sequencing mechanism in TrueType relies on the font to determine the proper order of text-processing operations." http://www.microsoft.com/typography/otspec/gsub.htm: "To access GSUB information, clients should use the following procedure: .... Inspect the FeatureTag of each feature, and select the features to apply to an input glyph string. Each feature provides an array of index numbers into the GSUB LookupList table. Assemble all lookups from the set of chosen features, and apply the lookups in the order given in the LookupList table." Thus harfbuzz's behavior is correct as per the OpenType specification. The fact that Arial Bold happens to work "properly" under Uniscribe is because Uniscribe explicitly does _not_ follow the OpenType spec in this regard. The description of Uniscribe's Arabic shaping makes it clear that it applies features individually in sequence, instead of assembling the full collection of lookups (as per the spec). http://www.microsoft.com/typography/OpenType%20Dev/arabic/shaping.mspx: "Next, Uniscribe calls OTLS to apply the features. All OTL processing is divided into a set of predefined features (described and illustrated in the Features section of this document). Each feature is applied, one by one, to the appropriate glyphs in the syllable and OTLS processes them. Uniscribe makes as many calls to the OTL Services as there are features. This ensures that the features are executed in the desired order." I suspect Uniscribe was implemented this way in order to work "properly" with poorly-constructed fonts where the lookups were not ordered appropriately. However, this behavior is not conformant with the published OpenType specification. In summary, I believe this is a font bug, not a Gecko or Harfbuzz bug, and I think we should WONTFIX it and instead report a bug against the Arial Bold font that Microsoft is shipping.
Is this issue worth to file a bug to change http://www.mozilla.com/ar/firefox/ ?
(In reply to comment #17) > In summary, I believe this is a font bug, not a Gecko or Harfbuzz bug, and I > think we should WONTFIX it and instead report a bug against the Arial Bold font > that Microsoft is shipping. Will that actually result in the updated font which has this fix get on the user's machine? (i.e., are these fonts updated as part of Windows Update)? Can we somehow work around this issue, since it might be considered a regression for those users who do not know what's going on here?
Unfortunately there are many Arabic fonts that rely on the segmented application of the Arabic features. So I plan to implement that in HarfBuzz.
The attached patch fixes the rendering of poorly-constructed fonts like this by ensuring that the lookups for the basic init/medi/fina/isol features are applied before the ligature features.
Attachment #523845 - Flags: review?(jdaggett)
Attachment #523845 - Flags: feedback?(mozilla)
Oops, forgot to refresh the patch with one more detail: ensure that 'ccmp' is applied before other features (including the Arabic basic-shaping ones). Otherwise we might break the rendering of some more complex Arabic fonts that rely on ccmp to compose or decompose letters into skeletal forms + separate dots before shaping.
Attachment #523845 - Attachment is obsolete: true
Attachment #523845 - Flags: review?(jdaggett)
Attachment #523845 - Flags: feedback?(mozilla)
Attachment #523846 - Flags: review?(jdaggett)
Attachment #523846 - Flags: feedback?(mozilla)
This adds a reftest so that we'll know if we break this again in some future update; it's marked as random for non-Windows platforms as the behavior will depend on the particular fonts available (e.g. some versions of Arial on OS X don't have the same Arabic-character coverage as the Win7 version).
Attachment #523890 - Flags: review?(jdaggett)
Thanks Jonathan. I like your approach and will adopt it (with modifications) upstreams. Two comments about the functionality though: - 'locl' should be in the pre-script-specific phase. Not sure what other features belong there. I'm guessing 'rtlm' family too. The only font I can find that has the mirroring features is KacstOne, which has it as the very last lookup index. That said, KacstOne also has 'locl' right before 'rtlm', ie. after shaping. Should investigate. But the Indic OT spec clearly requires 'locl' before the shaping. - Your approach breaks shaping in two phases even for non-complex cases. That is not desired IMO.
(In reply to comment #25) > - 'locl' should be in the pre-script-specific phase. Not sure what other > features belong there. I'm guessing 'rtlm' family too. The only font I can > find that has the mirroring features is KacstOne, which has it as the very last > lookup index. That said, KacstOne also has 'locl' right before 'rtlm', ie. > after shaping. Should investigate. I haven't investigated what 'locl' in KacstOne actually does, but it's perfectly reasonable to imagine a font designer choosing to do 'locl' late. For example, the font might have localized forms only for certain letters in specific positions. This would be most easily implemented with a late 'locl' lookup. > - Your approach breaks shaping in two phases even for non-complex cases. > That is not desired IMO. Yes, I realize it affects the generic behavior as well. Avoiding this will require a bit more separation of the generic and script-specific paths. I was trying to keep the patch as minimal as possible for now (and it seemed unlikely to actually have ill effects), but eventually I think you're going to have to do that.
(In reply to comment #26) > (In reply to comment #25) > > > - 'locl' should be in the pre-script-specific phase. Not sure what other > > features belong there. I'm guessing 'rtlm' family too. The only font I can > > find that has the mirroring features is KacstOne, which has it as the very last > > lookup index. That said, KacstOne also has 'locl' right before 'rtlm', ie. > > after shaping. Should investigate. > > I haven't investigated what 'locl' in KacstOne actually does, but it's > perfectly reasonable to imagine a font designer choosing to do 'locl' late. For > example, the font might have localized forms only for certain letters in > specific positions. This would be most easily implemented with a late 'locl' > lookup. I agree. But any idea how to reconcile this with the Indic spec? I'd hate to have some shapers do it before and some do it after... > > - Your approach breaks shaping in two phases even for non-complex cases. > > That is not desired IMO. > > Yes, I realize it affects the generic behavior as well. Avoiding this will > require a bit more separation of the generic and script-specific paths. I was > trying to keep the patch as minimal as possible for now (and it seemed unlikely > to actually have ill effects), but eventually I think you're going to have to > do that. This one I know how to fix.
Seems like the 'locl' feature in KacstOne maps Latin digits to Arabic digits. Ughh..
Attachment #523890 - Flags: review?(jdaggett) → review+
Comment on attachment 523846 [details] [diff] [review] updated patch, also ensure 'ccmp' comes first Looks fine with me, assuming Behdad is comfortable with this.
Attachment #523846 - Flags: review?(jdaggett) → review+
I believe Behdad wants to do a more thorough fix eventually, but until then we should take this in order to fix the reported issue with Arial Bold on Windows (and various other fonts, I believe).
Correct.
Status: NEW → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
Whiteboard: [fixed-in-cedar]
Target Milestone: --- → mozilla2.2
Attachment #523846 - Flags: feedback?(mozilla)
BTW, the Arial Bold shipped with XP does not have the bug. Also, fixed in harfbuzz upstream: http://cgit.freedesktop.org/harfbuzz/commit/?id=b70c96dbe41d6512b80fe3d966a1942e1ef64a4b
(In reply to comment #28) > Seems like the 'locl' feature in KacstOne maps Latin digits to Arabic > digits. Ughh.. I use KacstOne as an interface font (using Gnome), and since there was no simple way to numbers in progress dialogs, clock etc. in Arabic (localised formatters in Arabic glibc locales don't do that), I had to do it in the font. It was not very well thought and I don't recall the details, but it worked in Pango and that all what mattered then. If you think the font is broken in some way, I can fix it.
(In reply to comment #35) > (In reply to comment #28) > > Seems like the 'locl' feature in KacstOne maps Latin digits to Arabic > > digits. Ughh.. > > I use KacstOne as an interface font (using Gnome), and since there was no > simple way to numbers in progress dialogs, clock etc. in Arabic (localised > formatters in Arabic glibc locales don't do that), I had to do it in the > font. It was not very well thought and I don't recall the details, but it > worked in Pango and that all what mattered then. If you think the font is > broken in some way, I can fix it. Well, that behavior is nonstandard and hence broken. Fix glibc I would say.
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: