1045139 - The Arabic text with "MS Sans Serif" font is rendered bad

Reporter

Description

•

10 years ago

The Arabic text does not appear correctly with MS Sans Serif font and HWA disabled.

Testcase https://dl.dropboxusercontent.com/u/95157096/85f61cf7/oj13vf3ek.html


This problem occurs since the version 33.0a1 (2014-06-14).

https://hg.mozilla.org/mozilla-central/rev/86aa28ce309e

Screenshot https://dl.dropboxusercontent.com/u/95157096/85f61cf7/71e5hi0ac.png

Jonathan Kew [:jfkthame]

Comment 1

•

10 years ago

[Tracking Requested - why for this release]: Arabic-script sites that explicitly call for the MS Sans Serif font will show broken text on GDI-based (non-HW-acc) Windows systems.

This is clearly a regression from bug 985220. MS Sans Serif is an old bitmap font, so obviously it doesn't have OpenType tables; Uniscribe must have special-cased it one way or another. We should figure out some way to work around this, perhaps by forcing it to fall back to another font (which must be happening on all other platforms, including Windows+DWrite anyway).

status-firefox32: --- → unaffected

status-firefox33: --- → affected

status-firefox34: --- → affected

tracking-firefox33: --- → ?

tracking-firefox34: --- → ?

Jonathan Kew [:jfkthame]

Updated

•

10 years ago

Blocks: 985220

Status: UNCONFIRMED → NEW

Ever confirmed: true

Jonathan Kew [:jfkthame]

Comment 2

•

10 years ago

One way to address this might be to implement bug 1006122 on Windows, as that should mask out the Arabic range in MS Sans Serif.

Jonathan Kew [:jfkthame]

Updated

•

10 years ago

Depends on: 1006122

Behdad Esfahbod

Comment 3

•

10 years ago

Jonathan, why is the fallback arabic shaping in HarfBuzz not working here?

Jonathan Kew [:jfkthame]

Comment 4

•

10 years ago

(In reply to Behdad Esfahbod from comment #3)
> Jonathan, why is the fallback arabic shaping in HarfBuzz not working here?

I wondered that too, and am intending to take a look but haven't had time yet. I wouldn't be surprised if it turns out the font does not support the presentation-form codepoints, but relied on some other legacy hack for "shaping". But that's pure speculation, until I spend a bit of time in Windows to poke at it...

Behdad Esfahbod

Comment 5

•

10 years ago

Ah, it's Windows-1255 encoding!

Behdad Esfahbod

Comment 6

•

10 years ago

Well, Windows-1256 I mean.

Behdad Esfahbod

Comment 7

•

10 years ago

So, what implementation do you use for get_glyph() callback with this font?

Jonathan Kew [:jfkthame]

Comment 8

•

10 years ago

We should be using gfxGDIFont::GetGlyph(), found at http://mxr.mozilla.org/mozilla-central/source/gfx/thebes/gfxGDIFont.cpp#422, which is based on Uniscribe's ScriptGetCMap() API. So I need to check what that's actually doing.

blinky

Reporter

Comment 9

•

10 years ago

Screenshot of page affected by this bug https://dl.dropboxusercontent.com/u/95157096/85f61cf7/jdgp9tl97.png

Behdad Esfahbod

Comment 10

•

10 years ago

Thanks Jonathan.  I suppose it's returning a custom cmap that maps Arabic chars.  Looks like MS is using a custom glyph encoding for Windows-1256 shaping.  It's using 0x05..0x1F etc for shaped glyphs.

Technically, we can detect this in the arabic fallback shaper code and synthesize a GSUB table for it.  In fact, the GSUB table for it is fixed.  So we just need to do the mapping once and include the binary.

I can write that patch if you are interested in it.

Jonathan Kew [:jfkthame]

Comment 11

•

10 years ago

(In reply to Behdad Esfahbod from comment #10)
> Thanks Jonathan.  I suppose it's returning a custom cmap that maps Arabic
> chars.  Looks like MS is using a custom glyph encoding for Windows-1256
> shaping.  It's using 0x05..0x1F etc for shaped glyphs.
> 
> Technically, we can detect this in the arabic fallback shaper code and
> synthesize a GSUB table for it.  In fact, the GSUB table for it is fixed. 
> So we just need to do the mapping once and include the binary.
> 
> I can write that patch if you are interested in it.

Sounds good, thanks. (Well, kinda. I hate to carry extra code just to support obsolete stuff like this. But there are enough people on affected Windows systems that we should do it, I guess.)

Behdad Esfahbod

Comment 12

•

10 years ago

Ok, here's a patch:

  https://github.com/behdad/harfbuzz/commit/479ae6855d5611f9d05441b320e4db57d3502f4a

There's a tiny commit before it that you need also.

From the commit log:

commit 479ae6855d5611f9d05441b320e4db57d3502f4a
Author: Behdad Esfahbod <behdad@behdad.org>
Date:   Wed Jul 30 02:15:44 2014 -0400

    [arabic] Implement Windows-1256 private shaping
    
    Bug 1045139 - The Arabic text with "MS Sans Serif" font is rendered bad
    https://bugzilla.mozilla.org/show_bug.cgi?id=1045139
    
    This is only enabled on Windows platforms, and requires support from
    Uniscribe to work.  But for clients that do hook up to Uniscribe, this
    fixes shaping of Windows-1256-encoded bitmap fonts like "MS Sans Serif".
    
    The code and table together have just less than a 1kb footprint when
    enabled.
    
    UNTESTED.  I might even have broken regular Arabic fallback shaping.

Benjamin Kerensa [:bkerensa]

Comment 13

•

10 years ago

Will track for now.

tracking-firefox33: ? → +

tracking-firefox34: ? → +

Behdad Esfahbod

Comment 14

•

10 years ago

Ok, I fixed that commit a bit.  Look here instead:

https://github.com/behdad/harfbuzz/commits/win1256

It's just the top commit you need.

Jonathan Kew [:jfkthame]

Comment 15

•

10 years ago

Thanks, confirmed that it now builds successfully under MSVC.

As for the behavior... almost, but not quite! With the simple testcase from here[1], the text shapes correctly *except* that the khah in 'al-khat' (U+062E) appears as a final form, rather than medial.

Jonathan Kew [:jfkthame]

Comment 16

•

10 years ago

Looking at http://en.wikipedia.org/wiki/Arabic_alphabet#Table_of_basic_letters with the font forced to MS Sans Serif, I see that all three of JEEM, HAH and KHAH share the same problem: they display final where they should be medial, and they display isolate where they should be final.

Behdad Esfahbod

Comment 17

•

10 years ago

You are right!  My bad.  Fixing.

Behdad Esfahbod

Comment 18

•

10 years ago

Fix pushed:

https://github.com/behdad/harfbuzz/commit/946e610c7fe7ad23db0ea41c55f93c9ab0f7c10c

Also please test:

- Lam-Alef ligature, in final vs isolated forms.  You should get distinct glyphs.

- Diacritic marks:

  * I'm curious what happens with fallback positioning.  If the bitmap glyphs return a big box, positioning should fail, right?  We should detect that and skip positioning,

  * The Shadda followed by Fatha/Zamma/Kasra should ligate.

Jonathan Kew [:jfkthame]

Comment 19

•

10 years ago

(In reply to Behdad Esfahbod from comment #18)
> Fix pushed:
> 
> https://github.com/behdad/harfbuzz/commit/
> 946e610c7fe7ad23db0ea41c55f93c9ab0f7c10c

Great, that appears to have fixed those letters.

> Also please test:
> 
> - Lam-Alef ligature, in final vs isolated forms.  You should get distinct
> glyphs.

I'm getting the isolated form even when there's a preceding left-joining letter, so that doesn't look right.

The other letter where I'm noticing a problem is U+0629 TEH MARBUTA; I get the isolated letter where it ought to be final.

> 
> - Diacritic marks:
> 
>   * I'm curious what happens with fallback positioning.  If the bitmap
> glyphs return a big box, positioning should fail, right?  We should detect
> that and skip positioning,

No fallback positioning seems to be happening, which is fine; the diacritics all stay at a constant level, which AFAIK is how the font has always been expected to render. I'll attach a screenshot of text from quran.com with the font forced to MS Sans Serif/cp1256. There are some glitches where an unsupported diacritic falls back to a different font and interrupts the shaping, but aside from that, and the two issues mentioned above, I think it looks about as good as can be expected.

> 
>   * The Shadda followed by Fatha/Zamma/Kasra should ligate.

Yes, that looks OK. There are occurrences of all three in the screenshot I'll attach.

Jonathan Kew [:jfkthame]

Comment 20

•

10 years ago

Attached image screenshot to illustrate comment 19 — Details

Behdad Esfahbod

Comment 21

•

10 years ago

Fixed both.  please take another look.

Behdad Esfahbod

Comment 22

•

10 years ago

In your screenshot, in 2:16, the first word, third letter (from right), the Lam should be initial but is isolated.  I don't understand how that can happen.

Behdad Esfahbod

Comment 23

•

10 years ago

Is your Unicode Data complete?  Looks like U+0670 SUPERSCRIPT ALEF is breaking the shaping?

Behdad Esfahbod

Comment 24

•

10 years ago

Here's the sequence:

$ ./hb-unicode-decode ﻚﹷﺌﹻﻟﹶٰﻭﺃﹸ | ./hb-unicode-prettyname  
ALEF WITH HAMZA ABOVE + ARABIC DAMMA + WAW + LAM + ARABIC FATHA + SUPERSCRIPT ALEF + YEH WITH HAMZA ABOVE + ARABIC KASRA + KAF + ARABIC FATHA

I'm guessing that SUPERSCRIPT ALEF is coming from another font and breaks the shaping...

Jonathan Kew [:jfkthame]

Comment 25

•

10 years ago

Right, that'll be what's happening there. We don't currently implement context across font runs, so a character like U+0670 that falls back to a different font will cause a break in shaping.

Jonathan Kew [:jfkthame]

Comment 26

•

10 years ago

(In reply to Behdad Esfahbod from comment #21)
> Fixed both.  please take another look.

Yep, medial lam-alef is now working, and teh-marbuta is shaping correctly.

However, now that the mediLamLookup is hooked up, we're getting an extra isolated lam after any instance of medial lam that *isn't* followed by alef.

ISTM that we need to insert the LAM_MEDI_MARKER only in context when there is a following alef.

Behdad Esfahbod

Comment 27

•

10 years ago

Oh shoot.  Yeah, of course.  That's why I hadn't hooked up the lamMedi, I need to write a contextual lookup.  Ok, will do that now.

Behdad Esfahbod

Comment 28

•

10 years ago

Humm.  The LAM_MEDI_MARKER won't work if user disables rlig.  I know that's not supported. Still, I think it's better to add a contextual rlig feature that chooses the correct form based on the previous glyph.  I'll do that instead.

Behdad Esfahbod

Comment 29

•

10 years ago

Humm.  The solution was much easier: we control the order of lookups here, so doing rlig before medi/fina does the trick.  Please try again.

Jonathan Kew [:jfkthame]

Comment 30

•

10 years ago

Almost! You forgot to update the ligature rules to match the original (unshaped) lam/alef glyphs, but once I fixed that locally, it works.

Behdad Esfahbod

Comment 31

•

10 years ago

In master now!

Jonathan Kew [:jfkthame]

Updated

•

10 years ago

Depends on: 1036981

Jonathan Kew [:jfkthame]

Comment 32

•

10 years ago

The harfbuzz update that landed in bug 1036981 includes the fix for this issue.

Status: NEW → RESOLVED

Closed: 10 years ago

No longer depends on: 1006122

Resolution: --- → FIXED

Ryan VanderMeulen [:RyanVM][PTO June 24-28]

Updated

•

10 years ago

Assignee: nobody → birunthan

status-firefox34: affected → fixed

Target Milestone: --- → mozilla34

Sylvestre Ledru [:Sylvestre]

Comment 33

•

10 years ago

Jonathan, birunthan, what are the plan wrt 33? Are you planning to ask for an uplift of harfbuzz to see that bug fixed in 33? Thanks

Flags: needinfo?(jfkthame)

Flags: needinfo?(birunthan)

Jonathan Kew [:jfkthame]

Comment 34

•

10 years ago

(In reply to Sylvestre Ledru [:sylvestre] from comment #33)
> Jonathan, birunthan, what are the plan wrt 33? Are you planning to ask for
> an uplift of harfbuzz to see that bug fixed in 33? Thanks

My plan was to nominate bug 1036981 for uplift to aurora once it's had a few days to bake on central, in order to fix this issue. (I think that's preferable to trying to cherry-pick just this fix onto our existing older harfbuzz version, though that would be a possible alternative.)

Flags: needinfo?(jfkthame)

Biru [:poiru]

Assignee

Updated

•

10 years ago

Flags: needinfo?(birunthan)

Ryan VanderMeulen [:RyanVM][PTO June 24-28]

Updated

•

10 years ago

status-firefox33: affected → fixed