Closed
Bug 1045139
Opened 10 years ago
Closed 10 years ago
The Arabic text with "MS Sans Serif" font is rendered bad
Categories
(Core :: Graphics: Text, defect)
Tracking
()
RESOLVED
FIXED
mozilla34
Tracking | Status | |
---|---|---|
firefox32 | --- | unaffected |
firefox33 | + | fixed |
firefox34 | + | fixed |
People
(Reporter: over68, Assigned: poiru)
References
Details
Attachments
(1 file)
12.18 KB,
image/png
|
Details |
The Arabic text does not appear correctly with MS Sans Serif font and HWA disabled. Testcase https://dl.dropboxusercontent.com/u/95157096/85f61cf7/oj13vf3ek.html This problem occurs since the version 33.0a1 (2014-06-14). https://hg.mozilla.org/mozilla-central/rev/86aa28ce309e Screenshot https://dl.dropboxusercontent.com/u/95157096/85f61cf7/71e5hi0ac.png
Comment 1•10 years ago
|
||
[Tracking Requested - why for this release]: Arabic-script sites that explicitly call for the MS Sans Serif font will show broken text on GDI-based (non-HW-acc) Windows systems. This is clearly a regression from bug 985220. MS Sans Serif is an old bitmap font, so obviously it doesn't have OpenType tables; Uniscribe must have special-cased it one way or another. We should figure out some way to work around this, perhaps by forcing it to fall back to another font (which must be happening on all other platforms, including Windows+DWrite anyway).
status-firefox32:
--- → unaffected
status-firefox33:
--- → affected
status-firefox34:
--- → affected
tracking-firefox33:
--- → ?
tracking-firefox34:
--- → ?
Updated•10 years ago
|
Comment 2•10 years ago
|
||
One way to address this might be to implement bug 1006122 on Windows, as that should mask out the Arabic range in MS Sans Serif.
Comment 3•10 years ago
|
||
Jonathan, why is the fallback arabic shaping in HarfBuzz not working here?
Comment 4•10 years ago
|
||
(In reply to Behdad Esfahbod from comment #3) > Jonathan, why is the fallback arabic shaping in HarfBuzz not working here? I wondered that too, and am intending to take a look but haven't had time yet. I wouldn't be surprised if it turns out the font does not support the presentation-form codepoints, but relied on some other legacy hack for "shaping". But that's pure speculation, until I spend a bit of time in Windows to poke at it...
Comment 5•10 years ago
|
||
Ah, it's Windows-1255 encoding!
Comment 6•10 years ago
|
||
Well, Windows-1256 I mean.
Comment 7•10 years ago
|
||
So, what implementation do you use for get_glyph() callback with this font?
Comment 8•10 years ago
|
||
We should be using gfxGDIFont::GetGlyph(), found at http://mxr.mozilla.org/mozilla-central/source/gfx/thebes/gfxGDIFont.cpp#422, which is based on Uniscribe's ScriptGetCMap() API. So I need to check what that's actually doing.
Screenshot of page affected by this bug https://dl.dropboxusercontent.com/u/95157096/85f61cf7/jdgp9tl97.png
Comment 10•10 years ago
|
||
Thanks Jonathan. I suppose it's returning a custom cmap that maps Arabic chars. Looks like MS is using a custom glyph encoding for Windows-1256 shaping. It's using 0x05..0x1F etc for shaped glyphs. Technically, we can detect this in the arabic fallback shaper code and synthesize a GSUB table for it. In fact, the GSUB table for it is fixed. So we just need to do the mapping once and include the binary. I can write that patch if you are interested in it.
Comment 11•10 years ago
|
||
(In reply to Behdad Esfahbod from comment #10) > Thanks Jonathan. I suppose it's returning a custom cmap that maps Arabic > chars. Looks like MS is using a custom glyph encoding for Windows-1256 > shaping. It's using 0x05..0x1F etc for shaped glyphs. > > Technically, we can detect this in the arabic fallback shaper code and > synthesize a GSUB table for it. In fact, the GSUB table for it is fixed. > So we just need to do the mapping once and include the binary. > > I can write that patch if you are interested in it. Sounds good, thanks. (Well, kinda. I hate to carry extra code just to support obsolete stuff like this. But there are enough people on affected Windows systems that we should do it, I guess.)
Comment 12•10 years ago
|
||
Ok, here's a patch: https://github.com/behdad/harfbuzz/commit/479ae6855d5611f9d05441b320e4db57d3502f4a There's a tiny commit before it that you need also. From the commit log: commit 479ae6855d5611f9d05441b320e4db57d3502f4a Author: Behdad Esfahbod <behdad@behdad.org> Date: Wed Jul 30 02:15:44 2014 -0400 [arabic] Implement Windows-1256 private shaping Bug 1045139 - The Arabic text with "MS Sans Serif" font is rendered bad https://bugzilla.mozilla.org/show_bug.cgi?id=1045139 This is only enabled on Windows platforms, and requires support from Uniscribe to work. But for clients that do hook up to Uniscribe, this fixes shaping of Windows-1256-encoded bitmap fonts like "MS Sans Serif". The code and table together have just less than a 1kb footprint when enabled. UNTESTED. I might even have broken regular Arabic fallback shaping.
Comment 14•10 years ago
|
||
Ok, I fixed that commit a bit. Look here instead: https://github.com/behdad/harfbuzz/commits/win1256 It's just the top commit you need.
Comment 15•10 years ago
|
||
Thanks, confirmed that it now builds successfully under MSVC. As for the behavior... almost, but not quite! With the simple testcase from here[1], the text shapes correctly *except* that the khah in 'al-khat' (U+062E) appears as a final form, rather than medial.
Comment 16•10 years ago
|
||
Looking at http://en.wikipedia.org/wiki/Arabic_alphabet#Table_of_basic_letters with the font forced to MS Sans Serif, I see that all three of JEEM, HAH and KHAH share the same problem: they display final where they should be medial, and they display isolate where they should be final.
Comment 17•10 years ago
|
||
You are right! My bad. Fixing.
Comment 18•10 years ago
|
||
Fix pushed: https://github.com/behdad/harfbuzz/commit/946e610c7fe7ad23db0ea41c55f93c9ab0f7c10c Also please test: - Lam-Alef ligature, in final vs isolated forms. You should get distinct glyphs. - Diacritic marks: * I'm curious what happens with fallback positioning. If the bitmap glyphs return a big box, positioning should fail, right? We should detect that and skip positioning, * The Shadda followed by Fatha/Zamma/Kasra should ligate.
Comment 19•10 years ago
|
||
(In reply to Behdad Esfahbod from comment #18) > Fix pushed: > > https://github.com/behdad/harfbuzz/commit/ > 946e610c7fe7ad23db0ea41c55f93c9ab0f7c10c Great, that appears to have fixed those letters. > Also please test: > > - Lam-Alef ligature, in final vs isolated forms. You should get distinct > glyphs. I'm getting the isolated form even when there's a preceding left-joining letter, so that doesn't look right. The other letter where I'm noticing a problem is U+0629 TEH MARBUTA; I get the isolated letter where it ought to be final. > > - Diacritic marks: > > * I'm curious what happens with fallback positioning. If the bitmap > glyphs return a big box, positioning should fail, right? We should detect > that and skip positioning, No fallback positioning seems to be happening, which is fine; the diacritics all stay at a constant level, which AFAIK is how the font has always been expected to render. I'll attach a screenshot of text from quran.com with the font forced to MS Sans Serif/cp1256. There are some glitches where an unsupported diacritic falls back to a different font and interrupts the shaping, but aside from that, and the two issues mentioned above, I think it looks about as good as can be expected. > > * The Shadda followed by Fatha/Zamma/Kasra should ligate. Yes, that looks OK. There are occurrences of all three in the screenshot I'll attach.
Comment 20•10 years ago
|
||
Comment 21•10 years ago
|
||
Fixed both. please take another look.
Comment 22•10 years ago
|
||
In your screenshot, in 2:16, the first word, third letter (from right), the Lam should be initial but is isolated. I don't understand how that can happen.
Comment 23•10 years ago
|
||
Is your Unicode Data complete? Looks like U+0670 SUPERSCRIPT ALEF is breaking the shaping?
Comment 24•10 years ago
|
||
Here's the sequence: $ ./hb-unicode-decode ﻚﹷﺌﹻﻟﹶٰﻭﺃﹸ | ./hb-unicode-prettyname ALEF WITH HAMZA ABOVE + ARABIC DAMMA + WAW + LAM + ARABIC FATHA + SUPERSCRIPT ALEF + YEH WITH HAMZA ABOVE + ARABIC KASRA + KAF + ARABIC FATHA I'm guessing that SUPERSCRIPT ALEF is coming from another font and breaks the shaping...
Comment 25•10 years ago
|
||
Right, that'll be what's happening there. We don't currently implement context across font runs, so a character like U+0670 that falls back to a different font will cause a break in shaping.
Comment 26•10 years ago
|
||
(In reply to Behdad Esfahbod from comment #21) > Fixed both. please take another look. Yep, medial lam-alef is now working, and teh-marbuta is shaping correctly. However, now that the mediLamLookup is hooked up, we're getting an extra isolated lam after any instance of medial lam that *isn't* followed by alef. ISTM that we need to insert the LAM_MEDI_MARKER only in context when there is a following alef.
Comment 27•10 years ago
|
||
Oh shoot. Yeah, of course. That's why I hadn't hooked up the lamMedi, I need to write a contextual lookup. Ok, will do that now.
Comment 28•10 years ago
|
||
Humm. The LAM_MEDI_MARKER won't work if user disables rlig. I know that's not supported. Still, I think it's better to add a contextual rlig feature that chooses the correct form based on the previous glyph. I'll do that instead.
Comment 29•10 years ago
|
||
Humm. The solution was much easier: we control the order of lookups here, so doing rlig before medi/fina does the trick. Please try again.
Comment 30•10 years ago
|
||
Almost! You forgot to update the ligature rules to match the original (unshaped) lam/alef glyphs, but once I fixed that locally, it works.
Comment 31•10 years ago
|
||
In master now!
Comment 32•10 years ago
|
||
The harfbuzz update that landed in bug 1036981 includes the fix for this issue.
Updated•10 years ago
|
Comment 33•10 years ago
|
||
Jonathan, birunthan, what are the plan wrt 33? Are you planning to ask for an uplift of harfbuzz to see that bug fixed in 33? Thanks
Flags: needinfo?(jfkthame)
Flags: needinfo?(birunthan)
Comment 34•10 years ago
|
||
(In reply to Sylvestre Ledru [:sylvestre] from comment #33) > Jonathan, birunthan, what are the plan wrt 33? Are you planning to ask for > an uplift of harfbuzz to see that bug fixed in 33? Thanks My plan was to nominate bug 1036981 for uplift to aurora once it's had a few days to bake on central, in order to fix this issue. (I think that's preferable to trying to cherry-pick just this fix onto our existing older harfbuzz version, though that would be a possible alternative.)
Flags: needinfo?(jfkthame)
Assignee | ||
Updated•10 years ago
|
Flags: needinfo?(birunthan)
Updated•10 years ago
|
You need to log in
before you can comment on or make changes to this bug.
Description
•