Last Comment Bug 694205 - lang attribute not passed to font layout/rendering layer
: lang attribute not passed to font layout/rendering layer
Status: RESOLVED FIXED
:
Product: Core
Classification: Components
Component: Layout: Text (show other bugs)
: 7 Branch
: x86 Linux
: -- normal (vote)
: ---
Assigned To: Nobody; OK to take it and work on it
:
Mentors:
: 741093 (view as bug list)
Depends on: 703100
Blocks:
  Show dependency treegraph
 
Reported: 2011-10-12 17:11 PDT by Steve White
Modified: 2012-09-22 13:15 PDT (History)
5 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments
correct rendering by pango-view (18.66 KB, image/png)
2011-10-12 17:11 PDT, Steve White
no flags Details
pango-view output with DejaVu (834.91 KB, image/png)
2011-10-13 10:06 PDT, Steve White
no flags Details
HTML test file using DejaVu (626 bytes, text/html)
2011-10-13 10:08 PDT, Steve White
no flags Details
Alessandro's test showing Serbian (on my system at least) (2.45 KB, text/html)
2011-10-14 18:50 PDT, Steve White
no flags Details
Hebrew mark placement example (564 bytes, text/html)
2012-02-01 06:45 PST, Steve White
no flags Details
working example (33.53 KB, image/png)
2012-02-01 09:43 PST, Steve White
no flags Details
working example-Hebrew marks (50.89 KB, image/png)
2012-02-01 09:46 PST, Steve White
no flags Details

Description Steve White 2011-10-12 17:11:09 PDT
Created attachment 566692 [details]
correct rendering by pango-view

User Agent: Mozilla/5.0 (X11; Linux i686; rv:7.0.1) Gecko/20100101 Firefox/7.0.1
Build ID: 20110928224103

Steps to reproduce:

Have a font with a 'locl' lookup for Cyrillic text which replaces a few Russian letters with Serbian/Macedonian ones.

Tested it with pango-view.  (See attachment) with input like this:
<span lang="ru"> б г д п т </span> <span lang="ru"> <i>б г д п т</i> </span>
<span lang="sr"> б г д п т </span> <span lang="sr"> <i>б г д п т</i> </span>

(Note very few fonts make this distinction.)

Wrote HTML showing the variable characters in between <span> tags with 
lang="sr"
and
lang="ru"
Viewed this with Firefox.


Actual results:

The text is displayed, I see the font is the correct one.
(I cleared all the caches, re-started Firefox after installing the font; I have reason to be confident about the font being correct.)
However, no distinction was visible between the lang="sr" text and the lang="ru" text.

This would be consistent with the browser not passing the lang tag properly to the underlying font rendering software (Pango in this case, I think.).


Expected results:

The text from the two languages should have appeared quite different, as in the attached image.
Comment 1 :aceman 2011-10-13 02:38:02 PDT
Does this work in any other browser?
Comment 2 Jonathan Kew (:jfkthame) 2011-10-13 02:44:37 PDT
I thought we'd fixed this (see bug 24139). Please attach your complete (minimal) testcase, and specify the exact font version being used so that we can look into it more closely.
Comment 3 Steve White 2011-10-13 10:06:05 PDT
Created attachment 566872 [details]
pango-view output with DejaVu

pango-view --font "DejaVu Serif Italic 32" --markup Serbian-pango.html

where Serbian-pango.html contains
<span lang="ru"> б г д п т ѓ</span> <span lang="ru"> <i>б г д п т ѓ</i> </span>
<span lang="sr"> б г д п т ѓ</span> <span lang="sr"> <i>б г д п т ѓ</i> </span>
Comment 4 Steve White 2011-10-13 10:08:24 PDT
Created attachment 566873 [details]
HTML test file using DejaVu
Comment 5 Steve White 2011-10-13 10:10:14 PDT
Does it work in other browsers?  Not that I can see, in Linux anyway.

Curious.

And OpenOffice etc... Just nevermind.  They are pathetically broken when it comes to font features turned on by locale.
Comment 6 Steve White 2011-10-13 15:15:48 PDT
The sample is more interesting with

pango-view --font "DejaVu Serif 32" --markup Serbian-pango.html
Comment 7 Alessandro Ceschini 2011-10-14 10:51:13 PDT
I'm experiencing the same problem. Even this official test page doesn't work with my Firefox 7.01 on Ubuntu 11.04. Still I can see Serbian glyphs on the Serbian Wikipedia, but only there. Why?
Comment 8 Alessandro Ceschini 2011-10-14 10:51:55 PDT
Official test page: http://people.mozilla.org/~jdaggett/webfonts/serbianglyphs.html
Comment 9 Steve White 2011-10-14 16:15:46 PDT
Yes I looked at some Serbian pages, such as the Serbian Wikipdeia page on Macedonian orthography (look up Macedonian orthography, then click on language "Српски/Srpski")

In places, I see the Serbian "be" form in other places, the Russian.

For those not sensitive to Cyrillic: we're mostly looking at the letter "be", which looks rather like a Greek beta.  The Russian form has a tail that begins on the left side and has a pronounced upward flourish.  The Serbian form starts in the top middle of the circle, and ends more horizontally.  

In the DejaVu fonts, the distinction is made by a 'locl' substitution lookup for Serbian and Macedonian, which should be triggered by lang="sr".

Trying to determine why sometimes Serbian 'be' is appearing in that page, I was changing things all over the small example I attached.  

I am seeing very chaotic behavior here.  Things that should not affect it, are, and things that should pick the language, are failing to do so.  Really crazy.  Something's buggy.  But I haven't yet determined what triggers it.

It isn't CSS though (at least as set in the page).
Comment 10 Steve White 2011-10-14 18:50:29 PDT
Created attachment 567240 [details]
Alessandro's test showing Serbian (on my system at least)
Comment 11 Steve White 2011-10-14 18:51:09 PDT
Hi,
More experimentation, using Alessandro's example page.

Putting lang="sr" in the <html> tag has a strong influence.  Of course, then unless otherwise specified, *all* the Cyrillic text should be modified for Serbian.  This should not be necessary, either.  Putting it in any element tag should set the default language for that element and its children.

It is clear at least that the lang attribute is not being correctly dealt with in the containment hierarchy.

However, I think it's worse than that.  I seem to be seeing things change sometimes with just a second screen refresh, and in the other test file, simply adding text somewhere could change the apparent language in an unrelated element.

At this moment, I have Serbian Cyrillic showing in Alessandro's example.  See attached.
Comment 12 Jonathan Kew (:jfkthame) 2011-10-14 23:25:16 PDT
I'm confirming this bug, as it's clear there's a problem somewhere that needs investigation and fixing - my hunch is that it may turn out to be a problem with the application of OpenType features, rather than the handling of the 'lang' attribute, but that's only speculation until we track this down.

The erratic behavior described in comment 11 suggests there may be an uninitialized value somewhere that's "randomly" affecting whether the feature gets applied correctly.
Comment 13 Steve White 2012-01-31 09:46:35 PST
Hi,

Just to rattle your cage:  I'm seeing this again in FF 9.0.1 with a font that makes a distinction between Yiddish and Hebrew vowel marks. 

The marks are properly positioned by XeTeX using this font.  Also pango-view has no problem.

Firefox behaves erratically.  If two table cells contain Hebrew, but one has 'lang' attribute "he" and the other "yi", it will position marks according to whichever language came *first*.  Weirder, simply reversing them in the file and re-loading isn't enough to make it forget.  I think I've gone so far as to delete the cache and re-start, to get it to see that I've changed the 'lang' attributes.

It's as though it associates a language with a script, and then won't let go of the association.
Comment 14 Simon Montagu :smontagu 2012-01-31 13:02:37 PST
(In reply to Steve White from comment #13)
> Firefox behaves erratically.  If two table cells contain Hebrew, but one has
> 'lang' attribute "he" and the other "yi", it will position marks according
> to whichever language came *first*.

Do the two table cells contain the same characters? I ask because this reminds me of bug 386339 comment 1. The text run word cache has probably changed a lot since then so it may be completely irrelevant, but fonts with "locl" are certainly an example of what I asked there in bug 386339 comment 6 about the same sequence of unicode codepoints not being rendered with the same glyphs.
Comment 15 Steve White 2012-02-01 06:45:33 PST
Created attachment 593428 [details]
Hebrew mark placement example

This requires a font which makes a distinction between the placement in Yiddish and Hebrew of vowel marks under the yod and yodyod consonants.
Comment 16 Steve White 2012-02-01 06:53:08 PST
Hi, I tried altering the text in various ways.  It seems to have no effect.
Find the example attached.

1) This needs a special font that makes a distinction between the languages.
   The current SVN version of GNU FreeSans is the example I'm using.
   http://web.cvs.savannah.gnu.org/viewvc/?root=freefont
   You can build this with FontForge.  I could also send you a snapshot if you like.

2) A correspondent has told me that a similar sample works with that version 
   of FreeSans under Mac OS with the latest FireFox 10.  
   I have tried it under Windows with FireFox 10 and it fails.  
   I've asked him for the exact HTML file he's using.
Comment 17 Steve White 2012-02-01 08:12:10 PST
He has verified that the same "yiddish" HTML file works with FireFox 10 on Mac OS.
Again, it does not work for me with FireFox 10 on Windows 7.
Comment 18 Simon Montagu :smontagu 2012-02-01 08:29:26 PST
Steve, can you retest with a nightly? I see the Yiddish/Hebrew issue on Linux but only with builds before 2012-01-07, so I think it was fixed by bug 703100
Comment 19 Jonathan Kew (:jfkthame) 2012-02-01 09:03:45 PST
On Windows, I think you'll need to set gfx.font_rendering.harfbuzz.scripts to 7 (instead of the default 3) so that Hebrew script is rendered using harfbuzz in order to get the proper result here.
Comment 20 Steve White 2012-02-01 09:43:43 PST
Created attachment 593493 [details]
working example
Comment 21 Steve White 2012-02-01 09:46:44 PST
Created attachment 593496 [details]
working example-Hebrew marks
Comment 22 Steve White 2012-02-01 09:50:15 PST
That's it.

Nightly 12.0a1 (2012-01-31) seems to have the problem solved.

I attached shots of the Russian/Serbian Cyrillic distinction as well as Hebrew/Yiddish.  These are using the SVN versions of GNU FreeFont's FreeSans and FreeSerif.

Thanks guys!
Comment 23 Gordon P. Hemsley [:GPHemsley] 2012-09-22 13:15:22 PDT
*** Bug 741093 has been marked as a duplicate of this bug. ***

Note You need to log in before you can comment on or make changes to this bug.