Closed Bug 1108616 Opened 9 years ago Closed 9 years ago

Incorrect Letter Being Displayed (Turkish İ instead of English I in text-transform: uppercase)

Categories

(Core :: Internationalization, defect)

34 Branch
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla37
Tracking Status
firefox34 --- wontfix
firefox35 + fixed
firefox36 + fixed
firefox37 + fixed

People

(Reporter: rocket_lsr, Assigned: jfkthame)

References

(Depends on 1 open bug)

Details

(Keywords: regression)

Attachments

(7 files, 2 obsolete files)

1.16 MB, image/png
Details
362 bytes, text/html
Details
8.62 KB, image/png
Details
1.04 MB, image/png
Details
1.34 KB, patch
dbaron
: review+
Details | Diff | Splinter Review
28.60 KB, patch
jfkthame
: review+
Details | Diff | Splinter Review
4.64 KB, patch
jtd
: review+
Details | Diff | Splinter Review
Attached image bug.png
User Agent: Mozilla/5.0 (Windows NT 6.1; rv:34.0) Gecko/20100101 Firefox/34.0
Build ID: 20141125180439

Steps to reproduce:

My Firefox is in Turkish. After the latest Firefox update I have visited some websites like http://www.gog.com and noticed this small wrong letter bug.


Actual results:

Websites with Turkish Unicode font support displayed the letter İ instead of I. The attachment is an example for the bug.


Expected results:

The websites should have displayed English character I instead of İ (Turkish uppercase letter for "i" )

As a sidenote, this problem can be temporarily solved by changing the character encoding from Unicode to Western. However once I do that Unicode characters gets displayed incorrectly.
Component: Untriaged → Internationalization
Product: Firefox → Core
Summary: Incorrect Letter Being Displayed While Visiting Websites → Incorrect Letter Being Displayed (Turkish İ instead of English I in text-transform: uppercase)
I have a feeling this is a result of a page being incorrectly tagged as Turkish while displaying English text, but I haven't been able to reproduce.
There are two things that are bothering me, the first one is this problem has arisen with the latest update. There wasn't such thing before. And secondly I have asked different friends of mine to test and verify the issue and they said they had the same thing as well. I'm not sure if this helps, but all of us are Windows 7 x32 Ultimate users, the computer UI's are in Turkish, and installed Firefox in Turkish as well. I have tested this issue with more than one websites by the way. I think that the pages are somehow showing Turkish font instead English one if there font used has "İ" itself. Some fonts are shown correctly probably because not all the fonts have the Unicode Turkish letters embed in it.
This should demonstrate the difference between the transformations made to dotted and dotless i when tagged as English and Turkish.

Can you please verify whether this is rendering as you might expect?
Flags: needinfo?(rocket_lsr)
Um... I'm not sure what am I looking for here... But let me show the output I have recieved when I have pasted the attachment to the notepad. Filename is test.png.
Flags: needinfo?(rocket_lsr)
Attached image test.png
Comment on attachment 8539746 [details]
test.png

Well, pasting the output into Notepad is going to remove the styling and the language tags, but the browser screenshot suggests to me that everything is working as intended.
Attached image Comparison.png
Okay, this was some interesting thing. I thought that maybe font is the cause of this issue and now I have just discovered something different. I still think font is kind of responsible because if I change coding from unicode to western I no longer have any problems with it. However, this time unicode letters get scrambled. As a sidenote I'm still having the issue... And it is only with the capital letter. The screenshot I have provided has an example of the problem I have encountered. Incorrectly displayed words are circled red, they should have appeared like in the green circles. And for some reason one sentence is written with the same font but one part is displayed correct and the other part is incorrect, that section is circled blue.
I'm not entirely sure yet, but I think there may be a real Gecko bug here. I'm able to reproduce the issue on http://www.gog.com/ (as shown in attachment 8539783 [details]) on Mac OS X, too, by setting my primary system language to Turkish. (Note that setting Firefox's preferred content language to Turkish, but leaving the system language set to English, did NOT trigger the problem.)

The top-level navigation items on gog.com (Games, Movies, Community, etc) are styled with text-transform:uppercase, which in this case is applying the Turkish uppercase rules; but a cursory inspection of the page does not show any lang="tr" tagging. My recollection (but I haven't checked this) is that we're only supposed to use the language-specific casing rules for content that is explicitly language-tagged, but in this case it looks as though we're doing it by default based on the system locale. That may be a bug.

Note that the Turkish uppercasing behavior was implemented in bug 231162, which landed for Firefox 14. So that has been in the product (for Turkish-tagged content) for a long time. Given comment 0 ("After the latest Firefox update..."), it sounds like this may be a recent regression.
On OS X, the problem does not occur in Firefox 33.1.1. It does occur in 34.0.5.

--> Confirming this is a regression in Firefox 34.
Status: UNCONFIRMED → NEW
Ever confirmed: true
OS: Windows 7 → All
Hardware: x86 → All
(In reply to Jonathan Kew (:jfkthame) from comment #8)
> any lang="tr" tagging. My recollection (but I haven't checked this) is that
> we're only supposed to use the language-specific casing rules for content
> that is explicitly language-tagged, but in this case it looks as though
> we're doing it by default based on the system locale. That may be a bug.

Currently the only thing that checks mExplicitLanguage is hyphenation:
  http://mxr.mozilla.org/mozilla-central/search?string=mexplicitlanguage
but I agree it would make sense to do so for language-specific casing.
https://hg.mozilla.org/mozilla-central/rev/f0ae82398d23 seems like one possibility, though I haven't tested.

(I don't immediately see how it would influence the codepath nsStyleFont::Init -> nsPresContext::GetLanguageFromCharset -> (getting data from) -> nsPresContext::UpdateCharSet -> nsLanguageAtomService::LookupCharSet -> mozilla::dom::EncodingUtils::LangGroupForEncoding, though, but if something could explain nsLanguageAtomService::LookupCharSet switching to return x-unicode that would explain the behavior.)
The regression occurs on nightly is triggered by https://hg.mozilla.org/mozilla-central/rev/99e040a2e972 (bug 1048050).

However, that's not really the root cause of the bug; it just exposes it, by making "tr-tr" (which we get from the system locale via nsPresContext::UpdateCharSet on a Turkish system) select Turkish casing, rather than strictly checking only for "tr".

To fix, we just need to test whether the language code in the StyleFont was explicit or not, like we do for hyphenation.
Assignee: nobody → jfkthame
Status: NEW → ASSIGNED
Hmm, I think an additional patch may be needed to cover small-caps behavior... will check on this.
(In reply to Jonathan Kew (:jfkthame) from comment #14)
> Hmm, I think an additional patch may be needed to cover small-caps
> behavior... will check on this.

It looks like it is -- and it also looks like a little more work than the existing patch.
(In reply to David Baron [:dbaron] (UTC-5) (needinfo? for questions) from comment #15)
> (In reply to Jonathan Kew (:jfkthame) from comment #14)
> > Hmm, I think an additional patch may be needed to cover small-caps
> > behavior... will check on this.
> 
> It looks like it is -- and it also looks like a little more work than the
> existing patch.

Indeed. We'll need to pass the explicitLanguage flag down through the device context / font metrics machinery into the gfxFontStyle, so that we can check at shaping time whether to apply language-specific features or not.

(We can't just leave the existing gfxFontStyle::language field as null in the gfxFontStyle if the language wasn't explicit, because in addition to casing/shaping, it also contributes to font selection behavior -- and I think we want to maintain that even when it's an "implicit language".)
Something like this appears to fix all the cases I've tested, including small-caps (both real and synthetic).
Attachment #8539815 - Flags: review?(dbaron)
Comment on attachment 8539815 [details] [diff] [review]
part 2 - Add an explicitLanguage field to gfxFontStyle, pass it down through callers and use it to decide whether language-specific casing and other features should be used

Review of attachment 8539815 [details] [diff] [review]:
-----------------------------------------------------------------

::: layout/generic/nsBlockFrame.cpp
@@ +3016,5 @@
>  
>    if (!aState.IsAdjacentWithTop() ||
>        aChildFrame->StyleBorder()->mBoxDecorationBreak ==
> +        NS_STYLE_BOX_DECORATION_BREAK_CLONE ||
> +      aChildFrame->GetWritingMode().IsOrthogonalTo(aState.mReflowState.GetWritingMode())) {

Oops, this isn't supposed to be here!
We should also check this in the graphite shaping back-end, as well as harfbuzz; updated patch accordingly.
Attachment #8539818 - Flags: review?(dbaron)
Attachment #8539815 - Attachment is obsolete: true
Attachment #8539815 - Flags: review?(dbaron)
Comment on attachment 8539806 [details] [diff] [review]
Language-specific text-transform casing behavior should only be used when content was explicitly tagged for language

Probably good to put styleContext->StyleFont() in a lcal variable.
r=dbaron
Attachment #8539806 - Flags: review?(dbaron) → review+
Comment on attachment 8539818 [details] [diff] [review]
part 2 - Add an explicitLanguage field to gfxFontStyle, pass it down through callers and use it to decide whether language-specific casing and other features should be used

It might have been nice to separate the patch for passing the boolean
around from the patch for behavior changes to the shapers and gfxFont.
(I'm not sure I should really be reviewing the latter.)  (In fact, the
passing-through could have been more than one patch.)

Maybe stick to "ExplicitLanguage" naming rather than mixing it with
"LangExplicit"?

>+        language = hb_language_get_default();

hb_language_get_default looks locale-sensitive if I'm reading
the code correctly.

Maybe you want HB_LANGUAGE_INVALID instead?  Or will that get converted
to the result of hb_language_get_default anyway?


In nsMathMLChar.cpp, maybe (twice) use a local variable for the
const nsStyleFont*.


In CanvasRenderingContext2D.h, you need to initialize fontLangExplicit
to false in the default constructor for ContextState.

(It might be worth running a few tests under valgrind to see if anything
else similar turns up.)


(Also, CanvasRenderingContext2D::SetFont should use the |styleFont|
variable where it sets |language|, though that's a nit on the existing
code.)

r=dbaron with that, although I suspect you should get somebody else
to look at the shaper changes as well
Attachment #8539818 - Flags: review?(dbaron) → review+
(In reply to David Baron [:dbaron] (UTC-5) (needinfo? for questions) from comment #21)

> >+        language = hb_language_get_default();
> 
> hb_language_get_default looks locale-sensitive if I'm reading
> the code correctly.

Hmm, so it does.

> Maybe you want HB_LANGUAGE_INVALID instead?  Or will that get converted
> to the result of hb_language_get_default anyway?

Looks like the right thing to use here is hb_ot_tag_to_language(HB_OT_TAG_DEFAULT_LANGUAGE).
Cleaned up as per comment 21, and split off the font shaping changes to a new part 3 for separate review; carrying over r=dbaron on part 2.
Attachment #8539818 - Attachment is obsolete: true
Attachment #8539835 - Flags: review+
[Tracking Requested - why for this release]:
This is a regression that results in incorrect text rendering on sites using text-transform:uppercase (showing "İ" in place of "I" in uppercased non-Turkish content) for users with their system locale set to Turkish. Similar errors could also affect sites using text-transform:lowercase and small-caps.

Minor rendering errors could also occur for users on other locales (Irish, Greek, Dutch) where we implement language-specific casing behavior.
Tracking, but depending on risk of taking this to Beta 8 (must be ready & nominated by Mon Dec 29th to be considered).
Comment on attachment 8539836 [details] [diff] [review]
part 3 - Only do language-specific shaping when the language was explicitly tagged

Also tagging :roc as a possible reviewer here, as I don't know how available people are over the holiday period ... I'll take a review from whichever of you can get to it first. :)
Attachment #8539836 - Flags: review?(roc)
Comment on attachment 8539836 [details] [diff] [review]
part 3 - Only do language-specific shaping when the language was explicitly tagged

Review of attachment 8539836 [details] [diff] [review]:
-----------------------------------------------------------------

r+ with the inclusion of a reftest for this.
Attachment #8539836 - Flags: review?(jdaggett) → review+
(In reply to John Daggett (:jtd) from comment #28)
> with the inclusion of a reftest for this.

I'm not sure how to do that. We already have tests for language-specific rendering with explicitly-tagged content. The issue here arises when running with the *system* locale set to Turkish, which I don't think we handle as part of our automated tests.

(And if we *did* currently run reftests on a Turkish system, the existing test of text-transform:uppercase would have failed.)
Attachment #8539836 - Flags: review?(roc)
Landed the patches here, so we can get this into Nightly and consider it for uplift. If there's some way we could add an automated test (may be tricky, see comment 29), we can do that as a followup.

https://hg.mozilla.org/integration/mozilla-inbound/rev/159bbb595ffb
https://hg.mozilla.org/integration/mozilla-inbound/rev/ab1bcaf07693
https://hg.mozilla.org/integration/mozilla-inbound/rev/20cda36cb549
Target Milestone: --- → mozilla37
Comment on attachment 8539806 [details] [diff] [review]
Language-specific text-transform casing behavior should only be used when content was explicitly tagged for language

Approval Request Comment
[Feature/regressing bug #]: bug 1048050 (though really this just exposed a pre-existing bug, by making us accept locale codes including a region subtag, which we previously didn't recognize)

[User impact if declined]:
Users with their system locale set to Turkish (or Dutch, Irish, Greek) may see incorrect characters on sites that apply case transforms to content without an explicit language tag.

[Describe test coverage new/current, TBPL]:
This code is covered by existing reftests for explicitly lang-tagged content, and for untagged content on an en-US system. The fix here is specific to *other* system locales, which makes it hard to test in automation, but it was tested manually after changing system locale to verify that it behaves as expected.

[Risks and why]:
Appears low risk - this is just about being more conservative in where we apply our existing support for language-specific mappings.

[String/UUID change made/needed]:
none
Attachment #8539806 - Flags: approval-mozilla-beta?
Attachment #8539806 - Flags: approval-mozilla-aurora?
Attachment #8539835 - Flags: approval-mozilla-beta?
Attachment #8539835 - Flags: approval-mozilla-aurora?
Attachment #8539836 - Flags: approval-mozilla-beta?
Attachment #8539836 - Flags: approval-mozilla-aurora?
Attachment #8539806 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Attachment #8539835 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Attachment #8539836 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Attachment #8539806 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Attachment #8539835 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Attachment #8539836 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Should we have landed a test for this?
Flags: needinfo?(jfkthame)
Flags: in-testsuite?
(In reply to [Away 24-Dec to 4-Jan] Ryan VanderMeulen [:RyanVM UTC-5] from comment #35)
> Should we have landed a test for this?

At this point, we don't have a usable test that will work in automation. The bug here relates to behavior when the OS locale is set to Turkish, which isn't an environment we run in testing. (Existing tests cover this code when running on an en-US system.)
Flags: needinfo?(jfkthame)
The issue has been fixed with the latest Firefox update! I'd like to thank everyone who worked on a fix to my submitted ticket. THANK YOU SO MUCH! Keep up the good work and have a nice day! 

I think the thread is locked from now on, if not can you lock the thread for me since the issue is now fixed?
Depends on: 1473171
You need to log in before you can comment on or make changes to this bug.