The default bug view has changed. See this FAQ.

Incorrect hyphenation patterns used by CSS moz-hyphens in XHTML content (en-US patterns used, regardless of lang declaration)

RESOLVED FIXED in mozilla17

Status

()

Core
Layout: Text
P3
normal
RESOLVED FIXED
5 years ago
4 years ago

People

(Reporter: testit, Assigned: dbaron)

Tracking

Trunk
mozilla17
Points:
---

Firefox Tracking Flags

(relnote-firefox -)

Details

Attachments

(5 attachments, 1 obsolete attachment)

(Reporter)

Description

5 years ago
Created attachment 574156 [details]
hyphenation_no_lang.xhtml

User Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:8.0) Gecko/20100101 Firefox/8.0
Build ID: 20111104165243

Steps to reproduce:

I used -moz-hyphens in a xhtml file which has no language specified (no lang attribute used).


Actual results:

According to the specification (https://developer.mozilla.org/en/CSS/hyphens), a hyphenation rule should only be applied, if a language is specified using the "lang" attribute.

Which hyphenation rule is applied if no language is specified?
(Reporter)

Updated

5 years ago
Summary: CSS moz-hyphens → CSS moz-hyphens / no language set
None?
Component: General → Style System (CSS)
Product: Firefox → Core
QA Contact: general → style-system
(Reporter)

Comment 2

5 years ago
Sorry, my description was not clear. The actual result is, that a hyphenation rule is applied but it shouldn't.

Updated

5 years ago
Attachment #574156 - Attachment mime type: application/octet-stream → text/xml

Updated

5 years ago
Status: UNCONFIRMED → RESOLVED
Last Resolved: 5 years ago
Resolution: --- → DUPLICATE
Duplicate of bug: 702123
(Reporter)

Comment 4

5 years ago
This is not a duplicate of bug 702123. The topic of bug 702123 is, that a hyphenation rule is applied but shouldn't. The topic of this bug is that different hyphenation rules are applied in html and xhtml files.
Status: RESOLVED → UNCONFIRMED
Resolution: DUPLICATE → ---
(Assignee)

Updated

5 years ago
Summary: CSS moz-hyphens / no language set → -moz-hyphens should not apply hyphenation rules when no language is specified
> According to the specification (https://developer.mozilla.org/en/CSS/hyphens)

That's not the specification.  That's documentation.  And it was just wrong.  I've fixed it.

What hyphenation dictionary is used depends on the encoding of the page and the declared language.

> The topic of this bug is that different hyphenation rules are applied in html and xhtml
> files.

The attached testcase is using UTF-8 as the encoding.  If I load it a UTF-8 HTML (as opposed to ISO-8859-1 HTML), it hyphenates exactly the same as the attached testcase.

So as far as I can tell the behavior is correct; over to layout:text to verify.
Component: Style System (CSS) → Layout: Text
QA Contact: style-system → layout.fonts-and-text
Summary: -moz-hyphens should not apply hyphenation rules when no language is specified → CSS moz-hyphens / no language set
I guess the difference between ISO-8859-1 and UTF-8 could be considered a problem.  Is that the concern?
Oh, and what the actual _spec_ at http://dev.w3.org/csswg/css3-text/#hyphens says is this:

  The UA is therefore only required to automatically hyphenate text for which the author
  has declared a language (e.g. via HTML lang or XML xml:lang) and for which it has an
  appropriate hyphenation resource. 

Note that the UA is allowed to hyphenate other texts too; it's just not required to.
It looks to me like we're not correctly detecting the declared language (or lack thereof) in XHTML documents, and always treating the text as being en-US. Even after adding a specific declaration such as xml:lang="af", I still see the en-US hyphenation dictionary being used.

Not sure yet whether this is a problem specifically in the hyphenation code, or a higher-level issue with xml:lang support.
Status: UNCONFIRMED → NEW
Ever confirmed: true
OS: Windows 7 → All
Hardware: x86_64 → All
Summary: CSS moz-hyphens / no language set → Incorrect hyphenation patterns used by CSS moz-hyphens in XHTML content (en-US patterns used, regardless of lang declaration)
Version: 8 Branch → Trunk
Created attachment 574185 [details]
XHTML testcase for -moz-hyphens:auto that should _not_ apply English rules

In this testcase, the content is explicitly tagged with xml:lang="ar", so we should definitely *not* be applying English rules. (And as we don't ship any rules for Arabic, the net result should be no auto-hyphenation.)

However, it seems that when BuildTextRunsScanner::SetupBreakSinksForTextRun looks at GetStyleVisibility()->mLanguage to find the appropriate language, it's getting "en" despite the "ar" tag in the document.
(Assignee)

Comment 10

5 years ago
I think this is a duplicate of bug 234485.  I thought we'd implemented support for xml:lang, but I don't think we have.
(Assignee)

Comment 11

5 years ago
... except for the part where we should store whether the language information came from something explicit, and only do language-based hyphenation when it is.
(Assignee)

Comment 12

5 years ago
https://hg.mozilla.org/users/dbaron_mozilla.com/patches/raw-file/4e1148566cbf/explicit-language is an untested patch for the style system side of tracking whether the language is explicitly set.
Duplicate of this bug: 707637
(Assignee)

Comment 14

5 years ago
Created attachment 644757 [details] [diff] [review]
Track whether nsStyleVisibility::mLanguage came from explicit information in the document.  (, patch 1)
Attachment #644757 - Flags: review?(jfkthame)
(Assignee)

Comment 15

5 years ago
Created attachment 644758 [details] [diff] [review]
Only do hyphenation when the language was specified explicitly, rather than using an encoding-inferred language.  (, patch 2)
Attachment #644758 - Flags: review?(jfkthame)
(Assignee)

Updated

5 years ago
Attachment #644757 - Attachment description: Track whether nsStyleVisibility::mLanguage came from explicit information in the document. () UNTESTED → Track whether nsStyleVisibility::mLanguage came from explicit information in the document. (, patch 1)
Comment on attachment 644757 [details] [diff] [review]
Track whether nsStyleVisibility::mLanguage came from explicit information in the document.  (, patch 1)

Review of attachment 644757 [details] [diff] [review]:
-----------------------------------------------------------------

::: layout/style/nsStyleStruct.cpp
@@ +139,5 @@
>        language.FindChar(PRUnichar(',')) == kNotFound) {
>      mLanguage = do_GetAtom(language);
> +    // REVIEW: Should this count as explicit, or should only the lang
> +    // attribute count as explicit?
> +    mExplicitLanguage = true;

My inclination would be to treat only the lang attribute as explicit. That's often the only thing that an author will control (or even be aware of), I suspect. And given that this (for now, at least) is targeted at hyphenation, which in turn is author-controlled via CSS, it seems right to keep the language control also firmly and solely in the hands of the page author, without involving the server configuration.

In which case the initialization of mExplicitLanguage (always false) could be moved to the initializer lists for the non-copy constructors, instead of needing to be included in the Init() method.

::: layout/style/nsStyleStruct.h
@@ +113,5 @@
>    nscoord mScriptMinSize;        // [inherited] length
>    float   mScriptSizeMultiplier; // [inherited]
>    nsCOMPtr<nsIAtom> mLanguage;   // [inherited]
> +  // was mLanguage set based on something in the document?
> +  bool mExplicitLanguage;        // [inherited]

It looks to me like the struct will pack better if you insert this alongside the existing PR[U]Int8 fields, rather than appending it.

(The initializer in nsStyleStruct.cpp would then need to be re-ordered to match.)
Comment on attachment 644758 [details] [diff] [review]
Only do hyphenation when the language was specified explicitly, rather than using an encoding-inferred language.  (, patch 2)

Review of attachment 644758 [details] [diff] [review]:
-----------------------------------------------------------------

Looks fine to me (assuming it passes the tests, of course!)

If it's easy to do, maybe it'd be nice to split the variable-renaming stuff that doesn't involve any functional change (langGroup -> hyphenationLanguage, etc) into a separate patch, and then the actual behavior-changing patch would become much smaller. But not a big deal if you don't want to bother.
Attachment #644758 - Flags: review?(jfkthame) → review+
(Assignee)

Comment 18

5 years ago
Created attachment 652991 [details] [diff] [review]
Track whether nsStyleVisibility::mLanguage came from explicit information in the document.  (, patch 1)
Attachment #652991 - Flags: review?(jfkthame)
(Assignee)

Updated

5 years ago
Attachment #644757 - Attachment is obsolete: true
Attachment #644757 - Flags: review?(jfkthame)
Attachment #652991 - Flags: review?(jfkthame) → review+
(Assignee)

Comment 19

5 years ago
https://hg.mozilla.org/integration/mozilla-inbound/rev/198f6c4784cb
https://hg.mozilla.org/integration/mozilla-inbound/rev/67f1eff2cc7c

I think the remainder of this bug is a duplicate of bug 234485.
Priority: -- → P3
(Assignee)

Updated

5 years ago
Assignee: nobody → dbaron
https://hg.mozilla.org/mozilla-central/rev/198f6c4784cb
https://hg.mozilla.org/mozilla-central/rev/67f1eff2cc7c
Status: NEW → RESOLVED
Last Resolved: 5 years ago5 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla17

Updated

4 years ago
Depends on: 822554
(Assignee)

Comment 21

4 years ago
Created attachment 703339 [details] [diff] [review]
, patch 3:  Make the auto-hyphenation-{8,9,10} reftests not depend on picking the same fonts for different language or encoding values, to fix Android reftest failures from landing of bug 831354.
Attachment #703339 - Flags: review?(jfkthame)
Attachment #703339 - Flags: review?(jfkthame) → review+
(Assignee)

Comment 22

4 years ago
https://tbpl.mozilla.org/?tree=Try&rev=e3df072f8858
https://hg.mozilla.org/integration/mozilla-inbound/rev/47166178f558
https://hg.mozilla.org/mozilla-central/rev/47166178f558
relnote-firefox: --- → ?
Not rel-noting it as majority of the feature  landed in mozilla-17. The patch that recently landed seems to be reftest's hence not tracking for rel-noting
relnote-firefox: ? → -
You need to log in before you can comment on or make changes to this bug.