Closed Bug 1787442 Opened 3 years ago Closed 2 years ago

Hyphenate capitalized words in Finnish

Categories

(Core :: Layout: Text and Fonts, defect)

Firefox 105
defect

Tracking

()

RESOLVED FIXED
122 Branch
Tracking Status
firefox122 --- fixed

People

(Reporter: heikki, Assigned: jfkthame)

References

Details

Attachments

(4 files)

User Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:105.0) Gecko/20100101 Firefox/105.0

Steps to reproduce:

There has been discussion in the CSS recommendations how word hyphenation is working for capitalized words.

It as previously changed that capitalized words are not hyphenated because they can be names. There is a previous issue https://bugzilla.mozilla.org/show_bug.cgi?id=956213.

But this causes problems for example in Finnish language because it breaks the layouts really easily because sentences always start with a capitalized word and it makes really hard to hyphenate long words.

There has been a similar problem in Chrome which has been fixed: https://github.com/chromium/chromium/commit/92a0834acb49360fe1e2bd212484ca4fef9fc2ab

There has been a lot of discussion in the matter in the csswg-drafts issue queue: https://github.com/w3c/csswg-drafts/issues/3927

The recommendation was to open an issue in the browser vendor issue queue.

Actual results:

When a sentence starts with a capital word, it is not hyphenated even though hyphens auto option is used.

Expected results:

If a sentence starts with a capital letter, the first word should be hyphenated to avoid layouts breaking easily.

The Bugbug bot thinks this bug should belong to the 'Core::Layout: Text and Fonts' component, and is moving the bug to that component. Please correct in case you think the bot is wrong.

Component: Untriaged → Layout: Text and Fonts
Product: Firefox → Core

Thanks for the bug report. You mentioned that Chromium had fixed their version of the bug -- would you mind providing a small testcase that can be used to demonstrate the issue which works in Chromium but not in Firefox?

I tried to write an English and a Finnish testcase locally, but:

  • On the English testcase, Chrome matches Firefox (hyphenating lowercase words, vs. refusing-to-hyphenate uppercase words.)
  • On the Finnish testcase, Chrome refuses to hyphenate at all (regardless of upper/lowercase)

(It's entirely possible I'm doing something wrong; hoping you can clear it up with a testcase to help me get past poking around in the dark. :))

Flags: needinfo?(heikki)
Attached file testcase 1 (english)

Here's my English testcase, where Chromium and Firefox are currently in agreement (hyphenating lowercase "something", never hyphenating uppercase "Something")

Versions:
Chrome "dev channel", Version 106.0.5249.12 (Official Build) dev (64-bit)
Firefox Nightly 106.0a1 (2022-08-30) (64-bit)

Attached file testcase 2 (Finnish)

Here's a testcase just like testcase 1 except now in Finnish (lang="fi") and with the Finnish word "Viikonloppuisin" (which seems to translate to "Weekend").

On my system:

  • Firefox treats this the same as it treats English in testcase 1.
  • Chrome behaves differently than testcase 1; it refuses to hyphenate at all. I also tried copypasting a longer chunk of Finnish text into this testcase and Chrome refused to hyphenate that too.

So at first glance, it doesn't look to me like Chrome's behavior is more-correct or shows any sort of subtle fix here; rather, they're just not hyphenating Finnish at all. But as I said, it's quite possible I'm missing something; and if so, I'm curious to see what that might be. :)

Attached file testcase 3 (German)

Taking a closer look at your Chromium commit from comment 0, it looks like it was specific to German, and I do indeed see Chromium hyphenating German content regardless of capitalization.

However, Firefox (at least) seems to do do that as well (hyphenating every word in this attachment). So I'm still not seeing a Firefox-vs-Chrome behavior difference where Chrome is doing more hyphenation.

As far as I can tell, Firefox and Chrome both match your expectation for German; and neither of us match your expectation for Finnish, but Firefox does at least some hyphenation there and Chrome seems to do none.

(In reply to Daniel Holbert [:dholbert] from comment #5)

Created attachment 9292354 [details]
testcase 3 (German)

Taking a closer look at your Chromium commit from comment 0, it looks like it was specific to German, and I do indeed see Chromium hyphenating German content regardless of capitalization.

If I'm reading it right, the Chromium commit changed their behavior from only hyphenating capitalized words in de to hyphenating them in all non-en locales.

  hyphenate_capitalized_word_ = !locale.StartsWithIgnoringASCIICase("en");

I wonder if the issue with your Finnish testcase is that you need a Finnish-localized build of Chrome to get fi hyphenation?

To change the behavior for Finnish in Firefox, try creating a boolean pref intl.hyphenate-capitalized.fi and setting it to true. Perhaps that should be set by default.

Here is an example test case that I created with Codepen: https://codepen.io/hypocriteh/pen/WNzVqVb

<article lang="fi"> 
<h2>Kansainvälisyystutkimus</h2> 
<h2>kansainvälisyystutkimus</h2>
</div>

Really simple example case where Firefox 105.0b3 refuses to hyphenate the first word from the H2 because of the capital first letter. When I change the first letter to be lower case, it works correctly and hyphenates the word as requested.

Same example with Chrome hyphenates both words.

I attached two example screenshots:
https://i.ibb.co/1zjSCP2/chrome.jpg
https://i.ibb.co/qrYhy7d/firefox.jpg

Flags: needinfo?(heikki)

(In reply to Jonathan Kew [:jfkthame] (PTO) from comment #6)

To change the behavior for Finnish in Firefox, try creating a boolean pref intl.hyphenate-capitalized.fi and setting it to true. Perhaps that should be set by default.

I tested setting this to true in my browser and after restart, hyphenation works as expected. So it does seem like this setting would fix the issue and should definitely be the default for Finnish language.

(In reply to heikki from comment #7)

Here is an example test case that I created with Codepen: https://codepen.io/hypocriteh/pen/WNzVqVb
[...]
Really simple example case where Firefox 105.0b3 refuses to hyphenate the first word from the H2 because of the capital first letter. When I change the first letter to be lower case, it works correctly and hyphenates the word as requested.

Same example with Chrome hyphenates both words.

Interesting; for me, Chrome hyphenates neither word. So perhaps the Chrome behavior that you're seeing is only present in Finnish-localized Chrome builds, as jfkthame noted.

(In any case, it sounds like we've got a straightforward approach for a fix, via the about:config pref noted above. Assuming we do proceed with that, the new default would probably go here, right after the similar setting for German:
https://searchfox.org/mozilla-central/rev/4f4c8e0e84d5a728244f1e820dda14e5cdb81e71/modules/libpref/init/all.js#1887-1890 )

Severity: -- → S3
Status: UNCONFIRMED → NEW
Ever confirmed: true
Summary: Hyphenate capitalized words if the language is not English → Hyphenate capitalized words in Finnish

That would fix the issue for Finnish, yes.

And I can also confirm that Chrome does behave differently based on the language of the browser because in Finland a lot of people install the browser with English language and some with Finnish and I have seen behavior differences based on the browser language.

I do still wonder about the behavior difference in Chrome and Firefox. Since in Chrome it seems to treat all non-English language similarly but Firefox would require setting a default for each language separately. I do understand that setting it separately for each language is a nice versatile option but looking at the CSS Draft issue which stemmed this discussion, there were for example Norwegian and Danish language users complaining about the same CSS feature. So in that case each language would need a separate setting change and issue opened at Bugzilla to change this behavior.

I could of course add a comment to the CSS Draft issue pointing to this discussion for other people already involved in the discussion so that they could also open similar issue.

See Also: → 1706879
Assignee: nobody → jfkthame
Status: NEW → ASSIGNED

@jfkthame

I checked the attach patch which has a comment:

// In German and Finnish, we allow hyphenation of capitalized words; otherwise not.

This is fine for Finnish but I would still point towards the original issue in CSSWG at https://github.com/w3c/csswg-drafts/issues/3927. The issue is about most of the languages requiring hyphenation for long words to prevent layouts from breaking. There are a lot of examples in the issue pointing towards Norwegian, Danish and Swedish languages also: https://github.com/w3c/csswg-drafts/issues/3927#issuecomment-1105116651.

For English hyphenation may be a luxury, but for many languages with longer words (Danish, Norwegian, Swedish, Finnish, German and lots more) hyphenation is an absolute necessity for proper text layout, especially on mobile where lines are quite short. You just broke text layout for a large number of languages.

Based on that issue it seems like having the setting as true by default for more languages would make sense.

Pushed by jkew@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/146ee20a5ad8 Enable hyphenation of capitalized words by default in Finnish. r=layout-reviewers,emilio
Status: ASSIGNED → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
Target Milestone: --- → 122 Branch
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: