Hyphenation for Czech and Slovak languages
Categories
(Core :: Layout: Text and Fonts, enhancement)
Tracking
()
Tracking | Status | |
---|---|---|
firefox130 | --- | fixed |
People
(Reporter: ondrej.sojka, Assigned: jfkthame)
References
Details
(Keywords: dev-doc-complete)
Attachments
(1 file)
Steps to reproduce:
See https://bugzilla.mozilla.org/show_bug.cgi?id=1705881 for repro.
Actual results:
Hyphenation is not working for Czech and Slovak.
This was previously deliberate because of the license on the old patterns, but there are new Czecho-Slovak hyphenation patterns that hyphenate both languages really well.
Expected results:
The new patterns should be included and used by default for both Czech and Slovak languages as they are MIT-licensed and have superior performance.
Disclosure: I'm the author of the patterns.
Updated•2 months ago
|
Assignee | ||
Comment 1•2 months ago
|
||
Can you give details of where these new patterns are published? I just checked the tex-hyphen repository at https://github.com/hyphenation/tex-hyphen/ but there don't appear to be any recent updates of Czech/Slovak patterns there...
Reporter | ||
Comment 2•2 months ago
|
||
Yes, they are not yet published in tex-hyphen (as I wanted to have a period to catch bugs but then didn't get to it). It will be published there by the end of the year, to get them into TeXLive 2025 along with new multilingual Slavic patterns I'm developing now.
Main repository for the Czecho-slovak patterns is here: https://github.com/tensojka/cshyphen/blob/master/csskhyphen.pat
They are published on the Github currently, Android copied them into their tree: https://android.googlesource.com/platform/external/hyphenation-patterns/+/refs/heads/main/tensojka/cs/
Assignee | ||
Comment 3•2 months ago
|
||
Thank you, that's really helpful. (I'm afraid I hadn't paid close attention to the papers in TUGboat, although it's on my shelf!) This looks like an awesome piece of work, and I'd definitely agree with adding them to Firefox.
Assignee | ||
Comment 4•2 months ago
|
||
Just to check, could you confirm the intended \lefthyphenmin and \righthyphenmin values that should be used with these patterns -- should they both be set to 2?
Assignee | ||
Comment 5•2 months ago
|
||
(In reply to Jonathan Kew [:jfkthame] from comment #4)
Just to check, could you confirm the intended \lefthyphenmin and \righthyphenmin values that should be used with these patterns -- should they both be set to 2?
(Ondřej confirmed this to me by email, as bugzilla was giving trouble earlier today.)
Assignee | ||
Comment 6•2 months ago
|
||
I tried adding these new patterns, and noticed one slightly surprising result when using the first article from UDHR in Czech as an example: for the word "důstojnosti", I'm seeing the hyphenation "dů-s-toj-no-s-ti", where twice the single letter "s" appears with potential hyphenation breaks both before and after it. So in the most extreme case, it could appear as:
dů-
s-
toj-
no-
s-
ti
I don't know anything about the Czech language, so can't really assess this myself, but it seemed a bit odd for "s-" to appear as if it's a complete syllable. Is this OK, or does it indicate a problem (either with the patterns, or with how I prepared them for use in Firefox)?
Reporter | ||
Comment 7•2 months ago
|
||
Yes, it is (In reply to Jonathan Kew [:jfkthame] from comment #6)
I tried adding these new patterns, and noticed one slightly surprising result when using the first article from UDHR in Czech as an example: for the word "důstojnosti", I'm seeing the hyphenation "dů-s-toj-no-s-ti", where twice the single letter "s" appears with potential hyphenation breaks both before and after it. So in the most extreme case, it could appear as:
dů-
s-
toj-
no-
s-
tiI don't know anything about the Czech language, so can't really assess this myself, but it seemed a bit odd for "s-" to appear as if it's a complete syllable. Is this OK, or does it indicate a problem (either with the patterns, or with how I prepared them for use in Firefox)?
Great catch and question!
This is on purpose; the s can be either in the syllable důs or in the syllable stoj, and the typesetting engine then has more freedom in where and whether to hyphenate. It's a bit sad that we can't provide priorities for the hyphenation points as one of the hyphenations is usually preferred (but both are good enough).
Assignee | ||
Comment 8•2 months ago
|
||
OK, thanks for your reply. I'll go ahead and prepare a patch to add this to Firefox shortly.
Assignee | ||
Comment 9•2 months ago
|
||
The same patterns are used for both languages, so we just add a single copy
under the hyph_cs
name, and then use a hyphenation-alias pref to apply
the same rules to sk
as well.
Comment 10•2 months ago
|
||
[:jfkthame] when the patterns are in code, can you please also update https://github.com/mdn/browser-compat-data/blob/23a691219668ee5d6f3c9c134d989874a2609efd/css/properties/hyphens.json with the version that will contain it, so the support for Czech and Slovak shows in MDN?
Comment 11•2 months ago
|
||
Comment 12•2 months ago
|
||
bugherder |
Assignee | ||
Comment 13•2 months ago
|
||
(In reply to Michal Stanke (Mozilla.cz) [:mstanke][:MikkCZ] from comment #10)
[:jfkthame] when the patterns are in code, can you please also update https://github.com/mdn/browser-compat-data/blob/23a691219668ee5d6f3c9c134d989874a2609efd/css/properties/hyphens.json with the version that will contain it, so the support for Czech and Slovak shows in MDN?
I've opened https://github.com/mdn/browser-compat-data/pull/23963 to update this.
Comment 14•2 months ago
|
||
MDN docs, if any, can be tracked on https://github.com/mdn/content/issues/35425
Comment 15•2 months ago
|
||
Description
•