Closed
Bug 1105644
Opened 11 years ago
Closed 10 years ago
Incorrect German hyphenation w/ CSS -moz-hyphens (is there a more up-to-date hyphenation dictionary we could use?)
Categories
(Core :: Internationalization, defect)
Core
Internationalization
Tracking
()
VERIFIED
FIXED
mozilla38
| Tracking | Status | |
|---|---|---|
| firefox38 | --- | fixed |
People
(Reporter: bugzilla, Assigned: jfkthame)
Details
Attachments
(4 files)
|
4.49 KB,
application/x-zip
|
Details | |
|
2.82 KB,
patch
|
smontagu
:
review+
|
Details | Diff | Splinter Review |
|
2.53 KB,
patch
|
smontagu
:
review+
|
Details | Diff | Splinter Review |
|
340 bytes,
text/html
|
Details |
User Agent: Mozilla/5.0 (X11; Linux x86_64; rv:33.0) Gecko/20100101 Firefox/33.0
Build ID: 20141113143407
Steps to reproduce:
The German word "Grundausstattung" is wrongly hyphenated "Gr-undausstattung" when using -moz-hyphens:auto. The part "Grund" is not to be hyphenated at all.
I tested in FF33 (regular) and FF35.0a2 (dev edition).
Actual results:
Minimal example to reproduce:
data:text/html,<p style="width:40px;-moz-hyphens:auto;width:40px" lang="de">Grundausstattung</p>
Expected results:
The correct hyphenation is "Grund-aus-stat-tung". That should be shown.
Comment 1•11 years ago
|
||
Archaeopteryx, I don't suppose you know if we can update the hyphenation dictionary in use? (see also e.g. bug 966818)
Component: Untriaged → Internationalization
Flags: needinfo?(archaeopteryx)
Product: Firefox → Core
Summary: Incorrect German hyphenation w/ CSS -moz-hyphens → Incorrect German hyphenation w/ CSS -moz-hyphens (is there a more up-to-date hyphenation dictionary we could use?)
Comment 2•11 years ago
|
||
KaiRo maintains the most used German dictionaries, he is likely to hold more information on that.
Flags: needinfo?(archaeopteryx) → needinfo?(kairo)
Comment 3•11 years ago
|
||
I only maintain the AMO packaging for spelling dictionaries, the hyphenation stuff has independent of that and served in product.
Flags: needinfo?(kairo)
| Reporter | ||
Comment 4•10 years ago
|
||
So, any news here? I would be happy, if the one in charge would speak up with an opinion.
Unfortunately, one of our customers, a linguist as a matter of fact, spotted so many hyphenation errors, that we needed to disable -moz-hyphens on his website. Of course, this is an edge case, but I mention it to demonstrate, that the bug happens in more than just a niche of German.
Comment 5•10 years ago
|
||
(In reply to Manuel Strehl from comment #4)
> So, any news here? I would be happy, if the one in charge would speak up
> with an opinion.
There's not really a "one in charge" at Mozilla.
In any case, my knowledge of German is sufficient to agree that this is a bug (and I imagine the other people commenting so far, being native speakers, would agree too); the question is just how to fix it... software like Firefox uses hyphenation dictionaries to know where to hyphenate text in different languages, and so we basically need a better one than what is in use now. I don't know if such a better dictionary exists, nor where to find one, nor how many of the issues the linguist identified would be fixed by such a new dictionary.
Archaeopteryx, or Jonathan, ideas?
Flags: needinfo?(jfkthame)
Flags: needinfo?(archaeopteryx)
| Assignee | ||
Comment 6•10 years ago
|
||
This looks to me like it's probably a Gecko bug (rather than a shortcoming of the German hyphenation patterns we're using). Note that the bad hyphenation does *not* occur if you lowercase the word:
data:text/html;charset=utf-8,<div lang="de" style="-moz-hyphens:auto;width:1em">grundausstattung
gives the expected "grund-aus-stat-tung", whereas the capitalized version gives "Gr-und-aus-stat-tung". AFAIK, the result here should not have been dependent on the capitalization.
So I suspect this is a code bug, perhaps in the dom/layout line-breaking code or in the integration of the hyphenation library.
Flags: needinfo?(jfkthame)
Updated•10 years ago
|
Flags: needinfo?(archaeopteryx)
| Reporter | ||
Comment 7•10 years ago
|
||
Hm, as far as I know, there are no major problems with the TeX ngerman patterns when used in LaTeX. (Correct me, if I'm wrong, but those are used, aren't they?) I also use them in an extension to Apache FOP for hyphenation in XSL-FO.
Can it be, that the conversion step or the application in the code introduce inaccuracies?
I attach a minimal XSL-FO testcase complete with rendered PDF. It shows, that Apache FOP + TeX patterns does not show this wrong hyphenation behaviour.
(Reproducing: Unzip, then `fop -fo test.fo -pdf test.pdf`. Make sure, that the "offo" extension for Fop is installed.)
| Reporter | ||
Comment 8•10 years ago
|
||
Comment on attachment 8564929 [details]
XSL-FO + rendered PDF: No bug in Fop using TeX patterns
(The test case had wrongly those attributes set: hyphenation-push-character-count="4" hyphenation-remain-character-count="4". Removing them doesn't change the result, though.)
| Assignee | ||
Comment 9•10 years ago
|
||
The issue here is that we need to explicitly lowercase the characters in each word before passing to libhyphen for pattern matching; it does not do that internally. We're already having to do a UTF16-to-UTF8 copy, so we can apply lowercasing at the same time.
Attachment #8564959 -
Flags: review?(smontagu)
| Assignee | ||
Updated•10 years ago
|
Assignee: nobody → jfkthame
Status: UNCONFIRMED → ASSIGNED
Ever confirmed: true
| Assignee | ||
Comment 10•10 years ago
|
||
This affects a couple of our existing reftests, but I believe the changed behavior is an improvement in these cases, so just fixing the reference files to match. These cases just aren't as glaring as the German example, so we didn't pay sufficient attention to them.
Attachment #8564960 -
Flags: review?(smontagu)
| Assignee | ||
Updated•10 years ago
|
OS: Linux → All
Hardware: x86_64 → All
Version: 35 Branch → Trunk
Updated•10 years ago
|
Attachment #8564960 -
Flags: review?(smontagu) → review+
Comment 11•10 years ago
|
||
Comment on attachment 8564959 [details] [diff] [review]
Lowercase words before passing them to libhyphen, so as to match patterns fully
Review of attachment 8564959 [details] [diff] [review]:
-----------------------------------------------------------------
::: intl/hyphenation/nsHyphenator.cpp
@@ +99,5 @@
> + }
> +
> + // XXX What about language-specific casing? Consider Turkish I/i...
> + // In practice, it looks like the current patterns will not be
> + // affected by this, as they treat dotted and undotted i similarly.
I was going to say "what about German SS/ß? That can impact hyphenation" -- but on second thoughts if we're lowercasing and not uppercasing, ẞ will just become ß and SS will become ss, so there shouldn't be a problem, right?
Attachment #8564959 -
Flags: review?(smontagu) → review+
Comment 12•10 years ago
|
||
(In reply to Simon Montagu :smontagu from comment #11)
> I was going to say "what about German SS/ß? That can impact hyphenation" --
> but on second thoughts if we're lowercasing and not uppercasing, ẞ will just
> become ß and SS will become ss, so there shouldn't be a problem, right?
If ß got uppercased to SS before, that would be an issue for hyphenation, e.g.
Maße > MASSE > Masse
Masse has a different meaning and a different hyphenation:
Ma|ße vs. Mas|se
| Assignee | ||
Comment 13•10 years ago
|
||
(In reply to Archaeopteryx [:aryx] from comment #12)
> (In reply to Simon Montagu :smontagu from comment #11)
> > I was going to say "what about German SS/ß? That can impact hyphenation" --
> > but on second thoughts if we're lowercasing and not uppercasing, ẞ will just
> > become ß and SS will become ss, so there shouldn't be a problem, right?
Right.
> If ß got uppercased to SS before, that would be an issue for hyphenation,
> e.g.
>
> Maße > MASSE > Masse
> Masse has a different meaning and a different hyphenation:
> Ma|ße vs. Mas|se
If the text contains "SS" for an uppercase ß, that's the spelling we'll hyphenate; if the author writes "MASSE" and wants it hyphenated as uppercased "maße", i.e. "MA-SSE", they'd have to provide a manual soft hyphen, I think. The browser can't tell which of those words "MASSE" was meant to be.
OTOH, if you're referring to a case where the author writes "maße", and then it is uppercased via text-transform:uppercase ... well, we don't seem to apply auto-hyphenation at all in that case, so the issue doesn't arise. (Though I'm not sure why that is -- perhaps another bug?)
| Assignee | ||
Comment 14•10 years ago
|
||
https://hg.mozilla.org/integration/mozilla-inbound/rev/e46f80935409
https://hg.mozilla.org/integration/mozilla-inbound/rev/a36c441817d8
Target Milestone: --- → mozilla38
Comment 15•10 years ago
|
||
My bug report on Launchpad could have the same reason → https://bugs.launchpad.net/ubuntu/+source/firefox/+bug/1209176
There I compared the hyphenation between Firefox and LibreOffice. Firefox still doesn’t correctly hyphenate the test words in my example html-page (which also contains lowercase words, see attachment).
Comment 16•10 years ago
|
||
Comment 17•10 years ago
|
||
Comment on attachment 8565930 [details]
Hyphenation test page (German words)
><html lang="de">
><head><meta charset="utf-8">
><title>Hypenation Test</title>
><style type="text/css">
>p {-moz-hyphens:auto; hyphens:auto;
>width:3em; border-right: 1px solid red;
>font: 1em/1.32 FreeSerif,'Times New Roman',Times,serif;}
></style></head>
><body>
><p>mmmiii Türklinke Übungen wörtlich künftige öffentlich Überschriften überempfindlich</p>
></body>
></html>
Comment 18•10 years ago
|
||
sorry had to back this out for test failures like https://treeherder.mozilla.org/logviewer.html#?job_id=6720242&repo=mozilla-inbound on asan builds
Flags: needinfo?(jfkthame)
| Assignee | ||
Comment 19•10 years ago
|
||
Yeah, that's a bug -- I didn't notice that the |begin| variable gets reused later in the method. I've fixed the patch locally, and pushed an asan try run to double-check before relanding.
Flags: needinfo?(jfkthame)
| Assignee | ||
Comment 20•10 years ago
|
||
Comment 21•10 years ago
|
||
https://hg.mozilla.org/mozilla-central/rev/133f5ec3de01
https://hg.mozilla.org/mozilla-central/rev/7c3e53198e5b
Status: ASSIGNED → RESOLVED
Closed: 10 years ago
status-firefox38:
--- → fixed
Flags: in-testsuite+
Resolution: --- → FIXED
Comment 22•10 years ago
|
||
Verified fixed with Nightly 38.0a1 20150219030204 on Windows 8.1
Status: RESOLVED → VERIFIED
You need to log in
before you can comment on or make changes to this bug.
Description
•