Last Comment Bug 672320 - add hyphenation resources for more locales
: add hyphenation resources for more locales
Status: RESOLVED FIXED
: dev-doc-complete
Product: Core
Classification: Components
Component: Internationalization (show other bugs)
: Trunk
: All All
: -- normal with 6 votes (vote)
: mozilla8
Assigned To: Jonathan Kew (:jfkthame)
:
Mentors:
Depends on: 253317 656248 672472 673704
Blocks: bcp47 656750
  Show dependency treegraph
 
Reported: 2011-07-18 12:37 PDT by Jonathan Kew (:jfkthame)
Modified: 2014-06-10 02:30 PDT (History)
19 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments
hyphenation patterns for Swedish (56.16 KB, patch)
2011-07-19 04:10 PDT, Jonathan Kew (:jfkthame)
smontagu: review+
gerv: feedback+
Details | Diff | Splinter Review
include all available hyphenation patterns in the build (827 bytes, patch)
2011-07-19 04:13 PDT, Jonathan Kew (:jfkthame)
smontagu: review+
Details | Diff | Splinter Review
basic test for use of the Swedish hyphenation patterns (2.41 KB, patch)
2011-07-19 04:14 PDT, Jonathan Kew (:jfkthame)
smontagu: review+
Details | Diff | Splinter Review
add a bunch more hyphenation locales derived from TeX patterns (793.17 KB, patch)
2011-07-20 01:59 PDT, Jonathan Kew (:jfkthame)
smontagu: review+
Details | Diff | Splinter Review
basic reftests for the additional patterns (21.57 KB, patch)
2011-07-20 02:00 PDT, Jonathan Kew (:jfkthame)
smontagu: review+
Details | Diff | Splinter Review
hyphenation patterns for German (465.25 KB, patch)
2011-07-20 06:26 PDT, Jonathan Kew (:jfkthame)
no flags Details | Diff | Splinter Review
reftests for German hyphenation (3.39 KB, patch)
2011-07-20 06:32 PDT, Jonathan Kew (:jfkthame)
no flags Details | Diff | Splinter Review
hyphenation patterns for Mongolian (Cyrillic script) (21.34 KB, patch)
2011-07-20 07:16 PDT, Jonathan Kew (:jfkthame)
smontagu: review+
Details | Diff | Splinter Review
reftest for Mongolian hyphenation (2.00 KB, patch)
2011-07-20 07:17 PDT, Jonathan Kew (:jfkthame)
smontagu: review+
Details | Diff | Splinter Review
add patterns for Serbo-Croatian (covering Serbian & Bosnian lang tags) (87.91 KB, patch)
2011-07-20 14:44 PDT, Jonathan Kew (:jfkthame)
smontagu: review+
Details | Diff | Splinter Review
reftest for Serbian hyphenation (3.65 KB, patch)
2011-07-20 14:46 PDT, Jonathan Kew (:jfkthame)
smontagu: review+
Details | Diff | Splinter Review
false hyphenation war-um instead of wa-rum (202 bytes, text/html)
2011-07-20 16:07 PDT, Stefan
no flags Details
reftest for German hyphenation - added de-CH (4.47 KB, patch)
2011-07-21 03:45 PDT, Jonathan Kew (:jfkthame)
smontagu: review+
Details | Diff | Splinter Review
hyphenation patterns for German - added de-CH-* alias (465.30 KB, patch)
2011-07-21 03:47 PDT, Jonathan Kew (:jfkthame)
smontagu: review+
Details | Diff | Splinter Review
add missing hyph-aliases for "xx-*" -> "xx" (2.18 KB, patch)
2011-07-21 04:02 PDT, Jonathan Kew (:jfkthame)
smontagu: review+
Details | Diff | Splinter Review
pt 9.1 - patterns for French (24.86 KB, patch)
2011-07-22 01:46 PDT, Jonathan Kew (:jfkthame)
smontagu: review+
Details | Diff | Splinter Review
pt 9.2 - test for French hyphenation (2.25 KB, patch)
2011-07-22 01:47 PDT, Jonathan Kew (:jfkthame)
smontagu: review+
Details | Diff | Splinter Review
pt 10.1 - British English patterns (95.04 KB, patch)
2011-07-22 03:13 PDT, Jonathan Kew (:jfkthame)
smontagu: review+
gerv: feedback-
Details | Diff | Splinter Review
pt 10.2 - tests for British English patterns (3.11 KB, patch)
2011-07-22 06:31 PDT, Jonathan Kew (:jfkthame)
smontagu: review+
Details | Diff | Splinter Review
pt 11.1 - hyphenation patterns for Russian (104.65 KB, patch)
2011-07-22 06:48 PDT, Jonathan Kew (:jfkthame)
smontagu: review+
Details | Diff | Splinter Review
pt 11.2 - test for Russian patterns (2.20 KB, patch)
2011-07-22 06:54 PDT, Jonathan Kew (:jfkthame)
smontagu: review+
Details | Diff | Splinter Review
pt 12.1 - patterns for Norwegian (614.42 KB, patch)
2011-07-22 07:51 PDT, Jonathan Kew (:jfkthame)
smontagu: review+
Details | Diff | Splinter Review
pt 12.2 - tests for Norwegian (3.24 KB, patch)
2011-07-22 07:56 PDT, Jonathan Kew (:jfkthame)
no flags Details | Diff | Splinter Review
pt 13.1 - patterns for Lithuanian (17.23 KB, patch)
2011-07-22 14:12 PDT, Jonathan Kew (:jfkthame)
smontagu: review+
Details | Diff | Splinter Review
pt 13.2 - test for Lithuanian patterns (2.35 KB, patch)
2011-07-22 14:13 PDT, Jonathan Kew (:jfkthame)
smontagu: review+
Details | Diff | Splinter Review
pt 14.1 - patterns for Finnish (5.06 KB, patch)
2011-07-23 01:43 PDT, Jonathan Kew (:jfkthame)
smontagu: review+
Details | Diff | Splinter Review
pt 14.2 - test for Finnish patterns (2.41 KB, patch)
2011-07-23 01:44 PDT, Jonathan Kew (:jfkthame)
smontagu: review+
Details | Diff | Splinter Review
pt 12.2 - tests for Norwegian (revised) (3.92 KB, patch)
2011-07-26 03:49 PDT, Jonathan Kew (:jfkthame)
smontagu: review+
Details | Diff | Splinter Review
pt 15.1 - hyphenation patterns for Hungarian (745.83 KB, patch)
2011-08-17 04:29 PDT, Jonathan Kew (:jfkthame)
smontagu: review+
Details | Diff | Splinter Review
pt 15.2 - reftest for Hungarian hyphenation (2.37 KB, patch)
2011-08-17 04:32 PDT, Jonathan Kew (:jfkthame)
smontagu: review+
Details | Diff | Splinter Review
pt 16.1 - hyphenation patterns for Italian (8.24 KB, patch)
2011-08-17 13:38 PDT, Jonathan Kew (:jfkthame)
smontagu: review+
Details | Diff | Splinter Review
pt 16.2 - test for Italian patterns (2.36 KB, patch)
2011-08-17 13:38 PDT, Jonathan Kew (:jfkthame)
smontagu: review+
Details | Diff | Splinter Review
pt 17.1 - hyphenation patterns for Turkish (9.65 KB, patch)
2011-08-17 13:39 PDT, Jonathan Kew (:jfkthame)
smontagu: review+
Details | Diff | Splinter Review
pt 17.2 - test for Turkish patterns (1.97 KB, patch)
2011-08-17 13:40 PDT, Jonathan Kew (:jfkthame)
smontagu: review+
Details | Diff | Splinter Review

Description Jonathan Kew (:jfkthame) 2011-07-18 12:37:51 PDT
This is a followup to bug 253317, which provided the code to support auto-hyphenation, and the hyphenation patterns for en-US.

There are patterns for around 25 more languages available in the TeX community under the LPPL license. As per bug 656248 comment 9, we can use these as a basis for hyphenation files in Gecko, provided we follow proper licensing procedures.

The TeX patterns require a preprocessing and packaging step to make them ready for libhyphen use. This will prevent our versions serving as a direct replacement for the original files, which means we will abide by the LPPL requirements for relicensing a derived work.
Comment 1 Jonathan Kew (:jfkthame) 2011-07-19 04:10:24 PDT
Created attachment 546743 [details] [diff] [review]
hyphenation patterns for Swedish

As an initial test case, this adds Swedish hyphenation patterns (adapted from those used in TeX). We should be able to process a couple dozen more languages following exactly the same pattern, but I figured it would be simpler to review a single one first.

Gerv, please take a look at the LICENSE file and check that it meets your expectations. Note that the header lines that have been added to the patterns, and the fact that they're stripped of the TeX \patterns{...} markup, means that this is not usable as a direct replacement for the original work. (There's also been a substring-merging operation to meet libhyphen requirements, but that would not in itself affect TeX use, so it's not relevant to the relicensing requirements.)
Comment 2 Jonathan Kew (:jfkthame) 2011-07-19 04:13:18 PDT
Created attachment 546746 [details] [diff] [review]
include all available hyphenation patterns in the build

The general consensus seems to be that we should ship all available (subject to licensing) hyphenation resources in the default build, to provide the most uniform behavior and to minimize fingerprintability.
Comment 3 Jonathan Kew (:jfkthame) 2011-07-19 04:14:00 PDT
Created attachment 546747 [details] [diff] [review]
basic test for use of the Swedish hyphenation patterns
Comment 4 Simon Montagu :smontagu 2011-07-19 04:26:38 PDT
Comment on attachment 546743 [details] [diff] [review]
hyphenation patterns for Swedish

Review of attachment 546743 [details] [diff] [review]:
-----------------------------------------------------------------

rs=me
Comment 5 Gervase Markham [:gerv] 2011-07-19 06:42:55 PDT
Comment on attachment 546743 [details] [diff] [review]
hyphenation patterns for Swedish

Blimey, that's complicated! What about "(more information to be added later)"? Other than that, looks OK.

Gerv
Comment 6 Jonathan Kew (:jfkthame) 2011-07-19 06:51:14 PDT
(In reply to comment #5)
> What about "(more information to be added later)"?

That comes directly from the upstream packages I'm using; e.g. see http://tug.org/svn/texhyphen/trunk/hyph-utf8/tex/generic/hyph-utf8/patterns/txt/hyph-sv.lic.txt?revision=570&view=markup for the Swedish license file.

I'm using a conversion process that takes our tri-license plus the extra linking wording, and simply appends the old *.lic.txt file verbatim. So if that file contains oddities like this (many of them do!), they'll be preserved as-is.
Comment 7 Jonathan Kew (:jfkthame) 2011-07-20 01:59:19 PDT
Created attachment 547010 [details] [diff] [review]
add a bunch more hyphenation locales derived from TeX patterns

These are the most straightforward cases: patterns licensed under LPPL (so we can relicense our derived work), and tagged with simple language codes. Languages where more than one set of patterns are available, or other complications, will be handled individually so we can review the locale tagging used more carefully.
Comment 8 Jonathan Kew (:jfkthame) 2011-07-20 02:00:34 PDT
Created attachment 547011 [details] [diff] [review]
basic reftests for the additional patterns

Just a simple testcase for each supported language, to make sure the patterns are found and used as expected.
Comment 9 Simon Montagu :smontagu 2011-07-20 02:43:52 PDT
Comment on attachment 547010 [details] [diff] [review]
add a bunch more hyphenation locales derived from TeX patterns

rs=me.

I'm assuming that it doesn't matter whether a specific locale appears in our own locale properties files. kmr doesn't yet (though bug 666662 will add it).
Comment 10 Simon Montagu :smontagu 2011-07-20 02:49:51 PDT
Comment on attachment 547011 [details] [diff] [review]
basic reftests for the additional patterns

Review of attachment 547011 [details] [diff] [review]:
-----------------------------------------------------------------
Comment 11 Jonathan Kew (:jfkthame) 2011-07-20 02:51:15 PDT
(In reply to comment #9)

> I'm assuming that it doesn't matter whether a specific locale appears in our
> own locale properties files.

Right, this doesn't affect use of the resource. All that's required is for the lang="xxx" tag on the page to match the locale code of the pattern file.
Comment 12 Jonathan Kew (:jfkthame) 2011-07-20 03:29:34 PDT
Pushed to mozilla-inbound:
http://hg.mozilla.org/integration/mozilla-inbound/rev/59533328a665 (makefile chg)
http://hg.mozilla.org/integration/mozilla-inbound/rev/b88768594e63 (sv patterns)
http://hg.mozilla.org/integration/mozilla-inbound/rev/337883870ff5 (sv test)
http://hg.mozilla.org/integration/mozilla-inbound/rev/ef182a0608fb (more patterns)
http://hg.mozilla.org/integration/mozilla-inbound/rev/86903332e6a4 (more tests)

Keeping this bug open for now as additional pattern resources are in preparation.
Comment 13 Jonathan Kew (:jfkthame) 2011-07-20 06:26:02 PDT
Created attachment 547062 [details] [diff] [review]
hyphenation patterns for German

For German, there are separate patterns for traditional and reformed orthographies, and for Swiss German. The hyphenation-alias pref is used to select the reformed (1996) patterns as the default for data that is just tagged as lang="de" without a more specific subtag.

(The handling of these tags should probably be updated as part of the BCP47 effort, but for now the hyphenation manager just uses its own limited alias/wildcard scheme.)
Comment 14 Jonathan Kew (:jfkthame) 2011-07-20 06:32:43 PDT
Created attachment 547065 [details] [diff] [review]
reftests for German hyphenation
Comment 15 Jonathan Kew (:jfkthame) 2011-07-20 07:16:38 PDT
Created attachment 547080 [details] [diff] [review]
hyphenation patterns for Mongolian (Cyrillic script)

These patterns are labelled as mn-Cyrl in the TeX archives, but it seems likely some data may just be tagged as lang="mn" without the script subtag. Unless/until we also have patterns for mn-Mong (if hyphenation is even a possibility there, which seems doubtful), I think it's simplest to label this as plain "mn".

In the event that someone uses mn-Mong on a web page (whether tagged as "mn" or explicitly "mn-Mong"), this won't result in hyphenation there because the patterns simply won't match any Mongolian-script data.
Comment 16 Jonathan Kew (:jfkthame) 2011-07-20 07:17:17 PDT
Created attachment 547081 [details] [diff] [review]
reftest for Mongolian hyphenation
Comment 18 Jonathan Kew (:jfkthame) 2011-07-20 14:44:49 PDT
Created attachment 547254 [details] [diff] [review]
add patterns for Serbo-Croatian (covering Serbian & Bosnian lang tags)
Comment 19 Jonathan Kew (:jfkthame) 2011-07-20 14:46:19 PDT
Created attachment 547256 [details] [diff] [review]
reftest for Serbian hyphenation

The Serbo-Croatian patterns cover both Latin and Cyrillic orthographies - they are a combination of the two separate sets of LaTeX patterns. This test checks that hyphenation is working for both scripts.
Comment 20 Stefan 2011-07-20 16:03:49 PDT
(In reply to comment #13)
> Created attachment 547062 [details] [diff] [review] [review]
> hyphenation patterns for German

May I report "false" hyphenations be here?
Comment 21 Stefan 2011-07-20 16:04:11 PDT
(In reply to comment #13)
> Created attachment 547062 [details] [diff] [review] [review]
> hyphenation patterns for German

May I report "false" hyphenations here?
Comment 22 Stefan 2011-07-20 16:07:24 PDT
Created attachment 547282 [details]
false hyphenation war-um instead of wa-rum
Comment 23 Stefan 2011-07-20 16:07:57 PDT
Comment on attachment 547282 [details]
false hyphenation war-um instead of wa-rum

><style>
>body {
>   width: 8em;
>   -moz-hyphens: auto;
>   word-wrap: break-word;
>}
></style>
><p lang = "de">
>Warum
>Warum
>Warum
>Warum
>Warum
>Warum
>Warum
>Warum
>Warum
>Warum
>Warum
>Warum
>Warum
>Warum
>Warum
>Warum
Comment 24 Jonathan Kew (:jfkthame) 2011-07-21 01:16:59 PDT
(In reply to comment #22)
> Created attachment 547282 [details]
> false hyphenation war-um instead of wa-rum

The hyphenation "war-um" is shown in the wordlist at http://repo.or.cz/w/wortliste.git, which is the upstream source for the German patterns here.

(I'm not a German expert, but I suspect this may be a case where the "correct" hyphenation is open to question, perhaps depending whether you prefer to give more weight to morphology or phonology, or other factors.)

It would be best to raise this issue with Werner Lemberg, the author/maintainer of the resources we're using (see the intl/locales/de-1996/hyphenation/LICENSE file); you could file a separate Mozilla bug to track the issue so that it doesn't get lost in the meantime, but it's not really practical to debate and alter individual hyphenations here. If there are problems with the patterns for a particular language, this should be addressed upstream where the patterns are maintained, and then a new revision imported into our codebase.
Comment 25 Jonathan Kew (:jfkthame) 2011-07-21 01:34:57 PDT
BTW, the dictionary at http://dict.tu-chemnitz.de lists "Word division: wa·r·um", which seems to imply that either "wa-rum" or "war-um" could be permissible.
Comment 26 Simon Montagu :smontagu 2011-07-21 02:59:04 PDT
Comment on attachment 547062 [details] [diff] [review]
hyphenation patterns for German

Review of attachment 547062 [details] [diff] [review]:
-----------------------------------------------------------------

::: modules/libpref/src/init/all.js
@@ +1115,5 @@
> +// use reformed (1996) German patterns by default unless specifically tagged as de-1901
> +// (these prefs may soon be obsoleted by better BCP47-based tag matching, but for now...)
> +pref("intl.hyphenation-alias.de", "de-1996");
> +pref("intl.hyphenation-alias.de-*", "de-1996");
> +pref("intl.hyphenation-alias.de-DE-1901", "de-1901");

Do we not need a pref entry for de-CH and/or de-CH-*?
Comment 27 Simon Montagu :smontagu 2011-07-21 03:01:49 PDT
Comment on attachment 547065 [details] [diff] [review]
reftests for German hyphenation

Review of attachment 547065 [details] [diff] [review]:
-----------------------------------------------------------------

Here again I'd like to see a test for de-CH
Comment 28 Simon Montagu :smontagu 2011-07-21 03:03:20 PDT
Comment on attachment 547080 [details] [diff] [review]
hyphenation patterns for Mongolian (Cyrillic script)

Review of attachment 547080 [details] [diff] [review]:
-----------------------------------------------------------------

rs=me
Comment 29 Simon Montagu :smontagu 2011-07-21 03:04:11 PDT
Comment on attachment 547081 [details] [diff] [review]
reftest for Mongolian hyphenation

Review of attachment 547081 [details] [diff] [review]:
-----------------------------------------------------------------
Comment 30 Simon Montagu :smontagu 2011-07-21 03:05:52 PDT
Comment on attachment 547254 [details] [diff] [review]
add patterns for Serbo-Croatian (covering Serbian & Bosnian lang tags)

Review of attachment 547254 [details] [diff] [review]:
-----------------------------------------------------------------
Comment 31 Simon Montagu :smontagu 2011-07-21 03:07:08 PDT
Comment on attachment 547256 [details] [diff] [review]
reftest for Serbian hyphenation

Review of attachment 547256 [details] [diff] [review]:
-----------------------------------------------------------------
Comment 32 Jonathan Kew (:jfkthame) 2011-07-21 03:33:38 PDT
(In reply to comment #26)
> Do we not need a pref entry for de-CH and/or de-CH-*?

Not for de-CH, because we have a resource by that name; but you're right, we should have de-CH-* aliased to de-CH.

And now that you mention it, we should be adding an alias for xx-* in each of the "simple" cases where we have patterns for language xx. Otherwise, if (for example) text is tagged with lang="pt-PT" or "pt-BR" instead of plain "pt", the patterns won't be found.

I'll put up a patch to add those en masse. (Again, I anticipate that upcoming BCP47 work may supersede this, but for now...)
Comment 33 Jonathan Kew (:jfkthame) 2011-07-21 03:45:16 PDT
Created attachment 547357 [details] [diff] [review]
reftest for German hyphenation - added de-CH

This adds a testcase using de-CH. (It actually yields the same hyphen positions as standard German in this case. There are differences in the patterns, but I don't know enough about them to readily identify specific words that should turn out different.)
Comment 34 Jonathan Kew (:jfkthame) 2011-07-21 03:47:29 PDT
Created attachment 547359 [details] [diff] [review]
hyphenation patterns for German - added de-CH-* alias
Comment 35 Jonathan Kew (:jfkthame) 2011-07-21 04:02:00 PDT
Created attachment 547362 [details] [diff] [review]
add missing hyph-aliases for "xx-*" -> "xx"

This adds the appropriate aliases for the already-landed patterns. (Similar entries should be included with each additional language.)
Comment 36 Stefan 2011-07-21 08:18:53 PDT
(In reply to comment #25)
> either "wa-rum" or "war-um" could be permissible.

I stand corrected. Acording to § 113 of the rules
http://www.canoo.net/services/GermanSpelling/Amtlich/Trennung/pgf107-112.html#pgf113 both are permissible.

What about "EinsPlus" which is hyphenated as "Ein-sPlus"?
Comment 37 Jonathan Kew (:jfkthame) 2011-07-21 09:53:11 PDT
(In reply to comment #36)

> What about "EinsPlus" which is hyphenated as "Ein-sPlus"?

I assume "einsplus" is not a standard word? This illustrates that we should probably give some kind of special treatment to "words" that have CamelCasing. A similar problem arises in English; picking an arbitrary example, "CinemaScope" gets hyphenated as "Cine-maS-cope". :(

Please file a separate bug about this, however; it's not an issue of patterns for particular locales, it's something I think we should solve in a more general way.
Comment 38 Simon Montagu :smontagu 2011-07-21 11:49:34 PDT
Comment on attachment 547357 [details] [diff] [review]
reftest for German hyphenation - added de-CH

Review of attachment 547357 [details] [diff] [review]:
-----------------------------------------------------------------
Comment 39 Simon Montagu :smontagu 2011-07-21 11:50:05 PDT
Comment on attachment 547359 [details] [diff] [review]
hyphenation patterns for German - added de-CH-* alias

Review of attachment 547359 [details] [diff] [review]:
-----------------------------------------------------------------
Comment 40 Simon Montagu :smontagu 2011-07-21 11:51:01 PDT
Comment on attachment 547362 [details] [diff] [review]
add missing hyph-aliases for "xx-*" -> "xx"

Review of attachment 547362 [details] [diff] [review]:
-----------------------------------------------------------------
Comment 42 Jonathan Kew (:jfkthame) 2011-07-22 01:46:17 PDT
Created attachment 547632 [details] [diff] [review]
pt 9.1 - patterns for French

The French patterns are not LPPL-licensed, but have a simple "free" statement that permits redistribution and modified versions; see LICENSE:
+% This file is available for free and can used and redistributed
+% asis for free. Modified versions should have another name.

Gerv, flagging you for feedback so you have a chance to confirm that this is OK.
Comment 43 Jonathan Kew (:jfkthame) 2011-07-22 01:47:11 PDT
Created attachment 547633 [details] [diff] [review]
pt 9.2 - test for French hyphenation
Comment 44 Jonathan Kew (:jfkthame) 2011-07-22 03:13:30 PDT
Created attachment 547649 [details] [diff] [review]
pt 10.1 - British English patterns

Gerv, please check whether the licensing terms are OK here as well. The original terms (included verbatim in our LICENSE file) say:

% Unlimited copying and redistribution of this file
% is permitted so long as the file is not modified
% in any way.
%
% Modifications may be made for private purposes (though
% this is discouraged, as it could result in documents
% hyphenating differently on different systems) but if
% such modifications are re-distributed, the modified
% file must not be capable of being confused with the
% original.  In particular, this means
%
%(a) the filename (the portion before the extension, if any)
%    must not match any of :
%
%        UKHYPH                  UK-HYPH
%        UKHYPHEN                UK-HYPHEN
%        UKHYPHENS               UK-HYPHENS
%        UKHYPHENATION           UK-HYPHENATION
%        UKHYPHENISATION         UK-HYPHENISATION
%        UKHYPHENIZATION         UK-HYPHENIZATION
%
%   regardless of case, and
%
%(b) the file must contain conditions identical to these,
% except that the modifier/distributor may, if he or she
% wishes, augment the list of proscribed filenames.

which looks to me like it covers our situation.
Comment 45 Jonathan Kew (:jfkthame) 2011-07-22 06:31:52 PDT
Created attachment 547677 [details] [diff] [review]
pt 10.2 - tests for British English patterns
Comment 46 Jonathan Kew (:jfkthame) 2011-07-22 06:48:17 PDT
Created attachment 547680 [details] [diff] [review]
pt 11.1 - hyphenation patterns for Russian

Russian patterns - distributed under LPPL 1.2+. (Should have been included with the rest of the LPPL-licensed languages, but the license file had slightly different phrasing and my simple grep missed it.)
Comment 47 Jonathan Kew (:jfkthame) 2011-07-22 06:54:59 PDT
Created attachment 547682 [details] [diff] [review]
pt 11.2 - test for Russian patterns
Comment 48 Jonathan Kew (:jfkthame) 2011-07-22 07:51:52 PDT
Created attachment 547688 [details] [diff] [review]
pt 12.1 - patterns for Norwegian

There are two versions of Norwegian, "nb" (Bokmål) and "nn" (Nynorsk). The macrolanguage "no" is aliased to "nb" on the grounds that this is the more widely used written form.
Comment 49 Jonathan Kew (:jfkthame) 2011-07-22 07:55:00 PDT
Comment on attachment 547688 [details] [diff] [review]
pt 12.1 - patterns for Norwegian

Also tagging Gerv for feedback, just to double-check the licensing is OK. The original files include the statement:

% Copyright (C) 2007 Karl Ove Hufthammer.
% Copying and distribution of this file, with or without modification,
% are permitted in any medium without royalty, provided the copyright
% notice and this notice are preserved.

which seems pretty clear.
Comment 50 Jonathan Kew (:jfkthame) 2011-07-22 07:56:04 PDT
Created attachment 547689 [details] [diff] [review]
pt 12.2 - tests for Norwegian
Comment 51 Gervase Markham [:gerv] 2011-07-22 09:18:14 PDT
French: OK.
Norwegian: OK.

For both of these, leave the license the same as now. Unlike the LPPL, there is no need to make a change.

UK English: less clear. It depends what it means by "modifications may be made for private purposes" and then talking about distributing them! Do you have precedent for free software redistribution (e.g. by Debian)?

Gerv
Comment 52 Jonathan Kew (:jfkthame) 2011-07-22 09:55:40 PDT
(In reply to comment #51)
> French: OK.
> Norwegian: OK.
> 
> For both of these, leave the license the same as now. Unlike the LPPL, there
> is no need to make a change.

OK, thanks.

> UK English: less clear. It depends what it means by "modifications may be
> made for private purposes" and then talking about distributing them!

The only logical interpretation I can come up with (aside from that they're trying to discourage tampering in general) is that for private purposes, the file could be modified "in-place" so as to directly alter the behavior of the overall system (i.e. without a renaming requirement); whereas if you decide to distribute something that is modified, you MUST rename/document/etc so as to avoid the possibility of confusion with the canonical version.

> Do you
> have precedent for free software redistribution (e.g. by Debian)?

I don't think Debian distributes this, but AFAIK their primary reason is that the "source" (the OUP list of hyphenated words from which the patterns were derived) is not available, rather than a licensing concern.

OTOH, OpenOffice distributes a derived version (see http://wiki.services.openoffice.org/wiki/Dictionaries#English_.28AU.2CCA.2CGB.2CNZ.2CUS.2CZA.29) that is relicensed under LGPL.
Comment 53 Jonathan Kew (:jfkthame) 2011-07-22 10:17:58 PDT
Oh, and there's the most obvious free software distribution of this stuff - the TeX Live collection. See http://tug.org/texlive/copying.html for the top-level summary of their license conditions.

MikTeX also distributes it, and so apparently considers it free software (see http://miktex.org/copying).
Comment 54 Jonathan Kew (:jfkthame) 2011-07-22 10:48:14 PDT
Aha, I found some archived discussion of the UK hyphenation patterns and their (strange) license; see http://forum.soft32.com/linux/Strange-license-ukhyphen-ftopict290515.html. The main point of debate there seems to centre around the renaming requirement for modified versions, and in particular the inclusion of a list of prohibited "similar" filenames.
Comment 56 Jonathan Kew (:jfkthame) 2011-07-22 14:12:13 PDT
Created attachment 547798 [details] [diff] [review]
pt 13.1 - patterns for Lithuanian

In this case, the license documentation in hyph-utf8 was less complete, but checking the Lithuanian TeX package that is upstream of that repackaging confirms that the patterns are LPPL-licensed.
Comment 57 Jonathan Kew (:jfkthame) 2011-07-22 14:13:14 PDT
Created attachment 547799 [details] [diff] [review]
pt 13.2 - test for Lithuanian patterns
Comment 59 Jonathan Kew (:jfkthame) 2011-07-23 01:43:15 PDT
Created attachment 547907 [details] [diff] [review]
pt 14.1 - patterns for Finnish
Comment 60 Jonathan Kew (:jfkthame) 2011-07-23 01:44:18 PDT
Created attachment 547908 [details] [diff] [review]
pt 14.2 - test for Finnish patterns
Comment 61 Simon Montagu :smontagu 2011-07-24 09:20:10 PDT
Comment on attachment 547632 [details] [diff] [review]
pt 9.1 - patterns for French

Review of attachment 547632 [details] [diff] [review]:
-----------------------------------------------------------------

rs=me
Comment 62 Simon Montagu :smontagu 2011-07-24 09:24:14 PDT
Comment on attachment 547633 [details] [diff] [review]
pt 9.2 - test for French hyphenation

Review of attachment 547633 [details] [diff] [review]:
-----------------------------------------------------------------

::: layout/reftests/text/reftest.list
@@ +126,4 @@
>  == auto-hyphenation-mn-1.html auto-hyphenation-mn-1-ref.html
>  == auto-hyphenation-sh-1.html auto-hyphenation-sh-1-ref.html
>  == auto-hyphenation-sr-1.html auto-hyphenation-sr-1-ref.html
> +== auto-hyphenation-fr-1.html auto-hyphenation-fr-1-ref.html

Nit: life will be easier in the future if you alphabetize the list of tests. (Ditto the prefs in all.js, if they aren't already)
Comment 63 Simon Montagu :smontagu 2011-07-24 09:25:46 PDT
Comment on attachment 547649 [details] [diff] [review]
pt 10.1 - British English patterns

Review of attachment 547649 [details] [diff] [review]:
-----------------------------------------------------------------
Comment 64 Simon Montagu :smontagu 2011-07-24 09:26:45 PDT
Comment on attachment 547677 [details] [diff] [review]
pt 10.2 - tests for British English patterns

Review of attachment 547677 [details] [diff] [review]:
-----------------------------------------------------------------
Comment 65 Simon Montagu :smontagu 2011-07-24 09:27:38 PDT
Comment on attachment 547680 [details] [diff] [review]
pt 11.1 - hyphenation patterns for Russian

Review of attachment 547680 [details] [diff] [review]:
-----------------------------------------------------------------
Comment 66 Simon Montagu :smontagu 2011-07-24 09:28:21 PDT
Comment on attachment 547682 [details] [diff] [review]
pt 11.2 - test for Russian patterns

Review of attachment 547682 [details] [diff] [review]:
-----------------------------------------------------------------
Comment 67 Simon Montagu :smontagu 2011-07-24 09:45:57 PDT
Comment on attachment 547688 [details] [diff] [review]
pt 12.1 - patterns for Norwegian

Review of attachment 547688 [details] [diff] [review]:
-----------------------------------------------------------------

::: modules/libpref/src/init/all.js
@@ +1149,5 @@
>  pref("intl.hyphenation-alias.sr-*", "sh");
>  pref("intl.hyphenation-alias.bs-*", "sh");
>  
> +// Norwegian has two forms, Bokmål and Nynorsk, with "no" as a macrolanguage encompassing both.
> +// For "no", we'll alias to "nb" (Bokmål) as that is the more widely used written form.

This one made me wonder about a general question: are these prefs overridable by l10ns? On the one hand, I suppose that will let the fingerprinting genie back out of the bottle, but on the other hand since we do have separate nb and nn l10n, I should think the nn version won't be happy with this default.
Comment 68 Simon Montagu :smontagu 2011-07-24 09:49:29 PDT
Comment on attachment 547689 [details] [diff] [review]
pt 12.2 - tests for Norwegian

Review of attachment 547689 [details] [diff] [review]:
-----------------------------------------------------------------

Do you want to add a test for the examples that appear in the licence files as different in nb and nn (attende and betre)?
Comment 69 Simon Montagu :smontagu 2011-07-24 09:50:35 PDT
Comment on attachment 547798 [details] [diff] [review]
pt 13.1 - patterns for Lithuanian

Review of attachment 547798 [details] [diff] [review]:
-----------------------------------------------------------------
Comment 70 Simon Montagu :smontagu 2011-07-24 09:51:18 PDT
Comment on attachment 547799 [details] [diff] [review]
pt 13.2 - test for Lithuanian patterns

Review of attachment 547799 [details] [diff] [review]:
-----------------------------------------------------------------
Comment 71 Simon Montagu :smontagu 2011-07-24 09:52:46 PDT
Comment on attachment 547907 [details] [diff] [review]
pt 14.1 - patterns for Finnish

Review of attachment 547907 [details] [diff] [review]:
-----------------------------------------------------------------
Comment 72 Simon Montagu :smontagu 2011-07-24 09:53:18 PDT
Comment on attachment 547908 [details] [diff] [review]
pt 14.2 - test for Finnish patterns

Review of attachment 547908 [details] [diff] [review]:
-----------------------------------------------------------------
Comment 73 Jonathan Kew (:jfkthame) 2011-07-24 14:03:15 PDT
(In reply to comment #68)
> Do you want to add a test for the examples that appear in the licence files
> as different in nb and nn (attende and betre)?

Sure, I'll add those to the test files, with the appropriate -ref version for each.

(In reply to comment #67)
> This one made me wonder about a general question: are these prefs
> overridable by l10ns?

We discussed this a bit back when hyphenation support was initially being added. As far as I understood, I don't think there is currently any (easy?) way for l10n to override default prefs. I think this is something we ought to support for a number of reasons, not just hyphenation defaults - e.g. it ought to be possible for localizers to customize the default font settings, too.

Besides Norwegian, another example that might deserve l10n treatment is "en"; currently, this maps to "en-US", but if we add en-GB patterns then it would seem reasonable for the en-GB version to change the mapping for "en".

Probably worth opening a new bug on this specific topic, and discussing again with Pike & others. But perhaps we should allow the BCP47 dust to settle before worrying too much about this; I'm hoping we'll find ourselves with nice new BPC47-based lang/locale-matching APIs that can supersede and improve on the current hyph-alias prefs.
Comment 74 Jonathan Kew (:jfkthame) 2011-07-24 14:05:48 PDT
Gerv: any further thoughts re the UK English question (see comments 51-52)?
Comment 75 leon fan 2011-07-25 00:55:29 PDT
Do we already decide how to deliver those hyphenation resources to end user? all will be installed by default or user have to download them as addon, or will automatically download when it's needed?
Comment 76 Jonathan Kew (:jfkthame) 2011-07-26 03:49:43 PDT
Created attachment 548406 [details] [diff] [review]
pt 12.2 - tests for Norwegian (revised)

Added the specific example words that differ between nb/nn locales.
Comment 77 Jonathan Kew (:jfkthame) 2011-07-26 03:53:37 PDT
(In reply to comment #75)
> Do we already decide how to deliver those hyphenation resources to end user?
> all will be installed by default or user have to download them as addon, or
> will automatically download when it's needed?

The current approach is to install them all, so as to ensure consistent behavior for everyone. We are also considering other options that we could use if this becomes a problem, either due to size or because of licensing constraints on some of the resources we'd like to offer.
Comment 78 Simon Montagu :smontagu 2011-07-26 04:05:10 PDT
Comment on attachment 548406 [details] [diff] [review]
pt 12.2 - tests for Norwegian (revised)

Review of attachment 548406 [details] [diff] [review]:
-----------------------------------------------------------------
Comment 81 Jonathan Kew (:jfkthame) 2011-08-17 04:29:43 PDT
Created attachment 553729 [details] [diff] [review]
pt 15.1 - hyphenation patterns for Hungarian

The author of the Hungarian patterns has kindly relicensed them under the Mozilla tri-license terms (they were formerly GPL-only), so we can now include these.
Comment 82 Jonathan Kew (:jfkthame) 2011-08-17 04:32:48 PDT
Created attachment 553730 [details] [diff] [review]
pt 15.2 - reftest for Hungarian hyphenation

Test for the Hungarian patterns.

Also, just realized that I forgot the hyph-alias entry for "hu-*" in the pt 15.1 patch; this will be added before landing.
Comment 83 Jonathan Kew (:jfkthame) 2011-08-17 13:38:09 PDT
Created attachment 553887 [details] [diff] [review]
pt 16.1 - hyphenation patterns for Italian
Comment 84 Jonathan Kew (:jfkthame) 2011-08-17 13:38:51 PDT
Created attachment 553888 [details] [diff] [review]
pt 16.2 - test for Italian patterns
Comment 85 Jonathan Kew (:jfkthame) 2011-08-17 13:39:42 PDT
Created attachment 553889 [details] [diff] [review]
pt 17.1 - hyphenation patterns for Turkish
Comment 86 Jonathan Kew (:jfkthame) 2011-08-17 13:40:28 PDT
Created attachment 553890 [details] [diff] [review]
pt 17.2 - test for Turkish patterns
Comment 87 Gervase Markham [:gerv] 2011-08-19 06:00:13 PDT
Comment on attachment 547649 [details] [diff] [review]
pt 10.1 - British English patterns

Need more info on precedent for distribution of unclear UK English patterns (e.g. from the Debian project).

Gerv
Comment 88 Mojca Miklavec 2011-08-19 06:20:08 PDT
Original files are at:

http://www.ctan.org/pkg/ukhyph
http://mirrors.ctan.org/language/hyphenation/ukhyphen.tex

But those files are not guaranteed to stay.

If licence needs to be resolved (there is currently just free-text licence description), I would suggest to contact the author and also update hyph-utf8 repository.
Comment 89 Eric Shepherd [:sheppy] 2011-08-24 13:07:05 PDT
Added a list of added languages to the following page. This is everything that's in the Aurora build as of today.

https://developer.mozilla.org/en/CSS/hyphens
Comment 90 Mojca Miklavec 2011-08-24 13:27:44 PDT
Thanks a lot for this list.

hsb = Upper Sorbian
kmr = Kurmanji (Northern Kurdish)
de-CH = Swiss German, Traditional Orthography (Czech German - are you joking? :)
de-1901 = German, Traditional Orthography
de-1996 = German, Reformed Orthography

I think that Bokman and Nynorsk could be written with uppercase. I slightly prefer Slovenian to Slovene as adjective/for language name, but this can be an infinite debate/it's a cosmetic issue anyway.
Comment 91 Eric Shepherd [:sheppy] 2011-08-24 13:29:51 PDT
Fixed those. That "Czech German" thing, I dunno where that came from. Erp.
Comment 93 Gordon P. Hemsley [:GPHemsley] 2011-08-25 12:59:41 PDT
(In reply to Mojca Miklavec from comment #90)
> hsb = Upper Sorbian
> kmr = Kurmanji (Northern Kurdish)
> de-CH = Swiss German, Traditional Orthography (Czech German - are you
> joking? :)
> de-1901 = German, Traditional Orthography
> de-1996 = German, Reformed Orthography
> 
> I think that Bokman and Nynorsk could be written with uppercase. I slightly
> prefer Slovenian to Slovene as adjective/for language name, but this can be
> an infinite debate/it's a cosmetic issue anyway.

I should note that all of these names should eventually be generated by bug 666662.
Comment 94 Ed Morley [:emorley] 2011-08-25 18:41:20 PDT
Landed for mozilla9:
http://hg.mozilla.org/mozilla-central/rev/13e47b981869 (Hungarian)
http://hg.mozilla.org/mozilla-central/rev/f381ae05803a (hu-test)
http://hg.mozilla.org/mozilla-central/rev/002abea8ccb9 (Italian)
http://hg.mozilla.org/mozilla-central/rev/b39232627a54 (it-test)
http://hg.mozilla.org/mozilla-central/rev/53e0de790071 (Turkish)
http://hg.mozilla.org/mozilla-central/rev/079f4e4a1f4b (tr-test)

Leaving open as not clear what left to do here, please close if appropriate. Thanks! :-)
Comment 95 Eric Shepherd [:sheppy] 2011-08-26 05:23:04 PDT
Added hu, it, and tr to the documentation, flagged as being in Firefox 9.
Comment 96 Pablo Rodríguez 2011-09-11 04:41:11 PDT
Would it be possible to add hyphenation for Greek (el-EL) and ancient Greek (grc)?
Comment 97 Jonathan Kew (:jfkthame) 2011-09-11 07:48:49 PDT
(In reply to Pablo Rodríguez from comment #96)
> Would it be possible to add hyphenation for Greek (el-EL) and ancient Greek
> (grc)?

This depends on the availability of hyphenation patterns under licensing terms that allow us to ship them (or rather, a modified form) in Firefox. There are Greek patterns in the hyph-utf8 package in TeX Live, but I don't see any obvious statement of their license terms, so we'd need this to be clarified before trying to use them.
Comment 98 Pablo Rodríguez 2011-09-11 13:21:32 PDT
(In reply to Jonathan Kew from comment #97)
> (In reply to Pablo Rodríguez from comment #96)
> > Would it be possible to add hyphenation for Greek (el-EL) and ancient Greek
> > (grc)?
> 
> There are Greek patterns in the hyph-utf8 package in TeX Live, but I don't
> see any obvious statement of their license terms, so we'd need this to be
> clarified before trying to use them.

CTAN contains the same patterns (http://www.ctan.org/tex-archive/language/hyphenation/elhyphen) and contains they are released under the LaTeX Project Public License (http://mirrors.ctan.org/language/hyphenation/elhyphen/copyrite.txt).

I guess this should be OK.

BTW, patterns for polytonic Greek are also interesting but there is no standard tag for it.

Thanks for your help,


Pablo
Comment 99 Pablo Rodríguez 2011-09-18 03:34:40 PDT
Jonathan,

sorry, but since I got no reply, I don't know whether the suggested patterns for hyphenating ancient, polytonic and monotonic Greek are fine.

Are the patterns at http://www.ctan.org/tex-archive/language/hyphenation/elhyphen to be included in FF/TB for hyphenation?

Thanks for your help,


Pablo
Comment 100 Mojca Miklavec 2011-09-18 03:59:12 PDT
No, the files in elhyphen are for pdfTeX in some weird font encoding. You should use the patterns from hyph-utf8. They are the same patterns, but in proper UTF-8 encoding.

The licence statements were slightly updated on 13th September. See http://tug.org/svn/texhyphen?view=revision&revision=592. That version should be fine for inclusion, at least that was the intention.
Comment 101 Pablo Rodríguez 2011-10-02 09:38:02 PDT
(In reply to Mojca Miklavec from comment #100)
> No, the files in elhyphen are for pdfTeX in some weird font encoding. You
> should use the patterns from hyph-utf8. They are the same patterns, but in
> proper UTF-8 encoding.
> 
> The licence statements were slightly updated on 13th September. See
> http://tug.org/svn/texhyphen?view=revision&revision=592. That version should
> be fine for inclusion, at least that was the intention.

Thanks for the comment, Mojca.

I hope the proper Greek hyphenation dictionaries (at least for monotonic Greek) will be included in Firefox 8 (right now they are missing from https://developer.mozilla.org/en/CSS/hyphens#Gecko_notes).

Thanks again,


Pablo
Comment 102 Santhosh Thottingal 2011-10-09 00:05:57 PDT
Would it be possible to add hyphenation for some Indic languages? The patterns are present here: http://tug.org/svn/texhyphen/trunk/hyph-utf8/tex/generic/hyph-utf8/patterns/tex/

I am requesting for these patterns hyph-[hi, sa, ml, ta, gu, as, bn,  or, te,  kn , mr, pa]

Thanks
Comment 103 Jonathan Kew (:jfkthame) 2011-10-09 01:41:49 PDT
(In reply to Santhosh Thottingal from comment #102)
> Would it be possible to add hyphenation for some Indic languages? The
> patterns are present here:
> http://tug.org/svn/texhyphen/trunk/hyph-utf8/tex/generic/hyph-utf8/patterns/
> tex/
> 
> I am requesting for these patterns hyph-[hi, sa, ml, ta, gu, as, bn,  or,
> te,  kn , mr, pa]

We can't simply import those patterns, due to licensing issues - they're distributed under the LGPL only. However, I hope to create and add patterns for Indic languages sometime soon.

(In reply to Pablo Rodríguez from comment #101)
> (In reply to Mojca Miklavec from comment #100)
> > No, the files in elhyphen are for pdfTeX in some weird font encoding. You
> > should use the patterns from hyph-utf8. They are the same patterns, but in
> > proper UTF-8 encoding.
> > 
> > The licence statements were slightly updated on 13th September. See
> > http://tug.org/svn/texhyphen?view=revision&revision=592. That version should
> > be fine for inclusion, at least that was the intention.
> 
> Thanks for the comment, Mojca.
> 
> I hope the proper Greek hyphenation dictionaries (at least for monotonic
> Greek) will be included in Firefox 8 (right now they are missing from
> https://developer.mozilla.org/en/CSS/hyphens#Gecko_notes).

I'm intending to return to this soon, but have been busy with other issues, as well as waiting on further clarification of some licensing questions (not just specific to Greek).
Comment 104 Marek Stępień [:marcoos, inactive] 2011-11-22 15:00:18 PST
There is a TeX hyphenation file for Polish:

ftp://ftp.gust.org.pl/TeX/language/hyph-utf8/tex/generic/hyph-utf8/patterns/tex/hyph-pl.tex

The license of this file says:

%    Do with this file whatever needs to be done in future for the sake of
%    "a better world" as long as you respect the copyright of original file.
%    If you're the original author of patterns or taking over a new revolution,
%    plese remove all of the TUG comments & credits that we added here -
%    you are the Queen / the King, we are only the servants.

OpenOffice uses it and calls the above license "public domain" (which it probably really isn't) - http://extensions.services.openoffice.org/project/pl-dict - this is more in line with "WTF Public License", unless you treat "a better world" literally, which brings us closely to Crockford vs IBM: http://www.youtube.com/watch?v=-hCimLnIsDA :)
Comment 105 Marek Stępień [:marcoos, inactive] 2011-11-22 15:15:31 PST
Reading more into it, this seems to be a community-maintained former abandonware assumed to be open source. Ugh. :(
Comment 106 Jonathan Kew (:jfkthame) 2011-11-22 23:08:52 PST
That material dates back to the days before people obsessed so much over explicit licensing terms when stuff was made available "freely". The original authors are listed (just a bit higher up in the file). I've been in contact with Jacko and Marek regarding getting the license clarified so that we'd be able to use it, and they'd be fine with that, but unfortunately we don't have contact info for Hanna, and so it's difficult to move forward. There's a plan to create a new (and explicitly-licensed) set of Polish patterns, but that will take some time.
Comment 107 Pablo Rodríguez 2011-12-02 12:34:57 PST
(In reply to Jonathan Kew (:jfkthame) from comment #103)
> (In reply to Pablo Rodríguez from comment #101)
> [...]
> I'm intending to return to this soon, but have been busy with other issues,
> as well as waiting on further clarification of some licensing questions (not
> just specific to Greek).

Is there any chance that Firefox 9 will have these hyphenation dictionaries?

Many thanks,


Pablo
Comment 108 Pablo Rodríguez 2011-12-02 12:59:50 PST
Another issue that I guess it would be interesting to solve with hyphenation is URL breaking.

Of course, URL breaking shouldn't use hyphens and probably a good method would be the default one used by the LaTeX package url.sty.

Would it be possible to enable this feature for URLs when hyphenation is active?

Thanks for your help,


Pablo
Comment 109 Pablo Rodríguez 2012-06-10 09:38:58 PDT
(In reply to Jonathan Kew (:jfkthame) from comment #103)
> (In reply to Pablo Rodríguez from comment #101)
> > I hope the proper Greek hyphenation dictionaries (at least for monotonic
> > Greek) will be included in Firefox 8 (right now they are missing from
> > https://developer.mozilla.org/en/CSS/hyphens#Gecko_notes).
> 
> I'm intending to return to this soon, but have been busy with other issues,
> as well as waiting on further clarification of some licensing questions (not
> just specific to Greek).

Has this issue improved since last time?

Many thanks for your help,


Pablo
Comment 110 Marcis Gasuns 2013-11-15 12:40:39 PST
Issues for Indian languages seem to get stuck. Too bad,

Great idea anyway,

Marcis
Comment 111 Santhosh Thottingal 2013-11-21 22:04:42 PST
For Indic languages, I am the copyright holder(all licensed under LGPL). Please let me know what you need to get the patterns included. No need to get stuck because of licensing issues here :).
Comment 112 Gordon P. Hemsley [:GPHemsley] 2014-01-02 20:29:29 PST
I notice that this has a target milestone of mozilla8, and a lot of changes were landed during that cycle, yet this is still open.

Jonathan: What is left to be done here? Might it be better off spun into one or more follow-up bugs? (The bug summary itself is somewhat vague about when this might be considered "fixed".)
Comment 113 Jonathan Kew (:jfkthame) 2014-01-03 06:59:59 PST
This basically stalled, pending clarification of some licensing questions.

(Note that LGPL code, for example, was -not- being accepted into Firefox at the time this was active. The policy now states that "it may be permissible to import Third Party Code under the LGPL (version 2.0 upwards) to be Product Code if it's a clearly-demarcated library and will be dynamically linked into the product", which might be OK for hyphenation resources, but we should verify that with Gerv.)

We did add resources for a bunch of languages here, so given that we lost momentum, perhaps it'd be best to resolve this and suggest people file new bugs for specific additional resources, and make them dependencies of the general "Enhance hyphenation" bug 656750.
Comment 114 Jonathan Kew (:jfkthame) 2014-01-03 07:02:16 PST
Pablo, Santosh, Marcis, Marek, etc: please feel free to file new bugs for any specific languages where suitably-licensed resources are available, and we'll try to get them added. Sorry it's not been at the top of anyone's priority list to keep pushing this forward.
Comment 115 Pablo Rodríguez 2014-01-06 12:16:34 PST
Many thanks for the reply, Jonathan.

Sorry for the obvious question, but I’m afraid that I’m not familiar with Firefox licensing: which are the allowed licenses for hyphenation dictionaries?

Best wishes to all for the new yesar 2014,

Pablo
Comment 116 Pablo Rodríguez 2014-01-06 12:18:55 PST
Sorry, I forgot to mention: is the information available at https://developer.mozilla.org/en-US/docs/Web/CSS/hyphens#Notes_on_supported_languages updated?
Comment 117 Jean-Yves Perrier [:teoli] 2014-01-07 00:34:06 PST
It is a wiki so it is trivial to update it. If new hyphenation dictionaries are added you can also notify the doc team by setting the dev-doc-needed keyword on the relevant bug (though you don't have a guarantee about how long you'll have to wait, the support of a new hyphenation dict is a trivial change in the doc and should be done by the release of the relevant Firefox).

Note that new bugs will also help us in keeping the doc updated.

Note You need to log in before you can comment on or make changes to this bug.