Bugzilla

Assignee

Comment 3

•

13 years ago

Attached patch part 1 - implement -moz-hyphens property in CSS, based on CSS3 Text draft (obsolete) — Details — Splinter Review

As a preliminary step, this implements a "-moz-hyphens" property based on "hyphens" in the CSS3 Text draft, with values none/manual/auto/all. This will make it possible to control the use of hyphenation from CSS (once an actual hyphenation engine is in place).

Assignee: nobody → jfkthame

Assignee

Comment 4

•

13 years ago

Attached patch part 2 (WIP) - implement nsIHyphenator and nsIHyphenationManager components (obsolete) — Details — Splinter Review

The hyphenation code here is hyphen-2.5, with slight modifications. Currently, the actual hyphenation is done on 8-bit text, which means we have to convert to utf-8 in order to find hyphen positions; for better performance, we could perhaps adapt the core code to work directly on utf16 strings.

Assignee

Comment 5

•

13 years ago

Attached patch part 3 - provide the en-US hyphenation patterns (obsolete) — Details — Splinter Review

Assignee

Comment 6

•

13 years ago

Attached patch part 1 - implement -moz-hyphens property in CSS — Details — Splinter Review

Attachment #522769 - Attachment is obsolete: true

Attachment #522772 - Attachment is obsolete: true

Attachment #522773 - Attachment is obsolete: true

Attachment #523368 - Flags: review?

Assignee

Updated

•

13 years ago

Attachment #523368 - Flags: review? → review?(dbaron)

Assignee

Comment 7

•

13 years ago

Attached patch part 2 - import hyphenation code from hyphen-2.7.1 — Details — Splinter Review

This comes directly from the current (2.7.1) source available at http://sourceforge.net/projects/hunspell/files/Hyphen/. The code is MPL/LGPL-licensed, so there should be no licensing issues with using it in Gecko.

Attachment #523370 - Flags: review?(smontagu)

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Assignee

Comment 8

•

13 years ago

Attached patch part 3 - import en-US hyphenation patterns from hyphen-2.7.1 (obsolete) — Details — Splinter Review

These are the US-English hyphenation patterns, also used by OpenOffice.org, I believe; we may want to look into sharing OOo files if they're installed, but for the time being it seems simplest to include this in the tree, like we do for the US-English spelling dictionary.

Attachment #523371 - Flags: review?(smontagu)

Comment 9

•

13 years ago

Comment on attachment 523368 [details] [diff] [review]
part 1 - implement -moz-hyphens property in CSS

>diff --git a/layout/style/nsCSSPropList.h b/layout/style/nsCSSPropList.h
>+CSS_PROP_TEXT(
>+    -moz-hyphens,
>+    _moz_hyphens,

Don't use the "_moz_" in the name of the constant (and fix nsComputedDOMStyle.cpp to match).

>+    MozHyphens,

use CSS_PROP_DOMPROP_PREFIXED(Hyphens) here instead, and change the ValueForMozHyphens() in nsRuleNode.cpp to ValueForHyphens().

Have you tested that NS_STYLE_HINT_REFLOW is sufficient for a CalcDifference hint?

I presume that you've run the layout/style mochitests and they still pass.

r=dbaron with that

Attachment #523368 - Flags: review?(dbaron) → review+

Assignee

Comment 10

•

13 years ago

Attached patch part 4 - implement nsIHyphenationManager and nsIHyphenator components (obsolete) — Details — Splinter Review

This provides XPCOM components to implement hyphenation for a given locale (based on a file of hyphenation patterns), and to manage the hyphenators, looking them up by language/locale codes.

Currently, the hyphenation patterns for a given locale are never freed (until shutdown) once they've been loaded. If memory usage becomes a concern, we could make the hyphenationManager release them if they haven't been used for a certain amount of time.

Attachment #523373 - Flags: review?(smontagu)

Assignee

Comment 11

•

13 years ago

Attached patch part 5 - support none/manual/auto values for -moz-hyphens in layout (obsolete) — Details — Splinter Review

Implement automatic hyphenation support in layout. This is an "opt-in" feature; hyphenation will only be done if the CSS calls for "-moz-hyphens: auto", AND if the data is tagged as English.

(If additional patterns are added, then of course additional languages will be supported for hyphenation. We might want to make patterns available as part of language-packs for users to add as desired.)

Attachment #523374 - Flags: review?(roc)

Robert O'Callahan (:roc) (email my personal email if necessary)

Assignee

Comment 12

•

13 years ago

Attached patch part 6 - reftests for automatic hyphenation (obsolete) — Details — Splinter Review

Attachment #523375 - Flags: review?(roc)

Comment 13

•

13 years ago

Comment on attachment 523373 [details] [diff] [review]
part 4 - implement nsIHyphenationManager and nsIHyphenator components

Why do we need to use XPCOM here? Now that libxul is a required part of the build, these can just be regular C++ classes nsHyphenator and nsHyphenationManager.

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 14

•

13 years ago

Also, if insertHyphens is just for testing, I think we should do without it. We can fake it in the test code by laying out at different widths and seeing what boxes are created, using getClientRects.

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 15

•

13 years ago

I see that the hyphenator defines the notion of "a word". What does that mean exactly? We have words for selection/caret movement, and words for CSS capitalization, or is it something else? (We also have line break opportunities which are similar)

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 16

•

13 years ago

Let's name mHyphenatorInited mHyphenatorInitialized.

+      mFrame->GetStyleVisibility()->mLanguage->ToUTF8String(langString);
+      hyphSvc->GetHyphenator(langString.get(), getter_AddRefs(mHyphenator));

Why not make GetHyphenator take an nsIAtom*?

+    nsAutoTArray<PRUint8,200> hyphens;
+    PRBool inhibitHyphenInInitialWord =
+      startOriginalOffset > 0 &&
+      mFrag->CharAt(startOriginalOffset - 1) == CH_SHY;
+
+    if (NS_SUCCEEDED(mHyphenator->Hyphenate(*text, &hyphens,
+                                            inhibitHyphenInInitialWord)))

Seems like 'hyphens' should be an array of PRPackedBool.

Hmm. Actually I think there's a big problem here, which is that we should be looking at text runs that cross frame boundaries. In fact, I think we probably should be computing hyphenation breaks in nsLineBreaker. Probably the SHY breaks should be computed there too.

Carlo Alberto Ferraris

Comment 17

•

13 years ago

(In reply to comment #11)
> We might want to make patterns available as part of
> language-packs for users to add as desired.
Yes, please do.

Assignee

Comment 18

•

13 years ago

(In reply to comment #13)
> Comment on attachment 523373 [details] [diff] [review]
> part 4 - implement nsIHyphenationManager and nsIHyphenator components
> 
> Why do we need to use XPCOM here? Now that libxul is a required part of the
> build, these can just be regular C++ classes nsHyphenator and
> nsHyphenationManager.

(In reply to comment #14)
> Also, if insertHyphens is just for testing, I think we should do without it. We
> can fake it in the test code by laying out at different widths and seeing what
> boxes are created, using getClientRects.

Does XPCOM give people more flexibility to use (and/or replace) the component, e.g. from JS? For example, someone might want to implement alternative hyphenation components, if the libhyphen algorithm isn't ideal for all languages, or use insertHyphens from JS to find most hyphen positions but then massage the results before replacing text in the DOM.

I'd be OK with a simple C++ implementation, but providing XPCOM interfaces here seemed like the "Gecko-ish" way to do things.

Assignee

Comment 19

•

13 years ago

(In reply to comment #15)
> I see that the hyphenator defines the notion of "a word". What does that mean
> exactly? We have words for selection/caret movement, and words for CSS
> capitalization, or is it something else? (We also have line break opportunities
> which are similar)

(There's also the spell-checker, which has a notion of "word" as the unit to be checked.)

For simplicity, I went with a contiguous sequence of characters with the Unicode category "letter" or "combining mark", as that's what the hyphenation patterns are typically designed to handle. It's possible that we could take advantage of existing code for selection (not caret movement, which seems to include trailing punctuation with the "word", and certainly not line-breaking).

Robert O'Callahan (:roc) (email my personal email if necessary)

Assignee

Comment 20

•

13 years ago

(In reply to comment #16)
> Why not make GetHyphenator take an nsIAtom*?

A string seemed more JS-friendly, but if we decide not to expose this to JS then atoms would be fine, I guess.

> Hmm. Actually I think there's a big problem here, which is that we should be
> looking at text runs that cross frame boundaries. 

That's true, in principle, but I'm not sure how important it is (if it'd be at the cost of significant extra complexity, for example). Cases where a single hyphenatable word crosses frame boundaries are probably not common, and when they do occur (e.g., the author is explicitly styling certain letters of the word, such as highlighting one syllable in bold), it may well be inappropriate to be hyphenating anyway as this is typically a case where the "word" is specifically in focus for some reason, and splitting it across lines then detracts from the intent.

If we can do this readily, I'm fine with it, but if it's going to be tricky, I don't think missing a few potential hyphenation points in such cases is a major issue; it's a limitation we could live with. Authors writing such stuff who _do_ want it hyphenated (which seems like it'll be very rare) could still use &shy; and thereby get explicit control of the possible breaks.

> In fact, I think we probably
> should be computing hyphenation breaks in nsLineBreaker. Probably the SHY
> breaks should be computed there too.

Can nsLineBreaker handle this? Offhand, I don't see anything that suggests it can mark breakpoints as requiring hyphen insertion if they're taken.

Comment 21

•

13 years ago

(In reply to comment #18)
> Does XPCOM give people more flexibility to use (and/or replace) the component,
> e.g. from JS? For example, someone might want to implement alternative
> hyphenation components, if the libhyphen algorithm isn't ideal for all
> languages, or use insertHyphens from JS to find most hyphen positions but then
> massage the results before replacing text in the DOM.
> 
> I'd be OK with a simple C++ implementation, but providing XPCOM interfaces here
> seemed like the "Gecko-ish" way to do things.

Those are bad old ways.

Who's going to replace the hyphenator with a JS implementation? An extension? Seems unlikely. We shouldn't take complexity and perf hits in our code for an extension point that is unlikely to be used. We should implement the simple thing and then if demand arises for an extension point, provide that as an option.

If Web content wants to discover hyphenation points, we need to create a Web API for that. XPCOM doesn't do it.

We may think it's not a major issue to look only within a frame, in practice, but I think it's fundamentally wrong and will probably bite us later. Fundamentally, wrapping arbitrary chunks of text in unstyled <spans> should not change layout. We go to a lot of effort to make that true, e.g. forming ligatures across element boundaries. I really think we should do this right.

nsLineBreaker can't report hyphen breaks right now. It would need to be extended to do that. But I think the line breaker is the logical place to do this; apart from having the text that we need, hyphenation is fundamentally about exposing line break opportunities.

Robert O'Callahan (:roc) (email my personal email if necessary)

Assignee

Comment 22

•

13 years ago

(In reply to comment #21)

> Who's going to replace the hyphenator with a JS implementation? An extension?
> Seems unlikely.

It may not be as unlikely as all that. There's already Hyphenator.js (http://code.google.com/p/hyphenator/), for example, showing that people have an interest in the topic. Our hyphenation would basically make that superfluous, but more sophisticated things could be built as refinements on top of our implementation if we expose it to JS - e.g, adding site-specific hyphenation additions/exceptions while still using our hyphenator for the bulk of the work.

Perhaps more likely that replacing the actual hyphenator with a JS implementation, I can imagine people wanting to _use_ the hyphenator from JS and then put the resulting soft-hyphenated text (after post-processing) into the document.

> We may think it's not a major issue to look only within a frame, in practice,
> but I think it's fundamentally wrong and will probably bite us later.
> Fundamentally, wrapping arbitrary chunks of text in unstyled <spans> should not
> change layout. We go to a lot of effort to make that true, e.g. forming
> ligatures across element boundaries. I really think we should do this right.
> 
> nsLineBreaker can't report hyphen breaks right now. It would need to be
> extended to do that. But I think the line breaker is the logical place to do
> this; apart from having the text that we need, hyphenation is fundamentally
> about exposing line break opportunities.

Fair enough; I'll have a look at nsLineBreaker and see what would be involved there.

Comment 23

•

13 years ago

(In reply to comment #22)
> It may not be as unlikely as all that. There's already Hyphenator.js
> (http://code.google.com/p/hyphenator/), for example, showing that people have
> an interest in the topic. Our hyphenation would basically make that
> superfluous, but more sophisticated things could be built as refinements on top
> of our implementation if we expose it to JS - e.g, adding site-specific
> hyphenation additions/exceptions while still using our hyphenator for the bulk
> of the work.

That could be useful for Web sites themselves, so if there's demand we should just create a proper Web-accessible API for that. And that means not going through an XPCOM service.

> Perhaps more likely that replacing the actual hyphenator with a JS
> implementation, I can imagine people wanting to _use_ the hyphenator from JS
> and then put the resulting soft-hyphenated text (after post-processing) into
> the document.

Ditto.

Basically the right approach is to use something fast and lightweight internally (i.e. a regular C++ class, not even using virtuals if we can help it), then if we want a JS API, we create a scriptable wrapper or extension point which uses XPCOM as our WebIDL to define the interface, but does not use the rest of XPCOM such as services.

> Fair enough; I'll have a look at nsLineBreaker and see what would be involved
> there.

Great, thanks.

Assignee

Updated

•

13 years ago

Keywords: dev-doc-needed

Simon Montagu :smontagu

Updated

•

13 years ago

Attachment #523370 - Flags: review?(smontagu) → review+

Simon Montagu :smontagu

Comment 24

•

13 years ago

Comment on attachment 523371 [details] [diff] [review]
part 3 - import en-US hyphenation patterns from hyphen-2.7.1

rs=me

Attachment #523371 - Flags: review?(smontagu) → review+

Assignee

Comment 25

•

13 years ago

Attached patch part 4 - implement nsHyphenationManager and nsHyphenator (obsolete) — Details — Splinter Review

Updated to implement hyphenation manager and hyphenator as simple C++ classes, without XPCOM interfaces.

(Also includes the Makefile changes needed to build the new files, and include the hyphenation patterns in the final package.)

Still to come: an update of part 5, to use the revised interface.

Attachment #523373 - Attachment is obsolete: true

Attachment #523374 - Attachment is obsolete: true

Attachment #525667 - Flags: review?(smontagu)

Attachment #523374 - Flags: review?(roc)

Attachment #523373 - Flags: review?(smontagu)

Assignee

Comment 26

•

13 years ago

Attached patch part 5 - support -moz-hyphens:auto/manual/none in layout (obsolete) — Details — Splinter Review

This implements automatic hyphenation via the nsLineBreaker, so that it works across element boundaries.

Attachment #526972 - Flags: review?(roc)

Assignee

Comment 27

•

13 years ago

Attached patch part 6 - hyphenation reftests (updated) — Details — Splinter Review

Attachment #523375 - Attachment is obsolete: true

Attachment #526973 - Flags: review?(roc)

Attachment #523375 - Flags: review?(roc)

Assignee

Comment 28

•

13 years ago

Nominating for tracking-firefox6: hope to land this on m-c fairly soon, at which point it will be a significant new web-facing feature.

Needs dev-doc for authors re the -moz-hyphens property and the importance of proper language tagging to go with it. Also, localizers may wish to consider preparing and bundling hyphenation patterns for additional languages alongside the en-US patterns here.

tracking-firefox6: --- → ?

Carlo Alberto Ferraris

Comment 29

•

13 years ago

I have the it-IT hyphenation dictionary at hand. Should I attach it?

Robert O'Callahan (:roc) (email my personal email if necessary)

Assignee

Comment 30

•

13 years ago

(In reply to comment #29)
> I have the it-IT hyphenation dictionary at hand. Should I attach it?

Is that the dictionary from an OpenOffice or LibreOffice distribution? Or some other source?

I'm not sure what the best approach will be to keep track of additional dictionaries. I think this bug should be for the core support that becomes part of the base Firefox product; I'd suggest checking with Pike about how we should handle the resources for additional languages.

Carlo Alberto Ferraris

Comment 31

•

13 years ago

It's the latest official (open|libre)office3 it-IT hyphenation dictionary.

Comment 32

•

13 years ago

This is very nice!

+   * Possible states are 0 (no break), 1 (break allowed), 2 (hyphenation point)

Just use the FLAG_BREAK names here.

+    memset(breakState.Elements(), 0, length*sizeof(PRUint8));

Use FLAG_BREAK_TYPE_NONE here.

+    nsCAutoString lang;
+    mCurrentWordLangGroup->ToUTF8String(lang);
+    nsRefPtr<nsHyphenator> hyphenator =
+      nsHyphenationManager::Instance()->GetHyphenator(lang.get());

Why doesn't GetHyphenator just take the atom as a parameter?

+      if (mCurrentWordLangGroup && mCurrentWordLangGroup != aLangGroup) {
+        mCurrentWordContainsMixedLang = PR_TRUE;
+      } else {
+        mCurrentWordLangGroup = aLangGroup;
+      }

This chunk of code can be moved out to a shared function, say nsLineBreaker::UpdateCurrentWordLangGroup().

+          if (hyphenator) {
+            nsDependentSubstring string(aText + wordStart, aText + offset);
+            nsAutoTArray<PRPackedBool,200> hyphens;
+            if (NS_SUCCEEDED(hyphenator->Hyphenate(string, hyphens))) {
+              for (PRUint32 i = 0; i + 1 < string.Length(); ++i) {
+                if (hyphens[i]) {
+                  breakState[wordStart + i + 1] =
+                    gfxTextRun::CompressedGlyph::FLAG_BREAK_TYPE_HYPHEN;
+                }
+              }
+            }
+          }

I believe this chunk of code can be shared between AppendText and FlushCurrentWord.

I suspect it would be better to have PropertyProvider::GetHyphenationBreaks look up the hypenation state in the textrun itself, so that callers like BreakAndMeasureText and AddInlineMinWidthForFlow don't have to check for both kinds of hyphens.

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 33

•

13 years ago

Are we going to ship en-US hyphenation dictionaries in every build? If not, these reftests are going to break.

Assignee

Comment 34

•

13 years ago

(In reply to comment #32)

I'll go through most of these comments tomorrow, but a couple of responses for now:

> Why doesn't GetHyphenator just take the atom as a parameter?

Because if it doesn't find an exact match, it munges the value to successively drop trailing subtag(s) and calls itself recursively to see if they exist. This means it's dealing with the value as a string and potentially calling itself with a new temporary string, and there's no reason to make it create temporary atoms in order to do that.

> I suspect it would be better to have PropertyProvider::GetHyphenationBreaks
> look up the hypenation state in the textrun itself, so that callers like
> BreakAndMeasureText and AddInlineMinWidthForFlow don't have to check for both
> kinds of hyphens.

Currently, the textrun doesn't explicitly know about &shy; breaks; we'd have to add something to record those during textrun construction in order to be able to do this. I'll take a look... the "obvious" idea of using nsLineBreaker to mark all potential line-breaks doesn't work, as the &shy; characters are filtered out of the text that's passed to the linebreaker.

Robert O'Callahan (:roc) (email my personal email if necessary)

Assignee

Comment 35

•

13 years ago

(In reply to comment #33)
> Are we going to ship en-US hyphenation dictionaries in every build? If not,
> these reftests are going to break.

I've been assuming "yes" (just like I think we always ship the en-US spelling dictionary), but if product drivers decide against it (e.g. prefer to ship _only_ a local-language dictionary in non-en-US builds) we'd need to figure out some other approach to testing.

Comment 36

•

13 years ago

(In reply to comment #34)
> Because if it doesn't find an exact match, it munges the value to successively
> drop trailing subtag(s) and calls itself recursively to see if they exist. This
> means it's dealing with the value as a string and potentially calling itself
> with a new temporary string, and there's no reason to make it create temporary
> atoms in order to do that.

Ah OK. But I think it should still take an nsIAtom* parameter and do the conversion internally. That would actually save a bit of code already, and later if we want to we could easily add an extra internal cache mapping nsIAtoms to their hyphenators.

> > I suspect it would be better to have PropertyProvider::GetHyphenationBreaks
> > look up the hypenation state in the textrun itself, so that callers like
> > BreakAndMeasureText and AddInlineMinWidthForFlow don't have to check for both
> > kinds of hyphens.
> 
> Currently, the textrun doesn't explicitly know about &shy; breaks; we'd have to
> add something to record those during textrun construction in order to be able
> to do this. I'll take a look... the "obvious" idea of using nsLineBreaker to
> mark all potential line-breaks doesn't work, as the &shy; characters are
> filtered out of the text that's passed to the linebreaker.

Right. What I was suggesting is modifying PropertyProvider::GetHyphenationBreaks in nsTextFrameThebes to look in the textrun for the auto-hyphenation breaks recorded by nsLineBreaker, and combining those results with the &shy; results that it already computes. That way we won't need to record &shy; breaks in the textrun.

Comment 37

•

13 years ago

Sorry if my comments are out of context, I'm going to go "tl;dr;" on this bug.

Hyphenation is tricky language data, as it's less about which language is the preferred one, but which the user would see. How does it determine which hyphenation dict to use if there are multiple?

If we ship it optionally, it should also be installable through AMO. Might be a good idea to do so either way.

How big is the data, if we'd ship multiple dicts?

Also, making them optional makes us less strict about having that data to be triple licensed.

I don't think I want the hyphenation data to be in intl/hyphenation/locales/en-US/. If we go for a single data file (and not ship en-US by default), it should probably be in intl/locales/en-US/hyphenation/, if we can't put it in toolkit/locales/en-US. We'd also need to see how to integrate that into the l10n repacks right. Which patch does the build magic?

If we ship a bunch independent of localization, I care less about the file location.

Assignee

Comment 38

•

13 years ago

(In reply to comment #37)
> Hyphenation is tricky language data, as it's less about which language is the
> preferred one, but which the user would see. How does it determine which
> hyphenation dict to use if there are multiple?

It will _only_ try to auto-hyphenate if the author specifically requests it with "-moz-hyphens:auto" (eventually "hyphens:auto" once CSS3 Text is sufficiently advanced), AND if the text is tagged with a language code that maps to an available hyphenation resource.

So in the case of a multilingual page, individual elements will be hyphenated according to the appropriate rules for their language (or not hyphenated at all, if we don't have rules for them).

> If we ship it optionally, it should also be installable through AMO. Might be a
> good idea to do so either way.
> 
> How big is the data, if we'd ship multiple dicts?

The en-US data is around 100K (but deflates by about 60%, so it would add about 40K to the installer, I assume). Other languages vary greatly; many of them are MUCH smaller (in the 5K-20K range, before compression), as they have simple rules that can be expressed in just a few patterns, though a handful may be larger than en-US.

Although many of the files are small, the cumulative size of shipping all 60+ languages that we could easily derive from the TeX world (for example) is probably more than we want to include in the default installer.

> Also, making them optional makes us less strict about having that data to be
> triple licensed.

Ok, that's good to know. Licensing may well vary among the available patterns.

> I don't think I want the hyphenation data to be in
> intl/hyphenation/locales/en-US/. If we go for a single data file (and not ship
> en-US by default), it should probably be in intl/locales/en-US/hyphenation/, if
> we can't put it in toolkit/locales/en-US. We'd also need to see how to
> integrate that into the l10n repacks right. Which patch does the build magic?

Currently, it's included in part 4 here. I'm happy to move things around to wherever you think they belong.

> If we ship a bunch independent of localization, I care less about the file
> location.

ISTM that it makes sense to treat hyphenation patterns similarly to spelling dictionaries. We have the en-US dictionary in the mozilla-central tree (in extensions/spellcheck/locales/en-US/hunspell), and it becomes part of a "standard" m-c build, in the {application}/dictionaries folder, but it looks like l10n repacks replace this with a localized dictionary. In the patches here, I was aiming to install the en-US hyphenation data in a "hyphenation" folder alongside "dictionaries", on the assumption that it could then be processed in the same way during repacks.

I don't see any intl/locales resources in the tree at the moment; there's intl/locale, but that is source code for locale-related functionality rather than locale-specific resources.

Comment 39

•

13 years ago

I favor intl/locales/en-US/hyphenation as it allows us to put other locale info that we may have at some point into the same packaging process. I'm thinking about support data for js i18n, in particular.

Re the bundling piece: Should this be part of browser or part of toolkit? We probably want to talk to the mobile guys about that. Not sure if TB cares, or SM.
The rest of the logic looks OK for repacks, the locales/Makefile.in would be the place to make a call on whether we ship en-US in all builds, or not. Might still make sense to do so.

How are we going to do 
pref("intl.hyphenation-alias.en", "en-us");
for l10n builds? Can't we use sane defaults here, and get by without requiring a pref?

Leaves me with two follow up tasks for you: Feedback on mobile on whether they want to ship with this, and feedback from add-ons, both toolkit code and AMO, if they can hook you up with an extension infrastructure for this data.

Assignee

Comment 40

•

13 years ago

(In reply to comment #39)
> I favor intl/locales/en-US/hyphenation ...

OK, I'll move it and refresh the patches accordingly.

> How are we going to do 
> pref("intl.hyphenation-alias.en", "en-us");
> for l10n builds? Can't we use sane defaults here, and get by without requiring
> a pref?

I don't see a way to handle this without a localizable pref or some similiar mechanism. In the en-GB localization, for example, we might ship both en-us and en-gb patterns (or a user might add the en-us ones, if we only ship en-gb there), but I expect we'd want to map "en" -> "en-gb" by default so that text simply tagged as lang="en", without a more specific locale, will be handled using British rules.

I also think it should be possible for users to customize this behavior if they wish, which the pref allows.

Do l10n repacks currently have a mechanism to modify default prefs at all? I thought that was part of the process, but may of course be mistaken.

Martin von Gagern

Comment 41

•

13 years ago

(In reply to comment #39)
> pref("intl.hyphenation-alias.en", "en-us");

Is this really specific to hyphenation? I'd consider the setting "please treat all unknown English locales as US English" to affect several aspects, like hyphenation, spell checking, and perhaps more to come. Having different settings for different facets might cause trouble, particularly in terms of config UI.

Assignee

Comment 42

•

13 years ago

(In reply to comment #39)
> and feedback from add-ons, both toolkit code and AMO,
> if they can hook you up with an extension infrastructure for this data.

Looking on AMO, I see we already have https://addons.mozilla.org/en-US/firefox/language-tools/ with a whole bunch of spell-check dictionaries, etc.; it seems like hyphenation patterns could be delivered in exactly the same way, no?

Assignee

Comment 43

•

13 years ago

Attached patch part 3 - import en-US hyphenation patterns - moved to intl/locale/en-US/hyphenation — Details — Splinter Review

This is the same as the previous "part 3", except that the patterns are placed in intl/locales/en-US/hyphenation as per comment 39.

Carrying forward r=smontagu; Simon, please speak up if you have any concerns with this placement.

Attachment #523371 - Attachment is obsolete: true

Attachment #527285 - Flags: review+

Assignee

Comment 44

•

13 years ago

Attached patch part 3.1 - build support to install hyphenation patterns in the product — Details — Splinter Review

This is the build changes to copy the hyphenation patterns into the {program}/hyphenation folder, split out from the rest of Part 4 for easier review. Pike, does this look like it does the right thing from your point of view?

Attachment #527288 - Flags: review?(l10n)

Assignee

Comment 45

•

13 years ago

Attached patch part 4 - implement nsHyphenationManager and nsHyphenator - updated to use nsIAtom instead of strings in the API (obsolete) — Details — Splinter Review

Updated to use Atom parameters, as per Roc's comments.

Attachment #525667 - Attachment is obsolete: true

Attachment #527290 - Flags: review?(smontagu)

Attachment #525667 - Flags: review?(smontagu)

Assignee

Comment 46

•

13 years ago

Comment on attachment 527290 [details] [diff] [review]
part 4 - implement nsHyphenationManager and nsHyphenator - updated to use nsIAtom instead of strings in the API

Argh, cancelling r? on part 4 as I just realized I attached a stale patch, so don't waste time on it. Update coming.

Attachment #527290 - Flags: review?(smontagu)

Assignee

Comment 47

•

13 years ago

Attached patch part 4 - implement nsHyphenationManager and nsHyphenator - updated to use nsIAtom instead of strings in the API — Details — Splinter Review

Attachment #527290 - Attachment is obsolete: true

Attachment #527486 - Flags: review?(smontagu)

Assignee

Comment 48

•

13 years ago

Attached patch part 5 - support -moz-hyphens:auto/manual/none in layout - updated per comments 32 & 36 — Details — Splinter Review

Attachment #526972 - Attachment is obsolete: true

Attachment #527488 - Flags: review?(roc)

Attachment #526972 - Flags: review?(roc)

Robert O'Callahan (:roc) (email my personal email if necessary)

Assignee

Comment 49

•

13 years ago

(In reply to comment #33)
> Are we going to ship en-US hyphenation dictionaries in every build? If not,
> these reftests are going to break.

Regardless of the decision on this, I think we'll have the en-US hyph dic in the main build (like we do for the spelling dic), so these should work during our standard unit tests.

It's true that they'd fail if they were run with an l10n-repackaged build where the dic has been swapped out, but I don't see a good way to deal with that at this point. (We'd have the same issue with spell-check tests if we were running full unit tests on repacks, wouldn't we?)

Comment 50

•

13 years ago

I guess so.

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 51

•

13 years ago

I think it might make sense to just always package the en-US hyphenation dictionary, anyway, in addition to any other languages we choose to package.

Robert O'Callahan (:roc) (email my personal email if necessary)

Updated

•

13 years ago

Attachment #526973 - Flags: review?(roc) → review+

Robert O'Callahan (:roc) (email my personal email if necessary)

Updated

•

13 years ago

Attachment #527488 - Flags: review?(roc) → review+

Comment 52

•

13 years ago

I agree that it'd make sense to package the en-US hyphenation dictionary in all builds. That'd enable us to run the hyphenation tests on non-en-US builds, too, which is a good thing to have.

For the locale prefs, doing prefs as part of repacks is tough. There's a template prefs file we include, but I'd rather not have localizers touch that, it's too easy to put scary things in there.

Isn't a language mapping algorithm like BCP47 good enough in general? I don't mind if users have prefs to tweak things on top of that, but I don't think we should have prefs for the simple case of walking up a set of language tags. We currently don't have a reusable component implementing bcp47 sadly, but bug 356038 could add that at some point. Right now, going in chunks of "en-US", "en", 'nothing' should do the trick, I think.

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 53

•

13 years ago

Comment on attachment 527288 [details] [diff] [review]
part 3.1 - build support to install hyphenation patterns in the product

I'd make 

+PATTERN_FILES = $(strip $(wildcard $(LOCALE_SRCDIR)/hyphenation/*.dic))

list both the en-US and the locale pattern files so that we ship the en-US ones by default.

r=me with that.

Attachment #527288 - Flags: review?(l10n) → review+

Cédric Corazza

Comment 54

•

13 years ago

I read through rapidly all the comments and didn't read about one concern: bloating.
This is not really about the issue discussed here, but still. Adding en-US dic to l10n locale will bloat Firefox a little more. I remember times where size packages matters, and it was around 6 MB for Windows in these days, then we added l10n dictionaries (1 MB more or so). We are now around 12 MB.
Just ignore my comment if this doesn't matter anymore.
If it does, I will let the "guys who know" file the bug in the right component.
Sorry if made unncessary noise in this bug.

Comment 55

•

13 years ago

We worry about bloat, but adding features generally requires adding to the download size. In this case we expect adding the en-US dictionary to add about 40K to the download, see comment #38.

Simon Montagu :smontagu

Comment 56

•

13 years ago

Comment on attachment 527486 [details] [diff] [review]
part 4 - implement nsHyphenationManager and nsHyphenator - updated to use nsIAtom instead of strings in the API

Review of attachment 527486 [details] [diff] [review]:

r=me

Attachment #527486 - Flags: review?(smontagu) → review+

Assignee

Comment 57

•

13 years ago

Comment on attachment 527486 [details] [diff] [review]
part 4 - implement nsHyphenationManager and nsHyphenator - updated to use nsIAtom instead of strings in the API

Requesting sr for the new APIs here.

Attachment #527486 - Flags: superreview?(dbaron)

Robert O'Callahan (:roc) (email my personal email if necessary)

Assignee

Updated

•

13 years ago

Attachment #527486 - Flags: superreview?(dbaron) → superreview?(roc)

Comment 58

•

13 years ago

Comment on attachment 527486 [details] [diff] [review]
part 4 - implement nsHyphenationManager and nsHyphenator - updated to use nsIAtom instead of strings in the API

Review of attachment 527486 [details] [diff] [review]:

::: intl/hyphenation/public/nsHyphenator.h
@@ +49,5 @@
+public:
+  nsHyphenator(nsIFile *aFile);
+
+  nsrefcnt AddRef(void) {
+    NS_PRECONDITION(PRInt32(mRefCnt) >= 0, "illegal refcnt");

Use NS_INLINE_DECL_REFCOUNTING

::: intl/hyphenation/src/hnjalloc.h
@@ +43,5 @@
+#include "prmem.h"
+
+#define hnj_malloc(size)      PR_Malloc(size)
+#define hnj_realloc(p, size)  PR_Realloc(p, size)
+#define hnj_free(p)           PR_Free(p)

Use mozalloc.h and moz_malloc or moz_xmalloc. If you think libhyphen is very robust about checking malloc return values for OOM, then use moz_malloc, otherwise use moz_xmalloc.

Attachment #527486 - Flags: superreview?(roc) → superreview+

Assignee

Comment 59

•

13 years ago

(In reply to comment #58)

> Use NS_INLINE_DECL_REFCOUNTING

Done; although I was thinking we might eventually want to enhance this to have a time-based cache that discards hyphenation tables that haven't been used recently, if memory usage becomes a concern. For that, I think we'd want a custom Release implementation so as to track last-used time. But we can do that in a followup bug if it seems necessary.

> Use mozalloc.h and moz_malloc or moz_xmalloc. If you think libhyphen is very
> robust about checking malloc return values for OOM, then use moz_malloc,
> otherwise use moz_xmalloc.

Done. libhyphen does not check allocator return values, it assumes that hnj_{m,re}alloc  are infallible (normally by using its own wrappers), so they're mapped to the moz_x* versions here.


http://hg.mozilla.org/mozilla-central/rev/852a668d69b9
http://hg.mozilla.org/mozilla-central/rev/fb4db64ca8b8
http://hg.mozilla.org/mozilla-central/rev/861b8bbc25b5
http://hg.mozilla.org/mozilla-central/rev/a5877159c9a1
http://hg.mozilla.org/mozilla-central/rev/3f96f2d3dffe
http://hg.mozilla.org/mozilla-central/rev/cf41ba95de85
http://hg.mozilla.org/mozilla-central/rev/3fd770ef6a65

Status: NEW → RESOLVED

Closed: 13 years ago

Resolution: --- → FIXED

Assignee

Comment 60

•

13 years ago

I backed out parts 3.1 (the build support for including the hyphenation dict) and 6 (tests) because of orange in l10n-check:

make libs
make[4]: Entering directory `/builds/slave/cen-lnx/build/obj-firefox/intl/locales'
/builds/slave/cen-lnx/build/obj-firefox/config/nsinstall -R    ../../dist/xpi-stage/locale-x-test/hyphenation
usage: /builds/slave/cen-lnx/build/obj-firefox/config/nsinstall [-C cwd] [-L linkprefix] [-m mode] [-o owner] [-g group]
                                                                [-DdltR] file [file ...] directory
make[4]: make[4]: Leaving directory `/builds/slave/cen-lnx/build/obj-firefox/intl/locales'
*** [libs] Error 2
NEXT ERROR make[3]: *** [default] Error 2
make[3]: Leaving directory `/builds/slave/cen-lnx/build/obj-firefox/intl/locales'
make[2]: Leaving directory `/builds/slave/cen-lnx/build/obj-firefox/browser/locales'
make[2]: *** [libs-x-test] Error 2
make[1]: *** [l10n-check] Error 2
make[1]: Leaving directory `/builds/slave/cen-lnx/build/obj-firefox/browser/locales'
make: *** [l10n-check] Error 2
program finished with exit code 2

As such, the feature is currently disabled - the code is still in the tree, but with no dictionary, it won't actually do anything.

Status: RESOLVED → REOPENED

Resolution: FIXED → ---

Assignee

Comment 61

•

13 years ago

Re-landed after fixing the path in the makefile so that it won't fail l10n-check this time:
http://hg.mozilla.org/mozilla-central/rev/5cd93fcc75d9
http://hg.mozilla.org/mozilla-central/rev/4ed2169f5b34

Status: REOPENED → RESOLVED

Closed: 13 years ago → 13 years ago

Resolution: --- → FIXED

Eric Shepherd [:sheppy]

Comment 62

•

13 years ago

Documentation:

https://developer.mozilla.org/en/CSS/hyphens

I've composed an example but am awaiting a build with this working before inserting it into the page. The Nightly I'm running now doesn't have it yet.

Also mentioned on Firefox 6 for developers.

Keywords: dev-doc-needed → dev-doc-complete

Assignee

Comment 63

•

13 years ago

(In reply to comment #62)
> Documentation:
> 
> https://developer.mozilla.org/en/CSS/hyphens

Thanks! :)

A couple of notes. First a nit: there's a missing "e" on "The" at the very beginning!

You might want to reference CSS3 Text in addition to Generated-Content; I'm not sure where the final spec will settle down, though personally I think Text is a more appropriate home for it.

  http://www.w3.org/TR/2011/WD-css3-text-20110412/#hyphens

Also, it may be worth pointing out that the "auto" setting depends on appropriate language tagging, as the proper places to break words depend on the language. So Gecko (at least) will not do any automatic hyphenation unless the page/element has an appropriate 'lang' attribute; merely setting "-moz-hyphens:auto" with no 'lang' attribute will have no visible effect.

(As of now we only provide support for auto-hyphenating English, but we're looking at ways to include support for additional languages.)

Comment 64

•

13 years ago

So, I'm not totally comfortable with the choice of only shipping en-US hyphenation support by default.  This is something which affects the layout of web content, and I don't think that we should limit the language specific layout support to only the language the browser UI is in.  If you're a multilingual person, you almost definitely do not run two parallel browsers one in each language.  And we do not restrict other language specific layout features (such as shaping) to only browsers in some locales.  (I know that this example is kind of extreme, but I'm talking in terms of what an average user expects).

So, I'm wondering, is the built-in hyphenation dictionaries choice made in this bug a good one?

Comment 65

•

13 years ago

(In reply to comment #64)
> So, I'm wondering, is the built-in hyphenation dictionaries choice made in this
> bug a good one?

I think we should have more bugs on a good default set of hyphenation dicts to ship, plus good localized defaults, plus a good infra to detect if we should download and use additional dicts by demand.

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 66

•

13 years ago

(In reply to comment #65)
> (In reply to comment #64)
> > So, I'm wondering, is the built-in hyphenation dictionaries choice made in this
> > bug a good one?
> 
> I think we should have more bugs on a good default set of hyphenation dicts to
> ship, plus good localized defaults, plus a good infra to detect if we should
> download and use additional dicts by demand.

Is that considered to be something that should be addressed before we can ship this in Firefox?

Comment 67

•

13 years ago

(In reply to comment #64)
> So, I'm not totally comfortable with the choice of only shipping en-US
> hyphenation support by default.  This is something which affects the layout of
> web content, and I don't think that we should limit the language specific
> layout support to only the language the browser UI is in.

There are multiple, somewhat conflicting goals here:
1) Avoid making page layout depend on the browser UI locale.
2) Support auto-hyphenation for the preferred language(s) of each user.
3) Acceptable Firefox download bloat.

What's currently on trunk satisfies #1 and #3. I think this something we can ship; supporting auto-hyphenation for English only is better than not supporting it for any language, IMHO.

I'm not sure what we should do beyond that, though. That's a discussion we should probably have on dev.platform.

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 68

•

13 years ago

(In reply to comment #67)
> There are multiple, somewhat conflicting goals here:
> 1) Avoid making page layout depend on the browser UI locale.
> 2) Support auto-hyphenation for the preferred language(s) of each user.
> 3) Acceptable Firefox download bloat.
> 
> What's currently on trunk satisfies #1 and #3.

Just to clarify: the reason the current code satisfies #1 is that we ship the en-US dictionary for *every* locale (and never ship any other dictionaries).

Brian Smith (:briansmith, :bsmith, use NEEDINFO?)

Comment 69

•

13 years ago

Varying the hyphenation by the local would leak the locale.

Keywords: privacy

Whiteboard: [sg:audit]

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 70

•

13 years ago

Don't we already leak the locale in the Accept-Language header?

Brian Smith (:briansmith, :bsmith, use NEEDINFO?)

Comment 71

•

13 years ago

IIRC, there are a few ways we leak the locale. I don't know how much of a problem, or not, it is considered. I just want to make sure Sid is aware of the leak.

Phil Ringnalda (:philor)

Comment 72

•

13 years ago

Do we want to package this on Android, or mark the six permaorange reftests as fails-if? (Or, do the right thing, build, package, and test based on an ifdef.)

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 73

•

13 years ago

We should package it on Android.

Assignee

Comment 74

•

13 years ago

Attached patch wip - package the hyphenation dictionary on android - incomplete (obsolete) — Details — Splinter Review

I don't know how the Android packaging works, and what's required in order to incorporate this. I found mobile/installer/package-manifest.in and mobile/locales/Makefile.in and added appropriate entries to these (see attached patch), but this is not sufficient. I'm seeing hyphenation/en-US/hyph_en_US.dic in the $OBJDIR/dist/fennec directory, but it doesn't end up in the final package.

So could someone who knows how this should work please point me in the right direction - or just do it for me! :)

Attachment #530366 - Flags: feedback?(pavlov)

Updated

•

13 years ago

Blocks: 655078

Comment 75

•

13 years ago

How big are the dictionaries? We explicitly remove spellcheck dictionaries in mobile due to size issues and the fact that the spellcheck UI is not 100% functional.

Assignee

Comment 76

•

13 years ago

They vary greatly, from a few K to several hundred K. The en-US dict (currently the only one in the tree; we're looking into how best to handle other locales) is just over 100K, but compresses to about 42K when zipped.

Comment 77

•

13 years ago

I can try the patch and see how it works. It looks OK.

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 78

•

13 years ago

(In reply to comment #70)
> Don't we already leak the locale in the Accept-Language header?

The value of the Accept-Language header can be customized by the user, and the UI locale is not its only source.

That said, if we choose to ship en-US in every locale, this won't be a concern, except for the locales which ship their own version of the dictionary in addition to the en-US one.

Comment 79

•

13 years ago

(In reply to comment #75)
> How big are the dictionaries? We explicitly remove spellcheck dictionaries in
> mobile due to size issues and the fact that the spellcheck UI is not 100%
> functional.

This is probably more important than spellcheck since a) it affects layout and b) autohyphenation would be especially useful on mobile to make text fit in a narrow space.

Comment 80

•

13 years ago

(In reply to comment #67)
> (In reply to comment #64)
> > So, I'm not totally comfortable with the choice of only shipping en-US
> > hyphenation support by default.  This is something which affects the layout of
> > web content, and I don't think that we should limit the language specific
> > layout support to only the language the browser UI is in.
> 
> There are multiple, somewhat conflicting goals here:
> 1) Avoid making page layout depend on the browser UI locale.
> 2) Support auto-hyphenation for the preferred language(s) of each user.
> 3) Acceptable Firefox download bloat.
> 
> What's currently on trunk satisfies #1 and #3. I think this something we can
> ship; supporting auto-hyphenation for English only is better than not
> supporting it for any language, IMHO.
> 
> I'm not sure what we should do beyond that, though. That's a discussion we
> should probably have on dev.platform.

Posted a thread to dev.platform.

Jesse Ruderman

Updated

•

13 years ago

Depends on: 655198

Comment 81

•

13 years ago

Comment on attachment 530366 [details] [diff] [review]
wip - package the hyphenation dictionary on android - incomplete

You missed a spot packager.mk

Android omnijar packaging is different and separate from other omnijar packaging. I tweaked your patch and got the hyphen dict files in the package. I sent the patch to the try server to see if the reftests clear up.

Attachment #530366 - Flags: feedback?(pavlov) → feedback-

Assignee

Comment 82

•

13 years ago

Aha, thanks - I figured there must be something like that I'd missed.

In a local build, I now see the dict in the package as expected. However, testing on my device still doesn't actually work - maybe the code ends up looking in the wrong directory or something.

Comment 83

•

13 years ago

(In reply to comment #82)
> Aha, thanks - I figured there must be something like that I'd missed.
> 
> In a local build, I now see the dict in the package as expected. However,
> testing on my device still doesn't actually work - maybe the code ends up
> looking in the wrong directory or something.

I also found a place in the code where we unpack folders out of the omnijar file on Android.

Do we have any easily accessible test pages to try hyphenation? I already failed once to get any reftests to run on try and I don't have an easy way to try to run the Android reftests myself locally.

Comment 84

•

13 years ago

Not sure of a better place to ask, but are we planning to support reading the hyphen dict from a locale JAR file? It seems natural that the locale specific dictionary could be shipped with a locale itself. It looks like the code is currently designed to only read from extracted, local files - using the GRE or APP folder as the root.

Assignee

Comment 85

•

13 years ago

(In reply to comment #83)
> Do we have any easily accessible test pages to try hyphenation? I already
> failed once to get any reftests to run on try

(That sounds like bug 655046.)

I've put a page at http://people.mozilla.org/~jkew/hyph.html that should demonstrate hyphenation, if it's working.

Assignee

Comment 86

•

13 years ago

(In reply to comment #84)
> Not sure of a better place to ask, but are we planning to support reading
> the hyphen dict from a locale JAR file? It seems natural that the locale
> specific dictionary could be shipped with a locale itself. It looks like the
> code is currently designed to only read from extracted, local files - using
> the GRE or APP folder as the root.

What do we do for spelling dictionaries? ISTM that we probably want a similar strategy for both these kinds of resources. Perhaps it's time to spin off a new bug on that....

Brad Lassey [:blassey] (use needinfo?)

Comment 87

•

13 years ago

Attached patch Android packaging patch — Details — Splinter Review

This patch builds on Jonathan's but adds the hyphenation folder to the Android specific omnijar packager and then extracts the hyphenation folder out of the omnijar file on the Android device, so the toolkit code can access it.

Using this patch I can see hyphenation working on http://people.mozilla.org/~jkew/hyph.html

Attachment #530366 - Attachment is obsolete: true

Attachment #530621 - Flags: review?(blassey.bugs)

Comment 88

•

13 years ago

Comment on attachment 530621 [details] [diff] [review]
Android packaging patch

Review of attachment 530621 [details] [diff] [review]:
-----------------------------------------------------------------

I'd rather not extract these dictionaries from the omnijar, but this will do for now. Please file a follow up bug

Attachment #530621 - Flags: review?(blassey.bugs) → review+

Johnathan Nightingale [:johnath]

Comment 89

•

13 years ago

pushed Android packaging patch:
http://hg.mozilla.org/mozilla-central/rev/de45da66d672

Filed bug 655337 for loading via a JAR friendly method

Mathias Nater

Comment 90

•

13 years ago

Filed https://bugzilla.mozilla.org/show_bug.cgi?id=656750 for enhancements on hyphenation.

Comment 91

•

13 years ago

Marking this tracking+, but it seems like it should pick up status-firefox6:fixed as soon as that flag is available (today, ideally)

tracking-firefox6: ? → +

Asa Dotzler [:asa]

Updated

•

13 years ago

status-firefox6: --- → fixed

Gordon P. Hemsley [:GPHemsley]

Updated

•

13 years ago

Blocks: 656750

Dão Gottwald [:dao]

Updated

•

13 years ago

Target Milestone: --- → mozilla6

Serge Gautherie (:sgautherie)

Assignee

Updated

•

13 years ago

Blocks: 672320

Rene

Comment 92

•

13 years ago

(In reply to comment #21)
> Who's going to replace the hyphenator with a JS implementation? An
> extension? Seems unlikely. We shouldn't take complexity and perf hits in our
> code for an extension point that is unlikely to be used. We should implement
> the simple thing and then if demand arises for an extension point, provide
> that as an option.
>
It's not so unlikely at all. There are currently set of speller extensions for smaller languages like Finnish or Estonian that provide much better quality than hunspell.
I would be very happy if there would be an extension interface for hyphenation (like there is mozISpellCheckingEngine for speller) and kindly ask you to consider this for next update.

Mark Banner (:standard8)

Updated

•

13 years ago

Blocks: 677347

Mojca Miklavec

Comment 93

•

13 years ago

(In reply to Rene from comment #92)
> There are currently set of speller extensions
> for smaller languages like Finnish or Estonian that provide much better
> quality than hunspell.

What extensions are needed for Finnish or Estonian? What are their requirements that current hyphenation algorithm is not able to satisfy? (I'm interested to know what hyphenation points TeX is missing.)

Rene

Comment 94

•

13 years ago

(In reply to Mojca Miklavec from comment #93)
> What extensions are needed for Finnish or Estonian? What are their
> requirements that current hyphenation algorithm is not able to satisfy? (I'm
> interested to know what hyphenation points TeX is missing.)
Proper Estonian hyphenation is a combination of rules (for base words) and morphological analysis (for compounds). TeX handles the first part nicely, but can't handle compounds. I think having TeX hyphenation as a standard solution is acceptable (as Mozilla's first priority should be browsing, not language technology), but there should be an extension point here to implement language specific better quality options.

Benjamin Smedberg

Updated

•

13 years ago

Depends on: 685214

Updated

•

13 years ago

Blocks: 677254