Last Comment Bug 703861 - German hyphenation: Wrong orthography variant selected (lang="de-AT-1901" doesn't work)
: German hyphenation: Wrong orthography variant selected (lang="de-AT-1901" doe...
Status: RESOLVED FIXED
:
Product: Core
Classification: Components
Component: Layout: Text (show other bugs)
: Trunk
: All All
: -- normal (vote)
: mozilla11
Assigned To: Jonathan Kew (:jfkthame)
:
Mentors:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2011-11-19 03:08 PST by Gernot Katzer
Modified: 2011-12-13 05:08 PST (History)
3 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments
In this file, German (Austia, 1901 [=old orthography]) is mastakenly hyphenated according to rules of 1996 [=new orthography] (3.24 KB, text/html)
2011-11-19 03:08 PST, Gernot Katzer
no flags Details
patch, add an alias for the de-AT-1901 lang tag (1.15 KB, patch)
2011-11-19 06:20 PST, Jonathan Kew (:jfkthame)
smontagu: review+
Details | Diff | Review

Description Gernot Katzer 2011-11-19 03:08:03 PST
Created attachment 575645 [details]
In this file, German (Austia, 1901 [=old orthography]) is mastakenly hyphenated according to rules of 1996 [=new orthography]

User Agent: Mozilla/5.0 (X11; Linux i686; rv:8.0) Gecko/20100101 Firefox/8.0
Build ID: 20111104165243

Steps to reproduce:

Open http://www.uni-graz.at/~katzer/gecko_hyphenation_bug.html (also attached to this post)
This document contains four non-sensical paragraphs with different HTML lang attribute; all of them are automatically hyphenated wia the CSS rule -moz-hyphens: auto
Adjust the window size so that you see actual hyphenation taking place in all four samples.


Actual results:

The first three paragraphs are treated correctly, yet the fourth is not. It should be formatted identically to the third, but it if formatted identically to the first two.


Expected results:

The text which is declared as old orthography is hyphenated according to the rules of the new orthography.

German has two different orthography conventions which also differ in hyphenation. The new orthography (introduced 1996) is selected by default when a lang="de" attribute is encountered; old orthography is declared by a subtag referring to the year of its introduction, e.g. lang="de-1901". This all works in Firefox.

However, Firefox cannot deal with more complex declarations. In my example, lang="de-AT-1901" (meaning: German as spoken in Austria written according to the orthography of 1901) seems to go unrecognized and trigger default behaviour, this is, 1996 type hyphenation.

Note that the country subtag AT should not influence hyphenation at all, as their are no special “Austrian rules”.

I noticed the bug in Firefox 8.0 and confirmed it is still present in 10.0a2 (2011-11-13).
Comment 1 Jonathan Kew (:jfkthame) 2011-11-19 06:15:28 PST
You can work around this by adding a new entry in about:config with the name "intl.hyphenation-alias.de-AT-1901" and the string value "de-1901" (and then restart the browser).

We should probably add this to the predefined set of aliases.

(In the future, we should replace this mechanism with improved support for BCP47-based language tagging and tag matching - see bug 356038.)
Comment 2 Jonathan Kew (:jfkthame) 2011-11-19 06:20:11 PST
Created attachment 575659 [details] [diff] [review]
patch, add an alias for the de-AT-1901 lang tag
Comment 3 j.j. 2011-11-19 07:24:50 PST
Wy not adding also "de-CH-1901" ?
Comment 4 Gernot Katzer 2011-11-20 01:48:05 PST
In the Swiss case, it is not as easy, since they do spell a couple of words differently.

This applies to all words containing ß: In Switzerland, ß is always replaced by ss. Of course, this means that the hyphenation patterns must be recreated, as the vocabulary (the set of all well-formed character strings) is different.

As far as I get it from the web, the rules go as follows. Note that in Swiss orthography, any ß is replaced by ss; when I  write „any word with ß“, this refers to the spelling outside of Switzerland.

*) In old orthography, any word with ß that is hyphenated -ß (before the ß), in Switzerland is hyphenated -ss (before the ss).

*) In new orthography, any word with ß that is hyphenated -ß in Switzerland is hyphenated s-s.

If the non-Swiss orthography hyphenates after the ß, then Swiss orthography does basically the same and hyphenates after the ss. 

Examples: 
Stra-ße “street” in Switzerland old Stra-sse, new Stras-se.
Strauß-en-ei “egg of an ostrich” in Switzerland identical Strauss-en-ei.
Old Straß-burg “Strasbourg” should be spelled Strass-burg in new orthography (but the rules are not enforced for proper names), in Swiss style Strass-burg.

Old Orthography: DE,AT -ß  ——> CH -ss,   DE,AT ß- ——> CH ss-
New Orthography: DE,AT -ß  ——> CH s-s,   DE,AT ß- ——> CH ss-

(many cases of old ß- have been reformed to new ss-, so that these are now identical in DE,AT,CH; nevertheless, there are enough cases left with ß- in new orthography)

So I guess that de-CH-1901 and de-CH both need hyphenation patterns of their own. However, the dictionaries needed to create the hyphenation patterns can be produced automatically from the standard de-1901 and de dictionaries. Is it correct that Firefox uses the TeX algorithm (and the patgen pattern generator) for this?

Please don't take my word for it, but ask someone from Switzerland instead who really knows the rules, while I have just read about them.
Comment 5 Jonathan Kew (:jfkthame) 2011-11-21 00:16:19 PST
Right; I didn't add an alias for de-CH-1901 because de-CH has separate patterns from de-DE, and I am not confident of the relationships among the variants. We could consider this if it is important to Swiss users, and if we get guidance from someone with appropriate expertise, but we should treat it as a separate issue.
Comment 6 Jonathan Kew (:jfkthame) 2011-12-13 05:08:37 PST
https://hg.mozilla.org/mozilla-central/rev/b2c3fd1b871b

Note You need to log in before you can comment on or make changes to this bug.