Created attachment 575645 [details]
In this file, German (Austia, 1901 [=old orthography]) is mastakenly hyphenated according to rules of 1996 [=new orthography]
User Agent: Mozilla/5.0 (X11; Linux i686; rv:8.0) Gecko/20100101 Firefox/8.0
Build ID: 20111104165243
Steps to reproduce:
Open http://www.uni-graz.at/~katzer/gecko_hyphenation_bug.html (also attached to this post)
This document contains four non-sensical paragraphs with different HTML lang attribute; all of them are automatically hyphenated wia the CSS rule -moz-hyphens: auto
Adjust the window size so that you see actual hyphenation taking place in all four samples.
The first three paragraphs are treated correctly, yet the fourth is not. It should be formatted identically to the third, but it if formatted identically to the first two.
The text which is declared as old orthography is hyphenated according to the rules of the new orthography.
German has two different orthography conventions which also differ in hyphenation. The new orthography (introduced 1996) is selected by default when a lang="de" attribute is encountered; old orthography is declared by a subtag referring to the year of its introduction, e.g. lang="de-1901". This all works in Firefox.
However, Firefox cannot deal with more complex declarations. In my example, lang="de-AT-1901" (meaning: German as spoken in Austria written according to the orthography of 1901) seems to go unrecognized and trigger default behaviour, this is, 1996 type hyphenation.
Note that the country subtag AT should not influence hyphenation at all, as their are no special “Austrian rules”.
I noticed the bug in Firefox 8.0 and confirmed it is still present in 10.0a2 (2011-11-13).
You can work around this by adding a new entry in about:config with the name "intl.hyphenation-alias.de-AT-1901" and the string value "de-1901" (and then restart the browser).
We should probably add this to the predefined set of aliases.
(In the future, we should replace this mechanism with improved support for BCP47-based language tagging and tag matching - see bug 356038.)
Created attachment 575659 [details] [diff] [review]
patch, add an alias for the de-AT-1901 lang tag
Wy not adding also "de-CH-1901" ?
In the Swiss case, it is not as easy, since they do spell a couple of words differently.
This applies to all words containing ß: In Switzerland, ß is always replaced by ss. Of course, this means that the hyphenation patterns must be recreated, as the vocabulary (the set of all well-formed character strings) is different.
As far as I get it from the web, the rules go as follows. Note that in Swiss orthography, any ß is replaced by ss; when I write „any word with ß“, this refers to the spelling outside of Switzerland.
*) In old orthography, any word with ß that is hyphenated -ß (before the ß), in Switzerland is hyphenated -ss (before the ss).
*) In new orthography, any word with ß that is hyphenated -ß in Switzerland is hyphenated s-s.
If the non-Swiss orthography hyphenates after the ß, then Swiss orthography does basically the same and hyphenates after the ss.
Stra-ße “street” in Switzerland old Stra-sse, new Stras-se.
Strauß-en-ei “egg of an ostrich” in Switzerland identical Strauss-en-ei.
Old Straß-burg “Strasbourg” should be spelled Strass-burg in new orthography (but the rules are not enforced for proper names), in Swiss style Strass-burg.
Old Orthography: DE,AT -ß ——> CH -ss, DE,AT ß- ——> CH ss-
New Orthography: DE,AT -ß ——> CH s-s, DE,AT ß- ——> CH ss-
(many cases of old ß- have been reformed to new ss-, so that these are now identical in DE,AT,CH; nevertheless, there are enough cases left with ß- in new orthography)
So I guess that de-CH-1901 and de-CH both need hyphenation patterns of their own. However, the dictionaries needed to create the hyphenation patterns can be produced automatically from the standard de-1901 and de dictionaries. Is it correct that Firefox uses the TeX algorithm (and the patgen pattern generator) for this?
Please don't take my word for it, but ask someone from Switzerland instead who really knows the rules, while I have just read about them.
Right; I didn't add an alias for de-CH-1901 because de-CH has separate patterns from de-DE, and I am not confident of the relationships among the variants. We could consider this if it is important to Swiss users, and if we get guidance from someone with appropriate expertise, but we should treat it as a separate issue.