Closed Bug 740477 Opened 13 years ago Closed 13 years ago

The dutch IJ digraph is not handled correctly by text-transform:capitalize

Categories

(Core :: Internationalization, defect)

x86
macOS
defect
Not set
minor

Tracking

()

RESOLVED FIXED
mozilla14

People

(Reporter: teoli, Assigned: jfkthame)

References

()

Details

(Keywords: dev-doc-complete)

Attachments

(2 files, 2 obsolete files)

The Dutch language consider ij as a digraph (see link for reference). It means if a word is capitalized and starts with ij, both letters are capitalized.

ijsland -> IJsland

The behaviour is non-standard only affect only text-transform: capitalize, and not lowercase and uppercase. Also there are a few words in Dutch where ij is not a digraph but as far as I known the ij is not at the beginning of these words.
Note that you'll get the desired behavior if such words are spelled using the Unicode character U+0133 LATIN SMALL LIGATURE IJ, rather than the separate characters "i" and "j".

Compare:
data:text/html;charset=utf-8,<div style="text-transform: capitalize">ijsland
data:text/html;charset=utf-8,<div style="text-transform: capitalize">ijsland

However, perhaps we should consider special-case handling for the sequence "ij" in content that is specifically tagged as lang="nl".
Yes, the Unicode character does work but its use is discouraged by Unicode (it is mainly there for legacy purpose): see Unicode 6.1, Ch3 D66 Compatibility decomposable character. Anyway, nobody use it as it is not on the Dutch keyboard layout (http://www.goodtyping.com/teclatDUT.htm ).

I don't think that dutch-related flemish languages have their own language codes, so "nl" only should be ok.
See Also: → 92176
This implements the requested behavior for elements where lang="nl".

Note that it only applies the digraph-specific behavior (capitalizing the "j" as well) if both "i" and "j" were originally lowercase; thus, "ijsland" -> "IJsland", but "Ijsland" is unchanged by text-transform:capitalize, on the assumption that if it was already entered with mixed case in the "Ij" pair, this was a deliberate choice.
Attachment #610631 - Flags: review?(smontagu)
Attachment #610637 - Flags: review?(smontagu)
Keywords: dev-doc-needed
Summary: The dutch IJ digram is not handled correctly by text-transform:capitalize → The dutch IJ digraph is not handled correctly by text-transform:capitalize
Comment on attachment 610631 [details] [diff] [review]
patch, implement Dutch-specific capitalization for "ij"

Review of attachment 610631 [details] [diff] [review]:
-----------------------------------------------------------------

This has a bug when the "j" isn't adjacent to the "i", it still gets capitalized.

Maybe also instead of adding another boolean dutchCasing (and in later bugs adding greekCasing, lithuanianCasing and I don't know what all else, have an enum of languages and a languageSpecificCasing variable (or some shorter name)? There will only ever be one applicable language, unless I am very much mistaken.
Attachment #610631 - Flags: review?(smontagu) → review-
Comment on attachment 610637 [details] [diff] [review]
reftest for Dutch "ij" capitalization

Review of attachment 610637 [details] [diff] [review]:
-----------------------------------------------------------------

Add a case with non-adjancent i/j to test the bug I mentioned in the previous comment
Attachment #610637 - Flags: review?(smontagu) → review+
Good catch, thanks. Fixed in this version.
Assignee: nobody → jfkthame
Attachment #610631 - Attachment is obsolete: true
Attachment #610996 - Flags: review?(smontagu)
Added a case with "ixj" to the test; carry forward r=smontagu.
Attachment #610637 - Attachment is obsolete: true
Attachment #610998 - Flags: review+
Just a (somewhat tangential) thought:
Perhaps it would be better to disentangle the name of the change from the language(s) associated with it?

This is more related to the Turkish transformation than the Dutch one, but it's possible for a transformation to be used by more than one language (as with dotless I). So why not name the transformations after what they do, rather than who uses them? Like capitalizedIJDigraph or capitalizeDotlessI, or eIJDigraph or eDotlessI, or something like that?
(In reply to Gordon P. Hemsley [:gphemsley] from comment #9)
> Just a (somewhat tangential) thought:
> Perhaps it would be better to disentangle the name of the change from the
> language(s) associated with it?

We could, although I think it's perfectly reasonable to use the name of a well-known exemplar language even though the behavior may be "borrowed" by other languages that have a similar writing system. If we were exposing this to users somehow, it would need to be carefully considered, but here it's is just a question of naming a local variable within the code.

(Essentially the same thing happens for scripts, many of which are named after the "primary" language that uses them even if they get adopted for writing other languages as well.)

> This is more related to the Turkish transformation than the Dutch one, but
> it's possible for a transformation to be used by more than one language (as
> with dotless I). So why not name the transformations after what they do,
> rather than who uses them? Like capitalizedIJDigraph or capitalizeDotlessI,
> or eIJDigraph or eDotlessI, or something like that?

Personally, I find it most natural to label the behavior as "Turkish" even though it is used by several other languages; I think they have modeled their writing systems on the Turkish one. But I don't feel particularly strongly about it - Simon, any opinion?
At most I think we might add a comment that these are mnemonic names of exemplar languages that have the behaviour we are implementing. Getting it 100% right and pleasing everybody is an unattainable goal anyway -- we have enough problems already with user-facing names for various regions and languages.
(In reply to Jonathan Kew (:jfkthame) from comment #10)
> (In reply to Gordon P. Hemsley [:gphemsley] from comment #9)
> > Just a (somewhat tangential) thought:
> > Perhaps it would be better to disentangle the name of the change from the
> > language(s) associated with it?
> 
> We could, although I think it's perfectly reasonable to use the name of a
> well-known exemplar language even though the behavior may be "borrowed" by
> other languages that have a similar writing system. If we were exposing this
> to users somehow, it would need to be carefully considered, but here it's is
> just a question of naming a local variable within the code.

What happens when one language has multiple different transformation requirements, and then another language has one of those but not another? Wouldn't you wind up being in the same position there anyway?

Also, beyond that, having a more descriptive name would help developers who come along in the future who perhaps are not as familiar with the various idiosyncrasies of the language used to name the variable. Or what if a language that is used as a variable name then decides that they are no longer going to use that rule? Then you have a rule that is named after a language that doesn't even use it.

I am also trying to spread the notion of decoupling a language from its writing system or writing conventions. All languages can be written many different ways; it just makes more sense to name a particular convention after the convention itself, rather than any particular language that might be using it at any given time.

> (Essentially the same thing happens for scripts, many of which are named
> after the "primary" language that uses them even if they get adopted for
> writing other languages as well.)

True, but that's probably a different discussion for a different time. ;)
Attachment #610996 - Flags: review?(smontagu) → review+
Pushed to inbound, with added comments for the enum values:
https://hg.mozilla.org/integration/mozilla-inbound/rev/bb53aec4a302
https://hg.mozilla.org/integration/mozilla-inbound/rev/324368cce885
Target Milestone: --- → mozilla14
I've updated https://developer.mozilla.org/en/CSS/text-transform (summary, examples and the browser compatibility table).
and added a note in: https://developer.mozilla.org/en/Firefox_14_for_developers
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: