Closed
Bug 740477
Opened 13 years ago
Closed 13 years ago
The dutch IJ digraph is not handled correctly by text-transform:capitalize
Categories
(Core :: Internationalization, defect)
Tracking
()
RESOLVED
FIXED
mozilla14
People
(Reporter: teoli, Assigned: jfkthame)
References
()
Details
(Keywords: dev-doc-complete)
Attachments
(2 files, 2 obsolete files)
4.27 KB,
patch
|
smontagu
:
review+
|
Details | Diff | Splinter Review |
1.74 KB,
patch
|
jfkthame
:
review+
|
Details | Diff | Splinter Review |
The Dutch language consider ij as a digraph (see link for reference). It means if a word is capitalized and starts with ij, both letters are capitalized. ijsland -> IJsland The behaviour is non-standard only affect only text-transform: capitalize, and not lowercase and uppercase. Also there are a few words in Dutch where ij is not a digraph but as far as I known the ij is not at the beginning of these words.
Assignee | ||
Comment 1•13 years ago
|
||
Note that you'll get the desired behavior if such words are spelled using the Unicode character U+0133 LATIN SMALL LIGATURE IJ, rather than the separate characters "i" and "j". Compare: data:text/html;charset=utf-8,<div style="text-transform: capitalize">ijsland data:text/html;charset=utf-8,<div style="text-transform: capitalize">ijsland However, perhaps we should consider special-case handling for the sequence "ij" in content that is specifically tagged as lang="nl".
Reporter | ||
Comment 2•13 years ago
|
||
Yes, the Unicode character does work but its use is discouraged by Unicode (it is mainly there for legacy purpose): see Unicode 6.1, Ch3 D66 Compatibility decomposable character. Anyway, nobody use it as it is not on the Dutch keyboard layout (http://www.goodtyping.com/teclatDUT.htm ). I don't think that dutch-related flemish languages have their own language codes, so "nl" only should be ok.
Assignee | ||
Comment 3•13 years ago
|
||
This implements the requested behavior for elements where lang="nl". Note that it only applies the digraph-specific behavior (capitalizing the "j" as well) if both "i" and "j" were originally lowercase; thus, "ijsland" -> "IJsland", but "Ijsland" is unchanged by text-transform:capitalize, on the assumption that if it was already entered with mixed case in the "Ij" pair, this was a deliberate choice.
Attachment #610631 -
Flags: review?(smontagu)
Assignee | ||
Comment 4•13 years ago
|
||
Attachment #610637 -
Flags: review?(smontagu)
Reporter | ||
Updated•13 years ago
|
Keywords: dev-doc-needed
Summary: The dutch IJ digram is not handled correctly by text-transform:capitalize → The dutch IJ digraph is not handled correctly by text-transform:capitalize
Comment 5•13 years ago
|
||
Comment on attachment 610631 [details] [diff] [review] patch, implement Dutch-specific capitalization for "ij" Review of attachment 610631 [details] [diff] [review]: ----------------------------------------------------------------- This has a bug when the "j" isn't adjacent to the "i", it still gets capitalized. Maybe also instead of adding another boolean dutchCasing (and in later bugs adding greekCasing, lithuanianCasing and I don't know what all else, have an enum of languages and a languageSpecificCasing variable (or some shorter name)? There will only ever be one applicable language, unless I am very much mistaken.
Attachment #610631 -
Flags: review?(smontagu) → review-
Comment 6•13 years ago
|
||
Comment on attachment 610637 [details] [diff] [review] reftest for Dutch "ij" capitalization Review of attachment 610637 [details] [diff] [review]: ----------------------------------------------------------------- Add a case with non-adjancent i/j to test the bug I mentioned in the previous comment
Attachment #610637 -
Flags: review?(smontagu) → review+
Assignee | ||
Comment 7•13 years ago
|
||
Good catch, thanks. Fixed in this version.
Assignee: nobody → jfkthame
Attachment #610631 -
Attachment is obsolete: true
Attachment #610996 -
Flags: review?(smontagu)
Assignee | ||
Comment 8•13 years ago
|
||
Added a case with "ixj" to the test; carry forward r=smontagu.
Attachment #610637 -
Attachment is obsolete: true
Attachment #610998 -
Flags: review+
Comment 9•13 years ago
|
||
Just a (somewhat tangential) thought: Perhaps it would be better to disentangle the name of the change from the language(s) associated with it? This is more related to the Turkish transformation than the Dutch one, but it's possible for a transformation to be used by more than one language (as with dotless I). So why not name the transformations after what they do, rather than who uses them? Like capitalizedIJDigraph or capitalizeDotlessI, or eIJDigraph or eDotlessI, or something like that?
Assignee | ||
Comment 10•13 years ago
|
||
(In reply to Gordon P. Hemsley [:gphemsley] from comment #9) > Just a (somewhat tangential) thought: > Perhaps it would be better to disentangle the name of the change from the > language(s) associated with it? We could, although I think it's perfectly reasonable to use the name of a well-known exemplar language even though the behavior may be "borrowed" by other languages that have a similar writing system. If we were exposing this to users somehow, it would need to be carefully considered, but here it's is just a question of naming a local variable within the code. (Essentially the same thing happens for scripts, many of which are named after the "primary" language that uses them even if they get adopted for writing other languages as well.) > This is more related to the Turkish transformation than the Dutch one, but > it's possible for a transformation to be used by more than one language (as > with dotless I). So why not name the transformations after what they do, > rather than who uses them? Like capitalizedIJDigraph or capitalizeDotlessI, > or eIJDigraph or eDotlessI, or something like that? Personally, I find it most natural to label the behavior as "Turkish" even though it is used by several other languages; I think they have modeled their writing systems on the Turkish one. But I don't feel particularly strongly about it - Simon, any opinion?
Comment 11•13 years ago
|
||
At most I think we might add a comment that these are mnemonic names of exemplar languages that have the behaviour we are implementing. Getting it 100% right and pleasing everybody is an unattainable goal anyway -- we have enough problems already with user-facing names for various regions and languages.
Comment 12•13 years ago
|
||
(In reply to Jonathan Kew (:jfkthame) from comment #10) > (In reply to Gordon P. Hemsley [:gphemsley] from comment #9) > > Just a (somewhat tangential) thought: > > Perhaps it would be better to disentangle the name of the change from the > > language(s) associated with it? > > We could, although I think it's perfectly reasonable to use the name of a > well-known exemplar language even though the behavior may be "borrowed" by > other languages that have a similar writing system. If we were exposing this > to users somehow, it would need to be carefully considered, but here it's is > just a question of naming a local variable within the code. What happens when one language has multiple different transformation requirements, and then another language has one of those but not another? Wouldn't you wind up being in the same position there anyway? Also, beyond that, having a more descriptive name would help developers who come along in the future who perhaps are not as familiar with the various idiosyncrasies of the language used to name the variable. Or what if a language that is used as a variable name then decides that they are no longer going to use that rule? Then you have a rule that is named after a language that doesn't even use it. I am also trying to spread the notion of decoupling a language from its writing system or writing conventions. All languages can be written many different ways; it just makes more sense to name a particular convention after the convention itself, rather than any particular language that might be using it at any given time. > (Essentially the same thing happens for scripts, many of which are named > after the "primary" language that uses them even if they get adopted for > writing other languages as well.) True, but that's probably a different discussion for a different time. ;)
Updated•13 years ago
|
Attachment #610996 -
Flags: review?(smontagu) → review+
Assignee | ||
Comment 13•13 years ago
|
||
Pushed to inbound, with added comments for the enum values: https://hg.mozilla.org/integration/mozilla-inbound/rev/bb53aec4a302 https://hg.mozilla.org/integration/mozilla-inbound/rev/324368cce885
Target Milestone: --- → mozilla14
Comment 14•13 years ago
|
||
https://hg.mozilla.org/mozilla-central/rev/bb53aec4a302 https://hg.mozilla.org/mozilla-central/rev/324368cce885
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Reporter | ||
Comment 15•13 years ago
|
||
I've updated https://developer.mozilla.org/en/CSS/text-transform (summary, examples and the browser compatibility table). and added a note in: https://developer.mozilla.org/en/Firefox_14_for_developers
Keywords: dev-doc-needed → dev-doc-complete
You need to log in
before you can comment on or make changes to this bug.
Description
•