Closed Bug 740477 Opened 13 years ago Closed 13 years ago

The dutch IJ digraph is not handled correctly by text-transform:capitalize

Tracking

()

Status:

RESOLVED FIXED

Milestone:

mozilla14

People

(Reporter: teoli, Assigned: jfkthame)

References

(
URL
)

Details

(Keywords: dev-doc-complete)

Attachments

(2 files, 2 obsolete files)

patch, implement Dutch-specific capitalization for "ij" 13 years ago Jonathan Kew [:jfkthame] 3.08 KB, patch	smontagu : review-	Details \| Diff \| Splinter Review
reftest for Dutch "ij" capitalization 13 years ago Jonathan Kew [:jfkthame] 1.64 KB, patch	smontagu : review+	Details \| Diff \| Splinter Review
patch v2, implement Dutch-specific capitalization for "ij" 13 years ago Jonathan Kew [:jfkthame] 4.27 KB, patch	smontagu : review+	Details \| Diff \| Splinter Review
reftest for Dutch "ij" capitalization, v2 13 years ago Jonathan Kew [:jfkthame] 1.74 KB, patch	jfkthame : review+	Details \| Diff \| Splinter Review

Jean-Yves Perrier [:teoli]

Reporter

Description

•

13 years ago

The Dutch language consider ij as a digraph (see link for reference). It means if a word is capitalized and starts with ij, both letters are capitalized.

ijsland -> IJsland

The behaviour is non-standard only affect only text-transform: capitalize, and not lowercase and uppercase. Also there are a few words in Dutch where ij is not a digraph but as far as I known the ij is not at the beginning of these words.

Jonathan Kew [:jfkthame]

Assignee

Comment 1

•

13 years ago

Note that you'll get the desired behavior if such words are spelled using the Unicode character U+0133 LATIN SMALL LIGATURE IJ, rather than the separate characters "i" and "j".

Compare:
data:text/html;charset=utf-8,<div style="text-transform: capitalize">ijsland
data:text/html;charset=utf-8,<div style="text-transform: capitalize">ĳsland

However, perhaps we should consider special-case handling for the sequence "ij" in content that is specifically tagged as lang="nl".

Jean-Yves Perrier [:teoli]

Reporter

Comment 2

•

13 years ago

Yes, the Unicode character does work but its use is discouraged by Unicode (it is mainly there for legacy purpose): see Unicode 6.1, Ch3 D66 Compatibility decomposable character. Anyway, nobody use it as it is not on the Dutch keyboard layout (http://www.goodtyping.com/teclatDUT.htm ).

I don't think that dutch-related flemish languages have their own language codes, so "nl" only should be ok.

Simon Montagu :smontagu

Updated

•

13 years ago

Comment 3

•

13 years ago

Attached patch patch, implement Dutch-specific capitalization for "ij" (obsolete) — Details — Splinter Review

This implements the requested behavior for elements where lang="nl".

Note that it only applies the digraph-specific behavior (capitalizing the "j" as well) if both "i" and "j" were originally lowercase; thus, "ijsland" -> "IJsland", but "Ijsland" is unchanged by text-transform:capitalize, on the assumption that if it was already entered with mixed case in the "Ij" pair, this was a deliberate choice.

Attachment #610631 - Flags: review?(smontagu)

Jonathan Kew [:jfkthame]

Assignee

Comment 4

•

13 years ago

Attached patch reftest for Dutch "ij" capitalization (obsolete) — Details — Splinter Review

Attachment #610637 - Flags: review?(smontagu)

Jean-Yves Perrier [:teoli]

Reporter

Updated

•

13 years ago

Keywords: dev-doc-needed

Summary: The dutch IJ digram is not handled correctly by text-transform:capitalize → The dutch IJ digraph is not handled correctly by text-transform:capitalize

Simon Montagu :smontagu

Comment 5

•

13 years ago

Comment on attachment 610631 [details] [diff] [review]
patch, implement Dutch-specific capitalization for "ij"

Review of attachment 610631 [details] [diff] [review]:
-----------------------------------------------------------------

This has a bug when the "j" isn't adjacent to the "i", it still gets capitalized.

Maybe also instead of adding another boolean dutchCasing (and in later bugs adding greekCasing, lithuanianCasing and I don't know what all else, have an enum of languages and a languageSpecificCasing variable (or some shorter name)? There will only ever be one applicable language, unless I am very much mistaken.

Attachment #610631 - Flags: review?(smontagu) → review-

Simon Montagu :smontagu

Comment 6

•

13 years ago

Comment on attachment 610637 [details] [diff] [review]
reftest for Dutch "ij" capitalization

Review of attachment 610637 [details] [diff] [review]:
-----------------------------------------------------------------

Add a case with non-adjancent i/j to test the bug I mentioned in the previous comment

Attachment #610637 - Flags: review?(smontagu) → review+

Jonathan Kew [:jfkthame]

Assignee

Comment 7

•

13 years ago

Attached patch patch v2, implement Dutch-specific capitalization for "ij" — Details — Splinter Review

Good catch, thanks. Fixed in this version.

Assignee: nobody → jfkthame

Attachment #610631 - Attachment is obsolete: true

Attachment #610996 - Flags: review?(smontagu)

Jonathan Kew [:jfkthame]

Assignee

Comment 8

•

13 years ago

Attached patch reftest for Dutch "ij" capitalization, v2 — Details — Splinter Review

Added a case with "ixj" to the test; carry forward r=smontagu.

Attachment #610637 - Attachment is obsolete: true

Attachment #610998 - Flags: review+

Gordon P. Hemsley [:GPHemsley]

Comment 9

•

13 years ago

Just a (somewhat tangential) thought:
Perhaps it would be better to disentangle the name of the change from the language(s) associated with it?

This is more related to the Turkish transformation than the Dutch one, but it's possible for a transformation to be used by more than one language (as with dotless I). So why not name the transformations after what they do, rather than who uses them? Like capitalizedIJDigraph or capitalizeDotlessI, or eIJDigraph or eDotlessI, or something like that?

Jonathan Kew [:jfkthame]

Assignee

Comment 10

•

13 years ago

(In reply to Gordon P. Hemsley [:gphemsley] from comment #9)
> Just a (somewhat tangential) thought:
> Perhaps it would be better to disentangle the name of the change from the
> language(s) associated with it?

We could, although I think it's perfectly reasonable to use the name of a well-known exemplar language even though the behavior may be "borrowed" by other languages that have a similar writing system. If we were exposing this to users somehow, it would need to be carefully considered, but here it's is just a question of naming a local variable within the code.

(Essentially the same thing happens for scripts, many of which are named after the "primary" language that uses them even if they get adopted for writing other languages as well.)

> This is more related to the Turkish transformation than the Dutch one, but
> it's possible for a transformation to be used by more than one language (as
> with dotless I). So why not name the transformations after what they do,
> rather than who uses them? Like capitalizedIJDigraph or capitalizeDotlessI,
> or eIJDigraph or eDotlessI, or something like that?

Personally, I find it most natural to label the behavior as "Turkish" even though it is used by several other languages; I think they have modeled their writing systems on the Turkish one. But I don't feel particularly strongly about it - Simon, any opinion?

Simon Montagu :smontagu

Comment 11

•

13 years ago

At most I think we might add a comment that these are mnemonic names of exemplar languages that have the behaviour we are implementing. Getting it 100% right and pleasing everybody is an unattainable goal anyway -- we have enough problems already with user-facing names for various regions and languages.

Gordon P. Hemsley [:GPHemsley]

Comment 12

•

13 years ago

(In reply to Jonathan Kew (:jfkthame) from comment #10)
> (In reply to Gordon P. Hemsley [:gphemsley] from comment #9)
> > Just a (somewhat tangential) thought:
> > Perhaps it would be better to disentangle the name of the change from the
> > language(s) associated with it?
> 
> We could, although I think it's perfectly reasonable to use the name of a
> well-known exemplar language even though the behavior may be "borrowed" by
> other languages that have a similar writing system. If we were exposing this
> to users somehow, it would need to be carefully considered, but here it's is
> just a question of naming a local variable within the code.

What happens when one language has multiple different transformation requirements, and then another language has one of those but not another? Wouldn't you wind up being in the same position there anyway?

Also, beyond that, having a more descriptive name would help developers who come along in the future who perhaps are not as familiar with the various idiosyncrasies of the language used to name the variable. Or what if a language that is used as a variable name then decides that they are no longer going to use that rule? Then you have a rule that is named after a language that doesn't even use it.

I am also trying to spread the notion of decoupling a language from its writing system or writing conventions. All languages can be written many different ways; it just makes more sense to name a particular convention after the convention itself, rather than any particular language that might be using it at any given time.

> (Essentially the same thing happens for scripts, many of which are named
> after the "primary" language that uses them even if they get adopted for
> writing other languages as well.)

True, but that's probably a different discussion for a different time. ;)

Simon Montagu :smontagu

Updated

•

13 years ago

Attachment #610996 - Flags: review?(smontagu) → review+

Jonathan Kew [:jfkthame]

Assignee

Comment 13

•

13 years ago

Pushed to inbound, with added comments for the enum values:
https://hg.mozilla.org/integration/mozilla-inbound/rev/bb53aec4a302
https://hg.mozilla.org/integration/mozilla-inbound/rev/324368cce885

Target Milestone: --- → mozilla14

Ed Morley [:emorley]

Comment 14

•

13 years ago

https://hg.mozilla.org/mozilla-central/rev/bb53aec4a302
https://hg.mozilla.org/mozilla-central/rev/324368cce885

Status: NEW → RESOLVED

Closed: 13 years ago

Resolution: --- → FIXED

Jean-Yves Perrier [:teoli]

Reporter

Comment 15

•

13 years ago

I've updated https://developer.mozilla.org/en/CSS/text-transform (summary, examples and the browser compatibility table).
and added a note in: https://developer.mozilla.org/en/Firefox_14_for_developers

Keywords: dev-doc-needed → dev-doc-complete

You need to log in before you can comment on or make changes to this bug.