Last Comment Bug 731536 - text-transform:capitalize behavior is incorrect per CSS3-Text spec
: text-transform:capitalize behavior is incorrect per CSS3-Text spec
Status: RESOLVED FIXED
:
Product: Core
Classification: Components
Component: Layout: Text (show other bugs)
: Trunk
: All All
: -- normal (vote)
: mozilla14
Assigned To: Jonathan Kew (:jfkthame)
:
:
Mentors:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2012-02-29 02:31 PST by Jonathan Kew (:jfkthame)
Modified: 2012-03-16 05:55 PDT (History)
3 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments
patch, rename IsPunctuationMark and don't use it in text-transform:capitalize (4.85 KB, patch)
2012-02-29 04:28 PST, Jonathan Kew (:jfkthame)
smontagu: review-
Details | Diff | Splinter Review
patch, reftest for text-transform:capitalize with punctuation (2.02 KB, patch)
2012-02-29 04:30 PST, Jonathan Kew (:jfkthame)
smontagu: review+
Details | Diff | Splinter Review
patch, adjust text-transform:capitalize to follow css3 text spec (4.05 KB, patch)
2012-03-05 02:52 PST, Jonathan Kew (:jfkthame)
smontagu: review+
Details | Diff | Splinter Review
followup, add a reftest for Deseret text capitalization with punctuation marks (2.17 KB, patch)
2012-03-14 02:09 PDT, Jonathan Kew (:jfkthame)
smontagu: review+
Details | Diff | Splinter Review

Description Jonathan Kew (:jfkthame) 2012-02-29 02:31:10 PST
The method nsContentUtils::IsPunctuationMark is misleadingly named, because it is designed specifically to support CSS first-letter, but for this purpose it recognizes a specific _subset_ of punctuation marks, not all Unicode characters with GC=P*.

We should rename it to something like IsPunctuationForFirstLetter to make its specialized nature clear.

Currently, there appear to be two clients of this method: the implementation of first-letter, for which it does the right thing, and the implementation of text-transform:capitalize. In the latter case, it leads to the oddity shown by this example:

data:text/html;charset=utf-8,<div style="text-transform:capitalize">(this) "is" [a] –short– -test- «for» *the* ¿capitalize? ¡transform!

where Firefox does _not_ capitalize the words "short" or "test". Safari, OTOH, does capitalize them.

(The current CSS3 Text spec says that "capitalize" should put "the first _letter_ of each word in titlecase", where "letter" is defined as "a character belonging to one of the Letter or Number general categories". According to this, Safari is right and Firefox is wrong. Using a definition of IsPunctuationMark that recognizes all GC=P* characters would be an improvement, though )

So I think we should (a) modify the implementation of capitalize to avoid relying on nsContentUtils::IsPunctuationMark, as that is not the correct test; and (b) change the name of that method to clarify its specialized nature.
Comment 1 Jonathan Kew (:jfkthame) 2012-02-29 02:34:26 PST
(In reply to Jonathan Kew (:jfkthame) from comment #0)
> Using a definition
> of IsPunctuationMark that recognizes all GC=P* characters would be an
> improvement, though )

Oops, unfinished sentence! I was going to say that while a more complete IsPunctuationMark would be an improvement, it still ignores characters with Symbol classes; to follow CSS3 Text, we should be testing for !IsAlphanumeric rather than IsPunctuation to skip over leading characters.
Comment 2 Jonathan Kew (:jfkthame) 2012-02-29 04:28:49 PST
Created attachment 601574 [details] [diff] [review]
patch, rename IsPunctuationMark and don't use it in text-transform:capitalize
Comment 3 Jonathan Kew (:jfkthame) 2012-02-29 04:30:54 PST
Created attachment 601575 [details] [diff] [review]
patch, reftest for text-transform:capitalize with punctuation

It's not an exhaustive test, obviously, but includes a few cases that fail currently and are fixed by the patch here.
Comment 4 Jonathan Kew (:jfkthame) 2012-02-29 14:01:29 PST
It turns out the patch here causes the text-transform/all-title.html reftest to fail, mostly because it changes behavior with various characters in the Symbol category (e.g. the circled letters at U+24Dx). So we'll need to consider carefully what changes are or aren't desirable - and perhaps take this up with the CSS WG.
Comment 5 Simon Montagu :smontagu 2012-02-29 23:43:09 PST
Looking at bug 389707 comment 4, using the same punctuation classes in capitalization as in first-letter was by design. I think we should discuss this in www-style before making any change to behaviour here. I agree with the naming change, though, and in fact I was going to make that change anyway before checking in bug 731222 in response to your review comment there.
Comment 6 Simon Montagu :smontagu 2012-03-01 00:31:45 PST
Comment on attachment 601574 [details] [diff] [review]
patch, rename IsPunctuationMark and don't use it in text-transform:capitalize

Review of attachment 601574 [details] [diff] [review]:
-----------------------------------------------------------------

r- until we get the spec sorted out.
Comment 7 Simon Montagu :smontagu 2012-03-01 00:33:57 PST
Comment on attachment 601575 [details] [diff] [review]
patch, reftest for text-transform:capitalize with punctuation

Review of attachment 601575 [details] [diff] [review]:
-----------------------------------------------------------------

... but feel free to check this in as a failing test, mutatis mutandis, in the meanwhile.
Comment 8 Jonathan Kew (:jfkthame) 2012-03-01 06:15:13 PST
(In reply to Simon Montagu from comment #6)
> Comment on attachment 601574 [details] [diff] [review]
> patch, rename IsPunctuationMark and don't use it in text-transform:capitalize
> 
> Review of attachment 601574 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> r- until we get the spec sorted out.

Yep, fair enough!

There was a lengthy www-style thread on text-transform:capitalize a year ago, actually, which I've been re-reading. It's not clear to me that there's much point in re-opening it - there are plenty of cases where no spec or algorithm will be 100% "right", and there's little interest in ratholing about edge cases that can never be fully solved - but I've posted a message basically asking for confirmation that the spec as it currently stands is what we should aim to implement, particularly with regard to the areas where we'll need to modify our current behavior.
Comment 9 Jonathan Kew (:jfkthame) 2012-03-05 02:43:53 PST
Seeing no dissent - or indeed any interest in more discussion on the topic! - in response to http://lists.w3.org/Archives/Public/www-style/2012Mar/0018.html, I think we should simply adjust our implementation to conform to the current css3-text spec, making it capitalize the first alphanumeric character rather than the first character that is not in specific (first-letter) punctuation categories.

This provides a more intuitive result for authors, as well as better matching what the other browsers are doing (though none of them currently follow the spec entirely, as described in the www-style message).
Comment 10 Jonathan Kew (:jfkthame) 2012-03-05 02:52:30 PST
Created attachment 602840 [details] [diff] [review]
patch, adjust text-transform:capitalize to follow css3 text spec

Re-requesting review here (for updated patch), after going through the spec and past www-style discussions. This adjusts our implementation to conform to the spec regarding which character is treated as the "letter" that gets capitalized.

For completeness, I've included surrogate handling in SetupCapitalization, although text-transform:capitalize won't actually work for Deseret until we also fix bugs 210501 and 605021.
Comment 12 Jonathan Kew (:jfkthame) 2012-03-14 02:09:00 PDT
Created attachment 605689 [details] [diff] [review]
followup, add a reftest for Deseret text capitalization with punctuation marks
Comment 15 Marco Bonardo [::mak] 2012-03-16 05:55:44 PDT
https://hg.mozilla.org/mozilla-central/rev/880369fd5beb

Note You need to log in before you can comment on or make changes to this bug.