Last Comment Bug 399941 - first-letter should include both leading and trailing punctuation
: first-letter should include both leading and trailing punctuation
Status: RESOLVED FIXED
: css2
Product: Core
Classification: Components
Component: Layout: Text (show other bugs)
: Trunk
: All All
: -- normal (vote)
: ---
Assigned To: Simon Montagu :smontagu
:
Mentors:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2007-10-15 20:48 PDT by Simon Montagu :smontagu
Modified: 2008-07-17 14:56 PDT (History)
3 users (show)
roc: wanted‑next+
roc: blocking1.9-
roc: wanted1.9-
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments
Testcase (1.81 KB, text/html)
2007-10-15 20:48 PDT, Simon Montagu :smontagu
no flags Details
Patch (2.13 KB, patch)
2007-10-16 13:44 PDT, Simon Montagu :smontagu
no flags Details | Diff | Review
Patch (3.05 KB, patch)
2007-10-17 16:19 PDT, Simon Montagu :smontagu
no flags Details | Diff | Review
Various edge cases (1.10 KB, text/html)
2008-06-26 07:31 PDT, Simon Montagu :smontagu
no flags Details
Patch (39.64 KB, patch)
2008-06-26 13:17 PDT, Simon Montagu :smontagu
no flags Details | Diff | Review
Addressed comments (56.05 KB, patch)
2008-06-29 04:52 PDT, Simon Montagu :smontagu
roc: review+
roc: superreview+
Details | Diff | Review

Description Simon Montagu :smontagu 2007-10-15 20:48:38 PDT
Created attachment 285035 [details]
Testcase

From http://www.w3.org/TR/CSS21/selector.html#first-letter :
'Punctuation (i.e, characters defined in Unicode in the "open" (Ps), "close" (Pe), "initial" (Pi). "final" (Pf) and "other" (Po) punctuation classes), that precedes or follows the first letter should be included'

Currently we include punctuation preceding the first letter, but not following. Not a regression, but requesting blocking1.9 for standards compliance.
Comment 1 Simon Montagu :smontagu 2007-10-15 20:58:47 PDT
Just for the record, CSS2 only says 'precedes the first letter'.
Comment 2 Jesse Ruderman 2007-10-15 22:38:50 PDT
Comment 1 refers to CSS2.0, which is no longer what "CSS2" means aiui.
Comment 3 Simon Montagu :smontagu 2007-10-16 13:44:00 PDT
Created attachment 285139 [details] [diff] [review]
Patch
Comment 4 Robert O'Callahan (:roc) (Exited; email my personal email if necessary) 2007-10-16 21:24:19 PDT
Should probably advance to the end of the cluster containing any trailing punctuation.

Should probably call FindClusterEnd to do this (and use that to instead of the find-end-of-cluster-loop already in the FindFirstLetterRange).
Comment 5 Simon Montagu :smontagu 2007-10-17 16:19:41 PDT
Created attachment 285290 [details] [diff] [review]
Patch

This passes all the reftests in layout/reftests/first-letter without assertions, plus some extra ones I added to cover trailing punctuation, and the testcase in bug 329069 (which we should really add a reftest for). I want to test more scenarios before asking for review, but interim comments are always welcome.
Comment 6 Simon Montagu :smontagu 2007-10-18 06:19:01 PDT
Comment on attachment 285290 [details] [diff] [review]
Patch

>+  // consume clusters that start with whitespace or punctuation
>   for (i = 0; i < length; ++i) {
>+    if (IsTrimmableSpace(aFrag, aOffset + i) ||
>+        nsTextFrameUtils::IsPunctuationMark(aFrag->CharAt(aOffset + i))) {
>+      iter.SetSkippedOffset(aOffset + i);
>+      FindClusterEnd(aTextRun, aOffset + length, &iter);
>+      i = iter.GetSkippedOffset() - aOffset;
>+    } else {
>       break;
>+    }

I'm not sure about this part of the algorithm. Do we really want to include punctuation interspersed with whitespace? Would this be better?

  for (i = GetTrimmableWhitespaceCount(aFrag, aOffset, length, 1);
       i < length; ++i) {
    if (nsTextFrameUtils::IsPunctuationMark(aFrag->CharAt(aOffset + i))) {

I'm also not sure if I should be using iter.[GS]etSkippedOffset or [GS]etOriginalOffset
Comment 7 Robert O'Callahan (:roc) (Exited; email my personal email if necessary) 2007-10-18 12:30:21 PDT
> Do we really want to include punctuation interspersed with whitespace?

I think we might as well.

I think instead of making i the key variable, we should just use iter as the key variable. Just compare iter.GetOriginalOffset() to aOffset + aLength.
Comment 8 Robert O'Callahan (:roc) (Exited; email my personal email if necessary) 2008-04-17 02:06:04 PDT
Simon, were you planning to finish this?
Comment 9 Simon Montagu :smontagu 2008-06-26 07:26:15 PDT
After fiddling with this for some time, I'm fairly sure that we don't want to include punctuation interspersed with whitespace. Consider this case:

"*" is an asterisk.

The first-letter style would be applied to '"*" i', which seems totally wrong.

A question that bothers me more, though, is: when is a letter not a letter? CSS doesn't explicitly define what "letter" means in this context, though it does go out of its way to specify that the ':first-letter' also applies if the first letter is in fact a digit. To me this implies that if the first letter is in fact a punctuation mark (other than those specified to be included) or a symbol character, :first-letter should not apply.
Comment 10 Simon Montagu :smontagu 2008-06-26 07:31:50 PDT
Created attachment 326912 [details]
Various edge cases
Comment 11 Simon Montagu :smontagu 2008-06-26 13:17:09 PDT
Created attachment 326983 [details] [diff] [review]
Patch

This does what I believe is right, and only applies first-letter style to letters and numbers (with preceding and following punctuation). It also adds support for surrogate characters to IsPunctuationMark(). I'll be adding lots of reftests too.
Comment 12 Robert O'Callahan (:roc) (Exited; email my personal email if necessary) 2008-06-26 16:42:11 PDT
+  static nsCOMPtr<nsIUGenCategory> genCatService =
+    do_GetService(NS_UNICHARCATEGORY_CONTRACTID);

You might as well cache this with the other services that nsContentUtils caches.

+  // If the next character is not a letter or number, there is no first-letter.
+  // Return PR_TRUE so that we don't go on looking, but set aLength to 0.

So this means that '<span>"</span> F' does get first-letter style applied to the punctuation. That's unfortunate but I guess anything better is a lot harder, so I'm OK with it.

The code for "consume cluster that starts with punctuation" could be shared.

Looks good other than that.
Comment 13 Simon Montagu :smontagu 2008-06-29 04:52:07 PDT
Created attachment 327312 [details] [diff] [review]
Addressed comments
Comment 14 Simon Montagu :smontagu 2008-07-02 11:27:43 PDT
Checked in
Comment 15 Simon Montagu :smontagu 2008-07-17 14:56:43 PDT
Note: if we take this on the 3.0 branch, the branch patch should include the fix to bug 445626

Note You need to log in before you can comment on or make changes to this bug.