Closed Bug 389421 Opened 15 years ago Closed 15 years ago

"layout.word_select.stop_at_punctuation" preference no longer works

Categories

(Core :: Layout: Text and Fonts, defect)

x86
Linux
defect
Not set
normal

Tracking

()

RESOLVED FIXED

People

(Reporter: bzbarsky, Assigned: roc)

References

(Blocks 1 open bug)

Details

(Keywords: regression)

Attachments

(1 file, 2 obsolete files)

BUILD: Current Linux build

STEPS TO REPRODUCE:

1)  Load http://www.mozilla.org/
2)  Click after the URL in the URL bar
3)  Hit the right arrow to make sure the URL is not selected
4)  Hit Ctrl-backspace

EXPECTED RESULTS: Entire URL is deleted

ACTUAL RESULTS: Only the '/' at the end is deleted.

NOTE:  You can also double-click the URL in the URL bar -- that should select the whole URL but only selects parts between '/' and '.'.

Basically, the behavior is as if layout.word_select.stop_at_punctuation is set to true... but it's not.
Flags: blocking1.9?
There's a front-end patch to fix the URL-bar issue a different way, for what it's worth, to special-case double-clicking in the URL bar on X11-based platforms.  I remember the patch getting review+ in the past few days, but I can't find the bug right now.

If that's fixed, I suspect the pref will no longer be needed.
That's bug 190615.
> If that's fixed, I suspect the pref will no longer be needed.

I do have to say I like being able to select things like URIs and variable names in code by double-clicking, though...  Oh, well.
Attached patch fix (obsolete) — Splinter Review
This reworks PeekOffsetWord a bit. The basic idea is to track punctuation state across frames and take a uniform approach to checking whether we're between punctuation and non-punctuation, and if we are, deciding whether to break or not. When we're between punctuation we always obey the pref and ignore the word-breaker results.

This also fixes bug 385565 and bug 385562. It also fixes some bugs related to word-selection across text nodes; not sure if those are filed.

Kinda guessing at the reviewers here... Uri, if you want me to ask Simon instead, I'll do that. Thanks!
Assignee: nobody → roc
Status: NEW → ASSIGNED
Attachment #274122 - Flags: superreview?(mats.palmgren)
Attachment #274122 - Flags: review?(uriber)
Comment on attachment 274122 [details] [diff] [review]
fix

actually Simon should be able to do this
Attachment #274122 - Flags: review?(uriber) → review?(smontagu)
Comment on attachment 274122 [details] [diff] [review]
fix

>-      // sawBeforeType means "we already saw characters of the type
>+      // mSawBeforeType means "we already saw characters of the type
>       // before the boundary we're looking for". Examples:
>       // 1. If we're moving forward, looking for a word beginning (i.e. a boundary
>       //    between whitespace and non-whitespace), then eatingWS==PR_TRUE means
>       //    "we already saw some whitespace".
>       // 2. If we're moving backward, looking for a word beginning (i.e. a boundary
>       //    between non-whitespace and whitespace), then eatingWS==PR_TRUE means
>       //    "we already saw some non-whitespace".

If you want to keep this comment (which might not be necessary, since you added your own explanation in the declaration of PeekWordState), you can use this opportunity to change "eatingWS" to "state.mSawBeforeType" (I forgot to do this last time I renamed the variable).
Sure. Does the rest of the code look alright to you?
Looks good, I have a question though.
Given layout.word_select.stop_at_punctuation=true and the text
"http://www.mozilla.org/projects/minefield/" with the caret to the right
of the 'j'.

When selecting words forward (Shift+Right) in a LTR input we stop after
the punctuation (so it's included in the selection) and when "deselecting"
words (Shift+Left) we stop before the punctuation (so the punctuation remains
selected).  I assume this is an intended change (for bug 385565).
But what do we want to do in the RTL case? 
With the patch, selecting forward (Shift+Right) in a RTL input we stop *before*
the punctuation (since it selects visually to the left) and for Shift+Left
we stop to the right of the punctuation (so it's not part of the selection).
This is the opposite behaviour from the LTR case.
In other words, what the patch seems to be implementing is stopping
before/after punctuation visually - should we instead implement stopping
before/after punctuation logically?
Comment on attachment 274122 [details] [diff] [review]
fix

On the Fārsī wikipedia page:
http://fa.wikipedia.org/wiki/%D8%B5%D9%81%D8%AD%D9%87%D9%94_%D8%A7%D8%B5%D9%84%DB%8C
I get a new assertion when selecting text (Ctrl+Shift+Right)
###!!! ASSERTION: invalid array index: 'i < Length()', file ../../dist/include/xpcom/nsTArray.h, line 318
Attachment #274122 - Flags: superreview?(mats.palmgren) → superreview-
(In reply to comment #8)
> But what do we want to do in the RTL case? 
> With the patch, selecting forward (Shift+Right) in a RTL input

Actually, in RTL input, Shift+Right selects *backward* (logically!)

> we stop *before*
> the punctuation (since it selects visually to the left)

If the text itself is LTR, Shift+Right stops after (i.e., to the right of) the punctuation. If the text is RTL, it still stops after (this time, to the left of) the punctuation - which is consistent with the behavior of Shift-Left in an LTR field.

> and for Shift+Left
> we stop to the right of the punctuation (so it's not part of the selection).

Only if your text is LTR, in which case, "to the right of" means "after", which is consistent with the behavior of Shift+Right in LTR context.

> This is the opposite behaviour from the LTR case.
> In other words, what the patch seems to be implementing is stopping
> before/after punctuation visually - should we instead implement stopping
> before/after punctuation logically?
> 

So, no, it does behave logically in RTL as well as LTR. You just have to remember that when the control direction is RTL, "right arrow" means backward and "left arrow" means forward (regardless of the directionality of the characters themselves).
(In reply to comment #7)
> Sure. Does the rest of the code look alright to you?
> 

I looked over it and didn't spot anything wrong, but that doesn't really say much, as this code causes my brain to spin (I've been too far away from it for too long). So I think Simon should review this as well.
Flags: blocking1.9? → blocking1.9+
Attached patch fix v2 (obsolete) — Splinter Review
Stop that assertion firing.

When moving backwards through text, the "before" offset could be at the end of the text frame. We actually have no break data for that offset, because we don't actually know if a break is allowed there. Make the word-break array big enough to hold a "false" value for that offset. (There's a similar situation for moving forwards: mWordBreaks[0] is always false; we're assuming that there's never a word break at an end of the frame.)
Attachment #274122 - Attachment is obsolete: true
Attachment #275728 - Flags: superreview?(mats.palmgren)
Attachment #275728 - Flags: review?(smontagu)
Attachment #274122 - Flags: review?(smontagu)
Comment on attachment 275728 [details] [diff] [review]
fix v2

Please add XXX to the new comment, and file a bug on it?

Looks good!  sr=mats
Attachment #275728 - Flags: superreview?(mats.palmgren) → superreview+
I noticed that with the patch, word selection does stop (even when stop_at_punctuation=true) on some characters (such as -,_,+,=, of Unicode types Pd, Pc, Sm), on which the old implementation didn't stop.
This means that unlike in 1.8, URLs are not necessarily selectable with a double-click when stop_at_punctuation=true. I'm not sure whether this is considered a problem or not.
Blocks: 206141
Duplicate of this bug: 392412
(In reply to comment #15)
> I noticed that with the patch, word selection does stop (even when
> stop_at_punctuation=true) on some characters (such as -,_,+,=, of Unicode types
> Pd, Pc, Sm), on which the old implementation didn't stop.
> This means that unlike in 1.8, URLs are not necessarily selectable with a
> double-click when stop_at_punctuation=true. I'm not sure whether this is
> considered a problem or not.

I'm slightly confused. You're saying that when stop_at_punctuation is true, in 1.8 we *didn't* stop at -,_,+,=, but now we do?

My reading of nsTextTransformer.cpp suggests we *were* stopping at those characters in 1.8. Even if we weren't, I think it's probably a good thing that we are stopping at those characters now.
(In reply to comment #17)
> (In reply to comment #15)
> > I noticed that with the patch, word selection does stop (even when
> > stop_at_punctuation=true) on some characters (such as -,_,+,=, of Unicode types
> > Pd, Pc, Sm), on which the old implementation didn't stop.
> > This means that unlike in 1.8, URLs are not necessarily selectable with a
> > double-click when stop_at_punctuation=true. I'm not sure whether this is
> > considered a problem or not.
> 
> I'm slightly confused. You're saying that when stop_at_punctuation is true, in
> 1.8 we *didn't* stop at -,_,+,=, but now we do?
> 
> My reading of nsTextTransformer.cpp suggests we *were* stopping at those
> characters in 1.8. Even if we weren't, I think it's probably a good thing that
> we are stopping at those characters now.
> 

Errr... I guess s/stop_at_punctuation=true/stop_at_punctuation=false/g in comment #15.
I'm traveling now, so I can't test with the patch, but I'm pretty sure that's what I meant. Sorry.

I'm also seeing what Uri described in comment 15.
It occurs for both stop_at_punctuation=true/false, but only when
eat_space_to_next_word=false (default on Linux but not Win32).

Testcase: place caret at the left edge, (kbd) word-select to the right:
aa++bb//cc

Fx2 "stops at +"?
             eat_space=0 eat_space=1
stop_at_p=0     no          no
stop_at_p=1     after       after

Fx2 "stops at /"?
             eat_space=0 eat_space=1
stop_at_p=0     no          no
stop_at_p=1     after       after


Trunk with patch2 "stops at +"?
             eat_space=0      eat_space=1
stop_at_p=0  before and after    no
stop_at_p=1  before and after    no

Trunk with patch2 "stops at /"?
             eat_space=0 eat_space=1
stop_at_p=0     no          no
stop_at_p=1     after       after

It seems the patch makes any word breaking character behave like a space
when eat_space_to_next_word=false.  Was that intended?

FWIW, I tested IE7 a bit and it seems they have different word select
behavior in the URL bar compared to a <textarea>.  In the URL bar they
treat plus, dot and SPACE the same, they always stop after it, never
before it (so Firefox 2/Win32 is compatible with that, given the default
eat_space_to_next_word=true). In a <textarea> however, IE7 stops both
before and after plus and dot (but not SPACE), and they treat a sequence
of those as a single unit, eg aa++..++bb stops before
the first plus and then before the first "b".
AFAICT, IE7 treats any sequence of -_#?,;:.+=(){}[]^& as a single unit,
exception for / and \ which seems to get special treatment in <textarea>s
but not the URL bar.

IMO, the current prefs doesn't map very well to native platform behavior.
For example, the native behavior on both Linux and MacOSX when extending
the selection stops before punctuation, but when narrowing the selection
it stops after it (in the direction relative to the anchor point).
For example,
with the caret at the left edge and then typing Ctrl+Shift+Right twice,
it selects "aa" and then "aa++bb", then typing Ctrl+Shift+Left twice
selects "aa++" and then "". (same as for words separated by space)

I'm still ok with the patch, but it would be nice if we could make the
"before and after" instead be "before or after depending on direction
relative to the anchor point" to better match the platform behavior.

FWIW, Opera/Win32 seems to be compatible with the Fx2/IE7-urlbar behavior,
they appear to have the same behavior in the URL bar as in <textarea>.
Opera on Linux appears to match the Win32 version.
Safari have consistent behavior in the URL bar and <textarea>, AFAICT.
Oh crud. The reason is that those characters are deliberately excluded from nsTextFrameUtils::IsPunctuationMark because that's what we wanted from first-letter.

http://mxr.mozilla.org/seamonkey/source/layout/generic/nsTextFrameUtils.cpp#56

So should I add another CCmap for this? or should we have an API that lets me get the Unicode char type of a character? (That would cover all cases like this where performance is not an issue.) Or what? Simon, any suggestions?
(In reply to comment #20)
> should we have an API that lets me
> get the Unicode char type of a character?

http://lxr.mozilla.org/seamonkey/source/intl/unicharutil/public/nsIUGenCategory.h
That component currently isn't built and I'm reluctant to take its footprint hit just for this.

How bad would it be if I just hardcoded the ASCII punctuation characters and treated the rest as not punctuation for the purposes of this pref?
It wouldn't be so bad if we could use that component for the first-letter stuff, i.e., ripping out the ccmap that we're currently using, but we can't because it only gives us a general classification instead of the precise classification.
It looks like the Unicode paths in nsTextTransformer never honoured this pref. I'll revise the patch to hardcode ASCII punctuation knowledge and treat non-ASCII punctuation as non-punctuation.
Attached patch updated patchSplinter Review
Ok, I found that spellchecker has its own private (!) copy of the category tables, which it actually instantiates twice (!!). So I filed bug 393356 to move that back to intl/ where it belongs and where I can use that data for this bug.

This patch makes all P and S type Unicode characters be subject to the stop_at_punctuation pref. I hope this satisfies everyone. Of course the patch in bug 393356 has to land before this can land.
Attachment #275728 - Attachment is obsolete: true
Attachment #277850 - Flags: superreview+
Attachment #277850 - Flags: review?(smontagu)
Attachment #275728 - Flags: review?(smontagu)
(In reply to comment #24)
> It looks like the Unicode paths in nsTextTransformer never honoured this pref.

Yeah, that's bug 188213, which I guess could be marked FIXED now.
Comment on attachment 277850 [details] [diff] [review]
updated patch

r=me, sorry to have taken so long to get to it.

A few thoughts arising from this:
Do you want to make the same change to the only other caller of nsTextFrameUtils::IsPunctuationMark()? Then we could get rid of it, and punct_marks.ccmap too.

It would be cool if ClusterIterator() became surrogate-aware, but that's probably not 1.9 material.
Attachment #277850 - Flags: review?(smontagu) → review+
Surrogates already cluster, so it kinda is surrogate-aware. It's not aware of non-BMP punctuation, that's all.

We have to leave IsPunctuationMark the way it is, because first-letter is very clear about the specific punctuation that "doesn't count" when looking for the end of a first-letter.
checked in.
Status: ASSIGNED → RESOLVED
Closed: 15 years ago
Resolution: --- → FIXED
Blocks: 385565
You need to log in before you can comment on or make changes to this bug.