Open
Bug 939739
Opened 12 years ago
Updated 3 years ago
Wrong word boundary detection for MidLetter character
Categories
(Core :: Graphics: Text, defect)
Tracking
()
NEW
People
(Reporter: jmontane, Unassigned)
Details
User Agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:24.0) Gecko/20100101 Firefox/24.0 (Beta/Release)
Build ID: 20130911164256
Steps to reproduce:
1.a.- with Firefox: vist a webpage with following words, visit this bug page is enough.
1.b.- with Thunderbird: copy and paste following text in a mail.
----------8<--8<--8<----------
U+00B7--> abc·def
U+0387--> abc·def
U+05F4 --> abc״def
U+2027 --> abc‧def
U+003A --> abc:def
U+FE13 --> abc︓def
U+FE55 --> abc﹕def
U+FF1A --> abc:def
U+02D7 --> abc˗def
----------8<--8<--8<----------
2.- Double-click in any "abc?def" word.
Actual results:
Only "abc" (or "def") part of word is selected.
Expected results:
Following Unicode UAX TR29 [1] full word "abc?def" must be selected. Same problem moving cursor with Ctrl+Left (or Right) arrow.
After some testing: Safari, Konqueror, Opera, Web GNOME browser(often called Epiphany), rekonq and Chromium work as intended. Full word adb?def is selected with double-click.
[1] http://www.unicode.org/reports/tr29/#MidLetter, see WB6 and WB7.
Comment 1•12 years ago
|
||
Confirmed in FF 28.0a1 (2013-11-27), Win 7 x64
Status: UNCONFIRMED → NEW
Ever confirmed: true
OS: Linux → All
Hardware: x86_64 → All
Comment 2•12 years ago
|
||
(In reply to Joan Montané from comment #0)
> After some testing: Safari, Konqueror, Opera, Web GNOME browser(often called
> Epiphany), rekonq and Chromium work as intended. Full word adb?def is
> selected with double-click.
Not quite, in my testing - in the last example (U+02D7 --> abc˗def), both Chrome and Safari on OS X select the parts separately. The others do all select as a single word, though.
| Reporter | ||
Comment 3•12 years ago
|
||
(In reply to Jonathan Kew (:jfkthame) from comment #2)
> Not quite, in my testing - in the last example (U+02D7 --> abc˗def), both
> Chrome and Safari on OS X select the parts separately. The others do all
> select as a single word, though.
You are right, my fault. Chrome and Safari treat U+02D7 as a word separataor. I guess it's because they use an outdated ICU library. Compare[1] (Unicode 6; date:2010-08-19)for ICU data used by Chrome and [2] (Unicode 6.3; date:2013-07-05) for current ICU library data around Midletter characters.
But the issue here is Mozilla doesn't follow UAX TR29 when double-clicking (or arrow) selection. It's annoying when selecting Catalan text, where "·" (U+00B7) is used as Midletter character.
[1] https://code.google.com/p/chromium/codesearch#chromium/src/third_party/icu/source/data/unidata/WordBreakProperty.txt&l=830
[2]http://www.unicode.org/Public/UNIDATA/auxiliary/WordBreakProperty.txt
Updated•3 years ago
|
Severity: normal → S3
You need to log in
before you can comment on or make changes to this bug.
Description
•