Open Bug 939739 Opened 12 years ago Updated 3 years ago

Wrong word boundary detection for MidLetter character

Categories

(Core :: Graphics: Text, defect)

28 Branch
defect

Tracking

()

People

(Reporter: jmontane, Unassigned)

Details

User Agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:24.0) Gecko/20100101 Firefox/24.0 (Beta/Release) Build ID: 20130911164256 Steps to reproduce: 1.a.- with Firefox: vist a webpage with following words, visit this bug page is enough. 1.b.- with Thunderbird: copy and paste following text in a mail. ----------8<--8<--8<---------- U+00B7--> abc·def U+0387--> abc·def U+05F4 --> abc״def U+2027 --> abc‧def U+003A --> abc:def U+FE13 --> abc︓def U+FE55 --> abc﹕def U+FF1A --> abc:def U+02D7 --> abc˗def ----------8<--8<--8<---------- 2.- Double-click in any "abc?def" word. Actual results: Only "abc" (or "def") part of word is selected. Expected results: Following Unicode UAX TR29 [1] full word "abc?def" must be selected. Same problem moving cursor with Ctrl+Left (or Right) arrow. After some testing: Safari, Konqueror, Opera, Web GNOME browser(often called Epiphany), rekonq and Chromium work as intended. Full word adb?def is selected with double-click. [1] http://www.unicode.org/reports/tr29/#MidLetter, see WB6 and WB7.
Confirmed in FF 28.0a1 (2013-11-27), Win 7 x64
Status: UNCONFIRMED → NEW
Ever confirmed: true
OS: Linux → All
Hardware: x86_64 → All
(In reply to Joan Montané from comment #0) > After some testing: Safari, Konqueror, Opera, Web GNOME browser(often called > Epiphany), rekonq and Chromium work as intended. Full word adb?def is > selected with double-click. Not quite, in my testing - in the last example (U+02D7 --> abc˗def), both Chrome and Safari on OS X select the parts separately. The others do all select as a single word, though.
(In reply to Jonathan Kew (:jfkthame) from comment #2) > Not quite, in my testing - in the last example (U+02D7 --> abc˗def), both > Chrome and Safari on OS X select the parts separately. The others do all > select as a single word, though. You are right, my fault. Chrome and Safari treat U+02D7 as a word separataor. I guess it's because they use an outdated ICU library. Compare[1] (Unicode 6; date:2010-08-19)for ICU data used by Chrome and [2] (Unicode 6.3; date:2013-07-05) for current ICU library data around Midletter characters. But the issue here is Mozilla doesn't follow UAX TR29 when double-clicking (or arrow) selection. It's annoying when selecting Catalan text, where "·" (U+00B7) is used as Midletter character. [1] https://code.google.com/p/chromium/codesearch#chromium/src/third_party/icu/source/data/unidata/WordBreakProperty.txt&l=830 [2]http://www.unicode.org/Public/UNIDATA/auxiliary/WordBreakProperty.txt
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.