Closed Bug 283415 Opened 19 years ago Closed 17 years ago

Caret must be moved by grapheme cluster boundaries

Categories

(Core :: Internationalization, defect, P3)

x86
Windows XP
defect

Tracking

()

RESOLVED WORKSFORME

People

(Reporter: samphan, Assigned: smontagu)

References

(Blocks 1 open bug)

Details

(Keywords: intl)

Attachments

(1 file)

User-Agent:       Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7.5) Gecko/20041107 Firefox/1.0
Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7.5) Gecko/20041107 Firefox/1.0

A grapheme cluster is what users think of as a character, no matter what
underlying representation it is, e.g. base character plus combining characters.
For example, Á (A +  ́) is a grapheme cluster in Latin and นี้ (น +  ี +  ้) is a
grapheme cluster in Thai. For rendering, editing, or other such text process;
the grapheme cluster is treated as a single unit. See bug 283271 for backgrounds.

Caret should be always on cluster boundaries because users think of a cluster as
as a character or otherwise the users can do this :-
ÁA|̃  {caret after second A}
{backspace}
Á̃
or
ก็น|ี้  {caret after น}
{backspace}
ก็ี้

But currently, Mozilla applications on Windows (and *nix build without
--enable-ctl), caret movement is done by Unicode character.

Reproducible: Always

Steps to Reproduce:
1) Load the attached HTML sample below. It consists of a text input field with 5
grapheme clusters, 2 Latin and 3 Thai.
2) Try moving the caret with the arrow key between the begin and the end of the
input field. The caret will look as if it is stop when it is moving pass
non-spacing combining characters. You have to type the arrow key 11 times to
move from begin to end or vice versa.
3) Try moving the caret after a base character and press backspace to delete the
base character. There will be a problem.


Expected Results:  
The caret must stay on cluster boundaries only. The caret should move pass each
cluster in one arrow key typing. You should have to type the arrow key 5 times
to move from begin to end or vice versa.
Try moving the caret in the text input field
Blocks: 283271
Keywords: intl
Severtiy -> major

reason:
considered major loss of function, and there's no easy work around.

allow caret to falls inside a grapheme cluster will make it possible to have an
invalid characters sequence from input.

for example, let's
- a vertical line | a boundary of grapheme cluster,
- an underscore _ a caret position.
- "a" and "xyz" are only two valid characters sequences

if we have a string

  |a|xyz|

and caret is at the end of string

  |a|xyz_

if we allow caret to move inside graphme cluster

  |a|xy_z|

at this point, it is possible for user to accidentaly add more characters
between 'x' and 'y' or delete some characters.
for instance, if the user press <backspace>, the string will now turned out like
this

  |a|x_z|
 
which contains an invalid sequence "xz", an unexpected behavior for the user
(especially if we also consider that sometimes the caret is not actually
"visually moving" -- its position on screen doesn't changed, despite its
position in memory does. this may make a confusion.).


Please feel free to turn the Severity down if you find it inappropriate. Thanks.



Severity: normal → major
Status: UNCONFIRMED → NEW
Ever confirmed: true
Priority: -- → P3
related bug,
bug 283416 : Selection must be done by grapheme cluster boundaries
Blocks: thai
Hallo!

This report (bug) does not cover how caret should move over entities which are not "graphemes" as ZERO WIDTH SPACE (U+200B), ZERO WIDTH NON-JOINER (U+200C), ZERO WIDTH JOINER (U+200D), LRM, RLM, LRE, RLE, LRO, RLO, PDF and probably some others.

They are "content" inside an editbox and there should be a way to distinguish if such content is there or not. Identifying these characters can be time consuming and frustrating. Whatever solution is provided no special skills from users should be required to be able to deal with the presence of such characters.

*note*
I spend many hours analysing wiki pages looking at the rendering, editing them and looking at the page source. The problem is that you do not know where to search and mostly do not have the tools or the skills. If a user / a visitor adds by mistake (via copy and paste) or as an malicious action an RLO inside a page the consequences for that paragraph can show up some months later.

best regards reinhardt [[user:gangleri]]
WORKSFORME on trunk Fx.  (Reopen if you can still reproduce in a trunk build.)
Status: NEW → RESOLVED
Closed: 17 years ago
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: