backspace/delete in HTML editor don't handle Unicode plane 1 characters correctly (UTF-16 surrogate pairs)

RESOLVED FIXED in mozilla1.9.3a5

Status

()

Core
Editor
RESOLVED FIXED
11 years ago
2 years ago

People

(Reporter: dbaron, Assigned: Ehsan)

Tracking

Trunk
mozilla1.9.3a5
x86
Linux
Points:
---
Dependency tree / graph
Bug Flags:
in-testsuite +

Firefox Tracking Flags

(blocking2.0 -)

Details

Attachments

(4 attachments, 2 obsolete attachments)

(Reporter)

Description

11 years ago
This is a followup to bug 237585, which fixed this bug for plaintext editing.

[This comment is in UTF-8.]

Backspace and delete in the editor don't handle Unicode characters outside of plane 0 (the BMP, U+0000 to U+FFFF) correctly.  What appears to be happening is that they treat the two surrogates as separate characters, and a single backspace or delete only deletes one of the surrogates rather than the whole character.

(Note to the uninformed:  UTF-16 encodes characters in the range U+10000 to
U+10FFFF using two 16-bit values, a high surrogate (in the range U+D800 to
U+DBFF) followed by a low surrogate (in the range U+DC00 to U+DFFF).)

It's worth noting that selection seems to handle non-BMP characters fine.  Thus see bug 332635 about one approach that might fix this bug.

The attached testcase contains a document, with designMode set to on, with the string "a
(Reporter)

Comment 1

11 years ago
Created attachment 217099 [details]
testcase
(Reporter)

Updated

11 years ago
Depends on: 332635
QA Contact: editor
Assignee: mozeditor → nobody

Updated

7 years ago
Duplicate of this bug: 545417

Comment 3

7 years ago
How can we add another platform to this defect? It is reproducible on Win XP also.
Ehsan, this bug is hurting some web based editors*, any idea if it can be fixed (easily)?

* e.g. http://dev.fckeditor.net/ticket/5078
(In reply to comment #4)
> Ehsan, this bug is hurting some web based editors*, any idea if it can be fixed
> (easily)?
> 
> * e.g. http://dev.fckeditor.net/ticket/5078

I'll look into it.
(Reporter)

Comment 6

7 years ago
Note that a significant part of comment 0 got corrupted in the bugzilla database character encoding migration.  Does anybody still have the bugmail?
Created attachment 434605 [details]
comment 0 bugmail
Created attachment 434755 [details] [diff] [review]
WIP

This mostly works.  I just need to create a unit test for it, and push it to try to make sure I'm not breaking anything.
Assignee: nobody → ehsan
Status: NEW → ASSIGNED
Created attachment 435287 [details] [diff] [review]
Patch (v1)
Attachment #434755 - Attachment is obsolete: true
Attachment #435287 - Flags: review?(peterv)
Comment on attachment 435287 [details] [diff] [review]
Patch (v1)

Switching the review request to roc.
Attachment #435287 - Flags: review?(peterv) → review?(roc)
+        result = GetStartNodeAndOffset(aSelection, address_of(node), &offset);

why address_of instead of getter_AddRefs?
(In reply to comment #11)
> +        result = GetStartNodeAndOffset(aSelection, address_of(node), &offset);
> 
> why address_of instead of getter_AddRefs?

Because GetStartNodeAndOffset takes an nsCOMPtr<nsIDOMNode>* parameter.  And yes, that's stupid!  Here's bug 569523 to fix that.
OK.

What happens here if we have a cluster consisting of a base character followed by a combining mark, where the combining mark is in plane 1? It seems to me this patch would have backspace delete the entire cluster, whereas we should only delete the mark?
(In reply to comment #13)
> OK.
> 
> What happens here if we have a cluster consisting of a base character followed
> by a combining mark, where the combining mark is in plane 1? It seems to me
> this patch would have backspace delete the entire cluster, whereas we should
> only delete the mark?

If I'm not mistaken, the only thing that this patch does is to extend the selection 1 code point backwards, so that it includes both code points in the surrogate.  Why would it also cause the base character to be deleted as well?
(Reporter)

Updated

7 years ago
Attachment #434605 - Attachment mime type: text/plain → text/plain; charset=UTF-8
did you test it? It looks to me like MoveCaret(DOM_VK_BACK_SPACE) behaves just like DOM_VK_LEFT (in LTR), which I expect would select the entire cluster.
Created attachment 449203 [details] [diff] [review]
Patch (v1.1)

I did test it using all the non-spacing mark characters in plane 1 that I could find in UniView.

I also added a test using one of them.

I should also mention that we didn't support displaying any of those characters (using the default font at least), so I got the "character code in a box" glyph for all of them, but I'm not sure if that matters.
Attachment #435287 - Attachment is obsolete: true
Attachment #449203 - Flags: review?(roc)
Attachment #435287 - Flags: review?(roc)
Attachment #449203 - Flags: review?(roc) → review+
http://hg.mozilla.org/mozilla-central/rev/855a585cfa2e
Status: ASSIGNED → RESOLVED
Last Resolved: 7 years ago
Flags: in-testsuite+
Resolution: --- → FIXED
Target Milestone: --- → mozilla1.9.3a5
(In reply to comment #16)
> Created attachment 449203 [details] [diff] [review]
> Patch (v1.1)
> 
> I did test it using all the non-spacing mark characters in plane 1 that I could
> find in UniView.
> 
> I also added a test using one of them.
> 
> I should also mention that we didn't support displaying any of those characters
> (using the default font at least), so I got the "character code in a box" glyph
> for all of them, but I'm not sure if that matters.

It turns out that did matter, because when we set the "missing glyph" code in the textrun, we (incorrectly) overwrite the grapheme-cluster flag. This is bug 618870. The patch there fixes the cluster behavior, but causes the unit test here to fail, because <backspace> then deletes the base+mark pair for the string with a plane-1 combining mark. So Roc was correct in comment #15, but the problem was masked by bug 618870 plus the lack of any font support.

Re-opening this; we'll need to fix it (again) so that we can land bug 618870.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Blocks: 618870
Created attachment 498117 [details] [diff] [review]
patch 2, v1 - fix backspace with plane-1 combining marks

Here's a possible approach that resolves this issue. The basic idea is to enhance nsSelectionAmount to make a distinction between selection by "character" (meaning strictly a single Unicode character - but not a single PRUnichar codepoint, this still needs to be aware of surrogate pairs) and by "cluster", which is the grapheme-cluster behavior of the existing eSelectCharacter option.

The patch then migrates existing users of eSelectCharacter to the new eSelectCluster value, to maintain their behavior, except that the new non-clustering version of eSelectCharacter is used for CharacterExtendForBackspace.
Attachment #498117 - Flags: review?(roc)
Requesting blocking because we need to fix this in order to land the patch for bug 618870, which is what will fix bug 619286, which in turn is already marked as a blocking regression.

In addition, this bug means that anyone actually using plane-1 combining marks (which means they'd have font support) would get incorrect backspace behavior. Our unit tests only pass because the test boxes lack relevant fonts.
blocking2.0: --- → ?
Comment on attachment 498117 [details] [diff] [review]
patch 2, v1 - fix backspace with plane-1 combining marks

r=me provided that you add a comment in IsAcceptableCaretPosition as to what the surrogate pair check is trying to achieve, and also add a test case which triggers this code path in order to make sure that we won't regress it in the future.  And thanks for picking up my slack :-)
Attachment #498117 - Flags: review?(roc) → review+
blocking2.0: ? → -
Attachment #498117 - Flags: approval2.0+
Pushed to trunk (with added comment, and extra mochitest testcases):
http://hg.mozilla.org/mozilla-central/rev/1958ebae93cc
Status: REOPENED → RESOLVED
blocking2.0: - → ?
Last Resolved: 7 years ago7 years ago
Resolution: --- → FIXED
Argh, I forgot that one of the newly-added tests would likely fail until bug 618870 is fixed. So backed this out for now:
http://hg.mozilla.org/mozilla-central/rev/e331809693aa
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
(In reply to comment #23)
> Argh, I forgot that one of the newly-added tests would likely fail until bug
> 618870 is fixed. So backed this out for now:
> http://hg.mozilla.org/mozilla-central/rev/e331809693aa

Is it possible to disable the test here, land the patch, and re-enable it together with the patch in bug 618870?
blocking2.0: ? → -
Relanded along with the cluster patches that should let it stick this time:
http://hg.mozilla.org/mozilla-central/rev/b433f7b6d033
Status: REOPENED → RESOLVED
Last Resolved: 7 years ago7 years ago
Resolution: --- → FIXED

Updated

2 years ago
Blocks: 1194763
You need to log in before you can comment on or make changes to this bug.