Closed Bug 332636 Opened 18 years ago Closed 14 years ago

backspace/delete in HTML editor don't handle Unicode plane 1 characters correctly (UTF-16 surrogate pairs)

Categories

(Core :: DOM: Editor, defect)

x86
Linux
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla1.9.3a5
Tracking Status
blocking2.0 --- -

People

(Reporter: dbaron, Assigned: ehsan.akhgari)

References

Details

Attachments

(4 files, 2 obsolete files)

This is a followup to bug 237585, which fixed this bug for plaintext editing.

[This comment is in UTF-8.]

Backspace and delete in the editor don't handle Unicode characters outside of plane 0 (the BMP, U+0000 to U+FFFF) correctly.  What appears to be happening is that they treat the two surrogates as separate characters, and a single backspace or delete only deletes one of the surrogates rather than the whole character.

(Note to the uninformed:  UTF-16 encodes characters in the range U+10000 to
U+10FFFF using two 16-bit values, a high surrogate (in the range U+D800 to
U+DBFF) followed by a low surrogate (in the range U+DC00 to U+DFFF).)

It's worth noting that selection seems to handle non-BMP characters fine.  Thus see bug 332635 about one approach that might fix this bug.

The attached testcase contains a document, with designMode set to on, with the string "a
QA Contact: editor
Assignee: mozeditor → nobody
How can we add another platform to this defect? It is reproducible on Win XP also.
Ehsan, this bug is hurting some web based editors*, any idea if it can be fixed (easily)?

* e.g. http://dev.fckeditor.net/ticket/5078
(In reply to comment #4)
> Ehsan, this bug is hurting some web based editors*, any idea if it can be fixed
> (easily)?
> 
> * e.g. http://dev.fckeditor.net/ticket/5078

I'll look into it.
Note that a significant part of comment 0 got corrupted in the bugzilla database character encoding migration.  Does anybody still have the bugmail?
Attached patch WIP (obsolete) — Splinter Review
This mostly works.  I just need to create a unit test for it, and push it to try to make sure I'm not breaking anything.
Assignee: nobody → ehsan
Status: NEW → ASSIGNED
Attached patch Patch (v1) (obsolete) — Splinter Review
Attachment #434755 - Attachment is obsolete: true
Attachment #435287 - Flags: review?(peterv)
Comment on attachment 435287 [details] [diff] [review]
Patch (v1)

Switching the review request to roc.
Attachment #435287 - Flags: review?(peterv) → review?(roc)
+        result = GetStartNodeAndOffset(aSelection, address_of(node), &offset);

why address_of instead of getter_AddRefs?
(In reply to comment #11)
> +        result = GetStartNodeAndOffset(aSelection, address_of(node), &offset);
> 
> why address_of instead of getter_AddRefs?

Because GetStartNodeAndOffset takes an nsCOMPtr<nsIDOMNode>* parameter.  And yes, that's stupid!  Here's bug 569523 to fix that.
OK.

What happens here if we have a cluster consisting of a base character followed by a combining mark, where the combining mark is in plane 1? It seems to me this patch would have backspace delete the entire cluster, whereas we should only delete the mark?
(In reply to comment #13)
> OK.
> 
> What happens here if we have a cluster consisting of a base character followed
> by a combining mark, where the combining mark is in plane 1? It seems to me
> this patch would have backspace delete the entire cluster, whereas we should
> only delete the mark?

If I'm not mistaken, the only thing that this patch does is to extend the selection 1 code point backwards, so that it includes both code points in the surrogate.  Why would it also cause the base character to be deleted as well?
Attachment #434605 - Attachment mime type: text/plain → text/plain; charset=UTF-8
did you test it? It looks to me like MoveCaret(DOM_VK_BACK_SPACE) behaves just like DOM_VK_LEFT (in LTR), which I expect would select the entire cluster.
Attached patch Patch (v1.1)Splinter Review
I did test it using all the non-spacing mark characters in plane 1 that I could find in UniView.

I also added a test using one of them.

I should also mention that we didn't support displaying any of those characters (using the default font at least), so I got the "character code in a box" glyph for all of them, but I'm not sure if that matters.
Attachment #435287 - Attachment is obsolete: true
Attachment #449203 - Flags: review?(roc)
Attachment #435287 - Flags: review?(roc)
http://hg.mozilla.org/mozilla-central/rev/855a585cfa2e
Status: ASSIGNED → RESOLVED
Closed: 14 years ago
Flags: in-testsuite+
Resolution: --- → FIXED
Target Milestone: --- → mozilla1.9.3a5
(In reply to comment #16)
> Created attachment 449203 [details] [diff] [review]
> Patch (v1.1)
> 
> I did test it using all the non-spacing mark characters in plane 1 that I could
> find in UniView.
> 
> I also added a test using one of them.
> 
> I should also mention that we didn't support displaying any of those characters
> (using the default font at least), so I got the "character code in a box" glyph
> for all of them, but I'm not sure if that matters.

It turns out that did matter, because when we set the "missing glyph" code in the textrun, we (incorrectly) overwrite the grapheme-cluster flag. This is bug 618870. The patch there fixes the cluster behavior, but causes the unit test here to fail, because <backspace> then deletes the base+mark pair for the string with a plane-1 combining mark. So Roc was correct in comment #15, but the problem was masked by bug 618870 plus the lack of any font support.

Re-opening this; we'll need to fix it (again) so that we can land bug 618870.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Blocks: 618870
Here's a possible approach that resolves this issue. The basic idea is to enhance nsSelectionAmount to make a distinction between selection by "character" (meaning strictly a single Unicode character - but not a single PRUnichar codepoint, this still needs to be aware of surrogate pairs) and by "cluster", which is the grapheme-cluster behavior of the existing eSelectCharacter option.

The patch then migrates existing users of eSelectCharacter to the new eSelectCluster value, to maintain their behavior, except that the new non-clustering version of eSelectCharacter is used for CharacterExtendForBackspace.
Attachment #498117 - Flags: review?(roc)
Requesting blocking because we need to fix this in order to land the patch for bug 618870, which is what will fix bug 619286, which in turn is already marked as a blocking regression.

In addition, this bug means that anyone actually using plane-1 combining marks (which means they'd have font support) would get incorrect backspace behavior. Our unit tests only pass because the test boxes lack relevant fonts.
blocking2.0: --- → ?
Comment on attachment 498117 [details] [diff] [review]
patch 2, v1 - fix backspace with plane-1 combining marks

r=me provided that you add a comment in IsAcceptableCaretPosition as to what the surrogate pair check is trying to achieve, and also add a test case which triggers this code path in order to make sure that we won't regress it in the future.  And thanks for picking up my slack :-)
Attachment #498117 - Flags: review?(roc) → review+
Pushed to trunk (with added comment, and extra mochitest testcases):
http://hg.mozilla.org/mozilla-central/rev/1958ebae93cc
Status: REOPENED → RESOLVED
blocking2.0: - → ?
Closed: 14 years ago14 years ago
Resolution: --- → FIXED
Argh, I forgot that one of the newly-added tests would likely fail until bug 618870 is fixed. So backed this out for now:
http://hg.mozilla.org/mozilla-central/rev/e331809693aa
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
(In reply to comment #23)
> Argh, I forgot that one of the newly-added tests would likely fail until bug
> 618870 is fixed. So backed this out for now:
> http://hg.mozilla.org/mozilla-central/rev/e331809693aa

Is it possible to disable the test here, land the patch, and re-enable it together with the patch in bug 618870?
Relanded along with the cluster patches that should let it stick this time:
http://hg.mozilla.org/mozilla-central/rev/b433f7b6d033
Status: REOPENED → RESOLVED
Closed: 14 years ago14 years ago
Resolution: --- → FIXED
Blocks: 1194763
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: