Closed Bug 1068979 Opened 7 years ago Closed 7 years ago

backspacing over multiple individually-pasted supplementary-plane characters in contentEditable breaks surrogate pairs

Categories

(Core :: DOM: Editor, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla35

People

(Reporter: jfkthame, Assigned: jfkthame)

References

Details

Attachments

(3 files, 1 obsolete file)

STR:

* Load testcase:
  data:text/html,<div contentEditable>foo&%23x1d400;

* Select the bold (math-alphabet) A character at the end of the text

* Copy it to the clipboard

* Paste repeatedly, to append several copies of it to the text

* Press Backspace repeatedly

Note how on the first backspace, the bold A is deleted correctly. On subsequent backspace keystrokes, it takes two backspaces to delete it; after the first, an isolated surrogate appears as a hexbox, and then the next keystroke removes that.
(In this comment, I'm using [A] to represent the single SMP codepoint from the testcase described above, as I can't enter the true mathematical-bold A into bugzilla!)

Note that if you paste a single [A] into the contentEditable, and then backspace, it deletes fine. However, if you paste several of them, then start backspacing, the last one pasted deletes fine, but the preceding ones break into individual surrogates as they're deleted.

If you interrupt the sequence of backspaces with some other action, such as clicking the mouse (even if it's in the same place as the current insertion point), or moving the caret back and forth with the arrow keys, the first backspace after the interruption works correctly again. Then back to the broken behavior.

Furthermore, this only happens if you paste the sequence of [A]s one by one. Copy a string of them and paste it as a unit, and they'll all delete tidily.

The problem seems to be that after a backspace, the nature of the current selection object (as passed in to nsPlaintextEditor::ExtendSelectionForDelete) changes. Given the text

[A][A][A][A]

pasted into a contentEditable one character at a time, with the insertion point at the end, we get passed a selection whose startNode is the single-character (two-codepoint) node containing the last [A], with an offset of 2 (i.e. the selection is at the end of this text node). Backspace then correctly detects the surrogate pair and deletes both codepoints.

However, on the next backspace, we call ExtendSelectionForDelete with a selection whose startNode is the entire content of the contentEditable element, and offset is 3 (i.e. the selection is after the 3rd child node). So ExtendSelectionForDelete does not see this as a TextNode, and doesn't detect or handle the surrogate pair at all.

Clicking afresh or moving the insertion point with the arrow keys repositions the selection such that we again get a selection that refers to the actual TextNode contents.

I guess a couple of possible fixes for this bug might be either:

(a) ensure that after a <backspace> deletion that ends up removing a TextNode has been processed, the selection is left pointing at the end of the preceding TextNode and not at the TextNode's parent; or

(b) make ExtendSelectionForDelete handle this case by looking (recursively, if necessary?) at the child node immediately preceding the offset, to see if it contains text, and if so, whether surrogates need to be considered.

From here on, someone who understands a bit about how we manage DOM selections might be able to make progress...
AFAICT this fixes the problem, without risk of disrupting anything else (I hope). We'll also want a testcase...
Attachment #8491412 - Flags: review?(ehsan.akhgari)
Simple testcase that fails on current trunk; passes with the patch here.
Attachment #8491444 - Flags: review?(ehsan.akhgari)
Comment on attachment 8491412 [details] [diff] [review]
Handle surrogate pair if necessary when backspacing into a text node.

Review of attachment 8491412 [details] [diff] [review]:
-----------------------------------------------------------------

::: editor/libeditor/nsHTMLEditRules.cpp
@@ +2031,5 @@
> +          if (NS_IS_LOW_SURROGATE(text->CharAt(so)) &&
> +              NS_IS_HIGH_SURROGATE(text->CharAt(so - 1))) {
> +            so--;
> +          }
> +        }

I think instead of hacking things here, we should fix this properly by extending ExtendSelectionForDelete to handle the case where the selection points to the container.
Attachment #8491412 - Flags: review?(ehsan.akhgari) → review-
Attachment #8491444 - Flags: review?(ehsan.akhgari) → review+
Fair enough -- but I have no idea how to do that. Here's an attempt; seems to work in my testing, but this is completely unknown territory to me...
Attachment #8491572 - Flags: review?(ehsan.akhgari)
BTW, the bug here can also be demonstrated without needing to copy/paste-in characters. Just load

  data:text/html,<div contenteditable>foo <span>&%23x1d401;</span> bar

and then click at the end of the text and backspace one character at a time.
Added more testcases, including one that fails with the ExtendSelectionForDelete patch because the selection returned is at offset 0 of the node following the surrogate pair.
Attachment #8493237 - Flags: review?(ehsan.akhgari)
Attachment #8491444 - Attachment is obsolete: true
Assignee: nobody → jfkthame
Status: NEW → ASSIGNED
Comment on attachment 8491572 [details] [diff] [review]
Make ExtendSelectionForDelete descend into child nodes if necessary to check for surrogates.

As discussed on IRC, this isn't adequate - it fails the third example in the attached test patch.
Attachment #8491572 - Flags: review?(ehsan.akhgari) → review-
Comment on attachment 8491412 [details] [diff] [review]
Handle surrogate pair if necessary when backspacing into a text node.

Re-setting r? on this patch; it may be a low-level hack but it works OK for all the examples tested. :)
Attachment #8491412 - Flags: review- → review?(ehsan.akhgari)
Comment on attachment 8491412 [details] [diff] [review]
Handle surrogate pair if necessary when backspacing into a text node.

Review of attachment 8491412 [details] [diff] [review]:
-----------------------------------------------------------------

::: editor/libeditor/nsHTMLEditRules.cpp
@@ +2025,5 @@
> +        so--;
> +        eo--;
> +        // bug 1068979: delete both codepoints if surrogate pair
> +        if (so > 0) {
> +          nsRefPtr<Text> nodeAsText = visNode_->GetAsText();

Nit: I would move this up in the |wsType == WSType::text| branch so that nodeAsText can be used for both the if and else branch, and get rid of the declaration of this variable in the else branch.
Attachment #8491412 - Flags: review?(ehsan.akhgari) → review+
Comment on attachment 8493237 [details] [diff] [review]
Tests for backspacing over SMP characters preceding element boundaries due to markup or pasted content.

Review of attachment 8493237 [details] [diff] [review]:
-----------------------------------------------------------------

Nice!
Attachment #8493237 - Flags: review?(ehsan.akhgari) → review+
Duplicate of this bug: 1067712
Duplicate of this bug: 1068530
QA Whiteboard: [good first verify]
You need to log in before you can comment on or make changes to this bug.