Closed Bug 1158456 Opened 9 years ago Closed 9 years ago

U+0008 is inserted when removing compositing text with Japanese Input on OS X 10.10

Categories

(Core :: Widget: Cocoa, defect)

x86_64
macOS
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla40
Tracking Status
firefox40 --- fixed

People

(Reporter: arai, Assigned: arai)

Details

Attachments

(3 files, 3 obsolete files)

Tested on OS X 10.10.3.

Steps to reproduce:
  1. run Nightly with clean profile
  2. focus on location bar
  3. turn Japanese Input on
  4. press A key (あ)
  5. press space key (start translation)
  6. press delete key
  7. press delete key

Expected result (what happens on Finder, Safari, etc):
inserted text (あ) is removed on step 7

Actual result:
inserted text (あ) is removed and U+0008 (00008 in box) is inserted on step 7

Same issue happens on all supported branches (including ESR31), so this shouldn't be a recent regression.
This doesn't happen if I skip step 5 (space key), also I didn't see that issue on OS X 10.9.5.

attached log with NSPR_LOG_MODULES=TextInputHandlerWidgets:5, there TextInputHandler::InsertText with aAttrString="(U+0008)" is called.
following keys also insert control character in step 7
  * esc key
  * tab key
  * F1...F12
  * control+A, control+B, etc
on Finder's search field, only tab key inserts tab character, but other keys which produce control character just deletes the inserted text.
can we just ignore control character passed to InsertText? or should do it in more restricted case (after delete. only if compositing string has 1-length, etc)?
or perhaps there's another place to fix the issue?
Flags: needinfo?(masayuki)
WIP patch which treats control character as empty string.
if there's a input method which can generate and insert control character intentionally with same way, this won't work though.
(In reply to Tooru Fujisawa [:arai] from comment #4)
> can we just ignore control character passed to InsertText?

I think so. Inserting and sending control character could cause some web sites confused.

> or should do it
> in more restricted case (after delete. only if compositing string has
> 1-length, etc)?

I don't think so. Even if IME tries to insert control character, it doesn't make sense in usual cases.

> or perhaps there's another place to fix the issue?

I have no idea. nsPlaintextEditor can ignore control characters, however, it's not reasonable for listeners for DOM events.
Flags: needinfo?(masayuki)
Comment on attachment 8597607 [details] [diff] [review]
Dispatch composition event with empty string if a control character is passed to TextInputHandler::InsertText on OS X.

>diff --git a/widget/cocoa/TextInputHandler.mm b/widget/cocoa/TextInputHandler.mm
>--- a/widget/cocoa/TextInputHandler.mm
>+++ b/widget/cocoa/TextInputHandler.mm
>@@ -2765,16 +2765,22 @@ IMEInputHandler::DispatchCompositionComm
> }
> 
> void
> IMEInputHandler::InitCompositionEvent(WidgetCompositionEvent& aCompositionEvent)
> {
>   aCompositionEvent.time = PR_IntervalNow();
> }
> 
>+static bool
>+IsControlCharacter(char16_t code)

IsControlChar() is already in TextInputHandler.mm. You should use it and allow U+0009 if it's necessary.

>+{
>+  return (code <= 0x1F && code != 0x09) || code == 0x7F;

Hmm, it's difficult to decide if we should allow U+0009 (Horizontal Tab). But it's already allowed now and some IME *might* try to insert it. So, yes, it's perhaps better to keep current behavior for it.

>@@ -2837,17 +2843,21 @@ IMEInputHandler::InsertTextAsCommittingC
>-  DispatchCompositionCommitEvent(&str);
>+  if (str.Length() == 1 && IsControlCharacter(str.CharAt(0))) {
>+    DispatchCompositionCommitEvent(&EmptyString());
>+  } else {
>+    DispatchCompositionCommitEvent(&str);
>+  }

How about to cut control characters from the string? And I think that we should do it at dispatching compositionchange. If we allow control characters only during composing, users must feel odd. So, I think that you should create RemoveControlCharactersFrom(nsAString& aStr) and call it when the str is defined.

# Um, but if so, we can do this in TextComposition::DispatchCompositionEvent()? And it might be better to be controllable with pref like "dom.compositionevent.allow_control_characters".
> # Um, but if so, we can do this in TextComposition::DispatchCompositionEvent()? And it might be better to be controllable with pref like "dom.compositionevent.allow_control_characters".

And then, we can test the behavior with mochitest.
Thank you for your feedback :D

(In reply to Masayuki Nakano (:masayuki) (Mozilla Japan) from comment #7)
> IsControlChar() is already in TextInputHandler.mm. You should use it and
> allow U+0009 if it's necessary.

If we modify TextComposition::DispatchCompositionEvent instead, should we move IsControlChar to some header to share code? If so, which header should be used?

> >+{
> >+  return (code <= 0x1F && code != 0x09) || code == 0x7F;
> 
> Hmm, it's difficult to decide if we should allow U+0009 (Horizontal Tab).
> But it's already allowed now and some IME *might* try to insert it. So, yes,
> it's perhaps better to keep current behavior for it.

Yeah, as other applications inserts HTAB in that case, it would be better to keep that behavior same with them.

> >@@ -2837,17 +2843,21 @@ IMEInputHandler::InsertTextAsCommittingC
> >-  DispatchCompositionCommitEvent(&str);
> >+  if (str.Length() == 1 && IsControlCharacter(str.CharAt(0))) {
> >+    DispatchCompositionCommitEvent(&EmptyString());
> >+  } else {
> >+    DispatchCompositionCommitEvent(&str);
> >+  }
> 
> How about to cut control characters from the string? And I think that we
> should do it at dispatching compositionchange. If we allow control
> characters only during composing, users must feel odd. So, I think that you
> should create RemoveControlCharactersFrom(nsAString& aStr) and call it when
> the str is defined.

Okay, added to the beginning of TextComposition::DispatchCompositionEvent. Is there any appropriate method/function to remove those characters, instead of writing that code down there?

btw, should we take care of range if we remove control characters? if control character appears at the middle of the range, the range will become different (am I correct?) of course this is a super edge case, which won't happen in practice though.

> # Um, but if so, we can do this in
> TextComposition::DispatchCompositionEvent()? And it might be better to be
> controllable with pref like "dom.compositionevent.allow_control_characters".

Moved there :)
Can we call Preferences::GetBool everytime? or should call it once and cache the value?


Green on try run: https://treeherder.mozilla.org/#/jobs?repo=try&revision=820f93051a98
Attachment #8597607 - Attachment is obsolete: true
Attachment #8598178 - Flags: feedback?(masayuki)
(In reply to Tooru Fujisawa [:arai] from comment #9)
> (In reply to Masayuki Nakano (:masayuki) (Mozilla Japan) from comment #7)
> > IsControlChar() is already in TextInputHandler.mm. You should use it and
> > allow U+0009 if it's necessary.
> 
> If we modify TextComposition::DispatchCompositionEvent instead, should we
> move IsControlChar to some header to share code? If so, which header should
> be used?

I don't think so. widget and XP part are not share utilities. So, don't mind to duplicate IsControlChar() in this case.

> > >@@ -2837,17 +2843,21 @@ IMEInputHandler::InsertTextAsCommittingC
> > >-  DispatchCompositionCommitEvent(&str);
> > >+  if (str.Length() == 1 && IsControlCharacter(str.CharAt(0))) {
> > >+    DispatchCompositionCommitEvent(&EmptyString());
> > >+  } else {
> > >+    DispatchCompositionCommitEvent(&str);
> > >+  }
> > 
> > How about to cut control characters from the string? And I think that we
> > should do it at dispatching compositionchange. If we allow control
> > characters only during composing, users must feel odd. So, I think that you
> > should create RemoveControlCharactersFrom(nsAString& aStr) and call it when
> > the str is defined.
> 
> Okay, added to the beginning of TextComposition::DispatchCompositionEvent.
> Is there any appropriate method/function to remove those characters, instead
> of writing that code down there?

No, it's the best point.

> btw, should we take care of range if we remove control characters?

Oh, good catch. It's necessary.

> if control character appears at the middle of the range, the range will become
> different (am I correct?) of course this is a super edge case, which won't
> happen in practice though.

You're right. When a loop removing control characters finds a control character, it need to find a range which includes the control character and adjust its length and both start and end offsets of other ranges whose start offset or end offset is larger than the control character.

I think that this should be implemented by TextRangeArray. E.g., TextRangeArray::RemoveCharacter(uint32_t aOffset).

# I think that TextRange should store start offset and length, instead of end offset, but it's not scope of this bug.

> > # Um, but if so, we can do this in
> > TextComposition::DispatchCompositionEvent()? And it might be better to be
> > controllable with pref like "dom.compositionevent.allow_control_characters".
> 
> Moved there :)
> Can we call Preferences::GetBool everytime? or should call it once and cache
> the value?

I think that calling Preferences::GetBool() from constructor of TextComposition is reasonable. Then, you can check both behavior from mochitest.
Comment on attachment 8598178 [details] [diff] [review]
Remove control characters from composition string, and add dom.compositionevent.allow_control_characters pref to control it.

>+static void
>+RemoveControlCharactersFrom(nsAString& aStr)
>+{
>+  const char16_t* sourceBegin = aStr.BeginReading();
>+  const char16_t* sourceEnd = aStr.EndReading();
>+
>+  const char16_t* source;
>+  for (source = sourceBegin; source < sourceEnd; ++source) {
>+    if (*source != '\t' && IsControlChar(*source)) {
>+      break;
>+    }
>+  }
>+  if (source == sourceEnd) {
>+    // There is no control characters.
>+    return;
>+  }
>+
>+  size_t leadingLength = source - sourceBegin;
>+
>+  nsString copy(aStr);

nsAutoString, perhaps.

>+  sourceBegin = copy.BeginReading();
>+  sourceEnd = copy.EndReading();

I don't like reuse variables in this code. How about to separate the above part to |size_t FindFirstControlCharacter(const nsAString& aStr)|?

Then, it's result becomes leadningLength. (Although, I like firstControlCharOffset better.)

>+  char16_t* dest = aStr.BeginWriting();
>+  if (!dest) {

nit: NS_WARN_IF()

>+    return;
>+  }
>+
>+  char16_t* curDest = dest;
>+
>+  const char16_t* leading;

Cannot this defined in the condition of the |for|? Defining a variable without initializing makes other developers hard to read it.

>+  const char16_t* leadingEnd = sourceBegin + leadingLength;
>+  for (leading = sourceBegin; leading < leadingEnd; ++leading) {
>+    *curDest = *leading;
>+    ++curDest;
>+  }

I don't understand why this is necessary? This copies the string before first control character to aStr. However, isn't aStr has same string already??

>+
>+  // Skip first occurrence.
>+  source = sourceBegin + leadingLength + 1;
>+
>+  for (; source < sourceEnd; ++source) {
>+    if (*source == '\t' || !IsControlChar(*source)) {
>+      *curDest = *source;
>+      ++curDest;
>+    }
>+  }

So, we need to notify TextRangeArray of removing character, you cannot skip it. The loop should start from the first control character.

>diff --git a/modules/libpref/init/all.js b/modules/libpref/init/all.js
>--- a/modules/libpref/init/all.js
>+++ b/modules/libpref/init/all.js
>@@ -4702,8 +4702,11 @@ pref("media.gmp.insecure.allow", false);
> pref("gfx.vsync.hw-vsync.enabled", true);
> pref("gfx.vsync.compositor", true);
> pref("gfx.vsync.refreshdriver", true);
> 
> // Secure Element API
> #ifdef MOZ_SECUREELEMENT
> pref("dom.secureelement.enabled", false);
> #endif
>+
>+// Allow control characters appear in composition string.
>+pref("dom.compositionevent.allow_control_characters", false);

// Allow control characters appear in composition string.
// When this is false, control characters except
// CHARACTER TABULATION (horizontal tab) are removed from
// both composition string and data attribute of compositionupdate
// and compositionend events.

>+function runControlCharTest()
>+{
>+  textarea.focus();
>+
>+  var result = {};
>+  function clearResult()
>+  {
>+    result = { compositionupdate: null, compositionend: null };
>+  }
>+
>+  function handler(aEvent)
>+  {
>+    result[aEvent.type] = aEvent.data;
>+  }
>+
>+  textarea.addEventListener("compositionupdate", handler, true);
>+  textarea.addEventListener("compositionend", handler, true);
>+
>+  textarea.value = "";
>+
>+  var controlChars = String.fromCharCode.apply(null, Object.keys(Array.from({length:0x20}))) + "\x7F";
>+  controlChars = "";
>+
>+  var data = "AB" + controlChars + "CD";
>+
>+  // input string contains control characters
>+  clearResult();
>+  synthesizeCompositionChange(
>+    { "composition":
>+      { "string": data,
>+        "clauses":
>+        [
>+          { "length": data.length, "attr": COMPOSITION_ATTR_RAW_CLAUSE }
>+        ]
>+      },
>+      "caret": { "start": data.length, "length": 0 }
>+    });
>+
>+  is(result.compositionupdate, "ABCD", "runControlCharTest: control characters in event.data should be removed in compositionupdate event #1");
>+  is(textarea.value, "ABCD", "runControlCharTest: control characters should not appear in textarea #1");
>+
>+  synthesizeComposition({ type: "compositioncommit", data: data });
>+
>+  is(result.compositionend, "ABCD", "runControlCharTest: control characters in event.data should be removed in compositionend event #1");
>+  is(textarea.value, "ABCD", "runControlCharTest: control characters should not appear in textarea #2");
>+
>+  textarea.value = "";
>+
>+  clearResult();
>+
>+  SpecialPowers.setBoolPref("dom.compositionevent.allow_control_characters", true);
>+
>+  // input string contains control characters, allowing control characters
>+  clearResult();
>+  synthesizeCompositionChange(
>+    { "composition":
>+      { "string": data,
>+        "clauses":
>+        [
>+          { "length": data.length, "attr": COMPOSITION_ATTR_RAW_CLAUSE }
>+        ]
>+      },
>+      "caret": { "start": data.length, "length": 0 }
>+    });
>+
>+  is(result.compositionupdate, data, "runControlCharTest: control characters in event.data should be removed in compositionupdate event #1");
>+  is(textarea.value, data, "runControlCharTest: control characters should not appear in textarea #1");
>+
>+  synthesizeComposition({ type: "compositioncommit", data: data });
>+
>+  is(result.compositionend, data, "runControlCharTest: control characters in event.data should be removed in compositionend event #1");
>+  is(textarea.value, data, "runControlCharTest: control characters should not appear in textarea #2");
>+
>+  SpecialPowers.clearUserPref("dom.compositionevent.allow_control_characters");
>+
>+  textarea.removeEventListener("compositionupdate", handler, true);
>+  textarea.removeEventListener("compositionend", handler, true);
>+}

Great!
Attachment #8598178 - Flags: feedback?(masayuki) → feedback+
Comment on attachment 8598765 [details] [diff] [review]
Remove control characters from composition string, and add dom.compositionevent.allow_control_characters pref to control it.

Excellent! Thank you!!
Attachment #8598765 - Flags: review?(masayuki) → review+
sorry had to back this out for test failures like https://treeherder.mozilla.org/logviewer.html#?job_id=9443851&repo=mozilla-inbound
Flags: needinfo?(arai.unmht)
Thanks Tomcat :D

seems that my testcase is wrong for windows, since the failure happens on dom.compositionevent.allow_control_characters=true case, the previous behavior without the patch.
maybe some control character is replaced into 2-length string while processing the event, so the selection offset mismatches.
> 3432 INFO TEST-UNEXPECTED-FAIL | widget/tests/test_composition_text_querycontent.xul | runControlCharTest: selection offset is wrong#3 - got 37, expected 36
Flags: needinfo?(arai.unmht)
(In reply to Tooru Fujisawa [:arai] from comment #17)
> Thanks Tomcat :D
> 
> seems that my testcase is wrong for windows, since the failure happens on
> dom.compositionevent.allow_control_characters=true case, the previous
> behavior without the patch.
> maybe some control character is replaced into 2-length string while
> processing the event, so the selection offset mismatches.
> > 3432 INFO TEST-UNEXPECTED-FAIL | widget/tests/test_composition_text_querycontent.xul | runControlCharTest: selection offset is wrong#3 - got 37, expected 36

On Windows, \n is replaced with \r\n. I guess this is what the reason of the failure is.
(In reply to Masayuki Nakano (:masayuki) (Mozilla Japan) from comment #18)
> On Windows, \n is replaced with \r\n. I guess this is what the reason of the
> failure is.

Yeah, that seems to be related, but there no "\r\n" sequence appears in aEvent.data nor textarea.value, as other assertions succeeded.
So I wonder where it happens.

Anyway, as this is not a regression from this patch, I'd like to land it with tweaking the assertion (index+1 on windows),
and leave it to another bug.
(In reply to Tooru Fujisawa [:arai] from comment #19)
> (In reply to Masayuki Nakano (:masayuki) (Mozilla Japan) from comment #18)
> > On Windows, \n is replaced with \r\n. I guess this is what the reason of the
> > failure is.
> 
> Yeah, that seems to be related, but there no "\r\n" sequence appears in
> aEvent.data nor textarea.value, as other assertions succeeded.
> So I wonder where it happens.

CompositionEvent.data isn't replaced with native line breaking characters. But ConvenrtEventHandler does it. So, the results of query content events are replaced with them.
Thanks :)
just replaced `DIndex` with `DIndex - 1 + kLFLen`, since there is one LF before it.
> +  checkSelection(DIndex - 1 + kLFLen, "", "runControlCharTest", "#3")
no other changes from previous one
Assignee: nobody → arai.unmht
Attachment #8598765 - Attachment is obsolete: true
Attachment #8599872 - Flags: review?(masayuki)
Comment on attachment 8599872 [details] [diff] [review]
Remove control characters from composition string, and add dom.compositionevent.allow_control_characters pref to control it. r=masayuki

Please confirm the result of mochitest-chrome on tryserver before landing (Linux, Mac and Win).
Attachment #8599872 - Flags: review?(masayuki) → review+
https://hg.mozilla.org/mozilla-central/rev/2e212218ba40
Status: NEW → RESOLVED
Closed: 9 years ago
Flags: in-testsuite+
Resolution: --- → FIXED
Target Milestone: --- → mozilla40
You need to log in before you can comment on or make changes to this bug.