Open Bug 1541349 Opened 5 years ago Updated 1 year ago

Astral keystrokes show up as two surrogate keystrokes

Categories

(Core :: Widget: Win32, defect, P3)

Unspecified
Windows 10
defect

Tracking

()

People

(Reporter: hsivonen, Unassigned)

References

Details

Attachments

(1 file)

Steps to reproduce

  1. Use Windows 10 (I used 1803)
  2. Right-click an empty part of the Taskbar
  3. Select Show touch keyboard button if not already checked
  4. Navigate to https://hsivonen.com/test/moz/input.html
  5. Click the text field to focus
  6. Click the touch keyboard button in Taskbar
  7. Click the emoji button
  8. Click an emoji

Actual results

The log shows keydown and keyup events whose key string contains an unpaired surrogate. Upon the high surrogate getting entered, it is also visible unpaired in the data string of the input event and in the value of the HTML input element.

This is bad, since invalid UTF-16 is exposed to scripts, which is bad for bridging to Rust code running as wasm. Marking "major" for this reason.

Expected results

Expected one keystroke per Unicode scalar value carrying two UTF-16 code units in the key string for astral characters, as in Edge.

Additional info

KeyboardLayout.cpp seems to be aware of WM_UNICHAR, yet it doesn't seem to have an effect here.

Not a problem when entering emoji via the Microsoft Pinyin IME, which generates emoji via IME API.

OS: Unspecified → Windows 10

(In reply to Henri Sivonen (:hsivonen) from comment #0)

This is bad, since invalid UTF-16 is exposed to scripts, which is bad for bridging to Rust code running as wasm.

See the workarounds proposed in https://github.com/rustwasm/wasm-bindgen/issues/1348

I have trouble upgrading even to Windows 10 1809, but it would be good if someone on the Insider channel of Windows 10 could test the new Adlam keyboard at https://hsivonen.com/test/moz/input.html

(In reply to Henri Sivonen (:hsivonen) from comment #0)

KeyboardLayout.cpp seems to be aware of WM_UNICHAR, yet it doesn't seem to have an effect here.

There's explicitly a TODO about this:
https://searchfox.org/mozilla-central/rev/44a212460990ffffecf50a8e972d3cbde2e7216b/widget/windows/KeyboardLayout.cpp#1590

Wow, surprised!! The behavior is really different from CJKT keyboard layouts and the others.

May I take this?

Component: User events and focus handling → Widget: Win32

(In reply to Masayuki Nakano [:masayuki] (JST, +0900)(Sick, maybe slow response) from comment #4)

May I take this?

Sure. I was hoping you'd take this. :-)

See Also: → 1542252
See Also: → 1542259

The priority flag is not set for this bug.
:jimm, could you have a look please?

Flags: needinfo?(jmathies)
Flags: needinfo?(jmathies)
Priority: -- → P3

I'll check this tomorrow.

Flags: needinfo?(masayuki)

I'm still looking for how to install non-BMP characters without IME (i.e., directly from key typing) on Linux and macOS to check Blink/WebKit's behavior on Linux and macOS. (If it comes without keyboard events, all browsers just use beforeinput event and input event to input them on macOS and Linux.)

On Windows, Chrome does same things as Gecko, i.e., dispatching 2 sets of keyboard events. I'd like to check this occurs on the other browsers on the other platforms...

On the other hand, Gecko does not check surrogate pair to dispatch keypress events which are combined to a set of keyboard events:
https://searchfox.org/mozilla-central/rev/183b0cfc30f2d40f818a08cbea960f6119e2d196/widget/TextEventDispatcher.cpp#718-728
So, Gecko never dispatches keypress event which represents a non-BMP character in any platforms.

I have a concern about another things about emoji input. If an emoji coming with some variants, keypress events need to dispatch 2 or more times anyway. So, it seems that it might be better to use keypress events in such case.

(In reply to Masayuki Nakano [:masayuki] (he/him)(JST, +0900) from comment #8)

If an emoji coming with some variants, keypress events need to dispatch 2 or more times anyway.

Yes, but doing it the way IE and EdgeHTML did with the Windows 10 on-screen keyboard (one keystroke per potentially-astral scalar value) ensures that each keystroke represents a valid Unicode scalar value and reflecting any string associated with the event or any string value of the text input field being typed into as UTF-8 to Wasm is OK. (As noted in comment 0, the Microsoft Pinyin IME inputs emoji by IME means and not as keystrokes like the on-screen emoji keyboard, so both ways already exist simultaneously.)

Created a draft patch, this works fine with our editor.

However, I realize that this change breaks following code in the wild:

str += String.fromCharCode(event.charCode);

because it takes a code point in UTF-16. MDN says "Numbers greater than 0xFFFF are truncated".

So, at least for backward compatibility, we need to keep dispatching 2 keypress event, to represent a pair of high-surrogate and low-surrogate with charCode. On the other hand, we can put key can have a surrogate pair for the first keypress, and the other keypress has empty string.

However, Google Chrome does not set key value properly for a surrogate pair (setting both key values to empty string) on Windows. Therefore, web apps cannot use key for handling text input with keypress events. Therefore, changing our behavior must break the web.

WDYT, Henri?

Flags: needinfo?(masayuki) → needinfo?(hsivonen)

Filed .key value issue of non-BMP characters of Chrome for Windows.

How did these apps work in EdgeHTML and Trident? What happens in Safari on Mac?

Flags: needinfo?(hsivonen) → needinfo?(masayuki)

(In reply to Henri Sivonen (:hsivonen) from comment #12)

How did these apps work in EdgeHTML and Trident?

Could be not worked or could have a path with UA String check or something.

What happens in Safari on Mac?

I'm still not sure how to type non-BMP char directly from keyboard (without IME) on macOS and Linux.

Flags: needinfo?(masayuki)

In the process of migrating remaining bugs to the new severity system, the severity for this bug cannot be automatically determined. Please retriage this bug using the new severity system.

Severity: major → --

Despite the severity previously being "major", I'm setting this to S4, since this seems to be very-low-impact. :hsivonen/:masayuki, if you disagree, feel free to raise it to S3. (Or even S2, with some explanation.)

Severity: -- → S4
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: