Astral keystrokes show up as two surrogate keystrokes
Categories
(Core :: Widget: Win32, defect, P3)
Tracking
()
People
(Reporter: hsivonen, Unassigned)
References
Details
Attachments
(1 file)
Steps to reproduce
- Use Windows 10 (I used 1803)
- Right-click an empty part of the Taskbar
- Select Show touch keyboard button if not already checked
- Navigate to https://hsivonen.com/test/moz/input.html
- Click the text field to focus
- Click the touch keyboard button in Taskbar
- Click the emoji button
- Click an emoji
Actual results
The log shows keydown
and keyup
events whose key
string contains an unpaired surrogate. Upon the high surrogate getting entered, it is also visible unpaired in the data
string of the input
event and in the value
of the HTML input
element.
This is bad, since invalid UTF-16 is exposed to scripts, which is bad for bridging to Rust code running as wasm. Marking "major" for this reason.
Expected results
Expected one keystroke per Unicode scalar value carrying two UTF-16 code units in the key
string for astral characters, as in Edge.
Additional info
KeyboardLayout.cpp
seems to be aware of WM_UNICHAR
, yet it doesn't seem to have an effect here.
Not a problem when entering emoji via the Microsoft Pinyin IME, which generates emoji via IME API.
Reporter | ||
Updated•6 years ago
|
Reporter | ||
Comment 1•6 years ago
|
||
(In reply to Henri Sivonen (:hsivonen) from comment #0)
This is bad, since invalid UTF-16 is exposed to scripts, which is bad for bridging to Rust code running as wasm.
See the workarounds proposed in https://github.com/rustwasm/wasm-bindgen/issues/1348
Reporter | ||
Updated•6 years ago
|
Reporter | ||
Comment 2•6 years ago
|
||
I have trouble upgrading even to Windows 10 1809, but it would be good if someone on the Insider channel of Windows 10 could test the new Adlam keyboard at https://hsivonen.com/test/moz/input.html
Reporter | ||
Comment 3•6 years ago
•
|
||
(In reply to Henri Sivonen (:hsivonen) from comment #0)
KeyboardLayout.cpp
seems to be aware ofWM_UNICHAR
, yet it doesn't seem to have an effect here.
There's explicitly a TODO about this:
https://searchfox.org/mozilla-central/rev/44a212460990ffffecf50a8e972d3cbde2e7216b/widget/windows/KeyboardLayout.cpp#1590
Comment 4•6 years ago
|
||
Wow, surprised!! The behavior is really different from CJKT keyboard layouts and the others.
May I take this?
Reporter | ||
Comment 5•6 years ago
|
||
(In reply to Masayuki Nakano [:masayuki] (JST, +0900)(Sick, maybe slow response) from comment #4)
May I take this?
Sure. I was hoping you'd take this. :-)
Comment 6•6 years ago
|
||
The priority flag is not set for this bug.
:jimm, could you have a look please?
Updated•6 years ago
|
Comment 8•3 years ago
|
||
I'm still looking for how to install non-BMP characters without IME (i.e., directly from key typing) on Linux and macOS to check Blink/WebKit's behavior on Linux and macOS. (If it comes without keyboard events, all browsers just use beforeinput
event and input
event to input them on macOS and Linux.)
On Windows, Chrome does same things as Gecko, i.e., dispatching 2 sets of keyboard events. I'd like to check this occurs on the other browsers on the other platforms...
On the other hand, Gecko does not check surrogate pair to dispatch keypress
events which are combined to a set of keyboard events:
https://searchfox.org/mozilla-central/rev/183b0cfc30f2d40f818a08cbea960f6119e2d196/widget/TextEventDispatcher.cpp#718-728
So, Gecko never dispatches keypress
event which represents a non-BMP character in any platforms.
I have a concern about another things about emoji input. If an emoji coming with some variants, keypress
events need to dispatch 2 or more times anyway. So, it seems that it might be better to use keypress
events in such case.
Reporter | ||
Comment 9•3 years ago
|
||
(In reply to Masayuki Nakano [:masayuki] (he/him)(JST, +0900) from comment #8)
If an emoji coming with some variants,
keypress
events need to dispatch 2 or more times anyway.
Yes, but doing it the way IE and EdgeHTML did with the Windows 10 on-screen keyboard (one keystroke per potentially-astral scalar value) ensures that each keystroke represents a valid Unicode scalar value and reflecting any string associated with the event or any string value of the text input field being typed into as UTF-8 to Wasm is OK. (As noted in comment 0, the Microsoft Pinyin IME inputs emoji by IME means and not as keystrokes like the on-screen emoji keyboard, so both ways already exist simultaneously.)
Comment 10•3 years ago
|
||
Created a draft patch, this works fine with our editor.
However, I realize that this change breaks following code in the wild:
str += String.fromCharCode(event.charCode);
because it takes a code point in UTF-16. MDN says "Numbers greater than 0xFFFF
are truncated".
So, at least for backward compatibility, we need to keep dispatching 2 keypress
event, to represent a pair of high-surrogate and low-surrogate with charCode
. On the other hand, we can put key
can have a surrogate pair for the first keypress
, and the other keypress
has empty string.
However, Google Chrome does not set key
value properly for a surrogate pair (setting both key
values to empty string) on Windows. Therefore, web apps cannot use key
for handling text input with keypress
events. Therefore, changing our behavior must break the web.
WDYT, Henri?
Comment 11•3 years ago
|
||
Filed .key
value issue of non-BMP characters of Chrome for Windows.
Reporter | ||
Comment 12•3 years ago
|
||
How did these apps work in EdgeHTML and Trident? What happens in Safari on Mac?
Comment 13•3 years ago
|
||
(In reply to Henri Sivonen (:hsivonen) from comment #12)
How did these apps work in EdgeHTML and Trident?
Could be not worked or could have a path with UA String check or something.
What happens in Safari on Mac?
I'm still not sure how to type non-BMP char directly from keyboard (without IME) on macOS and Linux.
Comment 14•2 years ago
|
||
In the process of migrating remaining bugs to the new severity system, the severity for this bug cannot be automatically determined. Please retriage this bug using the new severity system.
Comment 15•2 years ago
•
|
||
Despite the severity previously being "major", I'm setting this to S4, since this seems to be very-low-impact. :hsivonen/:masayuki, if you disagree, feel free to raise it to S3. (Or even S2, with some explanation.)
Updated•2 years ago
|
Description
•