Dragging CJK text from external apps to Firefox results in corrupted characters (U+FFFD)
Categories
(Core :: Widget: Gtk, defect, P3)
Tracking
()
| Tracking | Status | |
|---|---|---|
| firefox-esr115 | --- | unaffected |
| firefox-esr140 | --- | unaffected |
| firefox148 | --- | wontfix |
| firefox149 | --- | verified |
| firefox150 | --- | verified |
People
(Reporter: oceancat365, Assigned: stransky)
References
(Blocks 1 open bug, Regression)
Details
(Keywords: nightly-community, regression)
Attachments
(3 files)
User Agent: Mozilla/5.0 (X11; Linux x86_64; rv:146.0) Gecko/20100101 Firefox/146.0
Steps to reproduce:
- On Arch Linux (with KDE Plasma 6.5.5), open any application other than Firefox (e.g., Kate or Telegram Desktop)
- Select a string that contains CJK characters in that application.
- Drag the selected text and drop it onto the Firefox tab bar (which triggers a search).
- Drag the selected CJK text and drop it into a text input field on a web page opened in Firefox.
- Drag CJK text to anywhere within Firefox, for example, in the same tab, to another tab, or to the tab bar.
- Repeat steps 3, 4, and 5 with pure English text.
Actual results:
- When dropping CJK text on the tab bar, Firefox opens a new tab and searches for the Unicode Replacement Character (U+FFFD).
- In an input field, if the CJK text is short, it pastes U+FFFD too. If the text is relatively long, sometimes a portion of the characters is successfully moved, but often it results in corruption.
- Dragging any text within Firefox, with or without CJK characters, works perfectly fine.
- English text works correctly in all scenarios.
Expected results:
Firefox should correctly receive the UTF-8 string from the Wayland drag-and-drop protocol and search for or paste the original CJK text, matching the behavior of English text and internal drag operations.
Comment 1•4 months ago
|
||
The Bugbug bot thinks this bug should belong to the 'Core::Widget: Gtk' component, and is moving the bug to that component. Please correct in case you think the bot is wrong.
| Assignee | ||
Comment 2•4 months ago
|
||
Can you run on terminal with MOZ_LOG="WidgetDrag:5" env variable, reproduce the issue and attach the log here?
Thanks.
| Assignee | ||
Comment 4•3 months ago
|
||
I wonder if we use just wrong MIME type for the text string so the result is corrupted. But looks like we're getting the data correctly in UTF-8 format.
Please check the attached log and look for "DragData() plain data MIME" entry - is the text correct?
Thanks.
Comment 5•3 months ago
|
||
I can reproduce the issue on kbuntu24.04 KDE x11 and wayland as well.
Regression window:
https://hg.mozilla.org/integration/autoland/pushloghtml?fromchange=8fbadf4f5d50384e02b557b7ac1067c19bd950f5&tochange=f2dbb34f6867ea6067ce07343069a362717a0f63
Updated•3 months ago
|
Updated•3 months ago
|
Hi, I checked the logs as requested.
Testing with the string "我能吞下玻璃而不伤身体" (I can eat glass, it doesn't hurt me) shows the following:
[Parent 18562: Main Thread]: D/WidgetDrag [D 2][7f53b58d7400] nsDragSession::TargetDataReceived(7f53f723b710) MIME text/plain;charset=utf-8
[Parent 18562: Main Thread]: D/WidgetDrag [D 2][7f53b58d7400] TargetDataReceived(): plain data, MIME text/plain;charset=utf-8 len = 11
[Parent 18562: Main Thread]: D/WidgetDrag DragData() plain data MIME: text/plain;charset=utf-8 : æè½åä¸
[Parent 18562: Main Thread]: D/WidgetDrag [D 1][7f53b58d7400] text/plain;charset=utf-8 received
I dug a bit deeper and found the "garbled" text may be truncation at the byte level.
The output æè½åä¸ occurs because the raw UTF-8 bytes are being cut off and then interpreted as single-byte characters (likely Latin-1).
- Input String: "我能吞下玻璃而不伤身体" (11 CJK characters)
- Original HEX (UTF-8):
e6 88 91 e8 83 bd e5 90 9e e4 b8 8b e7 8e bb e7 92 83 e8 80 8c e4 b8 8d e4 bc a4 e8 ba ab e4 bd 93(33 bytes) - Log shows
len = 11. It seems the code uses the character count (11) to determine the buffer size or read length in bytes, instead of the actual byte length. - The first 11 bytes of the HEX are:
e6 88 91 e8 83 bd e5 90 9e e4 b8. - Interpreting these 11 bytes as Latin-1 gives exactly
æè½åä¸. The 4th character is corrupted because its 3rd byte was dropped.
- Another case: "测试" (2 CJK characters)
- Original HEX (UTF-8):
e6 b5 8b e8 af 95(6 bytes) - Log shows
len = 2. - The first 2 bytes are
e6 b5, which renders asæµin the log.
- UTF-16 test: "🐶🐶🐶"
- Original HEX:
f0 9f 90 b6 f0 9f 90 b6 f0 9f 90 b6(12 bytes) - Log shows
TargetDataReceived(): plain data, MIME text/plain;charset=utf-8 len = 3 - The first 3 bytes are
f0 9f 90, renders asðin the log
I assumed that it is only requesting or reading N bytes for a string of N characters.
This works for ASCII where the ratio is 1:1, but causes truncation for multi-byte text.
Comment 7•3 months ago
|
||
:handyman, since you are the author of the regressor, bug 1966443, could you take a look?
| Assignee | ||
Comment 8•3 months ago
|
||
Thanks for the analysis. It's because we use g_utf8_strlen() to get sting len in chars in utf8. Will look at it.
| Assignee | ||
Updated•3 months ago
|
| Assignee | ||
Comment 9•3 months ago
|
||
It's caused by this revision: https://phabricator.services.mozilla.com/D256877
We need to add corresponding call to DragData to deal with UTF8 as char len and not byte len.
| Assignee | ||
Updated•3 months ago
|
| Assignee | ||
Comment 10•3 months ago
|
||
Updated•3 months ago
|
Comment 11•3 months ago
|
||
Comment 12•3 months ago
|
||
| bugherder | ||
Comment 13•3 months ago
|
||
The patch landed in nightly and beta is affected.
:stransky, is this bug important enough to require an uplift?
- If yes, please nominate the patch for beta approval.
- See https://wiki.mozilla.org/Release_Management/Requesting_an_Uplift for documentation on how to request an uplift.
- If no, please set
status-firefox149towontfix.
For more information, please visit BugBot documentation.
Comment 14•3 months ago
|
||
firefox-beta Uplift Approval Request
- User impact if declined: Broken D&D of CJK text.
- Code covered by automated testing: no
- Fix verified in Nightly: no
- Needs manual QE test: yes
- Steps to reproduce for manual QE testing: D&D any CJK text.
- Risk associated with taking this patch: low
- Explanation of risk level: We use correct text length for D&D.
- String changes made/needed: none
- Is Android affected?: no
| Assignee | ||
Comment 15•3 months ago
|
||
Original Revision: https://phabricator.services.mozilla.com/D286641
Updated•3 months ago
|
Updated•3 months ago
|
Comment 16•3 months ago
|
||
| uplift | ||
Updated•3 months ago
|
| Assignee | ||
Updated•3 months ago
|
Comment 17•3 months ago
|
||
Reproducible on a 2026-03-07 Firefox Nightly build on Ubuntu 22, following the STR from Comment 0.
Verified as fixed on Firefox Nightly 150.0a1 and Firefox 149.0b7 on Ubuntu 22.
Description
•