Closed
Bug 1497580
Opened 7 years ago
Closed 7 years ago
Improve GTK nsClipboard code (make text/html produce UTF-8)
Categories
(Core :: Widget: Gtk, enhancement)
Core
Widget: Gtk
Tracking
()
RESOLVED
FIXED
mozilla64
People
(Reporter: evilpies, Assigned: evilpies)
References
Details
Attachments
(2 files)
Hi Karl!
I have some question about nsClipboard::SelectionGetEvent. I was investigating nsPrimitiveHelpers::CreateDataFromPrimitive fixing in that function.
https://searchfox.org/mozilla-central/rev/80ac71c1c54af788b32e851192dfd2de2ec18e18/widget/gtk/nsClipboard.cpp#574-604
1) Why 8 bits? CreateDataFromPrimitive can return char or char16_t.
2) What's up with all that crazy code for text/html? Is that really necessary? This also seems to assume char16_t.
| Assignee | ||
Updated•7 years ago
|
Flags: needinfo?(karlt)
Comment 1•7 years ago
|
||
> 2) What's up with all that crazy code for text/html? Is that really necessary?
X11 usually works with UTF-8. The comment seems to claim that a recommendation permits UCS2 for text/html if the data starts with a BOM. I don't find that in a quick google search, but for a convention for UCS2 files. Any X11 usage probably pre-dates UTF-8.
A quick search for UTF-16, BOM, and UCS2 in GTK code does not find any matches, and so I suspect conversion to UTF-8 would provide better compatibility with other apps. If you'd like to change that, check LibreOffice for compat. Also check compat with another Mozilla app (or process), or at least check that the Gecko code receiving HTML works with UTF-8.
> This also seems to assume char16_t.
The format (char or char16_t) depends on the flavor passed to nsPrimitiveHelpers. char16_t is what is used for text/html.
1) Why 8 bits? CreateDataFromPrimitive can return char or char16_t.
It's just sending a stream of bytes, which, from the byte-count aDataLen parameter, seems to be what nsPrimitiveHelpers is providing.
https://searchfox.org/mozilla-central/rev/80ac71c1c54af788b32e851192dfd2de2ec18e18/widget/nsPrimitiveHelpers.cpp#74
Flags: needinfo?(karlt)
| Assignee | ||
Comment 2•7 years ago
|
||
(In reply to Karl Tomlinson (:karlt) from comment #1)
> > 2) What's up with all that crazy code for text/html? Is that really necessary?
>
> X11 usually works with UTF-8. The comment seems to claim that a
> recommendation permits UCS2 for text/html if the data starts with a BOM. I
> don't find that in a quick google search, but for a convention for UCS2
> files. Any X11 usage probably pre-dates UTF-8.
>
> A quick search for UTF-16, BOM, and UCS2 in GTK code does not find any
> matches, and so I suspect conversion to UTF-8 would provide better
> compatibility with other apps. If you'd like to change that, check
> LibreOffice for compat. Also check compat with another Mozilla app (or
> process), or at least check that the Gecko code receiving HTML works with
> UTF-8.
>
Great, I will have a look. Is there anything that produces text/html so I could check pasting into Firefox?
> > This also seems to assume char16_t.
>
> The format (char or char16_t) depends on the flavor passed to
> nsPrimitiveHelpers. char16_t is what is used for text/html.
>
> 1) Why 8 bits? CreateDataFromPrimitive can return char or char16_t.
>
> It's just sending a stream of bytes, which, from the byte-count aDataLen
> parameter, seems to be what nsPrimitiveHelpers is providing.
>
> https://searchfox.org/mozilla-central/rev/
> 80ac71c1c54af788b32e851192dfd2de2ec18e18/widget/nsPrimitiveHelpers.cpp#74
I am mostly just wondering how other apps are supposed to interpret that data, but I don't actually know what other kind of MIME types that we support would be requested by other applications.
| Assignee | ||
Comment 3•7 years ago
|
||
Seems like Libre Office can handle UTF-8 encoded text/html, but Firefox can't ...
Comment 4•7 years ago
|
||
(In reply to Tom Schuster [:evilpie] from comment #2)
> (In reply to Karl Tomlinson (:karlt) from comment #1)
> > A quick search for UTF-16, BOM, and UCS2 in GTK code does not find any
> > matches, and so I suspect conversion to UTF-8 would provide better
> > compatibility with other apps. If you'd like to change that, check
> > LibreOffice for compat. Also check compat with another Mozilla app (or
> > process), or at least check that the Gecko code receiving HTML works with
> > UTF-8.
> >
> Great, I will have a look. Is there anything that produces text/html so I
> could check pasting into Firefox?
I'm guessing that LibreOffice (editing HTML) and/or Chromium produce text/html, but I don't know TBH.
Note that the "STRING" etc block above almost does the same thing (with unnecessary "New" string allocations), but "text/unicode" would need to be changed.
> I am mostly just wondering how other apps are supposed to interpret that
> data, but I don't actually know what other kind of MIME types that we
> support would be requested by other applications.
|selectionTarget| indicates how the data will be interpreted.
Images have already been handled above, and so the common other one would be text/plain. URLs may be another - I don't know the mime type for that.
Comment 5•7 years ago
|
||
(In reply to Tom Schuster [:evilpie] from comment #3)
> Seems like Libre Office can handle UTF-8 encoded text/html, but Firefox
> can't ...
I guess that is this case:
https://searchfox.org/mozilla-central/rev/65f9687eb192f8317b4e02b0b791932eff6237cc/widget/gtk/nsClipboard.cpp#676
I guess ideally Mozilla should present this as "text/html;charset=utf-8". The variation from text/html would need to be performed via a new gtk_target_list_add() special case in nsClipboard::SetData().
Comment 6•7 years ago
|
||
(In reply to Karl Tomlinson (:karlt) from comment #5)
> I guess ideally Mozilla should present this as "text/html;charset=utf-8".
Although that seems unnecessary if the charset may be in the html.
Maybe ConvertHTMLtoUCS2() should just assume utf-8 if no charset.
| Assignee | ||
Comment 7•7 years ago
|
||
Using gtk_targets_include_text actually leads to a small behavior difference,
because gtk_targets_include_text also includes text/plain.
But gtk_selection_data_set_text seems to correctly convert UTF-8 to plain text.
Updated•7 years ago
|
Attachment #9016040 -
Attachment description: Bug 1497580 - Start cleaning up GTK clipboard code. r?karlt → Bug 1497580 - use gtk_target_list_add_text_targets for text/unicode clipboard data r?karlt
| Assignee | ||
Updated•7 years ago
|
Keywords: leave-open
Pushed by evilpies@gmail.com:
https://hg.mozilla.org/integration/autoland/rev/80b93ae22c27
use gtk_target_list_add_text_targets for text/unicode clipboard data r=karlt
| Assignee | ||
Comment 9•7 years ago
|
||
I just implemented the idea of adding a `<meta http-equiv="content-type" content="text/html; charset=utf-8">` to all text/html data. At least some version of WebKit seems to do the same: https://github.com/clbr/webkitfltk/blob/b5d1ddf4649ee9d7498a7b78fc48ac76d4adde1b/Source/WebCore/platform/gtk/PasteboardHelper.cpp#L142-L146.
This approach works quite well, even with previous Firefox version. I don't think we can get away with not adding the prefix and always assuming UTF-8, because that would basically break other (older) Gecko based software, as those would not be able to detect the encoding in that case.
| Assignee | ||
Updated•7 years ago
|
Assignee: nobody → evilpies
| Assignee | ||
Comment 10•7 years ago
|
||
I verified that we can still copy from Firefox to an older version of Firefox without this patch.
LibreOffice also still works. Talking to some GTK people on IRC they are also happy about UTF-8 instead of wrongly declared UCS2.
Comment 11•7 years ago
|
||
| bugherder | ||
Comment 12•7 years ago
|
||
Pushed by evilpies@gmail.com:
https://hg.mozilla.org/integration/autoland/rev/89f4156c07a5
Save text/html as UTF-8 to the clipboard. r=karlt
Comment 13•7 years ago
|
||
Backout by dluca@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/7f30799c0c32
Backed out changeset 89f4156c07a5 for CL failures in dom/base/test/test_bug166235.html
Comment 14•7 years ago
|
||
Backed out changeset 89f4156c07a5 (bug 1497580) for CL failures in dom/base/test/test_bug166235.html
Backout revision https://hg.mozilla.org/integration/autoland/rev/7f30799c0c324bbafe48e9ee63a9a8cc50215037
Failed push https://treeherder.mozilla.org/#/jobs?repo=autoland&resultStatus=testfailed%2Cbusted%2Cexception&classifiedState=unclassified&group_state=expanded&searchStr=linux%2Cdebug%2Cmochitests%2Ctest-linux32%2Fdebug-mochitest-clipboard%2Cm%28cl%29&revision=89f4156c07a5ab35fd1d9a97354341bee9d11e2a
Failure logs: https://treeherder.mozilla.org/logviewer.html#?job_id=205280909&repo=autoland
:evilpie can you please take a look?
Flags: needinfo?(evilpies)
| Assignee | ||
Comment 15•7 years ago
|
||
I wonder what is going on:
[task 2018-10-13T19:47:23.002Z] 19:47:23 INFO - GECKO(3395) | [3395, Main Thread] ###!!! ASSERTION: nsTDependentString must wrap only null-terminated strings. You are probably looking for nsTDependentSubstring.: 'this->mData[substring_type::mLength] == 0', file /builds/worker/workspace/build/src/obj-firefox/dist/include/nsTString.h, line 491
[task 2018-10-13T19:47:23.003Z] 19:47:23 INFO - GECKO(3395) | #01: nsTString<char>::AssertValidDependentString() [xpcom/string/nsTString.h:489]
[task 2018-10-13T19:47:23.003Z] 19:47:23 INFO -
[task 2018-10-13T19:47:23.004Z] 19:47:23 INFO - GECKO(3395) | #02: nsClipboard::GetData(nsITransferable*, int) [xpcom/string/nsTDependentString.h:66]
So I assume this has something to do with the Substring(data, prefixLen).EqualsLiteral(kHTMLMarkupPrefix)? But doesn't Substring return an nsTDependentSubstring already https://searchfox.org/mozilla-central/rev/26b40a44691e0710838130b614c2f2662bc91eec/xpcom/string/nsTDependentSubstring.h#134.
Flags: needinfo?(evilpies)
Comment 16•7 years ago
|
||
Pushed by evilpies@gmail.com:
https://hg.mozilla.org/integration/autoland/rev/bb2acd046eae
Save text/html as UTF-8 to the clipboard. r=karlt
| Assignee | ||
Comment 17•7 years ago
|
||
I seriously have no idea what was going on there, I just went with strncmp for now, that should work at least.
Comment 18•7 years ago
|
||
Backed out for clipboard failures on test_clipboard_events.html
Push link: https://hg.mozilla.org/integration/autoland/rev/bb2acd046eae1fbedfe11348dc3f31530a645fd8
Backout link: https://hg.mozilla.org/integration/autoland/rev/96a0fd705c2abbdd5e7905362bf7bac74f0db8ee
Log link: https://treeherder.mozilla.org/logviewer.html#?job_id=205946075&repo=autoland&lineNumber=20559
Flags: needinfo?(evilpies)
| Assignee | ||
Updated•7 years ago
|
Flags: needinfo?(evilpies)
Comment 19•7 years ago
|
||
There were also mochitest perma failures on test_bug333064.html
Log link: https://treeherder.mozilla.org/logviewer.html#?job_id=205979304&repo=autoland&lineNumber=12653
| Assignee | ||
Comment 20•7 years ago
|
||
This time the try push looks good: https://treeherder.mozilla.org/#/jobs?repo=try&revision=d37506dfd212e070627dde3d15071628ba4c357b
Comment 21•7 years ago
|
||
Pushed by evilpies@gmail.com:
https://hg.mozilla.org/integration/autoland/rev/695366afa355
Save text/html as UTF-8 to the clipboard. r=karlt
Comment 22•7 years ago
|
||
| bugherder | ||
| Assignee | ||
Updated•7 years ago
|
Status: NEW → RESOLVED
Closed: 7 years ago
Keywords: leave-open
Resolution: --- → FIXED
Target Milestone: --- → mozilla64
| Assignee | ||
Updated•7 years ago
|
Summary: Clarify nsClipboard::SelectionGetEvent → Improve GTK nsClipboard code (make text/html produce UTF-8)
You need to log in
before you can comment on or make changes to this bug.
Description
•