44496 - Unicode encoded clipboard data should contain charset info

Reporter

Description

•

25 years ago

Since clipboard data like "text/hmtl" is Unicode endoded, it should include charset information, like so :"text/html;charset=ISO-10646-UCS-2"; this way programs that would be expecting "text/html" to be plain ASCII can opperate properly. For example, Qt apps expect "text/html" to be ASCII. The list of all charset information is at ftp://ftp.isi.edu/in-notes/iana/assignments/character-sets

Asa Dotzler [:asa]

Comment 1

•

25 years ago

cc:'ing clipboard folks for input.

Stuart Parmenter

Comment 2

•

25 years ago

X clipboard does not do character set stuff. If they want the right data, they should be requesting COMPOUND_TEXT, which will be in the current locale, or UTF8_STRING (which very few apps support). This will give them plain (non-html) text though. I could make text/html always return COMPOUND_TEXT (like TEXT returns either STRING or COMPOUND_TEXT). This might fix QT's problem here.

Assignee: asa → pavlov

Stuart Parmenter

Comment 3

•

25 years ago

moving to future. I don't think this is critical... if you disagree, please let me know

Target Milestone: --- → Future

Keyser Sose

Comment 4

•

25 years ago

Adding [RFE] to Summary.

Summary: Unicode encoded clipboard data should contain charset info → [RFE] Unicode encoded clipboard data should contain charset info

Wesley Tanaka

Comment 5

•

25 years ago

http://www.unicode.org/iuc/iuc10/x-utf8.html It would be nice if mozilla at least did the same thing that netscape 4.76 does on Linux. For instance, if you copy (Alt-C) the first two characters under "Chinese (Simplified)" of the above URL in netscape, xclipboard shows 6 iso latin characters (the same bytes as the UTF-8 encoding for those first two characters). Mozilla doesn't put anything useful on the clipboard at all. (a single newline character?) This is mozilla 2001010606 on Linux, XFree86 3.3.6

Ilya Konstantinov

Comment 6

•

24 years ago

Why don't we use UTF-8 as the interchange format? Should be compatible with most legacy things. Or we can encode it in Latin-1, and have all non-Latin characters encoded as accents or Unicode HTML entities? (seems fair to me, as any receiving application which claims to understand HTML should handle those) IMHO, we should go for this option, as it removes all ambiguities.

Ilya Konstantinov

Comment 7

•

24 years ago

Also, Qt / KDE uses MIME types such as "text/plain;charset=UTF-8" and "text/plain;charset=ISO-10646-UCS-2", so we should provide them. For plain "text/html" type (without the encoding appended), we should encode all non-Latin-1 into HTML character entities.

Ilya Konstantinov

Comment 8

•

24 years ago

http://www.newplanetsoftware.com/xdnd/ (the XDND standard) also mentions use of 'mimetype;charset=...' format for specifying the encoding of the data. Since the mimetype string in X clipboard and DnD is an arbitary text string, I don't anticipate any special compatibility problems (old unaware applications should've request 'text/plain' anyway) - and on the contrary, opens exciting new possibilities of integration (e.g. drag-and-drop or copy-and-paste from the spreadsheet into Mozilla Composer, maintaining the formatting).

Jens Müller (:tessarakt)

Comment 9

•

23 years ago

I have a similiar problem here: I copy in Mozilla (by just selecting the text), and paste in XEmacs (By pressing the middle mouse button). All umlauts and similiar characters are now question marks. Dunno what XEmacs requests, but isn't the default charset of text/plain us-ascii?

timeless

Updated

•

23 years ago

Component: Browser-General → XP Toolkit/Widgets

Summary: [RFE] Unicode encoded clipboard data should contain charset info → Unicode encoded clipboard data should contain charset info

Stuart Parmenter

Updated

•

18 years ago

Assignee: pavlov → jag

QA Contact: doronr → xptoolkit.widgets

Mats Palmgren (inactive)

Updated

•

17 years ago

Assignee: jag → nobody

Anne (:annevk)

Comment 10

•

5 years ago

Closing this as it's not clear this is still actionable. If this is still problematic a good first step would be to raise this with the standard at https://github.com/w3c/clipboard-apis/issues.

Status: NEW → RESOLVED

Closed: 5 years ago

Resolution: --- → INCOMPLETE

Bugzilla

Unicode encoded clipboard data should contain charset info

Categories

(Core :: XUL, enhancement, P3)

Tracking

()

People

(Reporter: matt, Unassigned)

References

Details

Crash Data

Security

(public)

User Story

Description

Comment 1

Comment 2

Comment 3

Comment 4

Comment 5

Comment 6

Comment 7

Comment 8

Comment 9

Updated

Updated

Updated

Comment 10