Closed Bug 44496 Opened 24 years ago Closed 4 years ago

Unicode encoded clipboard data should contain charset info

Categories

(Core :: XUL, enhancement, P3)

x86
Linux
enhancement

Tracking

()

RESOLVED INCOMPLETE
Future

People

(Reporter: matt, Unassigned)

Details

Since clipboard data like "text/hmtl" is Unicode endoded, it should
include charset information, like so :"text/html;charset=ISO-10646-UCS-2";
this way programs that would be expecting "text/html" to be plain
ASCII can opperate properly.  For example, Qt apps expect "text/html"
to be ASCII.

The list of all charset information is at
ftp://ftp.isi.edu/in-notes/iana/assignments/character-sets
cc:'ing clipboard folks for input.
X clipboard does not do character set stuff.  If they want the right data, they
should be requesting COMPOUND_TEXT, which will be in the current locale, or
UTF8_STRING (which very few apps support).  This will give them plain (non-html)
text though.  I could make text/html always return COMPOUND_TEXT (like TEXT
returns either STRING or COMPOUND_TEXT).  This might fix QT's problem here.
Assignee: asa → pavlov
moving to future.  I don't think this is critical...  if you disagree, please
let me know
Target Milestone: --- → Future
Adding [RFE] to Summary.
Summary: Unicode encoded clipboard data should contain charset info → [RFE] Unicode encoded clipboard data should contain charset info
http://www.unicode.org/iuc/iuc10/x-utf8.html

It would be nice if mozilla at least did the same thing that netscape 4.76 does
on Linux.  For instance, if you copy (Alt-C) the first two characters under
"Chinese (Simplified)" of the above URL  in netscape, xclipboard shows 6 iso
latin characters (the same bytes as the UTF-8 encoding for those first two
characters).  Mozilla doesn't put anything useful on the clipboard at all.  (a
single newline character?)

This is mozilla 2001010606 on Linux, XFree86 3.3.6
Why don't we use UTF-8 as the interchange format? Should be compatible with most
legacy things.

Or we can encode it in Latin-1, and have all non-Latin characters encoded as
accents or Unicode HTML entities? (seems fair to me, as any receiving
application which claims to understand HTML should handle those)
IMHO, we should go for this option, as it removes all ambiguities.
Also, Qt / KDE uses MIME types such as "text/plain;charset=UTF-8" and
"text/plain;charset=ISO-10646-UCS-2", so we should provide them.

For plain "text/html" type (without the encoding appended), we should encode all
non-Latin-1 into HTML character entities.
http://www.newplanetsoftware.com/xdnd/ (the XDND standard) also mentions use of
'mimetype;charset=...' format for specifying the encoding of the data.

Since the mimetype string in X clipboard and DnD is an arbitary text string, I
don't anticipate any special compatibility problems (old unaware applications
should've request 'text/plain' anyway) - and on the contrary, opens exciting new
possibilities of integration (e.g. drag-and-drop or copy-and-paste from the
spreadsheet into Mozilla Composer, maintaining the formatting).
I have a similiar problem here: I copy in Mozilla (by just selecting the text),
and paste in XEmacs (By pressing the middle mouse button).  All umlauts and
similiar characters are now question marks.

Dunno what XEmacs requests, but isn't the default charset of text/plain us-ascii?
Component: Browser-General → XP Toolkit/Widgets
Summary: [RFE] Unicode encoded clipboard data should contain charset info → Unicode encoded clipboard data should contain charset info
Assignee: pavlov → jag
QA Contact: doronr → xptoolkit.widgets
Assignee: jag → nobody

Closing this as it's not clear this is still actionable. If this is still problematic a good first step would be to raise this with the standard at https://github.com/w3c/clipboard-apis/issues.

Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → INCOMPLETE
You need to log in before you can comment on or make changes to this bug.