User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; Q312461; SV1; .NET CLR 1.1.4322) Build Identifier: 1.0 PR Copy into clipboard in FireFox 1.0 uses UTF-8 encoded CF_HTML clipboard format (on Windows) which is right encoding. But drag-n-drop there uses some other encoding with not well known character references. Problem arises with non English text being drag from FireFox. Reproducible: Always Steps to Reproduce: Open any web site using e.g. russian language. For example http://www.google.ru 1. Select any text in russian there. 2. Drag-n-drop it into any application which accepts CF_HTML in drag-n-drop. (You may use my http://blocknote.net or http://www.nvu.com/ (Daniel Glazman, Gecko engine) as targets). 3. Notice how unrecognized char refs appear. Note if you will copy the same text into clipboard and then paste it into target applications everything is fine - CF_HTML contains pure UTF-8 encoded HTML. Actual Results: Russian text appears as: <html><body> <!--StartFragment --> Власти США пока не планируют Expected Results: Cyrillic characters should be UTF-8 encoded instead of use character references out of Latin 1 charset. Mozilla 1.7.3 has the same problem. It would be nice also if Firefox will use CF_HTML v.1.0 instead of outdated v.0.9. IE 6 uses CF_HTML v.1.0. Don't hesitate to contact me if you need more info here.
Daniel, feel like taking a shot at this one?
A prominent, if not free, drop target application to demonstrate this is Microsoft Word.
It seems that in mozilla/content/base/src/nsContentAreaDragDrop.c when nsContentAreaDragDrop::DragGesture is called, there should be an override nsIClipboardDragDropHooks defined for win32 that will convert the selected text to UTF-8 in the OnCopyOrDrag function. From what i can tell, this would be the preferable way to do the translation, but i'd like somebody who has a better idea to confirm that i'm not completely confused.
Component: General → Drag and Drop
Product: Firefox → Core
Version: unspecified → Trunk
Created attachment 167339 [details] [diff] [review] Don't encode entities Well, I'm guessing here that we shouldn't be encoding entities. At least, I seem to be able to drag-n-drop http://www.google.ru/ to Word now. But I've no idea whom to ask for (super)review.
This is an automated message, with ID "auto-resolve01". This bug has had no comments for a long time. Statistically, we have found that bug reports that have not been confirmed by a second user after three months are highly unlikely to be the source of a fix to the code. While your input is very important to us, our resources are limited and so we are asking for your help in focussing our efforts. If you can still reproduce this problem in the latest version of the product (see below for how to obtain a copy) or, for feature requests, if it's not present in the latest version and you still believe we should implement it, please visit the URL of this bug (given at the top of this mail) and add a comment to that effect, giving more reproduction information if you have it. If it is not a problem any longer, you need take no action. If this bug is not changed in any way in the next two weeks, it will be automatically resolved. Thank you for your help in this matter. The latest beta releases can be obtained from: Firefox: http://www.mozilla.org/projects/firefox/ Thunderbird: http://www.mozilla.org/products/thunderbird/releases/1.5beta1.html Seamonkey: http://www.mozilla.org/projects/seamonkey/
This is still happening in 1.5b1. It looks like Neil's patch got forgotten about.
Status: UNCONFIRMED → NEW
Ever confirmed: true
Well, maybe bz can suggest some victims^H^H^H^H^H^H^Hreviewers ;-)
I really don't know.... Maybe one of the editor folks? Because I can see this screwing up drag/drop within editor pretty badly...
The problem here isn't with CF_HTML "HTML Format", it is with text/html. Both are different formats. We are currently generating CF_HTML with UTF8 but we are generating text/html with UCS2. A similar problem also happens if you drag from Chrome onto us, Chrome treats text/html as UTF8 and it provides it to us in that format, but we try to use it as UCS2. It should be UTF8 for both CF_HTML and text/html. (Note: CF_HTML has the extra headers and such). Reference from: The 'text/html' Media Type (RFC 2854) > For the text/html flavor, any registered IANA charset may be used, but UTF-8 is preferred. This will require changes in a a handful of places but we should be handling this data as UTF8 so the change is worth it. I verified and generating the data as text/html makes the paste work in the provided program (BlockNote) in Comment 1 for Russian characters. Fix coming tomorrow.
Ran into some other problems on this a couple weeks ago so that is why no fix on this yet. Have some other stuff ahead of this before I get back to this task.
You need to log in before you can comment on or make changes to this bug.