Closed Bug 336012 Opened 18 years ago Closed 18 years ago

[10.4] Copy/paste from browser to Terminal results in encoded URL instead of selected text

Categories

(Core Graveyard :: Widget: Mac, defect)

PowerPC
macOS
defect
Not set
major

Tracking

(Not tracked)

VERIFIED FIXED

People

(Reporter: mak, Assigned: mark)

References

()

Details

(Keywords: regression)

Attachments

(2 files, 2 obsolete files)

User-Agent:       Mozilla/5.0 (Macintosh; U; PPC Mac OS X Mach-O; chrome://navigator/locale/navigator.properties; rv:1.9a1) Gecko/20060429 Camino/1.2+
Build Identifier: Mozilla/5.0 (Macintosh; U; PPC Mac OS X Mach-O; chrome://navigator/locale/navigator.properties; rv:1.9a1) Gecko/20060429 Camino/1.2+

Copying text from Camino and pasting it into Terminal.app results in a string like: "%00h%00t%00t%00p%00:%00/%00/%00n%00e%00w%00s%00.%00g%00o%00o%00g%00l%00e%00.%00c%00o%00m%00/". (Luckily copy/paste works fine in the other direction so I could enter the above string.)

Reproducible: Always

Steps to Reproduce:
1. Select text to copy in Camino, and copy using cmd-C or Edit->Copy
2. Paste into Terminal window at shell prompt using cmd-V or Edit->Paste


Actual Results:  
Pasting the string "Mohammed" resulted in the string "%00h%00t%00t%00p%00:%00/%00/%00n%00e%00w%00s%00.%00g%00o%00o%00g%00l%00e%00.%00c%00o%00m%00/"
in the Terminal window. Note that this seems to be some form of the URL
from Camino (in this case http://news.google.com/). 

Expected Results:  
Should have pasted "Mohammed". 

Pasting works correctly from Camino to all othe other apps that I tried, including TextEdit, Mail, Safari, Address Book. Note that pasting from any other app into Terminal seems to work ok.
Does this happen on all URLs, or only specific ones? I can't reproduce by copying from this page, for instance.

cl
This is reproducible for me using both of these Trunk builds on fresh profiles:
* Gecko/20060430 Camino/1.2+
* Gecko/20060430 Minefield/3.0a1

Its not reproducible with a Camino 1.0+ build (1.8 branch) from the same date.


Testpage: http://www.mozilla.org/projects/camino/homepage.html
Can we find out when this regressed?
This regressed between these builds (Camino):
2006042022 (1.2+) <- OK
2006042205 (1.2+) <- Broken

Regression window: http://tinyurl.com/o55tb
You beat me to it! I was just tracking down the cause...

This bug was caused by attachment 218505 [details] [diff] [review] of bug 315370. Backing out the first part of that patch removes the bug.

The part of the patch in question is in content/base/src/nsCopySupport.cpp. I'll reproduce it below. Basically, two shortcut.Append() calls were deleted, and the third arg of AppendString() was changed from kURLMime to kURLDataMime.

Here's the patch:

-            shortcut.Append(PRUnichar('\n'));
-            shortcut.Append(aDoc->GetDocumentTitle());
 
-            // Add the URL DataFlavor to the transferable
-            rv = AppendString(trans, shortcut, kURLMime);
+            // Add the URL DataFlavor to the transferable. Don't use kURLMime, as it will
+            // cause an unnecessary UniformResourceLocator to be added which confuses
+            // some apps eg. Outlook 2000 - (See Bug 315370)
+            rv = AppendString(trans, shortcut, kURLDataMime  );
Status: UNCONFIRMED → NEW
Component: General → Widget: Mac
Ever confirmed: true
OS: Mac OS X 10.3 → Mac OS X 10.4
Product: Camino → Core
Summary: Copy/paste from Camino to Terminal results in gibberish text → Copy/paste from Browser to Terminal results in gibberish text
Version: unspecified → Trunk
And I better make it clear that the patch above is the patch that *caused* the bug... not the patch I applied to reverse it ;)
Can now confirm that it's solely the use of kURLDataMime instead of kURLMime that causes this bug.

I've put a note over on bug 315370, so hopefully it'll get some attention.
Always a good idea to CC the person assigned to the bug that caused the regression :)
I'm wondering if the Mac equivalent of the nsDataObj.cpp code is not adding all the flavours to the clipboard so that "Terminal" picks the wrong one?

Is the order of flavours relevant on a Mac?

I'm not a Mac Developer by any means so I'm a bit out of my depth here. I'd appreciate any pointers or suggestions.

I do have access to a Mac at work, though it isn't set up for development at the moment so could take me a while to get code compiling on it (let alone understand what it's doing!)

-dave
I found a little OS X utility on VersionTracker called "Ugly Multiclip" that, among other things, allows you to have multiple clipboards and view the scrap types and sizes of each.

I created a plain text file with nothing in it but a single number '1'. I opened the file in a copy of Firefox that has the copy/paste bug, and one that doesn't. In each, I copied the text to a clipboard and got info on it.

The first 6 scrap types and sizes were identical for each. In listed order (I assume it's the clipboard order, but can't say for certain), they are:
MZ (26), MZ (74), MZ  (6), utxt (4), TEXT (2), styl (22)
The numbers in brackets are the size of each, in bytes.

The last (9th) scrap type and size was also identical for each:
ut16 (6)

The 7th and 8th scraps are the ones that varied. For the new (buggy paste) browser, they are:
url (68), MOZm (85)

For the older (good paste), they are:
MZ (66), MOZm (111)

The main difference between the two seems to be the inclusion of a "url" scrap in the new copy.

For reference, a copy of the equivalent plain text in *Safari* gives a much smaller clipboard. 4 scraps total:
utf8 (1), utxt (2), ut16 (4), TEXT (1)

Interestingly, if you right/ctrl click on a hyperlink and choose "Copy Link Location" in *Firefox*, then it pastes fine from both old and new browsers, so this form of copy hasn't been affected by the change. For reference, an example of this is:
utxt (36), TEXT (18), styl (22), MOZm (1), ut16 (38)

But if I copy a link from *Safari*, it gives a clipboard with a 'url' scrap type, but that pastes fine into Terminal (it just pastes a text form of the url). So it can't be just the presence of a url scrap that is causing the problem. As a reference, an example of this is:
url (18), urln (48), utf8 (18), utxt (36), ut16 (38), TEXT (18)

What all this means, I have no idea! I hope some of this helps, and I haven't just confused you further...
(Josh, I hope it's OK to cc you on this... maybe you have some insight that will help)

Sorry for all the spam, if I'd discovered Apple's "Clipboard Viewer" (part of the Developer Tools) earlier, it would have saved me posting that last epic.

I now think a couple of different things are going on here:

(1) Paste is preferentially going for the url flavor in the clipboard data, if that exists. Why, I don't know. Perhaps it's the order, but perhaps it's just intended to be that way. Maybe only Mozilla browsers add a url scrap flavour to the clipboard when there's no actual url being copied (yet neglect to add a url scrap when you're actually intending to copy one)! Maybe they shouldn't?

(2) For some reason, the url sent to the clipboard by Firefox has a whitespace inserted before *every* character in the url (' h t t p : / / w . w . w . ' etc.) This isn't the case if I copy a link in Safari, which it comes out clean in a paste, so I suspect it's Firefox's fault. The whitespaces in the url are translated to %00s, which gives the gibberish appearance.

I suspect if (2) is fixed, then Firefox&Camino copy/pastes will still come out (wrongly) as urls in Terminal, but they'll at least be legible.
Assignee: nobody → joshmoz
QA Contact: general → mac
The intention of the original code (which I think goes back to my original patch 244685 for OneNote compatibility) was to pass through the URL of the current page so that it was available to add to the CF_HTML flavour.

The original code used the kURLMime tag which includes the Title and URL. The intention of the recent patch for bug 315370 modified this to just pass the URL through (using kURLDataMime) as that was actually all we needed.

Maybe instead of using that type, I need to create a new private "Mime" type for this so it doesn't get confused.

I'm assuming the spaces between the characters are related to some Unicode thing.

-dave
kURLMime isn't defined as anything in the Mac scrap flavours, so it's never had an effect. I imagine that if it had been linked to 'url ', this problem would have shown up in Terminal long ago.

kURLDataMime, however, is linked to the flavor kScrapFlavorTypeURL in Mac, which is defined as 'url '. In Windows, it's just defined as CF_UNICODETEXT.

I don't understand the significance of this. But it kinda looks like the scrap is being used in different ways in Mac and Windows, and now that's causing an issue. But I'm just guessing.

Simon Fraser created the Mac definition of kURLDataMime in version 1.28 of /source/widget/src/mac/nsMimeMapper.cpp (see bug 193053). Before then, this bug couldn't have occurred because Mac would have ignored it.

I'm cc'ing Simon on this. He should understand the Mac scrap flavours and I'm hoping he can suggest a way around the problem. I hope that's ok, Simon? :)
How/why is this only broken on 10.4, too?
Keywords: regression
Summary: Copy/paste from Browser to Terminal results in gibberish text → [10.4] Copy/paste from Browser to Terminal results in gibberish text
I think the data for the 'url ' flavor should be a C-string url, and not unicode.

Maybe the Terminal didn't accept 'url ' before Tiger?
Not using unicode for the 'url' flavour may result in a legible url being pasted, instead of the "%00" formatted one that we see currently. I'm not sure it'd fix the real problem, though, which is that the copied text is supposed to be pasted, not the url.

It looks like Tiger notices the 'url' flavour as soon as it's added, because an "Apple URL pasteboard type" is added at the same time. I can't see any Moz code that adds this, so I presume it's OS X that does it. Can anyone still using Panther or Jaguar, who also has the Developer Tools, open Clipboard Viewer and check if it's added there?

Oh, also noting that switching to tcsh doesn't change anything.
Once I realized Wayne was talking about one of the sample projects instead of a pre-built Apple tool, I looked at the clipboard after a copy operation from CaminoTrunk in 10.3.9.

I see 11 different types on the clipboard when copying "downloading Camino®. We hope" from http://www.caminobrowser.org/start/.  I assume they're listed in the order of appearance.  (In the following, the "periods" are the visual representation in ClipboardViewer of the null bytes that are the first byte in UTF-16 for characters that are single-byte in UTF-8 or ISO-8859-1; these translate to the spaces in Wayne's comment 11 point 2.)

CorePasteboardFlavorType 0x4D5A0000 (UTF-16 raw text with extra spaces at the sentence break)

CorePasteboardFlavorType 0x4D5A0001 (appears to be the HTML that surrounds the text--but not the actual text, again UTF-16)

CorePasteboardFlavorType 0x4D5A0002 (UTF-16, contains only 0,0)

CorePasteboardFlavorType 0x75747874 (similar to the first CorePasteboardFlavorType, but no extra spaces at the sentence break)

NSStringPboardType (UTF-8 raw text)

CorePasteboardFlavorType 0x54455854 (content identical to NSStringPboardType)

CorePasteboardFlavorType 0x7374796C (22 periods; opens everywhere as a few spaces; I assume this to be UTF-16)

NeXT Rich Text Format v1.0 pasteboard type (self-explanatory)

CorePasteboardFlavorType 0x75726C20 (page URL as UTF-8/UTF-16, i.e, the periods between every actual character) 

Apple URL pasteboard type (page url inside a plist; plist is UTF-8 but the url inside the <string></string> appears to be UTF-16; the contents of the <string></string> are *identical* to what gets pasted in Terminal.app Comment 0)

CorePasteboardFlavorType 0x4D4F5A6D (some sort of listing of MIME types with numbers; text/html, text/_moz_htmlcontent, and text/_moz_htmlinfo are the listed MIME types)

If the order of the types and flavors is the order in which they're on the clipboard, and if the order is the same in 10.4 as it is here on 10.3, I see absolutely no reason why Termial.app should be grabbing the 10th item!  Everyone on 10.4 can do the copy personally and compare the flavors and orders to my "epic" listing (though that won't help the gentleman whose checkin seems to have caused this).
Blocks: 315370
The hex digits are carbon-style clipboard flavors:

0x4D5A0000 = 'MZ\0\0'
0x4D5A0001 = 'MZ\\0\1'
0x4D5A0002 = 'MZ\\0\2'
0x75747874 = 'utxt'
0x54455854 = 'TEXT'
0x7374796C = 'styl'
0x75726C20 = 'url '
0x4D4F5A6D = 'MOZm'
Summary: [10.4] Copy/paste from Browser to Terminal results in gibberish text → [10.4] Copy/paste from browser to Terminal results in encoded URL instead of selected text
There are two problems here:

1. Terminal doesn't accept the copied text, but instead accepts the URL of the document that the text was copied from.  We are now adding the 'url ' flavor to the clipboard, which Terminal seems to prefer over other flavors that contain the data you expect to be pasted ('TEXT' or 'utxt').  This aspect of the bug is probably a Terminalism.

2. The 'url ' data is going onto the clipboard as UTF-16, but Terminal is treating it as though it's byte-per-char data.  Maybe this is our problem.  Maybe the 'url ' flavor is expected to be UTF-8 or something.
Attached patch Patch for part 2 (obsolete) — Splinter Review
This patch solves comment 19 part 2.  It's not a complete fix to this bug, but it makes Mozilla apps put 'url '-flavored data onto the clipboard in the expected one-byte-per-character format.  (URLs are expected to have characters outside of the ASCII range escaped anyway, so the script doesn't make much of a difference.)
Assignee: joshmoz → mark
Status: NEW → ASSIGNED
Attachment #224734 - Flags: review?(hwaara)
Comment on attachment 224734 [details] [diff] [review]
Patch for part 2

r=me, with some extra comment-sprinkling
Attachment #224734 - Flags: review?(hwaara) → review+
Attachment #224734 - Flags: superreview?(mikepinkerton)
don't you have to dispose of |castedUnicode|? |ConvertUnicharLineBreaksInSitu()| may allocate a new buffer if it needs to.
Attached patch Always use utf-8 (obsolete) — Splinter Review
I decided that it's probably totally bogus for 'url '-flavored data to be on the clipboard in the system script, and that it should always be UTF8.  Gotta test that theory first though.

This fixes the other user of the in-situ linebreak converter in that file, but as the comments point out, converting between '\r' and '\n' isn't likely to cause a new allocation.
David Gardiner:

Is there anything special about kURLDataMime (text/x-moz-url-data) here?  That type is mapped to something special, used for drag-n-drop.  Reusing it in bug 315370 had the unintended effect of making some applications (such as the Mac Terminal) pick the URL up from the clipboard instead of the text the user intended.

Could we map this to something that will remain more private?  That would fix comment 19 part 1.
Mark,
 I am pretty sure that we could probably change it to a "private" flavour so that it wouldn't be recognised.

The main reason I picked it was that it seemed logical and fitted the bill as far as what the data was. Unfortunately it looks like not everything else is logical too :-)

-dave
I'm a bit confused. Have we always been putting 'url ' data in Unicode format on the clipboard up until now? How has that worked for all this time?
No, we used kURLMime for this until bug 315370:

http://bonsai.mozilla.org/cvsview2.cgi?subdir=mozilla/content/base/src&command=DIFF_FRAMESET&file=nsCopySupport.cpp&rev2=1.48&rev1=1.47

kURLMime did not map to any specific flavor, so a private custom type would be assigned at runtime.  kURLDataMime always maps to 'url '.

http://bonsai.mozilla.org/cvsblame.cgi?file=mozilla/widget/src/mac/nsMimeMapper.cpp&rev=1.30&mark=115-116,128-136#97
The data had always been going onto the clipboard as UTF-16, only the flavor has changed, from something private to something that Terminal will prefer over text ('TEXT', 'utxt', ...) for a paste operation.
I talked to ftang about this and we'll just go with UTF-8 all around.  The %-encoding means that it really doesn't matter.  The 'url ' flavor doesn't have any defined encoding probably because it doesn't need one.  So long as the characters are 8 bits, the %-encoding keeps everything in the ASCII subset that's the same across all sane character sets.  Safari and Mozilla both use %-encoded URLs for the 'url ' flavor.  I'll use UTF-8 because 16<->8 transformations are easy.

This patch includes better comments.
Attachment #224734 - Attachment is obsolete: true
Attachment #224764 - Attachment is obsolete: true
Attachment #224881 - Flags: review?(hwaara)
Attachment #224734 - Flags: superreview?(mikepinkerton)
Attached patch Fix for part 1Splinter Review
This moves URL data back into a private flavor on the Mac.  Windows testing needed.  David?
Attachment #224887 - Flags: review?(David.R.Gardiner)
Comment on attachment 224881 [details] [diff] [review]
Patch for part 2, always use utf-8, better comments (checked in, trunk and 1.8.1)

>+          // UTF-8 because it's easy.  UTF-8 will also be used for
>+          // kURLDescripitonMime ('urld'), URLs that carry a description.

Typo.

Thanks for fixing this!
Attachment #224881 - Flags: review?(hwaara) → review+
Attachment #224881 - Flags: superreview?(mikepinkerton)
the comments talk about how |kURLDataMime| needs to be utf8, but saying nothing about |kURLDescriptionMime|, which is also handled in the same case. Seems strange to mention one and not the other, leading me to wonder if something is wrong...
Comment on attachment 224881 [details] [diff] [review]
Patch for part 2, always use utf-8, better comments (checked in, trunk and 1.8.1)

sr=pink
Attachment #224881 - Flags: superreview?(mikepinkerton) → superreview+
Comment on attachment 224881 [details] [diff] [review]
Patch for part 2, always use utf-8, better comments (checked in, trunk and 1.8.1)

Checked in, trunk and 1.8 branch for 1.8.1b1.  Holding open for fix to part 1.
Attachment #224881 - Attachment description: Patch for part 2, always use utf-8, better comments → Patch for part 2, always use utf-8, better comments (checked in, trunk and 1.8.1)
Attachment #224881 - Flags: approval-branch-1.8.1+
With today's nightly (Version 2006060822 (1.2+)), the behavior is different but still not correct. Copying any text from Camino and pasting it into Terminal results in the URL (no embedeed null characters). For example, copying some text on the yahoo.com page and pasting into Terminal results in: "http://www.yahoo.com/".
Mark, that's right.  This bug has two parts (comment 11 and comment 19).  Fixing part 1 will mask part 2, so part 2 gets fixed first.
Comment on attachment 224887 [details] [diff] [review]
Fix for part 1

This looks good to me.

-dave
Attachment #224887 - Flags: review?(David.R.Gardiner) → review+
Attachment #224887 - Flags: superreview?(roc)
Attachment #224887 - Flags: superreview?(roc) → superreview+
Part 1 checked in on trunk.
Status: ASSIGNED → RESOLVED
Closed: 18 years ago
Resolution: --- → FIXED
(In reply to comment #38)
> Part 1 checked in on trunk.

This bug is broken again. The changes to nsCopySupport.cpp were reversed (seemingly deliberately so) by the check in for bug 255942 (attachment 225367 [details] [diff] [review]). I can't see any reason for it, so it looks like an accident. I've notified Mark Hammond over on that bug. Hopefully he'll fix it soon.

The changes applied to nsClipboard.cpp and nsDataObj.cpp are still intact, but of course won't do much on their own.
See bug 342103 which is for the re-fix.
Verified fixed on trunk
Status: RESOLVED → VERIFIED
Product: Core → Core Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: