Open Bug 665341 Opened 13 years ago Updated 4 years ago

Pasting from word still uses local references for images

Categories

(Core :: DOM: Editor, defect, P5)

x86
Windows XP
defect

Tracking

()

REOPENED

People

(Reporter: amla70, Unassigned)

References

(Blocks 1 open bug)

Details

Bug 490879 fixed the problem about pasting directly an image into a contentEditable area, so that it no longer uses a file:// url and now it's a data: 

That's good, but there is other issue that affects also the privacy of the user and the ability to work correctly with such contentEditable elements, and it's that if some content with an image is copied from MS Word (didn't test other sources at the moment), then a local reference is inserted.

That means that it won't work on the web and that the user has unexpectedly pasted part of the structure of his hard disk into a public page.

Example of such url:

<p style="margin-left:18.0pt;">
	<img height="98" src="file:///C:/DOCUME%7E1/ADMINI%7E1/CONFIG%7E1/Temp/msohtml1/01/clip_image002.gif" width="212" /></p>


The same kind of processing that turned standalone images into data: should be applied when pasting HTML
That's essentially bug 437217, but not really a duplicate. Since MS Office isn't sending an image over the clipboard but a reference to the image it deposited in a temporary file instead, and it's not drag-and-drop either (for which the image would be converted to a data: URI now as well per bug 609632), the file: URI is retained.

Thus, the general question is if anything that's neither http: nor https: should be resolved to a data: URI to ensure that no local references are included.
Blocks: 437217
I think this is a bug in Microsoft Word.  I don't think we should go inside the injected HTML block and try to replace all of the local file references to data: URIs.

How do WebKit, Opera and IE handle this?
Right; in this case Word is generating the HTML, not us; we're just getting HTML from the clipboard....
It is true that the origin of the HTML is external, but that's the point about this bug. When the user pastes something into a page, he doesn't care about the origin, he only sees that in the other app the text included images, but now the images are gone or are some little squares and blames Firefox and the online editor used for not pasting the images.

The web page can't read those files if the browser doesn't provide access to them, so if you don't think that it's good to replace them with data: like it's done for standalone pictures, then the other solution would be to add them in a Files object in the dataTransfer of the paste event, so that the page can opt to read them and save their contents properly before saving the whole HTML.


About other browsers, testing in http://ckeditor.com/demo : 
IE8 behaves in this case like Firefox (except that it's pasting a .emz instead of a .gif for the same clipboard content, maybe this is related to the clean up done by the editor)
Chrome 12 is exactly like Firefox, although I have to check if they have started adding support for files in the dataTransfer object, as that would be the explanation about the new "chrome only" feature of pasting images in Gmail, then they might expand it to cover this case (certainly I will check that there's a ticket filed at their side about this and they might want to fix it for GMail and GDocs).
And Opera 11 is a mix, includes both the .gif and .emz images
Can you please test in http://www.mozilla.org/editor/midasdemo/ ?  ckeditor might be doing its own processing in addition to the work done in the browser.

Thanks!
Firefox 4.01 generates this

<!--[if !mso]> <style> v\:* {behavior:url(#default#VML);} o\:* {behavior:url(#default#VML);} w\:* {behavior:url(#default#VML);} .shape {behavior:url(#default#VML);} </style> <![endif]--><!--[if gte mso 9]><xml> <w:WordDocument> <w:View>Normal</w:View> <w:Zoom>0</w:Zoom> <w:HyphenationZone>21</w:HyphenationZone> <w:PunctuationKerning/> <w:ValidateAgainstSchemas/> <w:SaveIfXMLInvalid>false</w:SaveIfXMLInvalid> <w:IgnoreMixedContent>false</w:IgnoreMixedContent> <w:AlwaysShowPlaceholderText>false</w:AlwaysShowPlaceholderText> <w:Compatibility> <w:BreakWrappedTables/> <w:SnapToGridInCell/> <w:WrapTextWithPunct/> <w:UseAsianBreakRules/> <w:DontGrowAutofit/> </w:Compatibility> <w:BrowserLevel>MicrosoftInternetExplorer4</w:BrowserLevel> </w:WordDocument> </xml><![endif]--><!--[if gte mso 9]><xml> <w:LatentStyles DefLockedState="false" LatentStyleCount="156"> </w:LatentStyles> </xml><![endif]--><!--[if gte mso 10]> <style> /* Style Definitions */ table.MsoNormalTable {mso-style-name:"Tabla normal"; mso-tstyle-rowband-size:0; mso-tstyle-colband-size:0; mso-style-noshow:yes; mso-style-parent:""; mso-padding-alt:0cm 5.4pt 0cm 5.4pt; mso-para-margin:0cm; mso-para-margin-bottom:.0001pt; mso-pagination:widow-orphan; font-size:10.0pt; font-family:"Times New Roman"; mso-ansi-language:#0400; mso-fareast-language:#0400; mso-bidi-language:#0400;} </style> <![endif]--> <p class="MsoNormal"><img src="file:///C:/DOCUME%7E1/ADMINI%7E1/CONFIG%7E1/Temp/msohtml1/01/clip_image002.gif" width="212" height="98"></p> 


IE8 gives an error trying to switch to source mode

Chrome 12 generates this

<p class="MsoNormal"><!--[if gte vml 1]><v:shapetype id="_x0000_t75" coordsize="21600,21600" o:spt="75" o:preferrelative="t" path="m@4@5l@4@11@9@11@9@5xe" filled="f" stroked="f"> <v:stroke joinstyle="miter"/> <v:formulas> <v:f eqn="if lineDrawn pixelLineWidth 0"/> <v:f eqn="sum @0 1 0"/> <v:f eqn="sum 0 0 @1"/> <v:f eqn="prod @2 1 2"/> <v:f eqn="prod @3 21600 pixelWidth"/> <v:f eqn="prod @3 21600 pixelHeight"/> <v:f eqn="sum @0 0 1"/> <v:f eqn="prod @6 1 2"/> <v:f eqn="prod @7 21600 pixelWidth"/> <v:f eqn="sum @8 21600 0"/> <v:f eqn="prod @7 21600 pixelHeight"/> <v:f eqn="sum @10 21600 0"/> </v:formulas> <v:path o:extrusionok="f" gradientshapeok="t" o:connecttype="rect"/> <o:lock v:ext="edit" aspectratio="t"/> </v:shapetype><v:shape id="_x0000_i1025" type="#_x0000_t75" style='width:159pt; height:73.5pt'> <v:imagedata src="file:///C:\DOCUME~1\ADMINI~1\CONFIG~1\Temp\msohtml1\01\clip_image001.emz" o:title=""/> </v:shape><![endif]--><!--[if !vml]--><img width="212" height="98" src="file:///C:/DOCUME~1/ADMINI~1/CONFIG~1/Temp/msohtml1/01/clip_image002.gif" v:shapes="_x0000_i1025"><!--[endif]--></p>


And Opera 11.11

<p class="MsoNormal"><v:shapetype id="_x0000_t75" coordsize="21600,21600" o:spt="75" o:preferrelative="t" path="m@4@5l@4@11@9@11@9@5xe" filled="f" stroked="f"> <v:stroke joinstyle="miter"> <v:formulas> <v:f eqn="if lineDrawn pixelLineWidth 0"> <v:f eqn="sum @0 1 0"> <v:f eqn="sum 0 0 @1"> <v:f eqn="prod @2 1 2"> <v:f eqn="prod @3 21600 pixelWidth"> <v:f eqn="prod @3 21600 pixelHeight"> <v:f eqn="sum @0 0 1"> <v:f eqn="prod @6 1 2"> <v:f eqn="prod @7 21600 pixelWidth"> <v:f eqn="sum @8 21600 0"> <v:f eqn="prod @7 21600 pixelHeight"> <v:f eqn="sum @10 21600 0"> </v:formulas> <v:path o:extrusionok="f" gradientshapeok="t" o:connecttype="rect"> <o:lock v:ext="edit" aspectratio="t"> </v:shapetype><v:shape id="_x0000_i1025" type="#_x0000_t75" style="width:159pt; height:73.5pt"> <v:imagedata src="file:///C:\DOCUME~1\ADMINI~1\CONFIG~1\Temp\msohtml1\01\clip_image001.emz" o:title=""> </v:shape><img width="212" height="98" src="file:///C:\DOCUME~1\ADMINI~1\CONFIG~1\Temp\msohtml1\01\clip_image002.gif" v:shapes="_x0000_i1025"></p>
So, this shows that this is not an interoperability issue, and my comment 2 still applies.  I'm WONTFIXing this bug for now.
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → WONTFIX
For the records, I don't see this as a bug, but as a new feature.
IE11 has implemented this feature as well as adding an additional twist.

Check their demo at:
http://ie.microsoft.com/testdrive/browser/Editing_Paste_Image/

If you leave the default "no script" and copy an image into the clipboard from a screenshot for example, both Firefox and IE11 will generate an image with a data URI. Chrome requires to switch to "blob script" (IE11 will also paste in this case by generating a blob URI) to use the paste event.

Now open a document in MS Word with some text and one or more images, and try pasting again. Look at the pasted data between all the xml-msword crap:
Firefox currently removes all the references to the image (so the privacy issue part of this bug is fixed) Only the text is pasted and the user says "Boo!"
Chrome seems to paste just what is in the clipboard, but that's a reference to a local file so it's not usable.

IE11 pastes the html embedding the image as base64 (if the no script option is selected) or as blob references (by using the paste event and the files list). So now the page can use the content that exists in Word and the user is very happy because he can finish his task easily.

Is it possible to reopen the bug and review this decision?

The goal would be to have a behavior like IE 11:
By default convert local references in HTML fragments (currently I think that only MS Word generates them) to base64 data URI, and as a next enhancement provide an equivalent to msConvertURL method http://msdn.microsoft.com/en-us/library/ie/dn254951%28v=vs.85%29.aspx
Ups, after reviewing the setting in this version of MS Word and disabling the VML option in the export to web panel I see that Firefox hasn't changed anything, the image is pasted as a reference to a local file.
(In reply to comment #10)
> Ups, after reviewing the setting in this version of MS Word and disabling the
> VML option in the export to web panel I see that Firefox hasn't changed
> anything, the image is pasted as a reference to a local file.

So are you still suggesting that we should do something here?  If yes, please elaborate.  I'm not sure if I understand what exactly you're asking us to do here.
Comment #10 was just a correction about a statement in #9 that Firefox had changed its behavior since 2011 to fully remove the image, but the change it seems that it was my version of MS Word (I don't use it except for testing :-).

The initial report stands the same way and IE11 has been released with the requested behavior.

1. In MS Word open a document with some text and at least an image.
2. Go to http://ie.microsoft.com/testdrive/browser/Editing_Paste_Image/ and paste with Firefox, look at the "pasted markup" and you'll get something like this:

<p style="margin-bottom:0cm;margin-bottom:.0001pt">Text before</p>
<p style="margin-bottom:0cm;margin-bottom:.0001pt"><span style="mso-no-proof:
yes"><img src="file:///C:\Users\Alfonso\AppData\Local\Temp\msohtmlclip1\01\clip_image001.png" height="211" width="335"></span></p>
...

3. Paste with IE11 and by default it converts the image to base64 data instead of a file: url
Text before</p>
...
<img height="211" width="335" src="
...


This feature makes it possible to really copy and paste a document with its content from MS Word to a web page and include the referenced images. 
This is an example that takes the data and stores the embedded images as files on the server: http://martinezdelizarrondo.com/ckplugins/simpleuploads.demo4/

Thanks for the quick reply
OK, so it looks like what Microsoft has implemented in IE11 is an msConvertURL method on the files collection of window.clipboardData.  See <http://msdn.microsoft.com/en-us/library/ie/dn254951%28v=vs.85%29.aspx>.  I can't find a spec for this method.  As long as this is an IE proprietary API, then this remains WONTFIX I'm afraid.
The msConvertURL method is used to handle the .files property of the clipboardData object to Blob objects, but without any scripting and any new API, now when a user pastes into a contentEditable element from MS Word they will get the images embedded as base64 data.

The first part of this post shows a table comparing the two pasting options available in IE11 
http://blogs.msdn.com/b/ie/archive/2013/10/24/enhanced-rich-editing-experiences-in-ie11.aspx

This ticket only requests the first column "DataURI". The msConvertURL is used for the "Blob" column and as you said, the first task would be to find a spec for it.
Firefox already supports pasting single images as Data URIs, extending it to handle file: uri and convert those files to base64 data would mean greater compatibility with MS Office (I guess that the same clipboard behavior is present in Outlook for example)
I see.  OK, reopening this bug then!  Thanks for taking the time to convince me.  :-)
Status: RESOLVED → REOPENED
Resolution: WONTFIX → ---
As the editor of the Clipboard Events spec, I'm proposing a somewhat different take on this: "cid:"-URIs for embeds. See http://dev.w3.org/2006/webapi/clipops/clipops.html (search for "cid:").

The idea is that rather than embedding potentially very huge data: URLs or reference local files in the embedded markup, we add a reference to the DataTransferItemList, and use the index of this reference to construct a cid: URI in the markup that clipboardEvent.getData('text/html') will see. The script processing this data can then pull out the cid: URIs, do drag-and-drop style file uploads for referenced clipboard parts, and update the data to refer to the locations on the server eventually (maybe first using an intermediate placeholder image or something like that.)

I'm looking for feedback regarding whether this is implementable and a good solution. I haven't had much (if any) feedback from implementors on this issue yet.
(In reply to Hallvord R. M. Steen from comment #16)
> As the editor of the Clipboard Events spec, I'm proposing a somewhat
> different take on this: "cid:"-URIs for embeds. See
> http://dev.w3.org/2006/webapi/clipops/clipops.html (search for "cid:").

Hmm, a cid: URI would be an addition to the Web platform, since it's currently not used anywhere else in the platform (that at least I know of.)  As such, I think the decision on whether or not we should do that should happen in a broader forum.  I'm not sure what that forum would be though, Anne do you know?

> The idea is that rather than embedding potentially very huge data: URLs or
> reference local files in the embedded markup, we add a reference to the
> DataTransferItemList, and use the index of this reference to construct a
> cid: URI in the markup that clipboardEvent.getData('text/html') will see.
> The script processing this data can then pull out the cid: URIs, do
> drag-and-drop style file uploads for referenced clipboard parts, and update
> the data to refer to the locations on the server eventually (maybe first
> using an intermediate placeholder image or something like that.)

Hmm, I'm not sure why that is better than looking at data: URIs, uploading them to a server and replacing them with another URI pointing to the resource stored on the server.  What am I missing?
Flags: needinfo?(annevk)
I'll ask on public-webapps for a wider review (could also take it to the general whatwg list of course, though that list is too busy for me to follow on a regular basis).

Regarding data: URLs, isn't there a limit to how much data it's convenient (or even possible) to handle in such a format? I suppose we want to support use cases like, say, pasting some HTML that references a local video file. Encoding every binary part no matter its size into a data: URL seems like a kludge to me..
(BTW cid: URIs are of course used millions of times every day in E-mailed HTML. You may define that as outside the scope of the web platform, it is nevertheless a simple and very widely deployed solution to this very problem. I do agree though that this would be the first time it's used in a W3C spec intended for wider web usage, and )
oops - sent prematurely. I wanted to finish off saying I fully agree with you that it requires wider review and discussion..
(In reply to Hallvord R. M. Steen from comment #18)
> I'll ask on public-webapps for a wider review (could also take it to the
> general whatwg list of course, though that list is too busy for me to follow
> on a regular basis).

Sounds good.

> Regarding data: URLs, isn't there a limit to how much data it's convenient
> (or even possible) to handle in such a format?

There might be implementation defined limitations.  Gecko doesn't cap the maximum length, as far as I can see in the code.

> I suppose we want to support
> use cases like, say, pasting some HTML that references a local video file.
> Encoding every binary part no matter its size into a data: URL seems like a
> kludge to me..

Yeah I guess processing such huge data: URIs won't be very efficient.

(In reply to Hallvord R. M. Steen from comment #19)
> (BTW cid: URIs are of course used millions of times every day in E-mailed
> HTML. You may define that as outside the scope of the web platform, it is
> nevertheless a simple and very widely deployed solution to this very
> problem. I do agree though that this would be the first time it's used in a
> W3C spec intended for wider web usage, and )

Yeah I'm aware of the usage of cid: URIs for email, but I'm unaware of any browser engine (including Gecko) implementing those URIs, since they're not used anywhere on the web.
Flags: needinfo?(annevk)
There wasn't much feedback on this issue as far as I remember. I guess lack of discussion is bad rather than good..?
Just a note: if the source is Libre Office 4.4, then the image is embedded as base64 data, so any page that has taken this into account (as well as handling normal image pastes in Firefox) would just work if you make the suggested change and bring parity with IE11.

There are several problems that each browser has related to pasting from the native OS and this is one of the problems in Firefox (as well as the inability to paste files outside a contentEditable element)
Still reproducible with Microsoft Word 2013, the file referenced is an image in %AppData%\Local\Temp.
Some observations:
Comment #23:
Yes, LibreOffice places a copied image as base64 data onto the clipboard, so it pastes into contenteditables (incl. Thunderbird's composition window).

Comment #24:
Yes, MS Office places the image as a link to somewhere in %AppData%\Local\Temp.
Pasting into Thunderbird works nicely, since some logic jumps in, grabs the image, adds it as attachment and changes the link to this a content-id, for example: <img src="cid:part1.01070202.08030108@domain.com">
Something similar was mentioned in comment #16 to #19.

Pasting into a content editable (URL: data:text/html, <html contenteditable>) shows a broken image. Not sure why.

However, using htmlpaste.html (attachment 8593110 [details]) from bug 586587 and a version of FF with its patch applied, it does paste correctly. Again, I'm not sure why this works, whereas pasting directly into a contenteditable does not.

Anyway, that's all idle talk since the bug is about converting the "file:"-URI to a "data:"-URI or a "cid:"-URI.
(In reply to Jorg K (GMT+2) from comment #25)
> Pasting into a contenteditable (URL: data:text/html, <html
> contenteditable>) shows a broken image. Not sure why.
Pasting into "regular" page that contains a contenteditable works as expected, but again the MIDAS demo (http://www-archive.mozilla.org/editor/midasdemo/) doesn't. So there are subtle differences.
> Pasting into "regular" page that contains a contenteditable works as
> expected, but again the MIDAS demo
> (http://www-archive.mozilla.org/editor/midasdemo/) doesn't. 

Was that "regular" page loading from file://? http(s):// resources are not supposed to have access to file:// content, so pasting markup with file:// paths shouldn't work on http(s):// pages..
I don't think I understand the question. The "regular" page is this:
  <html><body>
  <div style="background-color:#eee;border:1px solid #000;padding:10px; 
  height:200px;" contenteditable=""></div>
  </body></html>
loaded from a local file.
What doesn't work is http://www-archive.mozilla.org/editor/midasdemo/ and this URL:
  data:text/html, <html contenteditable>
I looked at the latter with some developer tools:
FF says: "Could not load the image".
Chrome says: "Not allowed to load local resource".
Looks like it is some security issue, as you indicated.
Sorry, this really has little to do with this bug.
ah, it probably worked because you loaded it from a local file. Local URLs are, I believe, allowed to load images from the filesystem?
Indeed. Moving the document to a server and pasting gives the same as the Midas demo: A broken image.
So we can close this side issue. Sorry for bringing it up.
In the Thunderbird compose Windows pasting from MS Word will now convert the file: URLs to data: URLs. This works from TB 52 onwards.
No longer blocks: attach-paradigm-fail

Bulk-downgrade of unassigned, untouched DOM/Storage bug's priority.

If you have reason to believe, this is wrong, please write a comment and ni :jstutte.

Severity: normal → S4
Priority: -- → P5
You need to log in before you can comment on or make changes to this bug.