Closed Bug 613130 Opened 14 years ago Closed 14 years ago

Paste list structure from MS-Word styles lost

Categories

(Core :: DOM: Editor, defect)

x86
Windows 7
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla2.0b8
Tracking Status
blocking2.0 --- final+
status1.9.2 --- wontfix
status1.9.1 --- wontfix

People

(Reporter: yaojun85, Assigned: ehsan.akhgari)

References

()

Details

(Keywords: regression)

Attachments

(2 files)

User-Agent:       Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US) AppleWebKit/534.12 (KHTML, like Gecko) Chrome/9.0.576.0 Safari/534.12
Build Identifier: 3.6.12

When pasting a list structure from MS Word document on Windows, the "mso:list" style originally presented in a span element that was symboling a list bullet indicator doesn't anymore appear, it's a regression observed since Firefox 3.6.



Reproducible: Always

Steps to Reproduce:
1.Copy the list structure in attached document
2.Paste it into "contentediable" frame
3.Check the HTML codes in 'Source' mode.
Actual Results:  
<p class="MsoNormal" style="margin-left: 18pt; text-indent: -18pt;"><span style="" lang="EN-US"><span style="">1.<span style="font: 7pt &quot;Times New Roman&quot;;">&nbsp;&nbsp;&nbsp; </span></span></span><span dir="LTR"></span><span lang="EN-US">List item1</span></p>


Expected Results:  
<p class="MsoNormal" style="margin-left: 18pt; text-indent: -18pt;"><span style="mso-list: Ignore" lang="EN-US"><span style="">1.<span style="font: 7pt &quot;Times New Roman&quot;;">&nbsp;&nbsp;&nbsp; </span></span></span><span dir="LTR"></span><span lang="EN-US">List item1</span></p>


The 'mso:list' style is widely used by WYSIWYG editors' "paste from word" functionality, to transform paragraphs acts as list item into a semantic list structure, and thus is broken.
Blocks: 424615
Depends on: 596300
Keywords: regression
(In reply to comment #0)
> The 'mso:list' style is widely used by WYSIWYG editors' "paste from word"
> functionality, to transform paragraphs acts as list item into a semantic list
> structure, and thus is broken.

Which WYSIWYG editors are you talking about?  To the best of my knowledge, Microsoft Word is the only one using this.
Both TinyMCE and CKEditor have functionality to handle pasting from Word and converting Word list to real semantic UL/OL lists. So there is currently a lot of frustration out there regarding the resent changes in Firefox from our users. I'm not sure how this would be best handled. I think instead of white listing styles it should blacklist the ones that are security issues like "expression" and "behavior".
What recent changes?

AFAIK no version of Firefox has ever preserved unknown style rules.
Ah, this is really about whether the contents of the 'style' attribute are preserved.

I think we do probably have a bug. We should preserve the original value of the 'style' attribute in the DOM and hand it out on a getAttribute() (until/unless someone modifies the .style properties of course). I'm sure this is already filed somewhere.
(In reply to comment #5)
> Ah, this is really about whether the contents of the 'style' attribute are
> preserved.
> 
> I think we do probably have a bug. We should preserve the original value of the
> 'style' attribute in the DOM and hand it out on a getAttribute() (until/unless
> someone modifies the .style properties of course). I'm sure this is already
> filed somewhere.

That's not what happens.  When we're pasting something, we look at all of the style attributes, feed their contents to our CSS parser, look through the parsed styles and remove -moz-binding properties, and serialize the result as the value of the style attribute to be pasted.  There is no original value of the style attribute in the DOM for us to use.
This is the code in question, FWIW <http://mxr-test.konigsberg.mozilla.org/mozilla-central/source/content/html/document/src/nsHTMLFragmentContentSink.cpp#1037>.  We do this all before injecting the style attribute in the DOM.
I guess if we don't encounter any -moz-binding values we could revert to the original value, but that would mean that we're basically blacklisting, and not white-listing... and that we'll be broken if a style attribute has both mso:list and -moz-binding.
I suppose what you really want here is a whitelist of properties that Gecko doesn't support, but we know are safe, and then we let those pass through. Maybe that can be hacked into the CSS parser somehow...
I'd probably be OK with the approach in comment #8 though.
(In reply to comment #9)
> I suppose what you really want here is a whitelist of properties that Gecko
> doesn't support, but we know are safe, and then we let those pass through.
> Maybe that can be hacked into the CSS parser somehow...

I don't have any idea how to generate such a whitelist.  I would argue that coming up with a comprehensive list is next to impossible.  If I'm not overlooking an obvious way to generate that list, I think comment 8 is the only sensible option here.
Since I'm not the original reporter of this issue I didn't do to much research
on why the paste logic is broken when it comes to lists. I think it's related
to conditional comments being stripped not styles. I compared the pasted
contents between 3.0.16 and 3.6.13 and the older version didn't have the
mso:list style either but the conditional comment "<!--[if !supportLists]-->"
is stripped. I'm not sure why it strips these comments but we look for either
that comment or the mso:list style in our code at least. Are there security
issues with comments as well?
Assignee: nobody → ehsan
Status: UNCONFIRMED → ASSIGNED
blocking1.9.1: --- → ?
blocking1.9.2: --- → ?
blocking2.0: --- → ?
Ever confirmed: true
(In reply to comment #12)
> Since I'm not the original reporter of this issue I didn't do to much research
> on why the paste logic is broken when it comes to lists. I think it's related
> to conditional comments being stripped not styles. I compared the pasted
> contents between 3.0.16 and 3.6.13 and the older version didn't have the
> mso:list style either but the conditional comment "<!--[if !supportLists]-->"
> is stripped. I'm not sure why it strips these comments but we look for either
> that comment or the mso:list style in our code at least. Are there security
> issues with comments as well?

Oh, no, there are no security issues with regards to comments.  But we've allowed comments since bug 572642, so they shouldn't be a problem.
The other issue with allowing unknown properties is that there may be unsafe values of some unknown properties (e.g., expression()).
(In reply to comment #14)
> The other issue with allowing unknown properties is that there may be unsafe
> values of some unknown properties (e.g., expression()).

True.  Catching those is not easy, though, because we don't support the expression property in our parser, and figuring out whether the original string did contain that and it was dropped by the parser is not possible, is it?
That's what I meant in comment #9. Whitelisting "unknown" properties is the only way to fix this bug and also protect against potentially dangerous unknown properties from leaking into pasted content, but it seems a bit quixotic.
(In reply to comment #16)
> That's what I meant in comment #9. Whitelisting "unknown" properties is the
> only way to fix this bug and also protect against potentially dangerous unknown
> properties from leaking into pasted content, but it seems a bit quixotic.

Well, whitelisting all of the unknown properties except the unsafe ones is very hard (and it's essentially the same as blacklisting).  And we can't use it anyways, until our CSS parser can retain such properties.
@Robert
It would not be that ridiculous of whitelisting mso-* if it's sure that clipboard comes from an office application context.
(In reply to comment #12)
> Since I'm not the original reporter of this issue I didn't do to much research
> on why the paste logic is broken when it comes to lists. I think it's related
> to conditional comments being stripped not styles. I compared the pasted
> contents between 3.0.16 and 3.6.13 and the older version didn't have the
> mso:list style either but the conditional comment "<!--[if !supportLists]-->"
> is stripped. I'm not sure why it strips these comments but we look for either
> that comment or the mso:list style in our code at least. Are there security
> issues with comments as well?
This's confirmed also, while I guess it desires a new bug, I'm not sure bug 572642 has tackled it.
(In reply to comment #18)
> @Robert
> It would not be that ridiculous of whitelisting mso-* if it's sure that
> clipboard comes from an office application context.

We don't have a reliable way to determine the source of the clipboard data.
(In reply to comment #19)
> (In reply to comment #12)
> > Since I'm not the original reporter of this issue I didn't do to much research
> > on why the paste logic is broken when it comes to lists. I think it's related
> > to conditional comments being stripped not styles. I compared the pasted
> > contents between 3.0.16 and 3.6.13 and the older version didn't have the
> > mso:list style either but the conditional comment "<!--[if !supportLists]-->"
> > is stripped. I'm not sure why it strips these comments but we look for either
> > that comment or the mso:list style in our code at least. Are there security
> > issues with comments as well?
> This's confirmed also, while I guess it desires a new bug, I'm not sure bug
> 572642 has tackled it.

Can you please file a new bug on the comment issue?
As I tried to explain in bug 596300 comment 42, there are two options to take care of all the problems related to this:

1: Provide the raw HTML in the onBeforePaste event with the clipboardData object. That way, if the application is designed to handle pasting let it do it, the application will know what are the kind of formatting it must allow and with this help it will be able to do it properly.

2: Create a new execCommand switch so an application can opt-out of all this sanitation and keep working as it does in other browsers.

As a general rule, if the application isn't doing anything special with the pasted data it won't care about bogus styles that can't be understood by the browser like those mso- that in general can be seen as garbage, so providing the data only to those that request it (as in 1) seems to me the better solution.
(In reply to comment #22)
> As I tried to explain in bug 596300 comment 42, there are two options to take
> care of all the problems related to this:
> 
> 1: Provide the raw HTML in the onBeforePaste event with the clipboardData
> object. That way, if the application is designed to handle pasting let it do
> it, the application will know what are the kind of formatting it must allow and
> with this help it will be able to do it properly.

But that doesn't change the data which is actually pasted in the DOM, so I'm not sure how this would help.

> 2: Create a new execCommand switch so an application can opt-out of all this
> sanitation and keep working as it does in other browsers.
> 
> As a general rule, if the application isn't doing anything special with the
> pasted data it won't care about bogus styles that can't be understood by the
> browser like those mso- that in general can be seen as garbage, so providing
> the data only to those that request it (as in 1) seems to me the better
> solution.

Well, the actual concern is to only pass safe data to web applications, in order to protect web applications against cross-domain attacks.  It's not really a good option to require web applications to opt in to this protection... :(
The basis of this ticket is that the application tries to detect the paste event and transform the data provided into something filtered and adjusted to its configuration. 
If we can get the raw HTML with clipboardData then we will cancel the default paste and proceed to adjust it before inserting ourselves into the DOM.

If you provide this sanitation as a toStaticHtml then we can call it in those situations where the user wants it that way (after we have done our clean up switching styles to semantic elements), there are other times when we must allow event handlers, inline scripts...
How does that mitigate the security concerns, then?
(In reply to comment #21)
> Can you please file a new bug on the comment issue?
Bug 572642 is filed for it.

(In reply to comment #18)
> We don't have a reliable way to determine the source of the clipboard data.
Ok...but is there any known security problem with "mso-*"?
(In reply to comment #26)
> (In reply to comment #21)
> > Can you please file a new bug on the comment issue?
> Bug 572642 is filed for it.
Sorry, I actually mean Bug 613912.
I think we should do comment #8 for Firefox 4 and later develop a safer solution based on parsing unknown properties and passing through some subset of them. (Personally I think we could pass through all prefixed properties whose prefix isn't known to be used by browsers.)
Attached patch Patch (v1)Splinter Review
This patch implements comment 8.  Asking review from bz on the content part and from roc on the layout part.
Attachment #492879 - Flags: review?(roc)
Attachment #492879 - Flags: review?(bzbarsky)
Comment on attachment 492879 [details] [diff] [review]
Patch (v1)

I don't like it, but it seems like the best of the bad options.
Attachment #492879 - Flags: review?(roc) → review+
Whiteboard: [has patch][needs review bz]
this is a regression from which landing? guessing bug 520189
Not going to block a branch release on this, but will look at the patch if you request branch approval (after review and trunk landing).
blocking1.9.1: ? → ---
blocking1.9.2: ? → ---
(In reply to comment #31)
> this is a regression from which landing? guessing bug 520189

Yes.
Comment on attachment 492879 [details] [diff] [review]
Patch (v1)

r=me.  I hate the web.  :(
Attachment #492879 - Flags: review?(bzbarsky) → review+
Whiteboard: [has patch][needs review bz] → [needs landing]
Landed the patch:

http://hg.mozilla.org/mozilla-central/rev/4da141a1fbff

It would be great if someone could verify this fix in tomorrow's nightly.
Status: ASSIGNED → RESOLVED
Closed: 14 years ago
Flags: in-testsuite+
Resolution: --- → FIXED
Whiteboard: [needs landing]
Target Milestone: --- → mozilla2.0b8
Depends on: 616723
No longer depends on: 616723
Juan, Marcia, is this something you guys can verify please?
I installed TinyMCE, and I opened one of the examples there (simple.html). Then, using MS Word 2010, I opened the document containing a list and copied all. When I paste this into the example and select one of the entries (a. aasdsd) and I view source, I get this:

4.0 Beta 7: 
<p class="MsoNormal" style="margin-left: 0.75in; text-indent: -0.25in;"><span style=""><span style="">a.<span style="font: 7pt &quot;Times New Roman&quot;;">&nbsp;&nbsp;&nbsp; </span></span></span>aasdsd</p>

4.0b8pre(12/06): 
<p class="MsoNormal" style="margin-left: 0.75in; text-indent: -0.25in;" _mce_style="margin-left: .75in; text-indent: -.25in; mso-list: l0 level2 lfo1; tab-stops: list .75in;"><span style="" _mce_style="mso-fareast-font-family: Arial; mso-bidi-font-family: Arial;"><span style="" _mce_style="mso-list: Ignore;">a.<span style="font: 7pt &quot;Times New Roman&quot;;" _mce_style="font: 7.0pt &quot;Times New Roman&quot;;">&nbsp;&nbsp;&nbsp; </span></span></span>aasdsd</p>

Garry, it doesn't look exactly as the expected results in comment #0. Does this depend on Word version?
(In reply to comment #37)
> I installed TinyMCE, and I opened one of the examples there (simple.html).
> Then, using MS Word 2010, I opened the document containing a list and copied
> all. When I paste this into the example and select one of the entries (a.
> aasdsd) and I view source, I get this:
> 
> 4.0 Beta 7: 
> <p class="MsoNormal" style="margin-left: 0.75in; text-indent: -0.25in;"><span
> style=""><span style="">a.<span style="font: 7pt &quot;Times New
> Roman&quot;;">&nbsp;&nbsp;&nbsp; </span></span></span>aasdsd</p>
> 
> 4.0b8pre(12/06): 
> <p class="MsoNormal" style="margin-left: 0.75in; text-indent: -0.25in;"
> _mce_style="margin-left: .75in; text-indent: -.25in; mso-list: l0 level2 lfo1;
> tab-stops: list .75in;"><span style="" _mce_style="mso-fareast-font-family:
> Arial; mso-bidi-font-family: Arial;"><span style="" _mce_style="mso-list:
> Ignore;">a.<span style="font: 7pt &quot;Times New Roman&quot;;"
> _mce_style="font: 7.0pt &quot;Times New Roman&quot;;">&nbsp;&nbsp;&nbsp;
> </span></span></span>aasdsd</p>
> 
> Garry, it doesn't look exactly as the expected results in comment #0. Does this
> depend on Word version?

This is possibly good, but can you please test it in MidasDemo instead (because TinyMCE might be adding its own stuff...)?
>Garry, it doesn't look exactly as the expected results in comment #0. Does this
>depend on Word version?
Not a big issue, the structure might defers from IE and Firefox (I'm actually making the TC refering to IE case) as long as the styles are preserved, confirm to be fixed on 4.0b8pre(12/06), a wonderful fix finally for wysiwygers.
Any chance to have this landing on Firefox 3.6?
(In reply to comment #40)
> Any chance to have this landing on Firefox 3.6?

I think that this fix is too risky for Firefox 3.6.  Please upgrade to Firefox 4, which has this bug fixed.
Not to say that this issue is affecting me directly, but all CKEditor users around the world. Anyway, it's up to you guys to decide.
Hmm, well, I gave it a bit more thought, and actually read the patch more closely.  The patch touches a lot of code, which makes me nervous, but then again it fixes a regression introduced on the branch...  roc: what do you think?
It is not a serious security or stability issue so it doesn't meet branch criteria. I'd have a hard time justifying taking it on the branch.
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: