Open Bug 822741 Opened 9 years ago Updated 5 years ago

HTML formatting (color) correctly shown in composition editor after Copy & Paste from MS Word, but lost by TB when saving as draft or sending (e.g.<span style="font-family:&quot;Times New Roman&quot;,&quot;serif&quot;;color:red">), HTML attr badly wrapped

Categories

(MailNews Core :: Composition, defect)

x86_64
Windows 7
defect
Not set
normal

Tracking

(Not tracked)

People

(Reporter: gworley, Unassigned)

References

(Blocks 1 open bug)

Details

Attachments

(7 files)

User Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:16.0; WUID=e44baded1c92cf712b201676e1a4568c; WTB=3253) Gecko/20100101 Firefox/16.0
Build ID: 20121024073032

Steps to reproduce:

I send a calendar out every Monday which is done in Microsoft Word to a group of volunteers for homeless shelter.  I copy the Word document and paste into a new message.   The colors look great and correct before the message is sent.  When the message is received the colors are gone.  The man color that is missing is Red... if the font color is Red, it is received as Black.   If the text is highlighted with Yellow, it is received in just Black.   I even tried saving the calendar in HTML and I looked at the HTML to see the colors and it appeared to be right.   I edited the colors in Thunderbird and some weren't stripped however some still were stripped but I should not have to re-edit the calendar after I pasted it just to get the colors right.   The message that is saved in the sent folder doesn't  maintain the correct colors.


Actual results:

Colors are stripped when message is sent.


Expected results:

Formatting of the message should be maintained when the message is sent.
Actually sent in text/html? Sent as text/plain?
(Check message source of sent mail copy in Sent folder) 
If sent as text/plain instead of expected text/html, phenomenon of bug 677171?
See also bugs pointed in that bug.

If IIRC, Format/Auto-Detect sends in text/html when color is set.
Is mail receipient's attribute of "Plain text mail only" requested in your address book?

If sent in text/html, when you viewed sent mail copy in Sent folder, what option is selected at "View/Message Body As"?
I realize that all HTML can be stripped by the receiving end.  One of the receivers was complaining about not seeing the formatting and that was the first question I asked the person are you set to for plain text or HTML   What is strange is some of the formatting is getting through.   If a time period is open I was highlighting black text in yellow and the highlighting was being stripped. So I changed the text color to very dark blue with yellow highlighting and that got sent correctly and received correctly.   The shelter is not normally opened during the day but on Christmas day it was decided to keep the it open for the entire day.   I highlighted the day shift with a green background and white letters.  It was received with a green background and black letters.  We have groups that work on specific days every month and I was changing the text for them to red and that was being stripped out.  I sent it from another client and Thunderbird displayed it fine.

"View/Message Body As" is set for "Original HTML."
(In reply to George Worley from comment #0)
> Created attachment 693518 [details]
> Calendar from the sent folder

Is this mail which produces your problem?
If so, I could observe phenomenon of "color:red in style of span is not applied" at following in text/html part(first row of first table in the HTML) with the attached mail.
(Note: "Rendering with highlight-by-yellow as expected" is seen in many columns of many rows in the table)
>     <table class="MsoTableGrid"
>       style="border-collapse:collapse;border:none;mso-border-alt:solid
>       windowtext .5pt; mso-yfti-tbllook:1536;mso-padding-alt:0in 5.4pt
>       0in 5.4pt" border="1" cellpadding="0" cellspacing="0">
>       <tbody>
>         <tr style="mso-yfti-irow:0;mso-yfti-firstrow:yes">
>           <td colspan="8" style="width:9.15in;border:solid windowtext
>             1.0pt; mso-border-alt:solid windowtext .5pt;padding:0in
>             5.4pt 0in 5.4pt" valign="top" width="878">
>             <p class="MsoNoSpacing" style="text-align:center"
>               align="center"><b style="mso-bidi-font-weight:normal"><span
>                   style="font-size:8.0pt;font-family: &quot;Times New
>                   Roman&quot;,&quot;serif&quot;;color:red">January 2013<o:p></o:p></span></b></p>
>           </td>
>         </tr>

(1) Because character entity(&quot;) is used in HTML, it's interpreted as '"' by HTML parsing.
> <span style="font-size:8.0pt;font-family: "Times New Roman","serif";color:red">January 2013<o:p></o:p></span>
(2) Because '"' is for start/end of parameter value in HTML, it's identical to following.
> <span style="font-size:8.0pt;font-family: " unknown/broken attributes of span element>January 2013<o:p></o:p></span>
(3) So, string of "January 2013" is correctly rendered with requested color at higher level element(in this case, by #000000).

Because following is seen, the mail was generated by Tb.
> User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:17.0) Gecko/17.0 Thunderbird/17.0
And, because you say "I copy the Word document and paste into a new message", the <span> tag with style attribute was generated by your MS Word.

=> INVALID, because fault in MS Word.

What happens when you do following instead of "Copy&Paste to Tb's mail composition window"?
(1) Save the MS Word document as HTML file by MS Word himself.
(2) By MS Word, request mail send via default mailer,
    and Save As Draft by mailer(or Send Later if Thunderbird).
   (2-A) Default mailer = MS's one such as Windows Mail
   (2-B) Default mailer = Thunderbird
Summary: Sent email doesn't have the colors → Sent email doesn't have the colors, if HTML data is Copy&Paste'ed from MS Word to Tb's mail composition window, because broken style attribute data of HTML element is Copy&Paste'ed from MS Word)
Attachment #693522 - Attachment description: The MS Word Document that is copied in pasted into the message. → Testcase1.docx: MS Word document whose content is copied and pasted into the message
We all know that MSO produces bad formats, but in this case, the problem seems to be within TB.

STR

1) Open Testcase1.docx of attachment 693522 [details] with Word 2010 (suppose Word 2007 should be same).

2) Ctrl+A to select whole content of Testcase1.docx, then Ctrl+C to copy.

3) In TB, Ctrl+N to start new composition with "Compose messages in HTML format" = true

4) Ctrl+V to paste content into TB editor

5) verify color formatting in TB editor

6) Ctrl+S to "Save Draft"

7) go to your drafts folder, and verify color formatting of saved draft in TB message reader

Actual Result

after 5) pasted content showing correctly in TB editor, with all formatting preserved (red headline "January 2013", yellow background for text of some calendar entries)

after 7) color formatting no longer seen in saved draft

Expected Result

after 7) color formatting should not be removed or altered in any way when saving as draft or sending

As a side note, that's irrespective of the correctness of the format: Even if the HTML were actually malformed (I don't think so), but we actually manage to parse and show it correctly in editor, then we also need to preserve it correctly when saving as draft or sending, according to the UX-principle of wysiwyg.
Status: UNCONFIRMED → NEW
Ever confirmed: true
Summary: Sent email doesn't have the colors, if HTML data is Copy&Paste'ed from MS Word to Tb's mail composition window, because broken style attribute data of HTML element is Copy&Paste'ed from MS Word) → HTML formatting (color) correctly shown in composition editor after Copy & Paste from MS Word, but lost by TB when saving as draft or sending (e.g. <span style="font-family:&quot;Times New Roman&quot;,&quot;serif&quot;;color:red">)
Here's the reason why this breaks:

working (as pasted into editor, no line breaks, no spaces):

<span style="font-family:&quot;Times New Roman&quot;,&quot;serif&quot;;color:red">
(showing correctly as Times New Roman with red text color)

broken (when saving as draft/sending, TB inserts a line break and at least 2 extra spaces):

<span style="font-family:&quot;Times New[CR][LF]
[space][space]Roman&quot;,&quot;serif&quot;;color:red">
(showing wrongly as default font with black text color)

This could even be an HTML parsing bug in browser.
Here's a reduced testcase 2a to expose the root problem as described in comment 6.
This is the message "as-composed", with font-family and color showing correctly (as Times New Roman, red).
(In reply to Thomas D. from comment #6)
> <span style="font-family:&quot;Times New
> Roman&quot;,&quot;serif&quot;;color:red">
> (showing correctly as Times New Roman with red text color)
> 
> broken (when saving as draft/sending, TB inserts a line break and at least 2
> extra spaces):
> <span style="font-family:&quot;Times New[CR][LF]
> [space][space]Roman&quot;,&quot;serif&quot;;color:red">
> (showing wrongly as default font with black text color)

This is a phenomenon of Bug 650206(unconditionaly inserts line break at 72 unicode chars, even at mid of attribute value in HTML tag).
See also bug 653342 for unconditional insertion of line break between continuous CJK characters.
"Excess space after inserted line break" may be phenomenon like followings.
(a) excess formatting of HTML(indention by extra spaces at begining of a line)
    which is seen in bug 653342.
(b) format=flowed like line break(and space) insertion.
(c) bad handling of character entity(&quot; in this case) in editing.
    IIRC, I saw bug for phenomenon like following. 
       "word1&entity;<space>word2" is changed to "<space>word1&entity;word2"
Testcase 2b is what you get after saving Testcase 2a as draft, as described in comment 6. Imo, Browser is supposed to ignore the [CR][LF][Space][Space] and parse this correctly. So I think this is a bug in the browser core.

Ludovic, can you cc somebody to evaluate if this is a bug in browser core?
Flags: needinfo?(ludovic)
Per comment 8 / Bug 650206: probably not a bug in browser core, but in the HTML formatter, which should not force linewrap inside HTML attributes. Is there any way of wrapping over-long HTML attribute values correctly?
Say you have

<a href="#" title="A very long title attribute which continues beyond a single line and still goes on and on and on...">

Is there any way of wrapping the title attribute correctly and precisely in HTML source, e.g. after the word "continues"?
(In reply to Thomas D. from comment #7)
> Created attachment 694782 [details]
> Reduced Testcase2a.eml: As composed, showing font-family & color correctly

Even when original has no extra space at begining of single line <span> ... </span>,
> <span style="font-family:&quot;Times New Roman&quot;,&quot;serif&quot;;color:red">Hello world</span>
Tb generated following two lines in Drafts folder after Edit As New, Save As Draft.
>     <span style="font-family:&quot;Times New
>       Roman&quot;,&quot;serif&quot;;color:red">Hello world</span>
> (4 spaces before <span, 6 spaces before Roman)

Extra spaces betweeen "Times New" and "Roman" is apparently generated by Tb's excess formating of HTML(idention by extra spaces).

Setting dependency to Bug 650206 for ease of analysis and tracking.
Depends on: 650206
Summary: HTML formatting (color) correctly shown in composition editor after Copy & Paste from MS Word, but lost by TB when saving as draft or sending (e.g. <span style="font-family:&quot;Times New Roman&quot;,&quot;serif&quot;;color:red">) → HTML formatting (color) correctly shown in composition editor after Copy & Paste from MS Word, but lost by TB when saving as draft or sending (e.g.<span style="font-family:&quot;Times New Roman&quot;,&quot;serif&quot;;color:red">) HTML attr badly wrapped
(In reply to Thomas D. from comment #10)
> Is there any way of wrapping over-long HTML attribute values correctly?

Is there any clear defiintion in HTML on "line break in HTML source text"?

If I understand HTML definition(s) normally, "treatment of line break between continuous string in definition of HTML" is "do it appropriately as you like", although HTML documents requests "line break between ascii-chars == space" and "line break between CJK-chars= == null" and perhaps "line break between CJK-chars and ascii-chars or vice versa == space", and although I don't know current clear definition in recent HTML5.
And, IIRC, I couldn't find clear/sold definition in HTML for "line break within a quoted string by single quote's(or by double quote's) in attribute vallue of an HTML element".

Further, even if issue around "line break in HTML source text" will be completely/perfectly resolved, issue around "a line in mail data stream is a line sent via protocol named SMTP" remains, because SMTP has "line length limitation".
This is a reason why "send text/html part in base64 if an HTML text line is too long" is done by patch proposed in bug 653342.
Component: Untriaged → Composition
Product: Thunderbird → MailNews Core
An interesting note here -- the last time I sent the calendar out I tried something just for the heck of it...

1) Opened up Windows Mail Live (Yuck)
2) Connected an account to my Gmail account using imap
3) Created a new message
4) Copied and Pasted the Calendar edited in MS Word.
5) Save the message as a draft
6) Opened up Thunderbird
7) Opened up the saved draft message.
8) Much to my surprise, all of the formatting was there.
9) Edited the message as new
10) Changed all the message account so I was sending from the Shelter's email address
11) Added the greeting and listsrv address.
12) Sent
13) Message was received with all formatting intact. 

The next time I send the calendar to the volunteers, I will try create a draft in my Gmail imap folder and see if the formatting is preserved with Thunderbird and report back my finding.
Component: Composition → Message Compose Window
Product: MailNews Core → Thunderbird
Component: Message Compose Window → Composition
Product: Thunderbird → MailNews Core
Flags: needinfo?(ludovic)
Similar problem with formatting happens when I copy & paste from Google Doc documents.  The text looks OK in Composer window, but it gets broken if saved as Draft or sent.

Quite often find myself in a situation when the email is already gone, everyone received it, and the formatting is awfully broken.  Unreadable.

If it's not possible to prevent this HTML wrapping, at least let me see that it is broken before I send it out, so I can correct it...  Saving to drafts before sending -- for every message it would be too much.

I'll attach samples below.
test message as it looks in composer
test message as it looks when saved in drafts
email source as it is saved in dfarts
Similar problem (HTML wrapping on send, and composer showing not what recepients will see) described here: bug 1097174.
Bad wrapping is due to the M-C serialiser. We should create a meta bug to collect all these problems.
I'm still encountering this bug in TB 52.1.1! Pasted a some HTML code in the signature setting for my account and could not work out why TB's own DOM inspector was saying "Invalid property value". It's mauling the code even though it could just use "quoted-printable" encoding and leave the HTML alone, completely. How bizarre.

E.g., some parts of long lines that get wrapped and indented automatically, breaking the font declaration:

>              width="20"><strong style="color:#D61F30;
>                white-space:nowrap; font-family:'Titillium
>                Web',sans-serif; font-size:10px !important;
Easily reproducible with attachment 694782 [details]. Import that message, edit as new, save, look at draft.

This is a Core::Serializer problem. For saving a message, we go to Gecko and request the DOM of the editor in the composed message as text. That's called serialising DOM to text. Then we take that text and store it as message. If it's incorrectly serialised, the HTML will be broken.

Looks like the serialiser doesn't recognise the |&quot;Times New Roman&quot;| in "font-family:&quot;Times New Roman&quot;,&quot;serif&quot;;color:red" as a nested string and inserts a line break in the middle of the font name.

The bad news: Core::Serialisers is an unowned module.
You need to log in before you can comment on or make changes to this bug.