Created attachment 551352 [details]
User Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:5.0) Gecko/20100101 Firefox/5.0
Build ID: 20110615151330
Steps to reproduce:
I updated from TB 3.1.11 to 5.0 and tried to read utf-7 encoded mail.
Utf-7 formatted mail such as the attached sample is not displayed properly anymore since the update. Schreenshots with the same (included) sample mail being displayed are in the attached file:
*Garbled interpretation by TB5
*An OK one by TB 3.1.11 (on second machine).
The original environment was Windows 7-64. I verified the problem with a clean profile and under Linux (Knoppix).
The character interpretation should have stayed the same as before!
The problem seems to be pretty far down. Html mail is totally garbled, as "<" becomes "+ADw+", with tags no more being recognized as such. (Attached example is plain text to keep things simple).
I think UTF-7 was dropped as a recognized protocol for the HTML parser, though I'm not certain.
bug 414064 dropped support for utf7
but there is bug 587475
Addendum: If one replies to utf-7 encoded mail, the result is utf-7 encoded itself. Apparently TB5 can write utf-7. It just can't read it, not even in its own "sent" folder. This seems kind of odd / half-done.
The removal of utf-7 support in an automatic update is bad. I am losing quite some mail history. There is no easy way back to 3.1.11 because the lightning plugin updated itself, too, including (now no more downward-compatible) calendar data.
Created attachment 561056 [details]
Another example using utf-7
UTF-7 is still used, at least some mailservers send delivery reports using it.
This is such a notification (edited to exclude confidential info).
Removing support for UTF-7 is fine as long as you don't allow to create new content using it, but removing the ability to read (old) messages (and not-too-old issued by still-functioning software) isn't acceptable. This brings the old question regarding one's ability to read old documents saved in a proprietary file formats, that is being addressed by new international standards (e.g. ODF). This move makes the opposite: makes a document composed using (once) standard format to be unreadable.
And yet another thought: while striving to implement HTML5, you will now drop supporting HTML4 and older, will you? Your implementation of HTML5 may be completely UTF-7-clean, but you could let it be in other places.
By the way, this message shows another bug displaying attached rfc822 message. As you can see, its attachments are shown as the main message attachments (well, this is another issue, I'll look for it already filed, or create a separate issue).
Removing support for reading UTF-7 wasn't intentional and is why this bug exists. Obviously when bug 414064 and bug 587475 landed we managed to break something, which is why we're tracking this for TB 8 and we'll try and get it fixed there.
Unfortunately we've not been able to fix this in time for 8, so we'll shoot for 9 instead.
Created attachment 572568 [details] [diff] [review]
*IF* we are OK with supporting UTF-7 even in HTML messages, this is relatively straightforward: we just have to use GetUnicodeDecoderInternal all the time.
Last time I tried this, view source didn't work with UTF-7 messages, but now it does, presumably because of the change to use the HTML5 parser in view source. The UTF-7 isn't decoded in view source, but that is no worse than the status quo with quoted-printable and base64 encoded messages.
Simon, thx for the patch. My understanding is that we don't do utf-7 in html in the browser because of xss exploits. I assume e-mail is not vulnerable because we don't have js turned on. rss feed messages, on the other hand, might be vulnerable, except that I'm not sure how much of the feed content actually goes through libmime. Cc'ing dveditz to see if this scares him.
I think it should be possible to not do utf-7 decoding in html parts, though it is libmime, so nothing is easy. Do you know for sure that with this patch we actually do utf-7 decoding of html?
> I think it should be possible to not do utf-7 decoding in html parts, though
> it is libmime, so nothing is easy. Do you know for sure that with this patch
> we actually do utf-7 decoding of html?
Yes, I've tested it with plain text and html messages (but not rss feeds, I have to admit ;-)
I think it shouldn't be too hard to exclude HTML messages from UTF-7 decoding, but note that comment 0 does specifically mention HTML messages as part of the problem.
(In reply to Simon Montagu from comment #10)
> I think it shouldn't be too hard to exclude HTML messages from UTF-7
> decoding, but note that comment 0 does specifically mention HTML messages as
> part of the problem.
It does, but the message is actually of type text/plain. I'm curious if there's really text/html mail out there with utf-7. But in any case, I do appreciate the argument that we shouldn't break old mail, and as long as dveditz is OK with this, I'm ok with it.
(In reply to David :Bienvenu from comment #11)
> It does, but the message is actually of type text/plain. I'm curious if
> there's really text/html mail out there with utf-7.
Yes, those text/html emails do exist. We are still receiving quite some of those:
X-Mailer: Microsoft Office Outlook 11
X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.5931
Comment on attachment 572568 [details] [diff] [review]
We could probably have a simple unit test for this, like mailnews/mime/test/unit/test_mimeStreaming.js, except that you'd have to verify the results, which that test doesn't do currently.
Are we waiting for the test to check this in ?
No, we can check this in, I think.
There's one thing I've been asked to test with this, which I'll do in a little while, so please don't land just yet.
Checked in: http://hg.mozilla.org/comm-central/rev/dc9e0a572606
Checked into branches: