Last Comment Bug 677111 - utf-7 encoded characters are not interpreted properly in TB 5.0.
: utf-7 encoded characters are not interpreted properly in TB 5.0.
Status: RESOLVED FIXED
: dataloss, regression
Product: MailNews Core
Classification: Components
Component: Internationalization (show other bugs)
: 5.0
: x86_64 Windows 7
: -- major with 1 vote (vote)
: Thunderbird 12.0
Assigned To: Simon Montagu :smontagu
:
Mentors:
Depends on:
Blocks: 587475
  Show dependency treegraph
 
Reported: 2011-08-07 14:55 PDT by mm-muell
Modified: 2012-01-24 13:17 PST (History)
10 users (show)
bugzillamozillaorg_serge_20140323: in‑testsuite?
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
-
-
-
+
fixed
fixed


Attachments
thunderbirt-utf7-bug.zip (55.48 KB, application/octet-stream)
2011-08-07 14:55 PDT, mm-muell
no flags Details
Another example using utf-7 (2.54 KB, text/plain)
2011-09-19 15:57 PDT, Mike Kaganski
no flags Details
Patch (2.14 KB, patch)
2011-11-07 12:07 PST, Simon Montagu :smontagu
mozilla: review+
standard8: approval‑comm‑aurora+
standard8: approval‑comm‑beta+
Details | Diff | Review

Description mm-muell 2011-08-07 14:55:54 PDT
Created attachment 551352 [details]
thunderbirt-utf7-bug.zip

User Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:5.0) Gecko/20100101 Firefox/5.0
Build ID: 20110615151330

Steps to reproduce:

I updated from TB 3.1.11 to 5.0 and tried to read utf-7 encoded mail.


Actual results:

Utf-7 formatted mail such as the attached sample is not displayed properly anymore since the update. Schreenshots with the same (included) sample mail being displayed are in the attached file:

*Garbled interpretation by TB5
*An OK one by TB 3.1.11 (on second machine).

The original environment was Windows 7-64. I verified the problem with a clean profile and under Linux (Knoppix).


Expected results:

The character interpretation should have stayed the same as before!
The problem seems to be pretty far down. Html mail is totally garbled, as "<" becomes "+ADw+", with tags no more being recognized as such. (Attached example is plain text to keep things simple).
Comment 1 Joshua Cranmer [:jcranmer] 2011-08-08 07:38:05 PDT
I think UTF-7 was dropped as a recognized protocol for the HTML parser, though I'm not certain.
Comment 2 Robert Longson 2011-08-08 07:40:36 PDT
bug 414064 dropped support for utf7
Comment 3 Matthias Versen [:Matti] 2011-08-09 03:00:53 PDT
but there is bug 587475
Comment 4 mm-muell 2011-08-09 07:57:55 PDT
Addendum: If one replies to utf-7 encoded mail, the result is utf-7 encoded itself. Apparently TB5 can write utf-7. It just can't read it, not even in its own "sent" folder. This seems kind of odd / half-done.

The removal of utf-7 support in an automatic update is bad. I am losing quite some mail history. There is no easy way back to 3.1.11 because the lightning plugin updated itself, too, including (now no more downward-compatible) calendar data.
Comment 5 Mike Kaganski 2011-09-19 15:57:52 PDT
Created attachment 561056 [details]
Another example using utf-7

UTF-7 is still used, at least some mailservers send delivery reports using it.
This is such a notification (edited to exclude confidential info).

Removing support for UTF-7 is fine as long as you don't allow to create new content using it, but removing the ability to read (old) messages (and not-too-old issued by still-functioning software) isn't acceptable. This brings the old question regarding one's ability to read old documents saved in a proprietary file formats, that is being addressed by new international standards (e.g. ODF). This move makes the opposite: makes a document composed using (once) standard format to be unreadable.
And yet another thought: while striving to implement HTML5, you will now drop supporting HTML4 and older, will you? Your implementation of HTML5 may be completely UTF-7-clean, but you could let it be in other places.

By the way, this message shows another bug displaying attached rfc822 message. As you can see, its attachments are shown as the main message attachments (well, this is another issue, I'll look for it already filed, or create a separate issue).
Comment 6 Mark Banner (:standard8) 2011-09-20 06:31:09 PDT
Removing support for reading UTF-7 wasn't intentional and is why this bug exists. Obviously when bug 414064 and bug 587475 landed we managed to break something, which is why we're tracking this for TB 8 and we'll try and get it fixed there.
Comment 7 Mark Banner (:standard8) 2011-10-28 11:42:24 PDT
Unfortunately we've not been able to fix this in time for 8, so we'll shoot for 9 instead.
Comment 8 Simon Montagu :smontagu 2011-11-07 12:07:44 PST
Created attachment 572568 [details] [diff] [review]
Patch

*IF* we are OK with supporting UTF-7 even in HTML messages, this is relatively straightforward: we just have to use GetUnicodeDecoderInternal all the time.

Last time I tried this, view source didn't work with UTF-7 messages, but now it does, presumably because of the change to use the HTML5 parser in view source. The UTF-7 isn't decoded in view source, but that is no worse than the status quo with quoted-printable and base64 encoded messages.
Comment 9 David :Bienvenu 2011-11-07 12:34:10 PST
Simon, thx for the patch. My understanding is that we don't do utf-7 in html in the browser because of xss exploits. I assume e-mail is not vulnerable because we don't have js turned on. rss feed messages, on the other hand, might be vulnerable, except that I'm not sure how much of the feed content actually goes through libmime. Cc'ing dveditz to see if this scares him.

I think it should be possible to not do utf-7 decoding in html parts, though it is libmime, so nothing is easy. Do you know for sure that with this patch we actually do utf-7 decoding of html?
Comment 10 Simon Montagu :smontagu 2011-11-07 12:52:50 PST
> I think it should be possible to not do utf-7 decoding in html parts, though
> it is libmime, so nothing is easy. Do you know for sure that with this patch
> we actually do utf-7 decoding of html?

Yes, I've tested it with plain text and html messages (but not rss feeds, I have to admit ;-) 
I think it shouldn't be too hard to exclude HTML messages from UTF-7 decoding, but note that comment 0 does specifically mention HTML messages as part of the problem.
Comment 11 David :Bienvenu 2011-11-07 13:07:40 PST
(In reply to Simon Montagu from comment #10)

> I think it shouldn't be too hard to exclude HTML messages from UTF-7
> decoding, but note that comment 0 does specifically mention HTML messages as
> part of the problem.

It does, but the message is actually of type text/plain. I'm curious if there's really text/html mail out there with utf-7. But in any case, I do appreciate the argument that we shouldn't break old mail, and as long as dveditz is OK with this, I'm ok with it.
Comment 12 phil 2011-11-08 00:41:44 PST
(In reply to David :Bienvenu from comment #11)

> It does, but the message is actually of type text/plain. I'm curious if
> there's really text/html mail out there with utf-7.

Yes, those text/html emails do exist. We are still receiving quite some of those:

[...]
X-Mailer: Microsoft Office Outlook 11
X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.5931
[...]
------=_NextPart_001_0013_01CC9E01.3F5BABB0
Content-Type: text/html;
	charset="utf-7"
Content-Transfer-Encoding: quoted-printable
[...]
Comment 13 David :Bienvenu 2011-11-09 13:46:53 PST
Comment on attachment 572568 [details] [diff] [review]
Patch

We could probably have a simple unit test for this, like mailnews/mime/test/unit/test_mimeStreaming.js, except that you'd have to verify the results, which that test doesn't do currently.
Comment 14 Ludovic Hirlimann [:Usul] 2011-12-08 05:20:44 PST
Are we waiting for the test to check this in ?
Comment 15 David :Bienvenu 2011-12-08 08:07:52 PST
No, we can check this in, I think.
Comment 16 Mark Banner (:standard8) 2012-01-20 03:29:24 PST
There's one thing I've been asked to test with this, which I'll do in a little while, so please don't land just yet.
Comment 17 Mark Banner (:standard8) 2012-01-21 03:05:13 PST
Checked in: http://hg.mozilla.org/comm-central/rev/dc9e0a572606

Note You need to log in before you can comment on or make changes to this bug.