UTF-8 encoding mangles multipart messages, breaks whine emails

RESOLVED FIXED in Bugzilla 2.22

Status

()

--
major
RESOLVED FIXED
14 years ago
11 years ago

People

(Reporter: cedric.caron, Assigned: karl)

Tracking

({regression})

2.21
Bugzilla 2.22
regression
Bug Flags:
approval +
blocking2.22 +

Details

Attachments

(3 attachments)

(Reporter)

Description

14 years ago
My server is running under windows 2000 using IIS, perl 5.8.7 and the curent 
CVS version.

I configured an event but the e-mail I reveice is empty.

the title is OK but the body of the empty.

any idea of aditional test I can do to fix this problem ?

PS: I have Bug 135812 patch applied
(Assignee)

Comment 1

14 years ago
I have several questions:

0) Do you see anything if you open the email in a different program?
1) What does the 'Content-Type' header say?  Or is there no Content-Type 
header in the email at all?
2) What query does your whining entry execute?
3) Does this problem appear on 2.20rc2?  Does this problem appear if you use 
the latest CVS version, without any extra patches on it?
4) What is the current subject and body of the whining entry?  What happens if 
you change them?
(Reporter)

Comment 2

14 years ago
0) I tryed with Outlook 2003 and the web interface provided by my ISP and the 
e-mail seems realy empty

1) see next coment

2)a very simple query returning all the unconfirmed, new, assigned and 
reopened bugs of a single product

3) my server is in "production" and its dificult to switch to an other version 
(I know CVS versions are not recomended for production)

4) the subject os "bugzilla status" and I tryed with the same body and with an 
empty body
(Reporter)

Comment 3

14 years ago
Header of the received e-mail:

MIME-Version: 1.0
Subject: [Bugzilla] Bugzilla Status
X-Mailer: Mail::Mailer[v1.67] Net::SMTP[v2.29]
X-Bugzilla-Type: whine
Content-Type: multipart/alternative;
 boundary="-----=====-----2584--1124229602-----";
 charset="UTF-8"
To: cedric.caron@urbanet.ch
Content-Transfer-Encoding: quoted-printable
From: bugzilla-daemon@orchid-management.com
Message-ID: <PROXYNQptgFrYILoygX0000002e@proxy.orchid-management.com>
X-OriginalArrivalTime: 16 Aug 2005 22:00:06.0453 (UTC) FILETIME=
[DCA15650:01C5A2AD]
Date: 17 Aug 2005 00:00:06 +0200
X-Virus-Scanned: ClamAV version 0.86.2, clamav-milter version 0.86 on mx-
04.tornado.cablecom.ch
X-Virus-Status: Clean
X-Spam-Checker-Version: SpamAssassin 2.64-hispeed (2004-01-11)
X-Spam-Status: No, hits=1.2 required=5.0 tests=AWL,CLICK_BELOW,HTML_30_40,
	HTML_LINK_CLICK_HERE,HTML_MESSAGE,MIME_MISSING_BOUNDARY,NO_REAL_NAME 
	autolearn=no version=2.64-hispeed
X-Spam-Level: *

(Reporter)

Comment 4

14 years ago
Created attachment 192995 [details]
e-mail dump...

I used the testfile mail mode to capteur the mail send by the server.

The the header seems ok the the body is full of "3D" which prevent a corect
MIME decoding...

Any idea what can be the problem ?
(Reporter)

Comment 5

14 years ago
Looks like Bug 126266 try to encode the e-mail and destroy the nice mime e-
mail generated by whine.pl (my database is configured to use UTF-8 charset)

(Assignee)

Comment 6

14 years ago
Thank you for the email dump.  I believe you may be correct.

The charset and Content-Transfer-Encoding headers were not originally set by
whine.pl.  As you noted, the new code from bug 126266 tries to modify the email.

My guess is that the new re-encoding code from bug 126266 did not properly
change the headers of the two alternative parts, which is why the email did not
display.  If this is correct, then this bug is probably a regression.
(Assignee)

Comment 7

14 years ago
Tested on landfill and did not see this problem.  However, I assume that is
because the utf8 parameter is set to "off" on landfill.  Anyway, here's my guess
as to what went wrong:


whine.pl calls Bugzilla::BugMail::MessageToMTA.  On line 635 of BugMail.pm, Perl
discovers that the utf8 parameter is on (which I assume is true from comment 5)
and either the header or body is not 7-bit clean (I'm fuzzy in this area: I'm
not sure why the message is not 7-bit clean, but this seems to be the case). 
Bugzilla::BugMail::encode_message is called for the message header & body (line
636/660).

encode_message calls MIME::Parser::parse_data (line 668) on the headers (JUST
the headers, not the body), returning a MIME::Entity.  A call to
MIME::Entity::head call is executed on the returned entity (line 669), which
gives us a MIME::Head object.  The utf8 character set is added to the
Content-Type header (line 675), and various headers are examined.  Eventually,
we get to the body.

The quoted-printable encoding is only set on line 714, which tells us that (a)
the body is not 7-bit clean (line 709), and (b) more than half of the message is
7-bit clean (line 713-714).  The quoted-printable header is added (line 714) and
the processed message is returned back to the caller (at line 636).  The message
is then sent out.


The flaw seems to be at the beginning of encode_message.  Throughout
encode_message, the $body is only touched to be encoded.  encode_message never
checks to see if the body of the message contains multiple parts, so the headers
of those parts are never properly updated.  This is unfortunate for whine.pl,
which sends out multipart/alternative messages.  I would guess the solution
would be to change encode_message to check for a multipart message.  Each part
in the body would then be split up, encoded (with appropriate headers inserted),
combined, and sent out.


For now, my suggestion is this: set the 'regression' keyword, change hardware to
'All', change OS to 'All', set severity to 'major', target for 2.22, and assign
to glob (the assignee of bug 126266).  Also, change summary to something like
"UTF-8 encoding mangles multipart messages, breaks whine emails".
(Reporter)

Comment 8

14 years ago
(In reply to comment #7)

> (I'm fuzzy in this area: I'm not sure why the message is not 7-bit clean, 
> but this seems to be the case). 

The bugs in my dtabase are in french and contains accents encoded using 8-bits
(Reporter)

Updated

14 years ago
Assignee: erik → bugzilla
Severity: normal → major
Keywords: regression
OS: Windows 2000 → All
Hardware: PC → All
Summary: whine send empte e-mail → UTF-8 encoding mangles multipart messages, breaks whine emails
> The bugs in my dtabase are in french and contains accents encoded using 8-bits

the quick fix is for whine.pl to use quote-printable or base64 encoding when
constructing the mime message if it's not 7-bit clean.

i don't think i'll have time to look at this for a while, reassigning to nobody.
Assignee: bugzilla → nobody

Updated

14 years ago
Target Milestone: --- → Bugzilla 2.22
(Reporter)

Comment 10

14 years ago
Another solution may be to look for a boundary in the header, replace this 
boundary by a UTF-8 safe version and optionaly restore the original after 
conversion.
(Reporter)

Comment 11

14 years ago
A quick fix for the boundayr problem:

in whine.pl replace 

$args->{'boundary'} = "-----=====-----" . $$ . "--" . time() . "-----";

by

$args->{'boundary'} = "----BugMail----" . $$ . "--" . time() . "-----";

this allow the e-mail to be displayed but all the 8bit caracters are 
corupted...
(Assignee)

Comment 12

14 years ago
Should this be blocking 2.20.1 or 2.22?
Flags: blocking2.20.1?
(Assignee)

Updated

14 years ago
Flags: blocking2.20.1?
Flags: blocking2.22?
another way to fix this is to call encode_message() for each of the individual mime parts, then join them together with mime boundaries and a normal message header.

this way the entire message will be 7-bit clean (as it'll already be encoded) so encode_message won't mangle the boundaries.

That'd mean the callsite were responsible to do the encoding.
How about we let the callsite hand over the parts individually, and BugMail.pm does the encoding and boundary-ing?
Flags: blocking2.22?
Flags: blocking2.22+
Flags: blocking2.20.1+
Target Milestone: Bugzilla 2.22 → Bugzilla 2.20
<bkor> justdave: bug 304885 is not a 2.20.1 blocker -- 2.20 does not have the utf8 parameter or the bugmail stuff to encode header/body for utf-8 (breaking whine)
Flags: blocking2.20.1+
Target Milestone: Bugzilla 2.20 → Bugzilla 2.22
(Assignee)

Comment 16

13 years ago
I have started to work on this.  If I don't have anything in a week I'll give it back to nobody.
Assignee: nobody → karl
(Assignee)

Comment 17

13 years ago
Created attachment 203747 [details] [diff] [review]
Patch v1

OK, here we go...

The first thing I do is remove the = characters from the whine mail boundary, as they do freak out the encoding.

(In reply to comment #14)
> That'd mean the callsite were responsible to do the encoding.
> How about we let the callsite hand over the parts individually, and BugMail.pm
> does the encoding and boundary-ing?

(In reply to comment #13)
> another way to fix this is to call encode_message() for each of the individual
> mime parts, then join them together with mime boundaries and a normal message
> header.

Both of those methods would require a good bit of additional work on the part of the code creating the message.  It would be nice if it would be as simple as calling MessageToMTA and letting BugMail do the work, as it partially does now.  That's the idea I went with, and here is what I have:

First, I now pass the entire message into encode_message, and the entire message is now parsed by MIME::Parser.  This takes care of recognizing and parsing all parts of the message.  The code responsible for the actual encoding is now in a function called encode_message_entity, which takes a MIME::Entity as its parameter (and returns same).  encode_message received the entity (which contains the newly-encoded data), extracts the header & body, and returns them as before.

encode_message_entity contains much of the code from encode_message, with little change.  There are, however, a few notable changes:

* If a multipart message is detected, extract each part as its own entity and call encode_message_entity, instead of trying to examine a body that is not going to exist (multipart messages don't have bodies, just parts).
* Do not try to set the content-type or charset on parts that contain no body
* Do not do any actual encoding.  Instead, just set the appropriate header and MIME::Tools will handle the encoding for us (since we have given it a body, not just headers)!

I tested this by creating a whining event that contained many interesting characters (ü, é, â, etc.) in both the subject & body.  In all cases (with quoted-printable and base64 encoding) my mailer (Apple Mail) was able to decode and display the message.  Of course, this still needs additional testing 8-)
Attachment #203747 - Flags: review?
(Assignee)

Updated

13 years ago
Status: NEW → ASSIGNED
Comment on attachment 203747 [details] [diff] [review]
Patch v1

r=glob

very nice, good job :)



nits (can be fixed on checkin):

>     # read header into MIME::Entity

this comment is no longer accurate - probably best to delete it

>+        foreach my $part ($entity->parts) {
>+          my $newpart = encode_message_entity($part);
>+          push @$newparts, $newpart;

indentation
Attachment #203747 - Flags: review? → review+
(Assignee)

Updated

13 years ago
Flags: approval?
Flags: approval? → approval+
(Assignee)

Comment 19

13 years ago
Created attachment 206702 [details] [diff] [review]
Checked-In Version
(Assignee)

Comment 20

13 years ago
Checking in whine.pl;
/cvsroot/mozilla/webtools/bugzilla/whine.pl,v  <--  whine.pl
new revision: 1.20; previous revision: 1.19
done
Checking in Bugzilla/BugMail.pm;
/cvsroot/mozilla/webtools/bugzilla/Bugzilla/BugMail.pm,v  <--  BugMail.pm
new revision: 1.61; previous revision: 1.60
done
Status: ASSIGNED → RESOLVED
Last Resolved: 13 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.