Closed Bug 696005 Opened 13 years ago Closed 13 years ago

emails can become corrupted if modified via the mailer_before_send due to premature UTF8 fixups

Categories

(Bugzilla :: Email Notifications, defect)

4.0.2
x86
macOS
defect
Not set
normal

Tracking

()

RESOLVED FIXED
Bugzilla 4.4

People

(Reporter: glob, Assigned: glob)

References

Details

Attachments

(5 files)

Attached file broken email
bug 689601 is sending corrupt emails on production, however the emails are not corrupt on our development environments. this is probably due to the older modules installed on production. i'll attach a patch to disable this feature until we've had time to investigate fully what is going on. see attached file; the attachment to it is quote-printed but this isn't specified in the header; and there's some other quote-printed weirdness going on.
Attached image broken email screenshot
Attached file good email
Attached image good email screenshot
the "good" versions are what my dev system generates.
Ok after digging into this more I have tracked it down to the offending part. In Bugzilla/Mailer.pm $email->walk_parts is checking each sub part to see if it is 7bit clean and if not it is adding the "Content-Encoding-Type: quoted-printable" header to each part. In this case your realname "<glob>" is not 7bit clean and so the header is added. This converts it to the appropriate ASCII equivalent "=3DE2=3D80=3D=B9glob=3D=3DE2=3D80=3DBA". It is also converting other characters in the body that it shouldn't. If I comment out the line in $email->walk_parts that adds the header, all looks proper in the email notifications but this may cause issues that we don't know about which is why the header was added originally a while ago. One thing that is wierd is that it adds the quoted-printable header to each of the sub parts as well as the main headers for the email which I am not sure it is supposed to be doing. This is causing characters in the sub headers to also be converted which is confusing to email clients. Example: From - Thu, 20 Oct 2011 21:51:54 +0000 From: bugzilla-daemon@mozilla.com^M To: glob@mozilla.com^M Subject: review requested: [Bug 171126] Search for bugs page doesn't show^M "Sort By" correctly. : [Attachment 563059 [details]] test patch^M Date: Thu, 20 Oct 2011 21:51:54 +0000^M ... Content-Type: multipart/mixed; boundary="1319147514.56C80.28538"; charset="UTF-8"^M Content-Transfer-Encoding: quoted-printable^M --1319147514.56C80.28538^M From: bugzilla-daemon@mozilla.com^M To: glob@mozilla.com^M Subject: review requested: [Bug 171126] Search for bugs page doesn't show^M "Sort By" correctly. : [Attachment 563059 [details]] test patch^M Date: Thu, 20 Oct 2011 21:51:54 +0000^M ... Content-Type: text/plain; charset=3D"UTF-8"^M Content-Transfer-Encoding: quoted-printable^M Notice: Content-Type: text/plain; charset=3D"UTF-8". So it seems that the body_set is being reran on the entire body causing the headers to become corrupt. This is as far as I got. I will play with it some more tomorrow to see where a workaround or other fix can be used to make this work properly. One possibility would be to convert the non-ASCII characters to printable equivalents for the message body using: $text =~ s/([^\x20-\x7E])/sprintf("&#x%X;", ord($1))/eg; url: http://www.perlmonks.org/?node_id=870912 What are our thoughts? dkl
Status: NEW → ASSIGNED
Attached patch patch v1Splinter Review
the crux of the problem is the message is fixed with regards to utf8 before it is passed to the hook. the email object doesn't "realise" that it has this has happened, so it gets a second encoding pass after we modifying the email. the fix is simple: move the walk_parts block to after the hook.
Attachment #568604 - Flags: review?(dkl)
Component: Extensions: Other → Email Notifications
Product: bugzilla.mozilla.org → Bugzilla
QA Contact: extensions → default-qa
Summary: bug 689601 is sending corrupt emails on production → emails can become corrupted if modified via the mailer_before_send due to premature UTF8 fixups
Version: Current → 4.0.2
Comment on attachment 568604 [details] [diff] [review] patch v1 Review of attachment 568604 [details] [diff] [review]: ----------------------------------------------------------------- Definitely improved as now I get the quoted-printable header only for the subparts and not the outer email body which makes more sense. Also the subpart headers do not have their content-type header munged now which is an improvement. Problem is I can understand the quoted characters displaying for your realname which has unicode but it is still causing other characters to be quoted which should not. --------------- From - Fri, 21 Oct 2011 22:01:08 +0000 From: bugzilla-daemon@mozilla.com^M To: glob@mozilla.com^M Subject: review requested: [Bug 171126] Search for bugs page doesn't show^M "Sort By" correctly. : [Attachment 563059 [details]] test patch^M Date: Fri, 21 Oct 2011 22:01:08 +0000^M ... Content-Type: multipart/mixed; boundary="1319234468.C8b38A7B0.2766"; charset="us-ascii"^M ^M ^M --1319234468.C8b38A7B0.2766^M From: bugzilla-daemon@mozilla.com^M To: glob@mozilla.com^M Subject: review requested: [Bug 171126] Search for bugs page doesn't show^M "Sort By" correctly. : [Attachment 563059 [details]] test patch^M Date: Fri, 21 Oct 2011 22:01:08 +0000^M ... Content-Type: text/plain; charset="UTF-8"^M Content-Transfer-Encoding: quoted-printable^M ^M David Lawrence [:dkl] <dkl@mozilla.com> has asked Byron Jones =E2=80=B9glob=^M =E2=80=BA^M <glob@mozilla.com> for review:^M Bug 171126: Search for bugs page doesn't show "Sort By" correctly.^M http://fedora/stable/show_bug.cgi?id=3D171126^M ^M Attachment 563059 [details]: test patch^M http://fedora/stable/attachment.cgi?id=3D563059&action=3Dedit^M Splinter Review^M http://fedora/stable/page.cgi?id=3Dsplinter.html&bug=3D171126&attachment=3D=^M 563059^M --1319234468.C8b38A7B0.2766^M Date: Fri, 21 Oct 2011 18:01:08 -0400 ... Content-Transfer-Encoding: quoted-printable Index: importxml.pl^M =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=^M =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=^M =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D^M RCS file: /cvsroot/mozilla/webtools/bugzilla/importxml.pl,v^M retrieving revision 1.104^M diff -p -u -r1.104 importxml.pl^M --- importxml.pl 23 May 2011 17:07:20 -0000 1.104^M +++ importxml.pl 5 Oct 2011 21:17:12 -0000^M @@ -854,8 +854,6 @@ sub process_bug {^M }^M =20^M # Status & Resolution^M - my $has_res =3D defined($bug_fields{'resolution'});^M - my $has_status =3D defined($bug_fields{'bug_status'});^M my $valid_res =3D check_field('resolution',=20=20^M scalar $bug_fields{'resolution'},=20^M undef, ERR_LEVEL );^M @@ -910,10 +908,10 @@ sub process_bug {^M }^M }^M =20^M ... --1319234468.C8b38A7B0.2766--^M Note the URLs have quoted characters: http://fedora/stable/attachment.cgi?id=3D563059&action=3Dedit^M Splinter Review^M http://fedora/stable/page.cgi?id=3Dsplinter.html&bug=3D171126&attachment=3D=^M 563059^M
Attachment #568604 - Flags: review?(dkl) → review-
(In reply to David Lawrence [:dkl] from comment #6) > Problem is I can understand the quoted characters displaying for your > realname which has unicode but it is still causing other characters to > be quoted which should not. there's nothing wrong with quote printable encoding as long as it's done correctly, which is now is with this patch. for example, the urls were encoded *twice*, resulting in > https://bugzilla.mozilla.org/attachment.cgi?id=3D3D568331&action=3D3Dedit > ^^^^^ now it's only once, and will be decoded correctly by email clients: > http://fedora/stable/attachment.cgi?id=3D563059&action=3Dedit > ^^^
(In reply to Byron Jones ‹:glob› from comment #7) > now it's only once, and will be decoded correctly by email clients: > > > http://fedora/stable/attachment.cgi?id=3D563059&action=3Dedit Thats fine. My question was why the letter '=' needed to be encoded in the first place. It is a normal ascii character. I can understand the characters in your realname, but it seemed like it was getting overly greedy. This is the first time for me dealing with an issue like this so I learn something new all of the time. dkl
(In reply to David Lawrence [:dkl] from comment #8) > (In reply to Byron Jones ‹:glob› from comment #7) > > now it's only once, and will be decoded correctly by email clients: > > > > > http://fedora/stable/attachment.cgi?id=3D563059&action=3Dedit > > Thats fine. My question was why the letter '=' needed to be encoded in the > first place. It is a normal ascii character. I can understand the characters > in your realname, but it seemed like it was getting overly greedy. = is the start-of-escape character in quoted-printed encoding, so it always needs to be escaped.
(In reply to Byron Jones ‹:glob› from comment #9) > = is the start-of-escape character in quoted-printed encoding, so it always > needs to be escaped. Well that explains it ;)
Attachment #568604 - Flags: review- → review?(dkl)
Comment on attachment 568604 [details] [diff] [review] patch v1 I am r+ as this corrects the problem keeping the attachment feature from going out. Although I am concerned that this is an upstream hook that is being moved and will be a BMO customization to maintain going forward. I would like to have this change pushed upstream if possible or possible find a better extension-like solution in the future. Thanks dkl
Attachment #568604 - Flags: review?(dkl) → review+
(In reply to David Lawrence [:dkl] from comment #11) > I would like to have this change pushed upstream if possible or possible find > a better extension-like solution in the future. this _is_ an upstream bug.
(In reply to Byron Jones ‹:glob› from comment #12) > (In reply to David Lawrence [:dkl] from comment #11) > > I would like to have this change pushed upstream if possible or possible find > > a better extension-like solution in the future. > > this _is_ an upstream bug. Cool. If there is already a bug report then I apologize for not searching for it before commenting. Either way then, this patch should pose no problem getting accepted since it fixes the upstream issue. dkl
Hey! Looks like nobody requested approval. Also, it's funny, you guys have actually come up against an old historical problem that I would really love to fix. When we wrote this code, the world still had MTAs in it that couldn't deal with 8-bit messages, and our code itself was not 8-bit clean. However, now that we actually support utf8 in all of Bugzilla, what would be even better here would be to simply set the Content-Transfer-Encoding as "8bit" if it's in utf8 and then make sure Perl thinks it's UTF-8! Not only would this resolve your bug, it would significantly decrease the size of emails sent by Bugzilla when they have to be encoded! :-) The only other fix that would be required, in order to actually fix your bug, would be to have it expect attachments. So the Content-Transfer-Encoding changes would only be done to the body or the multipart/alternative pieces. (I do some walking of a multipart/alternative in email_in.pl, perhaps that code could be abstracted out.)
(In reply to Max Kanat-Alexander from comment #14) > Hey! Looks like nobody requested approval. oops, requested. > what would be even better here would be to simply set the Content-Transfer-Encoding > as "8bit" if it's in utf8 and then make sure Perl thinks it's UTF-8! sounds reasonable, however that change should be tracked in a different bug. > The only other fix that would be required, in order to actually fix your > bug, would be to have it expect attachments. not sure what you're getting at here to be honest; attachments work with the attached patch applied.
Flags: approval?
Does it need to be checked in on branches? If yes, should it be considered as a blocker?
Flags: blocking4.2?
(In reply to Frédéric Buclin from comment #16) > Does it need to be checked in on branches? If yes, should it be considered > as a blocker? this issue was introduced by 3.6, however i don't think it's important enough to warrant branch check-in.
(In reply to Byron Jones ‹:glob› from comment #17) > this issue was introduced by 3.6, however i don't think it's important > enough to warrant branch check-in. OK, cool. So not a blocker. I will let mkanat approve this patch as he seems to have some concerns with it.
Flags: blocking4.2?
This patch is fine. But I would really like another patch that fixes up the content encoding to just be 8bit. We've now had numerous problems that have resulted from our encoding, I'd rather just not have to re-encode anything.
Flags: approval? → approval+
Committing to: bzr+ssh://bjones%40mozilla.com@bzr.mozilla.org/bugzilla/trunk/ modified Bugzilla/Mailer.pm Committed revision 8021.
Status: ASSIGNED → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Target Milestone: --- → Bugzilla 5.0
Blocks: 714488
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: