Closed Bug 1078314 Opened 6 years ago Closed 6 years ago

Missing links and broken unicode characters in some bugmail

Categories

(bugzilla.mozilla.org :: Extensions, defect)

Production
defect
Not set
critical

Tracking

()

RESOLVED FIXED

People

(Reporter: whimboo, Assigned: glob)

References

Details

Attachments

(4 files)

Attached file email.txt
As seen over the weekend some of the bugmail sent out by bugzilla.mozilla.org contains broken links and special characters. Talking to dkl on IRC about it, and he pointed out http://git.mozilla.org/?p=webtools/bmo/bugzilla.git;a=commit;h=9552ab99622d784611d254268d3688558d24fe6e [github] for the possible regressor. More testing would be necessary.

A bug mail which I got with broken data is attached.
Assignee: nobody → dylan
Duplicate of this bug: 1079574
The change that introduced this bug appears to be:

    # fix encoding
    my $body = $part->body;
    if (Bugzilla->params->{'utf8'}) {
        $part->charset_set('UTF-8');
        my $raw = $part->body_raw;
        if (utf8::is_utf8($raw)) {
            utf8::encode($raw);
            $part->body_set($raw);
        }    
    }    
    $part->encoding_set('quoted-printable') if !is_7bit_clean($body);

Looks innocent, right? (note: calling utf8::is_utf8 is useless (bug 1003996) but that isn't the problem here). The problem is an interaction between $part->encoding_set() and $part->body_raw.

I only realized this after reading the documentation: https://metacpan.org/pod/Email::MIME#encoding_set

> Convert the message body and alter the Content-Transfer-Encoding header using this method. Your
> message body, the output of the body() method, will remain the same. The raw body, output with the
> body_raw() method, will be changed to reflect the new encoding.

Right. So any function that relies on the value of body_raw() AND calls encoding_set() will not be idempotent.
Bugzilla::Mailer *should* not be encoding this, because it only does that when the charset is ascii (!?)
but some debugging prints should solve this issue.
Duplicate of this bug: 1079903
Summary: Missing links and broken umlauts in some bugmail → Missing links and broken unicode characters in some bugmail
In desperation, some bug spam. I'm not seeing the double encoding on my dev machine despite many variations of testing...

Some ŭtf8 ßtuff æñḑ suçȟ
I've already received two of them from bug 1072743 (comment 14 and comment 16), maybe you want to try copying one of those comments.
Hmm... I think I found the pattern -- when the person that caused the email to be sent does not have editbugs, the email will be borked.
Attached file 1072497.eml
Nope. Above comment is wrong. But I have now have a strong suspicion that all the bad emails are coming from jobqueue2. We need to figure out how those systems differ, if they do.

Here is an example of a "good" email
Attachment #8502599 - Attachment mime type: message/rfc822 → text/plain
Attached file 508541.eml
And this is a bad email.
taking.

(In reply to Dylan William Hardison [:dylan] from comment #2)
> Right. So any function that relies on the value of body_raw() AND calls
> encoding_set() will not be idempotent.

the raw parts are recreated with:

# force Email::MIME to re-create all the parts.  without this
# as_string() doesn't return the updated body for multi-part sub-parts.
$email->parts_set([ $email->subparts ]);

> But I have now have a strong suspicion that all the bad emails are coming from jobqueue2.
> We need to figure out how those systems differ, if they do.

this is another red herring -- i have a corrupt email generated by jobqueue1.

> Created attachment 8502602 [details]
> 508541.eml
> And this is a bad email.

odd; this email looks good to me.



it looks like the parts are being modified twice; the html in the bad email is characteristic of those modified by the HTML::Tree package.  i suspect this is all fallout from my lack of action on bug 714724 :(
Assignee: dylan → glob
Severity: normal → critical
Duplicate of this bug: 1086577
Attached patch 1078314_1.patchSplinter Review
the crux of the change here is "don't mess with the encoding if it has already been messed with".

i've created a standard _fix_encoding sub and use it in all extensions that touch bugmail.  unfortunately i can't put it into a shared library because some of the extensions need to be able to be deployed onto other sites.
Attachment #8509649 - Flags: review?(dylan)
Comment on attachment 8509649 [details] [diff] [review]
1078314_1.patch

Review of attachment 8509649 [details] [diff] [review]:
-----------------------------------------------------------------

As far as I can test, this works.
r=dylan
Attachment #8509649 - Flags: review?(dylan) → review+
To ssh://gitolite3@git.mozilla.org/webtools/bmo/bugzilla.git
   d70f291..6dc2473  master -> master
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Component: Extensions: BMO → Extensions
You need to log in before you can comment on or make changes to this bug.