Closed Bug 723944 Opened 12 years ago Closed 12 years ago

Plain-text only emails are mangled when they contain non-ASCII characters

Categories

(Bugzilla :: Email Notifications, defect)

defect
Not set
major

Tracking

()

RESOLVED FIXED
Bugzilla 4.2

People

(Reporter: philb, Assigned: LpSolit)

References

Details

(Keywords: regression)

Attachments

(2 files)

User Agent: Opera/9.80 (X11; Linux x86_64; U; en-GB) Presto/2.10.229 Version/11.61

Steps to reproduce:

Have your preferences set to plain-text emails only
Create a bug, or comment on a bug
In the description/comment, include non-ASCII characters (e.g. "┌────")


Actual results:

No email is sent


Expected results:

Email should have been sent.

Note that if HTML email is chosen, it is sent, along with the plain-text version. It's only when you have plain-text-only that nothing is sent.
This is in 4.2rc2.
utf8 param is on, and this worked fine in 3.6.
I tested with 4.3 using SMTP. Emails are sent when the email format is plain text, but the content is total garbage. For instance:

HTML format:     testостанов] - неожиданное падение
plain text only: testоÑтанов] - неожиданное падение

I assume this is a different symptom of the same root cause, so marking as blocker.
Severity: normal → major
Status: UNCONFIRMED → NEW
Ever confirmed: true
Flags: blocking4.2+
Target Milestone: --- → Bugzilla 4.2
Phil, did you try with SMTP too, or did you try with Sendmail?
> Phil, did you try with SMTP too, or did you try with Sendmail?

My config is SMTP (with jobqueue). Didn't try sendmail.
This is a regression due to bug 589128. When the user pref has been implemented, multipart/alternative has been replaced by $parts[0]->content_type, i.e. text/plain, for text only bugmails. If I set the content-type back to multipart/alternative, the display is fine again.

Phil, could you test something for me? In the file Bugzilla/BugMail.pm at line 406, replace

  if (scalar(@parts) == 1) {

by

  if (0 && scalar(@parts) == 1) {

Does this fix the problem for you? If yes, then this is indeed the same root cause.
Depends on: 589128
Keywords: regression
Ah, I found something interesting. In MessageToMTA(), we call $email->walk_parts to fix the encoding, but only if the Content-Type header doesn't define the charset itself. This is why we do this check:

  my $content_type = $part->content_type || '';
  if ($content_type !~ /;/) {

So I checked the value of $content_type with text only emails and with text+HTML emails:

Text only: Content-Type: text/plain; charset="us-ascii"

text+HTML: Content-Type text/plain
           Content-Type text/html

So with text+HTML emails, the Content-Type header has no charset defined, and we do the encoding as expected. But in the case of text only emails, charset="us-ascii" is present in the Content-Type header and so we skip the encoding, leading to the garbage I see.
The problem comes from:

  $email->content_type_set($parts[0]->content_type);

$parts[0]->content_type correctly returns 'text/plain', but right after this call, $email->content_type returns text/plain; charset="us-ascii". So for some unknown reason, $email->content_type_set appends charset="us-ascii" despite we didn't tell it to do so.

@rjbs: is this behavior expected?
> Does this fix the problem for you? If yes, then this is indeed the same root
> cause.

Yes, that fixes it.
my patch on bug 714724 fixes this by removing the walk_parts hack and setting the headers correctly before passing the message to Email::MIME.

do you think that's something we could target for 4.2?
(In reply to Byron Jones ‹:glob› from comment #10)
> do you think that's something we could target for 4.2?

I'm not sure. This is a pretty major change, and we are very close from the final release of 4.2. Your patch could have unexpected side-effects, especially with those not using utf8 or with MTAs which do not support 8bit correctly, and I wouldn't want to deal with such problems for 4.2.
Attached patch patch, v1Splinter Review
Here is a workaround which seems safer for 4.2. For 4.4, we can go with what glob suggested.
Assignee: email-notifications → LpSolit
Status: NEW → ASSIGNED
Attachment #597414 - Flags: review?(glob)
Comment on attachment 597414 [details] [diff] [review]
patch, v1

r=glob

kludge, but it works.
Attachment #597414 - Flags: review?(glob) → review+
Flags: approval4.2+
Flags: approval+
Committing to: bzr+ssh://lpsolit%40gmail.com@bzr.mozilla.org/bugzilla/trunk/
modified Bugzilla/Mailer.pm
Committed revision 8114.

Committing to: bzr+ssh://lpsolit%40gmail.com@bzr.mozilla.org/bugzilla/4.2/
modified Bugzilla/Mailer.pm
Committed revision 8029.
Status: ASSIGNED → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Summary: Plain-text email not sent when it contains non-ASCII chars → Plain-text only emails are mangled when they contain non-ASCII characters
I apologize if I'm making a superfluous comment, but are values in headers also being encoded properly? I'm examining today whether I need to add to our 3.6-based version what we have in our 2.22-based version, because we put a user's name in the From: of outgoing emails:

sub encoded_name {
   my $self = shift;
   
   if (!Param('utf8') || is_7bit_clean($self->{name})) {
      return $self->{name};
   } else {
      my $encoded = encode_qp($self->{name}, '');
      return "=?UTF-8?Q?${encoded}?=";
   }
}

I'm thinking about adding an 'qp_encoded' filter to handle this, but I'm also wondering whether other header fields should be encoded too.
(In reply to David Marshall from comment #15)
> I apologize if I'm making a superfluous comment, but are values in headers
> also being encoded properly? I'm examining today whether I need to add to
> our 3.6-based version what we have in our 2.22-based version, because we put
> a user's name in the From: of outgoing emails:
> 
> sub encoded_name {
>    my $self = shift;
>    
>    if (!Param('utf8') || is_7bit_clean($self->{name})) {
>       return $self->{name};
>    } else {
>       my $encoded = encode_qp($self->{name}, '');
>       return "=?UTF-8?Q?${encoded}?=";
>    }
> }
> 
> I'm thinking about adding an 'qp_encoded' filter to handle this, but I'm
> also wondering whether other header fields should be encoded too.

Yikes, never mind. There really is a lot of difference between 2.22 and 3.6
This is not fixed in current 4.2.

No plain-text emails with non-ASCII chars are being sent in my install.

text+HTML format email is better in that it now has correct charset:

Content-Type: text/plain; charset="UTF-8"
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
(In reply to Phil Barrett from comment #17)
> No plain-text emails with non-ASCII chars are being sent in my install.

If you disable JobQueue, do things work correctly? Also, with JobQueue disabled, could you enable the smtp_debug parameter to check where the problem is?
> If you disable JobQueue, do things work correctly?

Yes.

> Also, with JobQueue
> disabled, could you enable the smtp_debug parameter to check where the
> problem is?

I'll attach the log. I can't see anything obviously wrong.
Attached file Log for comment 19
(In reply to Phil Barrett from comment #19)
> > If you disable JobQueue, do things work correctly?
> 
> Yes.

OK, so this problem is fixed when not using JobQueue. Could you file a separate bug about JobQueue?
Status: REOPENED → RESOLVED
Closed: 12 years ago12 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: