387860 - Subject lines in mails may contain mangled multi-byte characters

Reporter

Description

•

18 years ago

If a multi-byte character is in an unlucky spot, it may get split in two, effectively breaking it. http://landfill.bugzilla.org/bugzilla-tip/show_bug.cgi?id=5475 is an example for this: its summary “Test der Umlaute öäü” gets damaged in mails. You can reproduce this by having an e-mail sent to you for it.

Marc Schumann [:Wurblzap]

Reporter

Comment 1

•

18 years ago

Oh yeah, here's how the Subject header looks like: Subject: [Bug 62] New:=?UTF-8?Q?=20Test=20der=20Umlaute=20=C3=B6=C3?= =?UTF-8?Q?=A4=C3=BC?= The second umlaut gets split across the line break.

Ilya Slobodin

Assignee

Comment 2

•

18 years ago

Attached patch Patch for 3.0 — Details — Splinter Review

This patch is based on Dan Kogai the Encode maintainer's explanation: http://www.nntp.perl.org/group/perl.unicode/2004/02/msg2404.html Dan Kogai: It LOOKS LIKE a bug but it is not. [...] Why that happens is that you use Encode::encode("UTF-8"). That means the utf8 flag of the resulting $string is OFF. Without UTF-8 flag perl takes it as "\xA4\xC3", not "\N{LATIN CAPITAL LETTER A WITH TILDE}". Now try the code below. use Encode qw(encode); my $string = Encode::decode("UTF-8", "ääääääääääääääääääääääääääääääääääää"); # ^^^^^^ print Encode::encode("MIME-Q", $string), "\n" [...]

Marc Schumann [:Wurblzap]

Reporter

Comment 3

•

18 years ago

So... Does this mean this bug here will be fixed by bug 363153?

Max Kanat-Alexander

Comment 4

•

18 years ago

(In reply to comment #3) > So... Does this mean this bug here will be fixed by bug 363153? It means that this bug is technically part of that bug, but due to the nature of the technical difficulties we're having over there, it might be a good idea to test and consider this patch. (With the addition of spaces around the "=" of course.)

Assignee: email-notifications → islobodin

Max Kanat-Alexander

Updated

•

18 years ago

Attachment #274755 - Flags: review?(wurblzap)

Marc Schumann [:Wurblzap]

Reporter

Comment 5

•

18 years ago

Comment on attachment 274755 [details] [diff] [review] Patch for 3.0 With the patch, the subject line no longer reads Subject: [Bug 5475]=?UTF-8?Q?=20Test=20der=20Umlaute=20=C3=B6=C3?= =?UTF-8?Q?=A4=C3=BC?= but Subject: [Bug 5475]=?UTF-8?Q?=20Test=20der=20Umlaute=20=C3=B6?= =?UTF-8?Q?=C3=A4=C3=BC?= which works -- r=Wurblzap. I can do the check-in after a+, touched up as mentioned in comment 4, if you don't have CVS access, Ilya.

Attachment #274755 - Flags: review?(wurblzap) → review+

Marc Schumann [:Wurblzap]

Reporter

Updated

•

18 years ago

Flags: approval?

Flags: approval3.0?

Max Kanat-Alexander

Comment 6

•

18 years ago

I can approve this for tip now, but since we're frozen for QA, LpSolit has to approve it for 3.0. Make sure that you add spaces around the "=" on checkin.

Status: NEW → ASSIGNED

Flags: approval? → approval+

Frédéric Buclin

Comment 7

•

18 years ago

Go for it for 3.0.1. Be sure to test this patch with both utf8 turned on and off, in case one of the two cases is broken. Also, is data seen differently when being newly added in 3.0 or upgraded from 2.20 where UTF-8 wasn't in use? Just to be sure. I don't want an emergency release next week. Please land it today to catch it in our QA tests.

Flags: approval3.0? → approval3.0+

Target Milestone: Bugzilla 3.2 → Bugzilla 3.0

Version: 3.1 → 3.0

Marc Schumann [:Wurblzap]

Reporter

Comment 8

•

18 years ago

(In reply to comment #7) > Go for it for 3.0.1. Be sure to test this patch with both utf8 turned on and > off, in case one of the two cases is broken. Also, is data seen differently It works with utf8 turned on and turned off. > when being newly added in 3.0 or upgraded from 2.20 where UTF-8 wasn't in use? I can't (or won't; your pick) test the 2.20-upgrade part of this. Please reconfirm a3.0+.

Flags: approval3.0+ → approval3.0?

Marc Schumann [:Wurblzap]

Reporter

Comment 9

•

18 years ago

Thanks for the patch, Ilya. Tip: Checking in Bugzilla/Mailer.pm; /cvsroot/mozilla/webtools/bugzilla/Bugzilla/Mailer.pm,v <-- Mailer.pm new revision: 1.11; previous revision: 1.10 done 3.0: Postponed until a3.0+.

Frédéric Buclin

Comment 10

•

18 years ago

Bah... I wanted to test with a 2.20.5 installation where some comments were written with the UTF8 encoding, and some others with the ISO-8859-1 encoding (selected from my web browser), and then upgrading to 3.0.1 to see what happen. After the upgrade, the utf8 param was off, as expected when upgrading from 2.20. Then I 1) ran recode.pl 2) turned on the utf8 param and 3) ran checksetup.pl, and guess what? "Test der Umlaute öäü" which was written as UTF8 is still there, but the same string which was written with the ISO-8859-1 encoding has disappeared and has been replaced by "Test der Umlaute". All the text which was following this string has been clipped too. I ran ./contrib/recode.pl --charset=iso-8859-1 --show-failures, and it reported no error at all, but I now lost some data. I'm not blaming this patch as it has nothing to do with the problem I just had. I'm only saying that this sucks as I cannot test this patch with a real upgraded DB.

Marc Schumann [:Wurblzap]

Reporter

Comment 11

•

18 years ago

Yeah, this matches what people experience and report to support-bugzilla@.

Max Kanat-Alexander

Comment 12

•

18 years ago

There is no difference between an upgraded DB and an originally-3.0 DB. If you're experiencing dataloss even though you ran recode, file another bug. We've had a few reports of that already.

Teemu Mannermaa (:wicked)

Comment 13

•

18 years ago

Comment on attachment 274755 [details] [diff] [review] Patch for 3.0 This seems to fix the issue atleast for the testcase in this bug on a 3.0 branch installation with utf8 enabled. I did also test this with an update from 2.22 to 3.0 that had a latin1 subject on a bug. (Note that recode.pl is irrevelant at this case because it doesn't touch latin1 strings.) However, when I tested this with utf8 flag disabled (on DB that's still latin1) I found that it didn't fix the error. It still results malformed UTF8 strings on subjects with UTF8 letters. It seems Encode::MIME::Header (the Perl module that handless MIME-Q encoding type for Encode module) version that comes with Perl 5.8.5 always encodes things in UTF8 and uses Perl utf8 mode. Simplest fix for this would to be to remove if clause and always run string through UTF8 decoding. This does seem to work for UTF8 subjects. For subjects with latin1 encoding, this will just strip out the non-UTF8 characters. However, without this subjects are always said to be in UTF8 but instead contain 8-bit latin1 characters which makes TB to ignore the whole encoded word and thus strips out almost all of the subject. This means non-utf8 converted systems are already badly broken as things are now so this could be considered to be a separate bug. Giving this r- for now. I can reverse that if we don't care about non-utf8 mode as everybody should just convert their databases because otherwise there are lot's of things broken anyway.

Attachment #274755 - Flags: review-

Marc Schumann [:Wurblzap]

Reporter

Comment 14

•

18 years ago

Even if we do care about non-utf8 mode, we can surely handle this in a separate bug, I think. With this patch here, nothing changes for non-utf8 mode. Would you consider this and maybe r+?

Max Kanat-Alexander

Comment 15

•

18 years ago

(In reply to comment #13) > However, when I tested this with utf8 flag disabled (on DB that's still latin1) > I found that it didn't fix the error. That's OK, Bugzilla is not expected to handle utf8 with the utf8 flag off. > Giving this r- for now. I can reverse that if we don't care about non-utf8 mode > as everybody should just convert their databases because otherwise there are > lot's of things broken anyway. Yes, just reverse it. We don't care about people who have utf8 in their database but haven't turned on the utf8 parameter.

Frédéric Buclin

Comment 16

•

18 years ago

(In reply to comment #15) > Yes, just reverse it. We don't care about people who have utf8 in their > database but haven't turned on the utf8 parameter. No, my own tests also show that non-UTF8 data in an installation with utf8 turned off is also broken.

Marc Schumann [:Wurblzap]

Reporter

Comment 17

•

18 years ago

(In reply to comment #16) > No, my own tests also show that non-UTF8 data in an installation with utf8 > turned off is also broken. Well it's broken without this patch as well, right? So nothing new there. I don't think we should attempt to fix this very very old bugginess and the UTF-8 bugginess at the same time.

Max Kanat-Alexander

Comment 18

•

18 years ago

(In reply to comment #16) > No, my own tests also show that non-UTF8 data in an installation with utf8 > turned off is also broken. Yes, that's to be expected and is essentially impossible to fix. The data in a non-utf8 installation could be in ANY encoding! Email is latin1 by default, so we do our best to encode everything as latin1, and if users don't want to use utf8, they have to customize Bugzilla just like they always did. However, they should just be using utf8 if they have non-latin encodings.

Teemu Mannermaa (:wicked)

Comment 19

•

18 years ago

Comment on attachment 274755 [details] [diff] [review] Patch for 3.0 > > No, my own tests also show that non-UTF8 data in an installation with utf8 > Yes, that's to be expected and is essentially impossible to fix. The data in No, problem is that we SAY it's UTF8 even for latin1 encoded strings. This obviously breaks decoding process and standard compliant systems ignore whole encoded word. We simply can't use MIME-Q encoding in Encode.pm when in non-UTF8 mode as that will make this decision for us, implicitly. This regressed whenever we started to encode headers with Encode.pm. But I agree that we should handle that regression in some other bug and just note for this bug that non-UTF8 mode still mangless multi-byte characters.

Attachment #274755 - Flags: review- → review+

Frédéric Buclin

Comment 20

•

18 years ago

a=me for 3.0.1 based on my discussion with wicked on IRC. Please land this patch asap.

Flags: approval3.0? → approval3.0+

Marc Schumann [:Wurblzap]

Reporter

Comment 21

•

18 years ago

Branch: Checking in Bugzilla/Mailer.pm; /cvsroot/mozilla/webtools/bugzilla/Bugzilla/Mailer.pm,v <-- Mailer.pm new revision: 1.7.2.4; previous revision: 1.7.2.3 done

Status: ASSIGNED → RESOLVED

Closed: 18 years ago

Resolution: --- → FIXED