Closed Bug 506927 Opened 15 years ago Closed 8 years ago

utf-8 subject decoding error (" "=space is inserted in RFC2047 encoded word of Quoted-printable)

Categories

(Thunderbird :: Mail Window Front End, defect)

x86
Windows Vista
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED INVALID

People

(Reporter: raoul, Unassigned)

References

(Blocks 1 open bug)

Details

(Keywords: testcase, Whiteboard: dupme)

Attachments

(2 files)

User-Agent:       Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv:1.9.1.1) Gecko/20090715 Firefox/3.5.1 (.NET CLR 3.5.30729)
Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv:1.9.1.2pre) Gecko/20090727 Shredder/3.0b4pre ID:20090727034746

the following subject is not decoded:

Subject: =?utf-8?Q?Heute=20ist=20M=C3=A4=C3=A4=C3=A4hhday!=20Nutze=20Deine=20Chance=20und=20geh=20auf=20Schn= =C3=A4ppchenjagd!?=


i will attach the email source.

Reproducible: Always
Attached image screenshot
Attached file mail source
i think that the subject might have some errors in it. nevertheless, another 
email client i use (roundcube 0.2.x) does a better job in decoding this subject:

> Heute ist Määähhday! Nutze Deine Chance und geh auf Schn= äppchenjagd!

maybe thunderbird should also try to decode as much as possible.

cheers,
raoul
looks like a duplicate of bug 493544 , what do you think Raoul ?
Whiteboard: dupme
Version: unspecified → Trunk
i came across thus bug too, but got lost in WADA's "code" and then thought (and still think) that this is not a multi-line issue.

trying to understand the comments in bug 493544 though, i think that this bug can/might be resolved together with bug 493544.
(In reply to comment #2)
> mail source

> Subject: =?utf-8?Q?Heute=20ist=20M=C3=A4=C3=A4=C3=A4hhday!=20Nutze=20Deine=20Chance=20und=20geh=20auf=20Schn= =C3=A4ppchenjagd!?=

This is never DUP of bug 493544. Completely corrupted header: a space is inserted in a RFC2047 encoded word by mail sender or mail server.
What software generated such header? Why such software is used as mailer which sends mails to external mail recipients or as software to manage mail data at mail server? Spam or phishing mail?

Quirks is possible in this case: Remove white space character after [CRLF] used for header folding. But it means ignoring of most basic rule of mail, RFC822/RFC2822. (i.e. don't respect RFC822/RFC2822).
Another workaround of this case: Introduce bug of "ignore white space character used for header folding" for compatibility with bad softwares.
another fact pointing to a broken email script is the fact that the first 
part of the subject is exatly 100bytes in size:

 =?utf-8?Q?Heute=20ist=20M=C3=A4=C3=A4=C3=A4hhday!=20Nutze=20Deine=20Chance=20und=20geh=20auf=20Schn=

suggesting a wrong construction of a folded multiple-line subject.

(un)fortunatly, i am not that familiar with (broken) email clients to judge if quirks is doing anything good in this special case.

anyways, i'll also report to the creator of this very email.
Keywords: testcase
(In reply to comment #7)
> another fact pointing to a broken email script is the fact that the first 
> part of the subject is exactly 100bytes in size:

"Quirks is possible" in my comment #6 was incorrect. It looks next.
  Software at mail server tried to split long Subject: header,
  and inserted a space for folding without care for RFC 2047 encoding.
  And, he somehow failed to insert [CRLF] before the inserted space for folding.
Summary: utf-8 subject decoding error → utf-8 subject decoding error (" "=space is inserted in RFC2047 encoded word of Quoted-printable)
I just updated to Thunderbird 3.0RC1, and noticed the subject error with a mailing list I'm on.  It displays in forwarded e-mail, and in the message pane and headers as a single question mark on a black background (see below). 

Subject: �
Date: Mon, 30 Nov 2009 08:05:14 -0800
From: The Christian Science Monitor <daily@csmonitor.com>
Reply-To: daily@csmonitor.com

Today's News: November 30, 2009

---------------------------------------------------
However when I look at the source, it appears that they are trying to "encode" the subject as utf-8.  I pasted this quoted into an e-mail, so it's has extra ">" at the beginning of the lines.

> Subject: =?utf-8?Q?Three=20questions=20Obama=20must=20answer=20in=20Afghanistan=20speech,=20Christmas=20cactuses=20=96=20what=20you=20need=20to=20know?=
> Date: Mon, 30 Nov 2009 08:05:14 -0800
> X-Delivery: Custom 31161
> Reply-To: daily@csmonitor.com
> X-Complaints-To: abuse@elabs5.com
> Message-Id: <20091130160613.35CB52DFA553@elabs5.com>
> Content-Type: text/plain; charset="utf-8"
> From: "=?utf-8?Q?The=20Christian=20Science=20Monitor?=" <daily@csmonitor.com>

Anyway, I think it may be the same problem as described above.
(In reply to comment #10)
> Subject: =?utf-8?Q?Three=20questions=20Obama=20must=20answer=20in=20Afghanistan=20speech,=20Christmas=20cactuses=20=96=20what=20you=20need=20to=20know?=

(i)  Encoded word is longer than RFC2047 requests.
(ii) cactuses=20=96=20what exists in the long encoded word.
     =96(0x96) is invalid data in UTF-8.

If cactuses=20=96=20what is replaced by cactuses=20Z20what(=96 -> Z), Tb decodes the Subject: as expected. Tb looks to be torelant with (i).
> Three questions Obama must answer in Afghanistan speech, Christmas cactuses Z what you need to know

If windows-1252 is properly set,
> Subject: =?windows-1252?Q?Three=20questions=20Obama=20must=20answer=20in=20Afghanistan=20speech,=20Christmas=20cactuses=20=96=20what=20you=20need=20to=20know?=
the subject is displayed as follows by Tb 3.
> Three questions Obama must answer in Afghanistan speech, Christmas cactuses – what you need to know 

> Anyway, I think it may be the same problem as described above.

Your case is completely different from this bug, even if =?utf-8?Q? is common. Dan Schwartz, if you believe quirks by Tb is mandatory for your case, open separate enhancement bug, please. In your case, I think "ignoring =96 and after it only" or "display in U+FFFD for some bytes around =96" is possible, if no security risk by the quirks.
(In reply to comment #11)
> (In reply to comment #10)
> > Subject: =?utf-8?Q?Three=20questions=20Obama=20must=20answer=20in=20Afghanistan=20speech,=20Christmas=20cactuses=20=96=20what=20you=20need=20to=20know?=
> 
> (i)  Encoded word is longer than RFC2047 requests.
> (ii) cactuses=20=96=20what exists in the long encoded word.
>      =96(0x96) is invalid data in UTF-8.
> 
> If cactuses=20=96=20what is replaced by cactuses=20Z20what(=96 -> Z), Tb
> decodes the Subject: as expected. Tb looks to be torelant with (i).
> > Three questions Obama must answer in Afghanistan speech, Christmas cactuses Z what you need to know
> 
> If windows-1252 is properly set,
> > Subject: =?windows-1252?Q?Three=20questions=20Obama=20must=20answer=20in=20Afghanistan=20speech,=20Christmas=20cactuses=20=96=20what=20you=20need=20to=20know?=
> the subject is displayed as follows by Tb 3.
> > Three questions Obama must answer in Afghanistan speech, Christmas cactuses – what you need to know 
> 
> > Anyway, I think it may be the same problem as described above.
> 
> Your case is completely different from this bug, even if =?utf-8?Q? is common.
> Dan Schwartz, if you believe quirks by Tb is mandatory for your case, open
> separate enhancement bug, please. In your case, I think "ignoring =96 and after
> it only" or "display in U+FFFD for some bytes around =96" is possible, if no
> security risk by the quirks.

Thanks for seeing that.  I just noticed the same behavior in TB 2.23, and today's newsletter came through fine.  Clearly an incident of upgrading and seeing some unexpected results due to a source error.
Ping.
Blocks: RFC2047
    Would this problem be handled further on ? Because in Thunderbird 13.0.1 we still face the same kind of problem as,

Subject: =?utf-8?B?44CQTklTU0FO5pyI5YiKNDDmnJ/jgJHmo67lkbzlkLjkuIDlpI/vvIzlhrfmsKPpm6jlraPlhY3osrvmqqLpqZc===?= 

Thank you and Regards.
=?utf-8?Q?Heute=20ist=20M=C3=A4=C3=A4=C3=A4hhday!=20Nutze=20Deine=20Chance=20und=20geh=20auf=20Schn= =C3=A4ppchenjagd!?=

is invalid as can be seen here: http://www.jorgk.com/decode (using the PHP decoder). Result is:
Heute ist Määähhday! Nutze Deine Chance und geh auf Schn= äppchenjagd!

That is what TB shows in the thread pane and will also show in the message page once bug 1146099 has landed.

Re. comment #15:
=?utf-8?B?44CQTklTU0FO5pyI5YiKNDDmnJ/jgJHmo67lkbzlkLjkuIDlpI/vvIzlhrfmsKPpm6jlraPlhY3osrvmqqLpqZc===?= 
works now and shows:
【NISSAN月刊40期】森呼吸一夏,冷氣雨季免費檢驗
Status: UNCONFIRMED → RESOLVED
Closed: 8 years ago
Resolution: --- → INVALID
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: