Closed Bug 1454257 Opened 6 years ago Closed 6 years ago

TB not treating folding with empty continuation line consistently, some message headers not recognised after first line with only white-space

Categories

(MailNews Core :: MIME, defect)

defect
Not set
normal

Tracking

(thunderbird60 fixed, thunderbird61 fixed)

RESOLVED FIXED
Thunderbird 61.0
Tracking Status
thunderbird60 --- fixed
thunderbird61 --- fixed

People

(Reporter: jpolblanc, Assigned: jorgk-bmo)

References

Details

Attachments

(2 files, 11 obsolete files)

216 bytes, message/rfc822
Details
4.09 KB, patch
aceman
: review+
Details | Diff | Splinter Review
Attached image filtre impots.jpg (obsolete) —
User Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36

Steps to reproduce:

I recently received an important email from the tax administration. In this email the address of the sender includes diacritics: it is :
"Direction Générale des Finances Publiques  <ne-pas-repondre@dgfip.finances.gouv.fr>". 
My Filter for this type of message is the following (see screenshot attached):
condition : From --> is ended by --> @dgfip.finances.gouv.fr
action : move message to --> mailbox "tax" on local folders
Btw I have 300 other similar filters that work fine (but I presume all with sender names without diacritics).
My version of Thunderbird is 52.7.0 on Mac OS X 10.11.3. Thunderbird is saying itself "uptodate".



Actual results:

The diacritics are displayed well but the filtering system (automatic sorting according to the email address) does not work.
Result: the message did not put in the mailbox "tax" I had planned for emails from taxes and suddenly I took more than ten days to see this message (which was quite important). I guess I'm not the only one in this case.


Expected results:

The message should be moved to the correct mailbox "tax". 

I add that by looking at the source code of the message, the name and address of the sender appear as follows:

From: =?UTF-8?Q?Direction_G=c3=a9n=c3=a9rale_des_Finances_Publiques?=<ne-pas-repondre@dgfip.finances.gouv.fr>

which seems to me quite normal and correct. Besides, it is perfectly decoded in the cartridge above the message itself.
I created myself a message with that address and set up a filter to tag the message. With the message stored in a local folder, it's all working for me in TB 52 and our Daily version TB 61.

Where is the stored when it is received? In an IMAP folder or a local folder via POP? And the target folder is IMAP or a local folder?
You can see on this screenshot two types of messages :
1) messages with the correspondent "ne pas repondre@dgfip.finances.gouv.fr"
These messages work without any problem.
2) messages with the correspondent "-" (just an hyphen). These ones are the ones that don't work.
Hello Jorg,

Thanks for your test and answer.

It's received in a local folder via POP, and the target folder is a local folder too.
You said that you created yourself a message with that address. Did you include the text with the diacritics ?

Btw I've remarked something more : In my mail box the "bad" messages appear with just an hyphen in the correspondent column. See the screenshot I've just attached.

So TB doesn't recognise correctly the From: field. I'll try to copy the headers of the message in my next comment.
From - Sat Apr 14 14:03:24 2018
X-Account-Key: account2
X-UIDL: 408628.OispcFzMd7Nvbj,DnLSDTNR6uak=
X-Mozilla-Status: 0001
X-Mozilla-Status2: 00000000
X-Mozilla-Keys:                                                                                 
Return-Path: ne-pas-repondre@dgfip.finances.gouv.fr
Received: from zimbra8-e1.priv.proxad.net (LHLO zimbra8-e1.priv.proxad.net)
 (172.20.243.158) by zimbra8-e1.priv.proxad.net with LMTP; Sat, 14 Apr 2018
 14:01:31 +0200 (CEST)
Received: from pf2pusi003.dgfip.finances.gouv.fr (mx22-g26.priv.proxad.net [172.20.243.92])
	by zimbra8-e1.priv.proxad.net (Postfix) with ESMTP id 14F032955AA
	for <jpolblanc@free.fr>; Sat, 14 Apr 2018 14:01:31 +0200 (CEST)
Received: from pf2pusi003.dgfip.finances.gouv.fr ([145.242.11.105])
	by mx1-g20.free.fr (MXproxy) for jpolblanc@free.fr;
	Sat, 14 Apr 2018 14:01:31 +0200 (CEST)
X-ProXaD-SC: state=HAM score=0
X-ProXaD-Cause: (null)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
	d=dgfip.finances.gouv.fr; s=default; t=1523707290;
	bh=Yt20F7G+tdBKUBrm/2TPSJ5LYwpAJ4XrQSjWL0y93Nw=;
	h=Date:Subject:To:From;
	b=mTY6MyDsNe/WGpxwQgsD8szOSAJL7YI1DWrgfxBxQufunyBy46nw9dveZ49/8OXaO
	 AEV/aARjsUDjuc8ZNXeI5mNT+rq7DO7X2eCyFYaEFmcKUrOq66v4FSnB0nkt2xQUMI
	 6dhF2mJYGTBQcPmKmOmGBwMKCkbOsmuLXwOVOYVw9je8kXvJ16tkhTmWx+t/xSFpWO
	 hklnQNFaKvCUnw876fnddMuTyKoYX2BEFGWpj+OZkoomY9j+kox97zTz1pWDNGJnWJ
	 PtOWBe1whDCkfAAZmt2H7aAzFLddYob+GAdvLSeu0O/KsoaNFhK4LuNWQX2oYGBjix
	 KDH2nds8atHJg==
MIME-Version: 1.0
Content-Transfer-Encoding: binary
Content-Type: multipart/alternative; boundary="_----------=_15237072904155349794"
Date: Sat, 14 Apr 2018 14:01:30 +0200
Subject: =?UTF-8?Q?Ouverture_du_service_de_d=c3=a9claration_en_ligne?=
To: jpolblanc@free.fr
 
From: =?UTF-8?Q?Direction_G=c3=a9n=c3=a9rale_des_Finances_Publiques?=<ne-pas-repondre@dgfip.finances.gouv.fr>
X-Mailer: MIME::Lite::HTML 1.23
Message-Id: <20180414120130.E330A824EC@pf2pusi003.dgfip.finances.gouv.fr>

This is a multi-part message in MIME format.

--_----------=_15237072904155349794
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset=UTF-8
Date: Sat, 14 Apr 2018 14:01:30 +0200

Ouverture du service de d=E9claration en ligne
[...]
If the message is really as you have pasted here, it is corrupt.

There *must* not be an empty line between headers since an empty line signifies the end of the headers. Is there really an empty line between To: and From:? That would explain why From: is not recognised.

The other thing I don't understand is the message body: d=E9claration. =E9 is not a valid UTF-8 character, hence the body isn't displayed properly.

How about you save one of those messages as an .eml file and attach it here. Of course you can edit the file in a text editor before and remove personal information.

I suggest you contact the French tax authorities and ask them not to waste tax payers' money by sending out e-mail which violates international standards :-(
Alfred, another "standards" issue. Should the empty line which appears to contain a space be treated as header folding. Looks like in the thread pane the From: isn't shown but the in the header pane it is shown. So it appears that two different parts of the system treat these headers differently.
Flags: needinfo?(infofrommozilla)
Summary: bad filtering when the sender name has diacritics (accented characters in French) - problem with UTF8 I presume → Headers in message headers not recognised after first empty line (or line with only white-space)
I've tried to do the same as you : modify one of my account so it has a similar text in its address and send me an email. No problem in this case, as for you : the filter works fine. 
So I've carefully examined the sources of the two messages: the bad one from the tax admin, the good one from my modified address... And I think I've found something. 
Bellow are the two From: lines.

The first one is :
From: =?UTF-8?Q?Direction_G=c3=a9n=c3=a9rale_des_Finances_Publiques?=<ne-pas-repondre@dgfip.finances.gouv.fr>

The second one is : 
From: =?UTF-8?Q?JPaul_G=c3=a9n=c3=a9ral_des_finances_publiques?=
 <jipaulb@free.fr>

As you can see, there is a space character in this second one (so the line is folded), but not in the first one.

I think it's a bug from the sender, but its a pity that TB can't detect it and resolve it. What do you think about that ?

If I knew how to do, I'd try to redirect my bad message to you. I'm sure in this case you could experiment the same as me.

Sincerely,
  JPaul.
So we sent our comments simultaneously.

And I agree with you the message from tax admin is corrupted. I already tried to contact them suggesting that they don't use diacritics. But I'll now ask them what you suggest !..

Thanks a lot

   JPaul.
Why don't you attach the message so we can see what it really is? You can drag the message to the desktop or a folder or use "File > Save as".
(In reply to Jorg K (GMT+1) from comment #6)
> two different parts of the system treat these headers differently.

If you mean message pane vs folder pane - known bug?  (but I'm not finding the one I remember seeing)
They can use diacritics all they like. German Umlaute äöü, Spanish/French accents áóú, Greek, Hebrew, Chinese, Korean, Japanese *all* work if properly encoded and with proper header structure. Trust me, the accented characters are not the problem here.
(In reply to Wayne Mery (:wsmwk) from comment #10)
> If you mean message pane vs folder pane - known bug?  (but I'm not finding
> the one I remember seeing)
I've fixed a few bugs, or so I thought, where those two were treated differently (and I could find them, too, for example bug 1146099), but this is about how headers as a whole are interpreted and what is the assumed end of the headers.

I might prepare a simple example to show the problem.
Simple test mail to demonstrate the problem:

===
Date: Sat, 14 Apr 2018 14:01:30 +0200
Subject: Badly folded headers, one line with space between To and From
To: recipient@free.fr
 
From: sender@also-free.fr

Here's out body
===

There's a line with a space between To: and From:. TB doesn't display the From: on the thread pane but it does show it in the headers pane (the headers display just above the preview).

Apparently filtering on the From: header also doesn't work in this case, no surprise.
Here is the file. Sorry I forgot to attach it.

By the way, what do you mean by "Alfred an other standards issue" ?
Here is the file. Sorry I forgot to attach it.

By the way, what do you mean by "Alfred an other standards issue" ?
Hello Jorg,
Sorry for the double attachment.
And I eventually understood the sentence "Alfred, an other standards issue".

Now three questions : Should I complain to the administration of taxes that they do not meet the standards, or no ?
Is the empty line (with a space) the only problem ?
Or is it also necessary, as I thought, to put a space between the sign = and the sign < of the "From:" header ?
Attachment #8968364 - Attachment mime type: message/rfc822 → text/plain
Attachment #8968365 - Attachment mime type: message/rfc822 → text/plain
Comment on attachment 8968365 [details]
Ouverture du service de déclaration en ligne.eml

Message attached twice, obsoleting second copy.
Attachment #8968365 - Attachment is obsolete: true
(In reply to JiPaulB from comment #15)
> By the way, what do you mean by "Alfred an other standards issue" ?
Alfred is a volunteer you works on bugs related to the standards compliance of TB.

The problem here is that you have a badly folded header. Mail headers do not allow empty lines, but they do allow continuation on another line if the continuation line starts with white-space.

So legal are:

===
Date: Sat, 14 Apr 2018 14:01:30 +0200
Subject: Badly folded headers, one line with space between To and From
To: recipient@free.fr
From: sender@also-free.fr
===

or

===
Date: Sat, 14 Apr 2018 14:01:30 +0200
Subject: Badly folded headers, one line with space between To and From
To: recipient@free.fr
From: here-is-a-long-sender-name
  sender@also-free.fr
===

Illegal is an absolutely empty line between headers:

===
Date: Sat, 14 Apr 2018 14:01:30 +0200
Subject: Badly folded headers, one line with space between To and From
To: recipient@free.fr

From: sender@also-free.fr
===

Here we have an edge case, a line with just a space (which you can't see).

===
Date: Sat, 14 Apr 2018 14:01:30 +0200
Subject: Badly folded headers, one line with space between To and From
To: recipient@free.fr
 
From: sender@also-free.fr
===

Ideally TB would just merge the space into the previous line, but that is not what happens. So we need to see what the standard says about this case.

(In reply to JiPaulB from comment #16)
> Now three questions : Should I complain to the administration of taxes that
> they do not meet the standards, or no ?
Yes. Also, if you view the body as plain text
  View > Message Body As > Plain Text
you see:
  L'ann�e derni�re, vous avez d�clar� vos revenus en ligne et nous vous en remercions.
Clearly these people can't assemble a valid e-mail. Which is funny, since half the French government actually use TB (the other half uses Outlook).

> Is the empty line (with a space) the only problem ?
No, the plain text part is also invalid.

> Or is it also necessary, as I thought, to put a space between the sign = and
> the sign < of the "From:" header ?
That will just add a space between the display name and the e-mail address but not help us here.

In the long term we should fix TB to deal with the situation better, however, we cannot dedicate much time to fixing issues for non-compliant messages. That's why we need Alfred to tell us whether the "folding with empty line" is semi-legal or not. That will adjust the priorities.
Status: UNCONFIRMED → NEW
Component: Filters → MIME
Ever confirmed: true
Product: Thunderbird → MailNews Core
Summary: Headers in message headers not recognised after first empty line (or line with only white-space) → TB not treating folding with empty continuation line consistently, some message headers not recognised after first line with only white-space
Version: 52 Branch → 52
I debugged this a bit. Looks like the information stored in the nsIMsgHdr is already wrong. No surprise since this feeds the thread pane where we see the issue. Most likely the header values are also used in filtering which isn't working as per the original report. I see debug like this for my test message:

=== IsReplyAllEnabled: msgHdr.author=
=== IsReplyAllEnabled: msgHdr.recipients=recipient@free.fr From: sender@also-free.fr

So the task here is to find where the header's author and recipients gets parsed and stored in the database. That is likely to happen when the message is received or imported from file.
Just for the record: A little research shows that nsMsgHdr::SetAuthor() is called in nsParseMailMessageState::FinalizeHeaders(). That's called in nsParseMailMessageState::ParseFolderLine() and just before we have a call to nsParseMailMessageState::ParseHeaders() which is this lovely function that does some home-grown parsing:
https://dxr.mozilla.org/comm-central/rev/5ca608b153140e34720bde4bb237c368ab357756/mailnews/local/src/nsParseMailbox.cpp#929

I'll debug this when I find the time.
There is a clear logic error in this code at
https://dxr.mozilla.org/comm-central/rev/5ca608b153140e34720bde4bb237c368ab357756/mailnews/local/src/nsParseMailbox.cpp#1094
      // eliminate any additional white space
      while (buf < buf_end &&
              (*buf == '\n' || *buf == '\r' || *buf == ' ' || *buf == '\t'))

'buf' points to the line-end preceding the folding space on the continuation line. This code will "eat" a white-space only continuation line including its line-end and glue the next header is finds onto the previous one as observed.

So

To: xx
 <-- single space here
From: yy

becomes To: xx From: yy.

:-(
Flags: needinfo?(infofrommozilla)
Better test case with a lot of "stressful" folding in the subject.
Attachment #8968065 - Attachment is obsolete: true
Attachment #8968186 - Attachment is obsolete: true
Attachment #8968237 - Attachment is obsolete: true
Attachment #8968364 - Attachment is obsolete: true
This will deal with one empty continuation line correctly, but won't get the subject in the test case right yet.
Assignee: nobody → jorgk
Status: NEW → ASSIGNED
This works now. I removed the crazy writeOffset logic which was also faulty.

For the review, keep in mind: (buf - writeOffset) is the new bufWrite.

So that makes
       header->value = value;
-      header->length = buf - header->value - writeOffset;
+      header->length = bufWrite - value;
easy to understand :-)
Attachment #8968912 - Flags: review?(acelists)
Comment on attachment 8968912 [details] [diff] [review]
1454257-handle-empty-continuation.patch (v2)

Review of attachment 8968912 [details] [diff] [review]:
-----------------------------------------------------------------

::: mailnews/local/src/nsParseMailbox.cpp
@@ -1093,4 @@
>  
> -      // eliminate any additional white space
> -      while (buf < buf_end &&
> -              (*buf == '\n' || *buf == '\r' || *buf == ' ' || *buf == '\t'))

This was the fatal flaw. If the line only contains white-space, this will eat into the next non-folded line.

'buf' pointed to CRLF or CR or LF just before the continuation line, so the line end can be ignored in some other way, like I did by simply overwriting it with spaces. We of course don't want to ignore the next line end!

I hope the new code is clearer now and the comments actually explain what it does. Using read and write pointers is of course easier to understand than the crazy offset logic.
In the header pane multiple spaces are compressed, so let's to that in the thread pane, too.
Attachment #8968849 - Attachment is obsolete: true
Attachment #8968852 - Attachment is obsolete: true
Attachment #8968912 - Attachment is obsolete: true
Attachment #8968912 - Flags: review?(acelists)
Added a few lines for white-space compression. Now the attached test message looks identical in the thread and the header pane.
Attachment #8969007 - Flags: review?(acelists)
Attachment #8969003 - Attachment mime type: message/rfc822 → text/plain
Making the test case a little harder still.
Attachment #8969003 - Attachment is obsolete: true
So the new test shows that the code now removes any multiple spaces wherever they happen: Leading, trailing, inter-word.

This matches JS Mime behaviour as can be seen in the header pane (above the preview).
Small improvement: Since we're already not copying subsequent spaces (added in v3), the backtracking can be replaced by a simple check of the last byte written.

Use interdiff to see the evolvement from v2 to v3 to v4.

This simplifies the algorithm, right? ;-)
Attachment #8969007 - Attachment is obsolete: true
Attachment #8969007 - Flags: review?(acelists)
Attachment #8970032 - Flags: review?(acelists)
I've discussed the patch with Aceman on IRC and he suggested some improvements which are included here.
Attachment #8970032 - Attachment is obsolete: true
Attachment #8970032 - Flags: review?(acelists)
Attachment #8970047 - Flags: review?(acelists)
Comment on attachment 8970032 [details] [diff] [review]
1454257-handle-empty-continuation.patch (v4)

Review of attachment 8970032 [details] [diff] [review]:
-----------------------------------------------------------------

Thanks, simplifying this with 2 pointers instead of 1 + offset seems OK.

I can't judge whether this code is now correct, but I have tested it with the sample .eml mesage and it seems to work.
I also killed the .msf file of one of my large test folders and let it rebuild with this patch and all seems fine at first glance.

It would be good if the sample message was turned into a test later on.

I suggest we first let this bake on nightly for some time (not pushing it to beta anytime soon).

::: mailnews/local/src/nsParseMailbox.cpp
@@ +1088,5 @@
>      {
> +      // Overwrite the end of line. Regardless of whether we had
> +      // CRLF, CR or LF, we can always nuke two characters.
> +      buf[0] = ' ';
> +      buf[1] = ' ';

I'm unneasy about writing to buf. If we decided the chars at buf are to be skipped let's skip them explicitly, via some buf+2.

@@ +1093,3 @@
>  
> +      // Add a single folding space if necessary.
> +      if (*(bufWrite - 1) != ' ' && *(bufWrite - 1 ) != '\t')

Surplus space after '- 1 '.
Attachment #8970032 - Attachment is obsolete: false
Attachment #8970047 - Flags: review?(acelists) → review+
Attachment #8970032 - Attachment is obsolete: true
(In reply to :aceman from comment #33)
> I suggest we first let this bake on nightly for some time (not pushing it to
> beta anytime soon).
This decision is up to the very capable release manager ;-)
Since Dailies are pretty much unavailable, we don't have that luxury. But if we uplift to beta, we can get feedback soon.
Pushed by mozilla@jorgk.com:
https://hg.mozilla.org/comm-central/rev/e98ab41b1ba1
handle empty continuation line correctly. r=aceman
Status: ASSIGNED → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Target Milestone: --- → Thunderbird 61.0
Attachment #8970047 - Flags: approval-comm-beta+
Blocks: 1456001
Flags: in-testsuite+
OS: Unspecified → All
Hardware: Unspecified → All
I'd prefer this late in the beta cycle which is our prep for esr60 that we be uplifting stability, security issues, and recent regression, and leaving other issues to being uplifted in future esr.
Bug 674247 is a Dupe of this.
Nothing new under the sun ;-)
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: