Closed Bug 678322 Opened 13 years ago Closed 6 years ago

Filters not working for headers that appear more than once (except Received: header, which are processed as expected)

Categories

(MailNews Core :: Filters, defect)

defect
Not set
major

Tracking

(thunderbird_esr6063+ fixed, thunderbird63 fixed)

RESOLVED FIXED
Thunderbird 63.0
Tracking Status
thunderbird_esr60 63+ fixed
thunderbird63 --- fixed

People

(Reporter: mozilla_bugzilla, Assigned: arekm)

References

Details

(Keywords: regression, Whiteboard: [good first bug][lang=c++])

Attachments

(3 files, 5 obsolete files)

User Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:5.0) Gecko/20100101 Firefox/5.0
Build ID: 20110615151330

Steps to reproduce:

I upgraded from the latest version in the 3.x-range, to 5.x.


Actual results:

The upgrade went well, but afterwards i noticed some of my filters had stopped working. The filters are used daily, so I discovered the problem quickly after upgrading.

On investigation I noticed the filters that are set on headers that can appear more than once have stopped, others are still functioning normally.

For example:
The Delivered-To header can appear more than once in a single e-mail. I use this header to filter some e-mail to a specific folder. This used to work fine in all other previous versions of Thunderbird, but in version 5.0 it's broken.

The Delivered-To header that matched on previous versions is the last one in the document, for example:
Delivered-To: a@domain.nl
Received: (qmail 27509 invoked by uid 1003); 11 Aug 2011 15:23:32 -0000
Delivered-To: b@domain.nl
Received: (qmail 27505 invoked by uid 1003); 11 Aug 2011 15:23:32 -0000
Delivered-To: c@domain.nl
Received: (qmail 27501 invoked from network); 11 Aug 2011 15:23:32 -0000

I was matching on c@domain.nl.


Expected results:

I was expecting the same behaviour as previous versions, where my rule would match and e-mails would be filtered.

Also, I'm curious how you would prefer this to work:
- Should a rule on a header that appears multiple times, be triggered when any of those headers match? (I believe it should, because I think most users would expect that behaviour)

- What should happen if I create multiple rules in a filter, targetting the same header, with different matches? Like:

IF Delivered-To IS b@domain.nl
AND
IF Delivered-To IS c@domain.nl
THEN
[...]

Of course I would expect this to work if both where true, but is this also taken care of in the code handling filters?
Oh I'm sorry;
My specific version is shown as 5.0 and release notes point to "v. 5.0, released: June 28, 2011" on https://www.mozilla.org/en-US/thunderbird/5.0/releasenotes/?uri=/thunderbird/releasenotes&locale=en-US&version=5.0&os=WINNT&buildid=20110624125636
Severity: normal → major
Component: General → Filters
Keywords: regression
Product: Thunderbird → MailNews Core
QA Contact: general → filters
Martijn, do you experience the problem also with v6 or v7?
v7 beta at http://www.mozilla.org/en-US/thunderbird/channel/
Hi Wayne. I just had the opportunity to install and test on 6.0.1, and the problem persists. Both for filters that should run on retrieving mail, as well as running them manually on a mailbox that contains mails that should have been filtered.

I'm not sure if/when I will have the opportunity to test v7.
(don't trouble yourself to test 7, I doubt it will help)
If this is truly a regression, perhaps it's caused by bug 124641, and not fixed by Bug 655578 - Some emails aren't filtered on list-id anymore (because 655578 if fixed in v5)

FWIW, I don't see any bugs filed since 2011-04-29, when 124641 checked in, that might be regressions.
Hi, same here and from my tests I may shed some light on the problem:

My guess is, that only one occurance of the respective Header is used. Up to V3 it was the last line, now it is the first.
There a good chance your guess is correct. When my filters worked I was matching against the last appearance of that particular header.
I made some additional tests. It seems Cc works so maybe V3 just worked as expected while V5+ only uses first occurance for Delivered-To. I also did some tests with Received which should normally give the same results in my case but if I didn´t mess up anything, it seemed to work!? Can you confirm Martijn?

Btw just in case someone blames us for being the only ones: Many users are probably using To/Cc but if you get messages as Bcc often, that doesn´t work. And I saw users having a combination in their filter and they didn´t notice, that this part didn´t work anymore...
I tested the Received-header and indeed... it works like it should. I did seperate tests for first and last appearance and both filters worked.

So mh, what's the difference between Received and Delivered-To in terms of parsing filters? Is Received a header that comes with any Thunderbird install to filter on? If that is the case then the title for this bug should probably be: "Filters not working for CUSTOM headers that can appear more than once"

Can anyone clarify?
Received is as Delivered-To in my user defined header lines. But Received is no workaround in my case, as Received does not necessarily contain a for ... per line and may contain other addresses like the sender.
Indeed, Received is not a substitute for Delivered-To. My Received lines do not include anything that identify the receiving mailaddress.

So, we need someone to shed some light on what we're experiencing here :)
nsMsgSearchTerm::MatchArbitraryHeader special cases the received header. We could either add the delivered-to header to that special casing, or try to have a general approach for headers that appear multiple times.
Status: UNCONFIRMED → NEW
Ever confirmed: true
I'd vote for the general approach, to match all occurrences of the header. Bug 687350 seem to be the same request as this one.

I am not sure from the comments if the all occurrences matching does not work for all headers (except special cased Received) or it only does not work for custom headers. I will test that later today.
It seems it doesn't work on standard headers too. There probably aren't many that can appear multiple times. I tested on date by manually editing the mbox file to duplicate line with the header and changing the date. I also found Edit->Find->Search messages also has this problem.
I think the semantics of IS and CONTAINS is not clear to me. 

Let's look at an example:

  Delivered-To: foo@domain.com
  Delivered-To: bar@domain.com


What "Delivered-To IS foo@domain.com" mean?

1. If IS means that it is the entire header, then it
   should return false, because there are other headers. 

2. If IS means header contains a string that IS "foo@domain.com", 
   then it should return true.

My guess is, that someone changed the semantics of IS form second meaning to the first one. And I guess it also impacted the meaning of CONTAINS (bug 697096):

I think CONTAINS should always mean, any header of the given type contains the string. 

But there is an ambiguity with CONTAINS. Lets take this example:
What does "Delivered-To CONTAINS bar@domain.com" match?

It clearly matches:
  Delivered-To: foo@domain.com
  Delivered-To: bar@domain.com

But does it match
  Delivered-To: foo@domain.com
  Delivered-To: foobar@domain.com

It "bar@domain.com" is seen as a substring, it matches. If it is seen
one line then it does not match.

I thing there has to be a clear semantics what
  IS, CONTAINS etc mean
and a clear semantics how the pattern is interpreted in the case of CONTAINS....
I agree with this. There should be some spec.
I'd actually say that the matching should be done on each occurrence of the header separately. So that would give behaviour 2. for IS. For CONTAINS, it would match both of your examples.
Blocks: 705587
FYI.
Following is permitted number of headers defined in RFC 5322.
> http://tools.ietf.org/html/rfc5322#section-3.6

(1) Max number = unlimited
>    +----------------+--------+------------+
>    | Field          | Min    | Max number |
>    |                | number |            |
>    +----------------+--------+------------+
>    | trace          | 0      | unlimited  |
>    | resent-date    | 0*     | unlimited* |
>    | resent-from    | 0      | unlimited* |
>    | resent-sender  | 0*     | unlimited* |
>    | resent-to      | 0      | unlimited* |
>    | resent-cc      | 0      | unlimited* |
>    | resent-bcc     | 0      | unlimited* |
>    | resent-msg-id  | 0      | unlimited* |
>    | comments       | 0      | unlimited  |
>    | keywords       | 0      | unlimited  |
>    | optional-field | 0      | unlimited  |
>    +----------------+--------+------------+

(2) Max number = 1
>    +----------------+--------+------------+
>    | Field          | Min    | Max number |
>    |                | number |            |
>    +----------------+--------+------------+
>    | orig-date      | 1      | 1          |
>    | from           | 1      | 1          |
>    | sender         | 0*     | 1          |
>    | reply-to       | 0      | 1          |
>    | to             | 0      | 1          |
>    | cc             | 0      | 1          |
>    | bcc            | 0      | 1          |
>    | message-id     | 0*     | 1          |
>    | in-reply-to    | 0*     | 1          |
>    | references     | 0*     | 1          |
>    | subject        | 0      | 1          |
>    +----------------+--------+------------+

If Max number=unlimited, same treatment as current one on Received: is needed.
However, if max number=1, "multiple headers" means malformed mail, and first header should be used for thread pane display & message pane display, and in message filtering & search. Currently reported bugs for it;
  bug 172104       : Display of multiple Subject:
  (not opened yet) : Display of multiple From: ( bug 310189 comment #7)
  bug 310189       : Filter of multiple Subject:, From:.
probable source of regression, both checked in 2011-04-29:
* Bug 124641 - Filter or Search: does not handle multi-line (wrapped, folded) headers correctly when search term spans lines 
* Bug 178870 - [news filters] implement filter after the fact (make Run Filters on Folder available for newsgroups) 

so 20110429 build should work. 20110430 build should fail
ftp://ftp.mozilla.org/pub/thunderbird/nightly/2011/04
Summary: Filters not working for headers that can appear more than once → Filters not working for headers that can appear more than once (except Received: header. if Received:, multiple headers are processed as expected))
So accumulation should be used for any header that isn't explicitly limited to one?
See isReceivedHeader http://mxr.mozilla.org/comm-central/source/mailnews/base/search/src/nsMsgSearchTerm.cpp#810
I don't fully understand what is going on in that isReceivedHeader code, but my feeling is that the behavior should be:

IS:  If any one of the headers exactly matches the string.
CONTAINS:  If any one of the headers contains the string.

If you mean by accumulation that the contents of the headers are concatenated before checking, which I *think* is what that code is doing, that would break the IS test.  As a result, only CONTAINS would work on multiple headers.  That would be less than ideal, but MUCH better than the current situation.
(1) Platform -> All? Probably fails everywhere. Fails on WV, Tb 11.0
(2) +1 on Comment 21. Not ideal, but I can live with it.
See 3 years earlier:
https://bugzilla.mozilla.org/show_bug.cgi?id=16913#c101
[Filter news based on any headers]
(In reply to NoOp from comment #23)
> See 3 years earlier:
> https://bugzilla.mozilla.org/show_bug.cgi?id=16913#c101
> [Filter news based on any headers]

How is this related ?
I'm struggling with this bug myself for a custom X-List header. Since it's a non-standard header, it falls under the optional-field specification in RFC 5322 and as such an unlimited number of these should be parsed. It appears that neither "is" nor "contains" works when trying to match in a filter.
Is nobody really interested in actually fixing bugs in Thunderbird anymore? Every damn version since 3 has only been integrating Firefox bugs.
Not likely... this has been around for more than a few years.
@Wayne re Comment 23: read the bug link and you will see that the 2009 comment I made is exactly what is happening here. This bug subject:
Bug 678322 - Filters not working for headers that can appear more than once (except Received: header. if Received:, multiple headers are processed as expected)) 

My comment in 16913:
<quote>
 NoOp 2009-11-07 17:31:48 PST

I see the subject bug is "Filter news based on any headers", however the filters only work on the _first_ such header. For example; if you have multiple 'Reply-To', multiple 'Delivered-To' or 'Original-Received' headers, etc., you are unable to filter on the second (or third) header of the same type. Sample:

Delivered-To: mailing list users@openoffice.org
Delivered-To: moderator for users@openoffice.org

Only the first header is possible, the 'moderator' filter is ignored. This may be related to 
https://bugzilla.mozilla.org/show_bug.cgi?id=387337
[news server filters not working in SM 2.0a1]

Please reopen, or resolve 387337 - thanks.
</quote>
Does this report apply to IMAP configuration too?

Found also that one: Bug 184490 which is a very old one with 41 votes.

How can this kind of bug be kept unsolved? Filtering is a key point especially now with all the social media, mailing lists available.
The context of this bug seems to have flipped.  I had a bunch of filters setup (and working) which looked at Delivered-To headers and moved messages into appropriate folders.  After upgrading from TB 23 to TB 24.0.1 those filters stopped working.  I found that the filters only seem to look at the last (earliest) Delivered-To header now.  I suspect (but can't test) that only the first (latest) Delivered-To header was being inspected previously.

I want to request the functionality that was detailed in comment 21 above, specifically for custom headers:

IS:  If any one of the headers exactly matches the string.
CONTAINS:  If any one of the headers contains the string.
So, any news about this yet?
TB 24.5.0 has the same problem, namely custom headers not being matched if repeated.
This still appears to be an issue with TB 31.2.0

Example: detection of emails from Outlook using the Content-Type tag and searching for application/ms-tnef.  This tag occurs three times in the file, so matching on 'CONTAINS' should test all instances of this tag.

Content-Type: multipart/mixed;
	boundary="----=_NextPart_000_0023_01D0089B.1137FF60"
Content-Type: text/plain;
	charset="utf-8"
Content-Type: application/ms-tnef;
	name="winmail.dat"
(In reply to Richard M from comment #32)
> Content-Type: multipart/mixed;	boundary="----=_NextPart_000_0023_01D0089B.1137FF60"
> Content-Type: text/plain; charset="utf-8"
> Content-Type: application/ms-tnef; name="winmail.dat"

Your case is different from this bug.
This bu is for:
   Mail = HeaderPart + One_Null_Line + MailPayloadPart.
   "Messge header" in context of  "message filter" is : message header in HeaderPart.
         i.e. If POP3, messaage headers which cn be obtained by "TOP 0".
               If imap,  message headers which can be obtained by "uid fetch nn Body[Heders]".
         i.e. "message header in sub parts of multipart", which is content of MailPayloadPart, is not involved.
   "Messge header" in this bug is : message header in HeaderPart of "1 < max number ".
        For example, Received:, Resent-To:.
.  Currently, Tb scans multiple Received: only. Tb doesn't scan multiple header lines such as Resent-To:

"Content-Type: header you referred" is content of MailPayloadPart in above descrition.
To support it, both of following is needed.
   - Extend definition of "message header for message filter" to "message header in all sub parts under multipart"
      including "sub parts in nested multipart".
   - Scan multiple header lines in all sub parts, for example, Content-Type:, as done on Received: currently. 
And, you maybe request "message filter for message header in  data of message/rfc822 part(==attached mail). 
And, you maybe request "message filter for message header after close boundary".
Mozilla/5.0 (X11; Linux x86_64; rv:37.0) Gecko/20100101 Thunderbird/37.0a2 ID:20150216004005 CSet: 8f81d26fca7d

I have filters set up to tag a mail with a custom tag if the "Authentication-Results" header field contains specific strings (such as google.com or dkim=pass).  For any email where there is a single line in the headers for "Authentication-Results" then the filter works.  For some emails there are two lines in the headers starting with "Authentication-Results", and the filter fails to work.  The lines where it fails are typically similar to:

Authentication-Results: mx.google.com;
       spf=pass (google.com: domain of xxxx@yyyyyyyy.com designates XX.ZZ.YY.JJJ as permitted sender) smtp.mail=user@yyyyyyyy.com;
       dkim=pass header.i=@yyyyyyyy.com;
       dmarc=pass (p=NONE dis=NONE) header.from=yyyyyyyy.com
Received: from xxx.yyy.yyyyyyyy.com (xxx.yyy.yyyyyyyy.com [10.164.28.53])
	by zz.yyyyyyyy.com (Postfix) with ESMTP id 9E7A8200088
	for <mike.cloaked@gmail.com>; Mon, 16 Feb 2015 17:32:03 -0500 (EST)
Received: from zzz.yyyyyyyy.com (xxx.yyyyyyyy.com [10.164.28.115])
	by xxx.yyyyyyyy.com (Postfix) with ESMTPSA id 4EF67420420
	for <mike.cloaked@gmail.com>; Mon, 16 Feb 2015 17:32:03 -0500 (EST)
Authentication-Results: xxx.yyy.yyyyyyyy.com; dmarc=none header.from=yyyyyyyy.com

In all cases where there is only the first Authentication-Results line but not the last one then the filter works fine.
I notice that the Assigned to: field for this bug is designated as assigned to "Nobody" - does this mean that there is nobody working on this at all?  The problem is, for those hitting this bug, a considerable frustration. It would be really nice to see this worked on.
Is there any diagnostics that I can provide that will progress this bug or triage it?  It would be so good if this could have a solution.
I think it's well understood, just need someone to work on it. 
Should probably just do the same as for the Received header - http://hg.mozilla.org/comm-central/annotate/fbb26b3fc314/mailnews/base/search/src/nsMsgSearchTerm.cpp#l786 for every header that isn't explicitly allowed only once per rfc 5322.
OS: Windows 7 → All
Hardware: x86_64 → All
Whiteboard: [good first bug][lang=c++]
Summary: Filters not working for headers that can appear more than once (except Received: header. if Received:, multiple headers are processed as expected)) → Filters not working for headers that appear more than once (except Received: header, which are processed as expected)
Just noticed that some of my filters using "Delivered-To" "contains" as a criterion (in a "Match any of the following") list are not working, probably for the reason cited in this bug ("Delivered-To" appears more than once in the headers).  Generally, I want my filters to react to the first appearance, reading from the top (the last appearance chronologically), and that's not happening.

I'm using Thunderbird version 38.5.0, which has been very stable for me.  
Has this bug been fixed by 45.6.0?  Please let me know.  If so, I'll take the risk of updating.
Thanks.
(In reply to D Holzmman from comment #38)
> Just noticed that some of my filters using "Delivered-To" "contains" as a
> criterion (in a "Match any of the following") list are not working, probably
> for the reason cited in this bug ("Delivered-To" appears more than once in
> the headers).  Generally, I want my filters to react to the first
> appearance, reading from the top (the last appearance chronologically), and
> that's not happening.
> 
> I'm using Thunderbird version 38.5.0, which has been very stable for me.  
> Has this bug been fixed by 45.6.0?  Please let me know.  If so, I'll take
> the risk of updating.
> Thanks.

") list" should be " list)".
sorry for the misplaced parenthesis.
This is still an issue in 52.7.0, in particular as described in comment 29.

This bug has prevented me from updating Thunderbird for over 7 years now, being stuck on 3.1.20 for all that time. Due to more and more SSL connections failing on Thunderbird 3.x, I am now forced to update to a recent version, and deal with this ancient filter bug. Please, somebody, implement the behavior described in comments 21 and 29, and fix this!
And in 60.0.
Here is my patch trying to address the regression.

First problem was

msg->GetStringProperty(m_arbitraryHeader.get(), getter_Copies(dbHdrValue));
return MatchRfc2047String(dbHdrValue, charset, charsetOverride, pResult);

GetStringProperty will access only single header even if header occurs multiple times with different values. Thus all other ocurrences of the header were ignored.
Now we use that only if we get match, otherwise we process all headers line by line.

Second problem was that the rest of code only waited for first header occurence. Now we process all headers one by one (while still leaving "Received" processing).

Works for me on version 60+patch on Linux.
Attachment #9003138 - Flags: review?(jorgk)
Why leave the Received special cased?

Can you add a testcase?
Then run it, e.g.

  ./mach xpcshell-test comm/mailnews/base/test/unit/test_search.js
Assignee: nobody → arekm
Status: NEW → ASSIGNED
Comment on attachment 9003138 [details] [diff] [review]
process all headers if the same header exists multiple times with some values

Review of attachment 9003138 [details] [diff] [review]:
-----------------------------------------------------------------

Can you please supply a patch in the right format applicable to the comm-central repository. A straight diff on comm/mailnews/base/search/src/nsMsgSearchTerm.cpp.org is not correct. Also a HG header is missing. As Magnus said, a test would be good.

::: comm/mailnews/base/search/src/nsMsgSearchTerm.cpp.org
@@ +766,5 @@
>    nsCString dbHdrValue;
>    msg->GetStringProperty(m_arbitraryHeader.get(), getter_Copies(dbHdrValue));
> +  if (!dbHdrValue.IsEmpty()) {
> +    // match value with the other info but it doesn't check all header occurences
> +    // if the same header exists multiple times with different values, so we use it only if we match

Nit: Please turn this into proper English sentences with sentence case and full stop.

@@ +813,5 @@
>          {
>            searchingHeaders = false;   // then stop examining the headers
>            result = stringMatches;
> +        } else if (!isContinuationHeader && !isReceivedHeader) {
> +          // prepare for new header

Nit: Start with sentence case and add a fullstop.
Attachment #9003138 - Flags: review?(jorgk)
Looking at the history of the bug, Magnus are Aceman would be the better reviewers here.
Updated version.

Simplified.

Dropped special Received handling (it was introduced in https://bugzilla.mozilla.org/show_bug.cgi?id=124641#c99) since makes no sense now as we search all headers.

Added tests.
Attachment #9003138 - Attachment is obsolete: true
Attachment #9003436 - Flags: review?(jorgk)
JorgK, ok, will do (didn't notice new comment).
Attachment #9003436 - Flags: review?(jorgk)
Updated comments (hopefully in "proper" English). HG patch format.
Attachment #9003436 - Attachment is obsolete: true
Attachment #9003480 - Flags: review?(mkmelin+mozilla)
Comment on attachment 9003480 [details] [diff] [review]
process all headers if the same header exists multiple times with some values

Review of attachment 9003480 [details] [diff] [review]:
-----------------------------------------------------------------

The function you're changing is spaghetti code and already not working in corner cases. Without any other changes, remove the e-mail body from bugmail12 which is valid. The test will now fail.

IMHO it would be good to rewrite it completely to decouple the matching from the collecting of the full header with all continuation lines. I'm not even sure it will work when matching 'ends with' when not all continuation lines have been accumulated.

On the plus side, this seems to be working and the test passes even if you split the lines and use continuation lines as I suggested.

I guess I can take further reviews of this since I have already spent a considerable amount of time looking at this now, unless Magnus prefers to do it.

::: mailnews/base/search/src/nsMsgSearchTerm.cpp
@@ +767,5 @@
>    msg->GetStringProperty(m_arbitraryHeader.get(), getter_Copies(dbHdrValue));
> +  if (!dbHdrValue.IsEmpty()) {
> +    // Match value with the other info. It doesn't check all header occurences,
> +    // so we use it only if we match and do line by line headers parsing otherwise.
> +    nsresult rfc_rv = MatchRfc2047String(dbHdrValue, charset, charsetOverride, pResult);

Please use rv, no need for another variable. You can reinitialise to NS_OK.

@@ +768,5 @@
> +  if (!dbHdrValue.IsEmpty()) {
> +    // Match value with the other info. It doesn't check all header occurences,
> +    // so we use it only if we match and do line by line headers parsing otherwise.
> +    nsresult rfc_rv = MatchRfc2047String(dbHdrValue, charset, charsetOverride, pResult);
> +    if (*pResult == true)

if (*pResult)

@@ +775,3 @@
>  
>    nsMsgBodyHandler * bodyHandler =
>      new nsMsgBodyHandler (scope, length, msg, db, headers, headersSize,

Please remove the |NS_ENSURE_TRUE(bodyHandler, NS_ERROR_OUT_OF_MEMORY);|. 'new' is infallible, if the object cannot be allocated, control will not return to the caller.

@@ +793,4 @@
>      bool isContinuationHeader = searchingHeaders ?
>        NS_IsAsciiWhitespace(buf.CharAt(0)) : false;
>  
> +    // We try to match the header from all times through the loop, which should now

Nit: "from all times" isn't real good English. You mean "all iterations"?

@@ +793,5 @@
>      bool isContinuationHeader = searchingHeaders ?
>        NS_IsAsciiWhitespace(buf.CharAt(0)) : false;
>  
> +    // We try to match the header from all times through the loop, which should now
> +    //  have accumulated over possible multiple lines.

Nit: Only one space after //.

@@ -794,5 @@
>  
> -    // We try to match the header from the last time through the loop, which should now
> -    //  have accumulated over possible multiple lines. For all headers except received,
> -    //  we process a single accumulation, but process accumulated received at the end.
> -    if (!searchingHeaders || (!isContinuationHeader &&

Why is it OK to remove '!searchingHeaders ||'. If you come to the end of the message, searchingHeaders is set to false, so the time has come to inspect the accumulated values. That's how I read it anyway.

@@ +807,2 @@
>        }
> +      // Prepare for new header.

Clearer: // Prepare for repeated header of the same type.

@@ +809,5 @@
> +      headerFullValue.Truncate();
> +    }
> +
> +    // We got result or finished processing all lines.
> +    if (!searchingHeaders)

I can't really say that I fully understand the logic. Is it possible that we match 'ends with' without considering all continuations?

Like:
Custom: this is
 a line
 and another on.

So will this code match 'ends with' on "a line" since it's not collecting the last continuation?

::: mailnews/test/data/bugmail12
@@ +55,5 @@
>  X-Bugzilla-Target-Milestone: ---
>  X-Bugzilla-Changed-Fields: Blocks
> +X-Duplicated-Header: one value
> +X-Duplicated-Header: second
> +X-Duplicated-Header: third value for test purposes

You want test cases with continuation lines, like:
X-Duplicated-Header: one
 value
X-Duplicated-Header: third value
 for test
 purposes

Maybe do a 'contains' test with:
X-Duplicated-Header: here comes a continuation line
 that contains the word 'fourth' we're looking for
 and then another one.

And add a test for 'ends with' "the end":
Custom: This is
 the end
 my friend
> Without any other changes, remove the e-mail body from bugmail12 which is valid. The test will now fail.

That's some bug that exists already in TB, so is unrelated to my patch. First GetNextLine call in nsMsgSearchTerm::MatchArbitrary() is failing which I traced to nsMsgBodyHandler::GetNextLocalLine inside GetNextLine. Looks like nsMsgBodyHandler class is not ready for handling email without body.

> Why is it OK to remove '!searchingHeaders ||'

!isContinuationHeader also handles !searchingHeaders case (added comment in patch)


Also fixed comments and added more tests.
Attachment #9003480 - Attachment is obsolete: true
Attachment #9003480 - Flags: review?(mkmelin+mozilla)
Attachment #9003649 - Flags: review?(jorgk)
Should this test pass?

  { testString: "value",
    testAttribute: OtherHeader,
    op: DoesntContain,
    customHeader: "X-Duplicated-Header",
    count: 1},

There is a header
  X-Duplicated-Header: second
which matches the condition, of course there are also duplicated headers which contain "value" and hence don't match it.

I added a bit of debug to see what's going on:
      // Match value with the other info.
      rv = MatchRfc2047String(headerFullValue, charset, charsetOverride, &stringMatches);
      if (m_operator == nsMsgSearchOp::DoesntContain) {
        printf("=== %s, %d\n", headerFullValue.get(), stringMatches?1:0);
      }
and
  *pResult = result;
  if (m_operator == nsMsgSearchOp::DoesntContain) {
    printf("=== %d\n", result?1:0);
  }
  return rv;

I get:
 0:00.94 pid:900 testing for string 'value'
 0:00.94 pid:900 === one value, 0
 0:00.94 pid:900 === 0
 0:00.94 pid:900 Finished search does 1 equal 0?
 0:00.94 FAIL onSearchDone - [onSearchDone : 37] 1 == 0

So only "one value" is looked at. matchExpected==false, MatchRfc2047String() returns false, so we exit early with result==false.

If you reorder the headers, like so:
  X-Duplicated-Header: one value
  X-Duplicated-Header: second
you get:
 0:00.92 pid:10260 testing for string 'value'
 0:00.92 pid:10260 === second, 1
 0:00.92 pid:10260 === one value, 0
 0:00.92 pid:10260 === 0
 0:00.92 pid:10260 Finished search does 1 equal 0?
 0:00.93 FAIL onSearchDone - [onSearchDone : 37] 1 == 0

So despite "second" being returned as a match for 'doesn't contain' "value", it's not recognised.
Your test should always fail regardless of headers order IMO because there is one (actually more than one) X-Duplicated-Header that contains "value".

Such behaviour makes sense from what you would expect (as as user) from filters, so:

contains - in any duplicated header
doesn't contain - in all duplicated headers

And i think it works that way.



For behaviour:
doesn't contain - in any duplicated header, so in other words - if one header doesn't contain then we "match" makes no sense to me in terms of usefulness. We need to make sure that all X-Duplicated-Header do not contain our phrase.
OK. We're still recovering from yesterday's bustage, so I'm busy fighting fires. The patch is OK now, but I'll do some comment and white-space adjustments and I might rename some variables so the logic becomes clearer. 'searchingHeaders' doesn't appear like such a good name, any idea for a replacement since you've looked at the code for a while?

The block controlled by |if (!isContinuationHeader && !headerFullValue.IsEmpty())| should be preceded by a comment like this:

// If we're not on a continuation header the header value is not empty,
// we have finished accumulating the header value by iterating over all
// header lines. Now we need to check whether the value is a match.
processHeaders?
I've cleaned up the code a bit: Changed comments and removed trivial and wrong comments. Fixed white-space issues, renamed a variable.

I also added trailing spaces to the test message to test the stripping of trailing spaces.

Since this will be landed in your name, please f+ the changes if you're happy with them.

Sadly I cannot run the test locally right now since debug builds are still busted, see bug 1485820 comment #50.
Attachment #9004009 - Flags: review+
Attachment #9004009 - Flags: feedback?(arekm)
Attachment #9003649 - Flags: review?(jorgk)
Looks good. Builds and passes tests here.
Attachment #9004009 - Flags: feedback?(arekm)
Attachment #9004009 - Flags: feedback+
Attachment #9003649 - Attachment is obsolete: true
https://hg.mozilla.org/comm-central/rev/74ed2e4ba2456c97a4266b4c58faf4bf24b4d21e
process all duplicate headers when searching. r=jorgk
Status: ASSIGNED → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Target Milestone: --- → Thunderbird 63.0
Looks like this caused a test failure, most likely a test that needs to be adjusted, see:
https://treeherder.mozilla.org/#/jobs?repo=comm-central&revision=74ed2e4ba2456c97a4266b4c58faf4bf24b4d21e&selectedJob=195950603

TEST-UNEXPECTED-FAIL | comm/mailnews/imap/test/unit/test_imapSearch.js | xpcshell return code: 0
TEST-UNEXPECTED-FAIL | comm/mailnews/imap/test/unit/test_imapSearch.js | onSearchDone - [onSearchDone : 37] 0 == 1

Failure of calendar/test/unit/test_attendee.js is caused by another bug (bug 1485820). Can you please run that test and submit a patch to fix it. Otherwise I have to back out the fix altogether.
Status: RESOLVED → REOPENED
Flags: needinfo?(arekm)
Resolution: FIXED → ---
This test also uses bugmail12 and this one fails:

  { testString: "QAcontact filters@mail.bugs",
    testAttribute: OtherHeader,
    op: Isnt,
    count: 0},  <== passes of you put 1 here.

The one above

  { testString: "QAcontact filters@mail.bugs",
    testAttribute: OtherHeader,
    op: Is,
    count: 1},

passes. The message contains X-Bugzilla-Watch-Reason: QAcontact filters@mail.bugs.
Attached patch 678322-follow-up.patch (obsolete) — Splinter Review
I got this from Arkadiusz via IRC/pastebin. Tested and working.

Thanks for your contribution, the investigation and bearing with me over all the nits ;-)
Flags: needinfo?(arekm)
Attachment #9004088 - Flags: review+
As discussed on IRC, that was another logic error. So we go with this?
Attachment #9004088 - Attachment is obsolete: true
Attachment #9004095 - Flags: feedback?(arekm)
Grrr, typos: ..., there was another logic error.
Attachment #9004095 - Flags: feedback?(arekm) → feedback+
https://hg.mozilla.org/comm-central/rev/9289d8a37eb2dae82f2ecc0076ec8396126a5526
Follow-up: fix logic error and cater for IMAP case where headers are not available. r=jorgk
Attachment #9004095 - Flags: review+
Attached patch More testsSplinter Review
nsMsgSearchTerm::MatchArbitraryHeader() can do matching using message string
properties and by matching headers line by line. test_search.js was only
testing second case. Now we also test first case.
Attachment #9004507 - Flags: review?(jorgk)
or better:

Bug 678322 - Follow-up: more tests including matching based on message string properties

nsMsgSearchTerm::MatchArbitraryHeader() can do matching using message string
properties and by matching headers line by line.

test_search.js was only testing second case for duplicated headers. Now we also test
first case.

Add more tests for second case.
Comment on attachment 9004507 [details] [diff] [review]
More tests

Review of attachment 9004507 [details] [diff] [review]:
-----------------------------------------------------------------

Excellent! Exactly what I had in mind ;-) - Thanks again for your contribution and persistence. Interested in fixing more bugs?

::: mailnews/test/data/bugmail12
@@ +66,5 @@
>  X-Duplicated-Header: This is
>     the end
>        my friend
> +X-Duplicated-Header-DB: this header
> +   tests DB

I'll take the liberty to add some trailing spaces here so that code gets exercised as well. I'll do it before landing, no new patch required.
Attachment #9004507 - Flags: review?(jorgk) → review+
(In reply to Arkadiusz Miśkiewicz from comment #66)
> or better:
Will do.
Pushed by mozilla@jorgk.com:
https://hg.mozilla.org/comm-central/rev/a09a76a92abf
Follow-up: more tests including matching based on message string properties. r=jorgk
Status: REOPENED → RESOLVED
Closed: 6 years ago6 years ago
Resolution: --- → FIXED
Comment on attachment 9004009 [details] [diff] [review]
process-all-headers6.patch - with ws and comment changes by JK

We should consider this for TB 60 uplift.
Attachment #9004009 - Flags: approval-comm-esr60?
Attachment #9004009 - Flags: approval-comm-esr60? → approval-comm-esr60+
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: