Closed Bug 1105937 Opened 10 years ago Closed 6 years ago

'body contains...' Search yields false positive results ("Body search of multipart mail" searches data of next mail in msgStore file)

Categories

(MailNews Core :: Search, defect)

x86_64
Windows 8.1
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED DUPLICATE of bug 1259534

People

(Reporter: bz, Unassigned)

References

Details

User Agent: Mozilla/5.0 (Windows NT 6.3; WOW64; rv:33.0) Gecko/20100101 Firefox/33.0
Build ID: 20141106120505

Steps to reproduce:

I encounter this bug somewhat often.  When performing a body search on a folder containing large number of emails (>40000), mixed in with the correct results will be some incorrect ones.  The incorrect results always appear to be "next to" correct ones in timestamp, like the incorrect result mail has the same date as one of the correct results.  For example:

right click Local Folders->Search Messages->Body contains "dr.werner.linden@t-online.de"


Actual results:

I get 4 email results.  3 of them are correct and have dates of 5/29/2007, 5/30/2007, and 4/4/2009.  1 of them is an incorrect false positive and has a date of 5/30/2007.


Expected results:

The false positive should have been excluded from the results.
One more note to add, I've seen other reports of similar-sounding bugs which were attributed to body searches including email header text.  This is not a duplicate of those, I am positive that the incorrect result in my above example (and in other times I've encountered the bug) the message source does not contain the search string anywhere.
(In reply to Marcel from comment #0)
> right click Local Folders->Search Messages->Body contains "dr.werner.linden@t-online.de"
> Actual results:
> I get 4 email results.  3 of them are correct and have dates of 5/29/2007, 5/30/2007, and 4/4/2009. 
> 1 of them is an incorrect false positive and has a date of 5/30/2007.

"dr.werner.linden@t-online.de" is string usually seen in message header.
Bug 697021?
Read Bug 697021 Comment #7, please. This still occurs in Tb 32.1.0.

Call true positive mails Mail#1, Mail#2, Mail#3, and Call false positive mail Mail#4.
(1) Is Mail#4 multipart message? Or text/plain or text/html message?
(2) Create new folder, FolderA, copy  Mail#1,Mail#2, Mail#3, and Mail#4, in this order.
      Do Body search. Is Mail#4 returned?
(3) Create new folder, FolderB, copy Mail #4.
      Do Body search. Is Mail#4 returned?
      Copy Mail#1,Mail#2, Mail#3, in this order, additionally.
      Do Body search. Is Mail#4 returned?
(4) Show "Order Received" column at the folder of Local Folders, sort by "Order Received" in ascending order.
      (when local mail folder, "Order Received" column value is same as messageOffset of mail)
      Call "mail shown under Mail#4" Mail#4X.
      Is there string of "dr.werner.linden@t-online.de" in top part of message source(header portion) of the Mail#4X?
(5) If yes, copy Mail#4X to FolderA additionally.
      Do Body search. Is Mail#4 returned?
This may indeed be a dup of Bug 697021 if I am understanding it correctly.  However I should note that I see this bug often and even when not searching for an email address as the find string, e.g. even when searching for a string that is not something that would typically be found in a message header.  Answers to your questions:

> (1) Is Mail#4 multipart message? Or text/plain or text/html message?

It is "Content-Type: multipart/alternative" with 1 part "text/plain" and 1 part "text/html".  There is no attachment.

> (2) Create new folder, FolderA, copy  Mail#1,Mail#2, Mail#3, and Mail#4, in this order. Do Body search. Is Mail#4 returned?

No the false positive is not returned in this situation.

> (3) Create new folder, FolderB, copy Mail #4. Do Body search. Is Mail#4 returned?

No.

> Copy Mail#1,Mail#2, Mail#3, in this order, additionally. Do Body search. Is Mail#4 returned?

Yes!

> (4) Show "Order Received" column at the folder of Local Folders, sort by "Order Received" in ascending order. (when local mail folder, "Order Received" column value is same as messageOffset of mail) Call "mail shown under Mail#4" Mail#4X. Is there string of "dr.werner.linden@t-online.de" in top part of message source(header portion) of the Mail#4X?

No, string "dr.werner.linden@t-online.de" is definitely not contained in the source of Mail 4x.
(In reply to Marcel from comment #3)
> > (4) Show "Order Received" column at the folder of Local Folders, sort by "Order Received" in ascending order.
> > (when local mail folder, "Order Received" column value is same as messageOffset of mail). Call "mail shown under Mail#4"
> > Mail#4X. Is there string of "dr.werner.linden@t-online.de" in top part of message source(header portion) of the Mail#4X?
> No, string "dr.werner.linden@t-online.de" is definitely not contained in the source of Mail 4x.

As seen in bug 697021 comment #1, "next mail" can be "deleted mail".
(6) Create new folder, FolderC, copy Mail#4, and copy Mail#1,Mail#2, Mail#3, in this order.
      Copy other mail , Mail#5, which never contains search string in it.
      Shift+Delete of Mail#1, Mail#2, Mail#3. Do Body search. Is Mail#4 returned?
      Note: At this step, you can see "data in X-Mozilla-Status: of deleted mail" by Text Editor. 
      Compact of FolderC. Do Body search. Is Mail#4 returned?

If deleted mail(EXPUNGE bit in X-Mozilla-Status: is On) exists between Mail#4 and Mail#4X, phenomenon can be explained.
View content of local folder file(if folder naamed FolderX, file named FolderX, not FolderX.msf) using Text Editor.
Is there mail data of deleted mail between Mail#4 and Mail#4X?
If yes, is search string contained in mail data held just after Mail#4?
> Create new folder, FolderC, copy Mail#4, and copy Mail#1,Mail#2, Mail#3, in this order. Copy other mail , Mail#5, which never contains search string in it. Shift+Delete of Mail#1, Mail#2, Mail#3. Do Body search. Is Mail#4 returned?

No.

> Note: At this step, you can see "data in X-Mozilla-Status: of deleted mail" by Text Editor. Compact of FolderC. Do Body search. Is Mail#4 returned?

No.

> View content of local folder file(if folder naamed FolderX, file named FolderX, not FolderX.msf) using Text Editor.  Is there mail data of deleted mail between Mail#4 and Mail#4X?

Ok now I am confused as to which folder you want me to test on here.  When you say "FolderX" do you mean the new folder "FolderC" that you just had me create for the previous test?  Or the original folder of >40000 backup mails that the body search yields false positives from?  Actually the original folder contains several subfolders so there would be more than 1 mail file to check.
(In reply to Marcel from comment #6)
> When you say "FolderX" do you mean the new folder "FolderC" ... ?

No. "FolderX" in that context is "any folder which you want to check X-Mozilla-Status:, message data etc. which is held on local HDD".
(In reply to Marcel from comment #6)
> (i)   Create new folder, FolderC, copy Mail#4, and copy Mail#1,Mail#2, Mail#3, in this order. 
> ==> Add this spep, please. (i-X) Do Body search. Is Mail#4 returned?
> (ii)  Copy other mail, Mail#5, which never contains search string in it. 
> ==> Add this spep, please. (ii-X) Do Body search. Is Mail#4 returned?
> (iii) Shift+Delete of Mail#1, Mail#2, Mail#3. 
> (iv) Do Body search. Is Mail#4 returned?
> 
> No.

Add step (i-X) and (ii-X), please.
Needless to say, Mail#4 == false positive mail, and Mail#1/#2/#3 == is true positive mails, which you referred in your comment #0. And, you said "Yes!" in your comment #3. Check of above (i-X) is absolutely same as "Yes! case in your comment #3"...
(In reply to Marcel from comment #6)
> Create new folder, FolderC, copy Mail#4, and copy Mail#1,Mail#2, Mail#3, in this order. 
> Copy other mail, Mail#5, which never contains search string in it. 
> Shift+Delete of Mail#1, Mail#2, Mail#3.  Do Body search. Is Mail#4 returned?
> 
> No.

See attachment 8532425 [details] which is attached to bug 697021 comment #22.
You can see phenomenon of "any string in mail#1/mail#2 returns mail#0", and phenomenon of "next mail(s) is deleted or not deleted is irrelevant to problem" if you do Shift+Delete of mail#1/mail#2.
If "false positive by Body search of multipart message in local mail folder" is caused by "string in next mail(s) is searched", phenomenon is perhaps following.
     Body Search starts from line after "null line" which is delimiter of "header part" and "mail payload part".
     Searched lines ==      "number of mail payload lines of multipart mail" 
                                    + "number of message header lines in sub part of the multipart mail"
     When "message header line in sub part of multipart mail" is skipped during body seaarch,
     processed line count is perhaps not incremented.
Body Search.
> http://mxr.mozilla.org/comm-central/source/mailnews/base/search/src/nsMsgLocalSearch.cpp#491
> http://mxr.mozilla.org/comm-central/source/mailnews/db/msgdb/src/nsMsgHdr.cpp#539
> http://mxr.mozilla.org/comm-central/source/mailnews/base/search/src/nsMsgSearchTerm.cpp#913
>   nsMsgSearchTerm::MatchBody (nsIMsgSearchScopeTerm *scope, uint64_t offset, uint32_t length /*in lines*/, const char *folderCharset,
>                                                       nsIMsgDBHdr *msg, nsIMsgDatabase* db, bool *pResult)
One of reasons why problem occurs in Body search if multipart is:
-  What is message body in multipart, what is attachment in multipart, what is non-body/non-attachment in multipart,
   is not clearly, cleanly defined in Tb.
For Body search, text/html or text/plain part under first multipart/alternative, or first text/plain or text/html part under multipart/mixed, can be used as definition of "message body in multipart mail", as done in "Forward in Inline".
If text/plain part or text/html part only is passed to "Body Search" function, I believe this bug will be resolved.
Summary: 'body contains...' Search yields false positive results → 'body contains...' Search yields false positive results ("Body search of multipart mail" searches data of next mail in msgStore file)
At least problem you could see in comment #3 is problem which I had reported to impossible-to-understand bug 697021.
Confirming.
Status: UNCONFIRMED → NEW
Ever confirmed: true
Product: Thunderbird → MailNews Core
Sorry for delayed reply, here are results of your last test requests:

> (i)   Create new folder, FolderC, copy Mail#4, and copy Mail#1,Mail#2, Mail#3, in this order. 
> ==> Add this spep, please. (i-X) Do Body search. Is Mail#4 returned?

yes the false positive is returned

> (ii)  Copy other mail, Mail#5, which never contains search string in it. 
> ==> Add this spep, please. (ii-X) Do Body search. Is Mail#4 returned?

yes the false positive is still returned in this case

> (iii) Shift+Delete of Mail#1, Mail#2, Mail#3. 
> (iv) Do Body search. Is Mail#4 returned?

yes the false positive is still returned in this case
(In reply to Marcel from comment #14)

It's consistent result with your comment #3.
If you do next, you can know what happened well.
1. Copy file named FolderC in comment #14 to FolderD, Edit FolderD(not FolderD.msf)  by Text Editor,
    Insert a line to mail data for Mail#1(2nd mail) which is placed after mail data for Mail#4(multipart mail) at top of file.
      From - Wed Dec 10 17:51:17 2014
      X-Y-Z: ???!!!"                                   <= insert this line.
      X-Account-Key: account3
      X-UIDL: 00001dbb51cb6554
      X-Mozilla-Status: 0009
      X-Mozilla-Status2: 00000000
      X-Mozilla-Keys:
2. Restart Tb, "Repair Folder" of FolderD.
3. Do Body Search for "X-Y-Z:",  "???!!!",  "X-Y-Z: ???!!!" etc.
This is "false positive" what you saw.

"How many line of subsequent mail is searched" depends on number of heder lines(or bytes) in subpart under multipart, for example, Content-Type: text/plain(image/jpeg, application/pdf), Content-Disposition: attachment; filename=... etc.
If you reduce "number of crafted header lines in subpart under multipart" of attachment 8532425 [details], you can reduce  searched lines of subsequent mail by Body Search.
Please try this with TB 59 or later. In bug 1259534 I fixed the problem that searches went beyond the end of the message yielding false positives from a subsequent message.
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → DUPLICATE
You need to log in before you can comment on or make changes to this bug.