Closed Bug 576994 Opened 14 years ago Closed 5 years ago

"Body" quick filter option searches body and seemingly-random header values

Categories

(Thunderbird :: Search, defect)

x86_64
Windows 7
defect
Not set
major

Tracking

(Not tracked)

RESOLVED WORKSFORME

People

(Reporter: erik, Unassigned)

References

(Depends on 1 open bug, Blocks 1 open bug)

Details

Attachments

(1 file)

User-Agent:       Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.2.7) Gecko/20100701 Firefox/3.6.7 (.NET CLR 3.5.30729)
Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.2.4) Gecko/20100608 Mnenhy/0.8.3 Thunderbird/3.1

The "Body" filter option new to Thunderbird 3.1 searches the entire message, including the headers. Sometimes this is useful, but sometimes it's worse than useless. If I'm searching for messages on an email list in which my name is mentioned, EVERY message appears because my email address, containing my name, is in one of the headers. Headers and body need to be separate search fields.

Worse, this is inconsistent. If I search for "received" (something that's in every single email header) in a folder containing 1071 messages, I get only 64 hits, most of which don't contain the word anywhere in the body (a text search confirms that it's only in the "Received:" header). A search for my first name in the same folder gives me 951 hits. Both of these hits should have 1071 hits if they're searching the full header, and both should have FAR fewer hits if they're searching the message body.

Reproducible: Sometimes

Steps to Reproduce:
1. Search for something that's common in an email header but not in email text, only enabling the "Body" filter
2. Look in the results for messages that don't appear to have that in the text.
3. Look at said message's source (Ctrl-U) and verify that the text is only in a header.
Actual Results:  
Many messages will only contain the searched-for text in headers. Many other messages which contain the same text in the header will NOT be found.

Expected Results:  
Only messages containing the text in the message *BODY* should be found. A separate filter should be available for "any header."

"Advanced filtering" is one of the common selling points of Thunderbird, so I'm making this bug "major" - this is advertised as a major feature, and it is broken.
er, I forgot to edit the first paragraph upon noticing that the results were inconsistent with my original impression that full headers were being searched. "EVERY message appears because" should have read "most messages appear because"
Version: unspecified → 3.1
Component: Folder and Message Lists → Search
QA Contact: folders-message-lists → search
strange, I don't remember getting Wayne's comment in my email inbox, even though I wrote this bug and am on its CC list.

Of that list, the only bug whose description seems to match is 519202, but that's a [META] entry, not its own bug. All of the other bugs listed seem to be about search functions simply failing, or not working properly in specific situations (e.g. while searching IMAP accounts - the problem described above is on locally stored mail downloaded from POP3 servers).

I haven't yet noticed a *specific* message that I can't find in search, so I'm afraid I can't say whether there's anything specifically different about the messages that aren't being found.
Status: UNCONFIRMED → RESOLVED
Closed: 13 years ago
Resolution: --- → DUPLICATE
Whiteboard: dupme?
Bug 379988 is not a duplicate of this, Aureliano. Or if it is, Bug 379988 is described incorrectly. That bug claims that the search function always searches headers, when it does not. It's entirely inconsistent, searching some headers in some messages and not in others.
Status: RESOLVED → UNCONFIRMED
Resolution: DUPLICATE → ---
Depends on: 379988
To all problem reporters:

Do you refer email headers of which mail and email headers of which part if multipart?
Please distinguish next cases.
(1) Primary message headers(lines from top of mail to null line which is separator of mail headers and mail payload) of the mail.
(2) Message headers for a part in multipart mail(headers just after boundary).
(3) Message headers in attached mail data which is mail payload data of message/rfc822 part.
(4) Primary message headers of next mail in mail folder file. (see bug 697021)
(5) Primary message headers of other mail.

Bug 379988 refers to Content-Disposition: header. 
Because Content-Disposition: is usually not used in primary headers of a mail, it's probably (2) or (3). I couldn't see problem on (2) with Tb 7 and recent trunk nightly. So, I think it's usually known problem on (3) if it happens.

When Received: header, any mail usually contains it, and file for mail box usually contains multiple mails. So, phenomenon of bug 697021 can happen.
If body search text is exactly same string as top most(last added) Received: header line of next mail, which mail is returned by body search?
 
Note-1: Show "Order Received" column and sort in ascending order. Do Compact folder before check, please.
Note-2: Please check with local mail folder. If IMAP, problem may not persistent.
(Replacement of questions in comment #6) 

There are two known problems which surely produces phenomenon of "false positive because body search actually searches message header", (1) bug 697021 (if multipart mail, message header of next mail), (2) bug 700541 (message header data in messae/rfc822 part).

To all problem reporters in this bug:
Which is your case?
  (a) Same as bug 697021, (b) Same as bug 700541,
  (c) Different from these two bugs, (d) Combination of (a)/(b)/(c).
If (c) or (d), can you find bug for same problem as yours in bugs listed in dependency tree for Bug 519202(which is put in Blocks: of this bug)?
Erik, see questions in comment 7. Can you reply?
Sorry, the comment 7 email notification somehow escaped me.
This appears to be unrelated to both of the bugs cited in comment 7, if I understand their descriptions correctly.
Bug 697021 appears to relate to content split across multiple messages, like you'd see in a USENet binary newsgroup, but that's incredibly rare in emails.
Bug 700541 appears to relate to email forwards, where the header of the forwarded message is sometimes part of the message body, if the person forwarding the message doesn't trim properly.

I'm getting hits from the email header itself. For example, if I do a body-only search for "Erik" in a folder for an email list (in TB 10.01), which has 3776 messages in it, I get 3269 hits, even though the vast majority of those discussions don't involve me at all.  Picking one at random, I see my email address (which contains my name) in the "Envelope-To:" header and in one of the "Received:" header blocks.  It's nowhere else in the message at all. My name should be in those same headers in the 507 emails that didn't come up in the search, though (and there's no really easy way that I know of to determine which 507 messages those are).

Incidentally, in that same folder, if I search for the FQDN of the mail server that's in the same "Received:" header block as my email address, I get 149 hits.  I should get 0, since none are in the message body, or 3776, since it's in every message header.  If I search for "hostgator.com," which is in the same "Received:" header by virtue of being my mail server, I get only 11 hits.  There's no consistency at all here.
Additionally, none of the bugs listed in the description for Bug 519202 appear to describe this one, either. I've just uploaded an example email as an attachment. This is from the Mozilla Thunderbird YahooGroup, and my name appears in three headers: "Return-path," "Envelope-to," and "Received," but nowhere in the body. This same message does NOT come up in a body search for "hostgator.com"
Attachment #597987 - Attachment mime type: application/octet-stream → text/plain
(In reply to Erik Harris from comment #9)
> Bug 697021 appears to relate to content split across multiple messages, like
> you'd see in a USENet binary newsgroup, but that's incredibly rare in emails.

Do you understand that bug correctly? That bug can occur any time in local mail folder if multipart mail exists in local mail folder.

(In reply to Erik Harris from comment #10)
> Created attachment 597987 [details]
> Example message that comes up in a "Body" search for "erik" despite not
> containing the string in the body.

Do you see your problem even when the mail is only mail in a local mail folder and the local mail folder is Compacted?
Did you show "Order Received" column and check message source of mail placed just after the mail?

Message headers of the attached mail.
> X-Account-Key: account4
> X-UIDL: UID175997-1200796571
> Content-Type: multipart/alternative; boundary="Boundary_(ID_NrdJku9Ca6mB+8HXynE41Q)"

Phnomenon of Bug 697021 is that mail placed just before the attached mail is returned by Body search for "X-UIDL: UID175997-1200796571" if the mail is multipart mail.
(1) Create a new local mail folder, Show "Order Received" column, Sort in
    ascending order by "Order Received". Column value = offset in file. 
(2) Copy a multipart mail other than attached mail to this folder. Call mail-1.
(3) Copy the atached mail to this folder. Call mail-2.
(4) Body search for "X-UIDL: UID175997-1200796571" => mail-1 is returned.
(5) Copy other mail to this folder. Call mail-3.
(6) Shift+Delete of mail-2.
    Body search for "X-UIDL: UID175997-1200796571" => mail-1 is still returned.
(7) Compact.
    Body search for "X-UIDL: UID175997-1200796571" returns nothing.
If mail-3 has "Received: from eric.com", Body search for "eric" returns mail-2. This occurs even after delete of the mail-3 until Compact is executed. 

> There's no consistency at all here.

As seen in above, problem depends on location of mails in local mail folder file.
- If mail of "eric in message header" is placed just after the mail,
  and if the "eric in message header" is placed at top part of the mail,
  and even if the mail of "eric in message header" is deleted mail,
  that bug occurs.
- If mail of "eric in message header" is placed just after the mail,
  and if "eric in message header" is placed at bottom part of the mail,
  that bug doesn't occur. (number of searched header lines is not so large) 
- If mail of "eric in message header" is not placed just after the mail,
  that bug doesn't occur.
Please rule out this already known issue from your problems first.

If IMAP and Online search is executed for Body(Edit/Finf/Search, Search Folder), problem like bug 721167 can occur.
Please surely rule out already known issues from your problems.
(In reply to WADA from comment #12)
> Do you understand that bug [Bug 697021] correctly? That bug can occur any time in local
> mail folder if multipart mail exists in local mail folder.

I can't guarantee that I fully understand either of them, which is why I recapped what the descriptions seemed to me to say.

> Do you see your problem even when the mail is only mail in a local mail
> folder and the local mail folder is Compacted?

I just created a new folder with only the above-attached message in it, and it doesn't come up in a search for "erik"

> Did you show "Order Received" column and check message source of mail placed
> just after the mail?

Check the one immediately after for what?  The subsequent message is a message I sent, so it has my name in the signature block, so it definitely should show up in a search for my name.

> As seen in above, problem depends on location of mails in local mail folder
> file.
> - If mail of "eric in message header" is placed just after the mail,
>   and if the "eric in message header" is placed at top part of the mail,
>   and even if the mail of "eric in message header" is deleted mail,
>   that bug occurs.

In this case, why does a search for "some other text in the same message header" NOT necessarily show up, even when that alternate text appears in every single message header in the folder? In this case, both "erik" and some other strings (such as "hostgator.com") appear in every header, but the searches for the various ubiquitous strings do not yield the same results. If it were simply a matter of "multipart header values cause some other message headers to be searched as body text," shouldn't the results for searches of ubiquitous header strings be identical?

Without doing a robust analysis of the hundreds or thousands of messages in a mailbox in order to find some pattern to the search results, the only way I know to clearly illustrate this bug is to use strings that are present in every message's header.

> - If mail of "eric in message header" is placed just after the mail,
>   and if "eric in message header" is placed at bottom part of the mail,
>   that bug doesn't occur. (number of searched header lines is not so large) 

I don't understand this. A search doesn't place the result in a specific part of the email, it just displays a list of emails where the search registers as a hit. And in all cases, header data is at the top part of the mail, not the bottom part.

> If IMAP and Online search is executed for Body(Edit/Finf/Search, Search
> Folder), problem like bug 721167 can occur.

I should've specified that I'm using POP/SMTP, not IMAP.
(In reply to Erik Harris from comment #13)
> I just created a new folder with only the above-attached message in it, and
> it doesn't come up in a search for "erik"

If so, searched header is apparently not header of the mail. (call mail-1)
(1) Copy another mail to the folder. (call mail-2)
(2) Create a Tag (call tag001. mail-1/mail-2 doesn't have string of "tag001")
(3) Add tag001 to mail-2. tag001 is written in X-Mozilla-Keys: header.
    X-Mozilla-Keys: tag001 
(4) Body search for tag001 => mail-1 is returned.
(5) Remove tag001 from mail-2. tag001 is removed from X-Mozilla-Keys: header.
    X-Mozilla-Keys:
(6) Body search for tag001 => mail-1 is not returned.
(7) Add tag001 to mail-2 again. tag001 is written in X-Mozilla-Keys: header.
    X-Mozilla-Keys: tag001
    Body search for tag001 returns mail-1.
(8) Shift+Delete of mail-2 => X-Mozilla-Status: 0009 
(9) Body search for tag001 => mail-1 is returned.
It indicates that part of message headers of subsequent mail is always searched.
If crafted mail(a few message headers only), search ends at top part of message body of subsequent mail.

> > Did you show "Order Received" column and check message source of mail placed
> > just after the mail?
> Check the one immediately after for what?  The subsequent message is a
> message I sent, so it has my name in the signature block, so it definitely
> should show up in a search for my name.

Not at search result of Body search for "erik".
At already Compacted folder, with all message shown at thread pane, with sorted by "Order Received", subsequent message of the wrongly hit message. If search term is not contained in any message header of the subsequent message, it's never Bug 697021. If search word is contained in a message header of the subsequent message, it's perhaps Bug 697021 although not certain yet.

> Without doing a robust analysis of the hundreds or thousands of messages
> in a mailbox in order to find some pattern to the search results,
> the only way I know to clearly illustrate this bug is to use strings
> that are present in every message's header.

Because headers of subsequent message is searched by Bug 697021, "false positive by bug 697021" can occur on many multipart mails if such Body search is executed in your environment.
FYI.
Phenomenon of bug 697021.
> <----- mail-1 (multipart/xxx) -----><------------- mail-2 ------------->
> <-- head --><-------- body --------><-------- head --------><-- body -->
>             <------ Body search by bug 697021 ------>
If mail-2 is crafted mail, following is observed.
> <----- mail-1 (multipart/xxx) -----><------------- mail-2 ------------->
> <-- head --><-------- body --------><- head -><-------- body ---------->
>             <------ Body search by bug 697021 ------>
Depends on: 697021
(don't see why this is still unconfirmed)
Status: UNCONFIRMED → NEW
Ever confirmed: true
See Also: → 794501

Many broken things were fixed in body search. This will work now.

Status: NEW → RESOLVED
Closed: 13 years ago5 years ago
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: