Closed Bug 249841 Opened 20 years ago Closed 15 years ago

False positives from message bodies search in newsgroups and IMAP accounts

Categories

(Thunderbird :: General, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED
Thunderbird 3.0b4

People

(Reporter: jmprange, Assigned: asuth)

References

Details

(Keywords: fixed-aviary1.0)

Attachments

(4 files, 2 obsolete files)

User-Agent:       Mozilla/5.0 (Windows; U; Win98; en-US; rv:1.7) Gecko/20040703 Firefox/0.9.0+ Mnenhy/0.6.0.101
Build Identifier: version 0.7+ (20040704)

With a newsgroup folder selected, using the quick search bar to search message
bodies or the right-click "Search Messages..." "Body" "contains" "string"
search, I get some posts that don't contain the string. In these cases I was
searching for my last name, "Prange", which I'm sure I'd notice if it were in
the post. It does find some posts that do contain the string, but I'm not sure
whether it finds all of them. As far as I've noticed, this only occurs in
newsgroups, but I haven't entirely ruled out other folders.

Reproducible: Sometimes
Steps to Reproduce:
1. Rightclick on a newsgroup folder.
2. Click on "Search Messages...", choose "Body" from the drop-down menu.
3. Leave "contains" as is, type a search string into form.
4. Click on "Search".
Actual Results:  
It shows some posts that *don't* contain the search string, as well as some that do.

Expected Results:  
Only posts that do contain the search string.

Note that this doesn't seem to be entirely random; if I search for a nonsense
string that probably doesn't occur in any posts, I get a "No matches found".
Seeing this on Win XP and Mac.
I've been able to reproduce this on IMAP mail folders as well.  I also sent one
of the false positive messages to a POP account, searched for the same text and
did not get a false positive. 
Status: UNCONFIRMED → NEW
Ever confirmed: true
OS: Windows 98 → All
QA Contact: walkerrunner
Hardware: PC → All
here is a reproducable test case:

- In Seamonkey, browse to http://wp.netscape.com/zh/hk/
- File | Send Page
- When the mail send widnow comes up, enter an IMAP account to send the page to.
- Allow the mail client to convert to UTF-8
- With Firefox, setup the IMAP account (if you hadn't already), get new messeges.
- select a View that contains the messege of the chinese site sent from seamonkey.
- select Message Body in search and then type "done"
notice the messege of the chinese site is shown.
  
-look at the messgae source and run a Find on "done"
"done" does not exist in the source.
Summary: False positives from message bodies search in newsgroups → False positives from message bodies search in newsgroups and IMAP accounts
I'm now also seeing false positives on a search in message body for "mozilla".
I have many test emails with only the word test in the body.  However the sender
contains "mozilla".

I haven't been able to identify why the chinese page gets a flase positive with
the word done.
Flags: blocking-aviary1.0?
I have seen this in the message search function as well (at least for local
folders).

It appears that the search function scans the message source, not just the
message body. I searched for "6301" and numerous messages were returned where
this string was part of the MIME encoding of an attachment but not the body.

1. Right click on a folder -> Search Messages
2. Body Containts "string"

version 0.8 (20040913) on Win XP.
yes, body search might be better named "all of message" since it does look
through the whole message, not just the body. It has always worked that way.
This makes it easy to search for special headers. We could just search the
message body but then you'd have to add custom headers to search for particular
headers or values...
ok, let's change the wording on this..    some suggestions

"search entire message"
"search whole message"
search Subject, Sender, Subject or Sender, "Entire Message"?
Flags: blocking-aviary1.0? → blocking-aviary1.0+
Attached patch the fixSplinter Review
changes the verbage to Entire Message. David's suggestion of All of Message may
also work if we don't end up liking this.

I also moved the Virtual Folder's item in the search drop down to the bottom of
the quick search list to clean up the UI a bit.
fixed branch and trunk. 
Status: NEW → RESOLVED
Closed: 20 years ago
Resolution: --- → FIXED
Target Milestone: --- → Thunderbird0.9
looks like it has been fixed on the branch, so could somebody please add the 
fixed-aviary1.0 keyword?
(i know this is trivial but it would be useful for querying the overall number
of remaining bugs)
Keywords: fixed-aviary1.0
Verified with Windows Thunderbird build from 2004-09-29
Status: RESOLVED → VERIFIED
My apologies for taking so long to take another look at this bug.

First off, the search string doesn't occur *anywhere* in the message source, not
even in the header lines.

"Entire Message" may well be better, but it appears only for the "Quick Find"
message bar. The context menu "Search Messages..." dropdown still shows "Body".
Status: VERIFIED → REOPENED
Resolution: FIXED → ---
minus'ing for the remaining issue of the search dialog
Flags: blocking-aviary1.0+ → blocking-aviary1.0-
Target Milestone: Thunderbird0.9 → Thunderbird1.1
Also false positives in message subjects in newsgroups with 30k messages
James M. Prange (or anyone else): do you still see this problem?
Magnus,

Yes, of course I still see this bug; did you expect it to fix itself?

This is with version 1.5.0.9.
I can't reproduce with Mozilla/5.0 (Windows; U; Windows NT 5.1; fr; rv:1.8.1.2pre) Gecko/20070116 Thunderbird/2.0b2 ID:2007011615.

Additionally, this bug created bug 271222, which is easily reproduceable.
Yes I see it. For me the false results are only when I'm offline from the IMAP server. Results are a) missing messages that contain the keyword, and b) false positives.

When I'm online, the IMAP server appears to do the search and the results are correct.

I'm running 2.0.0.0 and I've had similar problems with 1.5.0.10 and 1.5.0.9.

QA Contact: twalker → general
This bug is still in 2.0.0.4. Please, please, any progress on this??? I see this thread has been running since 2004!!
client: Thunderbird 2.0.0.4/1.5.0.12
OS:     Windows XP
IMAP server: unknown

I haven't discovered any false positive results yet but the online IMAP search of message texts and subject lines fails (i.e., doesn't yield any results), when

a) the search string contains any non-US-ASCII characters, e.g. 'ß' (0xDF), 'ü' (0xFC), 'ç' (0xE7, ISO-8859),

b) the messages are UTF-8 encoded, regardless of the search string.


The offline search however seems to be correct, this is quite the contrary to Comment 18.
I just grepped through my inbox file (168MB) on a particular keyword and the grep results exactly matched the emails returned by the IMAP server when I search using 'Entire Message.' When I repeat the search disconnected from the IMAP, Thunderbird gives me the false positives and missing results as mentioned above. Sorry that we've got opposite problems!

I'm more than happy to work with any of you who program Thunderbird. I'm a regular Beta tester for a variety of software and I do some C,C++,GUI programming in my job - mostly under MSVC6 & 2005.

This problem is still in 2.0.0.5 (Windows).

Can anyone confirm that they are looking at this? Or that they even care? Thunderbird's "entire message" search is currently unusable when offline from the IMAP server.

As I said above, I'm happy to work with anyone who programs Thunderbird. e.g running tests on dev builds etc.
This problem is still in 2.0.0.6 (two different PC's running Windows XP Pro SP2).

Please, please.... the silence is deafening. I can't be the only person in the world who is having problems with this.

As I said above, I'm happy to work with anyone who programs Thunderbird. e.g
running tests on dev builds etc.
And it's still in 3.0a1pre (2007080505).
BTW: Google desktop returns the correct results so I guess the only reliable option for offline Thunderbird message searching is to install Google desktop.
please, someone (or everyone) attach a test message to the bug.
I'm sorry. What do you mean by 'attach a test message'?

To the Thunderbird developers.... would it be useful if I created an IMAP account on my website for you, and loaded the account with messages until the problem appears? You could either just sync to the account and run the search strings I send you which show the problem, or I could even send you the whole inbox file....

(In reply to comment #28)
> I'm sorry. What do you mean by 'attach a test message'?

Michael

save one or more of the messages that are a false positive with
 file > save as > file
attach to this bug by clicking "Add an attachment"
This is one of a number of false positives returned when searching for the keyword "courtney" using "Entire Message." Note that this happens when Thunderbird is offline - i.e. not connected to the IMAP server. When connected to the IMAP server, the search is performed in the IMAP server and the results are fine.
I'm seeing this, too. In my case, the false positives are every single message in the IMAP folder. No matter what I choose from the search drop-down (From, Subject, etc.) it always returns all the messages in the folder.
Assignee: mscott → nobody
Status: REOPENED → NEW
I think there is some order in when false positives are generated.

1) I have a newsgroup - it is NOT configured for offline use
2) I download a message for offline use
3) Any word that is in that message causes false positives
I've done some debug. Interesting things happen in:
mailnews/base/search/src/nsMsgLocalSearch.cpp
at nsresult nsMsgSearchOfflineMail::Search (PRBool *aDone)
function.

Something goes wrong during execution of:
err = MatchTermsForSearch (msgDBHdr, m_searchTerms, charset.get(), m_scope, m_db, &expressionTree, &match);
At this place msgDBHdr seems to be OK. I've added some messages here and they suggest that the loop really iterates through all the threads/messages. I've checked this with GetSubject() and it was changing.

What is strange is that match had random values. 0 or 1 but not depending on a real match.

So I think
mailnews/base/search/src/nsMsgLocalSearch.cpp:
nsresult nsMsgSearchOfflineMail::MatchTerms(...)
needs more debugging, but I have to go to sleep right now :-)
Debug process report, tell me if You don't want so much text :-)

mailnews/base/search/src/nsMsgSearchTerm.cpp:
nsresult nsMsgSearchTerm::MatchBody(...)

at line
nsMsgBodyHandler * bodyHan  = new nsMsgBodyHandler (scope, offset, length, msg, db);
the lines retrieved don't come from a message they should, but are from the some message downloaded offline (News/<server>/<newsgroup> file). The retrieval does not stop at the beginning of the next message and continues through its raw headers.
The number of lines we get at:
bodyHan->GetNextLine(buf)
is equal to the length variable.
Attached patch First draft of a possible patch (obsolete) — Splinter Review
1) ommit non offline messages on nntp
2) correct lines count of a downloaded message, so that it preserves the XOVER value
And I forgot to mention, for this patch to work, You need to delete the files containing lines count of already downloaded NNTP messages.
Thanks for working on this! A few minor comments about the patch:
 - remove all the debugging info, no NS_WARNING just to print debug info ;)
 - use moz string classes where applicable, like nsCAutoString for server_type
 - initialize the nsresult rv = on the the row you're using it the first time
 - space after if in | if(!strcmp(server_type ... |, though you can use server_type.LowerCaseEqualsLiteral("nntp") instead
 - consistent indentation, esp looks funky in nsNewsFolder.cpp (no tabs please, if that's what is causing it)

I didn't check, but to get reviewed the patch should apply to the latest hg source - http://developer.mozilla.org/en/Comm-central_source_code_(Mercurial)

After you made the adjustments (and the patch is working), make sure to ask review. http://developer.mozilla.org/en/docs/Getting_your_patch_in_the_tree
Assignee: nobody → wodny
Target Milestone: Thunderbird1.1 → ---
Thank You for the advisory.
I hope this one looks better.

Diff based on hg repo from Mon Sep 01 00:12:18 2008 +0100
This patch doesn't affect IMAP - one reason is that I wasn't able to replicate the bug for IMAP.
Attachment #336172 - Attachment is obsolete: true
Attachment #336410 - Flags: review?(mkmelin+mozilla)
Attachment #336410 - Flags: review?(mkmelin+mozilla)
Comment on attachment 336410 [details] [diff] [review]
First version looking good (I suppose)

You need a mailnews backend reviewer, try bienvenu perhaps?
http://www.mozilla.org/owners.html#mail-and-news-backend

- I don't know why you put the comments for m_pastHeaderLines *after*
- "!= NS_OK" isn't usually used. NS_FAILED?
Can anyone point me to a patched test build that I can test on IMAP?
Magnus Melin:
OK, I've corrected the patch, I hope You will once again take a look. If there are no further problems I will add the review flag set to bienvenu.

Michael Smithers:
I can provide You with a patched version compiled on Ubuntu 8.04.1 (32bit)
Attachment #336410 - Attachment is obsolete: true
Thanks very much. I'm on WinXP. If it's possible to run a Ubuntu exe, I'm very interested in trying. (I haven't had much luck compiling 2.0.0.16 on windows - both with Cygwin and MSVC8 w MozillaBuild tools.)
Looks ok style wise I'd say.
Attachment #336585 - Flags: review?(bienvenu)
This bug is still in 2.0.0.17. Any more news on the fix?

Thanks,
Michael
Whiteboard: [has patch] [needs review]
Comment on attachment 336585 [details] [diff] [review]
corrected once again, NS_FAILED, comment goes before commented part

thx for the patch, sorry for the delay.

I think we should be handling the offline store issue issue differently - MatchBody should simply not be getting hold of any body if the message isn't in the offline store - I'm curious why it's getting any data for the message at all...
Attachment #336585 - Flags: review?(bienvenu) → review-
This patch is also associated with this bug:
https://bugzilla.mozilla.org/show_bug.cgi?id=452924
that is why it makes so many changes to code and even a class

I hope that the path found that leads to the error will be helpful.

I'm sorry but I can't see another approach now and won't have time to search for one in the nearest future. I don't want to block works by keeping myself as an assignee.
Assignee: wodny → nobody
Both IMAP and NNTP offline storage include the header lines in their line count:
http://mxr.mozilla.org/comm-central/source/mailnews/imap/src/nsImapMailFolder.cpp#4188
http://mxr.mozilla.org/comm-central/source/mailnews/news/src/nsNewsFolder.cpp#1774

This is in contrast with local messages which do speak in body lines:
http://mxr.mozilla.org/comm-central/source/mailnews/local/src/nsParseMailbox.cpp#833

News also speaks in body lines, but that does not affect body searching.

The book-keeping lines that precede the message in the offline store do not get counted:
http://mxr.mozilla.org/comm-central/source/mailnews/base/util/nsMsgDBFolder.cpp#1567

The provided match makes nsMsgBodyHandler change its understanding of the line count provided to it based on whether the message header is offline or not.  This stops us from reading into the next message(s) in the offline store (and producing a false positive if the headers/body of that message match.)  This is a pragmatic solution since it avoids having to blow everything away, but it's sad that there's the asymmetry in the use of the property.
Assignee: nobody → bugmail
Status: NEW → ASSIGNED
Attachment #389659 - Flags: superreview?(bienvenu)
Attachment #389659 - Flags: review?(bienvenu)
this patch looks fine, but I'm not seeing any hits on an "entire message" quick search in my imap folder with or w/o this patch. So I'm trying to figure that out...
ok, the no hits problem was due to a patch I had in my tree. Things look better w/ this patch, but I'm finding that I'm not getting hits on the last line of a message - perhaps this patch exposes an issue where we're not always looking at the last body line.
yeah, so we're not including the x-mozilla-status lines we add in the line count, so we're off by 2. We may be off by 3, actually, because I think we're not counting the From header either. So I think we need to account for these issues as well, to avoid false negatives.

I wonder if we can't use the offline msg size somehow, which appears to be more accurate than the line count...
Attachment #389659 - Flags: superreview?(bienvenu)
Attachment #389659 - Flags: superreview+
Attachment #389659 - Flags: review?(bienvenu)
Attachment #389659 - Flags: review+
Comment on attachment 389659 [details] [diff] [review]
v1 make nsMsgBodyHandler understand that offline messages have line counts including the header

how would you feel about nsBodyHandler start like this:

nsMsgBodyHandler::nsMsgBodyHandler (nsIMsgSearchScopeTerm * scope, PRUint32 offset, PRUint32 numLines, nsIMsgDBHdr* msg, nsIMsgDatabase * db)
{
  m_scope = scope;
  m_localFileOffset = offset;
  m_numLocalLines = numLines;
  PRUint32 flags;
  m_lineCountInBodyLines = NS_SUCCEEDED(msg->GetFlags(&flags)) ?
    !(flags & nsMsgMessageFlags::Offline) : PR_TRUE;
  // account for added x-mozilla-status lines, and envelope line.
  if (!m_lineCountInBodyLines)
    m_numLocalLines += 3;
  m_msgHdr = msg;
  m_db = db;
 
that fixes the last line search problem for me.  If that's OK, then r/sr=me...
pushed:
http://hg.mozilla.org/comm-central/rev/a82b10cd7416

Yes, that is fine with me.  Once the IMAP tests get cleaned up it will be easier to unit test this.  sid0 has foolishly volunteered to clean them up.
Status: ASSIGNED → RESOLVED
Closed: 20 years ago15 years ago
Flags: in-testsuite?
Resolution: --- → FIXED
Whiteboard: [has patch] [needs review]
Target Milestone: --- → Thunderbird 3.0b4
Ok, I am a non-programmer and I don't find any of this helps me.
I really need this function to work or I have to get another email program.
Can someone create an exe file so I can apply the fix?
Flags: in-testsuite?
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: