Last Comment Bug 383895 - Thunderbird Message Body/fulltext search or filter is painfully slow compared to Outlook on folders with lots of messages (both "quick search" and "search messages")
: Thunderbird Message Body/fulltext search or filter is painfully slow compared...
Status: NEW
: perf
Product: Thunderbird
Classification: Client Software
Component: Search (show other bugs)
: 3.0
: All All
: -- normal with 3 votes (vote)
: ---
Assigned To: Nobody; OK to take it and work on it
:
Mentors:
: 380898 (view as bug list)
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2007-06-09 20:26 PDT by domido
Modified: 2016-01-27 16:00 PST (History)
10 users (show)
vseerror: needinfo? (ReubenGarrett)
vseerror: needinfo? (twentyex)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---


Attachments

Description domido 2007-06-09 20:26:03 PDT
User-Agent:       Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.4) Gecko/20070515 Firefox/2.0.0.4
Build Identifier: 2

have many many messages in the local drafts folder and in some other folder under the "Local Folder" node  (thousands).

I use to search on the "Entire Message" and it takes 20 seconds.
On Outlook it took 2-3 seconds.

It is strange since a simple instr()/strstr() should do it, no?
If i open the drafts file with TextPad, and search there, it takes less then 1 seconds.
So, if i search on the "Entire Message" with Thunderbird it should so something simple like instr()/strstr, So why it takes so long?

(compacting did not help).

Is there anything i can do to make it faster? installing addon ... 

Please look here too:
http://forums.mozillazine.org/viewtopic.php?p=2920716#2920716

Reproducible: Always

Steps to Reproduce:
1.Create a folder with ~10000 messages
2.search using the "entire message" option on aaa
3.it takes to much time.
Comment 1 domido 2007-06-10 14:15:02 PDT
The drafts file is ~30MB, 95% of it is a text, 5% jpgs.
Thats it, no other formats.

Thanks.
Comment 2 WADA 2007-06-10 18:35:04 PDT
Possibly result of search word in huge binary attached file.

Bug 379988 indicates search of whole mail payload portion(all lines after a null line after mail headers).    
> 379988 – search body should not match words in MIME headers
And Bug 37031 & Bug 132340 are possibly applicable to image/jpeg part. 
> 37031 – searching message body yields false positives because base64 encoded binary attachments are treated as plaintext
> 132340 – Local body search does not work if the body is encoded as Base64

(Q1) Do your mails have large binary attachments?
Comment 3 domido 2007-06-14 17:12:05 PDT
No.

As i wrote:
The drafts file (the file i search on) is ~40MB, 95% of it is a text, 5% jpgs.
Thats it, no other formats.
It contains something like 10-15 images in size of 100KB each everything else is text messages.

You can reproduce this.
Just make a text message in ~ 5kb size, duplicate it until you reach to ~40 mb of file and search for some text in the "entire message".

I can understand that Thunderbird is doing some parsing there but searching for a word on 30-100 mb of file should not take so long, no?

Thanks.
Comment 4 domido 2007-06-14 17:17:57 PDT
Just to clarify, i am not searching on 1 message using Ctrl+F.
I use the search box in the lower right side with the "entire message" filter.

Comment 5 domido 2007-06-14 17:19:18 PDT
Just to clarify, i am not searching on 1 message using Ctrl+F.
I use the search box in the *UPPER* right side with the "entire message" filter.

Comment 6 WADA 2007-06-14 19:01:47 PDT
(In reply to comment #5)
> I use the search box in the *UPPER* right side with the "entire message" filter.

"take long" looks to be simply a result of total_search_time=(mm sec per a mail)*(NN mails), and "mm" is larger than IE.

I tested with next folders (size/number is rough/about one).
  Mbox1:  60MB(10000 mails),             Mbox2: 120MB(20000 mails)=Mbox1+Mbox1,
  Mbox3: 240MB(40000 mails)=Mbox2+Mbox2, Mbox4: 480MB(80000 mails)=Mbox3+Mbox3
  (Mbox1 is storehouse of Junk mails in the past, and many are text mails)
First hit was displayed very shortly, and following hits were displayed on each hit, and continued searching, then end.
Tolal time is roughly Mbox1:N sec, Mbox2:N*2 sec, Mbox3:N*4 sec, Mbox4: N*8 sec.
When my environment and tested mails, N was around 10 sec, then N*4 becomes around 1 minutes.
Virtual memory size increase while searching was not observed(stayed at a size).

> Is there anything i can do to make it faster?

If this is SQL DB, indexing is a first and effective solution, but mail folder file is plain text file and no indexing for words or text in mail db...
I can't imagine other solution than "performance improvement of text search of single mail". This may be achieved by extension.
Comment 7 domido 2007-06-16 08:04:00 PDT
Hi WADA.
Maybe there is something to improve in the way it does the search.
For example, try to search for a string that does not exist.
It should take you N.

Now open that emails file using textpad, and search for that string.
You will see that it takes N/C when C is quite "big".

I know that thunderbird is doing more than a simple instr, but if i am looking with the "Entire Message" and not for an email/sender... then i do not understand why it takes so long.

Why not doing something like this:
---
loop://on all the emails in the file
i=instr(sSmails,sSearch)
if i==0 then exit function
//found an email - show it
ShowResults(ExtractEmail(sEmails,i))

end loop//search for another email
-----
?

 
Comment 8 domido 2007-06-16 08:10:01 PDT
I addition to my last email.
Thunderbird is quite "new" that is why this problem is not so popular but it will be in the future, but from the day i started using it i reach 40MB of mails that i need to keep, you have ~500mb, what will happen next year?
If possible, then the search on the Entire Message need to be improved.

I am not expert in thunderbird code, i once tried to download the sources, saw how many files it has, got panicked and left it, but maybe someone that read this can improved it/write an extension before this problem become popular.

Comment 9 domido 2007-06-27 11:41:13 PDT
Hi.

I am not sure how things works here - i am now to this bugzilla thing.
Will anyone ever work on this issue or this problem i post here will joing to the thousands of bug reports?

Thanks.
Comment 10 Wayne Mery (:wsmwk, NI for questions) 2007-07-22 17:40:12 PDT
WFM version 3.0a1pre (2007071705) 
<5sec for 44MB local folder with many binary attachments.
<20sec for several folders of 20k messages.

What is the file size reported by *windows* for both the outlook folder and the thunderbird folder?  your thunderbird folder will be in a directory 
C:\Documents and Settings\<username>\Application Data\Thunderbird\Profiles\<something>.<profilename>\Mail\Local Folders
Comment 11 WADA 2007-08-07 04:08:36 PDT
(In reply to comment #10)
> <20sec for several folders of 20k messages.

To Wayne Mery: 
Bug opener's key claim is;
  If there is only one mail which hits search criteria, and if mail is the last
  mail in search order of mails, it'll take 20 seconds to get the found mail.
  But it'll take only a few seconds when IE.
Comment 12 Wayne Mery (:wsmwk, NI for questions) 2007-08-07 09:15:35 PDT
Is it fair to compare thunderbird search to a text file search or outlook - what with mime encoding and such?


(In reply to comment #6)
> (In reply to comment #5)
> > I use the search box in the *UPPER* right side with the "entire message" filter.
> 
> "take long" looks to be simply a result of total_search_time=(mm sec per a
> mail)*(NN mails), and "mm" is larger than IE.
> 
> I tested with next folders (size/number is rough/about one).
>   Mbox1:  60MB(10000 mails),             Mbox2: 120MB(20000 mails)=Mbox1+Mbox1,
>   Mbox3: 240MB(40000 mails)=Mbox2+Mbox2, Mbox4: 480MB(80000 mails)=Mbox3+Mbox3
>   (Mbox1 is storehouse of Junk mails in the past, and many are text mails)
> First hit was displayed very shortly, and following hits were displayed on each
> hit, and continued searching, then end.
> Tolal time is roughly Mbox1:N sec, Mbox2:N*2 sec, Mbox3:N*4 sec, Mbox4: N*8
> sec.

Wada, Did you compact folder between tests?  

With my Mbox1 of 22,000 messages. Mbox1 was 12 sec, Mbox2 (with ~44k messages) was ~60 seconds, but Mbox2 compacted search was ~26 seconds - so search time increase was linear.


(In reply to comment #11)
> (In reply to comment #10)
> > <20sec for several folders of 20k messages.
> 
> To Wayne Mery: 
> Bug opener's key claim is;
>   If there is only one mail which hits search criteria, and if mail is the last
>   mail in search order of mails, it'll take 20 seconds to get the found mail.
>   But it'll take only a few seconds when IE.

my timing ended when thunderbird finished searching - not to the search hit(s). 
Comment 13 domido 2007-08-07 10:26:51 PDT
As i see it:
Why not to add another search option:
Entire Message With MIME or something like that.

In this case TB will search just like you search on Text File.
This will be very fast and will do the job.

What do you think?
Comment 14 WADA 2007-08-07 13:04:12 PDT
(In reply to comment #12)
> so search time increase was linear.
I said so. (Wayne Mery, please note that "*8", not "**8")
Comment 15 WADA 2007-08-08 01:29:46 PDT
(In reply to comment #13)
> Entire Message With MIME or something like that.
What do you mean by "Entire Message With MIME"?
Comment 16 domido 2007-08-08 02:47:44 PDT
(In reply to comment #15)
> (In reply to comment #13)
> > Entire Message With MIME or something like that.
> What do you mean by "Entire Message With MIME"?
> 

As i understand simple instr/strstr takes less than 1 second BUT searching with TB on the entire message takes long BECAUSE it does some MIME processing.

What i suggest it thath TB will give another search option that will do this:

---
loop://on all the messages in the file
i=instr(sSmails,sSearch)
if i==0 then exit function
//found an email - show it
AddResultsToTBListView(Extract1Email(sEmails,i))

end loop//search for another email
-----


This way TB will search without doing any processing.
It the LOOP it will just search for the "string" using fast search function like strstr/instr and when it find 1 it will show it and continue to search on the text file from that location.

What do you think?
Comment 17 Wayne Mery (:wsmwk, NI for questions) 2007-08-08 05:29:05 PDT
(In reply to comment #15)
> (In reply to comment #13)
> > Entire Message With MIME or something like that.
> What do you mean by "Entire Message With MIME"?

I don't know the quick search code (and mime may be a lame or incorrect example) but doesn't it search each individual message and adjust for the language/charset of each message?

Or, does quick search scan the entire mbox as one entity?
Comment 18 domido 2007-08-08 09:20:14 PDT
Not sure why it takes so long.
Maybe someone that know TB code can tell and solve the mystery.
Anyway implementing "Fast Entire Search" can be only 80% accurate (since using strstr only), and when i say 90% i mean that it will work 100% for 90% of the users and less for the rest of the 10%

It just that today i prefer to open the Email file that contains all message in TextPad and to the search there than doing it in TB.
Implementing the "Fast Search" using strstr will solve it.
Comment 19 domido 2007-09-12 15:10:08 PDT
Any progress?
Comment 20 ovidiu 2008-05-15 09:57:26 PDT
can bug 380898 and bug 312282 be considered similar or related? Though they have TB 1.5 and linux ..

Also, is this Core not Thunderbird ?
Comment 22 Andrew Sutherland [:asuth] 2009-02-22 00:28:10 PST
Right, quick search is not especially clever.  It tends to be pretty quick if you don't search the body because all of the other search modes have all the data they need in the .msf file already (which is already parsed into memory).  Body search will be slower.

We are not going to improve quick-search in this regard, but gloda-search (bug 474701) should resolve this use-case.  Not sure what is best buzilla-wise; this bug will have no action directly taken on it, but it would be nice to convey (especially once 474701 is fixed) that this bug should no longer be a problem.
Comment 23 Wayne Mery (:wsmwk, NI for questions) 2009-02-23 06:23:42 PST
still, why is outlook faster? Are message bodies indexed?
Comment 24 Thomas D. (currently busy elsewhere; needinfo?me) 2009-10-13 05:33:45 PDT
As opposed to the assumption of comment #22, bug 474701 has not resolved this use-case, at all.

1) While "Search all messages" (using gloda) is lightning-fast, it's also very complex UI-wise:
- no find-as-you-type feedback to see if your searchwords are good enough
- two extra tabs till you get to the message you're looking for
- first results window hard to parse visually
- first results window showing only few results
- can take many clicks to get to looked-for message
- no (efficient) way of refining the search using additional search words

2) Besides, the search behaviour of "Search all messages" is not clear at all (someone yet needs to explain to me why "susann" will find "Susanne", while searching for "susan" won't, and more issues like that).

Sorry, I don't want nor have time to post separate bugs for that, maybe later.

3) "Search all messages" forces me to search all of my Thunderbird accounts, just every single folder that there is, ever. That means I will see a lot of results that are NOT what I'm actually looking for. In 90% of use cases, I DO know which folder the mail I'm looking for is in, and in 80% of use cases, it's my central inbox where I save copies of my sent mail, too. So just searching fulltext of that single folder would be sufficient and yield better results. So Quick filter does exactly the right thing, but it's painfully slow on fulltext (this bug).


Bottom line is this:

4) Lots (i. e. potentially millions) of users are likely stick to using traditional quicksearch (what we now call filters), if only becaused they're used to it from TB2, and it's A LOT easier UI-wise, as shown in 1).

5) It follows from 1-3 that this bug (and others) will continue to be a major problem for a potentially very big number of users. -> confirming.

6) Although symptoms are same, this bug is NOT a duplicate of Bug 513247 -  Search in message bodies is extremely slow (should check fast criteria like date first). That bug addresses only part of the problem, namely that we are body-searching messages that we shouldn't even search because they are already excluded by other search criteria which we should check first. So that's about more intelligent handling of multiple search parameters.
Whereas this one is only about the slow speed of body-searches itself.

7) Proposed fix for this bug:
- Most needed: Make quick filters "Search Message body" and "Search Entire Message" (Bug 271222) gloda-enabled (which is possible, looking at Bug 380898, comment #4 by Andrew Sutherland).
- Make fixing this bug high priority, if not for TB3, then immediately after.
- In the long run, make all quick filters gloda-enabled.

Comment #23 should be good question to start with, for added motivation.
Comment 25 Thomas D. (currently busy elsewhere; needinfo?me) 2009-10-13 05:40:56 PDT
*** Bug 380898 has been marked as a duplicate of this bug. ***
Comment 26 ovidiu 2009-10-19 12:12:30 PDT
> (someone yet needs to explain to me why "susann" will find "Susanne", while
> searching for "susan" won't, and more issues like that).
> 
> Sorry, I don't want nor have time to post separate bugs for that, maybe later.

bug 523183
Comment 27 Wayne Mery (:wsmwk, NI for questions) 2009-12-31 06:05:06 PST
slow <> major, notwithstanding comparison to other products. 
https://bugzilla.mozilla.org/page.cgi?id=fields.html#importance
Comment 28 WADA 2010-05-08 09:44:00 PDT
domido(bug opener), a solution of Gloda(Global Search and Indexer) is already available by official relese of Tb. Is "slowness in body text search of local mail folder" still seen with Tb 3.0?

Note: If you use Roaming Profile, and if auto-sync for IMAP is enabled(enabled by default upon first use of Tb 3.0), and if offline-use=on is set for IMAP folders(set by default upon first use of Tb 3.0), and if you have big IMAP folders, problem of "too big Roaming Profile" happens due to big offline-store files. Disable it via. Synchronization & Storage of IMAP accounts([ ] Keep messages for this account on this computer).
Comment 29 :aceman 2011-10-25 03:18:40 PDT
I have setup a test folder with about 600 000 messages comprising 2,1GB. I will do some tests about the speed of search with and without gloda. But creating gloda index already took about 10 hours and is 30% finished. So I'll look into it in several days :)
Comment 30 Wayne Mery (:wsmwk, NI for questions) 2011-10-25 04:52:51 PDT
(In reply to aceman from comment #29)
> creating gloda index already took about 10 hours and is 30% finished. So
> I'll look into it in several days :)

aceman, indexing will be faster with 10/24 Daily trunk as of 10/24 builds - new gloda patches (schema change will force full reindex if using an old profile). And, if you are compiling your tbird, undelivered patch in bug 585429 should get you a modest speed improvement. (your PC isn't going to sleep? :)  )

I suspect domido (the reporter) is gone.
Comment 31 :aceman 2011-10-25 05:10:59 PDT
(In reply to Wayne Mery (:wsmwk) from comment #30)
> aceman, indexing will be faster with 10/24 Daily trunk as of 10/24 builds -
> new gloda patches (schema change will force full reindex if using an old
> profile).
I am testing on TB8 so far.

> And, if you are compiling your tbird, undelivered patch in bug
> 585429 should get you a modest speed improvement.
I do compile it (to test some patches) but do not install/run afterward. I didn't expect it to take 2GB disk and 1,2GB RAM :) But yes, I must get around to doing it and run the trunk just can't do everything at once.

>(your PC isn't going to sleep? :)  )
Actually it is. I have closed TB and at next start it continued indexing where it stopped before. A good test by itself :)
Comment 32 twentyex 2013-05-31 13:34:47 PDT
I find it still painfully slow.
Comment 33 martin.monperrus 2013-07-08 01:46:24 PDT
It was very confusing for me.
 
Since Gloda works well, one could:
- either use Gloda when "Body" is selected
- or remove "Body" from the quick filter bar.
Comment 34 Wayne Mery (:wsmwk, NI for questions) 2016-01-27 16:00:34 PST
(In reply to twentyex from comment #32)
> I find it still painfully slow.

twentyex, Reuben,

Using a current version, what kind of search times do you see, on what hardware and folder size?

Note You need to log in before you can comment on or make changes to this bug.