Closed Bug 1584558 Opened 5 months ago Closed 5 months ago

Search Messages with criteria "message body contains" is slow (5 minutes) and makes TB unusable

Categories

(Thunderbird :: Search, defect)

defect
Not set

Tracking

(thunderbird_esr6870+ fixed, thunderbird70 fixed, thunderbird71 fixed)

RESOLVED FIXED
Thunderbird 71.0
Tracking Status
thunderbird_esr68 70+ fixed
thunderbird70 --- fixed
thunderbird71 --- fixed

People

(Reporter: bsh, Assigned: jorgk-bmo)

Details

(Keywords: perf)

Attachments

(2 files)

User Agent: Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:69.0) Gecko/20100101 Firefox/69.0

Steps to reproduce:

Searched my archived emails in a local folder, looking for email body containing a word.

Actual results:

The search took like five minutes, and while it ran, I could not open any of the found messages, and could not even use TB while the search was running, it responded extremely slowly (like, it took a minute to just switch to the main window...)
CPU usage was very low on just one core, so no idea why it made the whole application so slow.

Expected results:

On 60.9 and earlier versions, the same search takes like 15 seconds, and while it is searching I can still use TB in the main window and I can also open the found messages immediately, no problem.

  1. Does this happen for both words that are common hits (like hundreds) and words that are uncommon (one or two hits)?
  2. If it happens for both...
    2a. start win7 in safe mode https://support.microsoft.com/en-us/help/17419/windows-7-advanced-startup-options-safe-mode#start-computer-safe-mode=windows-7 and then start Thunderbird. Do you stlil see the problem?
    2b. Do Help > Troubleshooting > open folder for the profile > how big is global-messages-db.sqlite?
Flags: needinfo?(bsh)

Sorry, I forgot to add that I can not provide any more info because I switched back to TB 60.9 due to way too many bugs.

2b. Do Help > Troubleshooting > open folder for the profile > how big is global-messages-db.sqlite?

Global-messages.db is still there though, and it is 1 245 184 bytes. Did you move to sqlite, like firefox did a while ago? Because FF is also f'in slow when I try to delete the history and behaves the same way as TB did with this version: deleting a month's worth of history cripples FF for a good five minutes... wonderful...

Flags: needinfo?(bsh)
Keywords: perf

We've been sqlite for a decade. But there's no point comparing to Firefox - but usecase and code for Thunderbird is completely different. Also, 1.3gb global-messages-db is an OK size if you are running on modern hardware, eg. not some slacker 3600rpm laptop disk.

But your comment about Firefox suggests something on your system interfering with disk IO - like antivirus.

You cannot make a guess at question 1 based on the word(s) you used?

Flags: needinfo?(bsh)

(In reply to Wayne Mery (:wsmwk) from comment #3)

We've been sqlite for a decade. But there's no point comparing to Firefox - but usecase and code for Thunderbird is completely different. Also, 1.3gb global-messages-db is an OK size if you are running on modern hardware, eg. not some slacker 3600rpm laptop disk.

1.3 MEGABYTE, not gigabyte. The archive is of 15+ GB though, although ordered in yearly subfolder structures, and i've been searching for a mail in only last year's subfolder. The word i was searching for is very uncommon. there were like 5 hits in total.

But your comment about Firefox suggests something on your system interfering with disk IO - like antivirus.

I am running firefox on dozens of systems, like this one here is a i7 4771 with 32GB ram and a samsung 850 pro ssd, at home i have a 7700k with 32GB too and same ssd, and dozens of other computer at the office. all of them are different systems but with good hardware, and firefox history deletion is stupid slow on all of them, no exceptions, it beahves the same on all systems, i always wondered why no one ever reported this bug.

You cannot make a guess at question 1 based on the word(s) you used?

ok i repeated the search now on TB 60.9: took less than 10 seconds and there are 12 hits out of 3317 messages.

Flags: needinfo?(bsh)

It does sound then like the results should be quick.

To be clear, you were using "Search" (ctrl+K) and not Filters (ctrl+shift+K) and not Search Messages (ctrl+shift+F)?

I've been using "search messages" (right click menu item on the tree browser) with criteria "message body contains".

This is about "body search", not gloda. Doing a body search on my bugmail folder with 145000 messages takes 40 seconds the first time and 13 seconds the second time when the OS has cached the file. I don't think there would be a difference between TB 60 and TB 68, but I haven't tried. You could try doing the exact same search after a restart of the PC since, as I said, search times depend heavily on whether the file was cached by the OS or not. For reference: My bugmail mbox file is 600 MB big and has 14 million lines. Every body search will read through the entire file.

I have no idea what gloda is or where it comes from.
The archive folder i've been searching has 4 mbox files, 3.8 gigs total. I have repeated the search on a fresh restart and it again took 10 seconds on 60.9.

So the archive you searched was what size, 3-4gb?

I tested 68.1.1 on my local mail of 3.5gb of a few hundred folders and took less than 1 minute and didn't block the UI. Also EVO 850 SSD. In short, works for me and so I'd say something is off on your system. When you update again to 68, please test with Windows started in safe mode.

Status: UNCONFIRMED → RESOLVED
Closed: 5 months ago
Component: Untriaged → Search
Resolution: --- → INCOMPLETE
Summary: Searching Emails is extremely slow and makes TB unusable → Search Messages with criteria "message body contains" is slow (5 minutes) and makes TB unusable

GLOda is GLObal search, the first of the three search methods I mentioned, and the one that uses global-messages-db.sqlite

I have no idea what gloda is or where it comes from.

Gloda is the Global indexer that indexes all messages into the global-messages-db.sqlite database. You can turn the feature on/off in the options or on a per-folder basis.

I think we should at least investigate whether TB 60 is faster than TB 68. I can't see why, but for piece of mind, don't you think, Wayne?

Flags: needinfo?(vseerror)

Per my test, it's working fine. So we'd only investigate if reporter retests with Windows in safe mode.

Flags: needinfo?(vseerror)

OK, I tried my large bugmail folder on TB 60 and TB 68 and saw similar body search times. About 35 sec on first search, 14 sec on subsequent one.

Ok I have recorded a video on 60.9, then reinstalled 68.1.1 and recorded that too. But it contains private information so I wouldn't share here. If interested, send me an email or something.
Then I also tried safe mode (where I couldn't use OBS so no video): it was indeed faster but still much slower than 60.9 and the UI didn't respond while the search ran.

A few things here:

Send me a link to the video in a private message.

Performance analysis and identifying bottleneck is an art/skill/science to itself. Sadly we don't have the manpower to dedicate to a detailed analysis here. In general, the backend for that search hasn't changed much at all between TB 60 and TB 68. It's possible that there were some smaller changes that weren't backported to TB 60, but looking at it, even things like bug 1360803 were backported.

We've also changed the visual feedback during the search. We just displayed some yellow background, now there is a spinner that rotates. That might also eat CPU time. You can disable that spinner in a personal userChrome.css file. I have some customised settings and I use:

#threadTree[filterActive="searching"] {
  /* background: linear-gradient(#ffc, #ffc); */
  background-color: #ffffcc !important;
}

#threadTree[filterActive="matches"] {
/*  background: linear-gradient(#f0ffff, #f0ffff); */
/*  background-color: #f2f9fc; */
    background-color: #f0ffff ! important;
}

#threadTree[filterActive="nomatches"] {
  background: repeating-linear-gradient(-45deg, #fff0f4,
              #fff0f4 5px, white 5px, white 10px) !important;
}

The interesting part is the [filterActive="searching"]. I think you can get rid of the spinner by using background: linear-gradient(#ffc, #ffc); which is what I have commented out above since I like the rotating graphic.

Does that give any improvement?

Other than that, I can just suggest that you download debug versions of TB 60 and TB 68 (containing symbols) and use some profiling tools to find out where the time is spent.

Sorry, we don't have a better answer right now.

(In reply to Jorg K (GMT+2) from comment #15)

Send me a link to the video in a private message.

I don't know how to send private messages on this "forum", so I've sent you and email to your email address shown here. Did you mean that? And did you get it?

Since when does Search Messages (non-gloda search) have a spinner? I've never seen it - behind a preference?

The profile tool of choice is Gecko Profiler

  1. Use Thunderbird 68 or newer
  2. Install profiler add-on into thunderbird - get the add-on file from https://github.com/firefox-devtools/Gecko-Profiler-Addon/blob/master/gecko_profiler.xpi?raw=true and in Tools > add-ons click the gear to install add-on from file
  3. Follow instructions at https://profiler.firefox.com/ Also see videos, which are based on Firefox but equally applicable to Thunderbird.
  4. Create a profiler URL and post it here, along with a description of what/how you tested.

Yes, I meat that, I got it and I watched it ... and I won't share it ;-)

I was always using the Quick Filter Bar for the searches, but I can see that this is using the search window. So I did the same: Searched for Wayne (my favourite search) in the body. All worked nicely, the results were rolling in and at the same time I could open them. Nothing I'd do since I open messages in tabs and that brings the "big" main window to the foreground and I lose focus of the search window. I tried both TB 60 and TB 68. During the search, neither version is super responsive, but messages do open while the search is going on. I definitely don't experience what I saw in the video for TB 68 I received. Are you using any add-ons?

@Wayne: Sorry, the QBF search in the main window displays a spinner these days, can you see it? I was on the wrong boat.

I am using "Import/Export Tools", "Remove Duplicate Messages (alternate)", and "Signature Switch" addons, but when I upgrade (lol) to TB, all of these are disabled due to incompatibility with the new version. So the answer is: no active addons for TB 68.

There are compatible versions of "Signature Switch" and "Import Export Tools NG" available at ATN. I'm using them myself.

Let me repeat: I don't experience with TB 68 what I can see in your video. For me, TB 60 and TB 68 behave approximately the same.

I'm glad it works for you, but it doesn't help me at all, because for me it does not work (as you've seen) and the only difference is TB itself. The bug shouldn't be closed just because it doesn't affect you.
Oh and test on windows 7 as well, because it is not dead yet. (I'm referring to the missing window borders bug here, apparently NONE of the developers ever tried to run a RELEASE on windows 7?! that is just tragically ridiculous.)
So I'm back on 60.9 and I'm staying there. this is tooooo buggy.

Your report is appreciated, but we cannot address something we cannot reproduce. Comment 17 would give us exactly what is needed.

And your presumption about people not testing on windows 7 is entirely incorrect. I use win7 and have been using versions newer than 60 for almost a year. Plus we have several thousand daily users of beta on windows 7.

Hope this helps:
https://perfht.ml/2nDM2cD

what's up with that vc runtime memcopy taking ages? strange.
how do I do a profile using TB 60.9? Can't seem to find the icon for the profiler anywhere, so I can't start it. Customize menu only has an icon for Remove duplicates addon, but not for the profiler.

Interesting result. So it spends a lot of time in nsMsgBodyHandler::Base64Decode() which hasn't changed in ages and in there in the nsTSubstring::Replace() function that comes from the Mozilla platform. The bulk of the time is spent in memmove(). The actual reading through the file in nsFileInputStream::ReadLine() takes comparatively little time.

Unless nsTSubstring::Replace() is dramatically worse now, which I don't observe, there seems to be an issue with your vcruntime140.dll. Note that TB 60 is using the same runtime.

Jorg, I find it odd that we're not seeing any indication in the profile of file read activity in the OS. Or am I missing something?

What antivirus are you running?
Curious - is each your search iterating strictly over one folder? (i.e.you are not including subfolders)

(In reply to Janos Kövari from comment #14)

Then I also tried safe mode (where I couldn't use OBS so no video): it was indeed faster but still much slower than 60.9 and the UI didn't respond while the search ran.

Please in the future time your testse and provide numbers of (+/- ten) seconds so that we can accurately compare example tests.

Jorg, I find it odd that we're not seeing any indication in the profile of file read activity in the OS. Or am I missing something?

Comment #25 said: The actual reading through the file in nsFileInputStream::ReadLine() takes comparatively little time.

(In reply to Wayne Mery (:wsmwk) from comment #26)

Jorg, I find it odd that we're not seeing any indication in the profile of file read activity in the OS. Or am I missing something?

What antivirus are you running?
Curious - is each your search iterating strictly over one folder? (i.e.you are not including subfolders)

(In reply to Janos Kövari from comment #14)

Then I also tried safe mode (where I couldn't use OBS so no video): it was indeed faster but still much slower than 60.9 and the UI didn't respond while the search ran.

Please in the future time your testse and provide numbers of (+/- ten) seconds so that we can accurately compare example tests.

avast business antivirus pro (this is my work computer)
this is the structure: Local folders -> Archive folder, and then in the Archive there are subolders for each year (from 2005 up to 2018). Then in each of those yearly subfolders, there are two subfolders: mailaccount1 and mailaccount2. then in both of them, there are two subfolders: incoming and sent items.
when I was searching, i was searchin the entire "2018" folder including its subfolders. that's the ~3.5 GB total size, ~3300 emails, 4 mbox files. The bulk of it is in my work mail account sent item, many emails have 10+ meg attachments.
while i was testing in safe mode, the search took about 1 minute or so, while the ui was unresponsive. this is 5 times faster than normal, but 5 times slower than with TB 60.9.
there will be no future testing. I need to use this damn thing. I'm back on 60.9.

And I'm still convinced that this is a bug in TB itself. Everything points towards it.
While Antivirus can certainly affect performance, but it is the same antivirus with both 60.9 and 68.1.1, so why would it not slow down 60.9 but only 68.1.1? Also running in safe mode without antivirus still shows that 68.11 is much slower than 60.9.
The vcredit memcopy thing is certainly peculiar, but if TB 60.9 (and certainly many other program written in vs) is using the same file, why only 68 is affected? Something must have changed in TB that would make it slow.

Oh one thing I forgat: in that capture, I had to increase the record lenght and buffer sizes in the profiler addon. I also cancelled the search after about 30 seconds to stay in the 60s record window. (but since the UI did not respond to the abort, only much later, I was lucky to have stayed within that 60s. so the capture is not the full search, only part of it.

Attached image profiler.png

OK, I think we need a more scientific approach here. When I was saying that TB 60 and TB 68 behave approximately the same for me, I was testing on a 600 MB folder purely of bugmail here from BMO. All messages are plaintext, QP encoded, UTF-8. There is not a single base64 encoded message in the folder. So my results are not related to yours and I assume Wayne's results are also not representative. So I suggest this:

  1. I'll make that folder available to you and you can compare the performance. I'll do that now via WeTransfer
  2. If you want, you can make your "test" folder available to me, and I can see what's happening with it.
  3. We should see what may have changed in nsTSubstring::Replace or nsTSubstring::ReplacePrepInternal.

Here's a screenshot from https://profiler.firefox.com/public/96a4a92d8ebad889935288bbd1e2f068fad82c2e/calltree/?globalTrackOrder=0&hiddenLocalTracksByPid=1184-0-2-3-4-11-12-13&localTrackOrderByPid=1184-0-1-2-3-4-5-6-7-8-9-10-11-12-13-14-15-16-17~&thread=0&v=4

Henri, if you look at the picture attached in comment #31, you can see that our message body search in base64 encoded messages spends an awful lot of time in nsTSubstring::ReplacePrepInternal. That function was changed between mozilla60 and mozilla68 here:
https://hg.mozilla.org/mozilla-central/rev/400d4a90fe04#l3.73

Is there any reason to assume that this is significantly slower now?

Or maybe our nsMsgBodyHandler::Base64Decode()
https://searchfox.org/comm-central/rev/266e9cc242cd0de076e85eb4aa0b8392fcb2ca01/mailnews/base/search/src/nsMsgBodyHandler.cpp#448
using PL_Base64Decode() needs an overhaul.

Re-opening the bug for further investigation.

Status: RESOLVED → REOPENED
Ever confirmed: true
Flags: needinfo?(hsivonen)
Resolution: INCOMPLETE → ---

Actually, should it decode bas64 at all? those are the binary attachments, and are not part of the body (multipart). I don't want to search through the mime parts.
Maybe 60.9 didn't do that, that's why it is faster? Just guessing.

Hmm, well, bodies can be base4-encoded as well. We fixed a lot of body search issues in the last year or so, but all fixes should be equally present in TB 60 and TB 68. I've just confirmed that by comparing the 60 and 68 version of nsMsgBodyHandler.cpp which is where that happens. The code also only decodes text parts of the message. So in summary: As far as the base64 decoding goes, nothing has changed in TB code and all is correctly handled.

OK, that fixes the totally crazy code in nsMsgBodyHandler::Base64Decode().

Janos, would you be willing to try a test build of TB 68?

Assignee: nobody → jorgk
Status: REOPENED → ASSIGNED
Attachment #9097256 - Flags: review?(acelists)

I forgot to say: mach xpcshell-test comm/mailnews/base/test/unit/test_searchBody.js still passes.

Comment on attachment 9097256 [details] [diff] [review]
1584558-fix-clumsy-stuff.patch

Review of attachment 9097256 [details] [diff] [review]:
-----------------------------------------------------------------

Thanks, this should be much faster without any memmoves. Should be safe enough to run on the raw char* string when just replacing one char for another without changing length of the string. Otherwise we could use nsCString.ReplaceChar() or similar. nsCString.Replace() seems to be the wrong method to call for this job.

I tested manually that this still works on searching a base64 encoded message body.
Attachment #9097256 - Flags: review?(acelists) → review+
Keywords: checkin-needed

Great find, and approach. I made too many assumptions and should have dug deeper in the trace. (Also found the profiler doesn't work in my 64bit Thunderbird.)

QFB would have the same issue?

Keywords: checkin-needed
Keywords: checkin-needed

QFB would have the same issue?

Absolutely. All the same search code.

(In reply to Jorg K (GMT+2) from comment #35)

OK, that fixes the totally crazy code in nsMsgBodyHandler::Base64Decode().

Janos, would you be willing to try a test build of TB 68?

Is it the Base64Decode(), "which hasn't changed in ages"?! :D
If the test build also has the fix for the missing window frames, then i could give it a try.

(In reply to Janos Kövari from comment #40)

Is it the Base64Decode(), "which hasn't changed in ages"?! :D

Yes, our algorithm hasn't changed, but the helper function for string manipulation (Replace character) has changed in the library coming from mozilla-central (shared with Firefox). That revealed we do not need Replace function which does copying of the string around (if the original and replacement substrings aren't the same length), when we just need an in-place replacement of one character with another one.

Please try the test build once Jorg produces it.

Also many thanks for the profile, it did pin-point the problem nicely.

Flags: needinfo?(hsivonen)

(In reply to Jorg K (GMT+2) from comment #32)

Henri, if you look at the picture attached in comment #31, you can see that our message body search in base64 encoded messages spends an awful lot of time in nsTSubstring::ReplacePrepInternal. That function was changed between mozilla60 and mozilla68 here:
https://hg.mozilla.org/mozilla-central/rev/400d4a90fe04#l3.73

Is there any reason to assume that this is significantly slower now?

In opt builds, no, it's not supposed to be slower. If it is, I'd like to know why. (In debug builds, it does work to mark logically-uninitialized memory as poisoned, which it didn't do before.)

Or maybe our nsMsgBodyHandler::Base64Decode()
https://searchfox.org/comm-central/rev/266e9cc242cd0de076e85eb4aa0b8392fcb2ca01/mailnews/base/search/src/nsMsgBodyHandler.cpp#448
using PL_Base64Decode() needs an overhaul.

Pretty clearly it does. The current implementation is shockingly inefficient regardless of how fast exactly Replace is.

If the test build also has the fix for the missing window frames, then i could give it a try.

Yes, I have that included. Would you like a 32bit or 64bit binary?

32

@Janos: Will do in the next few hours.

@Henri: Henri, thanks for the comment. "In opt builds, no, it's not supposed to be slower. If it is, I'd like to know why." - I'm not sure how we can do that. All the information we have is what the profiler reported. A lot of memmove from nsTSubstring::ReplacePrepInternal on a single character replace. Perhaps you could just run 100K iterations of that using the current and previous code under defined conditions.

Pushed by mozilla@jorgk.com:
https://hg.mozilla.org/comm-central/rev/382c8c0732f0
Replace clumsy loops using Replace() with single character replacement in nsMsgBodyHandler::Base64Decode(). r=aceman

Status: ASSIGNED → RESOLVED
Closed: 5 months ago5 months ago
Keywords: checkin-needed
Resolution: --- → FIXED
Target Milestone: --- → Thunderbird 71.0
Attachment #9097256 - Flags: approval-comm-esr68+
Attachment #9097256 - Flags: approval-comm-beta+

TB 68.1.2 or TB 68.2:
https://hg.mozilla.org/releases/comm-esr68/rev/fb4daed5bea1130f3e73829e644c4b822d374aa9

I've landed this so I can be the test rabbit. I'll also make the version available to Janos. This will also get beta exposure before it hits prime time. That said, the fix is really straight forward and well understood.

Janos, you can run this from a ZIP file or installer:
https://queue.taskcluster.net/v1/task/Z1xm-L8JQJi4DAla9zfIGw/runs/0/artifacts/public/build/target.zip
https://queue.taskcluster.net/v1/task/Z1xm-L8JQJi4DAla9zfIGw/runs/0/artifacts/public/build/install/sea/target.installer.exe

We'll ship the fix in TB 68.1.2 or TB 68.2. Note that Calendar's MiniMonth is broken in that version. I expect a fix for that during the course of the day, if you prefer to wait.

Good work. Search is fast again.
The profiler doesn't work though and is buggy, I tried to capture a new profile but couldn't. So all I can say is it is now searchig fast and without blocking the UI.

Henri, this change made it fast again https://hg.mozilla.org/comm-central/rev/382c8c0732f0, so I suspect there's something wrong with the new Replace.

mmm, just a question:
I saw that the zip from Jörg comes with msvcp140.dll and vcruntime140.dll, which are ver. 14.15.something, and these are basically the 2017 version of vc runtime libraries.
the installed Thunderbird version 60.9 has version 14.11.something of these two files.
my system has the vc runtime 2015 installed (and some earlier versions too), that is ver. 14.0.something.

so, which dll is actually used by TB? I just ran it from the folder extracted from the zip, no install or nothing.
could it still be that the newer versions of these dll's are used and that is speeding things up or have some bugs fixed?

It uses the DLL in the folder of the executable.

so there is another difference between the test build and the 60.9 version, not just the optimized replace function.

(In reply to Jorg K (GMT+2) from comment #50)

Henri, this change made it fast again https://hg.mozilla.org/comm-central/rev/382c8c0732f0,

This should make it a lot faster even if there wasn't anything wrong with Replace...

so I suspect there's something wrong with the new Replace.

Now that I think about it, it's possible that the new Replace is more eager to shrink the string's buffer to fit. However, you really need to end up replacing a lot of characters relative to the length of the string to make it roughly halve its length before the buffer shrinking is supposed to happen.

(In reply to Henri Sivonen from comment #42)

Pretty clearly it does. The current implementation is shockingly inefficient regardless of how fast exactly Replace is.

Really? It scans the whole string twice, but that's not so shocking.

(In reply to aceman from comment #41)

Replace function which does copying of the string around (if the original and replacement substrings aren't the same length)

(In reply to Henri Sivonen from comment #42)

it's not supposed to be slower. If it is, I'd like to know why. (In debug builds, it does work to mark logically-uninitialized memory as poisoned, which it didn't do before.)

(In reply to Jorg K from comment #45)

A lot of memmove from nsTSubstring::ReplacePrepInternal on a single character replace.

aceman correctly described what used to happen, and Jorg K correctly described what happens now, so it didn't take me much digging to find out the cause - the old Replace optimised away the memmove if the original and replacement strings were the same length, and the new Replace removed that code, and didn't replace it with any equivalent code.

Henri?

Flags: needinfo?(hsivonen)

(In reply to Jorg K (GMT+1) (PTO to 5th Jan 2020, sporadically reading bugmail) from comment #57)

Henri?

Filed as bug 1605063. Thanks for tracking this down!

Flags: needinfo?(hsivonen)
You need to log in before you can comment on or make changes to this bug.