Incorrect message size in summary file causes global search indexing to hang
Categories
(Thunderbird :: Search, defect, P2)
Tracking
(thunderbird_esr140 affected, thunderbird146 affected, thunderbird147 affected, thunderbird148 affected)
People
(Reporter: bugzilla, Assigned: welpy-cw)
References
(Blocks 1 open bug, Regression)
Details
(Keywords: perf, regression, Whiteboard: [has performance profile])
Attachments
(3 files)
User Agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:143.0) Gecko/20100101 Firefox/143.0
Steps to reproduce:
Use thunderbird normally. Notice that search results are very incomplete; then see that messages are constantly being indexed (but no progress is being made).
I have two email accounts, both with many (~100k) messages. I had trouble with the indexer before bug 1941019, but that problem went away when upgrading to 140ESR.
I am now on 143 (not-ESR, as this is what the TB download page recommends. I think the problem may have returned with 142 already, though I'm not sure.
I've tried the steps I took in previous bug 1941019 to no avail. Specifically:
- Delete the "global-messages-db.sqlite" file
- Repair the folder it seems to be stuck on
- Disable all extensions
- Start TB without being connected to internet (what got me unstuck with previous bug).
Actual results:
"Indexing messages" gets stuck (e.g., current status is "Indexing 32 of 68701 messages in [[redacted]]").
Expected results:
Indexing messages should make progress, or at least give some information on why it is stuck (e.g., is there a malformed email somewhere?).
| Reporter | ||
Comment 1•7 months ago
|
||
To add: I'm on linux, running the version as downloaded from https://www.thunderbird.net/en-US/thunderbird/all/
Updated•7 months ago
|
Comment 2•7 months ago
|
||
While giving feedback for the call for testing on EWS feature (https://thunderbird.topicbox.com/groups/planning/T00a28d546ed1b7d1-M6077113cd8e22173d5ee7454/ews-in-thunderbird-144-0b1-call-for-testing) I also noticed the indexing of newly created account to be very slow with TB 144.0b2 (64-bit) on Windows 11 Pro.
With TB 140.3.0esr (64-bit) on Windows 10 Pro, as you would see attached, the indexing is extremely slow...
Comment 3•7 months ago
|
||
Here is a performance profile recorded under TB 140.3.0esr (64-bit) with a a very large IMAP mailbox while it was processing some indexing extremely slowly: https://share.firefox.dev/46sMihI
Hope that help.
Comment 4•7 months ago
|
||
Does the indexing process index only the headers of the email or also the body?
I don't know if that matter, but in my case all folders are set not to be used offline ("Select this folder for offline use" option is disabled on all folders) so I would expect the indexing not to index the body of message as it isn't present in the local cache and I would not want the indexing process to download all messages locally (eg 100GB worth of emails).
If a search is requested on the body content, I would expect the search to occur on the server side where the body content is present. "Run search on the server" to be enabled automatically when searching criteria include Body field while working in online mode. This option is available in Message Search feature, and Quick Filter search... but is it implemented in the Global Search feature which currently may require Body indexing to work?
Updated•7 months ago
|
Updated•7 months ago
|
Comment 5•7 months ago
•
|
||
(In reply to Richard Leger from comment #4)
Does the indexing process index only the headers of the email or also the body?
Headers, and first 20k (IIRC) of body.
I don't know if that matter, but in my case all folders are set not to be used offline ("Select this folder for offline use" option is disabled on all folders) so I would expect the indexing not to index the body of message as it isn't present in the local cache and I would not want the indexing process to download all messages locally (eg 100GB worth of emails).
Then if doing just headers, indexing should be very fast.
If a search is requested on the body content, I would expect the search to occur on the server side where the body content is present. "Run search on the server" to be enabled automatically when searching criteria include Body field while working in online mode. This option is available in Message Search feature, and Quick Filter search... but is it implemented in the Global Search feature which currently may require Body indexing to work?
AFAIK global search only operates on data that has been downloaded. If the body has not been downloaded ("sync") then the body content will not be found.
Comment 6•7 months ago
|
||
(In reply to Richard Leger from comment #3)
Here is a performance profile recorded under TB 140.3.0esr (64-bit) with a a very large IMAP mailbox while it was processing some indexing extremely slowly: https://share.firefox.dev/46sMihI
Hope that help.
Thanks so much.
However, can you redo the profile with 140.3.1, available today? Because 140.3.0 is missing important performance fixes:
- Bug 1973165 - Large load times on first launch when trying to view messages from Inbox (and other UI blockage) while messages are downloading
- Bug 1971648 - Add contain: strict to pane elements to reduce layout complexity, and avoid performance impact from animation in status bar
(FWIW they are in release 142)
Comment 7•7 months ago
|
||
(In reply to Wayne Mery (:wsmwk) from comment #6)
However, can you redo the profile with 140.3.1, available today?
In TB 140.3.1esr (64-bit) on Windows 11 Pro
While repairing one folder containing 2000ish emails - 1.3GB total size (as shown by TB)
https://share.firefox.dev/3KpJl9b
Hope that help.
A bit faster (but much less email than indexed previously) seems to slow down as the process progress.
About 7s to index 250ish emails as a rough estimate from a visual point of view (Activity Manager), seems quite long!
(In reply to Wayne Mery (:wsmwk) from comment #5)
(In reply to Richard Leger from comment #4)
Does the indexing process index only the headers of the email or also the body?
Headers, and first 20k (IIRC) of body.
When offline use is disabled I would not expect TB to have to download 20k of body from the server because if you have 200000 emails or more that is a lot of data to download (bandwidth usage). That should not be required.
Comment 8•7 months ago
|
||
I was not implying that 20k is downloaded if you have sync disabled. :)
IF the body is downloaded, by autosync or by manually accessing a message, then the message body gets indexed.
| Reporter | ||
Comment 9•7 months ago
|
||
Note that, for me, it is not slow, it is stuck. The indexer will do a handful of messages when I start thunderbird up (last time 32, now 146), but then just stay there. It doesn't seem to matter how long I wait at this point, it won't proceed (until I close and reopen TB).
Comment 10•7 months ago
|
||
(In reply to Mark Mueller from comment #9)
Note that, for me, it is not slow, it is stuck. The indexer will do a handful of messages when I start thunderbird up (last time 32, now 146), but then just stay there. It doesn't seem to matter how long I wait at this point, it won't proceed (until I close and reopen TB).
Are you able to reproduce the steps?
Any error in the console (CTRL+SHIFT+J) when the issue raise (indexing seems to run & then stops)?
Maybe if it happens again, try to capture a performance profile (https://support.mozilla.org/en-US/kb/profiling-thunderbird-performance) during the indexing process (while it is running & stopping if possible) and publish it here.
| Reporter | ||
Comment 11•7 months ago
|
||
(In reply to Richard Leger from comment #10)
(In reply to Mark Mueller from comment #
Note that, for me, it is not slow, it is stuck. The indexer will do a handful of messages when I start thunderbird up (last time 32, now 146), but then just stay there. It doesn't seem to matter how long I wait at this point, it won't proceed (until I close and reopen TB).
Are you able to reproduce the steps?
Any error in the console (CTRL+SHIFT+J) when the issue raise (indexing seems to run & then stops)?
Maybe if it happens again, try to capture a performance profile (https://support.mozilla.org/en-US/kb/profiling-thunderbird-performance) during the indexing process (while it is running & stopping if possible) and publish it here.
If I disable all addons, then no errors show in the error console (but indexer still gets stuck). I don't think addons are relevant, but they cause some errors which clutter the error console.
I don't think performance profile is relevant, as thunderbird is not using high resources when stuck (0.01% CPU, and around 850MB of RAM).
Comment 12•7 months ago
|
||
@Mark,
(In reply to Mark Mueller from comment #11)
I don't think performance profile is relevant, as thunderbird is not using high resources when stuck (0.01% CPU, and around 850MB of RAM).
I tend to think it may potentially help developers identify what may be going on under the hood... and it does not take long to capture and publish as far as you can reproduce the issue, and capture it while it is happening especially in absence of error in the console.
Alternatively or in complement, if you are up for it, as you seems to be able to reproduce the issue, you could try the https://mozilla.github.io/mozregression tool to identify the set of patches that caused the regression in your case between 140ESR which seems ok and 143 that seems ko. I often use the 6.3.0 version on Windows (not the latest because it does not pass the virus total test yet), they are option available for Linux.
@Wayne,
- Is there a better way to capture indexing logs to get more info on the issue? By rasing log level perhaps?
- I don't think neither Mark or me are currently using the new indexing system coming up, https://source-docs.thunderbird.net/en/latest/panorama/index.html, is there a way to check?
In the ESR branch (https://www.thunderbird.net/en-US/thunderbird/releases), I only see few fixes related to indexing in TB Version 140.0esr (https://www.thunderbird.net/en-US/thunderbird/140.0esr/releasenotes), could they have had side effects? I don't see anything related to indexing in successive release notes up to 143.
| Reporter | ||
Comment 13•6 months ago
|
||
| Reporter | ||
Comment 14•6 months ago
|
||
(In reply to Richard Leger from comment #12)
@Mark,
(In reply to Mark Mueller from comment #11)
I don't think performance profile is relevant, as thunderbird is not using high resources when stuck (0.01% CPU, and around 850MB of RAM).
I tend to think it may potentially help developers identify what may be going on under the hood... and it does not take long to capture and publish as far as you can reproduce the issue, and capture it while it is happening especially in absence of error in the console.
Sorry, took me some time to get to this. I did a profiler capture. I had to follow these steps ( https://bugzilla.mozilla.org/show_bug.cgi?id=1920387#c4 ) as the default guide gave me an error after the capture. Recording is here: https://bugzilla.mozilla.org/attachment.cgi?id=9520059
I can only start the recording after TB is already running. After starting the recording, TB activities manager showed the indexer starting, and it got stuck now at 4 messages indexed.
Note that I'm trusting the privacy preserving is OK, and no passwords / tokens are being shared. Please let me know if that's not true.
Alternatively or in complement, if you are up for it, as you seems to be able to reproduce the issue, you could try the https://mozilla.github.io/mozregression tool to identify the set of patches that caused the regression in your case between 140ESR which seems ok and 143 that seems ko. I often use the 6.3.0 version on Windows (not the latest because it does not pass the virus total test yet), they are option available for Linux.
I wanted to do this, but realistically, won't have time in the near future.
Comment 15•6 months ago
|
||
(In reply to Mark Mueller from comment #14)
Recording is here: https://bugzilla.mozilla.org/attachment.cgi?id=9520059
Ta.
It is usually good practice to Load profile from file on https://profiler.firefox.com and then Upload Local Profile (top right corner) + Upload to get an URL to the loaded profile that you can copy here e.g https://share.firefox.dev/4qgKJLK so it gets easily accessible for everyone.
I see Jank caused by Mailnews (indexing?) and an increase of memory usage.
Hopefully that will help developers identify the possible issue here.
Updated•6 months ago
|
Updated•6 months ago
|
Updated•6 months ago
|
Comment 16•6 months ago
|
||
I can reproduce this bug on 140.3.0esr:
- It starts indexing and then stop "in the middle"
- Consumes 0% CPU
- tried to let it stuck the whole night
- this morning it's still at the exact same point, still at 0% CPU.
Comment 17•6 months ago
|
||
After re-starting it a few time (deleting global-messages-db.sqlite between each run) too see if it halts at the same point: it does not.
On the first runs it halted during the first directory (Sent in my case), but now it successfully indexed "Sent" and got stuck indexing another directory.
Comment 18•6 months ago
|
||
Tried running mozregression. As the bug does not happen fast it's often hard to tell "will be bug happen in one minute or is it a good build?", yet I tried (took hours, haha), and here what I got:
245:16.95 INFO: Last good revision: 3db59b66aa1225bfaa239ca2c435a612cf22e63a
245:16.95 INFO: First bad revision: 6ab306398cb1fe19f73f76b0ebd5f42edad79d0e
245:16.95 INFO: Pushlog:
https://hg.mozilla.org/comm-central/pushloghtml?fromchange=3db59b66aa1225bfaa239ca2c435a612cf22e63a&tochange=6ab306398cb1fe19f73f76b0ebd5f42edad79d0e
So the bug may have appeared around 134.0a1.
But also it looks like it disappeared later (I have not bisected the fix yet), but I found it works in 146.0a1.
Problem is: the bug exists in current ESR (140).
Comment 19•6 months ago
|
||
Forget what I wrote about 146.0a1: it has the bug too, I may have not waited long enough when I tried it :(((
Comment 20•6 months ago
|
||
I'm getting better at identifying the bug, I have a folder named other, when it freezes it freeze around the same number of emails (1185, 1139, 1131, 1189, 1161, 1169, 1236 from my last bisect). I bet the small differences are due to parallelism or something.
Happily, my 2nd bisect gave me the exact same result than my first one:
533:34.39 INFO: Narrowed integration regression window from [3db59b66, d4568348] (6 builds) to [3db59b66, 6ab30639] (2 builds) (~1 steps left)
533:34.39 INFO: No more integration revisions, bisection finished.
533:34.39 INFO: Last good revision: 3db59b66aa1225bfaa239ca2c435a612cf22e63a
533:34.39 INFO: First bad revision: 6ab306398cb1fe19f73f76b0ebd5f42edad79d0e
533:34.39 INFO: Pushlog:
https://hg.mozilla.org/comm-central/pushloghtml?fromchange=3db59b66aa1225bfaa239ca2c435a612cf22e63a&tochange=6ab306398cb1fe19f73f76b0ebd5f42edad79d0e
No worries, I did not spent 533 minutes bisecting, had to AFK multiples times.
Comment 21•6 months ago
|
||
(In reply to Julien Palard from comment #18)
Tried running mozregression.
Well done Julien! Hopefully that will help the dev team identify and fix the issue.
For the record what is your OS (Linux?Windows?), OS Architecture (32bits?64bits), TB Architecture (32bits?64bits? Indicated in Menu > Help > About within the version).
Comment 22•6 months ago
|
||
I'm using Debian Trixie, on a 64 bits architecture (AMD Ryzen 7 7840HS) running TH 140.3.0esr (64-bit) from the Debian repositories.
Comment 23•6 months ago
|
||
Could be from bug 1924291 then.
Comment 24•6 months ago
|
||
(In reply to Richard Leger from comment #3)
Here is a performance profile recorded under TB 140.3.0esr (64-bit) with a a very large IMAP mailbox while it was processing some indexing extremely slowly: https://share.firefox.dev/46sMihI
(In reply to Richard Leger from comment #7)
(In reply to Wayne Mery (:wsmwk) from comment #6)
However, can you redo the profile with 140.3.1, available today?
In TB 140.3.1esr (64-bit) on Windows 11 Pro
While repairing one folder containing 2000ish emails - 1.3GB total size (as shown by TB)
https://share.firefox.dev/3KpJl9bA bit faster (but much less email than indexed previously) seems to slow down as the process progress.
About 7s to index 250ish emails as a rough estimate from a visual point of view (Activity Manager), seems quite long!
To my eye there are significant differences between Richard's two profiles
- 140.3.0 has much more painting, for example https://share.firefox.dev/47gGpDo. But it also has multiple memory increases of 500mb.
- 140.3.1 does have significant gloda activity as seen in https://share.firefox.dev/3Whwto8 - not surprising given you did a "repair" - but fewer (and less) memory increase
Comment 25•6 months ago
|
||
Just received 140.4.0esr through a Debian trixie update, tried a reindex just in case. Still stuck at the same spot, at 0% CPU, similar as before.
Comment 26•6 months ago
|
||
Someone who understands the indexer should do the following:
Check for stop conditions. At what time might the indexer decide to not continue.
Or alternatively, figure out what's necessary to keep the indexer going. Because it's a background activity, I'm guessing that we're posting events in some way, that schedule our code to regularly continue with that activity.
I suggest to add a minimal amount of logging for those events.
This could be used by people who can reproduce the bug. They could confirm whether those "keep active" events keep coming, or whether they eventually stop (at the time indexing fails). This could then be followed by more logging that watches for the various scenarios that could cause the keep-running events to stop.
(These are high-level thoughts, I don't know how it works exactly.)
This bug needs an owner who understands this code, can provide experimental builds with logging, that the effected users could test.
Comment 27•6 months ago
|
||
I'm marking this bug as confirmed, it sounds like we got plausible reporting that it's a real issue.
Updated•6 months ago
|
Updated•6 months ago
|
Comment 28•4 months ago
|
||
Got Thunderbird 140.6.0esr-1 through Debian Forky, and I'm still having this bug.
| Assignee | ||
Comment 29•4 months ago
|
||
I was able to reproduce the hang when indexing an IMAP folder with offline use: Patching the size and offlineMsgSize fields of a message in the .msf file (similar to this test) leads to MboxMsgInputStream::Read failing with NS_MSG_ERROR_UNEXPECTED_SIZE. In this case, onStopRequest will be called with that statuscode, but OnStopRunningUrl will be passed NS_OK, so the message is never finalized, causing the indexing progress to hang. (For a local folder this works as expected, both listeners receive the error code.)
Since this is regressed by the combination of bug 1851074 and bug 1924291, marking both bugs as responsible.
| Assignee | ||
Comment 30•4 months ago
|
||
Updated•4 months ago
|
| Assignee | ||
Updated•3 months ago
|
Updated•3 months ago
|
Updated•3 months ago
|
Comment 31•3 months ago
|
||
Pushed by geoff@darktrojan.net:
https://hg.mozilla.org/comm-central/rev/3f5fe56706d2
Fix Gloda indexing for offline IMAP folders with corrupt summary files. r=mkmelin
Description
•