Closed Bug 1991497 Opened 7 months ago Closed 3 months ago

Incorrect message size in summary file causes global search indexing to hang

Categories

(Thunderbird :: Search, defect, P2)

Thunderbird 134
Desktop
All
defect

Tracking

(thunderbird_esr140 affected, thunderbird146 affected, thunderbird147 affected, thunderbird148 affected)

RESOLVED FIXED
149 Branch
Tracking Status
thunderbird_esr140 --- affected
thunderbird146 --- affected
thunderbird147 --- affected
thunderbird148 --- affected

People

(Reporter: bugzilla, Assigned: welpy-cw)

References

(Blocks 1 open bug, Regression)

Details

(Keywords: perf, regression, Whiteboard: [has performance profile])

Attachments

(3 files)

User Agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:143.0) Gecko/20100101 Firefox/143.0

Steps to reproduce:

Use thunderbird normally. Notice that search results are very incomplete; then see that messages are constantly being indexed (but no progress is being made).

I have two email accounts, both with many (~100k) messages. I had trouble with the indexer before bug 1941019, but that problem went away when upgrading to 140ESR.

I am now on 143 (not-ESR, as this is what the TB download page recommends. I think the problem may have returned with 142 already, though I'm not sure.

I've tried the steps I took in previous bug 1941019 to no avail. Specifically:

  • Delete the "global-messages-db.sqlite" file
  • Repair the folder it seems to be stuck on
  • Disable all extensions
  • Start TB without being connected to internet (what got me unstuck with previous bug).

Actual results:

"Indexing messages" gets stuck (e.g., current status is "Indexing 32 of 68701 messages in [[redacted]]").

Expected results:

Indexing messages should make progress, or at least give some information on why it is stuck (e.g., is there a malformed email somewhere?).

To add: I'm on linux, running the version as downloaded from https://www.thunderbird.net/en-US/thunderbird/all/

OS: Unspecified → Linux
Hardware: Unspecified → x86_64
Component: Untriaged → Search

While giving feedback for the call for testing on EWS feature (https://thunderbird.topicbox.com/groups/planning/T00a28d546ed1b7d1-M6077113cd8e22173d5ee7454/ews-in-thunderbird-144-0b1-call-for-testing) I also noticed the indexing of newly created account to be very slow with TB 144.0b2 (64-bit) on Windows 11 Pro.

With TB 140.3.0esr (64-bit) on Windows 10 Pro, as you would see attached, the indexing is extremely slow...

Here is a performance profile recorded under TB 140.3.0esr (64-bit) with a a very large IMAP mailbox while it was processing some indexing extremely slowly: https://share.firefox.dev/46sMihI

Hope that help.

Does the indexing process index only the headers of the email or also the body?
I don't know if that matter, but in my case all folders are set not to be used offline ("Select this folder for offline use" option is disabled on all folders) so I would expect the indexing not to index the body of message as it isn't present in the local cache and I would not want the indexing process to download all messages locally (eg 100GB worth of emails).
If a search is requested on the body content, I would expect the search to occur on the server side where the body content is present. "Run search on the server" to be enabled automatically when searching criteria include Body field while working in online mode. This option is available in Message Search feature, and Quick Filter search... but is it implemented in the Global Search feature which currently may require Body indexing to work?

(In reply to Richard Leger from comment #4)

Does the indexing process index only the headers of the email or also the body?

Headers, and first 20k (IIRC) of body.

I don't know if that matter, but in my case all folders are set not to be used offline ("Select this folder for offline use" option is disabled on all folders) so I would expect the indexing not to index the body of message as it isn't present in the local cache and I would not want the indexing process to download all messages locally (eg 100GB worth of emails).

Then if doing just headers, indexing should be very fast.

If a search is requested on the body content, I would expect the search to occur on the server side where the body content is present. "Run search on the server" to be enabled automatically when searching criteria include Body field while working in online mode. This option is available in Message Search feature, and Quick Filter search... but is it implemented in the Global Search feature which currently may require Body indexing to work?

AFAIK global search only operates on data that has been downloaded. If the body has not been downloaded ("sync") then the body content will not be found.

(In reply to Richard Leger from comment #3)

Here is a performance profile recorded under TB 140.3.0esr (64-bit) with a a very large IMAP mailbox while it was processing some indexing extremely slowly: https://share.firefox.dev/46sMihI

Hope that help.

Thanks so much.

However, can you redo the profile with 140.3.1, available today? Because 140.3.0 is missing important performance fixes:

  • Bug 1973165 - Large load times on first launch when trying to view messages from Inbox (and other UI blockage) while messages are downloading
  • Bug 1971648 - Add contain: strict to pane elements to reduce layout complexity, and avoid performance impact from animation in status bar

(FWIW they are in release 142)

Flags: needinfo?(bugzilla)
See Also: → 1941019
See Also: → 1934197

(In reply to Wayne Mery (:wsmwk) from comment #6)

However, can you redo the profile with 140.3.1, available today?

In TB 140.3.1esr (64-bit) on Windows 11 Pro
While repairing one folder containing 2000ish emails - 1.3GB total size (as shown by TB)
https://share.firefox.dev/3KpJl9b
Hope that help.

A bit faster (but much less email than indexed previously) seems to slow down as the process progress.
About 7s to index 250ish emails as a rough estimate from a visual point of view (Activity Manager), seems quite long!

(In reply to Wayne Mery (:wsmwk) from comment #5)

(In reply to Richard Leger from comment #4)

Does the indexing process index only the headers of the email or also the body?

Headers, and first 20k (IIRC) of body.

When offline use is disabled I would not expect TB to have to download 20k of body from the server because if you have 200000 emails or more that is a lot of data to download (bandwidth usage). That should not be required.

I was not implying that 20k is downloaded if you have sync disabled. :)

IF the body is downloaded, by autosync or by manually accessing a message, then the message body gets indexed.

Note that, for me, it is not slow, it is stuck. The indexer will do a handful of messages when I start thunderbird up (last time 32, now 146), but then just stay there. It doesn't seem to matter how long I wait at this point, it won't proceed (until I close and reopen TB).

Flags: needinfo?(bugzilla)

(In reply to Mark Mueller from comment #9)

Note that, for me, it is not slow, it is stuck. The indexer will do a handful of messages when I start thunderbird up (last time 32, now 146), but then just stay there. It doesn't seem to matter how long I wait at this point, it won't proceed (until I close and reopen TB).

Are you able to reproduce the steps?

Any error in the console (CTRL+SHIFT+J) when the issue raise (indexing seems to run & then stops)?

Maybe if it happens again, try to capture a performance profile (https://support.mozilla.org/en-US/kb/profiling-thunderbird-performance) during the indexing process (while it is running & stopping if possible) and publish it here.

(In reply to Richard Leger from comment #10)

(In reply to Mark Mueller from comment #

Note that, for me, it is not slow, it is stuck. The indexer will do a handful of messages when I start thunderbird up (last time 32, now 146), but then just stay there. It doesn't seem to matter how long I wait at this point, it won't proceed (until I close and reopen TB).

Are you able to reproduce the steps?

Any error in the console (CTRL+SHIFT+J) when the issue raise (indexing seems to run & then stops)?

Maybe if it happens again, try to capture a performance profile (https://support.mozilla.org/en-US/kb/profiling-thunderbird-performance) during the indexing process (while it is running & stopping if possible) and publish it here.

If I disable all addons, then no errors show in the error console (but indexer still gets stuck). I don't think addons are relevant, but they cause some errors which clutter the error console.

I don't think performance profile is relevant, as thunderbird is not using high resources when stuck (0.01% CPU, and around 850MB of RAM).

@Mark,

(In reply to Mark Mueller from comment #11)

I don't think performance profile is relevant, as thunderbird is not using high resources when stuck (0.01% CPU, and around 850MB of RAM).

I tend to think it may potentially help developers identify what may be going on under the hood... and it does not take long to capture and publish as far as you can reproduce the issue, and capture it while it is happening especially in absence of error in the console.

Alternatively or in complement, if you are up for it, as you seems to be able to reproduce the issue, you could try the https://mozilla.github.io/mozregression tool to identify the set of patches that caused the regression in your case between 140ESR which seems ok and 143 that seems ko. I often use the 6.3.0 version on Windows (not the latest because it does not pass the virus total test yet), they are option available for Linux.

@Wayne,

In the ESR branch (https://www.thunderbird.net/en-US/thunderbird/releases), I only see few fixes related to indexing in TB Version 140.0esr (https://www.thunderbird.net/en-US/thunderbird/140.0esr/releasenotes), could they have had side effects? I don't see anything related to indexing in successive release notes up to 143.

Flags: needinfo?(vseerror)

(In reply to Richard Leger from comment #12)

@Mark,

(In reply to Mark Mueller from comment #11)

I don't think performance profile is relevant, as thunderbird is not using high resources when stuck (0.01% CPU, and around 850MB of RAM).

I tend to think it may potentially help developers identify what may be going on under the hood... and it does not take long to capture and publish as far as you can reproduce the issue, and capture it while it is happening especially in absence of error in the console.

Sorry, took me some time to get to this. I did a profiler capture. I had to follow these steps ( https://bugzilla.mozilla.org/show_bug.cgi?id=1920387#c4 ) as the default guide gave me an error after the capture. Recording is here: https://bugzilla.mozilla.org/attachment.cgi?id=9520059

I can only start the recording after TB is already running. After starting the recording, TB activities manager showed the indexer starting, and it got stuck now at 4 messages indexed.

Note that I'm trusting the privacy preserving is OK, and no passwords / tokens are being shared. Please let me know if that's not true.

Alternatively or in complement, if you are up for it, as you seems to be able to reproduce the issue, you could try the https://mozilla.github.io/mozregression tool to identify the set of patches that caused the regression in your case between 140ESR which seems ok and 143 that seems ko. I often use the 6.3.0 version on Windows (not the latest because it does not pass the virus total test yet), they are option available for Linux.

I wanted to do this, but realistically, won't have time in the near future.

(In reply to Mark Mueller from comment #14)

Recording is here: https://bugzilla.mozilla.org/attachment.cgi?id=9520059

Ta.

It is usually good practice to Load profile from file on https://profiler.firefox.com and then Upload Local Profile (top right corner) + Upload to get an URL to the loaded profile that you can copy here e.g https://share.firefox.dev/4qgKJLK so it gets easily accessible for everyone.

I see Jank caused by Mailnews (indexing?) and an increase of memory usage.

Hopefully that will help developers identify the possible issue here.

Summary: Indexing messages gets stuck → Indexing messages gets stuck - Mailnews causing Junk?
Keywords: perf
Whiteboard: [has performance profile]

I can reproduce this bug on 140.3.0esr:

  • It starts indexing and then stop "in the middle"
  • Consumes 0% CPU
  • tried to let it stuck the whole night
  • this morning it's still at the exact same point, still at 0% CPU.

After re-starting it a few time (deleting global-messages-db.sqlite between each run) too see if it halts at the same point: it does not.

On the first runs it halted during the first directory (Sent in my case), but now it successfully indexed "Sent" and got stuck indexing another directory.

Tried running mozregression. As the bug does not happen fast it's often hard to tell "will be bug happen in one minute or is it a good build?", yet I tried (took hours, haha), and here what I got:

245:16.95 INFO: Last good revision: 3db59b66aa1225bfaa239ca2c435a612cf22e63a
245:16.95 INFO: First bad revision: 6ab306398cb1fe19f73f76b0ebd5f42edad79d0e
245:16.95 INFO: Pushlog:
https://hg.mozilla.org/comm-central/pushloghtml?fromchange=3db59b66aa1225bfaa239ca2c435a612cf22e63a&tochange=6ab306398cb1fe19f73f76b0ebd5f42edad79d0e

So the bug may have appeared around 134.0a1.

But also it looks like it disappeared later (I have not bisected the fix yet), but I found it works in 146.0a1.

Problem is: the bug exists in current ESR (140).

Forget what I wrote about 146.0a1: it has the bug too, I may have not waited long enough when I tried it :(((

I'm getting better at identifying the bug, I have a folder named other, when it freezes it freeze around the same number of emails (1185, 1139, 1131, 1189, 1161, 1169, 1236 from my last bisect). I bet the small differences are due to parallelism or something.

Happily, my 2nd bisect gave me the exact same result than my first one:

533:34.39 INFO: Narrowed integration regression window from [3db59b66, d4568348] (6 builds) to [3db59b66, 6ab30639] (2 builds) (~1 steps left)
533:34.39 INFO: No more integration revisions, bisection finished.
533:34.39 INFO: Last good revision: 3db59b66aa1225bfaa239ca2c435a612cf22e63a
533:34.39 INFO: First bad revision: 6ab306398cb1fe19f73f76b0ebd5f42edad79d0e
533:34.39 INFO: Pushlog:
https://hg.mozilla.org/comm-central/pushloghtml?fromchange=3db59b66aa1225bfaa239ca2c435a612cf22e63a&tochange=6ab306398cb1fe19f73f76b0ebd5f42edad79d0e

No worries, I did not spent 533 minutes bisecting, had to AFK multiples times.

(In reply to Julien Palard from comment #18)

Tried running mozregression.

Well done Julien! Hopefully that will help the dev team identify and fix the issue.
For the record what is your OS (Linux?Windows?), OS Architecture (32bits?64bits), TB Architecture (32bits?64bits? Indicated in Menu > Help > About within the version).

I'm using Debian Trixie, on a 64 bits architecture (AMD Ryzen 7 7840HS) running TH 140.3.0esr (64-bit) from the Debian repositories.

Could be from bug 1924291 then.

(In reply to Richard Leger from comment #3)

Here is a performance profile recorded under TB 140.3.0esr (64-bit) with a a very large IMAP mailbox while it was processing some indexing extremely slowly: https://share.firefox.dev/46sMihI

(In reply to Richard Leger from comment #7)

(In reply to Wayne Mery (:wsmwk) from comment #6)

However, can you redo the profile with 140.3.1, available today?

In TB 140.3.1esr (64-bit) on Windows 11 Pro
While repairing one folder containing 2000ish emails - 1.3GB total size (as shown by TB)
https://share.firefox.dev/3KpJl9b

A bit faster (but much less email than indexed previously) seems to slow down as the process progress.
About 7s to index 250ish emails as a rough estimate from a visual point of view (Activity Manager), seems quite long!

To my eye there are significant differences between Richard's two profiles

Flags: needinfo?(vseerror)

Just received 140.4.0esr through a Debian trixie update, tried a reindex just in case. Still stuck at the same spot, at 0% CPU, similar as before.

Someone who understands the indexer should do the following:

Check for stop conditions. At what time might the indexer decide to not continue.
Or alternatively, figure out what's necessary to keep the indexer going. Because it's a background activity, I'm guessing that we're posting events in some way, that schedule our code to regularly continue with that activity.
I suggest to add a minimal amount of logging for those events.
This could be used by people who can reproduce the bug. They could confirm whether those "keep active" events keep coming, or whether they eventually stop (at the time indexing fails). This could then be followed by more logging that watches for the various scenarios that could cause the keep-running events to stop.
(These are high-level thoughts, I don't know how it works exactly.)

This bug needs an owner who understands this code, can provide experimental builds with logging, that the effected users could test.

I'm marking this bug as confirmed, it sounds like we got plausible reporting that it's a real issue.

Status: UNCONFIRMED → NEW
Ever confirmed: true
Severity: -- → S2
Priority: -- → P2
Keywords: regression

Got Thunderbird 140.6.0esr-1 through Debian Forky, and I'm still having this bug.

I was able to reproduce the hang when indexing an IMAP folder with offline use: Patching the size and offlineMsgSize fields of a message in the .msf file (similar to this test) leads to MboxMsgInputStream::Read failing with NS_MSG_ERROR_UNEXPECTED_SIZE. In this case, onStopRequest will be called with that statuscode, but OnStopRunningUrl will be passed NS_OK, so the message is never finalized, causing the indexing progress to hang. (For a local folder this works as expected, both listeners receive the error code.)

Since this is regressed by the combination of bug 1851074 and bug 1924291, marking both bugs as responsible.

OS: Linux → All
Regressed by: 1851074, 1924291
Hardware: x86_64 → Desktop
Summary: Indexing messages gets stuck - Mailnews causing Junk? → Incorrect message size in summary file causes global search indexing to hang
Version: Thunderbird 143 → Thunderbird 134
Assignee: nobody → h.w.forms
Status: NEW → ASSIGNED
Target Milestone: --- → 148 Branch
Attachment #9536347 - Attachment description: Bug 1991497 - Fix Gloda indexing for offline IMAP folders with corrupt summary files. r=#thunderbird-reviewers → Bug 1991497 - Fix Gloda indexing for offline IMAP folders with corrupt summary files. r=mkmelin
Target Milestone: 148 Branch → 149 Branch

Pushed by geoff@darktrojan.net:
https://hg.mozilla.org/comm-central/rev/3f5fe56706d2
Fix Gloda indexing for offline IMAP folders with corrupt summary files. r=mkmelin

Status: ASSIGNED → RESOLVED
Closed: 3 months ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: