Closed Bug 1592256 Opened 4 months ago Closed 4 months ago

The messages.continueList gradually gets slower

Categories

(Thunderbird :: Add-Ons: Extensions API, defect)

defect
Not set

Tracking

(thunderbird_esr6871+ fixed, thunderbird71 fixed, thunderbird72 fixed)

RESOLVED FIXED
Thunderbird 72.0
Tracking Status
thunderbird_esr68 71+ fixed
thunderbird71 --- fixed
thunderbird72 --- fixed

People

(Reporter: mihaicodrean, Assigned: darktrojan)

Details

Attachments

(3 files)

User Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:70.0) Gecko/20100101 Firefox/70.0

Steps to reproduce:

Iterate through the messages in a large folder (45k+ items) using the WebExtensions "messages" API, through the listMessages helper function from: https://thunderbird-webextensions.readthedocs.io/en/68/how-to/messageLists.html

Actual results:

The "continueList" call gradually gets slower, to the point that it's not acceptable.

Expected results:

I would have expected the performance to be linear.

Hey I'm just an extension developer like you.

I think I know why this happens (but I don't know for sure). Basically, it has to due with how Thunderbird indexes messages, and how it works with Compression. Certain folders and messages are 'harder' for Thunderbird to get the other messages (especially when compressed). As an example, try using messages.list() on Inbox. Then try using it on Drafts and Trash. You should notice a big difference in the time it takes (use Date.now() before and after each). That's because Drafts and Trash are stored in a way that it's harder to get to them - we don't look at Drafts or the Trash very often, so they aren't in a 'convenient' place for Thunderbird to get.

In addition, when you start using messages.list(), Thunderbird has to start remembering message ids, and I think it uses some sort of internal data structure to do so (again, just a guess). This might also cause things to slow down slightly, although it shouldn't be that much (or at least, it wasn't for me with 50,000 messages).

Suffice to say, getting messages from Archives can sometimes takes as long as 1.5 seconds in my experience.

A bigger problem I see with all of this is that getting messages this way sometimes freezes up the UI. So that 1.5 second freeze is really noticeable.

(In reply to michael.pope.email from comment #1)

Thanks for jumping in. In my case:

  • The first batches of 100 messages each are decently fast
  • But then the decay gets noticeable and by the time it gets over 5000 messages, the time spent in continueList() is 2-3 seconds for each call
  • The UI is barely usable during this time
  • What's also interesting is that this seems to be CPU-bound (20% on a i7, almost constantly), with little to no disk activity
  • No apparent memory leak, as far as I can tell by watching the private bytes
  • I did a "Repair Folder", just in case, but that didn't change anything
  • Also, in my case it's a custom folder, not the standard ones.

I'm surprised that you report 1.5 seconds to fetch 50,000 message headers. If I would have that kind of performance, I would be happy.

(In reply to Mihai from comment #2)

(In reply to michael.pope.email from comment #1)

Thanks for jumping in. In my case:

  • The first batches of 100 messages each are decently fast
  • But then the decay gets noticeable and by the time it gets over 5000 messages, the time spent in continueList() is 2-3 seconds for each call
  • The UI is barely usable during this time
  • What's also interesting is that this seems to be CPU-bound (20% on a i7, almost constantly), with little to no disk activity
  • No apparent memory leak, as far as I can tell by watching the private bytes
  • I did a "Repair Folder", just in case, but that didn't change anything
  • Also, in my case it's a custom folder, not the standard ones.

I'm surprised that you report 1.5 seconds to fetch 50,000 message headers. If I would have that kind of performance, I would be happy.

I should clarify. 1.5 seconds to fetch 100 messages (using message.continueList() or message.list()). But that's only in the deep Archives (really old ones).

Yeah, it's definitely CPU bound. Don't know why though. What I've done with my extension is create a few queues. I measure how long the continueList() or list() was, and then I setTimeout() for a certain period based on that (using async.js is so nice for this). That allows me to keep my extension's overall power usage listed as very low, even with the occasional burst up to 15% or so. It does mean that you only get a new list of 100 messages every 4-10 seconds though, which is a lot slower. So try putting the delays in and I think things will work better. As I mentioned in the post above, the call freezes up the UI a bit, so it's good to put some space in between them anyways.

What are you hoping to do with your extension?

Flags: needinfo?(geoff)

(In reply to michael.pope.email from comment #3)

That allows me to keep my extension's overall power usage listed as very low, even with the occasional burst up to 15% or so.

If we're talking workarounds, lowering the messagesPerPage property value could also help. Good idea with the periodic sleep.
I'd rather see this addressed though.

What are you hoping to do with your extension?

I need to analyze the emails in a couple of folders based on their subject and date.

(In reply to Mihai from comment #4)

(In reply to michael.pope.email from comment #3)

That allows me to keep my extension's overall power usage listed as very low, even with the occasional burst up to 15% or so.

If we're talking workarounds, lowering the messagesPerPage property value could also help. Good idea with the periodic sleep.
I'd rather see this addressed though.

What are you hoping to do with your extension?

I need to analyze the emails in a couple of folders based on their subject and date.

I'll try the messagesPerPage and see how that works. If I can reduce the UI freeze time that would be fantastic - a whole lot better than sleep.

Sweet! Are you trying to remove duplicates or something?

Mine is trying to synchronize tags on email servers that don't support natural tag synchronization (which is really heavy on the CPU in Thunderbird, because I keep having to use .getFull() to get the message UUID)

(In reply to michael.pope.email from comment #5)

I'll try the messagesPerPage and see how that works. If I can reduce the UI freeze time that would be fantastic - a whole lot better than sleep.

I was thinking both a smaller page & sleep. In my case it won't cut it, given the exponential slowdown.

Are you trying to remove duplicates or something?

No, although that could make a useful extension.

Mine is trying to synchronize tags on email servers that don't support natural tag synchronization (which is really heavy on the CPU in Thunderbird, because I keep having to use .getFull() to get the message UUID)

Nice!

(In reply to Mihai from comment #6)

(In reply to michael.pope.email from comment #5)

I'll try the messagesPerPage and see how that works. If I can reduce the UI freeze time that would be fantastic - a whole lot better than sleep.

I was thinking both a smaller page & sleep. In my case it won't cut it, given the exponential slowdown.

Are you trying to remove duplicates or something?

No, although that could make a useful extension.

Mine is trying to synchronize tags on email servers that don't support natural tag synchronization (which is really heavy on the CPU in Thunderbird, because I keep having to use .getFull() to get the message UUID)

Nice!

I just realized, messagesPerPage isn't in the documentation. Is it a property on browser.messages or chrome.messages?

Just curious, for your use case, how come it won't cut it? Are you dealing with rapidly changing folders (so you have to get the messages ASAP)?

(In reply to michael.pope.email from comment #7)

I just realized, messagesPerPage isn't in the documentation. Is it a property on browser.messages or chrome.messages?

It's a preference: "extensions.webextensions.messagesPerPage". But I see that it's not yet in the Beta channel - I'm on 71.0b1 and I don't have it yet. You could try to build Thunderbird etc.

Just curious, for your use case, how come it won't cut it? Are you dealing with rapidly changing folders (so you have to get the messages ASAP)?

It won't work for me because I need a decent time for an initial full folder parse. A total time over, say one minute, won't be acceptable for my purposes.

(In reply to Mihai from comment #8)

It's a preference: "extensions.webextensions.messagesPerPage". But I see that it's not yet in the Beta channel - I'm on 71.0b1 and I don't have it yet. You could try to build Thunderbird etc.

Ah, yeah, guess I'll have to wait then. Not worth building for... but it'll be nice to have when it comes out.

It won't work for me because I need a decent time for an initial full folder parse. A total time over, say one minute, won't be acceptable for my purposes.

Ah yeah. I was going to say you could try and grab the selected messages (since it's a secondary message list automatically delivered by an event listener), but if you need every message in a folder, that won't cut it.

The decay seems to be linear (not exponential), as in the attached chart.

What's also interesting is that the very last fetch for about half of a page yields a directly proportional time, which would indicate that the decay is per-message.

I see the problem. It's very likely because when we assign an ID to a message we have to look through every message that already has an ID, to check if we've seen it already. If we did the lookup with some kind of hash instead that would go from linear to constant time.

Status: UNCONFIRMED → NEW
Ever confirmed: true
Flags: needinfo?(geoff)

Yes, that works. Now to work out a hash that's better than the simple one I used to test.

Assignee: nobody → geoff
Status: NEW → ASSIGNED

Great news, Geoff! Looking forward to the patch.

Attachment #9106111 - Flags: review?(mkmelin+mozilla)
Attachment #9106111 - Flags: approval-comm-esr68?
Attachment #9106111 - Flags: approval-comm-beta?
Comment on attachment 9106111 [details] [diff] [review]
1592256-message-lookup-1.diff

Review of attachment 9106111 [details] [diff] [review]:
-----------------------------------------------------------------

LGTM! r=mkmelin
Attachment #9106111 - Flags: review?(mkmelin+mozilla) → review+

Pushed by mozilla@jorgk.com:
https://hg.mozilla.org/comm-central/rev/4865f56365ce
Cache WebExtension message identifiers to improve lookup performance from O(n) to O(1). r=mkmelin DONTBUILD

Status: ASSIGNED → RESOLVED
Closed: 4 months ago
Resolution: --- → FIXED
Target Milestone: --- → Thunderbird 72.0
Attachment #9106111 - Flags: approval-comm-beta? → approval-comm-beta+

What we stored in the Map should be removed when it isn't relevant any more.

Attachment #9106358 - Flags: review?(mkmelin+mozilla)
Attachment #9106358 - Flags: approval-comm-esr68?
Attachment #9106358 - Flags: approval-comm-beta?
Attachment #9106358 - Flags: review?(mkmelin+mozilla) → review+

Pushed by mozilla@jorgk.com:
https://hg.mozilla.org/comm-central/rev/39b07c84497e
Follow-up: remove data from cache when it is no longer relevant. r=mkmelin

Attachment #9106358 - Flags: approval-comm-beta? → approval-comm-beta+
Attachment #9106111 - Flags: approval-comm-esr68? → approval-comm-esr68+
Attachment #9106358 - Flags: approval-comm-esr68? → approval-comm-esr68+
You need to log in before you can comment on or make changes to this bug.