The messages.continueList gradually gets slower
Categories
(Thunderbird :: Add-Ons: Extensions API, defect)
Tracking
(thunderbird_esr6871+ fixed, thunderbird71 fixed, thunderbird72 fixed)
People
(Reporter: mihaicodrean, Assigned: darktrojan)
Details
Attachments
(3 files)
22.87 KB,
image/png
|
Details | |
2.56 KB,
patch
|
mkmelin
:
review+
jorgk-bmo
:
approval-comm-beta+
jorgk-bmo
:
approval-comm-esr68+
|
Details | Diff | Splinter Review |
1.19 KB,
patch
|
mkmelin
:
review+
jorgk-bmo
:
approval-comm-beta+
jorgk-bmo
:
approval-comm-esr68+
|
Details | Diff | Splinter Review |
User Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:70.0) Gecko/20100101 Firefox/70.0
Steps to reproduce:
Iterate through the messages in a large folder (45k+ items) using the WebExtensions "messages" API, through the listMessages helper function from: https://thunderbird-webextensions.readthedocs.io/en/68/how-to/messageLists.html
Actual results:
The "continueList" call gradually gets slower, to the point that it's not acceptable.
Expected results:
I would have expected the performance to be linear.
Comment 1•1 year ago
|
||
Hey I'm just an extension developer like you.
I think I know why this happens (but I don't know for sure). Basically, it has to due with how Thunderbird indexes messages, and how it works with Compression. Certain folders and messages are 'harder' for Thunderbird to get the other messages (especially when compressed). As an example, try using messages.list() on Inbox. Then try using it on Drafts and Trash. You should notice a big difference in the time it takes (use Date.now() before and after each). That's because Drafts and Trash are stored in a way that it's harder to get to them - we don't look at Drafts or the Trash very often, so they aren't in a 'convenient' place for Thunderbird to get.
In addition, when you start using messages.list(), Thunderbird has to start remembering message ids, and I think it uses some sort of internal data structure to do so (again, just a guess). This might also cause things to slow down slightly, although it shouldn't be that much (or at least, it wasn't for me with 50,000 messages).
Suffice to say, getting messages from Archives can sometimes takes as long as 1.5 seconds in my experience.
A bigger problem I see with all of this is that getting messages this way sometimes freezes up the UI. So that 1.5 second freeze is really noticeable.
(In reply to michael.pope.email from comment #1)
Thanks for jumping in. In my case:
- The first batches of 100 messages each are decently fast
- But then the decay gets noticeable and by the time it gets over 5000 messages, the time spent in continueList() is 2-3 seconds for each call
- The UI is barely usable during this time
- What's also interesting is that this seems to be CPU-bound (20% on a i7, almost constantly), with little to no disk activity
- No apparent memory leak, as far as I can tell by watching the private bytes
- I did a "Repair Folder", just in case, but that didn't change anything
- Also, in my case it's a custom folder, not the standard ones.
I'm surprised that you report 1.5 seconds to fetch 50,000 message headers. If I would have that kind of performance, I would be happy.
Comment 3•1 year ago
|
||
(In reply to Mihai from comment #2)
(In reply to michael.pope.email from comment #1)
Thanks for jumping in. In my case:
- The first batches of 100 messages each are decently fast
- But then the decay gets noticeable and by the time it gets over 5000 messages, the time spent in continueList() is 2-3 seconds for each call
- The UI is barely usable during this time
- What's also interesting is that this seems to be CPU-bound (20% on a i7, almost constantly), with little to no disk activity
- No apparent memory leak, as far as I can tell by watching the private bytes
- I did a "Repair Folder", just in case, but that didn't change anything
- Also, in my case it's a custom folder, not the standard ones.
I'm surprised that you report 1.5 seconds to fetch 50,000 message headers. If I would have that kind of performance, I would be happy.
I should clarify. 1.5 seconds to fetch 100 messages (using message.continueList() or message.list()). But that's only in the deep Archives (really old ones).
Yeah, it's definitely CPU bound. Don't know why though. What I've done with my extension is create a few queues. I measure how long the continueList() or list() was, and then I setTimeout() for a certain period based on that (using async.js is so nice for this). That allows me to keep my extension's overall power usage listed as very low, even with the occasional burst up to 15% or so. It does mean that you only get a new list of 100 messages every 4-10 seconds though, which is a lot slower. So try putting the delays in and I think things will work better. As I mentioned in the post above, the call freezes up the UI a bit, so it's good to put some space in between them anyways.
What are you hoping to do with your extension?
Updated•1 year ago
|
(In reply to michael.pope.email from comment #3)
That allows me to keep my extension's overall power usage listed as very low, even with the occasional burst up to 15% or so.
If we're talking workarounds, lowering the messagesPerPage property value could also help. Good idea with the periodic sleep.
I'd rather see this addressed though.
What are you hoping to do with your extension?
I need to analyze the emails in a couple of folders based on their subject and date.
Comment 5•1 year ago
|
||
(In reply to Mihai from comment #4)
(In reply to michael.pope.email from comment #3)
That allows me to keep my extension's overall power usage listed as very low, even with the occasional burst up to 15% or so.
If we're talking workarounds, lowering the messagesPerPage property value could also help. Good idea with the periodic sleep.
I'd rather see this addressed though.What are you hoping to do with your extension?
I need to analyze the emails in a couple of folders based on their subject and date.
I'll try the messagesPerPage and see how that works. If I can reduce the UI freeze time that would be fantastic - a whole lot better than sleep.
Sweet! Are you trying to remove duplicates or something?
Mine is trying to synchronize tags on email servers that don't support natural tag synchronization (which is really heavy on the CPU in Thunderbird, because I keep having to use .getFull() to get the message UUID)
(In reply to michael.pope.email from comment #5)
I'll try the messagesPerPage and see how that works. If I can reduce the UI freeze time that would be fantastic - a whole lot better than sleep.
I was thinking both a smaller page & sleep. In my case it won't cut it, given the exponential slowdown.
Are you trying to remove duplicates or something?
No, although that could make a useful extension.
Mine is trying to synchronize tags on email servers that don't support natural tag synchronization (which is really heavy on the CPU in Thunderbird, because I keep having to use .getFull() to get the message UUID)
Nice!
Comment 7•1 year ago
|
||
(In reply to Mihai from comment #6)
(In reply to michael.pope.email from comment #5)
I'll try the messagesPerPage and see how that works. If I can reduce the UI freeze time that would be fantastic - a whole lot better than sleep.
I was thinking both a smaller page & sleep. In my case it won't cut it, given the exponential slowdown.
Are you trying to remove duplicates or something?
No, although that could make a useful extension.
Mine is trying to synchronize tags on email servers that don't support natural tag synchronization (which is really heavy on the CPU in Thunderbird, because I keep having to use .getFull() to get the message UUID)
Nice!
I just realized, messagesPerPage isn't in the documentation. Is it a property on browser.messages or chrome.messages?
Just curious, for your use case, how come it won't cut it? Are you dealing with rapidly changing folders (so you have to get the messages ASAP)?
(In reply to michael.pope.email from comment #7)
I just realized, messagesPerPage isn't in the documentation. Is it a property on browser.messages or chrome.messages?
It's a preference: "extensions.webextensions.messagesPerPage". But I see that it's not yet in the Beta channel - I'm on 71.0b1 and I don't have it yet. You could try to build Thunderbird etc.
Just curious, for your use case, how come it won't cut it? Are you dealing with rapidly changing folders (so you have to get the messages ASAP)?
It won't work for me because I need a decent time for an initial full folder parse. A total time over, say one minute, won't be acceptable for my purposes.
Comment 9•1 year ago
|
||
(In reply to Mihai from comment #8)
It's a preference: "extensions.webextensions.messagesPerPage". But I see that it's not yet in the Beta channel - I'm on 71.0b1 and I don't have it yet. You could try to build Thunderbird etc.
Ah, yeah, guess I'll have to wait then. Not worth building for... but it'll be nice to have when it comes out.
It won't work for me because I need a decent time for an initial full folder parse. A total time over, say one minute, won't be acceptable for my purposes.
Ah yeah. I was going to say you could try and grab the selected messages (since it's a secondary message list automatically delivered by an event listener), but if you need every message in a folder, that won't cut it.
Reporter | ||
Comment 10•1 year ago
|
||
The decay seems to be linear (not exponential), as in the attached chart.
What's also interesting is that the very last fetch for about half of a page yields a directly proportional time, which would indicate that the decay is per-message.
Assignee | ||
Comment 11•1 year ago
|
||
I see the problem. It's very likely because when we assign an ID to a message we have to look through every message that already has an ID, to check if we've seen it already. If we did the lookup with some kind of hash instead that would go from linear to constant time.
Assignee | ||
Comment 12•1 year ago
|
||
Yes, that works. Now to work out a hash that's better than the simple one I used to test.
Reporter | ||
Comment 13•1 year ago
|
||
Great news, Geoff! Looking forward to the patch.
Assignee | ||
Comment 14•1 year ago
|
||
Comment 15•1 year ago
|
||
Comment on attachment 9106111 [details] [diff] [review] 1592256-message-lookup-1.diff Review of attachment 9106111 [details] [diff] [review]: ----------------------------------------------------------------- LGTM! r=mkmelin
Assignee | ||
Updated•1 year ago
|
Comment 16•1 year ago
|
||
Pushed by mozilla@jorgk.com:
https://hg.mozilla.org/comm-central/rev/4865f56365ce
Cache WebExtension message identifiers to improve lookup performance from O(n) to O(1). r=mkmelin DONTBUILD
Updated•1 year ago
|
Updated•1 year ago
|
Assignee | ||
Comment 17•1 year ago
|
||
What we stored in the Map should be removed when it isn't relevant any more.
Assignee | ||
Updated•1 year ago
|
Updated•1 year ago
|
Assignee | ||
Updated•1 year ago
|
Comment 18•1 year ago
|
||
Pushed by mozilla@jorgk.com:
https://hg.mozilla.org/comm-central/rev/39b07c84497e
Follow-up: remove data from cache when it is no longer relevant. r=mkmelin
Updated•1 year ago
|
Comment 19•1 year ago
|
||
TB 71 beta 3:
https://hg.mozilla.org/releases/comm-beta/rev/01133d7ade6a1467245c07b47842c6423815a623
https://hg.mozilla.org/releases/comm-beta/rev/0dc166cf7ad001f0b78a44f49f7f881bedea967e
Updated•1 year ago
|
Updated•1 year ago
|
Comment 20•1 year ago
|
||
TB 68.3 ESR:
https://hg.mozilla.org/releases/comm-esr68/rev/b583a52fa22aa4245a34758a9fef6d15a0adc39a
https://hg.mozilla.org/releases/comm-esr68/rev/e70a1c6b1fa8d7ab002707b52d76ca6486fa0e06
Description
•