984875 - Autocomplete hangs Thunderbird for > 20 seconds when searching for single character in large AB

Reporter

Description

•

11 years ago

Using Earlybird 29.0a2 (2014-03-17) although I've been seeing this for months, if not years. Steps to reproduce: * Start new mail * Type "a" into To: box Expected: either no autocomplete (which is fair enough, for one character) or quick autocomplete Actual: browser hangs unresponsively for > 30 seconds before delivering autocomplete answers. I have 4708 entries in my Personal Address Book and another 9800 or so in Collected Addresses. That doesn't seem like an unreasonably large number. My abook.mab is 2.6MB and my history.mab is 3MB. A modern machine should be able to search 6MB of data in milliseconds. Deleting the contents of history.mab means the delay is no longer long enough to trigger Ubuntu's "frozen app" UI, but it's still long. Then, deleting half the addresses in abook.mab makes it somewhat better. Perhaps there's an n^2 algorithm running in here somewhere? I thought about profiling but I can't get the Gecko profiler to work with Thunderbird. I don't get a "long-running script" dialog when I get these autocomplete hangs. Gerv

Magnus Melin [:mkmelin]

Comment 1

•

11 years ago

Regression? We did change some things in autocomplete. E.g. bug 529584 could affect you.

Gervase Markham [:gerv]

Reporter

Comment 2

•

11 years ago

Yes; having tested, this doesn't happen on TB 24.4.0. So it looks like a more recent regression. It could well be bug 529584. How can I tell whether it is? Gerv

Gervase Markham [:gerv]

Reporter

Comment 3

•

11 years ago

With mconley's help, I got the profiler working. Here's the profile: http://people.mozilla.org/~bgirard/cleopatra/?1395158682834#report=ca71d30896bd922a40d82a65884d883b13bb8370 (The first 20 seconds are the important ones.) I'm no expert, but this is what I've found so far: * It calls _searchWithinEmails 66 times - I don't have 66 address books! I do have 33 lists in one of my address books, though. 33 is suspiciously half of 66, but why 2 calls each? * Looks to me like we are calling toLocaleLowerCase() on the same strings an unnecessarily large number of times during this process, and multiple times in quick succession on the same string - see the functions _searchWithinEmails: http://dxr.mozilla.org/comm-central/source/mailnews/addrbook/src/nsAbAutoCompleteSearch.js#171 and _checkDuplicate: http://dxr.mozilla.org/comm-central/source/mailnews/addrbook/src/nsAbAutoCompleteSearch.js#259 * Looks like http://dxr.mozilla.org/comm-central/source/mailnews/addrbook/src/nsAbAutoCompleteSearch.js#253 is n^2 in the number of results returned, because for each result, it loops over all the current results. Gerv

Mike Conley (:mconley) (:⚙️)

Updated

•

11 years ago

Flags: needinfo?(mconley)

Mike Conley (:mconley) (:⚙️)

Comment 4

•

11 years ago

Cc'ing aceman who worked on bug 529584. Any chance these two things could be related, aceman?

Flags: needinfo?(mconley) → needinfo?(acelists)

Mike Conley (:mconley) (:⚙️)

Updated

•

11 years ago

Flags: needinfo?(mconley)

:aceman

Assignee

Comment 5

•

11 years ago

If you see the patch there https://bug529584.bugzilla.mozilla.org/attachment.cgi?id=803332 you'll see it just changes .startsWith to .contains without changing any loops or number of times .toLocaleLowercase() or _searchWithinEmails() is called. So we would need to find out which is the last commit to the nsAbAutoCompleteSearch.js file for TB 24 (from hg log).

Flags: needinfo?(acelists)

Gervase Markham [:gerv]

Reporter

Comment 6

•

11 years ago

(In reply to :aceman from comment #5) > If you see the patch there > https://bug529584.bugzilla.mozilla.org/attachment.cgi?id=803332 you'll see > it just changes .startsWith to .contains without changing any loops or > number of times .toLocaleLowercase() or _searchWithinEmails() is called. My observations on the code are not meant to be an indication of cause :-) .startsWith(), in the common case, compares one single letter (the first one). .contains(), in the common case, compares far more (depending on which string matching algorithm you are using). So your patch will definitely have had _some_ performance impact. My question is: did you measure it? Do you know how big it was? Did you test with large address books? Gerv

Mark Banner (:standard8)

Comment 7

•

11 years ago

startsWith -> contains is likely a performance impact. Ok, so I'm not sure I like some of this code, although I originally wrote it ;-) Some potential optimisations: - _searchWithinEmails doesn't need to be called for mailing lists. _searchCards has already looked at the mailing list values iirc, and the childCards don't need to be searched because we have the "feature" that all cards have to be in the parent address book. It would probably be worth investigating why its called 66 times rather than 33 in Gerv's case though. - As Gerv says, _checkDuplicate is sub-optimal. The results list should already be sorted, so we may be able to do some optimisations based around that, e.g. post de-duplicating rather than doing it as we go.

Gervase Markham [:gerv]

Reporter

Comment 8

•

11 years ago

And it would be good to find a way of caching the values of toLocaleLowerCase - or, at least, caching a bit which says "this email address already happens to be in LocaleLowerCase form", which is true of most email addresses. Let me know if you need more from me; otherwise, I'll assume you guys can take it from here. Gerv

:aceman

Assignee

Comment 9

•

11 years ago

No I have not measured the impact, I do not have large address book. But these are questions mconley would probably know to answer.

Wayne Mery (:wsmwk)

Updated

•

11 years ago

Keywords: perf

Mike Conley (:mconley) (:⚙️)

Comment 10

•

11 years ago

So, does anybody have cycles to push forward with Standard8's suggestions in comment 7? Because I currently don't.

Flags: needinfo?(mconley)

:aceman

Assignee

Comment 11

•

11 years ago

Is anybody able to provide me a big addressbook DB the size gerv has so I can test on it? It can contain just some testing cards.

Flags: needinfo?(mconley)

Mike Conley (:mconley) (:⚙️)

Comment 12

•

11 years ago

Gerv: Is it a ridiculous request to have you share your address book with aceman? If so, I can try to get around to generating a massive address book programmatically, but I'm not sure when I'll find the time. -Mike

Flags: needinfo?(mconley) → needinfo?(gerv)

:aceman

Assignee

Comment 13

•

11 years ago

Wayne sent me some test addressbooks already so I will try with those first.

Flags: needinfo?(gerv)

Gervase Markham [:gerv]

Reporter

Comment 14

•

11 years ago

I can share my address book if need be; let me know if the ones you have don't work out. Gerv

Mike Conley (:mconley) (:⚙️)

Comment 15

•

11 years ago

(In reply to Gervase Markham [:gerv] from comment #14) > I can share my address book if need be; let me know if the ones you have > don't work out. > > Gerv I've just realized that I have leftover code from the new address book work I was doing that will help me create a pretty fat address book if needs be. So we can go that route first, before having gerv share his address book. -Mike

:aceman

Assignee

Comment 16

•

11 years ago

It looks like _searchWithinEmails() is going away in bug 558931 so I will profile this after that rework lands.

Depends on: 558931

Wayne Mery (:wsmwk)

Updated

•

11 years ago

Keywords: regression

Updated

•

11 years ago

Summary: Autocomplete hangs Thunderbird for > 20 seconds → Autocomplete hangs Thunderbird for > 20 seconds when searching for single character in large AB

Whiteboard: [regression:TB27-bug 529584] → [regression:TB27-bug 529584 and prior search design]

Thomas D. (:thomas8)

Comment 17

•

11 years ago

Gerv, does {typing "foo" very quickly} produce less interruption than just "f"? Do we have a delay before triggering search when typing more letters? Could this be adjusted? As Gerv suspects in comment 0, this appears to be a problem long before bug 529584 ever landed: Bug 705837 - High memory and CPU usage, and slow autocomplete during address autocomplete using large addressbook Mark's comment 7 points to some of the previously existing routines that might be a starting point for performance improvements. Btw, _searchWithinEmails() function is about to be removed by patch in bug 558931.

Updated

•

11 years ago

Depends on: 1000775

Gervase Markham [:gerv]

Reporter

Comment 18

•

11 years ago

(In reply to Thomas D. from comment #17) > Gerv, does {typing "foo" very quickly} produce less interruption than just > "f"? > Do we have a delay before triggering search when typing more letters? Could > this be adjusted? Yes, yes and possibly, although I suggest a wiser temporary fix if you are diving into that code would be not to trigger search until 3 characters had been typed. Gerv

:aceman

Assignee

Comment 19

•

11 years ago

Attached patch WIP patch (obsolete) — Details — Splinter Review

So this is a WIP patch I came up. To be applied on top of bug 558931. With the 2 addressbooks comprising 20000 cards, autocompleting 'a' took 30 seconds without bug 558931 and 20 seconds with bug 558931. The test addressbooks have many duplicate cards. The search yields about 900 results. With this patch the autocomplete takes about 4-5 seconds. I can't make it much less because just running the AB query (in C++) takes 2-3 seconds regardless of what the search string is. We can still do the refusal to autocomplete when the string is less than X characters. Where would that belong? In the compose backend or better into the autocomplete startSearch() itself?

Assignee: nobody → acelists

Status: NEW → ASSIGNED

Attachment #8413059 - Flags: feedback?(standard8)

Attachment #8413059 - Flags: feedback?(mconley)

Thomas D. (:thomas8)

Comment 20

•

11 years ago

(In reply to :aceman from comment #19) > Created attachment 8413059 [details] [diff] [review] > WIP patch > > So this is a WIP patch I came up. To be applied on top of bug 558931. > With the 2 addressbooks comprising 20000 cards, autocompleting 'a' took 30 > seconds without bug 558931 and 20 seconds with bug 558931. > The test addressbooks have many duplicate cards. The search yields about 900 > results. > > With this patch the autocomplete takes about 4-5 seconds. I can't make it > much less because just running the AB query (in C++) takes 2-3 seconds > regardless of what the search string is. Wow, thanks, that's a great performance gain coming with Aceman's patch here. Given the minimum default time for C++ AB queries of 2-3 seconds, 4-5 seconds for 20000 cards is acceptable imho; many average private users won't ever have ABs of that size. > We can still do the refusal to autocomplete when the string is less than X > characters. Where would that belong? In the compose backend or better into > the autocomplete startSearch() itself? I think after the huge performance gains offered by Aceman's patch here, preventing autocompletion for search strings of less than 3 chars is no longer needed (and even Gerv's comment 18 refers to that option as a "temporary" fix, not an ideal solution). FF awesome bar frecency algorithm also responds to single characters, and we have plans to implement real frecency for TB recipient autocomplete in Bug 382415 (to replace poorly designed popularityIndex we have now). In that perspective, preventing autocomplete for less than 3 chars will prevent a lot of highly efficient and legitimate use patterns, because even single-character searches can work very well if combined with frecency algorithm (as seen in FF). We already have Bug 970456 where reporter complains that after Bug 529584, his two-character search patterns now yield different results. I suspect average users will not be amused if we make their one- or two-letter frecency searches return nothing, and I strongly discourage taking that route.

Gervase Markham [:gerv]

Reporter

Comment 21

•

11 years ago

I agree. Thanks for doing the testing - these figures are great. Full steam ahead with both patches! Gerv

:aceman

Assignee

Comment 22

•

11 years ago

For reference, my tests were done on an old AMD Phenom II CPU at 3.6Ghz on a DEBUG build of TB (which is generally slower than release).

Mike Conley (:mconley) (:⚙️)

Comment 23

•

11 years ago

Comment on attachment 8413059 [details] [diff] [review] WIP patch Review of attachment 8413059 [details] [diff] [review]: ----------------------------------------------------------------- Minus the logging stuff and vars I'm seeing strewn about, this is definitely the right idea. Can we do the same thing with the nsAbLDAPAutoCompleteSearch.js too? ::: mailnews/addrbook/src/nsAbAutoCompleteSearch.js @@ +362,5 @@ > this._searchCards(searchQuery, dir, result); > } > } > + > + for (let [value, resultItem] of result._collectedValues.entries()) { I think you can just do this with the spread operator[1]: result._searchResults = [...result._collectedValues.values()]; [1]: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Operators/Spread_operator

Attachment #8413059 - Flags: feedback?(mconley) → feedback+

:aceman

Assignee

Comment 24

•

11 years ago

What crazy syntax they come up with these days... Maybe I could do LDAP in a new bug, but I have no way to test it.

Mark Banner (:standard8)

Comment 25

•

11 years ago

Comment on attachment 8413059 [details] [diff] [review] WIP patch Review of attachment 8413059 [details] [diff] [review]: ----------------------------------------------------------------- Seems reasonable to me

Attachment #8413059 - Flags: feedback?(standard8) → feedback+

:aceman

Assignee

Comment 26

•

11 years ago

Attached patch patch v2 (obsolete) — Details — Splinter Review

This tweaks it a bit to make the mailnews/addrbook/test/test_nsAbAutoCompleteSearch*.js tests pass. To produce the expected order of results. I would like to use a.localeCompare(b) in the final sorting function but that would change the order of the results and would need to adapt some of the tests. I am not up to that yet because the used AB cards are stored in .mab files. So we need to live with the fact it sorts "DisplayName <>" before "dis <>" and worse. Thomas could file that as a followup bug :)

Attachment #8413059 - Attachment is obsolete: true

Attachment #8413934 - Flags: review?(standard8)

Magnus Melin [:mkmelin]

Comment 27

•

11 years ago

(In reply to :aceman from comment #24) > What crazy syntax they come up with these days... > > Maybe I could do LDAP in a new bug, but I have no way to test it. If you want an online ldap address book of at least reasonable size, you could use this: https://howto.adams.edu/index.php/Thunderbird_-_Add_LDAP_to_Address_Book - that's what I do for random ldap testing. Seems it's open for everyone from everywhere.

Magnus Melin [:mkmelin]

Comment 28

•

11 years ago

Comment on attachment 8413934 [details] [diff] [review] patch v2 Review of attachment 8413934 [details] [diff] [review]: ----------------------------------------------------------------- ::: mailnews/addrbook/src/nsAbAutoCompleteSearch.js @@ +362,5 @@ > + ((a.card == b.card && a.isPrimaryEmail)? -1 : 0) || > + ((a.value < b.value) ? -1 : (a.value == b.value) ? 0 : 1); > + // TODO: this should actually use a.value.localeCompare(b.value) . > + } > + result._searchResults.sort(order_by_popularity_and_email); i'd just inline the sort function

Magnus Melin [:mkmelin]

Comment 29

•

11 years ago

(In reply to :aceman from comment #19) > We can still do the refusal to autocomplete when the string is less than X > characters. Where would that belong? In the compose backend or better into > the autocomplete startSearch() itself? I think that would belong in startSearch. Now that we search many more fields, I think 2 chars should probably be required.

Gervase Markham [:gerv]

Reporter

Comment 30

•

11 years ago

Do not underestimate the joyful usefulness of being able to type simply "r" and getting the email address for my wife, who I email a lot. Even if we have to pre-cache 26 results, we should retain this ability. Also, an inability to search approximately 3MB of data (for a large address book) near-instantaneously would be, frankly, a massive embarrassment. Gerv

Mark Banner (:standard8)

Comment 31

•

11 years ago

Comment on attachment 8413934 [details] [diff] [review] patch v2 Review of attachment 8413934 [details] [diff] [review]: ----------------------------------------------------------------- Looks good, r=Standard8 ::: mailnews/addrbook/src/nsAbAutoCompleteSearch.js @@ +358,5 @@ > + // Order by descending popularity, then primary email before secondary > + // for the same card, then for differing cards sort by email. > + function order_by_popularity_and_email(a, b) { > + return (b.popularity - a.popularity) || > + ((a.card == b.card && a.isPrimaryEmail)? -1 : 0) || nit: space before ?

Attachment #8413934 - Flags: review?(standard8) → review+

:aceman

Assignee

Comment 32

•

11 years ago

Attached patch patch v2.1 — Details — Splinter Review

Thanks.

Attachment #8413934 - Attachment is obsolete: true

Attachment #8428850 - Flags: review+

:aceman

Assignee

Updated

•

11 years ago

Keywords: checkin-needed

Ryan VanderMeulen [:RyanVM]

Comment 33

•

11 years ago

https://hg.mozilla.org/comm-central/rev/57ec47cea66e

Status: ASSIGNED → RESOLVED

Closed: 11 years ago

Keywords: checkin-needed

Resolution: --- → FIXED

Target Milestone: --- → Thunderbird 32.0

Wayne Mery (:wsmwk)

Comment 34

•

11 years ago

This will be an issue for enterprises

tracking-thunderbird_esr31: --- → ?

Thomas D. (:thomas8)

Comment 35

•

11 years ago

(In reply to Wayne Mery (:wsmwk) from comment #34) > This will be an issue for enterprises Thanks Wayne, good catch. I wasn't aware this hasn't landed for TB31 yet. It definitely should. This brings a huge performance gain to autocomplete. FTR: Contrary to some rumours currently around, please note that per Aceman's analysis of very large AB in Bug 984875 Comment 19, bug 558931 (split multiword search and algorithm redesign/streamlining) has actually *improved* the performance by 33% (down from 30secs to 20secs; before this bug 984875). On top of that, this bug 984875 lands another performance gain of 75% or more (down from 20secs to 4-5 secs, and that's probably still in Debug mode). So the total performance gain should be an impressive 83% (although for anything beyond single-letter searches, we have to look at more factors like length and number of search words, speed of typing, pauses etc. which actually make a difference because of separate algorithms for first vs. subsequent searches). So far we have no systematic analysis / actual measurement of the potential performance impact of Bug 529584 (change of algorithm from beginsWith to Contains), but I'm inclined to think that it should be at least balanced out, probably positively surpassed by that huge performance gain of 83% after bug 558931 and this bug 984875.

Thomas D. (:thomas8)

Comment 36

•

11 years ago

From user feedback on various bugs, this is a major issue that affects workflows significantly. Can we please land this on TB31 asap?

Severity: normal → major

Thomas D. (:thomas8)

Updated

•

11 years ago

Comment 37

•

11 years ago

Comment on attachment 8428850 [details] [diff] [review] patch v2.1 [Approval Request Comment] Regression caused by (bug #): bug 558931. User impact if declined: some users get 10-15s hangish behavior when using autocomplete Testing completed (on c-c, etc.): haven't seen any problems from this on trunk Risk to taking this patch (and alternatives if risky): medium, but it does appear to help and has no known regressions

Attachment #8428850 - Flags: approval-comm-esr31?

Thomas D. (:thomas8)

Comment 38

•

11 years ago

(In reply to Magnus Melin from comment #37) > Comment on attachment 8428850 [details] [diff] [review] > patch v2.1 > > [Approval Request Comment] +1. This bug fixes a bad and inacceptable performance problem in recipient autocomplete, improving the performance by 75% per aceman's comment 19 (down to 4-5 secs from 20 secs for XXL AB). > Regression caused by (bug #): bug 558931. FTR: Magnus, per aceman's measured analysis of comment 19, bug 558931 came with a performance *gain* of 33% (down to 20secs from 30secs). Per reporter's (Gerv's) comment 0, he has been > seeing this for months, if not years. So if at all (which is unproven as of yet), this was regressed by bug 529584 (as suspected by Magnus comment 1) which introduced "Contains" algorithm; otherwise (per whiteboard) it's bad code found in prior search design (e.g. inefficient deduplication) and just more exposed now by bug 529584 (and if you search for e.g. 2 single letters, then perhaps bug 558931 splitting multiword searches might double the exposure).

Blocks: 1000775, 558931, 529584

No longer depends on: 1000775, 558931

Updated

•

11 years ago

Attachment #8428850 - Flags: approval-comm-esr31? → approval-comm-esr31+

Mark Banner (:standard8)

Updated

•

11 years ago

tracking-thunderbird_esr31: ? → 32+

Michael Evangelo

Comment 40

•

11 years ago

I see patch v2.1 at the top of this bug. I click on it and I'm reading Greek. So what do we do with this patch? How is this problem fixed if its just a bunch of gibberish? How do we use the patch?

:aceman

Assignee

Comment 41

•

11 years ago

The patch was accepted to be included into TB31.1. So just wait until it is merged there and you should get the update automatically.

Mark Banner (:standard8)

Comment 44

•

11 years ago

https://hg.mozilla.org/releases/comm-esr31/rev/e0b033b62d2c

status-thunderbird_esr31: --- → fixed

Nicola Ferralis

Comment 45

•

11 years ago

(In reply to Mark Banner (:standard8) from comment #44) > https://hg.mozilla.org/releases/comm-esr31/rev/e0b033b62d2c Has this been merged and available with the nightly builds?

Magnus Melin [:mkmelin]

Comment 46

•

11 years ago

Yes that link means it landed for the tb31.1 build.

Thomas D. (:thomas8)

Updated

•

10 years ago

Updated

•

10 years ago

WIP patch 11 years ago :aceman 10.64 KB, patch	standard8 : feedback+ mconley : feedback+	Details \| Diff \| Splinter Review
patch v2 11 years ago :aceman 8.88 KB, patch	standard8 : review+	Details \| Diff \| Splinter Review
patch v2.1 11 years ago :aceman 8.88 KB, patch	aceman : review+ standard8 : approval-comm-esr31+	Details \| Diff \| Splinter Review