Closed Bug 970456 Opened 10 years ago Closed 10 years ago

Recipient autocomplete: Implement scoring algorithm to prioritize "beginsWith" wordwise matches over "contains" matches: search for "be" should toplist "Ben" and "Betty" over "HolBErt" (having "be" in the middle of word)

Categories

(Thunderbird :: Address Book, defect)

defect
Not set
normal

Tracking

(thunderbird36 fixed, thunderbird_esr3135+ fixed)

RESOLVED FIXED
Thunderbird 36.0
Tracking Status
thunderbird36 --- fixed
thunderbird_esr31 35+ fixed

People

(Reporter: dholbert, Assigned: mkmelin)

References

Details

(Keywords: perf, regression, ux-efficiency, Whiteboard: [read comment 177 before commenting][fixed for thunderbird 31.4.0 and up][makes popularityIndex the secondary sorting criterion after primary scoring based on matching patterns; initial display name top-scoring will be reverted by bug 1134986])

Attachments

(1 file, 7 obsolete files)

Today, I opened a compose window and typed (as I'm used to doing) "Be" [tab] into the "To" field, expecting it to autocomplete to my friend Ben's email address.

Instead, it autocompleted to a family member's email address, who happens to have "be" in the middle of their last name. (holBErt)


Based on some discussion in #maildev, I've learned that:
 (a) Thunderbird used to choose the list of autocomple contacts based on startsWith() -- but now it uses contains() (which changed in bug 529584)
 (b) This holBErt family-member is pushing Ben out of the way because I've exchanged more email with this person than with my friend Ben.


However: I don't usually start typing in my contacts from the middle of their name, and I suspect that's true of most users. And I think it's a bit ridiculous that Thunderbird is now matching contacts by assuming that users are equally likely to start typing a contact's name at any point in the name. (It's *extremely* likely that folks will start typing from the beginning of names, and our heuristics should understand that.)


So -- my "Expected Behavior" (with the new .contains()-based matching) would probably be that Thunderbird would sort all of the startsWith() contacts first, based on popularity, and *then* sort all of the contains()-but-doesn't-startsWith() contacts after them.  So all of my "holBErt" contacts would still be listed when I type "Be", but they'd be listed *after* all of the Bens, Bettys, etc.
Version info, from About Earlybird: 29.0a2 (2014-02-10)
Summary: "Be" doesn't autocomplete to my friend "Ben" in my compose window anymore (because I have frequently-contacted holBErt contacts) → "Be" doesn't autocomplete to my friend "Ben" in my compose window anymore (because I have frequently-contacted holBErt contacts, with "Be" in the middle of their last name)
Summary: "Be" doesn't autocomplete to my friend "Ben" in my compose window anymore (because I have frequently-contacted holBErt contacts, with "Be" in the middle of their last name) → "Be" doesn't autocomplete to my friend "Ben" in my compose window anymore (because I have frequently-contacted holBErt contacts, with "be" in the middle of their last name)
I really agree with you! However, some people in bug 529584 seem to start typing from the middle of a name ;)

There is some refactoring going on in bug 558931 in this area so we can look at your proposal to refine the search after that finished.
Keywords: regression
(In reply to :aceman from comment #2)
> I really agree with you! However, some people in bug 529584 seem to start
> typing from the middle of a name ;)

Sure - but when there are both start-of-name matches and middle-of-name matches, I think it's reasonable to expect that Thunderbird would sort the start-of-name matches first.

> There is some refactoring going on in bug 558931 in this area so we can look
> at your proposal to refine the search after that finished.

OK, thanks.

[I tried to put that bug number into the "depends" field on this bug, but apparently that creates a cycle involving bug 529584 (which I believe this is correctly marked as 'blocking', in the '$X is a regression caused by $Y' sense). So I'm leaving this with no official link to bug 558931 at the moment, but feel free to adjust the dependencies if you like.]
Hi Daniel,

I sympathise with your disappointment that auto-complete no longer behaves exactly as it did before (returning your results in a different order); in fact, the complaint of many users that auto-complete doesn't return the desired matching results *at all* has been the main motivation for the changes in bug 529584 and bug 558931.

I have several good news for you, and a bit of correction and perspective.

1) good news 1: for many use cases, bug 558931 (introducing "contains") *ensures* that searching for the beginning of names, email addresses etc. as you do is possible *at all*. So that's actually in favor of your scenario (while I'm not denying your change of result order also relates to that, but not exclusively).
Example:
Card with display name "Anne Marie Johnson", email "a.marie.j@foo.bar.baz", no other fields filled(!).
Before bug 558931, searching any of the following yielded no results, but does now:
"Marie", "John", "marie.j", "bar.baz"
Iow, we've made it so that searching for the beginning of middle names, double-names, inner parts / subdomains / domains of email addresses actually works now where it didn't before.

2) good news 2: From your description, I think what you really want is a permanent 1:1 translation of your search term ("Be") into "Ben Johnson" (your friend). Fortunately, TB (before and after the bugs mentioned) already has this feature: Nickname field! (Unfortunately, it doesn't work 100% as it should, due to Bug 325458, and perhaps Bug 295428). Iow, conceptionally, your usecase seems to be about "static alias names" which is exactly what the "nickname" field provides (albeit buggy). You can work around bug 325458 by choosing a truly unique nickname like "bbb" for your friend Ben.

Now before we proceed, some correction and perspective.

3) Correction: Your scenario isn't really a regression from bug 558931 which introduced "contains". I'll explain why:
Daniel has 2 cards.
Card C1: e.g. Display Name "Ben Johnson" (his friend), has lower popularity index (say 10).
Card C2: e.g. Display Name "Mother Holbert" (family member), has higher popularity index (say 100).
So why exactly is it that before bug 558931, when Daniel types "Be", his friend "Ben" is always returned but not his Mum with higher popularity?
It's not *just* because old algorithm uses "beginsWith" (hence doesn't find Mum where "be" is in the middle of the word); it's mainly because "Ben Johnson", in Daniel's particular and entirely coincidental set of contacts, happens to be *the only* contact that has any field starting with "Be". Iow, even before/without bug 558931 (introducing contains), Daniel's scenario ("be" -> "Ben Johnson") is not that stable 1:1 translation that it appears to be; rather, it's entirely coincidental and instable. It will fail for most users as soon as there's any other contact having any field starting with "be". Example:
Suppose Daniel's AB has the following cards:
"Ben Bighead"
"Ben Bulldozer"
"Ben Bully"
"Ben Johnson"
"Celine Bell"
"Foo Bar <bemail@foo.bar>"
With that, searching for "be" will return all of the above as matches, ordered by popularity.

So Daniels current expectation that "be" always reliably returns "Ben Johnson" is an artefact only owed to the uniqueness of that particular card in Ben's AB; it's not a feature that TB has ever had.
To be clear: Even before bug 558931 (introducing contains), just typing two letters ("be") will yield a potentially unlimited number of false positives, depending on your AB content, ordered exclusively on popularity. In the current interim stage (after bug 558931, without bug 529584) you'll get a few more results in that particular set (for very short search terms!), but bug 558931 is entirely unrelated to the order of results (which is the main problem here as Daniel wants the "right" match to come topmost).

3) Given 2), it's obvious in the current design(!) that very short searchwords (2 characters in Daniel's case) can yield a potentially unlimited number of false positives, simply because 2 letter searchwords are not unique enough for any real-world dataset (unless we introduce frecency, which is another story, see below). However, even in the current interim stage, number of false positives will reduce drastically if you use just a few more letters, say 3 or 4-letter searchword ("Ben" is a bad example because it only has 3 letters...):
E.g. in the above scenario of 3):
"be" -> 6 matches
"bel" -> 1 match
"ben" -> 6 matches
"ben j" -> 1 match

4) Good news 4; Perspective 1:
bug 529584 (introducing *foo* AND *bar") is the logical twin of bug 558931 (introducing *foo bar* aka contains). After bug 529584, searching for the needle in the haystack will be *a lot* more efficient than before, and we'll enable users to limit the number of results radically after adapting their search habits only slightly.
E.g., Daniel will be able to use the following search:
"be jo" -> very likely, unique match for "Ben Johnson".
If that's not enough because you also have "Bella Johannson", a few more letters will definitely do the trick:
"Ben Jo" -> most likely, unique match for "Ben Johnson".
"Ben John" -> 98% likely, unique match for "Ben Johnson".
Of course, it all depends on the uniqueness of your search terms against your data, but that's not different from now.

5) More perspective:
I'm not denying that we can improve the order of autocomplete results; but that's independent of fixing the bad ux problems (missing results) addressed by bug 529584 and bug 558931, and the additional search power for partial matches introduced there.
- Yes, we could probably do something along the lines of Daniels proposal of reordering results (beginsWith matches with higher priority than contains matches). However, it won't be easy to code, and the devil is certainly in the detail. I'm not sure it's worth the effort.
- If users like Daniel don't like "Nickname" (e.g. because it requires manual setup) - fair enough. Then you're probably looking for "dynamic alias names" aka "frecency", which is Bug 382415. That would enable TB to be smart and automatically learn about the users expectations when he types "be" to find "Ben Johnson" (and not the other Ben's in the same AB).
Again (to counter all myths in this area), TB currently does *not* have any algorithm of frecency; we just have a dumb counter of "popularityIndex" which does *not* consider the timeline ("recent usage") of your favorite contacts at all. So again, that's another bug which is not related or regressed in any way by bug 529584 and bug 558931; if anything, these bugs might expose those other bugs a bit more depending on your particular data sets and scenarios. And yes, I'm much in favor of doing bug 382415 to add "frecency" algorithm to autocomplete (as in FF awesome bar); unfortunately, that won't be easy either.

I'll take some more time to think about this; but from the above (especially 2 and 3) I have a strong suspicion that this bug is not a regression and it's actually invalid because it's based on assumptions that don't apply neither to the old nor the new algorithm of autocomplete.
Yes, but the proposal for enhancement (ordering "begins with" matches) to the top is certainly valid and useful. I just don't know if current infrastructure in the search allows any prioritization. But at worst, it should be able to be similated by running two queries - the old beginsWith and the new Contains and just concatenating the results (removing dupes as matches from first query will be in the second one too).
If somebody wants to look into autocomplete results order (beginsWith matches before contains matches), pls ensure that you don't do that field-wise, but wordwise, because a technical "contains" match on a field, e.g. "ben" -> display name: "Albert Ben Ali" is a "beginsWith" match from the user's semantic pov.
Enhancement; not a regression per comment 4.
Severity: normal → enhancement
Keywords: regression
OS: Linux → All
Hardware: x86_64 → All
Summary: "Be" doesn't autocomplete to my friend "Ben" in my compose window anymore (because I have frequently-contacted holBErt contacts, with "be" in the middle of their last name) → Recipient autocomplete: prioritize "beginsWith" wordwise matches over "contains" matches: search for "be" should toplist "Ben" and "Betty" over "HolBErt" (having "be" in the middle of word)
(In reply to :aceman from comment #5)
> Yes, but the proposal for enhancement (ordering "begins with" matches) to
> the top is certainly valid and useful. I just don't know if current
> infrastructure in the search allows any prioritization. But at worst, it
> should be able to be similated by running two queries - the old beginsWith
> and the new Contains and just concatenating the results (removing dupes as
> matches from first query will be in the second one too).

+1. Won't be easy. But still, thanks Daniel for the idea.

Daniel, fwiw, pls consider that semantic boundaries (especially in other languages e.g. German) do not necessarily coincide with word boundaries (see my 558931 Comment #93), so there are lots of usecases for "natural" searches in the *middle* of words (contains):
Annamaria (joined first names)
Donaudampfschiffahrtskapitän (extreme example of joined nouns for a job title in German; also for company names etc.)
Johnson-Miller (double names joined by hyphen; that's 1 word technically, isn't it?)
airmozilla.org (joined words in domains; very frequent!!! searching for "mozilla" should still find this, shouldn't it?)
Christmas vs. XMas vs. X-Mas (just search for "mas" to find them all!)
"Highway Administration" vs. "Motorway Administration" (don't remember? just search for "way admin")

In addition to that, even the less or non-semantic use (searching for word/name fragments) can be extremely helpful when you are not sure about the exact (initial) spelling of words (for more details, see my Bug 558931 Comment 67, and (better) Bug 558931 Comment 109.

In view of all these, I am not sure how much the distinction between "beginsWith" and "contains" matches will actually help; as I said, the devil is definitely in the detail.

What if there's a card that matches "moz" with "contains" in many fields:
Card1:
Display Name: "Airmozilla management"
Email: "management@airmozilla.org"

vs. another card that has just a single match:
Card2:
Display Name: "Government of Mozambique"
Email: "info@mz.gov"

Isn't card 1 a better match although it's "only" based on contains matches?
Should email matches get some priority over names matches, or vice versa?
It's not all that simple as it looks.
(In reply to Thomas D. from comment #4)

> 4) Good news 4; Perspective 1:
> bug 529584 (introducing *foo* AND *bar") is the logical twin of bug 558931
> (introducing *foo bar* aka contains). After bug 529584, searching for the
> needle in the haystack will be *a lot* more efficient than before, and we'll
> enable users to limit the number of results radically after adapting their
> search habits only slightly.
> E.g., Daniel will be able to use the following search:
> "be jo" -> very likely, unique match for "Ben Johnson".
> If that's not enough because you also have "Bella Johannson", a few more
> letters will definitely do the trick:
> "Ben Jo" -> most likely, unique match for "Ben Johnson".
> "Ben John" -> 98% likely, unique match for "Ben Johnson".
> Of course, it all depends on the uniqueness of your search terms against
> your data, but that's not different from now.

Display name "Ben Johnson" isn't a good example for bug 529584, which is better illustrated by fields having more than two words, e.g.:
Display Name "Ben Daniel Johnson" and then you can still search for "be jo" etc.
(In an attempt of correction to Thomas D.'s comment #4)
I think that the bug numbers got swapped (bug 529584 took the place of bug 558931 and vice versa).
Just a thought.

Thanks.
(In reply to Suyash Agarwal (:sshagarwal) from comment #10)
> (In an attempt of correction to Thomas D.'s comment #4)
> I think that the bug numbers got swapped (bug 529584 took the place of bug
> 558931 and vice versa).

Oh Suyash, thanks a lot for that one, good catch. Indeed, I mixed up the two bug numbers and they are consistently wrong and should be the other way round (sorry for that):

(In reply to Thomas D. from comment #4)
> 1) good news 1: for many use cases, bug 558931 (introducing "contains")
> 4) Good news 4; Perspective 1:
> bug 529584 (introducing *foo* AND *bar") is the logical twin of bug 558931
> (introducing *foo bar* aka contains). After bug 529584, searching for the
> needle in the haystack will be *a lot* more efficient than before,
(In reply to Thomas D. from comment #9)
> Display name "Ben Johnson" isn't a good example for bug 529584, which is better
> illustrated by fields having more than two words, e.g.:
> Display Name "Ben Daniel Johnson" and then you can still search for "be jo" etc.

So for all of comment 4 and comment 9, any mention of bug 558931 and bug 529584 needs to be swapped, because:

bug 529584 has introduced *foo bar* (aka "contains" full search string instead of previous "beginsWith" full search string)

bug 558931 will introduce *foo* AND *bar" (aka split multiword search which breaks up the full search string; still using contains)

There's also a broken link in comment 8:
Semantic boundaries vs. word boundaries is explained in bug 558931 Comment #93.
I think comment 8 (bottom part) implies that this idea still needs a lot more thought before implementing -> adjusting summary accordingly.

I'll add another doubt/problem:

Card 1:
Display name: "John Doe"
Email1: <geek.helpdesk@foo.bar>
Email2: <geek.admin@foo.bar>

Card 2:
Display name: "John Doe (private)"
Email: <privategeek@foo.bar>

Especially in the current flawed design of the old address book, it's very possible that users have multiple cards for the same person, either to work around the limited number of email addresses per card, or due to other bugs and design flaws in the way we collect email addresses.

This bug proposes to toplist "beginsWith" matches, and then list "contains" matches further down in the results list, after all the "beginsWith" matches.
For the above scenario, that will have very confusing results:

search for "geek" (and prioritize results per current unrefined proposal of this bug)
-> card 1, "John Doe" will end up at the top of the list.
-> card 2, "John Doe (private)" will end up somewhere at the bottom of the list.

Yet it's the same person. Even if it were just similar names, I'd probably expect them to be sorted next to each other, similar to what the current algorithm does (I think), where we have some secondary alphabetical sorting after the primary popularityIndex sorting (I'm not sure how that alphabetical sorting works).

So there are a number of variables that influence how "good" a result is, and it's not clear yet how they interact with "contains" matches vs. "beginsWith" matches.
- in which field was the match and how relevant is that field for result sorting? iow, perhaps "contains" matches in certain important fields (like name fields) might be better matches than "begins" matches in less important fields (like notes)?
- in how many fields was the match? e.g. should "contains" match in two fields of one card be prioritized over "beginsWith" match in single field of another card?
- how do we account for semantical matches that don't coincide with word boundaries?
- do we need to account for problems due to current wrong design of AB? (see above in this comment)
- etc.
Summary: Recipient autocomplete: prioritize "beginsWith" wordwise matches over "contains" matches: search for "be" should toplist "Ben" and "Betty" over "HolBErt" (having "be" in the middle of word) → Recipient autocomplete: Explore idea of prioritizing "beginsWith" wordwise matches over "contains" matches: search for "be" should toplist "Ben" and "Betty" over "HolBErt" (having "be" in the middle of word)
The other question is:

If we introduce "frecency" (Bug 382415) which will prioritize the most recently used and most frequently chosen autocomplete results, won't that more or less obsolete any other sorting algorithms for the result list?

I strongly suspect we should start with Bug 382415 and then take it from there if we still need more sorting as proposed in this bug. Don't start with this bug.
See Also: → 972690
Blocks: 972690
So IIUC this bug is about how to prioritize various autocomplete orders:
- frecency
- beginsWith ("jo" or "ge" in "John Doe" <geek.admin@example.net>)
- I suggest beginsField ("Do", "ad", "ex" in the same case)
- contains ("k.a", "xam")
(In reply to Tony Mechelynck [:tonymec] from comment #15)
> So IIUC this bug is about how to prioritize various autocomplete orders:
> - frecency

Exactly. Sorting by "frecency" means ordering of *all* results based on a combined calculated value, where the most recently used and most frequently chosen results for your specific search input rank highest. So *all* of your results are sorted based on that value. After that, sorting by any other criterion becomes meaningless and impossible (unless for those edge cases where "frecency" value happens to be exactly the same for more than one address, so that's when you need another fallback sort algorithm). Otherwise if after frequency sorting, you start sorting by any other criteria (e.g. alphabetical "word beginsWith"), it means you ignore/break up frecency sorting and make that other criterion the dominating sort order (the legitimate exception being nicknames, which must indeed be toplisted ignoring "frecency").

So while your proposal sounds reasonable at first, it doesn't apply for the majority of cases because sorting by "frecency" breaks all other sort orders (and I think that's what we ultimately want for autocomplete; users who want alphabetical lists are better advised to search in contacts side bar/AB).

Complex sorting of results per Tony's comment 15 only applies for results having exactly the same "frecency" value, including zero frecency when you start on a new AB which has never been used. However over time, as frecency values get set for each result (which happens quickly), such sorting becomes largely irrelevant.

So I maintain that this bug and other bugs which suggest alphabetical sorting aren't very useful and based on factual misunderstandings that in fact makes them appear invalid. Unless you are suggesting we shouldn't use "frecency", but that would require a new bug which will certainly end up "wontfix".

I think the way forward is to fully implement "frecency" in bug 382415 (to replace currently used popularityIndex), and ensure full nickname matches are toplisted for those users who want stable relationships between their few-letter search inputs and results (bug 325458).

> - beginsWith ("jo" or "ge" in "John Doe" <geek.admin@example.net>)
> - I suggest beginsField ("Do", "ad", "ex" in the same case)

That's not beginsField, I think you meant "word beginsWith" (where words are separated by spaces). So that still wouldn't catch "ad" in geek.admin, geek_admin, geek-admin etc. which would require a lot of special-casing again which is much easier and more completely done with the currently implemented "contains" logic.

> - contains ("k.a", "xam")

The examples are misleading, because "contains" also covers semantic boundaries inside words (which are only human-parseable), like for geekadmin@ - searching for "admin" makes perfect sense semantically and is not structurally different from searching beginning of words of fields: Iow, technically it's a "contains" search but semantically it's a "beginsWith" search so there's no reason to treat that any different from other "beginsWith" search results. And in other languages like German, combined words that are not separated by spaces or hyphens etc. can be much more frequent compared to English.
I'm still struggling to understand why the ONLY match when I type "me " (that an m an e and a space) is: 

Time Warner Cable <connections@twcupdate.com>

and not the addressbook entry

me <gwelsh@hawaii.rr.com>

that's not even a priority issue, that's a failure to find a simple word.
(In reply to :aceman from comment #5)
> Yes, but the proposal for enhancement (ordering "begins with" matches) to
> the top is certainly valid and useful. I just don't know if current
> infrastructure in the search allows any prioritization. But at worst, it
> should be able to be similated by running two queries - the old beginsWith
> and the new Contains and just concatenating the results (removing dupes as
> matches from first query will be in the second one too).

Aceman, as explained in my comment 16: If we accept we ultimately want "frecency" concept (as in FF awesome bar) which dynamically adapts to users search patterns and learns what the users favourite results are (e.g. when user types "be" he wants "Ben" and not "Betty" or "Holbert"), then practically all other sorting algorithms become a largely irrelevant edge case which only applies as a secondary sorting of results having exactly the same "frecency" value, which is unlikely except for a virgin AB which has never been used. So imo we should focus on implementing "frecency" (bug 382415) which will much better assist users who have commented here, because all other sorting algorithms are becoming useless for bigger datasets (and I think our address books are growing daily, aren't they?). E.g., when your AB has "Ben", "Betty", "Bert", "Bernadette", "Bernd" etc. - any sorting based on alphabetical "word begins with" will not help you to pick your favourite, but with "frecency" algorithm, TB will dynamically learn that when you type "be" you're looking for "Ben". And when "Ben" becomes irrelevant, TB will again dynamically adjust and learn from the results you pick that now it's "Betty" which you mean when typing "be", so "Betty" will move up in your results lists if you pick her more often...

It's unfortunate that the combination of bugs and incomplete result sets based on former misdesign of "field beginsWith" search and "popularityIndex" in TB has tricked users into believing we already have a working implementation of "frecency" and "nicks" and now that we show complete result sets those bugs become more exposed and some users get irritated because result sets are a bit different than before (while a lot of other users will be happy that they can finally find more efficiently what they are looking for!). Also consider current autocomplete search algorithm on trunk is transitional until bug 558931 has landed, which will dramatically increase search power with less search input, but it might require a bit of adjustment of user search patterns...
If frecency isn't already stored in the AB database, adding it is so hard nobody would do it right now. we better wait for the new addressbook by mconley (hopefully it is not harder to operate than the current code).
Do you have the string "me <gwelsh@hawaii.rr.com>" stored in any field of the relevant addressbook card? I really doubt that. Usually "gwelsh@hawaii.rr.com" would be in Email field and "me" in Lastname or Display name. So there is no "me " string to find.

(In reply to gwelsh from comment #17)
> I'm still struggling to understand why the ONLY match when I type "me "
> (that an m an e and a space) is: 
> 
> Time Warner Cable <connections@twcupdate.com>
> 
> and not the addressbook entry
> 
> me <gwelsh@hawaii.rr.com>
> 
> that's not even a priority issue, that's a failure to find a simple word.
(In reply to gwelsh from comment #17)
> I'm still struggling to understand why the ONLY match when I type "me "
> (that an m an e and a space) is: 
> 
> Time Warner Cable <connections@twcupdate.com>
> 
> and not the addressbook entry
> 
> me <gwelsh@hawaii.rr.com>
> 
> that's not even a priority issue, that's a failure to find a simple word.

Gwelsh, is that on Trunk or TB24 release version? Are you using LDAP?
Either way, I don't see how this could happen on either of them.
If it does happen for you, then please file a bug providing detailed steps to reproduce (including all details of fields of the two sample cards; LDAP or not; POP vs. IMAP), actual result, expected result. tia.

The good news being "failures to find a simple word" are exactly what we've addressed for a lot of scenarios in bug 529584 and bug 558931 (pending).
(In reply to :aceman from comment #20)
> Do you have the string "me <gwelsh@hawaii.rr.com>" stored in any field of
> the relevant addressbook card? I really doubt that. Usually
> "gwelsh@hawaii.rr.com" would be in Email field and "me" in Lastname or
> Display name. So there is no "me " string to find.
> 
Oh, good catch. Indeed. So scrap my comment 21.

Gwelsh, current patch for bug 558931 will trim whitespace around search words, i.e. you type "me " and we'll search for "me" only. That that will include your desired result, but also find "Time Warner Cable". It will still be possible to search for "me " if you actually use double quotes around it. However, as Aceman explained, the string "me " (with trailing space) won't be found because it's not found in any field of your card.

Having said that, I do sympathise with your idea from a user's pov:

User types "me" and TB returns "me <foo@bar.com". So to the user, it looks like typing/finding the full combined string "me <foo@bar.com". User is not aware that this is constructed from Name field + Email field. I just tested this on TB 24 and the UX is indeed bad, autocomplete match even suddenly changes when you continue typing the space after the name, which is ux-inconsistent with matching/highlighting the rest of the found entry.
So indeed, we could explore the idea of allowing such matches (needs new RFE).

Aceman, this could be easy to implement as we just need to add the combination of Name* + " " + "<" + email + ">" to the combined string of field contents which Suyash is searching in bug 558931, where Name* needs to be whichever name that autocomplete shows in its results.
(In reply to Thomas D. from comment #22)

However, Gwelsh, good news for you: After bug 558931, it will be much easier for you to handle such scenarios (even without defining nick name):

just search for "me gw" or "me ii" or "me rr" or "me rr." or anything like that and there's a 95% chance that your address will be the only result. For most users, typing "Some Name <..." will actually be quite clumsy because most names are longer than "me" (i.e. the prospects of implementation for that RFE outlined in comment 22 are low).

Otherwise, even in current release version TB 24, here's a better solution for you:
What you really want is that typing "me" will always and exclusively return your email address "me <gwelsh@h...>, isn't it? So you want a permanently stable relationship between your search word and one particular result card. For that purpose, please define unique string in the "Nickname" field of your result card. Due to Bug 325458, unfortunately you can't yet use just "me" because it's not unique enough (must be unique against all fields of all cards), but Nickname like "#me" or "me*" or "mmm" will do the trick.
I recall having made a comment on some bug that we should allow searching for "<foo" to find email addresses like foobar@baz.com, because such searches make sense from user's pov because that's what we return in the autocomplete matches.
(In reply to Thomas D. from comment #22)
> Aceman, this could be easy to implement as we just need to add the
> combination of Name* + " " + "<" + email + ">" to the combined string of
> field contents which Suyash is searching in bug 558931, where Name* needs to
> be whichever name that autocomplete shows in its results.

If such a string is not already in the database, constructing it for all cards and then checking if it matches could be quite costly.
(In reply to :aceman from comment #19)
> If frecency isn't already stored in the AB database, adding it is so hard
> nobody would do it right now. we better wait for the new addressbook by
> mconley (hopefully it is not harder to operate than the current code).

I recall that mconley has practically offered that project for continuation by somebody else (with support by mconley, so waiting for him you might wait forever...
(In reply to :aceman from comment #20)
> Do you have the string "me <gwelsh@hawaii.rr.com>" stored in any field of
> the relevant addressbook card? I really doubt that. Usually
> "gwelsh@hawaii.rr.com" would be in Email field and "me" in Lastname or
> Display name. So there is no "me " string to find.
> 

Oh, I see.  I have added the space, and now it finds it, but lists is second, which is back entirely on the topic of this bug.  Finding letters at the beginning of a word is how humans spell, it should be how autocomplete works.  Or if programmers can't agree that's what users want, then make it a choice to have a real autocomplete OR a search.
Severity: enhancement → normal
Attached patch wip fix (obsolete) — Splinter Review
This orders results based 
 - first based on indexof serchstring in fulladdress:  usually you would get it right if you don't start in the middle of the word, but i do think people would usually get it right for people they frequently correspond with
 - then popularity (like before)
 - then alphabetical (like before)

It doesn't order correctly over multiple addressbooks (local+ldap) but that didn't work earlier either. I guess the autocomplete controller would have to do that, so it would be harder.

I think this patch solves a lot of the most obvious problems... tests and stuff would have to be adjusted still. What do people think?
Assignee: nobody → mkmelin+mozilla
Would it be possible to make this change on top of bug 984875?
That should be straight forward yes.
(In reply to Magnus Melin from comment #28)
> Created attachment 8421659 [details] [diff] [review]
> wip fix
> 
> This orders results based 
>  - first based on indexof searchstring in fulladdress:

fulladdress could be things like
"John Doe <johnny.d@asdf.com>"
"Doe, John <johnny.doe77@asdf.com>"
"Anne Marie Johnson <sweetness@asdf.com>"
"Marie Peters <peters.marie@asdf.com>"
"Peter Sample (Mozilla CEO) <samplepete@mozilla.com>"
"Mozilla Messaging <mozmess@mozilla.com>"
"Sam Peterson" <samp@asdf.com>

> usually you would get
> it right if you don't start in the middle of the word, but i do think people
> would usually get it right for people they frequently correspond with

Codewise, this looks a lot more efficient than status quo, so that's good.

I'm still very sceptical about sorting algorithm, but it's hard to tell, and without real frecency algorithm and working nicknames, anything we do here is just half the way and will always fail for a lot of cases. So I think this just changes the set of failures, but maybe it'll work better for many traditional usecases.

So for a lot of valid scenarios except the most basic searches, this effectively removes the last semblance of frecency (in its poor manifestation as PopularityIndex). It's not about starting in the "middle of a word"; it's about starting with any word from the Names or Email of the full address that users remember best about the person or use deliberately to exclude false positives.
E.g. in the above examples, depending on personal preferences and setups, users might just as well search for words which are more towards the middle of fullstring, like
"Marie" in "Anne Marie Johnson",
"John" in "Doe, John",
"Mozilla" in "Peter Sample (Mozilla CEO)"
"Samp" in "Peter Sample (Mozilla CEO)"

For all of these, I'm not sure why they should be downgraded in the results list just based on IndexOf search phrase in fullAddress, or why they should never ever get to the top regardless of their potentially higher popularityIndex because of other, similar entries which happen to have a better IndexOf (searchword happens to be found more to the left of full address)...

So what if users search based on email address? This patch will effect that if the email of the target card is more memorable but different from its display name, and if there are any other matches where the start of the target's email happens to occur in another person's display name, the intended match based on email will NEVER get to the top of the list (regardless of higher popularityIndex).

I'd also like to know how matches on other fields like First, Last, Nickname, which aren't in fulladdress string, are sorted in?
I understand their indexOf will be 0, so they'll all end up at the top and then sorted by popularity, isn't it? Well for nickname matches, that's a bit better...

>  - then popularity (like before)
>  - then alphabetical (like before)

--> Wouldn't this be the right corner to finally toplist full nickname matches (Bug 325458)?
Of course frecency is the ultimate solution, but toplisting full nickname matches would also help a great deal for users who want stable 1 on 1 relationships between their favourite search words and specific target cards. And still I think that's what the sort of users who reported this bug really want.

But in spite of my scepticism, it's really hard to predict all possible scenarios, data structures, datasets, personal user search habits, and resulting effects enough to veto this, so if others think it'll work better, feel free to go ahead with it...
I guess we could use the best indexOf (with offset) for each first name, last name, email part. So far I just wanted to test if it seemed to work reasonable with the very basic case.
Attached image TB autocomplete slow.png (obsolete) —
I'm seeing another symptom, which I hope might be solved by this bug. If not I'll be happy to file a new bug.

I used to be able to type a couple letters, fairly quickly hit enter, and have autocomplete pick the most used address.  Below is a rough time line because the behavior changed at several points prior to bug 959209 - please forgive the anal list:
- A 2013-10-26 27.0a1 bug 529584 domain matching patch lands
- B 2014-01-28 29.0a1 works nicely
- C 2014-03-02 20.0a1 works, but a tad slow
- D 2014-04-27 31.0a1 works, but a tad slow as C above
- E 2014-04-28 31.0a1 picks up bug 959209 toolkit autocomplete
- F 2014-04-29 31.0a1 fails as per screen shot

So the kicker is that I get my desired good behavior back (A+B) if I (partly?) revert the behavior of bug 529584 (A) by using morecols.autocomplete.match_just_beginning = true provided by 
addon https://nic-nac-project.org/~kaosmos/morecols-en.html  

If this bug 970456 reporter's behavior is improved by morecols.autocomplete.match_just_beginning = true then perhaps we have a common bug. The question is, why did toolkit autocomplete (E) make domain matching (A) worse??

Bonus if the solution also solves the slight slowness which started somewhere betwen B and C.
Keywords: perf, regression
Blocks: 1012397
No longer blocks: 1012397
See new Bug 1058583, "Address Book Popularity Index needs to age", and a proposed, simple algorithm
since version 31.0

If you type in a letter 'e' then it will return virtually the entire address book.
If any contact has the letter e anywhere in any:
First name, Last name, Display name, email address or secondary email address, then the search will return practically all entries. 
So entering eg: o or c will get picked up due to email addresses ending in com or org

A search should filter not return the entire address book.

This is what I discovered:


I do not have a huge address book, but I have noticed that the autocomplete is now returning a huge number of addresses.

eg: If I type 'a' then virtually every address is offered.

it seems to be scanning for anything that has an 'a' anywhere in the First Name, Last Name, Display Name, email address or additional email address.

But any email address with a Display Name begining with A will not necessarily be easily findable as they are not listed necessarilly first or in top four. In fact they could be about 20th in the list or more!

for example: my mother has Display Name as 'Mum' her email address does not have a letter 'a' in it. Her surname does not have an 'a' in it Her first name has a letter 'a' in the middle of the name. This email address is returned in the top 4.

If I type 'm' then I get the first 'M' display name = 'Mum' as number 4 in the list and my website email address 'mail@ etc' will be first in the list, although my name has no 'm' anywhere in it.

The ones where display name have an M at start of name were listed as numbers 4, 5, 7, 14, 35 and later. A total of 149 were listed as returned by the autosearch. note: 49 contacts are in my PAB 60 in another 94 in another 83 in Collected Addresses

some of these were duplicates as I keep an additional record of important in other folder, just in case I lose an address book.

So the search is effectively returning virtually everything.

I also noticed that it returned my brothers email address which has no 'm' anywhere in his name or email address, but there was an 'm' in the middle of his additional email address, so both were returned in the search.


I use the MoreFunctionsForAddressBook extension.
Tools > Options > Advanced > General tab
click on 'Config Editor;
it will tell you to be careful :)
In top search type:  morecols

look for this line:
morecols.autocomplete.match_just_beginning;  false
double click o n that line to toggle the  'false' to a 'true'
close window - top right X
click on OK to close the Options window.

Applied same 'm' in a TO field, This time it returned 32 instead of 149.

So I regard this as a massive improvement. 'M' is now returned for anything where the first letter is 'M' in First name or Surname, or Display Name or email address. It will also return both email addresses if the additional one has 'm' matching the conditions listed above.

Switching on this option in config editor now makes the autocomplete search work as reasonably  expected when searching on a one letter condition and even better for subsequent letters.

So for those experiencing a slowdown, could it be slowing down because of the nature of the search to return virtually the whole address books and with no apparent order ? I do not experience slowdown as I have few contacts in relative terms, but if the search was going through hundreds or thousands of contacts then it would explain why it was causing problems.

If the search parameters could be adjusted so that the returns are the same as I can get with MoreFunctionsForAddressBook, and then list them in order of Display name first, it would make the search useful. 

But currently without the addon to assist this function is virtually redundant; you have to type the entire name to get a useful shortlist. If the search starts searching immediately and their address books are large, there have been reports that they cannot enter a second letter because of slowdown.

One example of questions here:
https://support.mozilla.org/en-US/questions/1011952
This seems to do it.
Attachment #8421659 - Attachment is obsolete: true
Attachment #8462876 - Attachment is obsolete: true
Attachment #8484401 - Flags: review?(standard8)
Attachment #8484401 - Flags: feedback?(acelists)
Status: NEW → ASSIGNED
Comment on attachment 8484401 [details] [diff] [review]
bug970456_Be_Ben_autocomplete.patch

Yes, this is along the idea I had how to solve this. I can't test it as my build is still broken. Somebody just needs to check how much this slows down searches on 20000 cards.
Attachment #8484401 - Flags: feedback?(acelists) → feedback+
Did you have a large test ab you can send to me?
Also, we should get you building again. Why is it broken? At least the linux tree is building ok.
(In reply to Magnus Melin from comment #39)
> Did you have a large test ab you can send to me?
Yes, I have one but I am not sure it is distributable. I got it from Wayne, so he should be able to decide that.

> Also, we should get you building again. Why is it broken? At least the linux
> tree is building ok.
I am on linux but I am hitting bug 1054034. Must be something in my setup as nobody else is seeing it. But I get this problem for 2 months now.
Flags: needinfo?(vseerror)
Comment on attachment 8484401 [details] [diff] [review]
bug970456_Be_Ben_autocomplete.patch

Review of attachment 8484401 [details] [diff] [review]:
-----------------------------------------------------------------

Magnus, big thank you for taking the initiative here in spite of my vocal scepticism.

From user comments on respective bugs, there's no doubt the ordering system in its current entirety (including age-old shortcomings which are now more exposed) is a problem for a number of users. Notwithstanding the need of further improvements and redesign to address the old shortcomings, this looks like an immediate, smart and efficient solution to improve the UX. As I mentioned, this solution involves at least partially abandoning the priority of popularityIndex, but what's really smart about this patch is that popularityIndex is still used for prioritizing most wordwise beginsWith matches, and only disregarded for inner contains matches (which after this patch - I think - can never make it to the top as long as there are beginsWith matches). So yeah and well done, this looks like it could work as a good compromise formula. And we're definitely not worse than before the twin bugs because at least we're now showing inner contains matches, only at the end of the list.

I have a few questions, and ideas how to improve the "wordwise beginsWith priority" algorithm introduced with this patch.

::: mailnews/addrbook/src/nsAbAutoCompleteSearch.js
@@ +127,5 @@
>    /**
> +   * Gets the score of the (full) address, given the search input. We want
> +   * results that match the beginning of a "word" in the result to score better
> +   * than a result that matches only in the middle of the word.
> +   * 

nit: whitespace

@@ +133,5 @@
> +   * @param aSearchString - user given search string
> +   * @retrun a score; a lower score is better than a higher one
> +   */
> +  _getScore: function(aAddress, aSearchString) {
> +    aAddress = aAddress.toLocaleLowerCase();

I think we have to truncate top level domains from aAddress to avoid reintroducing the .tld problem reported by users ("u" matches all ".uk" addresses), when we check for dot-separated words in email address below.

For a given aAddress
> aAddress = John Doe <jd@domain.uk>
that should be truncated here (before word testing) to become 
> aAddress = "John Doe <jd@domain" [sic]

Double tlds like .co.uk will still fail for search input "c" after that because we keep ".co", but perhaps that's an edge case.

@@ +136,5 @@
> +  _getScore: function(aAddress, aSearchString) {
> +    aAddress = aAddress.toLocaleLowerCase();
> +    aSearchString = aSearchString.toLocaleLowerCase();
> +    let idx = aAddress.indexOf(aSearchString);
> +    if (idx > 0) {

How does this behave for Display names surrounded by quotes?
"John Doe!" <jd@domain.com>
Just asking, perhaps the quotes have already been trimmed away for aAddress as we don't tend to show quotes in the recipient area. If not, the leading quote needs to be trimmed above where we set aAddress.

@@ +139,5 @@
> +    let idx = aAddress.indexOf(aSearchString);
> +    if (idx > 0) {
> +      // We want to treat firstname, lastname and dot separated parts of the
> +      // email address as equals. E.g. for "John Doe <jd@example.com>"
> +      // both "John" and "Doe" and "jd" should score the same.

Suggested comment rephrasing including improvements suggested below (dot separated matches should be included in the example):
// We want to treat firstname, lastname and semantically distinct
// parts of the email address, separated by dots etc., as "words"
// that qualify for scoring high with beginsWith matches.
// E.g. for "John Doe <jd.private@subdomain.domain.com>"
// beginsWith matches for "John", "Doe", "jd", "private",
// "subdomain", and "domain" should equally score high.

@@ +141,5 @@
> +      // We want to treat firstname, lastname and dot separated parts of the
> +      // email address as equals. E.g. for "John Doe <jd@example.com>"
> +      // both "John" and "Doe" and "jd" should score the same.
> +      let prevChar = aAddress.charAt(idx - 1);
> +      if (prevChar == " " || prevChar == "<" || prevChar == ".")

wrt {prevChar == "."}, I think this would reintroduce the .tld problem ("u" matches all ".uk" addresses) unless we truncate the tld from aAddress as I suggested above.

I think we should recognise a few more "words" here (deliberate duplicate examples for some patterns to show use cases):

John Doe (private) <jd@domain.com> -> John, Doe, private, jd, domain
John Doe (Mozilla) <jd@domain.com> -> John, Doe, Mozilla, jd, domain
John Doe (:jd) <jd@domain.com> -> John, Doe, :jd, jd, domain
Foxy <john-doe@domain.com> -> Foxy, john, doe, domain
Foxy <john_doe@domain.com> -> Foxy, john, doe, domain
John Doe <jd-technologies@domain.com> John, Doe, jd, technologies, domain
John Doe <jd@delta-technologies.com> John, Doe, jd, delta, technologies

We can make all of these real life uses cases succeed with better scoring by testing for a few more word boundary characters, "-", "_", "(" and ":" :

if (prevChar == " " || prevChar == "<" || prevChar == "." ||
    prevChar == "-" || prevChar == "_" || prevChar == "(" || prevChar == ":")

@@ +384,5 @@
>        }
>  
>        result._searchResults = [...result._collectedValues.values()];
> +      result._searchResults.sort(function(a, b) {
> +        // Order by 1) ascending score, then 2) descending popularity, 

Thank you for documenting the sort order in detail!
nit: whitespace

@@ +387,5 @@
> +      result._searchResults.sort(function(a, b) {
> +        // Order by 1) ascending score, then 2) descending popularity, 
> +        // then 3) primary email before secondary for the same card, then
> +        // 4) by differing cards sort by email.
> +        return (a.score - b.score) || 

nit: whitespace
Attachment #8484401 - Flags: feedback+
Keywords: ux-efficiency
Updated version. Added some more special chars, but also I think we can completely forget scoring on the domain altogether. Of course there are personal domains and such, but generally it's not specific enough to deserve to be toplisted if there are better matches.

I also make "lastname-matches" score a little worst than direct matches, as it does seem reasonable to give an exact display name match some precedence.
Attachment #8484401 - Attachment is obsolete: true
Attachment #8484401 - Flags: review?(standard8)
Attachment #8485391 - Flags: review?(standard8)
Bug 1058583, which proposes a simple solution to the frequency/recency problem, resolves some of the lingering issues addressed in Comment 41 about the [currently mis-named] PopularityIndex.   

I would advocate for not removing that PopulartityIndex, and instead making the aging work as proposed in 105858.

Also: Does this bug 970456 and the proposed solution[s] address this odd behavior (or is this yet another bug...):
I have many cases where two people who share the same email.  Thus, within the large address book, these exist:
   1.  John Doe <JohnSusan@Domain.Com>
   2.  Susan Doe <JohnSusan@Domain.Com>
   3.  Rye, Susan <someother@email.com>
If it type:
   Susan D
All 3 addresses are selected out of the entire address book as possibilities; I would not have expected #3.  When I finally typed the " D" [space "D"], I would have thought that the name string "Susan D" was specific enough to call out 2, only, and not #1.  I have been writing alot to 'Susan', lately, and would have expected it to be #1 on the offered list.  If I then tab out, item 1 the "John Doe <...>" is entered, and I would have expected Susan Doe.  ((I suspect my bug 1058583 fix would address this, ordering them better)).   Is this addressed by the current, proposed fix, or is it a different bug?

Another possible quirk is exposed by this search:  The popularity sort as configured, now, appears to sort by the popularity of specific email addresses, rather than popularity of address book entries.  This sorting distinction is subtle, but fixable.   It seems like the sort occurs after duplicates are removed, but maybe duplicates should be removed after sorting entries [???].  Is this a different/separate bug?
That behaviour is probably enabled by bug 1000775. You could file a new bug asking for some escaping character that makes the whole string being matched, not individual words from it. E.g. when the string is input as quoted "Susan D".
john: the order you'd get with this patch is #2, #3, #1
Comment on attachment 8485391 [details] [diff] [review]
bug970456_Be_Ben_autocomplete.patch

Review of attachment 8485391 [details] [diff] [review]:
-----------------------------------------------------------------

Thanks for the new patch. Lgtm.

Some really small nits.

::: mailnews/addrbook/src/nsAbAutoCompleteSearch.js
@@ +129,5 @@
> +   * results that match the beginning of a "word" in the result to score better
> +   * than a result that matches only in the middle of the word.
> +   *
> +   * @param aAddress - full address, including display name and address
> +   * @param aSearchString - user given search string

nit: odd syntax/spelling here, perhaps better:

@param aSearchString - search string provided by user

@@ +130,5 @@
> +   * than a result that matches only in the middle of the word.
> +   *
> +   * @param aAddress - full address, including display name and address
> +   * @param aSearchString - user given search string
> +   * @retrun a score; a lower score is better than a higher one

spelling: @return

@@ +133,5 @@
> +   * @param aSearchString - user given search string
> +   * @retrun a score; a lower score is better than a higher one
> +   */
> +  _getScore: function(aAddress, aSearchString) {
> +    // We'll do this case insensitively and ignoring the domain.

nit: ignore (instead of ignoring)

@@ +141,5 @@
> +    let idx = aAddress.indexOf(aSearchString);
> +    if (idx > 0) {
> +      // We want to treat firstname, lastname and word boundary(ish) parts of
> +      // the email address the same. E.g. for "John Doe (:xx) <jd.who@example.com>"
> +      // all of these should be the same score: "John", "Doe", "xx", "jd", "who".

:xx (with leading colon) is missing in the comment because it's also recognized

@@ +148,5 @@
> +          prevCh == "(" || prevCh == "_" || prevCh == "-" || prevCh == "<")
> +        idx = 0;
> +      idx++; // Add one, so exact start match will still be the first hit.
> +    }
> +    else if (idx == -1) {

A comment here might be helpful:
// If aSearchString is not found in the displayed address (but in
// some other field), return highest score so that it gets sorted down

@@ +391,5 @@
>        }
>  
>        result._searchResults = [...result._collectedValues.values()];
> +      result._searchResults.sort(function(a, b) {
> +        // Order by 1) ascending score, then 2) descending popularity, 

nit: trailing whitespace
Attachment #8485391 - Flags: feedback+
I'm probably not going to get around to reviewing this for a week or so. Note, that the patch should include an extension to the current extensive xpcshell-tests to show that it is doing what it is meant to.
(In reply to Magnus Melin from comment #42)
> Added some more special chars,

Suggestion: If user explicitly types such a special char, it should be excluded from the search filter, e.g. ".uk" finds all .uk adresses.

> but also I think we can completely forget scoring on the domain altogether.

I don't agree thinking on cases like:
info@MyDomain.TLD
mail@MyDomain.TLD
contact@MyDomain.TLD
Maybe "info" generally should be scored very very low.

.co.uk : may be there could be a blacklist for such known double TLDs.
(In reply to Ulf Zibis from comment #50)
> (In reply to Magnus Melin from comment #42)
> > Added some more special chars,
> 
> Suggestion: If user explicitly types such a special char, it should be
> excluded from the search filter, e.g. ".uk" finds all .uk adresses.

Ulf, I think there's a misunderstanding. This bug is only about changing the *sort order* of the result set, it will not reduce the result set itself. So yes, with the current patch applied, .uk will still find all .uk addresses, and only if you have ".uk" at the beginning of a display name (unlikely), that contact might be toplisted.

> > but also I think we can completely forget scoring on the domain altogether.
> 
> I don't agree thinking on cases like:
> info@MyDomain.TLD
> mail@MyDomain.TLD
> contact@MyDomain.TLD
> Maybe "info" generally should be scored very very low.

Same here.
I understand that with current patch we are comparing the full multiword searchphrase with the result addresses as shown in autocomplete dropdown list. If you search for just "MyDomain", you'll most likely only get matches having @mydomain.tld, and these will be sorted according to popularity.
More likely you'll search for something like "info mydo", which is specific enough to only return a few or even just one matching address. As you won't have a contact with display name "info mydo", this patch does not affect such usecases at all.
(In reply to Thomas D. from comment #51)
> Ulf, I think there's a misunderstanding. This bug is only about changing the
> *sort order* of the result set, it will not reduce the result set itself.

Ah yes, thanks. But I think my comments will apply similarly on sort order.

> So yes, with the current patch applied, .uk will still find all .uk addresses,
> and only if you have ".uk" at the beginning of a display name (unlikely),
> that contact might be toplisted.

Fine!

> > > but also I think we can completely forget scoring on the domain altogether.
> > 
> > I don't agree thinking on cases like:
> > info@MyDomain.TLD
> > mail@MyDomain.TLD
> > contact@MyDomain.TLD
> > Maybe "info" generally should be scored very very low.

If I type "com" I want abc@computer.tld listed before numerous abc@xyz.com matches, even if the abc@computer.tld has a small popularity count.
If I type "info" I want abc@information.tld listed before numerous info@xyz.tld matches, even if the abc@information.tld has a small popularity count.
(In reply to Ulf Zibis from comment #52)
> (In reply to Thomas D. from comment #51)
> > Ulf, I think there's a misunderstanding. This bug is only about changing the
> > *sort order* of the result set, it will not reduce the result set itself.
> 
> Ah yes, thanks. But I think my comments will apply similarly on sort order.

I don't know, I find it really hard to imagine the net effects of this. Perhaps we can examine that after this patch has landed.

> If I type "com" I want abc@computer.tld listed before numerous abc@xyz.com
> matches, even if the abc@computer.tld has a small popularity count.

Per Comment 42 we ignore the entire domain for scoring:

> Updated version. Added some more special chars, but also I think we can
> completely forget scoring on the domain altogether. Of course there are
> personal domains and such, but generally it's not specific enough to deserve
> to be toplisted if there are better matches.

So "com" anywhere in domain scores equally lowest and gets downgraded to the bottom of the list if there are better matches having "com" in Display name or user part of email address. So if abc@computer.tld has a smaller popularity count than any of the abc@xyz.com addresses with .com tld, the tld matches will be toplisted based solely on their popularity (although it's very unlikely that the user wants to find a specific address by searching for its tld!).

@Magnus:
- Is my description of this case correct?
- If yes, is it good/desired behaviour that after your patch, an address with
high popularityIndex but matching exclusively on tld
can still get sorting priority over
matches at the beginning of domain or subdomain name, but with lower popularityIndex?

> If I type "info" I want abc@information.tld listed before numerous
> info@xyz.tld matches, even if the abc@information.tld has a small popularity
> count.

I suspect the performance costs of trying to get that right would be too high, because it would require a personal blocklist of lower-ranking user names, and we can't really foretell the personal significance of such user names (perhaps there's very few info@ addresses, and perhaps one of them is very popular, and I prefer to find it that way...). It's also a good example showing why what we do here is probably more like trying to patch the worst side effects of the status quo, whereas imo a more sustainable and universal solution must be based on Bug 382415 (and Bug 325458), both of which provide for a personalized user experience of ranking results, as opposed to the one-for-all attempts here.
Flags: needinfo?(mkmelin+mozilla)
(In reply to Thomas D. from comment #53)
Thanks for your detailed thoughts.
> > If I type "info" ...
> I suspect the performance costs of trying to get that right would be too high,
> because it would require a personal blocklist of lower-ranking user names, ...
> ... a more sustainable and universal solution must be based on Bug 382415
Sounds reasonable!

Another thought, not sure if it was yet mentioned:
Often, if I type a short search string it takes too much time until the matches are listed, even if in the end my favourite was in top position. So my question:
Would it be possible to display first results before the db-query is completely finished, maybe by help of multiple db-queries on descending score levels.
The various comments by users with their own particular expectations raise this thought in my mind:  the permutations for a sort mechanism will never find a simple and universally understood, let alone generally popular, sorting index.

In contrast, the use patterns of a properly aged address book entry list, a true PopularityIndex (in lieu of the current AddressUseCount which is incorrectly labeled as a popularity index) would likely provide a universal sort solution that could be improved by some simple universal sorts that exclude TLDs (but not the entire domain name).

My thinking is that the benefits of any changes/improvements to the sort system would be masked by the efficacy that address list aging provides.

That final thought makes me wonder if the landing of this patch should come after the aging bug patch is landed?  With aging in place, a simple solution set/path would appear for making a fast operating sort system, one that eliminates the delays
(In reply to john ruskin from comment #56)
> The various comments by users with their own particular expectations raise
> this thought in my mind:  the permutations for a sort mechanism will never
> find a simple and universally understood, let alone generally popular,
> sorting index.

I'm truly astonished at how much thought has gone into this function.  All along I thought it was supposed to produce a simple alphabetical result.  Then it turns out that TBird was actually designed this way!

I had always assumed that it was just a horribly programmed function that produced inexplicable results.  Is it too much to hope that there's a switch introduced that would allow a simple alpha search result so that we alpha types could get on with our chores in a simple, understandable way?
Magnus, since this bug is all about the autocomplete results sorting algorithm, would this be the right place to fix Bug 325458 where result from contacts having searchphrase==nickname must get highest absolute priority in autocomplete results?
Rant: Looking at Bug 79013 Comment 5 (if true), it's also interesting to see how we're in the process of re-creating a search algorithm which looks quite similar to what TB already had in 2001...
Blocks: 325458
(In reply to José Josephus from comment #57)
> Is it too much to hope that there's a switch
> introduced that would allow a simple alpha search result so that we alpha
> types could get on with our chores in a simple, understandable way?

I've explained to José in an extensive private mail why what he considers "simple alphabetical search" is NOT simple at all in the real world of incomplete and varied address data across private, business, and intercultural scenarios and personal AB data and search preferences. That simplicity already fails when trying to pick the right field for alphabetical sorting between Display Name, First Name, and Last Name, which will always fail for large subsets of users. Introducing *optional* alphabetical sorting also comes with even more code and UI complexity which needs to be maintained; even if there was a viable plan I doubt we'd want/have resources for that. Instead, if we follow the plan of fixing more bugs in the old and current algorithm (as advocated by me) including this bug, there's a good chance that even the "alphabetical types" of users like José will actually be happy, regardless whether or not they understand how it works now or worked before.
(In reply to Thomas D. from comment #60)
> (In reply to José Josephus from comment #57)
> > Is it too much to hope that there's a switch
> > introduced that would allow a simple alpha search result so that we alpha
> > types could get on with our chores in a simple, understandable way?
> 
> I've explained to José in an extensive private mail why what he considers
> "simple alphabetical search" is NOT simple at all in the real world of
> incomplete and varied address data across private, business, and
> intercultural scenarios and personal AB data and search preferences. 

Moreover, we already *have* alphabetical recipient search, just use contacts side bar (F9 in composition), and enjoy alphabetical results lists. Autocomplete has been designed for finding a single recipient (the needle in the haystack), for which alphabetical sorting is much less suitable than just finding what users are actually looking for, with whatever algorithm is appropriate for that.
Is there a way to bring back the old habit? For example, I have a user which email starts with letters GM. With older Thunderbird version it worked when I stared typing GM and it was the first and always proper hit when I pressed enter. Now, it starts with addresses on gmail.com, username completly ignored from search so if I'm not careful I send email to wrong person.
(In reply to Robert from comment #62)
> Is there a way to bring back the old habit?

That's exactly what this bug is trying to do, but in more flexible ways.

> Now, it starts with addresses on gmail.com, username completly ignored from search so if I'm not
> careful I send email to wrong person.

No, user name isn't ignored but the popularityIndex of other cards is higher, i.e. historically, you've written more messages to those gmail contacts compared to the actual contact you're trying to find with "GM".
Summary: Recipient autocomplete: Explore idea of prioritizing "beginsWith" wordwise matches over "contains" matches: search for "be" should toplist "Ben" and "Betty" over "HolBErt" (having "be" in the middle of word) → Recipient autocomplete: Implement scoring algorithm to prioritize "beginsWith" wordwise matches over "contains" matches: search for "be" should toplist "Ben" and "Betty" over "HolBErt" (having "be" in the middle of word)
Replying to Thomas, in comment 63:

Thomas, the simplicity of entering recently used, commonly used addresses is no longer present.  The complexity of the search tool now in place and in combination with the AddressUseCount (incorrectly named "PopularityIndex") creates a morass for useage.  Going to a contacts list, with the required keystroke/mousestroke combinations, slows down what used to be a quick and easy use for many folks (me included)

I have an additional suggestion, which might provide a solution path for all users, despite their varied expectations.  Follow me, for a moment.

There are, now, and in the past, a set of meta searches; those MetaSearches are ranked, then sorted and displayed by rank.
The MetaSearches look in different places.  They include: a search of: eMailAddress; NickNames; DomainNames; (TLDs); FirstNames; LastNames; Last&First or First&Last; and etc.
The ranking also includes a reliance on the AddressUseCount
The choices for ordering the steps, and then ranking are now in the hands of the designed/programmer.

So this is my suggestion, in two parts.
First, enable the AddressUseCount aging which is found within Bug 1058583 "Address Book Popularity Index needs to age "

Second (and this is the essence of my suggestion, here):
For each of the MetaSearches programmed, create a preference which has a value of 0-n.
A value of "0" means that MetaSearch is not undertaken.
A non-zero value means that the search is undertaken.  They are undertaken in the order of the preference -- 1 is first and n is last.  MetaSearchPrefs can share a number, but there would be no expectation of ordering
Default values are the those that the programmer wants; the programmer obviously includes all the possible ones, throws out the ones he hates (setting its pref to "0"), and crafts an order of undertaking/execution for the rest.
The execution of any ranking (whether it is any of the MetaSearch or reliance on AddressUseCount), is executed in the sequence suggested by the MetaSearchPreference value

Now, here is the beauty of this.  Any user can shut down, or for that matter, reorder all the rankings, by merely changing the preference values, in about:config.   Realistically, someone will quickly create an Add-On which provides some guidance and means to elegantly and simply set values.  Realistically, folks will create directions for changing things to (whatever one wants in a search tool), and they will appear all over the internet.

My programming experience tells me that this global approach should be relatively easy to engage.   

An additional beauty is that as the designer search/sort is re-programmed (that's inevitable...), what is really happening is the changing/addition of MetaSearches, the order in which those MetaSearches are presented, and nature of the final sort.   -Each- of those steps occurs in some sequence, the MetaSearchPrefs value define that sequence, the MetaSearchPrefs are -always- accessible to users; and the user is ultimately in control.

With this simple and moderate change (from a fixed sequence to a sequence ordered by a MetaSearchPref Values), we address ("provide a means to cure") -all- of the problems that crop up in -all- of the bugs, suggestions and complaints that revolve about the autocomplete function.

I would create a separate bug, and make the same suggestion, there, creating the necessary, depends-on and see-also, but I would probably mess up the references.   Let me know if you want me to craft that bug.
John, thanks for providing specific suggestions here and on the popularity aging Bug 1058583, from your programming background, on possible ways of improving the autocomplete algorithm.

I don't feel in a position (yet) to judge the feasability of your proposals and I won't claim to fully understand them although I think I get the general direction of the idea.
I suspect they are still much less easy to implement than you think, and perhaps not so easy to use and understand for anyone except power users, and we'll certainly run into other unexpected problems.
Anyway, it wouldn't be my call because I certainly can't code them.
So I'm encouraging others to comment on John's proposals!

(In reply to john ruskin from comment #64)
> Replying to Thomas, in comment 63:
> 
> Thomas, the simplicity of entering recently used, commonly used addresses is
> no longer present.  The complexity of the search tool now in place and in
> combination with the AddressUseCount (incorrectly named "PopularityIndex")
> creates a morass for useage.  Going to a contacts list, with the required
> keystroke/mousestroke combinations, slows down what used to be a quick and
> easy use for many folks (me included)

I understand that this very bug here will fix that for most cases reported by users, by mimicking parts of the old order which users have come to expect.

From a socio-pragmatic, manpower resources and workflow pov, this bug already has a patch which most likely improves things a lot for a lot of users, and we need to deliver that fix fast because as you say, it's really troubling users right now. So I think we should land this first and then take it from there. Remember we're currently working exclusively on volunteer manpower, which is very limited.

And yes, please file your comment 64 as a new bug; it's good to have your idea linked and sketched here, but we definitely can't discuss or do that here. Feel free to add dependencies and see-also as you see fit after careful consideration. Thanks.
See new bug 1067681

In my view, the resolution of Bug 1057681, or this bug 970456, do not firmly depend on the resolution of each other.  However, it would be wise to enable this mechanism in the current changes suggested within this Bug 970456, and/or other bugs which address problems with the address autocomplete.

Once the suggested backbone contemplated in 1067681 is in place, current or other (future) changes would just follow the pattern set up as suggested there.

Thomas -- I am not sure whether or not to set dependencies, here or there, as I'm not quite sure how it impacts the process...   I'll leave it to you.
ooops.   Comment 66 should have said: 
     "In my view, the resolution of bug 1067681...
                                          ^

(In reply to john ruskin from comment #66)
> See new bug 1067681
> 
> In my view, the resolution of Bug 1057681, . . .
                                      ^
See Also: → 1067681
See Also: → 1058583
Blocks: 1068570
Comment on attachment 8485391 [details] [diff] [review]
bug970456_Be_Ben_autocomplete.patch

I'm not really getting the time to look at this. So passing it on. I think I said before that this should get tests, but just in case I didn't, now you know.
Attachment #8485391 - Flags: review?(standard8) → review?(neil)
Yes, I haven't had time to complete tests yet.
> Moreover, we already *have* alphabetical recipient search, just use contacts
> side bar (F9 in composition), and enjoy alphabetical results lists.
> Autocomplete has been designed for finding a single recipient (the needle in
> the haystack), for which alphabetical sorting is much less suitable than
> just finding what users are actually looking for, with whatever algorithm is
> appropriate for that.

FWIW, I'm commenting as an unsophisticated untrained TB user.  Since (for several years) I've always been uncertain of what the  "autocomplete" function was trying to do (it's not immediately obvious), I've organized my Address Book as  simple database, searchable on ONE name field (Display).  I don't use ANY of the other fields. This allows me to find any name in an organized, searchable, repeatable way.  Since nobody here in user-land has ever explained why autocomplete doesn't produce useable results, we out here have always assumed that it's an inherent TB bug, and for years we simply work around it.  I still work around it, having maintained my Address Book of 800 records, indexed only on the Display field.  After using TB for over 10 year, I've skipped the various search & find algorithms and now I depend on an alpha search only, based only on the contents of the Display field.

For me, it's now too late to reorganize my Address Book. I would now like it if Autocomplete would offer to search only on the Display field, in alphabetical order, just as the sidebar listing shows.  I regret that we're long past the time when I could have used Autocomplete as you would like it to be, but it's too late to try to understand what the developers are trying to do, and I'm now too old to start over again.
(In reply to Thomas D. from comment #53)
> @Magnus:
> - Is my description of this case correct?

Yes.

> - If yes, is it good/desired behaviour that after your patch, an address with
> high popularityIndex but matching exclusively on tld
> can still get sorting priority over
> matches at the beginning of domain or subdomain name, but with lower
> popularityIndex?

I think so. Higher popularity -> should be sorter higher up.
Flags: needinfo?(mkmelin+mozilla)
Comment on attachment 8485391 [details] [diff] [review]
bug970456_Be_Ben_autocomplete.patch

Review of attachment 8485391 [details] [diff] [review]:
-----------------------------------------------------------------

It occurred to me we shouldn't use the score at all unless the search string is at a word boundary. I'll try out that change.
Attachment #8485391 - Flags: review?(neil)
Blocks: 1069826
> I think so. Higher popularity -> should be sorter higher up.

Please!  Never mind popularity until the rest of the module is working properly.  At present, the entire module is deeply damaged.  Can the volunteers be persuaded to get back to basics?
A thought, since I am in algorithm mode, lately . . .
(and, not knowing how the address book is indexed, or the code for a search)

I wonder if a Bloom filter might be worthy in the context maintaining an address book, and of searches.  With an interesting set of hashes, and in combination with a properly aged PopularityIndex, the bloom filter's propensity for a small subset of false positives wouldn't matter, as they would find themselves lower on the resulting search list 

This enhancement to the address book if not part of the search, now, would probably speed up complex searches (eliminating some of the ugly waits...).  Thoughts? 

I'd appreciate an education on the search and index use for the address book, in any case....
Comment on attachment 8485391 [details] [diff] [review]
bug970456_Be_Ben_autocomplete.patch

Looks good to me also perf-wise. I tested it on the 20000 contacts AB and when looking for 'a' took about 5 seconds. Any longer string is faster as it yields less hits that need to be prioritized.
Attachment #8485391 - Flags: feedback+
Flags: needinfo?(vseerror)
Attachment #8485391 - Attachment is obsolete: true
Attachment #8492788 - Flags: review?(neil)
Comment on attachment 8492788 [details] [diff] [review]
bug970456_Be_Ben_autocomplete.patch

>+      aAddress = aAddress.substring(0, atIdx)
[Nit: slight preference for slice or substr (all three work the same when the first arg is zero)]

>+      if (prevCh == " " || prevCh == ":" || prevCh == "." || prevCh == "\"" ||
>+          prevCh == "(" || prevCh == "_" || prevCh == "-" || prevCh == "<") {
[This seems rather arbitrary. What if the search string isn't a word?]

>+      idx = Number.MAX_VALUE;
[As does this. Also, it's unclear that you're reusing the zero result of the indexOf as your "best" score. I think you should explicitly return your score values.]

>     result._collectedValues.set(emailAddress.toLocaleLowerCase(), {
>       value: emailAddress,
>       comment: commentColumn,
>       card: card,
>       isPrimaryEmail: isPrimaryEmail,
>       emailToUse: emailToUse,
>-      popularity: this._getPopularityIndex(directory, card)
>+      popularity: this._getPopularityIndex(directory, card),
>+      score: this._getScore(emailAddress, result.searchString)
[emailAddress.toLocaleLowerCase gets used three times, one here, one for _getScore, and one for _checkDuplicate. Would be nice if we could reduce the duplication!]

>+      result._searchResults.sort(function(a, b) {
>+        // Order by 1) ascending score, then 2) descending popularity,
>+        // then 3) primary email before secondary for the same card, then
>+        // 4) by differing cards sort by email.
>+        return (a.score - b.score) ||
>+               (b.popularity - a.popularity) ||
[This is confusing because a larger popularity is "better" but a larger score is "worse".]
(In reply to neil@parkwaycc.co.uk from comment #77)

>
> [This is confusing because a larger popularity is "better" but a larger
> score is "worse".]

Is there any chance we users could convince you developers that the Address Book is an utter shambles and almost unusable in its present state?  The Address Book needs to be repaired at the most simple, basic level so that it can be used for the simplest of searches.  If we could even use it for simple alphabetical searches, that would be a great help.

There doesn't seem to be any point to exploring exotic frequency-type searches if the Address Book is broken at even the most basic level.  Thanks, JJ
(In reply to José Josephus from comment #78)
> Is there any chance we users could convince you developers that the Address
> Book is an utter shambles and almost unusable in its present state?  The
> Address Book needs to be repaired at the most simple, basic level so that it
> can be used for the simplest of searches.  If we could even use it for
> simple alphabetical searches, that would be a great help.
I am not sure if this is what can be the first step in the direction of fixing what
you find broken: bug 1000775.
(In reply to neil@parkwaycc.co.uk from comment #77)
> >+      if (prevCh == " " || prevCh == ":" || prevCh == "." || prevCh == "\"" ||
> >+          prevCh == "(" || prevCh == "_" || prevCh == "-" || prevCh == "<") {
> [This seems rather arbitrary. What if the search string isn't a word?]

What else would it be? If someone enters the whole address or something we hardly need this score anyway.
Better?

Try was ok with it - https://tbpl.mozilla.org/?tree=Thunderbird-Try&rev=d99827c9f20e
Attachment #8492788 - Attachment is obsolete: true
Attachment #8492788 - Flags: review?(neil)
Attachment #8493272 - Flags: review?(neil)
Comment on attachment 8493272 [details] [diff] [review]
bug970456_Be_Ben_autocomplete.patch

>+      aAddress = aAddress.substr(0, atIdx)
Seems to be missing a semicolon.

>+      return Number.MAX_VALUE;
It might be a good idea to stick with values that fit in signed integer arithmetic?

>+    if (prevCh == " " || prevCh == ":" || prevCh == "." || prevCh == "\"" ||
>+        prevCh == "(" || prevCh == "_" || prevCh == "-" || prevCh == "<") {
How about /[_\W]/.test(prevCh) ?

(In reply to Magnus Melin from comment #80)
> If someone enters the whole address or something we hardly need this score anyway.

Fair enough.
(In reply to neil@parkwaycc.co.uk from comment #82)
> >+    if (prevCh == " " || prevCh == ":" || prevCh == "." || prevCh == "\"" ||
> >+        prevCh == "(" || prevCh == "_" || prevCh == "-" || prevCh == "<") {
> How about /[_\W]/.test(prevCh) ?

No, that would break non-latin searches, and "+" i don't want included.
But sure, I like the regexp idea in general.
Attachment #8493272 - Attachment is obsolete: true
Attachment #8493272 - Flags: review?(neil)
Attachment #8493937 - Flags: review?(neil)
hello

I think you complicate for not much!
Indeed, as a user I do not ask AutoCompletion so complicated
What we want is a research on the name, display name (email address beginning!)
Non search the domain name
How do you remember the domain name of your contacts?
I installed Thunderbird for many users (+50) and no application Search for domain name
Please. if you really really want to be thrust upon us, allow an option (advanced search!)
For example a normal search (name, display name) and if necessary a proposal for an extensive search on the email and / or domain name)

Jean-marie user and host on Geckozone forum (sorry for the translation:-()
*------------------------------------------*
Bonjour

Il me semble que vous vous compliquez pour pas grand chose !
En effet en tant qu'utilisateur je ne demande pas une auto-complétion si compliquée
Ce que nous voulons est une recherche sur le nom, prénom,nom à afficher  (debut d'adresse email ! ) 
En non une recherche sur le nom du domaine 
Comment voulez vous vous souvenir du nom de domaine de vos contacts ?
J'ai installé Thunderbird chez de nombreux utilisateurs (+50) et personne ne demande une recherche par nom de domaine 
SVP. si vous voulez à tout prix nous l'imposer, prévoyez une option (recherche étendue !) 
Par exemple une recherche normale (nom, prénom,nom à afficher) puis si besoin une proposition pour une recherche étendue sur l'email et/ou le nom de domaine)

Jean-marie utilisateur et animateur sur le forum geckozone  (désolé pour la traduction :-( )
Dear Thomas et al   switching to this thread.......I am now starting to "grok" ( Stranger in a Strange Land), your daunting task!   Regardless   for those of us simple folk...an alpha sort starting at the beginning of the line..... ..eg  'CAS' = "Caserta" as 1st choice

re Jose comment 70
I still work around it, having maintained my Address Book of 800 records, indexed only on the Display field.  After using TB for over 10 year, I've skipped the various search & find algorithms and now I depend on an alpha search only, based only on the contents of the Display field.

For me, it's now too late to reorganize my Address Book. I would now like it if Autocomplete would offer to search only on the Display field, in alphabetical order, just as the sidebar listing shows.  I regret that we're long past the time when I could have used Autocomplete as you would like it to be, but it's too late to try to understand what the developers are trying to do, and I'm now too old to start over again.
As one of the few that like finding an address in the address book from the end or middle of the address,
perhaps finding for the beginning AND finding by using ? in place of characters.  I'm a ham radio operator and have a very large address book of 'ham' by call letters.  These are usually one or 2 characters followed by numerals and then up to 3 characters.  i.e.  W3BNR or KA9XXX.  Normally the last letters are the ones we use when talking about our friends  So a search on XXX would return KA9XXX along with any other XXX calls. So when considering this as a bug, it does not affect every one.

A solution would be to use a search on the first characters And allowing a search such as ?XXX where XXX could be anyplace in the address.
Hello (Bonjour) 

Namely that there is always the possibility of making an address book search :-) 
Besides on this topic:
-This would be nice if the search string is not cleared when changing book!
-De Able to search all books (advanced search or advanced search)!

Jean-Marie  (Sorry for English translated by Google :-( ) 
*---------------------------------------------------------------*
Bonjour (hello) 

A savoir qu'il y a toujours la possiblité de faire une recherche d'adresse dans le carnet . :-) 
D'ailleurs à ce sujet :
-Ce serait bien que la chaine de recherche ne s'efface pas quand on change de carnet !
-De pouvoir faire une recherche dans tous les carnets (recherche étendue ou dans la recherche avancée )  !

Jean-Marie  (Désolé pour l'anglais traduit par google )
neil: ping on the review
neil: ping on finishing the review
I'm hiding comment 91 & 93 -- please do *not* recommend that people disable updates. Disabling updates is extremely dangerous in internet-exposed programs like browsers & email clients, which receive & render content from arbitrary 3rd parties on the internet.

If you disable updates, you'll likely be opening yourself up to security vulnerabilities down the line, which get, publicly disclosed, and then may be attempted-to-be-exploited by hackers & script kiddies.  (You will not have the fix, so you'll be vulnerable & can easily become exploited.)

I feel your frustration, and if you want to disable updates, that's your choice, but please don't recommend it as a quick-and-easy solution to this bug; others who take your advice may not know what they're risking when they do so.

Let's hope that this gets reviewed and fixed soon.  Setting needinfo=neil to indicate next action here.
Flags: needinfo?(neil)
Blocks: 1091675
Unbitrot. 
Neil, are you able to finish this review soon? If not, I'll try to find someone else
Attachment #8493937 - Attachment is obsolete: true
Attachment #8493937 - Flags: review?(neil)
Attachment #8519582 - Flags: review?(neil)
Comment on attachment 8519582 [details] [diff] [review]
bug970456_Be_Ben_autocomplete.patch

>+    if (/[ :."'(-_<&]/.test(prevCh)) {
Sorry for the delay. The only problem I can see here is that as written the - sign will be interpreted as a character range operator. r=me with that fixed.
Flags: needinfo?(neil)
Attachment #8519582 - Flags: review?(neil) → review+
https://hg.mozilla.org/comm-central/rev/98414962486a -> FIXED
Status: ASSIGNED → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Target Milestone: --- → Thunderbird 36.0
Comment on attachment 8519582 [details] [diff] [review]
bug970456_Be_Ben_autocomplete.patch

[Approval Request Comment]
Regression caused by (bug #): bug 529584 
User impact if declined: The autocomplete results ordering is not what one would expect
Testing completed (on c-c, etc.): landed on c-c
Risk to taking this patch (and alternatives if risky): should not be risky
Attachment #8519582 - Flags: approval-comm-esr31?
Attachment #8519582 - Flags: approval-comm-beta?
Attachment #8519582 - Flags: approval-comm-aurora?
Comment on attachment 8519582 [details] [diff] [review]
bug970456_Be_Ben_autocomplete.patch

This changes enough, that I think it should back for a bit before heading to esr.
Attachment #8519582 - Flags: approval-comm-aurora? → approval-comm-aurora+
Hi Folks
         So to confirm: you're saying 'resolved fixed' as such, 'all better' :-) ?
Works Great   THANK YOU :-)))
Dear coders
      I spoke too soon.   What 'was' in the Prior Builds, and I ask again, as others, is to put it back to an exact alpha based upon the Display Name!

I repost from Septemeber

Regardless   for those of us simple folk...an alpha sort starting at the beginning of the line..... ..eg  'CAS' = "Caserta" as 1st choice

re Jose comment 70
I still work around it, having maintained my Address Book of 800 records, indexed only on the Display field.  After using TB for over 10 year, I've skipped the various search & find algorithms and now I depend on an alpha search only, based only on the contents of the Display field.

For me, it's now too late to reorganize my Address Book. I would now like it if Autocomplete would offer to search only on the Display field, in alphabetical order, just as the sidebar listing shows.  I regret that we're long past the time when I could have used Autocomplete as you would like it to be, but it's too late to try to understand what the developers are trying to do, and I'm now too old to start over again.
Matt, I do not understand your comment.
I think you have confirmed it works now in the nightly build starting from comment 103. It IS NOT YET in TB 31.x.
So what exactly do you request now?
HI
   Its not working perfectly as before with build 24 and earlier........Its sometimes not doing an exact sort starting with the 1st letter   eg  I have an entry of the last name 'Angiulli" and its FIRST searching the "email address" rather than than the 'Display Name' as its bring up 1st : Caserta, John <jangeloo@comcast.net>

As you can see  " ang " is the 2nd,3rd, and 4th letters in the email address :-(

So what I'm asking, and others, as before, is to implement, as in 5-10 years of earlier builds, an 'exact alpha sort from the 'Display Name'  not the Email address!  OR at least, make the 1st result come from the Display Name and not the Email Address !  Pretty Please !
That's the way the patch work yes: display name take precedence.
here I see "Status: RESOLVED FIXED" but I'm on the release channel and my TB 31.3.0 says it's "up to date" while the problem is not fixed therein.

I would appreciate it if someone would please tell me how to revert to previous version of TB while waiting for fix to be released.

Once again, I just almost sent an email to the wrong person.
Status tracks current trunk (development builds). Anyway, if this gets approval-comm-esr31+ it will land for the next general release too (31.4.0).
Thanks, Magnus!  Very much looking forward to 31.4.0.  May revert to 31.2.0 from http://ftp.mozilla.org/pub/mozilla.org/thunderbird/releases/31.2.0/win32/en-US/ in the meantime.
Comment on attachment 8519582 [details] [diff] [review]
bug970456_Be_Ben_autocomplete.patch

Looks like this is already on nightly & aurora, so a=me for beta.
Attachment #8519582 - Flags: approval-comm-beta?
Attachment #8519582 - Flags: approval-comm-beta+
Attachment #8519582 - Flags: approval-comm-aurora+
No longer blocks: 1069826
No longer blocks: 1068570
Dear Magnus et al
    Just to be clear:   you said " That's the way the patch work yes: display name take precedence."
But its not.  The 'Display Name ' is NOT the 1st entry in my real world example, but rather the 'email address'.
(In reply to Matt Miller from comment #119)
> Dear Magnus et al
>     Just to be clear:   you said " That's the way the patch work yes:
> display name take precedence."
> But its not.  The 'Display Name ' is NOT the 1st entry in my real world
> example, but rather the 'email address'.

Matt, you're using 31.3.0 and on that version the bug is NOT YET fixed.
You need to install TB 36(!) to test the fix in action!

On TB 36 and above, we check if the entire search string matches at the beginning of word-like substrings of the full address, consisting of Display Name and email. Such matches will score high and become top-listed. According to comment in patch, this is the new sort order:

+        // Order by 1) descending score, then 2) descending popularity,
+        // then 3) primary email before secondary for the same card, then
+        // 4) by differing cards sort by email.

NB can somebody explain the meaning of 4)?


(In reply to Matt Miller from comment #108)
> So what I'm asking, and others, as before, is to implement, as in 5-10 years
> of earlier builds, an 'exact alpha sort from the 'Display Name'  not the
> Email address!

Pls refrain from wrong claims about the previous behaviour still seen in current release version TB 31.3. Exact alpha sort hasn't been the topmost sort criterion for a long time.
Current behaviour sorts FIRST according to highest popularityIndex, and only if that's the same, the fallback (2ndary) sort criterion is alphabetical. Which might, depending on your personal data structure and use patterns, *appear* to effect an allover alphabetical sorting, but it's NOT technically and hasn't been for long.

> OR at least, make the 1st result come from the Display Name
> and not the Email Address !  Pretty Please !

Yes, in the new algorithm, starting from TB36, better scores involving display name (as explained above, and in patch comments) will take precedence over popularityIndex. Again, that's not simply alphabetical, but will for many simple scenarios and data structures have effects might be similar to alphabetical sorting.

Remember, autocomplete is NOT for showing alphabetical lists; the main purpose of the autocomplete widget in recipient area is to efficiently return a *single* contact, i.e. to find the needle in the haystack. For that sake, alphabetical listings will fail for many real life datasets as soon as you have a few more people having same or similar Display names, which would force you to enter many more characters to narrow down, which is not efficient. Instead, we've designed it so that you can succeed with typing very few unique characters from anywhere in relevant card fields. So e.g. to find Matt Miller, just typing "ma mi" has a very good chance of returning just that single target card, if that card's popularity index is higher than other matching cards. Defining explicit unique nicknames (in nickname field of contact properties) is another way of reliably retrieving cards with your personal short alias, e.g. type "ma#" and ALWAYS get Matt Miller ONLY. We're still working on some bugs in nickname design, but for truly unique nicknames it already works as expected.

If you want to pick names from alphabetical listings, please use address book or, during composition, contacts side bar (F9), which is a miniature AB where strictly alphabetical sorting of matching results is available.
Whiteboard: [NOT yet fixed for TB release channel (TB 31.3.0), but fix pending]
(In reply to Thomas D. from comment #120)
> (In reply to Matt Miller from comment #119)
> > Dear Magnus et al
> >     Just to be clear:   you said " That's the way the patch work yes:
> > display name take precedence."
> > But its not.  The 'Display Name ' is NOT the 1st entry in my real world
> > example, but rather the 'email address'.
> 
> Matt, you're using 31.3.0 and on that version the bug is NOT YET fixed.
> You need to install TB 36(!) to test the fix in action!

TB36 is currently available as Earlybird from https://www.mozilla.org/en-US/thunderbird/channel/
(In reply to Wayne Mery (:wsmwk) from comment #121)
> (In reply to Thomas D. from comment #120)
> > (In reply to Matt Miller from comment #119)
> > > Dear Magnus et al
> > >     Just to be clear:   you said " That's the way the patch work yes:
> > > display name take precedence."
> > > But its not.  The 'Display Name ' is NOT the 1st entry in my real world
> > > example, but rather the 'email address'.
> > 
> > Matt, you're using 31.3.0 and on that version the bug is NOT YET fixed.
> > You need to install TB 36(!) to test the fix in action!
> 
> TB36 is currently available as Earlybird from
> https://www.mozilla.org/en-US/thunderbird/channel/

AHA  :-)))  Thank you!!!!!!!   NOW for "Lightning" to work with this Release  :-)
(In reply to Thomas D. from comment #120)
> +        // Order by 1) descending score, then 2) descending popularity,
> +        // then 3) primary email before secondary for the same card, then
> +        // 4) by differing cards sort by email.
> 
> NB can somebody explain the meaning of 4)?

That was not the best wording, but it meant to say alphabetically ordered (as a lasts resort).
(In reply to Matt Miller from comment #122)
> > You need to install TB 36(!) 
> AHA  :-)))  Thank you!!!!!!!  NOW for "Lightning" to work with this Release :-)

TB 36 is an Alpha Release, which you'll use at your own risk (but it usually works fine).
To make Lightning work with that (again, at your own risk, might fail here and there):
1 uninstall lightning addon
2 download lightning addon (let's say the file is called lightning.xpi)
3 make your windows explorer show file extensions (google how)
4 rename lightning.xpi to lightning.zip
5 extract all files into folder lightning-x
6
(In reply to Thomas D. from comment #124)
Premature commit, sorry

This instruction is a Christmas present. We don't usually do support on bugs :)

> (In reply to Matt Miller from comment #122)
> > > You need to install TB 36(!) 
> > AHA  :-)))  Thank you!!!!!!!  NOW for "Lightning" to work with this Release :-)
> 
> TB 36 is an Alpha Release, which you'll use at your own risk (but it usually
> works fine).
> To make Lightning work with that (again, at your own risk, might fail here
> and there), you can try tweaking the TB version number which Lightning will accept to run on:

1 uninstall lightning addon
2 download lightning addon (let's say the install file is called lightning.xpi), make sure to save to your harddisk, not install
3 make your windows explorer show file extensions (google how)
4 rename lightning.xpi to lightning.zip
5 extract all files of lightning.zip into folder lightning-x
6 edit lightning-x/install.rdf with text editor
7 change the version number xxx (probably 31.x) of <em:maxVersion>xxx</em:maxVersion>
to become <em:maxVersion>38</em:maxVersion>

>        <em:id>{3550f703-e582-4d05-9a08-453d09bdfdc6}</em:id> <!-- thunderbird -->
>        <em:minVersion>5.0</em:minVersion>
>        <em:maxVersion>31.x</em:maxVersion>

8 save install.rdf
9 re-zip entire content of lightning-x folder into lightning-new.zip
10 rename lightning-new.zip to lightning-new.xpi
11 install lightning.xpi from Addons Tab:
- click on gear icon "tools for all addons" button, upper-right corner of addon-tab
- install addon from file...
- pick lightning-new.xpi (your modified version)
- restart TB

Good luck. If it works, pat yourself on the back :)
Lightning contains binary parts, so unzipping and changing compatibility and all that is not a good idea - it won't work at all most of the time. You probably want an xpi from here: https://ftp.mozilla.org/pub/mozilla.org/calendar/lightning/nightly/latest-comm-aurora/
Since this bug has been in existence since February and fixed in November, why was I recently upgraded onto TB 31.3.0 which is useless for Autocomplete and tabbing between elements?

I have 19500 people in my address book and my previous TB version worked just fine!

How am I supposed to install this "patch" thing on my TB 31.3.0.  ?

I even tried reinstalling a lower version, something I'm sure I've done in the past after a problematic new version (I've been with TB for years, sometimes using Beta versions), but I still ended up with 31.3.0.  Guess I've forgotten how to do that or the process has changed.

I already had MoreFunctionsForAddressBook 0.7.1 and I have now ticked the Autocomplete option "For autocomplete match just the beginning of the entries" and this seems to help, but I can't be sure as the problem was intermittent.
(In reply to Peter Reynolds from comment #127)
> Since this bug has been in existence since February and fixed in November,
> why was I recently upgraded onto TB 31.3.0 which is useless for Autocomplete
> and tabbing between elements?
> How am I supposed to install this "patch" thing on my TB 31.3.0.  ?

See comment 111 and the Whiteboard field.
We can't push any random bug fix into the stable release channel before testing it and evaluating the risk on the development channels. So it needs some time. But we want to get it into the next TB31.x.
I appreciate that - I just don't see why 31.3 would be released with such major issues.
Also, the MoreFunctionsForAddressBook fix does not put things back how they were as it only works with the beginning of *email addresses*.  I was previously able to type someone's *name* and get an intelligent response, or type someone's *domain* and get an intelligent response.  In neither case did this involve several minutes wait and much increased CPU and Memory usage as is the case in 31.3.
Why should we hold back TB31.3 with other important fixes just because this one was not yet ready? You hopefully do not want to punish all other users.

If you have several minutes of wait in TB31 please file that as a new bug with specific patterns that you use to autocomplete. Such times are not expected. In our tests we only got SECONDS and try to even reduce those.
I tried TB 36 but couldn't work out what was happening there - I didn't seem to be getting any autocomplete suggestions at all despite having Autocomplete set to look in both the offered places.  So I've reverted to TB 24.8.1 which it appears may be the last release version with the old design.  I hadn't realised the versions in between were Betas and that some had the new design.
Attachment #8519582 - Flags: approval-comm-esr31? → approval-comm-esr31+
This bug may have been related to bug 1107844 https://bugzilla.mozilla.org/show_bug.cgi?id=1107844 ... please make sure the fix of that one doesn't undo the fix of this one.
That's pretty unrelated to this one.
Whiteboard: [NOT yet fixed for TB release channel (TB 31.3.0), but fix pending] → [fixed for thunderbird 31.4.0 and up]
Both bugs changed email addressing result for people using their own longstanding keyboard and mouse click routines to address email.  So both bugs have resulted in people sending email to wrong addresses.
Well 31.4.0 is better than 31.3.0, but doesn't get us back to the autocomplete behavior of 31.2.0.

31.2.0 prioritized matches to the beginning of Nickname.
31.4.0 prioritizes matches to several other fields, and some matches in the middle of fields, over that.

having Nicknames like "aa" and "bb" and "cc" was a great way to make addressing easy.

FWIW.
(In reply to D Holzmman from comment #138)
> 31.2.0 prioritized matches to the beginning of Nickname.
> 31.4.0 prioritizes matches to several other fields, and some matches in the
> middle of fields, over that.
> 
> having Nicknames like "aa" and "bb" and "cc" was a great way to make
> addressing easy.
Ya. That is the issue you see! There was a "not-so-good" design and humans, as they always do,
found a workaround for it. Now, when its fixed, it'll take some time before users accept it
and once that happens, users will be happy with it :)

Thanks.
You know, Nicknames, being invented by humans, are easy for humans to remember.  So I think they should get priority.  I have a Nickname in my Address Book that begins with "bbbb".  Typing "bb" used to work.  Now I type "bbb" (to get TB to fill in the automatic match).  Must it always be humans versus designers?
(In reply to D Holzmman from comment #140)
> You know, Nicknames, being invented by humans, are easy for humans to
> remember.  So I think they should get priority.  I have a Nickname in my
> Address Book that begins with "bbbb".  Typing "bb" used to work.  Now I type
> "bbb" (to get TB to fill in the automatic match).  Must it always be humans
> versus designers?

The fact is, designers are humans as well. And we trust, you'll find how to use
it efficiently as well ;)
Hi,

starting with 31.4.0 I get a lot of complaints from my users.

I guess it is related to this change, but I am not sure how/why ;)

Sort order of autocompleted addresses is not as expected.

Examples:

I enter "adm"

I get result in this order:

admin@mail.example.com   (from collected addresses, seldom used)
sap-admin <sap-admin@example.com> (LDAP, used often)
MVTec System Administration <admin@example.com> (LDAP, used often)
lc-admin <lc-admin@example.com> (LDAP, seldom used)

I would have expected this order:

MVTec System Administration <admin@example.com> (LDAP, used often) (first because it starts with "adm" and is used more often)
admin@mail.example.com   (from collected addresses, seldom used)
sap-admin <sap-admin@example.com> (LDAP, used often) (after admin@mail.example.com because it only contains "adm")
lc-admin <lc-admin@example.com> (LDAP, seldom used)



I understand why admin@mail.example.com is more to the front than in the past, but I don't understand why it is even in front of "admin@example.com"...

Thanks,
- Oliver
(In reply to Oliver from comment #142)
> I understand why admin@mail.example.com is more to the front than in the
> past, but I don't understand why it is even in front of
> "admin@example.com"...

Ahhh, got it. We are not talking about the beginning of the *email address* but the beginning of *Name* of the address book entry. Bummer. So users get lots and lots of random "collected addresses" in front of the addresses they really use all day... :(

Regards,
- Oliver
(In reply to Oliver from comment #142)
> I get result in this order:
> 
> admin@mail.example.com   (from collected addresses, seldom used)
> sap-admin <sap-admin@example.com> (LDAP, used often)
> MVTec System Administration <admin@example.com> (LDAP, used often)
> lc-admin <lc-admin@example.com> (LDAP, seldom used)

FYI local addresses will always appear before LDAP addresses.
(In reply to Suyash Agarwal (:sshagarwal) from comment #139)
> (In reply to D Holzmman from comment #138)
> > 31.2.0 prioritized matches to the beginning of Nickname.

I'm always surprised how users come up with all sorts of imaginary ideas about the sorting algorithm.
FTR: TB24 (and probably before, too) through to TB 31.4.0 (current), none of these versions has ever "prioritized matches to the beginning of Nickname" in the sorting of results (although it might have appeared so for your particular usage).

> > 31.4.0 prioritizes matches to several other fields, and some matches in the
> > middle of fields, over that.

I think I understand what you're trying to say, but again that statement in itself is somewhat misleading, insufficient, and fuzzy. Ordering of results, after the patch for this bug landed, is only indirectly related to field content, in that we check if the full address "John Doe <johnny@foo.bar>" contains the full search string "John D" and prioritize results where it's found at word-like boundaries inside the full address, e.g. at the beginning of display name. So fields like Display name from which the full address is constructed are involved, but otherwise matches to particular fields are not prioritized as such and it depends much more on the *pattern* of matching.

> > having Nicknames like "aa" and "bb" and "cc" was a great way to make
> > addressing easy.

Indeed. The concept of explicitly user-defined nicknames (manually pre-defined "shortcuts") certainly has its limitations, but it's a very efficient and 100% reliable way of retrieving a limited set of "favorite/preferential/special" addresses (even when favorite not necessarily coincides with most frequently and most recently used). Unfortunately, it's quite broken in TB, see below.

OT: I use the FF equivalent of stable keyword searches for much of my BMO work and it works like a charm with an unsurpassed efficiency and reliability (e.g. I would simply type "tba auto comp ben" into location bar to find this particular bug, and my bookmark with "tba" search keyword auto-resolves that to "https://bugzilla.mozilla.org/buglist.cgi?quicksearch=ALL :thun,mailn auto comp ben".

> Ya. That is the issue you see! There was a "not-so-good" design and humans,
> as they always do,
> found a workaround for it. Now, when its fixed, it'll take some time before
> users accept it
> and once that happens, users will be happy with it :)

Suyash, that answer certainly has some general truth but misses at least part of the point :)
D Holzmann reported that full nickname matches for nicknames like "aa" "bb" etc. do no longer work as expected because other matching results are toplisted in autocomplete results.
The "no longer" can be explained by potentially expanded result sets for single, short searchwords because we now search *bb* instead of former bb*. So "bb" might now match on "John Doe (House 4, subbasement)" where it didn't before. But that's not the real issue here, it just makes an old bug occur more frequently. The design of the nickname feature has been broken for a long time (and we should really fix that now before it matures into another wave of superfluous bug reports):

Bug 325458 Recipient Autocomplete: Nickname does not get highest precedence for matching address book entries, for searchphrase==nickname [To, CC, addressing field/area, toplisted, priority, results]

Fixing that bug would finally ensure that nicknames actually work as expected per their inherent design as addressing shortcuts. I suspect that fixing that might be relatively easy and it might occur in the very same code context as the patch for this bug.

D Holzmann, until Bug 325458 is fixed, use truly unique nicknames as a workaround, so that there's only one matching result: something like "aaa", "aa#", "+aa", etc.

> Thanks.

Dito. :)
(In reply to Thomas D. from comment #145)
> Bug 325458 Recipient Autocomplete: Nickname does not get highest precedence
> for matching address book entries, for searchphrase==nickname [To, CC,
> addressing field/area, toplisted, priority, results]
> 
> Fixing that bug would finally ensure that nicknames actually work as
> expected per their inherent design as addressing shortcuts. I suspect that
> fixing that might be relatively easy and it might occur in the very same
> code context as the patch for this bug.

Oh, that's still valid? I thought it was also fixed by this bug itself.
Is someone working on it atm? :D
Hi,

I just lately figured out what's happening. Autocomplete suddenly started to behave completely randomly, showing people I haven't emailed for a year as the first suggestion. 

After the analysis, I believe you have not considered the non-ascii case properly and therefore new solution is horrible for non-english speakers with combinations of transliterated addresses in their contact list.

Here are two addresses
Dušan Something <dusan@domain.com> (whom I email 10x a day)
Dusan Smith <dsmith@otherdomain.com> (whom I last emailed a year ago)


As usual I start typing "dus" which is the beginning of the email of the person. The new algorithm then suggests Dusan Smith, which is definitely not someone I would expect as the first choice.


Is there a switch to get back to the old matching? This is really broken for languages with a lot of transliteration.
WHY DOES THIS BUG REPORT SAY.....

Status: 	RESOLVED FIXED
Whiteboard: 	[fixed for thunderbird 31.4.0 and up] 

I am running  31.4.0 and it is most certainly not fixed.  In fact, somehow.... it has become worse, if that is possible.  Is it possible to GO BACK to a previous version?  

This is a terrible situation!!!
Andraz: if you type dus you get stuff with dus. If you meant Duš, why not write it? Anyway, that's purely luck you (ever) found the Dušan address typing dus (the luck being that dus is in the email address but not in the display name). 

caca33: Because it is, fixed! If you have another issue, file another bug. (and no, there's no pref to revert)
Thank you kindly, Mangus, for the quick reply.  

My question was kind of a rhetorical question because, as I mentioned, I am running the most recent version and am still experiencing the problem.  In fact, about a week ago, when I upgraded to 31.4.0 the problem started manifesting a bit differently, and worse, if that's possible.

May I ask:  beyond running the most up-to-date version, what is there to do to resolve this?  Perhaps I missed something.  

Thank you again....
This has been a fun ride. Anyway, I upgraded to 31.4, and was happy to see that matches at the beginning of names/addresses were being prioritized to the top of the list. 

Until it stopped working. At some point, I noticed that it started working like it did prior to 31.4. For example, typing gs should have been bringing up gsicignano@xyz.com, but instead it started putting gregsolo@****.net.

I tried it multiple times. I created a new message and tried it again. Finally I exited out of TB and restarted, and tried it again. Now it was putting gsicignano@xyz.com at the top.

Is that even possible? Is TB dynamically configuring how this works, randomly, just to mess with our heads?
caca33: file a new bug with detailed description. You can also see if there's any difference compared to nightly builds - http://ftp.mozilla.org/pub/mozilla.org/thunderbird/nightly/latest-comm-central/

Mark: I don't think it should be possible for the *order* to ever be different.
Magnus, I wouldn't think that would be the case either, but I will keep an eye on it. At first I thought my eyes were playing tricks on me, and so I went through the motions described above. I should have screen snapped it or made a screencast. If it happens again, I will try to do so and create a new bug report.
(In reply to Magnus Melin from comment #149)
> Andraz: if you type dus you get stuff with dus. If you meant Duš, why not
> write it? Anyway, that's purely luck you (ever) found the Dušan address
> typing dus (the luck being that dus is in the email address but not in the
> display name). 

i'm thinking it's because typing special characters (like "š") is difficult.

also, i didn't think TB was supposed to be running on "luck";

and, i would like to see email address matches prioritized over name matches
(sometimes you have to break a tie, no?)
(Internet routers don't read names, they read email addresses.  
Name entries can have typos and emails will still get to their destinations.)
Things are better with 31.4.0 (with TB 31.3.0, I almost sent things meant for my wife to my niece!), but FWIW here’s the beginning  of the list (in order, *<-- with comments) of matches I’m getting now when I type “bb” (the first two characters of the Nickname for my wife):

BBQinvite <BBQinvite> 
BBQinviteMore <BBQinviteMore>
bbxxxx@verizon.com
bbxxx1020@aol.com
bbxxxxxx@xxxxx.xx.gov
bblunch@xxxxxxxxxxxxxx.org
bbxxxx@xxxxxxxxxxxxx.com
Yxxxxe Rxxxx <bbraxxxx@verizon.net>
Elizabeth (Betsy) Bxxxxx <bbxxxxx@xxxxxxxxxxxxxxxxxxxx.xxx>
Bxxxxx Yxxxxx <bbybby@xxxxxxx.xxx>  *<-- sometimes appears first on the list, notably when I backspace over the first try to try again, perhaps because it contains “bb” twice in the Email address?
'bbuxxxxx@xxxxxx.com' <bbuxxxxx@xxxxxx.com>
Barbara Bxxxxx <BBxxxxx@xxxx.org>
Betsy Bxxxxx <bbxxxxx@xxxxxxx.org>
Bxxxxxy Bxxxx <bbxxxxxxx@yahoo.com>
Bob Bxxxxxxxl <bbxxxxxx@xxxxx.xx.gov>
Bonnie Bxxxxxx <bbxxxxxx@xxx.xxxx.edu>
Elizabeth Bxxxx <bbxx@xxxxxxxxxxx.com>
Abby Xxxxxxxx <abbyxxxxxxxx@xxxxx.xxx>  *<-- my niece
Dxxxxx Vxxxxx <jdvxxxxx@xxxxx.com>
Debbie DxxxxXxxxxx <debbie.dxxxxxxxxxx@xxxxxy.org>
Xxxxxx Xxxxx Robb <xxxxxx@xxxxx.com>
Debbie Xxxxxxx <Xxxxxxxxxx@aol.com>
Xxxxxx Xxxx <Xxxxxx@xxxxxxxxxx.xxx>  *<-- this is the address that I was looking for (my wife, whose TB Nickname begins with bb), the address that reliably popped up first in TB 31.2.0
Xxxxxx Xxxxx Robb <xxx@xxxxxxxxxx.org>
Gary X. Xxxbbxx <xxxxbbxx@xxxxxxxxx.com>
Jeff Xxxxbbxxxxx <jeff@xxxxxxxxxxxxxxx.com>
Michael Xxbbxx <michael@xxxxxxxxx.com>
Mike Xxbbxx <mike.xxbbxx.xx@xxxxx.com>
Gary Xxxbbxx <GXxxbbxx@xxxxxxxxx.com>
Abby Xxxxxxxx <xxxxxxxx@xxxxxxxxxx.xxx>  *<-- another address for my niece
(In reply to D Holzmman from comment #154)
> i'm thinking it's because typing special characters (like "š") is difficult.

I haven't tried this, but you can add "Dusan Something" as a nickname for "Dušan Something" and see if it then autocompletes before "Dusan Smith"...
I'm going to chime in here as NOTFIXED for me in 31.4.
(whoops, above comment submitted prematurely... still gathering reproduction data)
Also +1 for frecency: Bug 382415
So my wife (running 31.4.0), who emails me multiple times a day, yet has a few different Joshes in her PAB and CAB, types J and gets a bunch of Joshes (not me) whom she has not emailed in years. She has the same problem with many of the people she emails every day. (she is ready to throw the computer out the window). Oddly enough, I am also running 31.4.0 and I do not have this problem. Is there some sort of manual way of resetting popularityIndex? Either way this feels like a regression to her, and I agree.
An interesting collection of the many recent comments, post "resolved" -- and I raise the attention of recent comments to these bugs/enhancements:
   1) Bug 1067681 - Create MetaSearch rank order via Prefs for Address Autocomplete
and
   2) Bug 1058583 - Address Book Popularity Index needs to age
as solution paths to the most current observations on this bug 970456
Josh: I don't think there's a way to see/reset PopularityIndex, short of deleting and recreating a contact.
If you can reproduce on your computer (if you copy over the .mab files), file a new bug with details.
It appears that Thomas D's recommendation to do wordwise searching (comment 6) is not part of 31.4.0. Given the display name's higher precedence, contacts with display names of the form "Last, First" are effectively demoted when searching for first name, even if they have higher popularity. For example, typing "rich" pre 31.4.0 I got the following display names:

Beta, Rich [used often]
Alpha, Richard [used sometimes]
Richard Gamma [used never]

which is exactly the order I expected. With 31.4.0 I now get:

Richard Gamma [used never]
Beta, Rich [used often]
Alpha, Richard [used sometimes]

startsWith() over contains() made sense, but its use combined with elevated Display Name seems to go too far. Why elevate any match over another? When a user starts typing, how does anyone know if they are trying to match email address, first name, last name, display name, nickname?
The autocomplete in 31.4.0 is broken. Even after using it for several weeks, it doesn't ever show a name based on recent use, like the earlier versions did. so I always have to type nearly the complete email address, or scroll down the list to pick the correct one.
I've downgraded to 31.3.0 now, and it's back to normal: I need to type max 3 characters to get the email I want based on recent use.
I tried reverting to 31.2.0, but that didn't seem to fix the problem.
I didn't check whether "recent use" had come to matter more like it should,
but typing the first two characters of "nickname" didn't work like it had before.
I now suspect that the problem persists via code or counters in the address book, rather than (or in addition to) in the program.
So what was wrong with the address book in TB 24 and previous?  Another thing - is there any kind of editor that will streamline one's address book by deleting all the unnecessary never used fields in the header?
(In reply to Peter Reynolds from comment #166)
> So what was wrong with the address book in TB 24 and previous?
Nothing.  I'm wondering if TB 24.3.0 (and maybe 24.4.0) modified it, or put disruptive prioritization information into it.
Whoops.  I mean "31.3.0 (and maybe 31.4.0)"

(In reply to D Holzmman from comment #167)
> (In reply to Peter Reynolds from comment #166)
> > So what was wrong with the address book in TB 24 and previous?
> Nothing.  I'm wondering if TB 24.3.0 (and maybe 24.4.0) modified it, or put
> disruptive prioritization information into it.
(In reply to D Holzmman from comment #167)
> (In reply to Peter Reynolds from comment #166)
> > So what was wrong with the address book in TB 24 and previous?
> Nothing.  I'm wondering if TB 24.3.0 (and maybe 24.4.0) modified it, or put
> disruptive prioritization information into it.

Address book is relatively unchanged between version 24 and 31. Or for that matter between 17 and 24.  The bulk of the work that affected version 31 address completion, both positive and negative, was merely from the move to Toolkit Autcomplete. It's been a challenging process, but it puts us on a modern code platform, with much improved capabilities. 

If you have any issues that still affect you in Thunderbird that don't already have a matching bug report (there are a few yet to be fixed), you should please file a new bug report - so that comments don't happen (here) in an already fixed (this) bug. If you have questions that are not in the nature of a bug, please post in a support forum https://wiki.mozilla.org/Thunderbird/CommunicationChannels
Re ordering, toolkit autocomplete didn't change much. That's a change due to bug 529584.
(In reply to Wayne Mery (:wsmwk) from comment #169)
> If you have any issues that still affect you in Thunderbird that don't
> already have a matching bug report (there are a few yet to be fixed), you
> should please file a new bug report - so that comments don't happen (here)
> in an already fixed (this) bug. If you have questions that are not in the
> nature of a bug, please post in a support forum
> https://wiki.mozilla.org/Thunderbird/CommunicationChannels

I submitted the new bug https://bugzilla.mozilla.org/show_bug.cgi?id=1134986
Blocks: 1134986
FTR: This bug effectively breaks (disregards) popularityIndex for all scenarios where search word happens to match the beginning of some unpopular Display Name. No matter how popular, TB will never toplist the desired entry as long as other, less popular matches get prioritized by this Display Name scoring algorithm.

From user feedback and logical analysis, I believe this algorithm is not sustainable.
Best solution is real "frecency" algorithm (bug 382415) which considers the actual input combined with frecency of chosen result, but we should consider anything which can at least fully re-enable the dull popularityIndex in the meantime.
Whiteboard: [fixed for thunderbird 31.4.0 and up] → [fixed for thunderbird 31.4.0 and up][ignores popularityIndex if less popular display name begins with search word]
I concur, Thomas.   And, in a spate of deja-vu, point back to the two suggestions mentioned in comment 161, on aging the popularity index and allowing a user manageable metasearch rank order.
I found myself on the list for this bug having reported the same issue.  As a database software developer, I have followed this process with interest and am quite impressed with the collaboration.

As often happens, things are more complex than they would first seem and I can definitely see that here.  In version 31.6.0, I do notice one odd thing - if I type the letter A, I get names starting with A that seem to be in popularity order.  But then the list continues in what seems to be some random order to list names that start with different letters.  For example, after all the A's run out, I get a name that starts with P that hasn't been used in well over 10 years.  Very odd...
That is likely bug 1134986.
Hi,
in current TB 31.6.0 I have those 2 addresses:
Personal addressbook: Eva Maya Kerkmeyer <EMK@EMKerkmeyer.com>
Collected addressbook: emkerk@test.com
Normally I use the 1st one, so the score should be high.
Since I have the 2nd one in the other addressbook, autocomplete sets the 2nd one in first position if I type "emk".
Is that as designed? Would is work better, if this bug is fixed?
We're still working to improve the algorithm for sorting recipient autocomplete results, e.g. in bug 1134986 which should land for TB38. We do not have the manpower to do all the desirable improvements at once, like introducing a more sophisticated algorithm of "frecency" (Bug 382415; also bug 1058583), similar to Firefox location bar. There are plans/ideas to make autocomplete search results and their order more customizable (e.g. Bug 118624 currently wants to make model query a customizable pref so users can chose which fields to search; bug 1067681 customizable meta order for search results). Please read this comment and relevant bugs linked at the top of this bug before commenting on problems of autocomplete results order here.

(In reply to Graham Smith from comment #174)
> I found myself on the list for this bug having reported the same issue.  As
> a database software developer, I have followed this process with interest
> and am quite impressed with the collaboration.

Thanks :)

> As often happens, things are more complex than they would first seem and I
> can definitely see that here.

So true...

> In version 31.6.0, I do notice one odd thing
> - if I type the letter A, I get names starting with A that seem to be in
> popularity order.  But then the list continues in what seems to be some
> random order to list names that start with different letters.  For example,
> after all the A's run out, I get a name that starts with P that hasn't been
> used in well over 10 years.  Very odd...

Bug 529584 introduced a *foo* matching pattern (to replace the old  foo*  matching pattern and its search failures), so that e.g. the following autocomplete searches will succeed:

man -> Putterman, mailman, etc.
soft -> Microsoft
maria -> Evamaria, Eva-Maria
isler -> Gisler, Ghisler (people DO forget the exact spellings of foreign names, and it'll be bloody hard to find if you forget the h and can't do 'contains' search...)

Also consider that other languages like German might be much more prone to combine different semantic words into a single visible word, which applies to Names, Company Names, and Job descriptions too, all of which might end up concatenated in email recipients. Same for email addresses and domains even in any language including English, where spaces or other obvious semantic delimiters are often omitted, as in Johnny <johnDoe@mozillamessaging.com>. You'd certainly want your search for "Doe" or "messaging" to find that address, which is only possible using 'contains' search (*foo*).

Even before Bug 529584, single-character searches like "a" would return a potentially big number of results, any contacts where First Name, Last Name, Display Name, Nick Name, email address etc. starts with "a".
After Bug 529584, searching for just "a" will return every contact of your ABs having an "a" somewhere in those searched fields. Which is probably most of your AB. Which is pretty much expected because "a" is really not a very unique search term... BUT it can still succeed IF there's a frecency algorithm doing the trick for you; namely to learn that when you type "a", you've frequently chosen your popular correspondent "Anna" in the recent past. TB does not have "frecency" yet (Bug 382415); we mimick the idea with a dull eternal frequency counter aka "popularityIndex" which unfortunately doesn't age (Bug 1058583) nor systematically correlate with your actual search input.

As a powerful feature, together with Bug 529584, we shipped Bug 558931 which allows rapid google style searches using multiple search words to rapidly narrow down your result set. So you could search for "anna micro" or "a soft" to find exactly "anna" working at Microsoft, or all people having "a" working at some software company. Possibilities are endless, please try it!!! Just type whatever unique parts of your contact you remember best, in any order.

Now for certain scenarios, Bug 529584 caused users to see correct but unexpected results for very short searchwords (mostly one- or two-character searchwords), perhaps having an "a" somewhere; results not shown before, but now shown because of 'contains' search, and "correctly" uplifted due to their high popularityIndex (which might be age-old, and users forgot about it because it didn't show up in results before for that particular searchword). This effect was worsened by the fact that we used to wrongly fill up "Collected addresses AB" with rarely or never used addresses solely for the purpose of allowing remote content (now managed outside the AB, Bug 457296).

To alleviate such undesirable effects, this bug 970456 implemented a scoring algorithm which prioritizes:
a) matches at the beginning of word-like parts of the complete address "Display Name (company) <john.doe@bar.com>", so that search words like Disp, Nam, John, doe, company score better (rank higher) than any of "isp, ame, ompa, ohn, oe".
b) matches of search word(s) against the beginning of the complete address to score BEST (top-ranked), so searching "Display" will bring that address to the top (even when its popularity is lower than other matching addresses).

Both a) and b) ignore/outscore popularityIndex (which was made the secondary sorting criterion after the primary scoring), meaning that after this bug 970456, certain good and expected results could never get to the top regardless of their frequency of use. I had registered my doubts right from the beginning of this bug, but to be honest the devil is really in the detail so it's actually hard to correctly consider all the net effects of a certain design for a potentially unlimited number of scenarios.

We are very willing to learn from user feedback and deeper reflection of our own deeds ;) (while our manpower capacities are limited...!)
So I have pointed out a new trend in user feedback (as I had suspected and hinted long before) that now there was another group of users complaining that some matches would never be top-listed in spite of their high popularityIndex, which was practically side-lined for many scenarios by this bug 970456:

(In reply to Ulf Zibis from comment #176)
> Hi,
> in current TB 31.6.0 I have those 2 addresses:
> Personal addressbook: Eva Maya Kerkmeyer <EMK@EMKerkmeyer.com>
> Collected addressbook: emkerk@test.com
> Normally I use the 1st one, so the score should be high.
> Since I have the 2nd one in the other addressbook, autocomplete sets the 2nd
> one in first position if I type "emk".
> Is that as designed? Would is work better, if this bug is fixed?

That's an unwanted/overlooked/underestimated side-effect of the current intended design after this bug 970456, namely b) as described above in this comment.

The good news is, we intend to eliminate the problem of b) in bug 1134986, so initial match of search word(s) against the displayed complete address will NOT top-score any more in TB 38. Iow, as long as your search words match the beginning of *any* word-like parts of the complete address (or the nickname field), as described in a) above in this comment, they all rank equally high at first and will then be effectively (second-)sorted by their popularityIndex. So that will strengthen the influence of popularityIndex on the results order for a lot of scenarios, starting from TB 38.

As a caveat, and somewhat contrary to Magnus claim in bug 1134986 comment 42 that we'll now "let popularity rule", please note that any results based on "inner" matches will still NEVER get to the top regardless of their frecency, after getting down-scored with scoring still being the primary sorting criterion. For details, see my bug 1134986 comment 43.

Thanks for your valuable user feedback; rest assured we're working hard to improve TB for you. Remember that we're all unpaid volunteers dedicating our free time to the continuous development of TB. Enjoy! :)
Whiteboard: [fixed for thunderbird 31.4.0 and up][ignores popularityIndex if less popular display name begins with search word] → [read comment 177 before commenting][fixed for thunderbird 31.4.0 and up][makes popularityIndex the secondary sorting criterion after primary scoring based on matching patterns; initial display name top-scoring will be reverted by bug 1134986]
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: