1134986 - Address autocomplete sorting wrong - appears to ignore recent use (popularityindex) information in 31.4.0+

Reporter

Description

•

10 years ago

User Agent: Mozilla/5.0 (X11; Linux x86_64; rv:35.0) Gecko/20100101 Firefox/35.0 Build ID: 20150122214805 Steps to reproduce: This is probably related to bug 970456 and partially discussed in there already. The autocomplete in 31.4.0 is broken. Even after using it for several weeks, it doesn't ever show a name based on recent use, like the earlier versions did. so I always have to type nearly the complete email address, or scroll down the list to pick the correct one. I've downgraded to 31.3.0 now, and it's back to normal: I need to type max 3 characters to get the email I want based on recent use. Actual results: Typing a few characters of an email address I often send mails brings other addresses first, and the one I'm actually looking for somewhere down in the list. Expected results: Typing a few characters of an email address I often send mails to should bring this up on the top of the list.

Magnus Melin [:mkmelin]

Assignee

Updated

•

10 years ago

Keywords: regression

Whiteboard: [regression:TB31.4.0]

Wayne Mery (:wsmwk)

Updated

•

10 years ago

Component: Untriaged → Message Compose Window

Thomas D. (:thomas8)

Updated

•

10 years ago

Depends on: autocompleteFrecency

Thomas D. (:thomas8)

Comment 1

•

10 years ago

This regression is caused by Bug 970456, which (for the first time), added a score for sorting which for certain scenarios supersedes/circumvents popularityIndex. So indeed, depending on matching patterns between searchwords and your contact data, some results will never get to the top, no matter how often you use them. I suspect we'll have to roll back Bug 970456 after morphing popularityIndex into a more functional "frecency" algorithm in bug 382415.

Depends on: 970456

Thomas D. (:thomas8)

Updated

•

10 years ago

Comment 2

•

10 years ago

I also encounted problems with this change, however I have an Idea of what is wrong. Suppose I have this in my collected addresses, which I write to often. John Doe <jd@viptel.dk> Now due to the new algorithm, jd@anywhere.org will always be preferred when typing "jd", because it starts with "jd" and not "John"

Magnus Melin [:mkmelin]

Assignee

Comment 3

•

10 years ago

That particular case should not be a problem. Since John is the first word in the display name we prefer that.

Tomasz Ostrowski

Comment 4

•

10 years ago

We're using Thunderbird at work in a 50+ users network. We've noticed something strange, which looks like can have something to do with this regression. It behaves differently when you type several letters of address fast and differently when you make a short break after first letter. when you type several letters fast then any address popularity is ignored. This means that automatically is autocompleted a totally obscure and almost never used address. This is a regression after upgrade from 24.X to 31.X. But when you wait about half a second after typing first letter, then autocomplete popup shows and then, when the rest of letters would be typed then a popular address would autocomplete. For example: - type 'tome' fast - unpopular address containing "tome" would autocomplete; - type 't', wait a second, type 'ome' - the most popular address containing "tome" would autocomplete. This is a serious issue, because people often "drive blindly". For example they remember that when they type "bos", then the most popular address would autocomplete "boss@example.com". They don't even look at it because it always worked like this. But after 31.X upgrade, which is not possible to postpone because of security issues in previous releases, this would start to autocomplete to "boson@example.org", which is somebody to which he sent one e-mail 2 years ago. This is a real story with multiple real users in my organization - only names were changed.

Magnus Melin [:mkmelin]

Assignee

Comment 5

•

10 years ago

Confirming this to make tracking easier. Obviously there's something strange happening for some people.

Status: UNCONFIRMED → NEW

Ever confirmed: true

OS: Linux → All

Hardware: x86_64 → All

Summary: Address autocomplete stopped using recent use information in 31.4.0 → Address autocomplete sorting wrong - appears to ignore recent use (popularityindex) information in 31.4.0+

Whiteboard: [regression:TB31.4.0] → [regression:TB31.4.0][DUPETOME]

Magnus Melin [:mkmelin]

Assignee

Comment 9

•

10 years ago

Sofar I haven't been able to find anything wrong here. I wonder if it's something special with the address books? If someone is willing to share their addressbook with me, send me a mail with the ab attached and details on what to enter and what the expected result would be. If someone want to take a closer look code wise, the code is around here: http://mxr.mozilla.org/comm-central/source/mailnews/addrbook/src/nsAbAutoCompleteSearch.js#137

Magnus Melin [:mkmelin]

Assignee

Comment 10

•

10 years ago

(In reply to Tomasz Ostrowski from comment #4) > For example: > - type 'tome' fast - unpopular address containing "tome" would autocomplete; > - type 't', wait a second, type 'ome' - the most popular address containing > "tome" would autocomplete. Code wise I don't understand how that would happen except if there's something odd with the query to the address book. The code first searches for what's entered, and just reduce non-matches from there. So if you enter "be" it would find all addresses that have "be" somewhere in them. If you go on to "ben" it doesn't do a new query (to the addressbook) but goes through the results it already got and removes those that no longer match.

mpj.jonas

Comment 11

•

10 years ago

(In reply to Tomasz Ostrowski from comment #4) > We're using Thunderbird at work in a 50+ users network. We've noticed > something strange, which looks like can have something to do with this > regression. It behaves differently when you type several letters of address > fast and differently when you make a short break after first letter. > > when you type several letters fast then any address popularity is ignored. > This means that automatically is autocompleted a totally obscure and almost > never used address. This is a regression after upgrade from 24.X to 31.X. > > But when you wait about half a second after typing first letter, then > autocomplete popup shows and then, when the rest of letters would be typed > then a popular address would autocomplete. > > For example: > - type 'tome' fast - unpopular address containing "tome" would autocomplete; > - type 't', wait a second, type 'ome' - the most popular address containing > "tome" would autocomplete. > > This is a serious issue, because people often "drive blindly". For example > they remember that when they type "bos", then the most popular address would > autocomplete "boss@example.com". They don't even look at it because it > always worked like this. But after 31.X upgrade, which is not possible to > postpone because of security issues in previous releases, this would start > to autocomplete to "boson@example.org", which is somebody to which he sent > one e-mail 2 years ago. This is a real story with multiple real users in my > organization - only names were changed. Doesn't work this way for me. Even if I wait after typing first letter results are sorted wrong.

Charles

Comment 12

•

10 years ago

(In reply to Tomasz Ostrowski from comment #4) > But when you wait about half a second after typing first letter, then > autocomplete popup shows and then, when the rest of letters would be typed > then a popular address would autocomplete. There are definitely problems caused by a marked slowdown in the auto-complete code somewhere, and maybe that is causing many of these 'weird' problems people are reporting - and why reproducing them is so hit-miss. That aside - personally, I think that the autocomplete should NEVER kick in after typing just one character, under any circumstances. For the autocomplete in the address book, it should not start auto-completing until at least TWO characters are entered. Side-case: this behavior makes the message list pane filtering infuriatingly slow on large mail stores, because almost every message is likely to contain the one, single first character you type. On one large folder I have, there is at least a 3-5 second delay between typing the first character, and when the second one you entered appears. For people who work fast, this is extremely frustrating. For the Message List Filter, it should be at least 3 or 4 (and in both cases should be made configurable, if possible). Been meaning to open one for this, so I'm off to open a bug for the latter...

Magnus Melin [:mkmelin]

Assignee

Comment 13

•

10 years ago

Unless you do a new search, the sorting doesn't change as you type further, after that it just filters out things that no longer match. I did wonder if that was the problem, but at least so far I haven't found a case where it wouldn't do what you expect.

Tomasz Ostrowski

Comment 14

•

10 years ago

Attached file abook.mab — Details

I'm attaching my a minified addressbook - after removing most addresses and leaving only 3. t…y@b…y.org.pl - very popular t…y@a…o.waw.pl - less popular t…k@t…y.pl - not popular at all When I type "tome" fast - t…k@t…y.pl would autocomplete, bad. When I type "t", pause, "ome" - t…y@b…y.org.pl would autocomplete, good.

Magnus Melin [:mkmelin]

Assignee

Comment 15

•

10 years ago

Thx, I can reproduce with the test case in attachment 8587956 [details]. It's indeed the case from comment 13 I wasn't able to figure out. If you type in "tome" - tomek: score 100 (since it's a direct match in display name) - the other two score 99 (since tome is found but not as the first thing in the display name) If you type in "t" all three addresses get score 100, and are correctly sorted by popularity as a second resort. When you then type in the "ome" after that we do not recalculate the scores, which we apparently must. Well, you said you actually didn't want the address with "Tomek" as display name first but that should reasonably be the one to use since the other addresses has display name Tomasz instead, and "tome" just in the address.

Assignee: nobody → mkmelin+mozilla

Tomasz Ostrowski

Comment 16

•

10 years ago

I see. I'd then suggest to treat direct prefix match in e-mail address at least with the same score as prefix match in display name, so popularity can work. I think typing e-mail address is so natural in "To" field, that it should not be treated worse than typing a name. Or give popularity some weight. So, when there's near match in scores (here it's 100 vs 99) it could switch the order. It could be something like floor(sqrt(popularity)) or floor(ln(popularity)) to limit it. I don't think recalculating a score with every letter is a good idea. This could reorder autocomplete uncontrollably while typing. Would be confusing as hell - lot's of bug reports as with current algorithm.

mpj.jonas

Comment 17

•

10 years ago

Agreed with Tomasz Ostrowski. In my addressbook majority of frequently used adresses starts with some kind of academic title or company name in display name. So direct match on start of name is mostly irrelevant. Also from my POW when I'm entering recipents I think about their e-mail adresses (which I remember for most used contacts) not about what version of their name I have in addressbook.

Thomas D. (:thomas8)

Comment 18

•

10 years ago

(In reply to mpj.jonas from comment #17) > Agreed with Tomasz Ostrowski. In my addressbook majority of frequently used > adresses starts with some kind of academic title or company name in display > name. So direct match on start of name is mostly irrelevant. Good point, showing that our choice of giving display names absolute priority (and permanently overriding popularity) can easily fail for a lot of valid cases. > Also from my POW when I'm entering recipents I think about their e-mail > adresses (which I remember for most used contacts) not about what version of > their name I have in addressbook. Good point again, but certainly other users will have other POVs, and never remember the email, but only the name. One day, perhaps someone will believe and pick up on what I've said all along, namely that bug 382415 is the only long-term sustainable solution to all of these issues raised by users. It's more than obvious that we'll never get our manual scoring algorithm right, because different users expect different things. The only thing that we can reliably know about ALL types of users is that if they have selected a certain address recently somewhat frequently after typing a certain combination of characters, they are likely to use the same characters again and expect the same result in the near future. As time goes on, this allows for dynamic changes of ranking which is better than any stiff algorithm can ever do. My FF locationbar succeeds on that all the time from a much bigger database of URL candidates than the typical AB size. So if mpj.jonas often types "startofemail..." and then tends to pick a certain address for that, we'll remember it. If Doe User types "Start of Display Name..." and then tends to pick a certain for that, we'll also remember that. And if Any User prefers to type "Smart" which is somewhere at the end of display name "Dr. Joe Smart" and then tends to pick a certain address for that, we'll also remember that. Everybody happy after a little bit of training. Even when I pick something from somewhere down the results list which TB initially thinks is not so relevant a match. Depending on the frecency algorithm we choose, we could perhaps still make it so that begin-of-display name matches and/or begin-of-email matches initially score higher, but if user insists on picking another address, we'll also honor that because we memorize the combination of "searchword typed" PLUS "result chosen" in a database, together with values for frequency and recency which will be weighted. (In reply to Thomas D. from comment #1) > This regression is caused by Bug 970456, which (for the first time), added a > score for sorting which for certain scenarios supersedes/circumvents > popularityIndex. So indeed, depending on matching patterns between > searchwords and your contact data, some results will never get to the top, > no matter how often you use them. > > I suspect we'll have to roll back Bug 970456 after morphing popularityIndex > into a more functional "frecency" algorithm in bug 382415.

Thomas D. (:thomas8)

Comment 19

•

10 years ago

(In reply to Magnus Melin from comment #15) > Thx, I can reproduce with the test case in attachment 8587956 [details]. > It's indeed the case from comment 13 I wasn't able to figure out. > > If you type in "tome" > - tomek: score 100 (since it's a direct match in display name) > - the other two score 99 (since tome is found but not as the first thing in > the display name) > > If you type in "t" all three addresses get score 100, and are correctly > sorted by popularity as a second resort. How long does the inital display name match have to be atm to be considered? > When you then type in the "ome" after that we do not recalculate the scores, > which we apparently must. We should be very careful before digging our heels into this even deeper. This sounds like resource-eating and we already had problems with autocomplete slowness? Is constant reordering of results ok? (Maybe yes, because removing results as you type on also changes the visible result set, and we're not here to create good lists, but to narrow down efficiently to exactly ONE address). > Well, you said you actually didn't want the address with "Tomek" as display > name first but that should reasonably be the one to use since the other > addresses has display name Tomasz instead, and "tome" just in the address. Comment 17 is yet another use case to show that this logic can easily fail if taken as an absolute. One more thought against stiff sorting algorithms and in favor of real frecency algorithm per bug 382415: We implemented Bug 529584 because there was massive evidence that too many legitimate searches were failing for no reason, especially such starting from the middle of technical field values. We deliberately wanted "Maria" to find display name "Santa Maria de Angelo" or first name "Eva-Maria", or "CEO" to find "Jack Boss (CEO)", or "company" to find "Mr. Miller (company)", because we have no way of predicting by which field values or parts thereof our users best remember their addresses. So we wanted to give the users freedom to use whatever they remember about a given contact. As a result, obviously for very short, single(!) searchwords and depending on your personal dataset and search patterns, sometimes there might be more potential results. However, that's when normally "frecency" would do the job for you, and assist you to get where you want to (for special cases and lazy typers, perhaps initially requires a little training with scrolling down the list and picking what you expect). However, we just have dull "popularityIndex" which doesn't age (bugs on record). Worse, in the past we had abused the "Collected Addresses" AB to store non-contact addresses only for the sake of allowing remote-content, which are still cluttering users AB even after we have moved out remote-content controlling into Options, and we can't fix it for users because we can't tell them apart from normal contacts. And that is why some users started to complain that some of their age-old addresses had reappeared to the top where they didn't expect them. So to alleviate our own home-made bugs and shortcomings of past and present, we did Bug 970456 as a first aid fix to reduce the clamor. But it's really just a band-aid fix. If you think about it, Bug 970456 certainly violates the intention of Bug 529584 (which was giving the user more freedom in searchwords), because we are practically limiting the choice of successful searchwords again (where successful = top-level match): If there happens to be an initial display name match, we always prefer that and ignore the popularity of any other matching contact (for good, without any escape except changing your search words just to please TB). So, as users have already reported, we are forcing them to type much more again, and even "pushing" them to type from beginning display name if they want to succeed with less typing. But the whole point of Bug 529584 was just typing *anything* that you remember, and pick from there, and TB should use a concept of frecency against input to come up with best suggestions next time.

Thomas D. (:thomas8)

Comment 20

•

10 years ago

Having said which, we have improved a lot on some other bugs/shortcomings in this area, so consider if the following can work for you for better autocomplete results: * Define unique nicknames and use those as searchwords (will work reliably in TB38); even typing partial nicknames will work if unique enough, but well, better define them short enough to be typed in full * Use multi-word search phrases (Bug 558931) to rapidly narrow down your result set: To find "Dr. Paul Samson (Mozilla)", you can use search phrases like these: "Dr Sam" "Dr Sam Moz" "Sam Moz" "Moz Sam" etc. As soon as the combination of first and subsequent searchwords is somewhat unique (usually quickly), there's a good chance of ending up with just a single result from two very short search words. So depending on your datasets, even these might work: "Dr S" "Dr Mo" "Sa Mo" "Sam M" "Dr. P" etc. * Using the multi-word search, you could also use nickname field as a tag field, containing multiple tags like nickname, company nick, role nick, etc.: Nickname: BiG M$ CEO Nickname: Chris Moz CEO That's also handy for writing to all CEOs, all MOZ employees, etc. using contacts side bar.

Magnus Melin [:mkmelin]

Assignee

Comment 21

•

10 years ago

Regarding comment 14 - comment 16, it still appears perfectly reasonable that the order would be adjusted once you get to the k in tomek - that's the 5th char! It's true a combination of score and popularity could be useful, but out of scope for this bug. Regarding performance concerns, don't worry about the sorting. For an address book with 16000 entries, I typed in "b" and got 533 hits. The sorting is still only 6ms, and likely faster still for an optimized build.

Status: NEW → ASSIGNED

Magnus Melin [:mkmelin]

Assignee

Comment 22

•

10 years ago

Attached patch bug1134986_autocomplete_invstigation.patch (obsolete) — Details — Splinter Review

It's frustrating trying stuff with the test data files, so I refactored test_nsAbAutoCompleteSearch1.js to have the data inline. Turned out I wanted a separate test though...

Attachment #8589855 - Flags: review?(neil)

neil@parkwaycc.co.uk

Comment 23

•

10 years ago

Comment on attachment 8589855 [details] [diff] [review] bug1134986_autocomplete_invstigation.patch >+ // Add matches into the results array. We re-sort as needed later. > result._searchResults.push({ > value: aPreviousResult.getValueAt(i), > comment: aPreviousResult.getCommentAt(i), > card: aPreviousResult.getCardAt(i), > emailToUse: aPreviousResult.getEmailToUse(i), >- popularity: parseInt(aPreviousResult.getCardAt(i).getProperty("PopularityIndex", "0")) >+ popularity: parseInt(aPreviousResult.getCardAt(i) >+ .getProperty("PopularityIndex", "0")), >+ score: this._getScore(aPreviousResult.getCardAt(i), >+ aPreviousResult.getValueAt(i).toLocaleLowerCase(), >+ fullString) > }); ... >+ // Sort the results. Scoring may have changed so do it even if this is >+ // just filtered previous results. >+ result._searchResults.sort(function(a, b) { >+ // Order by 1) descending score, then 2) descending popularity, >+ // then 3) primary email before secondary for the same card, then >+ // 4) by emails sorted alpabetically. >+ return (b.score - a.score) || >+ (b.popularity - a.popularity) || >+ ((a.card == b.card && a.isPrimaryEmail) ? -1 : 0) || >+ ((a.value < b.value) ? -1 : (a.value == b.value) ? 0 : 1); >+ // TODO: this should actually use a.value.localeCompare(b.value). >+ }); I think this fails when the same card matches on both primary and secondary email, because you lose the isPrimaryEmail value. [Typo: alphabetically]

Tomasz Ostrowski

Comment 24

•

10 years ago

(In reply to Magnus Melin from comment #21) > It's true a combination of score and popularity could be useful, > but out of scope for this bug. I'm sorry but I have to disagree. This bug and several duplicates are all about that autocomplete sorting _appears_ to ignore popularityIndex. You've proven that it is not really ignored. The problem is that score has priority over popularity, so very often popularity appears to be ignored, because a score is slightly better for unpopular card. As I understand your patch would make autocomplete result consistent, as it would not matter if text was typed slowly or fast - you'd get the same results. But it would still be an unexpected result - because unpopular display name would coincidentally match typed address prefix and get slightly higher score. Popularity would appear to be ignored as before. I'd instead add something like `result+=Math.floor(Math.log2(Math.max(popularity,1)))` to _getScore(). And I'd compare only by score, primary before secondary and then alphabetically.

Magnus Melin [:mkmelin]

Assignee

Comment 25

•

10 years ago

(In reply to neil@parkwaycc.co.uk from comment #23) > I think this fails when the same card matches on both primary and secondary > email, because you lose the isPrimaryEmail value. (didn't change that code but..) I believe it's correct, and the tests say so too ;) If you'd use something like ((a.card == b.card) ? (a.isPrimaryEmail ? -1 : 1) : 0) || ... the it will not work.

Magnus Melin [:mkmelin]

Assignee

Comment 26

•

10 years ago

Attached patch bug1134986_autocomplete_invstigation.patch (obsolete) — Details — Splinter Review

Slight update, with the secondary address also added.

Attachment #8589855 - Attachment is obsolete: true

Attachment #8589855 - Flags: review?(neil)

Attachment #8590450 - Flags: review?(neil)

neil@parkwaycc.co.uk

Comment 27

•

10 years ago

(In reply to Magnus Melin from comment #25) > (In reply to comment #23) > > I think this fails when the same card matches on both primary and secondary > > email, because you lose the isPrimaryEmail value. > > (didn't change that code but..) Sorry for being unclear, but the code path that queries the full results adds the isPrimaryEmail property to the results, but the code path that just filters them in place does not. To test this you would probably want the secondary email to precede the primary email alphabetically.

Magnus Melin [:mkmelin]

Assignee

Comment 28

•

10 years ago

Attached patch bug1134986_autocomplete_invstigation.patch (obsolete) — Details — Splinter Review

Ah yes, good catch!

Attachment #8590450 - Attachment is obsolete: true

Attachment #8590450 - Flags: review?(neil)

Attachment #8591052 - Flags: review?(neil)

Magnus Melin [:mkmelin]

Assignee

Comment 29

•

10 years ago

(In reply to Tomasz Ostrowski from comment #24) I just don't accept the idea that people would frequently type something that is not the "real" name they want to find, and actually expect to find what they want as top hit. And if they do, (for 38) we top list if what you type in matches the nickname you have set for that particular contact. So yes, at least for this bug I just want to make it work consistently.

neil@parkwaycc.co.uk

Comment 30

•

10 years ago

Comment on attachment 8591052 [details] [diff] [review] bug1134986_autocomplete_invstigation.patch >+ let prevCard = aPreviousResult.getCardAt(i); Might be worth caching the email to use too, it gets used three times. [Calling it prevCard looks a little odd; it's the current card from the previous result, so swings and roundabouts.]

Attachment #8591052 - Flags: review?(neil) → review+

Tomasz Ostrowski

Comment 31

•

10 years ago

(In reply to Magnus Melin from comment #29) > I just don't accept the idea that people would frequently type something > that is not the "real" name they want to find But this is what I experience at my work - many complains about this from many users. I suppose it is caused by convenience of standard email format in organization environment. See - all addresses in my organization, and in many others, use first letter of first name and surname, like this: Malcolm Reynolds - mreynolds@example.com Zoe Washburne - zwashburne@example.com Hoban Washburne - hwashburne@example.com Inara Serra - iserra@example.com Simon Tam - stam@example.com River Tam - rtam@example.com Even you own email address is using this format. This makes a very convenient addressing - just use initials on "To" and 99% of the time it correctly autocompleted to a correct person, as your coworkers are naturally your most popular email recipients. Even when there are many persons with the same given name or the similar surname in your addressbook. Initials are fairly good micro-hash of names. And best of all - it was completely automatic - you only needed to reply to coworker's email once to automatically add it to addressbook. There's no need to configure addressbook, edit cards, set nicknames for 50+ cards and so on. And all this is now broken.

Thomas D. (:thomas8)

Comment 32

•

10 years ago

(In reply to Magnus Melin from comment #29) > (In reply to Tomasz Ostrowski from comment #24) > I just don't accept the idea that people would frequently type something > that is not the "real" name they want to find, and actually expect to find > what they want as top hit. I'm really failing to see how Magnus arrives at this bold but unreasonable theory. There are loads of scenarios where for a number of different reasons, the official "Display Name" might not be an initial match for the favorite search words used by user. On this bug there's a very simple and straightforward example mentioned: Display Name: Prof. Dr. Ines Smooth <ismooth@foo.bar> (popularity: 100) In the same address book, there are other people the user rarely writes to: Isabell Doe <isa.doe@foo.bar> (popularity: 3) Ines Johnson <isjohnson@foo.bar> (popularity: 3) The university department has loads of Profs. So typing "Prof" as a searchword does not make sense at all. Users will automatically resort to searching for unique strings of information which they can easily remember. Searching for "Ines" (technically in the middle of Display name), or "is" (persons's initials at beginning of email) makes perfect sense and is a completely natural search. However, with the absolute priority of display name scoring wrongly established in Bug 970456, no matter how often users type "Ines" or "is" and then select "Prof. Ines Smooth" from results, TB will NEVER learn that Prof. Ines is more popular for those search words, because the two unpopular entries, happen to have display names which start with those search words ('Ines' Johnson, 'Is'abell Doe), and will FOREVER be wrongly returned as first matches. Iow, as long as there happen to be initial display name matches for random reasons, the real popularity of the desired combination {search word - chosen result} is completely ignored, whereas before (by absolute, dumb popularity index) we at least succeeded to mimick such popularity behaviour. > And if they do, (for 38) we top list if what you > type in matches the nickname you have set for that particular contact. So > yes, at least for this bug I just want to make it work consistently. Well, defining explicit nicknames is certainly a concept with limitations for bigger setups.

Thomas D. (:thomas8)

Comment 33

•

10 years ago

Magnus, as an incrememtal remedy for the critical algorithm of Bug 970456, could we at least make it so that if the beginning of an email address matches the search input, it scores just as high as the beginning of display name? Or perhaps we already have that? (I've lost track of the scoring details...) Searching for the actual email looks like a frequent usecase which should not interfere with popularity.

Flags: needinfo?(mkmelin+mozilla)

Thomas D. (:thomas8)

Comment 34

•

10 years ago

(In reply to Magnus Melin from comment #29) > (In reply to Tomasz Ostrowski from comment #24) > I just don't accept the idea that people would frequently type something > that is not the "real" name they want to find, and actually expect to find > what they want as top hit. And if they do, (for 38) we top list if what you > type in matches the nickname you have set for that particular contact. So > yes, at least for this bug I just want to make it work consistently. Another very simple example against the oversimplified initial-display-name search hypothesis, from bug 970456 comment 163: > Given the display name's higher precedence, > contacts with display names of the form "Last, First" are effectively > demoted when searching for first name, even if they have higher popularity. > For example, typing "rich" pre 31.4.0 I got the following display names: > > Beta, Rich [used often] > Alpha, Richard [used sometimes] > Richard Gamma [used never] > > which is exactly the order I expected. With 31.4.0 I now get: > > Richard Gamma [used never] > Beta, Rich [used often] > Alpha, Richard [used sometimes] > > startsWith() over contains() made sense, but its use combined with elevated > Display Name seems to go too far. Why elevate any match over another? When a > user starts typing, how does anyone know if they are trying to match email > address, first name, last name, display name, nickname? We really can't tell. Plus inside any of those fields, there can be multiple words, words in brackets, etc., in any unpredictable order depending on corporate or private logic... Also consider that how Display Names look will often not be defined by the user, but by how those other (international, ...) contacts have defined their own display names, and user just took them over into his ABs by replying.

Thomas D. (:thomas8)

Comment 35

•

10 years ago

(In reply to Magnus Melin from comment #29) > (In reply to Tomasz Ostrowski from comment #24) > I just don't accept the idea that people would frequently type something > that is not the "real" name they want to find, and actually expect to find > what they want as top hit. And if they do, (for 38) we top list if what you > type in matches the nickname you have set for that particular contact. So > yes, at least for this bug I just want to make it work consistently. Of course Magnus is right that users cannot *immediately* expect certain "inner" matches as top hits, but what they have rightly come to expect is that *over time*, TB will learn about the higher popularity of those matches with this particular user (which works surprisingly well with the current poor implementation of dull popularityIndex counter). Whereas the display-name based absolute top-scoring introduced by Bug 970456 has made it so that certain matches will NEVER reach the top regardless of their popularity, whereas less popular matches will cling to the top forever because of their accidental display-name top-scoring.

Charles

Comment 36

•

10 years ago

(In reply to Tomasz Ostrowski from comment #31) > And best of all - it was completely automatic - you only needed to reply to > coworker's email once to automatically add it to addressbook. There's no > need to configure addressbook, edit cards, set nicknames for 50+ cards and > so on. Only true if you use the brain-dead 'Automatically add outgoing e-mail addresses to my...' misfeature... This is one of the first things I turn off. I have many multiple Address Books, and I only want addresses in one of my Address Books that I intentionally put in there, and the little Star mechanism for managing this is extremely easy and functional. That said, I totally agree with your argument...

Wayne Mery (:wsmwk)

Comment 37

•

10 years ago

I am in tune with comment 31 and 33. I frequently do not type a person's name. Most often I use the start of their email or something I have put in display name (often a hint to their function, interest, position or company name) for several reasons: 1. because for me these (previously) have produced more predictable results than using a real name, 2. because I frequently don't remember a name, 3. because I want to be presented with several choices from the same organization or group (testers for example) Also note, frequently a person's company name is not contained in their email address, hence the need to resort to something in display name.

[:jberkus] Josh Berkus

Comment 38

•

10 years ago

Magnus @29: Requiring users to add nicknames to 100% of their contacts in order to get frequently-used contacts to sort correctly is hard to regard as anything but an extreme functionality regression. Let me give you another example of how this is broken. As someone who works in the OSS nonprofit sphere, I email the treasurers of most of the open source nonprofits and thus have them in my address book. Of these, "treasurer@spi-inc.org" gets emailed 1000X as frequently, because I'm the assistant treasurer there. Yet, "treasurer@documentfoundation.org", an address I have mailed *once*, two years ago, will sort above "treasurer@spi-inc.org", an address I mail serveral times a day. Why? Because the contact for SPI treasurer is: SPI Treasurer <treasurer@spi-inc.org> ... which means that address will NEVER reach the top of the sorting order, regardless of how frequently it is used and how infrequently other addresses are used, unless I remember to type "SPI" instead of "treasurer". Further, because of this bug, I have accidentally emailed the wrong contact and had to do a lot of backtracking at least 6 times since FF 31 came out. The new behavior is an unalloyed bug and should simply be reverted without further thought.

[:jberkus] Josh Berkus

Comment 39

•

10 years ago

... continued: I can think of two better solutions for the "HolBErt" problem described in bug 970456. 1) Allow the user to choose in Preferences whether they prefer "text match" or "popularity" to come first. 2) Expand the definition of "begins with" to cover likely search patterns. To explain (2), take the following contacts: "Doe, Jane" < jdoe@klezmania.edu > "SPI treasurer" <treasurer@spi-inc.org> ltorvalds@linux.com For Jane Doe, there are 4 searchable tokens: "Jane", "Doe", "jdoe", and "klezmania.edu". "Begins with" should be considered to match if it matches the beginning of any of the four tokens. Likewise, the second address has 3 tokens, one of which is repeated: "SPI", "treasurer", "spi-inc.org". The third address has only two tokens: "ltorvalds" and "linux.com". If email addresses are divided into tokens like this, then promoting "begins with" matches over "contains" matches starts to make sense; it would actually be an enhancement of TB instead of regression. There's some argument to be made as to whether the "domain" token after the @ should sort as highly as other matches.

ISHIKAWA, Chiaki

Comment 40

•

10 years ago

(In reply to [:jberkus] Josh Berkus from comment #38) > Magnus @29: ... > > Let me give you another example of how this is broken. As someone who works > in the OSS nonprofit sphere, I email the treasurers of most of the open > source nonprofits and thus have them in my address book. Of these, > "treasurer@spi-inc.org" gets emailed 1000X as frequently, because I'm the > assistant treasurer there. > I have similar examples, and > Yet, "treasurer@documentfoundation.org", an address I have mailed *once*, > two years ago, will sort above "treasurer@spi-inc.org", an address I mail > serveral times a day. Why? Because the contact for SPI treasurer is: > > SPI Treasurer <treasurer@spi-inc.org> > > ... which means that address will NEVER reach the top of the sorting order, > regardless of how frequently it is used and how infrequently other addresses > are used, unless I remember to type "SPI" instead of "treasurer". > > Further, because of this bug, I have accidentally emailed the wrong contact > and had to do a lot of backtracking at least 6 times since FF 31 came out. > The new behavior is an unalloyed bug and should simply be reverted without > further thought. I also had to do this backtracking a few times :-( I would like to see either the algorithm be reverted to the old (buggy?) one which worked just fine for me, or some suggestions here are incorporate' (use of frequency or rather use of recentness as priority, etc.), OR make the algorithm selectable by user preference USE_PRE_FF31_AUTOCOMPLETE, etc. TIA

St Heine

Reporter

Comment 41

•

10 years ago

I'm happy to see the discussion happening here, but I cannot see why a major change is needed before fixing at least the biggest pain point. I have downgraded back to Thunderbird 31.3.0 and reject any updates since, as that's the last version that was working perfectly fine for me. So I would recommend to revert back the code to what was in 31.3.0 and release that in the next version - then you can discuss major changes to the process.

Magnus Melin [:mkmelin]

Assignee

Comment 42

•

10 years ago

Attached patch bug1134986_autocomplete_invstigation.patch — Details — Splinter Review

Ok so let's let popularity rule instead of the "direct displayname hit" then. Also use localCompare, the alphabetical results were confusing earlier since capital letters and lower case letter sorted differently.

Attachment #8591052 - Attachment is obsolete: true

Flags: needinfo?(mkmelin+mozilla)

Attachment #8592415 - Flags: review?(neil)

Thomas D. (:thomas8)

Comment 43

•

9 years ago

(In reply to Magnus Melin from comment #42) > Created attachment 8592415 [details] [diff] [review] > bug1134986_autocomplete_invstigation.patch > > Ok so let's let popularity rule instead of the "direct displayname hit" then. > Also use localCompare, the alphabetical results were confusing earlier since > capital letters and lower case letter sorted differently. Thanks!!! I believe (and sincerely hope...) that this will improve user experience a lot in the long run. However, let's be precise in description/wording of what we've done here: Popularity still does NOT "rule" after patch of attachment 8592415 [details] [diff] [review], because it is STILL only the *secondary* sorting criterion *after* the score has done the primary sorting. But this patch makes it a lot better: All the matches of search word(s) against the beginning of *any* word-like parts of the complete email address will score BEST, so these will be toplisted and then effectively (second-)sorted by popularity: // We want to treat firstname, lastname and word boundary(ish) parts of // the email address the same. E.g. for "John Doe (:xx) <jd.who@example.com>" + // all of these should score the same: "John", "Doe", "xx", + // ":xx", "jd", "who". + if (/[ :."'(\-_<&]/.test(prevCh)) + return BEST; + result._searchResults.sort(function(a, b) { + // Order by 1) descending score, then 2) descending popularity, + // then 3) primary email before secondary for the same card, then + // 4) by emails sorted alphabetically. + return (b.score - a.score) || + (b.popularity - a.popularity) || + ((a.card == b.card && a.isPrimaryEmail) ? -1 : 0) || + a.value.localeCompare(b.value); + }); This means that any "inner" matches which do NOT match the beginning of obviously visible word-like parts of the complete email address will still always be downgraded because they have lower score which we use for primary sorting, with no chance of ever getting to the top, regardless of their popularity. So we've fixed the worst part of our well-meant but mistaken attempts in Bug 970456, but in the long run, I suspect that we'll still get legitimate complaints that some desired results from regular semantic search terms can never get to the top of the list. To understand the remaining problem, pls realize that technical word boundaries (spaces, dots, brackets etc), more so for other languages like German, do NOT always coincide with semantic word boundaries. Donaudampfschifffahrtskapitän is a single word in German which is regularly spelled exactly like that, no spaces at all, but obviously consists of several identifiable sub-words, Donau, Dampf, Schifffahrt, Kapitän. For any language including English, we have the same problem with email addresses and domains, where spaces or other easily recognizable semantic word separators are often omitted: Jonny <JohnDoe@mozillamessaging.com> There's absolutely no reason why searching for "doe" or "messaging" will still NEVER succeed to toplist that address regardless of high freceny, just because it happens to be an "inner" match because our algorithm can't see the semantic boundaries. But yeah, we're in a catch-22 because not downgrading the inner matches causes other desasters when the non-semantic, potentially undesired inner matches push themselves to the top. So as the ultimate solution, one day we'll have have to do the real "frecency" stuff and develop some smarter algorithm based on freqency and recency of actual search inputs and the results picked after those inputs (Bug 382415, also consider John Ruskin's proposals of bug 1058583).

neil@parkwaycc.co.uk

Comment 44

•

9 years ago

Comment on attachment 8592415 [details] [diff] [review] bug1134986_autocomplete_invstigation.patch So basically this deprioritises a full prefix match which basically becomes just another word prefix match? Sounds reasonable to me. (I'm not actually sure whether localeCompare sorts case-insensitively on all platforms; I think there used to be cases where it didn't. Still, it's at least as reliable as the previous code.)

Attachment #8592415 - Flags: review?(neil) → review+

Magnus Melin [:mkmelin]

Assignee

Comment 45

•

9 years ago

https://hg.mozilla.org/comm-central/rev/0f3e3aae3ef2 -> FIXED

Target Milestone: --- → Thunderbird 40.0

Magnus Melin [:mkmelin]

Assignee

Comment 46

•

9 years ago

We want this for 38, but with 38 around the corner I don't think it's esr31 material.

Status: ASSIGNED → RESOLVED

Closed: 9 years ago

status-thunderbird_esr31: --- → wontfix

tracking-thunderbird38: --- → +

Flags: in-testsuite+

Resolution: --- → FIXED

Charles

Comment 47

•

9 years ago

Thanks guys! 38 is looking to be a really solid release...

[:jberkus] Josh Berkus

Comment 48

•

9 years ago

Thanks, all, I look forward to testing the fix in 38 beta.

Kent James (:rkent)

Updated

•

9 years ago

status-thunderbird38: --- → affected

Kent James (:rkent)

Comment 49

•

9 years ago

Comment on attachment 8592415 [details] [diff] [review] bug1134986_autocomplete_invstigation.patch [Approval Request Comment] Regression caused by (bug #): User impact if declined: Testing completed (on c-c, etc.): Risk to taking this patch (and alternatives if risky):

Attachment #8592415 - Flags: approval-comm-beta?

Attachment #8592415 - Flags: approval-comm-aurora?

Kent James (:rkent)

Comment 50

•

9 years ago

Comment on attachment 8592415 [details] [diff] [review] bug1134986_autocomplete_invstigation.patch http://hg.mozilla.org/releases/comm-aurora/rev/85bcf1bc0b53

Attachment #8592415 - Flags: approval-comm-aurora? → approval-comm-aurora+

Kent James (:rkent)

Updated

•

9 years ago

status-thunderbird39: --- → fixed

Kent James (:rkent)

Comment 51

•

9 years ago

Comment on attachment 8592415 [details] [diff] [review] bug1134986_autocomplete_invstigation.patch https://hg.mozilla.org/releases/comm-beta/rev/53b37cdf7e74

Attachment #8592415 - Flags: approval-comm-beta? → approval-comm-beta+

Kent James (:rkent)

Updated

•

9 years ago

status-thunderbird38: affected → fixed

abook.mab 10 years ago Tomasz Ostrowski 1.99 KB, application/x-markaby		Details
bug1134986_autocomplete_invstigation.patch 10 years ago Magnus Melin [:mkmelin] 24.00 KB, patch		Details \| Diff \| Splinter Review
bug1134986_autocomplete_invstigation.patch 10 years ago Magnus Melin [:mkmelin] 25.39 KB, patch		Details \| Diff \| Splinter Review
bug1134986_autocomplete_invstigation.patch 10 years ago Magnus Melin [:mkmelin] 26.64 KB, patch	neil : review+	Details \| Diff \| Splinter Review
bug1134986_autocomplete_invstigation.patch 10 years ago Magnus Melin [:mkmelin] 33.96 KB, patch	neil : review+ rkent : approval-comm-aurora+ rkent : approval-comm-beta+	Details \| Diff \| Splinter Review