Last Comment Bug 558931 - Composition's recipient autocomplete does not / fails to show all matching address book entries (implement partial/substring matching for each of multiple search words; find 2nd words etc. as in Message Quick Filter: *foo* AND *bar* for XXL search power)
: Composition's recipient autocomplete does not / fails to show all matching ad...
Status: RESOLVED FIXED
[tb31features][UX summary comment 109...
: relnote, testcase
Product: Thunderbird
Classification: Client Software
Component: Address Book (show other bugs)
: Trunk
: All All
: -- major with 9 votes (vote)
: Thunderbird 31.0
Assigned To: Suyash Agarwal (:sshagarwal)
:
Mentors: Thomas D. (currently busy elsewhere; needinfo?me)
http://getsatisfaction.com/mozilla_me...
: 234422 405190 599282 719480 747854 775331 779045 793991 824732 (view as bug list)
Depends on: 529584 984875
Blocks: 40046 259289 749097 787608 1000775 1091675
  Show dependency treegraph
 
Reported: 2010-04-12 16:13 PDT by Andreas
Modified: 2015-02-28 02:19 PST (History)
30 users (show)
ryanvm: in‑testsuite+
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---


Attachments
testcase: sample address book card (test address book, import as csv) (600 bytes, text/plain)
2010-08-26 09:03 PDT, Thomas D. (currently busy elsewhere; needinfo?me)
no flags Details
Patch v1 (4.48 KB, patch)
2013-11-29 22:45 PST, Suyash Agarwal (:sshagarwal)
aryx.bugmail: feedback-
Details | Diff | Review
Patch v1 (cleaned up) (2.82 KB, patch)
2013-11-30 10:29 PST, Suyash Agarwal (:sshagarwal)
aryx.bugmail: feedback+
bugzilla2007: feedback+
Details | Diff | Review
Patch v2 (7.46 KB, patch)
2013-12-12 02:28 PST, Suyash Agarwal (:sshagarwal)
bugzilla2007: feedback+
Details | Diff | Review
Patch v2.2 (7.54 KB, patch)
2013-12-12 09:06 PST, Suyash Agarwal (:sshagarwal)
bugzilla2007: feedback+
Details | Diff | Review
Patch v2.5 (7.90 KB, patch)
2013-12-14 23:25 PST, Suyash Agarwal (:sshagarwal)
bugzilla2007: feedback+
Details | Diff | Review
Patch v3 (7.87 KB, patch)
2013-12-15 12:02 PST, Suyash Agarwal (:sshagarwal)
no flags Details | Diff | Review
Patch v3.2 (7.98 KB, patch)
2013-12-22 00:51 PST, Suyash Agarwal (:sshagarwal)
neil: review-
acelists: feedback+
mkmelin+mozilla: feedback+
Details | Diff | Review
Patch v3.6 (11.56 KB, patch)
2014-01-19 00:55 PST, Suyash Agarwal (:sshagarwal)
no flags Details | Diff | Review
Patch v4 (11.77 KB, patch)
2014-01-20 11:45 PST, Suyash Agarwal (:sshagarwal)
no flags Details | Diff | Review
Patch v4.1 (11.77 KB, patch)
2014-01-21 00:02 PST, Suyash Agarwal (:sshagarwal)
no flags Details | Diff | Review
Patch v4.2 (11.83 KB, patch)
2014-01-21 06:46 PST, Suyash Agarwal (:sshagarwal)
bwinton: ui‑review+
Details | Diff | Review
Patch v4.7 (4.2 updated) (11.90 KB, patch)
2014-04-15 03:48 PDT, Suyash Agarwal (:sshagarwal)
syshagarwal: ui‑review+
Details | Diff | Review
Patch v5 (12.16 KB, patch)
2014-04-22 22:02 PDT, Suyash Agarwal (:sshagarwal)
no flags Details | Diff | Review
Patch v5.3 (12.11 KB, patch)
2014-04-23 12:38 PDT, Suyash Agarwal (:sshagarwal)
neil: review+
Details | Diff | Review
Patch for check-in (11.76 KB, patch)
2014-04-23 13:58 PDT, Suyash Agarwal (:sshagarwal)
syshagarwal: review+
syshagarwal: ui‑review+
Details | Diff | Review
Patch for check-in + modified tests (14.54 KB, patch)
2014-04-25 04:16 PDT, Suyash Agarwal (:sshagarwal)
syshagarwal: review+
syshagarwal: ui‑review+
Details | Diff | Review
Patch for check-in + tests modified (14.55 KB, patch)
2014-04-25 04:21 PDT, Suyash Agarwal (:sshagarwal)
syshagarwal: review+
syshagarwal: ui‑review+
Details | Diff | Review
Final patch for check-in (14.53 KB, patch)
2014-04-25 11:14 PDT, Suyash Agarwal (:sshagarwal)
syshagarwal: review+
syshagarwal: ui‑review+
Details | Diff | Review

Description Andreas 2010-04-12 16:13:01 PDT
User-Agent:       Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2.3) Gecko/20100401 Firefox/3.6.3 (.NET CLR 3.5.30729)
Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2.3) Gecko/20100401 Firefox/3.6.3 (.NET CLR 3.5.30729)

Suppose you have an address book containing multiple persons containing the char-sequence "tin" as display name, name, email-address or anyting else.

If you type "tin" in a new message's To-Field, Thunderbird should show a list containing all address book members where "tin" matches - but it doesn't.

Reproducible: Always

Steps to Reproduce:
1. Crreate a new mail
2. Type a char-sequence into "To-Field" matching multiple address book entries (e.g. "tin")
3. Compare autocomplete-proposals with address book entries
Actual Results:  
Autocomplete does not show all matching address book entries.

e.g. Autocomplete only shows "Tino Foo ....", "Bar, Tina ...."; "Tina XYZ ..." or "Foo bar <tino.xxx@example.com" are missing.

Expected Results:  
Autocomplete shows all matching address book entries - non missing.

The missing entries are always the same. 
I cannot recognize a pattern for the missing entries.

None of my contacts has a lastname ore firstname, all entries only use display name, email-address and phone numbers.

Maybe this is a char-encoding-issue. I do use Zindus to sync my address book with Google contacts.
Comment 1 Andreas 2010-04-12 16:14:27 PDT
Sorry... wrong copy & paste for build identifier.

Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.1.9) Gecko/20100317 Thunderbird/3.0.4
Comment 2 Mark Banner (:standard8) 2010-04-13 00:33:38 PDT
Can you give some examples of the missing matches?

Note that autocomplete will only match the beginning of words, so if you have "satin" as a display name, that won't be picked up.
Comment 3 Andreas 2010-04-13 14:21:08 PDT
Wooow. Yes, I can now reproduce it.

Thunder bird only matches the beginning of a field (not a word). So the display name "foo bar" will only match for searches beginning with "fo..." and not with "ba..." because tb uses a search similar to displayname.beginsWith(var).

The reason it looked so strange, was: sometime the match hits the display name and sometimes the mail address - which is vice versa to the display name (name: foo bar, mail: bar.foo@...).

So there are two possibilities for this issue: 

1) Tb should find matches at the beginning of each word (then this is a bug) or
2) Tb behaves as expected (then this is an enhancement).

Unfortunately Google only knows "display name", so I cannot change that without loosing sync. :-|



3 Example entries in my address book

a) Foo Bar <foo.bar@example.com>
b) Bar Tender <bar.tender@example.com>
c) Eve Foo <foo.eve@example.com>

Actual Results:
Autocomplete for "foo": a) match name(1st word) & mail, c) match mail
Autocomplete for "ten": -- no match in name or mail
Autocomplete for "bar": b) match name(1st word) & mail

Expected Results:
Autocomplete for "foo": a) match name & mail, c) match name(2nd word) & mail
Autocomplete for "ten": b) match name (2nd word)
Autocomplete for "bar": a) match name (2nd word), b) match name(1st) & mail
Comment 4 Andreas 2010-04-13 15:08:25 PDT
One thing to add: I just noticed that the match is even more surprising.

1 address book entries:

a) Paris Hilton, mail: <ph@hilton.com>, additional mail: <sweet@example.com>

Autocomplete for "ph": Two(!) entries
 "Paris Hilton <ph@...>" and "Paris Hilton <sweet@...>"

If there is a match, all mails are shown for the address book match. That's a little bit surprising for me, because there is no match for "Paris Hilton <swee...>"?!
Comment 5 Thomas D. (currently busy elsewhere; needinfo?me) 2010-08-26 09:01:12 PDT
From related bug 405190, comment 1:
> Address book previously would match ANY word or partial word in the name or
> email portion of the address book card. Now it will only match the beginnning
> of the name or the beginning of the email address. This has reduced the
> auto-complete functionality immeasurably.

This bug isn't very clear, nor very comprehensive, but it seems to be the most comprehensive out of several bugs about this major problem -> confirming to get it on the radar.

We'll need a new bug to implement partial matches on *any* address-related fields on the card, and I suspect we should do the same as quick filter and treat multiple words in the search as potentially non-adjacent so that users can
narrow down on the contact without bothering about the right syntax as it
happens to be in the book. Iow, it shouldn't matter if I search for "Jane Doe"
or "Doe Jane" (both without quotes, and even "oe ane" should find the thing.
Furthermore, we should be tolerant and accept angle brackets to search for the
beginning or end of email addresses, like "<foo" or "asdf.com>" to find
<foo.bar@asdf.com>.
Comment 6 Thomas D. (currently busy elsewhere; needinfo?me) 2010-08-26 09:03:28 PDT
Created attachment 469478 [details]
testcase: sample address book card (test address book, import as csv)

Here's a testcase that helps to avoid the behaviour analysis traps of a "normal
card" where the display name is composed from First Name and Last Name and thus
you'll believe to get matches for things we don't actually match.

Play with it and see that we're not matching anything when it comes to partial
matches, e.g. we are not matching space-separated second+ words from the
display name (bug 405190): we do NOT actually match "Janette" or "company" if the display name is "Display, Janette (sample company)". But you might not notice because if the first name starts with "Jane", searching for "Jane" will match it because of the first name (not the display name).

With multiple words, we are NOT matching e.g. "Janette Display", but we should.
Comment 7 [:Aureliano Buendía] 2010-09-27 23:46:26 PDT
*** Bug 599282 has been marked as a duplicate of this bug. ***
Comment 8 Thomas D. (currently busy elsewhere; needinfo?me) 2011-01-09 22:34:20 PST
The following momo gs report has been declared canonical:
http://gsfn.us/t/159l6 aka
http://getsatisfaction.com/mozilla_messaging/topics/auto_complete_of_addresses_with_substring

Changed URL of this bmo bug to http://getsatisfaction.com/mozilla_messaging/tags/bug_558931, according to gs-bmo linking policies in
https://wiki.mozilla.org/Thunderbird/Support/Workflow
Comment 9 [:Aureliano Buendía] 2012-01-29 04:29:07 PST
*** Bug 719480 has been marked as a duplicate of this bug. ***
Comment 10 Magnus Melin 2012-02-27 05:06:10 PST
*** Bug 405190 has been marked as a duplicate of this bug. ***
Comment 11 Thomas D. (currently busy elsewhere; needinfo?me) 2012-07-26 06:48:38 PDT
*** Bug 775331 has been marked as a duplicate of this bug. ***
Comment 12 Thomas D. (currently busy elsewhere; needinfo?me) 2012-08-03 00:58:16 PDT
*** Bug 779045 has been marked as a duplicate of this bug. ***
Comment 13 Thomas D. (currently busy elsewhere; needinfo?me) 2012-09-27 08:46:40 PDT
*** Bug 793991 has been marked as a duplicate of this bug. ***
Comment 14 Thomas D. (currently busy elsewhere; needinfo?me) 2012-09-27 08:47:39 PDT
*** Bug 234422 has been marked as a duplicate of this bug. ***
Comment 15 Thomas D. (currently busy elsewhere; needinfo?me) 2012-11-22 00:28:35 PST
*** Bug 747854 has been marked as a duplicate of this bug. ***
Comment 16 Thomas D. (currently busy elsewhere; needinfo?me) 2012-12-21 14:19:52 PST
Partial/substring matching (and split multi-searchword matching) would make address autocomplete by order of magnitudes more useful. I keep running into this as I'm searching for those less-known addresses.

E.g. if I have an addressbook entry
  John Doe (ps-branch) <themanager.park-street@somebank.com>,
where "John Doe (ps-branch)" is the display name, the following autocomplete searches will NOT find that contact (without quotes):
"Doe John","ps-branch","manager","park","street","bank","bank.com", "<theman", "bank.com>"

Expected: For all of these search words, autocomplete should find & list that entry.

Mark (:standard8), could you point out where the autocomplete code lives?
Comment 17 Thomas D. (currently busy elsewhere; needinfo?me) 2013-01-18 12:07:52 PST
*** Bug 824732 has been marked as a duplicate of this bug. ***
Comment 19 freeman3 2013-02-04 14:22:23 PST
this feature is already implemented in address book so i guess it should not be difficult to do
Comment 20 Mark Banner (:standard8) 2013-03-06 07:27:30 PST
(In reply to Thomas D. from comment #16)
> Partial/substring matching (and split multi-searchword matching) would make
> address autocomplete by order of magnitudes more useful. I keep running into
> this as I'm searching for those less-known addresses.
> 
> E.g. if I have an addressbook entry
>   John Doe (ps-branch) <themanager.park-street@somebank.com>,
> where "John Doe (ps-branch)" is the display name, the following autocomplete
> searches will NOT find that contact (without quotes):
> "Doe John","ps-branch","manager","park","street","bank","bank.com",
> "<theman", "bank.com>"
> 
> Expected: For all of these search words, autocomplete should find & list
> that entry.
> 
> Mark (:standard8), could you point out where the autocomplete code lives?

iirc this is a duplicate of existing bugs that discuss the same sort of issues - and I'm fairly sure at least one had a partial patch. Depending on which autocomplete you're looking (compose versus ab) at then I think you want to take a look into preferences as the autocomplete formats are defined there iirc.
Comment 21 Thomas D. (currently busy elsewhere; needinfo?me) 2013-03-09 05:59:03 PST
Tentative, very wide search of autocomplete bugs:

product:core,toolk,mailn,thun,seam su:auto su:complet -su:ldap,css,locat,sess,form,litmus,URL,Uri,awesome  -comp:histo,secur,disab,find,javasc,passw,search,places,graph,form,dom,locat

https://bugzilla.mozilla.org/buglist.cgi?quicksearch=product%3Acore%2Ctoolk%2Cmailn%2Cthun%2Cseam%20su%3Aauto%20su%3Acomplet%20-su%3Aldap%2Ccss%2Clocat%2Csess%2Cform%2Clitmus%2CURL%2CUri%2Cawesome%20%20-comp%3Ahisto%2Csecur%2Cdisab%2Cfind%2Cjavasc%2Cpassw%2Csearch%2Cplaces%2Cgraph%2Cform%2Cdom%2Clocat&list_id=5945837
Comment 22 Thomas D. (currently busy elsewhere; needinfo?me) 2013-03-09 06:02:43 PST
Comment on attachment 469478 [details]
testcase: sample address book card (test address book, import as csv)

>First Name,Last Name,Display Name,Nickname,Primary Email,Secondary Email,Screen Name,Work Phone,Home Phone,Fax Number,Pager Number,Mobile Number,Home Address,Home Address 2,Home City,Home State,Home ZipCode,Home Country,Work Address,Work Address 2,Work City,Work State,Work ZipCode,Work Country,Job Title,Department,Organization,Web Page 1,Web Page 2,Birth Year,Birth Month,Birth Day,Custom 1,Custom 2,Custom 3,Custom 4,Notes,
>Jane Susan,Doe-Miller More,"Display, Janette (sample company)",Nicky Jenny,foo.bar@subdomain.mailserver.com,addy.baz@asdf.com,screeny weeny,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
Comment 23 Thomas D. (currently busy elsewhere; needinfo?me) 2013-03-09 06:39:34 PST
(In reply to Mark Banner (:standard8) from comment #20)
> (In reply to Thomas D. from comment #16)
> > Mark (:standard8), could you point out where the autocomplete code lives?
> 
> iirc this is a duplicate of existing bugs that discuss the same sort of
> issues - and I'm fairly sure at least one had a partial patch.

If you could mention those bugs you're thinking of, especially the one with a partial patch, that would be great. Tentative search of comment 21 might be a starting point.

> Depending on
> which autocomplete you're looking (compose versus ab) at then I think you
> want to take a look into preferences as the autocomplete formats are defined
> there iirc.

I don't understand what you mean by "look into preferences"? I was hoping for an MXR-reference to the code that handles autocompletion in composition and/or AB.

Concerning "where-to-search", this is only about composition, because AB already searches for substrings matching "search words". That's the most crucial change advocated by this bug.

Concerning "handle-multiple-search-words", both AB and compose handle multi-word searches as an exact string instead of splitting words and search for each word (like quickfilter). If required, that could be handled in a separate bug.
Comment 24 Thomas D. (currently busy elsewhere; needinfo?me) 2013-06-16 00:50:26 PDT
Aceman, Richard (:paenglab), as we approach TB24 for which features should be ready by 24th June 2013, I'd like to draw your attention to this awesome papercut bug! Perhaps you could give this a try?

I imagine that once we know the position of the code, this might actually be relatively easy to fix... Wayne, can you add Neil to find out where the code lives?

Google, TB global search, TB quick filter, TB AB quick search, you name it - every other search algorithm is better than the current address autocomplete which stubbornly looks at the beginning of fields only, and reads multiple search words as a single string, instead of searching for {contains *word1* anywhere AND contains *word2* anywhere}. For every other search, even in TB, adding more search words helps to refine the search - with address autocomplete, you will get nothing instead unless you type exactly what is there from the beginning of the field...
Comment 25 Suyash Agarwal (:sshagarwal) 2013-06-16 01:12:05 PDT
For multiple words in the search term, what do you want as the output?
That is, the result should contain every word from the search term, or even a single word from the search term will suffice for the result to be displayed.
Comment 26 Thomas D. (currently busy elsewhere; needinfo?me) 2013-06-16 01:49:25 PDT
(In reply to Suyash Agarwal (:sshagarwal) from comment #25)
> For multiple words in the search term, what do you want as the output?
> That is, the result should contain every word from the search term, or even
> a single word from the search term will suffice for the result to be
> displayed.

Result should contain(!) *every* word from the search term, but it should not matter
- whether each word is at the beginning of a searched field (like Display Name) or not
- whether word1 and word2 are adjacent in one field, or both in the same field at all, or not

So it should search for matches having
*word1* in any field AND *word2* in any field
Comment 27 Thomas D. (currently busy elsewhere; needinfo?me) 2013-06-16 01:52:32 PDT
Comment 6 has a testcase AB card (attachment 469478 [details], csv for import as an AB) to expose the behaviour and avoid false positives.

Comment 16 illustrates the problem of this bug that searching for any string that does not match the exact beginning of fields like Display Name, First Name, Last Name will fail, and multiple words are inflexibly treated as a single string which again is only found if at the beginning of fields.

Compare the search algorithms:

Search for:

a) "string1" (one word; without quotes)
b) "string1 string2" (two words; without quotes)

1) Composition's recipient autocomplete (this bug)

a) Match string1* in any target field of AB cards (single string at the beginning of fields)
b) Match "string1 string2*" in any target field (single concatenated string at the beginning of fields)

Flexibility/success of searching: bad

2) AB's address quicksearch

Match "*string1 string2*" in any target field (single concatenated string anywhere in the fields)

Flexibility/success of searching: medium (already much better than 1).


*** Expected result for this bug: **********************************************

3) Message list's Quick Filter (similar to google) - 

Match *string1* in any target field AND *string2* in any target field (both strings must be contained as substrings anywhere in the target, allowing for free-style searches and easy narrowing down of results by adding further clues)

Flexibility/success of searching: best (by order of magnitudes better than 1!).
Comment 28 Thomas D. (currently busy elsewhere; needinfo?me) 2013-06-16 02:25:05 PDT
(In reply to Thomas D. from comment #27)
(Oops, posted too soon before completing the patterns, let's try that again)

Comment 6 has a testcase AB card (attachment 469478 [details], csv for import as an AB) to expose the behaviour and avoid false positives.

Comment 16 illustrates the problem of this bug that searching for any string that does not match the exact beginning of fields like Display Name, First Name, Last Name will fail, and multiple words are inflexibly treated as a single string which again is only found if at the beginning of fields.

Compare the search algorithms:

Search for:

a) "string1" (one word; without quotes)
b) "string1 string2" (two words; without quotes)

1) Composition's recipient autocomplete (this bug)

a) Match string1* in any target field of AB cards (single string at the beginning of fields)
b) Match "string1 string2*" in any target field (single concatenated string at the beginning of fields)

Flexibility/success of searching: bad

2) AB's address quicksearch

a) Match *string1* in any target field (ok)
b) Match "*string1 string2*" in any target field (single concatenated string anywhere in the fields -> cannot narrow down results efficiently)

Flexibility/success of searching: medium (already better than 1).


*** Expected Result for this bug: **********************************************

3) Message list's Quick Filter (similar to google) - (expected result for this bug)

a) Match *string1* in any target field
b) Match *string1* in any target field AND *string2* in any target field (both strings must be contained anywhere in the target -> allowing for free-style searches and easy narrowing down of results by adding further clues)

Flexibility/success of searching: best (by order of magnitudes better than 1, and better than 2).

-------------
E.g. using Quick Filter, you can just search for [Peter holiday] to find messages that have *Peter* e.g. as sender OR recipient, AND *holiday* e.g. as a subject, so it's very simple to narrow down search results quickly

Same here, you'd like to just search for e.g. [Ann moz] as a fuzzy search to find any addresses involving a person you remember vaguely as Anne-Mary (or was it Maryanne, or Anne-Mary, or Marianne?), but you know she works at Mozilla...

So we should find results having:

(*ann* in Display Name OR First Name OR Last Name OR Nick [OR...])
AND
(*moz* in Display Name OR First Name OR Last Name OR Nick [OR...])

e.g. the following matches:

Mary-Anne <ma@insidemozilla.com>
Mary Anne <ma@insidemozilla.com>
Anne-Mary <am@insidemozilla.com>
Anne Mary <am@insidemozilla.com>
Annemary <am@insidemozilla.com>
Maryanne <ma@insidemozilla.com>
Bunny <Marianne@insidemozilla.com>
A News Nerd (ANN) <nerd@mozilla.com>
Anne-Mary (Mozilla) <am@privateaddress.com>
Mary-Anne (Moz Rep) <ma@privateaddress.com>
Comment 29 Thomas D. (currently busy elsewhere; needinfo?me) 2013-06-16 02:31:01 PDT
needinfo? Wayne: pls add/contact Neil -> see comment 24
Comment 30 Thomas D. (currently busy elsewhere; needinfo?me) 2013-06-16 04:30:02 PDT
Bug 529584, attachment 744047 [details] has a wip patch (tryout sketch) trying to address this issue for the primary email field using javascript indexOf() function:

Current algorithm...

(email.toLocaleLowerCase().startsWith(fullString))

...should be replaced with:

(email.toLocaleLowerCase().indexOf(fullString) > -1)

Or since we are using Mozilla's Gecko, suppose we could even use:

(email.toLocaleLowerCase().contains(fullString))

--------
Involved files (per Bug 529584 Comment 22 by :mconley):

(source file)
http://mxr.mozilla.org/comm-central/source/mailnews/addrbook/src/nsAbAutoCompleteSearch.js

> Yes, that is one of the files. The LDAP autocomplete implementation can be
> found under nsLDAPAutoCompleteSession.cpp, but I believe
> nsAbAutoCompleteSearch.js is what you need for standard Mork address books.

(source file)
http://mxr.mozilla.org/comm-central/source/mailnews/addrbook/src/nsLDAPAutoCompleteSession.cpp

(mxr search)
http://mxr.mozilla.org/comm-central/find?string=nsLDAPAutoCompleteSession&tree=comm-central&hint=

---------

(In reply to Thomas D. from comment #28)
> *** Expected Result for this bug:
> **********************************************
> So we should find results having:
> 
> (*ann* in Display Name OR First Name OR Last Name OR Nick [OR...])
> AND
> (*moz* in Display Name OR First Name OR Last Name OR Nick [OR...])
> 
> e.g. the following matches:
> 
> Mary-Anne <ma@insidemozilla.com>
> Mary Anne <ma@insidemozilla.com>
[snip]
> Mary-Anne (Moz Rep) <ma@privateaddress.com>

More examples/scenarios and the expected algorithm in Bug 529584 Comment 16:

> {(Firstname contains string1) OR (Lastname contains string1) OR
> (Displayname contains string1) OR (Email1 contains string1) OR ...}
> AND 
> {(Firstname contains string2) OR (Lastname contains string2) OR
> (Displayname contains string2) OR (Email1 contains string2) OR ...}

Bug 529584 is a small subset of this bug covering only the email address fields, so I think it's best to do the work here.
Comment 31 Thomas D. (currently busy elsewhere; needinfo?me) 2013-06-16 04:31:27 PDT
(In reply to Thomas D. from comment #30)
> Or since we are using Mozilla's Gecko, suppose we could even use:
> (email.toLocaleLowerCase().contains(fullString))

https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/contains
Comment 32 Mihovil Stanic [:Mikeyy - L10n HR] 2013-06-16 05:31:12 PDT
I would also like to add that sorting of results is also very important.
If there is more then 1 result my suggestion is to sort them:

1st = most used contact
2nd = second most used contact
3rd = third most used contact
etc...

or

1st = most used contact
2nd = first contact when sorting alphabeticaly
3rd = second contact when sorting alphabeticaly
etc...

In any case, it would be good if first result stays most used contact (received/sent emails). I think that's the way it's now.
Comment 33 Wayne Mery (:wsmwk, NI for questions) 2013-06-16 05:43:41 PDT
Thomas, is there anything left for Neil to research?
Comment 34 Ulf Zibis 2013-06-16 06:13:32 PDT
(In reply to Suyash Agarwal (:sshagarwal) from comment #25)
> For multiple words in the search term, what do you want as the output?
> That is, the result should contain every word from the search term, or even
> a single word from the search term will suffice for the result to be
> displayed.

I think, we should prioritize the search results, i.e. list the highest first:
In case of "S1 S2" search for:
1. S1* AND S2*
2. *S1* AND *S2*
3. S1* OR S2*
4. *S1* OR *S2*

The search should be restricted on address card *name and *email fields, maybe company field with 1st priority for S1 as first for "matching words in field" or "first name, name, display name, company" and then a 2nd priority on *name fields before *email and company fields.
Si* is meant as match on the beginning of any "word" in those fields.

So "p m" should find in following order:
Paul Müller
Peter Mayer
Peter Hammer
Ralph Mayer
Happy Worm
Peter Anyone
Ann Pest
Mike Anyone
Ann Müller
Rapper
Hammer

.. and after again prioritize with some weight on "frequently used in near past".
Comment 35 Ulf Zibis 2013-06-16 06:22:06 PDT
(In reply to Mihovil Stanic [:Mikeyy - L10n HR] from comment #32)
> In any case, it would be good if first result stays most used contact
> (received/sent emails). I think that's the way it's now.

I would say, only value *sent* emails.
Imagine, one receives many emails from bugzilla-daemon, then search for "bu" should prioritize buddy, as he never used bugzilla-daemon for sent.
Comment 36 :aceman 2013-06-16 06:52:29 PDT
The search logic already sorts the results by popularity so comment 32 should be covered. Not sure when 'popularity' is increased so that could be looked at.
Comment 37 Thomas D. (currently busy elsewhere; needinfo?me) 2013-06-16 07:03:08 PDT
(In reply to Wayne Mery (:wsmwk) from comment #33)
> Thomas, is there anything left for Neil to research?

Thanks for the followup. No, probably not now. I had already done more research on Bug 529584 than I thought I had done, so it was just a matter of making that information available where it's needed - here! :)
Comment 38 Thomas D. (currently busy elsewhere; needinfo?me) 2013-06-16 07:45:36 PDT
(In reply to Ulf Zibis from comment #34)

Ulf, nice to see you again.

> 3. S1* OR S2*
> 4. *S1* OR *S2*

Sorry, but implementing an OR search to find cards that match just one out of two or more searchwords is clearly NOT part of this bug, and should not be discussed here to avoid further distraction.

Doing an OR search of multiple search words against the card will INCREASE the number of results EXPONENTIALLY when you add more words, which is directly opposed to the intention of finding a SINGLE recipient to add, i.e. starting out broadly and then narrow down results by adding more search terms.

OR search of multiple search words does not make sense for recipient autocomplete in its current shape at all, because it is designed to select a single recipient at a time.

To use your example below, if you want to search for all persons containing "p" like "Peter Anyone", you should search for just "p". If you want to search for all persons having "m" like "Mike Anyone", search for just "m". You can't add them both at the same time, so returning all those results at the same time is entirely useless for a single-row recipient field. The useful and intended use pattern here would be to start out with [pe] which returns...

-> Peter Mayer, Peter Hammer, Peter Anyone, Ann Pest, Rapper

...and then proceed from there adding another unique string like [ma] to make it [pe ma] which returns:

-> Peter Mayer

...done - found your single recipient out of 11 with typing only 4 letters.

I'm not saying there couldn't be merit for multi-term OR searches when adding recipients; it might be useful as an option for other types of searches like AB Quicksearch where you can act on multiple results simultaneously. If you want to add multiple recipients at a time, you can use Composition's Contacts Sidebar (F9), or Address Book.

> So "p m" should find in following order:
> Paul Müller
> Peter Mayer
> Peter Hammer
> Ralph Mayer
> Happy Worm

Above looks good, but I'm not sure how that combines with popularity index, and we need to check the runtime footprint of doing so: Can we afford to run "startsWith" search first and then do "contains" search on the same set in another separate loop?

FTR: Entries below should NOT be found by single-recipient auto-complete when searching for [p m]:

> Peter Anyone
> Ann Pest
> Mike Anyone
> Ann Müller
> Rapper
> Hammer
Comment 39 Ulf Zibis 2013-06-16 07:46:09 PDT
(In reply to Ulf Zibis from comment #34)
Note: This is how it works on Palm handhelds and smart phones since years, with the exception, that "pma" is prioritized as "p ma".

Additionally prioritize "Müller" before "P" in pattern "P Müller".
Comment 40 Ulf Zibis 2013-06-16 07:51:20 PDT
(In reply to Thomas D. from comment #38)
> (In reply to Ulf Zibis from comment #34)
> 
> Ulf, nice to see you again.
> 
> > 3. S1* OR S2*
> > 4. *S1* OR *S2*
> 
> Sorry, but implementing an OR search to find cards that match just one out
> of two or more searchwords is clearly NOT part of this bug, and should not
> be discussed here to avoid further distraction.
>
> > So "p m" should find in following order:
> > Paul Müller
> > Peter Mayer
> > Peter Hammer
> > Ralph Mayer
> > Happy Worm

Accepted!
Comment 41 Thomas D. (currently busy elsewhere; needinfo?me) 2013-06-16 08:38:42 PDT
Codewise, current search algorithm for local search (vs. LDAP) apparently lives here:

http://mxr.mozilla.org/comm-central/source/mailnews/addrbook/src/nsAbAutoCompleteSearch.js#158

Search against email address fields:

169   _searchWithinEmails: function _searchWithinEmails(searchQuery, fullString,
170                                                     directory, result) {

Search against other card fields like names:

http://mxr.mozilla.org/comm-central/source/mailnews/addrbook/src/nsAbAutoCompleteSearch.js#197

203    * @param fullString  The full search string.
204    * @param firstWord   The first word of the search string.
205    * @param rest        Anything after the first word.

209   _checkEntry: function _checkEntry(card, emailToUse, fullString, firstWord,
210                                     rest) {
211     var i;
212     if (card.isMailList) {
213       return card.displayName.toLocaleLowerCase().startsWith(fullString) ||
214         card.getProperty("Notes", "").toLocaleLowerCase().startsWith(fullString) ||
215         card.getProperty("NickName", "").toLocaleLowerCase().startsWith(fullString);
216 <<<[Snip]>>>

I think we want to do several things here:

* Consider keeping this function, slightly modified, to run multi-word single-string searches enclosed in double quotes: ["John Doe"]

* Create a copy of this funtion for the more flexible use discussed here
- fullString would be enough for input, then we split up into double-quoted phrases (first) and single words (second)
- for each word or quoted phrase as searchterm, go through the loop
- in the loop, swap .startsWith(searchterm) against .contains(searchterm)
- popularityIndex stuff remains untouched and happens automatically after that afasics

With those changes, let's see what we get - I'm expecting much more efficient searches with better results:

* choose both (or all) words with uniqueness in mind, and at least three letters:
[Pet May] to find Peter Mayer
-> first word: more hits than now, but most popular still on top
-> second word: if chosen well, will drastically narrow down results; combined with most popular on top, should deliver the perfect match most of the time, while allowing for a very flexible search where you can use any little piece that you remember, and be sure that nothing escapes you
-> if you're still not happy (too many results), add another little term or two to make it more unique

=> The whole UX of this algorithm is much better than now where for many legitimate usecases we don't find things at all and then need to start a new search, and some terms that happen to be in the middle of card fields (esp. Display Name!), even when they are actually separate words, will never be found.

This algorithm has been tested fireproof, see TB Quick Filter (works like a charme), or BMO quicksearch...

E.g., to find this bug out of more than 10.000 TB and MailNews Core bugs, this simple search will do the trick, by doing {contains(word1) AND contains(word2)} search, i.e. search for *word1* and *word2* anywhere in the summary field:

unique words that you remember:
partial, autocomplete

bmo quicksearch for:
:thun,mail su:artia su:aut
-> 1 result found (out of 10.000+)
Comment 42 Thomas D. (currently busy elsewhere; needinfo?me) 2013-06-16 08:52:42 PDT
(In reply to Thomas D. from comment #41)
> * choose both (or all) words with uniqueness in mind, and at least three
> letters:
> [Pet May] to find Peter Mayer

Where the example was badly chosen because that one already works if Peter doesn't have a middle name... So for this bug, we're trying to make...

[aye][ete] find "Peter John Mayer"

...irrespective of whether partial terms are found in first name, last name, or display name, email, nick, or company/organization, or shared between these fields, and irrespective of their word order inside the fields...

There's even an attempted existing name swap algorithm (1) involving rest string to match reverse combinations of first and last name, but it's very poor and will fail for a lot of cases which we are trying to fix here, e.g. middle names:

226     if (firstWord && rest &&
227         ((firstName.startsWith(firstWord) &&
228           lastName.startsWith(rest)) ||
229          (firstName.startsWith(rest) &&
230           lastName.startsWith(firstWord))))
231       return true;

(1) http://mxr.mozilla.org/comm-central/source/mailnews/addrbook/src/nsAbAutoCompleteSearch.js#226
Comment 43 Thomas D. (currently busy elsewhere; needinfo?me) 2013-06-16 08:57:28 PDT
(In reply to Thomas D. from comment #41)
> I think we want to do several things here:
[snip]

that's a very rough recipe which is far from completely thought out, but you get the idea that we want need to construct a new search algorithm from what is there, one that caters for our needs as described in this bug along Comment 30.
Comment 44 :aceman 2013-06-16 09:07:32 PDT
As I said in bug 529584 I do not like searching for strings inside of names (opposite to emails).
So I would agree with:
[Pet May] to find Peter Mayer

but not with:
[aye][ete] find "Peter John Mayer"

I think you get many irrelevant results this way. Also I do not think the brain works this way. I think if you remember a persons name, then you remember it in full or the start of it. But not the middle part. On the other hand, in emails (bug 529584) his name may appear anywhere in the address (e.g. mayer.peter@domain) so we can use 'contains'.
Comment 45 Ulf Zibis 2013-06-16 09:20:25 PDT
(In reply to :aceman from comment #44)
> As I said in bug 529584 I do not like searching for strings inside of names
> (opposite to emails).
> So I would agree with:
> [Pet May] to find Peter Mayer
> 
> but not with:
> [aye][ete] find "Peter John Mayer"

Don't forget: "mar ann" should find:
Mary-Ann
Mary Ann
Marie-Ann
Ann Maria
Ann-Mary
Marianne
Comment 46 Thomas D. (currently busy elsewhere; needinfo?me) 2013-06-16 09:26:34 PDT
(In reply to Thomas D. from comment #30)
> Bug 529584, attachment 744047 [details] has a wip patch (tryout sketch)
> trying to address this issue for the primary email field using javascript
> indexOf() function:
> 
> Current algorithm...
> 
> (email.toLocaleLowerCase().startsWith(fullString))
>
> ...should be replaced with:
>
> (email.toLocaleLowerCase().indexOf(fullString) > -1)

This is flawed again, doing too much bugmail, time for a break!
Of course we need .indexOf(splitStringArray[i]) here or something like that, not fullstring.

> Or since we are using Mozilla's Gecko, suppose we could even use:
> 
> (email.toLocaleLowerCase().contains(fullString))

dito -> splitString, not fullstring

Sorry for too much bugmail...
Comment 47 Thomas D. (currently busy elsewhere; needinfo?me) 2013-06-16 11:24:05 PDT
(In reply to :aceman from comment #44)
> As I said in bug 529584 I do not like searching for strings inside of names
> (opposite to emails).
> So I would agree with:
> [Pet May] to find Peter Mayer
> 
> but not with:
> [aye][ete] find "Peter John Mayer"
> 
> I think you get many irrelevant results this way. Also I do not think the
> brain works this way. I think if you remember a persons name, then you
> remember it in full or the start of it. But not the middle part. On the
> other hand, in emails (bug 529584) his name may appear anywhere in the
> address (e.g. mayer.peter@domain) so we can use 'contains'.

Proceed from there and you'll see very soon that the same applies to all name fields... I really don't see any alternative to multi-word parsing (at least), and that's even more powerful combined with substring search (at best).

To pick up your example, you said you want
[Pet May] to find Peter Mayer.

Suppose you just have email field filled on his card, no name fields.
So you actually want
[Pet May] to find <mayer.peter@domain> or <petermayer@domain>

For that really simple one already (where we currently fail), you already need to split up the search words and do a substring search:

if email.contains("Pet") AND email.contains("May") -> found

I.e. you have to search for *Pet* and *May* separately, anywhere in the string.
More so for <petermayer@domain.com> where there is no visible boundary between the names, and yet it's a perfectly useful search.

But what if your card has:

first name: Peter
email: Mayer@domain

OR another card has

last name: Mayer
email: Peter@domain

search words: [Pet May]

So to catch those two cards, you need (reduced for illustration purposes):

{firstname.contains("Pet") OR email.contains("Pet")}
AND
{firstname.contains("May") OR email.contains("May")}

Which is exactly as proposed by this bug.
Anything else simply won't find the cards you are looking for.
Same applies for all name fields:

Search: [Ann Mar]

"First name" field could have:
1) Anne Sophie Marie
2) Anne-Sophie Marie,
3) Anne Marie,
4) Anne-Marie,
5) Annemarie,
6) Annemary...

To catch anything, you need to split searchwords (at least).
To catch them all, there's no way except split multiword AND substring search (using contains).

> I think you get many irrelevant results this way. Also I do not think the
> brain works this way. I think if you remember a persons name, then you
> remember it in full or the start of it. But not the middle part. On the
> other hand, in emails (bug 529584) his name may appear anywhere in the
> address (e.g. mayer.peter@domain) so we can use 'contains'.

As shown above, the need to search for the "middle part" of field is not limited to the email fields at all. Without split-word AND substring search, you will have major limitations regarding flexibity and effectiveness of your search, even if you actually only want to search for the beginnings of names as you say (and no one will force you to search for crazy things like [aye ete] ;)).

I think it's not true that you'll get many irrelevant results this way - several intelligent searches have evolved this way and are successfully using exactly this algorithm, e.g.:

Firefox Awesomebar, TB Quick Filter Bar, BMO quicksearch, etc.

I don't see a big risk for false positives: for short terms, you'll get false positives even if you search only at the beginning of words. For any search terms having 3 or 4+ characters, especially if you search for the "beginning" of those names/emails you remember, it's very unlikely to get false positives.

[Pete Maye] - how many cards will have *both* of these terms as a false positive in the /middle/ of some fields?

The point being that with the algorithm proposed in this bug, we are trying to solve the big problem that depending on data structure in the card (which is unpredictable), even when you search for full names like...

[Peter Mayer]

...current search algorithm will actually *fail* for a big range of frequent cases, e.g. for all of the following:

"Display Name" field on card having:

Hans-Peter Mayer -> not found
Hans Peter Mayer -> not found
Mayer Peter -> not found (partially works for other fields, see comment 42).
Mayer, Peter -> not found
Peter Petermayer -> not found (it's a surname in German, I think...)
Peter Johnson-Mayer -> not found
etc. pp.

Moreover, consider cases where people change their name from single-surname to double-surname upon marriage, and you have both types in your AB.

In fact, the algorithm proposed in this bug works *especially* well for the dataset at hand, because names, emails, nicks, company names all tend to be rather unique in their spellings, and the dataset is relatively small (for private users at least). The new algorithm will actually *assist* you to easily narrow down even on big sets with non-unique properties:

David@gmail.com
David@googlemail.com
David@mozilla.com
David@mozmill.com
David@getmozilla.com
David@microsoft.com

So you want to find that David with a gmail account? Don't remember if it was gmail or googlemail? No problem. Just hack in:

[dav mail] and you're there - dav@moz and dav@ms filtered out.
[moz] to get anything containing moz, anywhere in their name, email, display name, whereever. How likely that "moz" will trigger false positives from the middle of field values in other cards?

It's the smart way of searching.
Comment 48 Thomas D. (currently busy elsewhere; needinfo?me) 2013-06-21 03:05:17 PDT
(In reply to Ulf Zibis from comment #45)
> (In reply to :aceman from comment #44)
> > As I said in bug 529584 I do not like searching for strings inside of names
> > (opposite to emails).
> > So I would agree with:
> > [Pet May] to find Peter Mayer
> > 
> > but not with:
> > [aye][ete] find "Peter John Mayer"
> 
> Don't forget: "mar ann" should find:
> Mary-Ann
> Mary Ann
> Marie-Ann
> Ann Maria
> Ann-Mary
> Marianne

+1, good examples why this is useful
Comment 49 Thomas D. (currently busy elsewhere; needinfo?me) 2013-06-21 03:12:43 PDT
(In reply to Thomas D. from comment #48)
> (In reply to Ulf Zibis from comment #45)
> +1, good examples why this is useful

(In reply to :aceman from Bug 529584 Comment 32)
> Comment on attachment 763273 [details] [diff] [review]
> patch v2

That patch, if approved, implements the "contains" logic suggested here (but not yet the split-multiword logic), which would be a significant improvement over the status quo

Aceman says:
> I have no problem with that if the reviewers agree with you. Bwinton,
> mkmelin?

9 duplicates and 20 votes on GS also agree
Comment 50 Wayne Mery (:wsmwk, NI for questions) 2013-06-24 07:41:50 PDT
(In reply to Thomas D. from comment #49)
> Aceman says:
> > I have no problem with that if the reviewers agree with you. Bwinton,
> > mkmelin?
> 
> 9 duplicates and 20 votes on GS also agree

An indication of the strength/need of this request is:
- many of those duplicates are recent bug reports
- I easily found 3 more GS topics (I tagged them appropriately, but stopped searching after finding 3)

If there is concern about too many results, perhaps there can be a slight delay in showing the initial results as long as the user is still typing.  

And I doubt too many results will be a problem for most people I bet the average user has less than 500-1000 contacts ... ldap users being an exception.
Comment 51 Suyash Agarwal (:sshagarwal) 2013-10-29 05:50:28 PDT
So, after bug 529584 has landed, can you please tell what exactly is asked out of this bug?
Comment 52 :aceman 2013-10-29 06:03:30 PDT
There are some feature descriptions starting at comment 27.
Thomas D. will be the mentor here :)
Comment 53 Suyash Agarwal (:sshagarwal) 2013-11-05 04:20:58 PST
(Quoting myself from comment #51)
> So, after bug 529584 has landed, can you please tell what exactly is asked
> out of this bug?
Comment 54 Thomas D. (currently busy elsewhere; needinfo?me) 2013-11-05 04:48:54 PST
(In reply to Suyash Agarwal (:sshagarwal) from comment #53)
> (Quoting myself from comment #51)
> > So, after bug 529584 has landed, can you please tell what exactly is asked
> > out of this bug?

bug 529584 only handles single-word searches in a useful way (I'm not using exact syntax here, just logical sketch):

foo -> field1.contains(foo) OR field2.contains(foo) etc.

Which is an improvement over status quo where we had
foo -> field1.beginswith(foo) OR field2.beginswith(foo) etc.

However, after bug 529584, we are still useless for multi-word searches:
foo bar -> field1.contains("foo bar") OR field2.contains("foo bar")
So only field that have both words combined as a single string will match, which is counterintuitive and unhelpful.

But what users really expect (and much more useful per countless examples in this bug), is this:
foo bar -> field1.contains(foo OR bar) AND field2.contains(foo OR bar) (or any logical equivalent thereof).

Iow, use google syntax where spaces imply logical AND, but each word is searched separately, not all words as a single string.

Codewise, we have to break up the original (multi-word) search string into single words (using spaces) BEFORE we do the real searches against fields.
Comment 55 Suyash Agarwal (:sshagarwal) 2013-11-29 22:45:27 PST
Created attachment 8340602 [details] [diff] [review]
Patch v1

Okay, so let's begin the iterations.
Are these results acceptable?
I am looking for every word separately in all the fields, and any contact having all the punched in words in its contact fields, is added to the result list.
So, this makes the search for first name, last name and email separately redundant.

Though, maybe we will need them back for prioritizing the results, or maybe modification to _getPopularityIndex() will do. That is to be looked at if these results aren't in the order desired.

So, probably we need finest testers who will probably kill this patch at the first hand (I already selected the finest I know, for feedback).

Please let me know if this works.
Thanks.
Comment 56 Mihovil Stanic [:Mikeyy - L10n HR] 2013-11-30 04:17:29 PST
I'm not coder so I don't understand your patch, but have questions.

If I have contact name:
First name: Maric Mara
Last name: PJ Kopanica
Display name: PJ Kopanica, Maric Mara
e-mail: trgovina.kopanica@company.com

When I search for word "kopanica" will it display this contact? Word "kopanica" isn't first word of any string here.
When I seach for word "company" which is present only in email, domain part, will it find this contact?

Thanks for patch.
Comment 57 Suyash Agarwal (:sshagarwal) 2013-11-30 04:25:23 PST
(In reply to Mihovil Stanic [:Mikeyy - L10n HR] from comment #56)
> When I search for word "kopanica" will it display this contact? Word
> "kopanica" isn't first word of any string here.
> When I seach for word "company" which is present only in email, domain part,
> will it find this contact?

Hi! I tried it and this contact was suggested both for "kopanica" and "company".
Thanks for the test case :)
Comment 58 Mihovil Stanic [:Mikeyy - L10n HR] 2013-11-30 04:39:17 PST
I'm sold then. :)

Here's another one, based on last one, just changed first name.

First name: Ann-Marie
Last name: PJ Kopanica
Display name: PJ Kopanica, Ann-Marie
e-mail: trgovina.kopanica@company.com

Does strings
"marie company"
"ari any"

shows this contact?

If it shows, I cannot think of more test cases since this covers whole words and part of words in any place inside the string.
Comment 59 Suyash Agarwal (:sshagarwal) 2013-11-30 05:21:09 PST
(In reply to Mihovil Stanic [:Mikeyy - L10n HR] from comment #58)
> Does strings
> "marie company"
> "ari any"
> shows this contact?

Ya, they do :)

Thanks.
Comment 60 Mihovil Stanic [:Mikeyy - L10n HR] 2013-11-30 06:00:20 PST
Great!
Any chance of fitting this in current version?
This is so good that I want it now. :)
Comment 61 :aceman 2013-11-30 06:54:25 PST
(In reply to Mihovil Stanic [:Mikeyy - L10n HR] from comment #56)
> First name: Maric Mara
> Last name: PJ Kopanica
> Display name: PJ Kopanica, Maric Mara
> e-mail: trgovina.kopanica@company.com
> 
> When I search for word "kopanica" will it display this contact? Word
> "kopanica" isn't first word of any string here.
This should already be matched in TB27 thanks to bug 529584.

The patch here focuses on multi word searches.
Comment 62 :aceman 2013-11-30 07:51:35 PST
So there seems to be some traction here now!

Can we ask everybody who has placed an offer at http://freedomsponsors.org/core/issue/145/autocomplete-does-not-show-all-matching-address-book-entries-implement-partialsubstring-matching-for-each-of-multiple-search-words-find-second-words-etc#sponsors to refresh it please if it is already expired? Please account for enough time to pass for the patch gets into your wished release (starting after the patch is reviewed and accepted which can take weeks from today). E.g. there are about 6-12 weeks until a patch checked into trunk gets into a TB beta.
Comment 63 Sebastian H. [:aryx][:archaeopteryx] 2013-11-30 08:53:58 PST
Comment on attachment 8340602 [details] [diff] [review]
Patch v1

Review of attachment 8340602 [details] [diff] [review]:
-----------------------------------------------------------------

Thank you for working on this. There seems to be something wrong with the patch because autocomplete is broken after applying these changes to nsAbAutoCompleteSearch.js. I did it manually, so it could be my fault, but I doubt it because the breakage happens after the changed code due to the modified |searchQuery|.

Error message:
Error: [Exception... "Component returned failure code: 0xc1f30001 (NS_ERROR_NOT_INITIALIZED) [nsIAbDirectory.childCards]"  nsresult: "0xc1f30001 (NS_ERROR_NOT_INITIALIZED)"  location: "JS frame :: jar:file:///f:/Mozilla/Coding/Thunderbird/Application/central/omni.ja!/components/nsAbAutoCompleteSearch.js :: _searchCards :: line 134"  data: no]
Source File: jar:file:///f:/Mozilla/Coding/Thunderbird/Application/central/omni.ja!/components/nsAbAutoCompleteSearch.js
Line: 134

|this._searchCards(searchQuery, dir, result);| gets called after your modifications and what I got as object when inspecting |this._abManager.getDirectory(directory.URI + searchQuery)| didn't contain |childCards|.

::: mailnews/addrbook/src/nsAbAutoCompleteSearch.js
@@ +8,5 @@
>  
>  const ACR = Components.interfaces.nsIAutoCompleteResult;
>  const nsIAbAutoCompleteResult = Components.interfaces.nsIAbAutoCompleteResult;
>  
> +let cu = Components.utils;

Remove that (together with the commented code which used it before).

@@ +172,5 @@
>    _searchWithinEmails: function _searchWithinEmails(searchQuery, fullString,
>                                                      directory, result) {
>      let childCards =
>        this._abManager.getDirectory(directory.URI + searchQuery).childCards;
> +    

whitespaces on empty line

@@ +409,5 @@
>      {
>        // Construct the search query; using a query means we can optimise
>        // on running the search through c++ which is better for string
>        // comparisons (_checkEntry is relatively slow).
> +      Components.utils.reportError("to search: " + fullString);

Remove the debug code.

@@ +414,5 @@
> +      let list = fullString.split(" ");
> +      let searchQuery = "";
> +      let modelQuery = "(or(DisplayName,c,@V)(FirstName,c,@V)(LastName,c,@V)(NickName,c,@V)(and(IsMailList,=,TRUE)(Notes,c,@V))(PrimaryEmail,c,@V)(SecondEmail,c,@V))";
> +      for (let i = 0; i < list.length; ++i) {
> +        searchQuery = searchQuery + modelQuery.replace(/@V/g, encodeURIComponent(list[i]));

"= searchQuery +" > "+="

@@ +419,3 @@
>        }
> +      searchQuery = "(and" + searchQuery + ")";
> +      // cu.reportError(searchQuery);

Remove debug code.

@@ +432,5 @@
>  
>          if (dir instanceof Components.interfaces.nsIAbDirectory &&
>              dir.useForAutocomplete(params.idKey)) {
>            this._searchCards(searchQuery, dir, result);
> +      //    this._searchWithinEmails(emailSearchQuery, fullString, dir, result);

Remove this line.
Comment 64 Suyash Agarwal (:sshagarwal) 2013-11-30 10:29:49 PST
Created attachment 8340673 [details] [diff] [review]
Patch v1 (cleaned up)

Cleaned up, removed the comments and debugging code.
Comment 65 Suyash Agarwal (:sshagarwal) 2013-11-30 10:31:41 PST
(In reply to Archaeopteryx [:aryx] from comment #63)
> Comment on attachment 8340602 [details] [diff] [review]
> Patch v1
> Thank you for working on this. There seems to be something wrong with the
> patch because autocomplete is broken after applying these changes to
> nsAbAutoCompleteSearch.js. I did it manually, so it could be my fault, but I
> doubt it because the breakage happens after the changed code due to the
> modified |searchQuery|.
> 
> Error message:
> Error: [Exception... "Component returned failure code: 0xc1f30001
> (NS_ERROR_NOT_INITIALIZED) [nsIAbDirectory.childCards]"  nsresult:
> "0xc1f30001 (NS_ERROR_NOT_INITIALIZED)"  location: "JS frame ::
> jar:file:///f:/Mozilla/Coding/Thunderbird/Application/central/omni.ja!/
> components/nsAbAutoCompleteSearch.js :: _searchCards :: line 134"  data: no]
> Source File:
> jar:file:///f:/Mozilla/Coding/Thunderbird/Application/central/omni.ja!/
> components/nsAbAutoCompleteSearch.js
> Line: 134
> 

I don't know why it isn't working for you, its working perfectly fine for me. I too had this error one or two times while writing the patch, but then I corrected the mistakes.
Comment 66 Magnus Melin 2013-12-01 02:56:49 PST
(In reply to :aceman from comment #44)
> As I said in bug 529584 I do not like searching for strings inside of names
> (opposite to emails).
> So I would agree with:
> [Pet May] to find Peter Mayer
> 
> but not with:
> [aye][ete] find "Peter John Mayer"
> 
> I think you get many irrelevant results this way. Also I do not think the
> brain works this way. I think if you remember a persons name, then you
> remember it in full or the start of it. But not the middle part. On the
> other hand, in emails (bug 529584) his name may appear anywhere in the
> address (e.g. mayer.peter@domain) so we can use 'contains'.

+1, I'd even say it's unlikely you'd type in Pet May and expect Peter May****. You'd type in in the full first name.
Comment 67 Thomas D. (currently busy elsewhere; needinfo?me) 2013-12-01 14:01:01 PST
(In reply to Magnus Melin from comment #66)

> +1, I'd even say it's unlikely you'd type in Pet May and expect Peter
> May****. You'd type in in the full first name.

If I'm searching for Marianne Mustermann, why would I regularly type the full first name ("Marianne") instead of just typing ma mu? It's a question of habit formation (as already implemented e.g. in awesome bar). As soon as we no longer force users into wasting their time with typing to get exact matches of full multi-word phrases, users will very quickly adapt to that and use more efficient searches. You're still free to type full first name if you insist (which will certainly eliminate any potential false positives for you), but please don't prevent other users from finding what they need in the most efficient way.

I don't know how other people's brain works, but I find it much easier to just type whichever part I remember without worrying about exact matches. I also know that humans have a tendency of (temporarily) forgetting exact spellings or middle names or double names and such and for all of these this bug will address the issue because any bit which you remember will still find you the desired match. For such cases, the brain does work *very much* like "I remember her name was something with ...anne" when I look for Marianne, or Mary-Anne, or Maryanne or you name it. Finding "...anne" in Marianne and Maryanne and Mary-Anne requires the very same algorithm which will eventually happen to succeed using [aye] [ete] to find "Peter John Mayer" but of course these less likely (non-natural) searchwords are just bad examples for a good and successful fuzzy algorithm. Iow, nobody is forcing you to type things like aye ete, but for lots of other users, that very algorithm allows them to find Maryanne by typing "anne" which certainly makes sense. And in fact it even does make a lot of sense to really use non-natural, incomplete parts names if you don't know the exact spelling:

What was his name again?
Wang Shue?
Vang Schuh?
Fang Shu?
Don't rember the first letter? Just skip it. Search for "ang" (as a shorthand for *ang*), and find what you need in no time. Don't remember the exact spelling of last name either? No problem. If all you remember is there was "hu" in his name somewhere, just search [ang hu] and you're there. Nobody is forcing users like Magnus to search that way, but I have not the least of doubt that this is useful for a whole magnitude of scenarios.

Think of fancy letters:
was it Émile, Emile, Èmile, Êmile, or Emilé?
just type "mile" or even "mil" to find the funny thing (just add a few letters of last name if you should have too many matches). And yes, that will obviously correctly find Milean or Milee or whoever too.

Pls also remember there can be more than one word in a single field. I don't think we really want to start making this algorithm even more complicated and less transparent by comparing against the beginning of each word in each field. And I maintain that it will exclude a lot more useful scenarios than potential false positives.

Magnus, I've provided dozens of better examples in many comments on this bug and bug 529584 to prove beyond doubt that 
- this is useful and crucial for finding the right matches for whole range of real-life usecases (omission of which usecases and results is a usability bug at least)
- this will neither in any significant way increase the number of irrelevant results nor otherwise be harmful (and *if* the trade-off was beetween not showing many relevant matches at all or showing a few extra false positives, I'd definitely choose the latter; but after this bug, it's irrelevant because a) the false positives will NOT increase in a significant way and b) both current false positives and potential new false positives will be instantly easily removed by just typing a second search word which will radically shrink the result set, most likely giving you a precise choice of very few matches or even the very match you're looking for.
- the very same search algorithm is already used by many other widely used searches including google, awesome bar(!), TB quick filter, you name it - and they all work like a charme and I'm not aware of any complaints

*Pls re-read all of the comments supplied by me and others on both bugs which prove that this is required and useful exactly as described.*

Searchword having 4 characters or more are highly unlikely to find irrelevant matches; if you only type 3 or less characters, you'll get false positives with *any* algorithm including the current one. 
But let's suppose for a minute that there *were* significantly more irrelevant results; than that's still not an argument against this bug, but an argument in favour of introducing *frecency* into autocomplete (as in awesome bar).

Pls try quickfilter (and I've also recommended that before) and provide real-life examples from your existing address book which will fail after this bug. I can provide plenty real life examples which have failed for me using current algorithm, and others are even putting bounties on this bug because they want it fixed.

*If* you find any significant failures, pls explain why they could not be solved with introducing frecency rather than killing the elephant (block this highly powerful bug) because you want to shoot a fly (avoid potential false positive in very rare edge cases which with the traditional use patterns assumed by you could not even occur).

Pls let's just try the final patch on this bug and then take it from there.
Comment 68 Mihovil Stanic [:Mikeyy - L10n HR] 2013-12-01 14:09:30 PST
+1 for Thomas
I don't see point in downgrading patch functions. It allows everyone to use his own style.
Comment 69 Magnus Melin 2013-12-01 22:52:50 PST
To clarify I wasn't really saying we need to drop that, just I don't think it's not commonly used. Agreed it could help when not remembering exact spellings of names though.
Comment 70 Thomas D. (currently busy elsewhere; needinfo?me) 2013-12-02 12:58:31 PST
Comment on attachment 8340673 [details] [diff] [review]
Patch v1 (cleaned up)

Review of attachment 8340673 [details] [diff] [review]:
-----------------------------------------------------------------

Suyash, shamwari (Shona for "friend"), another job well done! :)

This looks really promising. And it's a true shame nobody (including myself) has ever tried to fix this extremely low-cost-high-value bug before...

Some nits of course, and 1 other code path which needs to be massaged in much the same way here (plus LDAP in followup bugs).

::: mailnews/addrbook/src/nsAbAutoCompleteSearch.js
@@ +408,5 @@
>      {
>        // Construct the search query; using a query means we can optimise
>        // on running the search through c++ which is better for string
>        // comparisons (_checkEntry is relatively slow).
> +      let list = fullString.split(" ");

"list" should be something less generic like
"searchwords" (for better readability and less headaches for posterity). Should also have a comment:

// When user's fullstring search expression is "foo bar" (without quotes), search for each word separately, for more versatile search and better, more complete results containing "foo" in any field AND "bar" in any field (see bug 558931 for explanations). Structurally same algorithm as in message quick filter.

@@ +410,5 @@
>        // on running the search through c++ which is better for string
>        // comparisons (_checkEntry is relatively slow).
> +      let list = fullString.split(" ");
> +      let searchQuery = "";
> +      let modelQuery = "(or(DisplayName,c,@V)(FirstName,c,@V)(LastName,c,@V)(NickName,c,@V)(and(IsMailList,=,TRUE)(Notes,c,@V))(PrimaryEmail,c,@V)(SecondEmail,c,@V))";

I like modelQuery, much more readable what this does.

Remove "(and(IsMailList,=,TRUE)", it's just clutter which is no longer needed. Just (NickName,c,@V) is enough, and handles both lists and people correctly, consistently and efficiently.

Fwiw, try this *on Trunk* to see that whatever this was once supposed to do no longer applies, as we no longer have different behaviour for lists and persons wrt to searching the nick field:

1 have person's card with nick = "foo1bar"
2 have mailing list with nick = "foo2bar"
3 compose new msg
4a in recipient field, type "foo" or "bar"
4b in recipient field, type "oo1"
4c in recipient field, type "oo2"

Actual = Expected result:

autocomplete dropdown results showing...
4a ...both the list and the person, because both nicks have "foo" and "bar"
4b ... only the person (so we use .contains search for person's nick)
4c ... only list (so we also use .contains for list's nick)
=> No difference in searching nick name field between cards of type person or mailing list.

And that's good behaviour, no reason to introduce any special cases here and be ux-inconsistent - there's no user-intellegible difference between nicks on persons vs. nicks on lists.

#####

PLUS: The fact that nicks of persons and lists are searched in the same way on trunk already before this patch imo suggests that this C++ search path is currently not used for that scenario. True or not, the patch needs to fix the traditional javascript search as well, found here:

_checkEntry: function...
http://mxr.mozilla.org/comm-central/source/mailnews/addrbook/src/nsAbAutoCompleteSearch.js#211
So pls ensure that _checkEntry handles multiple words seperately, too.

I'm not sure why we have two searches for what looks like the same thing, but as long as we have them and until someone can prove beyond doubt that one can be discarded to consolidate them, we should just maintain both of them (think of addons, too). We also just fixed both search constructions in twin bug 529584, and it shouldn't be a big deal.

If at all, I'd recommend to handle potential merging of the two search constructions and respective research in a new bug, not here.

#####

Furthermore: iirc, we still need followup bugs for handling LDAP cases consistently with non-LDAP cases for both this bug 558931 and bug 529584.

@@ +411,5 @@
>        // comparisons (_checkEntry is relatively slow).
> +      let list = fullString.split(" ");
> +      let searchQuery = "";
> +      let modelQuery = "(or(DisplayName,c,@V)(FirstName,c,@V)(LastName,c,@V)(NickName,c,@V)(and(IsMailList,=,TRUE)(Notes,c,@V))(PrimaryEmail,c,@V)(SecondEmail,c,@V))";
> +      for (let i = 0; i < list.length; ++i) {

list -> searchwords (see above)

@@ -437,5 @@
>  
>          if (dir instanceof Components.interfaces.nsIAbDirectory &&
>              dir.useForAutocomplete(params.idKey)) {
>            this._searchCards(searchQuery, dir, result);
> -          this._searchWithinEmails(emailSearchQuery, fullString, dir, result);

OK, you moved and integrated this above which looks reasonable at first sight. However, let's double-check:
- Why was it originally here?
- Did separate searches here originally influence sorting of results? (like general matches first, and email matches later?) - probably not, I think each turn would sort its addresses into the general result set, so we might be ok to consolidate
Comment 71 Ulf Zibis 2013-12-02 16:35:00 PST
(In reply to Thomas D. from comment #70)
> I'm not sure why we have two searches for what looks like the same thing
Do you speak about recipient autocomplete and quick filter here?
I agree, both should use the same algorithm and code.
Comment 72 Sebastian H. [:aryx][:archaeopteryx] 2013-12-03 10:59:01 PST
Comment on attachment 8340673 [details] [diff] [review]
Patch v1 (cleaned up)

Review of attachment 8340673 [details] [diff] [review]:
-----------------------------------------------------------------

::: mailnews/addrbook/src/nsAbAutoCompleteSearch.js
@@ +411,5 @@
>        // comparisons (_checkEntry is relatively slow).
> +      let list = fullString.split(" ");
> +      let searchQuery = "";
> +      let modelQuery = "(or(DisplayName,c,@V)(FirstName,c,@V)(LastName,c,@V)(NickName,c,@V)(and(IsMailList,=,TRUE)(Notes,c,@V))(PrimaryEmail,c,@V)(SecondEmail,c,@V))";
> +      for (let i = 0; i < list.length; ++i) {

for (searchword of searchwords) {
searchQuery += modelQuery.replace(/@V/g, encodeURIComponent(searchword);
}
Comment 73 Thomas D. (currently busy elsewhere; needinfo?me) 2013-12-03 13:46:14 PST
(In reply to Archaeopteryx [:aryx] from comment #72)
> Comment on attachment 8340673 [details] [diff] [review]
> for (searchword of searchwords) {
> searchQuery += modelQuery.replace(/@V/g, encodeURIComponent(searchword);
> }

Nice! :)
Comment 74 Thomas D. (currently busy elsewhere; needinfo?me) 2013-12-03 13:57:13 PST
(In reply to Ulf Zibis from comment #71)
> (In reply to Thomas D. from comment #70)
> > I'm not sure why we have two searches for what looks like the same thing
> Do you speak about recipient autocomplete and quick filter here?

No, mine was about javascript _checkEntry function and the separate searchquery we construct for c++, both found in nsAbAutoCompleteSearch.js and they both seem to do the same thing.

> I agree, both should use the same algorithm and code.

Yes, the *behaviour* of recipient autocomplete, AB quicksearch and main view's quickfilter bar wrt names and addresses should indeed be the same (as far as possible), and this bug goes a long way towards that. I don't think main view's quickfilter bar can have the same *code* because it's another dataset with different structure (the messages; vs. the addressbook).
Comment 75 Thomas D. (currently busy elsewhere; needinfo?me) 2013-12-03 14:44:58 PST
Comment on attachment 8340673 [details] [diff] [review]
Patch v1 (cleaned up)

Review of attachment 8340673 [details] [diff] [review]:
-----------------------------------------------------------------

Food for thought (and perhaps another bug):

::: mailnews/addrbook/src/nsAbAutoCompleteSearch.js
@@ +410,5 @@
>        // on running the search through c++ which is better for string
>        // comparisons (_checkEntry is relatively slow).
> +      let list = fullString.split(" ");
> +      let searchQuery = "";
> +      let modelQuery = "(or(DisplayName,c,@V)(FirstName,c,@V)(LastName,c,@V)(NickName,c,@V)(and(IsMailList,=,TRUE)(Notes,c,@V))(PrimaryEmail,c,@V)(SecondEmail,c,@V))";

I think if we agree that for any given search terms, there should not be a difference in search behaviour / result set between AB quicksearch, contacts side bar, and recipient autocomplete, then we can take some inspiration from bug 556490 (which however is currently focused on LDAP): 

Compare:
1) autocomplete modelQuery (per current patch + suggested improvements)
2) AB quicksearch modelQuery (also used for contacts side bar, or vice versa) currently has this modelQuery (hidden pref mail.addr_book.quicksearchquery.format, http://mxr.mozilla.org/comm-central/source/mail/locales/en-US/chrome/messenger/messenger.properties#257):

1) (or(DisplayName,c,@V)(FirstName,c,@V)(LastName,c,@V)(NickName,c,@V)(Notes,c,@V))(PrimaryEmail,c,@V)(SecondEmail,c,@V))
2) (or(PrimaryEmail,c,@V)(DisplayName,c,@V)(FirstName,c,@V)(LastName,c,@V))

I see no reason why these should be different, except bad design. So I think while we are here, we could unify them using 1) which will fix bug 118624, and then the current patch should just get the modelQuery from pref as seen in http://mxr.mozilla.org/comm-central/source/mail/components/addrbook/content/abContactsPanel.js#135. We'd have to double-check why 2) doesn't use secondary email address so far, if that's sneaked in somewhere else, or if it's just another bug.

Or we could do that in another bug, and think a bit harder which other fields are missing in that common query (along Bug 298438 which needs more thought).

I think I'm trying to say that these major improvements are actually amazingly simple to implement and unify as soon as there's an agreement on the behaviour/fields, and I'm entirely failing to understand why these things haven't been addressed years ago...
Comment 76 Suyash Agarwal (:sshagarwal) 2013-12-03 20:26:28 PST
(In reply to Archaeopteryx [:aryx] from comment #72)
> Comment on attachment 8340673 [details] [diff] [review]
> Patch v1 (cleaned up)
> 
> Review of attachment 8340673 [details] [diff] [review]:
> for (searchword of searchwords)
Oh, this was "of", I was trying "in" and it was failing, thanks :)
Comment 77 Sebastian H. [:aryx][:archaeopteryx] 2013-12-04 00:42:12 PST
(In reply to Suyash Agarwal (:sshagarwal) (unfortunately away till 1st Dec) from comment #76)
> > for (searchword of searchwords)
> Oh, this was "of", I was trying "in" and it was failing, thanks :)
Of course you should add a |let| here.
Comment 78 Suyash Agarwal (:sshagarwal) 2013-12-04 01:33:57 PST
(In reply to Archaeopteryx [:aryx] from comment #77)

Does this patch work for you now?
Comment 79 Sebastian H. [:aryx][:archaeopteryx] 2013-12-04 01:36:34 PST
> Does this patch work for you now?
Yes.
Comment 80 Ulf Zibis 2013-12-04 04:28:19 PST
(In reply to Thomas D. from comment #74)
> Yes, the *behaviour* of recipient autocomplete, AB quicksearch and main
> view's quickfilter bar wrt names and addresses should indeed be the same (as
> far as possible), and this bug goes a long way towards that. I don't think
> main view's quickfilter bar can have the same *code* because it's another
> dataset with different structure (the messages; vs. the addressbook).

Thanks for the quick explanation.
So at least you could share the code for autocomplete with AB quicksearch, and maybe some inner loop (after determining the dataset) with view's quickfilter bar.
Comment 81 Ulf Zibis 2013-12-04 04:39:35 PST
(In reply to :aceman from comment #44)
> As I said in bug 529584 I do not like searching for strings inside of names
> (opposite to emails).
> So I would agree with:
> [Pet May] to find Peter Mayer
> 
> but not with:
> [aye][ete] find "Peter John Mayer"

Could we prioritize the search results in the result list in maybe following order? :
1. search strings match with beginning of words AND there order in fields
2. search strings match with beginning of words
3. search strings match with there order in fields
4. search strings match in any other way
Comment 82 Thomas D. (currently busy elsewhere; needinfo?me) 2013-12-04 04:46:30 PST
(In reply to Ulf Zibis from comment #81)
> Could we prioritize the search results in the result list in maybe following
> order? :

Thanks for the idea, but: Probably not (too complex), and definitely not in this bug.

> 1. search strings match with beginning of words AND there order in fields
> 2. search strings match with beginning of words
> 3. search strings match with there order in fields
> 4. search strings match in any other way

And instead of investing time in such algorithms, I'd rather try implementing "frecency", so that most frequent and most recent results get toplisted, after full nick name matches (existing bug).
Not sure if we have a bug for autocomplete "frecency" yet.
Comment 83 :aceman 2013-12-04 05:03:25 PST
I think there is already some infrastructure for "frecency" or popularity in the addressbook.
Comment 84 Mihovil Stanic [:Mikeyy - L10n HR] 2013-12-04 05:10:38 PST
Most used (popular) results first.
I think that's current behaviour.
Comment 85 Thomas D. (currently busy elsewhere; needinfo?me) 2013-12-04 06:13:24 PST
OT:

(In reply to :aceman from comment #83)
> I think there is already some infrastructure for "frecency" or popularity in
> the addressbook.

No, nothing for frecency (frequency + recency), only a very simplistic implementation of "popularity", which is absolute frequency at best (just counting how often you send mail to a random card having that email address).

(In reply to Mihovil Stanic [:Mikeyy - L10n HR] from comment #84)
> Most used (popular) results first.
> I think that's current behaviour.

Yes, we have popularityIndex, but it's poorly designed:
- popularityIndex only ever goes up ad infinitum (and will probably break at some point if you try hard enough) [1]
- does not involve recency (so that age-old card which you haven't used for the last 3 years will still appear at the top if it was popular in the past, whilst you'll never see the new card topmost until new card's count is more than old card's count)
- if you have multiple cards with same email address, somewhat randomly first found card gets bumped [2]; restructure your address books or move cards around, and it will probably mess up
- no way to manually control or reset popularity values
- does not correlate the actual strings of user input with results selected by user from that string, so e.g. so when you press m often and recently enough and always pick "mozilla" from the dropdown, FF will learn that and offer "mozilla" at the top (that seems to be the resulting behaviour of FF awesome bar, not sure how they realize it technically).

[1] http://mxr.mozilla.org/comm-central/source/mailnews/compose/src/nsMsgCompose.cpp#4860
[2] http://mxr.mozilla.org/comm-central/source/mailnews/compose/src/nsMsgCompose.cpp#4817
Comment 86 Thomas D. (currently busy elsewhere; needinfo?me) 2013-12-04 06:27:36 PST
OT:

(In reply to Thomas D. from comment #85)
> No, nothing for frecency (frequency + recency), only a very simplistic
> implementation of "popularity"

Courtesy of bugmaster Thomas D., here's the bug for autocomplete "frecency":
Bug 382415 - Popularity index of autocomplete doesn't honor timeline (use frecency for email contacts)

And unfortunately uncertainty in that bug, too, as to how FF frecency works, and broken links to what might have answers.
Comment 87 Suyash Agarwal (:sshagarwal) 2013-12-04 06:42:04 PST
Shall we use a variant of Apriori algorithm ? :)
Comment 88 Thomas D. (currently busy elsewhere; needinfo?me) 2013-12-04 07:27:13 PST
(In reply to Suyash Agarwal (:sshagarwal) (unfortunately away till 1st Dec) from comment #87)
> Shall we use a variant of Apriori algorithm ? :)

http://en.wikipedia.org/wiki/Apriori_algorithm
Looks good to me (lol). Pls provide a prototype ;)

OK, looking at the number of comments, we should probably try to refocus on getting this landed and avoid derailing...

Next steps: Comment 70, Comment 72.

I think comment 75, although highly desirable, needs to be handled in its own bug (as it also involves sensitive localized string for mail.addr_book.quicksearchquery.format pref where imo we can't just easily and carelessly change the string/pref name to work around that bloody localization bug requiring new entity names for value changes, so we need to work around that and I also have an idea how).
Comment 89 Samuel Müller 2013-12-04 07:49:11 PST
(In reply to Suyash Agarwal (:sshagarwal) (unfortunately away till 1st Dec) from comment #87)
> Shall we use a variant of Apriori algorithm ? :)

I'm just briefly following this conversation when I had the following idea: It would be great if the algorithm would consider which proposed entry is selected in the end. 
E.g. I have two contacts, John Doe and Davy Jones. When I type "jo" it would suggest "Davy Jones" first because I write more e-mails to Davy. However I _never_ write "jo" when writing a message to Davy, instead I start typing "da". Davy Jones should not be suggested then typing "jo" (at least not as the first suggestion). This kind of learning can be achieved with Bayesian inference (http://en.wikipedia.org/wiki/Bayesian_inference) which is already used in the junk mail filter. I don't know the code, but maybe there is even an existing framework. On the other hand I don't know if the Bayes learning can be combined with the frecency concept. 
Any suggestions?
Comment 90 Thomas D. (currently busy elsewhere; needinfo?me) 2013-12-04 08:23:13 PST
(In reply to Samuel Müller from comment #89)
> can be combined with the frecency concept. 
> Any suggestions?

The behaviour you describe is what we ultimately want for autocomplete (as in FF awesome bar), and what we perhaps somewhat fuzzily subsume with "frecency". But pls let's discuss this in Bug 382415, not here...! This bug is only about improving multi-word searches.
Comment 91 :aceman 2013-12-07 12:30:04 PST
Comment on attachment 8340673 [details] [diff] [review]
Patch v1 (cleaned up)

I have not much opinion on this feature.
I just suggest you guys first specify the feature properly and then Suyash implements it. Throwing in random thoughts just kills this bug. Also get UI approval first.
Comment 92 Thomas D. (currently busy elsewhere; needinfo?me) 2013-12-07 15:23:38 PST
(In reply to :aceman from comment #91)

??

> Comment on attachment 8340673 [details] [diff] [review]
> Patch v1 (cleaned up)
> 
> I have not much opinion on this feature.
> I just suggest you guys first specify the feature properly

I think the feature has been suffiently specified, as seen in the current patch which generally does the right thing. The rest is technical implementation details which can't be pre-specified, they just need to be tried and tested. Order of results is currently defined by popularityIndex; it's not part of this bug to change that, so things like "frecency" or other changes to autocomplete result order should be done in their own bugs.

> and then Suyash implements it.

I have already pointed out next steps for improving the patch in my comment 88:
> > Next steps: Comment 70, Comment 72.

> Throwing in random thoughts just kills this bug.

Not sure what exactly this refers to, but I agree in general we should avoid further disgressions so as to move this forward more swiftly, as I said in comment 88:

> OK, looking at the number of comments, we should probably try to refocus on getting this landed and
> avoid derailing...

I'm sorry if I myself might have added a thought too much; I've already stated that my Comment 75 needs a new bug because it would overload this bug. 

> Also get UI approval first.

We will ask for UI approval when we have a complete and working patch which addresses relevant comments. This bug is nothing but a logical extension of Bug 529584 which already has ui-review+ for the "contains" logic which is already implemented on all currently searched fields.

So this bug is just the remainder, namely splitting multiword-searches to make the entire search more powerful. E.g. searching for "Hans Mayer" needs to find display name "Hans-Peter Mayer", which is obviously more useful and there are plenty examples on this bug and all the duplicate bugs and getsatisfaction reports (see Comment 50). In fact, user requests are already beyond this bug, asking us to include *all* contact fields into the search (bug 298438), and I've received angry personal mail from commenters voicing their frustration why TB does not address these obvious shortcomings which have been known for years. But of course, we could still agree to disagree with the real-life UX of our users, discuss this to death and just lose more userbase who are sick and tired of the slow evolution of basic features in TB. Perhaps lose long-term contributors, too.

UI reviewers need to see this in action, this can't be evaluated from just reading the code and making random and erroneous assumptions about false positives. Plus, as mentioned before, lots of precedents like google or Mozilla products like FF and TB where exactly the same type of "contains all of multiple search words somewhere" is successfully used because it's obviously the most powerful search for the kind of data we have here. So unless the current identical design of Message list Quick Filter Bar is wrong, there's no way this could fail, as it comes with an inherent ui-r+ from all sorts of perspectives. And fwiw, Message Quick Filter has evolved in exactly the same way, from a dull search using {begins with "all words as one string" in any field} to the powerful quick filter algorithm we have now, after years of campaigning from users like me. See Bug 522985 for an explanation of "contains split multiword anywhere" benefits which are applicable to this bug in exactly the same way...
Comment 93 Thomas D. (currently busy elsewhere; needinfo?me) 2013-12-07 16:52:16 PST
(In reply to Thomas D. from comment #67)
> If I'm searching for Marianne Mustermann, why would I regularly type the
> full first name ("Marianne") instead of just typing ma mu? It's a question
> of habit formation (as already implemented e.g. in awesome bar).
> please don't prevent other users from finding what they need
> in the most efficient way.

In support of my points in comment 67 about benefits of intra-word "contains" search (as acknowledged even by previously sceptical :mkmelin in his Comment 69 for scenarios involving spelling variations), consider that other languages have different word formation rules than English:

E.g. in German, it's very possible to have compound words like "Donaudampfschifffahrtskapitän", which is a combined noun consisting of 5 individual nouns: Donau (name of a river), Dampf (steam), Schifffahrt (~shipping), Kapitän (captain). So it's the "captain of a steam boat travelling on Donau river". It's the same with family names, company names, street names, job titles: All of these might be compound single nouns made up from several unique nouns that also exist independently. Obviously, it does make a lot of sense for German speakers to search for those contained nouns regardless of their position in the compound noun. This bug enables users to search for "Donau", "Dampf", "Schiffahrt", "Kapitän", "Dampfschifffahrt", "Dampfschifffahrtskapitän" - all of which are real independent words.

So the AB display name of our captain might be "Donaudampfschifffahrtskapitän Müllermaier", and even the sirname, "Müllermaier", is a compound of two independent sirnames, "Müller", and "Maier", which are common names and somewhat easy to mix up. So if you're searching for that person in your AB, but you only remember his job involved "captain" and his sirname had "Maier" - tough luck: Without this bug (exactly as requested, including intra-word matches), it's all or nothing: Intellegible searches like "Kapitän Maier" or "Dampfschifffahrt Müller" (which sounds like a reasonable company name) won't find the thing unless you search for almost the the entire exact phrase: "Donaudampfschifffahrtskapitän Mül" is currently the shortest search term to get away with. What poor UX, given that, after this bug, as little as "Kap Mü" might be enough to filter out that very person, or "Kapitän Maier" for those of us who prefer full words... The main point being that as soon as you start to enter the first few letters of the second word or substring, chances are that you will have already found what you are looking for. Combine this bug with bug bug 298438, and increase AB search power by order of magnitudes, as we did, once upon a time, for Quick Filter Bar...
Comment 94 Suyash Agarwal (:sshagarwal) 2013-12-11 06:59:48 PST
So, what is required on my part now?
Comment 95 Thomas D. (currently busy elsewhere; needinfo?me) 2013-12-11 14:47:19 PST
(In reply to Suyash Agarwal (:sshagarwal) will be away from 14th December from comment #94)
> So, what is required on my part now?

As I mentioned before:
comment 70, and
comment 72

1) change variable name list -> searchwords;
   for loop, use (searchword of searchwords)
2) Add comment before let searchwords = fullstring.split...:
// When user's fullstring search expression is "foo bar" (without quotes), search for each word separately, for more versatile search and better, more complete results containing "foo" in any field AND "bar" in any field (see bug 558931 for explanations). Structurally same algorithm as in message quick filter.
3) remove (and(IsMailList,=,TRUE)
4) modify code in _checkEntry: function... to split up fullstring into words there, too
http://mxr.mozilla.org/comm-central/source/mailnews/addrbook/src/nsAbAutoCompleteSearch.js#211
So pls ensure that _checkEntry handles multiple words seperately, too.
5) add whiteboard entry [needs followup bug for LDAP]

For details of what and why, see comment 70.
4) will bring new code which will need more feedback iterations.
Comment 96 Suyash Agarwal (:sshagarwal) 2013-12-12 02:28:02 PST
Created attachment 8346470 [details] [diff] [review]
Patch v2

Patch addressing comment 70 and comment 72.
Comment 97 Thomas D. (currently busy elsewhere; needinfo?me) 2013-12-12 05:52:35 PST
Comment on attachment 8346470 [details] [diff] [review]
Patch v2

Review of attachment 8346470 [details] [diff] [review]:
-----------------------------------------------------------------

Nice! :) I like the creative drive of cleaning up the old clumsy code with gloves off, and intelligent recreation :)

Caveat: I have not tested this code, just proofreading.

I have some naming nits as usual, and a followup bug ("exact quoted string") which will require some modifications again, but for this bug, I think this will do the trick nicely.

::: mailnews/addrbook/src/nsAbAutoCompleteSearch.js
@@ +199,5 @@
>    /**
> +   * Checks a field against all the search terms from
> +   * a multi word search query (assuming space as the
> +   * separator) to see if all the search terms are
> +   * present in the field or not.

Nit: "field" sounds like a field object or field name, but it's just a simple string which will be searched. It's not even one field, but text content of one or more fields, or any targetString for that matter. Let's rephrase this accordingly:

Checks text content of one or more card fields against all the search terms from a multi word search query (assuming space as separator) to see if all the search terms are present in fieldText not.

@@ +201,5 @@
> +   * a multi word search query (assuming space as the
> +   * separator) to see if all the search terms are
> +   * present in the field or not.
> +   *
> +   * @param field       Card field to be searched.

Dito, it's not a field object, nor the name of a field. Suppose we can keep "field" for contextual readability, although it's just plain targetString.

* @param fieldText   Text content of card field(s) to be searched.

@@ +202,5 @@
> +   * separator) to see if all the search terms are
> +   * present in the field or not.
> +   *
> +   * @param field       Card field to be searched.
> +   * @fullString        Multiword search query.

@param is missing here
Suppose we can keep "fullString" for contextual readability (as used elsewhere), although it's just searchString or searchWordsString.

@@ +206,5 @@
> +   * @fullString        Multiword search query.
> +   * @return            True if all the search terms
> +   *                    are present, false otherwise.
> +   */
> +  ifContained: function ifContained(field, fullString) {

The name of this function should indicate wordwise modus of operation, so I'd suggest this:

containsWordwise (fieldText, fullString)

That's my favourite. Alternatively, one of the following:

containsWordwise
containsWords

ifWordsContained
ifContainedWordwise
ifContainsWordwise
ifContainsWords

checkWordsContained
checkContainedWordwise
checkContainsWordwise
checkContainsWords

@@ +207,5 @@
> +   * @return            True if all the search terms
> +   *                    are present, false otherwise.
> +   */
> +  ifContained: function ifContained(field, fullString) {
> +    let searchWords = fullString.split(" ");

Does this work correctly for fullString with trailing, leading, or inner whitespace?

card with Displayname = "afoobarc" (!important)
fullString = "foo bar " (1 trailing)
fullString = "foo bar  " (2 trailing)
fullString = "foo  bar" (2 inner)
fullString = " foo bar" (1 leading)
fullString = "  foo bar" (2 leading)
3 whitespaces should be same as 2, but it's better to check...

All of these should ignore the extra whitespace and find cards that have "foo" and "bar" anywhere, even in the middle of words, like displayname = "afoobarc" (to avoid false positives from leading/trailing/inner spaces).

@@ +209,5 @@
> +   */
> +  ifContained: function ifContained(field, fullString) {
> +    let searchWords = fullString.split(" ");
> +    for (let searchWord of searchWords)
> +      if (!field.contains(searchWord))

Wow, that's nice and good perfomance, return as soon as one word fails! :)

field is misleading, adjust as above.

@@ +231,5 @@
> +    // joining all the fields so that a single search query can be
> +    // fired on all the fields at once. Separating them using spaces
> +    // so that field1=> "abc" and field2=> "def" on joining shouldn't
> +    // return true on search for "bcd".
> +    let card = card.displayName + " " + card.firstName + " " +

"card" (in "let card") sounds like card object, and is confusing because card is used both as input object and output string. We're also not yet searching the entire card (see Bug 298438 for AB quicksearch), but only selected fields.

let fieldText = card.displayName + ...

***** Followup bug: Allow quoted search for "exact string" *****

This is very efficient for multiword search as introduced by this bug. We will probably need to modify this in a followup bug to allow user to search for exact multiword strings using quotes, as we currently do in Message Quick Filter:

[foo bar] -> selected fields contain foo AND bar (new default algorithm, this bug)
["foo bar"] -> selected fields contain "foo bar" (allow current algorithm as an option).

Exact string search will fail with concatenated fieldText because we'll have cross-field false positives as you described them above with "bcd" example (which will yield false positive for "b cd"). But I'd recommend to just land this as-is and do exact string search later in a new bug.

**********

@@ +234,5 @@
> +    // return true on search for "bcd".
> +    let card = card.displayName + " " + card.firstName + " " +
> +               card.lastName + " " + emailToUse + " " +
> +               card.getProperty("Notes", "") + " " +
> +               card.getProperty("NickName", "");

We have card.lastName etc.; did you try if card.Notes and card.NickName can work the same way? This getProperty method looks odd.

@@ +236,5 @@
> +               card.lastName + " " + emailToUse + " " +
> +               card.getProperty("Notes", "") + " " +
> +               card.getProperty("NickName", "");
> +    card = card.toLocaleLowerCase();
> +    if (this.ifContained(card, fullString))

adjust function call as required, see above:

if (this.containsWordwise(fieldText, fullString))

@@ -224,5 @@
> -        lastName.contains(fullString) ||
> -        emailToUse.toLocaleLowerCase().contains(fullString))
> -      return true;
> -
> -    if (firstWord && rest &&

It's great to see this clutter gone (firstword && rest && ...)

@@ +427,1 @@
>        searchQuery = "?" + searchQuery;

In a bmo comment on this bug, could you cite final content of searchQuery variable for fullString = "foo bar" (without quotes)?
Comment 98 Suyash Agarwal (:sshagarwal) 2013-12-12 09:06:54 PST
Created attachment 8346635 [details] [diff] [review]
Patch v2.2

Hope this covers the feedback in comment 97.
Comment 99 Thomas D. (currently busy elsewhere; needinfo?me) 2013-12-12 13:15:09 PST
Comment on attachment 8346635 [details] [diff] [review]
Patch v2.2

Review of attachment 8346635 [details] [diff] [review]:
-----------------------------------------------------------------

Much better :)

In addition to nits below, can you please answer all three questions of comment 97? (Search for: "?")

::: mailnews/addrbook/src/nsAbAutoCompleteSearch.js
@@ +204,5 @@
> +   *
> +   * @param textString    Text string to be searched.
> +   * @param searchString  Multiword search query.
> +   * @return              True if all the search terms
> +   *                      are present, false otherwise.

new line for "false otherwise"

@@ +232,5 @@
> +    // search query can be fired on all of them at once. Separating them
> +    // using spaces so that field1=> "abc" and field2=> "def" on joining
> +    // shouldn't return true on search for "bcd".
> +    let cumulativeFieldText = card.displayName + " " + card.firstName + " " +
> +               card.lastName + " " + emailToUse + " " +

cosmetics: one line for each field pls, and left-align all fields

@@ +235,5 @@
> +    let cumulativeFieldText = card.displayName + " " + card.firstName + " " +
> +               card.lastName + " " + emailToUse + " " +
> +               card.getProperty("Notes", "") + " " +
> +               card.getProperty("NickName", "");
> +    card = card.toLocaleLowerCase();

cumulativeFieldText, not card (2x in this line)
Comment 100 Suyash Agarwal (:sshagarwal) 2013-12-14 23:25:06 PST
Created attachment 8347708 [details] [diff] [review]
Patch v2.5

(In reply to Thomas D. from comment #97)
> Does this work correctly for fullString with trailing, leading, or inner
> whitespace?
Ya, this works for all these three sort of inputs.

> We have card.lastName etc.; did you try if card.Notes and card.NickName can
> work the same way? This getProperty method looks odd.
I tried it and it wasn't accessible directly with the .(dot) operator.

> >        searchQuery = "?" + searchQuery;
> 
> In a bmo comment on this bug, could you cite final content of searchQuery
> variable for fullString = "foo bar" (without quotes)?
Sure
?
(and(or(DisplayName,c,foo)(FirstName,c,foo)(LastName,c,foo)
       (NickName,c,foo)(Notes,c,foo)(PrimaryEmail,c,foo)
       (SecondEmail,c,foo)
    )
    (or(DisplayName,c,bar)(FirstName,c,bar)(LastName,c,bar)
       (NickName,c,bar)(Notes,c,bar)(PrimaryEmail,c,bar)
       (SecondEmail,c,bar)
    )
)

I have added white spaces in the query string in an attempt to clarify the operand-operator ordering.

Thanks.
Comment 101 Thomas D. (currently busy elsewhere; needinfo?me) 2013-12-15 11:14:15 PST
Comment on attachment 8347708 [details] [diff] [review]
Patch v2.5

Review of attachment 8347708 [details] [diff] [review]:
-----------------------------------------------------------------

Lovely! This looks very correct to me. I tried hard but I can't find any technical flaws (but I'd appreciate others to double-check on the separate email logic which we incorporated into the main search logic, it's easy to get lost there from just reading the code).

So this time it's only cosmetics, and a tiny little bit of performance tuning.

::: mailnews/addrbook/src/nsAbAutoCompleteSearch.js
@@ +204,5 @@
> +   *
> +   * @param textString    Text string to be searched.
> +   * @param searchString  Multiword search query.
> +   * @return              True if all the search terms
> +   *                      are present,

Cosmetics: Better, but not good enough, pls rewrap ;)
-> A line break too much before "are present".

@@ +239,5 @@
> +                              card.firstName + " " +
> +                              card.lastName + " " +
> +                              emailToUse + " " +
> +                              card.getProperty("Notes", "") + " " +
> +                              card.getProperty("NickName", "");

-> For better order and minimally better performance, can we please swap "Notes" and "NickName" field (see below).

@@ +241,5 @@
> +                              emailToUse + " " +
> +                              card.getProperty("Notes", "") + " " +
> +                              card.getProperty("NickName", "");
> +    cumulativeFieldText = cumulativeFieldText.toLocaleLowerCase();
> +    if (this.ifContainsWords(cumulativeFieldText, fullString))

Nice! So this is where the wordwise search magic gets executed to reduce the previous result set as you type along.

@@ +419,5 @@
> +      // Firstly removing any leading or trailing whitespaces.
> +      fullString = fullString.trim();
> +      let searchWords = fullString.split(" ");
> +      let searchQuery = "";
> +      let modelQuery = "(or(DisplayName,c,@V)(FirstName,c,@V)(LastName,c,@V)(NickName,c,@V)(Notes,c,@V)(PrimaryEmail,c,@V)(SecondEmail,c,@V))";

Let's optimize performance here a bit (every bit counts, as we don't know how many dozens, hundreds or thousands of contacts users might have):
Assuming that users are searching for email more often than notes (and notes might have much more data to search):

-> Pls move (Notes,c,@V) to the end of the chain (where it was before). The effect should be that in case of early match on names, nicks, and emails, notes field with more data will not even be searched (I hope).

@@ +420,5 @@
> +      fullString = fullString.trim();
> +      let searchWords = fullString.split(" ");
> +      let searchQuery = "";
> +      let modelQuery = "(or(DisplayName,c,@V)(FirstName,c,@V)(LastName,c,@V)(NickName,c,@V)(Notes,c,@V)(PrimaryEmail,c,@V)(SecondEmail,c,@V))";
> +      for (let searchWord of searchWords) {

Nice again; so this is where the wordwise search magic is constructed for new autocomplete searches.
Comment 102 Suyash Agarwal (:sshagarwal) 2013-12-15 12:02:18 PST
Created attachment 8347775 [details] [diff] [review]
Patch v3

Okay, done!
Comment 103 Thomas D. (currently busy elsewhere; needinfo?me) 2013-12-21 12:36:52 PST
Sorry, I introduced an unwanted change into the patch which needs to be backed out:

(In reply to Thomas D. (away till 31st January) from comment #70)
> Comment on attachment 8340673 [details] [diff] [review]
> Patch v1 (cleaned up)
> > +      let searchQuery = "";
> > +      let modelQuery = "(or(DisplayName,c,@V)(FirstName,c,@V)(LastName,c,@V)(NickName,c,@V)(and(IsMailList,=,TRUE)(Notes,c,@V))(PrimaryEmail,c,@V)(SecondEmail,c,@V))";
> 
> Remove "(and(IsMailList,=,TRUE)", it's just clutter which is no longer
> needed. Just (NickName,c,@V) is enough, and handles both lists and people
> correctly, consistently and efficiently.

My reading of the IsMailList condition was obviously wrong, and nobody complained... :|
I had wrongly assumed that nickname depends on the IsMailList condition, but in fact it's the notes field, which is a different story then... So the original algorithm will only search notes field of mailing lists (exposed in the UI as "description" property), and we are not here to change that (that's bug 298438 territory or the autocomplete variant thereof; see Bug 298438 Comment 24 about why searching notes field without ways of opting out might have too many false positives depending on scenarios and freeform data structure).

So please roll the IsMailList parts back into your new code, like this (whitespace removed for presentation):

> + card.getProperty("Notes", "");
+ // If card is a mailing list, also search notes field aka description
+ (card.isMailList ? card.getProperty("Notes", "") : "")

> +      let modelQuery = "(or(DisplayName,c,@V)(FirstName,c,@V)(LastName,c,@V)(NickName,c,@V)(PrimaryEmail,c,@V)(SecondEmail,c,@V)(Notes,c,@V))";
+      let modelQuery = "(or(DisplayName,c,@V)(FirstName,c,@V)(LastName,c,@V)(NickName,c,@V)(PrimaryEmail,c,@V)(SecondEmail,c,@V)(and(IsMailList,=,TRUE)(Notes,c,@V)))";
Comment 104 Thomas D. (currently busy elsewhere; needinfo?me) 2013-12-21 13:35:46 PST
For better understanding of this existing function, let's tweak comments a bit:

> +  /**
>    * Checks a card against the search parameters to see if it should be
>    * included in the result.

    * Checks the card of a single entry of a previous autocomplete result set
    * against the search parameters to see if it should still be included
    * in the new result set.

>    *
>    * @param card        The card to check.
>    * @param emailToUse  The email address to check against.

    * @param emailToUse  The single email address to check against (from previous results entry)

>    * @param fullString  The full search string.
>-   * @param firstWord   The first word of the search string.
>-   * @param rest        Anything after the first word.
>    * @return            True if the card matches the search parameters, false
>    *                    otherwise.
Comment 105 Thomas D. (currently busy elsewhere; needinfo?me) 2013-12-21 13:44:30 PST
(In reply to Thomas D. (away till 31st January) from comment #104)
> For better understanding of this existing function, let's tweak comments a
> bit:
> 
> > +  /**
> >    * Checks a card against the search parameters to see if it should be
> >    * included in the result.
> 
>     * Checks the card of a single entry of a previous autocomplete result set
>     * against the search parameters to see if it should still be included
>     * in the new result set.

Still not fully correct (still wrong from current documentation), let's try again:

>     * Checks the card and email address of a single entry of a previous autocomplete
>     * result set against the search parameters to see if that entry should still
>     * be included in the new result set.
Comment 106 Suyash Agarwal (:sshagarwal) 2013-12-22 00:51:39 PST
Created attachment 8350970 [details] [diff] [review]
Patch v3.2

Fixed the regression.
(In reply to Thomas D. (away till 31st January) from comment #105)
>  * Checks the card and email address of a single entry of a previous autocomplete
>  * result set against the search parameters to see if that entry should still
>  * be included in the new result set.
I didn't include this part as the _checkEntry() though, is used for this purpose but we aren't doing this previous result part in this function, this function is just getting a card, email and the search string. It has no idea whether the card is from the list of previous autocomplete search result or from somewhere else. Please correct me if I made some wrong assumptions.

Thanks.
Comment 107 Thomas D. (currently busy elsewhere; needinfo?me) 2013-12-22 06:07:58 PST
(In reply to Suyash Agarwal (:sshagarwal) from comment #106)
> Created attachment 8350970 [details] [diff] [review]
> b558931.patch
> 
> Fixed the regression.
> (In reply to Thomas D. (away till 31st January) from comment #105)
> >  * Checks the card and email address of a single entry of a previous autocomplete
> >  * result set against the search parameters to see if that entry should still
> >  * be included in the new result set.
> I didn't include this part as the _checkEntry() though, is used for this
> purpose but we aren't doing this previous result part in this function, this
> function is just getting a card, email and the search string. It has no idea
> whether the card is from the list of previous autocomplete search result or
> from somewhere else. Please correct me if I made some wrong assumptions.

I was trying to be more explicit about the difference between these two functions:

_searchCards: initial autocomplete search (new search term);
checks defined fields of a full card to see if that card (potentially having *two* email addresses) should be included in the result set. If card matches and has two email addresses, this will create *two* lines of result entries in the autocomplete dropdown, one for each associated email.

_checkEntry: incremental autocomplete search (as user keeps adding characters to existing search term);
checks a single-line single-mail entry of a previous autocomplete result set to see if that particular entry should still be in the new result set (that's why we spoon-feed a single email address here, while there are *two* email addresses on each card).

>383       // We have successful previous matches, therefore iterate through the
>384       // list and reduce as appropriate
>385       for (var i = 0; i < aPreviousResult.matchCount; ++i) {
>386         if (this._checkEntry(aPreviousResult.getCardAt(i),
>387                              aPreviousResult.getEmailToUse(i), fullString,
>388                              firstWord, rest))
>389           // If it matches, just add it straight onto the array, these will
>390           // already be in order because the previous search returned them
>391           // in the correct order.
Comment 108 Thomas D. (currently busy elsewhere; needinfo?me) 2013-12-24 00:52:25 PST
(R)evolution of autocomplete search power:

foo bar* in selected fields (old algorithm, find multi-word fullstring at the beginning of fields)
*foo bar* in selected fields (introduced with bug 529584, find multi-word fullstring anywhere in fields)
*foo* AND *bar* in selected fields (this bug 558931, find search words anywhere in fields)
*foo* AND *bar* in more/all relevant fields (tbd in analogy to bug 298438 for AB quicksearch)
Comment 109 Thomas D. (currently busy elsewhere; needinfo?me) 2013-12-24 08:00:50 PST
Comment on attachment 8350970 [details] [diff] [review]
Patch v3.2

Dear reviewers,

as a result of our cooperative efforts on this bug, we're happy to offer you this patch by Suyash as a Christmas present, designed by popular demand, and covering many real-life usecases which currently fail. This is the logical complement of bug 529584 introducing *foo bar* search which has ui-r+ by :mconley. So here, we just split that up into *foo* AND *bar* for more versatile searching and better result sets.

This implements maximally efficient search algorithm as found in some of the most powerful searches known in- and outside of TB. For example, most of you will remember how good old Quick Search evolved into powerful Quick Filter Bar with exactly the same split multi-word search algorithm presented here (as explained in Bug 522985), and here it's exactly the same story:

(In reply to Thomas D. (away till 31st January) from comment #108)
> (R)evolution of autocomplete search power:
>
> foo bar* in selected fields (old algorithm, find multi-word fullstring at the /beginning/ of fields)
> *foo bar* in selected fields (introduced with bug 529584, find multi-word fullstring /anywhere/ in fields)
> *foo* AND *bar* in selected fields (this bug 558931, find /all search words anywhere/ in fields)
> *foo* AND *bar* in more/all relevant fields (tbd in analogy to bug 298438 for AB quicksearch)

Google, FF awesome bar, BMO quicksearch, TB Quick Filter - you name it: All the really powerful searches use this very same *foo* AND *bar* search algorithm or variants thereof. This works great for the usually limited dataset of ABs and the rather unique content found in most AB fields. And in case you're wondering, we're not touching the current order of results in this bug, so the first sort will still be stiff popularityIndex as it is now (introducing "frecency" is Bug 382415).

We want to *land this asap* so that it can be included in the *TB 31* release, while ensuring quality. Therefore, we've invited several reviewers, so whoever is the fastest to unwrap the present and deliver to the crowds will go down in history as the godfather of autocomplete (r)evolution ;)

*** Benefits / Sample scenarios ***

This bug eliminates a major usability problem which prevents efficient freeform searches using the bits that you remember about the contact you're looking for. Say you've got an AB card having:

Display Name: "Father Christmas (aka Santa Claus)"
Email: stairway.to.heaven@somewhere.foo
Email2: anotherway.to.heaven@somewhere.bar

Most of the following search expressions currently fail without reason [TB 24]. After this bug, just type whatever you remember without caring about exact field contents and word order, and by adding a second search term, you'll narrow down what you're looking for very quickly:
[Santa Claus]   [Claus Santa]   [Father Santa]   [Santa]   [Claus]   [stairway heaven]

It's also helpful when you don't remember the exact content you're looking for:
Was it "stairway" or "highway"? -> just type [way Santa], and you're there.
Was it "Christmas" or "XMas"? -> just type [mas Santa], success!
Was it "somewhere.foo" or "anywhere.foo"? -> just type [Christmas where.foo], and you'll find him anyway!

Of course that's also helpful for those double names, middle names, or foreign names where you forgot the exact spelling, so you just type some part of the name that you're sure of, and add any other bit of contact detail that you remember.
Even works for precision searches: If you want to use the 2nd email address, just type [Father Chris .bar] and in your autocomplete result, you'll probably see just that single contact you're looking for, hassle-free!
Many more examples found in comments on this bug.

For bonus points, patch comes with streamlined code and probably better performance, and hacks like "firstWord" and "rest" and "FN vs. LN" swaps are no longer required (in fact, those hacks were just trying to imitate the behaviour that we fully implement here).

Enjoy! :)

Thomas D.
Comment 110 Magnus Melin 2013-12-29 04:25:37 PST
Comment on attachment 8350970 [details] [diff] [review]
Patch v3.2

Review of attachment 8350970 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good to me, thx

::: mailnews/addrbook/src/nsAbAutoCompleteSearch.js
@@ +209,5 @@
> +  ifContainsWords: function ifContainsWords(textString, searchString) {
> +    // Removing any leading or trailing whitespaces.
> +    searchString = searchString.trim();
> +    let searchWords = searchString.split(" ");
> +    for (let searchWord of searchWords)

please add braces for the for clause

@@ +231,1 @@
>      var i;

unused i
Comment 111 :aceman 2014-01-13 12:16:29 PST
Comment on attachment 8350970 [details] [diff] [review]
Patch v3.2

Review of attachment 8350970 [details] [diff] [review]:
-----------------------------------------------------------------

Nice solution, concatenating all the values into a single string :)
Seems to work fine for me. Finds cards if 1 word is in the name, other word is in the email address. That was probably wanted.

When you now have the possibility to rewrite some of the functions, can you name the arguments properly starting with 'a', e.g. aTextString, aSearchString ?
Thanks

::: mailnews/addrbook/src/nsAbAutoCompleteSearch.js
@@ -431,5 @@
>  
>          if (dir instanceof Components.interfaces.nsIAbDirectory &&
>              dir.useForAutocomplete(params.idKey)) {
>            this._searchCards(searchQuery, dir, result);
> -          this._searchWithinEmails(emailSearchQuery, fullString, dir, result);

Was it already answered why this can be removed?
Comment 112 Suyash Agarwal (:sshagarwal) 2014-01-13 22:00:34 PST
(In reply to :aceman from comment #111)
> Review of attachment 8350970 [details] [diff] [review]:
> Nice solution, concatenating all the values into a single string :)
:)
> When you now have the possibility to rewrite some of the functions, can you
> name the arguments properly starting with 'a', e.g. aTextString,
> aSearchString ?
> Thanks
Sure.
> @@ -431,5 @@
> > -          this._searchWithinEmails(emailSearchQuery, fullString, dir, result);
> 
> Was it already answered why this can be removed?
I think not. I appended the terms in 'emailSearchQuery' in 'searchQuery'  so I thought _searchWithinEmails() wasn't required. Moreover we are now searching for separate words, so I thought a single searchQuery having email fields too will do. If I am wrong please let me know, I'll put it back.

Thanks.
Comment 113 :aceman 2014-01-14 01:44:01 PST
(In reply to Suyash Agarwal (:sshagarwal) from comment #112)
> > @@ -431,5 @@
> > > -          this._searchWithinEmails(emailSearchQuery, fullString, dir, result);
> > Was it already answered why this can be removed?
> I think not. I appended the terms in 'emailSearchQuery' in 'searchQuery'  so
> I thought _searchWithinEmails() wasn't required. Moreover we are now
> searching for separate words, so I thought a single searchQuery having email
> fields too will do. If I am wrong please let me know, I'll put it back.
I don't know myself I just mentioned it because the reviewer will probably ask it. Or answer it :)
Comment 114 neil@parkwaycc.co.uk 2014-01-18 09:40:07 PST
Comment on attachment 8350970 [details] [diff] [review]
Patch v3.2

>+  ifContainsWords: function ifContainsWords(textString, searchString) {
[Bikeshed: checkForMatch sounds better.]

>+    // Removing any leading or trailing whitespaces.
>+    searchString = searchString.trim();
>+    let searchWords = searchString.split(" ");
You should precompute the search words instead of splitting the string each time. Also trim() doesn't help you with multiple whitespace, you need to split on something more inclusive, otherwise you'll get empty words.

>+    for (let searchWord of searchWords)
>+      if (!textString.contains(searchWord))
>+        return false;
>+
>+    return true;
[There's probably a way to compute this using the array's .all() method.]

>+    let cumulativeFieldText = card.displayName +
>+                              " " + card.firstName +
>+                              " " + card.lastName +
>+                              " " + emailToUse +
>+                              " " + card.getProperty("NickName", "") +
>+                              (card.isMailList ?
>+                                " " + card.getProperty("Notes", "") :
>+                                "");
[Nit: This looks a little ugly. The neatest, but slowest, way would be to build an array of the properties, append the notes for mailing lists, then join the elements together. In this case I would probably stick to building up a string but add the notes in a separate condition. Also append the spaces at the end of the line, that way you can line up the various card references.]

>+    if (this.ifContainsWords(cumulativeFieldText, fullString))
>       return true;
> 
>     return false;
Don't bother with if/then, just return the boolean value directly.

>-                             firstWord, rest))
You've removed all of the uses of firstWord and rest without removing the code that creates them.

>+      fullString = fullString.trim();
>+      let searchWords = fullString.split(" ");
Oh look you're computing the words again.

>+      searchQuery = "(and" + searchQuery + ")";
>       searchQuery = "?" + searchQuery;
Might as well do this in one line.

>-          this._searchWithinEmails(emailSearchQuery, fullString, dir, result);
You didn't remove the _searchWithinEmails method itself.
Comment 115 Suyash Agarwal (:sshagarwal) 2014-01-19 00:55:45 PST
Created attachment 8362191 [details] [diff] [review]
Patch v3.6

(In reply to neil@parkwaycc.co.uk from comment #114)
> You should precompute the search words instead of splitting the string each
> time. Also trim() doesn't help you with multiple whitespace, you need to
> split on something more inclusive, otherwise you'll get empty words.
What should I use for splitting?
As the user is aware of only the spaces, so I have tried to remove the empty words (if any) after splitting the string on the basis of spaces, please let me know if this needs to changed in some way.
> You didn't remove the _searchWithinEmails method itself.
Oh, I left it in case I was asked to put it back, removed it this time, sorry.
Comment 116 neil@parkwaycc.co.uk 2014-01-20 09:01:22 PST
Comment on attachment 8362191 [details] [diff] [review]
Patch v3.6

>+  checkForMatch: function checkForMatch(aTextString, aSearchWords) {
>+    return aSearchWords.every(String.contains, aTextString);
Wow! This looks neat! Unfortunately it doesn't quite work, as String.contains is not the function you're looking for. Either String.prototype.contains or aTextString.contains will work though.
As this function is nice and short and only used once you might as well write it out directly in _checkEntry instead.

>-    // Craft this by hand - we want the first item to contain the full string,
>-    // the second item with just the first word, and the third item with
>-    // anything after the first word.
>+    // Craft this by hand - we want the item to contain the full string.
Even the tweaked comment doesn't really make sense; just remove it.

>     var fullString = aSearchString.toLocaleLowerCase();
>+    fullString = fullString.trim();
>+    let searchWords = fullString.split(" ");
[Would be nice if we can fit this on fewer lines.]

>+    // Removing any empty members from the array.
>+    for (let index = 0; index < searchWords.length; ++index) {
>+      if (!searchWords[index])
>+        searchWords.splice(index, 1);
>     }
Hmm, what happens if there are no words (i.e. you just typed spaces)? I guess you need to improve the bail out condition earlier in startSearch.
Comment 117 Sebastian H. [:aryx][:archaeopteryx] 2014-01-20 10:20:49 PST
Comment on attachment 8362191 [details] [diff] [review]
Patch v3.6

Review of attachment 8362191 [details] [diff] [review]:
-----------------------------------------------------------------

::: mailnews/addrbook/src/nsAbAutoCompleteSearch.js
@@ +168,3 @@
>     */
> +  checkForMatch: function checkForMatch(aTextString, aSearchWords) {
> +    return aSearchWords.every(String.contains, aTextString);

Do you want this code here?
return aSearchWords.every(aSearchWord => aTextString.contains(aSearchWord))

@@ +331,5 @@
> +    let searchWords = fullString.split(" ");
> +    // Removing any empty members from the array.
> +    for (let index = 0; index < searchWords.length; ++index) {
> +      if (!searchWords[index])
> +        searchWords.splice(index, 1);

Here you would skip every second empty element from adjacent whitespaces because you move on with the for loop while removing an array element (and it's a Javascript array, so its members are live).

You can replace the for loop with code like this:
let searchWords = searchWords.filter( searchWord => searchWord != "");
Comment 118 Suyash Agarwal (:sshagarwal) 2014-01-20 11:45:53 PST
Created attachment 8362632 [details] [diff] [review]
Patch v4

Made the changes, thanks to Aryx.
Comment 119 Suyash Agarwal (:sshagarwal) 2014-01-21 00:02:42 PST
Created attachment 8362835 [details] [diff] [review]
Patch v4.1

Splitting using regex.
Thanks to Neil.
Comment 120 neil@parkwaycc.co.uk 2014-01-21 03:51:41 PST
(In reply to Archaeopteryx from comment #117)
> (From update of attachment 8362191 [details] [diff] [review])
> > +    return aSearchWords.every(String.contains, aTextString);
> 
> Do you want this code here?
> return aSearchWords.every(aSearchWord => aTextString.contains(aSearchWord))

No he doesn't want to create a new bound function on every call when he's already got a perfectly good function that can do the job.
Comment 121 Suyash Agarwal (:sshagarwal) 2014-01-21 06:46:00 PST
Created attachment 8362968 [details] [diff] [review]
Patch v4.2

Removing the arrow function due to minor performance issues.
I have still kept the removing of empty members code as that is helpful in returning early if the input is just blank spaces. Moreover the cursor doesn't become red too as if there was some error (this happens when the code is removed).

Thanks.
Comment 122 gwelsh 2014-03-06 18:38:43 PST
(In reply to Thomas D. from comment #109)
> Comment on attachment 8350970 [details] [diff] [review]
> Patch v3.2
> 
> Dear reviewers,
> 
> as a result of our cooperative efforts on this bug, we're happy to offer you
> this patch by Suyash as a Christmas present, designed by popular demand, and
> covering many real-life usecases which currently fail. This is the logical
> complement of bug 529584 introducing *foo bar* search which has ui-r+ by
> :mconley. So here, we just split that up into *foo* AND *bar* for more
> versatile searching and better result sets.
> 
> This implements maximally efficient search algorithm as found in some of the
> most powerful searches known in- and outside of TB. For example, most of you
> will remember how good old Quick Search evolved into powerful Quick Filter
> Bar with exactly the same split multi-word search algorithm presented here
> (as explained in Bug 522985), and here it's exactly the same story:
> 
> (In reply to Thomas D. (away till 31st January) from comment #108)
> > (R)evolution of autocomplete search power:
> >
> > foo bar* in selected fields (old algorithm, find multi-word fullstring at the /beginning/ of fields)
> > *foo bar* in selected fields (introduced with bug 529584, find multi-word fullstring /anywhere/ in fields)
> > *foo* AND *bar* in selected fields (this bug 558931, find /all search words anywhere/ in fields)
> > *foo* AND *bar* in more/all relevant fields (tbd in analogy to bug 298438 for AB quicksearch)
> 
> Google, FF awesome bar, BMO quicksearch, TB Quick Filter - you name it: All
> the really powerful searches use this very same *foo* AND *bar* search
> algorithm or variants thereof. This works great for the usually limited
> dataset of ABs and the rather unique content found in most AB fields. And in
> case you're wondering, we're not touching the current order of results in
> this bug, so the first sort will still be stiff popularityIndex as it is now
> (introducing "frecency" is Bug 382415).
> 
> We want to *land this asap* so that it can be included in the *TB 31*
> release, while ensuring quality. Therefore, we've invited several reviewers,
> so whoever is the fastest to unwrap the present and deliver to the crowds
> will go down in history as the godfather of autocomplete (r)evolution ;)
> 
> *** Benefits / Sample scenarios ***
> 
> This bug eliminates a major usability problem which prevents efficient
> freeform searches using the bits that you remember about the contact you're
> looking for. Say you've got an AB card having:
> 
> Display Name: "Father Christmas (aka Santa Claus)"
> Email: stairway.to.heaven@somewhere.foo
> Email2: anotherway.to.heaven@somewhere.bar
> 
> Most of the following search expressions currently fail without reason [TB
> 24]. After this bug, just type whatever you remember without caring about
> exact field contents and word order, and by adding a second search term,
> you'll narrow down what you're looking for very quickly:
> [Santa Claus]   [Claus Santa]   [Father Santa]   [Santa]   [Claus]  
> [stairway heaven]
> 
> It's also helpful when you don't remember the exact content you're looking
> for:
> Was it "stairway" or "highway"? -> just type [way Santa], and you're there.
> Was it "Christmas" or "XMas"? -> just type [mas Santa], success!
> Was it "somewhere.foo" or "anywhere.foo"? -> just type [Christmas
> where.foo], and you'll find him anyway!
> 
> Of course that's also helpful for those double names, middle names, or
> foreign names where you forgot the exact spelling, so you just type some
> part of the name that you're sure of, and add any other bit of contact
> detail that you remember.
> Even works for precision searches: If you want to use the 2nd email address,
> just type [Father Chris .bar] and in your autocomplete result, you'll
> probably see just that single contact you're looking for, hassle-free!
> Many more examples found in comments on this bug.
> 
> For bonus points, patch comes with streamlined code and probably better
> performance, and hacks like "firstWord" and "rest" and "FN vs. LN" swaps are
> no longer required (in fact, those hacks were just trying to imitate the
> behaviour that we fully implement here).
> 
> Enjoy! :)
> 
> Thomas D.

this is quite an accomplishment, however I now get loads of unwanted suggestions for simple two letter shortcuts I always used every day.
Comment 123 Thomas D. (currently busy elsewhere; needinfo?me) 2014-03-07 02:28:20 PST
(In reply to gwelsh from comment #122)
> (In reply to Thomas D. from comment #109)
> > Comment on attachment 8350970 [details] [diff] [review]
> > Patch v3.2
> > as a result of our cooperative efforts on this bug, we're happy to offer you
> > this patch by Suyash as a Christmas present, designed by popular demand, and
> > covering many real-life usecases which currently fail. This is the logical
> > complement of bug 529584 introducing *foo bar* search which has ui-r+ by
> > :mconley. So here, we just split that up into *foo* AND *bar* for more
> > versatile searching and better result sets.

> this is quite an accomplishment, however I now get loads of unwanted
> suggestions for simple two letter shortcuts I always used every day.

Gwelsh, as explained in the very comment you're replying to, and also in Bug 972690 Comment 2, "contains" logic was introduced in bug 529584, not here, so let's avoid going off-topic here per this bug's current definition (although historically, much of the relevant discussion is here). Fwiw, and also explained on bug 972690, your comment exposes wrong assumptions / understanding of the previous and current algorithms and their shortcomings/bugs. TB/SM have never had a reliably working feature of "two letter shortcuts always bringing up a certain contact topmost of the autocomplete results list", neither before nor after bug 529584. Depending on your particular dataset and use patterns, it might *appear* to work like that if your search word is unique enough and the topmost returned contact has a high popularity index (frequency, i.e. often used by user ever before, even long ago). This bug actually helps your cause by enabling users to rapidly narrow down their search results via adding a second search word, even just 2 or 3 extra characters which make a unique combination with first searchword.

Fwiw, pls find more explanation in Bug 529584 Comment 86.

N.B. Pls avoid quoting long comments!
Comment 124 Blake Winton (:bwinton) (:☕️) (PTO 'til London. Find me there for quick answers!) 2014-04-08 10:09:29 PDT
Comment on attachment 8362968 [details] [diff] [review]
Patch v4.2

Okay, so this is better, but…
I think we need better prioritization.  If I've got "Display, Janette", and "Peter Lairo" in my various address books, I _really_ expect "p" to suggest "Peter Lairo" before "Display, Janette"…

But other than that (kind of thing), this seems like an improvement.

Thanks,
Blake.
Comment 125 Thomas D. (currently busy elsewhere; needinfo?me) 2014-04-09 02:32:19 PDT
(In reply to Blake Winton (:bwinton) from comment #124)
> Comment on attachment 8362968 [details] [diff] [review]
> Patch v4.2
> 
[...] this seems like an improvement.
> 
> Thanks,
> Blake.

Hallelujah! UX lead paves way for autocomplete revolution: Multiword XXL quick search power about to land in recipient autocomplete!

Thanks :)
Comment 126 Thomas D. (currently busy elsewhere; needinfo?me) 2014-04-09 04:19:51 PDT
(In reply to Blake Winton (:bwinton) from comment #124)
> Comment on attachment 8362968 [details] [diff] [review]
> Patch v4.2
> Okay, so this is better, but…
> I think we need better prioritization.  If I've got "Display, Janette", and
> "Peter Lairo" in my various address books, I _really_ expect "p" to suggest
> "Peter Lairo" before "Display, Janette"…
> But other than that (kind of thing), this seems like an improvement.

As hinted by Blake's choice of words ("kind of thing"), prioritization of results is a complex issue which isn't half as straightforward as it may look at first sight. We're exploring that intention in Bug 970456, and I've commented with extensive caveats there against common misunderstandings of current and upcoming behaviour.

The main problem is that as long as we keep popularityIndex as the main sorting criterion (or the refined version thereof aka "Frecency", Bug 382415), that will always eventually override any other secondary sorting criteria, and rightly so. Per concept of frecency (as in FF awesome bar), if after typing "p", user picks "Peter Lairo" often and recently enough, TB should learn about that intention and rank that result higher which will eventually rank "Display, Janette" lower.

So unless you're contesting the currently implemented priority ranking by popularityIndex (which should be refined as "frecency"), any other sorting algorithm will only apply as a secondary sorting for results which happen to have exactly the same popularityIndex/Frecency value, which is more than unlikely except when you start out with a fresh installation of TB.

> I think we need better prioritization.  If I've got "Display, Janette", and
> "Peter Lairo" in my various address books, I _really_ expect "p" to suggest
> "Peter Lairo" before "Display, Janette"…

Fwiw, that example isn't very good because it tends to creates the false impression that searching for single characters like "p" could reliably return certain results, which is never the case, neither in current implementation nor after adding of secondary sorting as proposed by comment 124. Search phrases of one or two letters are simply not unique enough to return reliable results (unless with "Frecency" algorithm as in FF awesome bar). Starting from 3 letters, that's when you can expect to see fewer and more relevant results (with or without frecency). Furthermore, based on which criteria do we prefer one field over another? Should we prioritize on First Name, Last Name, or Display Name? Or on email (which might start with entirely different characters again?)?
We also need to avoid simplistic assumptions on field data structure:
Can we safely assume that it's always the first word of a field that matters for sorting? That assumption already breaks on simple things like double names and company names in Display name, and also on semantic word boundaries inside combined words, much more common in other languages like German.

FN: Anne-Mary LN: Johnson-Patterson -> are .contains matches on "M" for "Mary" or "P" for "Patterson" any less relevant than .beginsWith matches, and do we have any way of telling? (NO).

FN: Annemarie LN: Wildsmith -> are .contains matches on "M" for "Marie" or "s" for "Smith" any less relevant, and can we tell beforehand? (NO).

DN: "John Parker (Airmozilla.com)" -> are .contains matches on "a" for "Airmozilla" or even "m" for "Mozilla" any less relevant than .beginsWith matches for fields or words, and can we tell? (NO)

The only reliable ways of associating single-character search phrases with a certain result are:
- (permanent link) nicknames (Bug 325458): define that "p" should always return "Peter" as topmost result
- (dynamic link) frecency (Bug 382415): after typing "p", pick "Peter" frequently and recently enough, TB will learn that and offer you the same next time

The good news is that after this bug, there's a much more efficient way of finding exactly what you want by incremental multi-word search:

- start out by typing the first three letters of whatever you remember best about the target card:
"Pet" (-> already you'll get a list including all the Peters and Pettersons and petroleum.com)
- then add any unique substring to reduce matches:
"Lai" (-> 98% probability that "Peter Lairo" is the only match. Of course if you also have "Petra Laila Smith", she'll stick around with Peter, and rightly so.)

Example why single-character search (e.g. for "p") usually fails (already in current implementation, not affected by this bug):

User's AB has these cards:

[set1]
Paul 
Pauline 
Patrick 
Pamela 
Peter Lairo
Perry
Phil
...
[set2]
Keith Parker
John Patterson
Zora Peters
Jamie Pullman
...
[set 3]
Displayname: Pink Panther FN:John LN: Müller
Displayname: Pussy Riot FN:Emilia LN: Justina
...
[set4]
John <parkerjohn@foo.bar>
Jane <plymouth.tourism@foo.bar>
Jessy <foo@porsche.com>
Julie <JuliePostman@foo.bar> [semantic 1]
John <foo@airmozillapresentation.com [semantic 2]
Xaver <xaver-the-postman@foo.bar> [semantic 3]
...
[set5]
Display, Janette
Hopper, John
Speidel, Hans
Annemarie Wildsmith [semantic 2]

Now there's the letter "p" in all of these.
When user searches for "p", which results should TB return first?

Should we prioritize results whose...
- Firstname field beginsWith "p" [set1]? -> doesn't help as there might be dozens of firstnames starting with "p"; even when you list them alphabetically, user still needs to choose from the list
- Lastname field beginsWith "p" [set2]? -> doesn't help as there might be dozens of the same
- DisplayName field beginsWith "p" [set3]? -> doesn't help as there might be dozens of the same; and how do sort these in when their FirstName and LastName fields start with entirely different letters?
- Email Address beginsWith "p" [set4]? -> doesn't help because there might be dozens of the same; and how do we sort these in with "field beginsWith" matches on FirstName, LastName, or DisplayName?

Of course, searching for just a single character like "p" will give you false positives, but those false positives are not limited to .contains matches (set 5) in any way, and we have no way of telling apart such false positives from semantic .contains matches where the semantic word boundary is different from the technical word boundary and there's nothing stopping users from searching for beginsWith matches of those "inner" words (semantic 1-3). Iow, the exact data structure of cards is widely unpredictable and we can't assume all users to have simple single-word datasets like "John Doe <john@doe.com>".

Finally, what if search word matches for more than one field of a single entry? Should that be somehow prioritized, too?

In conclusion, imho it is very much impossible to come up with any reliable one-for-all prioritizing algorithm except "frecency" with "full nickname matches toplisted". I'm not denying that there might be edge cases like fresh installation where some secondary sorting algorithm might be useful, but I think it's pretty pointless to waste time on developing such; I'd rather invest that precious time into implementing "Frecency" in Bug 382415 and "Nicknames" in Bug 325458 (popular ux-papercut which will be much more helpful for the scenario of most users who want "p" to return "Peter Lairo").
Comment 127 Suyash Agarwal (:sshagarwal) 2014-04-15 03:48:20 PDT
Created attachment 8406766 [details] [diff] [review]
Patch v4.7 (4.2 updated)

Updating the patch as v4.2 wasn't applying cleanly on trunk.
Increasing the version number to v4.7 as I expect at most 1 iteration (v5) will be ready for check-in :)

Thanks.
Comment 128 Thomas D. (currently busy elsewhere; needinfo?me) 2014-04-22 02:35:07 PDT
Neil, this important patch has all blessings including Blake's ui-r+ and was further improved in some review cycles with you so it's now waiting for your final review to land. Can we pls ensure that this doesn't miss the *TB31 branch deadline on 28th April* (which would delay this for another 10 months) after 10 years of user requests, 4 years of discussion here and 5 months of refining the patch... TIA! :)

Thanks for several reviews of this which you have already done along the way, so it should be familiar and afasics your review comments have all been addressed.
Comment 129 neil@parkwaycc.co.uk 2014-04-22 15:50:28 PDT
Comment on attachment 8406766 [details] [diff] [review]
Patch v4.7 (4.2 updated)

>+    return aSearchWords.every(
>+      String.prototype.contains, cumulativeFieldText);
Nit: I don't like this wrapping, instead I'd prefer
return aSearchWords.every(String.prototype.contains,
                          cumulativeFieldText);

>+    let fullString = aSearchString.toLocaleLowerCase().trim();
Now that you've moved the trim() here, I just noticed that this makes the check for an invalid search string much easier - it will simply be empty at this point, thus avoiding the need for code to remove empty members or check for an empty array. (Also you should prefer to trim before lower casing.)

>+    // Quietly return if there are no members left in the array.
>+    // (Maybe the user was playing with the field entering only spaces?).
>+    if (searchWords.length == 0)
>+      return;
This is actually incorrect behaviour - you need to call onSearchResult before returning. The easiest fix is to move this code block after the params.type check. (Also integrate it with the existing search string check.)

>+        searchQuery += modelQuery.replace(/@V/g, encodeURIComponent(searchWord));
(I think this might have bitrotted. Sorry about that.)
Comment 130 Suyash Agarwal (:sshagarwal) 2014-04-22 22:02:10 PDT
Created attachment 8410764 [details] [diff] [review]
Patch v5

(In reply to neil@parkwaycc.co.uk from comment #129)
Thanks a lot for reviewing this.

> >+        searchQuery += modelQuery.replace(/@V/g, encodeURIComponent(searchWord));
> (I think this might have bitrotted. Sorry about that.)
I updated the tree and it applies cleanly.

I have changed the commit message in the patch as the previous one
was too long. Please let me know if I have to revert it back.

Thanks.
Comment 131 :aceman 2014-04-22 23:49:53 PDT
(In reply to Suyash Agarwal (:sshagarwal) away till 7th of May from comment #130)
> > >+        searchQuery += modelQuery.replace(/@V/g, encodeURIComponent(searchWord));
> > (I think this might have bitrotted. Sorry about that.)
> I updated the tree and it applies cleanly.

It MAY get bitrotted by bug 749097 :) If if you check this in before me then it will not :)
Comment 132 Suyash Agarwal (:sshagarwal) 2014-04-22 23:51:37 PDT
(In reply to :aceman from comment #131)
> It MAY get bitrotted by bug 749097 :) If if you check this in before me then
> it will not :)

Oh, so should I update my patch with yours applied?
Comment 133 :aceman 2014-04-22 23:55:07 PDT
No, if you get the remaining review earlier, then I will update mine :) I still have to get some more reviews.
Comment 134 Thomas D. (currently busy elsewhere; needinfo?me) 2014-04-22 23:56:20 PDT
(In reply to Suyash Agarwal (:sshagarwal) away till 7th of May from comment #130)
> Created attachment 8410764 [details] [diff] [review]
> Patch v5
> I have changed the commit message in the patch as the previous one
> was too long. Please let me know if I have to revert it back.
>
> Bug 558931 - Implement substring match in addressbook entries for composition's recipient autocomplete.

No, that's wrong or misleading at least, because simple substring aka "contains" match for whole search string (*foo bar*), albeit planned and discussed here, was eventually introduced in Bug 529584:

> Bug 529584 - In composition address autocomplete, use "contains" to match the search string
> *anywhere* in the card fields that we currently search, e.g. email addresses or display name.
> r=mconley, r=mkmelin

Btw that commit message is much longer than what we had here...
> Bug 558931 - Implement partial/substring matching for each of multiple search words in addressbook
> entries for composition's recipient autocomplete.

So I suggest this as commit message for this bug:

Bug 558931 - In composition's recipient autocomplete, implement *foo* AND *bar* search pattern for multiple search words
Comment 135 Suyash Agarwal (:sshagarwal) 2014-04-22 23:58:02 PDT
(In reply to :aceman from comment #133)
> No, if you get the remaining review earlier, then I will update mine :) I
> still have to get some more reviews.

(In reply to Thomas D. from comment #134)
> So I suggest this as commit message for this bug:
> 
> Bug 558931 - In composition's recipient autocomplete, implement *foo* AND
> *bar* search pattern for multiple search words

Okay :)
Comment 136 neil@parkwaycc.co.uk 2014-04-23 11:22:22 PDT
Comment on attachment 8410764 [details] [diff] [review]
Patch v5

>Bug 558931 - Implement substring match in addressbook entries for composition's recipient autocomplete.
Perhaps the phrase "match all words" would be applicable?

>+    let fullString = aSearchString.trim().toLocaleLowerCase();
>+    // Array of all the terms from the fullString search query
>+    // (separated on the basis of spaces).
>+    let searchWords = fullString.split(/\s+/);
Nit: don't bother calculating the words if we're ignoring the search string.

>-    if (!aSearchString || aSearchString.contains(",")) {
>+    if (!searchWords.length || !aSearchString ||
>+        aSearchString.contains(",")) {
This check is wrong; searchWords.length will never be zero, it's fullString that you need to check.
Comment 137 neil@parkwaycc.co.uk 2014-04-23 11:24:26 PDT
(In reply to comment #136)
> Nit: don't bother calculating the words if we're ignoring the search string.
(i.e. calculate the words after the check for an ignored search string)

> This check is wrong; searchWords.length will never be zero, it's fullString
> that you need to check.

If you really want to check searchWords then I suppose you could write searchWords = aSearchString.toLocaleLowerCase().match(/\S+/g) as that will then be null (as opposed to an empty array) if there are no words.
Comment 138 Suyash Agarwal (:sshagarwal) 2014-04-23 12:38:34 PDT
Created attachment 8411261 [details] [diff] [review]
Patch v5.3

Does this suffice?
Comment 139 neil@parkwaycc.co.uk 2014-04-23 13:50:37 PDT
Comment on attachment 8411261 [details] [diff] [review]
Patch v5.3

>+    let fullString = aSearchString.trim().toLocaleLowerCase();
[I think I would prefer this moved down to where it is first used.]

>+    if (!fullString || !aSearchString ||
[This works but you don't need to check both fullString and aSearchString.]
Comment 140 Suyash Agarwal (:sshagarwal) 2014-04-23 13:58:44 PDT
Created attachment 8411330 [details] [diff] [review]
Patch for check-in

Thanks :)
Yippee!
Comment 141 Suyash Agarwal (:sshagarwal) 2014-04-23 13:59:37 PDT
Comment on attachment 8411330 [details] [diff] [review]
Patch for check-in

Carrying over review from mkmelin, Neil and ui-r from bwinton.
Comment 142 Ryan VanderMeulen [:RyanVM] 2014-04-24 05:43:48 PDT
https://hg.mozilla.org/comm-central/rev/ac5495ab2fdc
Comment 143 :aceman 2014-04-24 07:07:15 PDT
This may have caused test failures at https://tbpl.mozilla.org/php/getParsedLog.php?id=38405795&tree=Thunderbird-Trunk .
There are even JS errors like "aSearchString is null" in the file this patch touches. Some tests may need update after the changed evaluation of the search expression. Can you do it quickly, or do we back the patch out?
Comment 144 Suyash Agarwal (:sshagarwal) 2014-04-24 07:10:35 PDT
(In reply to :aceman from comment #143)
> Some tests may need update after the changed evaluation of the
> search expression. Can you do it quickly, or do we back the patch out?

Oh :( Let me try.
Thanks for letting me know this.
Comment 146 :aceman 2014-04-24 08:28:20 PDT
The affected tests seem to be:
mailnews/addrbook/test/unit/test_nsAbAutoCompleteSearch1.js
mailnews/addrbook/test/unit/test_nsAbAutoCompleteSearch4.js
Comment 147 Gervase Markham [:gerv] 2014-04-24 09:16:51 PDT
Has anyone done performance measurements before and after this patch, with a decent-sized address book? The performance of autocomplete on large books is already terrible (see bug 984875) and we should not make it worse, particularly just before an ESR.

Gerv
Comment 148 Suyash Agarwal (:sshagarwal) 2014-04-25 04:16:47 PDT
Created attachment 8412523 [details] [diff] [review]
Patch for check-in + modified tests

Carrying over review from Standard8 on IRC.
Tests pass locally for me.
Comment 149 Suyash Agarwal (:sshagarwal) 2014-04-25 04:21:39 PDT
Created attachment 8412525 [details] [diff] [review]
Patch for check-in + tests modified

Oh so as per gerv's comment about the performance. Do we have a bigg addressbook to test? Or can we get this in as is?
Comment 150 :aceman 2014-04-25 05:57:50 PDT
I can send you a big addressbook I got for bug 984875.

From the look of the patch it seems you iterate the query with each word found in the search term. So with single word there is no performance change? You even remove the _searchWithinEmails function which is proposed in bug 984875 as a speed win. So for this scenario this could be no or positive change after this patch.
gerv, does this fit your scenario? Or do you do multiword searches?

Suyash, does any of the tests actually test for this new AND functionality? It looks like the 'search: "l f"' may exercise the new feature and therefore finds more matches.

Aryx, could you run this on try server, xpcshell tests only?
Comment 151 Gervase Markham [:gerv] 2014-04-25 06:07:27 PDT
I'm not saying there isn't a performance win; I'm saying that we should measure rather than assume :-)

It's not about "my scenario", it's about making sure we don't regress performance for anyone.

Gerv
Comment 152 Sebastian H. [:aryx][:archaeopteryx] 2014-04-25 06:16:07 PDT
Pushed to Thunderbird-Try: https://tbpl.mozilla.org/?tree=Thunderbird-Try&showall=1&rev=5110428b4e9f
Comment 153 Suyash Agarwal (:sshagarwal) 2014-04-25 07:02:48 PDT
(In reply to :aceman from comment #150)
> I can send you a big addressbook I got for bug 984875.

Thanks :) 
So, please let me know and please send the addressbook to me if we need to test it.

> Suyash, does any of the tests actually test for this new AND functionality?
> It looks like the 'search: "l f"' may exercise the new feature and therefore
> finds more matches.

Ya, that's it.
Comment 154 Suyash Agarwal (:sshagarwal) 2014-04-25 11:14:46 PDT
Created attachment 8412767 [details] [diff] [review]
Final patch for check-in

Made a suggested change from Neil.

So, I tried an addressbook with 16000 entries.
The time taken to show autocomplete results was less than the time shown by addressbook to show the addressbook entries when selected.

So, I hope we can get this in.
Shall we?
Comment 155 :aceman 2014-04-25 13:53:35 PDT
So in my manual tests this patch reduces the time needed to autocomplete 'a' from 30 seconds to 20 seconds with 2 addressbooks totalling 20000 cards, with many duplicates.
I didn't find a case where it would be slower than current state. I'd say this can go in.

I am already working on some optimizations in bug 984875 that I can put on top of this patch.
Comment 156 Suyash Agarwal (:sshagarwal) 2014-04-25 13:55:27 PDT
Thanks :)
Comment 157 Ryan VanderMeulen [:RyanVM] 2014-04-26 14:33:52 PDT
https://hg.mozilla.org/comm-central/rev/65baf781db3c
Comment 158 Thomas D. (currently busy elsewhere; needinfo?me) 2014-07-21 00:24:41 PDT
[tb31features][relnote]

For all practical purposes, this bug 558931 and bug 529584 are a massive improvement (feature enhancement/addition) over the existing autocomplete algorithm of TB24. In fact, the XXL search power implemented in these two bugs is nothing short of a *new feature* which revolutionizes the way users can search for addresses using flexible multiword autocomplete (contains *foo* and *bar*) to narrow down their search results with much greater reliability, efficiency and precision than before, regardless of their data structure in AB fields. This should definitely be mentioned and explained in release notes so that users can benefit from this, but also to prevent surprises about the new behaviour which for certain edge cases (e.g. single-letter-searches) can initially return different (potentially broader) result sets which might require users to adjust their personal search patterns. Also, these bugs so far apply to local ABs only, LDAP is a different story.
Comment 159 D Merrick 2014-08-07 02:48:20 PDT
This is a really long thread, so apologies for not skimming it.
I notice some detrimental changes that have just come in to the autocomplete. Basically, if i've typed a few letters, I'm happy for partial matching to occur, but (certainly for a single block of characters) I'd like addressbook entries which begin with the letters I've typed to be placed first above entries where it appears in the middle. Perhaps there could be an option to do this (is there already?)
The reason is that I set up addresses so that what I type captures the one I want as the first entry, and this is thrown by the new system.
Hope this makes sense!
so typing 'edin' should capture 'edinburgh-list' above 'group-edinburgh'
david
Comment 160 Suyash Agarwal (:sshagarwal) 2014-08-07 02:53:16 PDT
(In reply to D Merrick from comment #159)
Hi David!
Thanks for showing up, really appreciated.
I just wanted to bring to your notice that we are aware of this issue
and we have bug 970456 in progress for that.

Do let us know if there are any more concerns.
Thanks.
Comment 161 Pat 2014-08-18 15:57:06 PDT
I've just updated Thunderbird to version 31 today and the address autocomplete doesn't work at all. I assumed that if I typed in the nickname or the first few letters of the name I've saved to display that a selection of names satisfying the criteria would show up and I could select one. I am having to go to the address book, find the person, copy their email address etc. 
Can you suggest anything while this bug is being fixed?
Thank you.
Pat
Comment 162 Pat 2014-08-18 15:57:58 PDT
I've just updated Thunderbird to version 31 today and the address autocomplete doesn't work at all. I assumed that if I typed in the nickname or the first few letters of the name I've saved to display that a selection of names satisfying the criteria would show up and I could select one. I am having to go to the address book, find the person, copy their email address etc. 
Can you suggest anything while this bug is being fixed?
Thank you.
Pat
Comment 163 Mihovil Stanic [:Mikeyy - L10n HR] 2014-08-18 22:31:01 PDT
It's probably something else blocking autocomplete, maybe even not related to TB.
Here on Win7 SP1, TB 31, everything is working without problems.

New autocomplete option is limited to compose new message window. Try testing with fresh installation, without any addons to see if some of them are blocking it.
Comment 164 Steve Wendt 2014-09-09 17:09:07 PDT
(In reply to Pat from comment #162)
> I've just updated Thunderbird to version 31 today and the address
> autocomplete doesn't work at all.

Not working for me either, in SeaMonkey 2.29 (Gecko 32).
Comment 165 Steve Wendt 2014-09-09 17:30:42 PDT
At least for me, the failure is caused by an extension (which sadly is no longer maintained):

Timestamp: 9/9/2014 5:26:35 PM
Error: search.autoCompleteResult is undefined
Source File: file:///.../extensions/exchangecalendar@extensions.1st-setup.nl/interfaces/exchangeAutoCompleteSearch/mivExchangeAutoCompleteSearch.js
Line: 202

Any pointers to info on how to patch it for changed interfaces?
Comment 166 Steve Wendt 2014-09-09 19:19:15 PDT
Apologies for the noise on this unrelated issue; for anyone still interested in this, a newer fork is available here:
https://github.com/Ericsson/exchangecalendar/releases

But unfortunately it doesn't fix the autocomplete failure; the actual problem seems to be this:

Error: NS_ERROR_XPC_CI_RETURNED_FAILURE: Component returned failure code: 0x80570015 (NS_ERROR_XPC_CI_RETURNED_FAILURE) [nsIJSCID.createInstance]
Source File: file:///.../extensions/exchangecalendar@extensions.1st-setup.nl/interfaces/exchangeAutoCompleteSearch/mivExchangeAutoCompleteSearch.js
Line: 152

Where it's doing this: 
... createInstance(Ci.mivExchangeAutoCompleteResult)

which appears to be defined in mivExchangeAutoCompleteResult.js; looking at that makes me wonder if a change in Lightning broke it:

// Load main script from lightning that we need.
NSGetFactory.mivExchangeAutoCompleteResult = XPCOMUtils.generateNSGetFactory([mivExchangeAutoCompleteResult]);

Again, apologies for the off-topic chatter, since this is about Lightning and more specifically a third-party addon.
Comment 167 Craig Stephen 2014-12-01 12:41:04 PST
OK, interesting feature idea (matching on substrings of email addresses) - but could you at least implement a preference to re-enable the old behavior?  This is a nontrivial change... suddenly when I want to type an important email to my boss and type the first few letters of his name, i get some seemingly random other autocomplete based on a substring match to the middle of someone else's name.

Or at least place prefix matches first in the autocomplete picklist.
Comment 168 :aceman 2014-12-01 12:55:19 PST
(In reply to Craig Stephen from comment #167)
> Or at least place prefix matches first in the autocomplete picklist.
Yes, that is bug 970456. Done and on the way to TB31.

Note You need to log in before you can comment on or make changes to this bug.