Open Bug 1067681 Opened 10 years ago Updated 2 years ago

Create MetaSearch rank order via Prefs for Address Autocomplete

Categories

(Thunderbird :: Address Book, enhancement)

enhancement

Tracking

(Not tracked)

UNCONFIRMED

People

(Reporter: educmale, Unassigned)

References

(Depends on 1 open bug, )

Details

This bug is an in-depth suggestion for handling various problems with ranking and sorting which arise in the context of address autocomplete.

It addresses problems where different users have different expectations from the autocomplete, either an in-general expectation or hope, or one derived from the way that TBird worked in prior incarnations.

For example: Some folks have large lists, and want nicknames shown first, others, straight alpha sort of the actual email address.  Some tolerate small (or larger) delays as the autocomplete fills, others would rather do away with some complexity for quick type response (See, for example, Bug 

This bug/suggestion was originally posted in bug 970456, and is set out here at the suggestion of Thomas D.

----------

There are, now, and in the past have been, a set of MetaSearches; those MetaSearches are ranked, then executed in some sequence, and the results are sorted and displayed by some consequential rank.

The MetaSearches look in different places.  They include: a search of:
eMailAddress; NickNames; DomainNames; (TLDs); FirstNames; LastNames; Last&First or First&Last; and etc.

The ranking also includes a reliance on the AddressUseCount (on this, See Bug 1058583 "Address Book Popularity Index needs to age") ((if aged, the AddressUseCount becomes a better PopularityIndex, addressing recent use))

The choices for selecting the execution order of the: MetaSearches, the Aging, or other Sort method, is now fixed in stone from the fingertips and mind of the designer/programmer.   My suggestion allows some user control, and addresses many of the problems raised in the bugs referenced here, and elsewhere

This is the essence of my suggestion, here:

For each of the MetaSearches programmed, create a preference which has a value of 0-n.
A value of "0" means that MetaSearch is not undertaken.
A non-zero value means that the search is undertaken.  They are undertaken in the order of the preference -- 1 is first and n is last.  MetaSearchPrefs can share a number, but there would be no expectation of ordering when so set.

Default values are the those that the designer/programmer wants; the programmer obviously includes all the possible ones, throws out the ones he hates (setting its pref to "0"), and crafts an order of undertaking/execution for the rest.

The execution of code for any ranking (whether it is any of the MetaSearch or reliance on AddressUseCount), is executed in the sequence suggested by the MetaSearchPreference value.  Again, this order defaults to the order set by the designer/programmer

Now, here is the beauty of this.  Any user can shut down, or for that matter, reorder all the rankings' executions, by merely changing the preference values, in about:config.   Realistically, someone will quickly create an Add-On which provides some guidance and means to elegantly and simply set values.  Realistically, folks will create directions for changing things to (whatever one wants in a search tool), and they will appear all over the internet.

My programming experience tells me that this global approach should be relatively easy to engage.   Except for the pain of creating the prefs.

An additional beauty is that as the designer search/sort is re-programmed (that's inevitable...), what is really happening is the changing/addition of MetaSearches, the order in which those MetaSearches are presented, and nature/means of the final sort.   -Each- of those steps occurs in some sequence, the MetaSearchPrefs value define that sequence, the MetaSearchPrefs are -always- accessible to users; and the user is ultimately in control.

With this simple and moderate change (from a fixed sequence to a sequence ordered by a MetaSearchPref Values), we address ("provide a means to cure") -all- of the problems that crop up in -all- of the bugs, suggestions and complaints that revolve about the autocomplete function.
In my view, the resolution of this bug/suggestion does not firmly depend on the resolution of any other bug.  However, it would be wise to enable this mechanism in the current changes suggested within Bug 970456, and/or other bugs which address problems with the address autocomplete.

Once this backbone is in place, current or other (future) changes would just follow the pattern set up as suggested here.

Thomas: I don't know how to add a SeeAlso in bug 970456.  For some reason, the edit doesn't show up
OS: Windows 8 → All
Hardware: x86_64 → All
John, make sure to CC me if you want me to see certain comments, even needinfo for important questions.

About reciprocal "see also" links, no idea why it didn't work. Perhaps a rights thing. Filed a comment on Bug 553932 and added the missing links. See also: Bug 853479.
See Also: → 1068570
I really like the suggestions here, thanks for taking the time and effort to create this John!
I suspect that once we get autocomplete tuned up that the potential usage for this will be far fewer people than when this bug is first filed.  In any event, this sounds more like addon territory than prefs - prefs is still going to limit you to what programmers have predesigned. And, I suspect it unlikely that such prefs would be approved to be exposed in the first place.
Severity: normal → enhancement
An interesting and valuable comment, Wayne.   

However...while I recognize the problems with exposing preferences, the only people who would actually dive into them would be those with serious and distinctive expectations about their address searches -- I suggest that -those- people would tend to be power users.   Thus the potential for mania by the general population is limited

I suspect that if this bug/suggestion were implemented, the schema changes that people would tend to select or place in service would be: last names high, nicknames high, pure email addresses high, or aged popularity index (see my other bug/suggestion at bug 1058583), with smaller populations looking for other elements.  As such, my perspectives are that programming will make the individual search elements efficient, and that no amount of tuning would cure large blocks of users' expectations and, in contrast, access to some tuning tools will solve lots of issues -- this bug/suggestion is one tuning tool that could be implemented simply (if I understand what is going on)

In my mind, the order of execution of (or perhaps the order of ranking of) search elements is selected by programmers, for the moment.  Adding in the gloss, suggested here, has some elements of simplicity for programming.  Each of the search elements, or ordering, is (converted to) a procedure call.  Each of the procedures is called, in sequence selected by the current programming gurus.   Why not merely call those in the order suggested by a set of preferences?   

Were I skilled in deciphering the current code, and then skilled with the coding beaurocracy and use patterns, I would give this a shot, but, alas, I am a VBA and Fortran coder...

So, I hope that someone will dive into the searching code, and make some global procedural blocks, and add some prefs, and use them.... :)
I doubt the code is all that complicated.  And there are people who will help you. https://developer.mozilla.org/en-US/Add-ons/Thunderbird will help, should you decide you don't want to wait for this bug to be fulfilled. Although, you may need an API to autocomplete
Like Wayne, I agree that we shouldn't be adding preferences here. If we really can't improve autocomplete to the point where everyone is basically happy with the results, then we should just ensure that add-ons have appropriate hooks into this code.

It might be instructive to look at how the Firefox AwesomeBar works. It's often cited as one of the main things keeping people on Firefox, and I have to agree with them: it's really good! One super cool thing about the AwesomeBar is that it *learns* from your usage. If address autocomplete had something like this, it would only take a short time of training before it adjusted to each user's needs, and then *everyone* (even the people who don't futz with settings all the time) would be happy.
Jim and Wayne:  I understand your perspective -- but could it be that the location bar is, in its essence, a frecency that depends on a narrow scope of data, that is, the URL's and their text title, while in comparison, the address seeking has far more combinations of source data that various users lean on, making the email address in TBird a far harder nut to crack?

See, for example, the endless goings on at just this one bug, pre -and- post resolved...

Bug 970456 - Recipient autocomplete: Implement scoring algorithm to prioritize "beginsWith" wordwise matches over "contains" matches: search for "be" should toplist "Ben" and "Betty" over "HolBErt" (having "be" in the middle of word)

Wayne: a rewrite of the structural essence as I suggest, a rewrite by me particularly, would be so inefficient as I would have endless questions to everyone about how to make GIT and everything else sparkle!

But, perhaps, there is a preference-less solution:

Why not make the structural changes, alone, foregoing the preferences and using internal constants which add-on authors could manipulate

I would suggest placing the search elements and ordering elements into procedures (a total of n elements), and creating two internal constant arrays, of length n -- one is refers to the procedure name somehow, and the other array is an index array.  Then, merely, call the procedures in sequence suggested by the index array, allowing for not calling a procedure if its order is zero.  The only other requirements are that the (current or future) elements should be narrow in the 'normalized' sense, if you will.  Each element should do one small thing to change the order, or provide content lines for the ultimate master list.  Also, any new/added element/schema should also be segregated and provided a pair of constants, and the order/rank re-stated for the whole set of elements

With the programming change I suggest, there would be a quick flurry of folks to write add-ons that do all sorts of tweaks, massaging the order and making funky interface dialogs that make it all work.

I would, by the way, include my "Bug 1058583 - Address Book Popularity Index needs to age" as one of the procedural things that can be turned on/off, this way.
FYI, info on FF location bar frecency scoring algorithm:
   https://developer.mozilla.org/en-US/docs/Mozilla/Tech/Places/Frecency_algorithm
[Thanks, Thomas]
See Also: → 546737
Thomas:  546737 is interesting; one more category of ranking that someone could craft.   My suggested meta ranking algorithm would take that idea up so quickly... and those that were interested in it could summon that special class anywhere in the meta ranking.
(In reply to john ruskin from comment #8)
> Jim and Wayne:  I understand your perspective -- but could it be that the
> location bar is, in its essence, a frecency that depends on a narrow scope
> of data, that is, the URL's and their text title, while in comparison, the
> address seeking has far more combinations of source data that various users
> lean on, making the email address in TBird a far harder nut to crack?

I don't think so. FF frecency is actually pretty complex if you've seen it. It's not just URLs and their text titles; at least they also consider Tags and Keywords (which is nicknames in TB); plus URLs are plenty from history and bookmarks. Their source data set is actually massively bigger than the average TB address book, because there's no limit to the number of URLs searched for people who keep their history and/or have many bookmarks. Plus, there's percentages in weighting, and also there's a database which records user search inputs and the specific results they pick from that input. And that's the key element where FF trumps by achieving dynamically customized rankings for each user.

Autocomplete is for finding a single recipient; the most likely scenario for that is searching for Name or Email. We do and we can allow searching for other fields (I'm even advocating to expand the field list, and/or allow customization), but that approach will still use those other fields just as a tool to ultimately find a unique contact. So it's more like using other fields as tags, and I've also advocated for introducing an explicit "tags" field for that purpose. But the main purpose is really just finding a single recipient from your search input, not creating a results list with refined sorting (which this bug might tempt users to do).

> See, for example, the endless goings on at just this one bug, pre -and- post
> resolved...
> 
> Bug 970456 - Recipient autocomplete: Implement scoring algorithm to
> prioritize "beginsWith" wordwise matches over "contains" matches: search for
> "be" should toplist "Ben" and "Betty" over "HolBErt" (having "be" in the
> middle of word)

A lot of that discussion is just noise, people jumping to the wrong conclusions from problems they see, and people not understanding the inner design and design goals of TB autocomplete algorithms (including its bugs).

That Bug 970456 is more like a bandaid fix (for age-old problems now more exposed) which has already started to blow because we couldn't make up our minds or lacked understanding between fixed and dynamic concepts of sorting, and we had started in that bug to practically disable "popularityIndex" which wasn't a good idea since that's the main design idea currently used. So we have already rolled back a large chunk of Bug 970456, namely the absolute priority for initial display name matches (which breaks every semblance of frecency currently realized as crippled popularityIndex).

> a rewrite of the structural essence as I suggest

The inherent design goal of recipient autocomplete is to find a single (!) contact. So the criterion for the usefulness of this bug 1067681 is in how far it can offer ways of succeding to get "the right", single(!) contact toplisted. Well-sorted result lists are NOT a goal in itself, but just an intermediate goal to the end goal of toplisting the right contact.

This bug allows users to create a fine-tuned sort order for results based on meta ranking of fields or other sorting algorithms.

I have big doubts that this bug 1067681 is the right and required path for achieving that result, because for most variations the stable result sorting created here will NOT be good enough to toplist the desired, single (!) contact... UNLESS you make "frecency" incl. search-input+result-picked-database your top criterion!

I don't see the the point of allowing users to sort the autocomplete results list by field-based criteria, say first on last names, then on first names, etc. Such sorting is NOT helpful to narrow down on a single contact, because you can still have 20 Johns in your AB, so you'll always end up picking manually from the list (whereas autocomplete is meant to find your best match and toplist that). Any user-defined meta ranking which uses fields as top sorting criterion will fail to find a single result unless field contents are unique, which is when they are either nicknames or tags (but never names). Another disadvantage of top-sorting by fields is that it will force you to always use search words which match against that field. In the long run, I doubt that's a natural way of searching. We aren't robots, so we tend to remember/associate different things about different contacts. Back to square one, the most useful and important thing we need is "frecency"-based algorithm to decide which of the 20 John's is your favorite under the given circumstances (including predictive analysis of your previous search-and-pick-match behaviour). TB should learn that when you type "tb qa", you've often picked ThomasD after that search. Users who want to pick from stiff alphabetical list can use the main AB or composition's miniature variant of that, contacts side bar. We can't spend time to duplicate such functionality.

Bottom line:
1) Recipient autocomplete's inherent design is to find a single contact.
2) Making alphabetical fields like Name, Email etc. the top sort criterion is usually NOT suitable for finding single contact; and we already offer alphabetical results list in AB and contacts side bar.
3) We already offer fixed alias names aka Nicknames, and they will actually work starting from TB38.
4) Contact tags (additional search-words on the contact) would be a nice addition which are somewhat in between fixed alias names (nicks) and dynamic alias names (frecency). Existing bug.
5) Dynamic alias names aka Autocomplete frecency (including predictive analysis of search-input+picked-result patterns) is a superior solution to find single contact because it dynamically adapts to the user's search patterns, even as they change over time, as opposed to forcing the user into fixed schemes of searching to succeed with certain field-based algorithms like alphabetical name, email etc.
6) Because of 1) and 2), this bug 1067681 would depend on a working implementation of Autocomplete Frecency (5), because without such, it can't help much to find a single contact. But it looks like a lot of work and maintenance burden.
7) After getting Autocomplete frecency (5) "right", and allowing contact tagging, there's not much need for this bug 1067681, because TB will actually learn about your search patterns and deliver good results for ALL sorts of people. You can type "Doe" as a last name and TB will learn which of your "Doe" contacts is the right one. You can type "john." as part of an email and TB will associate that with your favorite contact. Anything goes, and eventually succeeds to find single contact after a bit of learning. More so for multiple searchwords, which will rapidly narrow down your search results google-style.

Sorry I'm a bit struggling with words...

Basically, I'm arriving at the same conclusion as comment 7.
See Also: autocompleteFrecency
(In reply to john ruskin from comment #5)

> I suspect that if this bug/suggestion were implemented, the schema changes
> that people would tend to select or place in service would be: last names
> high,

last names high is nonsense for autocomplete, because last names are non-unique so they are not suitable to find the needle in the haystack, unless combined with some supporting algorithm like frecency.

> nicknames high, 

Full nickname matches MUST topscore by inherent design as unique alias names, and it'll work in TB38.
See Bug 325458 how/if we handle partial nickname matches.

> pure email addresses high,
We currently have high scores for all matches against beginning of word-like parts of the visible string (display name and email). Email addresses are part of that.
I don't think prioritizing email field in general is good in the long run for any user. For some people, we know their names, for others, we'll remember their email. Always top-listing email field results can break it for you very quickly.

> or aged popularity index

No! It's never "OR" with "aged popularity index". Aged popularity index aka autocomplete frecency (considering your input, too) can only work when it's the top sorting criterion. Putting anything else first will basically break it. And putting frecency first makes all other criteria become cosmetic for sorting the rest of the list (but secondary sorting is still needed for the training phase of frecency where you pick things from further down the list).
This is a completely different animal compared to all the other field-based stuff, and it's very powerful and adaptive as seen in FF awesome bar.

> (see my other bug/suggestion at bug 1058583),

Yes, we definitely need something like that.

> with smaller populations
> looking for other elements.

Unlikely, and addon area.

> As such, my perspectives are that programming
> will make the individual search elements efficient, and that no amount of
> tuning would cure large blocks of users' expectations

No. Intelligent frecency can work for large blocks of users. Many users succeeded even with that crippled version of popularityIndex, as can be seen from the clamor when the results changed.

>  and, in contrast,
> access to some tuning tools will solve lots of issues -- this bug/suggestion
> is one tuning tool that could be implemented simply (if I understand what is
> going on)

Probably no, I don't think anything is simple to implement in that area.

> In my mind, the order of execution of (or perhaps the order of ranking of)
> search elements is selected by programmers, for the moment.

Making "frecency" the top criterion actually returns the ranking definition to the user!
I must wholeheartedly agree with Thomas here... and thanks Thomas for taking the time to try to back up your (our) view on how this should work, and that 'intelligent' frecency simply must be the top priority - and I'd say it should be hard-coded, except that I also like options, so I'd be ok with a user pref that could change that for advanced users...
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.