Open Bug 1635100 Opened 5 years ago Updated 1 year ago

Introduce new visits transitions

Categories

(Toolkit :: Places, task, P2)

task
Points:
5

Tracking

()

ASSIGNED
Iteration:
81.1 - July 27 - Aug 09

People

(Reporter: mak, Assigned: mak)

References

(Depends on 1 open bug, Blocks 2 open bugs, )

Details

(Whiteboard: [sng])

I think we may want a few new transitions, both for solving current issues like "everything marked as typed" and for future better algorithms.

I think we should introduce a TRANSITION_SEARCH, for visits that are searches starting from the urlbar.
Then we likely want to disambiguate TYPED, we have 3 possible transitions we could generate from here:
TRANSITION_TYPED: this has effectively been typed by the user and it's not a search
TRANSITION_UI_PICKED: This has been picked from any UI, and it's not a search nor a bookmark
TRANSITION_BOOKMARK: The url was bookmarked at the time of visit. This transition already exists but it's only used by bookmark views

Of course the scores won't be immediately perfect because we can't fix old entries, but should adapt. In case of downgrade unknown transition types will get a zero bonus.

There are a few details to define here:

  1. If I pick a url from an history view, and that url is bookmarked, should it take transition_bookmark or transition_ui_picked (does it matter more that it's bookmarked or where it comes from?). Checking bookmarks before every visit is of course a bit more expensive. If we just use the info we have at-hand (like in the urlbar we know the bookmarked status for most entries) it will be cheap, but in certain cases we won't mark it properly as bookmarked.
  2. The fact an entry is bookmarked is sign it's important to the user, but does it matter if it WAS bookmarked? The problem is that if we only consider the current status, calculated scores won't be stable (starting from the same data at different times may give different results depending on whether something WAS/IS bookmarked). Hopefully if we have a steep slope, thus a rapid decrease in scores after a few days, the old scores won't matter much so it may become a non-problem.
  3. If I pick autofill, should that be considered TYPED?
  4. What else do we risk to mark as typed if it's an url not known to history?
Blocks: 1461844
Blocks: 1330343
Flags: needinfo?(dao+bmo)
Flags: needinfo?(adw)

My main thought is that these transitions overlap. For example, and you mention this, you can type a search, so does it get TYPED or SEARCH? You say that TYPED doesn't include searches, but why not? You can type a bookmarked URL, you can pick a bookmark from the UI, etc.

I think that's because these transitions mix two different dimensions, how and what -- how a transition happens (UI picked, typed) vs. what the URL is (search, bookmark). We should use one or the other, or we could have two transition dimensions/scores that are both applied, one for how and one for what.

After breaking it down into these two dimensions, it follows that we might have transition scores for each RESULT_TYPE (the what), and similarly transition scores for each UI access point (the how). Of course some number of the scores in each dimension could end up being the same, if we're not sure how to weigh bookmarks vs. history for example, or bookmarks sidebar vs. history sidebar.

Another thought is that TYPED still isn't clear because you can start out by typing something and then pick a result. If you type one or two characters and then pick a URL result or a search result, is that TYPED or UI_PICKED? How many characters would it take to count as TYPED? At least one? Or do you have to type the full search/URL and choose the heuristic result? Should there be TYPED_AND_PICKED_NON_HEURISTIC and TYPED_AND_PICKED_HEURISTIC? Is it even useful to differentiate, or should there be a single URLBAR score in the UI-access-point dimension?

(In reply to Marco Bonardo [:mak] from comment #0)

  1. The fact an entry is bookmarked is sign it's important to the user, but does it matter if it WAS bookmarked? The problem is that if we only consider the current status, calculated scores won't be stable (starting from the same data at different times may give different results depending on whether something WAS/IS bookmarked). Hopefully if we have a steep slope, thus a rapid decrease in scores after a few days, the old scores won't matter much so it may become a non-problem.

I'd guess that most people aren't bookmarking and unbookmarking things frequently at all... but that's just a guess. So I doubt it matters too much, and we could just use the current bookmarked status.

  1. What else do we risk to mark as typed if it's an url not known to history?

Boosting typos and other mistyped things? And as I say, it depends on how we defined TYPED. If TYPED means anything you start typing and then pick a result, it will end up applying to lots of transitions again.

Flags: needinfo?(adw)

(In reply to Drew Willcoxon :adw from comment #1)

My main thought is that these transitions overlap. For example, and you mention this, you can type a search, so does it get TYPED or SEARCH? You say that TYPED doesn't include searches, but why not? You can type a bookmarked URL, you can pick a bookmark from the UI, etc.

I think it doesn't matter much for our scope, we are basically assigning a score at the time of visit rather than later, of course if in the future our picks are wrong, we have the same problem of today. But potentially it's not a big deal, because scores decay, so defining new ones will just require a small time to decay the old ones.
With that vision, we'd apply the score that better fits the user action.

I think that's because these transitions mix two different dimensions, how and what -- how a transition happens (UI picked, typed) vs. what the URL is (search, bookmark). We should use one or the other, or we could have two transition dimensions/scores that are both applied, one for how and one for what.

The scope of transitions, as they were defined, is to describe "how we reached this url". I think in the search example, I can't think of a case where we'd care that a search is typed VS not, for example.
We have of course to make a score decision at the time of the visit.

After breaking it down into these two dimensions, it follows that we might have transition scores for each RESULT_TYPE (the what), and similarly transition scores for each UI access point (the how).

Let's see which problem we may have introducing 2 dimensions:

  1. space, with 300k visits each growing a further integer, we'll likely need to grow the db 500KiB, probably ok we'd lose some history but nothing major, we have space according to telemetry
  2. complexity, when calculating scores we must pick more values... it's probably ok, the added cost should not be interesting
  3. not everything has 2 dimensions, there's risk that very few things have, and then we'd have over-engineered our requirements, and end up with added complexity without added benefit. For this we likely need to put down a table of cases, and see how we plan to score them, if the scoring plan ends up being 1-dimension (like TYPED SEARCH and PICKED SEARCH get same score, as I suspect), I'd prefer to avoid this complexity.

Another thought is that TYPED still isn't clear because you can start out by typing something and then pick a result. If you type one or two characters and then pick a URL result or a search result, is that TYPED or UI_PICKED? How many characters would it take to count as TYPED? At least one? Or do you have to type the full search/URL and choose the heuristic result? Should there be TYPED_AND_PICKED_NON_HEURISTIC and TYPED_AND_PICKED_HEURISTIC? Is it even useful to differentiate, or should there be a single URLBAR score in the UI-access-point dimension?

TYPED is one of my main concerns, as I said I'm not even sure the fact something is typed matters that much, the user pastes a url or types it, is it more important than a bookmark?
My current idea of TYPED is pretty much just the heuristic result, if the url comes from the heuristic result is either SEARCH or typed, anything else is either SEARCH or UI_PICKED.
Again, I should probably put down a table of possible user actions, mark for each the "how" and "what" and see where we end up.

I'd guess that most people aren't bookmarking and unbookmarking things frequently at all... but that's just a guess. So I doubt it matters too much, and we could just use the current bookmarked status.

Fair point, now we have the problem of what to mark as bookmarked, every url if it's bookmarked? only urls for which we already have that info? start with the latter and then see if we can extend?

Boosting typos and other mistyped things? And as I say, it depends on how we defined TYPED. If TYPED means anything you start typing and then pick a result, it will end up applying to lots of transitions again.

Absolutely, I'd like a very strict definition of TYPED, pretty much just typed and confirmed with Enter.

Thanks, I'll start with making a table of user actions and possible outcomes.

To be clear, I'm not suggesting we need to come up with a score for each possible result type and each possible UI access point. We may not care for many of them, so we might want a baseline don't-care score (which is maybe zero or one, I don't know). My main point is that I think it's clarifying to think of these scores along those two dimensions. Otherwise things get confusing.

If the scope of transitions is "how we reached this URL" and we want to keep that scope, then we can ignore the what dimension and focus on the how, from my previous comment. So that would suggest we don't want SEARCH or BOOKMARK transitions but instead something like (and I'm being verbose for clarity) URLBAR_TYPED_AND_PICKED_HEURISTIC, URLBAR_TYPED_AND_PICKED_NON_HEURISTIC, URLBAR_TOP_SITES (i.e., no typing), BOOKMARKS_SIDEBAR, BOOKMARKS_TOOLBAR, etc. Or to your question about whether typed is useful, maybe URLBAR_INPUT_AND_PICKED_HEURISTIC and URLBAR_INPUT_AND_PICKED_NON_HEURISTIC to cover any sort of input (e.g. paste), or more simply URLBAR_HEURISTIC, URLBAR_NON_HEURISTIC.

(In reply to Marco Bonardo [:mak] from comment #2)

Fair point, now we have the problem of what to mark as bookmarked, every url if it's bookmarked? only urls for which we already have that info? start with the latter and then see if we can extend?

As I say, if we want to focus on how, then it seems like we want to score based on the UI access point and not whether the URL is bookmarked per se.

(In reply to Drew Willcoxon :adw from comment #3)

As I say, if we want to focus on how, then it seems like we want to score based on the UI access point and not whether the URL is bookmarked per se.

Well there is also our scope, that is scoring, so marking just the origin is not useful, for example not annotating whether something is bookmarked doesn't sound good for our scope. As well as I'd prefer not having one entry per each UI piece, also because those change and we unlikely need to be that much fine grained.
Anyway, I'll start with that table.

Iteration: --- → 78.2 - May 18 - May 31
Assignee: nobody → mak
Status: NEW → ASSIGNED

I'm adding a wip spreadsheet.

Iteration: 78.2 - May 18 - May 31 → 79.2 - June 15 - June 28

More concrete WIP spreadsheet, with an actual proposal for first implementation.
Discussing with MDB it came out that in the end we can also start with something simple that puts the basis of a future more complicate system, we may not make perfect assumptions immediately, and that's where experiments and data will help. It's important that at this stage we don't break things too much and introduce fallbacks that will pave the way to future changes.
I've taken Drew's suggestion for a 2-dimensional description of visits from the first WIP proposal I posted here some weeks ago.

Drew, I'd appreciate if you could have a look at this, since I'm planning to start implementing it and see how complex things are in reality and what we missed.
First things to do will likely be to define new transitions and fallbacks, then add the new column, and then make the ui properly mark sources.
Entries without a source will fallback to the base one (url), thus they may present a different score, but frecency adapts as time goes by anyway, so it'll just be a temporary problem. Adding more sources and categorization will improve scores over time, and then once we move on with better recalcs (bug 1618605) and an algo with a steeper slope, the problem should pretty much be reduced to the slope time (Somewhere between 7-30 days).
Ideally experiments should be able to change how visits are categorized (more/less categories), modify the decay slope and recalculate scores on the alt_frecency column, while the frecency column keeps the non-experimental value.

also note by using this we can immediately start fixing the "typed" problem without having a new algo (see bugs blocked by this), so it will be an incremental improvement.

Looks good to me overall! Picking the expected score category seems pretty hard though. For example, why do the search-related actions end up with the navigation category but other actions in the urlbar are userpicks? I'm guessing it's because searches and search suggestions aren't user data? That might make sense. And some bikeshedding on the "userpicks" and "uipicks" names -- it's not clear how those are different. The distinction between those two categories seems to be primary UI vs. secondary UI? If so, maybe primarypicks and secondarypicks are better names?

This looks like a good foundation to build on.

Flags: needinfo?(adw)

(In reply to Drew Willcoxon :adw from comment #8)

Looks good to me overall! Picking the expected score category seems pretty hard though. For example, why do the search-related actions end up with the navigation category but other actions in the urlbar are userpicks? I'm guessing it's because searches and search suggestions aren't user data?

Because in general we want to push search results down the list, searches tend to not be repeated that often and we have suggestions that can do a better job. These should improve with experiments anyway.

That might make sense. And some bikeshedding on the "userpicks" and "uipicks" names -- it's not clear how those are different. The distinction between those two categories seems to be primary UI vs. secondary UI? If so, maybe primarypicks and secondarypicks are better names?

Yes, names are not great, the categories are described in the WIP document, and I agree there's some confusion to be solved.
My original idea was just just have 3 categories (matters, neutral, doesn't matter), but discussing with the team it came out we may want to give bookmarks/typed a bit more importance, thus these 2 categories were born.
I'm not sure primary and secondary are right, but also the way I assigned them is not perfect (picking history in the urlbar vs picking a bookmark from the toolbar, for example). This is another thing that may require experimentation.
For now I may go with primarypicks and secondarypicks, not directly meaning the UI levels though, but the user intents.

Iteration: 79.2 - June 15 - June 28 → 80.1 - June 29 - July 12
Iteration: 80.1 - June 29 - July 12 → 81.1 - July 27 - Aug 09
Flags: needinfo?(dao+bmo)

another interesting condition to consider, is whether we want to give a boost to pages that have a password stored.
There are a few things to consider though:

  1. if passwords are per origin, we'd end up boosting every page in that origin
  2. if passwords are for specific pages, we could end up boosting login pages, that are not that interesting

So, needs more brainstorming.

Depends on: 1842008
Whiteboard: [sng]
You need to log in before you can comment on or make changes to this bug.