Closed Bug 1874068 Opened 5 months ago Closed 4 months ago

Querying for Pocket and AMO suggestions that have multiple keywords with the same first word returns duplicates of those suggestions

Categories

(Application Services :: Suggest, defect, P1)

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: lina, Assigned: ttran)

References

Details

(Whiteboard: [disco])

The Rust component uses a different matching strategy for Pocket and AMO keywords:

  • An exact match on the first word of the keyword, and...
  • Either a prefix match (for AMO and low-confidence Pocket keywords) or an exact match (for high-confidence Pocket keywords) on everything after the first word of the keyword.

These keywords live in the prefix_keywords table, which has a compound primary key over (keyword_prefix, keyword_suffix). Multiple rows can have the same keyword_prefix (first word), as long as their keyword_suffix (everything after the first word) is unique, and this SELECT in SuggestDao::fetch_suggestions will return rows for all suggestions with matching first words. (We match everything after the first word in Rust: Pocket and AMO).

The bug comes up when a single suggestion has multiple keywords with the same first word. The query in fetch_suggestion only matches on the first word (keyword_prefix), so it'll return all rows for the same suggestion_id.

For example, the keywords for the "Search by Image" add-on are ["image finder", "image history", "image investigator", "image query", "image search"], so if the user's query is image, we'll return 5 copies of Search by Image.

Another example: let's say a Pocket story has low-confidence keywords like ["studying aardvarks", "studying anteaters", "studying armadillos"], and the user's query is studying a. Their query could match any of these keywords, for the same suggestion. Right now, we don't provide the "full keyword" for Pocket suggestions (bug 1874028), but if we did, which one should we pick?

Whiteboard: [disco]
Whiteboard: [disco]
Whiteboard: [disco]

Hi Drew! I'm wondering if you have input on this case:

Another example: let's say a Pocket story has low-confidence keywords like ["studying aardvarks", "studying anteaters", "studying armadillos"], and the user's query is studying a. Their query could match any of these keywords, for the same suggestion. Right now, we don't provide the "full keyword" for Pocket suggestions (bug 1874028), but if we did, which one should we pick?

You mentioned we're shelving Pocket for now, and AFAICT, AMO suggestions don't have a full keyword (yet?), so maybe this is a bit moot—but in case this comes up again, do you have a preference for which full keyword we pick if the user types just the complete first word?

Flags: needinfo?(adw)
Blocks: 1877306

Hi Lina, sorry for the delay, I'm bad about looking at needinfos these days. Right, AMO doesn't show a full keyword, and there are no plans to change that. In general I don't have a preference, and since there's never really a right answer, I don't think it matters too much. We can choose whichever is most natural/easy for the implementation IMO. IIRC for AMP suggestions desktop chooses the longest matching full keyword.

The only way there'd be a right answer is if we had the concept of "how well does a particular suggestion match the user." All we do so far of course is simple keyword matching, so we're not there yet, but hopefully someday we will be.

Flags: needinfo?(adw)

Tif's PR was merged so I'll close this now, thanks Tif!

Status: NEW → RESOLVED
Closed: 4 months ago
OS: macOS → All
Hardware: x86_64 → All
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.