Closed Bug 797464 Opened 12 years ago Closed 10 years ago

Name based boosting doesn't appear to be working

Categories

(Marketplace Graveyard :: Search, defect, P2)

defect

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: krupa.mozbugs, Assigned: robhudson)

References

()

Details

(Whiteboard: [u=dev c=mkt-search p=2][qa-][incorrect_implementation])

Attachments

(1 file)

Attached image screenshot
steps to reproduce:
1. Load https://marketplace-dev.allizom.org
2. Search for "air"
3. Notice the search results

expected behavior:
name match has a higher precedence over description match

observed behavior:
"airbnb" is listed below "chirp radio"
Depends on: 572453
This definitely still happens. The dependent bug points to a missing spec for search criteria in amo. amo search doesn't have to be related to mkt search here. But either one needs a spec.

And while doing that we might want to consider things like locale-aware search (stemming, decompounding, non-whitespace-tokenizers and all the rest). This is really a bit of a separate project where we probably want UX input.
Whiteboard: [u=dev c=mkt-search p=]
Whiteboard: [u=dev c=mkt-search p=] → [u=dev c=mkt-search p=0]
Another example more recent is searching for "wiki"...
https://marketplace.firefox.com/search/?q=wiki

Wikipedia should definitely get some boosting b/c it includes "wiki" in the name but it does not. Wikipedia could benefit themselves by adding some descriptions but that's beside the point.

I looked at the elasticsearch query results with explain=true and the only boost wikipedia got was via the app_slug. I'll have to take a closer look at our queries and why the query time boosting isn't working as expected.

We should definitely hone this when we're on the new marketplace index.
Priority: -- → P3
No longer depends on: 572453
Summary: Weights for search results need tweaking → Name based boosting doesn't appear to be working
Whiteboard: [u=dev c=mkt-search p=0] → [u=dev c=mkt-search p=2]
Assignee: nobody → robhudson.mozbugs
Priority: P3 → P2
I took a closer look at this. Wikipedia *is* getting a prefix query boost (because "wiki" matches the start of "wikipedia") but the boost isn't very strong. So the query as we've specified it is working, it's just that the other apps "e.g. wikihow" have a stronger match according to elasticsearch.

I believe wikipedia doesn't score very highly for a few reasons:
* A better description that contains the word "wiki" would help boost wikipedia's scoring.
* The distance between "wiki" and "wikihow" (3 extra characters) vs "wiki" to "wikipedia" (5 extra words) plays a part of it also.
* The wikihow app also contains the substring "wiki" more frequently than wikipedia's (which doesn't include just the word "wiki" itself anywhere).

Another option that AMO has done for certain apps is to split known sets of words. E.g. "firebug" would normally get tokenized into a single word, but AMO has added a special case for this so that "firebug" gets tokenized as "fire" and "bug". So searching for "fire" helps find the firebug add-on. The input is provided as a list of words to decompound on, e.g. [fire, bug, flag, fox, grease, monkey, flash, block, ...]. If Marketplace did something similar we could use this to help find apps like wikipedia (by including "wiki") and Airbnb (by including "air" or "bnb"). 

> expected behavior:
> name match has a higher precedence over description match

This is indeed what is happening. But in the case of "airbnb" that's a single word. And "air" only matches it because of a prefix match. There's no use of the word "air" in the name or description. So apps like "AirCombat" and "Air Hockey" get a higher boost. "AirCombat" because we tokenize at mixed case boundaries. "Aim Point Pool" is an interesting case and it is because "aim" is a single edit distance away from "air" so the fuzzy search match kicks in and since this is in the name it gets a bigger boost here. The edit distance of "air" to "airbnb" is 3. "Airport" has the word "air" in the description and is scored higher as well.

I'm inclined to close this bug. The query is working as expected but the data we have in our search index may not return results as we'd expect in some circumstances.

We could also morph this bug into wanting to add a dictionary decompounder and feed it a list of words to start off and see how it helps.
Thanks for the great analysis Rob.  I think morphing this bug would just cause confusion so I'm going to close it.  New bugs are welcome to cite it.  CCing dbialer to think about the dictionary decompounder option.  It sounds useful  to me.
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Whiteboard: [u=dev c=mkt-search p=2] → [u=dev c=mkt-search p=2][qa-]
Whiteboard: [u=dev c=mkt-search p=2][qa-] → [u=dev c=mkt-search p=2][qa-][incorrect_implementation]
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: