Closed Bug 1246459 Opened 5 years ago Closed 5 years ago

prioritize web api reference by default

Categories

(developer.mozilla.org Graveyard :: Search, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: lonnen, Assigned: robhudson)

References

Details

(Keywords: DevAdvocacy, in-triage, Whiteboard: dupeme?)

When I search MDN for 'bind' I get:

1. toolbarBindings.xml
2. WebIDL bindings
3. binding (XUL/binding)
4. binding (XUL/bindings)
5. Binding Implementations(XBL/XBL_1.0_Reference/Binding_Implementations)

Function.prototype.bind() is the 11th result, so it doesn't even show up on the first page.

Function.prototype.call() is the 38th result; virtually unfindable.


Fixing search is a tough technical issue. We may be able to cheaply get big wins if we can boost javascript and css, or at least restrict non-web api reference by default.
Looks like it's matching on "bindings" because of stemming and that word shows up a lot.

There are a couple of other bugs that are similar in that the search query is a specific javascript/css/html thing which is a keyword and thus case-sensitive and shouldn't be stemmed or otherwise changed. In those cases, exact matches should rise to the top of search results.

Skimming our indexing, it seems like it is prose-centric. I wonder if we could analyze rendered_html twice: once prosey (like we currently have) and once with an analyzer that doesn't ditch case and doesn't do stemming. Then add an additional term match on the new document field. Maybe do this with the title field, too? Seems like an interesting thing to try.

Having said that, after you do a search for "bind", clicking on "JavaScript" in the filters causes "Function.prototype.bind()" to show up as the first result. Would you know which filter to use to make your search better? Do users use filters?

Regarding boosting specific topics over other topics, I haven't seen anything about the purpose of MDN search. Is it something like "MDN search helps web developers find web-development-related content in MDN"? If not, what is it? What are the use cases MDN search is currently targeted at? Knowing what MDN search is for would illuminate whether topic boosting is good to do.
> Would you know which filter to use to make your search better? Do users use filters?

My instinct is no but we can probably check with GA.

> Regarding boosting specific topics over other topics, I haven't seen anything about the purpose of MDN search.

As I understand it, we have a few different audiences -- web developers, addon developers, fxos open webapp devs, firefox hackers, maybe others. I'm suggesting we could quickly optimize the search results for one of those audiences at the expense of the others by changing the default filters.
Our largest audience is, by far, Web developers.
FWIW the "Command & Query" search UI would allow users to quickly choose the filters. So searching for "bind #js" would allow you to get to the Javascript specific results much faster.

That new search UI has a few bugs outstanding before it is considered ready for launch:
https://bugzilla.mozilla.org/show_bug.cgi?id=1182261#c1
Keywords: in-triage
Whiteboard: dupeme?
That UI sounds really flexible, but changing the defaults is a more powerful measure
This is extremely important to the devrel team- increasing the relevance of MDN search results for web developers will make education and adoption much easier.
Keywords: DevAdvocacy
Agreed. Web Developers are the primary audience for MDN and should be the default.
I talked to Lonnen about this. I'm going to grab it and work on it this week.
Assignee: nobody → willkg
Status: NEW → ASSIGNED
It'd help to have like 5 different search queries to play with as I'm tinkering with search stuff.

* bind
* ...

What are four more that exhibit this behavior where it exists in multiple categories, but we want web-related ones to show up higher?

If no one can think of others, I'll spend some time doing GA poking. Figured I'd ask in case there were.
These have the desired result on the second page

* bind
* call

These have the desired results on the first page, but lower than Firefox internals versions of the web api

* window
* promise
* workers
* xmlhttprequest
* apply
* postmessage
* navigator

These have a single good result on the top and the rest of the page is not quality:

* regexp

These have an obscure result in the web api above the obvious common result and may not be worth fixing in this bug:

* slice
I hit a bunch of problems most of which I got over, but it means I can't finish this in an appropriate amount of time. So I'm going to take what I did, toss it in a branch and then do a brain dump here and pass it off to Rob who kindly offered to take this from me.

Brain dump goes like this:

I think we want to use a function_score query (https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-function-score-query.html) to boost the scores for documents with "CSS", "HTML" and "JavaScript" tags. I was going to try with a boost of 8.0 because that looks like it'll bump things around for "bind" well, but figuring out a boost number that makes the searches in comment #10 good will require some trial and error.

AMO uses function_score query:

https://github.com/mozilla/zamboni/blob/master/mkt/search/filters.py#L103

MDN search has a nice filter system. We'll want to add a FunctionScoreFilterBackend to the list in kuma/search/filters.py . Doing that might let us fix other bugs.

I haven't figured out how to write a test for this other than to verify that the resulting query has the function_score stanza in it. Scoring is a finicky business and my experience has been that writing tests for it is fragile.

Tags are set in the database, so we might want to expose which tags yield the score boost to the admin. I think it's worth hard-coding the list for now to get it out sooner so that we can tinker with it and make sure it works well in prod then write up a bug for making it easier to change via the admin. Maybe we have different groups of tags that get different boosts. I don't know what we'll ultimately need here. Maybe the function score thing is a terrible idea and we want to try other things.

I wrote a harness that will make it easier to see movement in search results as one tweaks the code. I'll stick that in a gist in case it's helpful.

One thing that makes this impossible to work on right now is that we're running Elasticsearch 1.3.2 in production, but the vagrant vm is setup to run 1.7.2. This is a problem because Elasticsearch has undergone significant API changes between those two versions one of which is whether to use "boost" or "weight" in the function_score query. So before we can work on this, we need to figure that out. I think it's problematic to be developing with one version but running a different one in -prod, so I claim we should change everything to use Elasticsearch 1.3.2 across the board. I think that'll require fixing the vagrant vm provisioning as well as checking what Travis is using. Another alternative that's possibly harder is to switch everything to use 1.7.2. I don't know why webops has us running 1.3.2, but it's not supported and isn't getting security updates, so maybe pushing to upgrade to 1.7.2 is a better path.

So, to move forward with this, I think whoever takes it needs to:

1. figure out whether to downgrade the vagrant vm Elasticsearch to 1.3.2 (need to check Travis, too) or upgrade the -stage and -prod clusters to 1.7.2, write up bugs and complete those

2. continue the research into whether function_score works for all the search terms in comment #10 and whether there are undesired effects, write the code, test it, make a pr, review it, merge it, push it out and celebrate
Assignee: willkg → nobody
Status: ASSIGNED → NEW
I'm actively working on this.

I've discovered that tweaking the function score query isn't going to help much for query terms that match documents really well. E.g. I tested "bind" and boosted relevance for documents tagged "javascript" by 20x and still the top results were things related to the WebIDL or XUL bindings. I think we will have a difficult time finding a good boost value that provides relevant results for web developers for many documents that fit the query terms well but yet aren't typical things a "web developer" are looking for.

Following up on this comment:

(In reply to Chris Lonnen :lonnen from comment #2)
> As I understand it, we have a few different audiences -- web developers,
> addon developers, fxos open webapp devs, firefox hackers, maybe others. I'm
> suggesting we could quickly optimize the search results for one of those
> audiences at the expense of the others by changing the default filters.

The direction I'd like to take this bug is this: If no filters are selected, the default filters will be HTML, CSS, and Javascript. Debatable are "Web Development" and possibly "APIs & DOM"? The writers would no best if those should be included or not by default.

What this means is searching from the homepage or any search box other than the search results page would automatically be filtered by HTML, CSS, and Javascript tagged documents.

I'm removing bug 1196354 as a dependent bug because doing this no longer requires function scoring.

Also, bug 1182261 would be a very nice follow up bug once this is complete.
Assignee: nobody → robhudson
I think "APIs & DOM" is needed too as this includes all the Web APIs that are not in the JS definition.
Depends on: 1257622
Commits pushed to master at https://github.com/mozilla/kuma

https://github.com/mozilla/kuma/commit/4be458dceed0132e4ecad2eb36f3cab98cb333c2
Fix bug 1246459 - Prioritize web by default

https://github.com/mozilla/kuma/commit/7f15829572f2872795363c8d5929ec615cce8c7c
Merge pull request #3814 from robhudson/1246459-search-web

Fix bug 1246459 - Prioritize web by default
Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Is there a follow-up bug for APIs & DOM? Definitely need that turned on as well.
Fix deployed to production today. CSS, HTML, and Javascript were added immediately after. I also just now added "APIs & DOM" as requested in comment #14 and #16.
See Also: → 1337461
Product: developer.mozilla.org → developer.mozilla.org Graveyard
You need to log in before you can comment on or make changes to this bug.