[research] evaluate SUMO search APIs for best results given a piece of feedback

RESOLVED FIXED

Status

RESOLVED FIXED
4 years ago
2 years ago

People

(Reporter: willkg, Assigned: aokoye)

Tracking

Details

(Whiteboard: u=user c=feedback p= s=input.2015q1)

Attachments

(2 attachments)

SUMO has three API endpoints which can take a piece of text, search the knowledge base and questions and return results. We need to figure out which of the three gives us the best results for a piece of feedback.

This bug covers using a script I wrote which pulls feedback from the Input API, then queries the three SUMO API endpoints and spits out results allowing you to denote which we think is best. We can then analyze that data to figure out:

1. how many words should we require a feedback to have before we query SUMO?

2. are there other distinguishing characteristics we can use to increase the likelihood that asking SUMO for things will help the user?

3. which of the three SUMO API endpoints gives us the most useful results?
I need to tweak the script a little before I attach it. I'll do that tomorrow.
Created attachment 8553134 [details]
sumo_api_test.py

This is the script we should use to figure out which SUMO search API endpoint we should use.
(Assignee)

Comment 3

4 years ago
Created attachment 8554884 [details]
results.tar

I just attached a tar of the results directory. It ended up not stopping me at 100 so there are 120 results included.
Assignee: nobody → aokoye
Adam: How're you doing on conclusions from the data?
(Assignee)

Comment 5

4 years ago
The results of the research leads me to believe that the the best api endpoint is the SUMO Basic Search. Will's script tested the SUMO Search Suggest, SUMO Basic Search, and SUMO Advanced Search API endpoints by taking feedback that has a locale of en-US and and entering said feedback into the above SUMO API endpoints. That gave back the top five results of each API. I chose the best result by reading the feedback given and seeing which which of results looked best. If more than one API had good results I made that known and if none of the APIs had good results that was also noted.

In the end I went through results for 108 pieces of feedback that had at least 7 words (the minimum word count for the new thank you page to be triggered) and the best API endpoint ended up being the SUMO Basic Search. Unfortunately it only ended up potentially solving 50.9% of the queries with the SUMO Advanced Search and Search Suggest endpoints solving 47.2% and 43.5% respectively. Of the results search results in the Basic Search that weren't useful, there were 9 results that would have been helpful (for the same queries) using the Advanced Search endpoint.
Awesome! We'll go with the SUMO basic search endpoint.

One thing I want to point out is that technically, this isn't an API endpoint--it's a JSON formatted version of the results from /search. If that ever changes on SUMO, we'll have to update our code, too. If this project makes it past phase 1, we'll want to implement an actual endpoint on SUMO that does what basic search does and switch to that.

We've done what we need to do here, so marking as FIXED.
Status: NEW → RESOLVED
Last Resolved: 4 years ago
Resolution: --- → FIXED
Moving things out of the input.adam sprint.
Whiteboard: u=user c=feedback p= s=input.adam → u=user c=feedback p= s=input.2015q1
Product: Input → Input Graveyard
You need to log in before you can comment on or make changes to this bug.