Closed
Bug 1077686
Opened 11 years ago
Closed 10 years ago
tokenize responses, index that and expose to input api
Categories
(Input Graveyard :: Backend, defect)
Input Graveyard
Backend
Tracking
(Not tracked)
RESOLVED
WONTFIX
People
(Reporter: willkg, Unassigned)
Details
(Whiteboard: u=user c=api p= s=)
We've got some ok-ish tokenizing support for English in Input already. We could tokenize feedback when it's getting indexed and then expose that to the Input API. That means people don't have to tokenize when using the Input API. This is helpful for word trends, clouds, etc.
Then as we improve the tokenizing (stemming, stop words, synonyms, etc), everyone down stream will benefit.
Win!
Reporter | ||
Comment 1•11 years ago
|
||
One of the things I've been having problems with is when you do a porter stemmer pass, you end up with word roots like "hav".
Gregg said the way he deals with this is that he has a big map of roots -> real words and then runs that on the resulting tokens. I had that in one of the original tokenizing passes I did, but it requires us to maintain a big map.
I had another idea where we run the porter stemmer pass and whenever we stem something, we add root -> original word to the map and just keep the smallest original word. I think that'll work pretty well and it maintains itself. It does have some edge cases plus I think it requires us to write a porter stemmer, but maybe that's a good idea anyhow.
Reporter | ||
Comment 2•11 years ago
|
||
Bumping this to 2015q1 because we're out of time for this quarter.
Whiteboard: u=user c=api p= s=input.2014q4 → u=user c=api p= s=input.2015q1
Reporter | ||
Comment 3•10 years ago
|
||
Bumping this out quarter sprints. When we need this, we can re-schedule it.
Whiteboard: u=user c=api p= s=input.2015q1 → u=user c=api p= s=
Reporter | ||
Comment 4•10 years ago
|
||
I was thinking about using this, but I don't think anyone else was. I'm going to WONTFIX this now. If there's a compelling reason for the work to be done, someone will write up a new bug for it.
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → WONTFIX
Assignee | ||
Updated•9 years ago
|
Product: Input → Input Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•