Closed Bug 948925 Opened 11 years ago Closed 11 years ago

bigrams on stage and prod are weird

Categories

(Input Graveyard :: Dashboard, defect, P1)

defect

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: willkg, Assigned: willkg)

References

Details

(Whiteboard: u=analyzer c=dashboard p=1 s=input.2013q4)

The bigrams being generated are wrong in totally funky ways.


For example:

https://input.allizom.org/en-US/analytics_dashboard/occurrences_comparison?product=Firefox&first_version=24.0.0&first_search_term=&first_start_date=&first_end_date=&second_version=25.0.0&second_search_term=&second_start_date=&second_end_date=

Count 	Count / 0.008 	Bigram
4 	500.0 	minuć window
4 	500.0 	minuć twice
4 	500.0 	feedback twice
4 	500.0 	feedback submit
4 	500.0 	fcić window
2 	250.0 	comy mad
2 	250.0 	alive comy
1 	125.0 	happy testing
1 	125.0 	feedback happy


"minuć window" items should really be "minute window".

If you look through those bigrams, this happens a lot. Further, words are always mangled the same way.

This doesn't happen in my local development environment. It only happens on stage and prod.
Going to work on this today, so grabbing it.

I suspect this is a problem with passing the response descriptions to ES for analysis. The tokens we get back at that point are mangled. I suspect that stage and prod are configured differently defaults-wise than my local environment. Going to give it 2 points for now, but will adjust as I discover more.
Assignee: nobody → willkg
Status: NEW → ASSIGNED
Whiteboard: u=analyzer c=dashboard p= s=input.2013q4 → u=analyzer c=dashboard p=2 s=input.2013q4
It was an easy fix--just needed to specify the analyzer so that it wasn't using the default one. I have no idea why the default anlayzer on stage/prod ES clusters is different.

In a PR: https://github.com/mozilla/fjord/pull/193
Whiteboard: u=analyzer c=dashboard p=2 s=input.2013q4 → u=analyzer c=dashboard p=1 s=input.2013q4
Landed in master in https://github.com/mozilla/fjord/commit/f8ed988

Waiting until Monday to push because it's release week.
Pushed to prod just now.
Status: ASSIGNED → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Oops--this requires reindexing. Will do that today.
All set now.
Product: Input → Input Graveyard
You need to log in before you can comment on or make changes to this bug.