Closed Bug 1168475 Opened 10 years ago Closed 10 years ago

drf3 upgrade results in less feedback

Categories

(Input Graveyard :: Submission, defect, P1)

defect

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: willkg, Assigned: willkg)

Details

(Whiteboard: u=user c=feedback p=2 s=input.2015q2)

Attachments

(1 file)

We upgraded django-rest-framework to 3.1 (bug #1132455) and pushed the changes to production on May 20th, 2015 at 9:50am Pacific time. At that point, the total number of happy/sad responses dipped. The front page dashboard showed no obvious changes for Firefox, Firefox for Android or Firefox OS products. * https://input.mozilla.org/en-US/?date_start=2015-05-11&date_end=2015-05-25 However, using one of the various alpha quality dashboards, we see a big dip for Loop: * https://input.mozilla.org/en-US/dashboard/loop?date_start=2015-05-11&date_end=2015-05-25 This bug covers looking into why the number of feedback responses has dropped for Loop.
Attached image input_responses.png
Graph of Input returned response codes.
Grabbing this. I pulled down the data and looked at it on an hour-by-hour basis and it looks like at 10:00am Pacific time, the total number of incoming responses drops. 5/18 5/19 5/20 5/21 5/22 00:00 18 19 29 8 5 01:00 35 25 29 7 5 02:00 32 65 30 9 5 03:00 32 42 40 14 10 04:00 36 40 29 8 11 05:00 23 36 45 7 10 06:00 49 35 41 15 9 07:00 60 39 43 18 15 08:00 52 53 60 17 11 09:00 44 56 62 16 26 10:00 57 65 **18** 16 17 11:00 83 76 31 17 23 12:00 77 103 17 18 17 13:00 85 73 14 12 10 14:00 70 53 14 9 13 15:00 28 46 13 8 8 16:00 33 22 5 7 7 17:00 17 18 4 4 8 18:00 20 23 6 6 5 19:00 24 19 8 7 4 20:00 15 20 4 11 2 21:00 16 29 3 2 7 22:00 19 22 4 3 23:00 14 22 1 4 4 So far, it looks like it only affected Loop, but it's possible that it affected other products but it's not as noticeable because they're more heterogeneous in regards to how the feedback gets into Input. I attached a graph of Input response status codes. What's happening to the feedback that doesn't make it to Input? 1. Maybe the data validation on Input changed in some subtle way causing problems for Loop but not other products? If so, we should see a spike in HTTP 400 responses around May 20th and forward. But we don't. 2. Maybe the ratelimiting logic changed in some subtle way causing problems for Loop but not other products? If so, we'd see a spike in HTTP 429 responses, but we don't. 3. Ricky posited that there's some codepath through the feedback POST API that doesn't actually save the data to the db. This is a possibility. Maybe related to bad transaction handling code? 4. Maybe loop saves an item then updates it later rather than saving a new item. (No clue how this could happen, but maybe?) Other events that happened in that time period: 1. Firefox 38 release on 5/12. When did that go full throttle? Were there changes in the Loop client related to feedback? 2. We updated the production Elasticsearch cluster on 5/20, but that happened at 8:00pm Pacific which is 10 hours after the dip, so it's unlikely that's related.
Assignee: nobody → willkg
Status: NEW → ASSIGNED
I spent the last few hours looking at the data, reading the Fjord code, reading the django-rest-framework code and the mulling over the changes between django-rest-framework 2 and 3. I can't figure out what's going on here. Given that, I'm backing out the DRF3 upgrade. I'll watch the graphs for the next 24 hours and see what happens.
I isolated the problem. It's exclusively "happy loop feedback". When we upgrade to DRF3, that drops to 0.
Ricky pointed out that the CharFields that aren't required have a default of u'', but have allow_blank=False. That might be the problem. It's definitely worth fixing regardless. In a PR: https://github.com/mozilla/fjord/pull/585
I just pushed this out now. We'll know in like 30 minutes whether this fixed the issue or not.
I pulled the db and ran the analysis scripts I wrote. When I backed out the DRF3 upgrade code, the number of happy loop feedback per hour returned to normal. When we pushed out the fix based on Ricky's theory, the number of happy loop feedback per hour continued to be normal. I'm pretty sure we're ok now. I'll keep an eye on things in case other curious things show up. For now I'll mark as FIXED.
Status: ASSIGNED → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Whiteboard: u=user c=feedback p= s=input.2015q2 → u=user c=feedback p=2 s=input.2015q2
Product: Input → Input Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: