NewRelic has been alerting us for a few weeks about appdex warnings (below from IRC #snippets): NR_ALERT: Alert opened on snippets.mozilla.com-python: 'Snippets - FetchSnippets' Apdex score < 0.70. https://rpm.newrelic.com/accounts/263620/applications/2904874/incidents/6091144 :mkelly has already done some debugging on this (below from IRC #snippets): <mkelly> So the apdex appears to have been taking the admin interface and public snippet display into account, which we didn't want because those views are so rarely used and don't really need to be as performant as the main fetch snippets view. [12:15:05] <mkelly> So I added a patch to disable apdex on those views but we still got alerts for reasons I couldn't figure out yet [12:15:19] <mkelly> The response times on fetch_snippets are fine, between 250ms and 400ms or so [12:15:24] <mkelly> Our apdex threshold was at 500ms [12:15:47] <mkelly> Yet we kept on getting notifications, so I temporarily set the threshold to 2 seconds to shut it up since the response times on the view we care about were fine Notwithstanding the 2 second threshold, NewRelic is still notifying us.
So it turns out it's the FetchSnippets key transaction causing the trouble: https://rpm.newrelic.com/accounts/263620/key_transactions/3745 It expects response times from FetchSnippets in under 0.08 seconds, which seems ambitious to me, but considering we were doing fine until recently, might be a fine number. https://rpm.newrelic.com/accounts/263620/applications/2904874?tw[end]=1383339480&tw[start]=1383080280 shows a 3-day period from October 30th to November 1st when our response times for FetchSnippets jumped from around 70ms to 300ms. I'm currently trying to figure out what caused this.
I believe I've found a possible source; around the same time the ramp-up started was when the first Firefox OS animation snippets using the new template went out. This is a rather large template to parse, and AFAIK we aren't caching parsed templates due to how we load them. I also ran a profiling session, which indicates template rendering/parsing took 44% of the time for most requests: https://rpm.newrelic.com/accounts/263620/applications/2904874/profiles/45523 I'm going to implement template caching for snippet templates and see if that helps the load.
Commit pushed to master at https://github.com/mozilla/snippets-service https://github.com/mozilla/snippets-service/commit/f1fbf4c1fba70d52e89775cfedda9416dfdcbfea Bug 938748: Cache compiled snippet templates.
After the change above hit production, we saw a 50% decrease in app server time, which had already dropped down due to another event around 10PM Eastern on November 18th that I have yet to pinpoint. Either way, our average request time is down to 21ms, so I think we can call this issue resolved. :D