Closed Bug 1205667 Opened 9 years ago Closed 9 years ago

Investigate MDN down-time incident 2015-09-17

Categories

(developer.mozilla.org Graveyard :: General, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: groovecoder, Unassigned)

References

()

Details

(Keywords: in-triage)

There was a large spike in time spent in $compare transaction(s) before the outage:

https://rpm.newrelic.com/accounts/263620/applications/3172075/transactions#

All from a user agent "mozilla" ... so can't block it with the agent-blocking prevention we put in place for the last down-time.

We've seen this spike before down-times before, so I'm going to take the most expensive part of the $compare transaction (tidying the HTML of the revisions) out of the HTTP request completely. (We had previously moved it to a cache-behind operation, so now I'm making it an asynchronous cache-only operation.)

:jakem - can you dig into why Apache seems to hit max connection limits(?) after long-running transactions like this? Our down-times are always a massive spike of "Request Queuing" in New Relic, and we seem to hit Apache connection limits far too often.
Commits pushed to master at https://github.com/mozilla/kuma

https://github.com/mozilla/kuma/commit/8c546d9348331136cab4ad48f08bd3bfd8addeb0
bug 1205667 - get_tidied_content can return blank

When a $compare request is made for a large revision,
we want to skip tidy_content and return a warning to the user,
so we don't block requests on the expensive tidy operation.

https://github.com/mozilla/kuma/commit/315fbf3a13f3205205eb64e8fc590a53f0666c5a
bug 1205667 - tests for get_tidied_content

https://github.com/mozilla/kuma/commit/4f282be57ddfae08e1cb816857682a8e10eb8860
Merge pull request #3497 from mozilla/never-tidy-in-compare-request-1205667

bug 1205667 - get_tidied_content can return blank
As with bug 1203528, this downtime was immediately preceded by a spike in $compare transaction CPU time. Based on https://bugzilla.mozilla.org/show_bug.cgi?id=1203528#c3, I'm going to call both incidents investigated and resolved, knowing that we still haven't quite cleaned up everything.
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Product: developer.mozilla.org → developer.mozilla.org Graveyard
You need to log in before you can comment on or make changes to this bug.