Closed
Bug 1029541
Opened 11 years ago
Closed 11 years ago
developer.mozilla.org is down again
Categories
(Infrastructure & Operations Graveyard :: WebOps: Community Platform, task)
Infrastructure & Operations Graveyard
WebOps: Community Platform
x86
macOS
Tracking
(Not tracked)
VERIFIED
FIXED
People
(Reporter: davidwalsh, Assigned: nmaul)
Details
(Whiteboard: [kanban:https://kanbanize.com/ctrl_board/4/422] )
Attachments
(1 file)
|
5.41 KB,
text/x-log
|
Details |
At roughly 9am this morning I discovered that MDN appeared down again, showing a "service unavailable" message. Working with :cyliang and :jezdez to fix.
Comment 1•11 years ago
|
||
Smells related to bug 1027052.
ZLB reports being unable to connect to the web heads (both production and dev). The HTTP error log on all three production webheads report "server reached MaxClients setting, consider raising the MaxClients setting". Production recoveries begin to occur when I forcibly restarted HTTP to clear connections.
Comment 2•11 years ago
|
||
What caused that outage: the toolbar gathers info about the contributors of a document when the document loads. It happens to load them one by one, which can be slow. This is a performance problem, but one that's easy to handle when a small number of contributors have worked on a document. The scenario that caused the crash was multiple people opening a document at around the same time that had 209 contributors.
It was simply too much work for the webhead to maintain that many database connections ongoing, and it brought the service as a whole to its knees. We're working on a fix now.
I worked with jezdez to determine root cause and a potential fix. He'll implement shortly.
Updated•11 years ago
|
Whiteboard: [kanban:https://kanbanize.com/ctrl_board/4/422]
Comment 3•11 years ago
|
||
https://github.com/mozilla/kuma/pull/2504 has various fixes for those serial queries by doing the ordering by count of revision creators on the database side in two queries (albeit risking large IN queries) and also not requiring a JOIN for the user_profile table that was caused by circular code to generate the user's gravatar URL. Push comings soon..
Comment 4•11 years ago
|
||
Err, coming..
Updated•11 years ago
|
Assignee: server-ops-webops → nmaul
Comment 5•11 years ago
|
||
The pull request is merged and pushed. MDN is back up.
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Comment 6•11 years ago
|
||
Confirmed and tested with the page that was causing the issues. All resolved. The page load went from 31.06 seconds (newrelic reporting at time of incident) to < 1.32 seconds (newrelic reporting after patch).
Big win for performance, and big win for the contributor bar!
Status: RESOLVED → VERIFIED
Comment 7•11 years ago
|
||
Note: the issue causing page was: https://developer.mozilla.org/en-US/docs/Midas/Security_preferences
Comment 8•11 years ago
|
||
Woohoo!
Updated•7 years ago
|
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•