reviewboard-hg.mozilla.org went offline

NEW
Unassigned

Status

a year ago
a year ago

People

(Reporter: sal, Unassigned)

Tracking

Details

(Reporter)

Description

a year ago
@nagios-scl3> (IRC) Wed 22:20:50 UTC [5606] [devservices] reviewboard-hg2.dmz.scl3.mozilla.com:Out of memory - killed process is WARNING: Log errors: May 31 22:17:26 reviewboard-hg2.dmz.scl3.mozilla.com kernel: [6198232.462511] Out of memory: Kill process 26225 (httpd) score 120 or sacrifice child (http://m.mozilla.org/Out+of+memory+-+killed+process)
15:25:58 <@nagios-scl3> Wed 22:25:59 UTC [5611] [unset] reviewboard-hg.mozilla.org:HTTP is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 Internal Server Error - 651 bytes in 0.030 second response time (http://m.mozilla.org/HTTP)


Was not able to get into the box but here are some logs. 


May 31 22:17:26 reviewboard-hg2.dmz.scl3.mozilla.com kernel: [6198232.462511] Out of memory: Kill process 26225 (httpd) score 120 or sacrifice child
May 31 22:17:26 reviewboard-hg2.dmz.scl3.mozilla.com kernel: [6198232.463365] Killed process 26225 (httpd) total-vm:2067792kB, anon-rss:1252600kB, file-rss:0kB, shmem-rss:0kB
--
May 31 22:18:24 reviewboard-hg2.dmz.scl3.mozilla.com kernel: [6198290.878226] Out of memory: Kill process 26211 (httpd) score 120 or sacrifice child
May 31 22:18:24 reviewboard-hg2.dmz.scl3.mozilla.com kernel: [6198290.878638] Killed process 26211 (httpd) total-vm:2139908kB, anon-rss:1437616kB, file-rss:32kB, shmem-rss:0kB
--
May 31 22:19:21 reviewboard-hg2.dmz.scl3.mozilla.com kernel: [6198347.657948] Out of memory: Kill process 26591 (httpd) score 126 or sacrifice child
May 31 22:19:21 reviewboard-hg2.dmz.scl3.mozilla.com kernel: [6198347.658390] Killed process 26591 (httpd) total-vm:2214404kB, anon-rss:1693140kB, file-rss:228kB, shmem-rss:0kB



Also not sure what fixed the issue;

nagios-scl3> Wed 22:43:37 UTC [5636] [devservices] reviewboard-hg2.dmz.scl3.mozilla.com:Swap is OK: SWAP OK - 51% free (1037 MB out of 2046 MB) (http://m.mozilla.org/Swap)
15:45:57 <nagios-scl3> Wed 22:45:58 UTC [5641] [unset] reviewboard-hg.mozilla.org:HTTP is OK: HTTP OK: HTTP/1.1 200 Script output follows - 7429 bytes in 0.231 second response time (http://m.mozilla.org/HTTP)
A bot decided to crawl some expensive pages and was saturating CPU. Combined with the http server limits being set way above what the current machine can support, this led to memory exhaustion and swapping, basically rendering the machine a paperweight.

I noticed the poor VM only has 2 CPU cores. We should increase that if possible, as ReviewBoard is capable of sending more than 2 concurrent requests per single page load. So we're artificially limiting load time of reviews. That's not something we want to skimp on.

dividehex: can you help get us more CPU cores?
Flags: needinfo?(jwatkins)
Oh, we temp banned the IP causing the problems. Service should be fully restored.
(Reporter)

Comment 3

a year ago
blacklisted 173.239.236.87

Updated

a year ago
Duplicate of this bug: 1369206
(In reply to Gregory Szorc [:gps] from comment #1)
>
> dividehex: can you help get us more CPU cores?

this should be an easy thing, but will require a reboot. how many cores do we need?
Flags: needinfo?(jwatkins)
If the cores aren't hyperthreaded, 4 is fine. If they could be hyptherthreads, can we go with 6? They also don't need to be dedicated cores: the machine is low utilization and it would be a waste to have it eat up dedicated cores. We just need the headroom to burst CPU periodically.

Coordinate with mcote for a reboot. FWIW I think it is acceptable to do during the day: it should only result in a minute or two of downtime and disruption should be minimal since reviewboard still continues to work (albeit degraded) without the Mercurial server. Any unlucky pusher can simply retry until the server comes back.
Depends on: 1370234
as noted in the other bug, this box is more idle than not. outside of dumb bots, we often do we *actually* need that extra headroom?
You need to log in before you can comment on or make changes to this bug.