went offline



2 years ago
2 months ago


(Reporter: sal, Unassigned)





2 years ago
@nagios-scl3> (IRC) Wed 22:20:50 UTC [5606] [devservices] of memory - killed process is WARNING: Log errors: May 31 22:17:26 kernel: [6198232.462511] Out of memory: Kill process 26225 (httpd) score 120 or sacrifice child (
15:25:58 <@nagios-scl3> Wed 22:25:59 UTC [5611] [unset] is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 Internal Server Error - 651 bytes in 0.030 second response time (

Was not able to get into the box but here are some logs. 

May 31 22:17:26 kernel: [6198232.462511] Out of memory: Kill process 26225 (httpd) score 120 or sacrifice child
May 31 22:17:26 kernel: [6198232.463365] Killed process 26225 (httpd) total-vm:2067792kB, anon-rss:1252600kB, file-rss:0kB, shmem-rss:0kB
May 31 22:18:24 kernel: [6198290.878226] Out of memory: Kill process 26211 (httpd) score 120 or sacrifice child
May 31 22:18:24 kernel: [6198290.878638] Killed process 26211 (httpd) total-vm:2139908kB, anon-rss:1437616kB, file-rss:32kB, shmem-rss:0kB
May 31 22:19:21 kernel: [6198347.657948] Out of memory: Kill process 26591 (httpd) score 126 or sacrifice child
May 31 22:19:21 kernel: [6198347.658390] Killed process 26591 (httpd) total-vm:2214404kB, anon-rss:1693140kB, file-rss:228kB, shmem-rss:0kB

Also not sure what fixed the issue;

nagios-scl3> Wed 22:43:37 UTC [5636] [devservices] is OK: SWAP OK - 51% free (1037 MB out of 2046 MB) (
15:45:57 <nagios-scl3> Wed 22:45:58 UTC [5641] [unset] is OK: HTTP OK: HTTP/1.1 200 Script output follows - 7429 bytes in 0.231 second response time (
A bot decided to crawl some expensive pages and was saturating CPU. Combined with the http server limits being set way above what the current machine can support, this led to memory exhaustion and swapping, basically rendering the machine a paperweight.

I noticed the poor VM only has 2 CPU cores. We should increase that if possible, as ReviewBoard is capable of sending more than 2 concurrent requests per single page load. So we're artificially limiting load time of reviews. That's not something we want to skimp on.

dividehex: can you help get us more CPU cores?
Flags: needinfo?(jwatkins)
Oh, we temp banned the IP causing the problems. Service should be fully restored.

Comment 3

2 years ago
Duplicate of this bug: 1369206
(In reply to Gregory Szorc [:gps] from comment #1)
> dividehex: can you help get us more CPU cores?

this should be an easy thing, but will require a reboot. how many cores do we need?
Flags: needinfo?(jwatkins)
If the cores aren't hyperthreaded, 4 is fine. If they could be hyptherthreads, can we go with 6? They also don't need to be dedicated cores: the machine is low utilization and it would be a waste to have it eat up dedicated cores. We just need the headroom to burst CPU periodically.

Coordinate with mcote for a reboot. FWIW I think it is acceptable to do during the day: it should only result in a minute or two of downtime and disruption should be minimal since reviewboard still continues to work (albeit degraded) without the Mercurial server. Any unlucky pusher can simply retry until the server comes back.
as noted in the other bug, this box is more idle than not. outside of dumb bots, we often do we *actually* need that extra headroom?

Comment 8

2 months ago
MozReview is now obsolete. Please use Phabricator instead. Closing this bug.
Last Resolved: 2 months ago
Resolution: --- → INVALID
You need to log in before you can comment on or make changes to this bug.