Closed
Bug 1036519
Opened 11 years ago
Closed 10 years ago
virtualize sumocelery1.webapp.phx1
Categories
(Infrastructure & Operations Graveyard :: WebOps: Community Platform, task)
Infrastructure & Operations Graveyard
WebOps: Community Platform
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: gcox, Unassigned)
Details
(Keywords: p2v, Whiteboard: [kanban:https://kanbanize.com/ctrl_board/4/513] [vm-p2v:1])
sumocelery1.webapp.phx1 is on out-of-warranty hardware. Assuming it's still needed, it looks like a candidate for virtualization, and we'd like to work out a time to take it down and convert.
Comment 1•10 years ago
|
||
Gathering some stats for suggestions as to virtual hardware to allocate. Assuming that you still need/want it around.
1 CPU
6 GRAM
disk - reduce to default vm of 40G.
These are just suggestions - let me know if you have concerns with these, and when a good time to take this down would be, so that we can get this off of old hardware. Thanks.
Comment 2•10 years ago
|
||
Poking for status, if the suggested specs look acceptable, and if we can have a window to take this down and P2V, so we can get off the hardware as it comes off warranty.
Comment 3•10 years ago
|
||
From our email convo, this should take 1-2 hours. I'm game for doing it anytime you all prefer. Having celery tasks backlog for that long is no big deal.
Our rabbitmq queues will still be up and running, correct?
Comment 4•10 years ago
|
||
Just give us a headsup in #sumodev or in this bug, ideally the day before.
Comment 5•10 years ago
|
||
Per #sumodev conversation - did this P2V - upon reboot things were working, but it was using EVERY bit of RAM and swap - surprised OOM_killer-man didn't swoop in. took it back down and gave it more ram, things seem much happier.
Whiteboard: [kanban:https://kanbanize.com/ctrl_board/4/513] → [kanban:https://kanbanize.com/ctrl_board/4/513] [vm-p2v:1]
![]() |
Reporter | |
Comment 6•10 years ago
|
||
gcox@sumocelery1.webapp.phx1:~$ ps auxwww|grep "/data/www/support.mozilla.com/kitsune/virtualenv/bin/python /data/www/support.mozilla.com/kitsune/manage.py celeryd"|wc -l ; date ; free -m
130
Sun Sep 21 09:02:20 PDT 2014
total used free shared buffers cached
Mem: 11912 11646 265 0 37 176
-/+ buffers/cache: 11432 480
Swap: 4095 2097 1998
That seems like a lot of threads blocked.
Comment 7•10 years ago
|
||
This should get better without any VM changes... we're dropping the number of celery workers. :cturra has the specifics.
Comment 8•10 years ago
|
||
(In reply to Jake Maul [:jakem] from comment #7)
> This should get better without any VM changes... we're dropping the number
> of celery workers. :cturra has the specifics.
i reduced the number of available celery workers from 128 -> 96. watching the rabbit queues on this node before making the change, there never seemed to be any unacknowledged jobs in the queue, which meant we were already over provisioned with workers. will keep an eye on this to see if 96 results in the same.
![]() |
Reporter | |
Comment 9•10 years ago
|
||
:cknowles finished this; it's been behaving. With reduced cores it occasionally has lit off load alarms because of transient load, but :ashish said he'll "bump the threshold or increase the delay for notification".
Since this is just the p2v bug, and it's verified working correctly, closing this out.
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Updated•6 years ago
|
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•