Closed Bug 1036519 Opened 11 years ago Closed 10 years ago

virtualize sumocelery1.webapp.phx1

Tracking

(Not tracked)

Status:

RESOLVED FIXED

People

(Reporter: gcox, Unassigned)

Details

(Keywords: p2v, Whiteboard: [kanban:https://kanbanize.com/ctrl_board/4/513] [vm-p2v:1])

Greg Cox [:gcox]

Reporter

Description

•

11 years ago

sumocelery1.webapp.phx1 is on out-of-warranty hardware. Assuming it's still needed, it looks like a candidate for virtualization, and we'd like to work out a time to take it down and convert.

:kanban

Updated

•

11 years ago

Whiteboard: [kanban:https://kanbanize.com/ctrl_board/4/513]

Chris Knowles [:cknowles]

Comment 1

•

11 years ago

Gathering some stats for suggestions as to virtual hardware to allocate. Assuming that you still need/want it around. 1 CPU 6 GRAM disk - reduce to default vm of 40G. These are just suggestions - let me know if you have concerns with these, and when a good time to take this down would be, so that we can get this off of old hardware. Thanks.

Chris Knowles [:cknowles]

Comment 2

•

10 years ago

Poking for status, if the suggested specs look acceptable, and if we can have a window to take this down and P2V, so we can get off the hardware as it comes off warranty.

Ricky Rosario [:rrosario, :r1cky]

Comment 3

•

10 years ago

From our email convo, this should take 1-2 hours. I'm game for doing it anytime you all prefer. Having celery tasks backlog for that long is no big deal. Our rabbitmq queues will still be up and running, correct?

Ricky Rosario [:rrosario, :r1cky]

Comment 4

•

10 years ago

Just give us a headsup in #sumodev or in this bug, ideally the day before.

Chris Knowles [:cknowles]

Comment 5

•

10 years ago

Per #sumodev conversation - did this P2V - upon reboot things were working, but it was using EVERY bit of RAM and swap - surprised OOM_killer-man didn't swoop in. took it back down and gave it more ram, things seem much happier.

Whiteboard: [kanban:https://kanbanize.com/ctrl_board/4/513] → [kanban:https://kanbanize.com/ctrl_board/4/513] [vm-p2v:1]

Greg Cox [:gcox]

Reporter

Comment 6

•

10 years ago

gcox@sumocelery1.webapp.phx1:~$ ps auxwww|grep "/data/www/support.mozilla.com/kitsune/virtualenv/bin/python /data/www/support.mozilla.com/kitsune/manage.py celeryd"|wc -l ; date ; free -m 130 Sun Sep 21 09:02:20 PDT 2014 total used free shared buffers cached Mem: 11912 11646 265 0 37 176 -/+ buffers/cache: 11432 480 Swap: 4095 2097 1998 That seems like a lot of threads blocked.

Jake Maul [:jakem]

Comment 7

•

10 years ago

This should get better without any VM changes... we're dropping the number of celery workers. :cturra has the specifics.

Chris Turra [:cturra]

Comment 8

•

10 years ago

(In reply to Jake Maul [:jakem] from comment #7) > This should get better without any VM changes... we're dropping the number > of celery workers. :cturra has the specifics. i reduced the number of available celery workers from 128 -> 96. watching the rabbit queues on this node before making the change, there never seemed to be any unacknowledged jobs in the queue, which meant we were already over provisioned with workers. will keep an eye on this to see if 96 results in the same.

Greg Cox [:gcox]

Reporter

Comment 9

•

10 years ago

:cknowles finished this; it's been behaving. With reduced cores it occasionally has lit off load alarms because of transient load, but :ashish said he'll "bump the threshold or increase the delay for notification". Since this is just the p2v bug, and it's verified working correctly, closing this out.

Status: NEW → RESOLVED

Closed: 10 years ago

Resolution: --- → FIXED

BMO Automation

Updated

•

7 years ago

Product: Infrastructure & Operations → Infrastructure & Operations Graveyard

You need to log in before you can comment on or make changes to this bug.

Bugzilla

virtualize sumocelery1.webapp.phx1

Categories

(Infrastructure & Operations Graveyard :: WebOps: Community Platform, task)

Tracking

(Not tracked)

People

(Reporter: gcox, Unassigned)

References

Details

(Keywords: p2v, Whiteboard: [kanban:https://kanbanize.com/ctrl_board/4/513] [vm-p2v:1])

Crash Data

Security

(public)

User Story

Description

Updated

Comment 1

Comment 2

Comment 3

Comment 4

Comment 5

Comment 6

Comment 7

Comment 8

Comment 9

Updated