Closed
Bug 1061879
Opened 11 years ago
Closed 11 years ago
virtualize upload1.dmz.scl3
Categories
(Infrastructure & Operations :: Change Requests, task)
Infrastructure & Operations
Change Requests
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: gcox, Assigned: gcox)
References
Details
(Keywords: p2v, Whiteboard: [vm-p2v:1])
When: a TCW, 1 hour duration
System: upload1.dmz.scl3
Impact: Dealer's choice: disruptive cutover/cutback to using upload2 during the p2v, or builds can't upload during p2v.
Plan: established p2v practices
Notifs: standard TCW notices
Point: gcox
Why: Opportunistically p2v'ing a critical system before the warranty runs out.
Can be deferred if there's conflicting work.
| Assignee | ||
Updated•11 years ago
|
Flags: cab-review?
Nick - sounds like this is fine by us to happen during a TCW. Would we need to shut down any of our process, or okay to just let them recover retrigger after the fact? And would you recommend a hard tree closure or a soft one?
Flags: needinfo?(nthomas)
Comment 2•11 years ago
|
||
A drain on upload1 probably wouldn't be too disruptive, just a few builds that would fail to upload if they're using the post_upload.py method (which connects multiple times and relies on stuff it puts in /tmp. Affects firefox desktop, fennec, logs from masters IRRC). Not a huge difference from a hard cut if we're in a TCW anyways.
What about the spec of the VM ? A nice big network pipe to the Netapp would be super.
Flags: needinfo?(nthomas)
| Assignee | ||
Comment 3•11 years ago
|
||
My spec stab: 2 vCPU*, 8G vRAM**, 40G disk ('default'), standard ethernet***
* load doesn't really go over 2 unless there's collisions on rsyncs, and then it's just catchup.
** 16G is what it has now, but it's ALL cache, so that's quite trimmable.
*** when we put in the vNIC it'll get 10GbE and be one switch away.
Updated•11 years ago
|
Flags: cab-review? → cab-review+
Comment 4•11 years ago
|
||
(In reply to Greg Cox [:gcox] (plz don't needinfo me) from comment #3)
> My spec stab: 2 vCPU*, 8G vRAM**, 40G disk ('default'), standard ethernet***
Fwiw we currently have:
/dev/sda3 67G 18G 46G 29% /
on this host, where I wouldn't be shocked if we easily chew up 22+ GB extra in /tmp on occassion in high load-spikey moments.
(just informative, not considering it as any reason to alter the plan-of-record)
| Assignee | ||
Comment 5•11 years ago
|
||
OK, over on upload2 I shrank / to be smaller, but broke /tmp out be a separate 40g VMDK so it can't crush the root disk. We'll do the same for upload1.
Assignee: server-ops → gcox
Comment 6•11 years ago
|
||
Nice catch Callek.
| Assignee | ||
Comment 7•11 years ago
|
||
p2v done during TCW, 0900-0950.
/tmp put on its own partition, added DRS rules to ensure separation from upload2.
Left a note for the oncall to watch for alerts in case we misjudged the size needs.
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Whiteboard: [vm-p2v:1]
Updated•11 years ago
|
Product: mozilla.org → Infrastructure & Operations
Updated•10 years ago
|
Change Request: --- → approved
Flags: cab-review+
You need to log in
before you can comment on or make changes to this bug.
Description
•