datazilla1.webapp.scl3 is on hardware that will be going out-of-warranty soon. It appears to be lightly utilized and a good candidate for a p2v conversion. Assuming it's still needed, we'd like to get a window to take it down for a little while to convert it.
AFAIK, dev svcs has had no part in maintaining datazilla. I don't know who has been responsible for it, though.
Took a stab on filing it based on the creation bugs from moons ago. :jeads is listed as the admin in mana. I'm totally open to it being shuffled around.
I'm the primary developer on datazilla. Not entirely sure what "p2v conversion" implies. Is that moving the webservice+database to internal virtualized nodes? The webservice should be fine, the database especially talos_objectstore_1/talos_perftest_1 might be a bit large for that. We are migrating it to treeherder but that will carry over into Q4. I think virtualizing the non talos databases in datazilla should be fine. What is the timeline for this?
p2v is "this box is a physical piece of hardware, let's take a snapshot of it and turn it into a VM that runs in ESX." The timeline is "now-ish/soon as you'll let us": warranty on the webapp box ran out on 2014-8-23, so, the sooner we can do it, the better we protect it from hardware failure. Downtime is about an hour where we'll turn off apache and stuff, do the conversion, and then bring it back up as a VM. The database boxes are on the 'spring cleaning' list: they're out of warranty, but their impending cutover to treeherder means we're leaving them alone. This is only focusing on the webapp box, datazilla1.webapp.scl3. What we need is to know when we can take the downtime to convert it and who to notify. We've got decent coverage, pretty easily 0400-1600 PDT weekdays, and any other time with some coordination.
Not a dupe. This one's prod, the other is stage. Following sheeri's lead and ni'ing :wlach about timing of an outage.
How long a maintenance window are we talking about? There are three primary users of datazilla currently: (1) dzAlerts -- alerts sent out based on regressions detected in talos data (Kyle L is working on this). This is soon to be taken down (hopefully by end of Q4) (2) b2g performance numbers -- probably the most important use of datazilla right now, we use this to track startup times as commits are made to gaia, etc. (Dave Hunt maintains this, I think?) (3) Benchmarking Firefox vs. other browsers (Dan Minor maintains this, I think?) If we can avoid it, I'd prefer if we could postpone this migration until (1) is completed and we can turn datazilla submission for talos off (otherwise we'll need to close the trees so our jobs don't turn orange). For (2) and (3) CC'ing the owners of these components to figure out what the implications of turning datazilla off would be.
Ballpark downtime for something this small is quoted at 'an hour', reality is usually about 30mins. We missed the official cutoff for the tree-closing window on 12/20, but we could probably get an exception for this if we ask by Wednesday, 9am PT. Keep in mind, given that you're on an HP blade that's now out of warranty, you're gambling with there not being a hardware failure before this migrates, so, we'd like to do it asap, but, your box your call.
(In reply to William Lachance (:wlach) from comment #7) > How long a maintenance window are we talking about? There are three primary > users of datazilla currently: > > (1) dzAlerts -- alerts sent out based on regressions detected in talos data > (Kyle L is working on this). This is soon to be taken down (hopefully by end > of Q4) > (2) b2g performance numbers -- probably the most important use of datazilla > right now, we use this to track startup times as commits are made to gaia, > etc. (Dave Hunt maintains this, I think?) > (3) Benchmarking Firefox vs. other browsers (Dan Minor maintains this, I > think?) > > If we can avoid it, I'd prefer if we could postpone this migration until (1) > is completed and we can turn datazilla submission for talos off (otherwise > we'll need to close the trees so our jobs don't turn orange). For (2) and > (3) CC'ing the owners of these components to figure out what the > implications of turning datazilla off would be. Bug 1110270 tracks moving Mozbench from datazilla. By end of day today I should no longer be reporting anything to datazilla.
Deferring to Eli, who is actively working on Firefox OS performance, although I suspect we can tolerate the downtime that's predicted.
We can certainly tolerate that downtime. +1 from me.
Based on checking with :hwine and the other signoffs I've seen, I'm proposing we do this during this Saturday's treeclosing window. If there's a reason not to, let me know, otherwise I'll take it to the change board tomorrow.
as long as the trees are closed, I have no objections
CAB approved, aiming for really early in the window (~0900PT 20 Dec).
p2v completed. Downtime was 0900-~0930 PT. Based on usage over the last week, downsized to 1 core and 8G of RAM (was using nothing and 5G), and 40G disk. Nagios shows green.