Closed Bug 1313498 Opened 8 years ago Closed 7 years ago

Spin up VMs for mozreview stage environment

Categories

(Infrastructure & Operations :: Virtualization, task)

task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: dividehex, Assigned: cknowles)

References

Details

(Whiteboard: [vm-create:12][vm-delete:6])

Mozreview devs have requested a staging environment for mozreview. They will need 1 admin node, 2 web heads and 2 hg nodes.
Please spin up new VMs with identical resources and configurations to their current production nodes.

See the mana page for more details on the current VM setup:
https://mana.mozilla.org/wiki/display/websites/reviewboard.mozilla.org

The current hosts in production are as follows:
reviewboardadm.private.scl3.mozilla.com
reviewboard1.webapp.scl3.mozilla.com
reviewboard2.webapp.scl3.mozilla.com
reviewboard-hg1.dmz.scl3.mozilla.com
reviewboard-hg2.dmz.scl3.mozilla.com

Thanks!
Just a few questions.

1) normally we've not been spinning a separate admin box for stage environments.  Are you needing one here, or ??
2) You don't mention any DB hosts, did you want those included?
3) naming - for reviewboardfu.VLAN.scl3 - name the dev version reviewboardfu.dev.VLAN.scl3?  or some other convention?  (so, reviewboard-hg1.dmz.scl3 becomes reviewboard-hg1.dev.dmz.scl3)
(In reply to Chris Knowles [:cknowles] from comment #1)
> Just a few questions.
> 
> 1) normally we've not been spinning a separate admin box for stage
> environments.  Are you needing one here, or ??
We can probably skip a separate admin.  I'll just need to make sure netflows are open from the current host to the new hosts.

> 2) You don't mention any DB hosts, did you want those included?
Yep,  I assumed that would be spawned from a separate bug.  I filled bug 1313504 for creating the DB instances.

> 3) naming - for reviewboardfu.VLAN.scl3 - name the dev version
> reviewboardfu.dev.VLAN.scl3?  or some other convention?  (so,
> reviewboard-hg1.dmz.scl3 becomes reviewboard-hg1.dev.dmz.scl3)

Sounds good to me!
Ok, striking the admin host, and adding reviewboard{1,2}.db.scl3.mozilla.com.  Also making that DB but dependent on this one, as the Data team would certainly configure things, but the virt-folk would create it.  

It's late in my day today - but I will start work on this first thing Monday morning.
Alright, following created, items in the same vlan moved around so they aren't sharing storage or underlying hardware, disks created/upsized as noted.  Added to tracking sheets, and added as "pre-prod" to nagios.  Initial puppetization completed.

reviewboard1.dev.webapp.scl3.mozilla.com 2CPU 4G RAM 
reviewboard2.dev.webapp.scl3.mozilla.com 2CPU 4G RAM 
reviewboard-hg1.dev.dmz.scl3.mozilla.com 2CPU 6G RAM / upsized to 80G
reviewboard-hg2.dev.dmz.scl3.mozilla.com 2CPU 6G RAM / upsized to 80G
reviewboard1.dev.db.scl3.mozilla.com     1CPU 2G RAM 40G /data added
reviewboard2.dev.db.scl3.mozilla.com     1CPU 2G RAM 40G /data added

Please reach out if there are concerns or questions.
Assignee: server-ops-virtualization → cknowles
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Whiteboard: [vm-create:6]
In further conversation and requests - it appears that I made an error when I suggested reviewboard*.dev.VLAN - and I should have said "stage" instead of "dev".

With Jake's approval, tearing down the .dev. VMs and bringing up the .stage. versions.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
removed the .dev. boxes from nagios.  Once that's done propagating, will power off and continue with the replace.
Alright, following created, items in the same vlan moved around so they aren't sharing storage or underlying hardware, disks created/upsized as noted.  Added to tracking sheets, and added as "pre-prod" to nagios.  Initial puppetization completed.

reviewboard1.stage.webapp.scl3.mozilla.com 2CPU 4G RAM 
reviewboard2.stage.webapp.scl3.mozilla.com 2CPU 4G RAM 
reviewboard-hg1.stage.dmz.scl3.mozilla.com 2CPU 6G RAM / upsized to 80G
reviewboard-hg2.stage.dmz.scl3.mozilla.com 2CPU 6G RAM / upsized to 80G
reviewboard1.stage.db.scl3.mozilla.com     1CPU 2G RAM 40G /data mounted manually
reviewboard2.stage.db.scl3.mozilla.com     1CPU 2G RAM 40G /data mounted manually

Please reach out if there are concerns or questions.
Status: REOPENED → RESOLVED
Closed: 8 years ago8 years ago
Resolution: --- → FIXED
Whiteboard: [vm-create:6] → [vm-create:12][vm-delete:6]
Well this is embarrassing. reviewboard-hg[12].stage.dmz need to be rebuilt as CentOS 7, not RHEL6. :-( Nothing is live on them, so it's safe to do at any time (really, really).
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
OK, I'll be blowing away those VMs shortly and will start work on rebuilding.  They're down in nagios for 1 day.
I think we got wires crossed.

I definitely wanted reviewboard-hg[12] rebuilt (thanks!!), but I'm not sure that we want reviewboard[12].stage.webapp on centos7. :glob, what say ye?
Flags: needinfo?(glob)
Oh crap - I blew everything away.

I can attempt a restore for the hg and db boxes - but need to know soon if that's desired.

mea culpa
Argh.  I meant to say "webapp and db boxes"
Definitely on the db boxes, since matt had already loaded data there. I'm fairly certain on the webapp nodes, since they want to test a reviewboard upgrade, but I'd like glob to weigh in. If you can save the snapshots that'd be swell, but if you can't save them *and* that would make fixing them vastly easier then go ahead and restore the webheads.
db boxes are back - they are giving an error on the puppet runs (no idea if this was there before) - per :fubar cc'ing in :mpressman to take a look:

Error: /Stage[main]/Mysql2::Server/File[/usr/local/etc/reviewboard.db.mozilla.com.key]: Could not evaluate: Could not retrieve information from environment production source(s) puppet:///modules/secrets/certs/db/reviewboard_stage/reviewboard.db.mozilla.com.key
Error: /Stage[main]/Mysql2::Server/File[/usr/local/etc/reviewboard.db.mozilla.com.crt]: Could not evaluate: Could not retrieve information from environment production source(s) puppet:///modules/secrets/certs/db/reviewboard_stage/reviewboard.db.mozilla.com.crt

And just to be safe/repair my idiocy - I'll be restoring the reviewboard[12].stage.webapp shortly.  If :glob wants centos7, I'm happy to redo that work.
Oh!  and the reviewboard-hg[12].stage.dmz.scl3 boxes have been rebuilt and repuppeted.

So there's *some* success in this nightmare,.
I'm pretty certain the puppet errors is due to the refactoring. We encountered a similar problem testing mysql server upgrades using ansible. Although I didn't see the puppet error - pythian was doing the work - I was notified that it wasn't reading from secrets to set the password from it. I planned to investigate puppet by this Friday in order to sign off on the ansible playbook. 
The issue in Comment 14 seems to correlate with what the ansible testing uncovered as that only gets executed when mysql hasn't been initialized with the user grants applied.
Since the mysql2 module has not been modified in quite some time, I think just finding the correct reference to secrets will solve the problems.
Alright - and reviewboard[12].stage.webapp.scl3.mozilla.com have also been restored.  

They too are seeing some puppet errors - but from what I'm seeing I'm not clear that this isn't expected.  Take a look, let me know how I can help.

At this point, I think we're roughly at "Stop helping Chris, you've done enough damage".  The -hg's have been created as centos7 and the db and webheads have been restored from Tuesday night backups.

I'm leaving :glob's NI in place, so he can let me know if he wants centos7 webheads - or if they need other attention.

Feel free to shame me as needed.  I apologize for the (likely) extra work this will incur.
(In reply to Chris Knowles [:cknowles] from comment #17)
> I'm leaving :glob's NI in place, so he can let me know if he wants centos7
> webheads - or if they need other attention.

all i want is for stage to match production :)

~$ ssh reviewboard1.webapp.scl3.mozilla.com cat /etc/redhat-release
Red Hat Enterprise Linux Server release 6.8 (Santiago)
~$ ssh reviewboard1.stage.webapp.scl3.mozilla.com cat /etc/redhat-release
Red Hat Enterprise Linux Server release 6.8 (Santiago)

lgtm.

i haven't yet had time to start provisioning rb onto the servers as i only got access to them late yesterday, so it's too early to say if they need other attention.  but i can follow up with fubar if that's required.
Flags: needinfo?(glob)
Alright then, closing this out.  I apologize for the repeated confusion and noise in this bug.
Status: REOPENED → RESOLVED
Closed: 8 years ago7 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.