Closed Bug 999435 Opened 11 years ago Closed 11 years ago

Setup new Ubuntu 14.04 nodes for Mozmill CI in qa.scl3.mozilla.com

Categories

(Infrastructure & Operations :: Virtualization, task)

All
Linux
task
Not set
major

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: whimboo, Assigned: cknowles)

References

Details

(Whiteboard: [vm-create:18][vm-delete:8][qa-automation-blocked])

Ubuntu 14.04 has been released, and we want to upgrade our production machines to 14.04. Given that it is a LTS release we can replace all of the existent Ubuntu nodes, which means 12.04 (32/64) and 13.10 (32/64). So can we please get new Ubuntu 14.04 VM templates generated? Thanks. We want to use that version to get started with Puppet on bug 973535.
Machines we would need to be created are: mm-ub-1404-32-1.qa.scl3.mozilla.com mm-ub-1404-32-2.qa.scl3.mozilla.com mm-ub-1404-32-3.qa.scl3.mozilla.com mm-ub-1404-32-4.qa.scl3.mozilla.com mm-ub-1404-64-1.qa.scl3.mozilla.com mm-ub-1404-64-2.qa.scl3.mozilla.com mm-ub-1404-64-3.qa.scl3.mozilla.com mm-ub-1404-64-3.qa.scl3.mozilla.com Pre-installation instructions can be found here: https://mana.mozilla.org/wiki/display/websites/QA+Automation+ESX+Service#QAAutomationESXService-Linux%28Ubuntu%29
These will need to be kickstarted into PuppetAgain. This may become a much more common request, from people without easy vCenter console access. Should we have a quick meeting to talk about how to do this, and work out any kinks? The 14.04 kickstart process isn't set up yet, so we can't KS these VMs yet anyway.
(In reply to Dustin J. Mitchell [:dustin] from comment #2) > These will need to be kickstarted into PuppetAgain. This may become a much > more common request, from people without easy vCenter console access. > Should we have a quick meeting to talk about how to do this, and work out > any kinks? May be fine with me but we would need feedback from Adrian or someone else who will set up those VMs first. Better we manage a good time via IRC or Email then. > The 14.04 kickstart process isn't set up yet, so we can't KS these VMs yet > anyway. What do you think how long this will take?
Sorry, that was for a meeting with the virtualization folks. We're planning to use Ubuntu-14.04 for OpenStack as well, so I'm working on the KS process now.
Dustin - so sorry for the delay - I'd be happy to meet up with you. How's the puppetagain work for the 14.04 coming? My schedule is likely more wide open than yours, feel free to ping me on irc whenever. CJK
OK, upshot from the brief meeting - kickstarting puppetagain is understood reasonably well... from a "click here, do that" level. So that's not in the way. However work on 14.04 is ongoing, and "not quite ready" for kickstarting yet. Let me know when we can move forward on it. Thanks for the time today.
Alright, I see that the blocking bug for the 14.04 is now closed - starting on this. :whimboo - can you add these to the puppetagain node definitions so that the puppetagain kickstart can fully complete? CJK
Assignee: server-ops-virtualization → cknowles
Chris, on bug 1020659 I'm currently working on QA specific node definitions. With the patch attached there we will be able to recognize Ubuntu 14.04 for staging machines. The hosts which I pointed out in the initial comment are for production. So what we indeed also need are the machines for staging. Those would be: mm-ub-1404-32.qa.scl3.mozilla.com mm-ub-1404-64.qa.scl3.mozilla.com Once the patch on the other bug landed, both machines would be able to pull their configuration from our qa puppetmaster. So I would suggest we start with those 2 machines, and do the tests how it works. Does it sound good?
Status: NEW → ASSIGNED
sounds fine, other than that I just kickstarted 6 of the 8 machines. :/ (my timing problem, not yours) I'll power them back down, and setup to be ready to kickstart the two you suggest. Let me know when I'm clear to begin.
I have noticed that! So I went ahead and also added all the production nodes to the config which has been pushed to production now. If you want, you can start them all! Nothing should fail at the moment. The only problem which persists is bug 1006891, which also installed Apache. We have to get this fixed. So adding as dependency.
Depends on: 1006891
Alright, I've spun up mm-ub-1404-[32,64].qa.scl3 and kickstarted them. However, these are stuck at the post boot splash screen - which I had been informed implied a problem with puppetization - let me know if I'm clear to proceed on the rest, or how I should modify the procedure to work better... CJK
Logging into these, it looks like they need to have the releng hg repo merged, as the current QA repo still specifies puppet-3.4.2, which doesn't exist for trusty. They'll continue retrying until that's done (and probably sending you email?)
I did the merge from build/puppet now. So lets see how it works. Sadly I still get tons of emails for: Error 400 on SERVER: Could not find default node or by name with 'mm-ub-1404-64.qa.scl3.mozilla.com' on node mm-ub-1404-64.qa.scl3.mozilla.com Not sure why it doesn't fetch the node. Maybe somewhat is wrong in the node regex.
I totally messed up this merge. Chris, I'm sorry but can you please shutdown all machines? Otherwise I have a full inbox on Tuesday when I come back.
No need to shut down the machines, I think - I killed the puppetize.sh processes on them so they shouldn't spam over the weekend.
Given the "I think" I decided to shut them off. Let me know when I'm clear to start powering them up again. CJK
Per our IRC conversation, I just started the kickstart of mm-ub-1404-[32,64].qa.scl3. I'll let you know what I see.
Alright, the kickstarts have gone off, and are again sitting at the splashscreen - none of my credentials are working for SSHing in as root, so I can't see what's going on in the logs - let me know what you'd like me to try next. CJK
I have a problem in connecting to the machines given that on my Linux machine I cannot resolve the DNS names. I have to wait until I'm back at home for further inspection. But what I see so far is promising. All went fine this time, except a single error for apt-get right after puppetizing the machine. I will file a separate bug for that, at least for investigation.
Ok, so I can login via SSH with the root account, but not with my own username. So something went wrong with the initial puppet run. I will have to check what we actually do when adding those users. Maybe I can find what's wrong. Not sure if bug 1024938 has any effect here.
So the admin_users we define in qa-config.pp are those only for the puppetmaster? https://hg.mozilla.org/qa/puppet/file/2dea7e8f1dcc/manifests/qa-config.pp#l39 If, yes what needs to be done to also have them available on the slave nodes?
These are questions I think that are best directed towards Dustin, as I have to poke him with questions when things go awry. Dustin, do you have any input on this?
Chris, sorry that was my fault. Those questions are targeted for Dustin. So I revisit the current bugs, and figured out that we should get this discussion moved over to bug 973535. All what could be done on this bug has been done. I have added those 8 new nodes to our ESX documentation on Mana: https://mana.mozilla.org/wiki/display/websites/QA+Automation+ESX+Service Chris also instructed me on Thursday how to kickstart a machine with Ubuntu 14.04. I did that for a 32bit and 64bit one, and all works fine. So I think all work as it could be done here has been finished, and we can close the bug.
Status: ASSIGNED → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Whiteboard: [qa-automation-blocked]
Just to follow up, admin users are present on all toplevel::server nodes, which does *not* include slaves.
Whiteboard: [vm-create:8]
Chris, I will have to re-open this bug given that the PuppetAgain process is still ongoing and we weren't able to finish it off yet. As I have read last week Ubuntu even released 14.10, and we still don't have 14.04 live on our machines! We talked about that in our team and decided that we cannot wait until Puppetagain is ready for us on Ubuntu. So let us do the remaining steps here: 1. I need new templates for both 14.04 releases (32/64). Please ensure that this gets installed from fresh and not updated from a former Ubuntu release template. 2. I will do the necessary customizations for the template 3. Once customization is done we have to re-create the already existent 14.04 CI machines based on the template.
Severity: normal → major
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Whiteboard: [vm-create:8] → [vm-create:8][qa-automation-blocked]
So, let me rephrase, and ask a few questions to make sure I understand the request. You'd like me to delete mm-ub-1404-32-{1..4} and mm-ub-1404-64-{1..4} - and recreate them from infrastructure templates. - which are installs of 1404. do you want these puppeted? If this is far from correct, perhaps a new bug with a full, clear request, and without all the ancient history that is in here would better serve.
(In reply to Chris Knowles [:cknowles] from comment #26) > You'd like me to delete mm-ub-1404-32-{1..4} and mm-ub-1404-64-{1..4} - and > recreate them from infrastructure templates. - which are installs of 1404. > do you want these puppeted? What do the infrastructure templates contain? Are those bare installations, or did those already receive customizations? If they are plain, we can duplicate them to templates we could use. For all the other releases and also for Windows we have our own templates. You might be able to find them in vSphere. I was not with my privileges. Those will not get any puppetagain related customization! That will happen later when I'm done with that for our purposes. It may still take a bit. Not sure if you have to delete the existent mm-ub-1404-* hosts. Do the best what you think has to be done to later create the hosts from the customized template. > If this is far from correct, perhaps a new bug with a full, clear request, > and without all the ancient history that is in here would better serve. Comment 0 and comment 1 still apply, simply without puppet. Our estimate to get it directly running with puppetagain was kinda too optimistic. Sorry.
Well, if comment1 applies, you *do* want new boxes to replace the mm-ub-1404-32-{1..4} and mm-ub-1404-64-{1..4} ones that were created earlier. Please confirm that remove. The infra templates do have some added packages for our puppetizing pleasure - also keys and other access related elements for the datacenters and to allow IT access to the created VMs. However, puppet is not applied out of the box. So, I'm still a little confused - is this the request to create mm-ub-1404-32-{1..4} and mm-ub-1404-64-{1..4} - or is this a request to create a 32 and 64 bit template - to create those ?
(In reply to Chris Knowles [:cknowles] from comment #28) > Well, if comment1 applies, you *do* want new boxes to replace the > mm-ub-1404-32-{1..4} and mm-ub-1404-64-{1..4} ones that were created > earlier. Please confirm that remove. Alright! Then lets do that when the template is ready to get distributed to the to be replaced hosts. All of those nodes are not in use. We will not replace mm-ub-1404-32 and mm-ub-1404-64, which are hosts in our staging instance and which I use for testing puppet. > The infra templates do have some added packages for our puppetizing pleasure > - also keys and other access related elements for the datacenters and to > allow IT access to the created VMs. > > However, puppet is not applied out of the box. I think that should be ok for now. > So, I'm still a little confused - is this the request to create > mm-ub-1404-32-{1..4} and mm-ub-1404-64-{1..4} - or is this a request to > create a 32 and 64 bit template - to create those ? First we need the templates to be created and customized before we can create the nodes.
Alright, per our conversation this morning, Spin up two VM's - mm-ub-1404-32-template.qa.scl3.mozilla.com and mm-ub-1404-64-template.qa.scl3.mozilla.com - and let you do all your customizations to it. Once that's done, we'll convert those into templates in the QA space, and then we can spin out the actual worker VMs.
Releng is running 14.04 hosts in production .. what's still to do?
:whimboo, the machines : ubuntu-14.04-64-template.qa.scl3.mozilla.com and ubuntu-14.04-32-template.qa.scl3.mozilla.com Only changes I've made to them from the default desktop setup is: 1) set apt-proxy to the dc proxies - per your docs 2) installed open-vm-tools-desktop for proper vm support 3) added standard IT keys to the mozauto ssh account. 4) installed and enabled ssh server on there - so you can SSH to them as mozauto, with the password we discussed on IRC earlier. Let me know what else I can do.
Whiteboard: [vm-create:8][qa-automation-blocked] → [vm-create:10][qa-automation-blocked]
My early morning brain on that day made a mistake - the '.' in 14.04 was causing inventory to misinterpret things as a subdomain. So, with permission, I took these down and renamed them, simply removing the '.' - feeling that a '-' would be weird. ubuntu-1404-32-template.qa.scl3.mozilla.com ubuntu-1404-64-template.qa.scl3.mozilla.com Any problems or concerns, let me know.
Chris, I'm not able to connect to those machines via SSH. Neither I can find them in vSphere in our VLAN. So can you please install a SSH server? Thanks.
Status: REOPENED → ASSIGNED
Flags: needinfo?(cknowles)
Per the following, they've already got SSH servers installed, and are on the QA vlan (VLAN73). Last login: Mon Nov 3 16:31:05 on ttys000 cknowles-20405:~ cknowles$ ssh mozauto@ubuntu-1404-32-template.qa.scl3.mozilla.com Welcome to Ubuntu 14.04.1 LTS (GNU/Linux 3.13.0-32-generic i686) * Documentation: https://help.ubuntu.com/ 225 packages can be updated. 99 updates are security updates. Last login: Mon Nov 3 04:32:30 2014 from 10-22-248-146.vpn.scl3.mozilla.com mozauto@ubuntu-14:~$ uptime 03:35:21 up 23:04, 1 user, load average: 0.00, 0.01, 0.05 mozauto@ubuntu-14:~$ exit logout Connection to ubuntu-1404-32-template.qa.scl3.mozilla.com closed. cknowles-20405:~ cknowles$ ssh mozauto@ubuntu-1404-64-template.qa.scl3.mozilla.com Welcome to Ubuntu 14.04.1 LTS (GNU/Linux 3.13.0-32-generic x86_64) * Documentation: https://help.ubuntu.com/ 224 packages can be updated. 98 updates are security updates. Last login: Mon Nov 3 04:31:52 2014 from 10-22-248-146.vpn.scl3.mozilla.com mozauto@ubuntu-14:~$ uptime 03:35:34 up 23:04, 1 user, load average: 0.00, 0.01, 0.05 mozauto@ubuntu-14:~$ exit logout Connection to ubuntu-1404-64-template.qa.scl3.mozilla.com closed. cknowles-20405:~ cknowles$
Flags: needinfo?(cknowles)
Ups, totally my fault. I tried to get the IP address via my people SSH connection, but actually also tried to SSH into the above VMs from that location. That's clearly failing. I can connect now.
Alright. Both VMs have been updated and customized for our needs. Chris, you can now convert them back to templates, and replace our existent 4 machines for 32bit and 64bit with the new template. Thanks.
Alright will be shutting down: mm-ub-1404-32-1.qa.scl3.mozilla.com mm-ub-1404-32-2.qa.scl3.mozilla.com mm-ub-1404-32-3.qa.scl3.mozilla.com mm-ub-1404-32-4.qa.scl3.mozilla.com mm-ub-1404-64-1.qa.scl3.mozilla.com mm-ub-1404-64-2.qa.scl3.mozilla.com mm-ub-1404-64-3.qa.scl3.mozilla.com mm-ub-1404-64-3.qa.scl3.mozilla.com Destroying and redeploying from template - will let you know when that's complete.
Also, re-reading the bug history - will you be needing the staging boxes as well? mm-ub-1404-32.qa.scl3.mozilla.com mm-ub-1404-64.qa.scl3.mozilla.com
Alright the 8 have been respun from your template. mm-ub-1404-32-1.qa.scl3.mozilla.com mm-ub-1404-32-2.qa.scl3.mozilla.com mm-ub-1404-32-3.qa.scl3.mozilla.com mm-ub-1404-32-4.qa.scl3.mozilla.com mm-ub-1404-64-1.qa.scl3.mozilla.com mm-ub-1404-64-2.qa.scl3.mozilla.com mm-ub-1404-64-3.qa.scl3.mozilla.com mm-ub-1404-64-3.qa.scl3.mozilla.com They're all responding to SSH, and seem to be healthy, let me know of any concerns. And let me know if you need those staging ones respun as well.
Flags: needinfo?(hskupin)
(In reply to Chris Knowles [:cknowles] from comment #39) > Also, re-reading the bug history - will you be needing the staging boxes as > well? Nope, we can keep them. No need to re-deploy them. They are based on Puppet and will be used for testing. Thanks! (In reply to Chris Knowles [:cknowles] from comment #40) > Alright the 8 have been respun from your template. Great. I will check that soonish and reply back if I see something suspicious.
Whiteboard: [vm-create:10][qa-automation-blocked] → [vm-create:18][vm-delete:8][qa-automation-blocked]
Chris, for those VMs we do not have the checkbox for auto-upgrading VMware tools enabled. Can you please make sure to enable it for the templates?
It's not needed, and in fact may cause problems. Longer: Starting with the modern versions of the Linuxes (RHEL and CentOS 7, as well as ubuntu 14.04) vmware provided tools are now deprecated, and the open-vm-tools that are included with the distros are considered the canonical source. So for ubuntu, an "apt-get update;apt-get dist-upgrade" will set you with the latest open-vm-tools, as they're already installed on there. The checkbox tries to mount the tools CD image and install the vmware version - which may or may not cause issues. Also, note that for the linuxes, we have scripting in place that can and should manage tools upgrades for the environment and should keep them reasonably up to date.
Oh! That are news I haven't known. That is really good to hear. So ok, I will get the checkbox disabled again as first action tomorrow morning. Thanks for the info Chris!
Flags: needinfo?(hskupin)
So what I did so far: * Updated all the machines for auto-upgrade of VMware tools * Connected all 32bit machines * Connected all 64bit machines Todo: * I still see problems with the Flash installer not being able to download the real binaries. This is all due to proxy settings. This needs to be fixed, so I will care about it tomorrow.
I removed the auto-upgrade check from all Ubuntu 14.04 vms. So this is clean now. Further I investigated the Flash issue a bit more and given that it's not only 14.04, which suffers from that, I will take care of it on bug 949427. So all is done on that bug. Thanks a lot Chris for all the help!
Status: ASSIGNED → RESOLVED
Closed: 11 years ago11 years ago
Resolution: --- → FIXED
Product: mozilla.org → Infrastructure & Operations
You need to log in before you can comment on or make changes to this bug.