Closed Bug 659512 Opened 13 years ago Closed 13 years ago

create new staging master for auto-tools team

Categories

(Infrastructure & Operations :: RelOps: General, task)

task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: anodelman, Assigned: arich)

References

Details

tools-staging-master appears to now be dead in the water due to updates to buildbot-custom/python/twisted/etc.  I have been unable to resurrect it to a working state and I believe that the differences between my system and the production master environment has now changed enough that this a fool's errand.

The machine that my vm was based upon (talos-staging-master) has been decommissioned so there is no working machine to compare mine to.

I need a working staging master to attach my slaves to so that I can continue with my testing.
Blocks: 648114
Alice, Zandr believes that we can easily build you an image off of the new buildbot masters that he recently set up for releng.  Would that unhork things here?

This would get the master updated itself.  Is there other issues that are preventing these masters from working?
zandr - that sounds like that would get me unblocked.  Though remember that my master needs to be on the mv network, not the build network.
OK, I had a look at the link aki pointed to, and I think I can pull that off.  Here's the plan:

 1. new VM (amy)
 2. puppetize (me)
 3. install a staging master (me, maybe with puppet, which just runs setup-master)
 4. get someone else in releng to look and make sure I didn't hork it up
 5. reset passwords and hand off to anode

The new VM:
  talos-staging-master2.mv.mozilla.com (or whatever suffix "mv network" means)
  centos-5.5
  4GB RAM
  40GB HDD
  KVM Cluster: mtv1
Assignee: nobody → arich
(In reply to comment #4)
>   talos-staging-master2.mv.mozilla.com (or whatever suffix "mv network"
> means)

Ack, that should be tools-staging-master2.mv.mozilla.com, or (since that exists) 3, or 4, ..
anode: tools-staging-master02.mv.mozilla.com already exists in DNS and DHCP, but the host does not respond to ping.  Should I be re-using this hostname, or should I, as Dustin suggests, move onto tools-staging-master03.mv.mozilla.com?
I believe that 02 was created but never used.  I'm fine with re-using that hostname, but you should check to see if an actual 02 masters needs to be deleted to get it out of the way.
I have built a new vm according to your specs, and it's sitting on the build vlan for now.  I think zandr would like to move it to a different vlan when we're able to get tagged vlans working on the kvm server in mtv1, but this at least allows you the opportunity to get the host puppetized and set up.
Assignee: arich → dustin
(In reply to comment #4)
>  1. new VM (amy)
check

>  2. puppetize (me)
check

>  3. install a staging master (me, maybe with puppet, which just runs
> setup-master)
check

>  4. get someone else in releng to look and make sure I didn't hork it up
check

>  5. reset passwords and hand off to anode
back to relops to move this to the correct VLAN.  Ping me in IRC for the passwords.  Once this is in your hands, you'll find the master under /builds/buildbot/tests1.  Most of the buildbot stuff is under the master/ directory.  You'll want to adjust the list of buildslaves in master/localconfig.py, and probably some other things.  Check with the master-wranglers in #build for help from here on out.
Assignee: dustin → server-ops-releng
Component: Release Engineering → Server Operations: RelEng
QA Contact: release → zandr
And moving to the right vlan is... hm. Don't have a firm ETA on that, stay tuned.
Amy, Dustin, Zandr, et all, I really appreciate the status reports in here.  Thanks for your help on this.
To clarify comment 10, there is a networking bug in RHEL6 that prevents us from having VLAN tagged bridges for VMs. The solution is going to be upgrading the host OS to 6.1 (unconfirmed if it works) or rebuilding on Ubuntu (known working, it's what we use everywhere else).

This is work we have to do anyway, but it wasn't blocking until now. cc: bkero so he can give an update on his RHEL6.1 testing.
Blocks: 657987
So what does this all mean? This is currently blocking a few bugs anode needs to be working on (657987, 648114, and 617762). I'm going to need a relatively firm ETA please.
Blocks: 592793
Blocks: 609111
Further nudge to find out ETA here.
If we're waiting for me on this, the ETA for my KVM bonding/bridging/vlan tagging tests will be in an hour, just as soon as I finish uploading this virtual machine to test with.

If you're waiting for a new VM from releng-ops, not much I can do that from that end besides offer them resources.
It was my understand that releng had a vm configured and ready and we were waiting on IT to shift it to the MV network where it will be reachable by the auto tools team.
This is blocked on the KVM bonding/bridging/vlan tagging issue.  The vm already exists on the kvm servers in mtv1.
I've confirmed the KVM bonding/bridging/vlan redhat bug was fixed in the RHEL 6.1 update, which has already been applied to the kvm.build.mtv1 servers.

I imagine the next step should be to install VLANs on the hosts, then add the VM to it.  This will require a VM reboot in order to make the network changes.

arich/zandr: which VLAN needed to be added?  We'll have to pass it off to netops/dumitru for adding a single tagged vlan.
This machine needs to be on vlan200 10.250.0.0

I believe everything else on that host is vlan500 10.250.48.0
This bug is hard to follow but in talking with Clint offline the dependent bug is really the only outstanding item here.  That bug was filed today. 

Will make sure that gets wrapped up Thursday.
bkero: While you have tools-staging-master02 down to modify the bridge, could you export it?

We'll need to import a copy of this in scl1 for bug 617762.
I can certainly export it.  Just to confirm, this host should be taken down to move it to vlan200, correct?
(In reply to comment #22)
> I can certainly export it.  Just to confirm, this host should be taken down
> to move it to vlan200, correct?

Correct.
The VM has been moved to vlan200, and I've exported a clone to bring up in scl1.
(In reply to comment #20)
> This bug is hard to follow but in talking with Clint offline the dependent
> bug is really the only outstanding item here.  That bug was filed today. 
> 
> Will make sure that gets wrapped up Thursday.

Whats left to do here?
arr was bringing up the clone of the new VM (tools-staging-master01) in scl1, as far as I know that's all that's left to do here.
We need to get the build-vpn extended to the new vlan, and that will possibly require  downtime of the build-vpn to reload (netops is investigating whether or not a downtime will be required).  netops has bug 664885 and I've cced jhford (as build duty next week) so he and they can coordinate any downtime if necessary.
talos-staging-master02.mv.mozilla.com has been cloned and imported as talos-addon-master1.amotest.scl1.mozilla.com
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Assignee: server-ops-releng → arich
Does this mean that this machine is on the special amo network?

This vm should be on mv, not amotest.  amotest is just for the amo vm + amo talos slaves.
Too many similar names in too many similar bugs.

talos-addon-master1.amotest is in amotest.

talos-staging-master02.mv is in mv.
I can't get ping to talos-staging-master02.mv when connected to the mv office vpn - could you get me the ip address so that I can determine if this is a dns issue?
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Sorry, I pasted the wrong hostname in the update (too much talos vs tools back and forth in my head).

tools-staging-master02.mv.mozilla.com has address 10.250.48.254 (mv)

talos-addon-master1.amotest.scl1.mozilla.com has address 10.12.47.20 (scl1, requires build vpn)
Gah, even I did that while I was trying not to. Yay target fixation. (comment 30)
Okay, I still can't get ping using the machine name.  It does resolve using the ip, but I am getting ssh timeout.  Is the vm up?
Status: REOPENED → RESOLVED
Closed: 13 years ago13 years ago
Resolution: --- → FIXED
I'd like to request clarification on which network this machine should be on.  Should it be on the same vlan as the current tools-staging-master and be accessed in the same way, or should it be on the generic mv vlan and not have access to the same things that the current tools-staging-master has access to?  I want to make sure we wind up putting the vm in a place that's not going to break your workflow.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Based on our conversation, I've moved tools-staging-master02 to vlan200 and verified that you can connect to it as root on its new IP.
Status: REOPENED → RESOLVED
Closed: 13 years ago13 years ago
Resolution: --- → FIXED
Component: Server Operations: RelEng → RelOps
Product: mozilla.org → Infrastructure & Operations
You need to log in before you can comment on or make changes to this bug.