Closed Bug 528281 Opened 15 years ago Closed 15 years ago

Image 9 reclaimed mac minis for production build/unittest pool

Categories

(mozilla.org Graveyard :: Server Operations, task)

x86
macOS
task
Not set
critical

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: joduinn, Assigned: dmoore)

References

Details

(Whiteboard: needs MPT DNS)

While these are 1.83ghz, these are all 2gb ram, so cant be used for talos. Imaging for builders as:

moz2-darwin9-slave29
moz2-darwin9-slave30
moz2-darwin9-slave31
moz2-darwin9-slave32
moz2-darwin9-slave33
moz2-darwin9-slave34
moz2-darwin9-slave35
moz2-darwin9-slave36
moz2-darwin9-slave37
please add following to dns, inventory, nagios, etc.

moz2-darwin9-slave29 mac#00:16:cb:af:2c:36 serial#g88190olyl3 asset#02342
moz2-darwin9-slave30 mac#00:16:cb:af:2c:37 serial#g88190rfyl3 asset#02344
moz2-darwin9-slave31 mac#00:16:cb:ae:ae:c7 serial#g88190r3yl3 asset#02345
moz2-darwin9-slave32 mac#00:16:cb:af:5d:25 serial#g88190r6yl3 asset#02343
moz2-darwin9-slave33 mac#00:16:cb:b0:37:aa serial#g882726pyl3 asset#00796
moz2-darwin9-slave34 mac#00:16:cb:ab:d0:ea serial#g87412bxyl3 asset#00648
moz2-darwin9-slave35 mac#00:16:cb:b0:83:cb serial#g8827282yl3 asset#00797
moz2-darwin9-slave36 mac#00:16:cb:b0:63:43 serial#g882728syl3 asset#00795
moz2-darwin9-slave37 mac#00:16:cb:b0:82:ab serial#g882728gyl3 asset#00794
Assignee: server-ops → phong
ping on the dns, inventory, nagios, etc?
Severity: normal → critical
everything is done except nagios which is being track on a different bug 527814.
Status: NEW → RESOLVED
Closed: 15 years ago
Depends on: 527814
Resolution: --- → FIXED
may need to reboot these for them to pick up the correct DHCP address.
DNS still isn't setup, according to zandr
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
--- moz2-darwin9-slave29.mv.mozilla.com ping statistics ---
3 packets transmitted, 3 packets received, 0% packet loss
round-trip min/avg/max/stddev = 25.978/28.824/33.502/3.333 ms
host-4-198:dhcpconfig phong$ ping moz2-darwin9-slave32
PING moz2-darwin9-slave32.mv.mozilla.com (10.250.48.209): 56 data bytes
64 bytes from 10.250.48.209: icmp_seq=0 ttl=63 time=35.224 ms
64 bytes from 10.250.48.209: icmp_seq=1 ttl=63 time=27.751 ms

are you VPN to MV instead of MPT?
Oh, I see what's going on. I think we have some crossed wires, though. These machines are going to be production build slaves, and need to be inside of the build network.

Is that possible?
dmoore: i think this might be a firewall issue.  these need access to the production master on vlan71
Assignee: phong → dmoore
Firewall issues have been addressed, but we're still working through DNS.
dmoore/phong: whats blocking here? Anything we can do to unjam?
Depends on: 525357
Associated IP addresses:

moz2-darwin9-slave29 10.250.48.206
moz2-darwin9-slave30 10.250.48.207
moz2-darwin9-slave31 10.250.48.208
moz2-darwin9-slave32 10.250.48.209
moz2-darwin9-slave33 10.250.48.210
moz2-darwin9-slave34 10.250.48.211
moz2-darwin9-slave35 10.250.48.212
moz2-darwin9-slave36 10.250.48.213
moz2-darwin9-slave37 10.250.48.214
Just met with dmoore/mrz, heres a quick summary:

1) Fixed firewall. It was set to just allow from Castro->MPT, but would not allow MPT->Castro. Firewall rules now changed to bidirectional. We believe that access to all necessary systems like stage, p-m, p-m02, hg, etc is now in place.

2) There is a bigger project in progress in bug#525357, to unify the DNS namespaces. I'll leave the dep.bug in place for now, but if the following workaround works, then we can remove the depbug.

3) In the interim, as a possible workaround, can you try to add these new slave machine names / IP addresses to the local hostfile of production-master, production-master02, and see if the master can find the slaves and allocate work as usual? For example:
10.250.48.206 moz2-darwin9-slave29.mv.mozilla.com moz2-darwin9-slave29.build.mozilla.org
...

and see if this works? Note the way each IP address is given 2 aliases, so they will continue to work as bug#525357 is worked on.


(If it does *not* work, can you comment here with approx times, as this would help dmoore with forensics)
These slaves still don't have access to at least the following:
* production-puppet (tcp/8140, ssh)
* staging-master (tcp/9010)
* staging-stage (ssh)
* aus2-staging (ssh)

I've gone through our configs and setup docs and I *think* that's it, but it's hard to know for sure until they're actually running. There is definitely a lot of hosts that we'll have to throw in /etc/hosts to work around the DNS problem.

...which brings me back to my request to have these put in the build network. I'm not convinced there's any other way to have these slaves in line with the rest of them.
> ...which brings me back to my request to have these put in the build network.

Build network is what, "MPT Build" or "Castro Build" ?  They are in the latter right now.

The DNS issues should be resolved by the end of this week.
(In reply to comment #15)
> > ...which brings me back to my request to have these put in the build network.
> 
> Build network is what, "MPT Build" or "Castro Build" ?  They are in the latter
> right now.

Castro Build, I guess, since that's where they are. When I say "build network" I expect that it implies at least the following: has identical firewalling to the build machines in mpt; slaves are foo.build.mozilla.org; dns setup is identical.

My worry with the current setup is that every time we need to tweak the firewall, open a port, add a new service - or whatever - we're going to forgot about some of the slaves.
 
> The DNS issues should be resolved by the end of this week.

great!
 
> Castro Build, I guess, since that's where they are. When I say "build network"
> I expect that it implies at least the following: has identical firewalling to
> the build machines in mpt; slaves are foo.build.mozilla.org; dns setup is
> identical.

Pretty sure we're just hitting a couple implementation problems with doing a remote build network over an IPSEC VPN.  Once all those issues are resolved I wouldn't expect additional hosts to have repeat problems.
(In reply to comment #17)
> > Castro Build, I guess, since that's where they are. When I say "build network"
> > I expect that it implies at least the following: has identical firewalling to
> > the build machines in mpt; slaves are foo.build.mozilla.org; dns setup is
> > identical.
> 
> Pretty sure we're just hitting a couple implementation problems with doing a
> remote build network over an IPSEC VPN.  Once all those issues are resolved I
> wouldn't expect additional hosts to have repeat problems.

Ah, OK. That's pretty reassuring. Thanks Matthew.
mrz/dmoore: any ETA on this? We just tried again today, and are still unable to reach these machines by name.
(In reply to comment #19)
> mrz/dmoore: any ETA on this? We just tried again today, and are still unable to
> reach these machines by name.

We're also unable to reach the ones mentioned in comment #14 by IP address. Specifically:
* production-puppet (tcp/8140, ssh)
* staging-master (tcp/9010)
* staging-stage (ssh)
* aus2-staging (ssh)
Whiteboard: needs MPT DNS
(In reply to comment #20)
> (In reply to comment #19)
> > mrz/dmoore: any ETA on this? We just tried again today, and are still unable to
> > reach these machines by name.
> 
> We're also unable to reach the ones mentioned in comment #14 by IP address.
> Specifically:
> * production-puppet (tcp/8140, ssh)
> * staging-master (tcp/9010)
> * staging-stage (ssh)
> * aus2-staging (ssh)

These are all working now.

I was working on getting them hooked up to the Puppet server and hit a new one, though: can't mount the NFS share on 10.2.71.136. We serve some files over this share with Puppet and need to mount it on the slaves. It would be *really* preferable to mount it with this IP address too - once installed, Puppet manages fstab, which mounts the share with that IP address.
Ben,

This is a simple fix, we just need to update the export permissions on the NFS server. Which directories are you attempting to mount? From the existing permissions, it appears you're likely to need access to:

/export/build_tree
/export/buildlogs

Anything else?
(In reply to comment #22)
> Ben,
> 
> This is a simple fix, we just need to update the export permissions on the NFS
> server. Which directories are you attempting to mount? From the existing
> permissions, it appears you're likely to need access to:
> 
> /export/build_tree
> /export/buildlogs
> 
> Anything else?

/export/buildlogs/puppet-files is the only one we need on the slaves
NFS export is now available to the Castro build network.
Status: REOPENED → RESOLVED
Closed: 15 years ago15 years ago
Resolution: --- → FIXED
(In reply to comment #24)
> NFS export is now available to the Castro build network.

WFM. I'll be running through the rest of the setup tomorrow. Hopefully this was the last bit.
Product: mozilla.org → mozilla.org Graveyard
You need to log in before you can comment on or make changes to this bug.