This will mean adding interfaces for admin1a/b on the build network, pointing some staging hosts at them as done in bug 704329, then pointing all hosts to them. (ns1/2 are still doing DHCP, so we can't kill them yet).
we traded - matt's going to set up the rabbitmq hosts
Assignee: mlarrain → dustin
vlan 48 interfaces are already set up. The VIP is 10.12.75.27. Up next, I'll need a list of hosts in scl1 that I can point to this nameserver. As an aside, this will also fix the DNS lookup problems we've had during scl1 outages -- these nameservers are authoritative for all mozilla addresses. Of course, we'll still have connection errors during the outages. This is exactly the same config as used for the mtv1 nameservers (bug 704329), so I'm pretty sure it will work just fine. So let's give it a few days (say, until Friday morning) with the subset of hosts, then turn it on fully.
scl1 hosts: linux-ix-slave05 linux64-ix-slave01 w32-ix-slave01 w64-ix-slave41 talos-r3-fed-064 talos-r3-fed64-001
w64-ix-slave41 actually uses the dc's as its resolver, so I didn't change that. No need to worry about those hosts. linux-ix-slave05 is in mtv1 - is there a staging linux host in scl1? Also, I reverted the change for the moment, as the VIP isn't working like I wanted it to.
d'oh linux-ix-slave03 is pp and in scl1
OK, I have a puppet module to allow a host to have multiple VIPs. I'll commit it at a time I'm not about to walk out the door. That should allow us to serve DNS on a VIP in the build network.
The VIPs are running now: 10.12.75.11 10.12.48.19 so I made the corresponding DHCP changes for the hosts listed above. Let's see what happens.
Duplicate of this bug: 714539
I'm increasing the priority here as this is likely (but not certain) to be a fix for ongoing hiccups in scl1. I'll evaluate the experimental results with buildduty on Tuesday, and if no failures are seen, turn this on dc-wide at that time.
Severity: normal → major
Assignee: dustin → mlarrain
Bear, do you see any DNS-related problems on any of these hosts?
cool -- Matt, let's point all of the hosts configured with DHCP toward the vlan48 VIP.
Added commented out option for this and will turn it on later in the week per bear's ok as the build network is in the middle of a beta release.
Status: NEW → ASSIGNED
Duplicate of this bug: 712207
We almost ran with this today, until I noticed that the VIP was misconfigured, and that admin1b didn't have its VLAN trunking set up correctly. Both have now been remedied. Now, with admin1b as VIP master and named stopped on admin1a: [cltbld@linux-ix-slave30 ~]$ nslookup talos-r4-snow-001.build.mozilla.org 10.12.48.19 Server: 10.12.48.19 Address: 10.12.48.19#53 talos-r4-snow-001.build.mozilla.org canonical name = talos-r4-snow-001.build.scl1.mozilla.com. Name: talos-r4-snow-001.build.scl1.mozilla.com Address: 10.12.51.29 And the same with admin1a/b reversed. So I'm happy with the failover characteristics here now. Matt, Bear, go ahead and deploy this.
Nameservers have been changed over
Status: ASSIGNED → RESOLVED
Last Resolved: 7 years ago
Resolution: --- → FIXED
Component: Server Operations: RelEng → RelOps
Product: mozilla.org → Infrastructure & Operations
You need to log in before you can comment on or make changes to this bug.