Closed Bug 1001705 Opened 10 years ago Closed 10 years ago

Move p3 panda rack

Categories

(Infrastructure & Operations Graveyard :: CIDuty, task)

task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: hwine, Unassigned)

References

Details

Attachments

(1 file)

Move one panda rack from scl1 -> scl3 to verify setups work correctly.

Rack 201-3-scl1 has been identified as the candidate.
Documenting steps to take out of service:

* got list of systems from csv export of rack in inventory:
   https://inventory.mozilla.org/en-US/systems/racks/?site=19&status=&rack=250&allocation=2
* massaged list to make host name column 1, and remove double quotes
* used braindump/it-related/get_enabled_hosts to record current status in slavealloc
* used braindump/it-related/disable_all_hosts to disable in slavealloc
* grep'd to extract foopy hostnames to new text file
* used braindump/it-related/manage_hosts to generate fabric file
* added the following to the fab file to create and list the shutdown files:
    def shutdown_pandas():
        with warn_only():
            run('''for f in $(ls -1d /builds/panda-*; do
                    touch $f/shutdown.stamp
                done''')

    def pandas_going_down():
        with warn_only():
            run('''ls -1d /builds/panda-*/shutdown.stamp''')

* will use autogenerated 'shut_down' fab command to shutdown foopies on Sunday.
Status: NEW → ASSIGNED
List of pandas in rack 201-3 which were previously enabled in slavealloc.

Can be used, if desired, when bringing the rack back online.
All foopies shut down via script from comment 1
$ ./manage_hosts rack-foopy-201-3.txt ping
foopy59.p3.releng.scl1.mozilla.com not pingable
foopy56.p3.releng.scl1.mozilla.com not pingable
foopy57.p3.releng.scl1.mozilla.com not pingable
foopy54.p3.releng.scl1.mozilla.com not pingable
foopy55.p3.releng.scl1.mozilla.com not pingable
foopy53.p3.releng.scl1.mozilla.com not pingable

Done.
Assignee: hwine → nobody
Status: ASSIGNED → NEW
This correlates to the p3 rack and all of the p3 tabs as defined in https://docs.google.com/a/mozilla.com/spreadsheets/d/1Z_1fGFD3lg53YiRlIFti_GV6RGOgKRzlPoepNBVYloU
Summary: Move 1 panda rack → Move p3 panda rack
Vinh: we're still waiting on the rest of the information in the p3 tabs of https://docs.google.com/a/mozilla.com/spreadsheets/d/1Z_1fGFD3lg53YiRlIFti_GV6RGOgKRzlPoepNBVYloU to be complete before we can modify inventory.  Please coordinate with uberj for those changes, as they need to be made after we shut things down, but before we power up any of the gear at scl3.
Blocks: 981169
No longer blocks: 981124
Flags: needinfo?(vhua)
Flags: needinfo?(juber)
Just want to double check that rack 201-3 pandas are ready to be uncabled/unracked?  

arr: I'll have the inventory updated in the p3 tabs today.
Flags: needinfo?(vhua)
All the hosts in rack 201-3 have been unracked.  WPR has scheduled professional movers to transport the pandas to SCL3 tomorrow (Tues 4/29) @ 10am.
The info for the p3 tabs have been filled out.
arr:  We're not seeing the vlans 216 and 303 in core1.releng.scl3.mozilla.net.
:vinh: I've tagged dave from netops with a needinfo about the core switch.
Flags: needinfo?(dcurado)
Hi -- I am about to add the vlans for the pandas to the core switch.
Should have that done shortly.
However, those are vlans 301-310.
What is vlan 216 expected to be used for?

Thanks.
Flags: needinfo?(juber)
Flags: needinfo?(dcurado)
I just added vlans 301-310 to core1.releng.scl3.
vlan 216 already exists there -- so you should be able to see it.
Please let me know if that is not the case.
Thanks.
I see the added vlans.  Thanks dcurado!
interfaces for vlans301-310 have been added to fw1.releng.scl3.
bootp helper (for dhcp) for these interfaces has been enabled
security policies have been updated on fw1.releng.scl3 and fw1.scl3 giving the new panda vlans
the same access to ... stuff... that the panda vlans in scl1 have.

One exception: the policy giving the panda vlans access to various buildbot systems 
appears to reference a lot of buildbot systems in scl1, which have apparently all
been shutdown.  If the panda vlans require access to buildbot systems, please 
add a comment here with the specifications for it.  Thanks.
Received the following nagios alerts - is this related?

Wed 09:59:42 PDT [4707] panda-relay-025.p3.releng.scl3.mozilla.com:Mozpool relay health is UNKNOWN: UNKNOWN: Traceback (most recent call last): (http://m.mozilla.org/Mozpool+relay+health)
9:59:43 Wed 09:59:42 PDT [4708] panda-relay-023.p3.releng.scl3.mozilla.com:Mozpool relay health is UNKNOWN: UNKNOWN: Traceback (most recent call last): (http://m.mozilla.org/Mozpool+relay+health)
9:59:43 Wed 09:59:42 PDT [4709] panda-relay-024.p3.releng.scl3.mozilla.com:Mozpool relay health is UNKNOWN: UNKNOWN: Traceback (most recent call last): (http://m.mozilla.org/Mozpool+relay+health)
9:59:43 Wed 09:59:42 PDT [4710] panda-relay-028.p3.releng.scl3.mozilla.com:Mozpool relay health is UNKNOWN: UNKNOWN: Traceback (most recent call last): (http://m.mozilla.org/Mozpool+relay+health)
9:59:45 Wed 09:59:42 PDT [4711] panda-relay-027.p3.releng.scl3.mozilla.com:Mozpool relay health is UNKNOWN: UNKNOWN: Traceback (most recent call last): (http://m.mozilla.org/Mozpool+relay+health)
9:59:48 Wed 09:59:42 PDT [4712] panda-relay-029.p3.releng.scl3.mozilla.com:Mozpool relay health is UNKNOWN: UNKNOWN: Traceback (most recent call last): (http://m.mozilla.org/Mozpool+relay+health)
9:59:51 Wed 09:59:42 PDT [4713] panda-relay-026.p3.releng.scl3.mozilla.com:Mozpool relay health is UNKNOWN: UNKNOWN: Traceback (most recent call last): (http://m.mozilla.org/Mozpool+relay+health)
9:59:54 Wed 09:59:42 PDT [4714] panda-relay-030.p3.releng.scl3.mozilla.com:Mozpool relay health is UNKNOWN: UNKNOWN: Traceback (most recent call last): (http://m.mozilla.org/Mozpool+relay+health)
Flags: needinfo?(jwatkins)
Flags: needinfo?(dustin)
I'd say it is a safe bet that it is related.
We tried to boot/install one server last night, but I had to update firewall policies in a
different part of the SCL3 data center to get it all to work.
I did not want to rush through that (the work has been done as of this morning) but 
arr is in transit until friday AM.  

So, I recommend ack'ing the nagios alert for 3 days if you can?
Sorry 'bout that.
Dave
No worries. :)

Ack'd:
nagios-releng: ack 4707 bug 1001705
nagios-releng: ack 4708 bug 1001705
nagios-releng: ack 4709 bug 1001705
nagios-releng: ack 4710 bug 1001705
nagios-releng: ack 4711 bug 1001705
nagios-releng: ack 4712 bug 1001705
nagios-releng: ack 4713 bug 1001705
nagios-releng: ack 4714 bug 1001705
Looks like reverse zones for the new IP ranges in scl3 didn't get created. Leaving this comment here to remind myself to do that.
Also:

Wed 11:50:29 PDT [4792] mobile-imaging-003.p3.releng.scl3.mozilla.com:http is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Temporarily Unavailable - 568 bytes in 0.005 second response time (http://m.mozilla.org/http)

ack'd:
nagios-releng: ack 4792 bug 1001705
panda health checks in nagios are failing since mozpool is not running on the panda associated imaging server.  mobile-imaging-003.p3.releng.scl3.m.c in this case.

Mozpool is not running since the imaging server has yet to be reimaged(kickstarted) to get it new fqdn.
Flags: needinfo?(jwatkins)
also, system.imaging_server.0 != system_imaging_server.0

This k/v needs to be fixed on all the pandas in p3 before inv sync to mozpool will work.
I fixed the key value store.
uberj: the CNAME for mobile-imaging-003-mgmt.build.mozilla.org was still pointing at the scl1 address.  I've fixed this manually, but the script didn't dtrt.

I've successfully reimaged foopy53-foopy59.  I am unable to reimage mobile-imaging-003 because I can't get an ikvm connection to work.  I suspect it's related to the DNS issues mentioned above, and I'll try again after mobile-imaging-003-mgmt.build.mozilla.org resolves and I can use the short name.
Flags: needinfo?(dustin) → needinfo?(juber)
The issue with mobile-imaging-003's ikvm was not being able to use the bare hostname.  Once DNS caught up, I was able to reimage it.
I did the mobile-imaging host by hand because it had that extra CNAME. I obviously wasn't thorough with my changes :P
Flags: needinfo?(juber)
mobile-imaging-003 looks good after the kickstart.  Mozpool is running and the db has been synced up with the updated inventory data.
So should these pandas be available for production now?  I have setup foopy57 with the directories again for panda307-319 but get timeouts trying to connect to my dev master dev-master1.srv.releng.scl3.mozilla.com on port 9036.  These are the pandas I use for staging purposes.
Flags: needinfo?(jwatkins)
Hi, please see comment 15.
Never got any specifications.
So, if you know which buildbot servers the pandas should be able to talk to, I'll gladly set that
up for you.
Thanks -- Dave
Dave, every slave (panda and foopy in this case) should have flows to every buildbot master, over the ports as per every other slave.

See, e.g. 

Bug 972508, which had a similar issue with new masters not being added as per other masters.
Flags: needinfo?(dcurado)
Here's the thing:  I don't really know what "every buildbot master" means.
That's why I asked for a specification back in comment 15.
Seemed like a reasonable request to me.

Are they all in the 10.26.48/22 subnet?  
Does this list look complete?

address-set buildbot-master-set {
    address buildbot-master30;
    address buildbot-master31;          
    address buildbot-master32;
    address buildbot-master33;
    address buildbot-master36;
    address buildbot-master37;
    address buildbot-master38;
    address preproduction-master;
    address buildbot-master81;
    address buildbot-master82;
    address buildbot-master83;
    address buildbot-master84;
    address buildbot-master88;
    address buildbot-master89;
    address buildbot-master85;
    address buildbot-master86;
    address buildbot-master87;
    address buildbot-master102;
    address dev-master1;
    address buildbot-master103.srv.releng.scl3;
    address buildbot-master104.srv.releng.scl3;
    address buildbot-master105.srv.releng.scl3;
    address buildbot-master106.srv.releng.scl3;
    address buildbot-master107.srv.releng.scl3;
    address buildbot-master108.srv.releng.scl3;
    address buildbot-master109.srv.releng.scl3;
    address buildbot-master110.srv.releng.scl3;
    address buildbot-master111.srv.releng.scl3;
    address buildbot-master112.srv.releng.scl3;
}

If I'm left to just guess at this stuff, then what are the chances you're opening another
bug tomorrow because you can't reach some buildbot master that you expected to be able
to reach?  

Thanks in advance for your help in helping you.
Flags: needinfo?(dcurado)
(In reply to Justin Wood (:Callek) from comment #30)
> Dave, every slave (panda and foopy in this case) should have flows to every
> buildbot master, over the ports as per every other slave.
> 
> See, e.g. 
> 
> Bug 972508, which had a similar issue with new masters not being added as
> per other masters.

Callek: Can you provide an explicit list of buildbot masters?  I think that is what dave is looking for here since current list in the policy has outdated systems listed.

Also, bug 972508 seems to be about buildbot master not having a flow to the db vips.  Not sure if that is relevant here.
I think what is confusing matters here is we (releng) believe there is a master network plan for all of our gear which specifies both IP addresses and flows. This was established as we moved into scl3 and has been followed in each AWS region.

This work is documented at https://mana.mozilla.org/wiki/display/NOC/VLAN+assignments#VLANassignments-Releng -- which, as far as we know, remains authoritative as a planning guide and policy. All inventory assignments MUST fit into the correct buckets.

I know :tinfoil wanted some further decomposition of some of those netranges to simplify the firewall rules (when we switched to "deny all" on the releng firewall). Adding him here to clarify if there is a separate place where such documentation is kept.

Our (releng) interest is in being able to specify "set up another machine-of-type-x" and having all the IT documentation tied to that. We thought we had that -- what do we need to do to get back to that model?
Flags: needinfo?(mhenry)
Flags: needinfo?(dcurado)
> Callek: Can you provide an explicit list of buildbot masters?  I think that
> is what dave is looking for here since current list in the policy has
> outdated systems listed.
> 


I'm not Callek or hwine, but there is an explicit list of buildbot masters in puppet:
http://hg.mozilla.org/build/puppet/file/edd3c290bfc9/manifests/moco-nodes.pp#l238

If it's in puppet it should be canonical, right?

Hope that helps.
I think the only different I suggested to the VLAN structure was the creation of a separate VLAN for build servers.  This is because the current SRV subnet hosts machines of multiple -- including build servers.  Add to that build servers are created rather frequently, which requires frequent firewall policy changes.

Put build servers into their own VLAN and modify give all the build servers (subnet) a common policy.

This would save both RelEng and NetOps work while maintaining respectable network segmentation.

Aside from that I don't have any suggestions.
Flags: needinfo?(mhenry)
Laura wins -- puppet is the perfect place to be looking for this list.  Can you build the flows from that list?  I believe it's similar to the list I gave you in one of the myriad other bugs requiring a list of masters, but also includes dev-master1 which I unintentionally omitted.

Dave, you and I need to sit down and talk about this when I'm back online, because it's been a substantial source of friction recently.  I'm familar with what releng needs, but not how best to configure that.  I think things are the other way 'round for you, so hopefully we can meet in the middle.
Dustin, 
I don't know what you're talking about, and I feel like I'm being forced to 
defend myself here.

I've worked here for three months, have resolved 145 bugs, and only 4 have been from you.
(974888, 974970, 976534, 1000440)
Out of those, 2 were to clean up decommissioned hardware, 2 were to open new flows.
None of these bugs contained a list of hosts. 
Out of all the bugs I've worked on for opening flows, the requester has always 
supplied a hostname(s) and/or IP addresses, along with port numbers.
But in this case, people have been reluctant to provide that information.
I've been left scratching my head.  (again, see comment 15)

As for substantial friction, I don't know what you're talking about.
And, you're singling me out as the source.  I'm baffled and not a little hurt.

I've been working with Hal Wine and Amy Rich a fair amount, and I think they've been
happy with the work I've been doing.  Hal has sent email out to all of releng explaining
the clean up work I'm doing, and how we've shared a spreadsheet with a schedule of events.
Heck, I got push back about doing this because to some it looked like I was changing
the CAB process.  I argued that it does not cost us anything to go above and beyond
to increase releng's comfort with the network changes we're doing.
At this point, I'm not feeling like I got much a return on that investment.

Here's a quote from me on one of the bugs you opened:
"Resolved.  Thanks to Dustin for the patience, help, and education!"
Flags: needinfo?(dcurado)
I think there was some substantial miscommunication with comments 36 and 37.  Dave, I want to be clear that no one was saying you were doing anything wrong or you were the cause of friction here. On the contrary, we appreciate how fast you've come up to speed and all the help you've been giving us.  Let's take this to vidyo today where we have more bandwidth to communicate clearly.
No need to defend -- it's just an issue to work through, and not one you've caused or contributed to.  Sorry about any implication otherwise.  And from everything I can tell your hard work has been *very* valuable on the panda move.

I probably should have put my second paragraph in an email or meeting invite, instead.  I'll do that now.
(In reply to Hal Wine [:hwine] (use needinfo) from comment #33)
> I think what is confusing matters here is we (releng) believe there is a
> master network plan for all of our gear which specifies both IP addresses
> and flows. This was established as we moved into scl3 and has been followed
> in each AWS region.

So, in offline chats, it turns out I was confused. There was a plan, but it wasn't really documented and just lore that has since left the company.

My initial concern was that that we had regressed a process -- we have not. So, 2 paths forward:

 1. start a separate off bug discussion to get the process I think we _all_ want.
    Looks like Amy is starting this in comment 38 - please include me as optional (but do
    not block)

 2. move this panda bug forward -- as :laura & :dustin indicated, puppet is the best
    source of truth on this.

Dave - does comment 34 unblock you? If not, please ping me on irc and we'll get what you need today.
Flags: needinfo?(dcurado)
Flags: needinfo?(jwatkins)
:kmoir: we think we've got the buildbot master policies worked out now.  Can you try testing again please?
Here are the policies that have been added.  We found that some of the buildbot machines listed
in the puppet configuration file were missing, and they have also been added.
If anyone glancing at the list of hosts notices anything amiss, please let me
know.  Thanks.

Policy: build-master--buildbot, action-type: permit, State: enabled, Index: 2430, Scope Policy: 0
  Policy Type: Configured
  Sequence number: 1
  From zone: pod, To zone: srv
  Source addresses:
    any-ipv4: 0.0.0.0/0 
    any-ipv6: ::/0
  Destination addresses:
    buildbot-master102.srv: 10.26.48.24/32 
    buildbot-master101.srv: 10.26.48.20/32 
    buildbot-master100.srv: 10.26.48.19/32 
    buildbot-master99.srv: 10.26.48.18/32 
    buildbot-master112.srv.releng.scl3: 10.26.48.40/32 
    buildbot-master111.srv.releng.scl3: 10.26.48.39/32 
    buildbot-master110.srv.releng.scl3: 10.26.48.38/32 
    buildbot-master109.srv.releng.scl3: 10.26.48.37/32 
    buildbot-master108.srv.releng.scl3: 10.26.48.36/32 
    buildbot-master107.srv.releng.scl3: 10.26.48.32/32 
    buildbot-master106.srv.releng.scl3: 10.26.48.30/32 
    buildbot-master105.srv.releng.scl3: 10.26.48.28/32 
    buildbot-master104.srv.releng.scl3: 10.26.48.27/32 
    buildbot-master103.srv.releng.scl3: 10.26.48.26/32 
    dev-master1: 10.26.48.25/32 
    buildbot-master87: 10.26.48.57/32 
    buildbot-master86: 10.26.48.56/32 
    buildbot-master85: 10.26.48.55/32 
    buildbot-master89: 10.26.48.59/32 
    buildbot-master88: 10.26.48.58/32 
    buildbot-master84: 10.26.48.54/32 
    buildbot-master83: 10.26.48.53/32 
    buildbot-master82: 10.26.48.52/32 
    buildbot-master81: 10.26.48.51/32 
    preproduction-master: 10.26.48.21/32 
    buildbot-master30: 10.26.48.17/32
  Application: buildbot
    IP protocol: tcp, ALG: 0, Inactivity timeout: 86400
      Source port range: [0-0] 
      Destination port range: [8001-8006]
    IP protocol: tcp, ALG: 0, Inactivity timeout: 86400
      Source port range: [0-0] 
      Destination port range: [8101-8106]
    IP protocol: tcp, ALG: 0, Inactivity timeout: 86400
      Source port range: [0-0] 
      Destination port range: [8201-8206]
    IP protocol: tcp, ALG: 0, Inactivity timeout: 86400
      Source port range: [0-0] 
      Destination port range: [8000-8999]
    IP protocol: tcp, ALG: 0, Inactivity timeout: 86400
      Source port range: [0-0] 
      Destination port range: [7001-7006]
    IP protocol: tcp, ALG: 0, Inactivity timeout: 86400
      Source port range: [0-0] 
      Destination port range: [7101-7106]
    IP protocol: tcp, ALG: 0, Inactivity timeout: 86400
      Source port range: [0-0] 
      Destination port range: [7201-7206]
    IP protocol: tcp, ALG: 0, Inactivity timeout: 86400
      Source port range: [0-0] 
      Destination port range: [7301-7306]
    IP protocol: tcp, ALG: 0, Inactivity timeout: 86400
      Source port range: [0-0] 
      Destination port range: [7000-7999]
    IP protocol: tcp, ALG: 0, Inactivity timeout: 86400
      Source port range: [0-0] 
      Destination port range: [9101-9106]
    IP protocol: tcp, ALG: 0, Inactivity timeout: 86400
      Source port range: [0-0] 
      Destination port range: [9201-9206]
    IP protocol: tcp, ALG: 0, Inactivity timeout: 86400
      Source port range: [0-0] 
      Destination port range: [9301-9306]
    IP protocol: tcp, ALG: 0, Inactivity timeout: 86400
      Source port range: [0-0]          
      Destination port range: [9001-9020]
    IP protocol: tcp, ALG: 0, Inactivity timeout: 86400
      Source port range: [0-0] 
      Destination port range: [9000-9999]
    IP protocol: tcp, ALG: 0, Inactivity timeout: 86400
      Source port range: [0-0] 
      Destination port range: [9110-9110]
    IP protocol: tcp, ALG: 0, Inactivity timeout: 86400
      Source port range: [0-0] 
      Destination port range: [9100-9100]
    IP protocol: tcp, ALG: 0, Inactivity timeout: 86400
      Source port range: [0-0] 
      Destination port range: [9120-9120]

-------------------------------------------------------------------
Policy: aws-build-master--buildbot, action-type: permit, State: enabled, Index: 2432, Scope Policy: 0
  Policy Type: Configured
  Sequence number: 1
  From zone: pod, To zone: vpc
  Source addresses:
    any-ipv4: 0.0.0.0/0 
    any-ipv6: ::/0
  Destination addresses:
    buildbot-master06.srv.releng.usw2: 10.132.48.18/32 
    buildbot-master05.srv.releng.usw2: 10.132.48.17/32 
    buildbot-master04.srv.releng.usw2: 10.132.48.16/32 
    buildbot-master03.srv.releng.use1: 10.134.48.10/32 
    buildbot-master02.srv.releng.use1: 10.134.48.9/32 
    buildbot-master01.srv.releng.use1: 10.134.48.8/32 
    buildbot-master98.srv.releng.use1: 10.134.48.51/32 
    buildbot-master97.srv.releng.usw2: 10.132.50.51/32 
    buildbot-master96.srv.releng.use1: 10.134.48.53/32 
    buildbot-master95.srv.releng.usw2: 10.132.49.197/32 
    buildbot-master94.srv.releng.use1: 10.134.48.86/32 
    buildbot-master93.srv.releng.usw2: 10.132.50.156/32 
    buildbot-master92.srv.releng.use1: 10.134.49.214/32 
    buildbot-master91.srv.releng.usw2: 10.132.48.136/32 
    buildbot-master90.srv.releng.use1: 10.134.49.133/32 
    buildbot-master80.srv.releng.usw2: 10.132.49.150/32 
    buildbot-master79.srv.releng.usw2: 10.132.49.117/32 
    buildbot-master78.srv.releng.usw2: 10.132.49.158/32 
    buildbot-master77.srv.releng.use1: 10.134.49.77/32 
    buildbot-master76.srv.releng.use1: 10.134.48.40/32 
    buildbot-master75.srv.releng.use1: 10.134.48.158/32 
    buildbot-master74.srv.releng.usw2: 10.132.49.94/32 
    buildbot-master73.srv.releng.usw2: 10.132.49.181/32 
    buildbot-master72.srv.releng.usw2: 10.132.50.54/32 
    buildbot-master71.srv.releng.use1: 10.134.49.111/32 
    buildbot-master70.srv.releng.use1: 10.134.49.94/32 
    buildbot-master69.srv.releng.use1: 10.134.48.106/32 
    buildbot-master68.srv.releng.usw2: 10.132.50.89/32 
    buildbot-master67.srv.releng.use1: 10.134.49.70/32 
    buildbot-master66.srv.releng.usw2: 10.132.50.247/32 
    buildbot-master64.srv.releng.usw2: 10.132.50.44/32 
    buildbot-master63.srv.releng.use1: 10.134.48.196/32 
    buildbot-master62.srv.releng.use1: 10.134.48.236/32 
    buildbot-master61.srv.releng.use1: 10.134.49.62/32 
    buildbot-master60.srv.releng.usw2: 10.132.50.56/32 
    buildbot-master59.srv.releng.use1: 10.134.49.223/32 
    buildbot-master58.srv.releng.usw2: 10.132.49.125/32 
    buildbot-master57.srv.releng.use1: 10.134.48.228/32 
    buildbot-master56.srv.releng.usw2: 10.132.50.142/32 
    buildbot-master55.srv.releng.use1: 10.134.49.93/32 
    buildbot-master54.srv: 10.132.50.240/32 
    buildbot-master53.srv: 10.132.49.154/32 
    buildbot-master52.srv: 10.134.48.234/32 
    buildbot-master51.srv: 10.134.48.31/32
  Application: buildbot
    IP protocol: tcp, ALG: 0, Inactivity timeout: 86400
      Source port range: [0-0] 
      Destination port range: [8001-8006]
    IP protocol: tcp, ALG: 0, Inactivity timeout: 86400
      Source port range: [0-0] 
      Destination port range: [8101-8106]
    IP protocol: tcp, ALG: 0, Inactivity timeout: 86400
      Source port range: [0-0] 
      Destination port range: [8201-8206]
    IP protocol: tcp, ALG: 0, Inactivity timeout: 86400
      Source port range: [0-0] 
      Destination port range: [8000-8999]
    IP protocol: tcp, ALG: 0, Inactivity timeout: 86400
      Source port range: [0-0] 
      Destination port range: [7001-7006]
    IP protocol: tcp, ALG: 0, Inactivity timeout: 86400
      Source port range: [0-0] 
      Destination port range: [7101-7106]
    IP protocol: tcp, ALG: 0, Inactivity timeout: 86400
      Source port range: [0-0] 
      Destination port range: [7201-7206]
    IP protocol: tcp, ALG: 0, Inactivity timeout: 86400
      Source port range: [0-0] 
      Destination port range: [7301-7306]
    IP protocol: tcp, ALG: 0, Inactivity timeout: 86400
      Source port range: [0-0] 
      Destination port range: [7000-7999]
    IP protocol: tcp, ALG: 0, Inactivity timeout: 86400
      Source port range: [0-0] 
      Destination port range: [9101-9106]
    IP protocol: tcp, ALG: 0, Inactivity timeout: 86400
      Source port range: [0-0] 
      Destination port range: [9201-9206]
    IP protocol: tcp, ALG: 0, Inactivity timeout: 86400
      Source port range: [0-0] 
      Destination port range: [9301-9306]
    IP protocol: tcp, ALG: 0, Inactivity timeout: 86400
      Source port range: [0-0] 
      Destination port range: [9001-9020]
    IP protocol: tcp, ALG: 0, Inactivity timeout: 86400
      Source port range: [0-0] 
      Destination port range: [9000-9999]
    IP protocol: tcp, ALG: 0, Inactivity timeout: 86400
      Source port range: [0-0] 
      Destination port range: [9110-9110]
    IP protocol: tcp, ALG: 0, Inactivity timeout: 86400
      Source port range: [0-0] 
      Destination port range: [9100-9100]
    IP protocol: tcp, ALG: 0, Inactivity timeout: 86400
      Source port range: [0-0] 
      Destination port range: [9120-9120]
Flags: needinfo?(dcurado)
Depends on: 1007342
Depends on: 1008219
:kmoir: do we have confirmation that the pandas in the moved rack are working correctly? On 2014-05-15, we'll make the call on whether or not to start moving more racks, based on whether or not this one was successful and we have all of the dependencies nailed down and fixed.
Flags: needinfo?(kmoir)
Blocks: 981124
Yes they look good.  As an aside, I documented the changes that are needed on the releng side of things when moving pandas
https://wiki.mozilla.org/ReleaseEngineering/How_To/Android_Tegras#Releng_changes_required_when_moving_panda_racks_from_one_data_centre_to_another
Flags: needinfo?(kmoir)
foopy53 though to foopy59 all have this problem:

[cltbld@foopy58.p3.releng.scl3.mozilla.com ~]$ nc -vz -w 2 blobupload.elasticbeanstalk.com 443
nc: connect to blobupload.elasticbeanstalk.com port 443 (tcp) timed out: Operation now in progress

which results in this on connected panda jobs:
17:00:58     INFO -  (blobuploader) - INFO - Uploading /builds/panda-0291/test/build/blobber_upload_dir/7e4b1970-c2d9-b528-6db27876-414f5889.dmp ...
17:00:58     INFO -  (blobuploader) - INFO - Using https://blobupload.elasticbeanstalk.com
17:00:58     INFO -  (blobuploader) - INFO - Uploading, attempt #1.
17:01:19     INFO -  (blobuploader) - CRITICAL - Unexpected error in client: Traceback (most recent call last):
17:01:19     INFO -  ConnectionError: HTTPSConnectionPool(host='blobupload.elasticbeanstalk.com', port=443): Max retries exceeded with url: /blobs/sha512/193b31764e2bf54edafc18b35c2ca13bdbad439ef238c723212771b7fd8dae9b3242eee8fec859730b846bf08b34c6938cc4b120906809232871d668078065cd (Caused by <class 'socket.error'>: [Errno 110] Connection timed out)

Probably we need a new firewall rule for the moved foopies. Do we have some existing rules with destination 10.134.84.138/443 ?
Callek filed bug 1013029.
Depends on: 1013029
Component: Other → Platform Support
QA Contact: coop
done long ago
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Component: Platform Support → Buildduty
Product: Release Engineering → Infrastructure & Operations
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: