Closed Bug 743916 Opened 12 years ago Closed 11 years ago

move api-dev out of the community VLAN

Categories

(Developer Services :: General, task)

x86
macOS
task
Not set
normal

Tracking

(Not tracked)

RESOLVED WONTFIX

People

(Reporter: dparsons, Assigned: fox2mike)

References

Details

Due to an oversight, the ESX blade in sjc1 that was hosting api.bugzilla.mozilla.org was removed today. I have transferred the VM to the ESX cluster in scl3 and got it working. It's in the "Infra" group.

The following things need to be done:

(1) Rename the VM something appropriate

(2) Figure out which VLAN it should be in and assign it a new public IP (in sjc1, it was in "Community Give", I have no idea if this maps to "Community" in scl3 or not)

(3) Edit /etc/network/interfaces to assign the new IP to the current NIC (eth2 I believe)

(4) Update DNS with the new IP

(5) Have someone test the API functionality it provides to make sure everything is working properly
By the way, the new root password for this VM is in passwords.txt.gpg
Blocks: 743827
This is pretty critical infrastructure - it's the BzAPI machine. It's required for TBPL orange-starring and other things.

See bug 743992 and bug 743827, plus IRC, for people complaining about its loss.

Gerv
Severity: normal → critical
Seems the actual URL is http(s)://api-dev.bugzilla.mozilla.org according to bug 743827 .
Working on this.
No http, only https.

Gerv
Name in inventory: api-dev (dns)
Renamed vm to api-dev (however not the actual hostname (yet)) to keep consistency.
Assigned a public IP from the community vlan and from a bugzilla allocation found in the DNS file.

From the list:
1. Done
2. Left as is (could be updated later if needed).
3. Done
4. Done
5. Waiting for DNS update then will check.

Main goal is/was to get server online.
DNS resolving, need port 443 open on api-dev.bugzilla.mozilla.org (63.245.223.32)
Port open, attempting to follow up on irc but if no response , hopefully e-mail reaches them.
VM was renamed to api-dev.community.scl3.mozilla.com and added to inventory.

[:mjessome] confirmed that everything appears to be working.

Bug can be re-opened if any issues arise.
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
A few problems here:
 1. As gerv said, this is critical infra for Mozilla, not a community host, so this is the wrong VLAN
 2. Space on this network is tight, and .32 squats the only unreserved /27 in there

So we'll need to renumber this.  Better to do it now than later, IMHO.

For the record, there *is* a replacement in the works for this, by :dkl, but it's not ready yet (726710 comment 7).

I would suggest a /32 in the DMZ VLAN, but that's sort of my go-to.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Oh, and it should be named bugzilla-api1, or something to indicate that it's bugzilla-related, and the machine's internal hostname should be updated, too.  Part of the confusion here is that it has too many names.
The api works for me now - imagine you'll set to fixed once the cleanup work is done.
Status: REOPENED → NEW
Jeff, yes.  We'll need to renumber, but we'll try to give some warning this time :)
Severity: critical → normal
:dkl, I can't find a bug for your bzapi work, but I think the best way out of this mess is to accelerate that project, get it running in production, and then turn this VM off -- rather than try to renumber this VM again.

Do you have a bug and/or a timeline for that?
Assignee: afernandez → dustin
Component: Server Operations: Virtualization → Server Operations
QA Contact: dparsons → phong
Summary: Fix up dm-bugzilla-api02 → Fix up dm-bugzilla-api02/api-dev/bugzilla-api01
(In reply to Dustin J. Mitchell [:dustin] from comment #15)
> :dkl, I can't find a bug for your bzapi work, but I think the best way out
> of this mess is to accelerate that project, get it running in production,
> and then turn this VM off -- rather than try to renumber this VM again.
> 
> Do you have a bug and/or a timeline for that?

Hi. I have not been tracking its progress in a bug (probably should) since it has been a side project of mine for quite some time that I have not needed to collaborate on with anyone else til now. That being said it is a stretch goal for our department (Ateam) for Q2 but I cannot
guarantee it will make it as we have several other projects that are much higher priority at the moment. I definitely do not want anyone scheduling their work load around just yet as 
it is definitely not production ready IMO although not far off if I can just find the time.

I will create a bug for tracking but setting any kind of strict time frame may take some 
effort. When is the current bzapi VMs slated to be moved? 

dkl
api-dev is de facto in production, so we're in a tough spot, and this needs a higher priority.

We'll need to move the VM again in about a month, and that will come with some downtime.

If we can put your work into production at that point instead, maybe that would be better?  How does the "production ready" status of the work you've done to date compare with the "production ready" status of api-dev?  How much will that change in the next month?  How compatible are the interfaces?
If dkl were to put his work up in the Bugzilla BZR repo, we could look and see :-)

Gerv
(In reply to Gervase Markham [:gerv] from comment #18)
> If dkl were to put his work up in the Bugzilla BZR repo, we could look and
> see :-)

Finally created an official bug (bug 747201) and patch coming soon.

dkl
Summary: Fix up dm-bugzilla-api02/api-dev/bugzilla-api01 → move api-dev out of the community VLAN
I'm going to punt this back to the server ops queue for triage.  The summary is that there's a VM in the community network which is the production instance of the current version of the Bugzilla API, and is critical to a few production services.  It's also sitting on an IP that the community systems need.  Since the replacement API isn't coming soon, I suspect we should just move the host back into the internal network, and maybe try to puppetize it and/or get some documentation of it for oncall.
Assignee: dustin → server-ops
Assignee: server-ops → afernandez
Current set-up:
* api-dev.bugzilla.mozilla.org is a CNAME to api-dev.community.scl3.mozilla.com (63.245.223.32)
* 63.245.223.32 Public open ports: 443

Proposed changes:
1. Rename api-dev.community.scl3.mozilla.com -> bugzilla-api1.dmz.scl3.mozilla.com
2. Obtain public IP from netops (NAT)+ port 443 flow.
3. api-dev.bugzilla.mozilla.org CNAME will be removed and NAT A record added.
4. Clean up (removed obsolete r/dns),renamed in inventory/vmware.

Caveats:
1. Host won't be accessible from community jumphost (flow could be added, if netop ok with it).
2. We could puppetize but not sure if something will break, in which case, would be best to start the host from scratch and go from there (as need manifests etc).
3. Minor downtime because of DNS update.

Request:
:dustin please update root password for host.

Please let me know if above ok so NAT request for netops could be made and to schedule a day for the change.
I still need access to this box to fix/update/monitor/debug the BzAPI software. It doesn't have to be via the community jumphost but there needs to be some way. As long as that's possible, I have no objections.

Gerv
I don't have the root password.  My involvement here was "hey what's this doing in the community VLAN".  Dan said it's in passwords.txt.gpg in comment 2, although I don't see it there.

My understanding was that we prefer Zeus over NAT where possible now.  Unless that's changed or there's a reason not to, I think we should use Zeus (and it's a lot easier to set up, anyway).

This isn't community-related at all, so a flow from the community jumphost would not be the right approach (and would also be hard).  I believe MPT-VPN has dmz access already, but if not, that could be plumbed pretty easily.

It's my understanding, per bug 722795 comment 3 and others, that this is a relatively temporary machine until :dkl finishes BzAPI's replacement.  So puppetizing doesn't makes sense.

That said, progress on bug 747201 (the new API, originally expected in 2012Q1) seems stalled, so IMHO a move makes sense, since we need the space in the community network.

And maybe it will prod folks into replacing this de-facto critical SPOF with something more reliable :)
Assignee: afernandez → shyam
Component: Server Operations → Server Operations: Developer Services
QA Contact: phong → shyam
Shyam asked me to say what BzAPI needs.

The documentation for installing it is here:
https://wiki.mozilla.org/Bugzilla:REST_API#Your_Own_Installation
That gives details of what packages are needed.

The Apache config which makes Catalyst run properly can be copied from the existing box. That's a bit of black magic to me - I got it working once and haven't messed with it since.

BzAPI has a test suite - we can install it on the new box and run the suite (against a test database, e.g. on landfill) to make sure everything is working properly.

The CPU requirements are minimal; it does very little processing. We need plenty of RAM, because we often have several instances going at once (different versions). We log every request, which needs a reasonable amount of disk. (The new box should have log rotation set up properly; I don't know how to do that, so the current one doesn't.)

Does it make most sense to bring the new box up as api.bugzilla.mozilla.org and ask people to transition? We can then move people using the production services (the ones pointed at bugzilla.mozilla.org) to the new box, and keep the test services (landfill, allizom etc.) on the dev box. That would eliminate the "-dev" in the name.

If, once the new one is up and running, we want to create a new dev box from scratch (or by cloning the production setup) rather than upgrading and patching the OS on the current one, that would probably be OK. We should decide what the plan is there.

Gerv
I'd like to start with the reverse, get a new dev box going first, have you test and make sure everything is working fine and then roll out a new production box and ask people to use api.bugzilla.mozilla.org.

Then we can let you use the new dev box for your dev work/testing, have a new stage box up so IT can push to and you can verify stuff and then IT pushes to prod (just like every other project we manage).

How does that sound?
If you prefer that way, that sounds good to me :-)

Gerv
Let me know what else you need from me.

Gerv
(In reply to Gervase Markham [:gerv] from comment #27)
> Let me know what else you need from me.
> 
> Gerv

Nothing for now.

Just to set expectations, this isn't the only thing I have on my plate, but I hope to get this done before the quarter is out.
Whiteboard: [2013Q1]
The BMO team will be bringing the Bugzilla native API up to speed in Q2 2013, therefore I'm not investing time and effort into this at the moment.
Status: NEW → RESOLVED
Closed: 12 years ago11 years ago
Resolution: --- → WONTFIX
Whiteboard: [2013Q1]
Component: Server Operations: Developer Services → General
Product: mozilla.org → Developer Services
You need to log in before you can comment on or make changes to this bug.