Closed Bug 749471 Opened 12 years ago Closed 10 years ago

dev, staging environments for autoland

Categories

(Infrastructure & Operations Graveyard :: WebOps: Other, task, P4)

Tracking

(Not tracked)

RESOLVED WONTFIX

People

(Reporter: dustin, Unassigned)

Details

(Whiteboard: [triaged 20120907][waiting][releng])

Autoland is already set up as a webapp, so deployment of new nodes is easy.  It currently only has a production instance in phx1.

This should get dev and staging instances, each a single VM.  If these are in scl3, they'll need their own onboard rabbitmq servers (since there's no generic celery cluster in scl3).  They should also have a *different* SSH key that is not scm level 1.

This can wait until after the sjc1 evac is complete, or at least until the virtualization team is less overloaded.
Autoland isn't dev services.
Assignee: server-ops-devservices → server-ops-releng
Component: Server Operations: Developer Services → Server Operations: RelEng
QA Contact: shyam → arich
Hal, neither Lukas nor Marc are working for releng anymore.  Who, if anyone has taken over this project?  My understanding is that it's currently non-functional, even in prod.  Should we work at removing the service entirely like you've done with briar patch, WONTFIX this until there are releng resources available to maintain this or..?
Autoland is, at the moment, community supported, coordinated by Lukas. (None of the production machinery is on the build-vpn.) Adding Lukas to comment on dev/staging machine needs & timetables.
This was not set up as a community service.  There is no access to these machines by community members since they sit inside the mozilla internal infrastructure.
Bad choice of terms - I didn't mean the traditional "community" - I meant "non-releng folk who already have access to the internal infrastructure".
Stealing this into WebOps, because...

This is hosted on 2 servers, autoland1.shared.phx1 and autoland2.shared.phx1.

autoland1 is a webapp module admin node. It's got one env configured, "autoland". Members are itself and autoland2. It has a slightly weird "update" script, but a normal "deploy" script, and uses the normal src/www git deployment mechanism. There appears to be no auto-deploy mechanism, so I presume folks are logging in to this node to do an update/deploy.

autoland2 is a more normal web node. It however lacks a /etc/motd, making me wonder how much if it is really puppetized like a normal webapp node. Might need some massaging.

I found some documentation for the app here:
https://wiki.mozilla.org/ReleaseEngineering:Autoland

I don't know what the domain/URL for this app is... it might not have anything obvious (user-accessible), because it's intended to pull from bugzilla and push to hg... it doesn't necessarily have a user interface itself (as far as I can tell from those docs).


Adding a single dev and a single stage node should be *relatively* straightforward, provided the basic webapp stuff is actually puppetized. They could potentially be exactly the same, just with -dev having a cron to update itself (stage would be just like prod, just to provide a way to test a deployment before actually deploying to prod).


One thing I don't know about is dependencies. Can anyone provide some documentation on what this app requires? Is everything local to a web node in puppet, and are the external dependencies (bugzilla, mysql, whatever) documented somewhere?
Assignee: server-ops-releng → server-ops-webops
Component: Server Operations: RelEng → Server Operations: Web Operations
QA Contact: arich → cshields
I set this up - I had a meeting with dev services a while back about it, but they didn't want to take it on.

Both nodes are puppetized, but I don't think I added an /etc/motd to the files section.

Yes, pushes have been manual so far.

There's no website - the webheads aren't even running Apache.  Which makes it a bit odd for webops :)

The better docs are on mana - kinda surprised you didn't look there!
  https://mana.mozilla.org/wiki/display/websites/Autoland

Yeah, setting up new environments should be easy, although I'm not sure it's worth it.
We do plan to have a web component for this though, both a dashboard viewable to users and an API that we can script to.
Whiteboard: [pending triage]
@dustin: why do you say you're not sure it's worth having dev/stage environments? I don't have any prior knowledge of this service, so I'm not in a position to say how much it would or wouldn't benefit.

I'm perfectly willing to bow to RelOps' input on this... if you think it's not worthwhile, we can WONTFIX this. :)
I'd actually defer to Lukas on the need for dev/stage - realistically, how much work is this going to see over the next, say, 6mo, and how likely is it that such work will need to be segregated into environments, vs. just going into the only-sorta-working production environment?

The work involved is minimal, but I hate to think we'd spin up two new hosts that won't be used.  That's why I had stalled on this initially.
Priority: -- → P4
Whiteboard: [pending triage] → [triaged 20120907][waiting][releng]
Lukas we need feedback from you on this before it will get moved into a work queue.

Thanks
autoland1 went down after failing to get a dhcp lease. After bringing it back up manually via the console it is barfing on puppet/package configs. It's also way out of date on package updates.

Trying to get dhcp help/relay addresses fixed to bring these boxes back up properly.

info: Applying configuration version '48962'
err: /Stage[main]/Webapp::Python/Package[python-libs]/ensure: change from 2.6.6-29.el6_3.3 to 2.6.6-29.el6_2.2 failed: Could not update: Execution of '/usr/bin/yum -d 0 -e 0 -y downgrade python-libs-2.6.6-29.el6_2.2' returned 1: Error: Package: python-libs-2.6.6-29.el6_2.2.x86_64 (rhel-x86_64-server-6)
           Requires: python = 2.6.6-29.el6_2.2
           Installed: python-2.6.6-29.el6_3.3.x86_64 (@rhel-x86_64-server-6)
               python = 2.6.6-29.el6_3.3
           Available: python-2.6.5-3.el6.i686 (rhel-x86_64-server-6)
               python = 2.6.5-3.el6
           Available: python-2.6.5-3.el6_0.2.i686 (rhel-x86_64-server-6)
               python = 2.6.5-3.el6_0.2
           Available: python-2.6.6-20.el6.x86_64 (rhel-x86_64-server-6)
               python = 2.6.6-20.el6
           Available: python-2.6.6-29.el6.x86_64 (mozilla)
               python = 2.6.6-29.el6
           Available: python-2.6.6-29.el6_2.2.x86_64 (rhel-x86_64-server-6)
               python = 2.6.6-29.el6_2.2
 You could try using --skip-broken to work around the problem
 You could try running: rpm -Va --nofiles --nodigest
 at /etc/puppet/modules/webapp/manifests/python.pp:21
err: /Stage[main]/Webapp::Python/Package[python]/ensure: change from 2.6.6-29.el6_3.3 to 2.6.6-29.el6_2.2 failed: Could not update: Execution of '/usr/bin/yum -d 0 -e 0 -y downgrade python-2.6.6-29.el6_2.2' returned 1: Error: Package: python-libs-2.6.6-29.el6_3.3.x86_64 (@rhel-x86_64-server-6)
           Requires: python = 2.6.6-29.el6_3.3
           Removing: python-2.6.6-29.el6_3.3.x86_64 (@rhel-x86_64-server-6)
               python = 2.6.6-29.el6_3.3
           Downgraded By: python-2.6.6-29.el6_2.2.x86_64 (rhel-x86_64-server-6)
               python = 2.6.6-29.el6_2.2
           Available: python-2.6.5-3.el6.i686 (rhel-x86_64-server-6)
               python = 2.6.5-3.el6
           Available: python-2.6.5-3.el6_0.2.i686 (rhel-x86_64-server-6)
               python = 2.6.5-3.el6_0.2
           Available: python-2.6.6-20.el6.x86_64 (rhel-x86_64-server-6)
               python = 2.6.6-20.el6
           Available: python-2.6.6-29.el6.x86_64 (mozilla)
               python = 2.6.6-29.el6
 You could try using --skip-broken to work around the problem
 You could try running: rpm -Va --nofiles --nodigest
 at /etc/puppet/modules/webapp/manifests/python.pp:21
Wrong bug for that, but I'll take a look.  The host was only built, what, 5 months ago? How can it be that out of date?
Component: Server Operations: Web Operations → WebOps: Other
Product: mozilla.org → Infrastructure & Operations
Autoland is no more. I will be filing a bug to decommission the prod server. Therefore there is no need to complete this work.

Thanks everybody.
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → WONTFIX
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in before you can comment on or make changes to this bug.