Closed
Bug 731763
Opened 12 years ago
Closed 12 years ago
Set up production autoland instance
Categories
(mozilla.org Graveyard :: Server Operations: Projects, task)
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: mjessome, Assigned: dustin)
References
Details
(Whiteboard: [tools])
Attachments
(1 file)
114 bytes,
text/plain
|
Details |
We can puppetize on releng puppet servers. I think we had said: 1 VM to run our bugzilla scraping tool / producer, autoland_queue. 1 VM for running our worker instances, hgpusher. We currently have 2 staging servers, autoland-staging{01,02} we can drop one of those once this is set up, but will still need a staging server. We also need access to a rabbitmq server, schedulerDb and releng MySQL server.
Updated•12 years ago
|
Whiteboard: [autoland][tools]
Updated•12 years ago
|
Whiteboard: [autoland][tools] → [tools]
Assignee | ||
Comment 1•12 years ago
|
||
Docs https://wiki.mozilla.org/BugzillaAutoLanding From talking to Amy, I think it will be best to set this up *outside* of the build network. Nothing here (including schedulerdb) is within the build network, so there's no access loss involved. Per our discussions yesterday, production will wait for scl3, which means in the next few weeks as the remaining requirements appear in that datacenter. Staging's already running, so let's keep rolling with that. Those are in scl1, so we can re-create staging in scl3 at our leisure. The pieces I see here are: - VMs - waiting on scl3 ESX - schedulerdb access - just ACL changes - "releng MySQL server" I'm reading AutolandDB, a different DB on the same cluster, so this is free - bugzilla API - unrestricted, no issues - rabbitmq - see below - I'll get a page up in the websites space documenting this, pointing to the wiki page linked above. - monitoring - we'll need some details on how best to monitor this, what to do when it breaks Open questions: Brandon: this won't really be a web app -- it just quietly does its thing -- but it needs rabbtimq. Can it use a vhost on the generic cluster's staging/prod rabbitmq instances? If it's any consolation, in a later phase it may have at least a small dashboard/management UI that would probably fit well in the generic cluster. Marc: can you verify that your AMQP client library can connect to a list of servers sequentially, and that it can set the necessary flags for HA messages? I know Catlee's had trouble with this. I expect these will go on the DMZ VLAN. Does anyone know differently?
Assignee: server-ops-releng → dustin
Comment 2•12 years ago
|
||
AIUI the VM infra in SCL3 is up now. What's the ETA for these VMs?
Assignee | ||
Comment 3•12 years ago
|
||
Still waiting for fb from Brandon and Marc.
Reporter | ||
Comment 4•12 years ago
|
||
It isn't a problem to connect to a list of servers in sequence. Looking into the documentation for our library (pika), it seems like it does support the declaration of HA/mirrored queues. I've tested out declaring a queue in such a manner, and it works however I do not have a cluster to properly try it on.
Assignee | ||
Comment 5•12 years ago
|
||
You can try it against rabbit-[a..h].build.mozilla.org. It'd be good if you can also try it against rabbit1-dev -- we've been stalled on catlee for a few weeks waiting for him to test that.
Comment 6•12 years ago
|
||
Dustin, I missed that you were waiting on input from me :/ I'll ping you on irc today.
Assignee | ||
Comment 7•12 years ago
|
||
From IRC, using the dev/staging/prod generic clusters' rabbitmq instances are fine. Brandon, can you set up a single vhost, named say /autoland, with a single user ("autoland" has a nice ring to it) and a password to be communicated to Marc? I'll open a bug with Dan to get the VMs set up.
Comment 8•12 years ago
|
||
So we normally make a vhost with the env name it and the username matches. I've setup vhosts for -dev and -stage, info below -dev Host: generic-celery1.dev.seamicro.phx1.mozilla.com Username: autoland_dev -stage Host: generic-celery1.stage.seamicro.phx1.mozilla.com Username: autoland_stage Passwords will be relayed via IRC
Assignee | ||
Comment 9•12 years ago
|
||
(In reply to Dustin J. Mitchell [:dustin] from comment #7) > I'll open a bug with Dan to get the VMs set up. That's bug 736650. Marc, please have a look and make sure the specs are OK.
Comment 10•12 years ago
|
||
(In reply to Dustin J. Mitchell [:dustin] from comment #1) > Docs > https://wiki.mozilla.org/BugzillaAutoLanding > > From talking to Amy, I think it will be best to set this up *outside* of the > build network. Nothing here (including schedulerdb) is within the build > network, so there's no access loss involved. > These will need to be in the build network since the machines will have a user account on them that has permissions to push to scm_level3 hg repos as well as to make changes to bugzilla and eventually buildapi.
Assignee | ||
Comment 11•12 years ago
|
||
None of that requires being in the build network.
Assignee | ||
Comment 12•12 years ago
|
||
From an IRC conversation, there's been a secreview based on information different from what I was given two weeks ago - namely, that the hosts would be in the build network. Putting the hosts there means: * stopping bug 736650 until I'm back online * using limited build network VM resources * placing app further from the view of the SREs Lukas told me this is a Q1 goal and that getting the hosts up quickly is critical, which I take to mean that she will re-visit the secreview to see if the network isolation was required, and that we will continue deploying these hosts as planned. I haven't heard back in IRC to verify either of those.
Assignee | ||
Comment 13•12 years ago
|
||
(In reply to Dustin J. Mitchell [:dustin] from comment #1) > I expect these will go on the DMZ VLAN. Does anyone know differently? I was wrong here -- DMZ's not the right place for boxes with sensitive info like lukas mentioned in comment 10. I *think* this means the private VLAN, but I'd like verification -- fox2mike? justdave?
Comment 14•12 years ago
|
||
(In reply to Dustin J. Mitchell [:dustin] from comment #12) > From an IRC conversation, there's been a secreview based on information > different from what I was given two weeks ago - namely, that the hosts would > be in the build network. Putting the hosts there means: > > * stopping bug 736650 until I'm back online > * using limited build network VM resources > * placing app further from the view of the SREs > > Lukas told me this is a Q1 goal and that getting the hosts up quickly is > critical, which I take to mean that she will re-visit the secreview to see > if the network isolation was required, and that we will continue deploying > these hosts as planned. I haven't heard back in IRC to verify either of > those. After discussing with Amy, this is not necessary - please continue as planned with getting these VMs online. This is indeed a Q1 goal and not only do we need these VMs up but we will need a point person to work with Marc to get our puppet manifests to work with the IT puppet set up - also does IT have a secure password storage location that can store the user info for autoland?
Assignee | ||
Comment 15•12 years ago
|
||
I'd be the point person if I was around. Someone else may be able to substitute, or this may need to wait until I'm available. Yes, we have password storage (via GPG).
Comment 16•12 years ago
|
||
Not sure why this was referred to me for questions (other than just tribal knowledge or something) but here's my take on it. This is based on the understanding that: 1) This system contains user accounts that shouldn't be accessible to the Internet, which means it shouldn't live in the DMZ, and won't have inbound Internet access except via VPN 2) This system should also not be accessible to community build helpers who access via the build VPN. This sounds like it should either be in vlan72 (corporate resources) or we need a new VPN for non-community administrative resources for build.
Comment 17•12 years ago
|
||
or vlan275? I'm told that might be separate from build-vpn as well.
Assignee | ||
Comment 18•12 years ago
|
||
I'd prefer vlan72 (corp.scl3). I'll cross-post that to bug 736650.
Reporter | ||
Comment 19•12 years ago
|
||
Got into a discussion today about inodes/disk space and I'm starting to second guess the 20G disk space on these for production. I'll explain the situation: On our worker VM, we keep a clean clone of each supported branch to lower external resource usage, and increase our efficiency. At this moment, we can foresee supporting mozilla-central, mozilla-inbound, mozilla-beta, mozilla-aurora, and mozilla-esr. In addition to these clean clones, for each instance of our worker tool (we hope to run 2), we keep a working copy of the repository that is being worked on. This means that we are looking at needing to store at least 7 clones at any point in time. A clone & checkout of mozilla-central itself takes ~1.4G, and has somewhere around 160K files. This means inodes _will_ run out, ----- [root@autoland-staging02 clean]# du -hs mozilla-central 1.4G mozilla-central [root@autoland-staging02 clean]# find mozilla-central/ | wc -l 160228 ------ Note that this is only for one of the VMs, since we have 1 worker VM which needs these repositories, and another which does not store any repositories.
Comment 20•12 years ago
|
||
So how much disk space do you need?
Reporter | ||
Comment 21•12 years ago
|
||
I would say 20G will be good on one, and 50G for the other. I think that should allow for enough overhead as to not run into trouble. Thanks.
Comment 22•12 years ago
|
||
(In reply to Marc Jessome[:mjessome] from comment #21) > I would say 20G will be good on one, and 50G for the other. I think that > should allow for enough overhead as to not run into trouble. Thanks. If it's not too much space hogging, I'd like if we created both with 50GB. It would be great to have a consistent ref image for these vms as we are going to add more builders to be hg_pushers over time.
Assignee | ||
Comment 24•12 years ago
|
||
(In reply to Lukas Blakk [:lsblakk] from comment #22) > If it's not too much space hogging, I'd like if we created both with 50GB. > It would be great to have a consistent ref image for these vms as we are > going to add more builders to be hg_pushers over time. We don't use refimages. This will be done in puppet. Speaking of which, if you can point me to the necessary pieces in build/puppet-manifests, I can start working on that in sysadmins puppet.
Reporter | ||
Comment 25•12 years ago
|
||
The puppet manifests were never landed on build/puppet-manifests since we realized that we would be using the sysadmins puppet. I have it in my user repo at: https://hg.mozilla.org/users/mjessome_mozilla.com/puppet-manifests/ Files of note are: buildmaster-production.pp - Where the machine instances are used. modules/autoland/* - instance & dep information, config files & templates. secrets.pp.template - template for password storage. bug 723998 has the patch if you'd want to see that. With regards to the secrets.pp.template, we will need to talk about how passwords should be stored, and how I can get those to you.
Comment 26•12 years ago
|
||
The VMs have been created. Let me know if there's anything else I need to do. autoland1.corpdmz.scl3.mozilla.com autoland2.corpdmz.scl3.mozilla.com
Assignee | ||
Comment 27•12 years ago
|
||
No, sir. I'll get them set up with puppet!
Assignee | ||
Comment 28•12 years ago
|
||
Marc -- I had a look at the puppet manifests. The normal practice in infra is to use RPMs to install python libraries to site-packages, rather than virtualenvs, so I'll cook that up. As for password storage, sysadmins puppet doesn't have the equivalent to secrets.pp. I think that the best solution will be to add a local.ini with passwords in it on each machine, by hand. We'll keep a backup copy of the necessary passwords in our password store for easy re-creation in a disaster. I assume I can find those values on autoland-staging02 now, so that should be sufficient. On a side node, I notice that you're using 'source ...bin/activate' in a lot of places -- keep in mind that for virtualenvs, activate is just a convenient way to edit PATH in your interactive shell, and isn't useful for automated stuff. Instead of command=/bin/sh -c 'source <%=basedir%>/bin/activate && cd <%=script_dir%> && python <%=script_dir%>/hgpusher.py' you should use command=/bin/sh -c '<%= basedir %>/bin/python <%=script_dir%>/hgpusher.py'
Assignee | ||
Comment 29•12 years ago
|
||
https://hg.mozilla.org/users/mjessome_mozilla.com/puppet-manifests/file/a2bc3ef7f1c5/modules/autoland/manifests/instance.pp#l47 lists: - SQLAlchemy==0.7.2 - argparse==1.2.1 - mercurial==1.9.1 - pysqlite==2.6.3 - pika==0.9.5 - simplejson==2.1.6 - urllib3==1.0.2 - python-ldap==2.3.13 - Twisted==11.0.0 - MySQL-python==1.2.3 - requests==0.10.8 all for Python-2.6. I need an assist building RPMs for these. Jeremy, I hear you have an egg-to-rpm tool?
Assignee | ||
Comment 30•12 years ago
|
||
(In reply to Dustin J. Mitchell [:dustin] from comment #29) > - SQLAlchemy==0.7.2 needed > - argparse==1.2.1 /data/mrepo/www/6-x86_64/RPMS.epel/python-argparse-1.2.1-2.el6.noarch.rpm > - mercurial==1.9.1` needed > - pysqlite==2.6.3 not sure why you need this with Python-2.6, but OK :) > - pika==0.9.5 /data/mrepo/www/6-x86_64/RPMS.epel/python-pika-0.9.5-2.el6.noarch.rpm > - simplejson==2.1.6 /data/mrepo/www/6-x86_64/RPMS.mozilla/simplejson-2.3.2-1.x86_64.rpm (if the different version is OK? It's simplejson, after all!) > - urllib3==1.0.2 needed > - python-ldap==2.3.13 needed > - Twisted==11.0.0 needed > - MySQL-python==1.2.3 /data/mrepo/www/6-x86_64/RPMS.updates/MySQL-python-1.2.3-0.3.c1.1.el6.x86_64.rpm > - requests==0.10.8 needed
Assignee | ||
Comment 31•12 years ago
|
||
After some discussion, this will be more flexible -- and more similar to other webops deployments -- if the python libraries that aren't copmiled are in a vendor directory, similar to https://github.com/mozilla/balrog/tree/master/vendor Of what remains, pysqlite isn't required on 2.6 (it's built in). So that leaves MySQL-python, in the RPMS.updates repo, and Twisted, for which I'll build an RPM and add it to mrepo.
Assignee | ||
Comment 32•12 years ago
|
||
The Twisted package is created and in mrepo now, and the spec file is checked into subversion. Marc's working on a vendor lib for the others. I'll work on puppet manifests.
Assignee | ||
Comment 33•12 years ago
|
||
Rather than a local 'autoland' user, I'm going to use the 'autolanduser' user already in LDAP. Also, your manifests list rabbitmq-server, but from the above you don't need a rabbitmq server. That brings lots of deps with it, so I'd rather leave it out. Is that OK? I haven't converted autoland::instance yet, so I may have more questions tomorrow.
Reporter | ||
Comment 34•12 years ago
|
||
That isn't a problem leaving out rabbitmq-server, since we'll be using what is provided. The switch to "autolanduser" is not a problem.
Assignee: dustin → nobody
Component: Server Operations: RelEng → Server Operations: Projects
QA Contact: arich → mrz
Reporter | ||
Comment 35•12 years ago
|
||
I've finished the vendor libs. This also makes me think that we can get rid of virtualenv that was being used in my puppet setup. Looking into it, it seems like the python-ldap module is also compiled, so we will require an RPM for that as well.
Assignee | ||
Updated•12 years ago
|
Assignee: nobody → dustin
Assignee | ||
Updated•12 years ago
|
Summary: Set up VMs for autoland → Set production autoland instance
Assignee | ||
Updated•12 years ago
|
Summary: Set production autoland instance → Set up production autoland instance
Assignee | ||
Comment 36•12 years ago
|
||
(In reply to Marc Jessome[:mjessome] from comment #34) > That isn't a problem leaving out rabbitmq-server, since we'll be using what > is provided. The switch to "autolanduser" is not a problem. Awesome x 2 (In reply to Marc Jessome[:mjessome] from comment #35) > I've finished the vendor libs. This also makes me think that we can get rid > of virtualenv that was being used in my puppet setup. Indeed - I'll work that into the autoland::instance changes. > Looking into it, it seems like the python-ldap module is also compiled, so > we will require an RPM for that as well. Good point - I'll get that.
Reporter | ||
Comment 37•12 years ago
|
||
Here is a list of the differentiation between the two VMs that we'll have: The bugzilla poller: (runs autoland_queue and schedulerDbPoller) - This is the 20G VM - Requires the cron job that runs schedulerDbPoller - Requires supervisord.conf.erb to contain [program:autoland_queue] section - Requires netflow to tm-b01-slave01 (for access to the scheduler database) - The hg facing tools: (runs hgpushers) - This is the 50G VM - Requires supervisord.conf.erb to contain [program:hgpusher] section - requires ssh keys - requires hgrc As for configuration, it can be kept the same in both unless there are issues of not wanting unnecessary secrets on a second machine. In that case, secrets could be flagged out in same way as will be done in supervisord.conf.erb .
Reporter | ||
Comment 38•12 years ago
|
||
The outline for what is required in the secrets.ini file. I haven't included variable names since I'm not sure what will be used in the end.
Assignee | ||
Comment 39•12 years ago
|
||
OK, I think we're at a good stopping point for the day. The puppet manifests are in and working, so here's what's left: * get secrets.ini and the ssh key on the machine - I can do this when Marc's around tomorrow * land the code in hg.m.o/build/tools * figure out the right instance parameters. I have autoland::instance { "/data/autoland/autoland-env": code_tag => "staging", user => "autolanduser", attachment_url => "https://landfill.bugzilla.org/bzapi_sandbox/attachment.cgi?id=", api_url => "https://api-dev.bugzilla.mozilla.org/test/latest/", bz_url => "https://landfill.bugzilla.org/show_bug.cgi?id=", config_flags => "staging=1", poll_schedulerdb => false; } * fire it up and see what happens :) we'll get back to it tomorrow.
Assignee | ||
Comment 40•12 years ago
|
||
I should add to that list: * flows for MySQL * use rabbitmq as set up by brandon
Assignee | ||
Comment 41•12 years ago
|
||
bug 740006 and bug 740007 for the flows
Assignee | ||
Comment 42•12 years ago
|
||
OK, changes made today: * sorted out secrets and ssh keys, using autoland-staging02 as a source * code changes to use secrets.ini * only start hgpusher if poll_schedulerdb is false, and only start autoland_queue if it's true * set up both instances, using instance config from mjessome * used code from github, which is not actually accessible from corpdmz, so this will need to change (to hg) before this goes into production. * code changes to specify rabbitmq host in config And then we discovered that the rabbitmq hosts are in phx1, and the VMs are in scl3. Which isn't going to work. The easiest fix is to move the hosts to phx1, I think. I'll open a bug for that, and update the flow and ACL bugs.
Assignee | ||
Comment 43•12 years ago
|
||
bug 740198 for the move
Assignee | ||
Comment 44•12 years ago
|
||
I also added try_syntax=-b do -p all -u none -t none to config.ini, and built an SQLAlchemy RPM.
Reporter | ||
Comment 45•12 years ago
|
||
Would it be possible to see a generated version of the config at this point? I would like to make sure that there isn't anything else missing from there.
Assignee | ||
Comment 46•12 years ago
|
||
(done in irc)
Reporter | ||
Comment 47•12 years ago
|
||
Added the dependency for the move to phx1 move. Next steps are: - find a permanent home for staging repository - land production code to hg.m.o/build/tools - test the puppet deployment
Assignee | ||
Updated•12 years ago
|
Assignee | ||
Comment 49•12 years ago
|
||
Marc, can you get the production and staging code landed today? Once the flows are in, we should be ready to fire this up and see how it works.
Reporter | ||
Comment 50•12 years ago
|
||
Dustin: I'm currently working on that and hopefully will have it soon. Since nothing in hg.m.o/build/tools uses autoland code, we're going to land to a separate repository, build/autoland. This means that we'll be using a "production" tag which should be checked out on the production master, and tip on staging. One thing that comes up with this is that autoland depends on build/tools, so we need that checked out and made available. Would cloning hg.m.o/build/tools and then uaing the setup.py provided there to install the libraries be a good way of doing this? Some things that need to be added to config.ini: [bz] webui_url=https://bugzilla-stage-tip.mozilla.org/jsonrpc.cgi webui_login=autoland-try@mozilla.bugs [ldap] branch_api=https://hg.mozilla.org/repo-group?repo=/ secrets.ini: [bz] webui_password=password
Assignee | ||
Comment 51•12 years ago
|
||
I added bug 742495 to get mrepo to work, as it's currently breaking puppet.
Assignee | ||
Comment 52•12 years ago
|
||
Config changes: commit 2c2c48913a9c5a6630fcbc093844cf1a4cded4c4 Author: Dustin J. Mitchell <dustin@mozilla.com> Date: Wed Apr 4 14:55:56 2012 -0500 Bug 731763: more configuration for autoland diff --git a/modules/autoland/templates/config.ini.erb b/modules/autoland/templates/config.ini.erb index b115cce..a4eb21a 100644 --- a/modules/autoland/templates/config.ini.erb +++ b/modules/autoland/templates/config.ini.erb @@ -9,6 +9,7 @@ base_url=ssh://hg.mozilla.org/ username=autolanduser@mozilla.com # this ssh key is not handled by puppet and must be installed manually ssh_key=/home/autoland/.ssh/id_rsa +branch_api=https://hg.mozilla.org/repo-group?repo=/ [mq] host=<%= mq_host %> @@ -27,6 +28,8 @@ api_url=<%=api_url%> url=<%=bz_url%> # poll frequency in seconds poll_frequency=180 +webui_url=https://bugzilla-stage-tip.mozilla.org/jsonrpc.cgi +webui_login=autoland-try@mozilla.bugs [ldap] bind_dn=autolanduser,ou=logins,dc=mozilla diff --git a/modules/autoland/templates/secrets.ini.template b/modules/autoland/templates/secrets.ini.template index 6353eaa..bfddd28 100644 --- a/modules/autoland/templates/secrets.ini.template +++ b/modules/autoland/templates/secrets.ini.template @@ -9,3 +9,6 @@ scheduler_db_url= [mq] password= + +[bz] +webui_password=password ---- (and I'll get the appropriate secret installed locally)
Assignee | ||
Comment 53•12 years ago
|
||
I also changed the manifests to checkout $basedir/tools (-r default) and $basedir/autoland (-r $autoland_tag). The tools dir is provided in the config file in [defaults] tools=... I can't test this until puppet starts running again, but it's most likely ready to roll.
Assignee | ||
Comment 54•12 years ago
|
||
OK, I made some significant changes today on :solarce's advice. This now looks *very* much like a webapp, which will make webops and the SREs happy. At least, that's the idea. That means that autoland can be deployed using this process: https://mana.mozilla.org/wiki/display/websites/Home and in particular that we can have dev boxes updating automatically every 15 minutes, with staging and prod updated on request (usually with bugs, but we can work that out). The mechanics of the automatic updates aren't quite worked out yet, but it's on my list. At this point, the schedulerDbPoller.py seems to work fine (no errors). autoland_queue fails with Traceback (most recent call last): File "autoland_queue.py", line 31, in <module> db = DBHandler(config['databases_autoland_db_url']) File "/data/www/autoland-service/autoland/utils/db_handler.py", line 18, in __init__ self.scheduler_db_meta.reflect(bind=self.engine) File "/usr/lib64/python2.6/site-packages/sqlalchemy/schema.py", line 2355, in reflect conn = bind.contextual_connect() File "/usr/lib64/python2.6/site-packages/sqlalchemy/engine/base.py", line 2328, in contextual_connect self.pool.connect(), File "/usr/lib64/python2.6/site-packages/sqlalchemy/pool.py", line 209, in connect return _ConnectionFairy(self).checkout() File "/usr/lib64/python2.6/site-packages/sqlalchemy/pool.py", line 370, in __init__ rec = self._connection_record = pool._do_get() File "/usr/lib64/python2.6/site-packages/sqlalchemy/pool.py", line 757, in _do_get return self._create_connection() File "/usr/lib64/python2.6/site-packages/sqlalchemy/pool.py", line 174, in _create_connection return _ConnectionRecord(self) File "/usr/lib64/python2.6/site-packages/sqlalchemy/pool.py", line 255, in __init__ self.connection = self.__connect() File "/usr/lib64/python2.6/site-packages/sqlalchemy/pool.py", line 315, in __connect connection = self.__pool._creator() File "/usr/lib64/python2.6/site-packages/sqlalchemy/engine/strategies.py", line 80, in connect return dialect.connect(*cargs, **cparams) File "/usr/lib64/python2.6/site-packages/sqlalchemy/engine/default.py", line 275, in connect return self.dbapi.connect(*cargs, **cparams) sqlalchemy.exc.OperationalError: (OperationalError) unable to open database file None None which I assume is because the configured URL is sqlite:///data/autoland_live.sqlite - should that be mysql instead? Or somewhere else? All I have in the hgpusher log is 2012-04-05 16:02:19,022 hgpusher main An error occurred: [Errno 13] Permission denied: 'build' Other than fixing the above, my list is: * ganglia * nagios * system to automatically restart supervisord services after deploy * automatic updates for dev environments (waiting until we *have* a dev env) * system docs (I really should have done this months ago)
Assignee | ||
Comment 55•12 years ago
|
||
Oh, we should put the autoland DB on mysql too, right?
Assignee | ||
Comment 56•12 years ago
|
||
Docs added: https://mana.mozilla.org/wiki/display/websites/Autoland
Reporter | ||
Comment 57•12 years ago
|
||
I just saw this now for some reason so sorry for the late reply. (In reply to Dustin J. Mitchell [:dustin] from comment #54) > ... ... ... > > At this point, the schedulerDbPoller.py seems to work fine (no errors). > > autoland_queue fails with > > ... ... ... > > sqlalchemy.exc.OperationalError: (OperationalError) unable to open database > file None None > > which I assume is because the configured URL is > sqlite:///data/autoland_live.sqlite - should that be mysql instead? Or > somewhere else? The database is a sqlite db, and I would be wary of switching over to mysql without a bit of testing and tweaking, since a few things may need to be changed. The file autoland/data/autoland_sqlite.sql needs to be used to generate a new db, `sqlite3 data/autoland_live.sqlite < autoland_sqlite.sql`. > > All I have in the hgpusher log is > > 2012-04-05 16:02:19,022 hgpusher main An error occurred: [Errno > 13] Permission denied: 'build' In the config there is [default] work_dir=build which points to where the working directory should be (contains checked out repos, patch files, etc). This will need to be somewhere writable by the user.
Assignee | ||
Comment 58•12 years ago
|
||
OK, I added a workdir (/data/workdir) to all three services, created by puppet. autoland_queue - updated in config.ini: autoland_db_url=sqlite:////data/workdir/autoland_queue/autoland_live.sqlite - db generated as per above with the result that autoland_queue runs and connects to rabbitmq, but 2012-04-09 15:37:25,159 bz_utils request REQUEST ERROR: <urlopen error [Errno 110] Connection timed out>: https://api-dev.bugzilla.mozilla.org/test/latest/bug/?whiteboard=\[autoland.*\]&whiteboard_type=regex&include_fields=id,whiteboard&username=release@mozilla.com&password=5EgfR97CVW hgpusher - updated in config.ini: work_dir=/data/workdir/hgpusher with the result that it holds steady at: 2012-04-09 15:38:40,541 hgpusher main Working directory: hgpusher.0 schedulerdbpoller - added --cache-dir=$workdir/schedulerdbpoller/cache to the invocation per bug 743001. I can add extra arguments as you add them to the source. with the (expected for now) result IOError: [Errno 13] Permission denied: '/data/www/autoland-service/autoland/schedulerDBpoller.log' Do we need more flows for the Bugzilla API? Can you make a comprehensive flow list, 'cuz I think netops is going to start black-holing my requests soon :)
Comment 59•12 years ago
|
||
marking as moco confidential because of password leakage
Group: mozilla-corporation-confidential
Reporter | ||
Comment 60•12 years ago
|
||
(In reply to Dustin J. Mitchell [:dustin] from comment #58) > OK, I added a workdir (/data/workdir) to all three services, created by > puppet. > > autoland_queue > - updated in config.ini: > > autoland_db_url=sqlite:////data/workdir/autoland_queue/autoland_live.sqlite > - db generated as per above > with the result that autoland_queue runs and connects to rabbitmq, but > 2012-04-09 15:37:25,159 bz_utils request REQUEST ERROR: <urlopen > error [Errno 110] Connection timed out>: > https://api-dev.bugzilla.mozilla.org/test/latest/bug/?whiteboard=\[autoland. > *\]&whiteboard_type=regex&include_fields=id, > whiteboard&username=release@mozilla.com&password=5EgfR97CVW > I've got a new password which I'll have to get to you for the config file. This is an issue on the BzApi end, and has affected the current live setup as well. > > hgpusher > - updated in config.ini: > work_dir=/data/workdir/hgpusher > with the result that it holds steady at: > 2012-04-09 15:38:40,541 hgpusher main Working directory: hgpusher.0 > > > schedulerdbpoller > - added --cache-dir=$workdir/schedulerdbpoller/cache to the invocation per > bug 743001. I can add extra arguments as you add them to the source. > with the (expected for now) result > IOError: [Errno 13] Permission denied: > '/data/www/autoland-service/autoland/schedulerDBpoller.log' > t > > Do we need more flows for the Bugzilla API? Can you make a comprehensive > flow list, 'cuz I think netops is going to start black-holing my requests > soon :) I'm pretty sure all we need is flows to the SchedulerDb and the rabbitmq hosts. I'm not sure how locked down these machines are, but we also need to be able to access hg, ldap, bugzilla.
Comment 61•12 years ago
|
||
password changed, removing confidential flag
Group: mozilla-corporation-confidential
Assignee | ||
Comment 62•12 years ago
|
||
Ugh, sorry, and thanks. Passwords in URLs :'-( I'll get the new pw from marc and update autoland1/2 (In reply to Marc Jessome[:mjessome] from comment #60) > I'm pretty sure all we need is flows to the SchedulerDb and the rabbitmq > hosts. I'm not sure how locked down these machines are, but we also need to > be able to access hg, ldap, bugzilla. They're very locked-down. Anything not explicitly allowed is forbidden, basically. So, that will be hg.m.o for both ssh and http; the LDAP server in the config; the various bugzilla VIPs in the config (including backup sites), tbpl, and self-serve. I'll see if I can suss out exactly what those flows should be, and get a bug filed.
Assignee | ||
Comment 63•12 years ago
|
||
With the updated password, and flows in, I'm still getting a 400 in the autoland poller logs. The same URL in my browser says "message: Invalid username or password". I verified this is using the new password you supplied. Is this related to the api-dev move?
Reporter | ||
Comment 64•12 years ago
|
||
https://api-dev.bugzilla.mozilla.org/test/latest/ should be https://api-dev.bugzilla.mozilla.org/tip/ I also made sure passwords were changed on /tip/ .
Comment 65•12 years ago
|
||
(In reply to Marc Jessome[:mjessome] from comment #64) > https://api-dev.bugzilla.mozilla.org/test/latest/ > should be > https://api-dev.bugzilla.mozilla.org/tip/ > > I also made sure passwords were changed on /tip/ . Is this for access to the production bmo Bugzilla instead? If so, you probably want https://api-dev.bugzilla.mozilla.org/latest/ instead. If you're testing, /tip/ is fine, but that's probably not the best use of a production service. :)
Assignee | ||
Comment 66•12 years ago
|
||
Are the source code changes ready to roll? And, please clarify which api-dev URL I should use - I'm not clear from the last few comments.
Assignee | ||
Comment 67•12 years ago
|
||
With the latest code, and /tip/, I get 2012-04-12 18:22:26,128 autoland_queue bz_search_handler Flagged for landing on branches: [u'users/mjessome_mozilla.com/mozilla-central', u'try'] 2012-04-12 18:22:26,129 autoland_queue bz_search_handler Branch users/mjessome_mozilla.com/mozilla-central does not exist. 2012-04-12 18:22:26,130 autoland_queue bz_search_handler Branch try does not exist. hgpusher seems happy as before. schedulerdbpoller gives: ConfigParser.NoOptionError: No option 'posted_bugs' in section: 'log' which seems like a new bug! I'd like to get this code into production as-is (with the necessary modifications to support that) before continuing to make changes. Let me know what's next.
Reporter | ||
Comment 68•12 years ago
|
||
posted_bugs will need to be in the [log] section of the config, as is in the new config.ini-dist in the hg repository. It will need to be somewhere writable by the user running schedulerDBpoller.
Reporter | ||
Comment 69•12 years ago
|
||
Just to clarify that it should be a writable file path, so /data/postebugs.log , for example.
Assignee | ||
Comment 70•12 years ago
|
||
schedulerdbpoller looks good -sh-4.1$ cd /data/www/autoland-service/autoland && python schedulerDBpoller.py -b try -c config.ini -u None -p None --verbose --cache-dir=/data/workdir/schedulerbpoller/cache/ [RabbitMQ] Established connection to generic-celery1.stage.seamicro.phx1.mozilla.com. -sh-4.1$ I think we're in good shape? What's next?
Assignee | ||
Comment 71•12 years ago
|
||
A bit more work in IRC brings us to hgpusher trying to clone from https://hg.mozilla.org, based on an entry I had added to the DB on request: 10:25 < mjessome> oh, and we need to enable try branch in the database. If you `sqlite3 /path/to/autoland_live.sqlite`, and "INSERT INTO branches VALUES (1, 'try', 'https://hg.mozilla.org/try', 80, 'enabled', 0, 0);" I just now changed that to http://... I'll file a bug to get the https flow, since that's probably smarter all around.
Reporter | ||
Comment 72•12 years ago
|
||
(In reply to Dustin J. Mitchell [:dustin] from comment #71) > A bit more work in IRC brings us to hgpusher trying to clone from > https://hg.mozilla.org, based on an entry I had added to the DB on request: > > 10:25 < mjessome> oh, and we need to enable try branch in the database. If > you `sqlite3 /path/to/autoland_live.sqlite`, and "INSERT INTO branches > VALUES (1, 'try', 'https://hg.mozilla.org/try', 80, 'enabled', 0, 0);" > > I just now changed that to http://... I'll file a bug to get the https > flow, since that's probably smarter all around. Will this also give ssh:// flows to hg.mozilla.org ? We can't push over https, only pull -- so ssh:// is required for pushing.
Assignee | ||
Comment 73•12 years ago
|
||
SSH (and http) is already set up to hg.m.o.
Reporter | ||
Comment 74•12 years ago
|
||
Any news on those https flows? Could you add it as a blocker? Once that is ready, I hope we can to a test try-landing. To make those test runs a bit easier, is there some way that I can access the logs without having to bug you to pastebin them? Thanks
Assignee | ||
Comment 75•12 years ago
|
||
The https flows were fixed last week. We can probably set something up with screen to look at the logs - look me up in IRC.
Assignee | ||
Comment 76•12 years ago
|
||
We made some small fixes (regarding expansion of ~ in pathnames) and got hgpusher pushing to hg. I'm going to add a bug to get SSH flows into the systems from MPT-VPN and allow mjessome to log in.
Assignee | ||
Comment 77•12 years ago
|
||
mjessome and lsblakk have sudo access on the host now (so just the flow remains - I'll copy you both on that bug).
Reporter | ||
Comment 78•12 years ago
|
||
/home/mjessome/production on autoland1 contains the autoland_live.sqlite and postedbugs.log that can be dropped into "/data/workdir/autoland_queue/autoland_live.sqlite" and "/data/workdir/schedulerdbpoller/postedbugs.log" respectively. production config is located at /home/mjessome/production/config.ini-production, and still contains passwords, I wasn't sure if you'd want to drop the config in or puppetize, so I left them there. I'm not sure how the ssh key is being handled, so just a reminder about it!
Assignee | ||
Comment 79•12 years ago
|
||
:solarce, can you set up a production celery account for this? comment 8 only lists dev and stage.
Comment 80•12 years ago
|
||
This should be ready to use now [root@node339.seamicro.phx1 ~]# rabbitmqctl list_vhosts | grep autoland autoland_prod Password has the same convention of dev and stage
Assignee | ||
Comment 81•12 years ago
|
||
I re-enabled schedulerdbpoller, and moved the hosts to a new prod cluster. I'll need to do some re-jiggering for the cluster change.
Assignee | ||
Comment 82•12 years ago
|
||
OK, this is now in place, and according to the logfiles, it's running fine. Can you verify and close? Things to be done on other bugs: - refactor manifests to sit inside modules/webapp - update docs to talk about puppet configs - include the revision in the update script - dev/staging VMs (probably in scl3) - automatically restart daemons in update-www.sh - nagios and ganglia
Assignee | ||
Comment 83•12 years ago
|
||
So, from autoland2: [root@autoland2.shared.phx1 ~]# wget https://bugzilla.mozilla.org/attachment.cgi?id=618738 --2012-04-26 11:50:59-- https://bugzilla.mozilla.org/attachment.cgi?id=618738 Resolving bugzilla.mozilla.org... 63.245.217.60 Connecting to bugzilla.mozilla.org|63.245.217.60|:443... connected. HTTP request sent, awaiting response... 302 Found Location: https://bug749284.bugzilla.mozilla.org/attachment.cgi?id=618738 [following] --2012-04-26 11:50:59-- https://bug749284.bugzilla.mozilla.org/attachment.cgi?id=618738 Resolving bug749284.bugzilla.mozilla.org... 63.245.217.61 Connecting to bug749284.bugzilla.mozilla.org|63.245.217.61|:443... ^C which, I suspect, means we need moar flows. I'd like to get justdave's input here to try to cast a future-proof network of bugzilla IPs (including the impending scl3 cluster).
Comment 84•12 years ago
|
||
Won't know 'em till we get 'em. But yeah, the attachments are on a different IP because not enough people support SNI yet and it has a different SSL cert.
Assignee | ||
Comment 85•12 years ago
|
||
17:31 < mjessome> dustin: all is working :D I'll get new bugs open for the follow-on stuff. Less than 100 comments, woo!
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Assignee | ||
Comment 86•12 years ago
|
||
Bugs filed for the follow-on: * bug 749469 (puppet implementation fixes) * bug 749470 (better deployment automation) * bug 749471 (dev/staging) I'll be talking to the dev services group about how autoland works, so there's even more shared knowledge.
Updated•9 years ago
|
Product: mozilla.org → mozilla.org Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•