Closed
Bug 810393
Opened 13 years ago
Closed 12 years ago
deploy release runner
Categories
(Release Engineering :: Release Automation, defect, P2)
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: bhearsum, Assigned: rail)
References
Details
(Whiteboard: [shipit])
Attachments
(1 file)
|
637 bytes,
patch
|
bhearsum
:
review+
|
Details | Diff | Splinter Review |
We need to deploy the release runner portion of the kickoff system. Our current idea is to run it on buildbot-master36.
| Assignee | ||
Updated•13 years ago
|
Whiteboard: [kickoff]
| Assignee | ||
Updated•13 years ago
|
Priority: -- → P2
| Assignee | ||
Comment 1•13 years ago
|
||
I deployed the current dev version on bm36.
ssh cltbld@buildbot-master36
# install supervisord
su -
yum install supervisor
chkconfig supervisord on
# Add the following section to /etc/supervisor.conf:
[program:releaserunner]
command=/home/cltbld/release-runner/build-tools/buildfarm/release/release-runner.sh
exitcodes=0
user=cltbld
log_stderr=true
log_stdout=true
redirect_stderr=true
stdout_logfile=/var/log/supervisor/release-runner.log
# as cltbld
cd ~
mkdir release-runner && release-runner
virtualenv-2.6 --no-site-packages $PWD/venv
sourece venv/bin/activate
pip install simplejson
pip install fabric
pip install buildbot
git clone git://github.com/rail/build-tools.git
cd build-tools
git checkout -b release-runner origin/release-runner-comments
# Set up ~cltbld/.release-runner.ini
service start supervisord
| Reporter | ||
Comment 2•13 years ago
|
||
This isn't quite working....here's the problems I've encountered:
* Killing supervisord doesn't terminate release runner (at least, not when it's doing something like cloning a repository). I end up with the .sh process dead and a release-runner.py in sleep state. It eventually dies, presumably after it finishes retrying its current operation
* Not getting all of the output from release runner. We get a lot of output when polling release kickoff, but almost nothing when cloning repositories. For example:
==> /var/log/supervisor/release-runner.log <==
Buildbot version: 0.8.7p1
Twisted version: 12.3.0
2012-12-28 11:42:32,666 - DEBUG - Fetching release requests
2012-12-28 11:42:32,712 - INFO - Got a new release request: {'status': 'Pending', 'product': 'fennec', 'name': 'Fennec-20.0-build1', 'dashboardCheck': False, 'buildNumber': 1, 'ready': True, 'l10nChangesets': '{\r\n "ca": {\r\n "revision": "3e911ef81869",\r\n "platforms": ["android"]\r\n },\r\n "cs": {\r\n "revision": "4d8963178613",\r\n "platforms": ["android", "android-multilocale"]\r\n },\r\n "da": {\r\n "revision": "65d017f4f5fe",\r\n "platforms": ["android", "android-multilocale"]\r\n },\r\n "de": {\r\n "revision": "d6ff03c97175",\r\n "platforms": ["android", "android-multilocale"]\r\n },\r\n "es-AR": {\r\n "revision": "03c68802bdc4",\r\n "platforms": ["android"]\r\n },\r\n "es-ES": {\r\n "revision": "d2185a7e4f7f",\r\n "platforms": ["android", "android-multilocale"]\r\n },\r\n "fi": {\r\n "revision": "68cb3de2b609",\r\n "platforms": ["android", "android-multilocale"]\r\n },\r\n "fr": {\r\n "revision": "6513cd7d17ae",\r\n "platforms": ["android", "android-multilocale"]\r\n },\r\n "fy-NL": {\r\n "revision": "0bb46089e086",\r\n "platforms": ["android"]\r\n },\r\n "ga-IE": {\r\n "revision": "b611f1be732c",\r\n "platforms": ["android"]\r\n },\r\n "gl": {\r\n "revision": "6a40cee822a6",\r\n "platforms": ["android"]\r\n },\r\n "it": {\r\n "revision": "d986d54e6074",\r\n "platforms": ["android", "android-multilocale"]\r\n },\r\n "ja": {\r\n "revision": "066c7401bfdc",\r\n "platforms": ["android", "android-multilocale"]\r\n },\r\n "ko": {\r\n "revision": "de1302d7a7f9",\r\n "platforms": ["android", "android-multilocale"]\r\n },\r\n "lt": {\r\n "revision": "c2087557d498",\r\n "platforms": ["android"]\r\n },\r\n "nb-NO": {\r\n "revision": "1a967b024168",\r\n "platforms": ["android", "android-multilocale"]\r\n },\r\n "nl": {\r\n "revision": "3aec3a20b84b",\r\n "platforms": ["android", "android-multilocale"]\r\n },\r\n "pa-IN": {\r\n "revision": "4afbb88e0ccf",\r\n "platforms": ["android"]\r\n },\r\n "pl": {\r\n "revision": "9ba4c01e429f",\r\n "platforms": ["android", "android-multilocale"]\r\n },\r\n "pt-BR": {\r\n "revision": "f230d9a3c797",\r\n "platforms": ["android", "android-multilocale"]\r\n },\r\n "pt-PT": {\r\n "revision": "25ed21fde549",\r\n "platforms": ["android", "android-multilocale"]\r\n },\r\n "ru": {\r\n "revision": "ccfccb0a343d",\r\n "platforms": ["android", "android-multilocale"]\r\n },\r\n "sk": {\r\n "revision": "e371468ef6c9",\r\n "platforms": ["android"]\r\n },\r\n "sl": {\r\n "revision": "07442918d3aa",\r\n "platforms": ["android"]\r\n },\r\n "uk": {\r\n "revision": "e54754531f14",\r\n "platforms": ["android"]\r\n },\r\n "zh-CN": {\r\n "revision": "65723862ba6d",\r\n "platforms": ["android"]\r\n },\r\n "zh-TW": {\r\n "revision": "a5dac614368e",\r\n "platforms": ["android"]\r\n }\r\n}\r\n', 'version': '20.0', 'branch': 'releases/mozilla-beta', 'submitter': 'rail', 'mozillaRevision': 'default', 'complete': False}
warning: hg.mozilla.org certificate not verified (check web.cacerts config setting)
pulling from https://hg.mozilla.org/users/bhearsum_mozilla.com/buildbot-configs
searching for changes
no changes found
0 files updated, 0 files merged, 0 files removed, 0 files unresolved
abort: no suitable response from remote hg!
The latter makes it very difficult to debug other problems.
| Reporter | ||
Comment 3•12 years ago
|
||
Release runner hit an ISE 500 on the 24th and e-mailed us to say that it will sleep for 259200 seconds before retry. A few problems here:
1) ISE 500 from the server shouldn't necessarily cause a sleep like that. I think you fixed this already by retrying requests to ship it?
2) Nothing in the server log indicating that it would sleep - you'd only know that if you got the e-mail.
3) More than 259200 seconds (3 days) have gone by since then, and it's still not polling again.
| Assignee | ||
Comment 4•12 years ago
|
||
(In reply to Ben Hearsum [:bhearsum] from comment #3)
> Release runner hit an ISE 500 on the 24th and e-mailed us to say that it
> will sleep for 259200 seconds before retry. A few problems here:
> 1) ISE 500 from the server shouldn't necessarily cause a sleep like that. I
> think you fixed this already by retrying requests to ship it?
This should be fixed by bug 833062.
> 2) Nothing in the server log indicating that it would sleep - you'd only
> know that if you got the e-mail.
The attached patch prints a line which goes to logs.
> 3) More than 259200 seconds (3 days) have gone by since then, and it's still
> not polling again.
Hmmm, not sure what happens here... I suspect supervisord not restarting the process when it exits 0. I'll try reproduce this.
Attachment #708142 -
Flags: review?(bhearsum)
| Reporter | ||
Updated•12 years ago
|
Attachment #708142 -
Flags: review?(bhearsum) → review+
| Assignee | ||
Comment 5•12 years ago
|
||
I added autorestart=true to the config, looks like it solves the problem 3), at least it worked fine with a simple test script.
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
| Reporter | ||
Updated•12 years ago
|
Whiteboard: [kickoff] → [shipit]
Updated•12 years ago
|
Product: mozilla.org → Release Engineering
You need to log in
before you can comment on or make changes to this bug.
Description
•