Closed
Bug 505512
Opened 15 years ago
Closed 14 years ago
Make infrastructure related problems turn the tree a color other than red
Categories
(Release Engineering :: General, defect, P5)
Release Engineering
General
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: blassey, Assigned: bhearsum)
References
Details
Attachments
(4 files, 4 obsolete files)
26.87 KB,
patch
|
catlee
:
review+
bhearsum
:
checked-in+
|
Details | Diff | Splinter Review |
6.72 KB,
patch
|
catlee
:
review+
bhearsum
:
checked-in+
|
Details | Diff | Splinter Review |
59.75 KB,
patch
|
catlee
:
review+
bhearsum
:
checked-in+
|
Details | Diff | Splinter Review |
748 bytes,
patch
|
rail
:
review+
mozilla
:
checked-in+
|
Details | Diff | Splinter Review |
We started this discussion at the all hands a while back. It seems that we've had a lot of infrastructure related problems turning the tree red lately and I'm worried that this is numbing developers to seeing the tree red (the same as random oranges numb us to seeing orange). Also, fundamentally the issues need to be addressed by different people. When the build breaks, the developer needs to either fix or back out his or her patch. When the infrastructure fails IT or RelEng need to figure out what the issue is and fix it. Also there is a different lead time in getting the fix landed (10 seconds to back out versus a week or more for a maintenance window).
It was suggested at the all hands that purple would be the most appropriate color since it is used somewhere else to identify infrastructure issues.
Reporter | ||
Comment 2•15 years ago
|
||
I'm going to reopen this bug, since bug 476656 is resolved fixed and we still turn the tree red for infrastructure exceptions. It sounds like its a dependency, not a dupe.
Status: RESOLVED → REOPENED
Resolution: DUPLICATE → ---
Updated•14 years ago
|
Component: Tinderbox → Release Engineering
Product: Webtools → mozilla.org
QA Contact: tinderbox → release
Version: Trunk → other
Assignee | ||
Comment 3•14 years ago
|
||
Yeah, agreed. I haven't seen anyone stepping up to grab this, so I'm bumping the priority.
Priority: -- → P5
Assignee | ||
Comment 4•14 years ago
|
||
Going to try and look at this this quarter
Assignee: nobody → bhearsum
Assignee | ||
Comment 5•14 years ago
|
||
This patch depends on the upstream Buildbot patch here (or something like it): http://github.com/bhearsum/buildbot/commit/ddd6cf1dc2436efcb0b3e70161c24fdafc4dcaf4
I haven't tested this patch, but it should catch most of the HG errors that we hit. I was hoping to avoid creating a bunch of custom BuildStep's, but after writing this patch I realized that we're going to have to copy/paste around all of the calls to regex_log_evaluator unless we do so. Regardless which way we go, the upstream patch would be good to have.
Attachment #466443 -
Flags: feedback?(catlee)
Comment 6•14 years ago
|
||
Comment on attachment 466443 [details] [diff] [review]
simple checking for hg errors
Looks sane. hg_errors needs to be a list of tuples though I think.
Attachment #466443 -
Flags: feedback?(catlee) → feedback+
Assignee | ||
Comment 7•14 years ago
|
||
Here's a more polished version of the previous patch. I tested this locally by changing the repo_path of a build to "mozilla-central2", to cause a 404 error. This caused the build the turn purple (http://tinderbox.mozilla.org/MozillaTest/?noignore=1, in the "OS X 10.5.2 mozilla-central build" column), and be retried.
Builds without errors had no change in behaviour.
I can test this on all platforms in staging if desired, but I think it's safe enough to just land. It depends on this upstream commit: http://github.com/buildbot/buildbot/commit/87fbf3d84711a1d3471ddb36fa16e5eae8bc9464.
Attachment #466443 -
Attachment is obsolete: true
Attachment #468727 -
Flags: review?(catlee)
Comment 8•14 years ago
|
||
Comment on attachment 468727 [details] [diff] [review]
turn builds with hg errors purple (and retry them)
Everything looks ok except for this:
>-class MozillaTryServerHgClone(Mercurial):
>+class MozillaTryServerHgClone(EvaluatingMercurial):
> haltOnFailure = True
> flunkOnFailure = True
>
> def __init__(self, baseURL="http://hg.mozilla.org/", mode='clobber',
> defaultBranch='mozilla-central', timeout=3600, **kwargs):
> # repourl overridden in startVC
> Mercurial.__init__(self, baseURL=baseURL, mode=mode,
> defaultBranch=defaultBranch, timeout=timeout,
You need to update the call to Mercurial.__init__ here I think.
Attachment #468727 -
Flags: review?(catlee) → review+
Assignee | ||
Comment 9•14 years ago
|
||
We're upgrading the masters to the new Buildbot today, this patch is going to land along with that.
Assignee | ||
Comment 10•14 years ago
|
||
Comment on attachment 468727 [details] [diff] [review]
turn builds with hg errors purple (and retry them)
changeset: 917:0ba8a3c89102
Attachment #468727 -
Flags: checked-in+
Assignee | ||
Comment 11•14 years ago
|
||
Comment on attachment 468727 [details] [diff] [review]
turn builds with hg errors purple (and retry them)
Got backed out due to errors in the upstream patch.
Attachment #468727 -
Flags: checked-in+ → checked-in-
Assignee | ||
Comment 12•14 years ago
|
||
Turns out I forgot to commit the fixes to the upcall you suggested. This patch fixes that.
I'll be attaching the upstream diff that we need as well.
Attachment #468727 -
Attachment is obsolete: true
Attachment #469523 -
Flags: review?(catlee)
Assignee | ||
Comment 13•14 years ago
|
||
Here's all the upstream changesets we need to make this work bug free. This is:
http://github.com/buildbot/buildbot/commit/5764bd6edf7b639fb91bfe0e5732aefbf0bb6c5e
http://github.com/buildbot/buildbot/commit/87fbf3d84711a1d3471ddb36fa16e5eae8bc9464
http://github.com/buildbot/buildbot/commit/548d1ace6115c070b4659917536ea7e37e7aa31d
http://github.com/buildbot/buildbot/commit/9b5af09f7fa776ef2fad91c2e58dd2e6a4dde4d5
I've tested this + the buildbotcustom patch in staging. You can see the results on MozillaTest under "OS X 10.6.2 mozilla-central build". twistd.log was clear of relevant exceptions (there was some db ones and a bunch of HTTP 403's trying to poll shadow-central)
Attachment #469565 -
Flags: review?(catlee)
Assignee | ||
Updated•14 years ago
|
Updated•14 years ago
|
Attachment #469523 -
Flags: review?(catlee) → review+
Comment 14•14 years ago
|
||
Comment on attachment 469565 [details] [diff] [review]
round up of upstream changesets we need
Looks good. Can you write some tests upstream for this?
Attachment #469565 -
Flags: review?(catlee) → review+
Assignee | ||
Updated•14 years ago
|
Priority: P5 → P3
Assignee | ||
Comment 15•14 years ago
|
||
Comment on attachment 469523 [details] [diff] [review]
updated buildbotcustom patch
changeset: 950:c5881ee2525a
Attachment #469523 -
Flags: checked-in+
Assignee | ||
Comment 16•14 years ago
|
||
Comment on attachment 469565 [details] [diff] [review]
round up of upstream changesets we need
Landed across:
changeset: 91:da98221aa3bb
and
changeset: 92:5e4ed40eafd2
Attachment #469565 -
Flags: checked-in+
Assignee | ||
Comment 17•14 years ago
|
||
Updated Buildbot on the masters with:
cd ~cltbld/buildbot
hg pull
hg up
unset PYTHONHOME
cd master
/tools/buildbot-0.8.0/bin/python setup.py install
Assignee | ||
Comment 18•14 years ago
|
||
This looks like it's going to stick. I successfully made a build retry after an hg error. (I faked it by causing a 404...which made me realize we shouldn't be retrying on 404)
Priority: P3 → P5
Comment 20•14 years ago
|
||
The L? link is almost invisible against that dark purple background. Adding the following to userContent.css made it much more distinct:
/*
* make text legible on purple boxes of tinderbox.mozilla.org
*/
@-moz-document domain(tinderbox.mozilla.org) {
td[bgcolor="770088"]
{ background-color: #F4F !important
}
}
Assignee | ||
Comment 21•14 years ago
|
||
(In reply to comment #20)
> The L? link is almost invisible against that dark purple background. Adding the
> following to userContent.css made it much more distinct:
>
> /*
> * make text legible on purple boxes of tinderbox.mozilla.org
> */
> @-moz-document domain(tinderbox.mozilla.org) {
> td[bgcolor="770088"]
> { background-color: #F4F !important
> }
> }
This bug is tracking the Buildbot integration bit, but I filed bug 593341 for this.
Assignee | ||
Comment 22•14 years ago
|
||
There's quite a bit going on here....:
- Replacing ShellCommand/SetProperty/Trigger/Mercurial steps with those of our own creation
- Get rid of now-superfluous EvaluatingMercurial
- Delete unused l10n.py
- Fix some classes to use super_class
- Fix a bunch of subclasses' evaluateCommand to upcall properly.
I'm testing this along with the upstream changeset noted in bug 595027 on staging build & test masters now. So far, so good.
Attachment #474768 -
Flags: review?(catlee)
Comment 23•14 years ago
|
||
Comment on attachment 474768 [details] [diff] [review]
catch out of disk space globally, purge errors, fix hg errors
as per irc, patch needs to be updated with new Mercurial stuff
Attachment #474768 -
Flags: review?(catlee)
Assignee | ||
Comment 24•14 years ago
|
||
This is the same as the last patch modulo the Mercurial stuff and fixing up DisconnectStep to use super_class -- I had a reconfig issue with it in staging.
Attachment #474768 -
Attachment is obsolete: true
Attachment #475100 -
Flags: review?(catlee)
Updated•14 years ago
|
Attachment #475100 -
Flags: review?(catlee) → review+
Assignee | ||
Comment 25•14 years ago
|
||
Sorry to throw up yet another version of this, Chris, but I hit some reconfig issues today and wanted to make sure we avoid them in production. Specifically, I had issues with CompareBloatLogs, but I applied the super_class workaround to pretty much everything. Nothing else changed in this patch.
Attachment #475100 -
Attachment is obsolete: true
Attachment #476319 -
Flags: review?(catlee)
Updated•14 years ago
|
Attachment #476319 -
Flags: review?(catlee) → review+
Assignee | ||
Comment 26•14 years ago
|
||
This patch is ready to go, I've run it in staging for a long time without issue. Will land in the next RelEng downtime.
Assignee | ||
Updated•14 years ago
|
Blocks: releng-downtime
Assignee | ||
Updated•14 years ago
|
Flags: needs-treeclosure+
Assignee | ||
Comment 27•14 years ago
|
||
Comment on attachment 476319 [details] [diff] [review]
fix a bunch more potential reconfig issues
Landed in 1fd614e8c662 and 17a88ee7a7aa.
I'm going to consider this bug fixed now; we don't catch all infrastructure errors yet but the framework is there to easily add more. We'll do those in follow-up bugs.
Attachment #476319 -
Flags: checked-in+
Assignee | ||
Updated•14 years ago
|
Status: REOPENED → RESOLVED
Closed: 15 years ago → 14 years ago
Resolution: --- → FIXED
Comment 28•14 years ago
|
||
Attachment #476319 [details] [diff] tries to execute self.finished(EXCEPTION), but EXCEPTION is not imported:
2010-10-10 15:32:14-0700 [HTTPPageGetter,client] Unhandled Error
Traceback (most recent call last):
File "/builds/buildbot/builder-master/sandbox/lib/python2.5/site-packages/Twisted-10.1.0-py2.5-linux-i686.egg/twisted/internet/defer.py", line 441, in _runCallbacks
self.result = callback(self.result, *args, **kw)
File "/builds/buildbot/builder-master/sandbox/lib/python2.5/site-packages/Twisted-10.1.0-py2.5-linux-i686.egg/twisted/internet/defer.py", line 664, in _cbDeferred
self.callback(self.resultList)
File "/builds/buildbot/builder-master/sandbox/lib/python2.5/site-packages/Twisted-10.1.0-py2.5-linux-i686.egg/twisted/internet/defer.py", line 318, in callback
self._startRunCallbacks(result)
File "/builds/buildbot/builder-master/sandbox/lib/python2.5/site-packages/Twisted-10.1.0-py2.5-linux-i686.egg/twisted/internet/defer.py", line 424, in _startRunCallbacks
self._runCallbacks()
--- <exception caught here> ---
File "/builds/buildbot/builder-master/sandbox/lib/python2.5/site-packages/Twisted-10.1.0-py2.5-linux-i686.egg/twisted/internet/defer.py", line 441, in _runCallbacks
self.result = callback(self.result, *args, **kw)
File "/builds/buildbot/buildbotcustom/buildbotcustom/steps/test.py", line 489, in postFinished
self.finished(EXCEPTION)
exceptions.NameError: global name 'EXCEPTION' is not defined
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Assignee | ||
Comment 29•14 years ago
|
||
This patch fixes the missing import. I couldn't find any other files that were also missing imports.
Attachment #482531 -
Flags: review?(rail)
Updated•14 years ago
|
Attachment #482531 -
Flags: review?(rail) → review+
Assignee | ||
Updated•14 years ago
|
Flags: needs-treeclosure+ → needs-reconfig?
Updated•14 years ago
|
Flags: needs-reconfig? → needs-reconfig+
Comment 30•14 years ago
|
||
Comment on attachment 482531 [details] [diff] [review]
fix missing import
http://hg.mozilla.org/build/buildbotcustom/rev/31a22bd3816e
Attachment #482531 -
Flags: checked-in+
Assignee | ||
Comment 31•14 years ago
|
||
Masters have been updated with the last patch. Let's track further issues in follow-up bugs.
Status: REOPENED → RESOLVED
Closed: 14 years ago → 14 years ago
Resolution: --- → FIXED
Updated•14 years ago
|
Flags: needs-reconfig+
Updated•11 years ago
|
Product: mozilla.org → Release Engineering
You need to log in
before you can comment on or make changes to this bug.
Description
•