Closed Bug 852357 Opened 11 years ago Closed 7 years ago

Reporting System for Build/Infrastructure Issues

Tracking

(Not tracked)

Status:

RESOLVED INCOMPLETE

People

(Reporter: k0scist, Unassigned)

Details

(Whiteboard: [buildfaster:?])

Jeff Hammel

Reporter

Description

•

11 years ago

Reporting System for Build/Infrastructure Issues

CC: :bhearsum, :edmorley, :RyanVM, :jgriffin, :jmaher
Whiteboard: [buildfaster:?]

As part of builds (in the buildbot sense, as used herein, so could
also be a test run), the slave environment is introspected and
modified for setup (and sanity) for the build steps. E.g. an existing
hg clone could be updated...or given insufficient disk space or a bad
repo state, could be cloned afresh.

With our existing infrastructure, if there are issues with setup
steps, there are basically two (easy) possibilities:

1. turn the job orange - you broke it!
2. print to TinderboxPrint - but it is likely no one would see it

For cases where there is a non-preferred fallback, an expected result
vs an actual result, or gathering statistics for later actionability
like e.g. timings, but one where the build may safely proceed, turning
the job orange may be overkill, but it may be desirable to note the
issue somewhat more visibly than via tinderboxprint.

Disclaimer: this is a blue sky idea; it is also not (necessarily) a
trivial project.  I also don't know the extent that this exists in the
build system or to the extent this is cared about.  This is a rough
proposal at best if its useful, not a request.

Possible use cases:
* noting issues that may require machine reimaging/maintenance
  (e.g. disk space)
* noting systematic timings/slow downs (and other machine stats)
* noting prevalence of non-fatal problems or potential problems
* noting (excessive?) number of retries (e.g. hg.m.o timeouts)
* noting slow downloads
* noting when a fallback method is used that takes longer than the
  preferred case or is otherwise less desirable

The no-tech (or at least no-infra) solution is to have each particular
piece that is cared about generate and send notification.  A more
complete solution would entail a universal way of noting that there is
an issue (and what it is) as well as a place to put it. Note that
while the no-tech issue is easy for a particular case, multiple cases
will involve copy+pasting code and will probably discourage notifying
on a particular issue since each is roll-your-own.  At the other end
of the spectrum, a precisely tailored solution will involve an
excessive amount of time to spec and craft.  Both extremes give
perfect elasticity, though some middle ground is likely more pragmatic
in terms of overall gain.

Noting the issue could be done in any number of ways, for example
scanning the logs, POSTing to some service (e.g. bugzilla), emailing
some parties, uploading a file (somewhere), pulse, or leveraging
TinderboxPrint or similar and/or TBPL 2.0 equivalent thereof via an
additional piece.

The place to put it could be bugzilla (likely, since it is our issue
tracker whether or not it is ideal for this particular purpose and
this class of bugs could be harvested to make an additional
dashboard), a mailing list, nagios, or yet-to-exist web service.

IFF this is something worth doing, first steps would be prioritizing
based on added value and deciding the actual form of the solution
based on a convolution of (need) and (bang for the buck).

Idea from https://bugzilla.mozilla.org/show_bug.cgi?id=851270#c31

Jeff Hammel

Reporter

Updated

•

11 years ago

Whiteboard: [buildfaster:?]

Ed Morley [:emorley]

Comment 1

•

11 years ago

The new treeherder generic metadata fields sound like a good place to store this; all we then need is a UI for it, separate from the normal treeherder-ui view.

Nobody; OK to take it and work on it

Assignee

Updated

•

11 years ago

Product: mozilla.org → Release Engineering

Chris AtLee [:catlee]

Updated

•

7 years ago

Status: NEW → RESOLVED

Closed: 7 years ago

Resolution: --- → INCOMPLETE

Nobody; OK to take it and work on it

Assignee

Updated

•

6 years ago

Component: General Automation → General

You need to log in before you can comment on or make changes to this bug.

Bugzilla

Quick Search

Reporting System for Build/Infrastructure Issues

Categories

(Release Engineering :: General, defect)

Tracking

(Not tracked)

People

(Reporter: k0scist, Unassigned)

References

Details

(Whiteboard: [buildfaster:?])

Crash Data

Security

(public)

User Story

Description

Updated

Comment 1

Updated

Updated

Updated