Closed Bug 601028 (waronorange) Opened 10 years ago Closed 6 years ago

Create Hbase system for buildbot logs in order to analyze intermittent failure data

Categories

(Testing :: General, defect)

defect
Not set

Tracking

(Not tracked)

RESOLVED WORKSFORME

People

(Reporter: cmtalbert, Unassigned)

References

Details

We want to create an Hbase log storage system were we can store all our buildbot logs so that they can be queried and analyzed for various bits of information.  The first set of things we will analyze them for will be data on the intermittent failures that we see in the automated test framework.  We will create a dashboard to show the frequency of these failures over time as well as the frequency of a single failure over time.  We can use this system to help drive an effort to fix the intermittent failures.

= The Steps we need to do (these may become dependent bugs of this one at some point) =
* Get Log slurper to slurp logs from ftp into hbase - needs to happen ASAP as each day we lose data from the aging policy of the buildbot ftp area
* Create log parser to parse logs and output interesting information that we'd like to query for later - this is nearly done, you can see the output of the log parser here: http://people.mozilla.org/~jgriffin/logs, code at http://hg.mozilla.org/automation/logparser
* Create a dashboard atop this that tracks frequency data of intermittment test failure.  Working off our preliminary dashboard at: http://jmaher.couchone.com/orange_factor/_design/woo/orange.html, and experimenting with the necessary calculations to flag significant changes in frequency of intermittent tests

Once we have this set of code in place, we'll close this bug and open new bugs to track features against the existing system. We'll use the system to help drive an effort to eliminate/reduce the number and frequency of intermittent test failures.

This effort is also called "The War on Orange" (orange refers to the color of the test failure indicator on our build & test tracker: http://tests.themasta.com/tinderboxpushlog/

Feel free to treat this bug as a tracking bug for work to get us to that initial system rollout.
*
Can we, either additionally or just plain, upload the real buildbot logs?

The mangling we're doing to send those logs to tinderbox is a bug based on historical constraints and not a promising way forward, IMHO.
Depends on: 601216
Depends on: 600413
(In reply to comment #1)
> Can we, either additionally or just plain, upload the real buildbot logs?
> 
> The mangling we're doing to send those logs to tinderbox is a bug based on
> historical constraints and not a promising way forward, IMHO.

Can you point at this mangling so we can see what you're talking about?
Well, hardly, as we through that data away rather quickly, if you can access it at all (requires more or less tough VPNs)

Anyway, in buildbot, the logs are:

- one instance per step
- log is interleaved chunks of data, keeping apart "headers", stdout, stderr.

headers for example are the environment dumping we have all over our logs.
Um, I think this was solved with OF, I think we can mark this as WFM now?

And we have the data in treeherder too so lots of ways to get our logs and do interesting things with them.
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.