Document the types of sources for tinderbox failures

RESOLVED FIXED

Status

Release Engineering
General
P3
major
RESOLVED FIXED
10 years ago
5 years ago

People

(Reporter: dholbert, Assigned: coop)

Tracking

Firefox Tracking Flags

(Not tracked)

Details

(URL)

(Reporter)

Description

10 years ago
When a tinderbox log has status "testfailed" (i.e. when it's orange), it should print some unique string* near the line that it considers a failure.

An example unique string would be "TINDERBOX_TEST_FAILED".  It also doesn't necessarily need to print it *on* the testfailure line, but perhaps at the bottom of that test's section.  (e.g. if a reftest fails, print TINDERBOX_TEST_FAILURE in the summary below the reftest section.  Similarly for mochitests, UI tests, leaks, etc.)

This would make it SO much easier to diagnose tinderbox oranges.
(Reporter)

Comment 1

10 years ago
Here's an example of an orange 'testfailed' log that I still can't find the error in:

http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1214836575.1214845810.6373.gz

I used a merge tool to compare this log to the last green log before it...

http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1214782511.1214791900.17452.gz

... and I could not immediately find any significant differences.  All the differences seem to be due to a) different components being recompiled, b) different datestamps & time values, or c) different memory addresses.

I challenge anyone to quickly spot the cause of this orange... I sure as heck couldn't.  It'd be so much easier if I could just do a quick search for a particular string to at least get me to the relevant section where a failure is occurring.
(Reporter)

Comment 2

10 years ago
Here are some other recent orange logs that required much more inspection than should be necessary.

These associated error messages are marked with "FAIL", "WARNING", and "ERROR FAIL".  Note that these words all occur normally in many other places within the tinderbox logs (with the exception of the full phrase "error fail"), and they are by no means always present when failures occur, so they're not particularly helpful as markers.

http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1214159640.1214160970.30815.gz
("FAIL: failed to initialize browser" at bottom of log)

http://tinderbox.mozilla.org/showlog.cgi?log=Mozilla2/1214690238.1214697033.433.gz
("WARNING leaked 1 instance of nsSimpleNestedURI with size 64 bytes" at bottom of log)

http://tinderbox.mozilla.org/showlog.cgi?log=Mozilla2/1213892192.1213898469.19535.gz
("ERROR FAIL leaked ..." towards bottom of log, but not all the way at the bottom)
We have an errorparser that is supposed to highlight test failure lines, but some bugs have cropped up lately. See for example:
bug 441550
bug 438324
Notably, the latter two of what you pasted there should be covered by bug 441550. I've never seen the first one, we could get a bug on file for that.
Oh, that first one is a Talos failure, so it's not using the unittest error parser. It's getting parsed by this: http://mxr.mozilla.org/mozilla/source/webtools/tinderbox/ep_unix.pl

So the solution here is to use "Error: " to mark error lines. You could file a bug on Talos about that, or you could file a bug about getting a new talos error parser.
(Reporter)

Comment 6

10 years ago
(In reply to comment #3)
> We have an errorparser that is supposed to highlight test failure lines, but
> some bugs have cropped up lately. See for example:
> bug 441550
> bug 438324

Even with a working error parser, I tend to think we should still mark up error failure lines (or sections) with some unique string.

The code that determines the log status ("testfailure" vs "success") can and should be marking the lines that trigger its decision, IMHO, for easier understandability by humans *and* errorparsers.
Buildbot determines the test failure by scanning the log after the fact. If you want a unique string there, you should try to get the various test suites to agree on unique failure strings.
(Reporter)

Comment 8

10 years ago
(In reply to comment #5)
> So the solution here is to use "Error: " to mark error lines.

... except that there are already many occurrances of this string (and other
similar ones) in *successful* logs.

e.g. try searching for "Error: " in the successful log:
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1214782511.1214791900.17452.gz

There are 86 hits, of this form:

*** 17027 INFO expected error in todo testcase | Test threw exception:
TypeError: entReference is null
(Reporter)

Comment 9

10 years ago
(In reply to comment #7)
> Buildbot determines the test failure by scanning the log after the fact.

Right, and for each section of the log, Buildbot has *some* way of determining whether that section had a failure.  Right?  So why can't it just add an annotation when it makes that determination?

> If you
> want a unique string there, you should try to get the various test suites to
> agree on unique failure strings.

I agreed that this would be ideal, but I'm not sure that's feasible at this point.

I think it'd be *so much* easier to have Buildbot, which already knows how to check each test suite for failure, just add an annotation when it determines one has failed. Right?
(In reply to comment #9)
> (In reply to comment #7)
> > Buildbot determines the test failure by scanning the log after the fact.
> 
> Right, and for each section of the log, Buildbot has *some* way of determining
> whether that section had a failure.  Right?  So why can't it just add an
> annotation when it makes that determination?

I think it's bogus to have the program running the tool stick extra things into the log. That will lead to confusion with people not knowing where the output comes from. We already control the test output and the error parsers in tinderbox, we just have some impedance mismatch here. I don't think making buildbot stick extra lines in there is the right solution. Making our test output more consistent is a noble goal.
(In reply to comment #8)
> (In reply to comment #5)
> > So the solution here is to use "Error: " to mark error lines.
> 
> ... except that there are already many occurrances of this string (and other
> similar ones) in *successful* logs.
> 
> e.g. try searching for "Error: " in the successful log:
> http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1214782511.1214791900.17452.gz
 

That's a unit test box, and it's using a different error parser:
http://mxr.mozilla.org/mozilla/source/webtools/tinderbox/ep_unittest.pl
(note that this was one of the reasons I wrote the unit test parser in the first place.) You can see what error parser the box is requesting from the top of the log file.
(Reporter)

Comment 12

10 years ago
(In reply to comment #11)
> That's a unit test box, and it's using a different error parser:
> http://mxr.mozilla.org/mozilla/source/webtools/tinderbox/ep_unittest.pl
> (note that this was one of the reasons I wrote the unit test parser in the
> first place.) You can see what error parser the box is requesting from the top
> of the log file.

Ah, sorry -- I misinterpreted "the first one" in comment 5 to refer to the original log I linked in comment 1, rather than the first link in comment 2.  I see what you mean now.

I still stand by comment 8 though -- I think "Error: " is a bad unique identifier for test failures in logs, given that it already occurs so frequently in some (if not all) logs.  Though maybe it's better than nothing.
(Reporter)

Comment 13

10 years ago
(In reply to comment #10)
> I think it's bogus to have the program running the tool stick extra things into
> the log. That will lead to confusion with people not knowing where the output
> comes from.

I see your point... Perhaps this could be partly alleviated by only inserting annotations if a special runtime flag is used?

If a runtime flag is explicitly required for this to happen, and if the inserted text is unique enough (something like "BUILDBOT_TEST_FAILURE"?), I don't see why there'd be much confusion.

> Making our test
> output more consistent is a noble goal.

Agreed. 
I think this is WFM after the changes in bug 443090, but that only covered unit tests, not Talos, so it's up to you. You can now search a log for TEST-UNEXPECTED- to find any unit test failure (including mochitest leaks).
(Reporter)

Comment 15

10 years ago
I still think it'd be awesome to have either:
  a) one unique string that uniquely identifies orange-triggering lines/sections, across all tinderboxen (talos included)
**or, (if not that):**
  b) a documentation page, linked off of Tinderbox, with an up-to-date list of the 3-5 unique strings for the different things that might cause oranges. ("TEST-UNEXPECTED-", "ERROR FAIL", ...)

This would make diagnosing oranges much less annoying.
(Reporter)

Comment 16

10 years ago
(In reply to comment #14)
> I think this is WFM after the changes in bug 443090, but that only covered unit
> tests, not Talos, so it's up to you.

I'd prefer to leave it open until either (a) or (b) from comment 15 (or something else entirely that's determined to be better) is in place.
Ok. I don't know anything about Talos, so you'd have to ask alice about that. Unit tests now only require you to search for one string, so I'm happy with that.
(Reporter)

Comment 18

10 years ago
(cc'ing Alice per comment 17)
Alice -- do you know if there's any work underway to make consistent & unique labels for Talos errorlog lines that cause orange-failures?

Or, if the labels are already pretty consistent/unique there (I don't know much about talos), could you give a list of the strings that would uniquely identify these lines?

For background, see specifically Comment 2's first example, comment 4-5, and comment 14-15.
Talos has lines that result in tinderbox orange start with 'FAIL: '.  All the error handling in talos goes through one central error handler, so this is consistently reported.

I guess that what might be missing is adding some more notation to situations that cause an error to be thrown (like last web site loaded, etc).  Right now, since talos calls it quits once a single error has been thrown, you can just go the end of the log to see the last thing that it was testing before it quit.

Comment 20

10 years ago
Bouncing to m.o for documentation updates as it doesn't sound like you're going to get all tinderbox clients or equivalents to use a single string for failures other than 'Error:'.
Assignee: nobody → server-ops
Component: Tinderbox → Server Operations: Tinderbox Maintenance
Product: Webtools → mozilla.org
QA Contact: tinderbox → mrz
Version: Trunk → other
Assignee: server-ops → nobody
Component: Server Operations: Tinderbox Maintenance → Release Engineering
QA Contact: mrz → release
(Assignee)

Updated

10 years ago
Assignee: nobody → ccooper
Priority: -- → P3
(Assignee)

Comment 21

10 years ago
It would be nice to converge on common error strings for all the test harnesses, but I think it's a sucker's game. Everytime we add a new test suite, we'll need to do more work. Our parser should be doing the work (and is!), although I agree with Ted that I do *not* want the parser also adding data to the stream. That gets confusing quick.

Given that, where's the right place for the documentation page suggested in comment 15 b to live? There are some existing build pages for tinderbox in wiki.m.o, but somewhere under http://developer.mozilla.org/En/Tinderbox might make more sense. 

In the meantime, I'll start tracking down error messages or contributors for same.
(Assignee)

Comment 22

10 years ago
I've setup https://wiki.mozilla.org/Build:TinderboxErrors to track tinderbox errors/warnings and added some starter content for unittests. I've tried to set it up so error classes can be quickly linked to from the table of contents.

Alice: can I get you to add some Talos error classes please?
I'd added in the Talos errors that are reported to the waterfall - is there anything else that would be of interest?  There is sometimes additional information in the log itself (such as the last page successfully loaded before a browser crash).
(Assignee)

Comment 24

10 years ago
(In reply to comment #23)
> I'd added in the Talos errors that are reported to the waterfall - is there
> anything else that would be of interest?  There is sometimes additional
> information in the log itself (such as the last page successfully loaded before
> a browser crash).

Addition info for debugging is already helpful. If that info already exists elsewhere, a link would be fine.
(Assignee)

Comment 25

10 years ago
We have a place for this error data to live now. I'll try to be vigilant about getting people to keep it up-to-date.

Lukas: if you have time to flesh out with the unit test section, that would be great.
Status: NEW → RESOLVED
Last Resolved: 10 years ago
Resolution: --- → FIXED
(Reporter)

Comment 26

10 years ago
Thanks, Chris & Alice!

(Updating bug summary to reflect the final resolution of this bug)
Summary: When tinderbox is orange, it should identify the line(s) that causes orange with a *unique string* → Document the types of sources for tinderbox oranges
(Reporter)

Updated

10 years ago
Summary: Document the types of sources for tinderbox oranges → Document the types of sources for tinderbox failures
Product: mozilla.org → Release Engineering
You need to log in before you can comment on or make changes to this bug.