Closed Bug 443329 Opened 16 years ago Closed 14 years ago

Consolidated test failure reporting

Categories

(Release Engineering :: General, defect, P3)

defect

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: mtschrep, Assigned: murali)

References

Details

(Keywords: intermittent-failure)

Attachments

(5 files, 6 obsolete files)

(spinoff of Bug 443090)

Bug 443090 should be dedicated to unifying the logging of our test suites to make them easier to parse. This bug will be about getting those test failures into one unified log to make them easier to analyze.

This requires:

1) Scraping all of the build logs
2) Taking any test failures and unifying them into something like:

BUILDID/BUILDTIME | TESTTIME | TESTNAME | MACHINE(from this we can map to OS) | MSG | link to full log

A single text file per tree (accessible over HTTP) with 1 line per test failure would be fantastic.  I'll take other solutions if they are easier... 

Lukas you are already doing most of this so I'll send this your way :-)
Blocks: 443323
FWIW, this will go a long way towards fixing some of the issues in bug 431900 that prevent developers from caring about the output of these boxes. If it's easier o find real failures, these machines may become useful again.
This would be huge help for two (related) problems I've had to deal with:

1) Figuring out if a failure is just a 1-off glitch, or something that happens repeatedly (albeit infrequently). Bug 433097 is an example of something that seemed to just happen briefly, but it's hard to have high confidence that it was really a one-time glitch without spending an inordinate amount of time trolling through logs.

2) Figuring when an intermittent test failure started. Example: bug 425987 was a problem that grew worse slowly over time, and eventually the box started failing constantly. Pulling data out of historical logs was a huge pain. In this case the result was a trend, but in other cases it might help point to a specific regression range.

Similarly, having a single log that's easily accessible would make it possible to spot patterns of failures earlier. I suspect we miss quite a few things, because most people only closely watch logs when they're checking in or sherriffing, and so it can take a while before we notice that a 1-time glitch has been happening daily.
Haven't gotten to work on this in a long time.  Back in the pool.
Assignee: lukasblakk → nobody
John Ford is taking a look at this. Also, linked to bug#438871 which gives more examples of intermittent tests.
Assignee: nobody → johnhford
Blocks: 438871
I have investigated two possibilities for adding the test time information to log output.  One was modifying BuildBot and the other was modifying make files to spit include timing information.  I have written and will attach a patch which adds timestamps to xpcshell based tests.  I haven't yet gotten browser/components/feeds/ tests to spit out timestamps, though I have written some JS code that will work with the timestamp formatting string I am using.  I was also thinking to control the output, it might be a good idea to write my own timestamp generator so I can adapt it for JS, Python, etc.
Attached file This is a sample log from this patch (obsolete) —
This is the output from make -k check within my objdir.  It is not the final output.
Attachment #362234 - Attachment mime type: application/octet-stream → text/plain
Attached patch Cleaned up patch v2 (obsolete) — Splinter Review
This patch is updated to allow someone to manually enable timestamps in their mozconfig file, specify a timestamp generator and specify a timestamp format (passed to timestamp generator as argv[1]).

This patch does not enable timestamps for mozilla-central/toolkit/components/feeds/ and other tests.

Suggested mozconfig additions:
ac_add_options --enable-timestamp
ac_add_options --with-timestamp-prog=date
ac_add_options --with-timestamp-fmt=+%s
Attachment #362233 - Attachment is obsolete: true
Attachment #362234 - Attachment is obsolete: true
Attachment #362836 - Flags: review?
Attachment #362836 - Flags: review? → review?(ted.mielczarek)
Attached file logparser-v1 WIP
This is my very much work in progress parser.  I want to post it here to track my work.  It is no where near complete
Attachment #362836 - Flags: review?(ted.mielczarek) → review-
Comment on attachment 362836 [details] [diff] [review]
Cleaned up patch v2

A few comments:
1) Is there any reason to bother letting people pass a --with-timestamp-prog? Is anyone going to use anything other than `date`? No need to overengineer here, let's just build something simple that does what you need.

2) --enable-timestamp is a very generic sounding option. I'd recommend "--enable-unittest-timestamps" or something.

3) Do you expect to actually use different timestamp formats in your work? I think you should just pick a format and hardcode it. Again, no sense in overengineering. (Plus, when you go to put this into Mochitest, you'll probably find that getting data into the mochitest script is a pain, so hardcoding it will be easier.)

 check::
 	@$(EXIT_ON_ERROR) \
+	export TIMESTAMP=$(TIMESTAMP) ; \
+	export TIMESTAMP_PROG=$(TIMESTAMP_PROG) ; \
+	export TIMESTAMP_FMT=$(TIMESTAMP_FMT) ; \

Makefile variables are automatically set as environment variables in the environment of commands that are run from a rule, so no need to do this.
Attached patch cleaned up patch v3 (obsolete) — Splinter Review
Cleaned up patch as per Ted's recommendations.  Left mozconfig option but renamed it from --enable-timestamps to --enable-testing-timestamps.  The timestamp-prog and timestamp-fmt aren't crucial to me, it was an attempt at flexibility.  They are no longer part of this patch.  My next step is to make a reftest and mochitest patch for the same effect.  I am fine with removing the autoconfig option all together if it is OK with everyone.
Attachment #362836 - Attachment is obsolete: true
Attachment #365376 - Flags: review?(ted.mielczarek)
Whiteboard: [orange]
Attachment #365376 - Flags: review?(ted.mielczarek)
Basically, this patch adds a timestamp before each "REFTEST " in output.  It surrounds fail messages with >>>>>>> and <<<<<<< on their own line to make it easier to parse out fail messages.  Sample lines:
1236879009.202 | REFTEST TEST-KNOWN-FAIL | file:///Users/jhford/mozilla/mozilla-reftest/layout/xul/base/src/grid/reftests/scrollable-rows.xul | 
1236879009.282 | REFTEST TEST-PASS | file:///Users/jhford/mozilla/mozilla-reftest/layout/xul/base/src/grid/reftests/sizing-2d.xul | 
1236878990.43 | REFTEST TEST-KNOWN-FAIL(EXPECTED RANDOM) | file:///Users/jhford/mozilla/mozilla-reftest/layout/reftests/text/wordwrap-02.html | 
1236878904.745 | REFTEST TEST-UNEXPECTED-FAIL | file:///Users/jhford/mozilla/mozilla-reftest/layout/reftests/mathml/table-width-1.xhtml | 
>>>>>>>
1236878904.745 | REFTEST   IMAGE 1 (TEST): data:image/png;base64,iVBORw0KGgoAAA...
1236878904.822 | REFTEST   IMAGE 2 (REFERENCE): data:image/png;base64,iVBORw0KGgoAAAANSUhEU...
1236878904.903 | REFTEST number of differing pixels: 1
<<<<<<<
Attachment #367076 - Attachment description: add timestamps to output → reftest output v1
Sample log for reftest output
Is there a reason that the MIME type of the attachment in #14 is "application/octet-stream"?
Attachment #367152 - Attachment mime type: application/octet-stream → text/plain
I didn't set that, but there are base64 encoded images which use valid data: uri's, so I am going to guess that some internal bugzilla system is picking it up as binary data.  Comment 13 has a sample of what I am talking about

   data:image/png;base64,iVBORw0KGgoA.... 

There are whole reftest images embedded in the log.  They were in the logs before I touched the output.  These files can be viewed perfectly fine as plain text.
Attachment #365377 - Attachment mime type: application/octet-stream → text/plain
Adds an autoconf option to disable unit test timestamps.  Seperated from xpcshell patch as this is not specifically dependent or required for that patch.
Attachment #365377 - Attachment is obsolete: true
Attachment #367537 - Flags: review?(ted.mielczarek)
Attached patch xpcshell harness patch (obsolete) — Splinter Review
PCShell unit test harness patch.  Rewritten to take into account rewrite of test_*.sh.  Autoconf option is not specifically hooked into this code, though, I could check environmental variables before printing timestamps.  I have also added a --no-timestamps option which could given to the runxpcshelltests.py program when the environmental variable is set.
Attachment #367538 - Flags: review?(ted.mielczarek)
Attachment #365376 - Attachment is obsolete: true
Attachment #367537 - Flags: review?(ted.mielczarek) → review-
Comment on attachment 367537 [details] [diff] [review]
adds an autoconf option

+TESTING_TIMESTAMPS = @TESTING_TIMESTAMPS@

Convention is to have boolean flags in autoconf.mk named "MOZ_FOO", so please rename this to MOZ_TESTING_TIMESTAMPS.

+MOZ_ARG_DISABLE_BOOL(testing-timestamps, [--disable-testing-timestamps disbles timestamps on tests],
+    TESTING_TIMESTAMPS=0, TESTING_TIMESTAMPS=1)

When it's not set, convention is to leave it blank, so:
MOZ_ARG_DISABLE_BOOL(testing-timestamps, [--disable-testing-timestamps disbles timestamps on tests],
   TESTING_TIMESTAMPS=, TESTING_TIMESTAMPS=1)

Also you have a typo in your help text there.

You're intending for this to be the default option, as well? In that case, does it even make sense to have this as an option? Why would anyone care to turn it off?

r- just to get an answer there (and an updated patch with nits addressed).
Comment on attachment 367538 [details] [diff] [review]
xpcshell harness patch

+      if timestamps: print str(time()) + " |",

Please put the body of the if statement on the next line, indented. I don't like single-line if statements in Python. I'm also not wild about the comma to avoid the newline. Could you use sys.stdout.write() instead there?

r+ with that fix.
Attachment #367538 - Flags: review?(ted.mielczarek) → review+
Depends on: 479976
(In reply to comment #0)

> BUILDID/BUILDTIME | TESTTIME | TESTNAME | MACHINE(from this we can map to OS) |
> MSG | link to full log

What is the interest in TESTTIME ? Isn't BUILDID/BUILDTIME enough ?

> A single text file per tree (accessible over HTTP) with 1 line per test failure
> would be fantastic.  I'll take other solutions if they are easier... 

Do we actually want this format in (each/every line of) the logs ?
Thus needing to check/modify all test harnesses, error parsers, buildbot scripts, etc ?

I thought the current "TEST FAILURE | TEST NAME | ADDITIONAL INFORMATION" format was fine,
and that the extended format summary would be the result of parsing the former log.

In addition to or instead of such a summary file, I would think the goal would be to have a searchable database, +/- like http://office.smedbergs.us:8080/.
Serge,

As far as the test time for each test goes, I could go either way.  I will leave that to others to decide.  

I have a log parser which stores the log output to a DB and currently shows logs in a simplified format.  The goal of my project is to allow searching.  I am in the process of making a demonstration video for what I have, but as it stands I have hooked up buildbot to my parser through buildbotcustom's unittest steps
Updated as per Ted's request.
Attachment #367538 - Attachment is obsolete: true
Depends on: 489728
Depends on: 490255
No longer depends on: 489728
No longer depends on: 479976
I haven't looked at this in a while, moving to future queue.
Assignee: jford → nobody
Component: Release Engineering → Release Engineering: Future
Does this mean that nobody will be looking at it for a while?  Seems odd for a critical-severity bug with a reviewed patch attached and [orange] in the whiteboard. :-(
About this bug: We're looking at different approaches.  QA has a test tracking database, and Ted has a test-tracking database, and something like that seems more appropriate/useful as a fix.  We'd like to track the status of each individual test every test run (not just failures) so we know when tests are introduced or skipped, so neither existing database is the ideal solution yet.  But again, they're closer to what we're looking for.

This bug would make for a good quarterly goal at some point; the question is what gets bumped if it becomes a Q4 goal.  We're waiting for the outcome of the goals meetings for a better view of what our Q4 will look like.

About the Future component: there seems to be an assumption that "Future" means "never".  We're contemplating renaming it "Next" or something of the sort; it just means this bug is not in our immediate view, and we don't want to keep stumbling across it during daily triage and risk burying new bugs that do need triage or immediate action.  Should this bug make it onto our Q4 goals instead of an existing proposed goal, we can move it back and assign it to someone.  And we do look at and re-triage the "Future" queue on a periodic basis; just not daily or weekly.
(In reply to comment #26)
> QA has a test tracking database, and Ted has a test-tracking database

Are any of these two databases (results) open to the community?
Mass move of bugs from Release Engineering:Future -> Release Engineering. See
http://coop.deadsquid.com/2010/02/kiss-the-future-goodbye/ for more details.
Component: Release Engineering: Future → Release Engineering
Priority: P2 → P3
(In reply to comment #25)
> Does this mean that nobody will be looking at it for a while?  Seems odd for a
> critical-severity bug with a reviewed patch attached and [orange] in the
> whiteboard. :-(
(reviewed patch was obsoleted, but no r? on later patch; not sure what happened there)


Stepping back, as this bug has morphed a bit since being filed in July2008. 

The idea here is to look through test logs for any failures, and be able to do this repeatedly over time, so as to spot frequency and trending of test failures. I believe this is already possible in http://brasstacks.mozilla.com/topfails. 

Is there anything else needed, or can we just close this bug as WORKS_FOR_ME_WHEN_I_USE_TOPFAILS?
For some context here - topfails is "ted's database" as noted in comment 26.  The "QA Database" refers to our old Log Compare effort that we started for Fennec (http://brasstacks.mozilla.org/logcompare, note that it is currently down at the moment as we are not actively maintaining that effort due to both a lack of people and a decision to look at the problem of tracking our results data in a more holistic fashion.  We have some current plans to start on a new version of the logcompare aka "track everything" database - passes/fails/todo/new tests etc this summer or fall.
(In reply to comment #29)

> I believe this is already possible in
> http://brasstacks.mozilla.com/topfails. 
> 
> Is there anything else needed, or can we just close this bug as
> WORKS_FOR_ME_WHEN_I_USE_TOPFAILS?

That works for me, specific followups can be filed separately if there are remaining needs. Brasstacks seems sufficient for comment 2.

Not sure what the patch here was doing though. More data wouldn't hurt, if it's still landable!
(In reply to comment #31)
> (In reply to comment #29)
> 
> > I believe this is already possible in
> > http://brasstacks.mozilla.com/topfails. 
> > 
> > Is there anything else needed, or can we just close this bug as
> > WORKS_FOR_ME_WHEN_I_USE_TOPFAILS?
> 
> That works for me, specific followups can be filed separately if there are
> remaining needs. Brasstacks seems sufficient for comment 2.
cool.

> Not sure what the patch here was doing though. More data wouldn't hurt, if it's
> still landable!
The patch looks like it was adding timestamp info but has merge conflicts in it, and was never flagged for review, so not landable, imho. If this patch is to be made useable, lets file a new separate bug, as thats not directly related to this bug.
Status: NEW → RESOLVED
Closed: 14 years ago
Resolution: --- → WORKSFORME
Let's give Murali some credit for getting TopFails back up and running. :)
Assignee: nobody → mnandigama
Resolution: WORKSFORME → FIXED
Whiteboard: [orange]
Product: mozilla.org → Release Engineering
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: