We currently get relatively little information about our mozmill test failures. I propose we do the following, in broad strokes: 1) When a failure is reported, gather information about the state of Thunderbird: a) Always gather some data that can't hurt. For example, render a canvas-based screenshot of each window and characterize broad UI state (what window is focused, what widget in that window is focused, high-level tab info which may not be obvious from the screenshot). b) Provide "event timeline" data for the time period shortly before the start of the test up through declaration of the failure. The canonical example of such events is folderEventLogHelper.js which provides an nsIMsgFolderListener and reports everything it hears using logHelper's mark_action. This data would normally be written into an in-memory buffer and discarded in cases where a test does not fail. c) Provide hooks for specific tests/test families to provide additional information. 2) "Report" that information by either writing it to a file or printing it out on stdout. The stdout idea is for tinderbox. We can scrape the data out of the buildlog after the fact and do something useful with it without first having to stand up a reliable fancy data store. (We can later move to having the run directly POST its results to a reliable infrastructure server if so desired.)
Since this is not making things worse on my various try pushes, I think it makes sense to start reviewing this. The patch, regrettably combines a number of actual test fixes with the debug improving logic. I'm going to claim this is okay because it's clear the patch makes a number of oranges (on linux at least) go away, and has the demonstrated side-effect of structured output. To aid in discussing the relevant bits, here are what I believe to be the limitations in this patch, but which also seem fairly reasonable: 1) It always dumps structured meta-data out on stdout. Tinderbox or buildbot will love this, but humans who don't want the log output may not. Specifically, those data:url screenshots can get long. This is somewhat mitigated by the structured dumps taking place before the summary output generated by the python driver. 2) Magic logging only happens if test-folder-display-helpers.js is loaded. This was a pragmatic decision based on the fact that test-folder-display-helpers is the one that loads the asyncTestUtils stack which is where most of the logHelper stuff comes from. test-window-helpers is fancy logging enabled and can work without test-folder-display-helpers being loaded by virtue of having stub functions that are replaced when test-folder-display-helpers is initialized. Although this is clearly an argument for refactoring some of that responsibility out, I would argue we want to hold on doing that until we convert to the newer release of mozmill that bundles the cuddlefish loader and allows us to use CommonJS-style require("blah") dependency management instead of the ugly ugly thing we are doing right now. (Of course, there still remains the problem that the async test utils stack is designed to run in xpcshell too, and that shows no signs of gaining require() support... I guess we could rejigger that test layer to be JS modules instead?) 3) The python test runner accumulates the rich failure information as the tests run (and fail), but only outputs them when the tests are done. This means potentially increased memory usage and that failure cases where buildbot kills the process or mozmill kills itself in certain ways will not result in any log output. (Note that logsploder is capable of realtime monitoring, so that is an alternative...) The performance impact of this patch seems to be negligible. This could partially just be that because I hacked around some bad tests that were causing compulsory timeouts (but not failures) under 1.4.2, netting us a performance gain on the test changes side of the house that nearly offsets the performance slowdown from the speculative logging costs. (We snapshot objects to JSON-able representations when logging, which provides us with useful data if we detect a failure and choose to send the data across. Without this, we would only see the state of referenced objects at the time of the failure, which would be decidedly less useful and a bit of a memory bloat concern.)
Additional info on the processing side of the house including some screenshots can be found here: http://www.visophyte.org/blog/?p=722 The server stuff is not consistently available right now, but should be up with a reasonable level-of-service in a few days. Because the server just scrapes stuff out of the logs, the availability of the server does not affect the mozmill runners, and indeed allows the server to go back in time as needed (and logs remain available).
Here's an updated version that just managed to get a completely green set of runs on ThunderbirdTry! The set of changes is relatively minor, I believe.
Comment on attachment 518248 [details] [diff] [review] v2 minor test changes, what just ran green on try Looks good! Especially like the green linux :-)
Attachment #518248 - Flags: review?(bugzilla) → review+
Woo! pushed to trunk: http://hg.mozilla.org/comm-central/file/dd7768eaabf2 I will start to pursue standing up an official momo scraper doodle, but for the time being, results can be seen on http://arbpl.visophyte.org/ and specifically the Thunderbird tree: http://arbpl.visophyte.org/?tree=Thunderbird It's also possible to run it locally if you have hbase setup. I doubt most people will have hbase setup, so I will look into having it use a SQLite store, although that will still require people to have node.js and SQLite...
Status: ASSIGNED → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Er, and I am expecting we will still see some intermittent oranges, hopefully this will make them easier to track down...
Target Milestone: --- → Thunderbird 3.3a3
You need to log in before you can comment on or make changes to this bug.