Closed Bug 511174 Opened 16 years ago Closed 14 years ago

Fennec unit test parsing is broken

Categories

(Firefox for Android Graveyard :: General, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED WORKSFORME

People

(Reporter: mozilla, Unassigned)

Details

Currently: * There is a flag for known failure on unit tests, that applies to all browsers and platforms, but is essentially Firefox- and desktop-specific. * aiui, a number of unit test suites lack manifests of any kind. * We track unit test errors by counting the number of TEST-UNEXPECTED-FAIL lines, and subtracting an arbitrarily set, threshold-green number of additional expected Fennec-on-device failed tests from that number. ** Even if this worked perfectly, it's difficult to track what new tests failed, what old tests have been fixed, and what tests are flapping; this requires digging through the logs. ** This doesn't work perfectly. We run unit tests in "chunks" on Maemo to avoid crashing on oom. A number of chunks die in the middle, meaning a number of tests in each suite are never run. This makes the number of TEST-UNEXPECTED-FAIL vary wildly, as well as the total number of tests run. It seems a solution could be a database that stores results per test (per platform per branch) over time. Another solution could involve manifests for known test failures for Fennec on various devices -- this could also allow us to avoid running tests that will never work on Fennec.
(In reply to comment #0) > Currently: > > * There is a flag for known failure on unit tests, that applies to all browsers > and platforms, but is essentially Firefox- and desktop-specific. > > * aiui, a number of unit test suites lack manifests of any kind. All the mochi* tests and xpcshell tests have no manifests at all. That's something we can visit imposing upon those tests and would definitely help us in the long run. In the short run, post-processing the logs has always seemed like the quickest, biggest bang for the buck option. > It seems a solution could be a database that stores results per test (per > platform per branch) over time. Another solution could involve manifests for > known test failures for Fennec on various devices -- this could also allow us > to avoid running tests that will never work on Fennec. Right, and that is what we are working on with Joel's Build Comparison tool which is coming online now at http://brasstacks.mozilla.com/buildcompare. The site is still in development and has virtually no UI at the moment, but you can see that it is coming. Joel, want to comment on this?
there is a lot of talk of a manifest file, but no specifics. Building a manifest file similar to reftest would help a lot, but it would guarantee resolving our issues for the first iteration. Even with a working manifest file solution in place, we will still want to track results at a higher level. Using a database solution will give us more granularity and insight into what is happening build by build. It will also help us define our manifest files. Finally when we have a solution for manifest files, we can easily track the status and look for trends, random orange tests, etc... with a db. In order to get a db solution up and running we need: * a better way to parse the log files for accurate test counting (right now the tinderbox logs have duplicate data) * integration into the tinderbox scripts to upload data to the database
Since I filed this bug, there have been a lot of changes and a lot of improvements across the board. I think our teams are aware of the issues involved. I don't feel like this bug is helping, and is potentially a source of incremental noise. If we have a specific actionable item we can file a new bug.
Status: NEW → RESOLVED
Closed: 14 years ago
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.