tscroll is a bit outdated and could use some updating of its current tests as well as new coverage.
Lets use this bug to figure out what we want to test and when we know that we can figure how who can do it and when.
Avi: I assume your work for tscroll has helped out here?
Following bug 845943, tscroll now iterate using requestAnimationFrame.
My suggestion was to run the full tscroll suite twice, as different tests:
- Once normally (tscroll).
- Once with layout.frame_rate=10000 (which will cause rAF to iterate as fast as possible). e.g. tscrollR (for Rapid iterations).
The first run will simulate normal scroll working conditions, and we'd expect the iterations to be around 16.7 ms, though it's unlikely that this test detects small regressions, and it's variance is likely to be very low (because small pref changes don't affect it much).
The second run will do scroll iterations as fast as possible, therefore could detect small regressions, but is likely to have non negligible variance (preliminary results show about 10% variance over several runs on the same machine).
As I understand, there's some concern about increased test duration as a result of that extra tscrollR run. Depending on how acute this concern is, we could drop some of the subtests of tscrollR (I would drop those which show fastest iterations - best performance) and keep those with worse performance.
Another possible enhancement is adding some real-world pages (for instance, from tp5) instead the current synthetic pages.
I think it's important to make a distinction between two different aspects of "more adequate coverage" (for tscroll or any other test):
1. What we test: this mostly means intelligently selecting the samples we want to measure. This could be adding or removing pages at the test, modifying content of pages to make it better represent known/past/hard issues at the code, etc.
The suggestion at comment 2 to add tp5 pages to tscroll falls into this category, though admittedly as a suggestion only and without backing it up with real data other than (my) common sense.
2. How we test: that's the procedure we use in order to get good correlation of the result to "what we test".
The suggestion at comment 2 to run tscroll twice belongs in this category, and I did consider it deeply and am fairly certain of its value.
These two categories are not necessarily linked, and even if changes at any of them affect the result values, I don't think we should limit our progress by requiring both to be improved together (though obviously this would be best).
(In reply to Avi Halachmi (:avih) from comment #3)
> 1. What we test: this mostly means intelligently selecting the samples we
> want to measure. This could be adding or removing pages at the test,
> modifying content of pages to make it better represent known/past/hard
> issues at the code, etc.
> 2. How we test: that's the procedure we use in order to get good correlation
> of the result to "what we test".
We've got this covered:
1. tscrollx uses rAF and ASAP mode.
2. There's no real need to "run twice" - (once without ASAP) because:
a. ASAP proved good enough to represent the "non-ASAP" mode.
b. It's highly HW dependent and doesn't represent any real world case anyway.
3. It's mostly binary and doesn't detect regressions continuously.
3. We've got the "better coverage" part with tp5o_scroll - which takes the tscrollx test code and duplicates it to run over the tp5o pages (about 50).
So I'm closing it as RESOLVED. Reopen if you think this still needs some work.