Closed Bug 313812 Opened 19 years ago Closed 17 years ago

Trender test implementation

Categories

(Core Graveyard :: GFX, defect)

x86
All
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: vlad, Assigned: vlad)

References

Details

Attachments

(1 file, 3 obsolete files)

This is the Trender test framework that I'm thinking of us using for the Trender tests.  It's loosely based on our current Tdhtml test, with some modifications.

Some knobs we can tweak: currently it's set to do 21 redraws per test, throwing away the first 4 results (to account for various X crazyness; I need to make sure that this isn't needed on win32).  It also uses a 1024x768 window size.. there's a commented move to 0,0 that should probably be turned on for the tinderbox test.

The tests themselves should be one per directory, with a dirname/index.html.  the dirnames should be listed in testList.

I started doing some work to also test scrolling the page; is that worth implementing? (scroll, redraw, scroll, redraw, etc.)

The main thing we're missing is test page content...
Attached file Trender.xml (XHTML) (obsolete) —
Trender.xml (XHTML)
Assignee: general → vladimir
Status: NEW → ASSIGNED
The results seem stable on windows, although some things render so fast that I'm getting some NaN errors.. working on those. :p
Attached file Trender.xml (obsolete) —
Fixed; it actually was working fine, but there was a stray src=.. that was inserted into the <frame> element when I saved the file locally to test!
Attachment #200799 - Attachment is obsolete: true
Scrolling is not so interesting IMHO... The hard scrolling cases are when we have to repaint the window, and that's just the usual rendering path.
Attached file Trender.xml (obsolete) —
Improvements.. now easier to specify tests, and supports a concept of test sets which identify specific tests for which to sum/average results.  This lets us add additional individual tests without rendering old numbers meaningless (since the same test set would continue to exist and be reported, we'd just add a new one composed of old tests + new ones, that could be graphed/tracked separately), and also lets us examine in more detail individual chunks -- e.g., one test set could be SVG tests, another could be very text-heavy, etc.

I've had really good results by just using our Save As.. Web Page, complete functionality to create tests; I've just been going to various sites and sucking down a page in that fashion, saving it as "index.html" inside a folder appropriately named.  I've been running the tests in offline mode, to make sure we don't load content that may not have been downloaded locally (a few images here and there, etc.) and getting very consistent results.
Attachment #200809 - Attachment is obsolete: true
I have an updated version now that calculates the end per-set values using a geometric mean of the averages of the tests in that set.  The results are very stable on windows and on the mac (but very horrifying on the mac, particularily for non-latin character sets), but linux is all over the place still, e.g.:

en-planet1 3,2,2,3,6,3,2,11,2,3,181,3,2,3,2,2,2,3,2,202,3
hebrew-msn 23,20,27,20,28,57,22,43,10,11,61,11,10,54,11,65,11,58,10,11,56

The aggregate results are again very stable for osx/mac, and not so stable under linux (maybe <1% variance with osx/mac, ~10% for linux), though I think that's ok; we should still be able to see gross improvements under linux.

What's the right way to get this set up as an actual test?  I have about 40 pages' worth of content right now, with about 25 english topsite pages (yahoo, amazon, cnn, espn, aol, that sort of thing), and the rest comprised of various non-latin script languages (chinese, thai, arabic, russian, hebrew, japanese, etc.), so I don't think this should be checked into the tree.  I would like to be able to add new tests/test sets as time goes on easily, though I guess that won't happen all that often that we couldn't just maintain this on the 3 tinderboxes that will be running Trender..
(In reply to comment #6)
> linux is all over the place still, e.g.:
> 
> en-planet1 3,2,2,3,6,3,2,11,2,3,181,3,2,3,2,2,2,3,2,202,3
> hebrew-msn 23,20,27,20,28,57,22,43,10,11,61,11,10,54,11,65,11,58,10,11,56

Any idea what's causing this? I wonder what's running during the 200ms en-planet1 is stopped

> What's the right way to get this set up as an actual test?  I have about 40
> pages' worth of content right now, with about 25 english topsite pages (yahoo,
> amazon, cnn, espn, aol, that sort of thing), and the rest comprised of various
> non-latin script languages (chinese, thai, arabic, russian, hebrew, japanese,
> etc.), so I don't think this should be checked into the tree.  I would like to
> be able to add new tests/test sets as time goes on easily, though I guess that
> won't happen all that often that we couldn't just maintain this on the 3
> tinderboxes that will be running Trender..

It's copyrighted content I suppose?
(In reply to comment #7)
> (In reply to comment #6)
> > linux is all over the place still, e.g.:
> > 
> > en-planet1 3,2,2,3,6,3,2,11,2,3,181,3,2,3,2,2,2,3,2,202,3
> > hebrew-msn 23,20,27,20,28,57,22,43,10,11,61,11,10,54,11,65,11,58,10,11,56
> 
> Any idea what's causing this? I wonder what's running during the 200ms
> en-planet1 is stopped

No idea; I'm going to retry on my desktop, as all this was on my laptop.  I had the CPU speed locked down to 1.6ghz, but who knows what else may have been going on.  Though this seems consistent to me with X buffering; planet doesn't have a lot of data, so it can send a bunch of frames down the pipe until eventually it blocks for the X server to catch up.  MSN has a lot more data to send down, so while it can send the data down quickly, it has to block every third frame or so for the X server to catch up.  Does that sound plausible?


> It's copyrighted content I suppose?

Yeah, it is.. I wonder if we can make some sort of fair-use claim here (since we're basically saving the pages as-is, no modifications [except to remove some code that forces the page to jump out of an iframe in some], because it would be nice to make this generally available.  But if not, we can always keep the bulk of it mozilla-internal, and publish a subset of tests that are not copyright encumbered.

There's a few synthetic tests I need to create as well (e.g. pages with lots of opacity blending), and I'm still hunting for some good SVG tests.


Attached file Trender.xml
Final code, with tinderbox print bits.

Find me on IRC if you want the full .zip; also, let me know if you have any ideas for synthetic tests to include in here.  Right now I have a transparency test, rounded-borders test, lots-of-text test, and small image tiling test.

A number of these tests are misrendered by the current cairo builds; I'm not sure how to flag those, or even if we should bother doing that.  I think we should just let it go for now, and as we fix the tests we'll let Tgfx do whatever with a note saying that a test was fixed.

A few of these pages have various JS code, usually dealing with ads, that cause strange rendering when forced into offline mode.  I've been cleaning up the HTML as much as I can, though a few are /really/ weird.

This test harness could also be used to implement layout correctness, with the addition of a small xpcom component (via a compiled-in extension) that would do the image diffing.  Shouldn't be hard to do, I'll work on that at some point.
This was done/fixed/etc, then got backed out of tinderbox, and is now back in as part of Talos.
Status: ASSIGNED → RESOLVED
Closed: 17 years ago
Resolution: --- → FIXED
Product: Core → Core Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: