Open Bug 372452 Opened 18 years ago Updated 2 years ago

reftest should be able to compare with reference image, not just reference html


(Testing :: Reftest, defect)



(Not tracked)


(Reporter: ray, Unassigned)



(1 file, 2 obsolete files)

Attached patch reftest.js (obsolete) — Splinter Review
I have a patch that makes this work. It changes reftest to do the following:

1) when there is a comparison between two html files that fails, it takes the png for the reference and the png file for the test page and writes them both to the file system, as png images files.

2) One can have this in the manifest:

   == bugs/9458-valign-2.html bugs/9458-valign-2-ref.png

When a png is encountered, the test html page is rendered, and its image is compared with the image in the reference.

There are several obvious questions raised by this and I have not tried to guess them all. Can there be a better way to name and/or store the image files? Probably. Will the images be platform-specific? Yes. Will they be monitor resolution specific? Probably. Can the recording me made optional? Yes. I can think of several other questions.

There are uses for this, though. For example, a glyph should render a specific way on a specific platform. We could check for that. There are ways the issues could be dealt with and opportunities it makes available.
I think the idea of reftest was that you can't predict what the glyph rendering will be. I suppose this can be useful for things that don't involve text (although I think there are weirdnesses not just with text), but I'd rather have a way to compare rendering across different builds on the same machine.
But if you need to display an aleph, for example, and you want to verify you did not display an ee-krotkoye, you cannot compare some html to other html. If you want to verify that your fonts that are supposed to have serifs actually have serifs, you cannot do this with html. One needs to do this by verifying that what gets rendered is exactly what is supposed to be rendered.

There is a class of problems for which reftest was designed by dbaron. No surprise, that would be layout. There is a completely different class of problems addressable by image comparison.
Attachment #257096 - Flags: review?(dbaron)
Comment on attachment 257096 [details] [diff] [review]

So I definitely don't want the code in here to spit out the failure images to separate files.  If you want that, it's easy enough to post-process the data: URLs in the output, but I don't think we want that by default.  data: URLs are much easier to transfer in logs.

After considering the image reference code some more, I think I don't want that either (despite what I said to you on Friday).  My primary reason for that is that I can't think of any interesting image test with exact comparison that would have remained valid between 1.8 and 1.9.  We've drastically changed the way we lay out text (which affects things like measurement, kerning, ligatures, and antialiasing), changed the image scaling algorithms we use, etc.  I think most of the interesting things to test with reference images would require something that does approximate comparison, not exact comparison, and by that point I think we're better off with a separate tool.  I'd rather keep reftest focused on what it was originally intended for.

(For what it's worth, if we did want this, in StartCurrentURI, if you're going to set gPart2Key, you should check that aState is what you think it is.  And then you could call directly into the comparison code in DocumentLoaded (moved into its own function) rather than actually loading the URL.)
Attachment #257096 - Flags: review?(dbaron) → review-
Spitting out the failure images every time is not the goal here.

Sidje, who owns the MathML module, was pointing out that reference images could be generated when a manifest used something like "<<" at its front.

For example:

    << bugs/test.html bugs/test.png

This could unconditionally generate the image file. Using an existing manifest, the behavior could be exactly the same. So, a module owner would run the "comparison manifest" to check the images and run the "generating manifest" to create new images.

There will need to be a workflow for generating and judging new images. There are many ways to handle this. For example, between major upgrades, the entire suite of images may need to be regenerated. But this should not need to be done in minor updates. As we increase the number of minor updates we ship, our danger of introducing regressions is increasing. There is a significant class of problems that this would catch.

I will make some of the code changes suggested above and attach a new patch.
changed to core/layout so the right people can see it.
Component: Testing → Layout
OS: Mac OS X → All
QA Contact: testing → layout
Hardware: PC → All
This patch does two things.

First, it makes it possible to compare a loaded test page with a PNG that is a snapshot of the Canvas of the test page.

Second, per the suggestion of rbs (of the MathML project), it adds a "<<" directive to the manifest file.

When a reftest is run and it may encounter such as this:

  << dir/file.html dir/file.html

Reftest will load the first file (and the second and ignore that) and create a snap of the first file, automatically saving it at dir/file.html.png.

rbs suggested that one would thus be able to run a "generation reftest" when something changed the visual rendering of the pages, and then one would be able to run the "normal reftest", which would use:

  == dir/file.html dir/file.html.png

Existing reftest manifests and existing reftests run exactly the same. This patch does not change anything about the processing of current reftests.
Attachment #257096 - Attachment is obsolete: true
Attachment #263954 - Flags: review?(dbaron)
Comment on attachment 263954 [details] [diff] [review]
patch to include "<<" in manifest and comparison with PNG file

I don't think we want either of these:
 * I don't think reftest needs more than one way of outputting images, and printing data URLs works much better for logs
 * Comparison to a reference image should be an image comparison; a saved image could be the same data encoded differently.  Given that, I don't see why we wouldn't just want to render the image (e.g., in a very simple HTML page with zero margins -- you could even potentially write one that took the image in a query string -- if there were test that needed it, which I haven't seen yet)

Also, in future, please attach diff -u (preferably with 8 or more lines of context).
Attachment #263954 - Flags: review?(dbaron) → review-
Other people see the reason for this type of test case.

Given that this does not change current functionality, but only adds functionality which anyone can choose to use or not use, are there any problems with the actual code?
Attachment #263954 - Attachment is obsolete: true
Attachment #264034 - Flags: review?(dbaron)
The proposed additions would be helpful in an enabling/empowering way. Right now, MathML (and I presume SVG too) is tested using the so-called "eye-ball testing", where one looks at a rendered page to see if it renders as expected by traditional mathematical notation. This way of testing does not scale and becomes quickly manual/labour intensive and prone to mistakes/omissions. Whereas reftest is quite handy, and with these simple additions, it will become even more handy for other contexts like MathML (and I would think SVG and even XUL/XBL), as Ray described in some of his previous comments. Other alternatives of achieving the same effect seem over-stretched compared to this simple approach of leveraging reftest.

As for the patch itself, I am not familiar well enough with the constraints in place, but what I was wondering about was why Ray used "<< dir/file.html dir/file.html" instead of something like "<< dir/file.html dir/file.html.png".
I wish I did not have to use "<< dir/file.html dir/file.html" also. But that is what I could make work.

The reftest functionality seems to be:
1) load one document
2) capture the image of the canvas of it in memory
3) load another document
4) capture the image of that canvas into memory
5) compare them and write output

If this is what the reftest is doing, the logic is a bit obfuscated. It is also fragile. I am, though, relatively new to javascript. I have asked for assistance in crafting this patch in a better way.

I tried to make the manifest line  of the form "<< dir/file.html dir/file.png" and it would always hang.

The patch as I have it works. If anyone wants to suggest ways to make it work better, I am completely open to that. If I can figure out why it hangs, I can change it and re-submit the patch.
The problems with the code are:
 * that it backs out most of the recent changes to the files you're changing
 * that it adds a good bit of code for two features that can be done in simpler and more robust ways
 * that I don't see how you're going to write useful tests, passing on multiple platforms, for the features you're adding
There are many aspects of the setup that can affect the precise rendering:

* OS
  * Which fonts are installed (especially for MathML).
  * Text smoothing settings, such as "Turn off text smoothing for font sizes [8] and smaller" on Mac.
  * Scrollbar button location (Mac).
  * Scrollbar width (see bug 373456).

* Display
  * DPI, especially if it's so high that 1 CSS pixel is 2 device pixels.
  * LCD vs. CRT, unless subpixel rendering is disabled.

* Firefox settings
  * Default font.
  * Minimum font size.
  * Default link and text color.
Because there are so many possible combinations of settings that affect rendering, I think that storing one PNG "reference rendering" in CVS per platform will not work.  Instead, I think any "stored-image" reftests will be limited to detecting *changes* on a specific machine.

(For certain features that can be tested without text, such as the CSS block model or SVG shapes, it might be possible to make a single reference image.  But not for MathML...)
Note that this is not about storing the .png files in CVS. It is mostly about complementing reftest with a little extra that is useful for some situations (assuming of course that the patch is technically OK and doesn't undo anything that you guys want).
Two things:

1) Sorry about uploading a patch that backed out earlier changes. I will be more careful about that.

2) Even if image-comparison testing is only reliable per machine, it will still be worthwhile. Lots of things about automated testing are machine-specific. That is why most automated testing is done on machines dedicated to automated testing and set up and maintained for that purpose. As MoCo does with the QA lab machines.

If tests are machine specific, that certainly means image do not get checked in, but there are other workflows that will still work.

I have a very clean patch that just does the image comparison. I am trying to clean up the version that does the image generation. It is harder to do this right while I am rowing against the river, so to speak. But if that is how I must do it, that is how I will do it. A new patch will be coming soon.
Component: Layout → Reftest
Product: Core → Testing
Version: Trunk → unspecified
This is a mass change. Every comment has "assigned-to-new" in it.

I didn't look through the bugs, so I'm sorry if I change a bug which shouldn't be changed. But I guess these bugs are just bugs that were once assigned and people forgot to change the Status back when unassigning.
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.