Open Bug 1029348 Opened 5 years ago Updated 5 years ago

Integrate reftest statistic into ReftestManifest

Categories

(Testing :: Reftest, defect)

defect
Not set

Tracking

(Not tracked)

People

(Reporter: u459114, Unassigned)

References

Details

Attachments

(1 file)

Generate statistic data of all reftest test cases, listed in  layout/reftests/reftest.list, from ReftestManifest.

The benefit of this change is easier to see how many test items are skip on specific platform, such as B2G, so that we can understand the healthy of reftest on each platform.

Output of this summary look like
## reftest healthy report ##
* Total tests: 1020
* skip on B2G: 200
  {list of skip-if(B2G) tests)
* skip on B2G and browserIsRemote: 50
  {list of skip-if(B2G&&browserIsRemote) tests}
* fail on Android: 40
  {list of fails-if(Android) tests}
* random on cocoaWidget: 20
  {list of random-if(cocoaWidget) tests}
List all failed items in build time might too verbose.

have a parameter, such as "verbos", to output detailed list is better
./xxx.py -verbose layout/reftests/reftest.list 

By default, only dump summary data.
Related: bug 996183.
See Also: → 996183
We also now have a generic reftest manifest parser in the tree. Bug 1027215.

reftests conditions are JavaScript, so calculating skipped from Python would be... not fun.

I'm not sure if we want to make a push towards converting reftests to the widely-used .ini format. I seem to recall someone (dbaron or Ms2ger?) raise objections when I tossed the idea around a year or so ago. FWIW, I agree that aspects of the reftest manifests are nicer than ini files. But there is a push towards machine readability from Python. And the reftest manifests are arguably not that.
(In reply to Gregory Szorc [:gps] from comment #3)
> We also now have a generic reftest manifest parser in the tree. Bug 1027215.
> 
> reftests conditions are JavaScript, so calculating skipped from Python would
> be... not fun.
Sorry, I can't really catch this point. In regards to "reftest conditions", do you mean failure type, such as skip-if/random-if/fail-if? If that's the case, I don't understand why can't we calculate count of skip-if test items in Python? Would you explain more on that, thanks.
> I'm not sure if we want to make a push towards converting reftests to the
> widely-used .ini format. I seem to recall someone (dbaron or Ms2ger?) raise
> objections when I tossed the idea around a year or so ago. FWIW, I agree
> that aspects of the reftest manifests are nicer than ini files. But there is
> a push towards machine readability from Python. And the reftest manifests
> are arguably not that.
Flags: needinfo?(gps)
There are two use cases that we to generate this statistic data
1. Local testing.
   As a developer, I want to know how many test cases are skipped on B2G or Android. I don't really want to manually look into all reftest.list, have a tool here save a lot of time for me.
2. Build system: Bug 996183 comment 1

Bug 996183 comment 2 is another interesting extension of this tool.
$ ./mach reftest -fail-if|random-if|skip-if
With this options in command line, developer does not need to generate a temporary reftest manifest to test failed test case.

For all of this, the first thing that we should do is to modify manifest parser(Bug 1027215) to generate tables for each failure-type.
(In reply to C.J. Ku[:cjku] from comment #4)
> (In reply to Gregory Szorc [:gps] from comment #3)
> > We also now have a generic reftest manifest parser in the tree. Bug 1027215.
> > 
> > reftests conditions are JavaScript, so calculating skipped from Python would
> > be... not fun.
> Sorry, I can't really catch this point. In regards to "reftest conditions",
> do you mean failure type, such as skip-if/random-if/fail-if? If that's the
> case, I don't understand why can't we calculate count of skip-if test items
> in Python? Would you explain more on that, thanks.

The contents of the () in fails-if(), random-if(), skip-if(), etc., are Javascript expressions, evaluated in the sandbox constructed in https://hg.mozilla.org/mozilla-central/file/a19e0434ea52/layout/tools/reftest/reftest.js#l598

> > I'm not sure if we want to make a push towards converting reftests to the
> > widely-used .ini format. I seem to recall someone (dbaron or Ms2ger?) raise
> > objections when I tossed the idea around a year or so ago. FWIW, I agree
> > that aspects of the reftest manifests are nicer than ini files. But there is
> > a push towards machine readability from Python. And the reftest manifests
> > are arguably not that.

I'm not sure what I said a year ago, although I may well still agree with it even if I can't think of it at the moment.

One of the reason I like the format is that having a large and easily-extensible vocabulary of failure conditions lets us describe the actual failure conditions rather than just writing down which build machines the test fails on.  This is much better for developers running the tests locally, since they're less likely to get a bunch of unexpected failures related to the ways their configuration differs from one of our very small number of testing configurations, although it's degraded quite a bit as people have depended more and more on try and less on running tests locally.  (For example, fails-if(layersOMTC) is useful if I'm testing with OMTC enabled on Linux, even if our build machines have OMTC off on Linux.)  I'm not sure what the set of failure conditions we use in the .ini files is, but wanting to parse the conditions with python essentially contradicts the desire for dynamically-correct conditions that I describe here.

I don't mind the .ini syntax; I was more opposed to earlier proposals for much less human-editable syntaxes.
(In reply to David Baron [:dbaron] (UTC-7) (needinfo? for questions) from comment #7)
> enabled on Linux, even if our build machines have OMTC off on Linux.)  I'm
> not sure what the set of failure conditions we use in the .ini files is, but
> wanting to parse the conditions with python essentially contradicts the
> desire for dynamically-correct conditions that I describe here.

The set of conditions you can test against is documented here:
https://ci.mozilla.org/job/mozilla-central-docs/Tree_Documentation/buildsystem/mozinfo.html#mozinfo-attributes

It is, as you presume, limited to things we can statically know about at build time (most of it is a translation from configure/Makefile variables). If there are conditions that reftest wants to test that are only knowable at runtime then that's not a thing we can currently support. That being said, I think enforcing a known state (by way of preferences or other things) makes sense, so I would argue that having all our state be statically knowable is a nice benefit of that.

> I don't mind the .ini syntax; I was more opposed to earlier proposals for
> much less human-editable syntaxes.

FWIW, while I wish we only had one manifest format, I agree that the reftest manifests, being specialized to reftest, offer a more concise representation. I do think that they fall down when you get beyond the basics though (having multiple conditionals, for example).
Now that we have .ini format for mochitest with multiple conditions, I think it is only partially better than reftest manifest, but it does offer a bit of future flexibility.  Either way, we have 2 formats and that is better than 3.
Comment #7 answered my needinfo.
Flags: needinfo?(gps)
Thank for all comments here. I do learn a lot from them.

1. execute specific <failure-type> at local machine
./mach reftest --fails-if content/test/reftest/reftest.list 
It can be done by modify reftest.js. Small size change I think, compare to #2

2. python parser 
My original thinking is to generate a healthy report of reftest. Such as how many fails-if(B2G) test items in reftest manifests.
But since we have this kind of condition:
fails-if(Function("try{Function('let\x20x=5;');return(1,eval)('let\x20x=3;\\'x\\'\x20in\x20this');}catch(e){return(true);}")())
My plan is not work. :(

I suppose we can do this kind of static analysis with .ini syntax, since it's not that flexible.
You need to log in before you can comment on or make changes to this bug.