Bug 551617 (HomelessReftests)

Tool to detect test files not used by reftest.list manifest

NEW
Assigned to

Status

9 years ago
9 years ago

People

(Reporter: murali, Assigned: murali)

Tracking

(Depends on: 5 bugs)

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(4 attachments, 5 obsolete attachments)

(Assignee)

Description

9 years ago
Created attachment 431775 [details]
source file for review

Please find enclosed a script that scans a given directory for all reftest.list files and then identifies if there are any files that are in the 'reftest' directories but neither called by other files nor used in any manifest files.

Those files could be newly added test cases which were not updated in the reftest.list files or some thing else.

How to use the script :

The script takes two arguments.

First one is the path of your Firefox code base.
Second one is the string 'reftest.list'

Usage :: python mchecker.py /path/to/your/src/directory reftest.list

It generates two text files for now .

found.txt :: Which shows what all the files that are used as part of other files or present in manifest lists.
notfound.txt :: which shows all files that are not called by any other files or referenced in reftest.list files.

I have yet to add the usage and commandline arguments validation code which I will complete soon.

In the mean time, can you please comment on the logic and utility

Updated

9 years ago
Summary: Please review this code → Tool to detect test files not used by reftest.list manifest

Updated

9 years ago
Attachment #431775 - Attachment mime type: text/x-python → text/plain

Updated

9 years ago
Attachment #431775 - Flags: feedback?(ctalbert)

Updated

9 years ago
Attachment #431775 - Flags: feedback?(jmaher)
I will leave out style and polish comments for now.

If the intention is to identify files that are on the filesystem, but not in a manifest file, then I don't understand why the first for loop is for each file in thereftestfileslist (which that variable name must get changed!)

I would expect to see something like:

reftestFiles = parseManifest(rootManifest) #re use/port reftest.js parser
existingFiles = parseFilesystem(rootDir) #some kind of os.walk 
for each f in existingFiles:
  if reftestFiles.find(f) == None:
    print "HOMELESS TEST %(s) needs a loving manifest" % f

keep in mind the .find() is made up, but can consist of a regex match on the leaf name or some simple function that searches leaf name and ensures singularity based on path name.

Another thing which I don't understand is all_files.  I don't understand the argument list as the second argument is a pattern, but also a .list file?  I am not sure if this is intended to be a recursive function, but never called recursively.

Comment 2

9 years ago
you might want to check out dmandelin's check manifest in js/src/tests/jstests.py
It seems like a script like this would have a lot of false positives due to support files (images, subframes, style sheets, scripts, etc.); I'm not sure if it's too many to be useful.  It might be possible to get a much more accurate list by comparing the list of files that are served when all tests are run over HTTP (can the server take logs?) (once bug 544385 lands) with the list of files packaged by the test packaging.
(Assignee)

Comment 4

9 years ago
Dbaron,

I need more clarity on your comments. Can you please elaborate!

Here is my understanding on how the reftests directory is organized.

There are three classes of files in the reftest.

Super-class: The reftest.list files which call other lists via 'include' or they call test cases in them.

First-class: These are the actual test cases that are referenced by Super-class files

Second-class: These are support files which are not directly referenced by Super-class files but are used in the First-class files.


So, my script is doing three passes.

First-pass : Scan the full ROOT directory structure and get the list of reftest.list files 

Second-pass : In each such directory where a reftest.list file resides, get the list of all other files in that directory and see if they are included in any of the Super-class files.

Third-pass : If a file is not included  by any of the Super-class files, then check if that file is being included in any other files in its own directory or any of sub-directories in its own directory.


In that process I am generating two files ... as you see ...

The found.txt is a useless artifact of my multi-pass scanning.

The notfound.txt is what matters. it lists all those files that are not included in any files or lists.


I agree that my algorithm is not Big(O) optimized. But you seem to think that the output is erroneous.


It could be possible that I totally screwed up in ways that I have not yet realized. Can you please elaborate more on your comments.
(Assignee)

Comment 5

9 years ago
(In reply to comment #2)
> you might want to check out dmandelin's check manifest in
> js/src/tests/jstests.py

Thanks BD ..  I am on it
(Assignee)

Comment 6

9 years ago
(In reply to comment #5)
> (In reply to comment #2)
> > you might want to check out dmandelin's check manifest in
> > js/src/tests/jstests.py
> 
> Thanks BD ..  I am on it

Oops .. I meant BC
(In reply to comment #4)
> Third-pass : If a file is not included  by any of the Super-class files, then
> check if that file is being included in any other files in its own directory or
> any of sub-directories in its own directory.

I didn't realize you were doing that.

(That said, does the way you do this count file C as found if a test A in the reftest.list references file B which in turn references file C?)
(Assignee)

Comment 8

9 years ago
(In reply to comment #2)
> you might want to check out dmandelin's check manifest in
> js/src/tests/jstests.py

Hi BC, I have checked the script and here are my findings :

The logic used in dmandelin's 'def' is very similar to what I have done ( or vice-versa ).

However, there are some simplification-factors available in jsreftest that are ( unfortunately ) unavailable with reftest.

Looks like there is a single manifest file in dmandelin's check and it is assumed that all files that have a '.js' extension should be in that or else missing. 

In my case, we have a graph of manifest files. Also the structure is slightly complicated as explained in Comment #4. 

I am going through multiple passes and pretty much applying similar logic as that of the  "def check_manifest(test_list)"

Again, since I am new to the Python language ..  I may have totally missed the mark here. Please correct me if you find that I am wrong.
(Assignee)

Comment 9

9 years ago
> (That said, does the way you do this count file C as found if a test A in the
> reftest.list references file B which in turn references file C?)

Yes I do. as long as any file in the ROOT path  can be traced through any number of hoops to any reftest.list I mark it as 'found'
I think dbaron was asking if a file includes another file, but it is not in the manifest.  For example:
in the manifest file there is just:
http://mxr.mozilla.org/mozilla-central/source/layout/reftests/font-face/reftest.list#53

but in the file:
http://mxr.mozilla.org/mozilla-central/source/layout/reftests/font-face/cross-iframe-1.html?force=1#28

it references the *-inner.html files.  

Thinking more about an earlier comment dbaron made regarding http server logging would be a good way to do this.  I assume we could turn on verbose logging on a webserver and write a quick log parser to determine what files were loaded.
(Assignee)

Comment 11

9 years ago
> (That said, does the way you do this count file C as found if a test A in the
> reftest.list references file B which in turn references file C?)

Yes I do. as long as any file in the ROOT path  can be traced through any number of hoops to any reftest.list
(Assignee)

Comment 12

9 years ago
> in the manifest file there is just:
> http://mxr.mozilla.org/mozilla-central/source/layout/reftests/font-face/reftest.list#53
> 
> but in the file:
> http://mxr.mozilla.org/mozilla-central/source/layout/reftests/font-face/cross-iframe-1.html?force=1#28
> 
> it references the *-inner.html files.  

I am not sure about the point you are making. 

My logic fails only if there is a cyclic reference like A refers to B and B refers to A and both are not in any reftest.list. In such a case, instead of reporting that both are missing from reftest.list I mark them as found.

If we have such a case, we need to examine that.
(Assignee)

Updated

9 years ago
Assignee: nobody → mnandigama
(In reply to comment #10)
> Thinking more about an earlier comment dbaron made regarding http server
> logging would be a good way to do this.  I assume we could turn on verbose
> logging on a webserver and write a quick log parser to determine what files
> were loaded.

The http server logging does have one area it will miss:  files that exist only to be the "wrong" file, i.e., to make sure we don't load a certain file in some case.

Comment 14

9 years ago
Created attachment 431996 [details]
output (notfound.txt)

Most of the output seems to be real problems.

False positives:
* passinner.png is used to test relative URLs.  You can ignore-list it.
* LICENSE, README, readme.txt can be ignore-listed.
* Please ignore-list ".DS_Store".  Mac OS X litters these all over.
* I think you want to ignore any file whose name ends with ^headers^.

Comment 15

9 years ago
For the rest of the output, I'll file bugs (or report false positives here) as appropriate.

Updated

9 years ago
Alias: HomelessReftests
(Assignee)

Comment 16

9 years ago
Created attachment 432175 [details]
version 2 of the script incorporating all review comments

I have incorporated all comments from jmaher and Jessey. Please review and approve this script.
Attachment #431775 - Attachment is obsolete: true
Attachment #431775 - Flags: feedback?(jmaher)
Attachment #431775 - Flags: feedback?(ctalbert)
(Assignee)

Updated

9 years ago
Attachment #432175 - Flags: review?(jmaher)
Attachment #432175 - Flags: feedback?(dbaron)
Attachment #432175 - Attachment mime type: text/x-python → text/plain
(Assignee)

Comment 17

9 years ago
(In reply to comment #1)

Big guy !!  

I will Big(O) optimize my script using lists and only two scans across the directories as you suggested.

Please stay tuned for the 3rd review some time next week.

For now, this script gives what you want !! the result ...


> I will leave out style and polish comments for now.
> 
> If the intention is to identify files that are on the filesystem, but not in a
> manifest file, then I don't understand why the first for loop is for each file
> in thereftestfileslist (which that variable name must get changed!)
> 
> I would expect to see something like:
> 
> reftestFiles = parseManifest(rootManifest) #re use/port reftest.js parser
> existingFiles = parseFilesystem(rootDir) #some kind of os.walk 
> for each f in existingFiles:
>   if reftestFiles.find(f) == None:
>     print "HOMELESS TEST %(s) needs a loving manifest" % f
> 
> keep in mind the .find() is made up, but can consist of a regex match on the
> leaf name or some simple function that searches leaf name and ensures
> singularity based on path name.
> 
> Another thing which I don't understand is all_files.  I don't understand the
> argument list as the second argument is a pattern, but also a .list file?  I am
> not sure if this is intended to be a recursive function, but never called
> recursively.
Comment on attachment 432175 [details]
version 2 of the script incorporating all review comments

Not clear what feedback is being requested here; I don't think there's any need for me to review this.  I'd only note that:

># The Initial Developer of the Original Code is
># Murali Krishna Nandigama

If you're a Mozilla Corp. employee, this should be the Mozilla Foundation.

># Portions created by the Initial Developer are Copyright (C) 2010-11
># the Initial Developer. All Rights Reserved.

Should be copyright 2010.

>def main():
>    
>    rootDirectory = sys.argv[1]
>    manifestFileName = sys.argv[2]

>if __name__ == '__main__':
>     main() 

This doesn't seem to serve much useful modularity purpose.  Perhaps it would be more useful if it looked something like:

def find_unlisted_files(rootDirectory, manifestFileName):
...

if __name == '__main__':
    find_unlisted_files(sys.argv[1], sys.argv[2])
Attachment #432175 - Flags: feedback?(dbaron)
Comment on attachment 432175 [details]
version 2 of the script incorporating all review comments


>
>def main():
>    
>    rootDirectory = sys.argv[1]
>    manifestFileName = sys.argv[2]
>    

this was mentioned by dbaron that you should have a different function name other than main.

I would keep main which verifies the input parameters and errors out if they are invalid while providing help.  Then call your main function with your validated input parameters.

>    theRefTestFilesList = list(find_all_files(rootDirectory,manifestFileName))
>    

what about calling this 'reftestFiles'?

>    for file in theRefTestFilesList:
>        print file
>        # Get the base directory of the reftest.list file and scan all files.
>        dirpath= os.path.dirname(file)

we don't use file anymore inside this loop, just dirpath.  if we are concerned about dirpath, make a shorter list to loop on.


                    infile=open(myfile,"r")
                    text=infile.read()
                    infile.close()
                    index = text.find(eachFile)

can you make a function here like 'findTextInFile(text, filename)' which returns an index or None?


Lets get this cleaned up with the couple of comments you have on this version and give it another round of reviews.
Attachment #432175 - Flags: review?(jmaher) → review-

Comment 20

9 years ago
Please make it use a top-level manifest (e.g. central/layout/reftests/reftest.list) rather than scanning an entire source tree for files named reftest.list.
(Assignee)

Comment 21

9 years ago
ok tough guys ;)

I am implementing the suggestions.

Comment 22

9 years ago
Created attachment 432250 [details]
Annotated output

I'll submit a big patch to try-server and then mozilla-central to fix the obvious test problems.  For the four that confuse me, I'll file bugs or email developers.
Attachment #431996 - Attachment is obsolete: true

Updated

9 years ago
Depends on: 553833

Updated

9 years ago
Depends on: 553834

Updated

9 years ago
Depends on: 553835

Updated

9 years ago
Depends on: 553836

Updated

9 years ago
Depends on: 553837

Comment 23

9 years ago
http://hg.mozilla.org/mozilla-central/rev/eee895e24866 fixes a bunch of the tool's warnings.

A few of the tests turned out to have been omitted from manifests because they failed.  I filed bugs on those.

Updated

9 years ago
Depends on: 553879

Updated

9 years ago
Depends on: 553880

Updated

9 years ago
Depends on: 553881

Updated

9 years ago
Depends on: 553883

Comment 24

9 years ago
All the problems found with reftests have been fixed or filed as bugs blocking this one.  Murali, lemme know when you have a new version of the tool and I'll run it for crashtests too.
(Assignee)

Comment 25

9 years ago
Created attachment 434050 [details]
Generified manifest checker for review

This version has made the changes recommended by reviewers. Now the script takes only one argument. the fully qualified reftest.list ot crashtest.list file name.

Starting from the file, the list of list files is build and searching is done.

I have not implemented jmaher's suggestion of creating a def for opening file and returning  the search index. That , I felt was too much of modularizing for the sake of doing it. I can live with a little bit of disorder :)


Usage : python generic-mchecker.py /usr/home/src/layout/reftests/reftest.list
Attachment #432175 - Attachment is obsolete: true
(Assignee)

Updated

9 years ago
Attachment #434050 - Attachment mime type: text/x-python → text/plain
(Assignee)

Comment 26

9 years ago
Created attachment 434062 [details]
crashtest needs-home list

Crashtest not found list
(Assignee)

Comment 27

9 years ago
Created attachment 434063 [details]
reftest needs home list

reftest needs home list

CAUTION: my codebase on which I ran these queries is not up to date.

Script to run : 

python generic-mchecker.py /home/mnandigama/src/testing/crashtest/crashtests.list 

python generic-mchecker.py /home/mnandigama/src/layout/reftests/reftest.list

Comment 28

9 years ago
(In reply to comment #27)
> Created an attachment (id=434063) [details]
> reftest needs home list
> 
> reftest needs home list
> 
> CAUTION: my codebase on which I ran these queries is not up to date.
> 
Murali, please do not confuse things by running the script on an out-of-date build tree and attaching the output.  It'd be much more helpful to run the tool with Jesse's patches which correct all the true errors and then it should find only the four tests that have been removed from manifests in order to be disabled.
(Assignee)

Comment 29

9 years ago
Point taken ..Thanks
(Assignee)

Comment 30

9 years ago
Created attachment 434622 [details]
Latest crashtest homeless tests list

This list is latest and all tests found in this list are not bein gused by any manifest file
Attachment #434062 - Attachment is obsolete: true
(Assignee)

Comment 31

9 years ago
Created attachment 434623 [details]
Latest reftests needhome tests list

This list contains 26 tests. This is a reduction from previous 45 count. Even if there are open bugs filed on these 26 files, these keep appearing in the missing list as long as they are not fixed.
Attachment #434063 - Attachment is obsolete: true
(Assignee)

Comment 32

9 years ago
Comment on attachment 434623 [details]
Latest reftests needhome tests list

There seems a strange repeat happening on one test file /home/mnandigama/src/layout/reftests/reftest-sanity/prefix-suffix.html which appeared 4 times. I will investigate the root-cause

Comment 33

9 years ago
I fixed all the crashtests listed in comment 30's attachment.
You need to log in before you can comment on or make changes to this bug.