./mach subcommand to print tests categorized by bugzilla components

NEW
Unassigned

Status

Testing
General
7 months ago
a month ago

People

(Reporter: armenzg, Unassigned)

Tracking

(Depends on: 1 bug, Blocks: 1 bug)

Version 3
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(Not tracked)

Details

MozReview Requests

Submitter Diff Changes Open Issues Last Updated
Loading...
Error loading review requests:

Attachments

(3 attachments)

(Reporter)

Description

7 months ago
We want to see the sources of information with regards to which tests are associated to which manifests, components and map that to intermittency.

We want to investigate the various source and produce summary data.
May you add an example of a "manifest" and a "component"?  Links to either, maybe?
I view components as bugzilla components, which loosely maps to source directories.

One example I did back in December with made up data is:
https://people-mozilla.org/~jmaher/dashboard/dashboard.html

I view the bugzilla components as someone who is a triage owner:
https://bugzilla.mozilla.org/page.cgi?id=triage_owners.html


Currently when viewing the source tree and using mach:
$ ./mach file-info bugzilla-component layout/printing/**
Core :: Layout
  layout/printing/PrintTranslator.cpp
  layout/printing/PrintTranslator.h
  layout/printing/crashtests
  layout/printing/ipc
  layout/printing/moz.build
  layout/printing/nsIPrintProgress.idl
  layout/printing/nsIPrintProgressParams.idl
  layout/printing/nsIPrintStatusFeedback.idl
  layout/printing/nsPagePrintTimer.cpp
  layout/printing/nsPagePrintTimer.h
  layout/printing/nsPrintData.cpp
  layout/printing/nsPrintData.h
  layout/printing/nsPrintEngine.cpp
  layout/printing/nsPrintEngine.h
  layout/printing/nsPrintObject.cpp
  layout/printing/nsPrintObject.h
  layout/printing/nsPrintPreviewListener.cpp
  layout/printing/nsPrintPreviewListener.h

or more specifically:
./mach file-info bugzilla-component layout/printing/crashtests/509839-1.html
Core :: Layout
  layout/printing/crashtests/509839-1.html


I find the mapping tests in-tree -> bugzilla components gets us a path to ownership (via triage owners) and if we can give those owners a good set of data so they understand what code and tests they have along with any exceptions or issues to look at, then the bet is that we are providing value by giving useful data for teams to make decisions and focus on the right things.
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
(Reporter)

Comment 6

7 months ago
My latest push to MozReview has working code.

You can test with this command:
> ./mach file-info test-bug-components-associations testing/mochitest/

It will show a summary and create a file with all the data.

Running against the topsrcdir runs slowly. I will have to optimize.
it appears my 'windows' environment is not enjoying working with this:
$ ./mach file-info test-bug-components-associations layout/
We have 594 tests under this/these paths: ['layout/']
Error running mach:

    ['file-info', 'test-bug-components-associations', 'layout/']

The error occurred in the implementation of the invoked mach command.

This should never occur and is likely a bug in the implementation of that
command. Consider filing a bug for this issue.

If filing a bug, please include the full output of mach, including this error
message.

The details of the failure are as follows:

KeyError: u'layout\\style\\test\\test_bug499655.html'

  File "c:\Users\elvis\mozilla-inbound\python/mozbuild/mozbuild/frontend/mach_commands.py", line 164, in associations
    m = self._get_files_info([path])[path].get('BUG_COMPONENT', 'UNKNOWN')



I suspect this is something goofy with the \ vs / on windows. let me poke around a bit.
and a small hack like this works:
diff --git a/python/mozbuild/mozbuild/frontend/mach_commands.py b/python/mozbuild/mozbuild/frontend/mach_commands.py
--- a/python/mozbuild/mozbuild/frontend/mach_commands.py
+++ b/python/mozbuild/mozbuild/frontend/mach_commands.py
@@ -155,18 +155,19 @@ class MozbuildFileCommands(MachCommandBa
             if 'relpath' in metadata and metadata['relpath'] not in tests_by_path:
                 tests_by_path[metadata['relpath']] = metadata

         print('We have %d tests under this/these paths: %s' % (len(tests_by_path), paths))

         # Let's create a data structure containing the bug components
         components_to_files_metadata = defaultdict(list)
         for path in tests_by_path:
+            p = path.replace('\\', '/')
             # TODO: This might be suboptimal
-            m = self._get_files_info([path])[path].get('BUG_COMPONENT', 'UNKNOWN')
+            m = self._get_files_info([p])[p].get('BUG_COMPONENT', 'UNKNOWN')
             if m != 'UNKNOWN':
                 m = '%s::%s' % (m.product, m.component)
             components_to_files_metadata[m].append(path)

         print('We have found tests in %d components' % len(components_to_files_metadata))

         with open('tests_in_bug_components.json', 'w') as fd:
             fd.write(json.dumps(components_to_files_metadata, indent=4))

Comment 9

7 months ago
mozreview-review
Comment on attachment 8855917 [details]
Bug 1352495 - File-info mach subcommand to dump tests by bug component.

https://reviewboard.mozilla.org/r/127804/#review130514

This seems pretty reasonable!

::: python/mozbuild/mozbuild/frontend/mach_commands.py:151
(Diff revision 3)
> +            # NOTE: Metadata generated by resolve_tests() can have multiple items for
> +            # the same test file. For instance, a test file can have multiple entries
> +            # with differening 'ancestor_manifest'. Not sure if this is a bug or expected
> +            # We're making the call of just picking the first appearance

It is a feature. Some test files have different configurations. Each test configuration is represented as its own test entry in the test resolver.

Since bug metadata is file-based, de-duping entries based on path is the correct thing to do.

::: python/mozbuild/mozbuild/frontend/mach_commands.py:163
(Diff revision 3)
> +        print('We have %d tests under this/these paths: %s' % (len(tests_by_path), paths))
> +
> +        # Let's create a data structure containing the bug components
> +        components_to_files_metadata = defaultdict(list)
> +        for path in tests_by_path:
> +            # TODO: This might be suboptimal

This is definitely sub-optimal. `get_files_info()` has to run some somewhat expensive code. The evaluation essentially batches up files in the same directory. So passing the full list/set of paths you are interested in and invoking once should be faster than invoking N times.

I wouldn't be surprised if there were some algorithmic improvements that could be made to `get_files_info()`: that function is intended to be run with tens or hundreds of inputs, not thousands. But unless we're doing something really dumb (which would be obvious by running in a profiler), it will almost certainly be faster than N invocations.

::: python/mozbuild/mozbuild/frontend/mach_commands.py:172
(Diff revision 3)
> +            components_to_files_metadata[m].append(path)
> +
> +        print('We have found tests in %d components' % len(components_to_files_metadata))
> +
> +        with open('tests_in_bug_components.json', 'w') as fd:
> +            fd.write(json.dumps(components_to_files_metadata, indent=4))

We also typically sort JSON files so output is deterministic and more human readable.
Comment hidden (mozreview-request)
(Reporter)

Comment 11

7 months ago
Focusing the bug on the current feature being developed.
Summary: Investigate finding metadata about test files, associated them to bug components and manifest files → ./mach subcommand to print tests categorized by bugzilla components
this seems to run faster, but still fails for me on windows :)

I also noticed we do not have reftests associated here- I noticed in layout/reftests/moz.build we define REFTEST_MANIFESTS and associate all the reftest.list files with that.  Possibly handle this in a followup bug :)
Depends on: 1355163
(Reporter)

Updated

7 months ago
Depends on: 1334525
Comment hidden (mozreview-request)
(Reporter)

Comment 14

6 months ago
mozreview-review-reply
Comment on attachment 8855917 [details]
Bug 1352495 - File-info mach subcommand to dump tests by bug component.

https://reviewboard.mozilla.org/r/127804/#review130514

> This is definitely sub-optimal. `get_files_info()` has to run some somewhat expensive code. The evaluation essentially batches up files in the same directory. So passing the full list/set of paths you are interested in and invoking once should be faster than invoking N times.
> 
> I wouldn't be surprised if there were some algorithmic improvements that could be made to `get_files_info()`: that function is intended to be run with tens or hundreds of inputs, not thousands. But unless we're doing something really dumb (which would be obvious by running in a profiler), it will almost certainly be faster than N invocations.

Having made the changes you mentioned on how to call _get_files_info() helped (all paths at once instead of calling N times), however, yesterday I fixed the issue where I would not be processing reftests ('relpath' vs ' file_relpath') and that bumped the number of files from 16k to 42k.
The script now takes forever. I will do some profiling.
and there are also web_platform_tests to consider, they are defined in testing/web-platform/*.

Not an easy problem to solve, but it sounds like you are doing the right things here Armen!
Armen,

From your mail:

> The data TestResolver uses comes from the build generated "all-tests.pkl" file.

Can you read the "all-tests.pkl" file directly? What does TestResolver do? It appears to be coded assuming a particular access pattern, or use case; your case appears to not match.

May you send me a copy of an "all-tests.pkl" file?  I am very curious about what that file is about.

Thank you!
Flags: needinfo?(armenzg)
(Reporter)

Comment 17

6 months ago
Created attachment 8857057 [details]
Performance for 10000 paths that use 'relpath'

Processing between tests using 'relpath' vs 'file_relpath' is not necessarily the issue.

Going from 10k test paths to 15k paths does not increase the number of function calls in 50% or increase the processing time approximately 50%.

We have a non-linear growth with number of inputs.

~15k entries:
> 232813041 function calls (232803206 primitive calls) in 126.765 seconds

~10k entries:
>  78787958 function calls (78782017 primitive calls) in 41.454 seconds

That's rougly 300% increase.
(Reporter)

Comment 18

6 months ago
Created attachment 8857058 [details]
Performance for 15000 paths that use 'relpath'

Here are the top 4 tottime calls between the two.

~15k entries:
>  4738162   17.521    0.000   26.463    0.000 /Users/armenzg/.pyenv/versions/2.7.12/lib/python2.7/posixpath.py:329(normpath)
>    14796   12.716    0.001   93.053    0.006 /Users/armenzg/repos/firefox/python/mozbuild/mozbuild/frontend/reader.py:1401(test_defaults_for_path)
> 13086004   12.058    0.000   18.874    0.000 /Users/armenzg/.pyenv/versions/2.7.12/lib/python2.7/posixpath.py:120(dirname)
>  2330408    9.416    0.000   63.748    0.000 /Users/armenzg/.pyenv/versions/2.7.12/lib/python2.7/posixpath.py:424(relpath)

~10k entries:
>  7059225    6.306    0.000    9.818    0.000 /Users/armenzg/.pyenv/versions/2.7.12/lib/python2.7/posixpath.py:120(dirname)
>     9796    5.859    0.001   22.748    0.002 /Users/armenzg/repos/firefox/python/mozbuild/mozbuild/frontend/reader.py:1401(test_defaults_for_path)
>  7005202    3.073    0.000   14.327    0.000 /Users/armenzg/repos/firefox/python/mozbuild/mozpack/path.py:51(dirname)
>   532526    2.181    0.000    3.326    0.000 /Users/armenzg/.pyenv/versions/2.7.12/lib/python2.7/posixpath.py:329(normpath)
Flags: needinfo?(armenzg)
(Reporter)

Comment 19

6 months ago
gps: Do you see any obvious perf changes?
Should we look at another approach? (e.g. generate a file with bugzilla components at build time)
(Reporter)

Comment 20

6 months ago
ekyle: the file gets generated as part of the build process under the objdir.

I've uploaded it for you:
https://drive.google.com/open?id=0B3E5wl4QTSYKUXZlbndLNUh0X1U

I don't know what specifically generates it. You can find more about it in here:
https://dxr.mozilla.org/mozilla-central/search?q=all-tests.pkl&=mozilla-central
(Reporter)

Updated

6 months ago
Blocks: 1355568
Depends on: 1355630
As expected, there was some sub-optimal code in metadata reading. The patches I just attached in bug 1355630 reduce execution time for `mach file-info test-bug-components-associations dom` (9131 tests) from ~27.3s to ~3.7s. When running on the entire repo (42+k files), execution time dropped from minutes to ~24s.

There are probably some other optimizations to be found. But I think I chopped off the big and easy ones.
(Reporter)

Comment 22

6 months ago
That's fantastic! Thanks for your prompt help!
I had a quick look at this. It worked well for me and seemed fast. 

On first run I found myself wondering, "does this include skipped tests?" and "which test suites/types are included?"

I did not like that it created a json file...I know I'll forget about that and leave json files all over. Consider just printing the summary (as you have it currently) and adding a flag or using an alternate subcommand to dump the full json to stdout.

Comment 24

6 months ago
mozreview-review
Comment on attachment 8855917 [details]
Bug 1352495 - File-info mach subcommand to dump tests by bug component.

https://reviewboard.mozilla.org/r/127804/#review131754

This is mostly good. r- because it isn't quite ready to land.

::: python/mozbuild/mozbuild/frontend/mach_commands.py:121
(Diff revision 5)
>              print('%s :: %s' % (component.product, component.component) if component else 'UNKNOWN')
>              for f in sorted(files):
>                  print('  %s' % f)
>  
> +
> +    @SubCommand('file-info', 'test-bug-components-associations',

`test-bug-components-associations` doesn't read well to me.

As I was thinking of better names, I realized we already have a `bugzilla-component` sub-command. And, the command you've implemented here is a special case of a more generic command that aggregates/counts files by bug component.

How do you feel about rewriting this so it is a special mode of execution of the `bugzilla-component` sub-command. For example `mach file-info bugzilla-component --counts`?

::: python/mozbuild/mozbuild/frontend/mach_commands.py:131
(Diff revision 5)
> +        """Show metadata for each test file.
> +
> +        Given a requested path print a break down of bug components to test files
> +        """
> +        from mozbuild.testing import TestResolver
> +        resolver = self._spawn(TestResolver)

I'm not sure if it has been called out explicitly, but the test resolver will only find test files that are registered with the current build. So, there may be missing files not known to the build/resolver.

::: python/mozbuild/mozbuild/frontend/mach_commands.py:178
(Diff revision 5)
> +            components_to_files_metadata[m].append(path)
> +
> +        print('We have found tests in %d components' % len(components_to_files_metadata))
> +
> +        # Order the list to guarantee deterministic output when dumping to file
> +        for k, _ in components_to_files_metadata.items():

Use `itervalues()` here and call `.sort()` on instances directly. This will return object references and the sort is done in place. So there's no need to care about the key.

::: python/mozbuild/mozbuild/frontend/mach_commands.py:181
(Diff revision 5)
> +        with open('tests_in_bug_components.json', 'w') as fd:
> +            fd.write(json.dumps(components_to_files_metadata, indent=4, sort_keys=True))

I'm not a fan of unconditionally doing this. It needs to be put behind a flag, preferably also accepting an explicit output file name.
Attachment #8855917 - Flags: review?(gps) → review-
(Reporter)

Comment 25

6 months ago
Not working on this at the moment.
I've refocused on Quantum talos work.
Assignee: armenzg → nobody
(Reporter)

Comment 26

a month ago
Dave: this is the bug I was mentioning to you.

We can expand the data structure (or create a new one) which would contain all tests and under which configuration/platforms are run or not.

Publishing these artifacts to the index would allow building tools around it.
You need to log in before you can comment on or make changes to this bug.