Open Bug 1352495 Opened 3 years ago Updated 5 months ago
./mach subcommand to print tests categorized by bugzilla components
59 bytes, text/x-review-board-request
5.16 KB, text/plain
5.67 KB, text/plain
We want to see the sources of information with regards to which tests are associated to which manifests, components and map that to intermittency. We want to investigate the various source and produce summary data.
May you add an example of a "manifest" and a "component"? Links to either, maybe?
I view components as bugzilla components, which loosely maps to source directories. One example I did back in December with made up data is: https://people-mozilla.org/~jmaher/dashboard/dashboard.html I view the bugzilla components as someone who is a triage owner: https://bugzilla.mozilla.org/page.cgi?id=triage_owners.html Currently when viewing the source tree and using mach: $ ./mach file-info bugzilla-component layout/printing/** Core :: Layout layout/printing/PrintTranslator.cpp layout/printing/PrintTranslator.h layout/printing/crashtests layout/printing/ipc layout/printing/moz.build layout/printing/nsIPrintProgress.idl layout/printing/nsIPrintProgressParams.idl layout/printing/nsIPrintStatusFeedback.idl layout/printing/nsPagePrintTimer.cpp layout/printing/nsPagePrintTimer.h layout/printing/nsPrintData.cpp layout/printing/nsPrintData.h layout/printing/nsPrintEngine.cpp layout/printing/nsPrintEngine.h layout/printing/nsPrintObject.cpp layout/printing/nsPrintObject.h layout/printing/nsPrintPreviewListener.cpp layout/printing/nsPrintPreviewListener.h or more specifically: ./mach file-info bugzilla-component layout/printing/crashtests/509839-1.html Core :: Layout layout/printing/crashtests/509839-1.html I find the mapping tests in-tree -> bugzilla components gets us a path to ownership (via triage owners) and if we can give those owners a good set of data so they understand what code and tests they have along with any exceptions or issues to look at, then the bet is that we are providing value by giving useful data for teams to make decisions and focus on the right things.
My latest push to MozReview has working code. You can test with this command: > ./mach file-info test-bug-components-associations testing/mochitest/ It will show a summary and create a file with all the data. Running against the topsrcdir runs slowly. I will have to optimize.
it appears my 'windows' environment is not enjoying working with this: $ ./mach file-info test-bug-components-associations layout/ We have 594 tests under this/these paths: ['layout/'] Error running mach: ['file-info', 'test-bug-components-associations', 'layout/'] The error occurred in the implementation of the invoked mach command. This should never occur and is likely a bug in the implementation of that command. Consider filing a bug for this issue. If filing a bug, please include the full output of mach, including this error message. The details of the failure are as follows: KeyError: u'layout\\style\\test\\test_bug499655.html' File "c:\Users\elvis\mozilla-inbound\python/mozbuild/mozbuild/frontend/mach_commands.py", line 164, in associations m = self._get_files_info([path])[path].get('BUG_COMPONENT', 'UNKNOWN') I suspect this is something goofy with the \ vs / on windows. let me poke around a bit.
and a small hack like this works: diff --git a/python/mozbuild/mozbuild/frontend/mach_commands.py b/python/mozbuild/mozbuild/frontend/mach_commands.py --- a/python/mozbuild/mozbuild/frontend/mach_commands.py +++ b/python/mozbuild/mozbuild/frontend/mach_commands.py @@ -155,18 +155,19 @@ class MozbuildFileCommands(MachCommandBa if 'relpath' in metadata and metadata['relpath'] not in tests_by_path: tests_by_path[metadata['relpath']] = metadata print('We have %d tests under this/these paths: %s' % (len(tests_by_path), paths)) # Let's create a data structure containing the bug components components_to_files_metadata = defaultdict(list) for path in tests_by_path: + p = path.replace('\\', '/') # TODO: This might be suboptimal - m = self._get_files_info([path])[path].get('BUG_COMPONENT', 'UNKNOWN') + m = self._get_files_info([p])[p].get('BUG_COMPONENT', 'UNKNOWN') if m != 'UNKNOWN': m = '%s::%s' % (m.product, m.component) components_to_files_metadata[m].append(path) print('We have found tests in %d components' % len(components_to_files_metadata)) with open('tests_in_bug_components.json', 'w') as fd: fd.write(json.dumps(components_to_files_metadata, indent=4))
Comment on attachment 8855917 [details] Bug 1352495 - File-info mach subcommand to dump tests by bug component. https://reviewboard.mozilla.org/r/127804/#review130514 This seems pretty reasonable! ::: python/mozbuild/mozbuild/frontend/mach_commands.py:151 (Diff revision 3) > + # NOTE: Metadata generated by resolve_tests() can have multiple items for > + # the same test file. For instance, a test file can have multiple entries > + # with differening 'ancestor_manifest'. Not sure if this is a bug or expected > + # We're making the call of just picking the first appearance It is a feature. Some test files have different configurations. Each test configuration is represented as its own test entry in the test resolver. Since bug metadata is file-based, de-duping entries based on path is the correct thing to do. ::: python/mozbuild/mozbuild/frontend/mach_commands.py:163 (Diff revision 3) > + print('We have %d tests under this/these paths: %s' % (len(tests_by_path), paths)) > + > + # Let's create a data structure containing the bug components > + components_to_files_metadata = defaultdict(list) > + for path in tests_by_path: > + # TODO: This might be suboptimal This is definitely sub-optimal. `get_files_info()` has to run some somewhat expensive code. The evaluation essentially batches up files in the same directory. So passing the full list/set of paths you are interested in and invoking once should be faster than invoking N times. I wouldn't be surprised if there were some algorithmic improvements that could be made to `get_files_info()`: that function is intended to be run with tens or hundreds of inputs, not thousands. But unless we're doing something really dumb (which would be obvious by running in a profiler), it will almost certainly be faster than N invocations. ::: python/mozbuild/mozbuild/frontend/mach_commands.py:172 (Diff revision 3) > + components_to_files_metadata[m].append(path) > + > + print('We have found tests in %d components' % len(components_to_files_metadata)) > + > + with open('tests_in_bug_components.json', 'w') as fd: > + fd.write(json.dumps(components_to_files_metadata, indent=4)) We also typically sort JSON files so output is deterministic and more human readable.
Focusing the bug on the current feature being developed.
Summary: Investigate finding metadata about test files, associated them to bug components and manifest files → ./mach subcommand to print tests categorized by bugzilla components
this seems to run faster, but still fails for me on windows :) I also noticed we do not have reftests associated here- I noticed in layout/reftests/moz.build we define REFTEST_MANIFESTS and associate all the reftest.list files with that. Possibly handle this in a followup bug :)
Comment on attachment 8855917 [details] Bug 1352495 - File-info mach subcommand to dump tests by bug component. https://reviewboard.mozilla.org/r/127804/#review130514 > This is definitely sub-optimal. `get_files_info()` has to run some somewhat expensive code. The evaluation essentially batches up files in the same directory. So passing the full list/set of paths you are interested in and invoking once should be faster than invoking N times. > > I wouldn't be surprised if there were some algorithmic improvements that could be made to `get_files_info()`: that function is intended to be run with tens or hundreds of inputs, not thousands. But unless we're doing something really dumb (which would be obvious by running in a profiler), it will almost certainly be faster than N invocations. Having made the changes you mentioned on how to call _get_files_info() helped (all paths at once instead of calling N times), however, yesterday I fixed the issue where I would not be processing reftests ('relpath' vs ' file_relpath') and that bumped the number of files from 16k to 42k. The script now takes forever. I will do some profiling.
and there are also web_platform_tests to consider, they are defined in testing/web-platform/*. Not an easy problem to solve, but it sounds like you are doing the right things here Armen!
Armen, From your mail: > The data TestResolver uses comes from the build generated "all-tests.pkl" file. Can you read the "all-tests.pkl" file directly? What does TestResolver do? It appears to be coded assuming a particular access pattern, or use case; your case appears to not match. May you send me a copy of an "all-tests.pkl" file? I am very curious about what that file is about. Thank you!
Processing between tests using 'relpath' vs 'file_relpath' is not necessarily the issue. Going from 10k test paths to 15k paths does not increase the number of function calls in 50% or increase the processing time approximately 50%. We have a non-linear growth with number of inputs. ~15k entries: > 232813041 function calls (232803206 primitive calls) in 126.765 seconds ~10k entries: > 78787958 function calls (78782017 primitive calls) in 41.454 seconds That's rougly 300% increase.
Here are the top 4 tottime calls between the two. ~15k entries: > 4738162 17.521 0.000 26.463 0.000 /Users/armenzg/.pyenv/versions/2.7.12/lib/python2.7/posixpath.py:329(normpath) > 14796 12.716 0.001 93.053 0.006 /Users/armenzg/repos/firefox/python/mozbuild/mozbuild/frontend/reader.py:1401(test_defaults_for_path) > 13086004 12.058 0.000 18.874 0.000 /Users/armenzg/.pyenv/versions/2.7.12/lib/python2.7/posixpath.py:120(dirname) > 2330408 9.416 0.000 63.748 0.000 /Users/armenzg/.pyenv/versions/2.7.12/lib/python2.7/posixpath.py:424(relpath) ~10k entries: > 7059225 6.306 0.000 9.818 0.000 /Users/armenzg/.pyenv/versions/2.7.12/lib/python2.7/posixpath.py:120(dirname) > 9796 5.859 0.001 22.748 0.002 /Users/armenzg/repos/firefox/python/mozbuild/mozbuild/frontend/reader.py:1401(test_defaults_for_path) > 7005202 3.073 0.000 14.327 0.000 /Users/armenzg/repos/firefox/python/mozbuild/mozpack/path.py:51(dirname) > 532526 2.181 0.000 3.326 0.000 /Users/armenzg/.pyenv/versions/2.7.12/lib/python2.7/posixpath.py:329(normpath)
gps: Do you see any obvious perf changes? Should we look at another approach? (e.g. generate a file with bugzilla components at build time)
ekyle: the file gets generated as part of the build process under the objdir. I've uploaded it for you: https://drive.google.com/open?id=0B3E5wl4QTSYKUXZlbndLNUh0X1U I don't know what specifically generates it. You can find more about it in here: https://dxr.mozilla.org/mozilla-central/search?q=all-tests.pkl&=mozilla-central
As expected, there was some sub-optimal code in metadata reading. The patches I just attached in bug 1355630 reduce execution time for `mach file-info test-bug-components-associations dom` (9131 tests) from ~27.3s to ~3.7s. When running on the entire repo (42+k files), execution time dropped from minutes to ~24s. There are probably some other optimizations to be found. But I think I chopped off the big and easy ones.
That's fantastic! Thanks for your prompt help!
I had a quick look at this. It worked well for me and seemed fast. On first run I found myself wondering, "does this include skipped tests?" and "which test suites/types are included?" I did not like that it created a json file...I know I'll forget about that and leave json files all over. Consider just printing the summary (as you have it currently) and adding a flag or using an alternate subcommand to dump the full json to stdout.
Comment on attachment 8855917 [details] Bug 1352495 - File-info mach subcommand to dump tests by bug component. https://reviewboard.mozilla.org/r/127804/#review131754 This is mostly good. r- because it isn't quite ready to land. ::: python/mozbuild/mozbuild/frontend/mach_commands.py:121 (Diff revision 5) > print('%s :: %s' % (component.product, component.component) if component else 'UNKNOWN') > for f in sorted(files): > print(' %s' % f) > > + > + @SubCommand('file-info', 'test-bug-components-associations', `test-bug-components-associations` doesn't read well to me. As I was thinking of better names, I realized we already have a `bugzilla-component` sub-command. And, the command you've implemented here is a special case of a more generic command that aggregates/counts files by bug component. How do you feel about rewriting this so it is a special mode of execution of the `bugzilla-component` sub-command. For example `mach file-info bugzilla-component --counts`? ::: python/mozbuild/mozbuild/frontend/mach_commands.py:131 (Diff revision 5) > + """Show metadata for each test file. > + > + Given a requested path print a break down of bug components to test files > + """ > + from mozbuild.testing import TestResolver > + resolver = self._spawn(TestResolver) I'm not sure if it has been called out explicitly, but the test resolver will only find test files that are registered with the current build. So, there may be missing files not known to the build/resolver. ::: python/mozbuild/mozbuild/frontend/mach_commands.py:178 (Diff revision 5) > + components_to_files_metadata[m].append(path) > + > + print('We have found tests in %d components' % len(components_to_files_metadata)) > + > + # Order the list to guarantee deterministic output when dumping to file > + for k, _ in components_to_files_metadata.items(): Use `itervalues()` here and call `.sort()` on instances directly. This will return object references and the sort is done in place. So there's no need to care about the key. ::: python/mozbuild/mozbuild/frontend/mach_commands.py:181 (Diff revision 5) > + with open('tests_in_bug_components.json', 'w') as fd: > + fd.write(json.dumps(components_to_files_metadata, indent=4, sort_keys=True)) I'm not a fan of unconditionally doing this. It needs to be put behind a flag, preferably also accepting an explicit output file name.
Attachment #8855917 - Flags: review?(gps) → review-
Not working on this at the moment. I've refocused on Quantum talos work.
Assignee: armenzg → nobody
Dave: this is the bug I was mentioning to you. We can expand the data structure (or create a new one) which would contain all tests and under which configuration/platforms are run or not. Publishing these artifacts to the index would allow building tools around it.
You need to log in before you can comment on or make changes to this bug.