Closed Bug 1453056 Opened 2 years ago Closed 2 years ago
test-verify should have the ability to run in chunks depending on the incoming tests
I see many cases in-tree where we are moving tests or importing them- this causes them to run in different jobs or contexts. For example last week I saw us migrating tests in volume from mochitest -> xpcshell for web extensions; also this past weekend I saw about 30 reftests being moved to a w3c submitted folder and some slight modifications at the same time. We can add chunks to test-verify with some logic- specifically looking for file patterns for each job type (misc, web-platform-tests) and chunking based on the maximum number of tests to run per chunk.
This could also be useful for per-test coverage.
Assignee: nobody → jmaher
Status: NEW → ASSIGNED
Comment on attachment 8966755 [details] [diff] [review] run test-verify in chunks Review of attachment 8966755 [details] [diff] [review]: ----------------------------------------------------------------- Lots of implementation nits here, but nothing too serious. The use of file patterns in taskcluster to estimate the number of modified tests is ingenious; it will work well for lots of cases. There will also be mismatches: Cases where non-test files are counted as tests in taskcluster (more chunks spawned than necessary: inefficient, but not a big deal if it happens infrequently and the number of chunks is not excessive) and cases where test files are missed in taskcluster (fewer chunks spawned than required to verify all tests: not all tests are verified, like today). I am not convinced of the need for chunking for test-verify. What is the goal? Chunking will verify more tests, but I don't know that we need that. Boiling the ocean -- running every test through TV -- does not really move us closer to running TV as tier 1 (see https://bugzilla.mozilla.org/show_bug.cgi?id=1405428#c5). If it's just for code coverage, let's apply it to code coverage and keep verification simple and efficient (I thought we had agreed on that). The downside of chunking is that we use more resources. TV-ing a test means a good 20x more test machine time than just running the test once. That makes me uneasy, but capping the number of tests verified on a push makes it seem more sane. I don't want to get rid of that cap without a good reason. ::: taskcluster/taskgraph/transforms/tests.py @@ +764,5 @@ > them and assigning 'this-chunk' appropriately and updating the treeherder > symbol.""" > for test in tests: > + if test['suite'].startswith('test-verify'): > + test['chunks'] = split_perfile_chunks(config, test['test-name']) Consider capping the number of chunks to a maximum. If some mass linting change modifies thousands of tests, do you want to start up hundreds of chunks?? @@ +786,5 @@ > > yield chunked > > > +def split_perfile_chunks(config, type): nit: split_perfile_chunks is not actually splitting anything. Maybe "get_perfile_total_chunks"? Also, with an eye to code-coverage, etc in future, maybe there is too much verify-specific logic here. Could split out logic just for count_modified_test_files(file_patterns), then if test['suite'].startswith('test-verify'): if test['test-name'].startswith('test-verify-wpt'): file_patterns = 'testing/web-platform/tests' else: file_patterns = ... num_files = count_modified_test_files(config, file_patterns) test['chunks'] = int(math.ceil(num_files/10.0)) @@ +787,5 @@ > yield chunked > > > +def split_perfile_chunks(config, type): > + # Determine the number of chunks to run test-verify in I would like more explanation here, saying that file_patterns are being used to approximate the number of modified test files, to estimate the number of chunks required. @@ +792,5 @@ > + > + # TODO: Make this flexible based on coverage vs verify || test type > + tests_per_chunk = 10.0 > + > + if type == 'test-verify-wpt-e10s': type.startswith('test-verify-wpt') ? @@ +793,5 @@ > + # TODO: Make this flexible based on coverage vs verify || test type > + tests_per_chunk = 10.0 > + > + if type == 'test-verify-wpt-e10s': > + file_patterns = ['testing/web-platform/tests/**/*.htm*'] Why not testing/web-platform/tests/** ? @@ +794,5 @@ > + tests_per_chunk = 10.0 > + > + if type == 'test-verify-wpt-e10s': > + file_patterns = ['testing/web-platform/tests/**/*.htm*'] > + elif type == 'test-verify-e10s' or type == 'test-verify': type.startswith('test-verify') @@ +804,5 @@ > + test_files =  > + for pattern in file_patterns: > + for path in changed_files: > + if mozpackmatch(path, pattern): > + test_files.append(path) nit: Don't need to keep track of the test_files -- could just increment a counter here. ::: testing/mozharness/mozharness/mozilla/testing/verify_tools.py @@ +105,5 @@ > ('chrome', 'clipboard') : 'chrome-clipboard', > ('plain', 'clipboard') : 'plain-clipboard', > ('browser-chrome', 'devtools') : 'mochitest-devtools-chrome', > + ('browser-chrome', 'screenshots') : 'browser-chrome-screenshots', > + # below should be on test-verify-gpu job Make this change in bug 1425929. @@ +205,5 @@ > else: > self._find_misc_tests(dirs, changed_files) > > + # verify mode doesn't run normally, we split tests up in _find_*_tests > + # and then pass each test into the harness on the cli I do not understand this comment. What does "run normally" mean? _find_*_tests does not split tests up. @@ +210,5 @@ > + > + # chunk files > + total_tests = sum([len(self.verify_suites[x]) for x in self.verify_suites]) > + > + files_per_chunk = total_tests / (int(self.config.get('total_chunks', 1)) * 1.0) nit: float() instead of int() * 1.0 @@ +224,5 @@ > + current += 1 > + if current >= start and current < end: > + if suite not in suites: > + suites[suite] =  > + suites[suite].append(test) nit: consider break'ing once current >= end
addressed all the nits, added supporting comments. Regarding test-verify vs coverage- I think treating per_file style testing the same is a good idea, I also agree we shouldn't open the door to too full chunking. I made a choice to set the max chunks at 3 with a supporting comment; if that is not good we can leave it at one and then the infrastructure is in place for more :)
figured out why chunking wasn't working: 212c212 < + elif not self.config.get('verify'): --- > + elif self.config.get('verify') is not True: 234c234 < + elif c.get('total_chunks') and c.get('this_chunk') and not c.get('verify'): --- > + elif c.get('total_chunks') and c.get('this_chunk') and c.get('verify') is not True: 256c256 < + elif not c.get('verify'): --- > + elif c.get('verify') is not True: :gbrown, is it ok to carry forward your r+ with those specific changes?
Yes. Also keep in mind that marco is changing this sort of check to something like self.verify_enabled -- probably safer.
Pushed by firstname.lastname@example.org: https://hg.mozilla.org/integration/mozilla-inbound/rev/c26bbed87ab0 test-verify should have the ability to run in chunks depending on the incoming tests. r=gbrown
https://hg.mozilla.org/mozilla-central/annotate/c26bbed87ab0/taskcluster/taskgraph/transforms/tests.py#l822 added a call to get_changed_files(), which makes requests to hg.m.o every time taskgraph generation is run. This makes it impossible to run taskgraph generation completely locally. Is there another way we can get the information you need without hitting hg.m.o each time when taskgraph generation is run locally?
I hadn't thought about the hg.m.o hits -- that is troubling and I hope :jmaher can think of a way around that. > This makes it impossible to run taskgraph generation completely locally. I'm not sure I understand. I can run 'mach taskgraph tasks' locally and it completes OK.
(In reply to Geoff Brown [:gbrown] from comment #11) > I hadn't thought about the hg.m.o hits -- that is troubling and I hope > :jmaher can think of a way around that. > > > This makes it impossible to run taskgraph generation completely locally. > > I'm not sure I understand. I can run 'mach taskgraph tasks' locally and it > completes OK. I should have said 'impossible to run without hitting the network'
thanks for mentioning this :catlee, I will look into this and see what options we have.
apologies for taking so long, seems that a work week requires a lot of time to catch up and finish what was already started. Looking into this, I had copied the general logic from the optimize step: https://searchfox.org/mozilla-central/source/taskcluster/taskgraph/optimize.py#357 I see that is @memoize, possibly we could do that for perfile_number_of_chunks. In SETA (used in optimize), we have a two calls (treeherder and hg) to external services: https://searchfox.org/mozilla-central/source/taskcluster/taskgraph/util/seta.py I guess, there was enough prior art for so long that I didn't think this was a problem? Would time be better spent figuring out how to cache the data from a hg call, I am thinking to share the data between the optimize step and the perfile_number_of_chunks step? Or would time be better spent figure out a flow in the code that when we are not on a network that we can still push successfully?
Flags: needinfo?(jmaher) → needinfo?(catlee)
I haven't dug too deeply into it, but it looks like https://searchfox.org/mozilla-central/source/taskcluster/taskgraph/optimize.py#357 doesn't normally get run when doing taskgraph generation locally. If I have a parameters file, then the only remote request is via this new call in tests.py
when bug 1400895 lands, this patch will apply to it and solve this problem :)
Depends on: 1400895
this is simple, but when I turn my network off, this traps the failure and continues to run mach properly :)
Comment on attachment 8980022 [details] [diff] [review] allow localhost only runs of mach taskgraph Review of attachment 8980022 [details] [diff] [review]: ----------------------------------------------------------------- Thanks
Attachment #8980022 - Flags: review?(ahal) → review+
Pushed by email@example.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/cbf4d07d0e14 allow mach taskgraph runs locally with no network. r=ahal
You need to log in before you can comment on or make changes to this bug.