Closed Bug 1337903 Opened 8 years ago Closed 8 years ago

Pull python-tests out of 'make check' on OSX

Categories

(Firefox Build System :: General, defect)

defect
Not set
normal

Tracking

(firefox55 fixed)

RESOLVED FIXED
mozilla55
Tracking Status
firefox55 --- fixed

People

(Reporter: ahal, Assigned: dustin)

References

(Blocks 1 open bug)

Details

Attachments

(4 files)

Bug 1320194 and bug 1003417 are close to landing which means we'll be able to run python-tests in their own taskcluster tasks on Linux. There still isn't a great story for OSX and Windows though, as the taskcluster support isn't quite there. I think OSX is going to be the higher priority due to the data centre migration (?), so this bug will only focus on that for now. We'll either need to get taskcluster run_task support working on OSX, or else stand up some buildbot jobs that do it. Obviously the former would be preferable, but I'm not sure what dependencies (other than implementing the in-tree configs) it has.
Ahal, can you follow up with dustin about what TaskCluster support needs to be added?
Flags: needinfo?(ahalberstadt)
Depends on: 1342230
On OSX taskcluster uses something called "taskcluster-worker" instead of "docker-worker". From the taskcluster service side of things, we should be able to use taskcluster-worker tasks right now, though for the moment caching doesn't work. Greg Arndt said this is mostly just because it hasn't been prioritized yet. So the good news is that we should be able to get it working only with in-tree changes to the taskcluster configs side (sans caches). I don't think anyone is fully aware of what will be needed, but I did some digging and found we'll at least need to do the following: 1) Go through the files in this directory: https://dxr.mozilla.org/mozilla-central/source/taskcluster/taskgraph/transforms/job Then pull out "docker-worker" specific logic/configs, and share the rest with a "taskcluster-worker" run_using entry point. 2) Add a "taskcluster-worker" payload builder here: https://dxr.mozilla.org/mozilla-central/source/taskcluster/taskgraph/transforms/task.py#430 I started working on a patch and I think I mostly know how to do 1). But I gave up once I realized that I needed to do 2) as well. There may be more steps beyond 2). Overall I think this is pretty do-able, though it probably requires support from the taskcluster team. If I were working on this, I'd estimate 2 weeks to working prototype and a month to complete.
Flags: needinfo?(ahalberstadt)
tc-worker is super-flexible, so we are building its payloads on a per-engine basis. The native engine is the one we use on OS X (since it runs tasks natively, not in containers or VMs). Happily, that payload builder already exists https://dxr.mozilla.org/mozilla-central/source/taskcluster/taskgraph/transforms/task.py#253 https://dxr.mozilla.org/mozilla-central/source/taskcluster/taskgraph/transforms/task.py#591 so, I think (2) is largely done for you. And you're right about (1). I think you'd need to add @run_job_using("native-engine", "run-task", schema=run_task_schema) similar to https://dxr.mozilla.org/mozilla-central/source/taskcluster/taskgraph/transforms/job/run_task.py#42 and then add a new alternative for mozbase, using `implementation: native-engine`: https://dxr.mozilla.org/mozilla-central/source/taskcluster/ci/source-test/python-tests.yml#37
Assignee: nobody → dustin
So run-task is fairly linux-specific. In particular, it does lots of chowny stuff on the assumption that it's running as root, which it is not on a mac. So that's pretty annoying. It also has logic to fetch hg fingerprints from the secrets service, but we don't have a taskcluster proxy on OS X, so it doesn't have access to that service. I think I will just rely on the fallback behavior for that, for the interim.
I think I have the tests running. They are "succeeding" in that they exit with status 0, but seem to not be actually doing anything. https://public-artifacts.taskcluster.net/Urcrfsx9RzCbnnlreuoV5w/0/public/logs/live_backing.log [task 2017-03-06T20:30:09.163251Z] Installed /Users/cltbld/checkouts/gecko/obj-x86_64-apple-darwin14.5.0/_virtualenv/lib/python2.7/site-packages/blessings-1.6-py2.7.egg [task 2017-03-06T20:30:09.163298Z] Finished processing dependencies for mozlog==3.4 [task 2017-03-06T20:30:09.163331Z] Error in atexit._run_exitfuncs: [task 2017-03-06T20:30:09.163364Z] Traceback (most recent call last): [task 2017-03-06T20:30:09.163425Z] File "/tools/python27/lib/python2.7/atexit.py", line 24, in _run_exitfuncs [task 2017-03-06T20:30:09.163458Z] func(*targs, **kargs) [task 2017-03-06T20:30:09.163525Z] File "/tools/python27/lib/python2.7/multiprocessing/util.py", line 284, in _exit_function [task 2017-03-06T20:30:09.163557Z] info('process shutting down') [task 2017-03-06T20:30:09.163592Z] TypeError: 'NoneType' object is not callable [task 2017-03-06T20:30:09.163618Z] Error in sys.exitfunc: [task 2017-03-06T20:30:09.163651Z] Traceback (most recent call last): [task 2017-03-06T20:30:09.163713Z] File "/tools/python27/lib/python2.7/atexit.py", line 24, in _run_exitfuncs [task 2017-03-06T20:30:09.163743Z] func(*targs, **kargs) [task 2017-03-06T20:30:09.163813Z] File "/tools/python27/lib/python2.7/multiprocessing/util.py", line 284, in _exit_function [task 2017-03-06T20:30:09.163845Z] info('process shutting down') [task 2017-03-06T20:30:09.163886Z] TypeError: 'NoneType' object is not callable [task 2017-03-06T20:30:09.163898Z] [task 2017-03-06T20:30:09.261528Z] cat: backend.TestManifestBackend.in: No such file or directory [task 2017-03-06T20:30:09.264755Z] Build configuration changed. Regenerating backend. [task 2017-03-06T20:30:09.651903Z] No build detected, test metadata may be incomplete. [task 2017-03-06T20:30:09.677458Z] 0:05.72 Return code from mach python-test: 0 so maybe that's what's missing? At any rate this seems a bit out of my expertise at this point. And, in fact, I don't see anything called "backend.TestManifestBackend.in"
Flags: needinfo?(dave.hunt)
Attachment #8844141 - Flags: review?(ahalberstadt)
(I'd like to get :gps's review on the run-task changes, since he wrote it, but he's away for a few more weeks)
Comment on attachment 8844125 [details] Bug 1337903: install robustclone on workers https://reviewboard.mozilla.org/r/117672/#review119386 just a nit in commit message, lgtm ::: commit-message-0d94f:3 (Diff revision 1) > +Bug 1337903: install robustclone on workers; r?wcosta r?grenade > + > +This installes the extension everywhere, unconditionally. Because we do nit: "installs"
Attachment #8844125 - Flags: review?(wcosta) → review+
Dave, just clarifying your needinfo here. I told Dustin to needinfo you because it looked like there was a problem setting up mozlog (due to the fact we now setup.py install it), and was wondering if you had seen a traceback like that before. But on closer inspection it looks like the error is happening after mozlog is installed.. so I guess just confirm you haven't that traceback before. The 'cat: backend.TestManifestBackend.in: No such file or directory' message is harmless, and actually expected in this case. We should probably file a bug to suppress it, as it does add confusion. The real problem must be those tracebacks that look like they are coming out of the 'virtualenv install' step from the build. Though, I can't be sure and have no idea where specifically they are originating.. Mike, have you seen anything like those tracebacks in comment 16 before?
Flags: needinfo?(mshal)
Another possibility is that those tracebacks are also harmless, and you are simply trying to run tests that require a build to get detected. If that's the case, then everything is working as expected (minus cryptic tracebacks in the log).
I guess that wouldn't quite be "as expected" since there's no build in this case, and there should be one?
Attachment #8844125 - Flags: review?(rthijssen) → review+
> Dave, just clarifying your needinfo here. I told Dustin to needinfo you because it looked like there was a problem setting up mozlog (due to the fact we now setup.py install it), and was wondering if you had seen a traceback like that before. But on closer inspection it looks like the error is happening after mozlog is installed.. so I guess just confirm you haven't that traceback before. Thanks for the clarification. I can confirm that I've not seen that traceback before, and as you say it doesn't appear to be from mozlog.
Flags: needinfo?(dave.hunt)
(In reply to Dustin J. Mitchell [:dustin] from comment #23) > I guess that wouldn't quite be "as expected" since there's no build in this > case, and there should be one? No, there shouldn't be a build. Actually, just noticed that you are running the mozbase tests. Unless you modified their subsuites in the manifests to include OSX, e.g: https://dxr.mozilla.org/mozilla-central/source/testing/mozbase/mozcrash/tests/manifest.ini Then, they won't get picked up. Try adding `|| os == "mac"` there and see if it gets picked up.
(In reply to Andrew Halberstadt [:ahal] from comment #21) > The real problem must be those tracebacks that look like they are coming out > of the 'virtualenv install' step from the build. Though, I can't be sure and > have no idea where specifically they are originating.. Mike, have you seen > anything like those tracebacks in comment 16 before? I have not seen that before, though it looks like somehow the info function in multiprocessing/util.py is getting overridden with None. I don't know how that would happen without some intentional overwriting of sys.modules or something. Why is blessings being downloaded as part of the virtualenv setup in the first place? It's already in the tree as python/blessings. We don't want to rely on external servers for things like that.
Flags: needinfo?(mshal)
Comment on attachment 8844140 [details] Bug 1337903: add and use a job_try_name attribute; https://reviewboard.mozilla.org/r/117676/#review119748 Great cleanup, thanks! ::: commit-message-76bf7:19 (Diff revision 1) > +The tasks remain un-selectable in try, as before (they always run, but can be > +optimized away by when.files-changed). They actually can be selected with -j.. It's just undocumented and not supported by 'mach try', but we should fix both those things. ::: taskcluster/ci/source-test/mozlint.yml:1 (Diff revision 1) > -mozlint-eslint/opt: > +mozlint-eslint: Thanks for getting 'opt' out of the task name.. that was bugging me :) ::: taskcluster/taskgraph/transforms/source_test.py:34 (Diff revision 1) > + # The platform on which this task runs. This will be used to set up attributes > + # (for try selection) and treeherder metadata (for display). If given as a list, > + # the job will be "split" into multiple tasks, one with each platform. > + Required('platform'): Any(basestring, [basestring]), I wonder if we should even support a string here.. Could just require a list which makes it more obvious how to add more.. but guess it's not a big deal.
Attachment #8844140 - Flags: review?(ahalberstadt) → review+
Comment on attachment 8844141 [details] Bug 1337903: add support for OS X in run-task; https://reviewboard.mozilla.org/r/117678/#review119760 Thanks so much for doing this looks great! Just some minor comments and questions. Agreed it would have been ideal for gps to review this, but not worth delaying this for several more weeks. ::: taskcluster/ci/source-test/python-tests.yml:57 (Diff revision 1) > > mozbase: > description: testing/mozbase unit tests > platform: > - linux64/opt > + - macosx64/opt I'd be in favour of leaving the changes in this file for a follow-up bug. We can deal with getting the subsuites right and investigate those tracebacks there as well. That said it's useful to include this here for now for testing purposes. ::: taskcluster/docker/recipes/run-task:105 (Diff revision 1) > + # Obtain certificate fingerprints. Without this, the checkout will use the fingerprint > + # on the system, which is (hopefully) managed some other way So how is it managed for native-worker? ::: taskcluster/docker/recipes/run-task:188 (Diff revision 1) > + # expand ~ in some paths > + if args.vcs_checkout: > + args.vcs_checkout = os.path.expanduser(args.vcs_checkout) > + if args.tools_checkout: > + args.tools_checkout = os.path.expanduser(args.tools_checkout) I almost want to say use a for loop with 'getattr', but their formats are all just different enough that I won't. ::: taskcluster/taskgraph/transforms/job/mach.py:33 (Diff revision 1) > run = job['run'] > > # defer to the run_task implementation > - run['command'] = 'cd /home/worker/checkouts/gecko && ./mach ' + run['mach'] > + run['command'] = 'cd ~/checkouts/gecko && ./mach ' + run['mach'] > run['checkout'] = True > del run['mach'] This could be factored out to a 'run_mach' function. ::: taskcluster/taskgraph/transforms/job/run_task.py:79 (Diff revision 1) > + if run['checkout']: > + support_vcs_checkout(config, job, taskdesc) > + > + if run['requires-build']: > + add_build_dependency(config, job, taskdesc) A lot of common code here as well.
Attachment #8844141 - Flags: review?(ahalberstadt) → review-
Comment on attachment 8844140 [details] Bug 1337903: add and use a job_try_name attribute; https://reviewboard.mozilla.org/r/117676/#review119748 > They actually can be selected with -j.. It's just undocumented and not supported by 'mach try', but we should fix both those things. Oh, interesting -- this change would make them selectable with `-j linux64`, which is probably not what you want. Currently `-j` selects jobs by looking for tasks with a kind in JOB_KINDS and with `build_platform` matching the `-j` option. Should I set `build_platform` to mozbase for these tasks? That seems wrong (the build platform is not, in fact, mozbase).
Comment on attachment 8844141 [details] Bug 1337903: add support for OS X in run-task; https://reviewboard.mozilla.org/r/117678/#review120086 ::: taskcluster/ci/source-test/python-tests.yml:57 (Diff revision 1) > > mozbase: > description: testing/mozbase unit tests > platform: > - linux64/opt > + - macosx64/opt If I'm understanding correctly, you're saying to omit this additional platform, since the resulting tests aren't ready yet, and landing the rest? ::: taskcluster/docker/recipes/run-task:105 (Diff revision 1) > + # Obtain certificate fingerprints. Without this, the checkout will use the fingerprint > + # on the system, which is (hopefully) managed some other way With Puppet. I'm not sure why I wrote "(hopefully)"! ::: taskcluster/taskgraph/transforms/job/mach.py:33 (Diff revision 1) > run = job['run'] > > # defer to the run_task implementation > - run['command'] = 'cd /home/worker/checkouts/gecko && ./mach ' + run['mach'] > + run['command'] = 'cd ~/checkouts/gecko && ./mach ' + run['mach'] > run['checkout'] = True > del run['mach'] Oh, good point, I didn't see that they had converged to be almost identical!
(In reply to Dustin J. Mitchell [:dustin] from comment #30) > Oh, interesting -- this change would make them selectable with `-j linux64`, > which is probably not what you want. Currently `-j` selects jobs by looking > for tasks with a kind in JOB_KINDS and with `build_platform` matching the > `-j` option. Should I set `build_platform` to mozbase for these tasks? > That seems wrong (the build platform is not, in fact, mozbase). Yeah let's not do that, it sounds like maybe -j needs to be re-worked. Practically no one uses -j, so maybe we can just leave it as is for now and file a follow-up to make it look at the 'label' (or name or whatever it ends up as) instead. Or if the fix is simple, you could throw it in here too.
I think I came up with a fix for that.
I filed bug 1345554 for mozreview's confusion over which of those has and has not been reviewed.
Comment on attachment 8844141 [details] Bug 1337903: add support for OS X in run-task; https://reviewboard.mozilla.org/r/117678/#review120186 Thanks, looks good! ::: taskcluster/ci/source-test/python-tests.yml:57 (Diff revisions 1 - 2) > > mozbase: > description: testing/mozbase unit tests > platform: > - linux64/opt > - - macosx64/opt > + # macosx64/opt -- not ready yet This will be removed before landing right? Otherwise I could picture this deterring people from working on it under ths assumption there are mysterious blockers :)
Attachment #8844141 - Flags: review?(ahalberstadt) → review+
Comment on attachment 8844141 [details] Bug 1337903: add support for OS X in run-task; https://reviewboard.mozilla.org/r/117678/#review120186 > This will be removed before landing right? Otherwise I could picture this deterring people from working on it under ths assumption there are mysterious blockers :) Sorry, guess I should have clarified my last issue. By useful for "testing" purposes, I just mean while we were debugging this patch.. Please uncommit this file.
Comment on attachment 8844140 [details] Bug 1337903: add and use a job_try_name attribute; https://reviewboard.mozilla.org/r/117676/#review120262 ::: taskcluster/taskgraph/try_option_syntax.py:573 (Diff revision 2) > - elif attr('kind') in JOB_KINDS: > + elif attr('kind') in JOB_KINDS_MATCHING_BUILD_PLATFORM: > # This will add 'job' tasks to the target set even if no try syntax was specified. > if not self.jobs: > return True > if attr('build_platform') in self.jobs: > return True Would it be hard to get 'toolchain' and 'android-stuff' to also use job_try_name and set it to the build_platform via transform? Then you could remove this block. Consider this optional.
Comment on attachment 8845003 [details] Bug 1337903: handle platforms for source-test distinctly; https://reviewboard.mozilla.org/r/118250/#review120266 In case it wasn't clear, I properly reviewed the 2nd changeset as well.
Attachment #8845003 - Flags: review?(ahalberstadt) → review+
(In reply to Andrew Halberstadt [:ahal] from comment #39) > Sorry, guess I should have clarified my last issue. By useful for "testing" > purposes, I just mean while we were debugging this patch.. Please uncommit > this file. To be clear, you're asking me to remove all hunks in this file, and change the commit message from "add, and use, support for OS X in run-task" to just "add support for OS X in run-task"? Then we'll actually use it in a followup, based on comment 29? > Would it be hard to get 'toolchain' and 'android-stuff' to also use > job_try_name and set it to the build_platform via transform? Then you could > remove this block. > > Consider this optional. I looked at it briefly, and it wasn't obvious what try names were expected. For toolchain, I think I can ask glandium. 'android-stuff' is sort of a graveyard for unloved tasks, so I could probably change that with no deep concern. I made a follow-on: bug 1345863.
Keywords: leave-open
(In reply to Dustin J. Mitchell [:dustin] from comment #42) > To be clear, you're asking me to remove all hunks in this file, and change > the commit message from "add, and use, support for OS X in run-task" to just > "add support for OS X in run-task"? Then we'll actually use it in a > followup, based on comment 29? Yes. I'll file the bug to get mozbase pulled out on OSX. Just felt it would be cleaner to tackle separately as there are several other non-taskcluster related things that will need to be changed. > I looked at it briefly, and it wasn't obvious what try names were expected. > For toolchain, I think I can ask glandium. 'android-stuff' is sort of a > graveyard for unloved tasks, so I could probably change that with no deep > concern. I made a follow-on: bug 1345863. Couldn't you just set it to 'build-platform' and then it would have the same behaviour as now? Or is it more complicated than that? Either way, it's not a huge priority and I'm happy with the follow-up on file. Thanks!
Initially I had filed this as a meta bug, but now it contains the necessary precursor work. So moving bug 1342230 from dependency to blocked by.
Blocks: 1342230
No longer depends on: 1342230
Blocks: 1345874
Pushed by dmitchell@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/52d4c68eddd2 handle platforms for source-test distinctly; r=ahal https://hg.mozilla.org/integration/autoland/rev/f725dc76b398 add and use a job_try_name attribute; r=ahal https://hg.mozilla.org/integration/autoland/rev/774a218c2c4c add support for OS X in run-task; r=ahal
Ah, I was leaving open, but bug 1345874 is the follow-on.
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Keywords: leave-open
Target Milestone: --- → mozilla55
Depends on: 1368733
Product: Core → Firefox Build System
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: