Closed Bug 1278990 Opened 9 years ago Closed 9 years ago

update mozconfig.cache to set SCCACHE_BUCKET and SCCACHE_NAMESERVER for taskcluster builders

Categories

(Firefox Build System :: General, defect)

defect
Not set
normal

Tracking

(firefox50 fixed)

RESOLVED FIXED
mozilla50
Tracking Status
firefox50 --- fixed

People

(Reporter: pmoore, Assigned: grenade)

References

(Blocks 1 open bug)

Details

Attachments

(1 file, 4 obsolete files)

When running builds under taskcluster, there is no buildprops.json file, and therefore build/mozconfig.cache shouldn't assume there is one, and use it to determine platform. This results in it adding option --with-ccache to windows builds, since $platform gets value "". build/mozconfig.cache is lower down in the stack than buildbot, so it shouldn't have any knowledge of buildbot (the dependency is in the wrong direction). Rather, there should be a defined way to specify platform, so that both buildbot and taskcluster and a user running a job outside of any CI can specify the platform in the same way, such that the mozconfig does not need to understand abstraction levels above it, only beneath it.
Don't Linux taskcluster builds have a buildprops.json?
:glandium: they do, but :dustin mentioned on irc that he intends to rip them out. It was left in for convenience, rather than intentionally.
Attached patch bug1278990_gecko_v1.patch (obsolete) — Splinter Review
I shuffled things around a little in this file, but the logic should be the same for buildbot builds and developer builds. The main difference now is that if you set SCCACHE_BUCKET and SCCACHE_NAMESERVER, you will call the same logic that previously was only available to buildbot builds. The idea is that for the Windows builds running in taskcluster, we'll set these two vars in the build when sccache should get used (i.e. when running on try/mozilla-inbound/fx-team).
Assignee: nobody → pmoore
Status: NEW → ASSIGNED
Attachment #8762143 - Flags: review?(mh+mozilla)
Let me know if you'd like to meet up this week in London and go through this patch. Thanks!
Flags: needinfo?(mh+mozilla)
Comment on attachment 8762143 [details] [diff] [review] bug1278990_gecko_v1.patch Review of attachment 8762143 [details] [diff] [review]: ----------------------------------------------------------------- ::: build/mozconfig.cache @@ +14,5 @@ > + > + if test -z "$SCCACHE_DISABLE" -a -z "$no_sccache" -a -z "$MOZ_PGO_IS_SET" -a -z "$MOZ_PGO"; then > + if test -z "${SCCACHE_BUCKET}"; then > + # Respect SCCACHE_BUCKET if already set > + bucket="${SCCACHE_BUCKET}" What is going to set SCCACHE_BUCKET? If it's not in-tree, then I'm afraid this is r- for me. We can talk about it some time this week.
Attachment #8762143 - Flags: review?(mh+mozilla)
Flags: needinfo?(mh+mozilla)
Guys, what do you think about this approach? I'm depending on existing environment variables and only doing so when buildprops are not available. The patch also addresses us-west-1 (which isn't normally used by buildbot slaves, but is supported by tc workers/provisioner). The mozilla-releng-s3-cache-us-west-1-prod bucket may still need to be created and populated, that was not mentioned in mozconfig.cache prior to this patch so it may or may not already exist.
Assignee: pmoore → rthijssen
Attachment #8762143 - Attachment is obsolete: true
Attachment #8766352 - Flags: feedback?(pmoore)
Attachment #8766352 - Flags: feedback?(mh+mozilla)
I spoke to rail and have now created the mozilla-releng-s3-cache-us-west-1-prod bucket. Hopefully, that works as expected.
Attachment #8766352 - Flags: feedback?(mh+mozilla) → review?(mh+mozilla)
Attachment #8766352 - Flags: review?(mh+mozilla) → review+
ready for checkin to m-i
Attachment #8766352 - Attachment is obsolete: true
Attachment #8766352 - Flags: feedback?(pmoore)
Attachment #8767117 - Flags: checkin+
So the problem I hit is below (eg: https://treeherder.mozilla.org/#/jobs?repo=try&revision=66d04f33570a03547d5399e19a964dbb0c5a1aca): 10:36:40 INFO - Traceback (most recent call last): 10:36:40 INFO - File "z:\task_1467452691\build\src\sccache\server.py", line 384, in run_command 10:36:40 INFO - for result in _run_command(job): 10:36:40 INFO - File "z:\task_1467452691\build\src\sccache\server.py", line 309, in _run_command 10:36:40 INFO - storage = Storage.from_environment() 10:36:40 INFO - File "z:\task_1467452691\build\src\sccache\storage.py", line 62, in from_environment 10:36:40 INFO - Storage._storage = Storage._iter.next() 10:36:40 INFO - File "z:\task_1467452691\build\src\sccache\storage.py", line 76, in _iter_storages 10:36:40 INFO - dns_server=os.environ.get('SCCACHE_NAMESERVER')) 10:36:40 INFO - File "z:\task_1467452691\build\src\sccache\storage.py", line 233, in __new__ 10:36:40 INFO - return S3Storage(bucket_name=bucket_name, dns_server=dns_server) 10:36:40 INFO - File "z:\task_1467452691\build\src\sccache\storage.py", line 151, in __init__ 10:36:40 INFO - https_connection_factory=(self._https_connection_class, ())) 10:36:40 INFO - File "z:\task_1467452691\build\src\sccache\boto\s3\connection.py", line 176, in __init__ 10:36:40 INFO - validate_certs=validate_certs) 10:36:40 INFO - File "z:\task_1467452691\build\src\sccache\boto\connection.py", line 559, in __init__ 10:36:40 INFO - host, config, self.provider, self._required_auth_capability()) 10:36:40 INFO - File "z:\task_1467452691\build\src\sccache\boto\auth.py", line 875, in get_auth_handler 10:36:40 INFO - 'Check your credentials' % (len(names), str(names))) 10:36:40 INFO - NoAuthHandlerFound: No handler was ready to authenticate. 1 handlers were checked. ['HmacAuthV1Handler'] Check your credentials 10:36:40 INFO - z:/task_1467452691/build/src/config/rules.mk:878: recipe for target 'host_ListCSSProperties.obj' failed I don't know enough about how s3 buckets and how sccache works to know what the problem is but I have some theories: - the worker instance needs to belong to the same aws account as the s3 bucket? (testing this with a try push: https://hg.mozilla.org/try/rev/57c5f5c67e488a701c34b07511a3b018a1a965ff#l1.40) - the s3 bucket needs to have some configuration to allow it to accept io from tc workers (need help/advice if this is the case) - there are some credentials required for connecting to the bucket (I didn't find anything interesting in the cltbld .boto files on bb builders, so not sure where the creds would need to go) Our previous green builds have used a hack in mozconfig.cache which meant that sccache bucket was not set or used, so we have yet to produce a win tc build that makes use of sccache.
Flags: needinfo?(mh+mozilla)
I tested using buckets in the tc account instead of releng (https://hg.mozilla.org/try/rev/813b221b4c50eb6e95d2cf17b776fac2527d57c1#l1.40) and got the same error. I also tested using the .boto config that releng bb workers use (https://hg.mozilla.org/try/rev/813b221b4c50eb6e95d2cf17b776fac2527d57c1#l2.12) and got the same error.
(In reply to Rob Thijssen (:grenade - GMT) from comment #13) > - the worker instance needs to belong to the same aws account as the s3 > bucket? (testing this with a try push: > https://hg.mozilla.org/try/rev/57c5f5c67e488a701c34b07511a3b018a1a965ff#l1. > 40) AFAIK, it doesn't have to, but there needs to be permissions in place to allow it. > - the s3 bucket needs to have some configuration to allow it to accept io > from tc workers (need help/advice if this is the case) > - there are some credentials required for connecting to the bucket (I didn't > find anything interesting in the cltbld .boto files on bb builders, so not > sure where the creds would need to go) Builbot slaves use ephemeral credentials through IAM roles.
Flags: needinfo?(mh+mozilla)
Thanks Mike! John, I've created the instance profile and iam role to support sccache on windows worker types. Do you know how I can get the iam role associated with spot instances? The aws cli reference shows a parameter like: '--iam-instance-profile Name="taskcluster-level-1-sccache"' on the create-instance command. Is there already syntax support in the provisioner that would allow me to specify the instance profile in the workertype definition in the tc aws provisioner?
Flags: needinfo?(jhford)
Depends on: 1187257
Summary: build/mozconfig.cache requires buildprops.json to determine platform, which only works for buildbot builds → Get TC windows builds to use sccache
renamed bug to reflect ultimate objective that it aims to solve
renamed again to reduce scope of this bug to just the mozconfig.cache updates (making it work requires further changes to tc workers and iam config, which is more than I'm trying to achieve here)
Summary: Get TC windows builds to use sccache → update mozconfig.cache to set SCCACHE_BUCKET and SCCACHE_NAMESERVER for taskcluster builders
Attachment #8767117 - Attachment is obsolete: true
Mike, sorry to make you review again. This latest patch is slightly different in that it uses the taskcluster buckets (instead of buildbot/releng) which are in a different aws account. This makes IAM roles easier to manage and gives the TC provisioner an easier job of assigning and revoking roles. The changes in the patch don't care whether roles or credentials are used for access to the buckets and either should work. I've tested in try using a credentials file on the instance.
Comment on attachment 8768012 [details] Bug 1278990 - configure sccache for taskcluster ec2 builders; https://reviewboard.mozilla.org/r/62390/#review59316 ::: build/mozconfig.cache:54 (Diff revision 1) > + if test -z "$SCCACHE_DISABLE" -a -z "$no_sccache" -a -z "$MOZ_PGO_IS_SET" -a -z "$MOZ_PGO"; then > + > + # set S3 bucket according to tree (level) > + case "${GECKO_HEAD_REPOSITORY}" in > + *hg.mozilla.org/try*) > + bucket=taskcluster-level-1-sccache-${availability_zone%?} Please add a comment somewhere that availability_zone is of the form <region><letter> where region is e.g. us-west-2, and that thus, ${availability_zone%?} is expected to be the region. Assigning to a region variable would make that more explicit, too. ::: build/mozconfig.cache:81 (Diff revision 1) > if ! test -e $topsrcdir/sccache/sccache.py; then > echo "Sccache missing in the tooltool manifest" >&2 > exit 1 > fi > mk_add_options "export SCCACHE_BUCKET=$bucket" > - case "$master" in > + if [[ $master == *.us[ew][12].mozilla.com ]] || [ -n "${availability_zone}" ]; then I'd rather avoid bashisms here. Please keep the case. Why not define a fake master in the taskcluster block above?
Attachment #8768012 - Flags: review?(mh+mozilla)
Comment on attachment 8768012 [details] Bug 1278990 - configure sccache for taskcluster ec2 builders; Review request updated; see interdiff: https://reviewboard.mozilla.org/r/62390/diff/1-2/
Attachment #8768012 - Flags: review?(mh+mozilla)
(In reply to Rob Thijssen (:grenade - GMT) from comment #16) > Thanks Mike! > > John, I've created the instance profile and iam role to support sccache on > windows worker types. Do you know how I can get the iam role associated with > spot instances? The aws cli reference shows a parameter like: > '--iam-instance-profile Name="taskcluster-level-1-sccache"' on the > create-instance command. Is there already syntax support in the provisioner > that would allow me to specify the instance profile in the workertype > definition in the tc aws provisioner? Hey. We talked about this on IRC a little, but yes, you should be able to specify that as part of the LaunchSpecification supplied in the worker type definition. When you tried that, it broke. Inside the provisioner, we pass that through as a string property on a JS Object which gets serialised inside the aws-sdk module. The error you were getting looked as if it were somehow converting the string provided into a char array, which was being sent as the object representation of the array of chars. E.g. the aws-sdk was converting {a: 'abc'} into {a: {0: 'a', 1: 'b', 2: 'c'}}. This was freaking out the client or the api and didn't work. This same pattern is used to specify things like the ImageId, so it shouldn't be broken. Let me know if this is important and I can try to dig into it and see what's up here.
Flags: needinfo?(jhford)
Just an update that the try build for the revised patch, pending review, produced green builds in both buildbot and taskcluster. TC builds were a little slower, presumably due to empty cache buckets. I guess we would expect subsequent builds to take less time once the caches are seeded. https://treeherder.mozilla.org/#/jobs?repo=try&revision=e3b8f9d38b83e379d3e9f8e24ac79e21186a9181 I have re-triggered the one failure, which I think was unrelated to sccache.
(In reply to John Ford [:jhford] from comment #23) > Let me know if this is important and I can try to dig into it and see what's > up here. We've got a workaround for now, so rather than invest time fixing it, I think it's fine to leave the workaround in place till we implement the tc-auth stuff in the generic-worker.
https://reviewboard.mozilla.org/r/62390/#review59430 ::: build/mozconfig.cache:11 (Diff revision 2) > > # Avoid duplication if the file happens to be included twice. > if test -z "$bucket" -a -z "$NO_CACHE"; then > > +# buildbot (or builders that use buildprops.json): > +if [ -f $topsrcdir/../buildprops.json ]; then We'll need to be careful if checking for this is what determines if it's a buildbot build or taskcluster and does the right thing.. In the image used for firefox linux builds, there is a stubbed out buildprops.json but it looks like it's in the users home directory, not a directory up from topsrcdir. https://dxr.mozilla.org/mozilla-central/source/testing/docker/desktop-build/Dockerfile#24 ::: build/mozconfig.cache:49 (Diff revision 2) > > +# taskcluster: > +else > + # timeout after 1 second, and don't retry (failure indicates instance is not in ec2 or wget, network issue) > + # availability_zone is of the form <region><letter> where region is e.g. us-west-2, and az is us-west-2a > + availability_zone=$(wget -T 1 -t 1 -q -O - http://169.254.169.254/latest/meta-data/placement/availability-zone) So at least for docker-worker, and I would imagine generic-worker, the zone is known on worker start up. We set this in an env variable for every task that runs within docker-worker. I'm not sure if that's better than doing a wget for each task to the meta-data service api. https://github.com/taskcluster/docker-worker/blob/master/lib/task.js#L327 We call it worker group though because the AZ is the worker group for ec2 instances. ::: build/mozconfig.cache:56 (Diff revision 2) > + region=${availability_zone%?} > + if test -z "$SCCACHE_DISABLE" -a -z "$no_sccache" -a -z "$MOZ_PGO_IS_SET" -a -z "$MOZ_PGO"; then > + # set S3 bucket according to tree (level) > + case "${GECKO_HEAD_REPOSITORY}" in > + *hg.mozilla.org/try*) > + bucket=taskcluster-level-1-sccache-${region} I like that we're just substituting the region in here rather than doing another case/switch on it!
https://reviewboard.mozilla.org/r/62390/#review59430 > We'll need to be careful if checking for this is what determines if it's a buildbot build or taskcluster and does the right thing.. > > In the image used for firefox linux builds, there is a stubbed out buildprops.json but it looks like it's in the users home directory, not a directory up from topsrcdir. > > https://dxr.mozilla.org/mozilla-central/source/testing/docker/desktop-build/Dockerfile#24 I've spent a fair bit of time bouncing this code through reviews (it's the last issue preventing enabling of windows builds in taskcluster and the review process has stretched over several weeks). As far as I'm aware, the changes introduced by this patch only allow the sccache bucket and name server to be set (where they otherwise would remain unset) if several special conditions are true. Those conditions are: 1) the presence of an availability zone at an ec2 specific url. 2) the presence and specific contents of the GECKO_HEAD_REPOSITORY variable being set to well known values. 3) the presence and specific contents of a well known Windows specific environment variable. It's unlikely that any of these scenarios could exist in a non-deliberate or accidental scenario. Prior to this patch, fewer scenarios (existence of certain values in the buildprops.json file) was the only criteria that could cause sccache to be configured for use. We're just adding an extra scenario that can cause sccache to be available for use. I can't think of any reason this should be dangerous, but I'll entertain any corrections to my reasoning. > So at least for docker-worker, and I would imagine generic-worker, the zone is known on worker start up. We set this in an env variable for every task that runs within docker-worker. I'm not sure if that's better than doing a wget for each task to the meta-data service api. > > https://github.com/taskcluster/docker-worker/blob/master/lib/task.js#L327 > > We call it worker group though because the AZ is the worker group for ec2 instances. I went with the wget call because the original patch was r- due to using env vars that were not set in-tree. Since the mechanisms that can do this during worker instantiation are not in-tree I had to put logic to determine the region reliably into this code. I don't have any other ideas for how to determine region from a variable set in-tree but I'm open to ideas. > I like that we're just substituting the region in here rather than doing another case/switch on it! Me too :)
09:53 <grenade> glandium: is it safe for me to assume an r+ on https://reviewboard.mozilla.org/r/62390? I haven't got my head around mozreview, but I've made the changes you asked for to my earlier patch. garndt cleared the review flag (I don't know what that means in mozreview parlance) 09:54 <glandium> grenade: I can't answer for garndt 10:00 <grenade> glandium: does https://reviewboard.mozilla.org/r/62390/diff/1-2/ address your concerns from your review of my earlier attempt? 10:01 <glandium> grenade: yes
Keywords: checkin-needed
Pushed by cbook@mozilla.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/cf971ca7f132 configure sccache for taskcluster ec2 builders; r=glandium
Keywords: checkin-needed
https://reviewboard.mozilla.org/r/62390/#review59430 > I went with the wget call because the original patch was r- due to using env vars that were not set in-tree. Since the mechanisms that can do this during worker instantiation are not in-tree I had to put logic to determine the region reliably into this code. I don't have any other ideas for how to determine region from a variable set in-tree but I'm open to ideas. I don't see anything in the gecko tree that is using `TASKCLUSTER_WORKER_GROUP`: https://dxr.mozilla.org/mozilla-central/search?q=TASKCLUSTER_WORKER_GROUP&redirect=false although I'm happy to add this as a feature to generic worker. Based on https://bugzilla.mozilla.org/show_bug.cgi?id=1278990#c8 though, I think this won't help us in this instance, like Rob says.
Status: ASSIGNED → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla50
Depends on: 1285529
(In reply to Rob Thijssen (:grenade - GMT) from comment #27) > https://reviewboard.mozilla.org/r/62390/#review59430 > I've spent a fair bit of time bouncing this code through reviews (it's the > last issue preventing enabling of windows builds in taskcluster and the > review process has stretched over several weeks). I apologize, I was not implying that this hasn't been vetted through reviews nor suggesting that it should derail the already long stretch this patch has gone down. I was only presenting another side to things to take into considering but no action was needed on it. > I can't think of any reason this should be dangerous, but I'll entertain any > corrections to my reasoning. The reason I mentioned the buildprops file is because we already have one in the linux environment for other reasons (mozharness), although not in a location that this patch would cause issue with. However, developers are allowed to change this build environment and by placing that file at "$topsrcdir/../buildprops.json" could cause sccache to be disabled within taskcluster without possibly realizing it. Unlikely that this would happen as we are phasing out buildbot and hopefully the use of this file, which is why I suggested that we be careful in the future, not that it should cause this patch to be changed in anyway. I didn't mean to imply that if it came off that way.
(In reply to Pete Moore [:pmoore][:pete] from comment #30) > https://reviewboard.mozilla.org/r/62390/#review59430 > > > I went with the wget call because the original patch was r- due to using env vars that were not set in-tree. Since the mechanisms that can do this during worker instantiation are not in-tree I had to put logic to determine the region reliably into this code. I don't have any other ideas for how to determine region from a variable set in-tree but I'm open to ideas. > > I don't see anything in the gecko tree that is using > `TASKCLUSTER_WORKER_GROUP`: > https://dxr.mozilla.org/mozilla-central/ > search?q=TASKCLUSTER_WORKER_GROUP&redirect=false > although I'm happy to add this as a feature to generic worker. Based on > https://bugzilla.mozilla.org/show_bug.cgi?id=1278990#c8 though, I think this > won't help us in this instance, like Rob says. That environment variable is not used in production currently because sccache is not enabled for linux builds using temp s3 creds from taskcluster auth. If there are no issues with using wget on all our build platforms, then double thumbs up. I was just suggesting that we are providing that information within docker-worker environments in case we want to use it, not that it was mandatory to use it.
Wes, can you please backout. This failed on try (l10n) with: 11:43:11 INFO - Error loading mozconfig: /builds/slave/try-m64-l10n-00000000000000000/build/try/.mozconfig 11:43:11 INFO - Evaluation of your mozconfig exited with an error. This could be triggered 11:43:11 INFO - by a command inside your mozconfig failing. Please change your mozconfig 11:43:11 INFO - to not error and/or to catch errors in executed commands. 11:43:11 INFO - mozconfig output: 11:43:11 INFO - ------BEGIN_MK_OPTION 11:43:11 INFO - AUTOCLOBBER=1 11:43:11 INFO - ------END_MK_OPTION 11:43:11 INFO - ------BEGIN_AC_OPTION 11:43:11 INFO - --enable-crashreporter 11:43:11 INFO - ------END_AC_OPTION 11:43:11 INFO - ------BEGIN_AC_OPTION 11:43:11 INFO - --enable-release 11:43:11 INFO - ------END_AC_OPTION 11:43:11 INFO - ------BEGIN_AC_OPTION 11:43:11 INFO - --enable-js-shell 11:43:11 INFO - ------END_AC_OPTION 11:43:11 INFO - ------BEGIN_MK_OPTION 11:43:11 INFO - export MOZ_AUTOMATION_BUILD_SYMBOLS=1 11:43:11 INFO - ------END_MK_OPTION 11:43:11 INFO - ------BEGIN_MK_OPTION 11:43:11 INFO - export MOZ_AUTOMATION_L10N_CHECK=1 11:43:11 INFO - ------END_MK_OPTION 11:43:11 INFO - ------BEGIN_MK_OPTION 11:43:11 INFO - export MOZ_AUTOMATION_PACKAGE=1 11:43:11 INFO - ------END_MK_OPTION 11:43:11 INFO - ------BEGIN_MK_OPTION 11:43:11 INFO - export MOZ_AUTOMATION_PACKAGE_TESTS=1 11:43:11 INFO - ------END_MK_OPTION 11:43:11 INFO - ------BEGIN_MK_OPTION 11:43:11 INFO - export MOZ_AUTOMATION_INSTALLER=0 11:43:11 INFO - ------END_MK_OPTION 11:43:11 INFO - ------BEGIN_MK_OPTION 11:43:11 INFO - export MOZ_AUTOMATION_UPDATE_PACKAGING=0 11:43:11 INFO - ------END_MK_OPTION 11:43:11 INFO - ------BEGIN_MK_OPTION 11:43:11 INFO - export MOZ_AUTOMATION_UPLOAD=1 11:43:11 INFO - ------END_MK_OPTION 11:43:11 INFO - ------BEGIN_MK_OPTION 11:43:11 INFO - export MOZ_AUTOMATION_UPLOAD_SYMBOLS=0 11:43:11 INFO - ------END_MK_OPTION 11:43:11 INFO - ------BEGIN_MK_OPTION 11:43:11 INFO - export MOZ_AUTOMATION_SDK=0 11:43:11 INFO - ------END_MK_OPTION 11:43:11 INFO - ------BEGIN_MK_OPTION 11:43:11 INFO - AUTOCLOBBER=1 11:43:11 INFO - ------END_MK_OPTION 11:43:11 INFO - ------BEGIN_AC_OPTION 11:43:11 INFO - --enable-crashreporter 11:43:11 INFO - ------END_AC_OPTION 11:43:11 INFO - ------BEGIN_AC_OPTION 11:43:11 INFO - --enable-release 11:43:11 INFO - ------END_AC_OPTION 11:43:11 INFO - ------BEGIN_AC_OPTION 11:43:11 INFO - --enable-js-shell 11:43:11 INFO - ------END_AC_OPTION 11:43:11 INFO - ------BEGIN_MK_OPTION 11:43:11 INFO - export MOZ_AUTOMATION_BUILD_SYMBOLS=1 11:43:11 INFO - ------END_MK_OPTION 11:43:11 INFO - ------BEGIN_MK_OPTION 11:43:11 INFO - export MOZ_AUTOMATION_L10N_CHECK=1 11:43:11 INFO - ------END_MK_OPTION 11:43:11 INFO - ------BEGIN_MK_OPTION 11:43:11 INFO - export MOZ_AUTOMATION_PACKAGE=1 11:43:11 INFO - ------END_MK_OPTION 11:43:11 INFO - ------BEGIN_MK_OPTION 11:43:11 INFO - export MOZ_AUTOMATION_PACKAGE_TESTS=1 11:43:11 INFO - ------END_MK_OPTION 11:43:11 INFO - ------BEGIN_MK_OPTION 11:43:11 INFO - export MOZ_AUTOMATION_INSTALLER=0 11:43:11 INFO - ------END_MK_OPTION 11:43:11 INFO - ------BEGIN_MK_OPTION 11:43:11 INFO - export MOZ_AUTOMATION_UPDATE_PACKAGING=0 11:43:11 INFO - ------END_MK_OPTION 11:43:11 INFO - ------BEGIN_MK_OPTION 11:43:11 INFO - export MOZ_AUTOMATION_UPLOAD=1 11:43:11 INFO - ------END_MK_OPTION 11:43:11 INFO - ------BEGIN_MK_OPTION 11:43:11 INFO - export MOZ_AUTOMATION_UPLOAD_SYMBOLS=0 11:43:11 INFO - ------END_MK_OPTION 11:43:11 INFO - ------BEGIN_MK_OPTION 11:43:11 INFO - export MOZ_AUTOMATION_SDK=0 11:43:11 INFO - ------END_MK_OPTION 11:43:11 INFO - ------BEGIN_AC_OPTION 11:43:11 INFO - --with-l10n-base=../../l10n 11:43:11 INFO - ------END_AC_OPTION 11:43:11 INFO - ------BEGIN_AC_OPTION 11:43:11 INFO - --enable-update-channel=nightly 11:43:11 INFO - ------END_AC_OPTION 11:43:11 INFO - ------BEGIN_AC_OPTION 11:43:11 INFO - --with-branding=browser/branding/nightly 11:43:11 INFO - ------END_AC_OPTION 11:43:11 INFO - ------BEGIN_AC_OPTION 11:43:11 INFO - --with-macbundlename-prefix=Firefox 11:43:11 INFO - ------END_AC_OPTION 11:43:11 ERROR - Return code: 1 11:43:11 ERROR - 1 not in success codes: [0] 11:43:11 WARNING - setting return code to 2 11:43:11 FATAL - Halting on failure while running ['python2.7', 'mach', 'configure'] 11:43:11 FATAL - Running post_fatal callback... At least for OSX. I imagine it will break nightly l10n as well. Per IRC: 2:39 PM <mshal> looks like mine is failing because of this line: availability_zone=$(wget -T 1 -t 1 -q -O - http://169.254.169.254/latest/meta-data/placement/availability-zone)
Status: RESOLVED → REOPENED
Flags: needinfo?(wkocher)
Resolution: FIXED → ---
Yeah, comment 34 is exactly one of the issue we fixed in bug 1285529.
Callek: please can you provide a link to the rest of the failure log? I want to resubmit the patch, but it would be super helpful to know the rest of the environment configuration in the failure scenario. Cheers!
Flags: needinfo?(bugspam.Callek)
Comment on attachment 8768012 [details] Bug 1278990 - configure sccache for taskcluster ec2 builders; Review request updated; see interdiff: https://reviewboard.mozilla.org/r/62390/diff/2-3/
Attachment #8768012 - Flags: review?(mh+mozilla) → review?(garndt)
(In reply to Rob Thijssen (:grenade - GMT) from comment #37) > Callek: please can you provide a link to the rest of the failure log? I want > to resubmit the patch, but it would be super helpful to know the rest of the > environment configuration in the failure scenario. Cheers! I noticed the bustage on my own try push: https://treeherder.mozilla.org/#/jobs?repo=try&revision=3d6c071826f9b80bf8f2fcaa2590e21cb43309a0 I then added the OSX l10n job to the try push from c#24: https://treeherder.mozilla.org/#/jobs?repo=try&revision=e3b8f9d38b83e379d3e9f8e24ac79e21186a9181 As well as backed out that cset and pushed to try with my patchset again: https://treeherder.mozilla.org/#/jobs?repo=try&revision=f82a43eea4bcdcdb649d57cc3f38b11ad1243ca1 *NOTEWORTHY* the timeouts are expected for those Buildbot-based l10n jobs (like OSX). (the timeout for my push wasn't a concern of mine, but for your testing you may want to trim locales) -- Instructions at https://wiki.mozilla.org/ReleaseEngineering/TryServer#Desktop_l10n_jobs
Thank you Callek! I was specifically interested in the presence of the CCACHE_DIR env var, which we can use to determine if sccache config should be attempted (I believe the presence of that var is a good indication that we want the --with-ccache flag set and don't need the sccache bucket and nameserver settings). The latest patch should skip the wget call entirely if that var is set (which it appears to be in the linked logs).
Flags: needinfo?(bugspam.Callek)
Attachment #8768012 - Flags: review?(garndt) → review?(bugspam.Callek)
Comment on attachment 8768012 [details] Bug 1278990 - configure sccache for taskcluster ec2 builders; I can't claim to know the intent of this bug well enough to mark a full r+. But it does look like it would be landable without breaking OSX L10n at first glance. I'm going to leave the real review request alone to garndt (or someone else at your discretion) though
Attachment #8768012 - Flags: review?(garndt)
Attachment #8768012 - Flags: review?(bugspam.Callek)
Attachment #8768012 - Flags: feedback+
Attachment #8768012 - Flags: feedback+ → review?(pmoore)
Comment on attachment 8768012 [details] Bug 1278990 - configure sccache for taskcluster ec2 builders; Removing r?, looks like pmoore was later flagged for review.
Attachment #8768012 - Flags: review?(garndt)
Attachment #8768012 - Flags: review?(pmoore)
I'll take a review from anyone at this point. Half a dozen folks have looked at it and said it looks fine. Earlier incarnations were reviewed and folks said they were happy but nobody has actually r+ed it. Apologies that I keep changing the reviewer, that was in an effort to catch someone working this morning in BST, but I seem to have just confused things.
I haven't been following the progression of the patch, but I tried it out locally. If I add '. $topsrcdir/build/mozconfig.cache' to my .mozconfig, it adds 13 seconds to configure time. Unfortunately trying to add in echo statements to mozconfig.cache breaks configure, so debugging is a little tricky. However, I replaced wget with a version that creates a text file based on its PID in a tmp directory. A single run of './mach configure' ends up running the 'wget -T 1 -t 1 -q -O - http://169.254.169.254/latest/meta-data/placement/availability-zone' 13 times, which accounts for the additional 13 second delay. So I have a few questions: 1) (for ted) When developers start using sccache on their local machines, are they supposed to use build/mozconfig.cache? Or is that only for automation? 2) (for glandium) Do you know off-hand why this would be executed so many times? Is there a way we can cache the wget output so that it is only run once? Ugh, I wish I could force needinfo when someone blocks it. Maybe glandium knows part 1 as well.
Flags: needinfo?(mh+mozilla)
https://reviewboard.mozilla.org/r/62390/#review60330 ::: build/mozconfig.cache:50 (Diff revisions 2 - 3) > -# taskcluster: > -else > +# builds without buildprops (eg: taskcluster or non-buildbot) and without ccache env config and without sccache disabled: > +elif test -z "$CCACHE_DIR" -a -z "$SCCACHE_DISABLE" -a -z "$no_sccache" -a -z "$MOZ_PGO_IS_SET" -a -z "$MOZ_PGO"; then > + > # timeout after 1 second, and don't retry (failure indicates instance is not in ec2 or wget, network issue) > # availability_zone is of the form <region><letter> where region is e.g. us-west-2, and az is us-west-2a > - availability_zone=$(wget -T 1 -t 1 -q -O - http://169.254.169.254/latest/meta-data/placement/availability-zone) > + availability_zone=$(wget -T 1 -t 1 -q -O - http://169.254.169.254/latest/meta-data/placement/availability-zone || true) I know that glandium looked over the original wget portion, but what mshal brings up definitely should be looked into. ::: build/mozconfig.cache:11 (Diff revision 3) > > # Avoid duplication if the file happens to be included twice. > if test -z "$bucket" -a -z "$NO_CACHE"; then > > +# buildbot (or builders that use buildprops.json): > +if [ -f $topsrcdir/../buildprops.json ]; then Maybe it's just my view in mozreview, but shouldn't everythign in the following block be indented, at least to visually see the buildbot stuff is only run when buildprops.json is found. ::: build/mozconfig.cache:46 (Diff revision 3) > ;; > esac > fi > > +# builds without buildprops (eg: taskcluster or non-buildbot) and without ccache env config and without sccache disabled: > +elif test -z "$CCACHE_DIR" -a -z "$SCCACHE_DISABLE" -a -z "$no_sccache" -a -z "$MOZ_PGO_IS_SET" -a -z "$MOZ_PGO"; then Why do we (taskcluster tasks) need to check for the presence of $CCACHE_DIR but buildbot jobs don't? ::: build/mozconfig.cache:65 (Diff revision 3) > + *hg.mozilla.org/integration/mozilla-inbound*|*hg.mozilla.org/integration/fx-team*) > + bucket=taskcluster-level-3-sccache-${region} > + ;; > + esac > + > + # set a dummy master I know this was added in a previous review, but I'm just curious, what does this "dummy master" stuff mean here? Why is it important to set these?
(In reply to Greg Arndt [:garndt] from comment #45) > Maybe it's just my view in mozreview, but shouldn't everythign in the > following block be indented, at least to visually see the buildbot stuff is > only run when buildprops.json is found. I tend to agree but the pre-existing lines around reading the buildprops file can't be indented (<<EOF ... EOF), so I opted to leave the whole block un-indented for readability. > Why do we (taskcluster tasks) need to check for the presence of $CCACHE_DIR previously, jobs that didn't get a bucket set from the buildprops logic, didn't get a second shot at getting sccache config. Now that they do (with the wget/az logic), some (which don't use sccache) need to be filtered out. the ccache_dir being set is the best indication (i could think of) that should exclude these jobs from attempting sccache config. > I know this was added in a previous review, but I'm just curious, what does > this "dummy master" stuff mean here? Why is it important to set these? glandiums comment from the previous review suggested it was less evil than using bashisms, eg: if [[ $master == *.us[ew][12].mozilla.com ]] || [ -n "${availability_zone}" ]; then
(In reply to Rob Thijssen (:grenade - GMT) from comment #46) > (In reply to Greg Arndt [:garndt] from comment #45) > > > Maybe it's just my view in mozreview, but shouldn't everythign in the > > following block be indented, at least to visually see the buildbot stuff is > > only run when buildprops.json is found. > > I tend to agree but the pre-existing lines around reading the buildprops > file can't be indented (<<EOF ... EOF), so I opted to leave the whole block > un-indented for readability. If you use <<- instead of << then it should work with indents (see bash man page, and look for 'Here Documents').
(In reply to Michael Shal [:mshal] from comment #44) > 2) (for glandium) Do you know off-hand why this would be executed so many > times? I have a utility for showing what is importing mozconfigs, which should help establish why it gets pulled in 13 times for you. You point it to the mozconfig you are using and your topsrcdir, and it will generate your "effective" mozconfig by inlining all the imports and showing where it inlined them from. https://github.com/petemoore/myscrapbook/blob/master/explode_mozconfig.sh
Comment on attachment 8768012 [details] Bug 1278990 - configure sccache for taskcluster ec2 builders; Review request updated; see interdiff: https://reviewboard.mozilla.org/r/62390/diff/3-4/
Attachment #8768012 - Flags: review?(mshal)
Comment on attachment 8768012 [details] Bug 1278990 - configure sccache for taskcluster ec2 builders; Review request updated; see interdiff: https://reviewboard.mozilla.org/r/62390/diff/4-5/
(In reply to Pete Moore [:pmoore][:pete] from comment #48) > I have a utility for showing what is importing mozconfigs, which should help > establish why it gets pulled in 13 times for you. You point it to the > mozconfig you are using and your topsrcdir, and it will generate your > "effective" mozconfig by inlining all the imports and showing where it > inlined them from. > > https://github.com/petemoore/myscrapbook/blob/master/explode_mozconfig.sh Thanks for the utility - seems quite useful. Unfortunately it looks like in this case that the file gets included once as a result of mozconfig evaluation (the mozconfig is a single line including build/mozconfig.cache). I think perhaps it is getting evaluated multiple times by make or configure somewhere along the way. Running strace on configure shows the .mozconfig file (and subsequently, mozconfig.cache) opened many times.
Comment on attachment 8768012 [details] Bug 1278990 - configure sccache for taskcluster ec2 builders; https://reviewboard.mozilla.org/r/62390/#review60378 The patch looks reasonable to me, but unfortunately it doesn't seem to fix the multiple-wget call issue. I'm not sure why (perhaps every time mozconfig is evaluated is in a different shell?) Here is what I was using to test: $ cat ~/tbin/wget #! /bin/sh echo "Running wget: wget $@" > $HOME/tbin/$$.txt $ rm -rf obj-x86_64-pc-linux-gnu/; PATH=$HOME/tbin:$PATH ./mach configure $ cat ~/tbin/*.txt Running wget: wget -T 1 -t 1 -q -O - http://169.254.169.254/latest/meta-data/placement/availability-zone Running wget: wget -T 1 -t 1 -q -O - http://169.254.169.254/latest/meta-data/placement/availability-zone Running wget: wget -T 1 -t 1 -q -O - http://169.254.169.254/latest/meta-data/placement/availability-zone Running wget: wget -T 1 -t 1 -q -O - http://169.254.169.254/latest/meta-data/placement/availability-zone Running wget: wget -T 1 -t 1 -q -O - http://169.254.169.254/latest/meta-data/placement/availability-zone Running wget: wget -T 1 -t 1 -q -O - http://169.254.169.254/latest/meta-data/placement/availability-zone Running wget: wget -T 1 -t 1 -q -O - http://169.254.169.254/latest/meta-data/placement/availability-zone Running wget: wget -T 1 -t 1 -q -O - http://169.254.169.254/latest/meta-data/placement/availability-zone Running wget: wget -T 1 -t 1 -q -O - http://169.254.169.254/latest/meta-data/placement/availability-zone Running wget: wget -T 1 -t 1 -q -O - http://169.254.169.254/latest/meta-data/placement/availability-zone Running wget: wget -T 1 -t 1 -q -O - http://169.254.169.254/latest/meta-data/placement/availability-zone Running wget: wget -T 1 -t 1 -q -O - http://169.254.169.254/latest/meta-data/placement/availability-zone Running wget: wget -T 1 -t 1 -q -O - http://169.254.169.254/latest/meta-data/placement/availability-zone Are you able to reproduce the issue with this? (Note that the wrapper script doesn't actually invoke wget, it just prints a message to a file.)
Attachment #8768012 - Flags: review?(mshal)
(In reply to Michael Shal [:mshal] from comment #44) > 1) (for ted) When developers start using sccache on their local machines, > are they supposed to use build/mozconfig.cache? Or is that only for > automation? Local developers are not supposed to use in-tree mozconfigs. Never. Except for b2g builds because... b2g, I guess. Anyways, when developers start using sccache on their local machines the expectations won't change. They won't be using build/mozconfig.cache (except b2g developers) > 2) (for glandium) Do you know off-hand why this would be executed so many > times? Is there a way we can cache the wget output so that it is only run > once? mozconfig is evaluated way too many times. It used to be even worse before bug 1283052.
Flags: needinfo?(mh+mozilla)
Comment on attachment 8768012 [details] Bug 1278990 - configure sccache for taskcluster ec2 builders; Review request updated; see interdiff: https://reviewboard.mozilla.org/r/62390/diff/5-6/
Attachment #8768012 - Flags: review?(mshal)
https://reviewboard.mozilla.org/r/62390/#review60378 I haven't been able to repro due to my setup/lack of knowledge but I've submitted an update to the patch which runs an `export availability_zone` in an effort to cache/persist the result of the wget call.
Attachment #8768012 - Flags: review?(mshal) → review+
Comment on attachment 8768012 [details] Bug 1278990 - configure sccache for taskcluster ec2 builders; https://reviewboard.mozilla.org/r/62390/#review60732 Can you file a followup bug to either find a way to limit the number of mozconfig evaluations, or run the wget outside of the mozconfig so that it only happens once? ::: build/mozconfig.cache:81 (Diff revision 6) > + us-west-2) > + master=dummy.usw2.mozilla.com > + ;; > + esac > + fi > + export availability_zone Unfortunately this doesn't have the intended effect. Can you remove the export before landing? It might just add confusion to future development.
r+'ed version of the patch ready for landing
Attachment #8768012 - Attachment is obsolete: true
bug 1286326 created to address the repeated wget call issue
Status: REOPENED → ASSIGNED
Keywords: checkin-needed
Pushed by ryanvm@gmail.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/4c8316172145 Configure sccache for taskcluster ec2 builders. r=mshal
Keywords: checkin-needed
What would be the options for using sccache with this mozconfig.cache file in an automation environment that restricts network access, including specifically access to AWS meta-data and user-data endpoints?
garndt: we'd have to add handling for that scenario. my guess is we'd have to accept that the only way to determine region in that case would be to read an environment variable set outside of tree (like the ones set by the tc-worker or generic-worker) because if an in-tree process can't access metadata over the network, you have very limited options for determining ec2 region. you could possibly use the ip address of the public interface and include address ranges in tree, but that feels a lot stinkier than just accepting that the info comes from outside of tree. if network access is restricted as far as blocking access to the regional s3 bucket, then the only option is to set the SCCACHE_DIR variable to a local folder (or network store mapped to a local folder) and use sccache's local cache functionality. at that point though, its not shared cache anymore so the value is lessened.
Status: ASSIGNED → RESOLVED
Closed: 9 years ago9 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla50
(In reply to Michael Shal [:mshal] from comment #44) > 1) (for ted) When developers start using sccache on their local machines, > are they supposed to use build/mozconfig.cache? Or is that only for > automation? That's only for automation. I intend to add a `--with-sccache` configure option for local developers (bug 1266717).
in comments 16 and 23 above we made some references to problems associating instances with iam roles. i just want to document here that the syntax required to make that work (now implemented for windows try builders) is: "launchSpec": { "IamInstanceProfile": { "Arn": "arn:aws:iam::692406183521:instance-profile/taskcluster-level-1-sccache" } } the hint for the correct syntax came from: http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/spot-request-examples.html#spot-launch-specification2. the IamInstanceProfile property is actually an object rather than a string as suggested elsewhere in ec2 docs.
Product: Core → Firefox Build System
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: