Closed Bug 1145774 Opened 10 years ago Closed 10 years ago

figure out how to do release av scans when we move to s3

Tracking

(Not tracked)

Status:

RESOLVED FIXED

People

(Reporter: bhearsum, Assigned: nthomas)

References

Details

Attachments

(11 files, 6 obsolete files)

av-linux64-puppet.diff 10 years ago Rail Aliiev [:rail] 9.65 KB, patch	Callek : feedback+	Details \| Diff \| Splinter Review
av-linux64-puppet-1.diff 10 years ago Rail Aliiev [:rail] 19.22 KB, patch	dustin : review+	Details \| Diff \| Splinter Review
av-linux64-puppet-2.diff 10 years ago Rail Aliiev [:rail] 19.24 KB, patch	rail : review+ rail : checked-in+	Details \| Diff \| Splinter Review
Add av-linux64 to build-tools 10 years ago Rail Aliiev [:rail] 53 bytes, text/x-github-pull-request	dustin : review+ rail : checked-in+	Details \| Review
ensure_clamd_running.diff 10 years ago Rail Aliiev [:rail] 487 bytes, patch	nthomas : review+ rail : checked-in+	Details \| Diff \| Splinter Review
[gecko] antivirus script, wip 10 years ago Nick Thomas [:nthomas] (UTC+12) 7.85 KB, patch		Details \| Diff \| Splinter Review
[puppet] Add p7zip package 10 years ago Nick Thomas [:nthomas] (UTC+12) 1.48 KB, patch	rail : review+ dustin : feedback+ nthomas : checked-in+	Details \| Diff \| Splinter Review
[gecko] antivirus script 10 years ago Nick Thomas [:nthomas] (UTC+12) 7.42 KB, patch		Details \| Diff \| Splinter Review
[puppet] Add password to buildbot masters 10 years ago Nick Thomas [:nthomas] (UTC+12) 841 bytes, patch	rail : review+ nthomas : checked-in+	Details \| Diff \| Splinter Review
[gecko] antivirus script, v2 10 years ago Nick Thomas [:nthomas] (UTC+12) 7.29 KB, patch	jlund : review+ nthomas : checked-in+	Details \| Diff \| Splinter Review
[tools] Tweak extract_and_run.py 10 years ago Nick Thomas [:nthomas] (UTC+12) 1.07 KB, patch	rail : review+ nthomas : checked-in+	Details \| Diff \| Splinter Review
[buildbot-configs] Add linux64-av platform to drive jobs 10 years ago Nick Thomas [:nthomas] (UTC+12) 9.63 KB, patch	rail : review+	Details \| Diff \| Splinter Review
[buildbotcustom] Use new slaves, drive new script 10 years ago Nick Thomas [:nthomas] (UTC+12) 3.50 KB, patch	rail : review+	Details \| Diff \| Splinter Review
[buildbot-configs] Add linux64-av platform to drive jobs, v3 10 years ago Nick Thomas [:nthomas] (UTC+12) 8.14 KB, patch	nthomas : review+	Details \| Diff \| Splinter Review
[buildbot-configs] Add linux64-av platform to drive jobs, v4 10 years ago Nick Thomas [:nthomas] (UTC+12) 9.43 KB, patch	nthomas : review+ nthomas : checked-in+	Details \| Diff \| Splinter Review
av.diff 10 years ago Rail Aliiev [:rail] 1.21 KB, patch	Callek : review+ rail : checked-in+	Details \| Diff \| Splinter Review
PR to swap to 4xlarge or bigger, ondemand instead of spot 10 years ago Nick Thomas [:nthomas] (UTC+12) 53 bytes, text/x-github-pull-request	rail : review+	Details \| Review

bhearsum@mozilla.com (:bhearsum)

Reporter

Description

•

10 years ago

We currently run clamd on upload1...that's not going to work with s3. We should probably make sure we do something that doesn't process all of the release artifacts in serial, too.

bhearsum@mozilla.com (:bhearsum)

Reporter

Comment 1

•

10 years ago

One idea (that I think someone else can up with) is to monitor the release bucket(s) for new files and scan them as they come in. This would take the AV scan out of the critical path and potentially save tons of time during release automation. Something mshal suggested today was to upload artifacts to the nightly bucket initial, and then have the AV scan process them and put them in the release bucket. This might solve the retention issue at the same time, and could probably be done in combination with the above. Depending what happens in bug 1147142 this may not be possible. Eg: if we serve files directly off of s3, we may need to know their final location at build time to have balrog submission work.

Blocks: 1147140

Rail Aliiev [:rail]

Comment 2

•

10 years ago

http://aws.amazon.com/lambda/ is a perfect match for this job! :)

bhearsum@mozilla.com (:bhearsum)

Reporter

Comment 3

•

10 years ago

(In reply to Rail Aliiev [:rail] from comment #2) > http://aws.amazon.com/lambda/ is a perfect match for this job! :) If we go this route we're going to have think about how to make sure we actually verify that everything has passed AV prior to releasing. The fact that AV as a node in the release DAG right now ensures that. If we're doing something disconnected from the automation we'll need something else. It doesn't necessarily have to be part of the DAG, it could be as simple as something that makes sure the AV lambda jobs are stuck or anything.

bhearsum@mozilla.com (:bhearsum)

Reporter

Comment 4

•

10 years ago

(In reply to Rail Aliiev [:rail] from comment #2) > http://aws.amazon.com/lambda/ is a perfect match for this job! :) I'm not so sure Lambda is a good fit after looking at it. AFAICT you can only run JS code, and don't have control over what's installed on the compute platform. Given that, I'm not sure how we'd be able to run clamd or any other AV scan on it. Am I missing something? The docs suggest that Beanstalk might be an alternative. Given that Beanstalk is web-app centric, I suppose we might be able to make use of it if we implement antivirus-as-a-web-service. I'm not sure where the glue that makes requests to such a service after each s3 upload would live, though.

Rail Aliiev [:rail]

Comment 5

•

10 years ago

Yeah, Lambda is not ready yet. ATM it's JS/node-only and limited to 60s per task. We can use TC with custom scheduler listening for S3 PUT events.

bhearsum@mozilla.com (:bhearsum)

Reporter

Comment 6

•

10 years ago

I was chatting with Pete to get some additional TC understanding and we ended up talking about this. Now that I have a better understanding of things, I think you're right that TC would make a lot of sense. We could do something like: * Docker Worker w/ image that has clam installed to do scanning (we'd want to make sure that they know how to update their definitions though) * External process that creates task graphs in response to things being put into the release bucket (I think this is what you mean by "custom scheduler") The trick is going to be knowing when all of the AV scans are done. Pete suggested that we could keep extending the task graph as deliverables come in, which would let us watch the task queue for completion. I don't think that will work because there's not a constant flow of them - you could have gaps where you've scanned the existing ones but are still waiting for more to be generated and AFAIK there's no way to have gaps like that in the task graph. Rail and I talked about this a bit more and I think we can get what we need by setting extra metadata on the objects in s3 when AV completes, and have a Scheduler or job in Buildbot that makes sure all of the objects with a certain prefix have that metadata set before "ready for release" fires. There's still a bit of trickyness in making sure you don't start scanning the s3 bucket too soon because partial MARs will soon be moved to Funsize - and Buildbot won't know when those jobs complete. We're not 100% sure what to do here yet, but we talked about the idea of another job in Buildbot that is fed the task ids for the partials, and watches those queues. Once all of those complete, and any jobs from Buildbot that generate deliverables, you could start looking for the antivirus metadata.

bhearsum@mozilla.com (:bhearsum)

Reporter

Comment 7

•

10 years ago

(In reply to Ben Hearsum [:bhearsum] from comment #6) > There's still a bit of trickyness in making sure you don't start scanning > the s3 bucket too soon because partial MARs will soon be moved to Funsize - > and Buildbot won't know when those jobs complete. We're not 100% sure what > to do here yet, but we talked about the idea of another job in Buildbot that > is fed the task ids for the partials, and watches those queues. Once all of > those complete, and any jobs from Buildbot that generate deliverables, you > could start looking for the antivirus metadata. The buildbot<->tc bridge from bug 1135192 might be what we need here...

bhearsum@mozilla.com (:bhearsum)

Reporter

Comment 8

•

10 years ago

(In reply to Ben Hearsum [:bhearsum] from comment #7) > (In reply to Ben Hearsum [:bhearsum] from comment #6) > > There's still a bit of trickyness in making sure you don't start scanning > > the s3 bucket too soon because partial MARs will soon be moved to Funsize - > > and Buildbot won't know when those jobs complete. We're not 100% sure what > > to do here yet, but we talked about the idea of another job in Buildbot that > > is fed the task ids for the partials, and watches those queues. Once all of > > those complete, and any jobs from Buildbot that generate deliverables, you > > could start looking for the antivirus metadata. > > The buildbot<->tc bridge from bug 1135192 might be what we need here... Talked about this with Catlee a bit yesterday. Turns out that the buildbot bridge only works if ALL of the scheduling is in taskcluster, so it won't help us out unless we block this on moving that there. We want to do that eventually anyways, so I'm going to see how much effort that's going to take to do it. If we can do that quickly enough, we don't need an external watcher, or even extra metadata on the s3 objects. If we can't do it quickly enough, we should stick with comment #6 and no buildbot bridge for now, unless someone comes up with something else better in the meantime.

bhearsum@mozilla.com (:bhearsum)

Reporter

Updated

•

10 years ago

Depends on: 1150162

Nick Thomas [:nthomas] (UTC+12)

Assignee

Updated

•

10 years ago

Updated

•

10 years ago

No longer depends on: 1150162

Rail Aliiev [:rail]

Comment 9

•

10 years ago

<rail> so we are basically changing http://hg.mozilla.org/build/buildbotcustom/file/2bfdd8a7fea9/process/release.py#l1432 and the script <rail> also we need to spin up this builder, using a golden AMI for this type of builder which runs freshclam on boot <nthomas> sounds good. I can work on the script part <rail> I'll work on puppet (extend existing build.pp, add a cronjob for golden ami), slavealloc (add slaves), cloud-tools (new type configs) <nthomas> bug 1174145 has a nice template for the download from s3, using multiprocessing

Rail Aliiev [:rail]

Comment 10

•

10 years ago

NI myself just in case. :)

Flags: needinfo?(rail)

Rail Aliiev [:rail]

Comment 11

•

10 years ago

Initial cloud tools work: https://github.com/mozilla/build-cloud-tools/compare/master...rail:av-linux64?expand=1

Flags: needinfo?(rail)

Rail Aliiev [:rail]

Comment 12

•

10 years ago

Attached patch av-linux64-puppet.diff (obsolete) — Details — Splinter Review

* not tested * this is basically our regular build slave + clamd * freshclam run is "optimized" (@reboot cron) for slave-like instances, that don't live too long. We can also run it from a initscript as a daemon (will require an initscript and probably a config)

Attachment #8649584 - Flags: feedback?(bugspam.Callek)

Justin Wood (:Callek)

Comment 13

•

10 years ago

Comment on attachment 8649584 [details] [diff] [review] av-linux64-puppet.diff Review of attachment 8649584 [details] [diff] [review]: ----------------------------------------------------------------- ::: modules/clamav/manifests/daemon.pp @@ +15,5 @@ > + require => Class["packages::clamd"]; > + } > + file { > + "/etc/crond.d/freshclam": > + content => "@reboot root /usr/bin/freshclam --quiet --daemon-notify\n"; should the exec use --quiet (said as someone who never manually ran freshclam) ::: modules/packages/manifests/clamd.pp @@ +6,5 @@ > + case $::operatingsystem { > + CentOS: { > + package { > + "clamd": > + ensure => latest; I'm curious, how are we trying to ensure clamd is actually "new enough", since its a program that newer versions tend to matter more for, however centOS is generally "no new versions unless for critical sec reasons without feature changes" are we planning on building newer versions on our end. (I do note, I'm generally against |ensure=>latest;| but its probably the right idea for clamd) ::: modules/toplevel/manifests/slave/releng/build/av.pp @@ +3,5 @@ > +# file, You can obtain one at http://mozilla.org/MPL/2.0/. > + > +class toplevel::slave::releng::build::av inherits toplevel::slave::releng::build::mock { > + include clamav::daemon > +} If I'm reviewing the final patch, I probably want a chat about the goals of this instance/actual implementation. Because as it stands I can't imagine why we need/want "mock" on here, nor why we need to populate the shared repos.

Attachment #8649584 - Flags: feedback?(bugspam.Callek) → feedback+

Rail Aliiev [:rail]

Comment 14

•

10 years ago

(In reply to Justin Wood (:Callek) from comment #13) > should the exec use --quiet (said as someone who never manually ran > freshclam) It doesn't really matter because it will be run only once when a golden AMI is created. > I'm curious, how are we trying to ensure clamd is actually "new enough", > since its a program that newer versions tend to matter more for, however > centOS is generally "no new versions unless for critical sec reasons without > feature changes" are we planning on building newer versions on our end. We use whatever version we have in the repos. If we need something fresher, we can add a repo for this. > If I'm reviewing the final patch, I probably want a chat about the goals of > this instance/actual implementation. Because as it stands I can't imagine > why we need/want "mock" on here, nor why we need to populate the shared > repos. I thought about "forking" mock.pp, but given the scope of the project (it will be replaced) I didn't want to bother myself. :)

Rail Aliiev [:rail]

Comment 15

•

10 years ago

Attached patch av-linux64-puppet-1.diff (obsolete) — Details — Splinter Review

I reworked the initial patch to make freshclam more robust: * no more cronjobs to update the db * added an initscript to run freshclam as a daemon. It will update the DB on boot and will be checking for updates hourly (Checks 24) * modified vanilla freshclam.conf to: * use PID * notify daemon on update * "subscribe" the initial freshclam to package installation, so we run it only once, not on every puppet run Technically freshclam depends on clamav (not clamd), but in this case I didn't want to add extra code.

Attachment #8649584 - Attachment is obsolete: true

Attachment #8651107 - Flags: review?(dustin)

Dustin J. Mitchell [:dustin] (he/him)

Comment 16

•

10 years ago

Comment on attachment 8651107 [details] [diff] [review] av-linux64-puppet-1.diff Review of attachment 8651107 [details] [diff] [review]: ----------------------------------------------------------------- Good to go once the toplevel inheritance is straightened out ::: modules/toplevel/manifests/slave/releng/build/av.pp @@ +1,5 @@ > +# This Source Code Form is subject to the terms of the Mozilla Public > +# License, v. 2.0. If a copy of the MPL was not distributed with this > +# file, You can obtain one at http://mozilla.org/MPL/2.0/. > + > +class toplevel::slave::releng::build::av inherits toplevel::slave::releng::build::mock { If this inherits from mock, then it should be `toplevel::slave::releng::build::mock::av`

Attachment #8651107 - Flags: review?(dustin) → review+

Rail Aliiev [:rail]

Comment 17

•

10 years ago

Attached patch av-linux64-puppet-2.diff — Details — Splinter Review

Agree, toplevel::slave::releng::build::mock::av makes more sense. Renamed. Carry over r+

Attachment #8651107 - Attachment is obsolete: true

Attachment #8651135 - Flags: review+

Rail Aliiev [:rail]

Comment 18

•

10 years ago

Attached file Add av-linux64 to build-tools — Details

Attachment #8651136 - Flags: review?(dustin)

Dustin J. Mitchell [:dustin] (he/him)

Updated

•

10 years ago

Attachment #8651136 - Flags: review?(dustin) → review+

Rail Aliiev [:rail]

Comment 19

•

10 years ago

Nick, let me know when you want to land this. Next steps: * Add new slave type to production.py in bb configs * change release.py to use this slave type * land puppet and cloud-tools changes * wait (or force) for a new golden AMI (I added it to DNS and created a new IAM Role (a copy of bld-linux64)) * reconfig * add new slaves to slavealloc Anything else?

Rail Aliiev [:rail]

Comment 20

•

10 years ago

I was going to land the patches above today, but it's probably safer to wait until we ship this chemspill. There is no rush in any case.

Rail Aliiev [:rail]

Comment 21

•

10 years ago

Comment on attachment 8651136 [details] [review] Add av-linux64 to build-tools merged

Attachment #8651136 - Flags: checked-in+

Rail Aliiev [:rail]

Comment 22

•

10 years ago

Comment on attachment 8651135 [details] [diff] [review] av-linux64-puppet-2.diff remote: https://hg.mozilla.org/build/puppet/rev/ac2ad14a1254 remote: https://hg.mozilla.org/build/puppet/rev/20a2c7b9d53a

Attachment #8651135 - Flags: checked-in+

Rail Aliiev [:rail]

Comment 23

•

10 years ago

bustage fix: remote: https://hg.mozilla.org/build/puppet/rev/6623905ab574 remote: https://hg.mozilla.org/build/puppet/rev/fbca2d6f3302

Rail Aliiev [:rail]

Comment 24

•

10 years ago

Attached patch ensure_clamd_running.diff — Details — Splinter Review

A small tweak to ensure that clamd is running, so the initial freshclam's notify doesn't barf.

Attachment #8653105 - Flags: review?(nthomas)

Nick Thomas [:nthomas] (UTC+12)

Assignee

Updated

•

10 years ago

Attachment #8653105 - Flags: review?(nthomas) → review+

Rail Aliiev [:rail]

Comment 25

•

10 years ago

Comment on attachment 8653105 [details] [diff] [review] ensure_clamd_running.diff remote: https://hg.mozilla.org/build/puppet/rev/bbb0d82c44a2 remote: https://hg.mozilla.org/build/puppet/rev/3272d605323e

Attachment #8653105 - Flags: checked-in+

Rail Aliiev [:rail]

Comment 26

•

10 years ago

I forced golden AMI generation and it worked fine. Let's see how it goes overnight.

Nick Thomas [:nthomas] (UTC+12)

Assignee

Comment 27

•

10 years ago

Rail, I had a go at spinning up an instance for staging but wondered if we need configs/dev-av-linux64 to do that properly. What would you recommend ?

Nick Thomas [:nthomas] (UTC+12)

Assignee

Comment 28

•

10 years ago

Attached patch [gecko] antivirus script, wip (obsolete) — Details — Splinter Review

I've been testing this on a dev-linux64-ec2 with clamd retrofitted, example call: python2.7 scripts/release/antivirus.py --product firefox --version 41.0 --build-number 1 --bucket-name net-mozaws-stage-delivery-firefox -d 10 -s 4 --tools-revision FIREFOX_41_0_RELEASE --tools-repo https://hg.mozilla.org/users/stage-ffxbld/tools Runs for ~2 minutes for a 4 locale subset, ~1 minute of that is scanning. Need to fix windows installers though, eg 04:33:09 INFO - Command failed for the following files: 04:33:09 INFO - /home/cltbld/av-dev/scripts/cache/win32/ja/Firefox Setup Stub 41.0.exe 04:33:09 INFO - /home/cltbld/av-dev/scripts/cache/win32/de/Firefox Setup Stub 41.0.exe and some of the scanning is suspiciously fast. It shares a lot with bug 1181542, and bug 1174145. If this code were living a long time it would make sense to have a stage-tasks.py in mozharness.

Rail Aliiev [:rail]

Comment 29

•

10 years ago

(In reply to Nick Thomas [:nthomas] from comment #27) > Rail, I had a go at spinning up an instance for staging but wondered if we > need configs/dev-av-linux64 to do that properly. What would you recommend ? I think so. At least you don't need to install/tweak clamd in this case.

Rail Aliiev [:rail]

Comment 30

•

10 years ago

Comment on attachment 8654808 [details] [diff] [review] [gecko] antivirus script, wip Review of attachment 8654808 [details] [diff] [review]: ----------------------------------------------------------------- ::: testing/mozharness/scripts/release/antivirus.py @@ +129,5 @@ > + so that we can unpack various files for clam to scan them.""" > + remote_file = "{}/raw-file/{}/stage/extract_and_run_command.py".format(self.config["tools_repo"], > + self.config["tools_revision"]) > + self.download_file(remote_file, file_name="extract_and_run_command.py") > + remote_file = "{}/raw-file/{}/buildfarm/utils/mar.py".format(self.config["tools_repo"], maybe use https://pypi.python.org/pypi/mar instead?

Rail Aliiev [:rail]

Comment 31

•

10 years ago

BTW, I just stumbled upon https://github.com/bloomreach/s4cmd, maybe it's worth to try it: * Multi-threaded/multi-connection implementation for enhanced performance on all commands. * recursive get * md5 check of content * wildcard support

Nick Thomas [:nthomas] (UTC+12)

Assignee

Comment 32

•

10 years ago

Attached patch [puppet] Add p7zip package — Details — Splinter Review

You were right about needing p7zip, which extract_and_run_command.py uses to unpack our windows installers. This is an untested attempt to add this, bug I'm not sure if anchors should be used in p7zip.pp.

Attachment #8656964 - Flags: feedback?(rail)

Nick Thomas [:nthomas] (UTC+12)

Assignee

Comment 33

•

10 years ago

Attached patch [gecko] antivirus script (obsolete) — Details — Splinter Review

This is working on a dev slave (retrofitted with clam & p7zip). Not quite sure how to spin up one of the new instances properly, so that's TODO. Thanks for the suggestions rail. I swapped to catlee's mar package, unfortunately it still emits 22:24:47 INFO - Exception IOError: 'File not open for writing' in <bound method BZ2MarFile.__del__ of <mar.BZ2MarFile instance at 0x1df50e0>> ignored s4cmd doesn't support exclusions, which is shame. I think I'll just stick to brute forcing download perf with high request parallelism.

Attachment #8654808 - Attachment is obsolete: true

Dustin J. Mitchell [:dustin] (he/him)

Comment 34

•

10 years ago

Comment on attachment 8656964 [details] [diff] [review] [puppet] Add p7zip package Anchors are only needed when a custom class calls another custom class, as Puppet doesn't otherwise draw the right arrows in the dependency graph. `package` is a built-in, so no need for an anchor.

Attachment #8656964 - Flags: feedback?(rail) → feedback+

Nick Thomas [:nthomas] (UTC+12)

Assignee

Comment 35

•

10 years ago

Rail, could you please verify the p7zip patch ? I spun up an instance based on the av AMI, but my puppet-foo for getting happy certs has evaporated. TODO > * add new slaves to slavealloc > * Add new slave type to production.py in bb configs > * change release.py to use this slave type * patch for release.py to call the new script (like attachment 8657074 [details] [diff] [review] but different args) * land p7zip fix and get it into the golden AMI * test staging release against production VM

Rail Aliiev [:rail]

Comment 36

•

10 years ago

NI myself

Flags: needinfo?(rail)

Rail Aliiev [:rail]

Comment 37

•

10 years ago

Comment on attachment 8656964 [details] [diff] [review] [puppet] Add p7zip package Ship it! I created an instance by: cd /builds/aws_manager/cloud-tools/scripts /builds/aws_manager/bin/python aws_create_instance.py -c /builds/aws_manager/cloud-tools/configs/av-linux64 -r us-east-1 -s aws-releng -k /builds/aws_manager/secrets/aws-secrets.json --ssh-key /home/buildduty/.ssh/aws-ssh-key -i /builds/aws_manager/cloud-tools/instance_data/us-east-1.instance_data_prod.json --ignore-subnet-check av-linux64-ec2-golden Then logged in and ran puppet against my env: puppet agent --test --server releng-puppet2.srv.releng.scl3.mozilla.com --environment raliiev Verified by: # rpm -q p7zip p7zip-9.20.1-2.el6.x86_64 # rpm -e p7zip # rpm -e p7zip error: package p7zip is not installed # puppet agent --test --server releng-puppet2.srv.releng.scl3.mozilla.com --environment raliiev Info: Retrieving pluginfacts Info: Retrieving plugin Info: Loading facts Info: Caching catalog for av-linux64-ec2-golden.build.releng.use1.mozilla.com Warning: The package type's allow_virtual parameter will be changing its default value from false to true in a future release. If you do not want to allow virtual packages, please explicitly set allow_virtual to false. (at /usr/lib/ruby/site_ruby/1.8/puppet/type/package.rb:430:in `default') Info: Applying configuration version 'de152ef94d46+' Notice: /Stage[main]/Packages::P7zip/Package[p7zip]/ensure: created Notice: Finished catalog run in 29.71 seconds # rpm -q p7zip p7zip-9.20.1-2.el6.x86_64

Flags: needinfo?(rail)

Attachment #8656964 - Flags: review+

Nick Thomas [:nthomas] (UTC+12)

Assignee

Comment 38

•

10 years ago

Comment on attachment 8656967 [details] [diff] [review] [gecko] antivirus script Tested with Firefox and Thunderbird, on a modded up build slave. TODO - buildbotcustom patch to drive this, and verify av-linux64 AMI is good.

Attachment #8656967 - Flags: review?(rail)

Rail Aliiev [:rail]

Comment 39

•

10 years ago

Comment on attachment 8656967 [details] [diff] [review] [gecko] antivirus script Let me redirect this jlund's way - he is the pro! :)

Attachment #8656967 - Flags: review?(rail) → review?(jlund)

Jordan Lund (:jlund)

Comment 40

•

10 years ago

Comment on attachment 8656967 [details] [diff] [review] [gecko] antivirus script Review of attachment 8656967 [details] [diff] [review]: ----------------------------------------------------------------- sweet. so this will download all the release artifacts from s3 and scan them in parallel. will there be another script that pushes them back up or changes some of the s3 metadata? Or am I missing some high level understanding? ::: testing/mozharness/scripts/release/antivirus.py @@ +88,5 @@ > + super(AntivirusScan, self)._pre_config_lock(rw_config) > + > + # This default is set here rather in the config because default > + # lists cannot be completely overidden, only appended to. > + if not self.config.get("excludes"): wfm. I suppose the other option is to make excludes a self attr defined in __init__ and put dest_dir there too: self.excludes = self.config.get('excludes', DEFAULT_EXCLUDES) self.dest_dir = self.config.get('dest_dir', DEFAULT_DEST_DIR) @@ +154,5 @@ > + def worker(item): > + source, destination = item > + > + self.info("Downloading {} to {}".format(source, destination)) > + dest_path = os.path.dirname(destination) looks like dest_path isn't used @@ +156,5 @@ > + > + self.info("Downloading {} to {}".format(source, destination)) > + dest_path = os.path.dirname(destination) > + key = bucket.get_key(source) > + return retry(key.get_contents_to_filename, cool, fyi mh has a self.retry (from BaseScript) that might be good here. probably not worth the hassle this time since you have tested this though @@ +175,5 @@ > + if not os.path.isdir(dest_dir): > + os.makedirs(dest_dir) > + yield (keyname, destination) > + > + pool = ThreadPool(self.config["download_parallelization"]) I am a noob with using multiprocessing in python. From what I gather though, shouldn't we be wrapping any thread calls within a `if __name__ == main` or else importing the worker function and iterating list from another module to avoid unwanted side effects like recursive process inits? I know this is still definetily an issue with windows according to the docs. not sure if they fixed this for unix like machines: https://docs.python.org/2/library/multiprocessing.html#windows https://pymotw.com/2/multiprocessing/basics.html#importable-target-functions

Nick Thomas [:nthomas] (UTC+12)

Assignee

Updated

•

10 years ago

Depends on: 1206585

Nick Thomas [:nthomas] (UTC+12)

Assignee

Comment 41

•

10 years ago

(In reply to Jordan Lund (:jlund) from comment #40) > sweet. so this will download all the release artifacts from s3 and scan them > in parallel. will there be another script that pushes them back up or > changes some of the s3 metadata? Or am I missing some high level > understanding? In this iteration we're just scanning, there's no signature or other upload afterwards. In the event of any failure we'd follow up with an upload to virustotal.com (by hand), to see if it's just ClamAV or more widespread. > I suppose the other option is to make excludes a self attr defined in > __init__ and put dest_dir there too: > self.excludes = self.config.get('excludes', DEFAULT_EXCLUDES) > self.dest_dir = self.config.get('dest_dir', DEFAULT_DEST_DIR) That might be a little cleaner, I'll try it. > > + self.info("Downloading {} to {}".format(source, destination)) > > + dest_path = os.path.dirname(destination) > > looks like dest_path isn't used Removed. I was going to create the path/to/destination at this point, but doing it in find_release_files avoids races. > > + pool = ThreadPool(self.config["download_parallelization"]) > > I am a noob with using multiprocessing in python. Can't claim much on this either. Will check if that applies, but it looks like Windows only on the main multiprocessing doc. We're only planning to run this on linux.

Nick Thomas [:nthomas] (UTC+12)

Assignee

Comment 42

•

10 years ago

Attached patch [puppet] Add password to buildbot masters — Details — Splinter Review

I contemplated adding an entry to moco-nodes.pp to allow staging instance to talk to puppet, to make sure freshclam runs on boot. But don't the golden spot images have puppet disabled on the client side ?

Attachment #8663589 - Flags: review?(rail)

Nick Thomas [:nthomas] (UTC+12)

Assignee

Comment 43

•

10 years ago

Also: Sep 21 03:32:44 dev-av-linux64-ec2-nthomas.dev.releng.use1.mozilla.com puppet-agent[30186]: Could not retrieve catalog from remote server: Error 400 on SERVER: This host used to have trust level core, and cannot be changed to try without a reimage at /etc/puppet/production/modules/slave_secrets/manifests/init.pp:16 on node dev-av-linux64-ec2-nthomas.dev.releng.use1.mozilla.com Sep 21 03:32:44 dev-av-linux64-ec2-nthomas.dev.releng.use1.mozilla.com puppet-agent[30186]: Not using cache on failed catalog Sep 21 03:32:44 dev-av-linux64-ec2-nthomas.dev.releng.use1.mozilla.com puppet-agent[30186]: Could not retrieve catalog; skipping run

Rail Aliiev [:rail]

Updated

•

10 years ago

Attachment #8663589 - Flags: review?(rail) → review+

Rail Aliiev [:rail]

Comment 44

•

10 years ago

(In reply to Nick Thomas [:nthomas] from comment #42) > Created attachment 8663589 [details] [diff] [review] > [puppet] Add password to buildbot masters > > I contemplated adding an entry to moco-nodes.pp to allow staging instance to > talk to puppet, to make sure freshclam runs on boot. But don't the golden > spot images have puppet disabled on the client side ? Yeah, we run puppet on golden AMIs and then disable it, so the spot instances don't use it. But we still apply puppet changes to these machines.

Rail Aliiev [:rail]

Comment 45

•

10 years ago

(In reply to Nick Thomas [:nthomas] from comment #43) > Also: > Sep 21 03:32:44 dev-av-linux64-ec2-nthomas.dev.releng.use1.mozilla.com > puppet-agent[30186]: Could not retrieve catalog from remote server: Error > 400 on SERVER: This host used to have trust level core, and cannot be > changed to try without a reimage at > /etc/puppet/production/modules/slave_secrets/manifests/init.pp:16 on node > dev-av-linux64-ec2-nthomas.dev.releng.use1.mozilla.com > Sep 21 03:32:44 dev-av-linux64-ec2-nthomas.dev.releng.use1.mozilla.com > puppet-agent[30186]: Not using cache on failed catalog > Sep 21 03:32:44 dev-av-linux64-ec2-nthomas.dev.releng.use1.mozilla.com > puppet-agent[30186]: Could not retrieve catalog; skipping run Hmmm, sounds like it tries to use the trust level set in slavealloc, but dev-av-linux64-ec2-nthomas is not there.

Jordan Lund (:jlund)

Comment 46

•

10 years ago

Attachment #8656967 - Flags: review?(jlund)

Nick Thomas [:nthomas] (UTC+12)

Assignee

Comment 47

•

10 years ago

I've upload catlee's mar v1.2 from https://pypi.python.org/pypi/mar to http://pypi.pub.build.mozilla.org/pub.

Nick Thomas [:nthomas] (UTC+12)

Assignee

Comment 48

•

10 years ago

Comment on attachment 8656964 [details] [diff] [review] [puppet] Add p7zip package https://hg.mozilla.org/build/puppet/rev/f942e48a2de3 https://hg.mozilla.org/build/puppet/rev/4b182b11d6b0

Attachment #8656964 - Flags: checked-in+

Nick Thomas [:nthomas] (UTC+12)

Assignee

Comment 49

•

10 years ago

rail, to scan Firefox 42.0b2 we'd need about 80G, and 41.0.1 about 60G, to store the files. Can we do that easily with the existing AMI ? Say 120G at / instead of 50G. I had a quick look at shifting the model from 'download all, scan all' to 'for each file: download, scan, discard (with multiple in parallel)' but the logs are very jumbled, so I'd have to slay that dragon.

Flags: needinfo?(rail)

Rail Aliiev [:rail]

Comment 50

•

10 years ago

50G is minimum of what the slaves in theory can get. There is this script http://hg.mozilla.org/build/puppet/file/4b182b11d6b0/modules/aws/files/manage_instance_storage.py which is responsible to do LVM magic similar to https://github.com/mozilla/build-cloud-tools/pull/119 commands, except that it operates on instance (local) storage that come for free with instances. http://aws.amazon.com/ec2/instance-types/ and https://github.com/mozilla/build-cloud-tools/blob/master/configs/watch_pending.cfg#L128 tells me that the minimum we get is 80G. Add it to 50G on AMI and it'll give you 130G in total. We don't use this approach on loaners because instance storage is ephemeral and if you use it as apart of an LVM volume, the volume will be "corrupt" after a stop/start.

Flags: needinfo?(rail)

Nick Thomas [:nthomas] (UTC+12)

Assignee

Comment 51

•

10 years ago

Attached patch [gecko] antivirus script, v2 — Details — Splinter Review

This cleans up the exclude default as requested, as well as polishing up some help messages. I've left the multiprocessing as is because I think it's OK after reading the docs. There's a __main__ guard and we're on linux, and I haven't seen anything funky in lots of test runs. It also deletes the cache of files at the end, just in case we get another av job on the same slave before we recycle the instance. I did look at using PurgeMixin.purge_builds() early in the job, but would have work around the default of not deleting release directories. We'll be running on spot instances and I doubt we'll have a chance to investigate any failures before an instance gets the chop by aws-manager.

Attachment #8668207 - Flags: review?(jlund)

Nick Thomas [:nthomas] (UTC+12)

Assignee

Updated

•

10 years ago

Attachment #8656967 - Attachment is obsolete: true

Nick Thomas [:nthomas] (UTC+12)

Assignee

Comment 52

•

10 years ago

Attached patch [tools] Tweak extract_and_run.py — Details — Splinter Review

We try to use catlee's mardor first, and fall back to the mar.py nearby in tools if that fails. This can't break current production because there's a copy of extract_and_run.py in svn, which gets deployed to stage.m.o by puppet. I don't think I'll bother doing that.

Assignee: nobody → nthomas

Status: NEW → ASSIGNED

Attachment #8668213 - Flags: review?(rail)

Nick Thomas [:nthomas] (UTC+12)

Assignee

Comment 53

•

10 years ago

Attached patch [buildbot-configs] Add linux64-av platform to drive jobs (obsolete) — Details — Splinter Review

This is all just to make the slaves available to the antivirus job. It does mean that some other jobs may run on av slaves (if connected), eg ready for release and other notifications, start uptake checks, but I think that's harmless

Attachment #8668235 - Flags: review?(rail)

Nick Thomas [:nthomas] (UTC+12)

Assignee

Comment 54

•

10 years ago

Attached patch [buildbotcustom] Use new slaves, drive new script — Details — Splinter Review

Three hours should be long enough for a whole release, as a 4 locale x 4 platform test case takes 4m 30s.

Attachment #8668236 - Flags: review?(rail)

Rail Aliiev [:rail]

Updated

•

10 years ago

Attachment #8668213 - Flags: review?(rail) → review+

Rail Aliiev [:rail]

Updated

•

10 years ago

Attachment #8668235 - Flags: review?(rail) → review+

Rail Aliiev [:rail]

Comment 55

•

10 years ago

Comment on attachment 8668236 [details] [diff] [review] [buildbotcustom] Use new slaves, drive new script Review of attachment 8668236 [details] [diff] [review]: ----------------------------------------------------------------- ::: process/release.py @@ +123,5 @@ > if p == 'b2g': > continue > platform_slaves = branchConfig['platforms'][p].get('slaves', []) > all_slaves.extend(platform_slaves) > + if 'av' in p: Maybe use "av-linux" to be more specific? Otherwise LGTM.

Attachment #8668236 - Flags: review?(rail) → review+

Rail Aliiev [:rail]

Updated

•

10 years ago

Blocks: 1210538

Jordan Lund (:jlund)

Comment 56

•

10 years ago

Comment on attachment 8668207 [details] [diff] [review] [gecko] antivirus script, v2 Review of attachment 8668207 [details] [diff] [review]: ----------------------------------------------------------------- ::: testing/mozharness/scripts/release/antivirus.py @@ +34,5 @@ > + "help": "List of filename patterns to exclude. See script source for default", > + }], > + [["-d", "--download-parallelization"], { > + "dest": "download_parallelization", > + "default": 6, 10 was too much power? :) @@ +82,5 @@ > + config={ > + "virtualenv_modules": [ > + "boto", > + "redo", > + "mar", neat

Attachment #8668207 - Flags: review?(jlund) → review+

Nick Thomas [:nthomas] (UTC+12)

Assignee

Comment 57

•

10 years ago

(In reply to Jordan Lund (:jlund) [pto until oct 5th] from comment #56) > 10 was too much power? :) I did some quick timings and there wasn't much speedup beyond 4, presumably because S3 and the connection to it (from an AWS instance) are pretty fast. Just padded a bit to make sure the link is saturated. Thanks for the review.

Comment hidden (obsolete)

Nick Thomas [:nthomas] (UTC+12)

Assignee

Updated

•

10 years ago

Keywords: leave-open

Nick Thomas [:nthomas] (UTC+12)

Assignee

Comment 59

•

10 years ago

Comment on attachment 8668207 [details] [diff] [review] [gecko] antivirus script, v2 https://hg.mozilla.org/integration/mozilla-inbound/rev/9cb650b669bf https://hg.mozilla.org/releases/mozilla-aurora/rev/15f30369e3d1 https://hg.mozilla.org/releases/mozilla-beta/rev/2a93a1403925 https://hg.mozilla.org/releases/mozilla-release/rev/d36f7deb6396

Attachment #8668207 - Flags: checked-in+

Wes Kocher (:KWierso) (Not reading bugmail; email directly if needed)

Comment 60

•

10 years ago

https://hg.mozilla.org/mozilla-central/rev/9cb650b669bf

Nick Thomas [:nthomas] (UTC+12)

Assignee

Comment 61

•

10 years ago

Attached patch [buildbot-configs] Add linux64-av platform to drive jobs, v3 (obsolete) — Details — Splinter Review

Add linux64-av to Buildslaves.py.template too, carrying rail's r+.

Attachment #8668235 - Attachment is obsolete: true

Attachment #8672543 - Flags: review+

Nick Thomas [:nthomas] (UTC+12)

Assignee

Comment 62

•

10 years ago

Comment on attachment 8663589 [details] [diff] [review] [puppet] Add password to buildbot masters http://hg.mozilla.org/build/puppet/rev/13d31b45ae4a http://hg.mozilla.org/build/puppet/rev/5758bc71b80f

Attachment #8663589 - Flags: checked-in+

Nick Thomas [:nthomas] (UTC+12)

Assignee

Comment 63

•

10 years ago

Comment on attachment 8668213 [details] [diff] [review] [tools] Tweak extract_and_run.py https://hg.mozilla.org/build/tools/rev/4e8ee29b6be2

Attachment #8668213 - Flags: checked-in+

Nick Thomas [:nthomas] (UTC+12)

Assignee

Comment 64

•

10 years ago

Attached patch [buildbot-configs] Add linux64-av platform to drive jobs, v4 — Details — Splinter Review

Removes the linux64-av platform on try, so test-masters.sh passes. Carrying review over again. https://hg.mozilla.org/build/buildbot-configs/rev/b3331e7e1413

Attachment #8672543 - Attachment is obsolete: true

Attachment #8675963 - Flags: review+

Attachment #8675963 - Flags: checked-in+

Nick Thomas [:nthomas] (UTC+12)

Assignee

Comment 65

•

10 years ago

Added av-linux64-ec2-001 through 004 to slavealloc, two each in us-east-1 and us-west-2.

Nick Thomas [:nthomas] (UTC+12)

Assignee

Comment 66

•

10 years ago

Comment on attachment 8668207 [details] [diff] [review] [gecko] antivirus script, v2 http://hg.mozilla.org/releases/mozilla-esr38/rev/58ce8fcb6924 (default) http://hg.mozilla.org/releases/mozilla-esr38/rev/ab4f73574722 (tb verbranch)

Rail Aliiev [:rail]

Comment 67

•

10 years ago

AV didn't work as expected for 42.0b9 for several reasons: 1) the slaves were disabled in slavealloc (fixed) 2) the regexp needs a tweak: https://github.com/mozilla/build-cloud-tools/pull/124 (works with local change) 3) we don't have any "-spot-" slaves in buildbot configs, I bumped on-demand limits in watch pending config TODO: 1) Merge the patch 2) add "-spot-" slaves to slavealloc and bb-configs

Rail Aliiev [:rail]

Comment 68

•

10 years ago

Attached patch av.diff — Details — Splinter Review

Attachment #8678223 - Flags: review?

Justin Wood (:Callek)

Updated

•

10 years ago

Attachment #8678223 - Flags: review? → review+

Rail Aliiev [:rail]

Comment 69

•

10 years ago

Comment on attachment 8678223 [details] [diff] [review] av.diff https://hg.mozilla.org/build/buildbot-configs/rev/0d533fc07766

Attachment #8678223 - Flags: checked-in+

Rail Aliiev [:rail]

Comment 70

•

10 years ago

I added 4 slaves to slavealloc and now we have one taken a job! \o/

Nick Thomas [:nthomas] (UTC+12)

Assignee

Comment 71

•

10 years ago

Bug 1218763 is about handleing more exceptions for download errors. There's one outstanding issue. A full Firefox release build fills up the disk while caching the files. Rail, how do you feel about removing {c3,m3,r3}.xlarge so that we're guaranteed at least 160G of space ? Would that work out of the box ? Otherwise we could change the worker to download+scan in one go.

Status: ASSIGNED → NEW

Depends on: 1218763

Flags: needinfo?(rail)

Rail Aliiev [:rail]

Comment 72

•

10 years ago

Sounds good to me. The instances will automatically use all existing ephemeral storage (they will be added to LVM used by / ) The only thing that we may need to tweak is the bid price.

Flags: needinfo?(rail)

Rail Aliiev [:rail]

Comment 73

•

10 years ago

Another approach could be downloading, scanning and deleting files in batches.

Nick Thomas [:nthomas] (UTC+12)

Assignee

Comment 74

•

10 years ago

Attached file PR to swap to 4xlarge or bigger, ondemand instead of spot — Details

Attachment #8682872 - Flags: review?(rail)

Rail Aliiev [:rail]

Updated

•

10 years ago

Attachment #8682872 - Flags: review?(rail) → review+

Nick Thomas [:nthomas] (UTC+12)

Assignee

Comment 75

•

10 years ago

I merged that PR but we're not launching an instance to do the 43.0b1 build2 av job. There's this in aws_watch_pending.log: 2015-11-04 00:26:04,202 - av-linux64 - started 0 spot instances for slaveset None; need 1 2015-11-04 00:26:04,202 - need 1 ondemand av-linux64 for slaveset None 2015-11-04 00:26:04,221 - 0 av-linux64 ondemand instances running globally 2015-11-04 00:26:04,480 - Using cached slaves.json 2015-11-04 00:26:05,097 - getting all spot requests for us-west-2 2015-11-04 00:26:13,104 - No free IP available in None for subnets [u'subnet-d748dabe', u'subnet-a848dac1', u'subnet-ad48dac4', u'subnet-c74f48b3'] 2015-11-04 00:26:13,105 - No free IP available for av-linux64 in None 2015-11-04 00:26:13,472 - av-linux64 - started 0 instances for slaveset None; need 1 Those two 'None' look suspicious but I'm not going to try to fix that on the fly. Landed https://github.com/mozilla/build-cloud-tools/commit/5b1b103dd99ced48f33f2f5e723b62dc840c0a77 as a work around. It re-enables spots, bumps the bid up to 0.40, and renormalises the performance_constant's.

Nick Thomas [:nthomas] (UTC+12)

Assignee

Comment 76

•

10 years ago

Using the bigger instances has been fine.

Status: NEW → RESOLVED

Closed: 10 years ago

Resolution: --- → FIXED

Firefox Bug Husbandry Bot

Comment 77

•

7 years ago

Removing leave-open keyword from resolved bugs, per :sylvestre.

Keywords: leave-open

You need to log in before you can comment on or make changes to this bug.