1207023 - delivery: set best instance type for upload.ffxbld.productdelivery.prod.mozaws.net

Reporter

Description

•

9 years ago

This is just for the ffxbld upload host in in production, everything else is much lower traffic. First lets bring over a comment from bug 1186297, where rail said: I investigated multiple options to figure out what would be the optimal instance type for the upload host. One of ideas was trying to simulate load comparable to what we have now. First step was to figure out the current upload rates. I tried to use graphite, but the data is to coarse - tx/rx rates are not what we need. I took this approach to figure out the upload rates we experience: * find all files modified in particular period of time. I applied this to Firefox, Fennec and B2G files living on stage.m.o, modified within 24h for a busy day with multiple releases in fly (around Sep 15) * Generate time series and analyze the rates. In our case the max is most important because we have to plan for peaks. The results are the below: 30s max: 874 Mbps 1m max: 658 Mbps 3m max: 477 Mbps 5m max: 439 Mbps 10m max: 378 Mbps 1h max: 320 Mbps Load simulation is a bit tricky task which may take a lot of resources. We thought that we could use taskcluster to spin up a lot of clients and upload some files. This will require some extra work to prep proper images with all needed secrets baked in and write custom scripts to generate traffic. From our past experience with proxxy, we will need quite a beefy instance (assuming we can't use multiple instances in parallel) to meet the needed network performance. Per http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ebs-ec2-config.html m3.2xlarge might be what we need. ---------- Rail, during the S3 migration meeting today we wondered * are you selecting based on IOps, or network bandwidth ? * the newer m4 instances have enhanced networking, but EBS instead of SSD; could be would be worth checking them I also (wondered aloud) about a ramdisk instead of physical disk, because this is all ephemeral data. I'm thinking now this isn't very helpful, because matching the 40G on we have on stage:/tmp is m4.4xlarge or r3.2xlarge, and that's before any change in data-at-rest timing (it'll take longer to sequentially upload from scl3, before running post_upload).

Flags: needinfo?(rail)

Rail Aliiev [:rail]

Comment 1

•

9 years ago

(In reply to Nick Thomas [:nthomas] from comment #0) > Rail, during the S3 migration meeting today we wondered > * are you selecting based on IOps, or network bandwidth ? Network bandwidth. IOps can be adjusted if needed (you can use IOps-optimized EBS). > * the newer m4 instances have enhanced networking, but EBS instead of SSD; > could be would be worth checking them I believe you can choose SSD (it's still EBS though). I don't have any strong opinion about m3 vs m4. It's mostly about prices. In worse case scenario we can switch from m3 to m4. It will require instance shutdown, but no migration needed. > I also (wondered aloud) about a ramdisk instead of physical disk, because > this is all ephemeral data. I'm thinking now this isn't very helpful, > because matching the 40G on we have on stage:/tmp is m4.4xlarge or > r3.2xlarge, and that's before any change in data-at-rest timing (it'll take > longer to sequentially upload from scl3, before running post_upload). This is actually a great idea. If you have enough RAM we can definitely use tmpfs as $TMP.

Flags: needinfo?(rail)

Nick Thomas [:nthomas] (UTC+12)

Reporter

Comment 2

•

9 years ago

The price difference between m3.2xlarge and m4.2xlarge is minimal ($0.532/hr vs $0.502/hr in us-west-2), and the former has SSD disk. I can't check what we have on upload.ffxbld.productdelivery.prod.mozaws.net right now, but on the stage equivalent it's a t2.medium with what must be an EBS disk. The m3's have varying amount of SSD, and we may want to downgrade the instance later on as we stop uploading this way, so it may be better to go with an m4 for the optmized EBS. Does that sound right ? re tmpfs, probably too late at this point to re-engineer the upload host. Looks like we'd end up spending more on the instance too, if we want to match the 40G of space in stage:/tmp. Typically we are using a lot less than that, maybe just a few GB at any time, but if something goes wrong the space gets used up quickly.

Jeremy Orem [:oremj]

Assignee

Comment 3

•

9 years ago

Our AMI mounts ephemeral storage to /media/ephemeral0/. Where in /tmp are you uploading now and can it be changed?

Flags: needinfo?(nthomas)

Nick Thomas [:nthomas] (UTC+12)

Reporter

Comment 4

•

9 years ago

Most things are doing a 'mktemp -d', via https://dxr.mozilla.org/mozilla-central/source/build/upload.py#208. There are a few places were that is reimplemented we'd need to track down too. In theory we could modify that to be 'mktemp -d -p <somepath>'.

Flags: needinfo?(nthomas)

Nick Thomas [:nthomas] (UTC+12)

Reporter

Comment 5

•

9 years ago

Does the question implies that local SSD would be much preferable over EBS/optimised-EBS ?

Nick Thomas [:nthomas] (UTC+12)

Reporter

Updated

•

9 years ago

Blocks: 1213790

Jeremy Orem [:oremj]

Assignee

Comment 6

•

9 years ago

I'm going to update ffxbld hosts to m4.xlarge, which has high performance networking. Let's see if this improves speed enough.

Jeremy Orem [:oremj]

Assignee

Comment 7

•

9 years ago

Prod and stage and now m4.xlarge. Let's reopen if we need to change again.

Status: NEW → RESOLVED

Closed: 9 years ago

Resolution: --- → FIXED

Bugzilla

delivery: set best instance type for upload.ffxbld.productdelivery.prod.mozaws.net

Categories

(Cloud Services :: Operations: Miscellaneous, task)

Tracking

(Not tracked)

People

(Reporter: nthomas, Assigned: oremj)

References

Details

Crash Data

Security

(public)

User Story

Description

Comment 1

Comment 2

Comment 3

Comment 4

Comment 5

Updated

Comment 6

Comment 7