Closed Bug 1023477 Opened 10 years ago Closed 10 years ago

reduce EBS writes by tweaking writeout

Categories

(Release Engineering :: General, defect)

x86_64
Windows 8.1
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: taras.mozilla, Assigned: rail)

References

Details

Attachments

(3 files, 3 obsolete files)

According to 
https://wiki.archlinux.org/index.php/Solid_State_Drives#Disabling_Journaling_on_the_filesystem

This could cut our clobber times/cost(and probably other operations with lots of small files) by 40%
Should set barrier=0 mount option while at it
https://hg.mozilla.org/build/cloud-tools/rev/ba4c4560fcd6
https://hg.mozilla.org/build/cloud-tools/rev/999666860211

We still need to regenerate base AMIs and update the configs we use for spot instances to point to the new AMIs.
Assignee: nobody → rail
Just to clarify, by disabling journaling I mean something like 
mkfs or tune2fs -O ^journal
Oh, I thought that "data=writeback" does the magic.
Tried playing with laptop mode. I could not get IO to get down to 0, I'm sure there is some knob I'm missing somewhere, but I got it to almost nothing with the following

root@ip-10-33-6-234:/home/ubuntu# dstat -dnyc -N eth0 -C total -f 60
--dsk/xvda----dsk/xvdb- --net/eth0- ---system-- ----total-cpu-usage----
 read  writ: read  writ| recv  send| int   csw |usr sys idl wai hiq siq
 678k   16k:  12k   45B|   0     0 | 427   557 |  1   1  96   1   0   0
1114k  819k:8414B 1291M| 345k 1481B| 675   506 |  7   1  87   6   0   0
 137B 1980B:  68B   12M|2021k   10k|1100   612 | 18   1  80   1   0   0
   0     0 :   0  3087k|2766k   14k| 559   107 | 12   0  87   1   0   0
  68B    0 : 137B 3380k|2512k   11k| 966   593 | 14   1  84   1   0   0
   0     0 : 205B 5253k|2283k 6761B|1528  1163 | 22   1  76   0   0   0
   0     0 : 273B 6333k|1850k 4436B|1560  1169 | 23   2  74   1   0   0
   0     0 : 341B 7585k|2694k 9750B|1675  1159 | 21   2  77   1   0   0
 937k   73k:  10k   14M|  65k 1501B|2340  2115 | 27   6  60   7   0   0
  39k   92k:  68B   13M| 490B 2408B|2178  2276 | 63   7  27   4   0   0
  91k 5120B:   0  2232k| 364B 1439B|1589  1974 | 90   8   2   0   0   0
 164B    0 :   0  1881k| 198B  574B|1417  1835 | 92   7   1   0   0   0^Ac

 
 
 before ^
 
 --dsk/xvda----dsk/xvdb- --net/eth0- ---system-- ----total-cpu-usage----
 read  writ: read  writ| recv  send| int   csw |usr sys idl wai hiq siq
1495k   33k:  28k  103B|   0     0 | 979  1170 |  2   3  92   3   0   0
1117k  807k:8414B 1302M| 662k 2202B|1208   911 | 14   1  76   9   0   0
 137B    0 :   0  2731B|2527k   12k| 628   193 | 13   0  86   0   0   0
   0     0 :   0  2458B|2634k   17k| 650   166 | 12   0  87   0   0   0
  68B    0 : 205B  260k|2447k 9981B|1157   835 | 16   1  83   0   0   0
   0     0 : 273B  386k|2155k 6767B|1472  1152 | 23   1  76   0   0   0
   0     0 : 273B  556k|1922k 5945B|1459  1166 | 22   2  77   0   0   0
1707B  614B: 341B  451k|2186k 9934B|1380  1044 | 19   1  79   0   0   0
 947k   82k:  13k 2445k| 382B 1846B|2813  2673 | 26   7  63   4   0   0
  34k   83k:  68B 3059k| 531B 2520B|1703  2057 | 72   7  19   2   0   0
  91k 2321B:   0   325k| 265B  934B|1481  1924 | 92   7   1   0   0   0
1229B    0 :   0   452k| 192B  592B|1390  1727 | 93   7   1   0   0   0
   0  2926B:   0   156k| 175B  589B|1609  1863 | 87   9   4   0   0   0
after ^ using following settings

	echo "echo 360000 > /proc/sys/vm/dirty_writeback_centisecs" | sudo sh
	echo "echo 360000 > /proc/sys/vm/dirty_expire_centisecs" | sudo sh
	echo "echo 95 > /proc/sys/vm/dirty_ratio" | sudo sh
	echo "echo 95 > /proc/sys/vm/dirty_background_ratio" | sudo sh
Summary: turn off journalling in ext4 → reduce EBS writes by removing journal, tweaking writeout
Attached patch sysctl-puppet.diff (obsolete) — Splinter Review
For builders only for now.
Attachment #8439380 - Flags: review?(dustin)
I pushed https://hg.mozilla.org/build/cloud-tools/rev/00022070cb49 to make disable EXT4 journal and regenerated the AMIs. The spot AMIs will be regenerated automatically later today. Still need to land the puppet change to make sysctl configs persistent.
Comment on attachment 8439380 [details] [diff] [review]
sysctl-puppet.diff

Review of attachment 8439380 [details] [diff] [review]:
-----------------------------------------------------------------

puppet pieces look good, assuming you want this on *every* cent and ubuntu build slave, not just aws!

That said, I don't have any clue as to the sysctl vars or values, so get additional review from "someone" for what those are, and you're good to land (doesn't have to be a puppet peer)
Attachment #8439380 - Flags: review?(dustin)
Attachment #8439380 - Flags: review?
Attachment #8439380 - Flags: review+
Apply it only to virtual machines.
Attachment #8439380 - Attachment is obsolete: true
Attachment #8439380 - Flags: review?
Attachment #8439412 - Flags: review?(bugspam.Callek)
Attachment #8439412 - Flags: review?(bugspam.Callek) → review+
Followup to narrow down to /xen/ only ("facter virtual" returns "xen" on ubuntu and "xenhvm" on builders)

remote:   https://hg.mozilla.org/build/puppet/rev/f6095fab3f53
remote:   https://hg.mozilla.org/build/puppet/rev/d1ee77767258
Depends on: 1024961
Depends on: 1024962
I deleted the current AMIs and reverted the configs to use base AMIs with journal enabled.
https://hg.mozilla.org/build/cloud-tools/rev/4d3d3ea90517

Some ideas to test:
* use sync before shutting down the AMI instances
* play with mount parameters to see what may caused the issues
Rail - The patch that landed above is suspected for causing FxOS build bustage in bug 1024961. Can you backout & get the builds that are red on tbpl right now respun?

https://tbpl.mozilla.org/?tree=Mozilla-Aurora&showall=1&rev=8897bc43f59b
Flags: needinfo?(rail)
Rail mentioned in IRC that the regressing patch has been backed out. Simone has respun the build.
yeah, I think we are ok now...
Flags: needinfo?(rail)
Just looked at the stop idle logs now. Tons of instances have been shut down due to their impaired status...
Comment on attachment 8439412 [details] [diff] [review]
tweaks::filesystem

I reverted the changes because I suspect it caused bug 1025842

remote:   https://hg.mozilla.org/build/puppet/rev/42b776622e26
remote:   https://hg.mozilla.org/build/puppet/rev/ef257b25e88d


Some of the on-demand instances still may have this change. We can either ignore it (because they will be using the same AMIs soon) or create a change to make sure we reset the values to default ones.
Attachment #8439412 - Flags: checked-in+ → checked-in-
Depends on: 1025842
We had a lot of impaired instance shutdowns over the weekend :/
Reverting the changes helped with impaired instances situation - 0 warnings since it was deployed...
Lets try a more gradual approach.
echo 180000 > /proc/sys/vm/dirty_writeback_centisecs
echo 180000 > /proc/sys/vm/dirty_expire_centisecs

This should cause a flush every 30min (eg 30min*60seconds*100th/second).
Amusingly someone has done this on a t1.micro with good success
http://www.thaivisa.com/forum/topic/660187-reducing-disk-writes-and-reads-to-minimum/
Ok, so after some more research

best way to monitor iops is: 
sar -d 1
tps column provides per-device IOPS.

io activity on linux is controlled by 2 knobs ext4 writeout set by commit= mount flag(default of 5s) and /proc/sys/vm/dirty_*secs (heh heh) variables(default of 5 and 30seconds).


So by the logic, the minimum useful thing to test would be

/proc/sys/vm/dirty_writeback_centisecs set to 3000 (to match /proc/sys/vm/dirty_expire_centisecs)
and -o noatime,nodiratime,commit=30

This should cut down amount of io by doing writeouts every 30s, instead of 5. Once we can get stuff to work reliably at 30s, we can extend this interval to hours.
Summary: reduce EBS writes by removing journal, tweaking writeout → reduce EBS writes by tweaking writeout
Note, as far as I can tell the machines were hanging because 'sync' operation at shutdown was taking dozens of minutes to complete. It takes a while to flush many gigs of RAM over EBS :)
Rail,
can you deploy the 30s write delay, we can bump it incrementally over this week
Flags: needinfo?(rail)
sure, we'll see the results tomorrow morning.
Flags: needinfo?(rail)
(In reply to Rail Aliiev [:rail] from comment #28)
> So far
> https://secure.pub.build.mozilla.org/builddata/reports/slave_health/slave.
> html?name=bld-linux64-spot- looks very healthy! \o/

Is that because you didn't set 
/proc/sys/vm/dirty_writeback_centisecs to 3000 yet? :)
Attached patch fs-puppet-1.diffSplinter Review
set vm.dirty_writeback_centisecs on CentOS slaves only
Attachment #8441572 - Flags: review?(dustin)
Comment on attachment 8441572 [details] [diff] [review]
fs-puppet-1.diff

All virtual machines?  Or just EC2?
Attachment #8441572 - Flags: review?(dustin) → review+
(In reply to Dustin J. Mitchell [:dustin] from comment #31)
> Comment on attachment 8441572 [details] [diff] [review]
> fs-puppet-1.diff
> 
> All virtual machines?  Or just EC2?

Just EC2. If there is a better way than "=~ /xen/", I can use that instead.
Ah, Facter-2.0.0 has some EC2 facts, but I think they're broken in 1.7.5.
Yeah, just looked at /usr/lib/ruby/site_ruby/1.8/facter/ec2.rb:

if (Facter::Util::EC2.has_euca_mac? || Facter::Util::EC2.has_openstack_mac? ||
    Facter::Util::EC2.has_ec2_arp?) && Facter::Util::EC2.can_connect?
  metadata
  userdata
else
  Facter.debug "Not an EC2 host"
end

I bet it doesn't pass that check
# facter -d ec2_hostname
Relative directory ./facter removed from search path.
value for lsbdistid is still nil
Not an EC2 host
Hmm, for some reason I don't see the change in /etc/sysctl.conf. I'll investigate it today.
Attached file EBS Cost Summary for past hour (obsolete) —
if my code is correct..according to ebs stats from past hour... We should switch our 15&20gb instances to ssd ebs.

I assume those are tests and we should try to switch them to run with instance storage. Who knew those are the biggest contributors to our iops bill :(
Note above stats are from us-west-2. https://github.com/tarasglek/spotlog/blob/master/cloudwatch.js is the code
Us-east numbers are similar:
20GB nodes are killing us in iops costs

time nodejs cloudwatch.js config_ebs_cost_us-east-1.json
Storage cost: $1 ; IOPS cost: $1.84; { Size: '20',
  Stats:
   { Volumes: 587,
     VolumeWriteOps: 26011844,
     VolumeReadOps: 10829220 } }
Storage cost: $7.5 ; IOPS cost: $1.63; { Size: '150',
  Stats:
   { Volumes: 167,
     VolumeWriteOps: 23565040,
     VolumeReadOps: 9059006 } }
Storage cost: $12.5 ; IOPS cost: $0.2; { Size: '250',
  Stats: { Volumes: 53, VolumeWriteOps: 1310870, VolumeReadOps: 2637553 } }
Storage cost: $0.75 ; IOPS cost: $0.02; { Size: '15',
  Stats: { Volumes: 111, VolumeWriteOps: 317021, VolumeReadOps: 4214 } }
Storage cost: $1.75 ; IOPS cost: $0.01; { Size: '35',
  Stats: { Volumes: 28, VolumeWriteOps: 6861, VolumeReadOps: 135474 } }
Storage cost: $5 ; IOPS cost: $0; { Size: '100',
  Stats: { Volumes: 6, VolumeWriteOps: 90815, VolumeReadOps: 144 } }
Storage cost: $0.4 ; IOPS cost: $0; { Size: '8',
  Stats: { Volumes: 7, VolumeWriteOps: 27849, VolumeReadOps: 2 } }
Storage cost: $1.5 ; IOPS cost: $0; { Size: '30', Stats: { Volumes: 2, VolumeWriteOps: 6258 } }
Storage cost: $22.5 ; IOPS cost: $0; { Size: '450',
  Stats: { Volumes: 3, VolumeWriteOps: 89, VolumeReadOps: 13 } }
Storage cost: $4 ; IOPS cost: $0; { Size: '80', Stats: { Volumes: 2 } }
Cost calculations were bit misleading above. Normalized storage costs look more realistic now:
time nodejs cloudwatch.js config_ebs_cost_us-east-1.json
Storage cost: $0.8 ; IOPS cost: $1.73; { Size: '20',
  Stats:
   { Volumes: 587,
     VolumeWriteOps: 23211694,
     VolumeReadOps: 11370698 } }
Storage cost: $1.72 ; IOPS cost: $1.62; { Size: '150',
  Stats:
   { Volumes: 167,
     VolumeWriteOps: 23788569,
     VolumeReadOps: 8531713 } }
Storage cost: $0.91 ; IOPS cost: $0.18; { Size: '250',
  Stats: { Volumes: 53, VolumeWriteOps: 1266935, VolumeReadOps: 2278493 } }
Storage cost: $0.09 ; IOPS cost: $0.02; { Size: '35',
  Stats: { Volumes: 37, VolumeWriteOps: 14056, VolumeReadOps: 350765 } }
Storage cost: $0.11 ; IOPS cost: $0.02; { Size: '15',
  Stats: { Volumes: 111, VolumeWriteOps: 300389, VolumeReadOps: 3671 } }
Storage cost: $0.04 ; IOPS cost: $0; { Size: '100',
  Stats: { Volumes: 6, VolumeWriteOps: 89344, VolumeReadOps: 65 } }
Storage cost: $0 ; IOPS cost: $0; { Size: '8',
  Stats: { Volumes: 7, VolumeWriteOps: 34675, VolumeReadOps: 2 } }
Storage cost: $0 ; IOPS cost: $0; { Size: '30', Stats: { Volumes: 2, VolumeWriteOps: 6705 } }
Storage cost: $0.09 ; IOPS cost: $0; { Size: '450',
  Stats: { Volumes: 3, VolumeWriteOps: 146, VolumeReadOps: 13 } }
Storage cost: $0.01 ; IOPS cost: $0; { Size: '80', Stats: { Volumes: 2 } }

time nodejs cloudwatch.js config_ebs_cost_us-west-2.json
Storage cost: $0.88 ; IOPS cost: $1.82; { Size: '20',
  Stats:
   { Volumes: 642,
     VolumeWriteOps: 25188720,
     VolumeReadOps: 11150635 } }
Storage cost: $0.42 ; IOPS cost: $1.28; { Size: '15',
  Stats:
   { Volumes: 407,
     VolumeWriteOps: 18668651,
     VolumeReadOps: 7012907 } }
Storage cost: $2.1 ; IOPS cost: $1.08; { Size: '150',
  Stats:
   { Volumes: 204,
     VolumeWriteOps: 12930667,
     VolumeReadOps: 8586155 } }
Storage cost: $0.29 ; IOPS cost: $0.66; { Size: '35',
  Stats: { Volumes: 123, VolumeWriteOps: 619646, VolumeReadOps: 12658079 } }
Storage cost: $0.05 ; IOPS cost: $0.01; { Size: '100',
  Stats: { Volumes: 8, VolumeWriteOps: 168253, VolumeReadOps: 1674 } }
Storage cost: $0 ; IOPS cost: $0; { Size: '8', Stats: { Volumes: 5, VolumeWriteOps: 4562 } }
Storage cost: $0 ; IOPS cost: $0; { Size: '10', Stats: { Volumes: 2, VolumeWriteOps: 834 } }
Storage cost: $0 ; IOPS cost: $0; { Size: '4', Stats: { Volumes: 1, VolumeWriteOps: 224 } }
Storage cost: $0.06 ; IOPS cost: $0; { Size: '450', Stats: { Volumes: 2, VolumeWriteOps: 124 } }
Storage cost: $0 ; IOPS cost: $0; { Size: '12', Stats: { Volumes: 1, VolumeWriteOps: 45 } }
Storage cost: $0.84 ; IOPS cost: $0; { Size: '250', Stats: { Volumes: 49 } }
Storage cost: $0.01 ; IOPS cost: $0; { Size: '30', Stats: { Volumes: 6 } }
Attachment #8442447 - Attachment is obsolete: true
(In reply to Rail Aliiev [:rail] from comment #38)
> Hmm, for some reason I don't see the change in /etc/sysctl.conf. I'll
> investigate it today.

Just verified that everything is OK.
If everything is ok, 

lets set commit to 60,
dirty_writeback_centisecs and dirty_expire_centisecs to 6000
vm.dirty_ratio = 40
vm.dirty_background_ratio = 1

This should have an impact on catlee's IOPS graphs. If it doesn't, we need to wait until we stop rebooting to benefit from these settings.
Attached patch fs.diff (obsolete) — Splinter Review
Attachment #8443602 - Flags: review?(dustin)
Base AMI (commit=60) pushed to https://hg.mozilla.org/build/cloud-tools/rev/a081690469a6
Attached patch fs.diffSplinter Review
Attachment #8443602 - Attachment is obsolete: true
Attachment #8443602 - Flags: review?(dustin)
Attachment #8443632 - Flags: review?(dustin)
Attachment #8443632 - Flags: review?(dustin) → review+
The current settings worked fine so far.

Taras, would you like to tune more of these?
Flags: needinfo?(taras.mozilla)
Lets give this another week. If I believe the difference between test and bld IOPS in https://www.hostedgraphite.com/da5c920d/af587ddb-3e87-432e-8d10-7ed541694a6a/graphite/render/?width=1611&height=820&_salt=1403132195.304&lineMode=connected&target=sum%28ebs.bld-linux64.Volume*Ops%29&target=sum%28ebs.tst-*.Volume*Ops%29&target=sum%28ebs.bld-linux64.*.Volume*Ops%29&tz=America/Los_Angeles&from=-6days

I think bld might be spiking less than before relative to tst, but hard to tell so far.
Flags: needinfo?(taras.mozilla)
sounds good to me
How does it look like now?
Flags: needinfo?(taras.mozilla)
(In reply to Rail Aliiev [:rail] from comment #52)
> How does it look like now?

Judging from https://www.hostedgraphite.com/da5c920d/af587ddb-3e87-432e-8d10-7ed541694a6a/graphite/render/?width=1611&height=820&_salt=1403132195.304&lineMode=connected&target=sum%28ebs.bld-linux64.Volume*Ops%29&target=sum%28ebs.tst-*.Volume*Ops%29&target=sum%28ebs.bld-linux64.*.Volume*Ops%29&tz=America/Los_Angeles&from=-20days we don't have enough data. 

I *think* there is more difference between tst & bld iops now, but the only way to tell would be to turn off this optimization for a week and see if things look worse.

I think lack of clear win here means we are better off abandoning this approach for now and focusing on ebs-nuking
Flags: needinfo?(taras.mozilla)
Fair enough. Let's close this bug for now then.
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Component: General Automation → General
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: