Closed Bug 1179858 Opened 9 years ago Closed 9 years ago

Figure out why the Heroku DB performance is worse than production

Tracking

(Not tracked)

Status:

RESOLVED FIXED

People

(Reporter: emorley, Assigned: emorley)

References

Details

Ed Morley [:emorley]

Assignee

Description

•

9 years ago

I'm happy to dig into this.
We'll need access to the RDS instance on the IT AWS account.

Ed Morley [:emorley]

Assignee

Updated

•

9 years ago

Depends on: 1179860

Ed Morley [:emorley]

Assignee

Comment 1

•

9 years ago

If you look at say:
https://rpm.newrelic.com/accounts/677903/applications/7385291/transactions?type=app#id=5b225765625472616e73616374696f6e2f46756e6374696f6e2f747265656865726465722e7765626170702e6170692e61727469666163743a4172746966616374566965775365742e637265617465222c22225d

...then the DB does seem to be the slowest link (vs app).

We're using db.m3.xlarge which has 4 vCPU and 15GiB of RAM:
http://aws.amazon.com/rds/details/

Now that I have access to the RDS instance settings, I can confirm it's in us-east-1e (secondary zone is us-east-1d) - which should be fine, given Heroku US is in us-east-1:
https://devcenter.heroku.com/articles/regions#data-center-locations

And we're definitely using the US Heroku (rather than EU):
https://dashboard.heroku.com/apps/treeherder-heroku/settings

RDS CPU utilisation is under 20%, and there rarely seems to be any less than 3GB free RAM.

Wonder if IOPS might be the problem? We're using general storage, which gives us 3 IOPS per 1GB storage, so 300 IOPS - and it looks like we've using 250 IOPS typically:
https://console.aws.amazon.com/rds/home?region=us-east-1#dbinstances:id=treeherder-heroku;sf=all;v=mm

Ed Morley [:emorley]

Assignee

Comment 2

•

9 years ago

Things seem much better post bug 1182201; many of the metrics (see last link in comment 8) improved by 30-60% around the 10th July (comparing weekday to weekday; the major dips are weekends).

eg...

Network transmit throughput: ~6.5 MB/s -> ~3.5 MB/s
Network receive throughput: ~1.0 MB/s -> ~0.4 MB/s
Write throughput: ~5 MB/s -> ~3 MB/s
Write latency: ~30ms -> ~10ms

I'm guessing we were just overloading the network/disk IO quotas.

You can see the massive drop around the 10th July here:
https://rpm.newrelic.com/accounts/677903/applications/7385291/datastores?tw%5Bend%5D=1436959663&tw%5Bstart%5D=1436354863#/overview/All?value=total_call_time_per_minute

Comparing the DB response times for Heroku vs prod shows them to be much more similar now:

Heroku: https://rpm.newrelic.com/accounts/677903/applications/7385291/datastores#/overview/MySQL?value=average_response_time
Prod: https://rpm.newrelic.com/accounts/677903/applications/4180461/datastores#/overview/MySQL?value=average_response_time

Removing the objectstore (bug 1140349) should reduce the DB write churn and rate of reads even more.

As is, I think we can call this fixed for now :-)

Status: NEW → RESOLVED

Closed: 9 years ago

Depends on: 1182201, 1140349

Resolution: --- → FIXED

You need to log in before you can comment on or make changes to this bug.

Bugzilla

Quick Search

Figure out why the Heroku DB performance is worse than production

Categories

(Tree Management :: Treeherder, defect, P3)

Tracking

(Not tracked)

People

(Reporter: emorley, Assigned: emorley)

References

Details

Crash Data

Security

(public)

User Story

Description

Updated

Comment 1

Comment 2