Closed Bug 1330738 Opened 8 years ago Closed 8 years ago

log parser queue backlogs on stage and prod

Tracking

(Not tracked)

Status:

RESOLVED FIXED

People

(Reporter: emorley, Assigned: emorley)

References

Details

Attachments

(1 file)

devservices-aws PR #31: Increase stage RDS storage to 750GB 8 years ago Ed Morley [:emorley] 63 bytes, text/x-github-pull-request	emorley : review+ emorley : checkin+	Details \| Review

Ed Morley [:emorley]

Assignee

Description

•

8 years ago

Over the last few days I've had multiple CloudAMQP queue alerts for the various stage log parser queues. Bumping the number of workers didn't appear to help (bug 1292720 comment 6). Looking at the AWS console it seems that stage RDS is under much higher load than prod, even taking into account stage being an m4.xlarge vs prod's m4.2xlarge (we'd intentionally kept stage smaller since it doesn't have much API load from users). The cause is likely a combination of: 1) our data ingestion causing most of the DB load (and more load than previously, post ORM migration), therefore stage RDS has almost the load of prod even though it has no one hitting the API (so the m4.xlarge vs m4.2xlarge reasoning doesn't hold as well) 2) stage RDS having 500GB provisioned vs prod's 750GB, so has a baseline performance of 1500 IOPS rather than 2250 IOPS (see http://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/CHAP_Storage.html#Concepts.Storage.GeneralSSD) 3) Some query no longer fitting in RAM on stage RDS, since it has half the RAM of prod's m4.2xlarge (16GB vs 32GB) Bug 1330728 will look into #1. In this bug I'll also try raising the stage storage from 500GB to 750GB to get the increased baseline IOPS performance. This is something that would have happened at some point in the future anyway, when we next reset stage to prod (since restoring from snapshots means inheriting the snapshots allocated storage size). Failing that we can increase stage to an m4.2xlarge to match prod, however: * it means we're less likely to catch DB perf issues on stage, since it will then have more headroom than prod (given no user API load) * bug 1315329 converted dev+stage+prod to reserved instances, which may end up being wasted (hopefully another project or the read only replica can use them instead)

Ed Morley [:emorley]

Assignee

Comment 1

•

8 years ago

Attached file devservices-aws PR #31: Increase stage RDS storage to 750GB — Details

Attachment #8826410 - Flags: review?(klibby)

Ed Morley [:emorley]

Assignee

Comment 2

•

8 years ago

I've had 10+ alerts from stage today, and a handful from prod too.

Ed Morley [:emorley]

Assignee

Updated

•

8 years ago

Summary: log parser queue backlogs on stage → log parser queue backlogs on stage and prod

Ed Morley [:emorley]

Assignee

Comment 3

•

8 years ago

Comment on attachment 8826410 [details] [review] devservices-aws PR #31: Increase stage RDS storage to 750GB https://github.com/mozilla-platform-ops/devservices-aws/commit/ca99123da94fafb4ec222354d66c5021cdf8c077

Attachment #8826410 - Flags: checkin+

Kendall Libby [:fubar] (he/him)

Comment 4

•

8 years ago

sekrit (master)$ terraform apply data.terraform_remote_state.base: Refreshing state... ... aws_db_instance.treeherder-stage-rds: Modifying... allocated_storage: "500" => "750" ... Apply complete! Resources: 0 added, 1 changed, 0 destroyed.

Ed Morley [:emorley]

Assignee

Updated

•

8 years ago

Attachment #8826410 - Flags: review?(klibby) → review+

Ed Morley [:emorley]

Assignee

Updated

•

8 years ago

Status: ASSIGNED → RESOLVED

Closed: 8 years ago

Resolution: --- → FIXED

You need to log in before you can comment on or make changes to this bug.

Bugzilla

log parser queue backlogs on stage and prod

Categories

(Tree Management :: Treeherder: Infrastructure, defect, P1)

Tracking

(Not tracked)

People

(Reporter: emorley, Assigned: emorley)

References

Details

Crash Data

Security

(public)

User Story

Attachments

(1 file)

Description

Comment 1

Comment 2

Updated

Comment 3

Comment 4

Updated

Updated

Attachment

General

Description

File Name

Content Type