1071199 - Plan for moving BMO to AWS

Reporter

Description

•

11 years ago

This is a tracking bug for investigations as to how a move to AWS could be accomplished while maintaining (or improving) the current system's performance and maintainability.

Mark Côté [:mcote]

Reporter

Comment 1

•

11 years ago

Clarifying title. Also, the move would need to occur by end of Q1 2016.

Summary: Investigate moving BMO to AWS → Plan for moving BMO to AWS

:glob ✱

Comment 2

•

11 years ago

from a bugzilla developer's perspective, as long as the deployment "in the cloud" matches our DC hosted deployment, then there's nothing we need to do to support AWS/DO/whatever. is the plan to move both PHX1 and SCL3 clusters?

Flags: needinfo?(mcote)

Mark Côté [:mcote]

Reporter

Comment 3

•

11 years ago

Seems that itself is up in the air.

Flags: needinfo?(mcote)

Sheeri Cabral [:sheeri]

Comment 4

•

11 years ago

Great question! tl;dr yes. long version: Ideally we would host both, however we want to maintain redundancy, so we'd have to be careful to do that properly. There's always the possibility that it's more expensive to host in the cloud than on hardware, in which case we'd roll back. We would move the phx1 cluster first because the hardware's expiring first. It'd be great to field-test it, before it needs to be used as failover, because the last thing we need is to realize that the AWS infrastructure can't handle the load when we're in the midst of an scl3 bugzilla meltdown. So...yes, given testing and the ability to be redundant on the data-center level.

Sheeri Cabral [:sheeri]

Updated

•

11 years ago

Whiteboard: [db mig]

Sheeri Cabral [:sheeri]

Updated

•

11 years ago

Whiteboard: [db mig]

Sheeri Cabral [:sheeri]

Updated

•

11 years ago

Whiteboard: [data:db_mig]

:glob ✱

Updated

•

10 years ago

Depends on: 1160929

Sheeri Cabral [:sheeri]

Comment 5

•

10 years ago

Is there a more detailed timeline for this? We want to have this completed in 6-9 months...

Mark Côté [:mcote]

Reporter

Comment 6

•

10 years ago

I assume you mean "for phx1", as we have not settled on plans for BMO in scl3 to move to AWS. The process involves more than the BMO team, but we're aiming to have the necessary dev work done for phx->AWS by mid-August if not sooner.

Sheeri Cabral [:sheeri]

Comment 7

•

10 years ago

Correct, I meant for phx hardware.

Sylvie Veilleux [:sylvieV]

Comment 8

•

10 years ago

(In reply to Mark Côté [:mcote] from comment #6) > I assume you mean "for phx1", as we have not settled on plans for BMO in > scl3 to move to AWS. > > The process involves more than the BMO team, but we're aiming to have the > necessary dev work done for phx->AWS by mid-August if not sooner. Hi Mark Does that mean you will have a backup/passive buzgilla in AWS? I am trying to understand the failover (RPO and RTO) for this design as well as the level of effort to redesign bugzilla altogether for AWS multi-region. Sheeri, today the RPO/RTO is 30 minutes I believe?

Mark Côté [:mcote]

Reporter

Comment 9

•

10 years ago

(In reply to SylvieV from comment #8) > (In reply to Mark Côté [:mcote] from comment #6) > > I assume you mean "for phx1", as we have not settled on plans for BMO in > > scl3 to move to AWS. > > > > The process involves more than the BMO team, but we're aiming to have the > > necessary dev work done for phx->AWS by mid-August if not sooner. > > Hi Mark > Does that mean you will have a backup/passive buzgilla in AWS? > I am trying to understand the failover (RPO and RTO) for this design as well > as the level of effort to redesign bugzilla altogether for AWS multi-region. Yes, that is the idea. We have to move the BMO failover out of phx, and scl3 isn't an option since the production system is already there, so we're trying AWS. We can use it as a case study for potentially moving production to AWS some day. For instance, we may hit some performance issues with the move; they won't be critical for a failover system, but we'd have to spend some time on them if we want to move production there. The only crucial development effort for getting out of phx, which will mitigate the worst performance issues, is bug 1160929; we can investigate other ways of making Bugzilla "cloudier" after the move.

Kendall Libby [:fubar] (he/him)

Assignee

Comment 10

•

10 years ago

taking this and cc'ing :r2 since we've been working on this since Whistler. suggest using this as a tracker bug from here

Assignee: nobody → klibby

Richard Weiss [:r2]

Comment 11

•

10 years ago

Per the meeting that fubar, glob, and r2 had today, I am proposing we remove ElasticSearch from scope for this initial migration. Bugzilla is already a complex application and adding ElasticSearch increases the risk that we would not be able to migrate it in time.

Richard Weiss [:r2]

Comment 12

•

10 years ago

I'm exploring ways to provide SMTP service for Bugzilla in AWS, including Amazon Simple Email Service and Google Apps SMTP relay. How may emails does Bugzilla send in a day? To approximately how many unique recipients in a day?

Flags: needinfo?(glob)

:glob ✱

Comment 13

•

10 years ago

i only have logs for bugmail (email generated by bug creation/updates). these values exclude email generated by flags, and administration requests (new account verification, password resets, etc). because we relay through a smtp server in the dc, infra should be able to provide more accurate numbers if required, however bugmail accounts for the vast majority of the email we send, so these values should be good enough as a ballpark figure. in june we averaged 145k emails per day. we send a lot more on weekdays .. excluding weekends brings the average up to 182k emails per day. over all of june we averaged 3680 unique recipients per day, increasing to an average of 4080 unique recipients when weekends are excluded. over the course of the whole month we sent 4,344,962 emails to 19,839 unique recipients.

Flags: needinfo?(glob)

Richard Weiss [:r2]

Comment 14

•

10 years ago

How much storage will be need for attachments when they are moved to S3?

Flags: needinfo?(glob)

:glob ✱

Comment 15

•

10 years ago

(In reply to Richard Weiss [:r2] from comment #14) > How much storage will be need for attachments when they are moved to S3? we currently store 120gb of attachments content.

Flags: needinfo?(glob)

Sheeri Cabral [:sheeri]

Comment 16

•

9 years ago

Fixing this as we've moved BMO to AWS for the failover. If we need to reopen for something in particular, please do so.

Status: NEW → RESOLVED

Closed: 9 years ago

Resolution: --- → FIXED

Bugzilla

Plan for moving BMO to AWS

Categories

(bugzilla.mozilla.org :: Infrastructure, defect)

Tracking

()

People

(Reporter: mcote, Assigned: fubar)

References

Details

(Whiteboard: [data:db_mig])

Crash Data

Security

(public)

User Story

Description

Comment 1

Comment 2

Comment 3

Comment 4

Updated

Updated

Updated

Updated

Comment 5

Comment 6

Comment 7

Comment 8

Comment 9

Comment 10

Comment 11

Comment 12

Comment 13

Comment 14

Comment 15

Comment 16