Plan for moving BMO to AWS

RESOLVED FIXED

Status

()

bugzilla.mozilla.org
Infrastructure
RESOLVED FIXED
4 years ago
2 years ago

People

(Reporter: mcote, Assigned: fubar)

Tracking

Production
Dependency tree / graph

Firefox Tracking Flags

(Not tracked)

Details

(Whiteboard: [data:db_mig])

(Reporter)

Description

4 years ago
This is a tracking bug for investigations as to how a move to AWS could be accomplished while maintaining (or improving) the current system's performance and maintainability.
(Reporter)

Comment 1

4 years ago
Clarifying title.  Also, the move would need to occur by end of Q1 2016.
Summary: Investigate moving BMO to AWS → Plan for moving BMO to AWS
from a bugzilla developer's perspective, as long as the deployment "in the cloud" matches our DC hosted deployment, then there's nothing we need to do to support AWS/DO/whatever.

is the plan to move both PHX1 and SCL3 clusters?
Flags: needinfo?(mcote)
(Reporter)

Comment 3

4 years ago
Seems that itself is up in the air.
Flags: needinfo?(mcote)
Great question! tl;dr yes.

long version:
Ideally we would host both, however we want to maintain redundancy, so we'd have to be careful to do that properly.

There's always the possibility that it's more expensive to host in the cloud than on hardware, in which case we'd roll back.

We would move the phx1 cluster first because the hardware's expiring first. It'd be great to field-test it, before it needs to be used as failover, because the last thing we need is to realize that the AWS infrastructure can't handle the load when we're in the midst of an scl3 bugzilla meltdown.

So...yes, given testing and the ability to be redundant on the data-center level.
Whiteboard: [db mig]
Whiteboard: [db mig]
Depends on: 1064287
Whiteboard: [data:db_mig]
Depends on: 1160929
Is there a more detailed timeline for this? We want to have this completed in 6-9 months...
(Reporter)

Comment 6

3 years ago
I assume you mean "for phx1", as we have not settled on plans for BMO in scl3 to move to AWS.

The process involves more than the BMO team, but we're aiming to have the necessary dev work done for phx->AWS by mid-August if not sooner.
Correct, I meant for phx hardware.
(In reply to Mark Côté [:mcote] from comment #6)
> I assume you mean "for phx1", as we have not settled on plans for BMO in
> scl3 to move to AWS.
> 
> The process involves more than the BMO team, but we're aiming to have the
> necessary dev work done for phx->AWS by mid-August if not sooner.

Hi Mark
Does that mean you will have a backup/passive buzgilla in AWS?
I am trying to understand the failover (RPO and RTO) for this design as well as the level of effort to redesign bugzilla altogether for AWS multi-region.

Sheeri, today the RPO/RTO is 30 minutes I believe?
(Reporter)

Comment 9

3 years ago
(In reply to SylvieV from comment #8)
> (In reply to Mark Côté [:mcote] from comment #6)
> > I assume you mean "for phx1", as we have not settled on plans for BMO in
> > scl3 to move to AWS.
> > 
> > The process involves more than the BMO team, but we're aiming to have the
> > necessary dev work done for phx->AWS by mid-August if not sooner.
> 
> Hi Mark
> Does that mean you will have a backup/passive buzgilla in AWS?
> I am trying to understand the failover (RPO and RTO) for this design as well
> as the level of effort to redesign bugzilla altogether for AWS multi-region.

Yes, that is the idea.  We have to move the BMO failover out of phx, and scl3 isn't an option since the production system is already there, so we're trying AWS.  We can use it as a case study for potentially moving production to AWS some day.  For instance, we may hit some performance issues with the move; they won't be critical for a failover system, but we'd have to spend some time on them if we want to move production there.  The only crucial development effort for getting out of phx, which will mitigate the worst performance issues, is bug 1160929; we can investigate other ways of making Bugzilla "cloudier" after the move.
(Assignee)

Comment 10

3 years ago
taking this and cc'ing :r2 since we've been working on this since Whistler.

suggest using this as a tracker bug from here
Assignee: nobody → klibby

Comment 11

3 years ago
Per the meeting that fubar, glob, and r2 had today, I am proposing we remove ElasticSearch from scope for this initial migration. Bugzilla is already a complex application and adding ElasticSearch increases the risk that we would not be able to migrate it in time.

Comment 12

3 years ago
I'm exploring ways to provide SMTP service for Bugzilla in AWS, including Amazon Simple Email Service and Google Apps SMTP relay. How may emails does Bugzilla send in a day? To approximately how many unique recipients in a day?
Flags: needinfo?(glob)
i only have logs for bugmail (email generated by bug creation/updates).  these values exclude email generated by flags, and administration requests (new account verification, password resets, etc).  because we relay through a smtp server in the dc, infra should be able to provide more accurate numbers if required, however bugmail accounts for the vast majority of the email we send, so these values should be good enough as a ballpark figure.

in june we averaged 145k emails per day.  we send a lot more on weekdays .. excluding weekends brings the average up to 182k emails per day.

over all of june we averaged 3680 unique recipients per day, increasing to an average of 4080 unique recipients when weekends are excluded.

over the course of the whole month we sent 4,344,962 emails to 19,839 unique recipients.
Flags: needinfo?(glob)

Comment 14

3 years ago
How much storage will be need for attachments when they are moved to S3?
Flags: needinfo?(glob)
(In reply to Richard Weiss [:r2] from comment #14)
> How much storage will be need for attachments when they are moved to S3?

we currently store 120gb of attachments content.
Flags: needinfo?(glob)
Fixing this as we've moved BMO to AWS for the failover. If we need to reopen for something in particular, please do so.
Status: NEW → RESOLVED
Last Resolved: 2 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.