Closed Bug 765044 Opened 12 years ago Closed 12 years ago

Set up solitude in prod

Categories

(Cloud Services :: Operations: Marketplace, task, P1)

task

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: andy+bugzilla, Assigned: oremj)

References

Details

We'll need to set up solitude in production. This will be a separate database server and web head from the rest of the marketplace cluster. Setup on -dev in bug 765043.
Assignee: server-ops → oremj
Do we need any additional information or are there other dependencies that will stall this bug?
I believe we still need to have a meeting about architecture before we set this up.
I can coordinate a meeting to discuss architecture if you can tell me who the people are that need to be involved.
Mark Mayo, me, Raymond Forbes, Andy Mckay, and oremj at a minimum. Mark probably has others to include. Thanks.
Would like to get this moving next week if possible.
Have we decided to or to not use an intermediate proxy?
Haven't decided yet. It's probably a good idea.
I was trying to schedule a meeting to discuss all this, still necessary? Getting everyone at the same time has been challenging.
I met with some folks today and we've got a plan for next steps: 1) Raymond to clarify some security questions (email has been dispatched) 2) Mark+Wil coordinate an HA evaluation of the code and come up with a list of changes 3) Wil+Andy to organize getting the list of changes done We're currently on step 1.
Have we heard back from Raymond?
Nope! The email was just a link to https://etherpad.mozilla.org/solitude-prod Raymond - can you take a look?
Whiteboard: Waiting on Raymond.
Blocks: 777543
Whiteboard: Waiting on Raymond.
Planning to corner bobm and get a plan for this whilst at the apps work week.
We are still waiting on Raymond to speak with OPSEC about VLAN requirements. Once we have an idea on those, we can start with deployment.
I have just added kang and joe to the bug and asked them to respond directly.
I've added my reply on etherpad. Basically as Marketplace is now svc, the vlan setup should be like other svc setups, that means db has its own vlan, web has its own vlan, proxy has it's own vlan. From Opsec point of view this is better than the other setups and thus recommended.
So we should have: solitude{1..}.db.phx1.mozilla.com web{1..}.solitude.phx1.mozilla.com web{1..}.solitude-proxy.phx1.mozilla.com Where "solitude" and "solitude-proxy" are new VLANs and db is the existing db VLAN. Guillaume, does that sound alright?
Existing vlan db means a vlan db that was done specifically for solitude right? Otherwise, that sounds alright. Also, note that svc usually separate stage and prod vlans, which we also recommend.
This needs to be operational by 8/28.
Whiteboard: Due: 8/28
(In reply to Jeremy Orem [:oremj] from comment #19) > This needs to be operational by 8/28. Should be -7 days to give time for the QA flow
Whiteboard: Due: 8/28 → Due: 8/21
Guillaume, does memcache also need its own VLAN or can it live in the same VLAN as the web servers?
Ping Guillaume for comment 21.
Is there any performance concern if we were to request it to be in a different vlan (such as the db one) ? If that's the case we would be ok with having it with the web heads vlan
From the app the developer point of view caching isn't (at the moment) used for performance concerns, just a place to persist information between requests. This might change in the future, but I'd be cool with a slightly slower cache.
Let's keep them in the same VLAN to avoid putting more pressure on the firewalls.
Paypal isn't happening for 28th, BlueVia doesn't need this. No rush on this anymore.
Whiteboard: Due: 8/21
im ok with keeping it in the same vlan on the basis that other services actually do that (ie acking comment25) .otherwise in general it belongs to the db vlan.
Let's close out this solitude stuff and reopen when/if it becomes important again.
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → INCOMPLETE
Reopening - solitude is back. Mark is figuring out what needs to happen regarding hardware. Payments is set to go live Oct 28th, so we need this asap. Thanks.
Status: RESOLVED → REOPENED
Resolution: INCOMPLETE → ---
Whiteboard: [waiting on hardware]
Blocks: 794651
PO has been signed, hardware on the way
In particular, server hardware is due to arrive Nov 14th.
Here are OpSec requirements for Solitude network security. Solitude Network Security Requirements * Access controls for this zone must default to DENY. * Access to systems in this zone must go through internal load balancer or application proxy, unless direct access is required for operation or to overcome protocol limitations. Access and any exceptions must be approved by Security Assurance: OpSec. * Systems must not have access to the internet, unless required for operation. Exception to this must be approved by Security Assurance: OpSec and will require the use of a proxy server. * Systems in this network must use Mozilla internal DNS servers. * Encrypted services must be used. Use of unencrypted services must be approved by Security Assurance: OpSec. * Systems in this zone must be managed with puppet, and have OpSec required security controls in place. Any exception to these controls must be approved by Security Assurance: OpSec. * Systems in this zone must be scanned by OpSec vulnerability scanners on a weekly basis. * This zone must implement network monitoring (OpSec and NetOps does this) * Flow data collection * Network traffic capture * Signature-based security alerts These requirements are taken from the HIGH Security zone section of the Network Security Policy. https://mana.mozilla.org/wiki/display/SECURITY/Network+Security+Policy
> * Systems must not have access to the internet, unless required for > operation. Exception to this must be approved by Security Assurance: > OpSec and will require the use of a proxy server. The service is split into two. One has database access and should not have access to the internet. The other has no database access and access to a whitelisted set of domains (basically the payment providers, Bango, Paypal etc) so it can complete payment operations.
Here are the Opsec proposed requirements for Solitude system security. Note that a large part is provided by simply using Infra or Services supplied systems. System Security Zone "High" from https://mana.mozilla.org/wiki/display/SECURITY/System+Security+Policy The list is from "High" zone linked above, is: * Must successfully pass an Operation Security review * Must configure auditing using the "high" audit set of rules * Should define with Opsec the custom rules, if any (to be decided) * Must explicitly disallow forwarding of network facilities (in particular, SSH port forwarding/tunneling should be forbidden) * Must be updated on a regular basis with security updates. Security updates should not be delayed by more than 7 days * Must adhere to the minimal target * Must provide an host based firewall that is setup * OpSec can offer support in configuring the firewall * Must only accept packaged and signed binaries (from a trusted authority) * Must be managed by a configuration management tool * Must direct all system logs (i.e. syslog) to the corresponding central syslog server for the datacenter, and should use TLS or DTLS. * It must be possible to replace/change authentication data automatically across all systems * Must adhere to the mount settings * Should hash any system password with a strong hash, such as sha512 * Must implement the "infrasec" system user that is used for vulnerability scanning * Should not allow direct root or administrator access, or should provide a proper audit trail (no shared login) * Access must be authenticated using strong authentication (one or more of) * Public/Private key pair (minimum of 1024bit RSA, ECDSA recommended) * Strong, complex password that use a no-plain-text-equivalent mechanism, such as SRP * Two factor authentication (with at least one strong authentication factor) * Any administrative authentication data must be stored encrypted using GnuPG and the encrypted files must be backed up
Depends on: 815410
Are there any updates here?
Whiteboard: [waiting on hardware] → [waiting on hardware] u=dev c=pmt p=3
Priority: -- → P1
Blocks: 815410
No longer depends on: 815410
We have been working on an AWS deployment for this. We should have a working "stage" instance there next week.
(In reply to Jeremy Orem [:oremj] from comment #36) > We have been working on an AWS deployment for this. We should have a working > "stage" instance there next week. What's the status of this?
We are able to deploy an instance at AWS now. We are currently working on monitoring/logging/graphite/sentry.
(In reply to Jeremy Orem [:oremj] from comment #38) > We are able to deploy an instance at AWS now. We are currently working on > monitoring/logging/graphite/sentry. What's the latest ETA? We've missed our January milestone now so we're into "as soon as possible" territory now.
The Jan milestone was today. I was asking about it yesterday, and was told it was pushed off to an unknown date in the future. We are having a few app issues, but can work those out today if a dev is available.
I'm ready to close this, but want to confirm with Andy that everything is working properly. We can do this in the morning.
Sounds good. If you set BANGO_ENV = 'test', then you should be able to run this script: https://github.com/mozilla/solitude/blob/master/samples/bango-basic.py That will hit Bango, the database and all sorts of stuff (you'll need to change the root on line 9). After you've done that, wipe the db and we should be good to go.
Before we close this, we should verify that instances have been setup as described in comments 32 and 34. Are you ready for us to review?
Mark is going to review this. After we address any issues that he finds, we will submit it for a opsec review.
(In reply to Jeremy Orem [:oremj] from comment #44) > Mark is going to review this. After we address any issues that he finds, we > will submit it for a opsec review. Please keep us up to date on review ETAs. This is a top priority. Thanks.
Bug 831576 is also a blocker for going live.
Depends on: 831576
Depends on: 832019
Depends on: 832021
(In reply to Wil Clouser [:clouserw] from comment #45) > (In reply to Jeremy Orem [:oremj] from comment #44) > > Mark is going to review this. After we address any issues that he finds, we > > will submit it for a opsec review. > > Please keep us up to date on review ETAs. This is a top priority. Thanks. It's been two days with no ETA or status. Can someone update this bug? Thanks
Whiteboard: [waiting on hardware] u=dev c=pmt p=3 → [waiting on hardware]
The environment is ready, so I'm going to close this out. We have other bugs filed against this to tidy stuff up. Before we can really go live with this, bug 831576 will need to be fixed and a stable/tested production release will need to be cut.
Status: REOPENED → RESOLVED
Closed: 12 years ago12 years ago
Resolution: --- → FIXED
Whiteboard: [waiting on hardware]
Proxy is currently working on -dev, happy to start tagging solitude and getting ready for testing prod pushes.
Component: Server Operations: AMO Operations → Operations: Marketplace
Product: mozilla.org → Mozilla Services
You need to log in before you can comment on or make changes to this bug.