Closed
Bug 894085
Opened 11 years ago
Closed 11 years ago
addons.mozilla.org and marketplace.firefox.com production hardware migration
Categories
(Infrastructure & Operations :: Change Requests, task)
Infrastructure & Operations
Change Requests
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: jason, Assigned: jason)
References
Details
(Whiteboard: July 23rd, 2013, 16:00 PDT, 4 hours)
* date, time, duration of maintenance July 17th, 2013, 10:00 AM PDT, 4 hours * system(s) affected Systems associated with: addons.mozilla.org services.addons.mozilla.org marketplace.firefox.com * end-user impact addons.mozilla.org will be in read only state. Users will not be able add or modify addons.mozilla.org content. marketplace will be unavailable and users will be redirected to hardhat.mozilla.net. * maintenance plan and timeline https://infra.etherpad.mozilla.org/marketplace-cutover-20130717 * rollback plan / rollback point https://infra.etherpad.mozilla.org/marketplace-cutover-20130717 * notification mechanisms Email to everyone@mozilla.org. * who will be point, who else will be involved Jason Thomas and Jeremy Orem
Assignee | ||
Updated•11 years ago
|
Flags: cab-review?
Comment 1•11 years ago
|
||
Can we do this around 4pm or so PDT? That would get us to midnight in Europe which is where most of the marketplace users are
Folks - I understand and support the need to move to the dedicated new HA hardware. But, I'm expecting this is the LAST TIME that we take this cluster down for maintenance as the users are starting to ramp (phones are selling) and we are moving to an environment where we are online all the time. Thanks for making this happen. Rick.
Comment 3•11 years ago
|
||
Is this an emergency move? I see the request is for a move on the 17th, the CAB meets to approve this on the 17th at 9:00am, and we ask for 2 weeks notice on a change[1]. Given that this is a new process and we are ironing out the kinks I am happy to make an exception, but even so same-day approval after a CAB meeting might be tough if there are exceptions, not to mention no time to make appropriate notifications as this window is less than 48 hours from now. [1] - https://wiki.mozilla.org/IT/Maintenance#Approval_Process
Comment 4•11 years ago
|
||
Bug 888989 needs done in this window as well to add redundancy. We were assuming that object storage would be on S3 after this move so we haven't pushed for this previously, but that not being the case it needs done during the downtime. Thanks!
Comment 5•11 years ago
|
||
(In reply to Corey Shields [:cshields] from comment #4) > Bug 888989 needs done in this window as well to add redundancy. We were > assuming that object storage would be on S3 after this move so we haven't > pushed for this previously, but that not being the case it needs done during > the downtime. Thanks! We have been working with gcox and will move to the new volume during the migration. (In reply to Corey Shields [:cshields] from comment #3) > Is this an emergency move? I see the request is for a move on the 17th, the > CAB meets to approve this on the 17th at 9:00am, and we ask for 2 weeks > notice on a change[1]. Given that this is a new process and we are ironing > out the kinks I am happy to make an exception, but even so same-day approval > after a CAB meeting might be tough if there are exceptions, not to mention > no time to make appropriate notifications as this window is less than 48 > hours from now. > > [1] - https://wiki.mozilla.org/IT/Maintenance#Approval_Process Sorry about this, we found out we needed to file a CAB on 7/9 and we need to hit a rushed date to get everything moved over before new markets come online.
Comment 6•11 years ago
|
||
Cool, emailing the CAB members to bring this to their attention now, sooner than the meeting.
Comment 7•11 years ago
|
||
This bug mentions "hardware migration". I'm sorry that I can't gauge more context from this, but will someone fill me in? Are you moving from VMs to hardware? AWS to hardware? Old hardware to "dedicated new HA hardware"? Thanks.
Comment 8•11 years ago
|
||
Old hardware to new hardware.
Comment 9•11 years ago
|
||
(In reply to Wil Clouser [:clouserw] from comment #1) > Can we do this around 4pm or so PDT? That would get us to midnight in > Europe which is where most of the marketplace users are :oremj, any objections to starting at 4pm per Wil's suggestion?
Comment 10•11 years ago
|
||
(In reply to Mark Mayo [:mmayo] from comment #9) > :oremj, any objections to starting at 4pm per Wil's suggestion? 4pm works for us. We need to confirm the time with dbops and the storage team tomorrow.
Comment 11•11 years ago
|
||
dbeng can do 4 pm tomorrow.
Comment 12•11 years ago
|
||
Team Though I do not want to wait - I think going live is risky given the activities we have going on in production - Ensuring receipt checking passes QA is vital (Krupa) - We are currently stabilizing Poland in prod - Colombia is going live this week in prod I think next Tuesday after 4pm would be good - it's after Colombia payments live, but before Colombia go to market. other thoughts?
Comment 13•11 years ago
|
||
Tuesday @ 4pm works for us. Has Rick approved the time change?
Comment 14•11 years ago
|
||
Rick has delegated to Caitlin. Just make it middle of the night in Europe.
Comment 15•11 years ago
|
||
Alright, it's settled. We'll execute the cutover plan 7/23 @ 4pm.
Updated•11 years ago
|
Whiteboard: July 23rd, 2013, 16:00 PDT, 4 hours
Updated•11 years ago
|
Flags: cab-review? → cab-review+
Assignee | ||
Comment 16•11 years ago
|
||
This was completed yesterday.
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Comment 17•11 years ago
|
||
(In reply to Jason Thomas [:jason] from comment #16) > This was completed yesterday. There were some misses from the Bi/DW side - intake of data from ad-ons to hadoop. Re-opening so that Daniel can confirm all is done and run-books are updated to include this tie to data warehousing.
Updated•11 years ago
|
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Comment 18•11 years ago
|
||
Daniel, has this been fixed by now?
Comment 19•11 years ago
|
||
We are getting what appears to be 100% of the traffic, however, there is still an open issue with netscaler not supporting the inclusion of the DNT header value in the access logs. That also impacts the inclusion of B2G device details in marketplace logs, but we have a workaround in place for those where we are collecting logs directly from the webheads rather than netscaler. I think it would also be useful to have a nagios alert of some sort to detect a drop in log volume from the netscaler log collection daemon. That might be something that already exists.
Comment 20•11 years ago
|
||
Anurag, Annie - where are we with closing this one?
Comment 21•11 years ago
|
||
SylvieV - addons.mozilla.org: We are getting 100% logs for AMO, header information is still missing. Jason is working with Citrix in terms of getting a software update to start collecting the headers for AMO for DNT data and will update us when the patch is in place. marketplace.firefox.com: This is done. Logs for m.f.c are being collected via nginx w/ header support and pushed to our filers every night.
Assignee | ||
Comment 22•11 years ago
|
||
(In reply to Anurag Phadke[:aphadke@mozilla.com] from comment #21) > addons.mozilla.org: > We are getting 100% logs for AMO, header information is still missing. Jason > is working with Citrix in terms of getting a software update to start > collecting the headers for AMO for DNT data and will update us when the > patch is in place. We are waiting on Netscaler firmware 11.0 to patch issues with stability in Netscaler 10.1 (bug 900984). Citrix has not provided a final release date but support has stated that it might be out within the next two months. Bug 897732 opened to track progress.
Assignee | ||
Comment 23•11 years ago
|
||
Netscaler firmware was upgraded to latest stable release (bug 929110) which included support for custom headers. DNT headers have been configured for addons.mozilla.org netscaler logs.
Assignee: server-ops → jthomas
Status: REOPENED → RESOLVED
Closed: 11 years ago → 11 years ago
Resolution: --- → FIXED
Updated•10 years ago
|
Product: mozilla.org → Infrastructure & Operations
Updated•9 years ago
|
Change Request: --- → approved
Flags: cab-review+
You need to log in
before you can comment on or make changes to this bug.
Description
•