Closed
Bug 1446516
Opened 7 years ago
Closed 7 years ago
notify users of upcoming outage
Categories
(Socorro :: General, task, P1)
Socorro
General
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: willkg, Unassigned)
References
Details
We're in the process of finishing up a new infrastructure. In order to switch from the old infrastructure to the new one, things will be in "transition".
This bug covers figuring out
1. what will be working and what will be broken during the transition period
2. roughly how long the period will be for
3. figuring out who and how to notify users in advance
4. notifying users
Reporter | ||
Comment 1•7 years ago
|
||
In the meeting on Wednesday, Miles said that it's likely we can keep the Crash Stats site up during the migration. However, we'll stop processing, so incoming crashes will continue to get saved to S3, but won't get processed and won't show up in Crash Stats search, signature report, etc.
Miles also said the time it takes to do the migration is pretty dependent on how long an incremental S3DistCp takes. He's experimenting with S3DistCp and -stage -> -new-stage. We'll know more next week.
I think we should email the stability list. I don't know offhand who's on that list, though. We don't want to miss people. Mike said we should make sure to notify relman.
Crash Stats has a feature that lets us post a status message site-wide. We should do that, too.
We're pretty sure the "let's do it!" date is March 26th. We should figure out a time.
I think as soon as Miles has a better feel for how long the migration will take and we figure out a time and date, we should post a status message on the site and sending emails.
Reporter | ||
Comment 3•7 years ago
|
||
Tagging Miles for "how long do we estimate the migration will take?" and "is the details in comment #1 correct?"
Flags: needinfo?(miles)
Reporter | ||
Comment 4•7 years ago
|
||
We talked about this at the meeting today.
We're thinking 4 hour outage during which time we won't be processing crashes, but the webapp will probably work fine. We're thinking we're still on for Monday, March 26th, but that might slip depending on how long it takes for the initial -prod S3DistCp to run.
We want to notify everyone who uses the webapp. I created a bug covering the migration. We can link to that in a status message on the webapp. Something like:
"""
We're moving to a new house! On Monday, March 26th 2018, we'll have a 4-hour outage during which Crash Stats will work, but we won't be processing crashes. For more details, see <a href="https://bugzilla.mozilla.org/show_bug.cgi?id=1447748">bug 1447748</a>.
"""
We want to make sure we notify relman. I checked the stability subscribers and they're all on there, so I think we can notify that list and encourage them to forward to others and we'll be ok. Something like:
"""
We're switching to a new infrastructure for Crash Stats and all of the Mozilla crash ingestion pipeline.
We're hoping to do the work to migrate to the new infrastructure on Monday, March 26th. If we miss this date, we'll do it the next day.
We'll send an email to the stability list before we start the migration with details as well as after the migration has completed.
We expect to have a 4-hour outage for the migration. During the outage, the following will be true:
1. The collector will continue collecting incoming crashes--we will not lose any crashes.
2. We will shut off crash processing during the outage window. During the outage window, the Crash Stats site will not have access to crashes coming in during the outage window.
3. The Crash Stats webapp will probably continue work, but might throw occasional errors.
4. After the migration has completed, we will turn crash processing back on and process all the crashes in the queue.
"""
Miles, Brian, Lonnen, Mike: Does that sound right? Anything else to add?
No longer blocks: 1391034
Flags: needinfo?(mkelly)
Flags: needinfo?(chris.lonnen)
Flags: needinfo?(bpitts)
Comment 6•7 years ago
|
||
I think a 4 hour total "outage" window is a good estimate.
lgtm.
Flags: needinfo?(miles)
Reporter | ||
Comment 7•7 years ago
|
||
Status message posted on Crash Stats. Email sent to stability. Woo hoo!
Marking this as FIXORATED!
Status: NEW → RESOLVED
Closed: 7 years ago
Flags: needinfo?(chris.lonnen)
Flags: needinfo?(bpitts)
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•