Closed Bug 1819151 Opened 2 years ago Closed 2 years ago

validate stage infra and deploy pipeline

Categories

(Eliot :: General, task, P2)

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: willkg, Assigned: willkg)

References

Details

After the stage environment has been built, we need to validate the stage infrastructure and deploy pipeline.

This bug covers building a validation plan and executing on it.

Rough validation plan:

  1. validate deploys
    1. can we trigger deploys?
    2. do deploys notify the channel?
    3. do deploys complete?
    4. is eliot in stage updated after a deploy? -- check /__version__ for commit sha
  2. validate infrastructure
    1. does logging work?
    2. does monitoring work? is there a dashboard in grafana for eliot on stage?
    3. does sentry work?
    4. does symbolication API work? does it return correct responses?
    5. do dashboards look correct?

We can re-use the symbolication API bits we did in 2021h2 in bug #1674102.

I updated the validation document and fleshed it out with the things I think we need to look at. We can adjust it as we go along.

https://docs.google.com/document/d/1W61N9Ki0zwy-zitiN2AyXAbuPxgwSo-kc8_62-UnOpQ/edit#

I'm setting up the tools I need to validate and load test the stage environment. I'm reusing as much as I can from the tecken-loadtests repo.

https://github.com/mozilla-services/tecken-loadtests/

I'm currently waiting on Sentry access. Once I have that, I can start working on things.

Current status.

Done:

  1. Deploys to stage are working.
  2. Logging is working, there's a link in Confluence, I have access to them.
  3. There's an app monitoring dashboard and an infra monitoring dashboard.
  4. We verified the HTTP response headers are correct. Eliot GCP stage gets an A+ in the Mozilla Observatory.

Not done:

  1. Sentry, yet. https://mozilla-hub.atlassian.net/browse/DSRE-1236
  2. Validate stage environment. https://mozilla-hub.atlassian.net/browse/DSRE-1242
  3. Load test stage environment. https://mozilla-hub.atlassian.net/browse/DSRE-1243
  4. Verify the app and infra dashboards are working.

Getting there.

Current status:

Done:

  1. I verified that Sentry events make it to the Eliot GCP stage Sentry project.

Not done:

  1. Validate stage environment. https://mozilla-hub.atlassian.net/browse/DSRE-1242
  2. Load test stage environment. https://mozilla-hub.atlassian.net/browse/DSRE-1243
  3. Verify the app and infra dashboards are working.

I finished validating the stage environment. I documented (roughly) the tools I used. I updated the repo with the latest versions of everything.

https://github.com/mozilla-services/tecken-loadtests/commit/714d09a2fd83cd351f2532a1f4e18c6047d46daf

Final validation document: https://docs.google.com/document/d/1W61N9Ki0zwy-zitiN2AyXAbuPxgwSo-kc8_62-UnOpQ/edit#

Everything looks good.

Status: ASSIGNED → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.