1251307 - Please deploy loop-server 0.19.3 to PRODUCTION

============================ PRE-DEPLOYMENT: ============================ Here's what's currently in production: Placed several calls successfully between Nightly (47.0a1) and GR (44.0.2) using production loop-server 0.18.2 stack. ---------------------------- E2E TESTS ---------------------------- TESTS messaging - OK Tab & window-sharing - OK Video/audio mute/unmute - OK Room notifications - OK end-2-end test calls - OK ---------------------------- URL CHECKS (PROD) ---------------------------- curl https://loop.services.mozilla.com | python -m json.tool { "description": "The Mozilla Loop (WebRTC App) server", "endpoint": "https://loop.services.mozilla.com", "fakeTokBox": false, "fxaOAuth": true, "homepage": "https://github.com/mozilla-services/loop-server/", "i18n": { "defaultLang": "en-US" }, "name": "mozilla-loop-server", "version": "0.18.2" } curl https://loop.services.mozilla.com/__heartbeat__ | python -m json.tool { "fxaVerifier": true, "provider": true, "storage": true } curl https://loop.services.mozilla.com/push-server-config | python -m json.tool { "pushServerURI": "wss://push.services.mozilla.com" }

JP Schneider [:jp]

Assignee

Comment 4

•

9 years ago

New stack: "ELBDNSName": "loopsvrprod1-l-ELB-573C03V9CF29-1020574114.us-west-2.elb.amazonaws.com", "ELBFQDN": "loopsvrprod1-l-ELB-573C03V9CF29-1020574114.us-west-2.elb.amazonaws.com

JP Schneider [:jp]

Assignee

Comment 5

•

9 years ago

Old stackOld stack: dualstack.loopsvrprod1-l-elb-14rdb0b303rer-691625340.us-west-2.elb.amazonaws.com.

Chris Hartjes [:grumpy][:chartjes]

Comment 6

•

9 years ago

============================================ PRE-PRODUCTION (INCOMING) STACK VERIFICATION ============================================ E2E tests and stack check okay. Heartbeat check was showing intermittent errors. As per conversation with :bobm and :jp, ops to follow up with Tarek's team to modify heartbeat check for push. Halting this release. Notice from Sentry: Regression on Loop-Server loopserver-prod Error: Heartbeat: {"storage":true,"provider":true,"push":false,"fxaVerifier":true} Tags level = error logger = root server_name = ip-172-31-34-117 Exception Error: Heartbeat: {"storage":true,"provider":true,"push":false,"fxaVerifier":true} File "/data/loop-server/loop/routes/home.js", line 59, in returnStatus logError(new Error("Heartbeat: " + JSON.stringify(data))); File "/data/loop-server/loop/routes/home.js", line 82, in null.<anonymous> returnStatus(storageStatus, tokboxError, pushStatus, verifierStatus); File "/data/loop-server/loop/routes/home.js", line 30, in Request._callback if (error) return callback(error); ... (6 additional frame(s) were not displayed)

Status: ASSIGNED → RESOLVED

Closed: 9 years ago

Resolution: --- → WONTFIX

Bob Micheletto [:bobm]

Comment 7

•

9 years ago

Consider adding a load balancer specific health check to the service. See: https://bugzilla.mozilla.org/show_bug.cgi?id=1246008 for a similar request.

Richard Pappalardo [:rpapa][:rpappalardo]

Comment 8

•

9 years ago

(In reply to Bob Micheletto [:bobm] from comment #7) > Consider adding a load balancer specific health check to the service. See: > https://bugzilla.mozilla.org/show_bug.cgi?id=1246008 for a similar request. Thanks, Bob. Per our vidyo this morning, we'll need to have a reliable __heartbeat__ check in place for push before we can re-deploy to PROD. :natim, :tarek, is this something you guys might be able to add to heartbeat check?

Flags: needinfo?(tarek)

Flags: needinfo?(rhubscher)

Rémy Hubscher (:natim)

Comment 9

•

9 years ago

Can we release this one without that change and add it to the next release which will happen during this week with new features?

Flags: needinfo?(rhubscher)

Richard Pappalardo [:rpapa][:rpappalardo]

Comment 10

•

9 years ago

(In reply to Rémy Hubscher (:natim) from comment #9) > Can we release this one without that change and add it to the next release > which will happen during this week with new features? It's a good point as I believe push was just going to be added to the heartbeat w/ this release so, in theory, we're not breaking anything that was working before. Though we would have to choose to ignore the push heartbeat til the next release. but I defer to Ops for this one :jp, :bobm?

Flags: needinfo?(jschneider)

Flags: needinfo?(bobm)

Rémy Hubscher (:natim)

Comment 11

•

9 years ago

We could also increase the timeout value on the push heartbeat call.

Rémy Hubscher (:natim)

Comment 12

•

9 years ago

The default value of the config for heartbeatTimeout is 2000ms we may want to wait one more second before telling the push endpoint is broken.

Bob Micheletto [:bobm]

Comment 13

•

9 years ago

(In reply to Richard Pappalardo [:rpapa][:rpappalardo] from comment #10) > It's a good point as I believe push was just going to be added to the > heartbeat w/ this release so, in theory, we're not breaking anything that > was working before. Though we would have to choose to ignore the push > heartbeat til the next release. > > but I defer to Ops for this one :jp, :bobm? I defer to jp!

Flags: needinfo?(bobm)

JP Schneider [:jp]

Assignee

Comment 14

•

9 years ago

While I don't love having TCP healthchecks on load balancers, it's how we're currently running, so I won't block on it. As a heads up, until we make an lbheartbeat endpoint which doesn't exercise resource dependencies to give a 200 OK, we run the risk of having unhealthy nodes in our load balancer.

JP Schneider [:jp]

Assignee

Comment 15

•

9 years ago

I fished this bit out of our documentation in mana: "/__heartbeat__ Should return a 200 if the service is healthy, and a 500 otherwise. This should check dependent services like the database connection to ensure that they are healthy /__lbheartbeat__ Should respond 200 if the service is up, 500 otherwise. This is for load balancer checks and should not check dependent services." Right there we all get what we want. :)

Flags: needinfo?(jschneider)

Richard Pappalardo [:rpapa][:rpappalardo]

Comment 16

•

9 years ago

OK, sounds good. :jp lets try and sync-up w/ :chartjes tomorrow and make a plan.

Flags: needinfo?(tarek)

Richard Pappalardo [:rpapa][:rpappalardo]

Comment 17

•

9 years ago

Ops has given OK to move forward with 0.19.3 so re-opening ticket. Deployment will be tomorrow, Thurs. 3/3 @: 9am PST / 11am CST / 12pm EST :jp to follow-up w/ :natim to modify heartbeat prior to next deploy

Status: RESOLVED → REOPENED

Resolution: WONTFIX → ---

JP Schneider [:jp]

Assignee

Comment 18

•

9 years ago

I've built the new stack, and it's ready at "ELBDNSName": "loopsvrprod1-l-ELB-LG8XPE596N1V-2051374085.us-west-2.elb.amazonaws.com", "ELBFQDN": "loopsvrprod1-l-ELB-LG8XPE596N1V-2051374085.us-west-2.elb.amazonaws.com"}

Rémy Hubscher (:natim)

Comment 19

•

9 years ago

The __lbhealthcheck__ feature for loop-server is getting implemented in Bug 1253257

Rémy Hubscher (:natim)

Comment 20

•

9 years ago

JP Schneider it is strange that the version displayed on this instance [0] is set 0.20.0-dev [0] https://loopsvrprod1-l-elb-lg8xpe596n1v-2051374085.us-west-2.elb.amazonaws.com/

Rémy Hubscher (:natim)

Comment 21

•

9 years ago

Apparently it is an error in my tag: https://github.com/mozilla-services/loop-server/blob/0.19.3/package.json#L4 I am going to fix that in a 0.19.4 release if that is ok for you.

Rémy Hubscher (:natim)

Comment 22

•

9 years ago

Here is the 0.19.4 release with the fix: https://github.com/mozilla-services/loop-server/releases/tag/0.19.4

Rémy Hubscher (:natim)

Comment 23

•

9 years ago

Here is the diff of what changed: https://github.com/mozilla-services/loop-server/compare/0.19.3...0.19.4

Chris Hartjes [:grumpy][:chartjes]

Comment 24

•

9 years ago

============================ PRE-DEPLOYMENT: ============================ Here's what's currently in pre-production: Placed several calls successfully between Nightly (47.0a1) and GR (44.0.2) using pre-production loop-server 0.19.3 stack. ---------------------------- E2E TESTS ---------------------------- TESTS messaging - OK Tab & window-sharing - OK Video/audio mute/unmute - OK Room notifications - OK end-2-end test calls - OK ---------------------------- URL CHECKS (PRE-PRODUCTION) ---------------------------- It's a known issue that the devs accidentally tagged this release as 0.20.0-dev. It is 0.19.3 that is up in pre-production. curl -k https://loop.services.mozilla.com | python -m json.tool { "description": "The Mozilla Loop (WebRTC App) server", "endpoint": "https://loop.services.mozilla.com", "fakeTokBox": false, "fxaOAuth": true, "homepage": "https://github.com/mozilla-services/loop-server/", "i18n": { "defaultLang": "en-US" }, "name": "mozilla-loop-server", "version": "0.20.0-dev" } NOTE: Known issue that heartbeat is intermittently reporting that push is down when in fact the system is working correctly. :natim indicated a fix is in the works ~ ᐅ curl -k https://loop.services.mozilla.com/__heartbeat__ | python -m json.tool { "fxaVerifier": true, "provider": true, "push": true, "storage": true } ~ ᐅ curl -k https://loop.services.mozilla.com/__heartbeat__ | python -m json.tool { "fxaVerifier": true, "provider": true, "push": false, "storage": true } ~ ᐅ curl -k https://loop.services.mozilla.com/push-server-config | python -m json.tool { "pushServerURI": "wss://push.services.mozilla.com" } QA approved. Ready for DNS switch to production at scheduled deployment time of 12:00 Eastern Time.

Status: REOPENED → ASSIGNED

JP Schneider [:jp]

Assignee

Comment 25

•

9 years ago

New stack : "ELBDNSName": "loopsvrprod1-l-ELB-LG8XPE596N1V-2051374085.us-west-2.elb.amazonaws.com", "ELBFQDN": "loopsvrprod1-l-ELB-LG8XPE596N1V-2051374085.us-west-2.elb.amazonaws.com"} We've got stackdriver constantly alerting due to the new heartbeat issue we know about. I'm going to disable that check for now.

Rémy Hubscher (:natim)

Comment 26

•

9 years ago

Please consider changing heartbeatTimeout rather than deactivating the healthcheck. It is a configuration for loop-server that seems to be too low to let pushServer answer. We can configure: heartbeatTimeout: 3000

Rémy Hubscher (:natim)

Comment 27

•

9 years ago

> QA approved. Ready for DNS switch to production at scheduled deployment time of 12:00 Eastern Time. I would rather not switch to production with the wrong version number displayed.

Richard Pappalardo [:rpapa][:rpappalardo]

Comment 28

•

9 years ago

Met w/ Dev/Ops/QA on vidyo and decided to move ahead with 0.19.3 since we have a 0.20.0 tag (to be deployed next Thurs.)

JP Schneider [:jp]

Assignee

Comment 29

•

9 years ago

Original IP's 52.24.142.188 52.88.50.37 54.68.81.130 Original Stack: loopsvrprod1-l-elb-14rdb0b303rer-691625340.us-west-2.elb.amazonaws.com. Switching to new stack: loopsvrprod1-l-ELB-LG8XPE596N1V-2051374085.us-west-2.elb.amazonaws.com Switched at 17:10:00 UTC

Chris Hartjes [:grumpy][:chartjes]

Comment 30

•

9 years ago

============================ PRODUCTION: ============================ Placed several calls successfully between Nightly (47.0a1) and GR (44.0.2) using production loop-server 0.19.3 stack. ---------------------------- E2E TESTS ---------------------------- TESTS messaging - OK Tab & window-sharing - OK Video/audio mute/unmute - OK Room notifications - OK end-2-end test calls - OK ---------------------------- URL CHECKS (PROD) ---------------------------- It's a known issue that the devs accidentally labelled this release as 0.20.0-dev. It is 0.19.3 that is up in production ~ ᐅ curl https://loop.services.mozilla.com | python -m json.tool { "description": "The Mozilla Loop (WebRTC App) server", "endpoint": "https://loop.services.mozilla.com", "fakeTokBox": false, "fxaOAuth": true, "homepage": "https://github.com/mozilla-services/loop-server/", "i18n": { "defaultLang": "en-US" }, "name": "mozilla-loop-server", "version": "0.20.0-dev" } NOTE: Known issue that heartbeat is intermittently reporting that push is down when in fact the system is working correctly. :natim has suggested a fix in the timeout length on the push server to fix it. ~ ᐅ curl https://loop.services.mozilla.com/__heartbeat__ | python -m json.tool { "fxaVerifier": true, "provider": true, "push": true, "storage": true } ~ ᐅ curl https://loop.services.mozilla.com/push-server-config | python -m json.tool { "pushServerURI": "wss://push.services.mozilla.com" } QA approved.

Status: ASSIGNED → RESOLVED

Closed: 9 years ago → 9 years ago

Resolution: --- → FIXED

Chris Hartjes [:grumpy][:chartjes]

Comment 31

•

9 years ago

Forgot to move verified. One grumpy thumbs up.

Status: RESOLVED → VERIFIED