Closed
Bug 1251307
Opened 9 years ago
Closed 9 years ago
Please deploy loop-server 0.19.3 to PRODUCTION
Categories
(Cloud Services :: Operations: Deployment Requests - DEPRECATED, task)
Cloud Services
Operations: Deployment Requests - DEPRECATED
Tracking
(Not tracked)
VERIFIED
FIXED
People
(Reporter: cloud-services-qa, Assigned: jschneider)
References
Details
------------------
RELEASE NOTES
------------------
https://github.com/mozilla-services/loop-server/releases
COMPARISONS
https://github.com/mozilla-services/loop-server/compare/0.19.1...0.18.2
https://github.com/mozilla-services/loop-server/compare/0.18.2...0.19.2
https://github.com/mozilla-services/loop-server/compare/0.19.2...0.19.3
TAGS
https://github.com/mozilla-services/loop-server/releases/tag/0.19.3
https://github.com/mozilla-services/loop-server/commit/d2c2f6d661e21f41c0cc3aa372d2750a2affec25
CHANGELOG
0.19.3 (2016-02-12)
-------------------
- Add a way to log the loop-client versions in use. (#362)
Comment 1•9 years ago
|
||
Tentatively scheduled for release on Tues., 3/1 @: 1pm PST / 4pm EST
Comment 2•9 years ago
|
||
re-scheduled for today @ 10am PST / 1pm EST
Comment 3•9 years ago
|
||
============================
PRE-DEPLOYMENT:
============================
Here's what's currently in production:
Placed several calls successfully between Nightly (47.0a1) and GR (44.0.2) using production loop-server 0.18.2 stack.
----------------------------
E2E TESTS
----------------------------
TESTS
messaging - OK
Tab & window-sharing - OK
Video/audio mute/unmute - OK
Room notifications - OK
end-2-end test calls - OK
----------------------------
URL CHECKS (PROD)
----------------------------
curl https://loop.services.mozilla.com | python -m json.tool
{
"description": "The Mozilla Loop (WebRTC App) server",
"endpoint": "https://loop.services.mozilla.com",
"fakeTokBox": false,
"fxaOAuth": true,
"homepage": "https://github.com/mozilla-services/loop-server/",
"i18n": {
"defaultLang": "en-US"
},
"name": "mozilla-loop-server",
"version": "0.18.2"
}
curl https://loop.services.mozilla.com/__heartbeat__ | python -m json.tool
{
"fxaVerifier": true,
"provider": true,
"storage": true
}
curl https://loop.services.mozilla.com/push-server-config | python -m json.tool
{
"pushServerURI": "wss://push.services.mozilla.com"
}
| Assignee | ||
Comment 4•9 years ago
|
||
New stack: "ELBDNSName": "loopsvrprod1-l-ELB-573C03V9CF29-1020574114.us-west-2.elb.amazonaws.com", "ELBFQDN": "loopsvrprod1-l-ELB-573C03V9CF29-1020574114.us-west-2.elb.amazonaws.com
| Assignee | ||
Comment 5•9 years ago
|
||
Old stackOld stack: dualstack.loopsvrprod1-l-elb-14rdb0b303rer-691625340.us-west-2.elb.amazonaws.com.
Comment 6•9 years ago
|
||
============================================
PRE-PRODUCTION (INCOMING) STACK VERIFICATION
============================================
E2E tests and stack check okay. Heartbeat check was showing intermittent errors.
As per conversation with :bobm and :jp, ops to follow up with Tarek's team to modify heartbeat check for push.
Halting this release.
Notice from Sentry:
Regression on Loop-Server loopserver-prod
Error: Heartbeat: {"storage":true,"provider":true,"push":false,"fxaVerifier":true}
Tags
level = error logger = root server_name = ip-172-31-34-117
Exception
Error: Heartbeat: {"storage":true,"provider":true,"push":false,"fxaVerifier":true}
File "/data/loop-server/loop/routes/home.js", line 59, in returnStatus
logError(new Error("Heartbeat: " + JSON.stringify(data)));
File "/data/loop-server/loop/routes/home.js", line 82, in null.<anonymous>
returnStatus(storageStatus, tokboxError, pushStatus, verifierStatus);
File "/data/loop-server/loop/routes/home.js", line 30, in Request._callback
if (error) return callback(error);
...
(6 additional frame(s) were not displayed)
Status: ASSIGNED → RESOLVED
Closed: 9 years ago
Resolution: --- → WONTFIX
Comment 7•9 years ago
|
||
Consider adding a load balancer specific health check to the service. See: https://bugzilla.mozilla.org/show_bug.cgi?id=1246008 for a similar request.
Comment 8•9 years ago
|
||
(In reply to Bob Micheletto [:bobm] from comment #7)
> Consider adding a load balancer specific health check to the service. See:
> https://bugzilla.mozilla.org/show_bug.cgi?id=1246008 for a similar request.
Thanks, Bob.
Per our vidyo this morning, we'll need to have a reliable __heartbeat__ check in place for push before we can re-deploy to PROD.
:natim, :tarek, is this something you guys might be able to add to heartbeat check?
Flags: needinfo?(tarek)
Flags: needinfo?(rhubscher)
Comment 9•9 years ago
|
||
Can we release this one without that change and add it to the next release which will happen during this week with new features?
Flags: needinfo?(rhubscher)
Comment 10•9 years ago
|
||
(In reply to Rémy Hubscher (:natim) from comment #9)
> Can we release this one without that change and add it to the next release
> which will happen during this week with new features?
It's a good point as I believe push was just going to be added to the heartbeat w/ this release so, in theory, we're not breaking anything that was working before. Though we would have to choose to ignore the push heartbeat til the next release.
but I defer to Ops for this one :jp, :bobm?
Flags: needinfo?(jschneider)
Flags: needinfo?(bobm)
Comment 11•9 years ago
|
||
We could also increase the timeout value on the push heartbeat call.
Comment 12•9 years ago
|
||
The default value of the config for heartbeatTimeout is 2000ms we may want to wait one more second before telling the push endpoint is broken.
Comment 13•9 years ago
|
||
(In reply to Richard Pappalardo [:rpapa][:rpappalardo] from comment #10)
> It's a good point as I believe push was just going to be added to the
> heartbeat w/ this release so, in theory, we're not breaking anything that
> was working before. Though we would have to choose to ignore the push
> heartbeat til the next release.
>
> but I defer to Ops for this one :jp, :bobm?
I defer to jp!
Flags: needinfo?(bobm)
| Assignee | ||
Comment 14•9 years ago
|
||
While I don't love having TCP healthchecks on load balancers, it's how we're currently running, so I won't block on it.
As a heads up, until we make an lbheartbeat endpoint which doesn't exercise resource dependencies to give a 200 OK, we run the risk of having unhealthy nodes in our load balancer.
| Assignee | ||
Comment 15•9 years ago
|
||
I fished this bit out of our documentation in mana:
"/__heartbeat__
Should return a 200 if the service is healthy, and a 500 otherwise. This should check dependent services like the database connection to ensure that they are healthy
/__lbheartbeat__
Should respond 200 if the service is up, 500 otherwise. This is for load balancer checks and should not check dependent services."
Right there we all get what we want. :)
Flags: needinfo?(jschneider)
Comment 16•9 years ago
|
||
OK, sounds good. :jp lets try and sync-up w/ :chartjes tomorrow and make a plan.
Flags: needinfo?(tarek)
Comment 17•9 years ago
|
||
Ops has given OK to move forward with 0.19.3 so re-opening ticket.
Deployment will be tomorrow, Thurs. 3/3 @: 9am PST / 11am CST / 12pm EST
:jp to follow-up w/ :natim to modify heartbeat prior to next deploy
Status: RESOLVED → REOPENED
Resolution: WONTFIX → ---
| Assignee | ||
Comment 18•9 years ago
|
||
I've built the new stack, and it's ready at "ELBDNSName": "loopsvrprod1-l-ELB-LG8XPE596N1V-2051374085.us-west-2.elb.amazonaws.com", "ELBFQDN": "loopsvrprod1-l-ELB-LG8XPE596N1V-2051374085.us-west-2.elb.amazonaws.com"}
Comment 19•9 years ago
|
||
The __lbhealthcheck__ feature for loop-server is getting implemented in Bug 1253257
Comment 20•9 years ago
|
||
JP Schneider it is strange that the version displayed on this instance [0] is set 0.20.0-dev
[0] https://loopsvrprod1-l-elb-lg8xpe596n1v-2051374085.us-west-2.elb.amazonaws.com/
Comment 21•9 years ago
|
||
Apparently it is an error in my tag: https://github.com/mozilla-services/loop-server/blob/0.19.3/package.json#L4
I am going to fix that in a 0.19.4 release if that is ok for you.
Comment 22•9 years ago
|
||
Here is the 0.19.4 release with the fix: https://github.com/mozilla-services/loop-server/releases/tag/0.19.4
Comment 23•9 years ago
|
||
Here is the diff of what changed: https://github.com/mozilla-services/loop-server/compare/0.19.3...0.19.4
Comment 24•9 years ago
|
||
============================
PRE-DEPLOYMENT:
============================
Here's what's currently in pre-production:
Placed several calls successfully between Nightly (47.0a1) and GR (44.0.2) using pre-production loop-server 0.19.3 stack.
----------------------------
E2E TESTS
----------------------------
TESTS
messaging - OK
Tab & window-sharing - OK
Video/audio mute/unmute - OK
Room notifications - OK
end-2-end test calls - OK
----------------------------
URL CHECKS (PRE-PRODUCTION)
----------------------------
It's a known issue that the devs accidentally tagged this release as 0.20.0-dev. It is 0.19.3 that is up in pre-production.
curl -k https://loop.services.mozilla.com | python -m json.tool
{
"description": "The Mozilla Loop (WebRTC App) server",
"endpoint": "https://loop.services.mozilla.com",
"fakeTokBox": false,
"fxaOAuth": true,
"homepage": "https://github.com/mozilla-services/loop-server/",
"i18n": {
"defaultLang": "en-US"
},
"name": "mozilla-loop-server",
"version": "0.20.0-dev"
}
NOTE: Known issue that heartbeat is intermittently reporting that push is
down when in fact the system is working correctly. :natim indicated a fix
is in the works
~ ᐅ curl -k https://loop.services.mozilla.com/__heartbeat__ | python -m json.tool
{
"fxaVerifier": true,
"provider": true,
"push": true,
"storage": true
}
~ ᐅ curl -k https://loop.services.mozilla.com/__heartbeat__ | python -m json.tool
{
"fxaVerifier": true,
"provider": true,
"push": false,
"storage": true
}
~ ᐅ curl -k https://loop.services.mozilla.com/push-server-config | python -m json.tool
{
"pushServerURI": "wss://push.services.mozilla.com"
}
QA approved. Ready for DNS switch to production at scheduled deployment time of
12:00 Eastern Time.
Status: REOPENED → ASSIGNED
| Assignee | ||
Comment 25•9 years ago
|
||
New stack : "ELBDNSName": "loopsvrprod1-l-ELB-LG8XPE596N1V-2051374085.us-west-2.elb.amazonaws.com", "ELBFQDN": "loopsvrprod1-l-ELB-LG8XPE596N1V-2051374085.us-west-2.elb.amazonaws.com"}
We've got stackdriver constantly alerting due to the new heartbeat issue we know about. I'm going to disable that check for now.
Comment 26•9 years ago
|
||
Please consider changing heartbeatTimeout rather than deactivating the healthcheck.
It is a configuration for loop-server that seems to be too low to let pushServer answer.
We can configure:
heartbeatTimeout: 3000
Comment 27•9 years ago
|
||
> QA approved. Ready for DNS switch to production at scheduled deployment time of
12:00 Eastern Time.
I would rather not switch to production with the wrong version number displayed.
Comment 28•9 years ago
|
||
Met w/ Dev/Ops/QA on vidyo and decided to move ahead with 0.19.3 since we have a 0.20.0 tag (to be deployed next Thurs.)
| Assignee | ||
Comment 29•9 years ago
|
||
Original IP's
52.24.142.188
52.88.50.37
54.68.81.130
Original Stack:
loopsvrprod1-l-elb-14rdb0b303rer-691625340.us-west-2.elb.amazonaws.com.
Switching to new stack:
loopsvrprod1-l-ELB-LG8XPE596N1V-2051374085.us-west-2.elb.amazonaws.com
Switched at 17:10:00 UTC
Comment 30•9 years ago
|
||
============================
PRODUCTION:
============================
Placed several calls successfully between Nightly (47.0a1) and GR (44.0.2) using production loop-server 0.19.3 stack.
----------------------------
E2E TESTS
----------------------------
TESTS
messaging - OK
Tab & window-sharing - OK
Video/audio mute/unmute - OK
Room notifications - OK
end-2-end test calls - OK
----------------------------
URL CHECKS (PROD)
----------------------------
It's a known issue that the devs accidentally labelled this release as 0.20.0-dev. It is 0.19.3 that is up in production
~ ᐅ curl https://loop.services.mozilla.com | python -m json.tool
{
"description": "The Mozilla Loop (WebRTC App) server",
"endpoint": "https://loop.services.mozilla.com",
"fakeTokBox": false,
"fxaOAuth": true,
"homepage": "https://github.com/mozilla-services/loop-server/",
"i18n": {
"defaultLang": "en-US"
},
"name": "mozilla-loop-server",
"version": "0.20.0-dev"
}
NOTE: Known issue that heartbeat is intermittently reporting that push is
down when in fact the system is working correctly. :natim has suggested a fix in the timeout length on the push server to fix it.
~ ᐅ curl https://loop.services.mozilla.com/__heartbeat__ | python -m json.tool
{
"fxaVerifier": true,
"provider": true,
"push": true,
"storage": true
}
~ ᐅ curl https://loop.services.mozilla.com/push-server-config | python -m json.tool
{
"pushServerURI": "wss://push.services.mozilla.com"
}
QA approved.
Status: ASSIGNED → RESOLVED
Closed: 9 years ago → 9 years ago
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•