[MSISDN] Please re-deploy msisdn-gateway 0.5.1 to STAGE - CONFIG UPDATES

RESOLVED WONTFIX

Status

RESOLVED WONTFIX
4 years ago
4 years ago

People

(Reporter: rpapa, Unassigned)

Tracking

Firefox Tracking Flags

(Not tracked)

Details

Comment hidden (empty)
QA Contact: rpappalardo
These stacks now exist under
  msisdn.stage.mozaws.net
  msisdn-isoprod.stage.mozaws.net

They have numerous internal changes from an infrastructure perspective -
We've changed the build pipeline around -alot-.
The datadog agent has been added to the build.
We've updated the AMI (and so all the base OS packages).
We now use the Amazon provided SSL policies.
Version information is exposed at the stack, ASG and instance level.

From a code/QA test suite perspective hopefully nothing has changed.
Hey Remy, been awhile since I've tested msisdn and just want to make sure I'm testing correctly:

when using msisn-cli (pointing to PROD), i get:
$ msisdn-cli --host=https://msisdn.services.mozilla.com  -c 315 -n +1<US area-code + number>
HTTPError: 503 Server Error: Service Unavailable


I also tried:
$ msisdn-cli --host=https://msisdn-isoprod.stage.mozaws.net  -c 315 -n +1<US area-code + number>
Please enter the code that you will get by SMS from Mozilla: 

But I never receive anything response on my Flame

I also tried the verifier client for both hosts: http://mozilla-services.github.io/msisdn-verifier-client/
but again it just sends the request into the ether and my Flame never receives message.

Am I missing something?

Both msisdn (PROD) and (ISOPROD) are coming up in browser:
{
  "name"
: "mozilla-msisdn-gateway",
  "description": "The Mozilla MSISDN Gateway",
  "version": "0.5.1",
  "homepage": "https://github.com/mozilla-services/msisdn-gateway/",
  "endpoint": "https://msisdn.services.mozilla.com"
}
Flags: needinfo?(rhubscher)
Hello Richard,

I've got a 503 as well when trying the production server. We should look at the log to understand what is going on. I have checked the SMS service balance and it looks ok.

I tried the isoprod server but it looks like the message is never send to nexmo but sent to oxmen instead.

http://omxen.dev.mozaws.net/receive?to=1<US area-code + number>
Flags: needinfo?(rhubscher)
Flags: needinfo?(dwilson)
Flags: needinfo?(bobm)
The stage msisdn is at - rhubscher@ec2-54-76-112-192.eu-west-1.compute.amazonaws.com
cd /media/ephemeral0
find

And nothing appears in the logs. I did the same check on the isoprod and that also shows no errors in the logs.
Flags: needinfo?(bobm)
We don't have error in stage or isoprod, the 503 errors are only on the production server.
We have a number of -

TypeError: Cannot read property 'body' of undefined
    at Request._callback (/data/msisdn-gateway/msisdn-gateway/sms/outbound/nexmo.js:29:27)
    at self.callback (/data/msisdn-gateway/node_modules/request/request.js:123:22)
    at Request.emit (events.js:95:17)
    at ClientRequest.self.clientErrorHandler (/data/msisdn-gateway/node_modules/request/request.js:244:10)
    at ClientRequest.emit (events.js:95:17)
    at CleartextStream.socketErrorListener (http.js:1547:9)
    at CleartextStream.emit (events.js:95:17)
    at Socket.onerror (tls.js:1440:17)
    at Socket.emit (events.js:117:20)
    at net.js:833:16
    at process._tickCallback (node.js:419:13)
[Error: READONLY You can't write against a read only slave.]
[Error: READONLY You can't write against a read only slave.]
[Error: READONLY You can't write against a read only slave.]
[Error: READONLY You can't write against a read only slave.]
[Error: READONLY You can't write against a read only slave.]
[Error: READONLY You can't write against a read only slave.]
[Error: READONLY You can't write against a read only slave.]
[Error: READONLY You can't write against a read only slave.]
[Error: READONLY You can't write against a read only slave.]
[Error: READONLY You can't write against a read only slave.]
[Error: READONLY You can't write against a read only slave.]
[Error: READONLY You can't write against a read only slave.]
[Error: READONLY You can't write against a read only slave.]

We have a multiple redis node cluster in stage as well so i'd hope we'd see similar issue in stage - but we don't seem to have them in the logs.

I'm investigating the sentry configs now too.
Flags: needinfo?(dwilson)
Dean, I think we're getting closer.  Though the "smsGateways" now contains a "nexmo" entry, it looks like the isoprod endpoint value is still pointing to omxen URL instead of to the one for nexmo.  I'm pretty sure we also need an apiKey and apiSecret for that, i.e. all values should be identical to production.

The production.json on the isoprod STAGE server has the following entry:
    "smsGateways": {
        "nexmo": {
            "apiKey": "",
            "apiSecret": "",
            "endpoint": "http://omxen.dev.mozaws.net/send",
            "priority": 10
        }

~~~~~~~~~~~~

When I place calls to https://msisdn-isoprod.stage.mozaws.net

The verification codes are going thru to omxen:

browser:
http://omxen.dev.mozaws.net/receive?to=16505757343:

output:
[{"text": "Your verification code is: 103536", "from": "Mozilla"}]
Flags: needinfo?(dwilson)
just to be clear, I forgot to add...  the verification code is never received on my phone (because it's going to omxen - fake endpoint).
We should chat about this with natim later today but i think that the isoprod environment should have the fake details. I know this isn't what's currently planned but i have a reason, honest!

I think it makes more conceptual sense (and cleaner puppet code) for the standard stage stack to use the same provider as prod. They should both do the same thing and isoprod should be the special snowflake.

I also think we should rename isoprod to msisdnloadtest or something like that to be much more obvious to anyone new to the project.
I have just released 0.6.0 with the fix for the heartbeat endpoint.

https://github.com/mozilla-services/msisdn-gateway/releases/tag/0.6.0
After meeting today, our plan is to redeploy the msisdn-gateway STAGE environment so that our primary stage looks exactly like PROD.
Also, going forward, the instance on which we do loadtesting will be clearly labelled as such:

[A]. msisdn.stage.mozaws.net          --> Nexmo (real endpoint)
[B]. msisdn-loadtest.stage.mozaws.net --> Omxen (fake endpoint)

To save spinning up stacks prematurely, the new (and in future, automated) pipeline, 
would execute something like this:

[1]. spin up msisdn.stage.mozaws.net  --> Nexmo (real endpoint)
[2]. manually verify (or run deploy-verify script)
[3]. manual e2e test (or run e2e script)

if everything works as expected, then proceed:
[4]. spin up msisdn-loadtest.stage.mozaws.net --> Omxen (fake endpoint)
[5]. manually verify (or run deploy-verify script)
[6]. manually run loadtest (or automated run)

Dean will re-deploy to STAGE on Mon. and I'll retest. 

Thank you, Dean!  (and thanks, Remy, for fixing the heartbeat!)
Flags: needinfo?(dwilson)
closing this out so we can include minor fix to heartbeat.

see:
Bug 1165985 - Please deploy release 0.6.0 to msisdn-gateway STAGE
Status: NEW → RESOLVED
Last Resolved: 4 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.