Closed
Bug 1020899
Opened 11 years ago
Closed 11 years ago
Deploy the msisdn server to stage
Categories
(Cloud Services :: Operations: Deployment Requests - DEPRECATED, task, P1)
Tracking
(Not tracked)
VERIFIED
FIXED
People
(Reporter: tarek, Assigned: mostlygeek)
References
Details
(Whiteboard: [qa+])
Please puppetize/deploy msisdn on aws
the repo is at : https://github.com/mozilla-services/msisdn-gateway
There's a deployed aws dev. Rémy is the main contact.
Thanks!
Reporter | ||
Comment 1•11 years ago
|
||
proposed stage url: https://stage-msisdn.services.mozilla.com
prod url: https://msisdn.services.mozilla.com
Comment 2•11 years ago
|
||
I believe stage urls are <projectname>.stage.mozaws.net
Reporter | ||
Comment 3•11 years ago
|
||
ok
Comment 4•11 years ago
|
||
Yes, :alexis that is fairly standard now:
Dev:
http://loop.dev.mozaws.net
http://msisdn.dev.mozaws.net
Stage:
https://loop.stage.mozaws.net
http://msisdn.stage.mozaws.net <---- we could do this.
Others:
Content Server: https://accounts.stage.mozaws.net
Auth Server: https://api-accounts.stage.mozaws.net
TokenServer: https://token.stage.mozaws.net
Verifier: https://verifier.stage.mozaws.net
Sync: https://<NODE>.stage.mozaws.net
This makes sense to me for Prod: prod url: https://msisdn.services.mozilla.com
Whiteboard: [qa+]
Reporter | ||
Comment 5•11 years ago
|
||
whatever you guys think is the best pick
Comment 6•11 years ago
|
||
Well loadtests are ready so feel free to deploy asap.
Comment 7•11 years ago
|
||
Added :bobm and :mostlygeek - not sure who will be doing the actual deploy.
Priority: -- → P1
Reporter | ||
Comment 8•11 years ago
|
||
deadline is before the end of june
Comment 9•11 years ago
|
||
Well, hopefully we can just do it next week ;-)
Assignee | ||
Comment 10•11 years ago
|
||
I have a PR https://github.com/mozilla-services/msisdn-gateway/pull/84 that I would like merged before deploying the staging version.
Also can somebody provide me with the relevant config options for the staging environment?
Assignee | ||
Updated•11 years ago
|
Flags: needinfo?(tarek)
Assignee | ||
Comment 11•11 years ago
|
||
Two PRs we'll need on the cloudops side to merge before we can deploy to stage:
- https://github.com/mozilla-services/puppet-config/pull/596
- https://github.com/mozilla-services/svcops/pull/166
Assignee | ||
Comment 13•11 years ago
|
||
:tarek thanks for your help w/ the config.js PR.
I'm not sure about the SMS configuration for stage. Does somebody have SMS gateway information for me?
Comment 14•11 years ago
|
||
:mostlygeek for stage we should use the omxen mock to be able to loadtest it.
https://github.com/mozilla-services/omxen/
Comment 15•11 years ago
|
||
So, do we need omxen to be part of this Stage stack?
Much like we have https://loop-delayed-response.stage.mozaws.net for Loop-Server Stage?
Comment 16•11 years ago
|
||
This is a solution but I think you directly use the yet deployed one.
Can you put a omxen.dev.mozaws.net domain name on it please?
The current server stand at http://ec2-54-203-73-122.us-west-2.compute.amazonaws.com/
Flags: needinfo?(bwong)
Comment 18•11 years ago
|
||
Ok, thank you, I will check tomorrow.
Comment 19•11 years ago
|
||
:natim can you get me access to ec2-54-203-73-122.us-west-2.compute.amazonaws.com ?
Are you using a specific .pem file for the credentials?
Also, how was it deployed? OPs did it? or awsbox/awsboxen? or you created your own AWS instance?
Finally, I can ping ec2-54-203-73-122.us-west-2.compute.amazonaws.com
but I can not ping omxen.dev.mozaws.net
I was expecting it to redirect but instead I get a "server not found"
Status: NEW → ASSIGNED
Reporter | ||
Comment 20•11 years ago
|
||
> Also, how was it deployed? OPs did it? or awsbox/awsboxen? or you created your own AWS instance?
It's a custom instance
> I can not ping omxen.dev.mozaws.net
The host is still unknown. It seems not propagated yet. Once propagated we'll probably need a nginx config change
Comment 21•11 years ago
|
||
Yes same here:
> Host omxen.dev.mozaws.net not found: 3(NXDOMAIN)
Assignee | ||
Updated•11 years ago
|
Assignee: nobody → bwong
Assignee | ||
Comment 22•11 years ago
|
||
omxen.dev.mozaws.net should be working now.
[bwong@ip-10-101-151-161 ~]$ dig omxen.dev.mozaws.net
; <<>> DiG 9.8.2rc1-RedHat-9.8.2-0.23.rc1.el6_5.1 <<>> omxen.dev.mozaws.net
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 17941
;; flags: qr rd ra; QUERY: 1, ANSWER: 2, AUTHORITY: 0, ADDITIONAL: 0
;; QUESTION SECTION:
;omxen.dev.mozaws.net. IN A
;; ANSWER SECTION:
omxen.dev.mozaws.net. 45 IN CNAME ec2-54-203-73-122.us-west-2.compute.amazonaws.com.
ec2-54-203-73-122.us-west-2.compute.amazonaws.com. 172785 IN A 54.203.73.122
;; Query time: 0 msec
;; SERVER: 172.16.0.23#53(172.16.0.23)
;; WHEN: Wed Jun 18 19:07:56 2014
;; MSG SIZE rcvd: 117
Comment 23•11 years ago
|
||
Yes tested here, it works thanks.
Comment 24•11 years ago
|
||
ping omxen.dev.mozaws.net
does now resolve to ec2-54-203-73-122.us-west-2.compute.amazonaws.com
In the browser, http://omxen.dev.mozaws.net/
returns "OMXEN SMS GATEWAY"
curl http://omxen.dev.mozaws.net/
OMXEN SMS GATEWAY
curl -I http://omxen.dev.mozaws.net/
HTTP/1.1 200 OK
Server: nginx/1.4.6 (Ubuntu)
Date: Wed, 18 Jun 2014 21:32:43 GMT
Content-Type: text/html; charset=UTF-8
Content-Length: 0
Connection: keep-alive
In the browser, https://omxen.dev.mozaws.net/
returns (eventually) "The connection to omxen.dev.mozaws.net was interrupted while the page was loading."
curl -I https://omxen.dev.mozaws.net/
curl: (35) Server aborted the SSL handshake
We should probably address this...
Comment 25•11 years ago
|
||
For the ping it works here:
> $ ping omxen.dev.mozaws.net
> PING ec2-54-203-73-122.us-west-2.compute.amazonaws.com (54.203.73.122) 56(84) bytes of data.
There is no HTTPS for omxen yet do you want me to setup it on the server?
Flags: needinfo?(bwong)
Comment 26•11 years ago
|
||
:natim well, we should probably address it one way or another.
We are all use to hitting https servers now, so it is possible that people might type in or use https://omxen.dev.mozaws.net/ by accident...
Assignee | ||
Comment 27•11 years ago
|
||
:natim ping works now. You have to allow ICMP traffic in your security group.
Flags: needinfo?(bwong)
Assignee | ||
Comment 28•11 years ago
|
||
Also for omxen:
- should be in us-east-1 (where all of our stage stacks are), unless you want to test east => west AWS latencies
- if you want SSL, use an ELB + (*.stage.mozaws.net) wilcard. This only for SSL termination w/ out moving certs/keys around.
- I can map omxen.stage.mozaws.net DNS to the ELB
Comment 29•11 years ago
|
||
Fernando, you can point at https://msisdn-dev.stage.mozaws.net/ for now in dev.
Future endpoints will be https://msisdn.services.mozilla.com for production and https://msisdn.stage.mozaws.net (not deployed yet).
Assignee | ||
Comment 30•11 years ago
|
||
OK it's in stage now.
- app version 0.3.0-0snap201406261346git1c01a2, github commit: 1c01a2
- 1 x m3.medium (to verify it works/configured correctly)
- puppet-config version: 20140627181006
Status: ASSIGNED → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Assignee | ||
Comment 31•11 years ago
|
||
It's here btw: https://msisdn.stage.mozaws.net/
Comment 33•11 years ago
|
||
I have configured the Nexmo number +12182967993 for stage. It should work from the US for you to test.
Comment 34•11 years ago
|
||
I'd like to run a loadtest to make sure everything works.
Can you tell me what oxmen url is currently in use?
Flags: needinfo?(bwong)
Assignee | ||
Comment 35•11 years ago
|
||
:natim
It's configured with the nexmo credentials you emailed me.
You can ssh into the server via the bastion host. In us-east-1 search for msisdn-gateway-stage in the EC2 instance list.
The app is configured at /data/msisdn_gateway
Comment 36•11 years ago
|
||
So far, so good on the verification of deployment.
There is a bug in the log file names/locations. There seem to be some duplicates:
/var/log/hekad/msisdn_gateway.stderr.log
/var/log/hekad/msisdn_gateway.stdout.log
/var/log/msisdn-gateway_err.log
/var/log/msisdn-gateway_out.log
According to :mostlygeek, these two:
/var/log/msisdn-gateway_err.log
/var/log/msisdn-gateway_out.log
should actually be here:
/media/ephemeral0/msisdn_gateway/
Comment 37•11 years ago
|
||
:natim:
loadtests/loadtest.py: omxen_url = "http://ec2-54-203-73-122.us-west-2.compute.amazonaws.com"
Assignee | ||
Comment 38•11 years ago
|
||
I made a PR to fix the logging locations for circus: https://github.com/mozilla-services/puppet-config/pull/639
Assignee | ||
Comment 39•11 years ago
|
||
:natim
The server starts with a lot of these warnings:
Bad locale=[en_US] missing .key-value-json files in [/data/msisdn-gateway/app/i18n/en_US/messages.json]. See locale/README (Error: Cannot find module '/data/msisdn-gateway/app/i18n/en_US/messages.json')
Bad locale=[en_ZA] missing .key-value-json files in [/data/msisdn-gateway/app/i18n/en_ZA/messages.json]. See locale/README (Error: Cannot find module '/data/msisdn-gateway/app/i18n/en_ZA/messages.json')
Bad locale=[eo] missing .key-value-json files in [/data/msisdn-gateway/app/i18n/eo/messages.json]. See locale/README (Error: Cannot find module '/data/msisdn-gateway/app/i18n/eo/messages.json')
Bad locale=[es] missing .key-value-json files in [/data/msisdn-gateway/app/i18n/es/messages.json]. See locale/README (Error: Cannot find module '/data/msisdn-gateway/app/i18n/es/messages.json')
Bad locale=[es_AR] missing .key-value-json files in [/data/msisdn-gateway/app/i18n/es_AR/messages.json]. See locale/README (Error: Cannot find module '/data/msisdn-gateway/app/i18n/es_AR/messages.json')
Bad locale=[es_CL] missing .key-value-json files in [/data/msisdn-gateway/app/i18n/es_CL/messages.json]. See locale/README (Error: Cannot find module '/data/msisdn-gateway/app/i18n/es_CL/messages.json')
Bad locale=[es_MX] missing .key-value-json files in [/data/msisdn-gateway/app/i18n/es_MX/messages.json]. See locale/README (Error: Cannot find module '/data/msisdn-gateway/app/i18n/es_MX/messages.json')
Bad locale=[et] missing .key-value-json files in [/data/msisdn-gateway/app/i18n/et/messages.json]. See locale/README (Error: Cannot find module '/data/msisdn-gateway/app/i18n/et/messages.json')
Bad locale=[eu] missing .key-value-json files in [/data/msisdn-gateway/app/i18n/eu/messages.json]. See locale/README (Error: Cannot find module '/data/msisdn-gateway/app/i18n/eu/messages.json')
Bad locale=[fa] missing .key-value-json files in [/data/msisdn-gateway/app/i18n/fa/messages.json]. See locale/README (Error: Cannot find module '/data/msisdn-gateway/app/i18n/fa/messages.json')
Bad locale=[ff] missing .key-value-json files in [/data/msisdn-gateway/app/i18n/ff/messages.json]. See locale/README (Error: Cannot find module '/data/msisdn-gateway/app/i18n/ff/messages.json')
Bad locale=[fi] missing .key-value-json files in [/data/msisdn-gateway/app/i18n/fi/messages.json]. See locale/README (Error: Cannot find module '/data/msisdn-gateway/app/i18n/fi/messages.json')
Bad locale=[fr] missing .key-value-json files in [/data/msisdn-gateway/app/i18n/fr/messages.json]. See locale/README (Error: Cannot find module '/data/msisdn-gateway/app/i18n/fr/messages.json')
Bad locale=[fy] missing .key-value-json files in [/data/msisdn-gateway/app/i18n/fy/messages.json]. See locale/README (Error: Cannot find module '/data/msisdn-gateway/app/i18n/fy/messages.json')
Bad locale=[fy_NL] missing .key-value-json files in [/data/msisdn-gateway/app/i18n/fy_NL/messages.json]. See locale/README (Error: Cannot find module '/data/msisdn-gateway/app/i18n/fy_NL/messages.json')
Bad locale=[ga] missing .key-value-json files in [/data/msisdn-gateway/app/i18n/ga/messages.json]. See locale/README (Error: Cannot find module '/data/msisdn-gateway/app/i18n/ga/messages.json')
Bad locale=[ga_IE] missing .key-value-json files in [/data/msisdn-gateway/app/i18n/ga_IE/messages.json]. See locale/README (Error: Cannot find module '/data/msisdn-gateway/app/i18n/ga_IE/messages.json')
Bad locale=[gd] missing .key-value-json files in [/data/msisdn-gateway/app/i18n/gd/messages.json]. See locale/README (Error: Cannot find module '/data/msisdn-gateway/app/i18n/gd/messages.json')
Bad locale=[gl] missing .key-value-json files in [/data/msisdn-gateway/app/i18n/gl/messages.json]. See locale/README (Error: Cannot find module '/data/msisdn-gateway/app/i18n/gl/messages.json')
Bad locale=[gu] missing .key-value-json files in [/data/msisdn-gateway/app/i18n/gu/messages.json]. See locale/README (Error: Cannot find module '/data/msisdn-gateway/app/i18n/gu/messages.json')
Bad locale=[gu_IN] missing .key-value-json files in [/data/msisdn-gateway/app/i18n/gu_IN/messages.json]. See locale/README (Error: Cannot find module '/data/msisdn-gateway/app/i18n/gu_IN/messages.json')
Bad locale=[he] missing .key-value-json files in [/data/msisdn-gateway/app/i18n/he/messages.json]. See locale/README (Error: Cannot find module '/data/msisdn-gateway/app/i18n/he/messages.json')
Bad locale=[hi_IN] missing .key-value-json files in [/data/msisdn-gateway/app/i18n/hi_IN/messages.json]. See locale/README (Error: Cannot find module '/data/msisdn-gateway/app/i18n/hi_IN/messages.json')
Bad locale=[hr] missing .key-value-json files in [/data/msisdn-gateway/app/i18n/hr/messages.json]. See locale/README (Error: Cannot find module '/data/msisdn-gateway/app/i18n/hr/messages.json')
Bad locale=[ht] missing .key-value-json files in [/data/msisdn-gateway/app/i18n/ht/messages.json]. See locale/README (Error: Cannot find module '/data/msisdn-gateway/app/i18n/ht/messages.json')
Bad locale=[hu] missing .key-value-json files in [/data/msisdn-gateway/app/i18n/hu/messages.json]. See locale/README (Error: Cannot find module '/data/msisdn-gateway/app/i18n/hu/messages.json')
How do I get rid of them? I couldn't figure out how to generate the language files and bake them into the RPM for deployment.
Flags: needinfo?(rhubscher)
Comment 40•11 years ago
|
||
This is a configuration variable in the msisdn-gateway server: https://github.com/mozilla-services/msisdn-gateway/blob/master/msisdn-gateway/config.js#L258
For now we only have EN and FR translations that stands here: https://github.com/mozilla-services/msisdn-gateway-l10n
The system is the same as for the fxa-content-server-l10n repository.
Flags: needinfo?(rhubscher)
Assignee | ||
Comment 41•11 years ago
|
||
:natim could you give me some instructions on how to fix it?
- what do I run to generate /data/msisdn-gateway/app/i18n/en_US/messages.json?
- how do I pull the sources in from mozilla-services/msisdn-gateway-l10n and convert those into the .json files?
I can update the config so only "en_US" and "fr" are defined.
Flags: needinfo?(rhubscher)
Comment 42•11 years ago
|
||
Here are the steps to compile the messages:
- you've got to copy the locale directory from https://github.com/mozilla-services/msisdn-gateway-l10n/
- Then you can run *make compile-message* or *./node_modules/.bin/compile-json locale app/i18n*
Ok to configure with en_US and fr for now.
Flags: needinfo?(rhubscher)
Assignee | ||
Updated•11 years ago
|
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Comment 43•11 years ago
|
||
> [rhubscher@ip-10-80-127-202 ~]$ ssh ec2-54-197-86-149.compute-1.amazonaws.com
> Permission denied (publickey,gssapi-keyex,gssapi-with-mic).
Flags: needinfo?(bwong)
Comment 44•11 years ago
|
||
OK. I verified access to the new instance: ec2-54-197-86-149
But, it is unclear to me if the necessary fixes are in
see https://bugzilla.mozilla.org/show_bug.cgi?id=1020899#c41
and https://bugzilla.mozilla.org/show_bug.cgi?id=1020899#c42
:mostlygeek what's our status?
:natim not sure why you don't have access. Did you hop on the Stage Bastion host first?
Comment 45•11 years ago
|
||
Yes I don't know either.
Assignee | ||
Comment 46•11 years ago
|
||
- Waiting on 1032270 to be finished to get the RPM built w/ l10n files baked in.
- natim: if that server is part of the stage cluster I built you need to ssh through our bastion host first: ssh bastion.shared.us-east-1.dev.mozaws.net -p 2222
Flags: needinfo?(bwong)
Assignee | ||
Comment 47•11 years ago
|
||
OK stage has been updated:
- Accounts created by default for natim (rhubscher), alexis and tarek
- Languages/l10n/i18n files are now baked into the RPM
- configs updated for i18n appropriately
- stage is now uses http://omxen.dev.mozaws.net
Once :jbonacci verifies w/ tests I'll push it to prod and point https://msisdn.services.mozilla.com at it
Status: REOPENED → RESOLVED
Closed: 11 years ago → 11 years ago
Resolution: --- → FIXED
Comment 48•11 years ago
|
||
:mostlygeek that will be tomorrow at the earliest
I need several hours for this very first test of msisdn-gateway in Stage.
Adjust your schedule accordingly ;-)
Will start looking at this after the QA meeting.
Comment 49•11 years ago
|
||
Took a bit of hunting around on the single instance, but I assume the important information is stored here:
/data/msisdn-gateway/config/production.json
Comment 50•11 years ago
|
||
OK, besides the above config file, I verified the following:
AWS CF stack, ELB, and a single m3.medium instance: ec2-54-204-72-36
Versions installed:
msisdn-gateway-svcops 0.3.0-0snap201407021023gite74cc9 x86_64 49842124
puppet-config-msisdn 20140702184605-1 x86_64 9669
Checked out the processes and the files.
Checked out all the new logs:
/media/ephemeral0/msisdn-gateway/msisdn-gateway_err.log
/media/ephemeral0/msisdn-gateway/msisdn-gateway_out.log
/media/ephemeral0/nginx/logsdefault.access.log (not in use)
/media/ephemeral0/nginx/logsdefault.error.log (not in use)
/media/ephemeral0/nginx/logsmsisdn-gateway.access.log
/media/ephemeral0/nginx/logsmsisdn-gateway.error.log
/var/log/circus.log
/var/log/hekad/msisdn_gateway.stderr.log
/var/log/hekad/msisdn_gateway.stdout.log
curl https://msisdn.stage.mozaws.net
returns
{"name":"mozilla-msisdn-gateway","description":"The Mozilla MSISDN Gateway","version":"0.3.0-DEV","homepage":"https://github.com/mozilla-services/msisdn-gateway/","endpoint":"http://msisdn.stage.mozaws.net"}
curl -I https://msisdn.stage.mozaws.net
returns
HTTP/1.1 200 OK
Content-length: 207
Content-Type: application/json; charset=utf-8
Date: Thu, 03 Jul 2014 00:38:55 GMT
ETag: W/"cf-2959630074"
Timestamp: 1404347935320
Connection: keep-alive
Comment 51•11 years ago
|
||
OK. Some progress here. But I need help debugging.
Assuming the deploy is correct and the configuration file is correct (see https://bugzilla.mozilla.org/show_bug.cgi?id=1020899#c49)
I tried this:
make test SERVER_URL=https://msisdn.stage.mozaws.net
and got the following error:
IndexError: list index out of range Traceback:
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/unittest/case.py", line 331, in run
testMethod()
File "loadtest.py", line 43, in test_all
message = self.read_message()
File "loadtest.py", line 144, in read_message
return messages[0]["text"]
A bit more detail: https://jbonacci.pastebin.mozilla.org/5510856
I will hold here until we can figure this out...
Comment 52•11 years ago
|
||
I can confirm that I know have access to the stage VirtualMachine.
Also the production.json configuration file is missing the `protocol: "https"` configuration for Hmac to work.
I can reproduce James error so I will investigate today to see why oxmen doesn't grab our messages.
Thank you very much guys, we are almost there.
Comment 53•11 years ago
|
||
Ok James, I have fixed this.
With a configuration modification we were actually configuring Leonix for french number with no credentials. I made a patch for that.
Comment 54•11 years ago
|
||
We probabely want to add
stdout_stream.time_format = [%Y/%m/%d | %H:%M:%S]
and
stderr_stream.time_format = [%Y/%m/%d | %H:%M:%S]
To our circus.ini files.
Also you can drop this config that doesn't exists anymore:
stdout_stream.refresh_time = 0.5
stderr_stream.refresh_time = 0.5
Flags: needinfo?(bwong)
Comment 55•11 years ago
|
||
Here is my first loadtests attempt:
OMXEN_URL=http://omxen.dev.mozaws.net ./venv/bin/loads-runner --config=./config/bench.ini --server-url=https://msisdn.stage.mozaws.net loadtest.TestMSISDN.test_all
USING http://omxen.dev.mozaws.net OMXEN endpoint
[==============================================================================================] 100%
Duration: 60.05 seconds
Hits: 5275
Started: 2014-07-03 07:42:28.304685
Approximate Average RPS: 87
Average request time: 0.23s
Opened web sockets: 0
Bytes received via web sockets : 0
Success: 786
Errors: 0
Failures: 0
Slowest URL: http://54.203.73.122:80/receive?to=33340639441 Average Request Time: 1.497538
Stats by URLs (10 slowests):
- http://54.203.73.122:80/receive?to=33340639441 Average request time: 1.497538 Hits success rate: 1.0
- http://54.203.73.122:80/receive?to=33926114841 Average request time: 1.29362 Hits success rate: 1.0
- http://54.203.73.122:80/receive?to=33862351510 Average request time: 1.249569 Hits success rate: 1.0
- http://54.203.73.122:80/receive?to=33050170256 Average request time: 1.238933 Hits success rate: 1.0
- http://54.203.73.122:80/receive?to=33168810187 Average request time: 1.235262 Hits success rate: 1.0
- http://54.203.73.122:80/receive?to=33674270298 Average request time: 1.224954 Hits success rate: 1.0
- http://54.203.73.122:80/receive?to=33376007350 Average request time: 1.219644 Hits success rate: 1.0
- http://54.203.73.122:80/receive?to=33709582369 Average request time: 1.217724 Hits success rate: 1.0
- http://54.203.73.122:80/receive?to=33531442687 Average request time: 1.214007 Hits success rate: 1.0
- http://54.203.73.122:80/receive?to=33538209298 Average request time: 1.210075 Hits success rate: 1.0
Custom metrics:
- mt-flow : 382
- ask-for-certificate : 511
- try-wrong-code : 282
- try-right-code : 513
- momt-flow : 416
Flags: needinfo?(bwong)
Comment 56•11 years ago
|
||
Investigating why loadtests doesn't works on the loads cluster.
Comment 57•11 years ago
|
||
Ok I found it, the date wasn't accurate on loads-master and loads-slave.
It is now working: https://loads.services.mozilla.com/run/856a1599-9a20-4b9c-b0fb-dc23139915ca
Reporter | ||
Comment 58•11 years ago
|
||
Comment 59•11 years ago
|
||
Ok I fixed the server and ran a 30 minutes loadtest with no errors:
https://loads.services.mozilla.com/run/0d1a9d4e-f9ab-421d-9ec0-a6bc177fdeeb
We need to make sure the configuration is right.
I am releasing 0.5.0 so we set it live for production.
Comment 60•11 years ago
|
||
msisdn-gateway 0.3.0 released — https://github.com/mozilla-services/msisdn-gateway/releases/tag/0.3.0
Assignee | ||
Comment 61•11 years ago
|
||
OK new version of stage deployed
- added "protocol" configuration
- tweaked other config ops to match :natim's changes on the old stage server
- deployed 0.3.0-1 (offical release) of app
Comment 62•11 years ago
|
||
Verified the new instance is up
Version:
msisdn-gateway-svcops 0.3.0-1 x86_64 49828972
puppet-config-msisdn 20140703171505-1 x86_64 9919
Everything else looks good, so I am focusing on load testing...
Comment 63•11 years ago
|
||
This is working now:
make test SERVER_URL=https://msisdn.stage.mozaws.net
`\o/`
:natim, where can I get the complete list of custom metrics?
mt-flow
ask-for-certificate
try-wrong-code
try-right-code
momt-flow
That is all I have seen so far (and not all of them all the time)...
Comment 64•11 years ago
|
||
I am glad to see this works as well:
make bench SERVER_URL=https://msisdn.stage.mozaws.net
./venv/bin/loads-runner --config=./config/bench.ini --server-url=https://msisdn.stage.mozaws.net loadtest.TestMSISDN.test_all
[==============================================================================================] 100%
Duration: 300.04 seconds
Hits: 45578
Started: 2014-07-03 19:41:11.607557
Approximate Average RPS: 151
Average request time: 0.12s
Opened web sockets: 0
Bytes received via web sockets : 0
Success: 6839
Errors: 0
Failures: 0
Slowest URL: https://msisdn.stage.mozaws.net/sms/momt/nexmo_callback?msisdn=BLAH Average Request Time: 8.436277
Stats by URLs (10 slowests):
- https://msisdn.stage.mozaws.net/sms/momt/nexmo_callback?msisdn=BLAH&text=%2Fsms%2Fmomt%2Fverify+BLAH
Average request time: 8.436277
Hits success rate: 1.0
- http://54.203.73.122:80/receive?to=BLAH
Average request time: 0.504357
Hits success rate: 1.0
- http://54.203.73.122:80/receive?to=BLAH
Average request time: 0.501866
Hits success rate: 1.0
- https://msisdn.stage.mozaws.net/sms/momt/nexmo_callback?msisdn=BLAH&text=%2Fsms%2Fmomt%2Fverify+BLAH
Average request time: 0.485666
Hits success rate: 1.0
- https://msisdn.stage.mozaws.net/sms/momt/nexmo_callback?msisdn=BLAH&text=%2Fsms%2Fmomt%2Fverify+BLAH
Average request time: 0.439516
Hits success rate: 1.0
- https://msisdn.stage.mozaws.net/sms/momt/nexmo_callback?msisdn=BLAH&text=%2Fsms%2Fmomt%2Fverify+BLAH
Average request time: 0.43656
Hits success rate: 1.0
- http://54.203.73.122:80/receive?to=BLAH
Average request time: 0.413988
Hits success rate: 1.0
- http://54.203.73.122:80/receive?to=BLAH
Average request time: 0.410544
Hits success rate: 1.0
- https://msisdn.stage.mozaws.net/sms/momt/nexmo_callback?msisdn=BLAH&text=%2Fsms%2Fmomt%2Fverify+BLAH
Average request time: 0.404122
Hits success rate: 1.0
- http://54.203.73.122:80/receive?to=BLAH
Average request time: 0.385496
Hits success rate: 1.0
Custom metrics:
- mt-flow : 3369
- ask-for-certificate : 4499
- momt-flow : 3483
- try-right-code : 4499
- try-wrong-code : 2345
Comment 65•11 years ago
|
||
OK, the first 10min run against the Loads Cluster was a success:
Link: https://loads.services.mozilla.com/run/ab42cba8-ef46-4f4e-b510-dde144e15f9a
Tests over 31470
Successes 31470
Failures 0
Errors 0
TCP Hits 209556
Opened web sockets 0
Total web sockets 0
Bytes/websockets 0
Requests / second (RPS) 341
Custom metrics
mt-flow 15434
ask-for-certificate 20472
momt-flow 16115
try-right-code 20485
try-wrong-code 11023
Comparing the metrics breakdown against what is defined in the loadtest:
https://github.com/mozilla-services/msisdn-gateway/blob/master/loadtests/loadtest.py#L19-L22
PERCENTAGE_OF_MT_FLOW = 50 # Remining are MOMT flows
PERCENTAGE_OF_WRONG_CODES = 34 # Remining are valid ones.
PERCENTAGE_OF_SHORT_CODES = 50 # Remining are right ones.
MAX_OMXEN_TIMEOUT = 2 # Seconds to poll from omxen.
Not really easy to do a side-by-side, so asking Dev to look at those metrics and see if that is about what is expected for a 10min test. Looks about right to me.
Moving on to a 30min, then a 60min, followed by a log review.
Comment 66•11 years ago
|
||
And the 30min:
https://loads.services.mozilla.com/run/d9a11da3-66ff-4048-a5e1-0ea09a683eac
Results
Tests over 91318
Successes 91318
Failures 0
Errors 0
TCP Hits 607780
Opened web sockets 0
Total web sockets 0
Bytes/websockets 0
Requests / second (RPS) 335
Custom metrics
mt-flow 45458
ask-for-certificate 59581
momt-flow 45945
try-right-code 59593
try-wrong-code 31774
Comment 67•11 years ago
|
||
Not sure of the cause, but early into the 60min load test, I am seeing all kinds of errors/failures:
Tests over 83340
Successes 54171
Failures 29165
Errors 4
TCP Hits 390165
Opened web sockets 0
Total web sockets 0
Bytes/websockets 0
Requests / second (RPS) 375
Custom metrics
omxen-message-collision 7
And also:
4 occurrences:
No JSON object could be decoded
File "/usr/lib/python2.7/unittest/case.py", line 332, in run
testMethod()
File "loadtest.py", line 30, in test_all
self.register()
File "loadtest.py", line 85, in register
sessionToken = resp.json()['msisdnSessionToken']
File "/home/ubuntu/loads/local/lib/python2.7/site-packages/requests-2.2.1-py2.7.egg/requests/models.py", line 741, in json
return json.loads(self.text, **kwargs)
File "/usr/lib/python2.7/json/__init__.py", line 328, in loads
return _default_decoder.decode(s)
File "/usr/lib/python2.7/json/decoder.py", line 365, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/usr/lib/python2.7/json/decoder.py", line 383, in raw_decode
raise ValueError("No JSON object could be decoded")
Comment 68•11 years ago
|
||
And this, of course:
addFailure 29165
Killing the test to investigate...
Comment 69•11 years ago
|
||
Trolling the Logs:
Nothing useful in /media/ephemeral0/msisdn-gateway
msisdn-gateway_err.log is 0
msisdn-gateway_out.log has all the mocked call data
/media/ephemeral0/nginx/logs
default.access.log and default.error.log are 0
msisdn-gateway.error.log is 0
msisdn-gateway.access.log has the usual 200s and heartbeat messages
plus a high number of 400s for the following requests: /sms/verify_code
Example:
1404416475.120 "24.7.94.153" "POST /sms/verify_code HTTP/1.1" 400 46 "-" "python-requests/2.3.0 CPython/2.7.5 Darwin/13.3.0" 0.006 0.006 "-"
about 73704 of these, if I am counting correctly
4652 of these are from this IP: 24.7.94.153
36599 of these are from this IP: 54.218.210.15
32453 of these are from this IP: 54.245.44.231
and a few 401s
/var/log/circus.log is normal
/var/log/hekad/msisdn_gateway.stdout.log is 0
/var/log/hekad/msisdn_gateway.stderr.log has the usual heka messaging
That's all I can find. Not sure why 30min load tests were more or less clean and 24min into a 60, I see all the errors and addFailures.
Dev to debug...
Reporter | ||
Comment 70•11 years ago
|
||
Looks lik we are missing logs in the msisdn app when we get "collisions" at least. What do you think Remy?
Flags: needinfo?(rhubscher)
Comment 71•11 years ago
|
||
Hello,
About the metrics, you've got all the metrics here: https://github.com/mozilla-services/msisdn-gateway/blob/master/loadtests/loadtest.py
mt-flow
momt-flow
try-wrong-code
try-right-code
ask-for-certificate
omxen-message-collision
The addFailure is added automatically by loads.
Also we don't have all the debug informations because the loads dashboard doesn't explain failures.
The random configuration helps to try all kind of scenarios running test_all.
It is normal to have 400 in case of wrong code scenario.
Collisions are not handled by the msisdn app but by the loadtest
(In case we tried a right-code but get a 400 invalid code)
To me this looks like you've got the wrong omxen configured in your loadtest file.
Flags: needinfo?(rhubscher)
Comment 72•11 years ago
|
||
jbonnacci, could you try with the last version of loadtests.py?
Comment 73•11 years ago
|
||
Reporter | ||
Comment 74•11 years ago
|
||
note to everyone: loads.services.mozilla.com is password protected - let me know if you want an access
Comment 75•11 years ago
|
||
:natim
Hmmmmm, I followed your link in Comment 73. It's interesting that you got the load test to run and I did not. Hopefully, this means the latest fix is in and working as you expected.
So, yea, I will try the latest loadtests.py.
Looking at the repo, I am assuming you mean the latest based on this commit:
https://github.com/mozilla-services/msisdn-gateway/commit/d1d2f4fa3c17273db287d8c76246fe7231550f31
I will give it a try on Monday!
Comment 76•11 years ago
|
||
Yes that's it :)
Comment 77•11 years ago
|
||
It seems that the circus configuration as not been updated.
- Removing refresh_time and adding time_filter
- stdout_stream.refresh_time = 0.5
+ stdout_stream.time_format = [%Y/%m/%d | %H:%M:%S]
- stderr_stream.refresh_time = 0.5
+ stderr_stream.time_format = [%Y/%m/%d | %H:%M:%S]
Comment 78•11 years ago
|
||
:natim - is the above (Comment 77) in a new commit?
Or, in other words, is the load test ready to be used?
Comment 79•11 years ago
|
||
Comment 77 is not directly related to the loadtest but related to the circus configuration.
Also it may help if we need to investigate loadtest errors.
Assignee | ||
Comment 80•11 years ago
|
||
(In reply to Rémy Hubscher (:natim) from comment #77)
> It seems that the circus configuration as not been updated.
>
> - Removing refresh_time and adding time_filter
>
> - stdout_stream.refresh_time = 0.5
> + stdout_stream.time_format = [%Y/%m/%d | %H:%M:%S]
> - stderr_stream.refresh_time = 0.5
> + stderr_stream.time_format = [%Y/%m/%d | %H:%M:%S]
Won't do (for now). We have a standard circus module that we deploy with. If these are not affecting operation we'll change them later as multiple projects depend on the core configuration. For example we'd need to change the heka configurations as well to match.
Comment 81•11 years ago
|
||
That's fine.
:natim - I think this bug is getting too long and deviating from its original intent. ;-)
I will run load tests today and open new bugs as needed related directly to
1. the load test
2. the msisdn-gateway code
:mostlygeek - I think we hammered enough on Stage to show that the deploy was good.
Let's move on to Production.
Status: RESOLVED → VERIFIED
Comment 82•11 years ago
|
||
BLEH! bug 1035459
I should have checked before marking this one Verified.
You need to log in
before you can comment on or make changes to this bug.
Description
•