Closed
Bug 1035960
Opened 11 years ago
Closed 11 years ago
Deploy Release 0.3.1 to msisdn-gateway Stage
Categories
(Cloud Services :: Operations: Deployment Requests - DEPRECATED, task)
Cloud Services
Operations: Deployment Requests - DEPRECATED
Tracking
(Not tracked)
VERIFIED
FIXED
People
(Reporter: jbonacci, Assigned: mostlygeek)
References
Details
(Whiteboard: [qa+])
| Reporter | ||
Updated•11 years ago
|
Whiteboard: [qa+]
| Assignee | ||
Comment 1•11 years ago
|
||
Stage stack: https://msisdn.stage.mozaws.net updated to 0.3.1
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
| Reporter | ||
Comment 2•11 years ago
|
||
OK, verified the deployment to a single m3.medium instance: ec2-54-198-16-11
Versions
msisdn-gateway-svcops 0.3.1-1 x86_64 49860025
puppet-config-msisdn 20140703171505-1 x86_64 9919
Process
app node msisdn-gateway/index.js
/data/msisdn-gateway/config/production.json
is set up to use the mock server http://omxen.dev.mozaws.net/
https://msisdn.stage.mozaws.net
returns;
{"name":"mozilla-msisdn-gateway","description":"The Mozilla MSISDN Gateway","version":"0.3.1",
"homepage":"https://github.com/mozilla-services/msisdn-gateway/","endpoint":"https://msisdn.stage.mozaws.net"}
https://msisdn.stage.mozaws.net/__heartbeat__
returns:
{"storage":true}
Will set up some load testing soon...
| Reporter | ||
Comment 3•11 years ago
|
||
'make test' and 'make bench' from Mac to Stage were successful.
Will attempt a short 'make megabench' load test a bit later this evening.
Opened some bugs against Ubuntu and/or RHEL for load test issues...
| Reporter | ||
Comment 4•11 years ago
|
||
Two short load tests look good.
I will continue this tomorrow (Wednesday).
| Reporter | ||
Comment 5•11 years ago
|
||
OK, well second short load test ran to completion, but there were errors:
https://loads.services.mozilla.com/run/f969bcbb-035e-4c5b-ace8-504c5fd32b7f
I will investigate tomorrow...
| Reporter | ||
Comment 6•11 years ago
|
||
First 5 runs (all short) - in chronological order
RUN 1: https://loads.services.mozilla.com/run/36437a16-cf44-4d65-8b4c-0dd84bb994cd
NO APPARENT ERRORS
Results
Tests over 10000
Successes 10000
Failures 0
Errors 0
RUN 2: https://loads.services.mozilla.com/run/f969bcbb-035e-4c5b-ace8-504c5fd32b7f
385 POSSIBLE ERRORS
Results
Tests over 100000
Successes 99615
Failures 0
Errors 0
addFailure 385
RUN 3: https://loads.services.mozilla.com/run/dab52058-a7c2-4e38-99d8-d730557ed290
2616 POSSIBLE ERRORS
Results
Tests over 10000
Successes 7384
Failures 0
Errors 0
addFailure 2616
RUN 4: https://loads.services.mozilla.com/run/394f0abd-2096-4b37-9774-018663cb4a5e
NO APPARENT ERRORS
Results
Tests over 49850
Successes 49850
Failures 0
Errors 0
RUN 5: https://loads.services.mozilla.com/run/6038bad3-824e-4810-9b66-a0e39de2adad
2899 POSSIBLE ERRORS
Results
Tests over 10000
Successes 7101
Failures 0
Errors 0
addFailure 2899
I will investigate results in the logs.
After that, it looks like I will need to run a longer load test while monitoring the logs to see what I catch...
| Reporter | ||
Comment 7•11 years ago
|
||
OK, well yuck. This will need to be looked at:
/media/ephemeral0/msisdn-gateway/msisdn-gateway_err.log
has traceback in it:
events.js:72
throw er; // Unhandled 'error' event
^
TypeError: Cannot call method 'toString' of undefined
at SecretBox.self.decrypt (/data/msisdn-gateway/node_modules/sodium/lib/secretbox.js:149:30)
at Object.decrypt (/data/msisdn-gateway/msisdn-gateway/encrypt.js:40:14)
at /data/msisdn-gateway/msisdn-gateway/index.js:573:34
at /data/msisdn-gateway/msisdn-gateway/storage/redis.js:211:13
at try_callback (/data/msisdn-gateway/node_modules/redis/index.js:573:9)
at RedisClient.return_reply (/data/msisdn-gateway/node_modules/redis/index.js:661:13)
at HiredisReplyParser.<anonymous> (/data/msisdn-gateway/node_modules/redis/index.js:309:14)
at HiredisReplyParser.emit (events.js:95:17)
at HiredisReplyParser.execute (/data/msisdn-gateway/node_modules/redis/lib/parser/hiredis.js:43:18)
at RedisClient.on_data (/data/msisdn-gateway/node_modules/redis/index.js:534:27)
at Socket.<anonymous> (/data/msisdn-gateway/node_modules/redis/index.js:91:14)
at Socket.emit (events.js:95:17)
at Socket.<anonymous> (_stream_readable.js:748:14)
at Socket.emit (events.js:92:17)
at emitReadable_ (_stream_readable.js:410:10)
at emitReadable (_stream_readable.js:406:5)
at readableAddChunk (_stream_readable.js:168:9)
at Socket.Readable.push (_stream_readable.js:130:10)
at TCP.onread (net.js:528:21)
And, well, this is interesting. As of the moment I checked, the nginx logs had not been updated for 20 hours. So, it looks like some logging is not working right in Stage?
Are there any other/new logs that I don't know about?
So, that's it for first round of Stage testing - I can't pass this w/o some understanding of what's going on with the traceback and the logging (or lack thereof).
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
| Assignee | ||
Comment 8•11 years ago
|
||
re: logging. Yes there was a log_rotate bug. See: https://github.com/mozilla-services/puppet-config/pull/664
I'll need to redeploy the box to fix it. Nothing else has changed.
re: stack dump, that looks like something for the devs.
| Assignee | ||
Comment 9•11 years ago
|
||
OK msisdn-stage has been re-deployed fixing the logging bug.
| Reporter | ||
Comment 10•11 years ago
|
||
OK, thanks for the logging fix.
So, I am just blocked on the traceback info shown in
https://bugzilla.mozilla.org/show_bug.cgi?id=1035960#c7
Comment 11•11 years ago
|
||
We need to have the detail of the failures in the loads dashboard and put back the timestamp on circus logs.
| Reporter | ||
Comment 12•11 years ago
|
||
:natim, so you want more information than what is in https://bugzilla.mozilla.org/show_bug.cgi?id=1035960#c6 ?
Comment 13•11 years ago
|
||
Yes but loads doesn't provide it for you. It is tracked here: https://github.com/mozilla-services/loads/issues/268
Comment 14•11 years ago
|
||
I just merged https://github.com/mozilla-services/msisdn-gateway/commit/ef4a34291392a0719a4fc012c7d0b1f4b06a5961
We should probably test to run loadtests with DynamoDB configured.
Also :tarek fixed the failure display inside the loads cluster.
Comment 15•11 years ago
|
||
Ok so I could reproduce the matter and see the problem.
https://loads.services.mozilla.com/run/b3cbd9f1-b8f2-415b-981b-54fcac0a7f7b
We've got a 302 on this call: https://github.com/mozilla-services/msisdn-gateway/blob/master/loadtests/loadtest.py#L113-L120
| Assignee | ||
Comment 16•11 years ago
|
||
(In reply to Rémy Hubscher (:natim) from comment #14)
> I just merged
> https://github.com/mozilla-services/msisdn-gateway/commit/
> ef4a34291392a0719a4fc012c7d0b1f4b06a5961
>
> We should probably test to run loadtests with DynamoDB configured.
>
> Also :tarek fixed the failure display inside the loads cluster.
0.4.0? I don't think we should mix up the deploy tickets. 0.3.0 is in prod and it is broken so we should at least get this one out before blocking ourselves with testing a dynamodb backend.
| Assignee | ||
Updated•11 years ago
|
Assignee: nobody → bwong
| Reporter | ||
Comment 17•11 years ago
|
||
Yea, I prefer one change at a time since we have not released to Prod in awhile.
But, I also prefer to wait for a possible fix to the issue I found in Comment 7, reproduced in Comment 15.
Comment 18•11 years ago
|
||
Yeah I agree. That said, the dynamo backend won't really get in the way if the config points to the redis one
| Reporter | ||
Comment 19•11 years ago
|
||
OK, so for the traceback and 302s, I opened this new bug: bug 1037604
Consider it a blocker for any further deployments and testing in Stage and Production.
Separately, let's hold off on the changes for dynamo support unless this is a high-priority change.
Comment 20•11 years ago
|
||
I think you mean 502 rather than 302 right?
| Reporter | ||
Comment 21•11 years ago
|
||
That reference was pulled from here: https://bugzilla.mozilla.org/show_bug.cgi?id=1035960#c15
But, yea, based on your test results - https://loads.services.mozilla.com/run/b3cbd9f1-b8f2-415b-981b-54fcac0a7f7b - it should be 502, not 302.
Two issues to address:
The traceback I found in the logs
The 502s you found in the loads dashboard
| Reporter | ||
Comment 22•11 years ago
|
||
Not going to morph this bug.
Resolving as incomplete so we can move on to 0.4.1.
Status: REOPENED → RESOLVED
Closed: 11 years ago → 11 years ago
Resolution: --- → INCOMPLETE
| Reporter | ||
Comment 23•11 years ago
|
||
Switching over to Verified since we are quickly deploying this to Prod and moving right to 0.4.1.
Status: RESOLVED → VERIFIED
| Reporter | ||
Updated•11 years ago
|
Resolution: INCOMPLETE → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•