Closed
Bug 1094587
Opened 11 years ago
Closed 11 years ago
Please deploy server-syncstorage 1.5.11 to stage
Categories
(Cloud Services :: Operations: Deployment Requests - DEPRECATED, task)
Cloud Services
Operations: Deployment Requests - DEPRECATED
Tracking
(Not tracked)
VERIFIED
FIXED
People
(Reporter: rfkelly, Assigned: bobm)
References
Details
(Whiteboard: [qa+])
This version of syncstorage tweaks some logging and adds a potential workaround for an apparent TokuDB bug:
Bug 1063284 - Log at higher level when killing interrupted sql commands
Bug 1057892 Comment 32 - workaround for TokuDB errors on PUT /meta/global
This will hopefully reduce the level of 503s due to Bug 1057892 to close to zero.
Please deploy to stage and do a loadtest run in preparation for production deployment. If possible let's prioritize this over the tokenserver deploy in Bug 1091313 so that we can get the workaround shipped.
Reporter | ||
Comment 1•11 years ago
|
||
Also worth noting, we should go back to the prod version of MariaDB for this test, not the updated version from Bug 1089945. Otherwise, if the errors do stop, we won't know what actually fixed them.
Assignee | ||
Updated•11 years ago
|
Assignee: nobody → bobm
Status: NEW → ASSIGNED
Comment 2•11 years ago
|
||
Do we have an ETA when this deployment will be done? Would be nice to know. Thanks.
Reporter | ||
Comment 3•11 years ago
|
||
ni? :bobm, but I think he's out for a conference this week.
Flags: needinfo?(bobm)
Comment 4•11 years ago
|
||
It's 8 days later and still no update yet. Is Bob really the only person who can deploy changes? We would really appreciate if that can go live soon. Thanks.
Reporter | ||
Comment 6•11 years ago
|
||
Sorry for the delay here Henrik, we're running light on QA resources at the moment so deploys are taking longer than usual to go through. Bob and I will see about doing a bit of our own smoketesting of this deploy in stage.
Are you able to redirect the failing TPS tests to our stage server, to see if this change helps with the issues you are seeing? The stage sync endpoint is https://token.stage.mozaws.net/1.0/sync/1.5, and the stage FxA endpoints are listed at https://developer.mozilla.org/en-US/Firefox_Accounts#Stage
Flags: needinfo?(hskupin)
Comment 7•11 years ago
|
||
So the only thing I would have to do is to set the PUBLIC_URL environment variable to
https://token.stage.mozaws.net/1.0/sync/1.5 as instructed in the following comment?
https://github.com/mozilla/fxa-python-client/issues/25#issuecomment-60857774
Flags: needinfo?(hskupin)
Assignee | ||
Updated•11 years ago
|
QA Contact: kthiessen
Reporter | ||
Comment 8•11 years ago
|
||
> So the only thing I would have to do is to set the PUBLIC_URL environment variable to
> https://token.stage.mozaws.net/1.0/sync/1.5 as instructed in the following comment?
No, this will change the FxA instance used by the tests, but not the sync instance. (And now that I think about it, using production FxA with stage sync should work just fine).
How do you tests discover what sync server to use? Do they just use the default value built into firefox?
Reporter | ||
Comment 9•11 years ago
|
||
(I mean specifically the failing tests in Bug 1066493)
Reporter | ||
Comment 10•11 years ago
|
||
Assuming it's actually done through firefox, this would involve setting the about:config setting "services.sync.tokenServerURI" to https://token.stage.mozaws.net/1.0/sync/1.5
Reporter | ||
Comment 11•11 years ago
|
||
After a few hiccups gettings loads to work, I have finally kicked off a basic tokenserver+sync loadtest against this stack:
https://loads.services.mozilla.com/run/cfe5936f-187a-41bc-bade-f728cd8b0d01
Reporter | ||
Comment 12•11 years ago
|
||
Aaand it's giving a bunch of errors, so I killed it. Will dig in...
Reporter | ||
Comment 13•11 years ago
|
||
I'm getting DNS errors while running the loadtest, and it appears to be trying to connect to this following endpoint:
https:sync-1-us-east-1.stage.mozaws.net/1.5/17194
Which is malformed, should be "https://" rather than just "http:". Bob, can you please check the node definitions in the tokenserver database?
Assignee | ||
Comment 14•11 years ago
|
||
(In reply to Ryan Kelly [:rfkelly] from comment #13)
> I'm getting DNS errors while running the loadtest, and it appears to be
> trying to connect to this following endpoint:
>
> https:sync-1-us-east-1.stage.mozaws.net/1.5/17194
>
> Which is malformed, should be "https://" rather than just "http:". Bob, can
> you please check the node definitions in the tokenserver database?
It was just sync 1 for some reason. It has been fixed.
+-------------------------------------------+
| node |
+-------------------------------------------+
| https://sync-0-us-east-1.stage.mozaws.net |
| https://sync-1-us-east-1.stage.mozaws.net |
| https://sync-2-us-east-1.stage.mozaws.net |
| https://sync-3-us-east-1.stage.mozaws.net |
| https://sync-4-us-east-1.stage.mozaws.net |
+-------------------------------------------+
Reporter | ||
Comment 15•11 years ago
|
||
Thanks Bob, local tests look healthier, I've kicked off a fresh loads run here:
https://loads.services.mozilla.com/run/dc12e2d5-43fb-444b-95e8-d39b31038e14
Reporter | ||
Comment 16•11 years ago
|
||
Loadtest shows a number of 503s from the tokenserver, but no errors from the sync nodes. I'll try an isolated loadtest run against sync-1 for comparison; see https://loads.services.mozilla.com/run/439750b5-4d14-4ded-8907-26b4ea6678e6
Reporter | ||
Comment 17•11 years ago
|
||
The single-node test looks good, zero errors and solidly above 350 RPS. Errors in the combined test most likely due to something on the tokenserver side, which we can dive back into in QA for Bug 1091313.
Reporter | ||
Comment 18•11 years ago
|
||
Marking this fixed, after discussion with Bob and Karl in IRC today we're happy to roll this to production.
Status: ASSIGNED → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Comment 19•11 years ago
|
||
Hi Ryan. Sorry that I haven't had the time the last days to check this change with our CI. It's a bit complicated to get the staging sync server to be used. I wonder if I should still do a check, of if I should have a look at our tests once the change on this bug has been deployed to production?
Flags: needinfo?(rfkelly)
Comment 20•11 years ago
|
||
(In reply to Ryan Kelly [:rfkelly] from comment #18)
> Marking this fixed, after discussion with Bob and Karl in IRC today we're
> happy to roll this to production.
In most other projects I'm involved in here, this means that I should now file the production deploy ticket. Is that true here as well, or is there a different process to follow?
Reporter | ||
Comment 21•11 years ago
|
||
No worries Henrik, at this stage it sounds like it'll be easier to just wait for the production deploy. Hopefully it will actually have the desired effect.
Karl, yes, please mark this as RESOLVED/VERIFIED and file a follow-up production bug.
Flags: needinfo?(rfkelly)
You need to log in
before you can comment on or make changes to this bug.
Description
•