Closed
Bug 1515231
Opened 6 years ago
Closed 6 years ago
Slow response from cloud-mirror.taskcluster.net
Categories
(Taskcluster :: Operations and Service Requests, task)
Taskcluster
Operations and Service Requests
Tracking
(Not tracked)
RESOLVED
WORKSFORME
People
(Reporter: nthomas, Unassigned)
References
Details
Transcript from #taskcluster:
14:26 <nthomas> https://cloud-mirror.taskcluster.net/v1/redirect/s3/us-east-1/https%3A%2F%2Fs3-us-west-2.amazonaws.com%2Ftaskcluster-public-artifacts%2FAhRzGGkrQ1yNq1YaomBESg%2F0%2Fpublic%2Fparameters.yml seems to take ~30s to return a 302
14:27 <nthomas> this is from a releng AWS instance making an initial request for https://queue.taskcluster.net/v1/task/AhRzGGkrQ1yNq1YaomBESg/artifacts/public/parameters.yml
14:42 <bstack> ooh, I haven't seen cloud-mirror in a while
14:42 <bstack> let me try looking into it
14:43 <nthomas> wondered if it's a dyno that's run out of memory, or similar
14:44 <bstack> "redis connection failed" in logs
14:44 <bstack> I'll restart. give me a sec
14:44 <bstack> this happens with terraform for this service so it's a bit more than a quick click
14:44 <nthomas> huh interesting, we've had that in release-services too
14:47 <bstack> I think heroku has been doing redis maintenance recently
14:47 <bstack> some of our services are smart about it and others aren't
14:51 <bstack> restarting now
14:57 <bstack> nthomas: does it look ok now?
14:57 <nthomas> bstack: seems about the same, unfortunately
14:58 <bstack> shoot
14:58 <bstack> I appear to have lost access to the heroku cloud mirror bits
14:58 <bstack> if those still exist
14:58 <nthomas> it's non-blocking, just seemed out of normal
14:58 <bstack> it definitely does feel unusual
14:58 <bstack> mind making a bug and cc jh.ford?
14:59 <nthomas> how much does cloud-mirror get used these days ?
14:59 <bstack> if it is non blocking he can look in his morning
14:59 <nthomas> sure
14:59 <bstack> I'm honestly not sure
14:59 <bstack> this is a part of tc that I don't have much familiarity with
jhford, any idea what the problem is ?
Flags: needinfo?(jhford)
Comment 1•6 years ago
|
||
This is almost certainly copier nodes being down, but I'm not sure how to diagnose that.
I cannot access Papertrail because it seems to be redirecting me to some other site and I don't know how else to see logs for the copier nodes. I tried running terraform, but it is generating a bunch of 403 and other permissions problems.
I'm not sure how to diagnose this. Brian, any hints on how I could diagnose this?
One question I have for the static service deploys using terraform, does it verify that if credentials change that the hosts are redeployed or is it just checking that I have N copies of the copier running, and this is how to create a new one?
Do we have this running in the background to ensure services which crash are restarted?
Flags: needinfo?(jhford)
Comment 2•6 years ago
|
||
Solarwinds announced a couple months back that they were switching up the login pages. I had to reset my password to be able to log in. Otherwise it's the same account and everything.
What 403 are you getting with tf?
It _should_ redeploy if any creds, etc change. Last night it did deploy two new instances. Today looking at the logs they both appear to have later crashed. We've started them again today.
Not really sure what is going on with them.
Comment 3•6 years ago
|
||
Not sure if related, but cloud-mirror response times are back to normal it seems. Not really sure what the root cause was here.
I did terminate the instances which were running yesterday since they were definitely not working, maybe that was enough to convince them to be redeployed?
I've emailed you logs of what happens.
Assignee | ||
Updated•6 years ago
|
Component: Operations → Operations and Service Requests
Updated•6 years ago
|
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → WORKSFORME
You need to log in
before you can comment on or make changes to this bug.
Description
•