Closed Bug 1515231 Opened 6 years ago Closed 6 years ago

Slow response from cloud-mirror.taskcluster.net

Categories

(Taskcluster :: Operations and Service Requests, task)

task
Not set
normal

Tracking

(Not tracked)

RESOLVED WORKSFORME

People

(Reporter: nthomas, Unassigned)

References

Details

Transcript from #taskcluster: 14:26 <nthomas> https://cloud-mirror.taskcluster.net/v1/redirect/s3/us-east-1/https%3A%2F%2Fs3-us-west-2.amazonaws.com%2Ftaskcluster-public-artifacts%2FAhRzGGkrQ1yNq1YaomBESg%2F0%2Fpublic%2Fparameters.yml seems to take ~30s to return a 302 14:27 <nthomas> this is from a releng AWS instance making an initial request for https://queue.taskcluster.net/v1/task/AhRzGGkrQ1yNq1YaomBESg/artifacts/public/parameters.yml 14:42 <bstack> ooh, I haven't seen cloud-mirror in a while 14:42 <bstack> let me try looking into it 14:43 <nthomas> wondered if it's a dyno that's run out of memory, or similar 14:44 <bstack> "redis connection failed" in logs 14:44 <bstack> I'll restart. give me a sec 14:44 <bstack> this happens with terraform for this service so it's a bit more than a quick click 14:44 <nthomas> huh interesting, we've had that in release-services too 14:47 <bstack> I think heroku has been doing redis maintenance recently 14:47 <bstack> some of our services are smart about it and others aren't 14:51 <bstack> restarting now 14:57 <bstack> nthomas: does it look ok now? 14:57 <nthomas> bstack: seems about the same, unfortunately 14:58 <bstack> shoot 14:58 <bstack> I appear to have lost access to the heroku cloud mirror bits 14:58 <bstack> if those still exist 14:58 <nthomas> it's non-blocking, just seemed out of normal 14:58 <bstack> it definitely does feel unusual 14:58 <bstack> mind making a bug and cc jh.ford? 14:59 <nthomas> how much does cloud-mirror get used these days ? 14:59 <bstack> if it is non blocking he can look in his morning 14:59 <nthomas> sure 14:59 <bstack> I'm honestly not sure 14:59 <bstack> this is a part of tc that I don't have much familiarity with jhford, any idea what the problem is ?
Flags: needinfo?(jhford)
See Also: → 1515230
This is almost certainly copier nodes being down, but I'm not sure how to diagnose that. I cannot access Papertrail because it seems to be redirecting me to some other site and I don't know how else to see logs for the copier nodes. I tried running terraform, but it is generating a bunch of 403 and other permissions problems. I'm not sure how to diagnose this. Brian, any hints on how I could diagnose this? One question I have for the static service deploys using terraform, does it verify that if credentials change that the hosts are redeployed or is it just checking that I have N copies of the copier running, and this is how to create a new one? Do we have this running in the background to ensure services which crash are restarted?
Flags: needinfo?(jhford)
Solarwinds announced a couple months back that they were switching up the login pages. I had to reset my password to be able to log in. Otherwise it's the same account and everything. What 403 are you getting with tf? It _should_ redeploy if any creds, etc change. Last night it did deploy two new instances. Today looking at the logs they both appear to have later crashed. We've started them again today. Not really sure what is going on with them.
Not sure if related, but cloud-mirror response times are back to normal it seems. Not really sure what the root cause was here. I did terminate the instances which were running yesterday since they were definitely not working, maybe that was enough to convince them to be redeployed? I've emailed you logs of what happens.
Component: Operations → Operations and Service Requests
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.