WINNT 5.2 mozilla-inbound leak test build taking over 20min (almost 40) to upload artifacts!



3 years ago
3 years ago


(Reporter: jlal, Unassigned)





3 years ago
The problem is things are very very slow to upload... A potential workaround is to teach the mozharness artifact upload code (to TC) to reclaim the task every 10min or so (this will allow for however slow uploads). Generally though if your taking over 20min to upload a 100-500mb artifact from SCL3 -> us-west-2 something very very wrong.

That's four different slaves, one of which had been doing the same job successfully all day today. These are smack in the middle of the (I can't see it but I'm told this is the bug number) bug 1160608 power outage window, though I don't know why having half the power out would make for slow uploads.
The IT Systems team handles internal infrastructure and generally is at dinner or asleep around this time of night, with coverage varying from US/Pacific to US/Eastern. So I'm moving this bug to the MOC queue since they have 24/7 response, as opposed to the business-day only coverage in the "Infrastructure: ___" components. If you were led to this component by documentation somewhere, please let us know so we can correct it ASAP.
Assignee: infra → nobody
Component: Infrastructure: Other → MOC: Service Requests
QA Contact: jdow → lypulong
Some swithes went down. Might be related.
Is it still an issue ?
Flags: needinfo?(jlal)
:van, :johnb: I hear that you fixed something with the firewalls last night? Can you provide more detail, please? Slow uploads from scl3 to aws have been an issue for us for a while, but there wasn't a known cause.
Flags: needinfo?(jbircher)

Comment 6

3 years ago
That was a pair of busy nights.

Question: Are you still experiencing slowness from SCL3 to AWS? We never received feedback after the initial notification of slowness.

We didn't do anything with the firewall or core switches that night that would have resolved any slowness. When we took a look we had a false positive between the cores and the firewalls that might have made it slow but was actually functioning as it should have been. The work we did on the firewall resulted in resolving a redundancy issue, not a throughput issue.
Flags: needinfo?(jbircher)
Probably not still seeing it, and you don't want to know details behind that being only "probably." Something bad happened while the power was out, and we don't have any way of knowing what at this point.
Last Resolved: 3 years ago
Flags: needinfo?(jlal)
Resolution: --- → INCOMPLETE
You need to log in before you can comment on or make changes to this bug.