Closed Bug 1107462 Opened 10 years ago Closed 10 years ago

Trees CLOSED - Various test fail with command timed out: 1800 seconds without output, attempting to kill

Categories

(Infrastructure & Operations Graveyard :: CIDuty, task)

x86
All
task
Not set
blocker

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: cbook, Unassigned)

References

()

Details

maybe a result of the problems with ftp last night but seems a lot of test fail with 

01:44:29 INFO - retry: Calling <bound method Proxxy._download_file of <mozharness.mozilla.proxxy.Proxxy object at 0x00D4DB70>> with args: ('https://ftp-ssl.mozilla.org/pub/mozilla.org/firefox/tinderbox-builds/mozilla-inbound-win32-pgo/1417674611/firefox-37.0a1.en-US.win32.tests.zip', 'C:\\slave\\test\\build\\firefox-37.0a1.en-US.win32.tests.zip'), kwargs: {}, attempt #1 command timed out: 1800 seconds without output, attempting to kill
and this is spreading so closing the integration trees and m-c
Severity: critical → blocker
Summary: Various test fail with command timed out: 1800 seconds without output, attempting to kill → Trees CLOSED - Various test fail with command timed out: 1800 seconds without output, attempting to kill
Trees reopen at about 7:30am Pacific.

Hal, Jakem, do you want to update this bug before we close this as fixed ?
Flags: needinfo?(nmaul)
Flags: needinfo?(hwine)
See Also: → 1107156
Timeouts look like carry-overs that started during the outage.
Status: NEW → RESOLVED
Closed: 10 years ago
Flags: needinfo?(nmaul)
Flags: needinfo?(hwine)
Resolution: --- → FIXED
Recapping issue & timeline from irc logs:

- releng & webops paged at 03:39 PT
- initial reports hinted at collateral from bug 1107156
- escalated to 2nd level webops @ 04:30 PT
- confirmed as zlb blocking/rate limiting issue
- 04:40 PT added SCL3 hosts to white list - did not resolve issue
- 04:50 PT backed off "general protection class settings" set yesterday as part of bug 1107156
- 05:15 PT AWS hosts added to white list - seems to resolve issues for AWS
- 05:30 PT continued scl3 issues
- log correlations imply extremely low transfer rate within scl3
- 06:12 PT escalated to catlee
- 06:50 PT confirm extremely slow transfer rate of 2kb/s
- 07:00 PT page netops
- jake discovers zlb rate limiting rules interaction not working as advertised
- 07:18 PT all rate throttling disabled for ftp.m.o VIP
- 07:19 PT netops dismissed with thanks

No more problems in scl3
Product: Release Engineering → Infrastructure & Operations
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in before you can comment on or make changes to this bug.