Closed Bug 936248 Opened 12 years ago Closed 12 years ago

EC2 jobs failing due to lost network traffic

Categories

(Infrastructure & Operations Graveyard :: CIDuty, task)

All
Linux
task
Not set
normal

Tracking

(Not tracked)

RESOLVED DUPLICATE of bug 940356

People

(Reporter: KWierso, Unassigned)

Details

Attachments

(1 file)

Some of these were network timeouts, too. This seems to be limited to use1, so I suspect this is the usual use1 connection problem. Arzhel's seeing "%-RT_IPSEC_REPLAY: Replay packet detected on IPSec tunnel" suggesting that the Amazon end is re-transmitting packets and that both the original and re-transmitted packets are eventually arriving.
Summary: EC2 jobs failing due to Error <urlopen error timed out> while getting http://pypi.pvt.build.mozilla.org/pub/blobuploader-1.0b.tar.gz (from http://pypi.pvt.build.mozilla.org/pub/) error → EC2 jobs failing due to lost network traffic
Does this seem solvable? Do we still need to gather more data? Should releng continue to setup in-house buildbot masters to minimize cross-colo traffic (bug 927129)?
(In reply to Chris Cooper [:coop] from comment #3) > Does this seem solvable? Do we still need to gather more data? > > Should releng continue to setup in-house buildbot masters to minimize > cross-colo traffic (bug 927129)? I believe this was a different part of our system. Ec2 host trying to read in-house host: tst-linux32-ec2-134 -> http://pypi.pvt.build.mozilla.org A better bug would be to setup a pypi host on EC2.
That's true, but difficult since pypi is part of the releng web cluster and thus based on infra puppet and assuming the presence of a Zeus load balancer and a NetApp backend. Following this line of thought, you'll need to move *everything* into AWS - clobberer, npm-mirror, ftp, hgmo, gitmo, .. And maybe that is the right solution, but it's not a decision we should back into lightly or without some serious thought, as it will be a lot of work, and far *more* work if we don't approach it systematically.
Arzhel is debugging this in bug 940356.
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → DUPLICATE
Product: Release Engineering → Infrastructure & Operations
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: