Closed Bug 1004674 Opened 11 years ago Closed 11 years ago

Fix rsync://releases-rsync.mozilla.org/

Categories

(Infrastructure & Operations Graveyard :: WebOps: Product Delivery, task)

task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: gozer, Assigned: gozer)

References

Details

Attachments

(1 file)

Looks like the rsync service is broken, as in: $> rsync rsync://releases-rsync.mozilla.org/ rsync: did not see server greeting rsync error: error starting client-server protocol (code 5) at main.c(1635) [Receiver=3.1.0] Looking at the rsync boxes, we are being hit by xinetd rate limiting the ZLBs [root@rsync1.dmz.scl3 ~]# tail -f /var/log/messages May 1 13:16:48 rsync1 xinetd[3146]: START: rsync pid=29416 from=::ffff:10.22.74.212 May 1 13:16:48 rsync1 xinetd[3146]: START: rsync pid=29417 from=::ffff:10.22.74.210 May 1 13:16:49 rsync1 xinetd[3146]: EXIT: rsync status=12 pid=29406 duration=3(sec) May 1 13:16:49 rsync1 xinetd[3146]: EXIT: rsync status=12 pid=29407 duration=2(sec) May 1 13:16:49 rsync1 xinetd[3146]: FAIL: rsync per_source_limit from=::ffff:10.22.74.208
Assignee: server-ops-webops → gozer
Status: NEW → ASSIGNED
Blocks: 1003265
Also problematic was that Zeus was doing health-check every 5 seconds, causing lots of rsync process churn. Switched to a calmer health-check seemed to do the trick and quiesced the rsync boxes.
Status: ASSIGNED → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Not sure on the structure of the rsyncs, but are are multiple deamons supposed to be running? This is creating a "high" load on both nodes; rsync1.dmz.scl3.mozilla.com:Load is CRITICAL: CRITICAL - load average:26.66, 26.55, 25.80 Fri 09:50:56 PDT [5106] rsync2.dmz.scl3.mozilla.com:Load is CRITICAL: CRITICAL - load average: 28.55, 25.31, 20.37
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
As per irc, both alerts have been downtimed for 1day (24 hours).
Rsyncd processes are managed by xinetd, and since Zeus isn't checking aggressively anymore, I need to look at if this is normal usage or caused by some problem. Need to do a bit more digging.
Looking at these, looks like it's not a bug. Just looks like many clients doing a fairly slow rsync of our content. I am assuming since we've been broken for a while, that now, a few folks out there are finally playing catch-up. For now, I'll keep this bug open and keep watching over stuff. I would hope this would just be the symptom of many mirrors out there having to sync-up lots of content. I'll have to wait and see some.
Sat 11:06:53 PDT [5844] rsync2.dmz.scl3.mozilla.com:Load is CRITICAL: CRITICAL - load average: 28.47, 27.63, 25.24 (http://m.mozilla.org/Load) Sat 11:08:54 PDT [5845] rsync1.dmz.scl3.mozilla.com:Load is CRITICAL: CRITICAL - load average: 28.14, 28.10, 27.89
The service is working now, and can expect up to 30 rsync clients per-box, so I just cranked up the nagios load check's limits. In most cases, these will be relatively idle processes blocked on the client's ability to suck down data.
Status: REOPENED → RESOLVED
Closed: 11 years ago11 years ago
Resolution: --- → FIXED
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: