Closed Bug 568005 Opened 15 years ago Closed 15 years ago

connection to mail, mpt-vpn, build-vpn, and others very slow

Categories

(mozilla.org Graveyard :: Server Operations, task)

x86
macOS
task
Not set
critical

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: bhearsum, Assigned: dmoore)

Details

Has caused some build failures, too. It's been happening for the last 2 hours at least.
Derek and mrz have been looking into this.
Assignee: server-ops → dmoore
Oddly mail is coming to my phone (wtf?) much quicker than to my desktop here in Toronto. Dunno if that helps with diagnosis.
Some additional diagnosisis: - toronto office to speedtest in SJ (smugmug) is fine - ssh direct from office to people.mozilla.org is slow - ssh through off.net to people.mozilla.org is fine Hope that helps!
It helps. We're working though problems with our upstream providers, and it's very dependent on where you're connecting from.
Some traceroute, as requested by shaver: direct from toronto office to people.mozilla.org: mtr from off.net to people.mozilla.org: Host Loss% Snt Last Avg Best Wrst StDev 1. ca-gw1.ca.mozilla.com 0.0% 93 2.1 2.5 1.6 26.0 2.6 2. 66.207.206.177 0.0% 93 3.7 4.2 3.2 35.2 3.4 3. 76-9-207-1.beanfield.net 0.0% 92 3.3 9.1 2.7 207.7 26.3 4. 72.15.49.2 0.0% 92 16.2 17.9 15.6 96.2 9.2 5. 72.15.49.58 0.0% 92 15.6 15.9 15.4 17.8 0.5 6. nyiix.layer42.net 0.0% 92 16.4 16.6 15.7 29.6 2.1 7. xe3-2.core1.mpt.layer42.net 0.0% 92 100.1 104.5 99.4 148.8 11.0 8. 216-129-125-182.cust.layer42.net 39.1% 92 112.6 131.6 101.5 359.9 56.8 9. v9.core1.sj.mozilla.com 28.3% 92 95.1 95.7 94.8 100.1 0.9 10. ??? traceroute to people.mozilla.org (63.245.208.169), 30 hops max, 40 byte packets 1 161.136.196.67.static.heavycomputing.ca (67.196.136.161) 1.055 ms 0.976 ms 1.033 ms 2 gw-he.torontointernetxchange.net (198.32.245.112) 9.848 ms 9.820 ms 10.014 ms 3 10gigabitethernet1-2.core1.nyc5.he.net (72.52.92.165) 17.317 ms 17.591 ms 17.547 ms 4 10gigabitethernet1-4.core1.nyc1.he.net (72.52.92.153) 17.499 ms 17.476 ms 17.426 ms 5 10gigabitethernet1-1.core1.nyc4.he.net (72.52.92.45) 28.246 ms 28.315 ms 28.394 ms 6 10gigabitethernet5-3.core1.lax1.he.net (72.52.92.226) 79.816 ms 81.964 ms 78.760 ms 7 10gigabitethernet1-3.core1.lax2.he.net (72.52.92.122) 78.719 ms 78.688 ms 78.645 ms 8 mozilla.com.any2ix.coresite.com (206.223.143.109) 86.136 ms 86.167 ms 86.136 ms 9 v8.core1.sj.mozilla.com (63.245.208.49) 86.081 ms 86.048 ms 86.086 ms 10 * * * 11 * * * 12 * * * 13 * * * 14 * * * 15 * * * 16 * * * 17 * * * 18 * * * 19 * * * 20 * * * 21 * * * 22 * * * 23 * * * 24 * * * 25 * * * 26 * * * 27 * * * 28 * * * 29 * * * 30 * * *
Ugh, mispasted above: the mtr is from the toronto office to people, the traceroute is from off.net to people (don't have mtr on that box) Looks like the problem is at layer42.net which isn't on off.net's routing.
Here's what I'm seeing from home, too. All mozilla-related infra is painfully slow, and I'm seeing timeouts to mail (since 7am-ish Eastern) My traceroute [v0.72] cuttlefish (0.0.0.0) Tue May 25 11:18:32 2010 Resolver: Received error response 2. (server failure)er of fields quit Packets Pings Host Loss% Snt Last Avg Best Wrst StDev 1. gw-home.deadsquid.com 0.0% 29 0.5 0.6 0.4 4.2 0.7 2. xplr-142-46-160-1.xplornet.com 0.0% 29 71.7 64.7 19.5 152.8 39.6 3. 174.35.131.81 0.0% 29 41.7 55.2 24.2 164.7 30.6 4. 142.46.4.25 0.0% 29 35.8 62.4 22.2 190.1 40.3 5. 142.46.128.9 0.0% 29 56.6 60.7 22.2 258.7 55.5 6. gw-wbsconnect.torontointernetxch 0.0% 29 68.0 89.5 38.6 258.7 60.9 7. te-1-4.bmf1.sjc1.gt-t.net 3.6% 29 236.1 159.7 113.4 316.8 51.6 8. 98.124.130.254 3.6% 28 185.3 158.1 123.3 292.9 41.7 9. xe3-2.core1.mpt.layer42.net 3.6% 28 268.6 202.6 115.6 520.6 111.4 10. 216-129-125-182.cust.layer42.net 39.3% 28 337.6 171.7 104.8 379.5 86.8 11. v9.core1.sj.mozilla.com 46.4% 28 281.3 167.2 107.9 305.0 68.8 12. ???
I'm not getting packet loss, but couldn't get to bugzilla a second ago. Seeing a jump in latency between torontointernetxchange.ent and te-1-4.bmf1.sjc1.gt-t.net like kev: Host Loss% Snt Last Avg Best Wrst StDev 1. 192.168.1.1 0.0% 44 0.8 0.9 0.6 2.0 0.2 2. 10.76.64.1 0.0% 44 7.8 9.8 7.1 19.5 2.9 3. d226-8-177.home.cgocable.net 0.0% 44 15.0 15.4 13.0 33.7 3.1 4. 113-0-226-24.cgocable.net 0.0% 44 21.5 22.2 19.0 33.1 3.2 5. gw-wbsconnect.torontointernetxchange.net 0.0% 44 38.1 44.4 32.8 206.3 32.0 6. te-1-4.bmf1.sjc1.gt-t.net 0.0% 43 112.4 113.4 110.5 126.7 3.3 7. 98.124.130.254 0.0% 43 119.2 120.8 118.8 128.8 1.7 8. ge2-24.core2.mpt.layer42.net 0.0% 43 128.7 125.5 119.5 162.6 10.4 9. 216-129-125-186.cust.layer42.net 0.0% 43 119.9 143.8 117.7 336.7 56.6 10. v8.core1.sj.mozilla.com 0.0% 43 121.3 120.3 118.2 126.2 1.8 11. dyna-bugzilla.acelb.sj.mozilla.com 0.0% 43 120.5 121.2 118.5 137.2 3.1
Seems much better now, fwiw: Host Loss% Snt Last Avg Best Wrst StDev 1. ca-gw1.ca.mozilla.com 0.0% 33 17.8 6.2 1.9 22.1 5.8 2. 66.207.206.177 0.0% 33 22.5 11.5 3.1 75.3 14.6 3. 76-9-207-1.beanfield.net 0.0% 33 3.6 10.3 3.1 89.5 15.7 4. 72.15.49.2 0.0% 32 18.2 24.4 15.7 171.9 27.4 5. 72.15.49.58 0.0% 32 70.2 24.5 15.5 84.7 15.6 6. nyiix.layer42.net 0.0% 32 17.1 21.7 15.6 73.4 11.4 7. ge2-24.core2.mpt.layer42.net 0.0% 32 107.3 106.5 99.6 125.6 7.9 8. 216-129-125-186.cust.layer42.net 0.0% 32 96.5 126.0 93.0 293.1 49.6 9. v8.core1.sj.mozilla.com 0.0% 32 101.3 101.8 93.0 142.8 11.9 10. ???
We've been taking aggressive steps to work around the problematic providers throughout the night, but we need to maintain a minimum number of connections or we run into congestion issues. That balancing act is still underway, but we're making progress.
We believe this is sorted. A combination of provider outages (at Level3 and Mzima) and equipment capacity contributed to this in many interesting, interleaving ways. Our datacenter network was reconfigured to facilitate better debugging, and we're in the process of returning it to full production readiness. The rest of the recovery process should be transparent to end users, though.
Status: NEW → RESOLVED
Closed: 15 years ago
Resolution: --- → FIXED
Product: mozilla.org → mozilla.org Graveyard
You need to log in before you can comment on or make changes to this bug.