Closed Bug 431631 Opened 16 years ago Closed 16 years ago

www.mozilla.com down in Japan

Categories

(mozilla.org Graveyard :: Server Operations, task)

task
Not set
blocker

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: gen, Assigned: mrz)

References

()

Details

(Whiteboard: researching RFO, needs-nagios)

We in Japan cannot access www.mozilla.com.

We are being sent to the China servers and the page is not resolving.


traceroute to www-mozilla-com.glb.mozilla.com (59.151.50.31), 64 hops max, 40 byte packets

 9  202.232.8.130 (202.232.8.130)  54.381 ms  54.087 ms *
10  202.97.33.49 (202.97.33.49)  55.338 ms  55.317 ms  54.651 ms
11  202.97.33.29 (202.97.33.29)  54.695 ms  56.019 ms  55.201 ms
12  202.97.34.133 (202.97.34.133)  79.846 ms  79.676 ms  79.585 ms
13  202.97.57.218 (202.97.57.218)  79.685 ms  80.268 ms  103.860 ms
14  bj141-130-114.bjtelecom.net (219.141.130.114)  80.401 ms  79.582 ms  80.701 ms
15  219.142.8.26 (219.142.8.26)  79.705 ms  79.464 ms  79.663 ms
16  211.151.224.182 (211.151.224.182)  80.303 ms  79.565 ms  79.773 ms
17  211.151.227.86 (211.151.227.86)  81.012 ms  82.383 ms  84.993 ms
18  59.151.50.31 (59.151.50.31)  79.877 ms  79.968 ms  80.420 ms
Assignee: nobody → server-ops
Component: www.mozilla.com → Server Operations
Product: Websites → mozilla.org
QA Contact: www-mozilla-com → justin
Version: unspecified → other
Assignee: server-ops → mrz
I'm poking at this as well, the reassign was a knee-jerk thing to keep it from paging me, but I paged mrz anyway because it feels like a netscaler issue.

Backend servers in .cn all seem to be serving the correct content, looks like the nagios over there isn't monitoring the outside vip on the netscaler though.
If I stick the China IP in my hosts file for www.mozilla.com, I get "The connection to the server was reset while the page was loading." in Firefox.
Was able to duplicate this - rebooted the Netscaler (cnlb01) and can't duplicate it anymore.  

Like Dave said, we're not doing a good job monitoring the external VIPs - will work on that.
Ok, we're back in Japan.  Thanks for the quick work.
I just did a traceroute and I am still sent to "www-mozilla-com.glb.mozilla.com (59.151.50.31), 64 hops max, 40 byte packets"

Is that right?
Yes - that's right.  You are best served out of that colo (based on RTT probes).
Whiteboard: researching RFO, needs-nagios
Timezone on the that netscaler is a bit screwed up.  Only relevant log entry is:

424605 13220 Monitor_http-moz_of_svc-group-static_10.6.80.12_80(10.6.80.12:80): DOWN; Last response: Failure - TCP syn s
ent, reset received. Thu May  1 07:12:18 2008

424606     0 'server_serviceGroup_NSSVC_HTTP_10.6.80.12:80(svc-group-static_10.6.80.12_80)' DOWN Thu May  1 07:12:18 2008

424607 14781 Monitor_http-moz_of_svc-group-static_10.6.80.13_80(10.6.80.13:80): DOWN; Last response: Failure - TCP syn sent, reset received. Thu May  1 07:12:33 2008

It's 11am Pacific and that box is reporting 19:03 BST so that event was about 12 hours ago or very close to 11pm.  
Depends on: 432127
I was too quick to reboot without grabbing any diag info so there's little information I can go on.  

I'm upgrading the two Netscalers in China from 8.0.50 to 8.0.54.6 to pick up a lot of bug fixes (mostly around weblogging under load) and performance fixes, some of which we've already picked up in the 7.0.61 train we're running in Amsterdam & San Jose.  

Nagios monitoring is being tracked in bug 432127.
Status: NEW → RESOLVED
Closed: 16 years ago
Resolution: --- → FIXED
Product: mozilla.org → mozilla.org Graveyard
You need to log in before you can comment on or make changes to this bug.