Closed Bug 801148 Opened 13 years ago Closed 13 years ago

Several reports of slow response for mozilla.org

Tracking

(Not tracked)

Status:

RESOLVED FIXED

People

(Reporter: rik, Assigned: nmaul)

References

Details

(Whiteboard: [triaged 20121012])

Attachments

(2 files)

graph showing recent response times 13 years ago Jake Maul [:jakem] 50.52 KB, image/png		Details
www.mozilla.org response times, 10/15 13 years ago Jake Maul [:jakem] 84.06 KB, image/png		Details

Anthony Ricaud (:rik)

Reporter

Description

•

13 years ago

In the last couple of days, I've seen a bunch of people complaining about the loading time of www.mozilla.org. Some of them even saw an error message. Something like Service unavailable or Service not responding (I can't remember, sorry) I've cced two of them: Pascal is in France, Tomer is in Israel.

Jake Maul [:jakem]

Assignee

Comment 1

•

13 years ago

We've had a number of odd networking issues lately which likely contributed to this, so I suspect it may be very hard to track down. Some of these *definitely* contributed heavily (Catchpoint monitoring was highly annoyed at www.mozilla.org's performance). I'm inclined not to investigate much further until we can say with some certainty that those networking issues were not the underlying cause here. One thing that will definitely help is anything we can do regarding bug 742890. Now is probably not the time to work on this extensively (given the stub installer stuff that's ongoing at the moment), but if there's anything we can do easily (cleanly! don't want to rush bad code to production), it might be worth a quick glance.

Priority: -- → P4

Whiteboard: [triaged 20121012]

Tomer Cohen :tomer

Comment 2

•

13 years ago

I have some connection problems from my location, sometimes pages load very slowly and sometimes not at all. I think this issue appeared few days ago. Is it possible it happen because of heavy load when everyone gets their browser upgrades? When I monitored the main page it took more than a minute to load the page, and I found few HTTP 500 errors with some images and scripts attached to the main page. Monitoring the page https://www.mozilla.org/en-US/ using cURL showed that sometimes it loads the page in around 15 seconds, while sometimes it failed with an error after very long time. About 1/10 of my tests failed with that error message. $ time curl -vv https://www.mozilla.org/en-US/ * About to connect() to www.mozilla.org port 443 (#0) * Trying 63.245.213.92... connected * successfully set certificate verify locations: * CAfile: none CApath: /etc/ssl/certs * SSLv3, TLS handshake, Client hello (1): * SSLv3, TLS handshake, Server hello (2): * SSLv3, TLS handshake, CERT (11): * SSLv3, TLS handshake, Server finished (14): * SSLv3, TLS handshake, Client key exchange (16): * SSLv3, TLS change cipher, Client hello (1): * SSLv3, TLS handshake, Finished (20): * SSLv3, TLS change cipher, Client hello (1): * SSLv3, TLS handshake, Finished (20): * SSL connection using RC4-SHA * Server certificate: * subject: businessCategory=Private Organization; 1.3.6.1.4.1.311.60.2.1.3=US; 1.3.6.1.4.1.311.60.2.1.2=California; serialNumber=C2543436; C=US; ST=California; L=Mountain View; O=Mozilla Foundation; OU=IT Operations; CN=www.mozilla.org * start date: 2011-12-15 20:35:47 GMT * expire date: 2013-12-16 21:23:08 GMT * subjectAltName: www.mozilla.org matched * issuer: C=US; O=GeoTrust Inc; OU=See www.geotrust.com/resources/cps (c)06; CN=GeoTrust Extended Validation SSL CA * SSL certificate verify ok. > GET /en-US/ HTTP/1.1 > User-Agent: curl/7.22.0 (i686-pc-linux-gnu) libcurl/7.22.0 OpenSSL/1.0.1 zlib/1.2.3.4 libidn/1.23 librtmp/2.3 > Host: www.mozilla.org > Accept: */* > < HTTP/1.1 500 Internal Server Error < Date: Fri, 12 Oct 2012 23:21:15 GMT < Connection: close < Content-Type: text/html < <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd"> <html> <head> <meta http-equiv="Content-Type" content="text/html;charset=utf-8"> <title>Service Unavailable</title> <style type="text/css"> body, p, h1 { font-family: Verdana, Arial, Helvetica, sans-serif; } h2 { font-family: Arial, Helvetica, sans-serif; color: #b10b29; } </style> </head> <body> <h2>Service Unavailable</h2> <p>The service is temporarily unavailable. Please try again later.</p> </body> </html> * Closing connection #0 * SSLv3, TLS alert, Client hello (1): real 1m13.357s user 0m0.020s sys 0m0.012s

Anthony Ricaud (:rik)

Reporter

Comment 3

•

13 years ago

Jake: I'd argue that this is more important than the stub installer. With those issues, a lot of people can't download Firefox. That's more important than a test.

Jake Maul [:jakem]

Assignee

Comment 4

•

13 years ago

Attached image graph showing recent response times — Details

I understand and don't disagree that it's important. However, please see the attached graph. That's why I'm hesitant to spend time investigating this more thoroughly right now. The statistical data strongly indicates that this is vastly improved as of approximately 3pm PT today. The 4 red dots around 30 seconds or so are Munich, Rome, London, and Amsterdam. This may indicate a problem in the EU... or just be bad luck (4 data points don't make much of a trend). I can look at the AMS1 load balancer cluster specifically, or even disable it and send that traffic back to PHX1/SCL3 like everything else... but there's a decent chance that would actually be *worse* overall, rather than better.

Jake Maul [:jakem]

Assignee

Comment 5

•

13 years ago

Looking further back in Catchpoint, I see that those red ~30s timeouts started around October 8, and are largely concentrated in Europe. I believe this to be somewhat related to problems in SCL3... according to the Zeus cluster in AMS1, I am getting some flapping when it tries to reach the SCL3 origin. For the time being, I have disabled the SCL3 origin in the AMS1 Zeus cluster. In a few hours it should become obvious if this has a noticeable impact.

Jake Maul [:jakem]

Assignee

Comment 6

•

13 years ago

Attached image www.mozilla.org response times, 10/15 — Details

This seems to have helped drastically. I can't be 100% sure that there haven't been network improvements around the same time so we might want to flip back and forth once or twice to be sure, but it definitely seems much better now.

Anthony Ricaud (:rik)

Reporter

Comment 7

•

13 years ago

Thanks for taking a look at this late on a Friday!

Jake Maul [:jakem]

Assignee

Comment 9

•

13 years ago

I've spoken to our netops folks, and there was indeed an issue in SCL3 from 10/8 to 10/12, which is now be fixed. I've re-enabled the SCL3 node in the AMS1 backend pool, and all seems well after several hours. Calling this one fixed. Thanks!

Assignee: server-ops-webops → nmaul

Status: NEW → RESOLVED

Closed: 13 years ago

Resolution: --- → FIXED

Nobody; OK to take it and work on it

Updated

•

12 years ago

Component: Server Operations: Web Operations → WebOps: Other

Product: mozilla.org → Infrastructure & Operations

BMO Automation

Updated

•

7 years ago

Product: Infrastructure & Operations → Infrastructure & Operations Graveyard

You need to log in before you can comment on or make changes to this bug.

Bugzilla

Several reports of slow response for mozilla.org

Categories

(Infrastructure & Operations Graveyard :: WebOps: Other, task, P4)

Tracking

(Not tracked)

People

(Reporter: rik, Assigned: nmaul)

References

Details

(Whiteboard: [triaged 20121012])

Crash Data

Security

(public)

User Story

Attachments

(2 files)

Description

Comment 1

Comment 2

Comment 3

Comment 4

Comment 5

Comment 6

Comment 7

Comment 9

Updated

Updated

Attachment

General

Description

File Name

Content Type