Closed
Bug 467502
Opened 16 years ago
Closed 16 years ago
QA AMO behind Zeus ZXTM load balancer
Categories
(mozilla.org Graveyard :: Server Operations, task)
Tracking
(Not tracked)
VERIFIED
FIXED
People
(Reporter: mrz, Assigned: oremj)
References
Details
Attachments
(1 file)
2.52 KB,
text/plain
|
Details |
Need some QA on AMO behind Zeus ZXTM. Site is staged at 63.245.209.107 and only answers https right now.
Reporter | ||
Comment 1•16 years ago
|
||
I'll save Reed the trouble and add him.
Assignee: server-ops → oremj
Group: infra
Reporter | ||
Comment 2•16 years ago
|
||
Works for http too, redirects. Desktop-Computer:~ mrz$ curl -H'Host: addons.mozilla.org' -v http://addons.mozilla.org/ * About to connect() to addons.mozilla.org port 80 (#0) * Trying 63.245.209.107... connected * Connected to addons.mozilla.org (63.245.209.107) port 80 (#0) > GET / HTTP/1.1 > User-Agent: curl/7.16.3 (powerpc-apple-darwin9.0) libcurl/7.16.3 OpenSSL/0.9.7l zlib/1.2.3 > Accept: */* > Host: addons.mozilla.org > < HTTP/1.1 301 Moved Permanently < Content-Length: 0 < Date: Tue, 02 Dec 2008 07:22:04 GMT < Connection: Keep-Alive < Location: https://addons.mozilla.org/ < Content-Type: text/html < * Connection #0 to host addons.mozilla.org left intact * Closing connection #0
Comment 3•16 years ago
|
||
curl -H'Host: addons.mozilla.org' -v -k https://63.245.209.107/en-US/firefox/ wfm. Did you duplicate the cookie rules that the NS currently has for AMO?
Comment 4•16 years ago
|
||
Not looking good thus far; [1] Taking upward of ~ 40 seconds once I've clicked on the "Add to Firefox" button [2] Taking even longer --60 seconds or so--to get the file-failed-to-download message of: [3] "Firefox could not install the file at https://addons.mozilla.org/en-US/firefox/downloads/file/42123/adblock_plus-1.0-fx+sm+tb.xpi because: Download error -228" Maybe I should've added services.addons.mozilla.org, too, to my HOSTS file? https://services.addons.mozilla.org/en-US/firefox/api/1.1/search/live%20http%20headers/all/10/WINNT/3.0.4 More tomorrow; bed now :-P
Reporter | ||
Comment 5•16 years ago
|
||
(In reply to comment #3) > Did you duplicate the cookie rules that the NS currently has for AMO? Those were specific to the Netscaler's version of caching (or rather how it chose to ignore/break thing). As such those rules are directly "duplicate-able" and I'm not sure if they're needed. This bug is here to track all of that though :)
Reporter | ||
Comment 6•16 years ago
|
||
oremj - this is on zxlb04 / 10.2.10.72. Still using a self-signed SSL cert, feel free to replace it with a moco signed *.mozilla.org one or something.
Comment 7•16 years ago
|
||
Comment 8•16 years ago
|
||
On prod, I get the following two things right after that 302: https://addons.mozilla.org/en-US/firefox/ GET /en-US/firefox/ HTTP/1.1 Host: addons.mozilla.org User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.0.4) Gecko/2008102920 Firefox/3.0.4 Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8 Accept-Language: en-us,en;q=0.5 Accept-Encoding: gzip,deflate Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7 Keep-Alive: 300 Connection: keep-alive Referer: https://addons.mozilla.org/en-US/firefox/ Cookie: __utma=164683759.1720830908.1228207447.1228255323.1228255654.11; __utmz=164683759.1228207447.1.1.utmccn=(direct)|utmcsr=(direct)|utmcmd=(none); __utmb=164683759; __utmc=164683759; AMOappName=firefox HTTP/1.x 200 OK Date: Tue, 02 Dec 2008 21:47:41 GMT Content-Length: 10575 Connection: Keep-Alive Via: NS-CACHE-6.0: 4 Server: Apache/2.2.3 (Red Hat) X-Powered-By: PHP/5.1.6 X-AMO-ServedBy: mrapp06 Content-Type: text/html; charset=UTF-8 Cache-Control: private Content-Encoding: gzip
(In reply to comment #9) > I don't see an addon cookie in your logout request in comment 7. Yeah; I didn't modify anything--that's the straight output :-(
Assignee | ||
Comment 11•16 years ago
|
||
I think I fixed that issue. Let me know if logging in/out is working for you now.
Comment 12•16 years ago
|
||
Hey guys - what is the schedule/timeline for this? Would like someone on webdev to give stephend a hand with this. What are you guys thinking as a switch date?
Assignee | ||
Comment 13•16 years ago
|
||
Until we buy some hardware to front zeus I believe this is just going to be a backup for the netscaler. Mrz, what's the layer 4 load balancer evaluation timeline?
I'm still seeing the problems I mentioned in comment 4, along with: [4] When I log in, the page doesn't reflect that fact; it still displays "Log in", until I load another link or click Refresh. Log out, though, seems to now work correctly.
Reporter | ||
Comment 15•16 years ago
|
||
(In reply to comment #12) > Hey guys - what is the schedule/timeline for this? Would like someone on > webdev to give stephend a hand with this. What are you guys thinking as a > switch date? Right now I want a fall back plan for Dec 16 release. I think we have enough stuff shifted around for MU on Thursday though it would be great to have AMO ready to go on Zeus should that be necessary. (In reply to comment #13) > Until we buy some hardware to front zeus I believe this is just going to be a > backup for the netscaler. Zeus is "production" for vamo & fxfeeds. Moving those took a significant load off the Netscalers. > Mrz, what's the layer 4 load balancer evaluation timeline? Just got that Cisco ACE try-n-buy signed today, should get hardware soon.
(In reply to comment #14) > I'm still seeing the problems I mentioned in comment 4, along with: > > [4] When I log in, the page doesn't reflect that fact; it still displays "Log > in", until I load another link or click Refresh. > > Log out, though, seems to now work correctly. Jeremy has fixed the performance and failure-to-download error (-228); the load balance was returning https://releases..., rather than http://releases... Issue [4], however, still remains: I'm getting the cached homepage view after I log in to AMO (shows "Log in" as a link, rather than "My Account", "Developer Tools", and "Log out".
Assignee | ||
Comment 17•16 years ago
|
||
Do you have a way to reliably reproduce Issue 4?
(In reply to comment #17) > Do you have a way to reliably reproduce Issue 4? Yes, 100%. 1. Just click "Log in", log in, and then look at the page -- it'll still appear as though you're logged out, though if you refresh the page, you'll clearly see you're logged in. 2. If you click "Log out" from this page (once that link is visible after a reload/refresh), you can keep reproducing this. http://pastebin.com/m461f37d2 has the headers
OK, yet-another-update: At 2:26pm today, Jeremy did something that appears to have fixed issue # 4. Since then, I've tested: * Account creation * Password reset * Different application views Additionally, I've thrown my two Selenium testcases--search and the more general one--at it, with no errors. Next up, the Dev CP...
Comment 20•16 years ago
|
||
What kind of performance improvement does the zeus box have over the netscalers?
Reporter | ||
Comment 21•16 years ago
|
||
(In reply to comment #20) > What kind of performance improvement does the zeus box have over the > netscalers? Hard to quantify but it's a more scalable solution that Netscaler is, which is just two boxes. Zeus ZXTM runs on commodity hardware (on top of RHEL5) and I can keep adding ZXTM nodes to grow it. The only thing Netscaler does in specialized hardware is SSL offload. The Netscaler 12k on paper can do 28,000 SSL tps but we see closer to 8,000 before it fails. Zeus claims a dual quad-core L5450 Xeon server can do about 12,000 SSL tps. At a 1/20th the cost of Netscaler, I can get a couple boxes to handle 28k and can exceed that a lot easier. (For production, I'm looking at dual E5460 Xeons)
Comment 22•16 years ago
|
||
Sweet, that sounds great.
I've tested uploading an add-on, as well as changing its description and screenshots, now, with no issues. I'd feel more comfortable if a few webdev folks could also poke at AMO, too :-P
Comment 24•16 years ago
|
||
I clicked around and it worked fine for me.
Reporter | ||
Comment 25•16 years ago
|
||
That's great! I'd like to do a performance test run and shift AMO over to ZXTM Tuesday night and into at least Wednesday morning/afternoon to pick up peak traffic. Maybe even Tuesday - Thursday. This will affect log processing for the duration. Trying to gauge how well ZXTM does under AMO load, how Gomez views external performance and how large the eventual ZXTM cluster might need to be to handle (right now it's 80% idle with vamo & fxfeeds). Any show stoppers to that?
Comment 26•16 years ago
|
||
(In reply to comment #25) > This will affect log processing for the duration. What's that mean? I think anything that affects our statistics is a show stopper.
Reporter | ||
Comment 27•16 years ago
|
||
Means deinspanjer will need to process the ZXTM log dir for AMO instead of (or in addition to) the normal AMO log dir. Anything else look at those logs?
Comment 28•16 years ago
|
||
(In reply to comment #27) > Anything else look at those logs? Yes, AMO has its own log processing, which went critical yesterday due to other log problems.
Comment 29•16 years ago
|
||
AMO's log parse scripts will need to be updated for the VAMO and AMO log changes, as logs from both sites are processed. Considering VAMO already moved to ZXTM earlier, the VAMO part of the addons processing scripts has been broken since then.
Comment 30•16 years ago
|
||
I don't understand how this keeps happening. The information on reconfiguring the log dirs and re-parsing the old logs is here: https://wiki.mozilla.org/Update:Developers/Statistics
Reporter | ||
Comment 31•16 years ago
|
||
(In reply to comment #30) > I don't understand how this keeps happening. Probably because there's no monitor to remind anyone about it when it stops grabbing current data. Is that something easy to do?
Comment 32•16 years ago
|
||
(In reply to comment #31) > (In reply to comment #30) > > I don't understand how this keeps happening. > > Probably because there's no monitor to remind anyone about it when it stops > grabbing current data. Is that something easy to do? I said in comment #28 that the current AMO log processing monitor I wrote for this purpose went critical yesterday, but it looks like it was ignored (at least from my point of view). I have to agree with fligtar here, though. This continues to happen every time something happens to the logs (I can personally count at least 3-4+ times). What needs to be done to make sure that everybody is pulled in every time something happens or needs to happen to any AMO and/or VAMO logs? deinspanjer was notified in bug 467412 about this change, but it doesn't seem like anybody told fligtar.
Comment 33•16 years ago
|
||
It's not even necessary that I be told - it was my understanding that we handed the log configuration aspect off to IT when that documentation was written and we had trouble last time and made the integrity check script and nagios monitor.
Assignee | ||
Comment 34•16 years ago
|
||
(In reply to comment #30) > I don't understand how this keeps happening. > > The information on reconfiguring the log dirs and re-parsing the old logs is > here: https://wiki.mozilla.org/Update:Developers/Statistics Sorry we switched over to Zeus in an emergency situation, so stats were secondary to keeping the service up. I did get a page yesterday(?), "14:15 <@nagios> [79] dm-stats01:addons stats is CRITICAL: [FAIL][Adblock Plus] [Update Pings] Count from Wednesday changed by 85% from 2008-11-26 count of 5202612 [FAIL][Adblock Plus] [Update Pings] Count from Wednesday changed by 82% from 2008-11-19 count of 4348990 [FAIL][NoScript] [Update Pings] Count from Wednesday changed by 89% from 2008-11-26 count of 1818583 [FAIL][NoScript] [U". I think I've come up for a solution that will fix the logs without changing the configs. I'll start running the script in Attachment 35183 [details] [diff] which will consolidate all the zeus logs to one place.
Assignee | ||
Comment 35•16 years ago
|
||
Oops, meant Attachment 351833 [details].
Comment 36•16 years ago
|
||
Thanks Jeremy - can you update the update ping counter config to parse that directory and re-run the update ping script for last Wednesday (2008-12-03)?
Comment 37•16 years ago
|
||
It's probably stating the obvious but when the AMO test happens at 9pm tonight the log parsing scripts need to continue to work with the Zeus logs.
Assignee | ||
Comment 38•16 years ago
|
||
(In reply to comment #37) > It's probably stating the obvious but when the AMO test happens at 9pm tonight > the log parsing scripts need to continue to work with the Zeus logs. I'm hardlinking the logs in to the directory the stats scripts are already looking at, so hopefully everything will just work.
Assignee | ||
Comment 39•16 years ago
|
||
Site has been live behind the ZXTM boxes since ~9:10pm.
Still looking good so far.
Reporter | ||
Comment 41•16 years ago
|
||
From the vserver | Content Compression: The following table holds a list of the MIME types for the content that will be compressed. MIME Type text/css text/plain text/html Default Mime-types don't include javascript. There was some bug some time ago to include javascript. The matching Netscaler rule is: add policy expression moz_javascript "RES.HTTP.HEADER Content-Type CONTAINS javascript" Pretty sure I did a CONTAINS because I didn't want to have to match on these: text/javascript application/x-javascript Added both of those to the addons-ssl vserver (sure wish there was a global setting).
Assignee | ||
Comment 42•16 years ago
|
||
Test complete.
Status: NEW → RESOLVED
Closed: 16 years ago
Resolution: --- → FIXED
Verified -- for my part, anyway, which was what this bug was about :-)
Status: RESOLVED → VERIFIED
Comment 44•16 years ago
|
||
Jeremy - can you take a look at comment #39? Has that script been run?
Comment 45•16 years ago
|
||
(In reply to comment #44) > Jeremy - can you take a look at comment #39? Has that script been run? You sure that's the right comment? Are we caught up on stats crunching now? I'm still hearing reports about broken stats.
Comment 46•16 years ago
|
||
I meant comment #36, but I'm filing another bug now because we now have 2 weeks of broken stats.
Comment 47•16 years ago
|
||
Is this fixed now? The stats are still way down. Even if were not retroactively parsing the missed stats (yet?), the current stats don't look right either. E.g. https://addons.mozilla.org/en-US/firefox/statistics/addon/1865 fell off a cliff after Nov 25, and continues to drop despite a huge spike in downloads (due to a new release).
Comment 48•16 years ago
|
||
The new bug for the broken stats is bug 469376
Updated•9 years ago
|
Product: mozilla.org → mozilla.org Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•