Closed Bug 524157 Opened 15 years ago Closed 14 years ago

Implement SiteSpect on AMO and SUMO [ref:00D7JfQw.5007API2S:ref]

Categories

(Infrastructure & Operations Graveyard :: WebOps: Other, task)

All
Other
task
Not set
minor

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: bcutler, Assigned: oremj)

References

()

Details

(Whiteboard: Upgrade 2/4 @ 7pm)

Attachments

(1 file)

We are working with Tal Cohen to add a new domain (AMO) and subdomain (SUMO) to SiteSpect.  Some notes from Tal:

* Addons.mozilla.org needs to be setup as a new Site in SiteSpect. The reason is that the user tracking done by SiteSpect is largely cookie based. The cookies that SiteSpect assigns are bound to the domain configured in the Site configuration. Currently, that is set to .mozilla.com not .org. I can walk you through this process so that you can do this on your own next time. 

* Support.mozilla.com can either be added onto the existing site, or can be setup as a new site. It comes down to whether or not the support subdomain need to be isolated (from a testing standpoint) from the www.mozilla.com site.

* Either way, [Tal] would suggest adding 2 additional IP addresses to our SiteSpect appliance to accommodate the new subdomains.
Please update the tile to include the following string so that we can track this in our internal tracking system:

[ ref:00D7JfQw.5007API2S:ref ]
Summary: Implement SiteSpect on AMO and SUMO → Implement SiteSpect on AMO and SUMO [ref:00D7JfQw.5007API2S:ref]
Blake -

1. AMO & SUMO are in San Jose & Amsterdam.  Do you need the test run out of both places?

2. Both AMO & SUMO are behind ZXTM and not the Netscaler so there will be additional engineering effort to duplicate the Netscaler config on a different load balancer platform.

2. What's your time line?
Assignee: server-ops → jeremy.orem+bugs
1) What are the implications of running tests out of only San Jose?  For the time being, we are only testing en-US pages.

2) Ideally, we would have SiteSpect implemented within 2-3 weeks.  That said, there is no pressing deadline.
Blake, what's the logic flow for this? 


New user, no cookie set 
 -> load balance between origin servers & sitespect

User, cookie equals $x
 -> send to sitespect

What is $x?  

AMO/SAMO has 24 servers.  By default SiteSpect would get 1/25 of all new traffic.  Is that fine?  Or should we do this on some percentage basis instead?
The affinity cookie logic should be:
   SSLB=1 -> Send to SiteSpect
   SSLB=0 -> Do not send to SiteSpect
   SSLB does not exist -> Split the traffic and set SSLB=0 if sending to non-SiteSpect server
>    SSLB does not exist -> Split the traffic and set SSLB=0 if sending to
> non-SiteSpect server

How's that currently working?  Are the origin webservers setting that?  I don't see that in the Netscaler config. (oremj?)
Sitespect sets all that. We never set a SSLB cookie.
I have to check my notes, but don't you do session based affinity in addition to the checking of the SSLB values?
We don't.  Requires too much state info on the load balancer and takes a performance hit.
OK, so based on the https://bugzilla.mozilla.org/show_bug.cgi?id=509591 thread it looks like users are either going to be served from cache (where the cache honors the SSLB cookie logic) or get served directly from the VIP. When serving from the VIP, SiteSpect sets SSLB cookies and the web server do not. However, they remain  sticky to what ever web server they hit for 15 minutes.
Attached file trafficscript rule —
First draft of a possible ZXTM TrafficScript rule.

The first if-else block selects a pool based on the SSLB Cookie.

The second half of the code assumes this is a new user.  For some percentage of time, users will go to SiteSpect (this is different than percentage of -users- which I'm not sure how to do).  The current SiteSpect setup for www.mozilla.com has 4 origin servers + SiteSpect which means SiteSpect gets 1/5 of the traffic.  

AMO has 24 servers.  Since 1/25 might not be right and isn't very flexible I went with the percent of time.

On the Netscaler we specifically do not cache objects from SiteSpect.  

  add cache policy "mozcom-sitespect-no2" -rule "REQ.HTTP.HEADER Host CONTAINS www.mozilla.com && REQ.HTTP.HEADER Cookie CONTAINS SSLB=1" -action NOCACHE
  
This is a limitation of the Netscaler - it can't cache two different objects that have the same URI.

On ZXTM there's two ways to do this:

1. mimic the Netscaler and set http.cache.disable()
2. cache and set a unique http.cache.setkey()

For a high traffic site like AMO, I think #2 is desirable.

From the docs:

http.cache.setkey()

Allows multiple variants of the same URL to be considered distinct objects, even if the standard 'Vary' RFC semantics would consider the pages identical. Cached objects will be stored with this key, and subsequent requests for the same URL will only match if the same key is provided. An example use is to provided different cached content based on a portion of the User-Agent field of the request. 

oremj, what do you think of this rule?  Can you test it out in staging?
Blake, when can we test this on preview.amo?
Tal, would you recommend testing this setup on preview.amo?
Yes, I do recommend fully testing any infrastructure changes in a preview/test environment before pushing to t alive site. Let me know once it is available and how I can access it. I'll run it through some validation tests on my end. Also, let me know if you need assistance setting up the SiteSpect side.
I'd rather wait to active the rule until after sitespect is configured - is it?
No, not yet.

Do you want to add additional IPs to SiteSpect, or do you want to reuse the existing IP address, but a different port?
different port is easiest on me :)
OK, I'm working on it now.

I'm setting up addons.mozilla.org in SiteSpect to listen on 10.2.80.232 port 81. 

What is the IP address that SiteSpect should proxy requests to?

Also, what is the URL/port that should be used to run a preview. For example, in the www.mozilla.com site, we have http://www.mozilla.com:9081/en-US/ configured as a channel that goes to 10.2.80.232 port 80. Can you setup http://addons.mozilla.com:9081/en-US/ to go to 10.2.80.232 port 81?
Any update on where I should proxy traffic to (see my previous comment).
Matt / Jeremy / Blake,
  I'm still waiting for a response to my question from 11/6
I'm hung on one detail. addons only works over ssl, so do I point sitespect at an ssl vip?
Feels like you need to make another VIP (10.2.80.222) and mimic the real AMO VIP minus the sitespect rules.
https://addons.mozilla.org:9081/ -> 10.2.80.232:81

Sitespect should proxy to 10.2.80.222 port 80.

Anything I missed?
OK, I've setup AMO on SiteSpect.

SiteSpect is listening on 10.2.80.232:81 and proxying to 10.2.80.222:80

There are 2 issues that need to be addressed:

1. There is an issues when previewing. When running a preview, SiteSpect sends a request to (the numeric values will differ)

https://addons.mozilla.org:9081/en-US/firefox/?PREVIEW=1259005667711618536&Site_ID=3&VariationGroup_ID=890&TestCampaign_ID=-999&FullPreview=0&EnablePreviewPopUp=1&IgnoreSiteVariations=0&IgnoreHigherSeqSiteVariations=1&IgnoreQuickChanges=0&Referer=&UsePreviewAssistant=0. 

When that request hits the load balancer, the load balancer proxies it correctly to SiteSpect. SiteSpect proxies it to 10.2.80.222:80. The response that SiteSpect receives is a redirection to 

http://addons.mozilla.org:9081/en-US/firefox/

Since it is being redirected to http instead of https, the request fails.

If you then modify the protocol to https and reissue the request it works correctly. 

This will be an issue for your business users.

2. The second issue is related to failover. Your load balancer is sending heartbeats to /en-US/firefox/. When these fail, what is the expected behavior of the load balancer? When I tested, requests just died.
2. :9081 only has sitespect in the traffic pool, so if you pull that out the virtual server will just die.
Do you need anything from me on this right now?
Well, I'd like to test failover before you introduce this into production. Otherwise there is no way to know what the overall behavior on your network will be when SiteSpect fails over.
(In reply to comment #27)
> Well, I'd like to test failover before you introduce this into production.
> Otherwise there is no way to know what the overall behavior on your network
> will be when SiteSpect fails over.

Did we test this already?
This was tested in the www.mozilla.org site. Since AMO and SUMO are new Sites in SiteSpect, and I assume need new Netscaler configurations, I'd feel more comfortable if we validate that this works in AMO and SUMO.
(In reply to comment #29)
> This was tested in the www.mozilla.org site. Since AMO and SUMO are new Sites
> in SiteSpect, and I assume need new Netscaler configurations, I'd feel more
> comfortable if we validate that this works in AMO and SUMO.

Comment #27 said you wanted to test fail over - want to know if you tested that.

AMO/SUMO aren't behind the Netscaler.  They are behind Zeus ZXTM.  

Our side is already setup, looking to see what's holding this bug up.
First, sorry for the delayed response (holidays, audits, etc).

Second, from my perspective everything is setup. I was waiting on validating the ZXTM config, specifically failover. That said, if you are comfortable with the exiting config, you can push it live. Let me know when it goes live so that I can run a round of validation against the live site.

Third: I can start working on SUMO. As with AMO, I'll setup SiteSpect to use the existing IP address of 10.2.80.232 and listen on a different port. Where should traffic be proxied to by SiteSpect for content?

Just an FYI, since you have the system in-house, you do have the ability to create these sites the same way that I do. I'm more than happy to provide training to you on how to configure the system and validate it.

Last, and on an unrelated note, I noticed that you are running an older version of SiteSpect, version 3.3.6. I recommend updating to our latest version, 3.3.8. The 3.3.8 version has security and feature updates. We perform the update for you. The update process takes 30 to 60 minutes, during which time the system will be failed over so no live traffic will route through SiteSpect, and the control panel will be unavailable. Let me know when you would like to have this scheduled.
IT - when can you direct traffic away from this so the sitespect upgrade can be done?
(In reply to comment #32)
> IT - when can you direct traffic away from this so the sitespect upgrade can be
> done?

Any time you say to, however, comment #31 suggests that during the upgrade the box will be unavailable anyways and will fail the health checks until it's back.  

Does this need to be manually disabled anyways?  When is the upgrade scheduled?
I can place the system into failover mode, which fails heartbeats. At that point I can perform the rest of the update remotely. Let me know when you want the update scheduled.
Can you upgrade it this Thursday @ 7?
Yes, I'll update the system at 7PM EST on 2/4/2010 

Tal
Whiteboard: Upgrade 2/4 @ 7pm
What's the next step?
addons.mozilla.org is configured in SiteSpect.
SiteSpect is listening on 10.2.80.232 port 81 for traffic and will proxy requests to 10.2.80.222 port 80. 
Your load balancer should be sending heartbeat requests to /en-US/firefox/?sshealth=1 (right now it is sending health checks to /en-US/firefox/).

After updating your load balancer heartbeat, you should be able to turn on traffic for addons. Let me know once you do so that I can run validation tests.

support.mozilla.com has not yet been provisioned. I just need to know where you want to proxy traffic to on your end (ip and port).
A quick note on the www.mozilla.com site in SiteSpect:

Right now we are setting cookies to the .mozilla.com domain. In order to deploy support.mozilla.com, I need to change how these cookies are set. The change will be to set the cookies to use the www.mozilla.com domain (and support.mozilla.com). This will impact your testing ability if you test across multiple subdomains. Please let me know if this is the case.
Looking at your existing traffic, changing the cookies may not work.

Can we get on the phone and discuss this issue and possible solutions?

I am available today at 1:00PM EST, 2:00PM EST, and 4:00PM EST.
Blake, do we test across multiple domains?
We will test on multiple domains, but will not run a single test across multiple domains.
Is there an ETA on this?
AMO is pretty much ready whenever. SUMO needs to be moved to zeus first. We should be able to do that tomorrow.
I've turned on sitespect for AMO.
Blocks: 549981
I've set SUMO traffic to go through sitespect.
Status: NEW → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
Component: Server Operations: Web Operations → WebOps: Other
Product: mozilla.org → Infrastructure & Operations
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: