Closed Bug 613620 Opened 14 years ago Closed 13 years ago

Would like to be able to specify a backup region for a given region

Categories

(Webtools :: Bouncer, defect, P2)

defect

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: justdave, Assigned: brandon)

References

Details

When we set a region to less than 100% throttle, the remaining traffic gets sent to the global pool. I'd like the option to specify a "fallback region" other than the global pool for this use. For example, we have a *lot* of mirrors (probably enough to actually handle the global traffic) just in Europe, whereas North America has much fewer mirrors than the number required for the amount of users here. We'd get a lot better user experience if the failover from North America all went to Europe than if it went into the global pool.
And as another example for this one, we have quite a number of single-country regions now because of countries that have very good internal infrastructure, but have limited access to the outside world. Those countries would be better served by having their failover traffic go to the parent continent region or in some cases a country region for a neighboring country that they have better connectivity to.
This is high on the user experience wishlist. We gets complaints when people in North America have to download from a mirror in Cambodia, for example. :)
Severity: normal → major
Priority: -- → P2
Assignee: nobody → anthony
OS: Mac OS X → All
Hardware: x86 → All
Whiteboard: [Rik Q3]
This would also fix the problem we created in bug 646076, where we tried to create an internal mirror, and ended up sending global traffic to it.
Any update? I ask because of bug#646076.
QA is also waiting for this bug to be fixed. With a backup region and the bouncer set to internal mirrors we could reduce the time for our update tests during release testing drastically. Thanks for any update on this bug.
I've finally found the time to tackle some Bouncer bugs lately. I'm gonna try to come up with a first pass by the end of this week.
Any update here? This is holding up a cascade of build network isolation bugs for releng/IT.
(In reply to Chris Cooper [:coop] from comment #8) > Any update here? This is holding up a cascade of build network isolation > bugs for releng/IT. ping?
Anthony is looking forward to working on this but still has a few bugs to tie up on his current project before he can get started here. I expect it to be another weekish before he can have a serious look at this. Thanks for bearing with us!
We've been blocking bug 617414 on this for nearly 6 months now. What's the latest?
QA is also blocked on it to be able to drastically speed-up the update tests for releases (external mirrors vs. internal servers). We would kindly like to get some positive feedback and that it can be implemented in the near future.
(In reply to Ben Hearsum [:bhearsum] from comment #11) > We've been blocking bug 617414 on this for nearly 6 months now. What's the > latest? per offline discussion with LauraT, she will investigate and get back to us with a timeline.
(In reply to John O'Duinn [:joduinn] from comment #13) > per offline discussion with LauraT, she will investigate and get back to us > with a timeline. LauraT, do you have an update here? Would be great to see a timeline.
(In reply to Henrik Skupin (:whimboo) from comment #14) > (In reply to John O'Duinn [:joduinn] from comment #13) > > per offline discussion with LauraT, she will investigate and get back to us > > with a timeline. > > LauraT, do you have an update here? Would be great to see a timeline. rik/LauraT: ping?
All right, sorry for the delay. Sitting down with wenzel and Rik tomorrow to get briefed. After that it's probably a week of work + a week review/test/QA. Let's commit to having this ready to go in about two weeks' time.
Assigning this to Brandon for now, he and Laura got this.
Assignee: anthony → bsavage
Whiteboard: [Rik Q3]
(In reply to Laura Thomson :laura from comment #16) > All right, sorry for the delay. Sitting down with wenzel and Rik tomorrow > to get briefed. After that it's probably a week of work + a week > review/test/QA. Let's commit to having this ready to go in about two weeks' > time. (In reply to Fred Wenzel [:wenzel] from comment #17) > Assigning this to Brandon for now, he and Laura got this. ping - any revised ETA?
Brandon's been working on this, and is not done yet. He's out until Tuesday but he may see this and give us an ETA before then.
I worked on this before I left but didn't get quite finished. It'll be done middle of next week when I return.
(In reply to Brandon Savage [:brandon] from comment #20) > I worked on this before I left but didn't get quite finished. It'll be done > middle of next week when I return. cool, thanks Brandon. We'll plan work on our side accordingly.
Just to be clear, do you want a single backup for a region or the ability to specify multiple backups? This affects how I complete my work.
Pull request here, for single region fallback. https://github.com/fwenzel/tuxedo/pull/8
(In reply to Brandon Savage [:brandon] from comment #22) > Just to be clear, do you want a single backup for a region or the ability to > specify multiple backups? This affects how I complete my work. Dave, what say you?
Not to influence you, but there is already a patch for specifying a single backup region. Multiple backup regions would take longer.
Single backup works for 90% of our use cases.
(In reply to Ben Hearsum [:bhearsum] from comment #24) > (In reply to Brandon Savage [:brandon] from comment #22) > > Just to be clear, do you want a single backup for a region or the ability to > > specify multiple backups? This affects how I complete my work. > > Dave, what say you? (In reply to Brandon Savage [:brandon] from comment #25) > Not to influence you, but there is already a patch for specifying a single > backup region. Multiple backup regions would take longer. (In reply to Dave Miller [:justdave] from comment #26) > Single backup works for 90% of our use cases. justdave: ok, but is this enough to unblock bug#646076?
(In reply to John O'Duinn [:joduinn] from comment #27) > (In reply to Ben Hearsum [:bhearsum] from comment #24) > > (In reply to Brandon Savage [:brandon] from comment #22) > > > Just to be clear, do you want a single backup for a region or the ability to > > > specify multiple backups? This affects how I complete my work. > > > > Dave, what say you? > > (In reply to Brandon Savage [:brandon] from comment #25) > > Not to influence you, but there is already a patch for specifying a single > > backup region. Multiple backup regions would take longer. > > (In reply to Dave Miller [:justdave] from comment #26) > > Single backup works for 90% of our use cases. > > justdave: ok, but is this enough to unblock bug#646076? Brandon, does having a single backup mean that IP blocks can be completely restricted from going outside of their primary/backup, even if both are heavily loaded?
The way that the system currently works is that when a request is throttled, the request is forwarded to the global pool. Under the new code I've written, if a request is throttled and there exists a backup region, the request is forwarded there. If the request does not have a backup region, or no suitable mirrors for the backup region can be found, the request is forwarded to the global pool as a sanity check. Bouncer makes no examination or distinction about load when it calculates a mirror. It doesn't test the mirror to find out how loaded it is. While an acceptable backup with a returned mirror would in fact prevent it from going outside the primary/secondary, it wouldn't be related to load.
(In reply to Brandon Savage [:brandon] from comment #29) > The way that the system currently works is that when a request is throttled, > the request is forwarded to the global pool. Under the new code I've > written, if a request is throttled and there exists a backup region, the > request is forwarded there. If the request does not have a backup region, or > no suitable mirrors for the backup region can be found, the request is > forwarded to the global pool as a sanity check. > > Bouncer makes no examination or distinction about load when it calculates a > mirror. It doesn't test the mirror to find out how loaded it is. While an > acceptable backup with a returned mirror would in fact prevent it from going > outside the primary/secondary, it wouldn't be related to load. I had a long conversation with Brandon about this, and I don't think the current implementation will address the use case we want for bug 646076. According to him, even with a backup specified requests will fall back to the global pool if an acceptable mirror isn't found in the primary or backup region. That's definitely the right behaviour for the real world (IMO), but for internal mirror purposes having some sort of way to make certain IP blocks or regions not fall back to the global region would be good. Maybe that's a follow-up bug, though?
Pull request, including the new feature in Comment 30, here: https://github.com/fwenzel/tuxedo/pull/9/commits Waiting for review and testing.
(In reply to Brandon Savage [:brandon] from comment #31) > Pull request, including the new feature in Comment 30, here: > https://github.com/fwenzel/tuxedo/pull/9/commits Waiting for review and > testing. I don't know the Tuxedo code, but it looks like this will do what we need, whee! Thank you!
The code changes (which were r+'ed) include a flag, which when set, prevents global pool failover.
(In reply to Brandon Savage [:brandon] from comment #33) > The code changes (which were r+'ed) include a flag, which when set, prevents > global pool failover. Sweet to see that r+'d patch landed in github! Thank you Brandon. So... of course the next question is: how long will this take to get tested and deployed to production?
Brandon, would you mind giving me a reply if that also helps Mozilla QA with the update testing of Firefox as mentioned in comment 6? I assume I should file a new bug which depends on the resolution of that one.
(In reply to Henrik Skupin (:whimboo) from comment #35) > Brandon, would you mind giving me a reply if that also helps Mozilla QA with > the update testing of Firefox as mentioned in comment 6? I assume I should > file a new bug which depends on the resolution of that one. AFAICT, your use case is the same as RelEng's.
(In reply to John O'Duinn [:joduinn] from comment #34) > (In reply to Brandon Savage [:brandon] from comment #33) > > The code changes (which were r+'ed) include a flag, which when set, prevents > > global pool failover. > > Sweet to see that r+'d patch landed in github! Thank you Brandon. > > So... of course the next question is: how long will this take to get tested > and deployed to production? ping?
Yes, this will fit the use case in Comment 6.
John: I'll merge some code for bug 700482 today and open a bug to get this deployed.
Depends on: 740002
(In reply to Anthony Ricaud (:rik) from comment #39) > John: I'll merge some code for bug 700482 today and open a bug to get this > deployed. :rik, Thanks for that. I see bug#740002 is now closed as FIXED. Anything left to do here?
This still needs to be staged, QA'ed, and released. Fred and Rik probably have a better idea than I do about how to get that process going.
This is staged. You can check on https://tuxedo.stage.mozilla.com/. I can create accounts for the admin if you need to test the features. We need to test this since it strongly impacts the download experience. After you've tested this, we'll open a bug for pushing to production.
Thanks to Rik I've got an account on tuxedo stage now. I've set-up the region/mirror/ip blocks/country like we want them for bug 646076, and once releng-mirror01 is up and running again (bug 741774) I can verify the disable-global-fallback part of this bug.
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Depends on: 749207
Target Milestone: --- → 2.0
Web QA has temporarily deferred testing on this functionality, and tested heavily (positive/negative) around the changes that we shipped, tonight, to Bouncer 2.0. We will revisit this at a later date, TBD, and may need Ben's help :-)
You need to log in before you can comment on or make changes to this bug.