Closed
Bug 807839
Opened 13 years ago
Closed 11 years ago
Replace zeus trafficscript rule for splitting between reg/node
Categories
(Cloud Services Graveyard :: Server: Sync, defect)
Tracking
(Not tracked)
VERIFIED
FIXED
People
(Reporter: rfkelly, Unassigned)
References
Details
Attachments
(3 files)
11.16 KB,
patch
|
telliott
:
review+
|
Details | Diff | Splinter Review |
5.22 KB,
patch
|
telliott
:
review+
|
Details | Diff | Splinter Review |
1004 bytes,
patch
|
telliott
:
review+
|
Details | Diff | Splinter Review |
Per email discussion, the implementation of the user API exposed on auth.services.mozilla.com is currently split between two services:
Requests to /user/1.0/<username>/node/weave are directed to node*.reg.smc
All other requests are directed to web*.reg.smc
The splitting is currently achieved by a Zeus TrafficScript rule. We're planning to migrate away from Zeus, so we need a new way to handle this.
I see two options:
1) Continue to handle this at the auth.smc VIP level, by copying the trafficscript logic into whatever replaces Zeus.
2) Send all traffic through to web*.reg.smc initially, and have it explicitly proxy to node*.reg.smc when needed.
I have a slight preference for option (2) as I think it makes the code and the deployment a bit simpler to understand overall. Option (1) has the advantage of less overhead since each node request has one less hop to travel through.
Ultimately this should be an Ops call, depending on which option makes the deployment simpler/cleaner/easier. I believe atoll has expressed a preference for option (2) via email, but I wanted to move the final decision into a bug for posterity.
I prefer (2), since that maintains an existing method (web.reg forwards to various backend VIPs - in this case, web.sreg and node.reg), rather than anything the app developers can't see and can't test in (1).
Reporter | ||
Comment 2•13 years ago
|
||
Thinking this through some more, I wonder if the node*.reg servers are actually useful at all. They were originally intended as a separate external entry point for the client, but it doesn't sound like we will ever actually implement that in production.
So what if we have reg talk directly to both sreg and snode to fulfill its duties?
In other words, rather than something like this:
reg --> sreg
|
-------> node --> snode
Just cut out the middleman and have:
reg --> sreg
|
-------> snode
I think the original theory for the node servers was that non-Sync products might want to request nodes, and two or three steps away from that on some future paths involves "node allocation SLA". In my opinion, reg + snode is just as easily scaled to provide "node allocation SLA" - if we ever desire to someday - rather than reg + node + snode, which is more moving pieces for no useful effect in any case.
Reporter | ||
Comment 4•13 years ago
|
||
(In reply to Richard Soderberg [:atoll] from comment #3)
> I think the original theory for the node servers was that non-Sync products
> might want to request nodes
I think (hope?) that by the time we need this, it'll be handled by tokenserver and friends rather than the existing node-assignment servers. So if we can simplify stuff in the meantime, I think that'll be a win.
I'll also note that taking node.reg out of the equation also removes the principal benefit for option (1) listed above - there is now no 'extra hop' cost to doing it inside server-reg rather than with a separate service.
The attached patch implements the basic idea, which is similar to the code currently in server-sreg - load an optional 'nodes' backend, delegate node-assignment calls to it, but continue to offer a "fallback_node" option for folks who want to skip all this hassle on smaller deployments.
Attachment #677650 -
Flags: review?(telliott)
Comment 5•13 years ago
|
||
Yeah, this was done in preparation for tokenserver, where the roles of account-creation and node assignment would be done by two separate services.
That, plus the idea that only we needed this particular chunk of code, so splitting it out and keeping it away from users who wanted to set up a 1-node system made their lives a little easier.
Updated•13 years ago
|
Attachment #677650 -
Flags: review?(telliott) → review+
What sections and/or section parameters from node "nodes.cfg" would need to be merged into reg "sync.cfg" as part of this patch?
Reporter | ||
Comment 7•13 years ago
|
||
(In reply to Richard Soderberg [:atoll] from comment #6)
> What sections and/or section parameters from node "nodes.cfg" would need to
> be merged into reg "sync.cfg" as part of this patch?
Just the "nodes" section. I will go through the dev and stage config files and make sure there's nothing lurking in any other sections that would be relevant here.
Reporter | ||
Comment 8•13 years ago
|
||
Committed in http://hg.mozilla.org/services/server-reg/rev/84105730250c
I have looked through the config files and I believe that copying the [nodes] section from node
"nodes.cfg" into reg "sync.conf" should be the only necessary configuration change. The reg machines will also need to be able to send traffic to the snode machines.
I'll make a tag and deployment bug for trying this out on stage.
Reporter | ||
Comment 9•13 years ago
|
||
Actually, my initial patch is incomplete. Node-assignment backends may, instead of just returning the assigned node, raise an AlreadyWrittenError with the node as payload.
That's a bit yuck to deal with without introducing a hard dependency on server-node-assignment, since you need to exception class in order to catch it properly. In the attached patch I've chosen to stub it out if mozsvcnodes can't be imported.
(I'm not sure I see the point of AlreadyWrittenError, but I'd rather change things in one place rather than fiddling with both server-reg and server-node-assignment)
Attachment #678934 -
Flags: review?(telliott)
Comment 10•13 years ago
|
||
AlreadyWrittenError exists because the code deployed on node and snode are the same thing. Therefort, when snode does the write, the function that proxies needs to be able to return "good, but take no action" as a result.
Reporter | ||
Comment 11•13 years ago
|
||
Ah, I see - so the controller doesn't try to write it into LDAP twice. Thanks.
Updated•13 years ago
|
Attachment #678934 -
Flags: review?(telliott) → review+
Reporter | ||
Comment 12•13 years ago
|
||
Reporter | ||
Comment 13•13 years ago
|
||
It turns out that the zeus script was also re-writing "/node/weave" to "/node/sync" when forwarding traffic to the node servers. So we have to use "sync" as the product name when using the MozSvcNodes API directly, rather than "weave". Patch attached.
Attachment #681813 -
Flags: review?(telliott)
Comment 14•13 years ago
|
||
Comment on attachment 681813 [details] [diff] [review]
server-reg patch to use "sync" as product name, not "weave"
It'd probably be a good idea to just search the entire codebase for weave references at this point.
Attachment #681813 -
Flags: review?(telliott) → review+
Reporter | ||
Comment 15•13 years ago
|
||
Updated•11 years ago
|
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Updated•2 years ago
|
Product: Cloud Services → Cloud Services Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•