Closed Bug 788272 Opened 12 years ago Closed 11 years ago

[tracking] Split versioncheck into a separate repository

Categories

(addons.mozilla.org Graveyard :: API, defect)

x86
macOS
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED WONTFIX

People

(Reporter: andy+bugzilla, Unassigned)

References

Details

We've got versioncheck split out into a separate repo here:

https://github.com/mozilla/versioncheck/
http://versioncheck.readthedocs.org/en/latest/

This is a tracking bug to figure out how to make this completely independent.
Depends on: 783382
Depends on: 788283
Depends on: 788302
Assignee: nobody → amckay
So, we're looking at a system with huge numbers of reads, a very small number of writes and an unknown to me number of variable responses.

Let's assume for now that the system doesn't have access to the central db. The current SQL is pretty damn amazing, but is probably the hardest approach to scale, and the most resource intensive. So, Version control gets some sort of update ping or item put in the processing queue from central data.

Possibilities for serving off the top of my head to be thought about, in general order of efficiency:

1) Static files, using mod_rewrite or similar to map requests. Are there too many combinations to make this viable? Are there parameters coming from the client that make this impossible? Regenerating a set of static files each time an app gets updated doesn't seem like too much work, so it's a question of data size scope.

2) Grabbing relevant pieces out of memcache. This is unlikely to work if the prior one doesn't, unless there's one hashed piece of data with a whole lot of possible values, or several variables that combine into a large number of columnA-columnB-columnC permutations. The latter doesn't strike me as likely in this case. So this would be a case of doing a couple of lookups and pouring them into a template. Then, the feed keeps the memcached data handy.

3) Feeding the data queue into a denormalized db or two. One row per result is basically solution #1, but getting the thing down to, say, one highly-optimized join might be preferable. 

4) Mysql replication and the current db setup. This is what we currently have

It's also worth noting that with the additional attention we're giving to versioncheck right now, it might be a good time to be thinking about the future. The current API doesn't seem to be serving anyone's needs - it's pretty operationally expensive from our side and given the number of parameters we just ignore, it's not doing what the client needs. Who would we talk to about figuring out a better API overall? Obviously we're going to need to support the old one for some time, but if we can do better by having a cleaner API, that's also worth exploring.
It also occurs to me that we could do some cool hybrid approaches. For example, what if we pregenerated the most frequent, say, 100K requests and put them in flat files? We could then use mod_rewrite and a 404 handler to take care of the rest. I don't know what sort of percentage that would be, but a few log files would tell us that.

This is a good opportunity to be creative :)
Did some logcrunching today. Got a logfile with 14.5M queries in it.

Blank requests accounted for 18% of all queries. Those are easy.

The top 50 ids accounted for 60% of the traffic. 

Looking at the top id, there were 33 combinations of reqVersion, appID and int(appVersion), with the bulk of the combinations being due to appVersion (reqVersion+appID only produced 4 combinations). Even there, the top 4 combinations accounted for well over 75% of requests.
also worth noting: Darwin, Linux, WINNT and blank make up 99.something percent of appOS. Heck, WINNT and blank account for like 97%

That means we could get the bulk of hits with ~ 100 ids * 5 appIDs * 2 reqVersions * ~20 versions * 2 appOS. That's 40K files. 

Another way to look at it: the top 10K combinations of reqVersion, id, appid, appversion and appOS account for 78% of all queries (and blank for another 18%, so you're up to 96%). That's got to be cachable. Sample the logs once a week and regenerate the files.
As we chatted about in IRC, the goal of this bug is the seperation of the code base and deployment. 

Happy to address rewrites/caching/rearchitecture in another bug. And that's some good ideas number crunching in there.
Component: General → API
Product: Marketplace → addons.mozilla.org
Assignee: amckay → nobody
I don't think this is going to happen. If someone wants to go for it, please do.
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → WONTFIX
Product: addons.mozilla.org → addons.mozilla.org Graveyard
You need to log in before you can comment on or make changes to this bug.