788272 - [tracking] Split versioncheck into a separate repository

Reporter

Description

•

12 years ago

We've got versioncheck split out into a separate repo here:

https://github.com/mozilla/versioncheck/
http://versioncheck.readthedocs.org/en/latest/

This is a tracking bug to figure out how to make this completely independent.

Andy McKay

Reporter

Updated

•

12 years ago

Depends on: 783382

Andy McKay

Reporter

Updated

•

12 years ago

Depends on: 788283

Andy McKay

Reporter

Updated

•

12 years ago

Depends on: 788302

Andy McKay

Reporter

Updated

•

12 years ago

Assignee: nobody → amckay

Toby Elliott [:telliott]

Comment 1

•

12 years ago

So, we're looking at a system with huge numbers of reads, a very small number of writes and an unknown to me number of variable responses.

Let's assume for now that the system doesn't have access to the central db. The current SQL is pretty damn amazing, but is probably the hardest approach to scale, and the most resource intensive. So, Version control gets some sort of update ping or item put in the processing queue from central data.

Possibilities for serving off the top of my head to be thought about, in general order of efficiency:

1) Static files, using mod_rewrite or similar to map requests. Are there too many combinations to make this viable? Are there parameters coming from the client that make this impossible? Regenerating a set of static files each time an app gets updated doesn't seem like too much work, so it's a question of data size scope.

2) Grabbing relevant pieces out of memcache. This is unlikely to work if the prior one doesn't, unless there's one hashed piece of data with a whole lot of possible values, or several variables that combine into a large number of columnA-columnB-columnC permutations. The latter doesn't strike me as likely in this case. So this would be a case of doing a couple of lookups and pouring them into a template. Then, the feed keeps the memcached data handy.

3) Feeding the data queue into a denormalized db or two. One row per result is basically solution #1, but getting the thing down to, say, one highly-optimized join might be preferable.

4) Mysql replication and the current db setup. This is what we currently have

It's also worth noting that with the additional attention we're giving to versioncheck right now, it might be a good time to be thinking about the future. The current API doesn't seem to be serving anyone's needs - it's pretty operationally expensive from our side and given the number of parameters we just ignore, it's not doing what the client needs. Who would we talk to about figuring out a better API overall? Obviously we're going to need to support the old one for some time, but if we can do better by having a cleaner API, that's also worth exploring.

Toby Elliott [:telliott]

Comment 2

•

12 years ago

It also occurs to me that we could do some cool hybrid approaches. For example, what if we pregenerated the most frequent, say, 100K requests and put them in flat files? We could then use mod_rewrite and a 404 handler to take care of the rest. I don't know what sort of percentage that would be, but a few log files would tell us that.

This is a good opportunity to be creative :)

Toby Elliott [:telliott]

Comment 3

•

12 years ago

Did some logcrunching today. Got a logfile with 14.5M queries in it.

Blank requests accounted for 18% of all queries. Those are easy.

The top 50 ids accounted for 60% of the traffic. 

Looking at the top id, there were 33 combinations of reqVersion, appID and int(appVersion), with the bulk of the combinations being due to appVersion (reqVersion+appID only produced 4 combinations). Even there, the top 4 combinations accounted for well over 75% of requests.

Toby Elliott [:telliott]

Comment 4

•

12 years ago

also worth noting: Darwin, Linux, WINNT and blank make up 99.something percent of appOS. Heck, WINNT and blank account for like 97%

That means we could get the bulk of hits with ~ 100 ids * 5 appIDs * 2 reqVersions * ~20 versions * 2 appOS. That's 40K files. 

Another way to look at it: the top 10K combinations of reqVersion, id, appid, appversion and appOS account for 78% of all queries (and blank for another 18%, so you're up to 96%). That's got to be cachable. Sample the logs once a week and regenerate the files.

Andy McKay

Reporter

Comment 5

•

12 years ago

As we chatted about in IRC, the goal of this bug is the seperation of the code base and deployment. 

Happy to address rewrites/caching/rearchitecture in another bug. And that's some good ideas number crunching in there.

Wil Clouser [:clouserw]

Updated

•

11 years ago

Component: General → API

Product: Marketplace → addons.mozilla.org

Andy McKay

Reporter

Updated

•

11 years ago

Assignee: amckay → nobody

Andy McKay

Reporter

Comment 6

•

11 years ago

I don't think this is going to happen. If someone wants to go for it, please do.

Status: NEW → RESOLVED

Closed: 11 years ago

Resolution: --- → WONTFIX

Nobody; OK to take it and work on it

Assignee

Updated

•

8 years ago

Product: addons.mozilla.org → addons.mozilla.org Graveyard

Bugzilla

Quick Search

[tracking] Split versioncheck into a separate repository

Categories

(addons.mozilla.org Graveyard :: API, defect)

Tracking

(Not tracked)

People

(Reporter: andy+bugzilla, Unassigned)

References

Details

Crash Data

Security

(public)

User Story

Description

Updated

Updated

Updated

Updated

Comment 1

Comment 2

Comment 3

Comment 4

Comment 5

Updated

Updated

Comment 6

Updated