Closed Bug 626564 Opened 14 years ago Closed 12 years ago

Serve addons from the CDN instead of mirrors

Categories

(addons.mozilla.org Graveyard :: Administration, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED DUPLICATE of bug 806615

People

(Reporter: nthomas, Unassigned)

Details

It looks like every version of an addon ever released is being carried by our core mirrors, eg http://releases.mozilla.org/pub/mozilla.org/addons/1865/?C=M;O=A They now they take up more than 10G of space, spread over 8000+ directories and 37000+ files. I bet most install requests would be met by carrying only the last, or last two, releases for each addon. That would help sync times (less stat's for rsync to do on an NFS mount!) and lower the disk usage - the 10G used is 10% of the size we ask our largest mirrors to carry, and we have to fit in firefox/mobile/thunderbird/seamonkey/xulrunner/etc in too. Could we serve the older versions from a mozilla server instead ? We have machines like dm-download02 which share the NFS mount which the mirrors sync from.
We can adjust this in our code to match whatever timing/version criteria we agree on. The trouble we have is that we have no way of knowing if the add-on is on the mirror so if it happens to not be, the person gets a super unfriendly popup in Fx and their add-on doesn't install. Right now our logic is: > If add-on is public: > If add-on has been in the system for > 30 minutes: > Redirect to mirror > else: > Serve locally I'm happy to adjust that, but will need some recommendations on what we change it to and how to deal with edge cases (eg. someone uploads 3 versions of their add-ons within 30 minutes). Additionally, you can't depend on timestamp for what the latest 2 versions are. You'll either need to parse the versions from the filenames with Mozilla's crazy versioning regex, or talk to the AMO db.
Something like bouncer is the ideal case from a user perspective since it yields a list of active mirrors for any file, modulo the refresh interval. A check per addon version per mirror is going to be pretty expensive in time though. 10G isn't a lot of space in an absolute sense, just compared to the 100G limit we ask the mirrors to carry. So we could perhaps split addons out into a new rsync module, and only the top tier mirrors carry it (this doesn't solve the 30 minute sync assumption). Or we could have a few self-hosted boxes that do all the serving. What do you think justdave ?
I already sync addons separately to the chinese mirror, because it gets more than 30 minutes old any people over there complain about 404s on addons every time we have a Firefox release. rsync -av --timeout=900 --delete --delete-before stage-rsync.mozilla.org::mozilla-releases/addons/ /opt/releases.mozilla.org/pub/mozilla.org/addons > $logfile and rsync -av --timeout=900 --delete --delete-before --exclude=/addons stage-rsync.mozilla.org::mozilla-releases/ /opt/releases.mozilla.org/pub/mozilla.org > $logfile in two separately-cronned jobs.
Is this bug still valid? If so, what are the next steps?
We moved the Firefox/Thunderbird/SeaMonkey downloads from the mirror network to a CDN a few weeks ago, and the major user of the mirror network is now AMO serving addons from the 'tier 1' mirrors that are in releases.m.o. Perhaps we should be looking to move addon serving off the mirrors too, which will allow us to deprecate that system. Do we have any way to quantify how much traffic installs from AMO generates ?
As far as I know we have no good way to quantify the the bandwidth used by AMO downloads to releases.mozilla.org. AMO Ops or Metrics might be able to hint at a hit count, though. Another benefit of moving AMO downloads to a CDN... no rsync replication, which means no replication lag. The app could refer to the CDN resource immediately, and not need any logic around timing (assuming resources don't *change* once created).
https://addons.mozilla.org/en-US/statistics/addons_downloaded/?last=30 says we are averaging 1.1 to 1.2 million downloads a day. Can you calculate the average size of an add-on based on the files on the server?
Probably, but I'm not sure how much good that would be... it would be "average file size", and not "average size of each download". I suspect a large % of the bandwidth would come from a handful of addons. Also I believe old versions of addons are in the same directory tree... not sure how trivial it would be to exclude them. In any case, there are 70,836 .xpi files, based on directory structure at /mnt/netapp_amo/addons.mozilla.org/files. The average file size of them is 268529 bytes. At 1.2M hits/day, that's approximately 300GB/day transferred... around 30mbps average (given the caveats above, of course... this could be wildly misleading). It appears to be slightly higher right around a Firefox release (~1.3M hits/day), which isn't too surprising. Based on this, I believe adding this to a CDN would be an insignificant amount of bandwidth (compared to Firefox itself).
We could switch all addons downloads to the CDN for a few hours to get real numbers. Also, I'm all for serving them from the CDN instead of the mirror network.
Ok, morphing the bug to switch to the CDN. What are our next steps here ?
Summary: Serve old versions of addons from mozilla server → Serve addons from the CDN instead of mirrors
This was done on bug 806615.
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → DUPLICATE
Product: addons.mozilla.org → addons.mozilla.org Graveyard
You need to log in before you can comment on or make changes to this bug.