Closed Bug 799155 Opened 12 years ago Closed 12 years ago

SSL CDN for developer.mozilla.org

Categories

(Infrastructure & Operations Graveyard :: WebOps: Other, task, P4)

All
Other

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: nmaul, Assigned: nmaul)

References

()

Details

(Whiteboard: [triaged 20121008])

Summary says it all... let's get this set up on Akamai.

Name will be developer.cdn.mozilla.net
    should include query strings and obey cache headers

Origin will be developer-origin.cdn.mozilla.net
    should be on the prod MDN cluster, separate vhost, aliases to /media/, etc
    need .htaccess for cache headers so webdev can manage it
CC'ing folks on the MDN team

This will allow us to eliminate the dynect + AMS1 caching layer that MDN uses, and provide better worldwide performance for MDN.
Priority: -- → P4
Whiteboard: [triaged 20121008]
Yes please! We got a big performance gain when we switched to django. Now the biggest (smallest?) bottleneck on performance now is high latency for international users - including the audience in Firefox OS target markets whom we're actively recruiting to translate Firefox OS and Apps docs.
Akamai property submitted for creation (takes ~8hrs) and matching origin vhost committed to puppet.

Next step is to make sure we have sane Cache-Control headers in /media/ and /admin-media/, and whatever else you want to be made available on the CDN. Once that's straightened out it should be pretty straightforward to just edit MEDIA_URL in settings_local.py to include the CDN prefix (https://developer.cdn.mozilla.net/).
cc'ing :openjck and :teoli who will probably be very happy to hear about this!
It seems there is a .htaccess file in /media/ already, so this may be usable as-is. I forgot to make DNS entries earlier, but that's done now... once it propagates this should be testable. The only concern I might have is if/when these things change, will the way they're accessed change as well? Query strings / build/version numbers, filenames, etc... that may or may not be a concern.


# Set far-future Expires headers for static media
ExpiresActive on

ExpiresDefault "access plus 1 week"
ExpiresByType text/css "access plus 1 year"
ExpiresByType text/javascript "access plus 1 year"
ExpiresByType image/png "access plus 1 month"
ExpiresByType image/jpeg "access plus 1 month"
ExpiresByType image/gif "access plus 1 month"
ExpiresByType image/vnd.microsoft.icon "access plus 1 month"
ExpiresByType video/webm "access plus 1 week"
ExpiresByType video/ogg "access plus 1 week"
ExpiresByType video/x-flv "access plus 1 week"
ExpiresByType application/x-shockwave-flash "access plus 1 week"
Blocks: 783527
This is really good news! It made my day! 

I have partial answers to Jake questions:
- Most of our css and js that are in /media/ are revved (E.g. mdn-min.css?build=dcfe923 ). So each time we push a new MDN release, the name changes. (I never tested the interaction with the '?' and Expire header but it should work)
- We have two exceptions: mdn-print.css (our print css), which is not important (it is not really pretty right now and we should revved it — bug 783852 ) and our Template:CustomCSS which is not under /media/ and is therefore not yet concerned — we are looking for a definitive solution for it in bug 770195 ).
- CKEditor css/js are revved too, so this bug should solve bug 783859
- Our theme images are not revved but as we don't use sprites, this should not be a problem: we almost never modify them and when adding some they will have a new name. This doesn't affect content images which are not under /media.

Changes I _propose_ to the listed Expires headers:
1) Add an entry (1 month) for SVG images (we don't have any, but we may in the future) [Maybe configure gzip for SVG images at the same time, we need to this for our content anyway — bug 785801 ]
2) Add an entry (1 month) for image/x-icon which is the MIME type we use for our favicon (just checked the response I get for it).
3) I'm a little bit concerned by the default "access plus 1 week". That means that text/html won't be able to be changed in 1 week. That's pretty long. So even if text/html shouldn't be under /media/, for "security" I would add a text/html of 15 minutes. This shouldn't change anything as there shouldn't be an html file there anyway.
Blocks: 783859, 783851, 792974
Added Jeremie and David in CC: as they also have experience in that domain and will be interested too.
I'm generally fine with all of those changes... the rules are in github, so it's a webdev-controlled thing:

https://github.com/mozilla/kuma/blob/master/media/.htaccess

Feel free to add/change as desired. My only request/comment is that shorter TTLs can hurt CDN cache hit rate, so it's good to stay as high as is feasible (although past a point it doesn't matter much anymore).

I am slightly concerned about text/html, because if we do this as a "just in case" measure, we're less likely to ever figure out that there *is* a problem. However I won't block on this, and my previous statements stand... feel free to do it anyway if you prefer. :)
Let us know when you're ready to try this out... generally we just need to change MEDIA_URL in settings_local.py.

We can do it on dev or stage first for testing... although note that it will be pointed at the prod instance's /media/ content, so that might be slightly wonky if they're not identical.
Flags: needinfo?
Whiteboard: [triaged 20121008] → [triaged 20121008][waiting][webdev]
let's test it on stage ASAP. we just did a push so the media files should be identical.
Flags: needinfo?
This is in place... I just changed MEDIA_URL in settings_local.py:

MEDIA_URL = '/media/'
to
MEDIA_URL = 'https://developer.cdn.mozilla.net/media/'

Seems to work for me. Want to verify and then we can disable in stage and enable in prod? One thing I did notice is that there are a few CSS files on the main /en-US/ page that have 1-year-long TTLs, but don't have a ?build= query string:

https://developer.cdn.mozilla.net/media/css/mdn-print.css
https://developer.cdn.mozilla.net/media/css/demos.css
https://developer.cdn.mozilla.net/media/js/mdn/jquery.hoverIntent.minified.js
cc'ing craig cook and david walsh who might know if we left the ?build= query string off those frontend assets for a reason?
This was pushed to prod on Friday... no ill effects reported to Ops so far!

I'm looking at scatterplots in Catchpoint monitoring, and some regions show a noticeably faster "webpage response" time than before. In particular South America, Japan, and South Korea all show nice improvement. Australia is somewhat improved as well.

Hit rate is lower than I expected (around 80%). I suspect this will climb a little, but we're past the break-in period and it's not likely to get more than a few % better.

I was curious why this was not higher, and I believe I've found a potential culprit. According to Akamai's reporting, some files have a *very bad* hit rate, and are called frequently. Specifically these two are very commonly accessed:

developer-origin.cdn.mozilla.net/media/fonts/BebasNeue-webfont.woff
developer-origin.cdn.mozilla.net/media/fonts/League_Gothic-webfont.woff

The hit rate on these two files is around 4%. There are other files with a similarly poor hit rate... all of them in /media/fonts/.

I believe this is due to the presence of a "Vary: Referer" header, being added in kuma/media/fonts/.htaccess:


# block hotlinking to .woff and .eof files
RewriteCond "%{HTTP_REFERER}" "!https?://.*mozilla\.(com|org)/.*$"
RewriteRule \.(woff|eot)$ - [F,NC,L]

<FilesMatch "\.(ttf|woff|eot)$">
    Header append vary "Referer"
    ExpiresActive On
    ExpiresDefault "access plus 2 weeks"
    Header set Access-Control-Allow-Origin "*"
</FilesMatch>


The problem is the Referer header actually varies on virtually every single page where the font is needed... it's not a good trigger.

Can we remove this extra protection? I understand the desire to stop hotlinking, but I think we're impacting performance unnecessarily because of this. The Referer varies with *every single page* you view, not just the domain you come from.

If we really want to keep this then I can move this logic into the CDN config. I'm not a huge fan of this (I'd rather not bother at all), but I'd rather move this into the CDN config than leave it as-is, where it's hurting our hit rate.

Or alternatively perhaps we can set the CORS header better and eliminate Referer checking altogether... ? Do we know of anything other than https://developer.mozilla.org that should be using those fonts directly?

Another alternative... can we key off of Origin instead of Referer? Origin contains only the protocol and host name, so the list of potential cases is much, much smaller, and this should still work to prevent hotlinking for any CORS-aware browser.
Flags: needinfo?
I'm just going to remove that .htaccess file. We can re-assess if we ever see tons of hot-linking on the fonts.
Flags: needinfo?
Good catch!
Copying this from github: Are we using any non-open fonts? (ie. for the Mozilla wordmark, etc?) I seem to remember these font controls were put into place across Mozilla by demand from the company that licensed the web fonts, under threat of having the license (and right to use the fonts) revoked. Doesn't matter if hotlinking *was* taking place; the capacity that it *might* take place ran afoul of the license.
I think I found the original, bug 540859. Make completely sure we don't expose any fonts whose licenses forbid making them available without "reasonable measures" to prevent hotlinking
I find the conclusions in that bug to be rather suspect. The referer-based restrictions were settled upon as a reasonable way to allow caching while still preventing the obvious hotlinking problem. However in reality (as shown by Akamai's reporting) it does *not* in fact allow for any significant caching at all.

CORS is comparatively new (newer than that bug, at least in terms of having widespread support), and solves essentially the same problem. It should be possible to have decent caching with a "Vary: Origin" header instead of "Vary: Referer", and have the CORS header set properly with some simple SetEnvIf + mod_rewrite rules...

<FilesMatch "\.woff$">
    SetEnvIf Origin "^(.*\.mozilla\.com|.*\.mozilla\.org)$" ORIGIN_SUB_DOMAIN=$1
    Header append vary Origin
    Header set Access-Control-Allow-Origin "%{ORIGIN_SUB_DOMAIN}e" env=ORIGIN_SUB_DOMAIN
</FilesMatch>

With that in place, anything that obeys CORS should allow or disallow the fonts as intended. That covers most all recent versions of all major browsers. I believe that's enough to prevent any significant hotlink problems. Non-malicious hotlinkers will quickly discover that it is extremely unreliable due to CORS, and we've already established (in bug 540859) that we can do virtually nothing about malicious usage... anyone can download the font and host it themselves. Therefore I believe this meets all of the requirements: CORS is a reasonable measure.


If for some reason we're still unhappy with that, the fallback would be to enforce the Referer checking within the CDN, instead of via .htaccess rules on the origin. The obvious downside is this converts the CDN from a dumb caching-reverse-proxy into an intelligent entity with built-in rules about content serving that are not at all evident from looking at the site code or config. If we do this, I believe we should *keep* the .htaccess rules, but change them to trigger only if the Host: requested was developer.mozilla.org (as opposed to developer-origin.cdn.mozilla.net). This will help to keep things a bit more obvious for future ops/devs. A comment in the file would be good too. This is still weird, but at least we'd have a chance at diagnosing any problems that arise.
I did https://github.com/mozilla/kuma/pull/671/files before I read this comment. Which way do you suggest?
 RewriteCond "%{HTTP_REFERER}" "!https?://.*mozilla\.(com|org)/.*$"
-RewriteRule \.(woff|eot)$ - [F,NC,L]
+RewriteRule \.(woff|eot)$ - [F,NC,L,E=!CORS]

 <FilesMatch "\.(ttf|woff|eot)$">
-    Header append vary "Referer"	
+    Header append vary "Origin"

     ExpiresActive On
     ExpiresDefault "access plus 2 weeks"

-    Header set Access-Control-Allow-Origin "*"	
+    Header set Access-Control-Allow-Origin "*" env=CORS
 </FilesMatch>


The middle change is okay. I don't understand the first and 3rd ones though... what are we trying to do here? In the first change, we unset the CORS env var... and in the 3rd change, we send a CORS header only when the CORS env var is set. But we've never set it in the first place, so by my reading it will never go out.

Also, the initial RewriteCond is still based on HTTP_REFERER, rather than HTTP_ORIGIN. This might result in some confusion. I think it might be okay, but I'm having some trouble stepping through the logic.


Side note: on the Expires stuff, we might want to consider moving to media/.htaccess, where all the other Expires directives live.
Commits pushed to master at https://github.com/mozilla/kuma

https://github.com/mozilla/kuma/commit/f1eb1ac9c7313667ed033f2973c0958b71fe9414
bug 799155 - smarter font access and caching controls

https://github.com/mozilla/kuma/commit/832e48844d66554f67ae744315f87eb99466154d
Merge pull request #671 from groovecoder/cdn-fonts-fix-799155

bug 799155 - smarter font access and caching controls
I added that because of https://github.com/mozilla/kuma/blob/master/configs/htaccess-without-mindtouch#L33 up in the main htaccess. But now I see that config only enables CORS for the mwsgi requests. So this change effectively changes the cache vary to Origin, but doesn't use CORS to allow hot-linking from mozilla.org or mozilla.com. Which is fine because it should really only link from the same domain.

Anyway, let's keep an eye on that cache hit rate to see if this works/fixes it.
This seems to be fixed! Hit rate is currently about 97% for the last day. All files are hitting pretty consistently... no consistent misses that I can see.

It would be nice if someday we could figure out how to get URLs like this on the CDN also:

https://developer.mozilla.org/en-US/jsi18n/build:6cf0f69

But that's merely an incremental optimization, and not something we need to keep this bug around for.

Thanks all!
Assignee: server-ops-webops → nmaul
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Whiteboard: [triaged 20121008][waiting][webdev] → [triaged 20121008]
And the effect is huge. Thanks to this and the settings of the cache headers, the average page load in Western Europe went from 5-7s to 2-3.5s. :-)

\o/ Well done!
Component: Server Operations: Web Operations → WebOps: Other
Product: mozilla.org → Infrastructure & Operations
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in before you can comment on or make changes to this bug.