Closed Bug 508093 Opened 15 years ago Closed 15 years ago

Install mod_concat on all prod. and staging servers

Categories

(Infrastructure & Operations Graveyard :: WebOps: Other, task)

All
Other
task
Not set
normal

Tracking

(Not tracked)

RESOLVED WONTFIX

People

(Reporter: rdoherty, Assigned: oremj)

References

()

Details

A lot of our performance work for Mozilla websites includes concatenating CSS & JS files together to reduce HTTP requests (#1 way to speed up websites). This usually involves writing a custom build script that needs to be run before every release.

http://code.google.com/p/modconcat/ is an apache module that allows developers to concat files by simply creating a URL listing the files to concat together (mozilla.com/js/??file.js,file2.js,file3.js). This will enable any website we maintain to concat their files easily. 

Can this module be installed on our servers?
That's a cool idea.  If I'm reading it right, we'd still have to maintain a revision number to make the URL unique, right?
Build docs from http://modconcat.googlecode.com/files/mod_concat.pdf

You will need to first download the code from the SVN repository. The repository
is here: http://modconcat.googlecode.com/svn/trunk/mod_concat/ 

once you have the code, just compile it via 

    $ apxs -c mod_concat.c 

alternatively you could just run 

    $ apxs -i -a -c mod_concat.c 

that builds mod_concat, installs it in the default place, and activates in the
configuration for you so it just works when you next restart. 

once you have built it, you will need to configure it. 

    Loadmodule concat_module modules/mod_concat.so
(In reply to comment #1)
> That's a cool idea.  If I'm reading it right, we'd still have to maintain a
> revision number to make the URL unique, right?

Correct, if we still far-futures expires our static content (which we should), we still need to change the URL when it is updated. Perhaps another apache or php module can help us with that.

I'm not 100% sure how this works with .htaccess files setting the expires header as mod_concat uses last-modified time of the most recently modified file when it sends the file.
If we want to try this out somewhere small, I'd like to use http://onebillion.stage.mozilla.com/ (and then the real site).  I'll have a patch that takes advantage of modconcat on there soon.
Could we get an ETA?  This seems pretty simple -- any reason why we can't do this?
Severity: minor → normal
The PDF on the site also mentions :

now a word of warning. if you use security/access permissions on your site, you will need to be very careful, as a clever 
person might be able to subvert this module to read the contents of this file.

Which is sort of vague. What exactly is the author referring to? And what are the possible security implications?
My other concern is maintaining the module, keeping it updated etc is going to be hard if we just build it from source. We're probably going to have to create rpms for this as well, if we're going to push this to production.

Jeremy, comments?
I'd also recommend against using this in production, for the same reasons Shyam mentioned (just putting that on the record).  Being able to retrieve random files by passing a URL that makes it concatenate with something else you're allowed to see doesn't sound too safe to me.
(In reply to comment #6)
> Which is sort of vague. What exactly is the author referring to? And what are
> the possible security implications?

I think the situation in question is if your app has login-protected files in the same directory as other public files, someone could get around those restrictions with this module.  We only want to expose this for public resources so it shouldn't be a problem.  All our code is open anyways.
(In reply to comment #8)
> I'd also recommend against using this in production, for the same reasons Shyam
> mentioned (just putting that on the record).  Being able to retrieve random
> files by passing a URL that makes it concatenate with something else you're
> allowed to see doesn't sound too safe to me.

I'm assuming mod_concat is under the same restrictions as the rest of Apache, so it shouldn't open us up to any new attacks.  This is all the source code for the module: http://code.google.com/p/modconcat/source/browse/trunk/mod_concat/mod_concat.c

It's coming from an Apache contributor and was created at AOL, so I have more faith in it than some random code on the internet.
(In reply to comment #6)
> The PDF on the site also mentions :
> 
> now a word of warning. if you use security/access permissions on your site, you
> will need to be very careful, as a clever 
> person might be able to subvert this module to read the contents of this file.
> 
> Which is sort of vague. What exactly is the author referring to? And what are
> the possible security implications?

It's clearly spelled out in this bug: http://code.google.com/p/modconcat/issues/detail?id=1
I'm not enabling this globally, I'd prefer to be able to enable it for specific directories.  Unfortunately, the config only lets you disable it in a directory, not enable it.  So if I disable globally, there's no way to re-enable it on specific directories.  This is backwards.  We can't expect people to rememeber to put "disable_concat" in every vhost we create on every server, and many of the vhosts that share these servers *do* have authenticated areas on them.
(In reply to comment #11)
> (In reply to comment #6)
> > The PDF on the site also mentions :
> > 
> > now a word of warning. if you use security/access permissions on your site, you
> > will need to be very careful, as a clever 
> > person might be able to subvert this module to read the contents of this file.
> > 
> > Which is sort of vague. What exactly is the author referring to? And what are
> > the possible security implications?
> 
> It's clearly spelled out in this bug:
> http://code.google.com/p/modconcat/issues/detail?id=1

That's correct, and that's because that module doesn't make use of subrequests to get at the content of the files to concatenate, but rather directly reads them from disk. That's probably for performance reasons, but that has major implications, security ones, like pointed out there. But will also have implications if any of the urls being concat'ed together isn't straight static files. For instance, if you css file is actually a php script, or has SSI includes in it, they won't be processed at all.
(In reply to comment #12)
> I'm not enabling this globally, I'd prefer to be able to enable it for specific
> directories.  Unfortunately, the config only lets you disable it in a
> directory, not enable it.  So if I disable globally, there's no way to
> re-enable it on specific directories.  This is backwards. 

Very much so, IMO. Plus, the modules is enabled by default everywhere as soon as you load it. Of course, it can be easily changed to be the other way around.

# Not necessary, the default is off
#<Location />
#  ConcatFiles Off
#</Location>

<Location /static>
  ConcatFiles On
</Location>

<Location /static/old>
  ConcatFiles Off
</Location>

Sounds much 

> We can't expect
> people to rememeber to put "disable_concat" in every vhost we create on every
> server, and many of the vhosts that share these servers *do* have authenticated
> areas on them.

The problems highlighted so far are quite valid. However, since this would be handy for momo sites as well, I am tempted to take this over and provide the necessary patches for the module. Shouldn't be very long.

The only issue is that if the patches can't make it upstream, we end up with our own forked mod_concat, not sure that would be a great thing, especially for folks trying to run their own copies of our sites.

Might be simpler to just have small scripts to 'build' the concatenated files or use some of the other PHP-based concatenation solutions.

 - http://rakaz.nl/code/combine
 - http://www.ejeliot.com/blog/72
(In reply to comment #14)
> The problems highlighted so far are quite valid. However, since this would be
> handy for momo sites as well, I am tempted to take this over and provide the
> necessary patches for the module. Shouldn't be very long.

That would be very cool.  This module will give us easy performance wins on mozilla.com and other static-ish marketing sites that are pulling in lots of resources.

We could also try using Perlbal, where this concat feature started, but I'm guessing that would be even more hassle.
(In reply to comment #14)
> The only issue is that if the patches can't make it upstream, we end up with
> our own forked mod_concat, not sure that would be a great thing, especially for
> folks trying to run their own copies of our sites.

It's a valid point, we should at least try contacting them when we have a fix. 

If we can't fix the security issues, a last resort is a PHP based script that does the same thing.
Can you quantify the performance gains?
Step #1 of http://www.ryandoherty.net/2008/10/12/optimizing-openspacebook/

50% decrease in page load time when concatenating files.

Step #4 http://blog.patrickmeenan.com/2009/05/optimization-impact.html

60% decrease in page load time when concatenating files.

http://www.elctech.com/tutorials/reduce-http-requests-combine-files

56% decrease in page load time when concatenating files
I talked to Ryan & Jeff.  We'll manually test this one the onebillion site but will wait until Jeremy gets back on Thursday.
Assignee: server-ops → oremj
Depends on: 508305
(In reply to comment #12)
> I'm not enabling this globally, I'd prefer to be able to enable it for specific
> directories.  Unfortunately, the config only lets you disable it in a
> directory, not enable it.  So if I disable globally, there's no way to
> re-enable it on specific directories.  This is backwards.

I've got a slightly modified version of mod_concat.c over here:

<http://svn.ectoplasm.org/projects/c/mod_concat/mod_concat.c>

That adds a Concat On/Off directive, that defaults correctly to off. So one can now do:

<Location /static>
  Concat On
</Location>

If you checkout <http://svn.ectoplasm.org/projects/c/mod_concat>

you can easily build/bundle this module with:

$> autoreconf -v -i
$> ./configure --with-apxs=/usr/sbin/apxs
$> make
$> make test
$> make rpm
I'd really prefer not to run a forked version of the module or even the original module. It could potentially interfere with apache upgrades in the future. 

In in the original version I'm scared of something like https://oursite.mozilla.com??includes/config.php, which looks possible.

Seems like we could go for a much simpler approach and cron a build script that does "cat foo.js bar.js baz.js > site.js; etc; etc;".  Then, if js/css files are added, it will just be updating build.sh vs updating all the header files.
(In reply to comment #21)
> I'd really prefer not to run a forked version of the module or even the
> original module. It could potentially interfere with apache upgrades in the
> future. 

I agree with you there.

> In in the original version I'm scared of something like
> https://oursite.mozilla.com??includes/config.php, which looks possible.

Yup, and that's definitely a problem indeed.

> Seems like we could go for a much simpler approach and cron a build script that
> does "cat foo.js bar.js baz.js > site.js; etc; etc;".  Then, if js/css files
> are added, it will just be updating build.sh vs updating all the header files.

Yes, if pregenerating the combined files is easy/possible, it's a much better approach, IMO. Serving straight static files (even if they are built) is a much simpler proposition.
(In reply to comment #21)
> Seems like we could go for a much simpler approach and cron a build script that
> does "cat foo.js bar.js baz.js > site.js; etc; etc;".  Then, if js/css files
> are added, it will just be updating build.sh vs updating all the header files.

I'd rather not do this for a few reasons:

* Custom build script for every site
* Requires more coding than just changing a url
* Room for error, someone could forget to do this and we'd have all sorts of weird errors. (old css & js files being served with new html)

My goal for improving performance of all our websites is to make it almost effortless. Many of our sites are maintained and updated by contractors and external agencies. We need to make it as easy as possible for them to improve the performance of their websites.

If mod_concat isn't useable, I think we should find something else that can do the same thing.
If there was a way to limit concat to just css or js files I would feel better about it.  An alternative could be "minst" as a build/deploy app to do this for us and it'd be required when deploying anything.  We could make I effortless that way -- as long as we agreed on a common directory format and all that.  Let's talk about it more.
I'm sure we can figure out how to limit concatenation to js and css files.

I still don't want a build script that requires a specific format or directory structure. We need to be able to apply this to already existing sites.
I think the only way to use this module is to host all static files from another domain. Besides a build script do you know of any alternatives?
Hosting static files from another (cookieless) domain would be cool, that's usually good for client-side performance.
(In reply to comment #27)
> Hosting static files from another (cookieless) domain would be cool, that's
> usually good for client-side performance.

This sort of thing falls into the realm of a CDN. Which would be totally awesome (default it to gzip, minify and set expires header).

My other idea is create a PHP library that can do this. Make it a SVN external for projects. Could that work?
My other idea is also to create a PHP library. It isn't optimal to run everything through PHP, but NS/Zeus caching should factor that out. If we are going to create a library I'd prefer if it were statically configured, i.e:
$static_files = array( "js" => array("file1", "file2",...),
                       "js2" => array("otherfile1", ...) );

Then they could be referenced something like: http://.../static.php?fileset=js2
 
> This sort of thing falls into the realm of a CDN. Which would be totally
> awesome (default it to gzip, minify and set expires header).

We won't do a CDN like you're thinking but we can still serve from another domain.  We'll have to be careful though because if AMO is under an EV SSL cert, we'll want this other domain to have an EV SSL cert (ideally).  

(In reply to comment #29)
> My other idea is also to create a PHP library. 

I like that idea.  The js/css files shouldn't be changing often right?  So those could be cached with a long expire?
(In reply to comment #30)
> We won't do a CDN like you're thinking but we can still serve from another
> domain.  We'll have to be careful though because if AMO is under an EV SSL
> cert, we'll want this other domain to have an EV SSL cert (ideally).  

That's cool :)

> I like that idea.  The js/css files shouldn't be changing often right?  So
> those could be cached with a long expire?

JS/CSS files usually change with every release of any site. The ideal setup is to make the URL unique per release (use svn rev # in the url). That way you can far-future-expires the header. We do this on AMO and can probably write a library to do it automatically with the concat script.
Shall we scrap this bug and start a new one for the the library or just rename this one?
(In reply to comment #32)
> Shall we scrap this bug and start a new one for the the library or just rename
> this one?

Let's scrap this one and make a new bug to track building a PHP library that will concat, minify and far-future expires the files. I'll cc everyone on this bug.
Status: NEW → RESOLVED
Closed: 15 years ago
Resolution: --- → WONTFIX
Component: Server Operations: Web Operations → WebOps: Other
Product: mozilla.org → Infrastructure & Operations
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in before you can comment on or make changes to this bug.