Closed Bug 1001517 Opened 9 years ago Closed 9 years ago

Stand up nginx caching proxy PoC in scl3 on seamicro hw

Categories

(Infrastructure & Operations :: Storage, task)

task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: taras.mozilla, Assigned: gozer)

References

Details

+++ This bug was initially created as a clone of Bug #962298 +++
Lets do this on a seamicro node with 9 or 10 oldtimey 80gb intel g2 ssds :) Bonus points if you can avoid RAID and do JBOD(nginx is pretty good that way in that caching is done by hash of filename and using the first few charters as subdirectories)
Mike,
Can you provide a url to s3 to proxy?
Flags: needinfo?(mh+mozilla)
gozer: do you want to set up the seamicros yourself, or do you want us to assist?  If the latter, what do you want for an OS?
Assignee: server-ops-storage → gozer
(In reply to Taras Glek (:taras) from comment #1)
> Mike,
> Can you provide a url to s3 to proxy?

If your question is "does sccache support asymetric storage where it GETs on one url and PUTs on another", the answer is no, not currently.
Flags: needinfo?(mh+mozilla)
(In reply to Mike Hommey [:glandium] from comment #3)
> (In reply to Taras Glek (:taras) from comment #1)
> > Mike,
> > Can you provide a url to s3 to proxy?
> 
> If your question is "does sccache support asymetric storage where it GETs on
> one url and PUTs on another", the answer is no, not currently.

irc summary:
Gozer, we want to mirror urls like:
https://s3-us-west-2.amazonaws.com/mozilla-releng-s3-cache-us-west-2-try/0/0/0/00000381cf28d38b840d12c3ce1bb19adde1d777

So the baseurl is https://s3-us-west-2.amazonaws.com/mozilla-releng-s3-cache-us-west-2-try/
The ultimate awesome would be: http://host/mozilla-releng-ceph-cache-scl3-try/ reverse proxies to https://s3-us-west-2.amazonaws.com/mozilla-releng-s3-cache-us-west-2-try/ for GET and PUTs.

This would allow to switch configuration at the boto level, which means it wouldn't be dependent on things propagating from m-c to try, which also means there wouldn't be problems in the future when someone pushes a changeset that's configured to use ceph during some bisection once ceph is decommissioned for sccache.
(In reply to Amy Rich [:arich] [:arr] from comment #2)
> gozer: do you want to set up the seamicros yourself

Never done that before, so probably a smallish waste of my time having to figure it out.

> or do you want us to
> assist?  If the latter, what do you want for an OS?

RHEL/CentOS 5/6 would be just fine.
(In reply to Philippe M. Chiasson (:gozer) from comment #6)
> (In reply to Amy Rich [:arich] [:arr] from comment #2)

And in any case, I'd need to know which seamicro node I can grab for this.
I'd suggest kickstarting them into puppetagain as toplevel::server, then manually configuring from there.  That leaves a quick path to full puppet configuration.
I assigned hp4.relabs.releng.scl3.mozilla.com for this.  Jake's going to KS it with CentOS 6.5.
OK, it's up and running, and gozer, you've got root via SSH key.
Done, and for your money, you get Varnish (preferred) and Nginx

[nginx] http://hp4.relabs.releng.scl3.mozilla.com:80/mozilla-releng-ceph-cache-scl3-try/0/0/0/00000381cf28d38b840d12c3ce1bb19adde1d777

[varnish] http://hp4.relabs.releng.scl3.mozilla.com:81/mozilla-releng-ceph-cache-scl3-try/0/0/0/00000381cf28d38b840d12c3ce1bb19adde1d777

#Varnish
[root@boris ~]# ab -c 100 -n 1000 http://hp4.relabs.releng.scl3.mozilla.com:81/mozilla-releng-ceph-cache-scl3-try/0/0/0/00000381cf28d38b840d12c3ce1bb19adde1d777

Requests per second:    1271.37 [#/sec] (mean)
Time per request:       78.655 [ms] (mean)
Time per request:       0.787 [ms] (mean, across all concurrent requests)
Transfer rate:          26633.97 [Kbytes/sec] received

#nginx
[root@boris ~]# ab -c 100 -n 1000 http://hp4.relabs.releng.scl3.mozilla.com:80/mozilla-releng-ceph-cache-scl3-try/0/0/0/00000381cf28d38b840d12c3ce1bb19adde1d777
Requests per second:    2375.17 [#/sec] (mean)
Time per request:       42.102 [ms] (mean)
Time per request:       0.421 [ms] (mean, across all concurrent requests)
Transfer rate:          50821.57 [Kbytes/sec] received
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
(In reply to Mike Hommey [:glandium] from comment #5)
> The ultimate awesome would be:
> http://host/mozilla-releng-ceph-cache-scl3-try/ reverse proxies to
> https://s3-us-west-2.amazonaws.com/mozilla-releng-s3-cache-us-west-2-try/
> for GET and PUTs.

So it turns out this doesn't work awesomely, because of how s3 authentication works.
Could you make http://host/ reverse proxy to https://s3-us-west-2.amazonaws.com/ instead?
gozer,
can you add my and glandium's ssh key to so we can ssh in?

Can you make sure that varnish/nginx retain objects for atleast 2 weeks without any s3 traffic(eg, no http head)?

From glandium's testing doesn't look like cache is hanging around long enough(or at all?)
(In reply to Taras Glek (:taras) from comment #13)
> gozer,
> can you add my and glandium's ssh key to so we can ssh in?

with root access.
(In reply to Taras Glek (:taras) from comment #13)
> gozer,
> can you add my and glandium's ssh key to so we can ssh in?

Yes, done.

> Can you make sure that varnish/nginx retain objects for atleast 2 weeks
> without any s3 traffic(eg, no http head)?

That's because Varnish respects Expiry/Cache-Control headers. The objects in S3
don't seem to have anything in there, so it falls back to caching for it's configured
TTL of 120 seconds.

> From glandium's testing doesn't look like cache is hanging around long
> enough(or at all?)

Instead of making the cache cheat, wouldn't it be possible to just set a Max-Age on the
S3 objects ?
(In reply to Philippe M. Chiasson (:gozer) from comment #15)

> Instead of making the cache cheat, wouldn't it be possible to just set a
> Max-Age on the
> S3 objects ?

That's a good idea, but will take a while to deploy this. In meantime, can you make it cheat? This seems like a common reverse proxy tweak.
gozer, was thinking about this some more: there is no reason for nginx/varnish object expiry to match S3. Varnish has LRU, S3 only has expiry. Technically the object cache should never expire stuff, it should only push it out based on LRU logic. This isn't like a webcache where the content can change. In this case the name of the object reflects content, and the object is only useful as long as there are requests for it.

So please set expiry to be infinite.
Blocks: 1007976
It doesn't seem the server is reporting its load/cpu/network/etc. usage to graphite/hosted graphite. Can that be added?
Gozer, I was looking at getting the following stats out of varnish:

* Varnish:time_firstbyte 
* %b Size of response in bytes, excluding HTTP headers. In CLF format, i.e. a '-' rather than a 0 when no bytes are sent.
%D Time taken to serve the request, in microseconds.


Then I noticed these are only available in varnish3+. Can we upgrade this to varnish 4.0?
err, i meant latest 3.0
Blocks: 1010316
(In reply to Mike Hommey [:glandium] from comment #18)
> It doesn't seem the server is reporting its load/cpu/network/etc. usage to
> graphite/hosted graphite. Can that be added?

I've enabled collectd for this node.  I had to change the relabs config and run puppet against my puppet env since there is currently no way to do this on a per node basis and collectd is disabled by default in the relebs org config.  The puppet cron run is also disabled so not to clobber collectd.

This node will report under .test.relabs. as opposed to .hosts.
Blocks: 1024651
Product: mozilla.org → Infrastructure & Operations
You need to log in before you can comment on or make changes to this bug.