Closed Bug 751171 Opened 12 years ago Closed 12 years ago

build distributed ES cluster

Categories

(Cloud Services :: Operations: Marketplace, task)

x86
macOS
task
Not set
minor

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: cshields, Assigned: dmaher)

References

Details

I'm not sure if ES has the concept of "local vs remote" but if it does we should make sure that it understands the difference between the 4 nodes that will be in SCL3 and the 4 nodes in PHX1.

We have the 4 nodes in SCL3 already kickstarted, elasticsearch[1-4].amopod1.addons.scl3

the 4 matching nodes in phx1 have hit a delay in racking that would put too much of a delay in this proof of concept so we are going to use some lesser-ram nodes to test with for now, and then swap those out for the prod nodes in a few weeks.  Filed 751169 to have those done.
Severity: normal → major
Status: NEW → ASSIGNED
WIP.
Preliminary tests of site-awareness are positive; a test index spread across 2 sites of 4 nodes each was balanced correctly.  Further tests of shard/replica combinations are forthcoming, and following that: performance testing.
The cluster is with four nodes each at SCL3 and PHX1.  ES will automatically spread the shards with respect to the two sites, and given an appropriate combination of shards and replicas, it is possible to create indices with different levels of redundancy with respect to data centre and node loss.
06:22:10 <@cshields> can you sync up with jason and oremj - get the amo stage environment to use this new cluster for testing
I think we need to raise the number of open file limits for the elastic search instances on these hosts:

java.io.IOException: Too many open files
	at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method)
	at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:163)
	at org.elasticsearch.common.netty.channel.socket.nio.NioServerSocketPipelineSink$Boss.run(NioServerSocketPipelineSink.java:227)
	at org.elasticsearch.common.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
	at java.lang.Thread.run(Thread.java:679)

[root@elasticsearch-tmp2.addons.phx1 elasticsearch]# cat /proc/8322/limits | grep file
Max file size             unlimited            unlimited            bytes     
Max core file size        0                    unlimited            bytes     
Max open files            1024                 1024                 files     
Max file locks            unlimited            unlimited            locks 

http://www.elasticsearch.org/tutorials/2011/04/06/too-many-open-files.html
Indeed - I've not actually adjusted any of the system settings yet, as I was strictly interested in the mechanics of a multi-homed ES cluster last week.  This week I'm going to tune things properly, then set up a testing programme to put a few different shard / replica combinations through some automated testing.  Once that's done, the same tests will be done, but with the cluster in various states of degradation.  I'm not going to re-create the wheel, as we've already done a fair amount of testing[1], but there are some scenarios that require additional study.

Once the model multi-homed model has been validated we can extend AMO staging and see how it fares against that use case.  I'd also be highly interested in extending some nodes into the Amazon cloud, just to see if and how that element can come into play - but this latter task isn't a top priority (afaik).

[1] https://mana.mozilla.org/wiki/display/websites/Elastic+Search#ElasticSearch-Testing
Okay great. I was trying to hook up AMO staging for testing earlier today however I was blocked by the following error in comment 5 when trying to create indexes for BAMO and AMO.
Automated tests are currently underway.  I'm going to let them run through the night - these are just the baseline tests, so the results won't be particularly interesting.  Tomorrow I'll initiate the degradation testing, which should produce much more enlightening results.  When (if?) we're satisfied with that, we'll take and roll AMO staging out; target early next week (11 June 2012).
After roughly 24 of automated testing, the averages are as follows :

default :
_all.primaries.docs.count : 2444
_all.primaries.indexing.index_time_in_millis : 8793
_all.primaries.store.size_in_bytes : 142516776
_all.total.docs.count : 25933
_all.total.indexing.index_time_in_millis : 24295
_all.total.store.size_in_bytes : 218378509

critical :
_all.primaries.docs.count : 4266
_all.primaries.indexing.index_time_in_millis : 8623
_all.primaries.store.size_in_bytes : 135713329
_all.total.docs.count : 5550
_all.total.indexing.index_time_in_millis : 80195
_all.total.store.size_in_bytes : 175593968

bigtime :
_all.primaries.docs.count : 2745
_all.primaries.indexing.index_time_in_millis : 15879
_all.primaries.store.size_in_bytes : 100333238
_all.total.docs.count : 4480
_all.total.indexing.index_time_in_millis : 148960
_all.total.store.size_in_bytes : 143752669
Please ignore the statistics in comment #9; I made an error in the portion of my testing script that creates the indexes, and as a result, they were not reliably being created with the desired shard / replica combinations.  I fixed the error and re-ran the tests this week-end, and the results are actually quite interesting :

default :
_all.primaries.docs.count : 2025
_all.primaries.indexing.index_time_in_millis : 58847
_all.primaries.store.size_in_bytes : 114995479
_all.total.docs.count : 9894
_all.total.indexing.index_time_in_millis : 28532
_all.total.store.size_in_bytes : 99228557

critical :
_all.primaries.docs.count : 2631
_all.primaries.indexing.index_time_in_millis : 30927
_all.primaries.store.size_in_bytes : 141211684
_all.total.docs.count : 41183
_all.total.indexing.index_time_in_millis : 36273
_all.total.store.size_in_bytes : 757185744

bigtime :
_all.primaries.docs.count : 11716
_all.primaries.indexing.index_time_in_millis : 18094
_all.primaries.store.size_in_bytes : 75169101
_all.total.docs.count : 17566
_all.total.indexing.index_time_in_millis : 94147
_all.total.store.size_in_bytes : 155106212

Interestingly, the "critical" combination was able to index the largest number of documents during each test run.
Reducing priority since we need to deal with getting new hardware up for the existing single-homed clusters.
Related :
* bug 754547
* bug 763975
* bug 771532
* bug 763923
Severity: major → minor
Status: ASSIGNED → NEW
At this point do we know what it will take to build a distributed ES cluster? If we do, I'd like to close this bug out and open bugs for individual clusters, as needed.
Yes, we know what it will take, and we can build them out as necessary.  Closing this bug.
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Component: Server Operations: AMO Operations → Operations: Marketplace
Product: mozilla.org → Mozilla Services
You need to log in before you can comment on or make changes to this bug.