Closed Bug 977760 Opened 11 years ago Closed 11 years ago

Elasticsearch capacity for DXR

Categories

(Infrastructure & Operations Graveyard :: WebOps: Other, task)

task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: erik, Assigned: bburton)

Details

DXR is on track to switch to an Elasticsearch backend (from NetApp NFS mounts) in the next 4-8 weeks (https://wiki.mozilla.org/DXR_Storages#Elasticsearch). Can we get some ES capacity allocated, either on an existing cluster or a new one? (I believe fubar has already started investigating, so I'm CCing him here, and he can claim the ticket if he's the right person to do it.) Our indices will start at around 50GB (an order-of-magnitude estimate--I'll update this once I have a better number) and grow as we add more trees. Currently, we've indexed 2 of about 50 trees planned. The biggest draw on resources will be reindexing: we rebuild and reindex both of current trees every 6 hours and would like to maintain a comparable schedule. I could be argued down to every 24 hours, but it's still a lot of data flying around: 50GB in a big flood every time we reindex, if my above number is correct. And that will grow with the number of trees. Many thanks! Let me know if you need to know more about our data access patterns or anything.
(In reply to Erik Rose [:erik][:erikrose] from comment #0) > DXR is on track to switch to an Elasticsearch backend (from NetApp NFS > mounts) in the next 4-8 weeks > (https://wiki.mozilla.org/DXR_Storages#Elasticsearch). Can we get some ES > capacity allocated, either on an existing cluster or a new one? (I believe > fubar has already started investigating, so I'm CCing him here, and he can > claim the ticket if he's the right person to do it.) > > Our indices will start at around 50GB (an order-of-magnitude estimate--I'll > update this once I have a better number) and grow as we add more trees. > Currently, we've indexed 2 of about 50 trees planned. > > The biggest draw on resources will be reindexing: we rebuild and reindex > both of current trees every 6 hours and would like to maintain a comparable > schedule. I could be argued down to every 24 hours, but it's still a lot of > data flying around: 50GB in a big flood every time we reindex, if my above > number is correct. And that will grow with the number of trees. > > Many thanks! Let me know if you need to know more about our data access > patterns or anything. We can get you started with the existing production cluster in PHX1, https://mana.mozilla.org/wiki/pages/viewpage.action?pageId=5112369&src=search#Elasticsearch%28webapp%29-PHX1Production , accessible through elasticsearch-zlb.vlan81.phx1.mozilla.com This is currently 0.20.5, but should be moved to 0.90.x very soon, we're coordinating downtime with Input and SUMO I'll need to file a bug for flows for the following hosts though * dxradm.private.phx1.mozilla.com * dxr-processor1.private.phx1.mozilla.com
Assignee: server-ops-webops → bburton
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in before you can comment on or make changes to this bug.