Closed
Bug 977760
Opened 11 years ago
Closed 11 years ago
Elasticsearch capacity for DXR
Categories
(Infrastructure & Operations Graveyard :: WebOps: Other, task)
Infrastructure & Operations Graveyard
WebOps: Other
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: erik, Assigned: bburton)
Details
DXR is on track to switch to an Elasticsearch backend (from NetApp NFS mounts) in the next 4-8 weeks (https://wiki.mozilla.org/DXR_Storages#Elasticsearch). Can we get some ES capacity allocated, either on an existing cluster or a new one? (I believe fubar has already started investigating, so I'm CCing him here, and he can claim the ticket if he's the right person to do it.)
Our indices will start at around 50GB (an order-of-magnitude estimate--I'll update this once I have a better number) and grow as we add more trees. Currently, we've indexed 2 of about 50 trees planned.
The biggest draw on resources will be reindexing: we rebuild and reindex both of current trees every 6 hours and would like to maintain a comparable schedule. I could be argued down to every 24 hours, but it's still a lot of data flying around: 50GB in a big flood every time we reindex, if my above number is correct. And that will grow with the number of trees.
Many thanks! Let me know if you need to know more about our data access patterns or anything.
| Assignee | ||
Comment 1•11 years ago
|
||
(In reply to Erik Rose [:erik][:erikrose] from comment #0)
> DXR is on track to switch to an Elasticsearch backend (from NetApp NFS
> mounts) in the next 4-8 weeks
> (https://wiki.mozilla.org/DXR_Storages#Elasticsearch). Can we get some ES
> capacity allocated, either on an existing cluster or a new one? (I believe
> fubar has already started investigating, so I'm CCing him here, and he can
> claim the ticket if he's the right person to do it.)
>
> Our indices will start at around 50GB (an order-of-magnitude estimate--I'll
> update this once I have a better number) and grow as we add more trees.
> Currently, we've indexed 2 of about 50 trees planned.
>
> The biggest draw on resources will be reindexing: we rebuild and reindex
> both of current trees every 6 hours and would like to maintain a comparable
> schedule. I could be argued down to every 24 hours, but it's still a lot of
> data flying around: 50GB in a big flood every time we reindex, if my above
> number is correct. And that will grow with the number of trees.
>
> Many thanks! Let me know if you need to know more about our data access
> patterns or anything.
We can get you started with the existing production cluster in PHX1, https://mana.mozilla.org/wiki/pages/viewpage.action?pageId=5112369&src=search#Elasticsearch%28webapp%29-PHX1Production , accessible through elasticsearch-zlb.vlan81.phx1.mozilla.com
This is currently 0.20.5, but should be moved to 0.90.x very soon, we're coordinating downtime with Input and SUMO
I'll need to file a bug for flows for the following hosts though
* dxradm.private.phx1.mozilla.com
* dxr-processor1.private.phx1.mozilla.com
Assignee: server-ops-webops → bburton
| Assignee | ||
Updated•11 years ago
|
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Updated•6 years ago
|
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•