Closed Bug 860256 Opened 11 years ago Closed 11 years ago

Populate OrangeFactor's dev PHX ES DB with data from the ES SCL3 production instance

Categories

(Tree Management Graveyard :: OrangeFactor, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED INCOMPLETE

People

(Reporter: emorley, Unassigned)

Details

* OrangeFactor currently uses the Metrics ES instance, to which I have direct access using MPT-VPN.
* Bug 772503 wants us to move to a new SCL3 prod ES instance, to which we're not being allowed access (bug 849161 comment 15 onwards).
* In order to work on OrangeFactor I need access to recent production data - so we'll need to periodically mirror the prod ES DB to dev.

TBD:
* Automatic cron vs a script I can run by hand on brasstacks (machine on which the prod OF instance runs, which does have access to prod).
* How many weeks worth of records to copy across (think we'll only need the last 2-4 weeks worth of some of the tables or whatever ES calls them).
adding jakem (webops) and removing the DBAs, the ES db isn't one that the db team administers, webops does.
Assignee: server-ops-database → server-ops-webops
Component: Server Operations: Database → Server Operations: Web Operations
QA Contact: cshields → nmaul
Flags: needinfo?(emorley)
Whiteboard: [triaged 20130411]
(In reply to Ed Morley [:edmorley UTC+1] from comment #0)
> TBD:
> * Automatic cron vs a script I can run by hand on brasstacks (machine on
> which the prod OF instance runs, which does have access to prod).
> * How many weeks worth of records to copy across (think we'll only need the
> last 2-4 weeks worth of some of the tables or whatever ES calls them).

For the moment, let's make this a manual script that can be run from brasstacks.

The last 4 weeks worth of records would be great.

jgriffin, I'm guessing some of the ES tables will be needed wholesale? I can't find a schema checked in anywhere, do you know which we'll need all of?
Flags: needinfo?(emorley) → needinfo?(jgriffin)
ES doesn't use schemas, per se.  The indices OrangeFactor uses are:  logs, tbpl, bugs, and bzcache.
Flags: needinfo?(jgriffin)
So I'm guessing we'll need all of bugs and bzcache, but just the last 4 weeks of logs and tbpl?
We should just need all of bzcache; the bugs index contains a different view of the starred bugs from TBPL, so we can just copy the most recent 4 weeks of that, as well as logs and tbpl.
From IRC, before I lose it:

phrawzty: you can't really, like, "mirror" elasticsearch indexes.
phrawzty: assuming the raw data is still availble, you can re-index it into a new instance, for example.
phrawzty: then you have two indices
phrawzty: with the same data :)
phrawzty: in terms of keeping them live, generally speaking the most sane approach is to have the client - which is to say the thing that's actually feeding ES - communicate with both instances
phrawzty: if re-indexing from the original data is not possible, then there are two options
phrawzty: one is to literally copy the entire file system contents of a complete index.  this is fine for small indices that aren't sharded across multiple nodes
phrawzty: in The Real World, however, that's not generally possible
edmorley: yeah I can imagine, particularly given the size if the indices here
phrawzty: which leads to the second option, which is performing a scroll re-index from one ES instance directly to another
phrawzty: this generally works, but it can be.. quirky, especially if strange things are being done to the source index.  it works, but it can be finnicky, is what i'm trying to say.
phrawzty: does.. does that help at all ?
edmorley: yes thank you :-)
edmorley: I'll have a think about this
edmorley: maybe modifying the client is the easiest thing here
edmorley: and then scheduling something to purge the dev instance periodically
phrawzty: so you can set an expire time on indexed documents in ES
edmorley: oh
phrawzty: dunno if that is interesting to your use case, but ther eyou go
edmorley: it may indeed be, thank you for that
phrawzty: http://www.elasticsearch.org/guide/reference/mapping/ttl-field/

--

Moving back to OF so we can decide on the best strategy for this. In addition, I think I'm going to not make this block bug 772503 / bug 848834 any longer, given that OF has been holding them up long enough as it is.
Assignee: server-ops-webops → nobody
Component: Server Operations: Web Operations → Orange Factor
Product: mozilla.org → Testing
QA Contact: nmaul
Version: other → Trunk
Summary: Periodically mirror OrangeFactor's production ES SCL3 DB to the dev PHX ES instance → Populate OrangeFactor's dev PHX ES DB with data from the ES SCL3 production instance
Whiteboard: [triaged 20130411]
Production: elasticsearch-zlb.webapp.scl3.mozilla.com:9200
Dev: elasticsearch-zlb.dev.vlan81.phx.mozilla.com:9200
If we opt for the write-everything-to-two-ES-instances approach, we'll need to update logparser and bzcache as well so they know how to do this.
And technically this *does* block the move because we need log data for OF to be useful.
(In reply to Mark Côté ( :mcote ) from comment #9)
> And technically this *does* block the move because we need log data for OF
> to be useful.

This bug is only about the dev/staging instance, which we're not currently using, so it doesn't block :-)
Yeah, agreed that this shouldn't block us from getting off of the metrics cluster--except that this bug was still blocking on bug 848834, which itself blocks bug 772503. :)  I've cleared the blocker list, since as you say we can really do this anytime.  We'll come back to this sometime after the migration.

Also agreed that we probably don't have to actually import any data to the dev cluster (unlike the prod cluster (bug 8705590)), as long as we get tbpl and the logparser writing to the dev cluster as well as prod, and (optionally) get the bzcache refresh script writing to dev as well (but only for the last 4 weeks, not 6 months).  We also need to be sure that tbpl, logparser, and the bzcache refresher can all gracefully handle failed writes of various types (host not found, port not open, errors when submitting data, etc.), since this will be a *dev* system and may not always be up or fully functional.
No longer blocks: 848834
I think we just need to give up on this idea for now - short/medium term much easier just to tunnel to prod data :-)
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → INCOMPLETE
Product: Testing → Tree Management
Product: Tree Management → Tree Management Graveyard
You need to log in before you can comment on or make changes to this bug.