Closed Bug 824065 (ops-es-mdn) Opened 13 years ago Closed 13 years ago

set up ES for MDN

Categories

(Infrastructure & Operations Graveyard :: WebOps: Other, task, P1)

x86
macOS

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: groovecoder, Unassigned)

References

Details

(Whiteboard: [triaged 20130104][waiting][853185])

The MDN dev, stage, and prod clusters need access to the ES cluster so we can start using it for site search.
Blocks: 823723
Severity: major → normal
Priority: -- → P2
Sounds good to me, the SCL3 ES cluster should be ready to use. All we should need is some netflows from your nodes to the ES cluster IPs, and then probably some settings_local.py entries. The source IPs for this will be the web nodes, at least... possibly the celery node(s) at some point, but that's still not production at the moment. developer1.webapp.scl3.mozilla.com - 10.22.81.18 developer2.webapp.scl3.mozilla.com - 10.22.81.19 developer3.webapp.scl3.mozilla.com - 10.22.81.20 @phrawzty: let us know what destination IP(s) and port(s) need to be allowed. When we have that we'll open a dependent ACL bug.
Flags: needinfo?(dmaher)
Priority: P2 → P3
Whiteboard: [triaged 20130104]
I missed a note from triage (and comment 0)... the info above is for prod only. Obviously we'll want to dev and stage also. I'm okay with doing a cross-DC ACL for dev and stage, if NetOps is willing to do it. I'm also okay with those living on the same ES cluster, just using different indexes. if anyone is concerned about this, please speak up. :) Those source nodes are: developer1.dev.webapp.scl3.mozilla.com - 10.22.81.16 developer1.stage.webapp.scl3.mozilla.com - 10.22.81.17
Hi :jakem (et al.), As a general rule, nodes in the webapp VLAN can all talk to each other without requiring additional flows to be opened - though this is true only within a given data centre. Thus, the MDN (prod) machines can already communicate with the (prod) ES cluster at SCL3 : $ for i in developer1.dev developer1.stage developer1; do ssh $i.webapp.scl3.mozilla.com 'nc -vz elasticsearch-zlb.webapp.scl3.mozilla.com 9200'; done Connection to elasticsearch-zlb.webapp.scl3.mozilla.com 9200 port [tcp/wap-wsp] succeeded! Connection to elasticsearch-zlb.webapp.scl3.mozilla.com 9200 port [tcp/wap-wsp] succeeded! Connection to elasticsearch-zlb.webapp.scl3.mozilla.com 9200 port [tcp/wap-wsp] succeeded! However, as :jakem noted, for the dev and stage machines at SCL3 to communicate with the dev & stage cluster at PHX1, a flow will need to be opened : $ for i in developer1.dev developer1.stage; ssh $i.webapp.scl3.mozilla.com 'nc -w 1 -vz elasticsearch-zlb.dev.vlan81.phx.mozilla.com 9200'; done nc: connect to elasticsearch-zlb.dev.vlan81.phx.mozilla.com port 9200 (tcp) timed out: Operation now in progress nc: connect to elasticsearch-zlb.dev.vlan81.phx.mozilla.com port 9200 (tcp) timed out: Operation now in progress In addition, it can be useful (though not always necessary) for the admin node to be able to communicate with the ES cluster(s) : $ ssh developeradm.private.scl3.mozilla.com 'nc -w 1 -vz elasticsearch-zlb.dev.vlan81.phx.mozilla.com 9200' nc: connect to elasticsearch-zlb.dev.vlan81.phx.mozilla.com port 9200 (tcp) timed out: Operation now in progress $ ssh developeradm.private.scl3.mozilla.com 'nc -w 1 -vz elasticsearch-zlb.webapp.scl3.mozilla.com 9200' nc: connect to elasticsearch-zlb.webapp.scl3.mozilla.com port 9200 (tcp) timed out: Operation now in progress In summary, these are the flows from these machines : * developer1.dev.webapp.scl3.m.c * developer1.stage.webapp.scl3.m.c * (optional) developeradm.private.scl3.m.c will need to be opened to : * elasticsearch-zlb.dev.vlan81.phx.m.c : 9200/tcp
Flags: needinfo?(dmaher)
Alias: ops-es-mdn
Blocks: 839214
Bump. We need to set this up to unblock ES site search work.
Severity: normal → major
(In reply to Daniel Maher [:phrawzty] (AFK through 4 March 2013) from comment #3) > In summary, these are the flows from these machines : > * developer1.dev.webapp.scl3.m.c > * developer1.stage.webapp.scl3.m.c > * (optional) developeradm.private.scl3.m.c > > will need to be opened to : > * elasticsearch-zlb.dev.vlan81.phx.m.c : 9200/tcp Is there a net-ops bug for this? It's the only IT blocker besides adding IP addresses to local configs.
Im knocking this down to Normal so it doesn't continue to page me. I have alerted WebOps to its escalated Severity
Severity: major → normal
(In reply to James Socol [:jsocol, :james] from comment #5) > (In reply to Daniel Maher [:phrawzty] (AFK through 4 March 2013) from > comment #3) > > In summary, these are the flows from these machines : > > * developer1.dev.webapp.scl3.m.c > > * developer1.stage.webapp.scl3.m.c > > * (optional) developeradm.private.scl3.m.c > > > > will need to be opened to : > > * elasticsearch-zlb.dev.vlan81.phx.m.c : 9200/tcp > > Is there a net-ops bug for this? It's the only IT blocker besides adding IP > addresses to local configs. I'll find or file an ACL bug today and we can work together next week to roll out the config changes
Assignee: server-ops-webops → bburton
Priority: P3 → P1
Wait, I just noticed this, but do we intend to open a route from scl3 to phx? Don't we have dev ES infra up in scl3?
(In reply to James Socol [:jsocol, :james] from comment #8) > Wait, I just noticed this, but do we intend to open a route from scl3 to > phx? Don't we have dev ES infra up in scl3? At this time the only ES Dev cluster is in PHX1. If an SCL3 cluster is required, we may be able to get one set up down the line - for now, PHX1 is the only option.
We'll need this for MDN in the next week or so. Which will be available? * Open flows from SCL3 dev, stage, adm nodes to PHX ES cluster or * Set up dev ES in SCL3
(In reply to Luke Crouch [:groovecoder] from comment #10) > We'll need this for MDN in the next week or so. Which will be available? > > * Open flows from SCL3 dev, stage, adm nodes to PHX ES cluster Of the two options presented, opening the network flows is the only realistic one.
What product/component do we use to file the netops bug? And mark it blocking this one.
I filed https://bugzilla.mozilla.org/show_bug.cgi?id=846934 to get the rest of the flows in place, it should be looked at Monday In the mean time, do you want to try and dark launch it for prod? We have flows for prod already, maybe I can push the ES config and we can try some manage.py commands?
Flags: needinfo?(lcrouch)
Thanks. Need to merge some more ES code to test on stage. Will ping back here.
Flags: needinfo?(lcrouch)
(In reply to Luke Crouch [:groovecoder] from comment #14) > Thanks. Need to merge some more ES code to test on stage. Will ping back > here. Sounds good, I'm idling in #mdndev to please ping me if I can help to test anything with this or celery
Depends on: 848870
Blocks: 853185
Assignee: bburton → server-ops-webops
Whiteboard: [triaged 20130104] → [triaged 20130104][waiting][853185]
Blocks: 868506
Been iterating on dev for a while and now we're ready to put stage and prod onto ES. I tried to index on stage and got: AttributeError: 'Settings' object has no attribute 'ES_INDEXING_TIMEOUT' So I think we need to copy the same ES_* values from dev's settings_local.py to stage's settings_local.py.
As per bug 853185c12 the ES config for stage has been set on the admin server. You may trigger a deployment via Chief at your leisure.
Got this on stage when I tried to search: AttributeError: 'Settings' object has no attribute 'ES_URLS'
On the admin server : $ grep ^ES_ settings_local.py ES_DISABLED = False ES_INDEXES = {'default': 'main_index'} ES_INDEX_PREFIX = 'mdnstage' ES_LIVE_INDEX = True ES_INDEXING_TIMEOUT = 30 ES_URLS = ['http://elasticsearch-zlb.dev.vlan81.phx1.mozilla.com:9200'] Did you push stage before attempting your search ?
Prod push, reindex, and search looks good. Thanks!
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Component: Server Operations: Web Operations → WebOps: Other
Product: mozilla.org → Infrastructure & Operations
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in before you can comment on or make changes to this bug.