Closed Bug 1045750 Opened 11 years ago Closed 11 years ago

Update statsd settings

Categories

(Infrastructure & Operations :: IT-Managed Tools, task)

x86
Linux
task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: ericz, Assigned: cliang)

References

Details

(Whiteboard: [kanban:https://kanbanize.com/ctrl_board/4/642] )

We need to update the statsd destination server for all of our webapps. In PHX1, we recently switched the statsd server from graphite1.private.phx1.mozilla.com to graphite6.private.phx1.mozilla.com. I've scoured puppet but there are still some servers using non-Puppet setttings to send to graphite1. Please switch them to graphite6.private.phx1.mozilla.com. Same port, 8125.
Blocks: 919038
Whiteboard: [kanban:https://kanbanize.com/ctrl_board/4/642]
Assignee: server-ops-webops → cliang
ausadm.private.phx1.mozilla.com - none bounceradm.private.phx1.mozilla.com - none bugzillaadm.private.scl3|phx1.mozilla.com - none captainadm.private.scl3.mozilla.com - none datazillaadm.private.scl3.mozilla.com - none developeradm.private.scl3.mozilla.com - none dxradm.private.phx1.mozilla.com - none etherpadadm.private.phx1.mozilla.com - none geoipadm.private.scl3.mozilla.com - none [inputadm.private.phx1.mozilla.com - not able to connect?] Found these references and changed them -- but did not push the changes out: bedrockadm.private.phx1 /data/bedrock/src/www.mozilla.org-django/bedrock/bedrock/settings/local.py:STATSD_HOST = '10.8.75.93' # was 10.8.74.135 supportadm.private.phx1.mozilla.com /data/support/www/support.mozilla.org/kitsune/kitsune/settings_local.py:STATSD_HOST = '10.8.75.93'
Bugzilla doesn't hyperlink the closing paren in the above comment so you'll need to cut and paste rather than click on it.
pluginsadm.private.phx1.mozilla.com - none reviewboardadm.private.scl3.mozilla.com - none >> socorro-bixieadm.private.phx1.mozilla.com - none socorroadm.private.phx1.mozilla.com - commented out already startpageadm.private.scl3.mozilla.com - none staticadm.private.phx1.mozilla.com - none treeherderadm.private.scl3.mozilla.com - none Found these references, changed them, and pushed the changes out via the ./deploy command: * genericadm.private.phx1.mozilla.com ./genericrhel6-dev/src/mozillians-dev.allizom.org/mozillians/mozillians/settings/local.py:STATSD_HOST = 'graphite1.private.phx1.mozilla.com' #was 10.8.74.135 ./genericrhel6-stage/src/mozillians.allizom.org/mozillians/mozillians/settings/local.py:STATSD_HOST = 'graphite1.private.phx1.mozilla.com' #was 10.8.74.135 ./genericrhel6/src/mozillians.org/mozillians/mozillians/settings/local.py:STATSD_HOST = 'graphite1.private.phx1.mozilla.com' #was 10.8.74.135 ./genericrhel6-dev/src/affiliates-dev.allizom.org/affiliates-app/affiliates/settings/local.py:STATSD_HOST = 'graphite1.private.phx1.mozilla.com' ./genericrhel6-stage/src/affiliates.allizom.org/affiliates-app/settings/local.py:STATSD_HOST = 'graphite1.private.phx1.mozilla.com' ./genericrhel6/src/affiliates.mozilla.org/affiliates-app/settings/local.py:STATSD_HOST = 'graphite1.private.phx1.mozilla.com' ./genericrhel6-stage/src/basket.allizom.org/basket/settings/local.py:STATSD_HOST = 'graphite1.private.phx1.mozilla.com' # was 10.8.74.135 ./genericrhel6/src/basket.mozilla.org/basket/settings/local.py:STATSD_HOST = 'graphite1.private.phx1.mozilla.com' # was 10.8.74.135 * supportadm.private.phx1.mozilla.com ./support-dev/src/support-dev.allizom.org/kitsune/kitsune/settings_local.py:STATSD_HOST = '10.8.75.93' ./support-stage/src/support.allizom.org/kitsune/kitsune/settings_local.py:STATSD_HOST = '10.8.75.93' ./support/src/support.mozilla.org/kitsune/kitsune/settings_local.py:STATSD_HOST = '10.8.75.93'
Forgot to add - on bedrockadm.private.phx1 /data/bedrock-stage/www/www.allizom.org-django/bedrock/bedrock/settings/local.py:STATSD_HOST = '10.8.75.93' # was 10.8.74.135 Changed the reference and pushed it out. It looks like the earlier change I made on bedrock has also been pushed out.
Comparing against a version of the graph in comment 2 that I took this morning, things have dropped a little (115K to 112.5K), but not nearly as much as either of us would have liked. Is it worth trying to get data from the NetOps folks about what boxes are still shipping data to graphite1.private.phx1?
Flags: needinfo?(eziegenhorn)
No need yet, I can see what is connecting to the box like I did before. I'll update this when I get a chance.
Flags: needinfo?(eziegenhorn)
At the moment I see: generic-celery1.webapp.phx1.mozilla.com generic-celery2.webapp.phx1.mozilla.com input2.webapp.phx1.mozilla.com input3.webapp.phx1.mozilla.com glow1.dmz.phx1.mozilla.com glow1.dev.dmz.phx1.mozilla.com support1.webapp.phx1.mozilla.com
See Also: → 1055620
generic-celery1.webapp.phx1.mozilla.com generic-celery2.webapp.phx1.mozilla.com restarted celery processes for basket and affiliates, restarted mozillians apache process glow - opened separate bug with devs; should be fixed. input2/input3 --> inputadm.private.phx1.mozilla.com /data/input/src/input.mozilla.org/input/fjord/settings/local.py:STATSD_HOST = '10.8.75.93' # was 10.8.74.135 see bug 933280 Changed this reference, pushed out the change, did an apachectl graceful. On the support webheads, I stopped some extra celery processes that might have been sending information to graphite1. I'm only seeing marginal improvement in the graphs. =\
generic-celery3.webapp.phx1.mozilla.com is sending to graphite1.
Also support1.dev.webapp.phx1.mozilla.com and generic102.webapp.phx1.mozilla.com.
* Restarted celery processes on generic-celery3.webapp.phx1, generic-celery1.dev.webapp.phx1, generic-celery1.stage.webapp.phx1, and generic-celerybeat1.webapp.phx1 * Restarted celery processes on sumocelery1.webapp.phx1, support1.stage.webapp.phx1, and support-celery1.dev.webapp.phx1 * Restarted celery processes on input-celery[1-2].webapp.phx1 * Graceful restart of HTTP on generic102.webapp.phx1.mozilla.com, generic1.dev.webapp.phx1, generic[1-2].stage.webapp.phx1 * Graceful restart of HTTP on support1.dev.webapp.phx1 and support1.stage.webapp.phx1 Minor drop in graphs. Clearly, I'm still missing things. =\
Based on discussion in IRC, we're going to close this bug for now and it can be reopened if more stragglers are found. ericz 2:28 I'm just tcpdumping since again it doesn't log anything And I think that graph I showed you is probably showing non-statsd traffic now Which, I don't know where that is coming from, but need to find out. Port 8125 udp seems dead quiet now. So I think you can reso fix your bug and I'll continue tracking down straggler traffic elsewhere. Thanks for all your help! cyliang 2:30 Groovey. Reopen if you find something else hitting that port. ericz 2:30 Sounds good
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.