Closed
Bug 1093757
Opened 10 years ago
Closed 10 years ago
Install a RabbitMQ monitoring plugin for New Relic on stage and prod
Categories
(Tree Management :: Treeherder: Infrastructure, defect, P2)
Tree Management
Treeherder: Infrastructure
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: fubar, Assigned: fubar)
References
Details
ensure we're reporting data to newrelic for memcached, rabbitmq, etc so dev's can have more insight into production environment for development and supporting ops
Assignee | ||
Comment 1•10 years ago
|
||
newrelic plugin agent is also not actually collecting data from apache. the plugin config has port 80 hardcoded, but apache's on 8080.
Assignee | ||
Comment 2•10 years ago
|
||
memcached is now reporting: https://rpm.newrelic.com/accounts/677903/plugins/13559
Updated•10 years ago
|
Priority: -- → P2
Assignee | ||
Comment 3•10 years ago
|
||
all staging hosts are now reporting to newrelic correctly (proxy acl was blocking outbound data).
apache was also configured to also listen on port 80 so that the agent could collect data.
Comment 4•10 years ago
|
||
We hit another situation today where two of the processors had stopped taking tasks (even though we hadn't deployed) resulting in:
log_parser 19702
log_parser_fail 375
log_parser_hp 16337
Having the queues in new relic would mean we could (presumably) set up email alerts, and so not have to wait until the sheriffs say "is there a problem with log parsing", by which time there is a 35000 job backlog - which takes a fair time to clear even after a |restart-jobs -p log|.
Also - is it expected that everything other than the webapp nodes have "0 rpm" on https://rpm.newrelic.com/accounts/677903/applications/4180461 ? Is there any way we can get that to report the actually number of tasks handled per second?
OS: Mac OS X → All
Priority: P2 → P1
Hardware: x86 → All
Comment 5•10 years ago
|
||
Is this rabbitmq new relic plugin what we need?
https://rpm.newrelic.com/accounts/677903/plugins/directory/95
Comment 6•10 years ago
|
||
:edmorley the webapp nodes should have rpm == 0 for non-web transactions and rpm > 0 for web transactions.
The opposite is true for all the other nodes: rpm == 0 for web transactions and rpm > 0 for non-web transactions.
Comment 7•10 years ago
|
||
(In reply to Mauro Doglio [:mdoglio] from comment #6)
> :edmorley the webapp nodes should have rpm == 0 for non-web transactions and
> rpm > 0 for web transactions.
> The opposite is true for all the other nodes: rpm == 0 for web transactions
> and rpm > 0 for non-web transactions.
The table on https://rpm.newrelic.com/accounts/677903/applications/4180461 has 0 rpm for all nodes apart from webapp, so seems like something needs tweaking.
Comment 8•10 years ago
|
||
Some options for alerts:
http://celery.readthedocs.org/en/latest/userguide/monitoring.html#monitoring-munin
Updated•10 years ago
|
Priority: P1 → P2
Comment 9•10 years ago
|
||
Please can we install either of these:
https://rpm.newrelic.com/accounts/677903/plugins/directory/25
https://rpm.newrelic.com/accounts/677903/plugins/directory/95
The former is what is used on the Mozilla General New Relic account:
https://rpm.newrelic.com/accounts/263620/plugins/11697
...so failing any other ideas, shall we go with that one?
Added bonus: once this is set up, we can set up alerts for message queue sizes that don't require access to Nagios (plus when the alerts _do_ fire, they'll link to the pretty graphs).
Summary: newrelic monitoring for memcache, rabbitmq, etc → Install a RabbitMQ monitoring plugin for New Relic on stage and prod
Assignee | ||
Comment 10•10 years ago
|
||
It's been installed and apparently failing to connect:
ERROR 2015-03-10 19:22:41,395 27769 MainProcess MainThread newrelic_plugin_agent.agent send_components L235 : Error reporting stats: HTTPSConnectionPool(host='platform-api.newrelic.com', port=443): Max retries exceeded with url: /platform/v1/metrics (Caused by ProxyError('Cannot connect to proxy.', error('Tunnel connection failed: 403 Forbidden',)))
which is messed up because I can connect to that directly. newrelic has fast become my least favorite part of this project.
Assignee | ||
Comment 11•10 years ago
|
||
proxy fixed and rabbitmq is finally reporting.
Comment 12•10 years ago
|
||
That's great - thank you :-)
@sheriffs:
Check this page if you ever think tasks are getting behind:
https://rpm.newrelic.com/accounts/677903/dashboard/6293241/page/4
Have filed bug 1141993 for setting up new relic alerts once we know what sensible values are for the thresholds.
Assignee: nobody → klibby
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•