Install a RabbitMQ monitoring plugin for New Relic on stage and prod

RESOLVED FIXED

Status

P2
normal
RESOLVED FIXED
4 years ago
4 years ago

People

(Reporter: fubar, Assigned: fubar)

Tracking

(Blocks: 1 bug)

Details

(Assignee)

Description

4 years ago
ensure we're reporting data to newrelic for memcached, rabbitmq, etc so dev's can have more insight into production environment for development and supporting ops
(Assignee)

Comment 1

4 years ago
newrelic plugin agent is also not actually collecting data from apache. the plugin config has port 80 hardcoded, but apache's on 8080.
(Assignee)

Comment 2

4 years ago
memcached is now reporting: https://rpm.newrelic.com/accounts/677903/plugins/13559

Updated

4 years ago
Priority: -- → P2

Updated

4 years ago
Blocks: 1059325
(Assignee)

Comment 3

4 years ago
all staging hosts are now reporting to newrelic correctly (proxy acl was blocking outbound data). 

apache was also configured to also listen on port 80 so that the agent could collect data.

Comment 4

4 years ago
We hit another situation today where two of the processors had stopped taking tasks (even though we hadn't deployed) resulting in:
log_parser      19702
log_parser_fail 375
log_parser_hp   16337

Having the queues in new relic would mean we could (presumably) set up email alerts, and so not have to wait until the sheriffs say "is there a problem with log parsing", by which time there is a 35000 job backlog - which takes a fair time to clear even after a |restart-jobs -p log|.

Also - is it expected that everything other than the webapp nodes have "0 rpm" on https://rpm.newrelic.com/accounts/677903/applications/4180461 ? Is there any way we can get that to report the actually number of tasks handled per second?
OS: Mac OS X → All
Priority: P2 → P1
Hardware: x86 → All

Comment 5

4 years ago
Is this rabbitmq new relic plugin what we need?
https://rpm.newrelic.com/accounts/677903/plugins/directory/95
:edmorley the webapp nodes should have rpm == 0 for non-web transactions and rpm > 0 for web transactions.
The opposite is true for all the other nodes: rpm == 0 for web transactions and rpm > 0 for non-web transactions.

Comment 7

4 years ago
(In reply to Mauro Doglio [:mdoglio] from comment #6)
> :edmorley the webapp nodes should have rpm == 0 for non-web transactions and
> rpm > 0 for web transactions.
> The opposite is true for all the other nodes: rpm == 0 for web transactions
> and rpm > 0 for non-web transactions.

The table on https://rpm.newrelic.com/accounts/677903/applications/4180461 has 0 rpm for all nodes apart from webapp, so seems like something needs tweaking.

Updated

4 years ago
Priority: P1 → P2

Comment 9

4 years ago
Please can we install either of these:
https://rpm.newrelic.com/accounts/677903/plugins/directory/25
https://rpm.newrelic.com/accounts/677903/plugins/directory/95

The former is what is used on the Mozilla General New Relic account:
https://rpm.newrelic.com/accounts/263620/plugins/11697

...so failing any other ideas, shall we go with that one?

Added bonus: once this is set up, we can set up alerts for message queue sizes that don't require access to Nagios (plus when the alerts _do_ fire, they'll link to the pretty graphs).
Summary: newrelic monitoring for memcache, rabbitmq, etc → Install a RabbitMQ monitoring plugin for New Relic on stage and prod
(Assignee)

Comment 10

4 years ago
It's been installed and apparently failing to connect:

ERROR      2015-03-10 19:22:41,395 27769  MainProcess     MainThread newrelic_plugin_agent.agent                   send_components           L235   : Error reporting stats: HTTPSConnectionPool(host='platform-api.newrelic.com', port=443): Max retries exceeded with url: /platform/v1/metrics (Caused by ProxyError('Cannot connect to proxy.', error('Tunnel connection failed: 403 Forbidden',)))

which is messed up because I can connect to that directly. newrelic has fast become my least favorite part of this project.
(Assignee)

Comment 11

4 years ago
proxy fixed and rabbitmq is finally reporting.

Updated

4 years ago
Blocks: 1141993
That's great - thank you :-)

@sheriffs:
Check this page if you ever think tasks are getting behind:
https://rpm.newrelic.com/accounts/677903/dashboard/6293241/page/4

Have filed bug 1141993 for setting up new relic alerts once we know what sensible values are for the thresholds.
Assignee: nobody → klibby
Status: NEW → RESOLVED
Last Resolved: 4 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.