1093757 - Install a RabbitMQ monitoring plugin for New Relic on stage and prod

Assignee

Description

•

10 years ago

ensure we're reporting data to newrelic for memcached, rabbitmq, etc so dev's can have more insight into production environment for development and supporting ops

Kendall Libby [:fubar] (he/him)

Assignee

Comment 1

•

10 years ago

newrelic plugin agent is also not actually collecting data from apache. the plugin config has port 80 hardcoded, but apache's on 8080.

Kendall Libby [:fubar] (he/him)

Assignee

Comment 2

•

10 years ago

memcached is now reporting: https://rpm.newrelic.com/accounts/677903/plugins/13559

Ed Morley [:emorley]

Updated

•

9 years ago

Priority: -- → P2

Ed Morley [:emorley]

Updated

•

9 years ago

Blocks: 1059325

Kendall Libby [:fubar] (he/him)

Assignee

Comment 3

•

9 years ago

all staging hosts are now reporting to newrelic correctly (proxy acl was blocking outbound data). 

apache was also configured to also listen on port 80 so that the agent could collect data.

Ed Morley [:emorley]

Comment 4

•

9 years ago

We hit another situation today where two of the processors had stopped taking tasks (even though we hadn't deployed) resulting in:
log_parser      19702
log_parser_fail 375
log_parser_hp   16337

Having the queues in new relic would mean we could (presumably) set up email alerts, and so not have to wait until the sheriffs say "is there a problem with log parsing", by which time there is a 35000 job backlog - which takes a fair time to clear even after a |restart-jobs -p log|.

Also - is it expected that everything other than the webapp nodes have "0 rpm" on https://rpm.newrelic.com/accounts/677903/applications/4180461 ? Is there any way we can get that to report the actually number of tasks handled per second?

OS: Mac OS X → All

Priority: P2 → P1

Hardware: x86 → All

Ed Morley [:emorley]

Comment 5

•

9 years ago

Is this rabbitmq new relic plugin what we need?
https://rpm.newrelic.com/accounts/677903/plugins/directory/95

Mauro Doglio [:mdoglio]

Comment 6

•

9 years ago

:edmorley the webapp nodes should have rpm == 0 for non-web transactions and rpm > 0 for web transactions.
The opposite is true for all the other nodes: rpm == 0 for web transactions and rpm > 0 for non-web transactions.

Ed Morley [:emorley]

Comment 7

•

9 years ago

(In reply to Mauro Doglio [:mdoglio] from comment #6)
> :edmorley the webapp nodes should have rpm == 0 for non-web transactions and
> rpm > 0 for web transactions.
> The opposite is true for all the other nodes: rpm == 0 for web transactions
> and rpm > 0 for non-web transactions.

The table on https://rpm.newrelic.com/accounts/677903/applications/4180461 has 0 rpm for all nodes apart from webapp, so seems like something needs tweaking.

Ed Morley [:emorley]

Comment 8

•

9 years ago

Some options for alerts:
http://celery.readthedocs.org/en/latest/userguide/monitoring.html#monitoring-munin

Ed Morley [:emorley]

Updated

•

9 years ago

Priority: P1 → P2

Ed Morley [:emorley]

Comment 9

•

9 years ago

Please can we install either of these:
https://rpm.newrelic.com/accounts/677903/plugins/directory/25
https://rpm.newrelic.com/accounts/677903/plugins/directory/95

The former is what is used on the Mozilla General New Relic account:
https://rpm.newrelic.com/accounts/263620/plugins/11697

...so failing any other ideas, shall we go with that one?

Added bonus: once this is set up, we can set up alerts for message queue sizes that don't require access to Nagios (plus when the alerts _do_ fire, they'll link to the pretty graphs).

Summary: newrelic monitoring for memcache, rabbitmq, etc → Install a RabbitMQ monitoring plugin for New Relic on stage and prod

Kendall Libby [:fubar] (he/him)

Assignee

Comment 10

•

9 years ago

It's been installed and apparently failing to connect:

ERROR      2015-03-10 19:22:41,395 27769  MainProcess     MainThread newrelic_plugin_agent.agent                   send_components           L235   : Error reporting stats: HTTPSConnectionPool(host='platform-api.newrelic.com', port=443): Max retries exceeded with url: /platform/v1/metrics (Caused by ProxyError('Cannot connect to proxy.', error('Tunnel connection failed: 403 Forbidden',)))

which is messed up because I can connect to that directly. newrelic has fast become my least favorite part of this project.

Kendall Libby [:fubar] (he/him)

Assignee

Comment 11

•

9 years ago

proxy fixed and rabbitmq is finally reporting.

Ed Morley [:emorley]

Updated

•

9 years ago

Blocks: 1141993

Ed Morley [:emorley]

Comment 12

•

9 years ago

That's great - thank you :-)

@sheriffs:
Check this page if you ever think tasks are getting behind:
https://rpm.newrelic.com/accounts/677903/dashboard/6293241/page/4

Have filed bug 1141993 for setting up new relic alerts once we know what sensible values are for the thresholds.

Assignee: nobody → klibby

Status: NEW → RESOLVED

Closed: 9 years ago

Resolution: --- → FIXED

Bugzilla

Quick Search

Install a RabbitMQ monitoring plugin for New Relic on stage and prod

Categories

(Tree Management :: Treeherder: Infrastructure, defect, P2)

Tracking

(Not tracked)

People

(Reporter: fubar, Assigned: fubar)

References

Details

Crash Data

Security

(public)

User Story

Description

Comment 1

Comment 2

Updated

Updated

Comment 3

Comment 4

Comment 5

Comment 6

Comment 7

Comment 8

Updated

Comment 9

Comment 10

Comment 11

Updated

Comment 12