Closed
Bug 947938
Opened 11 years ago
Closed 10 years ago
debug why newrelic pgbouncer isn't working
Categories
(Data & BI Services Team :: DB: MySQL, task)
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: scabral, Assigned: mpressman)
Details
(Whiteboard: [2014q2] April)
Matt - could you check the pgbouncer auth files for the machines listed at https://rpm.newrelic.com/accounts/263620/plugins?type=6283 and figure out why newrelic (on the scalebase hosts) cannot connect to pgbouncer?
Reporter | ||
Updated•11 years ago
|
Whiteboard: [2014q1]
Reporter | ||
Updated•10 years ago
|
Assignee: server-ops-database → mpressman
Reporter | ||
Updated•10 years ago
|
Whiteboard: [2014q1] → [2014q2] April
Assignee | ||
Comment 1•10 years ago
|
||
The link in the description just brings up an error page and doesn't provide a list of machines. Is there a list of machines that are affected elsewhere?
Assignee | ||
Comment 2•10 years ago
|
||
Also, since pgbouncer is only running on the socorro machines, I went in verified the pgbouncer auth files and was able to connect using the credentials provided, but only locally to ensure that the password matches the hash in the auth files
Assignee | ||
Comment 3•10 years ago
|
||
Finally, the pg_hba.conf only provides newrelic access from scalebase1 and from there the only running postgres plugin showed no pgbouncer ports. There was an additional plugin found in the postgresql dir that was named pgbouncer.cfg and the pgbouncer section did not contain a password for access. The postgresql section listings did contain the proper password for the newrelic user: pgbouncer: -host: name: port: user: newrelic
Assignee | ||
Comment 4•10 years ago
|
||
One last thing while I'm flying blind without debug messages or machines at issue could be that the newrelic pgbouncer plugin may be failing because of a permissions/role issue. We only allow three roles show command access and newrelic is not listed or a member of the allowed roles. By adding the newrelic role to the pgbouncer config files stats_users then the newrelic will be allowed show access.
Reporter | ||
Comment 5•10 years ago
|
||
So, scalebase1.db.phx1.mozilla.com is running newrelic, check out /var/log for the logs, so you can see the error messages. It's probably the pgbouncer ACLs you mention in comment 4. Is that something that's under puppet control?
Assignee | ||
Comment 6•10 years ago
|
||
I did check the logs in /var/log. Both in /var/log/newrelic and /var/log/newrelic-plugin-agent-supervisor.log[1-3] and there isn't any mention for pgbouncer. The current process list shows the newrelic_plugin_agent using the plugin /usr/local/newrelic/postgresql/pg_newrelic_config.cfg. Also in the dir /usr/local/newrelic/postgresql/pg_newrelic_config.cfg exists pgbouncer.cfg. I assume that is the plugin that isn't working and is just disabled since this hasn't been working. Regardless, I'm with you and think that the issue has to do with the pgbouncer ACL's. Without privs to access the pgbouncer special administration database, the newrelic user cannot access and the plugin will fail. I checked the pgbouncer logs while manually trying to connect with the newrelic user and received the following message: WARNING C-0x1725748: pgbouncer/newrelic@unix:6432 Pooler Error: not allowed Also, yes the pgbouncer configs are under puppet control. I have updated the config and added the newrelic user (revision 86437). On socorro1.stage.db.phx1 I manually reloaded the pgbouncer configs and retested connecting as the newrelic user and it succeeded. The next step will be to re-enable the pgbouncer new relic plugin.
Reporter | ||
Comment 7•10 years ago
|
||
The config is in /usr/local/newrelic/pgbouncer/newrelic_plugin_agent.cfg - I put socorro3 in it. You can run it by doing: /usr/bin/newrelic_plugin_agent -c /usr/local/newrelic/pgbouncer/newrelic_plugin_agent.cfg -f (-c is the location of the config file, -f is to run in the foreground, you can press Ctrl-C to get out of it) I'm getting: INFO 2014-04-25 15:35:04 31336 MainProcess MainThread clihelper run L382 : newrelic_plugin_agent 1.1.0 started CRITICAL 2014-04-25 15:35:04 31336 MainProcess MainThread newrelic_plugin_agent.plugins.postgresql poll L256 : Could not connect to PgBouncer, skipping stats run: could not connect to server: Connection refused Is the server running on host "socorro3.db.phx1.mozilla.com" and accepting TCP/IP connections on port 6000? I got the same result when I tried socorro1.db.phx1.mozilla.com.
Assignee | ||
Comment 8•10 years ago
|
||
YAY - It worked on stage where I had reloaded the config. I also had to put the password in it, but here are the results newrelic_plugin_agent -c /usr/local/newrelic/pgbouncer/newrelic_plugin_agent.cfg -f INFO 2014-04-25 21:37:05 19236 MainProcess MainThread clihelper run L382 : newrelic_plugin_agent 1.1.0 started INFO 2014-04-25 21:37:05 19236 MainProcess MainThread newrelic_plugin_agent.plugins.base finish L141 : PgBouncer poll successful, completed in 0.05 seconds INFO 2014-04-25 21:37:05 19236 MainProcess MainThread newrelic_plugin_agent.agent send_components L209 : Sending 31 metrics to NewRelic INFO 2014-04-25 21:37:10 19236 MainProcess MainThread newrelic_plugin_agent.agent process L122 : All stats processed in 4.44 seconds, next wake in 55.56 CINFO 2014-04-25 21:37:33 19236 MainProcess MainThread clihelper run L746 : CTRL-C caught, shutting down INFO 2014-04-25 21:37:33 19236 MainProcess MainThread clihelper stop L463 : Attempting to stop the process INFO 2014-04-25 21:37:33 19236 MainProcess MainThread clihelper run L761 : clihelper.run exiting cleanly
Assignee | ||
Comment 9•10 years ago
|
||
I'll go on to the rest of the socorro hosts and reload pgbouncers config that now has the newrelic user added to stats_users
Assignee | ||
Comment 10•10 years ago
|
||
All socorro hosts report successful on port 6432 (the pgbouncer-web instance) socorro1.db.phx1 socorro2.db.phx1 socorro3.db.phx1 socorro1.stage.db.phx1 socorro-reporting1.db.phx1
Assignee | ||
Comment 11•10 years ago
|
||
All socorro hosts report successful on port 6433 (the pgbouncer-processor instance)
Assignee | ||
Comment 12•10 years ago
|
||
I added all the socorro hosts in comment 10 to /usr/local/newrelic/pgbouncer/newrelic_plugin_agent.cfg - both the web instance and processor instance and the output appears successful. newrelic_plugin_agent -c /usr/local/newrelic/pgbouncer/newrelic_plugin_agent.cfg -f INFO 2014-04-25 22:00:04 22281 MainProcess MainThread clihelper run L382 : newrelic_plugin_agent 1.1.0 started INFO 2014-04-25 22:00:04 22281 MainProcess MainThread newrelic_plugin_agent.plugins.base finish L141 : PgBouncer poll successful, completed in 0.01 seconds INFO 2014-04-25 22:00:04 22281 MainProcess MainThread newrelic_plugin_agent.plugins.base finish L141 : PgBouncer poll successful, completed in 0.00 seconds INFO 2014-04-25 22:00:04 22281 MainProcess MainThread newrelic_plugin_agent.plugins.base finish L141 : PgBouncer poll successful, completed in 0.01 seconds INFO 2014-04-25 22:00:04 22281 MainProcess MainThread newrelic_plugin_agent.plugins.base finish L141 : PgBouncer poll successful, completed in 0.00 seconds INFO 2014-04-25 22:00:04 22281 MainProcess MainThread newrelic_plugin_agent.plugins.base finish L141 : PgBouncer poll successful, completed in 0.01 seconds INFO 2014-04-25 22:00:04 22281 MainProcess MainThread newrelic_plugin_agent.plugins.base finish L141 : PgBouncer poll successful, completed in 0.00 seconds INFO 2014-04-25 22:00:04 22281 MainProcess MainThread newrelic_plugin_agent.plugins.base finish L141 : PgBouncer poll successful, completed in 0.01 seconds INFO 2014-04-25 22:00:04 22281 MainProcess MainThread newrelic_plugin_agent.plugins.base finish L141 : PgBouncer poll successful, completed in 0.01 seconds INFO 2014-04-25 22:00:04 22281 MainProcess MainThread newrelic_plugin_agent.plugins.base finish L141 : PgBouncer poll successful, completed in 0.01 seconds INFO 2014-04-25 22:00:04 22281 MainProcess MainThread newrelic_plugin_agent.plugins.base finish L141 : PgBouncer poll successful, completed in 0.00 seconds INFO 2014-04-25 22:00:04 22281 MainProcess MainThread newrelic_plugin_agent.agent send_components L209 : Sending 274 metrics to NewRelic INFO 2014-04-25 22:00:04 22281 MainProcess MainThread newrelic_plugin_agent.agent process L122 : All stats processed in 0.56 seconds, next wake in 59.44
Assignee | ||
Comment 13•10 years ago
|
||
The link at https://rpm.newrelic.com/accounts/263620/plugins?type=6283 is now populated and all hosts are green indicating "Component normal"
Assignee | ||
Comment 14•10 years ago
|
||
I think we're good here. I'll leave this bug open. Please close if it is working, otherwise please let me know what I'm missing.
Reporter | ||
Comment 15•10 years ago
|
||
The point was to debug why it's not working, and it's working, so this bug is met. Resolving.
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Updated•10 years ago
|
Product: mozilla.org → Data & BI Services Team
You need to log in
before you can comment on or make changes to this bug.
Description
•