Closed Bug 973903 Opened 10 years ago Closed 10 years ago

Cannot connect to database from One and Done prod paas instance

Categories

(Infrastructure & Operations :: IT-Managed Tools, task)

task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: osmose, Assigned: cturra)

References

Details

When I try to do anything involving the database on One and Done on the prod paas, I get the following error:

_mysql_exceptions.OperationalError: (1130, "Host '10.22.93.33' is not allowed to connect to this MySQL server")

This occurs even when I tried to backup the database with the dbexport command.
Restarting did nothing to help, but recreating the app (while not deleting the database service) was able to fix it. Perhaps if I had just re-bound the database service, things would've been okay? Could this have been related to the recent paas update?
Yes, I believe this was directly related to the maintenance last night. I saw the same thing happen on the prod nucleus instance: https://bugzilla.mozilla.org/show_bug.cgi?id=973793
:mkelly - i am copy+pasting my response from the bug :jgmize mentions in his comment:

""
i have a couple theories here, but will share what i saw and my thoughts on them. after the patching was completed last night, the services needed restarts. after the mysql service was restarted, the services node itself could no longer connect with the mysql server(s) (managed by our DBAs). i worked with :sheeri to flush the user and get that auth working again.

my working theory here is the application containers were still trying to connect through the mysql services node with some sort of cached credentials. i did manually try restarting the [oneanddone] app last night, but that made no difference. 

i did check after these changes last night that all of the dea nodes could manually connect to the mysql severs.

 stackato@stackato-dea3:/home/cturra$ mysql -u stackato -p --host paas-mysql-vip.db.scl3.mozilla.com -e "select 1\G"
 Enter password:
 *************************** 1. row ***************************
 1: 1


at this point, we're going to need to wait and see, since there are no indicators of issues persisting at this time.
""
Component: WebOps: Other → WebOps: IT-Managed Tools
i am pretty confident the change we made to the services node described in my comment above, plus the repush of the application has cleared this up.

i am going to mark this bug as r/fixed, but as usual, if this creeps up on you again, please don't hesitate to re-open.
Assignee: server-ops-webops → cturra
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.