844912 - SUMO database inaccesible for a few minutes this morning.

Reporter

Description

•

12 years ago

I'm not sure if this is the right place to file this. I got 11 emails this morning between 05:42 PST and 05:46 PST that indicated one of the SUMO servers wasn't able to communicate with it's database. The site does not appear to be negatively affected, and all these errors come from a cron job that runs once a minute. It looks to me like every cron job failed for a few minutes. Sheeri checked mysql logs and nagios, and didn't notice anything, so it sounds like it might have been a network glitch. Here is one of the emails I got. It is pretty uninteresting except two things: That last line, that says it failed, and it appears to have been doing a write operation, not a read. Traceback (most recent call last): File "manage.py", line 49, in <module> execute_manager(settings) File "/data/support-stage/www/support.allizom.org/kitsune/vendor/src/django/django/core/management/__init__.py", line 459, in execute_manager utility.execute() File "/data/support-stage/www/support.allizom.org/kitsune/vendor/src/django/django/core/management/__init__.py", line 382, in execute self.fetch_command(subcommand).run_from_argv(self.argv) File "/data/support-stage/www/support.allizom.org/kitsune/vendor/src/django/django/core/management/base.py", line 196, in run_from_argv self.execute(*args, **options.__dict__) File "/data/support-stage/www/support.allizom.org/kitsune/vendor/src/django/django/core/management/base.py", line 232, in execute output = self.handle(*args, **options) File "/data/support-stage/www/support.allizom.org/kitsune/vendor/src/django-cronjobs/cronjobs/management/commands/cron.py", line 38, in handle registered[script](*args) File "/data/support-stage/www/support.allizom.org/kitsune/apps/customercare/cron.py", line 80, in collect_tweets tweet.save() File "/data/support-stage/www/support.allizom.org/kitsune/vendor/src/django/django/db/models/base.py", line 463, in save self.save_base(using=using, force_insert=force_insert, force_update=force_update) File "/data/support-stage/www/support.allizom.org/kitsune/vendor/src/django/django/db/models/base.py", line 551, in save_base result = manager._insert([self], fields=fields, return_id=update_pk, using=using, raw=raw) File "/data/support-stage/www/support.allizom.org/kitsune/vendor/src/django/django/db/models/manager.py", line 203, in _insert return insert_query(self.model, objs, fields, **kwargs) File "/data/support-stage/www/support.allizom.org/kitsune/vendor/src/django/django/db/models/query.py", line 1593, in insert_query return query.get_compiler(using=using).execute_sql(return_id) File "/data/support-stage/www/support.allizom.org/kitsune/vendor/src/django/django/db/models/sql/compiler.py", line 912, in execute_sql cursor.execute(sql, params) File "/data/support-stage/www/support.allizom.org/kitsune/vendor/src/django/django/db/backends/mysql/base.py", line 114, in execute return self.cursor.execute(query, args) File "/usr/lib64/python2.6/site-packages/MySQLdb/cursors.py", line 173, in execute self.errorhandler(self, exc, value) File "/usr/lib64/python2.6/site-packages/MySQLdb/connections.py", line 36, in defaulterrorhandler raise errorclass, errorvalue django.db.utils.DatabaseError: (2013, 'Lost connection to MySQL server during query')

Sheeri Cabral [:sheeri]

Comment 1

•

12 years ago

:uberj reported I got a 'Lost connection to MySQL server during query' at 5:50AM when trying to connect to dev-zeus-rw.db.phx1.mozilla.com.

Sheeri Cabral [:sheeri]

Comment 2

•

12 years ago

cc'ing Jake

Will Kahn-Greene [:willkg] ET needinfo? me

Comment 3

•

12 years ago

Just to clarify, this was the support.allizom.org server which is -stage and not -prod.

Michael Cooper [:mythmon]

Reporter

Comment 4

•

12 years ago

-dev is also represented in the emails, but I didn't receive any for -prod. This isn't happening any more, so it is certainly not a fire, just a curiosity.

Sheeri Cabral [:sheeri]

Comment 5

•

12 years ago

Checked the stage database server (same as in comment 1) and there are no MySQL errors there, and Nagios has no problems, not even "soft" states, reported today. MySQL error logs for today have nothing unusual.

Jake Maul [:jakem]

Comment 6

•

12 years ago

That path (/data/support-stage/www/...) is the path on the admin node (supportadm.private.phx1), rather than the web nodes. Therefore this probably won't be in Sentry. Indeed I see nothing there. Just throwing this out as another data point... seems like to have been cron- or deploy-related, because that should be all that happens on the admin node.

Sheeri Cabral [:sheeri]

Comment 7

•

12 years ago

This was over a month ago, and we haven't seen this recur, so I'm going to close this. If this is in fact still an issue, please re-open.

Status: NEW → RESOLVED

Closed: 12 years ago

Resolution: --- → WORKSFORME

Nobody; OK to take it and work on it

Updated

•

11 years ago

Product: mozilla.org → Data & BI Services Team

Bugzilla

SUMO database inaccesible for a few minutes this morning.

Categories

(Data & BI Services Team :: DB: MySQL, task)

Tracking

(Not tracked)

People

(Reporter: mythmon, Unassigned)

References

Details

Crash Data

Security

(public)

User Story

Description

Comment 1

Comment 2

Comment 3

Comment 4

Comment 5

Comment 6

Comment 7

Updated