Closed
Bug 619419
Opened 14 years ago
Closed 14 years ago
SQL connection errors from SUMO
Categories
(mozilla.org Graveyard :: Server Operations, task)
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: jsocol, Assigned: justdave)
Details
Attachments
(1 file)
|
43.30 KB,
image/png
|
Details |
We seem to get relatively low but steady stream of stack traces from SUMO production that end with this error, from MySQL:
File "/data/www/support.mozilla.com/kitsune/vendor/src/django/django/db/backends/mysql/base.py", line 297, in _cursor
self.connection = Database.connect(**kwargs)
File "/usr/lib64/python2.6/site-packages/MySQLdb/__init__.py", line 81, in Connect
return Connection(*args, **kwargs)
File "/usr/lib64/python2.6/site-packages/MySQLdb/connections.py", line 188, in __init__
super(Connection, self).__init__(*args, **kwargs2)
OperationalError: (2013, "Lost connection to MySQL server at 'reading authorization packet', system error: 0")
I'm not sure what the issue is, or if it's solvable, but if it is solvable, we should solve it.
Updated•14 years ago
|
Assignee: server-ops → justdave
| Reporter | ||
Comment 1•14 years ago
|
||
Any insight or ideas here?
| Reporter | ||
Comment 2•14 years ago
|
||
We've had 70 of these in less than 24 hours. It is by far the most common issue with SUMO.
Comment 3•14 years ago
|
||
(In reply to comment #1)
> Any insight or ideas here?
Hrrrm. That looks like it has an issue connecting to the phx slaves. I'm not even sure sumo's using those.
Dave?
| Assignee | ||
Comment 4•14 years ago
|
||
(In reply to comment #3)
> (In reply to comment #1)
> > Any insight or ideas here?
>
> Hrrrm. That looks like it has an issue connecting to the phx slaves. I'm not
> even sure sumo's using those.
Why PHX? SUMO is using Zeus as a proxy to the databases, and that error message *usually* means Zeus doesn't have any backends available when the connection was attempted. Zeus has a max connections setting of its own, it's possible we're hitting Zeus' max connections limit.
Do you know whether the connection in question is for the master or the slave databases? They're different pools in Zeus.
Comment 5•14 years ago
|
||
(In reply to comment #4)
> Do you know whether the connection in question is for the master or the slave
> databases? They're different pools in Zeus.
James, alternatively if we can find out if the stacktrace was from a write or a r/o op...that'd help too (if not the server name).
| Reporter | ||
Comment 6•14 years ago
|
||
Here's a full stack trace. I can tell that this one was a read to the slaves (we do read from master immediately after writes to avoid rep lag).
Traceback (most recent call last):
File "/data/www/support.mozilla.com/kitsune/vendor/src/django/django/core/handlers/base.py", line 100, in get_response
response = callback(request, *callback_args, **callback_kwargs)
File "/data/www/support.mozilla.com/kitsune/vendor/src/django/django/views/decorators/http.py", line 37, in inner
return func(request, *args, **kwargs)
File "/data/www/support.mozilla.com/kitsune/apps/wiki/views.py", line 128, in document
return jingo.render(request, 'wiki/document.html', data)
File "/data/www/support.mozilla.com/kitsune/vendor/src/jingo/jingo/__init__.py", line 78, in render
rendered = render_to_string(request, template, context)
File "/data/www/support.mozilla.com/kitsune/vendor/src/jingo/jingo/__init__.py", line 96, in render_to_string
return template.render(**get_context())
File "/usr/lib/python2.6/site-packages/jinja2/environment.py", line 891, in render
return self.environment.handle_exception(exc_info, True)
File "/data/www/support.mozilla.com/kitsune/apps/wiki/templates/wiki/document.html", line 10, in top-level template code
{% set localizable_url = url('wiki.document', document.parent.slug, locale=settings.WIKI_DEFAULT_LANGUAGE) %}
File "/data/www/support.mozilla.com/kitsune/apps/wiki/templates/wiki/base.html", line 15, in top-level template code
{% set top_text = _('Firefox Help') %}
File "/data/www/support.mozilla.com/kitsune/templates/layout/base.html", line 58, in top-level template code
{% block content_area %}
File "/data/www/support.mozilla.com/kitsune/apps/wiki/templates/wiki/base.html", line 34, in block "content_area"
{% block content %}
File "/data/www/support.mozilla.com/kitsune/apps/wiki/templates/wiki/document.html", line 15, in block "content"
{% if related %}
File "/data/www/support.mozilla.com/kitsune/vendor/src/django/django/db/models/query.py", line 112, in __nonzero__
iter(self).next()
File "/data/www/support.mozilla.com/kitsune/vendor/src/django/django/db/models/query.py", line 106, in _result_iter
self._fill_cache()
File "/data/www/support.mozilla.com/kitsune/vendor/src/django/django/db/models/query.py", line 760, in _fill_cache
self._result_cache.append(self._iter.next())
File "/data/www/support.mozilla.com/kitsune/vendor/src/django-cache-machine/caching/base.py", line 127, in __iter__
obj = iterator.next()
File "/data/www/support.mozilla.com/kitsune/vendor/src/django/django/db/models/query.py", line 269, in iterator
for row in compiler.results_iter():
File "/data/www/support.mozilla.com/kitsune/vendor/src/django/django/db/models/sql/compiler.py", line 672, in results_iter
for rows in self.execute_sql(MULTI):
File "/data/www/support.mozilla.com/kitsune/vendor/src/django/django/db/models/sql/compiler.py", line 726, in execute_sql
cursor = self.connection.cursor()
File "/data/www/support.mozilla.com/kitsune/vendor/src/django/django/db/backends/__init__.py", line 75, in cursor
cursor = self._cursor()
File "/data/www/support.mozilla.com/kitsune/vendor/src/django/django/db/backends/mysql/base.py", line 297, in _cursor
self.connection = Database.connect(**kwargs)
File "/usr/lib64/python2.6/site-packages/MySQLdb/__init__.py", line 81, in Connect
return Connection(*args, **kwargs)
File "/usr/lib64/python2.6/site-packages/MySQLdb/connections.py", line 188, in __init__
super(Connection, self).__init__(*args, **kwargs2)
OperationalError: (2013, "Lost connection to MySQL server at 'reading authorization packet', system error: 0")
Comment 7•14 years ago
|
||
AFAICT, they're coming from more than just one webhead:
(~7:20AM pacific)
'platform.name': 'pm-app-sumo02.mozilla.org',
'REMOTE_ADDR': '10.2.81.102',
'SERVER_ADDR': '10.2.81.141',
(~2:50 AM pacific)
'platform.name': 'pm-app-sumo03.mozilla.org',
'REMOTE_ADDR': '10.2.81.100',
'SERVER_ADDR': '10.2.81.142',
| Assignee | ||
Comment 8•14 years ago
|
||
OK, there is no max connections limit set up on the sumo database pools. There
is, however, a 4 second timeout on connections to the backends. I bumped that
to 10 seconds on the master pool, since there's only one backend there. On the
slave pool it'll cycle through the backends if that 4 seconds expires hitting
one of them.
There are only 3 backends in the slave pool, and you said these are slave connections, so I suppose it's possible that all three slaves were slow at some point, so I've bumped it up to 10 seconds there as well. Let me know if it still happens at all.
| Reporter | ||
Comment 9•14 years ago
|
||
Seen it happen at least 4 times since you made the change, so about the same rate.
Occasionally we see a similar type of error from search, though I don't know if we're connecting to Sphinx through Zeus or Netscaler. If it's Zeus, might that point to trouble with Zeus?
Here's the error, it looks like sock.recv(8) is failing. It might be a red-herring, but I figure more information is better.
File "/data/www/support.mozilla.com/kitsune/apps/search/sphinxapi.py", line 223, in _GetResponse
(status, ver, length) = unpack('>2HL', sock.recv(8))
error: unpack requires a string argument of length 8
Comment 10•14 years ago
|
||
I think there are more now than before, actually.
Last one (~2:11pm):
Traceback (most recent call last):
File "/data/www/support.mozilla.com/kitsune/vendor/src/django/django/db/models/manager.py", line 132, in get
return self.get_query_set().get(*args, **kwargs)
File "/data/www/support.mozilla.com/kitsune/vendor/src/django/django/db/models/query.py", line 336, in get
num = len(clone)
File "/data/www/support.mozilla.com/kitsune/vendor/src/django/django/db/models/query.py", line 81, in __len__
self._result_cache = list(self.iterator())
File "/data/www/support.mozilla.com/kitsune/vendor/src/django-cache-machine/caching/base.py", line 127, in __iter__
obj = iterator.next()
File "/data/www/support.mozilla.com/kitsune/vendor/src/django/django/db/models/query.py", line 269, in iterator
for row in compiler.results_iter():
File "/data/www/support.mozilla.com/kitsune/vendor/src/django/django/db/models/sql/compiler.py", line 672, in results_iter
for rows in self.execute_sql(MULTI):
File "/data/www/support.mozilla.com/kitsune/vendor/src/django/django/db/models/sql/compiler.py", line 726, in execute_sql
cursor = self.connection.cursor()
File "/data/www/support.mozilla.com/kitsune/vendor/src/django/django/db/backends/__init__.py", line 75, in cursor
cursor = self._cursor()
File "/data/www/support.mozilla.com/kitsune/vendor/src/django/django/db/backends/mysql/base.py", line 297, in _cursor
self.connection = Database.connect(**kwargs)
File "/usr/lib64/python2.6/site-packages/MySQLdb/__init__.py", line 81, in Connect
return Connection(*args, **kwargs)
File "/usr/lib64/python2.6/site-packages/MySQLdb/connections.py", line 188, in __init__
super(Connection, self).__init__(*args, **kwargs2)
OperationalError: (2013, "Lost connection to MySQL server at 'reading authorization packet', system error: 0")
| Reporter | ||
Comment 11•14 years ago
|
||
The rate of this error has definitely gone up in the past few hours.
Comment 12•14 years ago
|
||
Is it possible the problem is locking up resources? That would explain increased timeout being worse. Since it's on slaves, I imagine not - reads are non-blocking, right?
| Assignee | ||
Comment 13•14 years ago
|
||
some interesting spikes and dips in here...
| Assignee | ||
Comment 14•14 years ago
|
||
oh, image didn't include the legend....
The green line is the RO pool, the purple one is the RW pool.
| Assignee | ||
Comment 15•14 years ago
|
||
How's it looking so far, any change? Do the errors you're still getting seem to come in clusters or are they evenly spread out?
Comment 16•14 years ago
|
||
I have actually seen *nothing* today. See screenshot. All times are PDT, yesterday.
http://grab.by/8lUg
Looks promising. You can close this if you want. Hope there will be nothing new by push time tomorrow, too.
| Reporter | ||
Comment 17•14 years ago
|
||
In general, I wouldn't say they came in clusters, but they rarely came alone. It was typical to see 1-3 of these in a 2-3 minute period, but it wasn't like every request would cause it for 2 minutes.
I haven't see any since around 4:15pm PT yesterday.
| Reporter | ||
Comment 18•14 years ago
|
||
Haven't seen this in a week. I'm not sure what y'all did, but thanks!
Status: NEW → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
Updated•10 years ago
|
Product: mozilla.org → mozilla.org Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•