[traceback] MemcacheServerError: object too large for cache
Categories
(Socorro :: Webapp, defect, P2)
Tracking
(Not tracked)
People
(Reporter: willkg, Assigned: willkg)
Details
Attachments
(1 file)
Traceback:
MemcacheServerError: b'object too large for cache'
File "django/core/handlers/exception.py", line 47, in inner
response = get_response(request)
File "django/core/handlers/base.py", line 181, in _get_response
response = wrapped_callback(request, *callback_args, **callback_kwargs)
File "django/views/decorators/csrf.py", line 54, in wrapped_view
return view_func(*args, **kwargs)
File "crashstats/api/views.py", line 119, in _clear_empty_session
ret = fun(request, *args, **kwargs)
File "crashstats/api/views.py", line 148, in _no_csrf
ret = fun(request, *args, **kwargs)
File "ratelimit/decorators.py", line 24, in _wrapped
return fn(request, *args, **kw)
File "crashstats/crashstats/decorators.py", line 110, in inner
response = view(request, *args, **kwargs)
File "crashstats/crashstats/utils.py", line 244, in wrapper
response = f(request, *args, **kw)
File "crashstats/api/views.py", line 253, in model_wrapper
result = function(**form.cleaned_data)
File "crashstats/crashstats/models.py", line 437, in get
return self._get(expect_json=expect_json, **kwargs)
File "crashstats/crashstats/models.py", line 490, in _get
return self.fetch(
File "crashstats/crashstats/models.py", line 299, in inner
result = method(*args, **kwargs)
File "crashstats/crashstats/models.py", line 387, in fetch
cache.set(cache_key, result, self.cache_seconds)
File "django/core/cache/backends/memcached.py", line 82, in set
if not self._cache.set(key, value, self.get_backend_timeout(timeout)):
File "pymemcache/client/hash.py", line 358, in set
return self._run_cmd('set', key, False, *args, **kwargs)
File "pymemcache/client/hash.py", line 334, in _run_cmd
return self._safely_run_func(
File "pymemcache/client/hash.py", line 214, in _safely_run_func
result = func(*args, **kwargs)
File "pymemcache/client/base.py", line 462, in set
return self._store_cmd(b'set', {key: value}, expire, noreply,
File "pymemcache/client/base.py", line 1107, in _store_cmd
self._raise_errors(line, name)
File "pymemcache/client/base.py", line 940, in _raise_errors
raise MemcacheServerError(error)
This is happening from the processed crash api with crash id 44f5c1dc-0881-47ba-8d54-59b560220203 . I'm not sure if it's happening with others, but I didn't see any when spot-checking the Sentry issue.
| Assignee | ||
Comment 1•3 years ago
|
||
| Assignee | ||
Comment 2•3 years ago
|
||
willkg merged PR #5996: "bug 1753550: recover after cache.set error" in f3e0969.
I'll add a graph for that metric (webapp.crashstats.models.cache_set_error). Then we can see how often this happens. If it happens periodically, we should add some logging so we can see which crash ids it happens with. That was too difficult to figure out how to do in this pass.
Also, once this is in prod, I can look at the crash data and determine what's going on with that crash report.
| Assignee | ||
Comment 3•3 years ago
|
||
I deployed this to prod just now in bug #1753554.
I added a graph to the dashboard tracking cache_set_error. There are only a couple of instances. Seems like the problem was isolated to this crash report.
I downloaded the data for this crash report. It's got a normal looking raw crash, but the processed crash has a very large memory_report section. Sizes:
- raw_crash: 18,705
- processed_crash: 9,840,796
I don't understand why I was having problems retrieving the raw crash. I wonder if the /api/RawCrash/ actually pulls both the raw and processed crash reports or something like that. Otherwise I don't see how it triggered a 1mb max size issue.
I'll look into this more tomorrow.
| Assignee | ||
Updated•3 years ago
|
| Assignee | ||
Comment 4•3 years ago
|
||
When upgrading Django to 3.2 this week, we switched from python-memcached to pymemcache as the Python library to access memcache with.
It looks like python-memcached ignores set failures. For example, if the value is too large, python-memcached's .set() returns non-zero on success and zero on failure. There are a couple of cases where it raises an exception, but it looks like that's only for network issues and it otherwise ignores the server response.
pymemcache .set(), on the other hand, checks the server response for SERVER_ERROR and raises a MemcacheServerError if it's an error.
I suspect that Crash Stats has always had crash reports that exceed the 1mb value limit. This issue comes up now because we've switched libraries and the new one is more informative as to what just happened.
I think the fix in PR 5996 is the right way to go. Crash Stats uses caching as an optimization measure so it doesn't have to re-fetch the values from the crash storage. If the value can't be cached, that's fine--Crash Stats will just refetch it the next time.
Jason said we could increase the max value size. If we did that, I don't think we need to do anything with the client--it should just work. I don't think I want to do this now, but we can keep it in mind especially if we find ourselves with lots of failures.
One thing we should do that we aren't doing currently, is setting network timeouts:
Always set the connect_timeout and timeout arguments in the pymemcache.client.base.Client constructor to avoid blocking your process when memcached is slow. You might also want to enable the no_delay option, which sets the TCP_NODELAY flag on the connection’s socket.
I'll look at doing that soon.
For now, marking this as FIXED.
Description
•