Closed
Bug 1163421
Opened 10 years ago
Closed 10 years ago
processor - OperationalError: out of memory
Categories
(Socorro :: Backend, task)
Socorro
Backend
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: ashish, Assigned: lars)
References
Details
Found this traceback in socorro-processor.log on sp-processor07.phx1:
> 2015-05-10 11:56:39,049 CRITICAL - Thread-16 - socorro.external.postgresql.connection_context transaction error eligible for retry
> Traceback (most recent call last):
> File "/data/socorro/socorro-virtualenv/lib/python2.6/site-packages/socorro-master-py2.6.egg/socorro/database/transaction_executor.py", line 114, in __call__
> result = function(connection, *args, **kwargs)
> File "/data/socorro/socorro-virtualenv/lib/python2.6/site-packages/socorro-master-py2.6.egg/socorro/external/postgresql/crashstorage.py", line 189, in _save_processed_transaction
> self._save_processed_crash(connection, processed_crash)
> File "/data/socorro/socorro-virtualenv/lib/python2.6/site-packages/socorro-master-py2.6.egg/socorro/external/postgresql/crashstorage.py", line 231, in _save_processed_crash
> execute_no_results(connection, upsert_sql, values)
> File "/data/socorro/socorro-virtualenv/lib/python2.6/site-packages/socorro-master-py2.6.egg/socorro/external/postgresql/dbapi2_util.py", line 62, in execute_no_results
> a_cursor.execute(sql, parameters)
> OperationalError: out of memory
> DETAIL: Cannot enlarge string buffer containing 0 bytes by 1546842743 more bytes.
Comment 1•10 years ago
|
||
Let's put in the summary that this happened on a processor.
Summary: OperationalError: out of memory → processor - OperationalError: out of memory
| Reporter | ||
Comment 2•10 years ago
|
||
Recurred at 2015-05-10 17:34:03,497
Comment 3•10 years ago
|
||
(In reply to Ashish Vijayaram [:ashish] from comment #0)
> > OperationalError: out of memory
> > DETAIL: Cannot enlarge string buffer containing 0 bytes by 1546842743 more bytes.
I've seen this testing on local systems but haven't noticed it on production before.
Ashish, did this processor need to be restarted, or did it just log this and then go back to normal?
Flags: needinfo?(ashish)
Comment 4•10 years ago
|
||
I looked across all the processors and I see 1 on sp-processor05, 18 on sp-processor07 and 62 on sp-processor08 in the past 24 hours.
It looks like this has happened sporadically in the past too. We should see if it is particular crashes that are causing it.
Comment 5•10 years ago
|
||
Just got an alert for sp-processor08.phx1. There have been variou sother alerts noted in the oncall log for the soccorro processors recently.
2015-05-11 03:27:27,708 INFO - Thread-12 - finishing successful job: a3b7ce56-48e0-4240-9ddb-9ec7f2150511
2015-05-11 03:27:27,879 INFO - Thread-18 - finishing successful job: d0e59562-f908-4abf-a3a2-322392150511
2015-05-11 03:27:27,880 CRITICAL - Thread-25 - socorro.external.postgresql.connection_context transaction error eligible for retry
Traceback (most recent call last):
File "/data/socorro/socorro-virtualenv/lib/python2.6/site-packages/socorro-master-py2.6.egg/socorro/database/transaction_executor.py", line 114, in __call__
result = function(connection, *args, **kwargs)
File "/data/socorro/socorro-virtualenv/lib/python2.6/site-packages/socorro-master-py2.6.egg/socorro/external/postgresql/crashstorage.py", line 189, in _save_processed_transaction
self._save_processed_crash(connection, processed_crash)
File "/data/socorro/socorro-virtualenv/lib/python2.6/site-packages/socorro-master-py2.6.egg/socorro/external/postgresql/crashstorage.py", line 231, in _save_processed_crash
execute_no_results(connection, upsert_sql, values)
File "/data/socorro/socorro-virtualenv/lib/python2.6/site-packages/socorro-master-py2.6.egg/socorro/external/postgresql/dbapi2_util.py", line 62, in execute_no_results
a_cursor.execute(sql, parameters)
OperationalError: out of memory
DETAIL: Cannot enlarge string buffer containing 0 bytes by 1146103493 more bytes.
2015-05-11 03:27:28,784 DEBUG - QueuingThread - RabbitMQCrashStorage acking with delivery_tag 38516
Comment 6•10 years ago
|
||
:rhelmer when this happened to day on processor08.phx1 I assumed the process had died and been restarted automatically, I did not restart it myself.
It appears I was incorrect in that assumption, it's been running since May 10th:
[pradcliffe@sp-processor08.phx1 ~]$ ps auxww | fgrep proc
1988 2804 0.0 0.0 100952 684 pts/0 S+ 04:26 0:00 fgrep proc
socorro 31834 87.9 64.5 18838380 15878968 ? SNsl May10 1332:01 /data/socorro/socorro-virtualenv/bin/python /data/socorro/application/socorro/processor/processor_app.py --admin.conf=/etc/socorro/processor.ini
No longer blocks: 1163566
Flags: needinfo?(ashish)
Comment 7•10 years ago
|
||
sp-processor02.phx1
socorro 386 31.7 96.1 34069868 23652772 ? SNsl May10 514:33 /data/socorro/socorro-virtualenv/bin/python /data/socorro/application/socorro/processor/processor_app.py --admin.conf=/etc/socorro/processor.ini
Processor taking up all the memory on the machine, loaded by swapping to death.
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
386 socorro 30 10 32.5g 22g 960 S 71.4 96.3 514:29.37 python
| Reporter | ||
Comment 8•10 years ago
|
||
(In reply to Robert Helmer [:rhelmer] from comment #3)
> (In reply to Ashish Vijayaram [:ashish] from comment #0)
> > > OperationalError: out of memory
> > > DETAIL: Cannot enlarge string buffer containing 0 bytes by 1546842743 more bytes.
>
> I've seen this testing on local systems but haven't noticed it on production
> before.
>
> Ashish, did this processor need to be restarted, or did it just log this and
> then go back to normal?
I did not restart. The processor times out about 10 mins before the traceback and resumes processing.
Updated•10 years ago
|
Assignee: nobody → lars
Status: NEW → ASSIGNED
Component: General → Backend
Comment 10•10 years ago
|
||
Comment 11•10 years ago
|
||
Commit pushed to master at https://github.com/mozilla/socorro
https://github.com/mozilla/socorro/commit/44a9490c0a5c73ce1add0af94e261149ac7adac8
Merge pull request #2779 from twobraids/OOM-not-retriable
fixes Bug 1163421 moved PG OperationalError to Conditional list
Updated•10 years ago
|
Status: ASSIGNED → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Comment 12•10 years ago
|
||
We're going to push this out to production momentarily - the problem will still happen but not cause retries-with-backoff anymore.
You need to log in
before you can comment on or make changes to this bug.
Description
•