Closed Bug 1163421 Opened 10 years ago Closed 10 years ago

processor - OperationalError: out of memory

Categories

(Socorro :: Backend, task)

task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: ashish, Assigned: lars)

References

Details

Found this traceback in socorro-processor.log on sp-processor07.phx1: > 2015-05-10 11:56:39,049 CRITICAL - Thread-16 - socorro.external.postgresql.connection_context transaction error eligible for retry > Traceback (most recent call last): > File "/data/socorro/socorro-virtualenv/lib/python2.6/site-packages/socorro-master-py2.6.egg/socorro/database/transaction_executor.py", line 114, in __call__ > result = function(connection, *args, **kwargs) > File "/data/socorro/socorro-virtualenv/lib/python2.6/site-packages/socorro-master-py2.6.egg/socorro/external/postgresql/crashstorage.py", line 189, in _save_processed_transaction > self._save_processed_crash(connection, processed_crash) > File "/data/socorro/socorro-virtualenv/lib/python2.6/site-packages/socorro-master-py2.6.egg/socorro/external/postgresql/crashstorage.py", line 231, in _save_processed_crash > execute_no_results(connection, upsert_sql, values) > File "/data/socorro/socorro-virtualenv/lib/python2.6/site-packages/socorro-master-py2.6.egg/socorro/external/postgresql/dbapi2_util.py", line 62, in execute_no_results > a_cursor.execute(sql, parameters) > OperationalError: out of memory > DETAIL: Cannot enlarge string buffer containing 0 bytes by 1546842743 more bytes.
Let's put in the summary that this happened on a processor.
Summary: OperationalError: out of memory → processor - OperationalError: out of memory
Recurred at 2015-05-10 17:34:03,497
(In reply to Ashish Vijayaram [:ashish] from comment #0) > > OperationalError: out of memory > > DETAIL: Cannot enlarge string buffer containing 0 bytes by 1546842743 more bytes. I've seen this testing on local systems but haven't noticed it on production before. Ashish, did this processor need to be restarted, or did it just log this and then go back to normal?
Flags: needinfo?(ashish)
I looked across all the processors and I see 1 on sp-processor05, 18 on sp-processor07 and 62 on sp-processor08 in the past 24 hours. It looks like this has happened sporadically in the past too. We should see if it is particular crashes that are causing it.
Just got an alert for sp-processor08.phx1. There have been variou sother alerts noted in the oncall log for the soccorro processors recently. 2015-05-11 03:27:27,708 INFO - Thread-12 - finishing successful job: a3b7ce56-48e0-4240-9ddb-9ec7f2150511 2015-05-11 03:27:27,879 INFO - Thread-18 - finishing successful job: d0e59562-f908-4abf-a3a2-322392150511 2015-05-11 03:27:27,880 CRITICAL - Thread-25 - socorro.external.postgresql.connection_context transaction error eligible for retry Traceback (most recent call last): File "/data/socorro/socorro-virtualenv/lib/python2.6/site-packages/socorro-master-py2.6.egg/socorro/database/transaction_executor.py", line 114, in __call__ result = function(connection, *args, **kwargs) File "/data/socorro/socorro-virtualenv/lib/python2.6/site-packages/socorro-master-py2.6.egg/socorro/external/postgresql/crashstorage.py", line 189, in _save_processed_transaction self._save_processed_crash(connection, processed_crash) File "/data/socorro/socorro-virtualenv/lib/python2.6/site-packages/socorro-master-py2.6.egg/socorro/external/postgresql/crashstorage.py", line 231, in _save_processed_crash execute_no_results(connection, upsert_sql, values) File "/data/socorro/socorro-virtualenv/lib/python2.6/site-packages/socorro-master-py2.6.egg/socorro/external/postgresql/dbapi2_util.py", line 62, in execute_no_results a_cursor.execute(sql, parameters) OperationalError: out of memory DETAIL: Cannot enlarge string buffer containing 0 bytes by 1146103493 more bytes. 2015-05-11 03:27:28,784 DEBUG - QueuingThread - RabbitMQCrashStorage acking with delivery_tag 38516
Blocks: 1163566
:rhelmer when this happened to day on processor08.phx1 I assumed the process had died and been restarted automatically, I did not restart it myself. It appears I was incorrect in that assumption, it's been running since May 10th: [pradcliffe@sp-processor08.phx1 ~]$ ps auxww | fgrep proc 1988 2804 0.0 0.0 100952 684 pts/0 S+ 04:26 0:00 fgrep proc socorro 31834 87.9 64.5 18838380 15878968 ? SNsl May10 1332:01 /data/socorro/socorro-virtualenv/bin/python /data/socorro/application/socorro/processor/processor_app.py --admin.conf=/etc/socorro/processor.ini
No longer blocks: 1163566
Flags: needinfo?(ashish)
sp-processor02.phx1 socorro 386 31.7 96.1 34069868 23652772 ? SNsl May10 514:33 /data/socorro/socorro-virtualenv/bin/python /data/socorro/application/socorro/processor/processor_app.py --admin.conf=/etc/socorro/processor.ini Processor taking up all the memory on the machine, loaded by swapping to death. PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 386 socorro 30 10 32.5g 22g 960 S 71.4 96.3 514:29.37 python
(In reply to Robert Helmer [:rhelmer] from comment #3) > (In reply to Ashish Vijayaram [:ashish] from comment #0) > > > OperationalError: out of memory > > > DETAIL: Cannot enlarge string buffer containing 0 bytes by 1546842743 more bytes. > > I've seen this testing on local systems but haven't noticed it on production > before. > > Ashish, did this processor need to be restarted, or did it just log this and > then go back to normal? I did not restart. The processor times out about 10 mins before the traceback and resumes processing.
Timeout is documented in Bug 1163566
See Also: → 1163566
Assignee: nobody → lars
Status: NEW → ASSIGNED
Component: General → Backend
Commit pushed to master at https://github.com/mozilla/socorro https://github.com/mozilla/socorro/commit/44a9490c0a5c73ce1add0af94e261149ac7adac8 Merge pull request #2779 from twobraids/OOM-not-retriable fixes Bug 1163421 moved PG OperationalError to Conditional list
Status: ASSIGNED → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
We're going to push this out to production momentarily - the problem will still happen but not cause retries-with-backoff anymore.
You need to log in before you can comment on or make changes to this bug.