Closed
Bug 1041263
Opened 10 years ago
Closed 10 years ago
processing broken on stage
Categories
(Socorro :: Backend, task)
Socorro
Backend
Tracking
(Not tracked)
RESOLVED
FIXED
95
People
(Reporter: rhelmer, Unassigned)
Details
Lars, I think we broke hbase processing while getting Ceph processing working ;)
2014-07-19 20:00:55,918 ERROR - Thread-12 - Error in processing a job
Traceback (most recent call last):
File "/data/socorro/application/socorro/lib/threaded_task_manager.py", line 336, in run
function(*args, **kwargs) # execute the task
File "/data/socorro/application/socorro/processor/processor_app.py", line 166, in transform
crash_id
File "/data/socorro/application/socorro/external/crashstorage_base.py", line 465, in save_raw_and_processed
crash_id
File "/data/socorro/application/socorro/external/crashstorage_base.py", line 170, in save_raw_and_processed
self.save_raw_crash(raw_crash, dumps, crash_id)
File "/data/socorro/application/socorro/external/crashstorage_base.py", line 879, in save_raw_crash
self.wrapped_crashstore.save_raw_crash(raw_crash, dumps, crash_id)
File "/data/socorro/application/socorro/external/hb/crashstorage.py", line 253, in save_raw_crash
return transaction()
File "/data/socorro/application/socorro/external/hb/crashstorage.py", line 104, in <lambda>
return lambda *args, **kwargs: self.transaction(lambda conn_ctx: f(conn_ctx.client, *args, **kwargs))
File "/data/socorro/application/socorro/database/transaction_executor.py", line 104, in __call__
result = function(connection, *args, **kwargs)
File "/data/socorro/application/socorro/external/hb/crashstorage.py", line 104, in <lambda>
return lambda *args, **kwargs: self.transaction(lambda conn_ctx: f(conn_ctx.client, *args, **kwargs))
File "/data/socorro/application/socorro/external/hb/crashstorage.py", line 166, in transaction
for key, dump in dumps.iteritems():
AttributeError: 'NoneType' object has no attribute 'iteritems'
Reporter | ||
Comment 1•10 years ago
|
||
Lonnen, we should not release master to prod until this is either fixed or backed out
Flags: needinfo?(chris.lonnen)
Reporter | ||
Comment 2•10 years ago
|
||
Also, I think it is bad that neither our testing nor nagios noticed this... I first noticed in while testing processing with FS so I am surprised the integration test script didn't fail.
Comment 3•10 years ago
|
||
The plan --
back the changes out
write a test to check for this in the future
reland when the test is passing
Flags: needinfo?(chris.lonnen)
Comment 4•10 years ago
|
||
there's nothing to backout here. The problem is in the benchmarking code in a unfortunate interaction with HBase. At the end of processing a crash, we call a method called "save_raw_and_processed" rather than just "save_processed". Most crashstores make do with the base class version of this method which calls the "save_raw" and "save_processed" methods in turn. HBase, though has it's own "save_raw_and_processed" because the raw crash is already in HBase - we don't want to save it a second time (HBase doesn't really delete, we'd end up with wasted space with two copies of the raw crash in HBase).
The Benchmarking code subverted HBase's self protection, by overriding HBase's "save_raw_and_processed" with the base class "save_raw_and_processed" and causing a second storage attempt at the raw_crash.
Meanwhile, the processor knows too much and is being clever in a bad way. It knows that we're using the "save_raw_and_processed" instead of the simple "save_processed" that is used it the past. It also knows that HBase isn't going to save the raw crash. It knows that PG is only going to save the json part of the raw crash. So the too-clever-for-its-own-good processor doesn't send the dumps into the "save_raw_and_processed" call. With the BenchmarkingCrashStore by-passing HBase's dont-save-the-raw-crash intent, HBase gets a raw crash to store without the dumps. This causes HBase to lose its lunch...
the solution:
1) implement "save_raw_and_processed" in the BenchmarkingCrashStore.
2) stop being clever in the processor and pass the dumps along with the raw crash into
the "save_raw_and_processed"
why wasn't any of this stuff caught? We don't IntegrationTest the benchmarking class. The problem will not appear in any other situation.
Comment 5•10 years ago
|
||
Actually, I take back my comment that the processor is being too clever for its own good. The reason the processor doesn't pass the dumps to the save method is that it doesn't have them. It order to pass them to the save routine, it would have to first fetch them from HBase. Since it knows that it won't be saving them back to HBase and PG doesn't want them, it doesn't go through the overhead of fetching them only to just throw them away.
Reporter | ||
Comment 6•10 years ago
|
||
(In reply to K Lars Lohn [:lars] [:klohn] from comment #4)
> why wasn't any of this stuff caught? We don't IntegrationTest the
> benchmarking class. The problem will not appear in any other situation.
Ah! Thanks lars, I didn't think that enabling benchmarking on stage *caused* this, I assumed it was just a regression in the main code (which definitely should have been caught by the integration test).
Lowering severity, we should fix but isn't as critical as I had thought when I saw stage failing to process.
I do wish our monitoring could catch this, though.
Severity: blocker → normal
Comment 7•10 years ago
|
||
Commit pushed to master at https://github.com/mozilla/socorro
https://github.com/mozilla/socorro/commit/33dd2c55f4428b239722b7078569d312ea8cbb7e
Remove extra whitespace in crashstorage tests.
Fixes bug 1041263
Updated•10 years ago
|
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Updated•10 years ago
|
Target Milestone: --- → 95
You need to log in
before you can comment on or make changes to this bug.
Description
•