Closed Bug 623659 Opened 11 years ago Closed 11 years ago

Middleware bridge to search two different HBase instances for a crash

Categories

(Socorro :: General, task)

task
Not set
major

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: laura, Assigned: lars)

Details

If xstevens and dre can't get all the crashes to PHX before the migration, we may find ourselves in the position of having to search both PHX and SJC for a crash.

This will require:
- changes to GetCrash
- addition of a config option for a secondary HBase instance

We'll also need Metrics to set up a second instance for us to test against (on the research cluster?) - do you need a bug, Daniel/Xavier?

We may also need to get holes poked in the firewall from PHX middleware to SJC Hbase.  Jabba, can you clarify if this is needed?
ok, i've implemented and started testing a dual hbase scheme for the middleware.  If, through the web service, a request is made for an ooid (meta, dump or processed), the web service will first look to a primary HBase instance.  If it fails to find the ooid there, it will try a secondary HBase instance.  

As it turns out, however, the middleware is not the only place that will have to know about this dual scheme.  The processor also fetches meta and raw dump data from hbase.

If the two HBase instances are arranged such that the older crashes are in the secondary instance and new crashes are going into the primary HBase, most of the processor's work will come from the primary HBase.  However, if a request for a priority job comes in from beyond primary HBase's threshold age, the processor will have to go to the secondary HBase for fulfillment.

Programmatically, this is not difficult.  I just want it known that the ramifications of this are not confined to the middleware.

I'll soon post a patch here that implements this change.  

While I'm arrogant and assume my code is golden by default, it would be wise to test this.  Got any suggestions how I can test?
Have some code that points at both the dev and staging instances.  Ask for crashes that only exist in one or the other.
Lars, status?
The "bridge" exists in svn as the lars-176dev branch.  

I will make another for 175x as lars-175xdev.  That way we'll have code for a bridge in either direction.
Also need to add functionality to search two different reports tables in the same instance (for sharding, basically).
now coding the variation where the two instances of hbase are really the same instance.  In this case, one instance will have to use a different table name other than 'crash_reports'.  

I'm making the assumption that I should make the code work for both read/write operations with an alternate primary table name.  If I were to make only the read operations respond to the alternate table name, then we'd have a very confusing hack.  See * below for the alternative.

Having the read/write capability means that there needs to be index tables for the alternate table.  Right now, if the primary table is 'crash_reports' the index table names will be 'crash_reports_index...' where '...' the name of the index topic.  I'm assuming that the index names for the alternate tables will follow the same pattern.  Here is a list of the parameterized names that I will be templatizing, comment if anything is missing:

crash_reports
crash_reports_index_legacy_unprocessed_flag 
crash_reports_index_legacy_submitted_time
crash_reports_index_hang_id_submitted_time
crash_reports_index_hang_id
crash_reports_index_submitted_time
crash_reports_index_unprocessed_flag

This leaves the issue of the 'metrics' table.  Should that be settable, too?

* If we were to want to go the way of read only table name templatization, I could implement it as a subclass.  That way I can halt any attempt at write operations by raising a "NotImplemanted" exception.  Then I could create a new crashStorage wrapper class that could be used in the DualHbaseCrashStorageSystem as the secondary store.
this code exists in the lars-1761dev branch in googlecode.  It was not deployed.
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → WONTFIX
this is suddenly hot again for use in staging.  It now resides in lars-177dev5 in googlecode.  Need it integrated into 177 in one week.
Severity: normal → major
Status: RESOLVED → REOPENED
Resolution: WONTFIX → ---
Target Milestone: 1.7.6 → 1.7.7
it is my understanding that this feature is wanted only for use of two HBases for the middleware.  However, I suspect that it might be needed in the processor, too.  Consider this scenario:

an ooid is requested, but it is not found it the primary HBase, but it is found in the secondary.  Further, in the secondary, the crash has not been processed, so the middleware queues the crash for priority processing.  If the processor doesn't also know about the secondary HBase, the processor will be unable to find the ooid.

What is the use case in staging where we need this double-barreled HBase approach.  Is it true that we will not need this in production?  How do we certify a version in staging when the production system will have a different configuration and run different code?
(In reply to comment #9)
> it is my understanding that this feature is wanted only for use of two HBases
> for the middleware.  However, I suspect that it might be needed in the
> processor, too.  Consider this scenario:
> 
> an ooid is requested, but it is not found it the primary HBase, but it is found
> in the secondary.  Further, in the secondary, the crash has not been processed,
> so the middleware queues the crash for priority processing.  If the processor
> doesn't also know about the secondary HBase, the processor will be unable to
> find the ooid.
> 
> What is the use case in staging where we need this double-barreled HBase
> approach.  Is it true that we will not need this in production?  How do we
> certify a version in staging when the production system will have a different
> configuration and run different code?

The use case in staging is as follows:
The HBase instance we have in staging contains much less data than the PG instance.  This means that we may try to load a raw crash that isn't there, which makes it hard to test UI features at times.

The limitations of this specific implementation are, I think, that we only need the bridge for GetCrash.  We may expand this later, but this will work for now.
checked in to trunk as r3027.
Status: REOPENED → RESOLVED
Closed: 11 years ago11 years ago
Resolution: --- → FIXED
Component: Socorro → General
Product: Webtools → Socorro
You need to log in before you can comment on or make changes to this bug.