the crash mover works fine if the primary storage is the file system. However, it doesn't work well as the system for resubmitting if HBase is the primary storage. The problem is that the crash mover is long running and meant to go as fast as it can. As soon as it sees a crash in the fallback storage, it tries to submit it to hbase. That gives hbase no time to recover from the problem that caused the crash to go into fallback storage. The hbaseResubmit cron has a built in delay and is much more appropriate for the job. It is well tested and been used in production for months. It is my intent to bring it forward from 175 into 176.
this is bogus - I've forgotten that newCrashMover has a backoff retry on the HBase connection. If hbase is in fail mode, the newCrashMover may, indeed, try immediately to insert, but if it fails, it will back off, holding on to that crash until it can insert it. we'll revisit this if we encounter problems