Closed Bug 821335 Opened 12 years ago Closed 9 years ago

Enhance debug logging for Thrift client connections

Categories

(Socorro Graveyard :: Middleware, defect, P2)

defect

Tracking

(Not tracked)

RESOLVED INVALID

People

(Reporter: dre, Unassigned)

References

Details

Bug 819881 describes some problems with frequent Thrift timeout issues on a cronjob.  Currently, there isn't enough information to be found on either side to effectively diagnose the root problem.

I'd like to recommend we evaluate adding any combination of the following data points to the client debug log to help troubleshoot this and similar issues in the future:

* Client method that caused a connection error
** Which table is being accessed
** Which column families are being accessed
** Read or Write operation
** Rowkey responsible for the failure (allows us to narrow the troubleshooting to a particular region)

* Type of connection error when one occurs

* Time until the next retry whenever a connection error occurs

Additionally, logging these items either periodically, or after a failure condition would be helpful for general monitoring or to evaluate the severity of observed errors:

* Number of open Thrift connections
** In the last 1/5/15 minutes
*** per process
*** per thread

* Number of refused connections
** In the last 1/5/15 minutes
*** per process
*** per thread

* Number of closed connections
** In the last 1/5/15 minutes
*** per process
*** per thread

* Number of timeouts connections
** In the last 1/5/15 minutes
*** per process
*** per thread

* Number of successful operations
** In the last 1/5/15 minutes
*** per process
*** per thread
Depends on: 822069
We're no longer using Thrift at all.
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → INVALID
Product: Socorro → Socorro Graveyard
You need to log in before you can comment on or make changes to this bug.