Closed
Bug 550535
Opened 15 years ago
Closed 15 years ago
Sisyphus - fail to reconnect to database
Categories
(Testing Graveyard :: Sisyphus, defect)
Testing Graveyard
Sisyphus
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: bc, Assigned: bc)
References
Details
Attachments
(4 files, 1 obsolete file)
|
16.46 KB,
patch
|
cmtalbert
:
review+
|
Details | Diff | Splinter Review |
|
22.74 KB,
patch
|
cmtalbert
:
review+
|
Details | Diff | Splinter Review |
|
908 bytes,
patch
|
cmtalbert
:
review+
|
Details | Diff | Splinter Review |
|
4.71 KB,
patch
|
cmtalbert
:
review+
|
Details | Diff | Splinter Review |
Currently, database related function calls attempt to filter connection related exceptions by looking for conn_request in the exception text. This can result in failures to recover from database connection errors when different exceptions are thrown. Also, sisyphus.couchdb.connectToDatabase() does not terminate until a connection is reestablished. This can cause problems if the couchdb call enters a permanent failure mode independent of the actual database availability.
This patch eliminates the exception check for conn_request and the limits the number of attempts that connectToDatabase() will attempt to reconnect before failing.
Attachment #430682 -
Flags: review?(ctalbert)
| Assignee | ||
Updated•15 years ago
|
Blocks: sisyphus-automation
| Assignee | ||
Comment 1•15 years ago
|
||
I missed some 'conn_request' conditionals in the worker.
Attachment #430682 -
Attachment is obsolete: true
Attachment #431112 -
Flags: review?(ctalbert)
Attachment #430682 -
Flags: review?(ctalbert)
Attachment #431112 -
Flags: review?(ctalbert) → review+
The only issue I see here (and I can't really see it in the diff view) is that removing some of these "raise" statements might have some unintended consequences, so just be sure this is well tested. I think it is a good approach though, overall to do the connection retries this way instead of guessing at "which error should I reconnect on"
| Assignee | ||
Comment 3•15 years ago
|
||
I'll double check tonight. Definitely the CouchDBExceptions such as update or delete conflicts should be handled properly and not temporarily ignored. The right way to do this is to audit couchquery and httplib2 for all connection related exceptions and only retry for them. Or I could check for /couchquery/ or /httplib2/ in the exceptionValue. That might be the preferred method. See http://qm-crash-linux01:5984/crashtest/_design/default/_view/logs?startkey=%222010-03-08T08%22 for this list of exceptions since I pushed these recent changes to the server. Let me redo this with the path checks and see how that behaves.
| Assignee | ||
Comment 4•15 years ago
|
||
Ok, I think this is more robust.
Attachment #431367 -
Flags: review?(ctalbert)
Comment on attachment 431367 [details] [diff] [review]
patch v3
I think so too.
Attachment #431367 -
Flags: review?(ctalbert) → review+
| Assignee | ||
Comment 6•15 years ago
|
||
Status: ASSIGNED → RESOLVED
Closed: 15 years ago
Resolution: --- → FIXED
| Assignee | ||
Comment 7•15 years ago
|
||
When a connection related error occurs, recreate the httplib2 Http object.
Attachment #442033 -
Flags: review?(ctalbert)
Attachment #442033 -
Flags: review?(ctalbert) → review+
| Assignee | ||
Comment 8•15 years ago
|
||
| Assignee | ||
Comment 9•15 years ago
|
||
This doesn't quite work when the scripts restart and the last logMessage can not recover. The fix is to recreate the couchquery Database object when the database connection fails.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
| Assignee | ||
Comment 10•15 years ago
|
||
I tested this by running several workers and killing the couchdb server then restarting it after a delay. They do reconnect fine.
Attachment #452081 -
Flags: review?(ctalbert)
Attachment #452081 -
Flags: review?(ctalbert) → review+
| Assignee | ||
Comment 11•15 years ago
|
||
Status: REOPENED → RESOLVED
Closed: 15 years ago → 15 years ago
Resolution: --- → FIXED
Updated•6 years ago
|
Product: Testing → Testing Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•