Closed Bug 420809 Opened 17 years ago Closed 17 years ago

Refactor current mutual exclusion method

Tracking

(Not tracked)

Status:

RESOLVED FIXED

People

(Reporter: morgamic, Assigned: lars)

References

Details

Attachments

(1 file)

locking workflow 17 years ago Michael Morgan [:morgamic] 48.02 KB, image/png		Details

Michael Morgan [:morgamic]

Reporter

Description

•

17 years ago

The current monitor is flawed because it achieves mutual exclusion through handling an SQLException, which could happen for any number of reasons. The current code assumes that if an SQLException happens it is because of an attempted insert on a duplicate reports.id, and this is the wrong way to have concurrency between multiple instances of the processor. Use of Gamin was thwarted by NFS and its inability to be polled by a fs monitoring lib like Gamin. An alternative locking mechanism using the database should suffice (using existing db connections) without creating too much additional overhead.

Michael Morgan [:morgamic]

Reporter

Comment 1

•

17 years ago

Attached image locking workflow — Details

Michael Morgan [:morgamic]

Reporter

Updated

•

17 years ago

Priority: -- → P1

K Lars Lohn [:lars] [:klohn]

Assignee

Comment 2

•

17 years ago

Any exclusion done in the database would end being a variant of what is already done. I would suggest that just fixing the current method would be quickest and most appropriate course of action. The current system catches the exception SQLError and assumes that it exists because of a duplicated ID in the database. The class SQLError (actually renamed DBAPIError in more recent versions of SQLAlchemy) broadly covers all server errors. However, it is possible to determine the actual cause of the error through a component within DBAPIError called "orig". This is an instance of the original exception raised by the underlying database API. This exception could be checked to see if it id a "Duplicate Index" or some such error. While it is true that will, perhaps, tie the code tightly to the underlying database, such will be the case anyway if SQLAlchemy is retired.

Michael Morgan [:morgamic]

Reporter

Comment 3

•

17 years ago

There is another problem with this method. If report insertion is successful and then insertion fails for modules or frames we have a zombie report record and the processor always assumes this record has been previously processed. This is a problem with not using a transactional way to process dumps, but I think changing our locking mechanism would help this. In addition to that, the report_id auto increment sequence gets updated when failures happen, so for every successful report the id gets incremented hundreds to thousands of times.

(not currently active) Ted Mielczarek

Comment 4

•

17 years ago

I believe lars has fixed this with his processor refactoring.

Status: NEW → RESOLVED

Closed: 17 years ago

Resolution: --- → FIXED

Michael Morgan [:morgamic]

Reporter

Comment 5

•

17 years ago

Want to keep this open until it's deployed. Still have to stage it in prod environment.

Status: RESOLVED → REOPENED

Resolution: FIXED → ---

Michael Morgan [:morgamic]

Reporter

Comment 6

•

17 years ago

Update from Lars (from email) -- Lars could you add to this if something is missing: * processor.py now responds to SIGTERM as well as ^C to terminate. On responding to one of those signals, it shuts down in an orderly manner by unregistering itself from the database. After a processor.py terminates, monitor.py will reallocate any pending jobs. * both programs now have a complete command line interface for changing options * the config.py has been split into two parts, one for each script: socorro/monitor/config.py and socorro/processor/config.py -- the original socorro/lib/config.py is deprecated and unused. * the format for the config.py files has changed to a more friendly format for handling parameters as command line options. It should be fairly self explanatory. * all writing directly to stdout and stderr has been removed in favor using Python Logging. Both programs will log ERROR and CRITICAL to the stderr while at the same time logging all error/warning/info/debug levels to rotating log files. This behavior is configurable in the config.py files or the command line (or environment variables). The failure of monitor.py to move failed json and dump files on 'stage' is unresolved. In spite of hours of trying, I cannot get it to fail in the same way on 'khan'. When I mess around with file permissions to force a failure, I get proper error reporting behavior. This silent failure stumps me. I have instrumented monitor.py with a --debug=True option that will let tell the program to verify the file relocation of the .json and .dump files. This should, at least, catch the problem in the act.

Michael Morgan [:morgamic]

Reporter

Updated

•

17 years ago

Status: REOPENED → ASSIGNED

Michael Morgan [:morgamic]

Reporter

Updated

•

17 years ago

Blocks: 422584

Michael Morgan [:morgamic]

Reporter

Updated

•

17 years ago

Blocks: 422581

Michael Morgan [:morgamic]

Reporter

Comment 7

•

17 years ago

Pushed, 4/3 @ 8pm PST. Nice job Lars!

Status: ASSIGNED → RESOLVED

Closed: 17 years ago → 17 years ago

Resolution: --- → FIXED

Nobody; OK to take it and work on it

Updated

•

14 years ago

Component: Socorro → General

Product: Webtools → Socorro

You need to log in before you can comment on or make changes to this bug.

Bugzilla

Refactor current mutual exclusion method

Categories

(Socorro :: General, task, P1)

Tracking

(Not tracked)

People

(Reporter: morgamic, Assigned: lars)

References

Details

Crash Data

Security

(public)

User Story

Attachments

(1 file)

Description

Comment 1

Updated

Comment 2

Comment 3

Comment 4

Comment 5

Comment 6

Updated

Updated

Updated

Comment 7

Updated

Attachment

General

Description

File Name

Content Type