Closed
Bug 59994
Opened 25 years ago
Closed 25 years ago
handle CGI timeouts/aborts by aborting database calls
Categories
(Bugzilla :: Bugzilla-General, defect, P3)
Tracking
()
RESOLVED
FIXED
Bugzilla 2.12
People
(Reporter: dmosedale, Assigned: endico)
Details
Some (many? all?) webservers abort CGI scripts if they output no text for long
periods of time. In the case of Apache, this seems to be 300 seconds (5
minutes). For bugzilla.mozilla.org, this is probably a good value, because we
don't want the shadow db tied up for much longer than that (syncs cannot happen
while searches are going on).
It turns out that the current mechanism for aborting the scripts doesn't cause
any in-progress SELECTs to be aborted, which means that truly bad searches can
and do continue for very long periods of time (hours even -- thus causing the
shadow_db to remain out of sync). Bugzilla needs to have abort-handlers that
kill any appropriate running SQL calls (not necessarily all SQL calls, however,
as we can't afford to leave the db in an inconsistent state).
| Reporter | ||
Comment 1•25 years ago
|
||
After much groveling through documentation (as well as the source code to
Apache), I think I've figured out the strategy for this.
It appears as though when CGIs are timed out by Apache, they get sent a SIGTERM
and then three seconds later a SIGKILL, so this is an opportunity to cleanup.
Ideally, we'd use the DBI statement cancel function. Unfortunately, DBI::mysql
doesn't implement that, but presumably it could be added. Alternately, we
could find the $dbh->{'thread_id'} and then use system() to fork off a
mysqladmin to kill the thread in question or even perhaps exec() mysqladmin in
our own process slot.
Note, however, that "man DBI" tells us that:
The first thing to say is that signal handling in Perl is
currently not safe. There is always a small risk of Perl
crashing and/or core dumping when, or after, handling a
signal. (The risk was reduced with 5.004_04 but is still
present.)
There is also experimental code included with 5.005 (Thread::Signal) which runs
which runs signal handlers on a separate thread. However, 5.005 doesn't appear
to me to be built multithreaded by default, so this seems overly risky.
In any case, we'll need to encourage folks to upgrade to 5.004_04. And crashing
in a signal handler will usually be better than running forever and leaving the
shadow database out of sync.
Probably the short-term fix is to just use a cronjob script that runs
'mysqladmin processlist' every so often and kill off any long-running threads.
In fact, we're likely to want to do this anyway, given the signal-handler
lossage. It also wouldn't be difficult to patch MySQL to cause long_queries to
be killed when they hit the time limit.
With both of these quick fixes, though, I'm a bit afraid of one really evil
SELECT (which might take > 5 mins on its own) significantly slowing down other
SELECTs (which might only take 3 mins on their own) and then having the entire
batch of threads get killed). Probably the answer to this (easy to implement in
the cronjob) is to never kill more than 1 thread per x minute interval, for some
value of x.
| Reporter | ||
Updated•25 years ago
|
Status: NEW → ASSIGNED
| Reporter | ||
Comment 2•25 years ago
|
||
I've got a test version of a thread-killer script that kills at most one thread
every five minutes. It's running on bugzilla.mozilla.org, but right now it just
sends me mail rather than killing anything, so that I can verify that it's
really doing the right thing.
Comment 3•25 years ago
|
||
The script also sent me about 30 mails today... could that be fixed?
Comment 4•25 years ago
|
||
Dan: could you also add few extra lines to this script just to check the health
of the database (bug 54818 which I'll mark depending on this). It could work as
follows: if it can't connect to database ->
$sleep = 5;
$pid = `cat /opt/mysql/var/lounge.pid`; #no chop needed, pidfile don't have NL
system("kill -TERM $pid"); sleep($sleep);
system("sh /etc/init.d/mysql.server start");
system("echo database restarted | /bin/mailx -s mysql root");
Depends on: 54818
| Reporter | ||
Comment 5•25 years ago
|
||
rko: I think the 30 mails were the result of a bug. I'll work on integrating
the MySQLd killing code as well.
| Reporter | ||
Comment 6•25 years ago
|
||
I've checked in the initial version of this code which has been running on
lounge for a while. I haven't integrated Risto's mysqld checking suggestion
yet, that's next.
| Reporter | ||
Comment 7•25 years ago
|
||
Mass reassign of my Bugzilla bugs to endico, as I'm switching groups to work on
mozilla LDAP integration full time.
Assignee: dmose → endico
Status: ASSIGNED → NEW
| Assignee | ||
Comment 8•25 years ago
|
||
marking this bug fixed. and removing dependency on 54818 which is actually
a request for an extra feature to this script.
Comment 9•24 years ago
|
||
In search of accurate queries.... (sorry for the spam)
Target Milestone: --- → Bugzilla 2.12
Comment 10•24 years ago
|
||
Moving closed bugs to Bugzilla product
Component: Bugzilla → Bugzilla-General
Product: Webtools → Bugzilla
Version: other → unspecified
Updated•13 years ago
|
QA Contact: matty_is_a_geek → default-qa
You need to log in
before you can comment on or make changes to this bug.
Description
•