668664 - kill pending sql query on thread exit

Reporter

Description

•

14 years ago

If the current thread gets killed, we need to kill any pending SQL query sent to mysql. notes to myself 1. register queries being executed in a threadlocal 2. catch SIGINT (maybe SIGHUP -- need to check gunicorn 3. see how to kill the query 4. quit and let the lib close the socket

:Atoll

Comment 1

•

14 years ago

mysql(1) binary has a SIGINT handler that does this, in a way that actually tells the server to abort the query, and gets back a confirmation. we should investigate how this is done.

Tarek Ziadé (:tarek)

Reporter

Comment 2

•

14 years ago

+ timed out queries. probably via a decorator

Tarek Ziadé (:tarek)

Reporter

Comment 3

•

14 years ago

I have a first prototype that will trigger query KILLS on SIGINT/SIGTERM I would like to try on high load on the load infra. The goal is to check that when the request > 30 seconds and Gunicorn restarts the worker, all pending requests are killed in MySQL. If the prototype works as expected, I'll finish the code + tests in a cleaner version. Pete do you think we can do a load session with this patch ?

Tarek Ziadé (:tarek)

Reporter

Comment 4

•

14 years ago

Attached patch kill pending queries (obsolete) — Details — Splinter Review

Just a first prototype to validate the idea. Don't look at the code :-)

Attachment #545187 - Flags: feedback?

Tarek Ziadé (:tarek)

Reporter

Updated

•

14 years ago

Attachment #545187 - Flags: feedback? → feedback?(petef)

Toby Elliott [:telliott]

Comment 5

•

14 years ago

This actually seems kind of dangerous. Do we really want to kill a delete if it takes longer than the timeout?

Tarek Ziadé (:tarek)

Reporter

Comment 6

•

14 years ago

I think We want to kill any operation that lasts more than Zeus timeout in fact. Having a query last more that 30 seconds while the web app that triggered it is already dead because killed by Zeus/Nginx/Gunicorn, is already a lost cause for the client side: the client already gets a 5xx.

:Atoll

Comment 7

•

14 years ago

Probably not. Some filtering of what to kill would be sensible. Like, initially, SELECTs only. Especially since MySQL rollbacks are 10x as slow as letting it finish under any sort of load (until recent fixed versions).

:Atoll

Comment 8

•

14 years ago

It's okay if the client gets a 5xx for a DELETE and then later on we finish their DELETE. That way in 5-10 minutes it'll be done when they try to DELETE again!

Tarek Ziadé (:tarek)

Reporter

Comment 9

•

14 years ago

(In reply to comment #8) > It's okay if the client gets a 5xx for a DELETE and then later on we finish > their DELETE. That way in 5-10 minutes it'll be done when they try to DELETE > again! What are the queries that are on the edge right now ? select(s) or batch post(s) ? or do we have a mix of all kind on high load ? I made the assumption that the batch posts were the longest ones.

:Atoll

Comment 10

•

14 years ago

The three I know of are, in order of estimated(!) frequency, below. This is from memory, so any actual evidence takes precedence. DELETE FROM collection WHERE uid=? (often; any time a user resets sync or clears sync data) SELECT MAX(timestamp) GROUP BY collection (rarely; unless we flushed memcache, in which case continuously for several hours) INSERT INTO collection (), (), (), (), (), (), () (unknown; usually requires write saturation of db, which is an incident-class event anyways)

:Atoll

Comment 11

•

14 years ago

For the SELECT case, we currently have helper scripts that detect that specific query and terminate it at 100 seconds. We don't touch DELETE or INSERT in any automated fashion.

Tarek Ziadé (:tarek)

Reporter

Comment 12

•

14 years ago

Mmm so: 1- the goal is to kill some queries so the DB does not get saturated when the web app is forced-restarted at 30s 2- the SELECT seems to be a rare event What would be the danger to kill all pending queries on the kill/restart, even DELETEs ? I can't think of a case where it's could be a problem

:Atoll

Comment 13

•

14 years ago

Rolling back a 100 second DELETE can take up to 1000 seconds. Terrifically unsafe. The SELECT is a condition that crops up when the DB is saturated and/or unexpectedly busy. Which occurs any time we trigger more traffic than usual, through either memcache or storage upgrade or etc. events.

Tarek Ziadé (:tarek)

Reporter

Comment 14

•

14 years ago

(In reply to comment #13) > Rolling back a 100 second DELETE can take up to 1000 seconds. Terrifically > unsafe. That's very long indeed. > The SELECT is a condition that crops up when the DB is saturated and/or > unexpectedly busy. Which occurs any time we trigger more traffic than > usual, through either memcache or storage upgrade or etc. events. Fair enough. Adding a some filtering options

Tarek Ziadé (:tarek)

Reporter

Comment 15

•

14 years ago

Attached patch kill pending *select* queries (obsolete) — Details — Splinter Review

Attachment #545187 - Attachment is obsolete: true

Attachment #545216 - Flags: feedback?(rsoderberg)

Attachment #545216 - Flags: feedback?(petef)

Attachment #545187 - Flags: feedback?(petef)

Pete Fritchman [:petef]

Comment 16

•

14 years ago

Comment on attachment 545216 [details] [diff] [review] kill pending *select* queries kill_pending_queries looks like it hardcodes localhost. Don't we have to run kill_pending_queries on every sqluri we know about?

Tarek Ziadé (:tarek)

Reporter

Comment 17

•

14 years ago

Attached patch kill pending *select* queries — Details — Splinter Review

Attachment #545216 - Attachment is obsolete: true

Attachment #547567 - Flags: feedback?(petef)

Attachment #545216 - Flags: feedback?(rsoderberg)

Attachment #545216 - Flags: feedback?(petef)

Toby Elliott [:telliott]

Updated

•

14 years ago

Blocks: 676423

Tarek Ziadé (:tarek)

Reporter

Comment 18

•

14 years ago

Pete, can we work on this this week ? the patch is getting old and deprecated a bit

Tarek Ziadé (:tarek)

Reporter

Updated

•

14 years ago

Assignee: tarek → nobody

Tarek Ziadé (:tarek)

Reporter

Updated

•

14 years ago

Whiteboard: [needs loadtest]

James Bonacci [:jbonacci]

Updated

•

14 years ago

Whiteboard: [needs loadtest] → [needs loadtest][qa+]

James Bonacci [:jbonacci]

Comment 19

•

13 years ago

:atoll :tarek :rfkelly Any specific load testing still needed for this?

Tarek Ziadé (:tarek)

Reporter

Comment 20

•

13 years ago

I did all the work but people lost interest - to be reopened on the next sync fire

Status: NEW → RESOLVED

Closed: 13 years ago

Resolution: --- → WONTFIX

kill pending queries 14 years ago Tarek Ziadé (:tarek) 6.49 KB, patch		Details \| Diff \| Splinter Review
kill pending select queries 14 years ago Tarek Ziadé (:tarek) 6.67 KB, patch		Details \| Diff \| Splinter Review
kill pending select queries 14 years ago Tarek Ziadé (:tarek) 6.43 KB, patch		Details \| Diff \| Splinter Review
patch to kill the current query when execution is interrupted 12 years ago Ryan Kelly [:rfkelly] 2.94 KB, patch	telliott : review+	Details \| Diff \| Splinter Review
patch to kill the current query when execution is interrupted 12 years ago Ryan Kelly [:rfkelly] 3.46 KB, patch	telliott : review+	Details \| Diff \| Splinter Review
sync15-cleanup-query-on-error.diff 11 years ago Ryan Kelly [:rfkelly] 5.17 KB, patch	telliott : review+ tarek : feedback+	Details \| Diff \| Splinter Review