Closed Bug 470550 Opened 17 years ago Closed 16 years ago

Database replication died in the C01 cluster due to overfilling sphinx tables

Categories

(support.mozilla.org :: General, defect)

defect
Not set
major

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: justdave, Assigned: nkoth)

References

Details

(Whiteboard: sumo_only)

Attachments

(1 file, 1 obsolete file)

Replication on all of the database slaves died this morning with this error: "Error 'The table 'se_words' is full' on query. Default database: 'support_mozilla_org'." Since this is apparently only cache data for sphinx, the problem was resolved by truncating the table on all three slaves (and then on the master). We need to look into how this is set up to prevent this from happening again. As far as I can tell, the max_hash_table_size was set the same on the slaves as it was on the masters, except that the table status showed the master having a larger max_data_size on that table than the slaves did, and changing the max_heap_table_size on the slaves didn't seem to affect that max_data_size at all.
Assignee: nobody → nelson
Target Milestone: --- → 0.8.1
Blocks: 460213
AFAIK, the max_data_size of a table is set at creation, based on the max_heap_table_size setting at the time. Once the table is created, setting max_heap_table_size won't have an effect on tables that have already been created. I could try and remove dependency on the use of a hash table... by using a real table. Any comments on using a real table?
Memory tables get lost any time the server restarts, both on the master and the slaves. Any dependency on that table already existing, regardless of replication being in use or not, should be removed if you expect the app to survive a database server reboot. The table should be created when you need it, and immediately dropped as soon as you're finished with it. If it needs to last longer than a minute or two, you're increasing the risk of a restart hosing it. Leaving it sit around for quick access to something is, of course, doable, as long as you check that it actually exists first and recreate it if it doesn't, and you're only accessing it on the master database. Slaves (especially when we have more than one behind a load balancer and can rotate them in and out of service to deal with load) are quite likely to crash the replication thread if any update is made to an in-memory table that was created before the slave last restarted. Likewise, updates to other tables based on things you put in an in-memory table (like selecting out of the in-memory table to fill another one) is not recommended in a replicated environment at all, unless you're doing the create it, populate it, do what you needed, then immediately drop it thing. And even that's risky, but it's less of a risk (how likely is it that a server would get rebooted right during the few seconds it takes you to do that?)
Attached file use a real table for se_words (obsolete) —
We should convert se_words to a real table. Let's do this with the 0.8.1 push. My tests on my development server show that this table size is going to be somewhat over 6 mb in size (not too big), but the large number of updates on the table mean that my first time indexing increases from 230 sec to about 320 seconds. 150 sec is attributable to selects that have to be run anyway, which (150 sec) is the time taken for subsequent indexing if no pages have changed. These times are not a big deal for a batch mode operation run at most once a day.
Attachment #355606 - Flags: review?(laura)
Attachment #355606 - Attachment is patch: true
Attachment #355606 - Attachment mime type: application/octet-stream → text/plain
Attachment #355606 - Attachment is patch: false
Attachment #355606 - Flags: review?(laura) → review+
Attachment #355606 - Flags: review?(justdave)
Comment on attachment 355606 [details] use a real table for se_words Will running this kind of drop table followed by create table cause any problem with replication? I have absolutely no idea, so asking you just in case.
Attachment #355606 - Flags: review?(justdave) → review-
Comment on attachment 355606 [details] use a real table for se_words I would use "DROP TABLE IF EXISTS `se_words`;" because if you try to drop it without the conditional and it doesn't exist, it'll break the same way.
Is this OK, Dave?
Attachment #355606 - Attachment is obsolete: true
Attachment #356203 - Flags: review?(justdave)
Attachment #356203 - Attachment mime type: application/octet-stream → text/plain
Comment on attachment 356203 [details] replace any existing se_words table with real table yep, looks good to me.
Attachment #356203 - Flags: review?(justdave) → review+
Status: NEW → RESOLVED
Closed: 16 years ago
Keywords: push-needed
Resolution: --- → FIXED
Keywords: push-needed
Whiteboard: sumo_only
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: