Closed Bug 378804 Opened 14 years ago Closed 14 years ago

AMO downloads table needs a new index

Categories

(mozilla.org Graveyard :: Server Operations, task)

All
Other
task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: morgamic, Assigned: oremj)

References

Details

Attachments

(1 file, 1 obsolete file)

The downloads table in the remora database is missing an index on created and count that is causing the weekly and total download count queries to timeout.

We need to update the table to have the correct indexes to reduce query run time.

I suggest an index on (created,counted) since those are the two columns sorted by that are not yet indexed.

Runnin on khan-vm, the output was:
mysql> create index date_counted on downloads(created,counted);
Query OK, 12595841 rows affected (43 min 2.03 sec)
Records: 12595841  Duplicates: 0  Warnings: 0

This was using a dump from 04/25, 8am.  Production should take considerably less time to run given the hardware.

Since we don't want to lock downloads and installs, we need a way to add this index.  Shaver had some ideas about redirecting download traffic to another db temoprarily.  I think we could also have a maint window and run the query while the install logging is temporarily disabled (< 15 min ballpark estimate).

Either way, the sooner we can get this fixed the sooner we can get download counts back and operational.

Index query as stated above is:
create index date_counted on downloads(created,counted);

Thoughts?
Assignee: server-ops → oremj
Blocks: 375278
Thoughts: Extremely good catch, morgamic.

I like the idea to reconnect to a different DB (or maybe only a table "downloads2" or so). Locking the whole download table for 15 minutes isn't so great. Unless this won't break downloading? (But I think it will give you errors on the front end, right?)
(In reply to comment #1)
>(But I think it will give you errors on the front end, right?)
The error would be silent, I believe -- but it could potentially mess up the redirect... would have to look.

Wanted to note that Shaver was concerned about whether or not the indexes would help enough.  It saved about 20 minutes on khan-vm for the weekly/total queries but in retrospect I think khan-vm is not a good test bed for it because the VM starts swapping at a certain point, which is why the weekly and total downloads queries both take over 2 hours. khan is also a shared resource.  :\

In general, though, I can't see myself arguing against putting an index on queried columns in a table w/ 12 million rows... even if it's tricky to add the index post-release.

If we want to be 100% sure the best test is dumping the AMO database on comparable hardware to test the perf increase we get from the index.

The two pain queries are... weekly:
            SELECT
                downloads.addon_id as addon_id,
                COUNT(downloads.addon_id) as seven_day_count
            FROM
                `downloads`
            WHERE
                `created` >= DATE_SUB(NOW(), INTERVAL 7 DAY)
            GROUP BY
                downloads.addon_id
            ORDER BY
                downloads.addon_id

And total:
            SELECT
                downloads.addon_id as addon_id,
                COUNT(downloads.addon_id) as count
            FROM
                `downloads`
            WHERE
                `counted`=0
            GROUP BY
                downloads.addon_id
            ORDER BY
                downloads.addon_id

I agree, while we don't know if it *helps enough*, this is hardly an argument *against* an index.

Without one we'll be well off limits anyway.

If it helped breaking it into smaller chunks, we could run a cron job midnight-ish, counting for a single day. Making the other values would then end up being much cheaper. (But I am unsure if the daily cron job would be any faster)

I also wonder if the (created, counted) index you want to create will be appropriate given we are using created and counted only by themselves in the WHERE clause. Don't we need two indexes (one on each) instead?
(In reply to comment #3)
> I also wonder if the (created, counted) index you want to create will be
> appropriate given we are using created and counted only by themselves in the
> WHERE clause. Don't we need two indexes (one on each) instead?

That's exactly what prompted me to ask about whether we saw the effects we wanted. :)
Fred, yeah, looks like created/counted aren't used in an AND so you're right.  We should create two separate indexes.  It was still helping on the weekly but not the total... mysql will still optimize the first of a multiple column index.

So yeah, my fault -- we should split them up.  Jeremy -- have you had a chance to test this on a box that has enough memory?
(In reply to comment #4)
> That's exactly what prompted me to ask about whether we saw the effects we
> wanted. :)
Dude... most people ask this question when they try to solve problems (including me).  Let's just get this bug fixed.
The index I will create will: create index date_created on downloads (`created`,`addon_id`);

We should work out a time to shut off download inserts.
This should allow use to do a simple table copy (structure only) and then set the table name in config.php so we can redirect inserts to this table while the index is building for approx 1 hour.

After this, we can re-insert these into the downloads table.  Can someone take a look-see at the patch?
Added this -- clouserw pointed out there is a DELETE statement for file deletion in the dev component.
Attachment #263377 - Attachment is obsolete: true
Couple of notes:
* we'd have to push a prod update to merge in this patch (after it's committed)
* oremj would have to copy the downloads structure to another table named 'downloads-tmp'
* oremj would then have to update config.php to have define('DOWNLOADS_TABLE','downloads-tmp');
* we would disable maintenance.php while the index was being created
* as far as I know, the downloads table isn't touched besides the dev component, maintenance.php and the downloads controller
* we could then run the create index statement on the master db without fear of locking downloads
* then it'd be a matter of re-running the collected downloads in downloads-tmp without the id part of the INSERT
We worked on this today and updated the indexes while redirecting downloads to a temp table.  We then reinserted the temp downloads into the main and reran the script.  Counts should be updated, and the script ran in < 1 min.  (woot).
Status: NEW → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
Product: mozilla.org → mozilla.org Graveyard
You need to log in before you can comment on or make changes to this bug.