Closed Bug 1025735 Opened 10 years ago Closed 10 years ago

Timestamp-sorted queries are "using filesort" rather than the index

Categories

(Cloud Services Graveyard :: Server: Sync, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: rfkelly, Assigned: rfkelly)

References

Details

(Whiteboard: [qa?])

Attachments

(1 file, 1 obsolete file)

On sync-2.stage, when I ask mysql to explain a query from its slow-query log:


MariaDB [weave0]> explain SELECT bso.userid, bso.collection, bso.id, bso.sortindex, bso.modified, bso.payload, bso.payload_size, bso.ttl FROM bso WHERE bso.userid = 960137 AND bso.collection = 10 AND bso.modified > 1400907104000 AND bso.ttl > 1401232339 ORDER BY bso.modified DESC, bso.id DESC;
+------+-------------+-------+------+-----------------------------------------+---------+---------+-------------+------+-----------------------------+
| id   | select_type | table | type | possible_keys                           | key     | key_len | ref         | rows | Extra                       |
+------+-------------+-------+------+-----------------------------------------+---------+---------+-------------+------+-----------------------------+
|    1 | SIMPLE      | bso   | ref  | PRIMARY,bso_ttl_idx,bso_usr_col_mod_idx | PRIMARY | 8       | const,const |    1 | Using where; Using filesort |
+------+-------------+-------+------+-----------------------------------------+---------+---------+-------------+------+-----------------------------+


We really want this query to be using the bso_user_col_mod index, and we really *don't* want it to be "using filesort".  Bleh.

The cause is the secondary "bso.id DESC" in the ORDER BY clause, as shown by:


MariaDB [weave0]> explain SELECT bso.userid, bso.collection, bso.id, bso.sortindex, bso.modified, bso.payload, bso.payload_size, bso.ttl  FROM bso  WHERE bso.userid = 960137 AND bso.collection = 10 AND bso.modified > 1400907104000 AND bso.ttl > 1401232339 ORDER BY bso.modified DESC;
+------+-------------+-------+------+-----------------------------------------+---------------------+---------+-------------+------+-------------+
| id   | select_type | table | type | possible_keys                           | key                 | key_len | ref         | rows | Extra       |
+------+-------------+-------+------+-----------------------------------------+---------------------+---------+-------------+------+-------------+
|    1 | SIMPLE      | bso   | ref  | PRIMARY,bso_ttl_idx,bso_usr_col_mod_idx | bso_usr_col_mod_idx | 8       | const,const |    1 | Using where |
+------+-------------+-------+------+-----------------------------------------+---------------------+---------+-------------+------+-------------+
1 row in set (0.00 sec)


That's much nicer.

I suspect this was a side-effect of a recent mysql engine upgrade or switch or something.  AFAICT this *used* to work in our sync1.1 setup, per https://bugzilla.mozilla.org/show_bug.cgi?id=735102#c14

I *think* the right thing to do here is to explicitly add the "id" column on to the index, reproducing what MySQL apparently used to do implicitly:


MariaDB [test]> drop index bso_usr_col_mod_idx on bso;Query OK, 0 rows affected (0.03 sec)
Records: 0  Duplicates: 0  Warnings: 0

MariaDB [test]> create index bso_usr_col_mod_idx on bso (userid, collection, modified, id);Query OK, 0 rows affected (0.03 sec)
Records: 0  Duplicates: 0  Warnings: 0

MariaDB [test]> explain SELECT bso.userid, bso.collection, bso.id, bso.sortindex, bso.modified, bso.payload, bso.payload_size, bso.ttl  FROM bso  WHERE bso.userid = 960137 AND bso.collection = 10 AND bso.modified > 1400907104000 AND bso.ttl > 1401232339 ORDER BY bso.modified DESC, id DESC;
+------+-------------+-------+------+-----------------------------------------+---------------------+---------+-------------+------+-------------+
| id   | select_type | table | type | possible_keys                           | key                 | key_len | ref         | rows | Extra       |
+------+-------------+-------+------+-----------------------------------------+---------------------+---------+-------------+------+-------------+
|    1 | SIMPLE      | bso   | ref  | PRIMARY,bso_ttl_idx,bso_usr_col_mod_idx | bso_usr_col_mod_idx | 8       | const,const |    1 | Using where |
+------+-------------+-------+------+-----------------------------------------+---------------------+---------+-------------+------+-------------+
1 row in set (0.00 sec)


We could also just remove "id" from the ORDER BY clause.  It's there to ensure that limit/offset always produces a consistent ordering, but clients are not using offset yet anyway.  I discovered this because I need it for the new offset-driven pagination thing in 1007987 and we would need to keep it for that patch to work correctly.
Patch to include id in app-created table index.

For prod the corresponding patch would be https://github.com/mozilla-services/puppet-config/pull/603
Assignee: nobody → rfkelly
Attachment #8440496 - Flags: review?(telliott)
I have concerns here. You're gaining a nice speedup in the read, but it's at the cost of an additional index, which is going to hurt the write performance and overall disk usage. It may not be much of a win.

I think step 1, at least is to simply remove the sort-on-id and see if that alone obviates the need for internal paging. I also *suspect* that the id being part of the pk means that it'll consistently return values correctly in those corner cases where the timestamp crosses the pagination barrier (or produce inconsistencies infrequently enough that we don't worry about it).

(Also, minor nit: that last query at the end points to id rather than bso.id)
Sorry, s/additional/longer/
Whiteboard: [qa?]
OK.  Here's a version that doesn't use the id for ordering, that we can experiment with.
Attachment #8440998 - Flags: review?(telliott)
Comment on attachment 8440998 [details] [diff] [review]
sync15-no-order-by-id.diff

Review of attachment 8440998 [details] [diff] [review]:
-----------------------------------------------------------------

Worth a shot.
Attachment #8440998 - Flags: review?(telliott) → review+
Specifically, this was rolled out to Sync-1 and Sync-2, but not Sync-3 or Sync-4.
I ran 30min load tests on Sync-1 and Sync-3 and saw no appreciable difference other than disk I/O (as shown in StackDriver.

:rfkelly will continue to look at these results...
Status: NEW → ASSIGNED
I've been running some more write-heavy tests, and see basically the same as :jbonacci reports above.  Sync-1 (which has the extra column) showed around 1M/s more disk IO than Sync-3 (which does not) but it also had an average RPS of 52 versus 47 which might account for some of it.
And unsurprisingly, the overall db size is bigger on sync-1 than sync-3 - by several GB after all our loadtest runs today.
Sync Server 1.5.6 has been rolled out to stage.  The extra column has been removed.
(In reply to Bob Micheletto [:bobm] from comment #10)
> Sync Server 1.5.6 has been rolled out to stage.  The extra column has been
> removed.

Correction: The id index has been removed.
Depends on: 1026346
Comment on attachment 8440496 [details] [diff] [review]
sync15-include-id-in-timestamp-index.diff

Obsoleting the previous patch that we decided not to run with
Attachment #8440496 - Attachment is obsolete: true
Attachment #8440496 - Flags: review?(telliott)
This went out in the 1.5.6 deploy (Bug 1025735) so we'll see what kind of effect it has.  To verify, we should grab a SELECT query or two from the prod slow-query logs, run EXPLAIN on them, and check that they are indeed using the correct index.

I'm going to close this out, we can re-visit the issue of ordering/offset-management when (if?) we get around to rolling out Bug 1007987 or similar.
Status: ASSIGNED → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
I see your Resolved this. Excellent.

But since you said this, we have to do it before I can Verify the ticket:
"This went out in the 1.5.6 deploy (Bug 1025735) so we'll see what kind of effect it has.  To verify, we should grab a SELECT query or two from the prod slow-query logs, run EXPLAIN on them, and check that they are indeed using the correct index."

:-)
Yep.  You'll need :bobm for that one :-)
Flags: needinfo?(bobm)
MariaDB [weave0]> EXPLAIN SELECT bso.userid, bso.collection, bso.id, bso.sortindex, bso.modified, bso.payload, bso.payload_size, bso.ttl 
    -> FROM bso 
    -> WHERE bso.userid = XXXXXXX AND bso.collection = 7 AND bso.modified > 1403222235690 AND bso.ttl > 1403297438 ORDER BY bso.sortindex DESC 
    ->  LIMIT 5001;
+------+-------------+-------+-------+-----------------------------------------+---------------------+---------+------+------+----------------------------------------------------+
| id   | select_type | table | type  | possible_keys                           | key                 | key_len | ref  | rows | Extra                                              |
+------+-------------+-------+-------+-----------------------------------------+---------------------+---------+------+------+----------------------------------------------------+
|    1 | SIMPLE      | bso   | range | PRIMARY,bso_ttl_idx,bso_usr_col_mod_idx | bso_usr_col_mod_idx | 16      | NULL |   17 | Using index condition; Using where; Using filesort |
+------+-------------+-------+-------+-----------------------------------------+---------------------+---------+------+------+----------------------------------------------------+
1 row in set (0.01 sec)
Flags: needinfo?(bobm)
> ORDER BY bso.sortindex DESC 

Yep, unfortunately the sortindex ones will still have to use a filesort.  The good news is that it's using the bso_usr_col_mod_idx for selecting the actual rows, rather than doing a full scan of all the user's items.

Bob, are there any "ORDER BY bso.modified" queries in the slow log, and if so can you please also EXPLAIN one of them?  It should *not* be using filesort.
Flags: needinfo?(bobm)
(In reply to Ryan Kelly [:rfkelly] from comment #17)
> Bob, are there any "ORDER BY bso.modified" queries in the slow log, and if
> so can you please also EXPLAIN one of them?  It should *not* be using
> filesort.

MariaDB [weave0]> EXPLAIN SELECT bso.userid, bso.collection, bso.id, bso.sortindex, bso.modified, bso.payload, bso.payload_size, bso.ttl  FROM bso  WHERE bso.userid = XXXXXXX AND bso.collection = 4 AND bso.modified > 1403218499710 AND bso.ttl > 1403545706 ORDER BY bso.modified DESC LIMIT 5001;

+------+-------------+-------+-------+-----------------------------------------+---------------------+---------+------+------+-------------+
| id   | select_type | table | type  | possible_keys                           | key                 | key_len | ref  | rows | Extra       |
+------+-------------+-------+-------+-----------------------------------------+---------------------+---------+------+------+-------------+
|    1 | SIMPLE      | bso   | range | PRIMARY,bso_ttl_idx,bso_usr_col_mod_idx | bso_usr_col_mod_idx | 16      | NULL |  179 | Using where |
+------+-------------+-------+-------+-----------------------------------------+---------------------+---------+------+------+-------------+
1 row in set (0.32 sec)
Flags: needinfo?(bobm)
:thumbsup:
Product: Cloud Services → Cloud Services Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: