Closed Bug 606018 Opened 15 years ago Closed 14 years ago

probably faster queries for username, collection, MAX(modified)

Categories

(Cloud Services Graveyard :: Server: Sync, defect)

x86
macOS
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED WONTFIX

People

(Reporter: Atoll, Unassigned)

Details

(Whiteboard: [python: 624542])

Attachments

(1 file)

> select t1.username, t1.collection, (select t2.modified from wbo87 t2 where t2.username = t1.username and t2.collection = t1.collection order by t2.username desc, t2.collection desc, t2.modified desc limit 1) as modified from wbo87 t1 where username = 1099987 group by username, collection order by null; In english, this query takes the list of collections for a given user from the primary key index, and then for each collection found, it looks up the newest modified for that collection using the modified/usr_col_mod_pay index. The reasoning for this is that MySQL can read a multi-column index either forwards (ASC) or backwards (DESC), and GROUP BY a, b implies ORDER BY a ASC, b ASC. There is a simpler form of the query that has to scan more rows, but returns the same result: > select username, collection, modified from wbo87 where username = 1099987 group by username, collection order by username desc, collection desc, modified desc; This works on the theory that mysql returns the first row found for a given GROUP BY, when a non-aggregate column is returned. MySQL query optimizer isn't smart enough to turn MIN() and MAX() into this sort of thing, so we have to do it the hard way. It is not clear how much faster either of the above queries are. Need to have a stable test platform and a lot of testing to be absolutely certain how much better. But I imagine it'll be significant.
Note that, for a user with a few thousand rows, the subselect version uses an index to speed up the group by and does not require a temporary table, unlike the simpler query. MariaDB [weave0]> explain select t1.username, t1.collection, (select t2.modified from wbo87 t2 where t2.username = t1.username and t2.collection = t1.collection order by t2.username desc, t2.collection desc, t2.modified desc limit 1) as modified from wbo87 t1 where username = 1099987 group by username, collection order by null; +----+--------------------+-------+-------+----------------------------------+----------+---------+-----------+------+---------------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+--------------------+-------+-------+----------------------------------+----------+---------+-----------+------+---------------------------------------+ | 1 | PRIMARY | t1 | range | PRIMARY,modified,usr_col_mod_pay | PRIMARY | 6 | NULL | 1 | Using where; Using index for group-by | | 2 | DEPENDENT SUBQUERY | t2 | ref | PRIMARY,modified,usr_col_mod_pay | modified | 6 | func,func | 35 | Using where; Using index | +----+--------------------+-------+-------+----------------------------------+----------+---------+-----------+------+---------------------------------------+ MariaDB [weave0]> explain select username, collection, modified from wbo87 where username = 1099987 group by username, collection order by username desc, collection desc, modified desc; +----+-------------+-------+------+----------------------------------+----------+---------+-------+------+-------------------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------+------+----------------------------------+----------+---------+-------+------+-------------------------------------------+ | 1 | SIMPLE | wbo87 | ref | PRIMARY,modified,usr_col_mod_pay | modified | 4 | const | 744 | Using where; Using index; Using temporary | +----+-------------+-------+------+----------------------------------+----------+---------+-------+------+-------------------------------------------+
Toby, is the collection count change is related ? or are we just planning to ditch collection_counts in the next API rev. because the clients don't use it ?
The new query works fine. I've also the new PDO::ATTR_TIMEOUT by adding random TCP delays for the queries during the functional tests, and it works as expected (503s) Shouldn't we add the same timeout in reg-server for LDAP ? LDAP_OPT_NETWORK_TIMEOUT seems to be supported by PHP.
Attachment #484886 - Flags: review?(tarek) → review+
Tarek, Toby, could you please confirm the following: - the new query has been proposed for PHP, but has NOT been applied? - the new query works for Python, and HAS been applied? It's not clear from the comments whether those are true or not. I'm primarily interested in Python but if this is in the PHP code, even better.
Whiteboard: [python: 624542]
Can confirm that this is not in the php server - we're still using $select_stmt = '/*[queryName=get_collection_list_with_all]*/ select collection, max(modified) as timestamp, count(*) as ct from ' . $this->_db_name . ' where username = :username and ttl > :ttl group by collection';
Status: NEW → RESOLVED
Closed: 14 years ago
Resolution: --- → WONTFIX
Product: Cloud Services → Cloud Services Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: