Closed
Bug 861224
Opened 11 years ago
Closed 11 years ago
Collections queries causing spikes in database traffic for addons
Categories
(addons.mozilla.org Graveyard :: Collections, defect, P2)
Tracking
(Not tracked)
RESOLVED
FIXED
2013-06-20
People
(Reporter: bjohnson, Assigned: robhudson)
References
Details
(Whiteboard: [qa-])
We're seeing general spikes on the addons database when it comes to gathering some information from the collections tables. Graph: http://goo.gl/ZSmvE With just over 3 million rows in the table, in looking at this, we could speed it up a bit on the database end by creating an index on collections.author_id. Right now author_id is indexed, but as part of a composite index of: author_id and slug, which is a varchar(30) field. However, it appears when we're getting the spikes, it's literally several hundred of queries like this one being run simultaneously: SELECT `collections`.`id` FROM `collections` LEFT OUTER JOIN `collections_users` ON (`collections`.`id` = `collections_users`.`collection_id`) WHERE ("en-us"="en-us" AND (`collections`.`author_id` = X OR `collections_users`.`user_id` = X ));
Reporter | ||
Comment 1•11 years ago
|
||
happened again. Also noticed this query during the spike. Can you guys by chance identify where this came from? SELECT `addons`.`id`, `addons`.`created`, `addons`.`modified`, `addons`.`guid`, `addons`.`slug`, `addons`.`app_slug`, `addons`.`name`, `addons`.`defaultlocale`, `addons`.`addontype_id`, `addons`.`status`, `addons`.`higheststatus`, `addons`.`icontype`, `addons`.`homepage`, `addons`.`supportemail`, `addons`.`supporturl`, `addons`.`description`, `addons`.`summary`, `addons`.`developercomments`, `addons`.`eula`, `addons`.`privacypolicy`, `addons`.`the_reason`, `addons`.`the_future`, `addons`.`averagerating`, `addons`.`bayesianrating`, `addons`.`totalreviews`, `addons`.`weeklydownloads`, `addons`.`totaldownloads`, `addons`.`hotness`, `addons`.`average_daily_downloads`, `addons`.`average_daily_users`, `addons`.`sharecount`, `addons`.`last_updated`, `addons`.`ts_slowness`, `addons`.`inactive`, `addons`.`trusted`, `addons`.`viewsource`, `addons`.`publicstats`, `addons`.`prerelease`, `addons`.`adminreview`, `addons`.`admin_review_type`, `addons`.`sitespecific`, `addons`.`externalsoftware`, `addons`.`dev_agreement`, `addons`.`auto_repackage`, `addons`.`outstanding`, `addons`.`nominationmessage`, `addons`.`target_locale`, `addons`.`locale_disambiguation`, `addons`.`wants_contributions`, `addons`.`paypal_id`, `addons`.`charity_id`, `addons`.`suggested_amount`, `addons`.`total_contributions`, `addons`.`annoying`, `addons`.`enable_thankyou`, `addons`.`thankyou_note`, `addons`.`get_satisfaction_company`, `addons`.`get_satisfaction_product`, `addons`.`premium_type`, `addons`.`manifest_url`, `addons`.`app_domain`, `addons`.`current_version`, `addons`.`backup_version`, `addons`.`make_public`, `addons`.`mozilla_contact`, `addons`.`is_packaged` FROM `addons` INNER JOIN `versions` ON (`addons`.`current_version` = `versions`.`id`) INNER JOIN `appsupport` ON (`addons`.`id` = `appsupport`.`addon_id`) WHERE ("de"="de" AND NOT (`addons`.`status` = 11 ) AND (`versions`.`id` IS NOT NULL AND `addons`.`inactive` = 0 AND `addons`.`status` IN (4)) AND `appsupport`.`app_id` = 1 AND `addons`.`id` IN (304, 57674, 8552, 133, 15061, 143, 56194, 588, 15018, 388, 1957, 2008, 367341, 6079, 6637, 52, 146375, 8848, 410606, 4735, 7400, 235806, 5642, 6138, 270728, 5396, 161775, 162124, 113508, 176002, 8289, 58517, 2685, 334573, 118646, 5240, 155815, 375335, 8147, 347597, 404489, 4305, 1200, 10976, 337334, 12546, 422108, 12313, 422058, 390779, 109232, 390771, 421400, 14164, 368953, 278893, 67648, 355823, 3825, 258236, 354772, 146364, 162158, 66584, 90787, 11213, 370506, 309955, 46519, 6485, 4156, 9488, 344, 2962, 9511, 4386, 71410, 2232, 5555, 3131, 2703, 14642, 7484, 13905, 1252, 9322, 8624, 368099, 3190, 2266, 337479, 59191, 2886, 355807, 4513, 2294, 526, 324975, 1579, 5986, 10091, 331634, 181920, 269245, 146348, 314124, 8743, 195789, 2707, 8360, 311926, 337533, 5478, 3002, 1672, 421, 3199, 295894, 306531, 12330, 270822, 4336, 11778, 2050, 182864, 2620, 620, 14120, 349222, 162141, 13848, 991, 361746, 157767, 316906, 142879, 979, 180649, 371987, 58189, 371677, 423804, 337658, 337263, 315874, 364735, 368682, 369147, 324987, 4571, 398980, 335328, 7481, 235796, 393928, 359598, 62) AND `appsupport`.`min` <= 20000100200100 AND `addons`.`id` IN (422058, 337658, 12313, 4156));
Updated•11 years ago
|
Summary: Collections queries causing spikes in database traffic. → Collections queries causing spikes in database traffic for addons
Comment 2•11 years ago
|
||
Adding Wil to this bug. The query Brandon mentioned in comment 1 happened 224 times in the hour between 11 pm and midnight Pacific time last night, with an average query time of 6.02 seconds (total query time 1347 seconds). *************************** 1. row *************************** id: 1 select_type: SIMPLE table: collections partitions: NULL type: index possible_keys: author_id key: author_id key_len: 98 ref: NULL rows: 3109028 Extra: Using index *************************** 2. row *************************** id: 1 select_type: SIMPLE table: collections_users partitions: NULL type: ref possible_keys: collection_id key: collection_id key_len: 4 ref: addons_mozilla_org.collections.id rows: 1 Extra: Using where; Using index 2 rows in set (0.00 sec) It's using an index, but the query isn't optimal. Changing the OR to a UNION makes queries MUCH faster: from SELECT `collections`.`id` FROM `collections` LEFT OUTER JOIN `collections_users` ON (`collections`.`id` = `collections_users`.`collection_id`) WHERE ("ja"="ja" AND (`collections`.`author_id` = 3283336 OR `collections_users`.`user_id` = 3283336 )); to SELECT `collections`.`id` FROM `collections` LEFT OUTER JOIN `collections_users` ON (`collections`.`id` = `collections_users`.`collection_id`) WHERE ("ja"="ja" AND `collections`.`author_id` = 3283336) UNION SELECT `collections`.`id` FROM `collections` LEFT OUTER JOIN `collections_users` ON (`collections`.`id` = `collections_users`.`collection_id`) WHERE ("ja"="ja" AND `collections_users`.`user_id` = 3283336 ); Is that something you can change? I ran each query 3 times in a row and query time goes down from over 5 seconds for the first query with OR to 0.02 seconds for the 2nd query with the UNION.
Comment 3•11 years ago
|
||
(database load paged again last night at 11:51 pm Pacific, or 2:51 am my time. These are early warnings for the database falling over)
Comment 4•11 years ago
|
||
FWIW the UNION instead of the OR makes the query more than 175 times faster.
Assignee | ||
Comment 5•11 years ago
|
||
For collections the code that seems to produce this query is here: https://github.com/mozilla/zamboni/blob/master/apps/bandwagon/models.py#L67-L71 What's tricky about converting this to a UNION is this is in the Django model manager and returns a queryset which allows other filters to be applied to it later. The only way I can think to still achieve this is to do the UNION query and then do a separate `ID IN (...)` query to get all the collections as a queryset that Django can pass along to be filtered more later. But I've heard "IN" queries can be slow also if they're large enough.
Comment 6•11 years ago
|
||
Index is added in https://github.com/mozilla/zamboni/commit/0a8b45ea40c0eafe09b0b13914117cd1eae8cea1 and will go out on Thursday
Comment 7•11 years ago
|
||
Wil - that index already exists on collections, why are you adding it again? That will make writes to the collections table even slower. Please revert that patch.
Comment 8•11 years ago
|
||
(In reply to Rob Hudson [:robhudson] from comment #5) > For collections the code that seems to produce this query is here: > https://github.com/mozilla/zamboni/blob/master/apps/bandwagon/models.py#L67- > L71 > > What's tricky about converting this to a UNION is this is in the Django > model manager and returns a queryset which allows other filters to be > applied to it later. The only way I can think to still achieve this is to do > the UNION query and then do a separate `ID IN (...)` query to get all the > collections as a queryset that Django can pass along to be filtered more > later. But I've heard "IN" queries can be slow also if they're large enough. IN does make things slow if there are a lot of results, and of course we don't know how many results this kind of query *might* have. The problem here is that the OR with 2 different filters really makes it 2 different queries (because you want to use 2 different indexes). Is there a way to just query the table twice instead of once? There will be more network latency, but running 2 separate queries will be much faster in this case, and then the extra filters could just be applied to both queries?
Comment 9•11 years ago
|
||
(In reply to Sheeri Cabral [:sheeri] from comment #7) > Wil - that index already exists on collections, why are you adding it again? > That will make writes to the collections table even slower. Please revert > that patch. Brandon asked me to add it. Please coordinate and tell me what you'd like to do.
Reporter | ||
Comment 10•11 years ago
|
||
After we had talked in IRC following comment 1, it seemed making this 2 queries was not able to be done. That was why the recommendation to add the index went up. It will slow down writes by 4 bytes per record, but increase the performance of this specific select. Doing 2 separate queries (or the UNION, as that's effectively what it's doing) here is the optimal method. If we can do that, let's do it instead. :)
Comment 11•11 years ago
|
||
(I agree with Brandon FWIW)
Comment 12•11 years ago
|
||
Ok, I'll leave the index in.
Comment 13•11 years ago
|
||
Wil - In comment 8 Rob implies that we can split it into 2 queries or use a UNION. We'd rather go down that path. The index has not been tested on addons stage, so we don't actually know what kind of a speedup it will bring (the speedup is based on theory, so we know it will speed up if we do that, but not by how much). And splitting it is a much better option - in theory, and as you can see by the speedup (175 times faster), in practice. Additionally, leaving the index in will create *more* work when a write occurs, so there's a tradeoff. The tradeoff for 2 queries is it's a bit harder to code, but there's no performance tradeoff, like there is for adding an index. Please take the index out and let's work on a way we can use a UNION or have 2 queries for this.
Comment 14•11 years ago
|
||
:robhudson - can we talk about having a UNION or 2 separate queries here?
Assignee | ||
Comment 15•11 years ago
|
||
(In reply to Sheeri Cabral [:sheeri] from comment #14) > :robhudson - can we talk about having a UNION or 2 separate queries here? Happy to. This doesn't have an priority on it and I'm focusing on P1 and P2 bugs. Can we get this prioritized? Using Django's `raw` query support we can probably do the UNION and still return `Collection` objects back, which is what the rest of the code expects. If that doesn't work it sounds like 2 queries with IN would be the 2nd choice.
Updated•11 years ago
|
Assignee: nobody → robhudson.mozbugs
Priority: -- → P3
Comment 16•11 years ago
|
||
FWIW it's happening again today. Just logging it here so we get a sense of frequency. Seems like once every 6 weeks or so, just after a FF release perhaps?
Reporter | ||
Comment 17•11 years ago
|
||
Happening right now. :) Killing them. Machine was pegged at 100% on all cores. Killing them.
Assignee | ||
Comment 18•11 years ago
|
||
work in progress patch I did awhile ago: https://github.com/robhudson/zamboni/compare/861224-collections-union I had some trouble with cache-machine and got busy with other prioritized work and never got back to this.
Assignee: robhudson.mozbugs → nobody
Reporter | ||
Comment 19•11 years ago
|
||
Is there any chance we could bump the priority on this? Every time it happens it requires direct intervention from a DBA and completely halts the system otherwise. This is a production system and this load is dangerous.
Comment 20•11 years ago
|
||
(In reply to Brandon Johnson [:cyborgshadow] from comment #19) > Is there any chance we could bump the priority on this? Every time it > happens it requires direct intervention from a DBA and completely halts the > system otherwise. This is a production system and this load is dangerous. Why don't we do the fix from comment 6 as a stop gap?
Comment 21•11 years ago
|
||
Wil - comment 6 was not a fix. The index is already there, comment 6 added it again, which makes writes slower and has little affect on reads (but if any, it makes them a tad slower).
Comment 22•11 years ago
|
||
Happening again right now.
Comment 23•11 years ago
|
||
Bumping up the priority to P2-critical, because if we don't jump on the machine and kill the queries when they show up, the machine would lock up and no queries would be able to run. We have given suggestions on the fix in comment 2 and comment 8, which can be done as per Rob in comment 15, and a WIP patch in comment 18.
Severity: normal → critical
Priority: P3 → P2
Updated•11 years ago
|
Assignee: nobody → amckay
Target Milestone: --- → 2013-06-20
Comment 24•11 years ago
|
||
It's happening again right now.
Assignee: amckay → nobody
Target Milestone: 2013-06-20 → ---
Comment 25•11 years ago
|
||
Do we have an ETA on the fix for this?
Assignee | ||
Comment 26•11 years ago
|
||
This is the error I get when I use the new query: Traceback (most recent call last): File "/Users/rob/.virtualenvs/zamboni/lib/python2.6/site-packages/django/core/handlers/base.py", line 111, in get_response response = callback(request, *callback_args, **callback_kwargs) File "/Users/rob/git/zamboni/apps/amo/decorators.py", line 32, in wrapper return func(request, *args, **kw) File "/Users/rob/git/zamboni/apps/bandwagon/views.py", line 357, in ajax_list for collection in collections: File "/Users/rob/git/zamboni/apps/amo/models.py", line 101, in __iter__ self._result_cache = list(super(RawQuerySet, self).__iter__()) File "/Users/rob/.virtualenvs/zamboni/lib/python2.6/site-packages/caching/base.py", line 249, in __iter__ for obj in CacheMachine(sql, iterator): File "/Users/rob/.virtualenvs/zamboni/lib/python2.6/site-packages/caching/base.py", line 108, in __iter__ self.cache_objects(to_cache) File "/Users/rob/.virtualenvs/zamboni/lib/python2.6/site-packages/caching/base.py", line 116, in cache_objects invalidator.cache_objects(objects, query_key, query_flush) File "/Users/rob/.virtualenvs/zamboni/lib/python2.6/site-packages/caching/invalidation.py", line 95, in cache_objects for key in map(flush_key, obj._cache_keys()): File "/Users/rob/.virtualenvs/zamboni/lib/python2.6/site-packages/caching/base.py", line 236, in _cache_keys fks = dict((f, getattr(self, f.attname)) for f in self._meta.fields File "/Users/rob/.virtualenvs/zamboni/lib/python2.6/site-packages/caching/base.py", line 237, in <genexpr> if isinstance(f, models.ForeignKey)) File "/Users/rob/.virtualenvs/zamboni/lib/python2.6/site-packages/django/db/models/query_utils.py", line 100, in __get__ instance._state.db).get(), File "/Users/rob/.virtualenvs/zamboni/lib/python2.6/site-packages/django/db/models/query.py", line 361, in get num = len(clone) File "/Users/rob/.virtualenvs/zamboni/lib/python2.6/site-packages/django/db/models/query.py", line 85, in __len__ self._result_cache = list(self.iterator()) File "/Users/rob/.virtualenvs/zamboni/lib/python2.6/site-packages/caching/base.py", line 108, in __iter__ self.cache_objects(to_cache) File "/Users/rob/.virtualenvs/zamboni/lib/python2.6/site-packages/caching/base.py", line 116, in cache_objects invalidator.cache_objects(objects, query_key, query_flush) File "/Users/rob/.virtualenvs/zamboni/lib/python2.6/site-packages/caching/invalidation.py", line 95, in cache_objects for key in map(flush_key, obj._cache_keys()): File "/Users/rob/.virtualenvs/zamboni/lib/python2.6/site-packages/caching/base.py", line 236, in _cache_keys fks = dict((f, getattr(self, f.attname)) for f in self._meta.fields File "/Users/rob/.virtualenvs/zamboni/lib/python2.6/site-packages/caching/base.py", line 237, in <genexpr> if isinstance(f, models.ForeignKey)) File "/Users/rob/.virtualenvs/zamboni/lib/python2.6/site-packages/django/db/models/query_utils.py", line 100, in __get__ instance._state.db).get(), File "/Users/rob/.virtualenvs/zamboni/lib/python2.6/site-packages/django/db/models/query.py", line 361, in get num = len(clone) File "/Users/rob/.virtualenvs/zamboni/lib/python2.6/site-packages/django/db/models/query.py", line 85, in __len__ self._result_cache = list(self.iterator()) File "/Users/rob/.virtualenvs/zamboni/lib/python2.6/site-packages/caching/base.py", line 108, in __iter__ self.cache_objects(to_cache) ... RuntimeError: maximum recursion depth exceeded while calling a Python object
Comment 27•11 years ago
|
||
And it's happening again. Killed the queries so the machine wouldn't fall over.
Comment 28•11 years ago
|
||
Hi Rob, Can you clarify what query you used in comment 27? I used the query in comment 2 and it returned immediately on a production addons machine, so I'm not sure where the Python object recursion is coming from. MariaDB [addons_mozilla_org]> SELECT `collections`.`id` -> FROM `collections` -> LEFT OUTER JOIN `collections_users` ON (`collections`.`id` = `collections_users`.`collection_id`) -> WHERE ("ja"="ja" -> AND `collections`.`author_id` = 3283336) -> UNION -> SELECT `collections`.`id` -> FROM `collections` -> LEFT OUTER JOIN `collections_users` ON (`collections`.`id` = `collections_users`.`collection_id`) -> WHERE ("ja"="ja" -> AND `collections_users`.`user_id` = 3283336 ); +--------+ | id | +--------+ | 100139 | +--------+ 1 row in set (0.00 sec)
Assignee | ||
Comment 29•11 years ago
|
||
(In reply to Sheeri Cabral [:sheeri] from comment #28) > Hi Rob, > > Can you clarify what query you used in comment 27? I used the query in > comment 2 and it returned immediately on a production addons machine, so I'm > not sure where the Python object recursion is coming from. It's the code I posted in comment 18.
Comment 30•11 years ago
|
||
Hrm, I'm not sure I can debug that kind of thing, but the file itself is models.py, and I'm not sure if it's related to this at all: File "/Users/rob/git/zamboni/apps/amo/models.py", line 101, in __iter__ self._result_cache = list(super(RawQuerySet, self).__iter__()) The code *looks* like it should work to me, but I don't really have experience debugging python ORMs. The only suggestions I have: 0) trying to remove the backtics (`) because maybe they're causing a parsing issue? 1) Separating into 2 queries instead of 1, so instead of changing self.filter(Q(author=user.id) | Q(users=user.id)) into a union, run 2 different queries: self.filter(Q(author=user.id)) self.filter(Q(users=user.id)) Or however that works. (my apologies, I'm *really* a novice when it comes to developing)
Assignee | ||
Comment 31•11 years ago
|
||
https://github.com/mozilla/zamboni/commit/a891d2b
Assignee: nobody → robhudson.mozbugs
Whiteboard: [qa-]
Target Milestone: --- → 2013-06-20
Assignee | ||
Comment 32•11 years ago
|
||
This will get pushed to prod on 6/20.
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Comment 33•11 years ago
|
||
w00t!
Updated•8 years ago
|
Product: addons.mozilla.org → addons.mozilla.org Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•