Closed Bug 861224 Opened 11 years ago Closed 11 years ago

Collections queries causing spikes in database traffic for addons

Categories

(addons.mozilla.org Graveyard :: Collections, defect, P2)

x86
macOS
defect

Tracking

(Not tracked)

RESOLVED FIXED
2013-06-20

People

(Reporter: bjohnson, Assigned: robhudson)

References

Details

(Whiteboard: [qa-])

We're seeing general spikes on the addons database when it comes to gathering some information from the collections tables.

Graph:
http://goo.gl/ZSmvE

With just over 3 million rows in the table, in looking at this, we could speed it up a bit on the database end by creating an index on collections.author_id. Right now author_id is indexed, but as part of a composite index of: author_id and slug, which is a varchar(30) field.

However, it appears when we're getting the spikes, it's literally several hundred of queries like this one being run simultaneously:

SELECT `collections`.`id` FROM `collections` LEFT OUTER JOIN `collections_users` ON (`collections`.`id` = `collections_users`.`collection_id`) WHERE ("en-us"="en-us" AND (`collections`.`author_id` = X  OR `collections_users`.`user_id` = X ));
happened again. Also noticed this query during the spike. Can you guys by chance identify where this came from?

SELECT `addons`.`id`, `addons`.`created`, `addons`.`modified`, `addons`.`guid`, `addons`.`slug`, `addons`.`app_slug`, `addons`.`name`, `addons`.`defaultlocale`, `addons`.`addontype_id`, `addons`.`status`, `addons`.`higheststatus`, `addons`.`icontype`, `addons`.`homepage`, `addons`.`supportemail`, `addons`.`supporturl`, `addons`.`description`, `addons`.`summary`, `addons`.`developercomments`, `addons`.`eula`, `addons`.`privacypolicy`, `addons`.`the_reason`, `addons`.`the_future`, `addons`.`averagerating`, `addons`.`bayesianrating`, `addons`.`totalreviews`, `addons`.`weeklydownloads`, `addons`.`totaldownloads`, `addons`.`hotness`, `addons`.`average_daily_downloads`, `addons`.`average_daily_users`, `addons`.`sharecount`, `addons`.`last_updated`, `addons`.`ts_slowness`, `addons`.`inactive`, `addons`.`trusted`, `addons`.`viewsource`, `addons`.`publicstats`, `addons`.`prerelease`, `addons`.`adminreview`, `addons`.`admin_review_type`, `addons`.`sitespecific`, `addons`.`externalsoftware`, `addons`.`dev_agreement`, `addons`.`auto_repackage`, `addons`.`outstanding`, `addons`.`nominationmessage`, `addons`.`target_locale`, `addons`.`locale_disambiguation`, `addons`.`wants_contributions`, `addons`.`paypal_id`, `addons`.`charity_id`, `addons`.`suggested_amount`, `addons`.`total_contributions`, `addons`.`annoying`, `addons`.`enable_thankyou`, `addons`.`thankyou_note`, `addons`.`get_satisfaction_company`, `addons`.`get_satisfaction_product`, `addons`.`premium_type`, `addons`.`manifest_url`, `addons`.`app_domain`, `addons`.`current_version`, `addons`.`backup_version`, `addons`.`make_public`, `addons`.`mozilla_contact`, `addons`.`is_packaged` FROM `addons` INNER JOIN `versions` ON (`addons`.`current_version` = `versions`.`id`) INNER JOIN `appsupport` ON (`addons`.`id` = `appsupport`.`addon_id`) WHERE ("de"="de" AND NOT (`addons`.`status` = 11 ) AND (`versions`.`id` IS NOT NULL AND `addons`.`inactive` = 0  AND `addons`.`status` IN (4)) AND `appsupport`.`app_id` = 1  AND `addons`.`id` IN (304, 57674, 8552, 133, 15061, 143, 56194, 588, 15018, 388, 1957, 2008, 367341, 6079, 6637, 52, 146375, 8848, 410606, 4735, 7400, 235806, 5642, 6138, 270728, 5396, 161775, 162124, 113508, 176002, 8289, 58517, 2685, 334573, 118646, 5240, 155815, 375335, 8147, 347597, 404489, 4305, 1200, 10976, 337334, 12546, 422108, 12313, 422058, 390779, 109232, 390771, 421400, 14164, 368953, 278893, 67648, 355823, 3825, 258236, 354772, 146364, 162158, 66584, 90787, 11213, 370506, 309955, 46519, 6485, 4156, 9488, 344, 2962, 9511, 4386, 71410, 2232, 5555, 3131, 2703, 14642, 7484, 13905, 1252, 9322, 8624, 368099, 3190, 2266, 337479, 59191, 2886, 355807, 4513, 2294, 526, 324975, 1579, 5986, 10091, 331634, 181920, 269245, 146348, 314124, 8743, 195789, 2707, 8360, 311926, 337533, 5478, 3002, 1672, 421, 3199, 295894, 306531, 12330, 270822, 4336, 11778, 2050, 182864, 2620, 620, 14120, 349222, 162141, 13848, 991, 361746, 157767, 316906, 142879, 979, 180649, 371987, 58189, 371677, 423804, 337658, 337263, 315874, 364735, 368682, 369147, 324987, 4571, 398980, 335328, 7481, 235796, 393928, 359598, 62) AND `appsupport`.`min` <= 20000100200100  AND `addons`.`id` IN (422058, 337658, 12313, 4156));
Depends on: 861089
Summary: Collections queries causing spikes in database traffic. → Collections queries causing spikes in database traffic for addons
Adding Wil to this bug.

The query Brandon mentioned in comment 1 happened 224 times in the hour between 11 pm and midnight Pacific time last night, with an average query time of 6.02 seconds (total query time 1347 seconds).

*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: collections
   partitions: NULL
         type: index
possible_keys: author_id
          key: author_id
      key_len: 98
          ref: NULL
         rows: 3109028
        Extra: Using index
*************************** 2. row ***************************
           id: 1
  select_type: SIMPLE
        table: collections_users
   partitions: NULL
         type: ref
possible_keys: collection_id
          key: collection_id
      key_len: 4
          ref: addons_mozilla_org.collections.id
         rows: 1
        Extra: Using where; Using index
2 rows in set (0.00 sec)

It's using an index, but the query isn't optimal. Changing the OR to a UNION makes queries MUCH faster:

from
SELECT `collections`.`id` FROM `collections` LEFT OUTER JOIN `collections_users` ON (`collections`.`id` = `collections_users`.`collection_id`) WHERE ("ja"="ja" AND (`collections`.`author_id` = 3283336  OR `collections_users`.`user_id` = 3283336 ));

to

SELECT `collections`.`id` 
FROM `collections`
 LEFT OUTER JOIN `collections_users` ON (`collections`.`id` = `collections_users`.`collection_id`) 
WHERE ("ja"="ja" 
AND `collections`.`author_id` = 3283336)
UNION
SELECT `collections`.`id` 
FROM `collections`
 LEFT OUTER JOIN `collections_users` ON (`collections`.`id` = `collections_users`.`collection_id`) 
WHERE ("ja"="ja" 
AND `collections_users`.`user_id` = 3283336 );

Is that something you can change? I ran each query 3 times in a row and query time goes down from over 5 seconds for the first query with OR to 0.02 seconds for the 2nd query with the UNION.
(database load paged again last night at 11:51 pm Pacific, or 2:51 am my time. These are early warnings for the database falling over)
FWIW the UNION instead of the OR makes the query more than 175 times faster.
For collections the code that seems to produce this query is here:
https://github.com/mozilla/zamboni/blob/master/apps/bandwagon/models.py#L67-L71

What's tricky about converting this to a UNION is this is in the Django model manager and returns a queryset which allows other filters to be applied to it later. The only way I can think to still achieve this is to do the UNION query and then do a separate `ID IN (...)` query to get all the collections as a queryset that Django can pass along to be filtered more later. But I've heard "IN" queries can be slow also if they're large enough.
Wil - that index already exists on collections, why are you adding it again? That will make writes to the collections table even slower. Please revert that patch.
(In reply to Rob Hudson [:robhudson] from comment #5)
> For collections the code that seems to produce this query is here:
> https://github.com/mozilla/zamboni/blob/master/apps/bandwagon/models.py#L67-
> L71
> 
> What's tricky about converting this to a UNION is this is in the Django
> model manager and returns a queryset which allows other filters to be
> applied to it later. The only way I can think to still achieve this is to do
> the UNION query and then do a separate `ID IN (...)` query to get all the
> collections as a queryset that Django can pass along to be filtered more
> later. But I've heard "IN" queries can be slow also if they're large enough.

IN does make things slow if there are a lot of results, and of course we don't know how many results this kind of query *might* have.

The problem here is that the OR with 2 different filters really makes it 2 different queries (because you want to use 2 different indexes). Is there a way to just query the table twice instead of once? There will be more network latency, but running 2 separate queries will be much faster in this case, and then the extra filters could just be applied to both queries?
(In reply to Sheeri Cabral [:sheeri] from comment #7)
> Wil - that index already exists on collections, why are you adding it again?
> That will make writes to the collections table even slower. Please revert
> that patch.

Brandon asked me to add it.  Please coordinate and tell me what you'd like to do.
After we had talked in IRC following comment 1, it seemed making this 2 queries was not able to be done. That was why the recommendation to add the index went up. It will slow down writes by 4 bytes per record, but increase the performance of this specific select. 

Doing 2 separate queries (or the UNION, as that's effectively what it's doing) here is the optimal method. If we can do that, let's do it instead. :)
(I agree with Brandon FWIW)
Ok, I'll leave the index in.
Wil - In comment 8 Rob implies that we can split it into 2 queries or use a UNION. We'd rather go down that path.

The index has not been tested on addons stage, so we don't actually know what kind of a speedup it will bring (the speedup is based on theory, so we know it will speed up if we do that, but not by how much). And splitting it is a much better option - in theory, and as you can see by the speedup (175 times faster), in practice.

Additionally, leaving the index in will create *more* work when a write occurs, so there's a tradeoff. The tradeoff for 2 queries is it's a bit harder to code, but there's no performance tradeoff, like there is for adding an index.

Please take the index out and let's work on a way we can use a UNION or have 2 queries for this.
:robhudson - can we talk about having a UNION or 2 separate queries here?
(In reply to Sheeri Cabral [:sheeri] from comment #14)
> :robhudson - can we talk about having a UNION or 2 separate queries here?

Happy to. This doesn't have an priority on it and I'm focusing on P1 and P2 bugs. Can we get this prioritized?

Using Django's `raw` query support we can probably do the UNION and still return `Collection` objects back, which is what the rest of the code expects. If that doesn't work it sounds like 2 queries with IN would be the 2nd choice.
Assignee: nobody → robhudson.mozbugs
Priority: -- → P3
FWIW it's happening again today. Just logging it here so we get a sense of frequency. Seems like once every 6 weeks or so, just after a FF release perhaps?
Happening right now. :) Killing them.


Machine was pegged at 100% on all cores. Killing them.
work in progress patch I did awhile ago:
https://github.com/robhudson/zamboni/compare/861224-collections-union

I had some trouble with cache-machine and got busy with other prioritized work and never got back to this.
Assignee: robhudson.mozbugs → nobody
Is there any chance we could bump the priority on this? Every time it happens it requires direct intervention from a DBA and completely halts the system otherwise. This is a production system and this load is dangerous.
(In reply to Brandon Johnson [:cyborgshadow] from comment #19)
> Is there any chance we could bump the priority on this? Every time it
> happens it requires direct intervention from a DBA and completely halts the
> system otherwise. This is a production system and this load is dangerous.

Why don't we do the fix from comment 6 as a stop gap?
Wil - comment 6 was not a fix. The index is already there, comment 6 added it again, which makes writes slower and has little affect on reads (but if any, it makes them a tad slower).
Happening again right now.
Bumping up the priority to P2-critical, because if we don't jump on the machine and kill the queries when they show up, the machine would lock up and no queries would be able to run.

We have given suggestions on the fix in comment 2 and comment 8, which can be done as per Rob in comment 15, and a WIP patch in comment 18.
Severity: normal → critical
Priority: P3 → P2
Assignee: nobody → amckay
Target Milestone: --- → 2013-06-20
It's happening again right now.
Assignee: amckay → nobody
Target Milestone: 2013-06-20 → ---
Do we have an ETA on the fix for this?
This is the error I get when I use the new query:

Traceback (most recent call last):
  File "/Users/rob/.virtualenvs/zamboni/lib/python2.6/site-packages/django/core/handlers/base.py", line 111, in get_response
    response = callback(request, *callback_args, **callback_kwargs)
  File "/Users/rob/git/zamboni/apps/amo/decorators.py", line 32, in wrapper
    return func(request, *args, **kw)
  File "/Users/rob/git/zamboni/apps/bandwagon/views.py", line 357, in ajax_list
    for collection in collections:
  File "/Users/rob/git/zamboni/apps/amo/models.py", line 101, in __iter__
    self._result_cache = list(super(RawQuerySet, self).__iter__())
  File "/Users/rob/.virtualenvs/zamboni/lib/python2.6/site-packages/caching/base.py", line 249, in __iter__
    for obj in CacheMachine(sql, iterator):
  File "/Users/rob/.virtualenvs/zamboni/lib/python2.6/site-packages/caching/base.py", line 108, in __iter__
    self.cache_objects(to_cache)
  File "/Users/rob/.virtualenvs/zamboni/lib/python2.6/site-packages/caching/base.py", line 116, in cache_objects
    invalidator.cache_objects(objects, query_key, query_flush)
  File "/Users/rob/.virtualenvs/zamboni/lib/python2.6/site-packages/caching/invalidation.py", line 95, in cache_objects
    for key in map(flush_key, obj._cache_keys()):
  File "/Users/rob/.virtualenvs/zamboni/lib/python2.6/site-packages/caching/base.py", line 236, in _cache_keys
    fks = dict((f, getattr(self, f.attname)) for f in self._meta.fields
  File "/Users/rob/.virtualenvs/zamboni/lib/python2.6/site-packages/caching/base.py", line 237, in <genexpr>
    if isinstance(f, models.ForeignKey))
  File "/Users/rob/.virtualenvs/zamboni/lib/python2.6/site-packages/django/db/models/query_utils.py", line 100, in __get__
    instance._state.db).get(),
  File "/Users/rob/.virtualenvs/zamboni/lib/python2.6/site-packages/django/db/models/query.py", line 361, in get
    num = len(clone)
  File "/Users/rob/.virtualenvs/zamboni/lib/python2.6/site-packages/django/db/models/query.py", line 85, in __len__
    self._result_cache = list(self.iterator())
  File "/Users/rob/.virtualenvs/zamboni/lib/python2.6/site-packages/caching/base.py", line 108, in __iter__
    self.cache_objects(to_cache)
  File "/Users/rob/.virtualenvs/zamboni/lib/python2.6/site-packages/caching/base.py", line 116, in cache_objects
    invalidator.cache_objects(objects, query_key, query_flush)
  File "/Users/rob/.virtualenvs/zamboni/lib/python2.6/site-packages/caching/invalidation.py", line 95, in cache_objects
    for key in map(flush_key, obj._cache_keys()):
  File "/Users/rob/.virtualenvs/zamboni/lib/python2.6/site-packages/caching/base.py", line 236, in _cache_keys
    fks = dict((f, getattr(self, f.attname)) for f in self._meta.fields
  File "/Users/rob/.virtualenvs/zamboni/lib/python2.6/site-packages/caching/base.py", line 237, in <genexpr>
    if isinstance(f, models.ForeignKey))
  File "/Users/rob/.virtualenvs/zamboni/lib/python2.6/site-packages/django/db/models/query_utils.py", line 100, in __get__
    instance._state.db).get(),
  File "/Users/rob/.virtualenvs/zamboni/lib/python2.6/site-packages/django/db/models/query.py", line 361, in get
    num = len(clone)
  File "/Users/rob/.virtualenvs/zamboni/lib/python2.6/site-packages/django/db/models/query.py", line 85, in __len__
    self._result_cache = list(self.iterator())
  File "/Users/rob/.virtualenvs/zamboni/lib/python2.6/site-packages/caching/base.py", line 108, in __iter__
    self.cache_objects(to_cache)
...
RuntimeError: maximum recursion depth exceeded while calling a Python object
And it's happening again. Killed the queries so the machine wouldn't fall over.
Hi Rob,

Can you clarify what query you used in comment 27? I used the query in comment 2 and it returned immediately on a production addons machine, so I'm not sure where the Python object recursion is coming from.

MariaDB [addons_mozilla_org]> SELECT `collections`.`id` 
    -> FROM `collections`
    ->  LEFT OUTER JOIN `collections_users` ON (`collections`.`id` = `collections_users`.`collection_id`) 
    -> WHERE ("ja"="ja" 
    -> AND `collections`.`author_id` = 3283336)
    -> UNION
    -> SELECT `collections`.`id` 
    -> FROM `collections`
    ->  LEFT OUTER JOIN `collections_users` ON (`collections`.`id` = `collections_users`.`collection_id`) 
    -> WHERE ("ja"="ja" 
    -> AND `collections_users`.`user_id` = 3283336 );
+--------+
| id     |
+--------+
| 100139 |
+--------+
1 row in set (0.00 sec)
(In reply to Sheeri Cabral [:sheeri] from comment #28)
> Hi Rob,
> 
> Can you clarify what query you used in comment 27? I used the query in
> comment 2 and it returned immediately on a production addons machine, so I'm
> not sure where the Python object recursion is coming from.

It's the code I posted in comment 18.
Hrm, I'm not sure I can debug that kind of thing, but the file itself is models.py, and I'm not sure if it's related to this at all:

  File "/Users/rob/git/zamboni/apps/amo/models.py", line 101, in __iter__
    self._result_cache = list(super(RawQuerySet, self).__iter__())

The code *looks* like it should work to me, but I don't really have experience debugging python ORMs. 

The only suggestions I have:

0) trying to remove the backtics (`) because maybe they're causing a parsing issue?

1) Separating into 2 queries instead of 1, so instead of changing 
self.filter(Q(author=user.id) | Q(users=user.id)) 

into a union, run 2 different queries:

self.filter(Q(author=user.id))
self.filter(Q(users=user.id))

Or however that works. (my apologies, I'm *really* a novice when it comes to developing)
https://github.com/mozilla/zamboni/commit/a891d2b
Assignee: nobody → robhudson.mozbugs
Whiteboard: [qa-]
Target Milestone: --- → 2013-06-20
This will get pushed to prod on 6/20.
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
w00t!
Product: addons.mozilla.org → addons.mozilla.org Graveyard
You need to log in before you can comment on or make changes to this bug.