Closed Bug 608739 Opened 15 years ago Closed 14 years ago

[tinder] tbpl is slow, one query sucks, but why?

Categories

(Webtools Graveyard :: Elmo, defect, P3)

defect

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: Pike, Assigned: peterbe)

References

()

Details

(Keywords: perf)

Attachments

(1 file, 1 obsolete file)

We should figure out what's hurting # Time: 101101 8:10:14 # User@Host: l10nuser[l10nuser] @ cm-vpn01.mozilla.org [10.2.72.11] # Query_time: 32.825318 Lock_time: 0.000000 Rows_sent: 89 Rows_examined: 307499 SET timestamp=1288624214; SELECT DISTINCT `mbdb_property`.`id`, `mbdb_property`.`name`, `mbdb_property`.`source`, `mbdb_property`.`value` FROM `mbdb_property` INNER JOIN `mbdb_build_properties` ON (`mbdb_property`.`id` = `mbdb_build_properties`.`property_id`) WHERE (`mbdb_property`.`name` IN ('locale', 'tree', 'slavename') AND `mbdb_build_properties`.`build_id` >= 102540 AND `mbdb_build_properties`.`build_id` <= 102796 ); Call-site is the evaluation of pq, http://hg.mozilla.org/l10n/django-site/file/ca51e0545d5c/tinder/views.py#l97. FTR, the hand-written SQL we have in there is not the bad piece, even though it's close and intermingled. Gandalf, any idea? django-debug-toolbar claims it's doing an intermediate table.
Keywords: perf
wow, what a fascinating case. It seems that mysql optimizer kicks in and does everything in the reverse order. It's pretty complex but the result is that it uses mbdb_property for the first query, and only then it follow ups with a much more complex query on mbdb_build_properties: *************************** 1. row *************************** id: 1 select_type: SIMPLE table: mbdb_property type: range possible_keys: PRIMARY,mbdb_property_name key: mbdb_property_name key_len: 62 ref: NULL rows: 104 filtered: 100.00 Extra: Using where; Using temporary *************************** 2. row *************************** id: 1 select_type: SIMPLE table: mbdb_build_properties type: ref possible_keys: build_id,property_id_refs_id_621d2a84 key: property_id_refs_id_621d2a84 key_len: 4 ref: l10n_site.mbdb_property.id rows: 57 filtered: 100.00 Extra: Using where; Distinct 2 rows in set, 1 warning (0.00 sec) If only could I reverse... ah! I can reverse the order or the queries by bastardizing the syntax against MYSQL will and forcing it onto the compiler by using STRAIGHT_JOIN: SELECT STRAIGHT_JOIN DISTINCT `mbdb_property`.`id`, `mbdb_property`.`name`, `mbdb_property`.`source`, `mbdb_property`.`value` FROM `mbdb_build_properties` INNER JOIN `mbdb_property` ON (`mbdb_property`.`id`=`mbdb_build_properties`.`property_id`) WHERE (`mbdb_property`.`name` in ('locale', 'tree', 'slavename') and `mbdb_build_properties`.`build_id` >= 102540 and `mbdb_build_properties`.`build_id`<= 102796); *************************** 1. row *************************** id: 1 select_type: SIMPLE table: mbdb_build_properties type: range possible_keys: build_id,property_id_refs_id_621d2a84 key: build_id key_len: 4 ref: NULL rows: 6496 filtered: 100.00 Extra: Using where; Using index; Using temporary *************************** 2. row *************************** id: 1 select_type: SIMPLE table: mbdb_property type: eq_ref possible_keys: PRIMARY,mbdb_property_name key: PRIMARY key_len: 4 ref: l10n_site.mbdb_build_properties.property_id rows: 1 filtered: 100.00 Extra: Using where 2 rows in set, 1 warning (0.00 sec) ugly you say? I agree, but... what numbers are we talking about? How much could we save by such a hack? Old way: 89 rows in set (21.97 sec) Hacked way: 89 rows in set (0.05 sec) Yeah... 20000% - MySQL is such an amazing beast. disclaimer: I believe we can try to achieve similar result by working on indexes in order to trigger the compiler to do the right stuff.
STRAIGHT_JOIN replacing INNER JOIN will also do the trick.
Component: Infrastructure → Elmo
Product: Mozilla Localizations → Webtools
QA Contact: infrastructure → elmo
Summary: [dashboard][tinder] tbpl is slow, one query sucks, but why? → [tinder] tbpl is slow, one query sucks, but why?
Version: unspecified → 1.0
Either: 1. override the orm with manual query as per gandalf 2. solve problem (as per mysql bugzilla) by upgrading to mysql 5.5 Others?
Assignee: nobody → peterbe
Priority: -- → P3
3. Instead of trying to get all properties in one go with pq = Property.objects.filter(name__in=props) iterate over props, and do it one by one. Need to investigate if that'd help.
Interestingly enough, the total dataset of the properties for all three is just 107 items long at this point, and returns quickly. The best and correct fix is to not restrict that query to builds apparently. Yac.
PS: Doing just one property is a good deal better (2.5 secs each instead of 26 for all three), but the complete list of properties just takes 0.4 secs. That's remote from Bln talking to sjc.
This does not change any functionality on the surface. The old function uses a large INNER JOIN which causes Mysql to use a temporary table (extremely slow the first time, super fast afterwards) even for MySQL 5.5. My new function introduces 1 more SQL query (3 in total) but all 3 are able to use indexes and Python helps to intersect two sets instead. In the new function ALL queries are able to use indexes and thus very fast and best of all: 1) never slow on the first run and 2) does not become slower when you increase the number of sourcestamps. At 10 source stamps and after the temporary table has been allocated both functions take about 50ms. At 100 source stamps, the original one takes 500-2500ms to complete whereas the new function stays around 50-60ms ALL the time! Tests included in the patch to proof it.
Attachment #533969 - Flags: review?(l10n)
Also removed some excessive imports that aren't used.
Attachment #533969 - Attachment is obsolete: true
Attachment #533981 - Flags: review?(l10n)
Attachment #533969 - Flags: review?(l10n)
Comment on attachment 533981 [details] [diff] [review] Puts the docstring on pmap() back in Review of attachment 533981 [details] [diff] [review]: ----------------------------------------------------------------- I had a hard time reading that patch, and thus I'll be heavy on the comment side. Also, to call out the actual optimizations here: The query over Property.object doesn't restrict itself anymore to the builds in question, but gets all Properties. That's ok for our use-case, as the values for the properties we use are a defined and somewhat small set. If some other code would want to get revision or something, all hell would break loose. The second optimization, AFAICT, is the second query to Properties. The first get's all Properties for the given names, but only the IDs. The second only keeps track of those our builds actually use, and then get the full data for those. It'd still be good if we could go without the assumption for the first one here, but that's more of an academic problem. There's on issue that I saw, and that's there's no more restriction to the actual build ids in the return value, so I've suggested a spot to fix that. I really think this code would be much better off with a Build_Properties non-managed model like we have for other ManyToMany. Hadn't made up that hack back when we wrote this code. But that's for a different bug. r=me with the following comments. ::: apps/tinder/tests.py @@ +255,5 @@ > + props = ('gender', 'age') > + result = pmap(props, [build1.id, build2.id]) > + > + ok_(hasattr(result, 'keys')) > + ok_(hasattr(result, 'values')) I'd rather do a ok_(isinstance(result, dict)) ::: apps/tinder/views.py @@ +70,5 @@ > + """Create a map of build ids to dicts with the requested properties.""" > + if not bld_ids: > + return {} > + > + rv = defaultdict(dict) Move this down to where it's populated? Also, I'd like to get some comments back in to the code, so that I get it next time I read it. I'll add them here as comments where I see fit. # Set up SQL query to get build-property mappings @@ +83,5 @@ > `%(t)s`.`%(p)s` IN (%(ps)s));''' > + > + args['bs_min'] = min(bld_ids) > + args['bs_max'] = max(bld_ids) > + # Get all IDs for properties with the given names. @@ +92,3 @@ > cursor.execute(pattern % args) > + props = defaultdict(list) > + _all_pids = set() comments after the lines? props = defaultdict(list) # property IDs for each build _all_pids = set() # used property IDs, subset of property_ids @@ +95,1 @@ > for bid, pid in cursor.fetchall(): I'd drop all 'bid's that are not in bld_ids.
Attachment #533981 - Flags: review?(l10n) → review+
Landed on origin/develop https://github.com/mozilla/elmo/commit/ef52374d5868c5c2073d90e39297b1a9e1e26159 Thanks Pike for the review!
Status: NEW → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
Target Milestone: --- → 1.2
Product: Webtools → Webtools Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: