Closed Bug 823054 Opened 12 years ago Closed 10 years ago

DatabaseError: (1205, 'Lock wait timeout exceeded; try restarting transaction')

Categories

(addons.mozilla.org Graveyard :: Code Quality, defect, P3)

x86
macOS
defect

Tracking

(Not tracked)

RESOLVED WONTFIX

People

(Reporter: tarek, Unassigned)

References

Details

(Keywords: perf)

Seen on high load (> 500 RPS) on stage. DatabaseError: (1205, 'Lock wait timeout exceeded; try restarting transaction') Stacktrace (most recent call last): File "django/core/handlers/base.py", line 111, in get_response response = callback(request, *callback_args, **callback_kwargs) File "amo/decorators.py", line 160, in wrapper return f(*args, **kw) File "amo/decorators.py", line 152, in wrapper return f(*args, **kw) File "amo/decorators.py", line 33, in wrapper return func(request, *args, **kw) File "mkt/submit/decorators.py", line 37, in wrapper return f(request, *args, **kw) File "mkt/submit/decorators.py", line 21, in wrapper return f(request, *args, **kw) File "django/db/transaction.py", line 209, in inner return func(*args, **kwargs) File "mkt/submit/views.py", line 92, in manifest is_packaged=form.is_packaged()) File "addons/models.py", line 430, in from_upload addon = Addon(**dict((k, v) for k, v in data.items() if k in fields)) File "addons/models.py", line 323, in __init__ super(Addon, self).__init__(*args, **kw) File "amo/models.py", line 226, in __init__ super(OnChangeMixin, self).__init__(*args, **kw) File "django/db/models/base.py", line 355, in __init__ setattr(self, field.name, rel_obj) File "translations/fields.py", line 125, in __set__ value = self.translation_from_dict(instance, lang, value) File "translations/fields.py", line 175, in translation_from_dict trans = self.translation_from_string(instance, locale, string) File "translations/fields.py", line 158, in translation_from_string translation = self.model.new(string, lang) File "translations/models.py", line 84, in new SET id=LAST_INSERT_ID(id + @@global.auto_increment_increment)""") File "django/db/backends/mysql/base.py", line 114, in execute return self.cursor.execute(query, args) File "MySQLdb/cursors.py", line 173, in execute self.errorhandler(self, exc, value) File "MySQLdb/connections.py", line 36, in defaulterrorhandler raise errorclass, errorvalue
What is the value for "innodb_lock_wait_timeout" ? * it looks like raising it a bit could fix the issue. * we also need to look at the longest requests on high load to see what request takes so long
Severity: normal → critical
Priority: -- → P1
Related: Bug 756242
Ok it seems that this 'magic' autoinc does not scale well after 500 RPS. Sheeri, Kumar, Can I get more background on this feature ?
(In reply to Tarek Ziadé (:tarek) from comment #1) > What is the value for "innodb_lock_wait_timeout" ? > > * it looks like raising it a bit could fix the issue. > * we also need to look at the longest requests on high load to see what > request takes so long Just to clarify this is happening on our stage cluster during load testing. addons1.stage.db.phx1.mozilla.com mysql> show variables like '%innodb_lock_wait%'; +--------------------------+-------+ | Variable_name | Value | +--------------------------+-------+ | innodb_lock_wait_timeout | 50 | +--------------------------+-------+ 1 row in set (0.00 sec) 50 seconds seems pretty high already. We can try raising it to see if it helps under high load but I will have defer to @sheeri.
Well, that's what load testing things on stage is for, right? Let's see if we increase the timeout, if that's the right variable to tweak, and we could also see what's taking so long by looking at the slow query log....grab me on IRC and we can do this.
Looking at the slow query logs, we keep today and yesterday, so starting 121219 4:45:08 here's what we have: [root@addons2.stage.db.phx1 mysql]# grep Query.time mysql-slow.log.* | cut -f2 -d: | cut -f1 -d\. | sort -n | tail 7 9 9 10 11 12 34 35 61 62 So there are some 60 second queries...this is on addons2.stage, one of the slaves...Note that the thing they have most in common is they are examining LOTS of rows. (see Rows_examined) As opposed to, say, Lock_time.....the queries themselves are taking a while. here's the 4 queries that are over 20 seconds: # User@Host: addons_dev[addons_dev] @ [10.8.70.202] # Query_time: 35.398121 Lock_time: 0.000036 Rows_sent: 1 Rows_examined: 13729345 SET timestamp=1355985339; SELECT SUM(`download_counts`.`count`) AS `sum` FROM `download_counts` WHERE `download_counts`.`date` < '2012-12-20'; # Time: 121219 21:35:39 # User@Host: addons_dev[addons_dev] @ [10.8.70.202] # Query_time: 34.801077 Lock_time: 0.000054 Rows_sent: 24338 Rows_examined: 13738531 use marketplace_altdev_allizom_org; SET timestamp=1355981739; SELECT addon_id, AVG(count), SUM(count) FROM download_counts USE KEY (`addon_and_count`) GROUP BY addon_id ORDER BY addon_id; # Time: 121219 23:46:06 # User@Host: addons_stage[addons_stage] @ [10.8.70.202] # Query_time: 61.873735 Lock_time: 0.000048 Rows_sent: 921027 Rows_examined: 7666361 use addons_allizom_org; SET timestamp=1355989566; SELECT `stats_addons_collections_counts`.`addon_id`, `stats_addons_collections_counts`.`collection_id`, SUM(`stats_addons_collections_counts`.`count`) AS `sum` FROM `stats_addons_collections_counts` GROUP BY `stats_addons_collections_counts`.`addon_id`, `stats_addons_collections_counts`.`collection_id` ORDER BY NULL; # User@Host: addons_dev[addons_dev] @ [10.8.70.202] # Query_time: 62.015101 Lock_time: 0.000047 Rows_sent: 919481 Rows_examined: 7644040 use addons_dev_allizom_org_new; SET timestamp=1355989566; SELECT `stats_addons_collections_counts`.`addon_id`, `stats_addons_collections_counts`.`collection_id`, SUM(`stats_addons_collections_counts`.`count`) AS `sum` FROM `stats_addons_collections_counts` GROUP BY `stats_addons_collections_counts`.`addon_id`, `stats_addons_collections_counts`.`collection_id` ORDER BY NULL;
All stats queries ugh. I'm trying to kill these at least by moving stats off of zamboni and into another project.
Severity: critical → normal
Priority: P1 → --
@wil: all our load test work is blocked by this because we can't do even small loads without locking mysql and putting the app in a broken state - how can we express it in term of priority and severity ?
any time we see USE KEY in a query we might as well kill that query cold. There is no way mysql can optimize for that. Chances are the key is wrong.
kumar++ for comment 9. I agree.
Blocks: 825280
bug 825280 is for fixing slow queries which sounds like a great idea
Whiteboard: [waiting on monolith]
Monolith is alive, we can start thinking about this
Keywords: perf
Priority: -- → P3
Whiteboard: [waiting on monolith]
Meaning, thinking about going through the code and rework the queries that use "USE KEY" or any other index hints? Or something else?
(In reply to Sheeri Cabral [:sheeri] from comment #13) > Meaning, thinking about going through the code and rework the queries that > use "USE KEY" or any other index hints? Or something else? Monolith replaces all of those.
It sounds like this is a Monolith thing now, and may be fixed. Is this good to close?
Most of those look like they are AMO and we haven't done any work getting those stats out of AMO and into Monolith. It's in the plans for the future but the priority was to get Marketplace on Monolith first. I don't think we should close this until AMO uses Monolith.
No longer blocks: tarako-marketplace
Blocks: 992365
No longer blocks: 992365
Component: General → Code Quality
Product: Marketplace → addons.mozilla.org
Version: 1.0 → unspecified
Thanks for filing this. Due to resource constraints we are closing bugs which we won't realistically be able to fix. If you have a patch that applies to this bug please reopen. For more info see http://micropipes.com/blog/2014/09/24/the-great-add-on-bug-triage/
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → WONTFIX
Product: addons.mozilla.org → addons.mozilla.org Graveyard
You need to log in before you can comment on or make changes to this bug.