Closed Bug 1346567 Opened 8 years ago Closed 6 years ago

cycle_data is failing on stage/production

Tracking

(Not tracked)

Status:

RESOLVED FIXED

People

(Reporter: emorley, Assigned: igoldan)

References

(Blocks 1 open bug)

Details

Attachments

(7 files)

[treeherder] mozilla:cycle-non-job-data-timeouts > mozilla:master 8 years ago GitHub Autolander Bot 47 bytes, text/x-github-pull-request	wlach : review+	Details \| Review
cycle-data log 8 years ago Ed Morley [:emorley] 76.23 KB, text/plain		Details
Link to GitHub pull-request: https://github.com/mozilla/treeherder/pull/3022 7 years ago GitHub Bugzilla PR Linker 47 bytes, text/x-github-pull-request	wlach : review+	Details \| Review
Link to GitHub pull-request: https://github.com/mozilla/treeherder/pull/4777 6 years ago GitHub Bugzilla PR Linker 47 bytes, text/x-github-pull-request		Details \| Review
Link to GitHub pull-request: https://github.com/mozilla/treeherder/pull/4854 6 years ago GitHub Bugzilla PR Linker 47 bytes, text/x-github-pull-request		Details \| Review
Link to GitHub pull-request: https://github.com/mozilla/treeherder/pull/4954 6 years ago GitHub Bugzilla PR Linker 47 bytes, text/x-github-pull-request		Details \| Review
Link to GitHub pull-request: https://github.com/mozilla/treeherder/pull/5375 6 years ago GitHub Bugzilla PR Linker 47 bytes, text/x-github-pull-request		Details \| Review

Ed Morley [:emorley]

Reporter

Description

•

8 years ago

Breaking out from bug 1284432 comment 35. `cycle_non_job_data()` is failing during: used_machine_ids = set(Job.objects.values_list( 'machine_id', flat=True).distinct()) Machine.objects.exclude(id__in=used_machine_ids).delete() The SQL being used for this is roughly: used_machine_ids -> "SELECT DISTINCT `job`.`machine_id` FROM `job`" Then "SELECT `machine`.`id`, `machine`.`name` FROM `machine` WHERE NOT (`machine`.`id` IN (1, 2, 3, 4, 5, ...))" This doesn't work when there are 2.6 million distinct machine ids used in the jobs table (ie it tries to pass a 2.6 million item tuple). Instead of flattening to a list, we should pass the queryset directly, eg something like: Machine.objects.exclude(id__in=Job.objects.values('machine_id')).delete()

Ed Morley [:emorley]

Reporter

Comment 1

•

8 years ago

$ ./manage.py shell ... >>> from treeherder.model.models import Job, Machine >>> print Machine.objects.exclude(id__in=Job.objects.values('machine_id')).query SELECT `machine`.`id`, `machine`.`name` FROM `machine` WHERE NOT (`machine`.`id` IN (SELECT U0.`machine_id` FROM `job` U0)) > EXPLAIN SELECT `machine`.`id`, `machine`.`name` FROM `machine` WHERE NOT (`machine`.`id` IN (SELECT U0.`machine_id` FROM `job` U0)) ******************** 1. row ********************* id: 1 select_type: PRIMARY table: machine type: index possible_keys: key: machine_name_4a8b45973a00cd64_uniq key_len: 302 ref: rows: 5340061 Extra: Using where; Using index ******************** 2. row ********************* id: 2 select_type: DEPENDENT SUBQUERY table: U0 type: index_subquery possible_keys: job_a9374927 key: job_a9374927 key_len: 4 ref: func rows: 3 Extra: Using index 2 rows in set -> Looks good to me.

GitHub Autolander Bot

Comment 2

•

8 years ago

Attached file [treeherder] mozilla:cycle-non-job-data-timeouts > mozilla:master — Details

Ed Morley [:emorley]

Reporter