Closed Bug 1302869 Opened 8 years ago Closed 7 years ago

Remove redundant indexes

Categories

(Tree Management :: Treeherder, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: emorley, Assigned: emorley)

References

Details

Attachments

(1 file)

eg:

> SELECT * FROM sys.schema_redundant_indexes

...
******************** 5. row *********************
              table_schema: treeherder
                table_name: bug_job_map
      redundant_index_name: bug_job_map_d697ea38
   redundant_index_columns: job_id
redundant_index_non_unique: 1
       dominant_index_name: bug_job_map_job_id_700648fc2e943c6b_uniq
    dominant_index_columns: job_id,bug_id
 dominant_index_non_unique: 0
            subpart_exists: 0
            sql_drop_index: ALTER TABLE `treeherder`.`bug_job_map` DROP INDEX `bug_job_map_d697ea38`
******************** 6. row *********************
              table_schema: treeherder
                table_name: build_platform
      redundant_index_name: build_platform_os_name_3e460886b60cf2ce_uniq
   redundant_index_columns: os_name,platform,architecture
redundant_index_non_unique: 0
       dominant_index_name: build_platform_os_name_2b1ee6b83c271ea_uniq
    dominant_index_columns: os_name,platform,architecture
 dominant_index_non_unique: 0
            subpart_exists: 0
            sql_drop_index: ALTER TABLE `treeherder`.`build_platform` DROP INDEX `build_platform_os_name_3e460886b60cf2ce_uniq`
Install the mysql helper scripts using:
1) vagrant destroy -f && vagrant up && vagrant ssh
2) cd ~/
3) git clone https://github.com/mysql/mysql-sys
4) cd mysql-sys
5) mysql -u root -p < ./sys_56.sql

Then against the Vagrant DB:

> SELECT CONCAT(table_schema, '.', table_name) AS 'table', dominant_index_columns AS dominant_index, redundant_index_columns AS redundant_index FROM sys.schema_redundant_indexes

******************** 1. row *********************
          table: treeherder.bug_job_map
 dominant_index: job_id,bug_id
redundant_index: job_id

******************** 2. row *********************
          table: treeherder.build_platform
 dominant_index: os_name,platform,architecture
redundant_index: os_name

******************** 3. row *********************
          table: treeherder.failure_line
 dominant_index: job_log_id,line
redundant_index: job_log_id

******************** 4. row *********************
          table: treeherder.failure_line
 dominant_index: signature,test,created
redundant_index: signature

******************** 5. row *********************
          table: treeherder.failure_match
 dominant_index: failure_line_id,classified_failure_id,matcher_id
redundant_index: failure_line_id

******************** 6. row *********************
          table: treeherder.job_group
 dominant_index: name,symbol
redundant_index: name

******************** 7. row *********************
          table: treeherder.job_log
 dominant_index: job_id,name,url
redundant_index: job_id

******************** 8. row *********************
          table: treeherder.job_type
 dominant_index: name,symbol
redundant_index: name

******************** 9. row *********************
          table: treeherder.machine
 dominant_index: name
redundant_index: name

******************** 10. row *********************
          table: treeherder.machine_platform
 dominant_index: os_name,platform,architecture
redundant_index: os_name

******************** 11. row *********************
          table: treeherder.option_collection
 dominant_index: option_collection_hash,option_id
redundant_index: option_collection_hash

******************** 12. row *********************
          table: treeherder.performance_alert
 dominant_index: summary_id,series_signature_id
redundant_index: summary_id

******************** 13. row *********************
          table: treeherder.performance_datum
 dominant_index: repository_id,job_id,result_set_id,signature_id,push_timestamp
redundant_index: repository_id,job_id

******************** 14. row *********************
          table: treeherder.performance_signature
 dominant_index: repository_id,framework_id,signature_hash
redundant_index: repository_id

******************** 15. row *********************
          table: treeherder.repository
 dominant_index: name
redundant_index: name

******************** 16. row *********************
          table: treeherder.text_log_error
 dominant_index: step_id,line_number
redundant_index: step_id

******************** 17. row *********************
          table: treeherder.text_log_step
 dominant_index: job_id,started_line_number,finished_line_number
redundant_index: job_id
Some are not our fault, eg Django creates redundant indexes for a FlexibleForeignKey even if there is a unique_together:

class TextLogError(models.Model):
    id = BigAutoField(primary_key=True)
    step = FlexibleForeignKey(TextLogStep, related_name='errors')
    line = models.TextField()
    line_number = models.PositiveIntegerField()

    class Meta:
        db_table = "text_log_error"
        unique_together = ('step', 'line_number')


Others are definitely our fault. eg specifying both unique=True and db_index=True.

For the Django at fault ones, we should still file upstream bugs.
(In reply to Ed Morley [:emorley] from comment #2)
> Some are not our fault, eg Django creates redundant indexes

See:
https://groups.google.com/forum/#!topic/django-developers/3ywugkcaxqs
https://code.djangoproject.com/ticket/24082
Assignee: nobody → emorley
Depends on: 1303763
(In reply to Ed Morley [:emorley] from comment #2)
> Some are not our fault, eg Django creates redundant indexes for a FlexibleForeignKey

More specifically:
* Django 1.10 and earlier creates a redundant index for ForeignKeys that overlap with a unique_together (see example in https://emorley.pastebin.mozilla.org/8933048).
* This appears to be fixed in Django master.
* The workaround is to add an explicit db_index=False to the ForeignKey().
* However this workaround causes a suboptimal migration if applied to existing tables, since it not only drops the redundant index, but actually removes the foreign key entirely before adding an identical one. This is bad since dropping an index is an online DDL operation, but adding a constraint is not (causes the table to be locked).

For this I've filed:
https://code.djangoproject.com/ticket/27558

Discussion at:
https://groups.google.com/forum/#!topic/django-developers/3ywugkcaxqs

The takeaway lesson here is that when we're performing migrations on our larger tables we probably should run a `./manage.py model <migration_name>` to show the SQL being run to understand how much of a performance implication it will really have.
A `./manage.py sqlmigrate <model> <migration_name>` even.
(In reply to Ed Morley [:emorley] from comment #4)
> For this I've filed:
> https://code.djangoproject.com/ticket/27558

I've opened some Django PRs to fix that ticket:
https://github.com/django/django/pull/7647
https://github.com/django/django/pull/7648
(In reply to Ed Morley [:emorley] from comment #6)
> I've opened some Django PRs to fix that ticket:

My PRs were actually merged in time to make the Django 1.10.4 release.
Once we're on Django 1.10 we'll have the fix for this :-)

https://docs.djangoproject.com/en/1.10/releases/1.10.4/
Comment on attachment 8825211 [details] [review]
[treeherder] mozilla:rm-redundant-indexes > mozilla:master

This fixes most of the cases. The others will ether be fixed as part of migrations file squashing (bug 1320136; since there is cruft in 0001_inital...) or else by later PRs in this bug.
Attachment #8825211 - Flags: review?(wlachance)
Comment on attachment 8825211 [details] [review]
[treeherder] mozilla:rm-redundant-indexes > mozilla:master

Awesome!
Attachment #8825211 - Flags: review?(wlachance) → review+
Commit pushed to master at https://github.com/mozilla/treeherder

https://github.com/mozilla/treeherder/commit/97825a259ba6a8b9852b76f3da0e3c943b7c352d
Bug 1302869 - Remove redundant MySQL indexes

In cases where one index overlaps with the start of another (larger)
composite index, the first index is redundant and can be removed.

Found using the sys.schema_redundant_indexes helper:

          table: treeherder.build_platform
 dominant_index: os_name,platform,architecture
redundant_index: os_name

          table: treeherder.job_group
 dominant_index: name,symbol
redundant_index: name

          table: treeherder.job_type
 dominant_index: name,symbol
redundant_index: name

          table: treeherder.machine_platform
 dominant_index: os_name,platform,architecture
redundant_index: os_name

          table: treeherder.option_collection
 dominant_index: option_collection_hash,option_id
redundant_index: option_collection_hash

          table: treeherder.performance_datum
 dominant_index: repository_id,job_id
redundant_index: repository_id

          table: treeherder.performance_datum
 dominant_index: repository_id,job_id,push_id,signature_id
redundant_index: repository_id,job_id
Depends on: 1320136
After bug 1320136, there's only one redundant index returned by the `sys.schema_redundant_indexes` helper.

> SELECT * FROM sys.schema_redundant_indexes

******************** 1. row *********************
              table_schema: treeherder
                table_name: failure_line
      redundant_index_name: failure_line_signature_idx
   redundant_index_columns: signature
redundant_index_non_unique: 1
       dominant_index_name: failure_line_signature_test_idx
    dominant_index_columns: signature,test,created
 dominant_index_non_unique: 1
            subpart_exists: 1
            sql_drop_index: ALTER TABLE `treeherder`.`failure_line` DROP INDEX `failure_line_signature_idx`
1 rows in set

However the indexes in question are:
signature(50)
signature(25),test(50),created

...so I'm presuming the non-composite index still has use, since it's longer than the `signature` part of the composite index.

I would file an issue against the mysql-sys helpers, but the GitHub repo has issues disabled and so bugs have to be filed against the Oracle mysql.com tracker, which is an utter waste of time given previous experience.
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Decided to handle updating dev/stage/prod here rather than piecemeal in the table by table deps of bug 1303763.

Statements generated from:

SELECT table_name, redundant_index_columns, dominant_index_columns, sql_drop_index FROM sys.schema_redundant_indexes WHERE subpart_exists != 1

Run against dev/stage:

ALTER TABLE `treeherder`.`bug_job_map` DROP INDEX `bug_job_map_d697ea38`;
ALTER TABLE `treeherder`.`commit` DROP INDEX `commit_e13f0cea`;
ALTER TABLE `treeherder`.`failure_line` DROP INDEX `failure_line_48d13415`;
ALTER TABLE `treeherder`.`failure_match` DROP INDEX `failure_match_4b7d144b`;
ALTER TABLE `treeherder`.`job_exclusion` DROP INDEX `idx_name`;
ALTER TABLE `treeherder`.`job_log` DROP INDEX `job_log_d697ea38`;
ALTER TABLE `treeherder`.`job` DROP INDEX `job_084f3fc9`;
ALTER TABLE `treeherder`.`machine` DROP INDEX `idx_name`;
ALTER TABLE `treeherder`.`performance_alert_summary` DROP INDEX `performance_alert_summary_81ee1a31`;
ALTER TABLE `treeherder`.`performance_alert` DROP INDEX `performance_alert_548b2096`;
ALTER TABLE `treeherder`.`performance_datum` DROP INDEX `performance_datum_81ee1a31`;
ALTER TABLE `treeherder`.`performance_signature` DROP INDEX `performance_signature_81ee1a31`;
ALTER TABLE `treeherder`.`text_log_error` DROP INDEX `text_log_error_bef491d2`;
ALTER TABLE `treeherder`.`text_log_step` DROP INDEX `text_log_step_d697ea38`;

Will wait a bit before doing the same on prod.
Blocks: 1303763
No longer depends on: 1303763
No change in stage DB perf; have applied against prod too.
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: