The logical relation between the `option` table & the `option_collection` table is many-to-many. In normal circumstances, relations like these leverage an extra table (something like `option_option_collection_table`). But Treeherder's implementation uses a custom hash map; this old & custom implementation doesn't allow developers to build more elaborate SQL queries in a single go, using Django's ORM. Developers need to build the first part of the query to fetch some data, come back to Python application code & use that data as input to the custom hash implementation to get other missing data **and then** they can build the last part of the SQL query to fetch all the data. More tech details on the current implementation: An `option_collection` row can have multiple `option` rows associated with it. An `option` can be associated to multiple `option_collection` rows. When a new `option_collection` is registered, 1st a SHA hash is computed based on all the composing *(and sorted)* `option` names. Then, for each of the composing option, a new `option_collection` row is created, using a new unique id, that hash & the current `option`'s id. Basically, if an `option_collection` has 5 `option`s, there will be 5 `option_collection` rows. There's a custom [OptionCollectionManager](https://github.com/mozilla/treeherder/blob/31a8e2bcee51d272e616cb49ade7451ad0c3aad7/treeherder/model/models.py#L305) just to pick that information up. The manager starts from one of the ids of a particular option_collection, fetches the associated hash, then looks for all `option_collection` rows which share that hash; all these rows point back to each of the composing `options`. This feels very counter-intuitive & without any obvious benefits. We need to revisit this implementation & refactor it such that we're defining the many-to-many relation in a more standard way.
Bug 1616212 Comment 0 Edit History
Note: The actual edited comment in the bug view page will always show the original commenter’s name and original timestamp.
The logical relation between the `option` table & the `option_collection` table is many-to-many. In normal circumstances, relations like these leverage an extra table (something like `option_option_collection_table`). But Treeherder's implementation uses a custom hash map; this old & custom implementation doesn't allow developers to build more elaborate SQL queries in a single go, using Django's ORM. Developers need to build the first part of the query to fetch some data, come back to Python application code & use that data as input to the custom hash implementation to get other missing data **and then** they can build the last part of the SQL query to fetch all the data. More tech details on the current implementation: An `option_collection` row can have multiple `option` rows associated with it. An `option` can be associated to multiple `option_collection` rows. When a new `option_collection` is registered, 1st a SHA hash is computed based on all the composing *(and sorted)* `option` names. Then, for each of the composing option, a new `option_collection` row is created, using a new unique id, that hash & the current `option`'s id. Basically, if an `option_collection` has 5 `option`s, there will be 5 `option_collection` rows. There's a custom [OptionCollectionManager](https://github.com/mozilla/treeherder/blob/31a8e2bcee51d272e616cb49ade7451ad0c3aad7/treeherder/model/models.py#L305) just to pick that information up. The manager starts from one of the ids of a particular option_collection, fetches the associated hash, then looks for all `option_collection` rows which share that hash; all these rows point back to each of the composing `options`. This feels very counter-intuitive & without any obvious benefits. We need to revisit this implementation & refactor it such that we're defining the many-to-many relation in a more standard way, to enable building elaborate SQL queries in a single go.
The logical relation between the [option](https://github.com/mozilla/treeherder/blob/31a8e2bcee51d272e616cb49ade7451ad0c3aad7/treeherder/model/models.py#L329) table & the [option_collection](https://github.com/mozilla/treeherder/blob/31a8e2bcee51d272e616cb49ade7451ad0c3aad7/treeherder/model/models.py#L329) table is many-to-many. In normal circumstances, relations like these leverage an extra table (something like `option_option_collection_table`). But Treeherder's implementation uses a custom hash map; this old & custom implementation doesn't allow developers to build more elaborate SQL queries in a single go, using Django's ORM. Developers need to build the first part of the query to fetch some data, come back to Python application code & use that data as input to the custom hash implementation to get other missing data **and then** they can build the last part of the SQL query to fetch all the data. More tech details on the current implementation: An `option_collection` row can have multiple `option` rows associated with it. An `option` can be associated to multiple `option_collection` rows. When a new `option_collection` is registered, 1st a SHA hash is computed based on all the composing *(and sorted)* `option` names. Then, for each of the composing option, a new `option_collection` row is created, using a new unique id, that hash & the current `option`'s id. Basically, if an `option_collection` has 5 `option`s, there will be 5 `option_collection` rows. There's a custom [OptionCollectionManager](https://github.com/mozilla/treeherder/blob/31a8e2bcee51d272e616cb49ade7451ad0c3aad7/treeherder/model/models.py#L305) just to pick that information up. The manager starts from one of the ids of a particular option_collection, fetches the associated hash, then looks for all `option_collection` rows which share that hash; all these rows point back to each of the composing `options`. This feels very counter-intuitive & without any obvious benefits. We need to revisit this implementation & refactor it such that we're defining the many-to-many relation in a more standard way, to enable building elaborate SQL queries in a single go.
The logical relation between the [option](https://github.com/mozilla/treeherder/blob/31a8e2bcee51d272e616cb49ade7451ad0c3aad7/treeherder/model/models.py#L81) table & the [option_collection](https://github.com/mozilla/treeherder/blob/31a8e2bcee51d272e616cb49ade7451ad0c3aad7/treeherder/model/models.py#L329) table is many-to-many. In normal circumstances, relations like these leverage an extra table (something like `option_option_collection_table`). But Treeherder's implementation uses a custom hash map; this old & custom implementation doesn't allow developers to build more elaborate SQL queries in a single go, using Django's ORM. Developers need to build the first part of the query to fetch some data, come back to Python application code & use that data as input to the custom hash implementation to get other missing data **and then** they can build the last part of the SQL query to fetch all the data. More tech details on the current implementation: An `option_collection` row can have multiple `option` rows associated with it. An `option` can be associated to multiple `option_collection` rows. When a new `option_collection` is registered, 1st a SHA hash is computed based on all the composing *(and sorted)* `option` names. Then, for each of the composing option, a new `option_collection` row is created, using a new unique id, that hash & the current `option`'s id. Basically, if an `option_collection` has 5 `option`s, there will be 5 `option_collection` rows. There's a custom [OptionCollectionManager](https://github.com/mozilla/treeherder/blob/31a8e2bcee51d272e616cb49ade7451ad0c3aad7/treeherder/model/models.py#L305) just to pick that information up. The manager starts from one of the ids of a particular option_collection, fetches the associated hash, then looks for all `option_collection` rows which share that hash; all these rows point back to each of the composing `options`. This feels very counter-intuitive & without any obvious benefits. We need to revisit this implementation & refactor it such that we're defining the many-to-many relation in a more standard way, to enable building elaborate SQL queries in a single go.