Bug 1616212 Comment 0 Edit History

Original comment by

on 2020-02-18 02:25:49 PST

The logical relation between the `option` table & the `option_collection` table is many-to-many. In normal circumstances, relations like these leverage an extra table (something like `option_option_collection_table`). But Treeherder's implementation uses a custom hash map; this old & custom implementation doesn't allow developers to build more elaborate SQL queries in a single go, using Django's ORM.

Developers need to build the first part of the query to fetch some data, come back to Python application code & use that data as input to the custom hash implementation to get other missing data **and then** they can build the last part of the SQL query to fetch all the data.

More tech details on the current implementation:
An `option_collection` row can have multiple `option` rows associated with it.
An `option` can be associated to multiple `option_collection` rows.

When a new `option_collection` is registered, 1st a SHA hash is computed based on all the composing *(and sorted)* `option` names. Then, for each of the composing option, a new `option_collection` row is created, using a new unique id, that hash & the current `option`'s id. Basically, if an `option_collection` has 5 `option`s, there will be 5 `option_collection` rows.
There's a custom [OptionCollectionManager](https://github.com/mozilla/treeherder/blob/31a8e2bcee51d272e616cb49ade7451ad0c3aad7/treeherder/model/models.py#L305) just to pick that information up. The manager starts from one of the ids of a particular option_collection, fetches the associated hash, then looks for all `option_collection` rows which share that hash; all these rows point back to each of the composing `options`. This feels very counter-intuitive & without any obvious benefits.

We need to revisit this implementation & refactor it such that we're defining the many-to-many relation in a more standard way.

Revision 1 by

Ionuț Goldan [:igoldan]

on 2020-02-18 02:26:36 PST

The logical relation between the `option` table & the `option_collection` table is many-to-many. In normal circumstances, relations like these leverage an extra table (something like `option_option_collection_table`). But Treeherder's implementation uses a custom hash map; this old & custom implementation doesn't allow developers to build more elaborate SQL queries in a single go, using Django's ORM.

Developers need to build the first part of the query to fetch some data, come back to Python application code & use that data as input to the custom hash implementation to get other missing data **and then** they can build the last part of the SQL query to fetch all the data.

More tech details on the current implementation:
An `option_collection` row can have multiple `option` rows associated with it.
An `option` can be associated to multiple `option_collection` rows.

When a new `option_collection` is registered, 1st a SHA hash is computed based on all the composing *(and sorted)* `option` names. Then, for each of the composing option, a new `option_collection` row is created, using a new unique id, that hash & the current `option`'s id. Basically, if an `option_collection` has 5 `option`s, there will be 5 `option_collection` rows.
There's a custom [OptionCollectionManager](https://github.com/mozilla/treeherder/blob/31a8e2bcee51d272e616cb49ade7451ad0c3aad7/treeherder/model/models.py#L305) just to pick that information up. The manager starts from one of the ids of a particular option_collection, fetches the associated hash, then looks for all `option_collection` rows which share that hash; all these rows point back to each of the composing `options`. This feels very counter-intuitive & without any obvious benefits.

We need to revisit this implementation & refactor it such that we're defining the many-to-many relation in a more standard way, to enable building elaborate SQL queries in a single go.

Revision 2 by

Ionuț Goldan [:igoldan]

on 2020-02-18 02:27:25 PST

The logical relation between the [option](https://github.com/mozilla/treeherder/blob/31a8e2bcee51d272e616cb49ade7451ad0c3aad7/treeherder/model/models.py#L329) table & the [option_collection](https://github.com/mozilla/treeherder/blob/31a8e2bcee51d272e616cb49ade7451ad0c3aad7/treeherder/model/models.py#L329) table is many-to-many. In normal circumstances, relations like these leverage an extra table (something like `option_option_collection_table`). But Treeherder's implementation uses a custom hash map; this old & custom implementation doesn't allow developers to build more elaborate SQL queries in a single go, using Django's ORM.

Developers need to build the first part of the query to fetch some data, come back to Python application code & use that data as input to the custom hash implementation to get other missing data **and then** they can build the last part of the SQL query to fetch all the data.

More tech details on the current implementation:
An `option_collection` row can have multiple `option` rows associated with it.
An `option` can be associated to multiple `option_collection` rows.

When a new `option_collection` is registered, 1st a SHA hash is computed based on all the composing *(and sorted)* `option` names. Then, for each of the composing option, a new `option_collection` row is created, using a new unique id, that hash & the current `option`'s id. Basically, if an `option_collection` has 5 `option`s, there will be 5 `option_collection` rows.
There's a custom [OptionCollectionManager](https://github.com/mozilla/treeherder/blob/31a8e2bcee51d272e616cb49ade7451ad0c3aad7/treeherder/model/models.py#L305) just to pick that information up. The manager starts from one of the ids of a particular option_collection, fetches the associated hash, then looks for all `option_collection` rows which share that hash; all these rows point back to each of the composing `options`. This feels very counter-intuitive & without any obvious benefits.

We need to revisit this implementation & refactor it such that we're defining the many-to-many relation in a more standard way, to enable building elaborate SQL queries in a single go.

Revision 3 by

Ionuț Goldan [:igoldan]

on 2020-02-18 02:27:46 PST

The logical relation between the [option](https://github.com/mozilla/treeherder/blob/31a8e2bcee51d272e616cb49ade7451ad0c3aad7/treeherder/model/models.py#L81) table & the [option_collection](https://github.com/mozilla/treeherder/blob/31a8e2bcee51d272e616cb49ade7451ad0c3aad7/treeherder/model/models.py#L329) table is many-to-many. In normal circumstances, relations like these leverage an extra table (something like `option_option_collection_table`). But Treeherder's implementation uses a custom hash map; this old & custom implementation doesn't allow developers to build more elaborate SQL queries in a single go, using Django's ORM.

Developers need to build the first part of the query to fetch some data, come back to Python application code & use that data as input to the custom hash implementation to get other missing data **and then** they can build the last part of the SQL query to fetch all the data.

More tech details on the current implementation:
An `option_collection` row can have multiple `option` rows associated with it.
An `option` can be associated to multiple `option_collection` rows.

When a new `option_collection` is registered, 1st a SHA hash is computed based on all the composing *(and sorted)* `option` names. Then, for each of the composing option, a new `option_collection` row is created, using a new unique id, that hash & the current `option`'s id. Basically, if an `option_collection` has 5 `option`s, there will be 5 `option_collection` rows.
There's a custom [OptionCollectionManager](https://github.com/mozilla/treeherder/blob/31a8e2bcee51d272e616cb49ade7451ad0c3aad7/treeherder/model/models.py#L305) just to pick that information up. The manager starts from one of the ids of a particular option_collection, fetches the associated hash, then looks for all `option_collection` rows which share that hash; all these rows point back to each of the composing `options`. This feels very counter-intuitive & without any obvious benefits.

We need to revisit this implementation & refactor it such that we're defining the many-to-many relation in a more standard way, to enable building elaborate SQL queries in a single go.