Closed
Bug 1311185
Opened 8 years ago
Closed 8 years ago
Store push data inside main treeherder db using ORM
Categories
(Tree Management :: Treeherder, defect)
Tree Management
Treeherder
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: wlach, Assigned: wlach)
References
Details
Attachments
(2 files)
This bug will track storing and using result set (push) data inside the main db.
Comment 1•8 years ago
|
||
Assignee | ||
Comment 2•8 years ago
|
||
Rough deployment plan for this:
1. Add push and commit models and tables to central database, migrate initial batch of data
2. Switch on ingestion of push and commit and data in central db, but don't use it for anything.
3. Switch external endpoints to show old push data.
4. Switch off ingestion of push and commit data into datasource db, remove old code related to it.
Assignee | ||
Updated•8 years ago
|
Assignee: nobody → wlachance
Assignee | ||
Comment 3•8 years ago
|
||
Comment on attachment 8803478 [details] [review]
[treeherder] wlach:1311185 > mozilla:master
Hey, could you take a look at this (ideally sometime in the next couple days, I'm hoping to get the ds migration project wrapped up sooner than later). It is split up into multiple commits to facilitate the migration plan outlined in comment 2, but I think for review purposes it just makes sense to look at it as a unit.
I plan on testing deployment of this on prototype and then stage before trying it on master.
Attachment #8803478 -
Flags: review?(emorley)
Attachment #8803478 -
Flags: feedback?(cdawson)
Comment 4•8 years ago
|
||
(In reply to William Lachance (:wlach) from comment #2)
> Rough deployment plan for this:
>
> 1. Add push and commit models and tables to central database, migrate
> initial batch of data
> 2. Switch on ingestion of push and commit and data in central db, but don't
> use it for anything.
> 3. Switch external endpoints to show old push data.
> 4. Switch off ingestion of push and commit data into datasource db, remove
> old code related to it.
This looks like a good plan. Though I suspect step 3 was a typo and you mean you'll switch the external endpoints to show ORM push data?
Assignee | ||
Comment 5•8 years ago
|
||
(In reply to Cameron Dawson [:camd] from comment #4)
> (In reply to William Lachance (:wlach) from comment #2)
> > Rough deployment plan for this:
> >
> > 1. Add push and commit models and tables to central database, migrate
> > initial batch of data
> > 2. Switch on ingestion of push and commit and data in central db, but don't
> > use it for anything.
> > 3. Switch external endpoints to show old push data.
> > 4. Switch off ingestion of push and commit data into datasource db, remove
> > old code related to it.
>
> This looks like a good plan. Though I suspect step 3 was a typo and you
> mean you'll switch the external endpoints to show ORM push data?
Yes :) Sorry about that, so many moving parts to keep track of here.
Comment 6•8 years ago
|
||
Comment on attachment 8803478 [details] [review]
[treeherder] wlach:1311185 > mozilla:master
Looks good to me, though Cameron's more familiar with the resultset ingestion than I :-)
Attachment #8803478 -
Flags: review?(emorley)
Attachment #8803478 -
Flags: review?(cdawson)
Attachment #8803478 -
Flags: feedback?(cdawson)
Attachment #8803478 -
Flags: feedback+
Comment 7•8 years ago
|
||
I will review this a little later today. I need to get past a few things on the Taskcluster Auth before I switch gears.
Assignee | ||
Comment 8•8 years ago
|
||
So I'm starting to experiment with deploying this on stage, pending review. For the most part it's pretty smooth, but migrating some of the tables is pretty slow because we're adding a new column with foreign key constraints. I'm particularly concerned about the amount of time adding a foreign key constraint on the new (initially NULL) column is taking (which seems to by default involve creating a temporary table, blocking new inserts until it's done).
Pythian: is there any way to just ignore integrity checks when initially adding the foreign key constraint to the newly added column in mysql? There is no way the value of the column would equal anything other than NULL.
Flags: needinfo?(team73)
Comment 9•8 years ago
|
||
I hope this answer your question:
You can use "column name" "data type" DEFAULT NULL
Ex:
`property_name` varchar(256) DEFAULT NULL,
Flags: needinfo?(team73)
Assignee | ||
Comment 10•8 years ago
|
||
(In reply to Pythian Team73 from comment #9)
> I hope this answer your question:
> You can use "column name" "data type" DEFAULT NULL
>
> Ex:
> `property_name` varchar(256) DEFAULT NULL,
Not quite. Let me be more explicit. I'm adding the following column on a large (134 million rows) table:
`push_id` int(11) DEFAULT NULL
Adding this column is fairly fast and does not block inserts. Adding the following constraint, however, is rather slow (mysql wants to create a temporary table to perform the operation), and any inserts are blocked:
CONSTRAINT `performance_datum_push_id_f0fbb265c6a3fd_fk_push_id` FOREIGN KEY (`push_id`) REFERENCES `push` (`id`)
There is no way push_id would be anything other than NULL for now. Given that, is there a way to speed up adding this constraint?
Flags: needinfo?(team73)
Comment 11•8 years ago
|
||
Mysql creates another table for both cases, not just when adding a constraint. It is about foreign key constraints adding additional locking to make sure no changes can happen that would break consistency. Setting foreign_key_checks=0 helps you in this case.
Kindly, if you can share details we can assist you on this.
https://dev.mysql.com/doc/refman/5.6/en/innodb-create-index-overview.html
Flags: needinfo?(team73)
Assignee | ||
Comment 12•8 years ago
|
||
I think disabling the foreign key checks might be enough here, validating that right now.
Deploying this patch is going to be complex enough that I decided to create an etherpad describing the migration process (based on my experiences deploying to prototype):
https://public.etherpad-mozilla.org/p/bug-1311185-deployment-plan
Will re-validate these steps on stage, before deploying to production.
Updated•8 years ago
|
Attachment #8803478 -
Flags: review?(cdawson) → review+
Assignee | ||
Comment 13•8 years ago
|
||
This doesn't make sense to land all at once on master, I'm going to start by landing the innocuous piece that adds the db models and migration scripts. I think it makes sense to deploy that to production pretty quickly (assuming all goes well on stage), then we can look at rolling out the other pieces incrementally according to the linked plan.
Comment 14•8 years ago
|
||
Commit pushed to master at https://github.com/mozilla/treeherder
https://github.com/mozilla/treeherder/commit/7b7471f255198dad50cc17cfc0175648dfbf68a4
Bug 1311185 - Add push and commit objects, migration script
Comment 15•8 years ago
|
||
Commit pushed to master at https://github.com/mozilla/treeherder
https://github.com/mozilla/treeherder/commit/aa991be35bdf7cbe6d36add2622328abfd778e2c
Bug 1311185 - Handle duplicate revisions in legacy databases
Comment 16•8 years ago
|
||
Commit pushed to master at https://github.com/mozilla/treeherder
https://github.com/mozilla/treeherder/commit/d757fb6bad14b40eec49c7607d5ec90915c8c4a8
Bug 1311185 - Start ingesting ORM pushes / commits
But don't use them for anything yet
Comment 17•8 years ago
|
||
Commit pushed to master at https://github.com/mozilla/treeherder
https://github.com/mozilla/treeherder/commit/8698b8fae44e2cf7df753960aad8e1c6ab12ff06
Bug 1311185 - Fix error in alerts when data is migrating
Comment 18•8 years ago
|
||
Commits pushed to master at https://github.com/mozilla/treeherder
https://github.com/mozilla/treeherder/commit/f024daba7b5e6923816ca0fca0270d097ebf72bc
Bug 1311185 - Enforce uniqueness on push id's for performance alerts
https://github.com/mozilla/treeherder/commit/8ccc91ee9524b207414b962146e8dead03cb0b72
Bug 1311185 - Make API's return push id's, not result set id's
For now the resultset and job API endpoints still return result_set_id,
for backwards compatibility.
Comment 19•8 years ago
|
||
Commit pushed to master at https://github.com/mozilla/treeherder
https://github.com/mozilla/treeherder/commit/4bba979c04430cc8488f65d8cc8da0a861361a14
Bug 1311185 - Make "push" a required property of every job and perf datum
Correspondingly, result_set_id is now optional. Add a migration script to
change this for legacy datasource data.
Comment 20•8 years ago
|
||
Commit pushed to master at https://github.com/mozilla/treeherder
https://github.com/mozilla/treeherder/commit/301218c2f04bb92d369ec3611c35192586cb47f8
Bug 1311185 - Fix error that could happen when updating an existing job
Comment 21•8 years ago
|
||
Commit pushed to master at https://github.com/mozilla/treeherder
https://github.com/mozilla/treeherder/commit/982e506f9e210c18284e320a893ab5a54374b410
Bug 1311185 - Fix logviewer
Comment 22•8 years ago
|
||
Commit pushed to master at https://github.com/mozilla/treeherder
https://github.com/mozilla/treeherder/commit/75eaa9ad0a5ccdb6087647aefe5b42b3216b4074
Bug 1311185 - Add `result_set_id` return value to job endpoint
This mirrors the value of `push_id`, for backwards compatibility with
tools which were depending on that value.
Comment 23•8 years ago
|
||
Commit pushed to master at https://github.com/mozilla/treeherder
https://github.com/mozilla/treeherder/commit/3a6be476fc96a79231729778037efd213653c5c4
Bug 1311185 - Fix ordering of commits when viewing push details
Comment 24•8 years ago
|
||
I'm switching the prototype Heroku instance over to the new treeherder-dev DB, which was created from the `treeherder-prod-2016-11-10-07-05` backup snapshot:
https://github.com/mozilla-platform-ops/devservices-aws/blob/e698ae2a5d7ce7b7a222137d81b20d3b0eb33d19/treeherder/rds.tf#L88
However this snapshot doesn't include the last migration or two (one due to timing of when the backup was taken, the other due their being migrations on master that are not on prod), and when the migrations ran, they errored:
Applying model.0040_push_and_commit_orm_2... OK
Applying perf.0022_alert_summary_push_uniqueness...Traceback (most recent call last):
...
django.db.utils.IntegrityError: (1062, "Duplicate entry '14-1-7187-7188' for key 'performance_alert_summary_repository_id_62c12ee1b497800f_uniq'")
(https://papertrailapp.com/systems/treeherder-prototype/events?focus=733697239926726684)
Did you have to manually fix this up on stage/prod? Or did turning off the unique constraints mean we hid real errors? (It's hard to tell from the etherpad which tables the operations were performed against when the foreign constraints were turned off).
One option is asking fubar to reset treeherder-dev back to a newer prod backup snapshot, once what is on master is on prod (the backups occur once a day).
Flags: needinfo?(wlachance)
Assignee | ||
Comment 25•8 years ago
|
||
(In reply to Ed Morley [:emorley] from comment #24)
>
> Did you have to manually fix this up on stage/prod? Or did turning off the
> unique constraints mean we hid real errors? (It's hard to tell from the
> etherpad which tables the operations were performed against when the foreign
> constraints were turned off).
Oh yes, I remember that problem. Some older result set ids mapped to the same push id (duplicates in the old db) and this polluted the alert summary table as well. I fixed up stage and prod manually.
> One option is asking fubar to reset treeherder-dev back to a newer prod
> backup snapshot, once what is on master is on prod (the backups occur once a
> day).
I think this is the best option.
Flags: needinfo?(wlachance)
Comment 26•8 years ago
|
||
In the meantime I've truncated treeherder.performance_alert_summary (and treeherder.performance_alert) on treeherder-dev RDS, so the migrations can complete and James is unblocked.
As such, there's no real rush to recreate treeherder-dev again immediately, so can wait until some more manual datasource changes have landed :-)
Comment 27•8 years ago
|
||
Commit pushed to master at https://github.com/mozilla/treeherder
https://github.com/mozilla/treeherder/commit/5c3344f70890c43948c60eaca6a73fc4f2445d18
Bug 1311185 - Fix creation of performance alerts from UI
Comment 28•8 years ago
|
||
Commit pushed to master at https://github.com/mozilla/treeherder
https://github.com/mozilla/treeherder/commit/1c0c32cec0302f1b40549acb5190614609ee2dd1
Bug 1311185 - Turn off remaining result set ingestion and use
Assignee | ||
Comment 29•8 years ago
|
||
Just need to finish last db modifications on production (to disallow null pushes for all data) and then we'll be done. Will do next week.
Comment 30•8 years ago
|
||
Assignee | ||
Comment 31•8 years ago
|
||
Comment on attachment 8810500 [details] [review]
[treeherder] mozilla:1311185-fix-embed > mozilla:master
This fixes the embed view (tested locally and on stage), thanks for noticing! We should write a unit test as a followup so this doesn't happen again.
Attachment #8810500 -
Flags: review?(emorley)
Comment 32•8 years ago
|
||
Commit pushed to master at https://github.com/mozilla/treeherder
https://github.com/mozilla/treeherder/commit/716574d9633ce6a4efde988d8fecde176fb5d0e0
Bug 1311185 - Remove more unused result set / datasource code
Updated•8 years ago
|
Attachment #8810500 -
Flags: review?(emorley) → review+
Assignee | ||
Comment 33•8 years ago
|
||
I'm going to call this done. Filed bug 1318469 to add a unit test for the embedding API.
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•