Closed Bug 1148576 Opened 9 years ago Closed 9 years ago

add new field ``tier`` to all ``job`` tables in treeherder stage and production

Categories

(Tree Management :: Treeherder: Infrastructure, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: camd, Assigned: camd)

References

Details

(Keywords: treeherder)

Please add this field to all the repo databases on stage and production for treeherder.

the new field is:

  `tier` int(10) unsigned DEFAULT 1,

and idx:

  KEY `idx_tier` (`tier`),
Blocks: 1113322
I believe this is the right cmd for each database:

ALTER TABLE `job` 
ADD COLUMN `tier` INT(10) UNSIGNED NULL DEFAULT '1' AFTER `running_eta`;
Sorry to drop by (just been glancing at my email whilst on PTO to check there has been nothing that will block the imminent TBPL EOL), but wanted to say this before we add the field, given it's non-trivial.

Are we sure adding this to the jobs table is the best way to go about this? It seems more on par with having a separate table a la the job_eta and using the signature hash to correlate? Mauro, thoughts? :-) (I don't know if you've both discussed this, but there's nothing in any bugs that I can see)

Opening this bug up since it doesn't need to be confidential, and also adding the treeherder component alias to the CC list, since otherwise people don't get notifications of this bug existing :-)
Group: metrics-private
Keywords: treeherder
OS: Mac OS X → All
Hardware: x86 → All
Mauro and I discussed this in a vidyo conversation.  Part of the decision to go this route was speed and ease of implementation.  This approach means the jobs get set on ingestion, but we can determine the tier of the job via the exclusion profile.  This approach means we don't need yet another join to get the tier of a job on fetch.  And setting it during ingestion via the special exclusion profile meant we didn't need another UI for editing which jobs are tier 1 vs. 2.

I think a better implementation later would be something like what you speak of.  Perhaps even having that value reside in the ``reference_data_signature`` table itself.  The user could set in there what tier the job belongs to via a new UI (or enhanced django admin ui).

So, yeah.  This approach was primarily for expediency to get it done by end of quarter.  Adding the field, as it ends up, is actually pretty easy with the ``run_sql`` mgmt command.
Cool thank you for explanation :-) 

I guess one or my concerns was that if we were doing a simpler, but not our ideal choice of implementation just to get this done by end of quarter, then I think we should defer the goal (like mine was) rather than rush it through just to meet an arbitrary list of goals, given this involves schema changes (if it didn't I'd be more "let's not let perfect be the enemy of good"). I'm starting to really kick myself that I didn't think twice about how overoptimistic (and new feature focused) our goals were for Q1 hehe.

I know we have the run_sql command, but I'd be surprised if this can be run without taking the nodes offline and doing a fallback, hence my "non-trivial" comment earlier.

Anyway, if you feel this is the best implementation for now, I'm happy to continue - I'll leave it to you both to make the call :-)
This solution is in my opinion good enough and covers the requirements.
Ed--  I appreciate your input on this.  You make many excellent points.  And I don't disagree that I wish we could make this a more robust solution.  However there's a fair bit of urgency on getting basic Tier-2 support working at least this well.  So I think we'll need to move forward with it as is.  But thanks again for the input.  :-)
Here's the SQL I executed (am executing) with ``run_sql`` on dev right now.

ALTER TABLE `job` 
ADD COLUMN `tier` INT(10) DEFAULT '1' AFTER `running_eta`,
ADD KEY `idx_tier` (`tier`);
Note that in many circumstances, adding a column and an index is online in MySQL:
http://dev.mysql.com/doc/refman/5.6/en/innodb-create-index-overview.html

I would start with treeherder stage and verify that it indeed is an online operation. If you run into trouble we can kill the ALTER TABLE on the server and roll back the change.
Looks like i can just do it with:

python2.7 manage.py run_sql -f migrate.sql

interestingly, even though python 2.7.8 is the default python, I still needed to call this explicitly with ``python2.7`` to get it to work.
I ran the above sql migration on dev, stage and prod.  All appears complete and correct.  Closing as fixed now.
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Assignee: server-ops-database → cdawson
Component: Database Operations → Treeherder: Infrastructure
Product: Data & BI Services Team → Tree Management
QA Contact: scabral → laura
Version: other → ---
(In reply to Cameron Dawson [:camd] from comment #9)
> interestingly, even though python 2.7.8 is the default python, I still
> needed to call this explicitly with ``python2.7`` to get it to work.

Python 2.6 is the default Python and we can't change that (eugh RHEL), but in bug 1133138 I requested aliases be added for root and the treeherder user so python would use 2.7. However this doesn't help user accounts, which is presumably where this was run from.
As a short term measure you could add a similar alias to your user .bashrc, or perhaps we could see if there was an automated way to add it to all users both current and future, on treeherder machines (though likely not worth it, given the move to Heroku/..).
Depends on: 1304096
You need to log in before you can comment on or make changes to this bug.