Closed
Bug 1278711
Opened 9 years ago
Closed 9 years ago
Add missing indexes to the jobdetail table
Categories
(Tree Management :: Treeherder: Data Ingestion, defect)
Tree Management
Treeherder: Data Ingestion
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: camd, Assigned: camd)
References
Details
Attachments
(5 files, 4 obsolete files)
47 bytes,
text/x-github-pull-request
|
garndt
:
review+
|
Details | Review |
47 bytes,
text/x-github-pull-request
|
wlach
:
review+
|
Details | Review |
47 bytes,
text/x-github-pull-request
|
wlach
:
review+
|
Details | Review |
47 bytes,
text/x-github-pull-request
|
emorley
:
review+
|
Details | Review |
203.51 KB,
image/jpeg
|
Details |
Here's an example of the error on new relic:
https://rpm.newrelic.com/accounts/677903/applications/5585473/traced_errors/553a8f-83e1d9fc-2c4a-11e6-b947-b82a72d22a14
treeherder.model.models:MultipleObjectsReturned: get() returned more than one JobDetail -- it returned 2!
Assignee | ||
Updated•9 years ago
|
Assignee: nobody → cdawson
Comment 1•9 years ago
|
||
Assignee | ||
Updated•9 years ago
|
Attachment #8760982 -
Flags: review?(wlachance)
Comment 2•9 years ago
|
||
Comment on attachment 8760982 [details] [review]
[treeherder] mozilla:jobdetail-unique-together > mozilla:master
Should have added this way in the first place, sorry! Thanks for fixing it.
Attachment #8760982 -
Flags: review?(wlachance) → review+
Comment 3•9 years ago
|
||
Commit pushed to master at https://github.com/mozilla/treeherder
https://github.com/mozilla/treeherder/commit/4e377daf5aa8357ccee9ea6d2d841dfe41ee70e3
Bug 1278711 - Add unique together index to JobDetail model/table (#1573)
During job ingestion, we can sometimes get an error of
MultipleObjectsReturned when creating a JobDetail record. Adding
an index for uniqueness will prevent us from getting that while
using the get_or_create model function.
Assignee | ||
Updated•9 years ago
|
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Comment 4•9 years ago
|
||
Comment 5•9 years ago
|
||
Commit pushed to master at https://github.com/mozilla/treeherder
https://github.com/mozilla/treeherder/commit/12567801ce236844cf267337869a6a724b66121a
Revert "Bug 1278711 - Add unique together index to JobDetail model/table (#1573)" (#1577)
This reverts commit 4e377daf5aa8357ccee9ea6d2d841dfe41ee70e3.
Comment 6•9 years ago
|
||
Assignee | ||
Comment 7•9 years ago
|
||
Comment on attachment 8761357 [details] [review]
[treeherder] mozilla:add-jobdetail-unique-together-idx > mozilla:master
Hey Will-- What do you think of this approach? :)
Attachment #8761357 -
Flags: review?(wlachance)
Comment 8•9 years ago
|
||
Comment on attachment 8761357 [details] [review]
[treeherder] mozilla:add-jobdetail-unique-together-idx > mozilla:master
As discussed on irc, I'd like to see:
1. Update the get_or_create statement to put "url" in a defaults dict (lots of examples of this elsewhere in treeherder)
2. Update the comment to make it clear that the behaviour is what we want, rather than just a workaround.
Attachment #8761357 -
Flags: review?(wlachance) → review-
Assignee | ||
Comment 9•9 years ago
|
||
I think there is likely an additional problem to what we're seeing. We probably already have duplicate rows in the table. That will cause the creation of this index to fail. Currently running a query to find duplicates on stage now, but it's taking a loooooong time. Keeps timing out... sigh...
Sooo... after googling, I think our solution MAY be to accept the migration as I've got it written. But instead of executing it as-is, we need to run the sql by hand. I am SURE it'll timeout anyway....
But the SQL for the migration will be:
ALTER TABLE `job_detail` MODIFY `title` varchar(100) NULL;
ALTER TABLE `job_detail` ADD CONSTRAINT `job_detail_job_id_f2203261c91c576_uniq` UNIQUE (`job_id`, `title`, `value`);
However, we can still make use of "ALTER IGNORE TABLE" which will delete duplicates for us as it creates the index. This function is apparently available in mysql 5.6, which we're on. It's being taken out of 5.7.4 because it causes Hepatitis. Actually... not sure why...
Will: did you have any better ideas?
Status: RESOLVED → REOPENED
Flags: needinfo?(wlachance)
Resolution: FIXED → ---
Assignee | ||
Comment 10•9 years ago
|
||
My last comment may have been unclear. We should replace this line:
ALTER TABLE `job_detail` ADD CONSTRAINT `job_detail_job_id_f2203261c91c576_uniq` UNIQUE (`job_id`, `title`, `value`);
with this:
ALTER IGNORE TABLE `job_detail` ADD CONSTRAINT `job_detail_job_id_f2203261c91c576_uniq` UNIQUE (`job_id`, `title`, `value`);
Comment 11•9 years ago
|
||
(In reply to Cameron Dawson [:camd] from comment #9)
> I think there is likely an additional problem to what we're seeing. We
> probably already have duplicate rows in the table. That will cause the
> creation of this index to fail. Currently running a query to find
> duplicates on stage now, but it's taking a loooooong time. Keeps timing
> out... sigh...
It might be faster to create an index on job_id, value, title first before running a query to get the duplicates?
Are we not able to add the constraint without eliminating the duplicates first? Does mysql verify the table matches the constraint when you add it?
> Sooo... after googling, I think our solution MAY be to accept the migration
> as I've got it written. But instead of executing it as-is, we need to run
> the sql by hand. I am SURE it'll timeout anyway....
Yeah, definitely this is stuff that should be done outside of a deployment. :)
Flags: needinfo?(wlachance)
Comment 12•9 years ago
|
||
If making manual changes, please also update the new stage RDS instance too :-)
(I'd recommend adding SCL3 x 2 + RDS x 3 to your MySQLWorkbench configs - makes it pretty quick to connect to each to make changes :-))
Assignee | ||
Comment 13•9 years ago
|
||
Dang, I'm a bit stuck here so far. Need to talk with a DBA. So the ``value`` field is a varchar(512) and I can't create an index on that field even by itself. I still get "Specified key was too long; max key length is 767 bytes"
The table is utf8 and as I understand each char in a varchar can be 1 to 6 bytes. The index is probably calculating size based on 6. We'd have to truncate the field to 125 or so to make it fit in an index. And that's just the one field.
We CAN avoid the exception just with a ``try:`` block. But that won't prevent us creating duplicate records.
Assignee | ||
Comment 14•9 years ago
|
||
After more googling, it looks like there's a setting for the innoDB engine that may help us:
http://dev.mysql.com/doc/refman/5.5/en/innodb-parameters.html#sysvar_innodb_large_prefix
Assignee | ||
Comment 15•9 years ago
|
||
While this error *does* happen in connection with data ingestion with pulse jobs, it can happen with ingestion via API as well. So it's orthogonal to the pulse ingestion work. While important to fix, I'm removing it as a blocker from that bug.
No longer blocks: 1266229
Assignee | ||
Comment 16•9 years ago
|
||
Hey Sheeri-- Sorry to pull you in on this, but I'm a bit out of my league here. The TL;DR is that we want to create a unique index on 3 columns in our DB that are varchar(100), varchar(512) and int. However, I'm hitting the error in command 13. I'm wondering if the use of innodb_large_prefix is viable as a solution to this?
I'm working on this db, fwiw: treeherder-stage-rw-vip.db.scl3.mozilla.com
Table: treeherder_stage.job_detail
Flags: needinfo?(scabral)
Assignee | ||
Comment 17•9 years ago
|
||
Sheeri: specifically, would you recommend adding these to our my.cnf?
innodb_large_prefix=on
innodb_file_format=barracuda
innodb_file_per_table=true
Comment 18•9 years ago
|
||
innodb_file_per_table is standard.
We can put those other 2 in the my.cnf. They *should* work. The first 2 can also be put in as global values while the server is running.
The file format is for new tables only. I'm not sure if an ALTER TABLE will do the right thing in that case, but that's what stage is for, right?
However, job_detail is 11G. Let me set up some tests on stage.
Flags: needinfo?(scabral)
Comment 19•9 years ago
|
||
mysql> show variables like 'innodb_large%';
+---------------------+-------+
| Variable_name | Value |
+---------------------+-------+
| innodb_large_prefix | OFF |
+---------------------+-------+
1 row in set (0.00 sec)
mysql> show variables like 'innodb_file%';
+--------------------------+----------+
| Variable_name | Value |
+--------------------------+----------+
| innodb_file_format | Antelope |
| innodb_file_format_check | ON |
| innodb_file_format_max | Antelope |
| innodb_file_per_table | ON |
+--------------------------+----------+
4 rows in set (0.00 sec)
mysql> set global innodb_large_prefix=on;
Query OK, 0 rows affected (0.00 sec)
mysql> set global innodb_file_format=barracuda;
Query OK, 0 rows affected (0.00 sec)
mysql> select @@hostname;
+---------------------------------------+
| @@hostname |
+---------------------------------------+
| treeherder1.stage.db.scl3.mozilla.com |
+---------------------------------------+
1 row in set (0.00 sec)
mysql> show variables like 'innodb_large%'; show variables like 'innodb_file%';
+---------------------+-------+
| Variable_name | Value |
+---------------------+-------+
| innodb_large_prefix | ON |
+---------------------+-------+
1 row in set (0.00 sec)
+--------------------------+-----------+
| Variable_name | Value |
+--------------------------+-----------+
| innodb_file_format | Barracuda |
| innodb_file_format_check | ON |
| innodb_file_format_max | Antelope |
| innodb_file_per_table | ON |
+--------------------------+-----------+
4 rows in set (0.00 sec)
mysql> use treeherder_stage;
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A
Database changed
mysql> create table jd like job_detail; insert into jd select * from job_detail limit 100000;
Query OK, 0 rows affected (0.17 sec)
Query OK, 100000 rows affected (2.63 sec)
Records: 100000 Duplicates: 0 Warnings: 0
---- meanwhile, on the shell:
[root@treeherder1.stage.db.scl3 ~]# ls -lh /var/lib/mysql/treeherder_stage/jd.ibd
-rw-rw---- 1 mysql mysql 40M Jun 15 12:33 /var/lib/mysql/treeherder_stage/jd.ibd
---- back in the db:
mysql> ALTER IGNORE TABLE jd ADD CONSTRAINT UNIQUE (`job_id`, `title`, `value`);
ERROR 1709 (HY000): Index column size too large. The maximum column size is 767 bytes.
Hrm, not sure how to force this table to be barracuda. Will look into it.
Comment 20•9 years ago
|
||
restarted treeherder1.stage with those parameters.
mysql> create table jd (
`id` bigint(20) NOT NULL AUTO_INCREMENT,
`title` varchar(100) COLLATE utf8_bin DEFAULT NULL,
`value` varchar(512) COLLATE utf8_bin NOT NULL,
`url` varchar(512) COLLATE utf8_bin DEFAULT NULL,
`job_id` bigint(20) NOT NULL, PRIMARY KEY (`id`),
KEY `job_detail_d697ea38` (`job_id`),
KEY `job_detail_title` (`title`),
FOREIGN KEY (`job_id`) REFERENCES `job` (`id`),
UNIQUE KEY (`job_id`, `title`, `value`)
) engine=innodb default charset=utf8 collate=utf8_bin row_format=Compressed;
That worked - putting row_format=Compressed in forced Barracuda:
mysql> show variables like 'innodb_file_format%';
+--------------------------+-----------+
| Variable_name | Value |
+--------------------------+-----------+
| innodb_file_format | Barracuda |
| innodb_file_format_check | ON |
| innodb_file_format_max | Barracuda |
+--------------------------+-----------+
3 rows in set (0.00 sec)
innodb_file_format_max is a high water mark, so it went from Antelope from Barracuda, because the table "jd" has it now.
Let me test more...
Comment 21•9 years ago
|
||
tl;dr - turn the global params on, and both adds and alters that use row_format=compressed do the trick:
mysql> set global innodb_large_prefix=on; set global innodb_file_format=barracuda;
Query OK, 0 rows affected (0.00 sec)
Query OK, 0 rows affected (0.00 sec)
mysql> use treeherder_stage;
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A
Database changed
mysql> create table jd ( `id` bigint(20) NOT NULL AUTO_INCREMENT, `title` varchar(100) COLLATE utf8_bin DEFAULT NULL, `value` varchar(512) COLLATE utf8_bin NOT NULL, `url` varchar(512) COLLATE utf8_bin DEFAULT NULL, `job_id` bigint(20) NOT NULL, PRIMARY KEY (`id`), KEY `job_detail_d697ea38` (`job_id`), KEY `job_detail_title` (`title`), FOREIGN KEY (`job_id`) REFERENCES `job` (`id`), UNIQUE KEY (`job_id`, `title`, `value`) ) engine=innodb default charset=utf8 collate=utf8_bin row_format=Compressed;
Query OK, 0 rows affected (0.19 sec)
let's see if ALTER TABLE works:
mysql> drop table jd; create table jd like job_detail; alter table jd add UNIQUE KEY (`job_id`, `title`, `value`), row_format=compressed;
Query OK, 0 rows affected (0.01 sec)
Query OK, 0 rows affected (0.02 sec)
Query OK, 0 rows affected (0.05 sec)
Records: 0 Duplicates: 0 Warnings: 0
table jd dropped.
The proof of concept is complete.
Comment 22•9 years ago
|
||
Assignee | ||
Comment 23•9 years ago
|
||
Comment on attachment 8762951 [details] [review]
[treeherder] mozilla:innodb-large-prefix > mozilla:master
Ed-- Do you know if this is the only place I need to make this change? How do we set this on Heroku/RDS.
Looking at the RDS dashboard, perhaps I need to create a custom "DB Parameter Group"? I fiddled with it, but I don't apparently have access to do that, atm... :)
Attachment #8762951 -
Flags: review?(emorley)
Comment 24•9 years ago
|
||
Comment on attachment 8762951 [details] [review]
[treeherder] mozilla:innodb-large-prefix > mozilla:master
The MySQL config file in the repo currently only affects the Vagrant environment. We also need to modify for:
* Travis (see .travis.yml for James' current modification)
* SCL3-{stage,prod}
* RDS (prototype,stage,prod}
For Travis/Vagrant IMO we should use a shared file (to replace the manual edits in .travis.yml) - we should create a new mostly empty file to do this (the current in-repo file is crufty and from the mysql 5.5 and us-using-myasm era).
For RDS yeah we'll need to use a parameter group, and Kendall would need to set this up for you in Terraform etc for all three RDS instances.
Before we do this - do we definitely need such a large index? Can we reduce the size of some of the fields (if needs be by truncating edge-case existing data) and/or do we need the index on the 'value' attribute?
ie: I would posit that there should only be one instance per job_id-title combination, and the value shouldn't come into it? In which case the index shouldn't need to cover value, right? (sorry kinda lost track with all the discussion so far, I may be missing something)
Attachment #8762951 -
Flags: review?(emorley)
Comment 25•9 years ago
|
||
(In reply to Ed Morley [:emorley] from comment #24)
> ie: I would posit that there should only be one instance per job_id-title
> combination, and the value shouldn't come into it? In which case the index
> shouldn't need to cover value, right? (sorry kinda lost track with all the
> discussion so far, I may be missing something)
Sadly we can have multiple job details with the same title. The most common example I've seen is "artifact uploaded".
In lieu of modifying the database, we could consider capping the length "value" at 155 characters (so title + value would only be 255 characters long). Kind of an arbitrary limitation, but probably legit.
Comment 26•9 years ago
|
||
(In reply to Cameron Dawson [:camd] from comment #15)
> While this error *does* happen in connection with data ingestion with pulse
> jobs, it can happen with ingestion via API as well. So it's orthogonal to
> the pulse ingestion work. While important to fix, I'm removing it as a
> blocker from that bug.
I don't see this occurring at all on prod -- are you sure this isn't only tickled when pulse data ingestion is active?
Some recent stage occurrences:
https://rpm.newrelic.com/accounts/677903/applications/5585473/traced_errors/553eb4-2d42fd6f-41f0-11e6-bd66-b82a72d22a14
https://rpm.newrelic.com/accounts/677903/applications/5585473/traced_errors/553a8f-c32d44b2-41ee-11e6-bd66-b82a72d22a14
Updated•9 years ago
|
Flags: needinfo?(cdawson)
Comment 27•9 years ago
|
||
Instances where this occurs:
* SCL3-stage
* Heroku-stage
Instances were Pulse data ingestion is enabled:
* SCL3-stage
* Heroku-stage
So I think for now we need to treat this as blocking bug 1266229.
Blocks: 1266229
Assignee | ||
Comment 28•9 years ago
|
||
I had thought that perhaps we could just make a unique index on job_id, title, value. And, indeed, after trimming the size of ``value`` to 125, I was able to create an index with all 3. And having that index allowed me to query for duplicates with those 3 fields.
My hope was that any records that were dups in those 3 fields were just dups. But they're not. Well, these might be anomalies:
https://treeherder.allizom.org/#/jobs?repo=mozilla-inbound&revision=411f7ae1704d12648fab77210488692d23f42b74&filter-searchStr=e42fd356fb0d7e405385e12f4c4ca5c528ad67de&filter-tier=1&filter-tier=2&filter-tier=3
If you look in the Job Details tab, there are lots of dupes. I'm still hunting for another example that doesn't look like it was a fluke. :)
Flags: needinfo?(cdawson)
Comment 29•9 years ago
|
||
That seems like a bug. That job is from autophone.
Bob, are the duplicate entries on the job details tab here (eg "artifact uploaded", "Config: s1s2-twitter-remote.ini") expected?
https://treeherder.allizom.org/#/jobs?repo=mozilla-inbound&revision=411f7ae1704d12648fab77210488692d23f42b74&filter-searchStr=e42fd356fb0d7e405385e12f4c4ca5c528ad67de&filter-tier=1&filter-tier=2&filter-tier=3&selectedJob=23564721
Flags: needinfo?(bob)
Comment 30•9 years ago
|
||
Cameron, I'm seeing these exceptions on SCL3 stage:
django.db.utils:DataError: (1406, "Data too long for column 'value' at row 1")
in the log-parser task, during artifacts.py store_job_details():
https://rpm.newrelic.com/accounts/677903/applications/5585473/traced_errors/556195-b6d98887-4361-11e6-bd66-b82a72d22a14
Is this due to the change just made?
Comment 31•9 years ago
|
||
Ed, no those duplicates aren't expected. I looked at several other jobs during that time frame and several of the cancelled jobs had duplicate entries like the example you gave. I didn't see any examples with jobs which were not cancelled. My first impression would be there is something wrong with the way that I cancel jobs that results in multiple job details being submitted.
Flags: needinfo?(bob)
Assignee | ||
Comment 32•9 years ago
|
||
Talked to Ed and Will in our TH meeting and decided this doesn't block transition to Pulse ingestion.
These errors happen when we get the same job from BOTH pulse and api. The API submission on stage was accidentally switched on, which caused these errors.
Nevertheless, we need to fix this issue. So the plan is to :
1. Make some fields shorter so they can fit into a unique index (title, value, job_id)
2. delete job_detail entries that are in violation of the uniqueness in item 1
3. create that unique index from item 1
4. If we get a dup with a different url, we will just update the url value for the title/value/job_id entry
5. If users want multiple urls, they need to ensure to submit them with a unique title/value combination
To do this, I'll need to
Updated•9 years ago
|
Attachment #8761357 -
Attachment is obsolete: true
Updated•9 years ago
|
Attachment #8761321 -
Attachment is obsolete: true
Updated•9 years ago
|
Attachment #8760982 -
Attachment is obsolete: true
Updated•9 years ago
|
Attachment #8761357 -
Attachment is obsolete: false
Updated•9 years ago
|
Attachment #8762951 -
Attachment is obsolete: true
Assignee | ||
Comment 33•9 years ago
|
||
Oops, I never finished my thought from comment 32... :)
My plan is to have a few PRs in stages.
PR 1
====
1. Change field lengths so they can fit into the desired index.
2. Change the get_or_create to an update_or_create
SQL
===
delete duplicate records
create a unique index in title, value, job_id (coming in PR 2)
fake out the migration
PR 2
====
1. Create the migration for Unique index on title, value and job_id
Comment 34•9 years ago
|
||
Assignee | ||
Updated•9 years ago
|
Attachment #8769313 -
Flags: review?(garndt)
Updated•9 years ago
|
Attachment #8769313 -
Flags: review?(garndt) → review+
Comment 35•9 years ago
|
||
Commit pushed to master at https://github.com/mozilla/treeherder
https://github.com/mozilla/treeherder/commit/4cc00cf784b7ff4deb8cf37acf544f605cebb81e
Bug 1278711 - Set pulse schema jobdetails field lengths higher (#1665)
This sets the field lengths to what they will be in a later PR for
the job_details model. But these are still within the constraints
of the current field lengths for that table.
Comment 36•9 years ago
|
||
Assignee | ||
Updated•9 years ago
|
Attachment #8769377 -
Flags: review?(wlachance)
Assignee | ||
Comment 37•9 years ago
|
||
I will likely have to shuttle this migration through prod (maybe stage, too) by hand, since the job_detail table is quite large there.
Assignee | ||
Comment 38•9 years ago
|
||
I will remove duplicates on the job_detail table with this sql query I'll run by hand prior to creating the unique index:
delete
from job_detail using job_detail,
job_detail jt1
where job_detail.id < jt1.id
and job_detail.title = jt1.title
and job_detail.value = jt1.value
and job_detail.job_id = jt1.job_id
Comment 39•9 years ago
|
||
Assignee | ||
Comment 40•9 years ago
|
||
Comment on attachment 8769393 [details] [review]
[treeherder] mozilla:job_detail_unique_together_idx > mozilla:master
This is a follow-on to the other PR in this bug. So please review that one first. This one branches off that one, so please only compare the last commit on this PR.
Attachment #8769393 -
Flags: review?(wlachance)
Comment 41•9 years ago
|
||
Comment on attachment 8769377 [details] [review]
[treeherder] mozilla:jobdetail-uniqueness-fix > mozilla:master
I'd like us to consider using Django's _meta API here. Other than that this looks good.
Attachment #8769377 -
Flags: review?(wlachance) → review-
Comment 42•9 years ago
|
||
Comment on attachment 8769393 [details] [review]
[treeherder] mozilla:job_detail_unique_together_idx > mozilla:master
One nit, but this basically looks fine.
Attachment #8769393 -
Flags: review?(wlachance) → review+
Assignee | ||
Comment 43•9 years ago
|
||
Comment on attachment 8769377 [details] [review]
[treeherder] mozilla:jobdetail-uniqueness-fix > mozilla:master
Yep, that's a great improvement. Done. I'll have to keep ``_meta`` in mind for the future.
Attachment #8769377 -
Flags: review- → review?(wlachance)
Comment 44•9 years ago
|
||
(In reply to Cameron Dawson [:camd] from comment #38)
> I will remove duplicates on the job_detail table with this sql query I'll
> run by hand prior to creating the unique index:
>
> delete
> from job_detail using job_detail,
> job_detail jt1
> where job_detail.id < jt1.id
> and job_detail.title = jt1.title
> and job_detail.value = jt1.value
> and job_detail.job_id = jt1.job_id
I'm worried that this is going to be super slow unless you have an index (since my guess is that this is a O(n^2) operation). You may want to create a temporary non-unique index on job id, title and value before you do this.
Updated•9 years ago
|
Attachment #8769377 -
Flags: review?(wlachance) → review+
Pushes like
https://treeherder.mozilla.org/#/jobs?repo=try&revision=4244fed90395&selectedJob=24994251
are exhibiting this behaviour. The Task Group shows good results:
https://tools.taskcluster.net/task-group-inspector/#PjaFt0T3TwS2ERtQoYzkPQ//
Assignee | ||
Comment 46•9 years ago
|
||
Nick, ends up this is a similar, but not-quite-the-same issue as the one I believe you were seeing. I created bug 1291882 for that one. I have fixed the DB and I think it's resolved now. Please let me know if you see it again.
Comment 47•9 years ago
|
||
Assignee | ||
Updated•9 years ago
|
Attachment #8778474 -
Flags: review?(emorley)
Updated•9 years ago
|
Attachment #8778474 -
Flags: review?(emorley) → review+
Comment 48•9 years ago
|
||
Commit pushed to master at https://github.com/mozilla/treeherder
https://github.com/mozilla/treeherder/commit/0f2fd75019b1cdfacf40a27be9d2db5e7c9a8826
Bug 1278711 - Lower field lengths in job_detail table (#1666)
This is required in order to create a unique index on title,
value and job_id to prevent duplicates. The index will be
created in a later PR.
This also uses update_or_create instead of get_or_create as
this will be the mechanism going forward to prevent duplicates.
Comment 49•9 years ago
|
||
Commit pushed to master at https://github.com/mozilla/treeherder
https://github.com/mozilla/treeherder/commit/13943f6e2113924c83aa1944c7c6fd336fc177b9
Bug 1278711 - Add unique_together index for job_detail (#1667)
Assignee | ||
Comment 50•9 years ago
|
||
The field lengths have been changed and the commit for that (0f2fd75019b1cdfacf40a27be9d2db5e7c9a8826) has been pushed to production now. The migration for adding the unique index has already been faked in the migrations table. Pythian is going to begin the command to create that index now.
I will confirm it's been created tomorrow, then we can close this bug.
Comment 51•9 years ago
|
||
Clarifying summary / adding an explanation, since the tree closure message is pointing at this bug and there seems to be some confusion from people as to what to expect.
In order to fix occasional race conditions during data ingestion, an index had to be added to one of the larger database tables used by Treeherder. This was tested on stage, but look a while to complete, so for production Pythian suggested using their schema modification tools that allows adding an index without locking the table.
However this index addition (which is occurring in the dep bug 1293500) appears to have still caused a perf impact (judging from #treeherder IRC scrollback, there aren't logs for #data and I can't see any bug comments), meaning job ingestion started backing up at 01:30 UTC today.
As such, it Cameron pushed a branch that temporarily disabled Taskcluster job ingestion at, and all log parsing - so that the DB index creation could complete faster.
Looking at the CPU graphs I don't think the index creation has finished yet:
https://rpm.newrelic.com/accounts/677903/servers/6106888
Once it has, ingestion can be re-enabled, albeit there is a 64,000 taskcluster job ingestion backlog, and a 20,000 buildbot log parsing backlog.
Summary: Getting JobDetail errors during job ingestion MultipleObjectsReturned → Add missing indexes to the jobdetail table
Comment 52•9 years ago
|
||
(In reply to Ed Morley [:emorley] from comment #51)
> pushed a branch that temporarily disabled Taskcluster job ingestion at,
At 04:00 UTC
Comment 53•9 years ago
|
||
The index completion has now completed; job ingestion re-enabled at 09:45 UTC.
In addition, I've temporarily blocked the ActiveData-ETL user agent from making requests to the API, since it was responsible for 75% of total web requests, and included ones made to the table being altered (which will now be under heavy use now that data ingestion has resumed).
The DB CPU usage is maxed out at 100%, but is otherwise handling the backlog fine.
In the 40 mins since data ingestion resumed, the queues have fallen as follows:
* Taskcluster job ingestion queue: 64,000 -> 0
* Log parsing queues: 22,000 -> 33,000
Note that whilst the log parsing queue has risen, that's because the 64K Taskcluster jobs would have each put at least one log parsing task in the other queue.
The trees can now be reopened, since the log parsing backlog is for successful jobs only (the parsing of logs for failed jobs takes priority), so doesn't affect sheriffing (and will catch up shortly anyway).
Updated•9 years ago
|
Attachment #8761357 -
Attachment is obsolete: true
Comment 54•9 years ago
|
||
Log parser queue for successful jobs' logs is now also down to zero.
Comment 55•9 years ago
|
||
(In reply to Ed Morley [:emorley] from comment #53)
> In addition, I've temporarily blocked the ActiveData-ETL user agent from
> making requests to the API, since it was responsible for 75% of total web
> requests, and included ones made to the table being altered (which will now
> be under heavy use now that data ingestion has resumed).
Not the DB load has stabilised, I've pushed to prod again reversing the block.
Comment 56•9 years ago
|
||
s/Not/Now/
Comment 57•9 years ago
|
||
Assignee | ||
Comment 58•9 years ago
|
||
We can finally call this one fixed now. Thanks Ed, Will and everyone for your help and efforts on this issue.
Status: REOPENED → RESOLVED
Closed: 9 years ago → 9 years ago
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•