Bugzilla

Assignee

Updated

•

9 years ago

Blocks: 1231361

Assignee

Comment 3

•

9 years ago

Comment on attachment 8707255 [details] [review] [treeherder] wlach:1239185 > mozilla:master So this is mostly working, except `test_job_list_filter_fields` with a parameter of `job_group_id` (tests/webapp/api/test_jobs_api.py) doesn't work, because it looks like a stray JobGroup object from `test_ingest_job_with_updated_job_group` (tests/model/derived/test_jobs_model.py) is sticking around, which throws off the id of the ingested job group by 1. At least on my local machine (still waiting for travis to finish) Mauro, do you know what's going on? I thought we were supposed to be flushing the db between tests? I tried some techniques from here http://pytest-django.readthedocs.org/en/latest/database.html and they didn't seem to help. :( Aside from that, I think this is pretty close to being able to land. Some nice code deletion action -- after this lands, we'll be pretty close to being able to remove the reference data model altogether.

Attachment #8707255 - Flags: feedback?(mdoglio)

Mauro Doglio [:mdoglio]

Comment 4

•

9 years ago

I confirm the reference data tables should be truncated at the end of each test with a series of truncate tables. I wasn't able to understand what the problem is in a couple of hours. I should spend more time to find it out.

Mauro Doglio [:mdoglio]

Comment 5

•

9 years ago

Comment on attachment 8707255 [details] [review] [treeherder] wlach:1239185 > mozilla:master Ooops I forgot to clear the f? flag yesterday.

Attachment #8707255 - Flags: feedback?(mdoglio)

Assignee

Comment 6

•

9 years ago

Comment on attachment 8707255 [details] [review] [treeherder] wlach:1239185 > mozilla:master Ok, I figured it out! Well partly. We were incorrectly creating the second job group in the unit test in question (test_ingest_job_with_updated_job_group), which somehow got persisted. So fixing that makes everything pass. I added a check to the unit test to make sure the second job group doesn't get created, so our testing is at least a little more comprehensive now. There's still the question of why things didn't get cleaned up, but that's almost certainly not a new issue. I've done some basic checking locally to make sure that ingestion still seems to work reasonably-- it does. I suspect the thing to do is experiment with landing this to stage for a couple hours tomorrow and see how things go. Does that sound ok?

Attachment #8707255 - Flags: review?(mdoglio)

Assignee

Comment 7

•

9 years ago

This has been running fine on stage since last Thursday, there appears to be no noticeable performance differences from doing things this way.

Mauro Doglio [:mdoglio]

Comment 8

•

9 years ago

Comment on attachment 8707255 [details] [review] [treeherder] wlach:1239185 > mozilla:master fix it, then ship it!

Attachment #8707255 - Flags: review?(mdoglio) → review+

Comment 9

•

9 years ago

Commit pushed to master at https://github.com/mozilla/treeherder https://github.com/mozilla/treeherder/commit/161b3635aa70562147c16e8cb4f87a3949a67127 Bug 1239185 - Remove use of reference data model in job ingestion

Comment 10

•

9 years ago

Unfortunately there are some new errors on stage that seem related to this: https://rpm.newrelic.com/accounts/677903/applications/5585473/filterable_errors?tw%5Bend%5D=1458044844&tw%5Bstart%5D=1457958444#/heatmap?top_facet=transactionUiName&primary_facet=error.message&barchart=barchart&_k=mwmdpc eg: /treeherder.webapp.api.jobs:JobsViewSet.create django.db.utils:DataError: (1265, "Data truncated for column 'review_status' at row 1") https://rpm.newrelic.com/accounts/677903/applications/5585473/traced_errors/553eb4-6a433091-eaa9-11e5-a1e3-b82a72d22a14 /treeherder.webapp.api.jobs:JobsViewSet.create treeherder.model.models:MultipleObjectsReturned: get() returned more than one Machine -- it returned 2! https://rpm.newrelic.com/accounts/677903/applications/5585473/traced_errors/553eb4-62fd7783-eaa9-11e5-a1e3-b82a72d22a14 /treeherder.webapp.api.artifact:ArtifactViewSet.create sonschema.exceptions:ValidationError: 'name' is a required propertyFailed validating 'required' in schema['properties']['suites']['items']: ... https://rpm.newrelic.com/accounts/677903/applications/5585473/traced_errors/553eb4-228fd653-ea8a-11e5-a1e3-b82a72d22a14 (though this last one, whilst previously unseen, may be unrelated to this bug)

Comment 11

•

9 years ago

bc mentioned on IRC that his autophone jobs on stage were not showing up on aurora/beta: https://treeherder.allizom.org/#/jobs?repo=mozilla-aurora&revision=5e9097f4bf87&filter-tier=1&filter-tier=2&filter-tier=3&exclusion_profile=false&filter-searchStr=autophone https://treeherder.allizom.org/#/jobs?repo=mozilla-beta&revision=2a62a6f3237e&filter-tier=1&filter-tier=2&filter-tier=3&exclusion_profile=false&filter-searchStr=autophone but were on inbound: https://treeherder.allizom.org/#/jobs?repo=mozilla-inbound&revision=a794e56f887d&filter-tier=1&filter-tier=2&filter-tier=3&exclusion_profile=false&filter-searchStr=autophone

Assignee

Comment 12

•

9 years ago

Thanks for following up on this Ed! (In reply to Ed Morley [:emorley] from comment #10) > /treeherder.webapp.api.jobs:JobsViewSet.create > django.db.utils:DataError: (1265, "Data truncated for column > 'review_status' at row 1") > https://rpm.newrelic.com/accounts/677903/applications/5585473/traced_errors/ > 553eb4-6a433091-eaa9-11e5-a1e3-b82a72d22a14 This looks like a problem with the database (probably caused by lax enforcement of the schema when we were using datasource). I'd be tempted to solve the problem by just removing the review_status and review_timestamp fields, since they are currently unused. Thoughts? > /treeherder.webapp.api.jobs:JobsViewSet.create > treeherder.model.models:MultipleObjectsReturned: get() returned more than > one Machine -- it returned 2! > https://rpm.newrelic.com/accounts/677903/applications/5585473/traced_errors/ > 553eb4-62fd7783-eaa9-11e5-a1e3-b82a72d22a14 Hmm, likewise this looks more like an existing problem that the patch showed up. Should we really have two machines with the same name in the db? I'm tempted to remove the duplicate entry and add a unique constraint to the schema. > /treeherder.webapp.api.artifact:ArtifactViewSet.create > sonschema.exceptions:ValidationError: 'name' is a required propertyFailed > validating 'required' in schema['properties']['suites']['items']: ... > https://rpm.newrelic.com/accounts/677903/applications/5585473/traced_errors/ > 553eb4-228fd653-ea8a-11e5-a1e3-b82a72d22a14 > (though this last one, whilst previously unseen, may be unrelated to this > bug) Yes, that's bug 1256408. :)

Comment 13

•

9 years ago

> This looks like a problem with the database (probably caused by lax enforcement of the schema when we were using datasource). I'd be tempted to solve the problem by just removing the review_status and review_timestamp fields, since they are currently unused. Thoughts? Sounds good to me :-)

GitHub Autolander Bot

Comment 14

•

9 years ago

Attached file [treeherder] wlach:1239185-followup > mozilla:master — Details

Assignee

Comment 15

•

9 years ago

Comment on attachment 8730877 [details] [review] [treeherder] wlach:1239185-followup > mozilla:master The machine uniqueness problem will need a manual fix on stage before this lands, which I'll take care of seperately.

Attachment #8730877 - Flags: review?(emorley)

Comment 16

•

9 years ago

Comment on attachment 8730877 [details] [review] [treeherder] wlach:1239185-followup > mozilla:master Looks good, thank you :-)

Attachment #8730877 - Flags: review?(emorley) → review+

Comment 17

•

9 years ago

Commits pushed to master at https://github.com/mozilla/treeherder https://github.com/mozilla/treeherder/commit/22db983c67f67a1c60fbc406bc4defeb1047597c Bug 1239185 - Remove review fields from reference data signatures https://github.com/mozilla/treeherder/commit/8d38bb5b5afc4e620ce9eaf535d5135c32e3080f Bug 1239185 - Insist that machine name be unique

Assignee

Comment 18

•

9 years ago

I think I fixed all the issues with duplicate machines on stage (there were no dupes on prod), and the migrations above should ensure this doesn't happen again. I'm going to call this fixed - please let me know if you encounter further issues.

Status: NEW → RESOLVED

Closed: 9 years ago

Resolution: --- → FIXED

Comment 19

•

9 years ago

I received a high error alert email from New Relic this morning (stage is at 20% exception rate). There are two new exceptions: 1) treeherder.webapp.api.jobs:JobsViewSet.create django.db.utils:OperationalError (1054, "Unknown column 'reference_data_signatures.review_timestamp' in 'field list'") https://rpm.newrelic.com/accounts/677903/applications/5585473/traced_errors/553eb4-e27bfe8d-eb63-11e5-a249-b82a72d22a14 2) treeherder.webapp.api.jobs:JobsViewSet.create django.db.utils:DataError (1406, "Data too long for column 'platform' at row 1") https://rpm.newrelic.com/accounts/677903/applications/5585473/traced_errors/553eb7-e7ecfc4c-eb62-11e5-a249-b82a72d22a14 #1 is due to the migration from the followup fix above (comment 17) having been run when Cameron tested the (rebased on master) revision hash feature branch on stage, however following that, stage was then reverted back to the 'stage' branch which is behind master, and so doesn't include the changes that need to go along with the new schema. Fixing this should be as simple as pushing current master to the stage branch, so the comment 17 commits get re-deployed (which I'll do now). In the case of rolling back changes on stage or testing feature branches, we need to keep an eye out for migrations and adjust branch points accordingly. #2 seems new and possibly related to this bug - Will, would you mind taking a look? Thank you for battling through this! :-)

Status: RESOLVED → REOPENED

Resolution: FIXED → ---

Comment 20

•

9 years ago

Thanks to the recent improvements to the custom request attributes reported to New Relic, we can tell that the #2 case is being submitted by Taskcluster, with example payload of: {u'project': u'try', u'job': {u'build_platform': {u'platform': u'android-4-0-armv7-api15-partner1', u'os_name': u'-', u'architecture': u'-'}, u'submit_timestamp': 1458123450, u'build_system_type': u'taskcluster', u'name': u'[TC] Android armv7 API 15+ part ... (Custom attributes are truncated to 256 chars by the New Relic Python agent for perf reasons) Perhaps we were just silently truncating 'android-4-0-armv7-api15-partner1' before?

Assignee

Comment 21

•

9 years ago

(In reply to Ed Morley [:emorley] from comment #20) > Thanks to the recent improvements to the custom request attributes reported > to New Relic, we can tell that the #2 case is being submitted by > Taskcluster, with example payload of: > > {u'project': u'try', u'job': {u'build_platform': {u'platform': > u'android-4-0-armv7-api15-partner1', u'os_name': u'-', u'architecture': > u'-'}, u'submit_timestamp': 1458123450, u'build_system_type': > u'taskcluster', u'name': u'[TC] Android armv7 API 15+ part ... > > (Custom attributes are truncated to 256 chars by the New Relic Python agent > for perf reasons) > > Perhaps we were just silently truncating 'android-4-0-armv7-api15-partner1' > before? Quite possibly... I'll submit a PR to bump the platform length to 50 (it is currently 25, and 'android-4-0-armv7-api15-partner1' is 32).

GitHub Autolander Bot

Comment 22

•

9 years ago

Attached file [treeherder] wlach:1239185-machinelength > mozilla:master — Details

Assignee

Comment 23

•

9 years ago

Comment on attachment 8731288 [details] [review] [treeherder] wlach:1239185-machinelength > mozilla:master I wound up setting it to 100, which sounds a bit excessive, but is the same as in the reference data signature model.

Attachment #8731288 - Flags: review?(emorley)

Comment 24

•

9 years ago

Comment on attachment 8731288 [details] [review] [treeherder] wlach:1239185-machinelength > mozilla:master r+ with the migration script change added :-)

Attachment #8731288 - Flags: review?(emorley) → review+

Comment 25

•

9 years ago

Commit pushed to master at https://github.com/mozilla/treeherder https://github.com/mozilla/treeherder/commit/56621187bc2329b00d69c303945c9be85d8c71a3 Bug 1239185 - Bump maximum platform length to 100 characters in all cases It was 100 in the reference data signature, but only 25 in the build and machine platform models.

Comment 26

•

9 years ago

Commit pushed to master at https://github.com/mozilla/treeherder https://github.com/mozilla/treeherder/commit/5e01a286c76cd92c4d8561442857201c3f08a4e6 Bug 1239185 - Fix migration merge conflict

Comment 27

•

9 years ago

Is this new? (not looked at the code, eating dinner at the moment) https://rpm.newrelic.com/accounts/677903/applications/5585473/filterable_errors#/show/553eb4-0414c2ca-ebb4-11e5-a249-b82a72d22a14/stack_trace?top_facet=transactionUiName&primary_facet=error.class&barchart=barchart&_k=ecuett treeherder.webapp.api.jobs:JobsViewSet.create ... File "/data/www/treeherder.allizom.org/treeherder-service/treeherder/webapp/api/utils.py", line 105, in use_jobs_modelFile "/data/www/treeherder.allizom.org/treeherder-service/treeherder/webapp/api/jobs.py", line 230, in createFile "/data/www/treeherder.allizom.org/treeherder-service/treeherder/model/derived/jobs.py", line 1006, in store_job_dataFile "/data/www/treeherder.allizom.org/treeherder-service/treeherder/model/derived/jobs.py", line 1419, in _set_data_ids exceptions:KeyError: u'3724a051484fba3b5a4bc5826a3286270b0ccb9e'

Assignee

Comment 28

•

9 years ago

(In reply to Ed Morley [:emorley] from comment #27) > Is this new? (not looked at the code, eating dinner at the moment) > https://rpm.newrelic.com/accounts/677903/applications/5585473/ > filterable_errors#/show/553eb4-0414c2ca-ebb4-11e5-a249-b82a72d22a14/ > stack_trace?top_facet=transactionUiName&primary_facet=error. > class&barchart=barchart&_k=ecuett > > treeherder.webapp.api.jobs:JobsViewSet.create > > ... > File > "/data/www/treeherder.allizom.org/treeherder-service/treeherder/webapp/api/ > utils.py", line 105, in use_jobs_modelFile > "/data/www/treeherder.allizom.org/treeherder-service/treeherder/webapp/api/ > jobs.py", line 230, in createFile > "/data/www/treeherder.allizom.org/treeherder-service/treeherder/model/ > derived/jobs.py", line 1006, in store_job_dataFile > "/data/www/treeherder.allizom.org/treeherder-service/treeherder/model/ > derived/jobs.py", line 1419, in _set_data_ids > exceptions:KeyError: u'3724a051484fba3b5a4bc5826a3286270b0ccb9e' Hmm, the line in question is related to `revision_hash` and `result sets`: https://github.com/mozilla/treeherder/blob/326b918/treeherder/model/derived/jobs.py#L1419 I'd suspect camd's recent testing of his revision_hash work is the more likely culprit here.