Closed Bug 1227552 Opened 4 years ago Closed 4 years ago

fetch-allthethings task exception "IntegrityError: (1062, "Duplicate entry 'W3C Web Platform Reftests-Wr' for key 'uni_name_symbol'")"

Categories

(Tree Management :: Treeherder, defect)

defect
Not set

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: emorley, Assigned: adusca)

References

Details

Attachments

(1 file)

47 bytes, text/x-github-pull-request
camd
: review+
Details | Review
https://rpm.newrelic.com/accounts/677903/applications/4180461/traced_errors/52946a-6ba9dfc0-9281-11e5-900f-b82a72d22a14

https://emorley.pastebin.mozilla.org/8852963
https://emorley.pastebin.mozilla.org/8852961

IntegrityError: (1062, "Duplicate entry 'W3C Web Platform Reftests-Wr' for key 'uni_name_symbol'") 

<emorley> adusca: the uni_name_symbol key is presumably from the unique together - perhaps those two fields are not unique on their own?
<adusca> emorley: it looks like the job group for this job is W on prod and Wr on stage?
<adusca> https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound&revision=1e426f4d6f36&filter-searchStr=w3c%20web%20platform%20reftests
<adusca> https://treeherder.allizom.org/#/jobs?repo=mozilla-inbound&revision=1e426f4d6f36&filter-searchStr=w3c%20web%20platform%20reftests
<emorley> ah that sounds like bug 1081600
<firebot> https://bugzil.la/1081600 — DUPLICATE, nobody@mozilla.org — Treeherder service isn't grouping "W3C Web Platform Reftests" even though buildbot.py was updated
<emorley> there's an awful lot broken with the existing reference_data_signatures and related tables, IMO we need to rip them all out and come up with a better schema
<emorley> camd has been looking into this area a bit


-> I think we need to have a whiteboard session in Orlando, where we sketch out how the reference_data_signature schema (and others; plus the whole "cramming OS vs platform vs product into one string, a la bug 1060769) should look like if we were starting from fresh -- and then see if there's a way to there from where we are now.

There are just quite a few bugs now coming out of this, and it would be good to stamp out the root cause.
Attached file Using default
Attachment #8691407 - Flags: review?(emorley)
Comment on attachment 8691407 [details] [review]
Using default

Cameron understands these tables better than I do.

During the initial review I hadn't seen these were get_or_create(), I thought fetch-allthethings was only modifying the runnable_job table and not the others too.
Attachment #8691407 - Flags: review?(emorley) → review?(cdawson)
Assignee: nobody → alicescarpa
With the patch in this bug, new rows are still added to other tables (build platform, machine platform, job type, job group, option collection) by fetch-allthethings. I think this makes sense, because a job is runnable as soon as it appears in allthethings.json, and that could happen before a job is triggered for the first time. 

Right now (and in the proposed fix) we never *change* rows in the other tables, just create them if necessary. However, if adding new rows to those tables is also undesirable, an alternative is to just skip the affected runnable jobs and don't import them. What do you prefer?
Flags: needinfo?(cdawson)
Comment on attachment 8691407 [details] [review]
Using default

Looks good.  Thanks for the fix.  :)
Flags: needinfo?(cdawson)
Attachment #8691407 - Flags: review?(cdawson) → review+
(In reply to Alice Scarpa [:adusca] from comment #3)
> With the patch in this bug, new rows are still added to other tables (build
> platform, machine platform, job type, job group, option collection) by
> fetch-allthethings. I think this makes sense, because a job is runnable as
> soon as it appears in allthethings.json, and that could happen before a job

Assuming we can trust that allthethings is correct, then I think creating them like this is fine.  It will save us that step during ingestion later on.

This feeds into the issue in bug 1215587 to decouple the group from the job type at some point.  But that's a different can of worms...  :)
> is triggered for the first time. 
> 
> Right now (and in the proposed fix) we never *change* rows in the other
> tables, just create them if necessary. However, if adding new rows to those
> tables is also undesirable, an alternative is to just skip the affected
> runnable jobs and don't import them. What do you prefer?
Commits pushed to master at https://github.com/mozilla/treeherder

https://github.com/mozilla/treeherder/commit/ab7cae899905935ca69a7697aba64a1c1062d259
Bug 1227552: Don't force a new job_group on fetch_allthethings

Since "W3C Web Platform Reftests" is being grouped as "W" instead
of "Wr" on production, we are getting a duplicate symbol error.

https://github.com/mozilla/treeherder/commit/5ef28f0110f062797b4a898e879dae06e43f72c2
Merge pull request #1163 from adusca/runnable-fix

Bug 1227552: Don't force a new job_group on fetch_allthethings
This has merged, but won't be on production until the next deploy. See:
http://whatsdeployed.io/?owner=mozilla&repo=treeherder&name[]=Stage&url[]=https://treeherder.allizom.org/revision.txt&name[]=Prod&url[]=https://treeherder.mozilla.org/revision.txt
Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.