Closed Bug 1214531 Opened 5 years ago Closed 5 years ago

Autophone - submitting Talos jobs to Treeherder causes 500 Server errors.

Categories

(Testing :: Autophone, defect)

defect
Not set

Tracking

(firefox44 affected)

RESOLVED FIXED
Tracking Status
firefox44 --- affected

People

(Reporter: bc, Assigned: jmaher)

References

Details

Attachments

(1 file)

We have been getting 500 Server Errors when submitting Talos jobs to Treeherder. It appears to be related to the job_group_id as some of the errors returned a response {"detail": "job_group_id"}.

The two Talos jobs use the following:

job_name = Autophone Tp4m
job_symbol = tpn
group_name = Autophone
group_symbol = A


[treeherder]
job_name = Autophone Tsvg
job_symbol = svg
group_name = Autophone
group_symbol = A

emorley: Do we need to register these job/groups names and symbols before we begin submitting jobs to Treeherder?
Flags: needinfo?(emorley)
This is presumably something weird with the job signatures/job groups - Cameron knows this code better than I - could you take a look? :-)
Flags: needinfo?(emorley) → needinfo?(cdawson)
There's a conflict with the symbols/names:

Jobs:
id	job_group_id	symbol	name
153	7	        tpn	"Talos tp nochrome"
4284	24	        tpn	"Autophone Tp4m"

Groups:
id	symbol	name
7	T	"Talos Performance"
24	A	Autophone

Though this SHOULD be just cause it to choose the wrong group (of T).  Not sure yet where the 500 is coming from.
I'm trying to build a test for that.
Flags: needinfo?(cdawson)
Interesting.  The entry for "svg" in the job_type table has a job_group_id of NULL.  This really shouldn't be possible.  It should have the id for "?" group, if there was no group.

So I'm trying to create a test case that can cause this, but no luck yet.  Here's what I'm seeing on stage, however:

id	job_group_id	symbol	name
4343	NULL		ym1-60	"MSE YouTube Playback Medium1 60min"
4344	NULL		svg	"Autophone Tsvg"
4345	NULL		m-e10s	"MSE Video Playback (e10s)"

Interesting that they're all consecutive ids.  Hrm...
On production, the only record that has a NULL job_group_id is "X19" [TC] Mulet XPCShell

I went ahead and fixed that record on stage.  Perhaps it was a fluke that data was being ingested right when we pushed new code and it got vexed?  Really hard to say.

I'm not sure what group the other symbols should belong to, so I didn't fix those.  

Ahh data integrity errors.  What fun...
lastly there are 2 other changes which allow submission to allizom:
1)  "./manage.py add_perf_framework autophone" on the server
2) adjusting the json object sent from autophone to have "framework: {name: autophone}" instead of what was defined in https://bugzilla.mozilla.org/show_bug.cgi?id=1175295#c3 as "framework: autophone"

I still have to verify svg and rck3 work- it sounds like the investigation into the data integrity that camd did will help reduce one more headache!
Blocks: 1170685
Assignee: bob → jmaher
Attachment #8675044 - Flags: review?(bob)
Attachment #8675044 - Flags: review?(bob) → review+
https://github.com/mozilla/autophone/commit/4d12f12e6ac099bc68eb4d38d4d9c1a820a26cbf
Status: ASSIGNED → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
pulled on the server though not activated yet.
You need to log in before you can comment on or make changes to this bug.