Closed
Bug 1248172
Opened 9 years ago
Closed 7 years ago
Autophone - treeherder job collections created with invalid job_guid
Categories
(Testing Graveyard :: Autophone, defect)
Testing Graveyard
Autophone
Tracking
(firefox47 affected)
RESOLVED
WONTFIX
Tracking | Status | |
---|---|---|
firefox47 | --- | affected |
People
(Reporter: bc, Unassigned)
References
Details
Attachments
(5 files)
In bug 1216578 we started queuing treeherder submissions to the treeherder table in the jobs database.
Overnight, it appear several entries were created with null job_guid. This stalled the submission of other pending treeherder jobs completely.
I also did not receive any email notifications that there was a problem.
The error in the log was:
Traceback (most recent call last):
File "/mozilla/autophone/autophone/autophonetreeherder.py", line 91, in post_request
client.post_collection(project, job_collection)
File "/mozilla/autophone/venv/local/lib/python2.7/site-packages/thclient/client.py", line 923, in post_collection
collection_inst.validate()
File "/mozilla/autophone/venv/local/lib/python2.7/site-packages/thclient/client.py", line 529, in validate
d.validate()
File "/mozilla/autophone/venv/local/lib/python2.7/site-packages/thclient/client.py", line 62, in validate
cb(prop.split('.'), required_properties[prop], prop)
File "/mozilla/autophone/venv/local/lib/python2.7/site-packages/thclient/client.py", line 117, in validate_existence
raise TreeherderClientError(msg, [])
TreeherderClientError: TreeherderJob structure validation errors detected for property:job.job_guid
Value not defined for job.job_guid
I am not sure of the root cause, but there were apparently network issues where downloads were failing due to incomplete downloads.
Reporter | ||
Comment 1•9 years ago
|
||
Reporter | ||
Comment 2•9 years ago
|
||
Reporter | ||
Comment 3•9 years ago
|
||
Reporter | ||
Comment 4•9 years ago
|
||
Apart from not creating invalid jobs, when we see structural fatal errors that are not due to transient network or treeherder server issues, we should make sure to email notification of the issues and not block on the bad jobs. We can either delete the structurally bad jobs from the database or at least skip over them so the other jobs are submitted in a timely fashion.
![]() |
||
Comment 5•9 years ago
|
||
(In reply to Bob Clary [:bc:] from comment #0)
> I also did not receive any email notifications that there was a problem.
Neither did I. That's odd, since there is code to send email here, and we've seen that type of email recently:
Phone: nexus-6p-2
TreeherderClientError: HTTPSConnectionPool(host='treeherder.mozilla.org', port=443): Max retries exceeded with url: /api/project/fx-team/jobs/ (Caused by ConnectTimeoutError(<requests.packages.urllib3.connection.VerifiedHTTPSConnection object at 0x7f1310187d90>, 'Connection to treeherder.mozilla.org timed out. (connect timeout=120)'))
Last attempt: None
Response: 2016-02-09T14:49:28.689539
(I note as an aside that the Response and Last attempt are reversed.)
![]() |
||
Updated•9 years ago
|
Assignee: nobody → gbrown
Reporter | ||
Comment 6•9 years ago
|
||
Heh. I haven't looked closer into what happened but I wonder if the failures related to the downloading and unzipping the build caused us to hit a dead lock for some reason. I mostly wanted to file this so we wouldn't lose the datum and could think about the potential causes at our leisure. heh. ;-)
![]() |
||
Comment 7•9 years ago
|
||
I'm not sure what conditions we want to use to decide when to discard a job - let's discuss.
In the mean time, I noticed these 3 issues related to email notification.
Attachment #8720027 -
Flags: review?(bob)
Reporter | ||
Comment 8•9 years ago
|
||
Comment on attachment 8720027 [details] [diff] [review]
improve mail notification
Review of attachment 8720027 [details] [diff] [review]:
-----------------------------------------------------------------
lgtm
Attachment #8720027 -
Flags: review?(bob) → review+
![]() |
||
Comment 9•9 years ago
|
||
https://github.com/mozilla/autophone/commit/d6823b825c7cd0db532f24efb61cabff0b1baf7a
There more to do here - figure out when to retry and when to discard a failed submission.
Assignee: gbrown → nobody
Reporter | ||
Comment 10•7 years ago
|
||
Autophone is going away. Resolving these to wontfix.
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → WONTFIX
Updated•3 years ago
|
Product: Testing → Testing Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•