Closed
Bug 1285127
Opened 9 years ago
Closed 9 years ago
Crash summary job for 2nd July is is failing
Categories
(Cloud Services Graveyard :: Metrics: Pipeline, defect, P1)
Cloud Services Graveyard
Metrics: Pipeline
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: rvitillo, Assigned: rvitillo)
Details
User Story
It seems that for the 2nd July any Scala based job eventually times out. I looked at the logs and found the following: - A bunch of warnings like the following: WARN heka.Dataset$: Failure to read file telemetry-2/20160702/telemetry/4/main/Firefox/nightly/41.0a1/20150603030208/20160702112128.029_ip-172-31-5-69: Unable to execute HTTP request: Timeout waiting for connection from pool - The thread dump of a running executor shows that it's waiting (TIMED_WAITING) at com.mozilla.telemetry.utils.S3Store$.getKey(S3.scala:43) Apparently Bucket.getObject gets a connection handler from a pool and doesn't release it even when garbage collected, which in turn causes tasks to wait for a very long time.
Attachments
(1 file)
No description provided.
Assignee | ||
Updated•9 years ago
|
Assignee: nobody → rvitillo
Severity: normal → blocker
Priority: -- → P1
Assignee | ||
Comment 1•9 years ago
|
||
Attachment #8768724 -
Flags: review?(mreid)
Assignee | ||
Updated•9 years ago
|
Attachment #8768724 -
Flags: review?(mreid) → review?(mdoglio)
Assignee | ||
Updated•9 years ago
|
User Story: (updated)
Updated•9 years ago
|
Attachment #8768724 -
Flags: review?(mdoglio) → review+
Assignee | ||
Comment 2•9 years ago
|
||
The missing day has been back-filled successfully.
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Updated•7 years ago
|
Product: Cloud Services → Cloud Services Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•