Intermittent: Update verify, Task timeout after 7200 seconds. Force killing container.

RESOLVED FIXED

Status

defect
RESOLVED FIXED
Last year
8 months ago

People

(Reporter: jlorenzo, Assigned: jlorenzo)

Tracking

unspecified
Dependency tree / graph

Firefox Tracking Flags

(firefox63 fixed)

Details

Attachments

(1 attachment, 1 obsolete attachment)

Assignee

Description

Last year
It seems downloads took too long to be done within 2 hours:

> Saving to: ‘Firefox Setup 62.0b3.exe’
>      0K ........ ........ ........ ........ ........ ........  7% 2.42M 15s
>   3072K ........ ........ ........ ........ ........ ........ 15% 6.41M 9s
>   6144K ........ ........ ........ ........ ........ ........ 23% 6.46M 7s
>   9216K ........ ........ ........ ........ ........ ........ 31% 7.08M 6s
>  12288K ........ ........ ........ ........ ........ ........ 38% 6.96M 5s
>  15360K ........ ........ ........ ........ ........ ........ 46% 7.72M 4s
>  18432K ........ ........ ........ ......
> [taskcluster:error] Task timeout after 7200 seconds. Force killing container.


https://tools.taskcluster.net/groups/IO1RufPdQVCe7YN5QbgFaA/tasks/XWliNlKYR-asfE0v0yX3YQ/runs/0/logs/public%2Flogs%2Flive_backing.log#L20573
Assignee

Updated

Last year
Summary: Intermittent: Task timeout after 7200 seconds. Force killing container. → Intermittent: Update verify, Task timeout after 7200 seconds. Force killing container.
Assignee

Comment 1

Last year
Update verify: double number of chunks
Looking at the other update-verify tasks on that build[1] (deved and firefox), it looks like all of the successful tasks take <45m, with all but a few running in 30±5m. That strongly suggests that *something* was going wrong in that task (since it appears to be taking at least 4x as long. I'm not sure that just doubling them number of chunks is appropriate.

[1] https://treeherder.mozilla.org/#/jobs?repo=mozilla-beta&revision=801112336847960bbb9a018695cf09ea437dc137&filter-searchStr=update%20verify&selectedJob=186062304
Comment hidden (Intermittent Failures Robot)
Assignee

Comment 6

Last year
Thanks for looking more into it, Tom! I agree, something's up with both jobs. Like you said for the first job, the longest gap is IO related, which may indicate a bad hard drive.

The second job is more odd, it seems it just got stuck in the middle of an extraction[1]. It stayed 50 minutes in there. 

So, splitting up jobs is not the right long term fix. I wonder though, if it would make the first kind of slowness less visible. Because this test job is not testing the product, we're not sweeping a genuine product slowness under the rug. This may also speed the tests up, thus rerunning a failed job would be less expensive (time-wise).

What do you think about it? 

[1] https://tools.taskcluster.net/groups/epjv71NBS5-1HekFjXKskw/tasks/TTv8CAEbSiylqRxZsnCC5g/runs/0/logs/public%2Flogs%2Flive_backing.log#L11129-11153
Flags: needinfo?(mozilla)
These typically run for 25-50 minutes. So, if they are taking longer than that,
there is likely a problem.
It looks like we haven't run into this since this was originally reported. I realize that this isn't masking an issue in product performance, but it would be masking an issue in rel-eng performance. I think we should cut down the runtime, and if the issue keeps occurring, we can investigate the cause.
Flags: needinfo?(mozilla)
Comment on attachment 8993484 [details]
Bug 1472930: [release] Decrease max runtime for update-verify; r?jlorenzo

Johan Lorenzo [:jlorenzo] has approved the revision.

https://phabricator.services.mozilla.com/D2252
Attachment #8993484 - Flags: review+

Comment 10

11 months ago
Pushed by mozilla@hocat.ca:
https://hg.mozilla.org/integration/autoland/rev/803bf61b6073
[release] Decrease max runtime for update-verify; r=jlorenzo

Comment 11

11 months ago
bugherder
https://hg.mozilla.org/mozilla-central/rev/803bf61b6073
Status: NEW → RESOLVED
Closed: 11 months ago
Resolution: --- → FIXED

Updated

10 months ago
Attachment #8990293 - Attachment is obsolete: true
Depends on: 1499265
You need to log in before you can comment on or make changes to this bug.