Closed Bug 895466 Opened 12 years ago Closed 12 years ago

Increase mozpool request time

Tracking

(Not tracked)

Status:

RESOLVED FIXED

People

(Reporter: gbrown, Assigned: Callek)

References

Details

Attachments

(2 files)

[mozharness] Up duration of request to 4 hours for android 12 years ago Justin Wood (:Callek) 882 bytes, patch	mozilla : review+	Details \| Diff \| Splinter Review
[mozharness] do it for android as intended 12 years ago Justin Wood (:Callek) 1.10 KB, patch	gbrown : review+	Details \| Diff \| Splinter Review

Geoff Brown [:gbrown]

Reporter

Description

•

12 years ago

There is concern that the mozpool request_device duration at: http://mxr.mozilla.org/build/source/mozpool/mozpoolclient/mozpoolclient/mozpoolclient.py#182 may be too short, and that may be causing, for instance, the rc2 timeouts tracked in bug 883539. We should consider increasing this to 60 minutes or more. (:jmaher also notes that there are buildbot timeouts to consider...he thinks they are 60 minutes.)

Armen [:armenzg]

Comment 1

•

12 years ago

If I understand it correctly, that is *only* a timeout for requesting a device to become available. This timeout could only happen at the beginning of the job, before even one line of the tests would be run. Could you please paste a log to see? Just to make sure that it is as I say or if I got it wrong.

Justin Wood (:Callek)

Assignee

Comment 2

•

12 years ago

(In reply to Armen Zambrano G. [:armenzg] (Release Enginerring) (EDT/UTC-4) from comment #1) > If I understand it correctly, that is *only* a timeout for requesting a > device to become available. I checked with dustin and jake about this, the duration param to that function is indeed a value letting mozpool how long we 'want' the device for. And has nothing to do with the initial request timeout. > Could you please paste a log to see? Just to make sure that it is as I say > or if I got it wrong. The logs are not as informative (or easy to splice), atm, I need to convince the mozpool people that the way I envision it is worthwhile! :-)

Dustin J. Mitchell [:dustin] (he/him)

Comment 3

•

12 years ago

Once the request duration runs out, the request is closed, and whatever user requested the device no longer "owns" the device. At that point, Mozpool is free to do whatever it would like to the device - reboot, selftest, reimage, mine bitcoins, etc. You can either request a sufficiently-long duration up front, or renew the request before it expires.

Armen [:armenzg]

Comment 4

•

12 years ago

Knowing that, fixing it should fix the issues pointed out.

Justin Wood (:Callek)

Assignee

Comment 5

•

12 years ago

Attached patch [mozharness] Up duration of request to 4 hours for android — Details — Splinter Review

So currently, the no-output timeout on these jobs is set to 40-minutes, while the max-time they are allowed to run is 4 hours. (This includes all setup and teardown of mozharness) This patch makes the mozpool request last 4 hours, in theory this is fine even for short jobs since we should always be releasing the request when we're done with the job (even on failed jobs) - anyplace that doesn't hold true will need to be dealt with though.

Assignee: nobody → bugspam.Callek

Status: NEW → ASSIGNED

Attachment #779337 - Flags: review?(aki)

Aki Sasaki (not active)

Updated

•

12 years ago

Attachment #779337 - Flags: review?(aki) → review+

Justin Wood (:Callek)

Assignee

Comment 6

•

12 years ago

Landed and deployed to production: https://hg.mozilla.org/build/mozharness/rev/e79596c26aae

Status: ASSIGNED → RESOLVED

Closed: 12 years ago

Resolution: --- → FIXED

Nobody; OK to take it and work on it

Updated

•

12 years ago

Product: mozilla.org → Release Engineering

Justin Wood (:Callek)

Assignee

Comment 7

•

12 years ago

So, based on findings in Bug 883539 and a chat with :gbrown on IRC it turns out that I didn't actually do this bug right, I merely increased duration for *b2g* and not android. I'm sad that I made that mistake, and even sadder that it wasted time on a few fronts and wasn't caught until now. I'll spin up a new patch today.

Status: RESOLVED → REOPENED

Flags: needinfo?(bugspam.Callek)

Resolution: FIXED → ---

Justin Wood (:Callek)

Assignee

Comment 8

•

12 years ago

Attached patch [mozharness] do it for android as intended — Details — Splinter Review

Attachment #8368663 - Flags: review?(gbrown)

Flags: needinfo?(bugspam.Callek)

Geoff Brown [:gbrown]

Reporter

Updated

•

12 years ago

Attachment #8368663 - Flags: review?(gbrown) → review+

Justin Wood (:Callek)

Assignee

Comment 9

•

12 years ago

merged to production

Status: REOPENED → RESOLVED

Closed: 12 years ago → 12 years ago

Resolution: --- → FIXED

Geoff Brown [:gbrown]

Reporter

Comment 10

•

12 years ago

Unfortunately, this did not fix the robocop 30 minute retries: https://tbpl.mozilla.org/php/getParsedLog.php?id=33894396&tree=Try&full=1#error0. Thanks anyway -- it was worth a try.

Justin Wood (:Callek)

Assignee

Comment 11

•

12 years ago

actually it did help! 11:40:17 INFO - Waiting for request 'ready' stage. Current state: 'contact_lifeguard' 11:41:17 INFO - Waiting for request 'ready' stage. Current state: 'pending' 11:42:17 INFO - Waiting for request 'ready' stage. Current state: 'pending' 11:43:17 INFO - Running command: ['/tools/buildbot/bin/python', '/builds/sut_tools/verify.py'] ... 12:15:00 INFO - 42 INFO SimpleTest FINISHED 12:15:00 INFO - INFO | automation.py | Application ran for: 0:01:40.592312 12:15:00 INFO - INFO | zombiecheck | Reading PID log: /tmp/tmpFIKeeHpidlog 12:15:00 INFO - /data/anr/traces.txt not found 12:15:01 INFO - WARNING | leakcheck | refcount logging is off, so leaks can't be detected! 12:15:01 INFO - runtests.py | Running tests: end. 12:15:09 INFO - MochitestServer : launching [u'/builds/panda-0270/test/build/hostutils/bin/xpcshell', '-g', '/builds/panda-0270/test/build/hostutils/xre', '-v', '170', '-f', '/builds/panda-0270/test/build/hostutils/bin/components/httpd.js', '-e', "const _PROFILE_PATH = '/tmp/tmpcRRPj1'; const _SERVER_PORT = '30270'; const _SERVER_ADDR = '10.12.130.18'; const _TEST_PREFIX = undefined; const _DISPLAY_RESULTS = false;", '-f', './server.js'] 12:15:09 INFO - runtests.py | Server pid: 16259 12:15:10 INFO - runtests.py | Running tests: start. 12:16:36 INFO - Robocop process name: org.mozilla.fennec 12:17:27 INFO - Traceback (most recent call last): .... 12:18:01 INFO - Request 'http://mobile-imaging-003.p3.releng.scl1.mozilla.com/api/request/860091/' deleted on cleanup Mozpool agrees with that: 2014-01-31T11:42:40 statemachine sending imaging result 'complete' to Mozpool 2014-01-31T11:42:40 statemachine entering state ready 2014-01-31T12:22:53 sut connecting to SUT agent 2014-01-31T12:33:03 sut connecting to SUT agent 2014-01-31T12:43:04 sut connecting to SUT agent So that is 33 minutes instead of 30-on-the-dot. With mozpool never checking on the device until ~4 minutes after our request was cleaned up, after the job was done. So while _this_ is fixed, it certainly spells of some other issue in the test harness/tests

Nobody; OK to take it and work on it

Updated

•

7 years ago

Component: General Automation → General

You need to log in before you can comment on or make changes to this bug.

Bugzilla

Increase mozpool request time

Categories

(Release Engineering :: General, defect)

Tracking

(Not tracked)

People

(Reporter: gbrown, Assigned: Callek)

References

Details

Crash Data

Security

(public)

User Story

Attachments

(2 files)

Description

Comment 1

Comment 2

Comment 3

Comment 4

Comment 5

Updated

Comment 6

Updated

Comment 7

Comment 8

Updated

Comment 9

Comment 10

Comment 11

Updated

Attachment

General

Description

File Name

Content Type