Closed
Bug 936615
Opened 11 years ago
Closed 11 years ago
Pandas should start buildbot when mozpool thinks the device is ready
Categories
(Infrastructure & Operations Graveyard :: CIDuty, task)
Infrastructure & Operations Graveyard
CIDuty
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: Callek, Assigned: Callek)
References
Details
Attachments
(1 file)
23.58 KB,
patch
|
armenzg
:
review+
pmoore
:
feedback+
|
Details | Diff | Splinter Review |
We currently run through a set of android sanity checks for pandas, and never contact mozpool. Mozpool however could leave a device in the "self test" image and declare it ready, which is not suiteable to pass the verify.py script. We should make verify.py call out to mozpool for state and go on to starting buildbot when it is in said ready state.
Comment 1•11 years ago
|
||
CCing sheriffs since this is the bug that will fix the panda pool shortage. See bug 932231 comment 2 for more context :-)
Assignee | ||
Comment 2•11 years ago
|
||
My current patch has the following annoyances, but IMHO is still an improvement over the current status, so I will move forward with it: * Increases end to end time by ~10 minutes when mozpool needs to image a device to android from the self test image * Increases end to end time by ~5 minutes when verify.py/cleanup needs to do tasks (e.g. uninstalling fennec) * Some instances in staging errors of: 19:18:11 INFO - 11/11/2013 19:18:11: INFO: Uninstalling org.mozilla.fennec... 19:18:12 INFO - 11/11/2013 19:18:12: INFO: Calling PDU powercycle for panda-0309, panda-relay-027.p3.releng.scl1.mozilla.com:2:4 --- followed by the device failing to properly come up/be sane. This may not be widespread but something to pay attention to, it could easily be related to not using mozpool for the reboots. It does however allow us to start buildbot even when we're not "Android" though, so I'm happy.
Assignee | ||
Comment 3•11 years ago
|
||
Pete is no python expert but has touched much of this code, armenzg is my favorite for being familiar with mozpoolclient. I'm open to bikeshed color improvement suggestions in this as well. (I'm not a big fan of most of my color choices)
Attachment #830612 -
Flags: review?(armenzg)
Attachment #830612 -
Flags: feedback?(pmoore)
Assignee | ||
Comment 4•11 years ago
|
||
Comment on attachment 830612 [details] [diff] [review] [tools] mozpoolclient Note that mozpoolclient-0.1.4/* is a dump from http://puppetagain.pub.build.mozilla.org/data/python/packages/mozpoolclient-0.1.4.tar.gz -- so is a static import of the version of the release mozpoolclient package, no review is necessary in that code
Comment 5•11 years ago
|
||
Comment on attachment 830612 [details] [diff] [review] [tools] mozpoolclient Review of attachment 830612 [details] [diff] [review]: ----------------------------------------------------------------- I think we have room to do more in the cases for MOZPOOL_STATE_ERROR and MOZPOOL_STATE_MISSING, however, getting this in should already give us benefits. Thanks Callek. ::: sut_tools/verify.py @@ +21,5 @@ > EXPECTED_DEVICE_SCREEN = 'X:1024 Y:768' > EXPECTED_DEVICE_SCREEN_ARGS = {'width': 1024, 'height': 768, 'type': 'crt'} > > +MOZPOOL_CNAME = "mobile-imaging" > +# CONSTS corresponding to mozpool related return codes. If you could paste a URL to the codes on the actual mozpool code that would be a bonus. I say this as they might have more info as to what each means. @@ +71,5 @@ > break # we're done here > return True > > +def isMozpoolReady(device): > + """ Checks if we mozpool is available and is ready "available" and "ready" sound very similar in this context. Are you referring to "mozpool, the host, is available" and "the device is on the ready" state? I think so from the rest of the code. s/we// @@ +82,5 @@ > + import socket > + default_timeout = socket.getdefaulttimeout() > + socket.setdefaulttimeout(5) # Don't let networking delay us too long > + try: > + dummy = socket.gethostbyname(MOZPOOL_CNAME) I don't think we need "dummy" at all. It doesn't bother me though. @@ +84,5 @@ > + socket.setdefaulttimeout(5) # Don't let networking delay us too long > + try: > + dummy = socket.gethostbyname(MOZPOOL_CNAME) > + except: > + log.info("No mozpool server in this VLAN") Would log.warning() (or higher) make more sense?
Attachment #830612 -
Flags: review?(armenzg) → review+
Assignee | ||
Comment 6•11 years ago
|
||
(In reply to Armen Zambrano [:armenzg] (Release Engineering) (EDT/UTC-4) from comment #5) > Comment on attachment 830612 [details] [diff] [review] > [tools] mozpoolclient > > Review of attachment 830612 [details] [diff] [review]: > ----------------------------------------------------------------- > > I think we have room to do more in the cases for MOZPOOL_STATE_ERROR and > MOZPOOL_STATE_MISSING, however, getting this in should already give us > benefits. > > Thanks Callek. > > ::: sut_tools/verify.py > @@ +21,5 @@ > > EXPECTED_DEVICE_SCREEN = 'X:1024 Y:768' > > EXPECTED_DEVICE_SCREEN_ARGS = {'width': 1024, 'height': 768, 'type': 'crt'} > > > > +MOZPOOL_CNAME = "mobile-imaging" > > +# CONSTS corresponding to mozpool related return codes. > > If you could paste a URL to the codes on the actual mozpool code that would > be a bonus. > I say this as they might have more info as to what each means. These are not actually return codes from mozpool itself, just the verify function I added here, I reworded to "# CONSTS corresponding to mozpool check return codes." > @@ +71,5 @@ > > break # we're done here > > return True > > > > +def isMozpoolReady(device): > > + """ Checks if we mozpool is available and is ready > > "available" and "ready" sound very similar in this context. > Are you referring to "mozpool, the host, is available" and "the device is on > the ready" state? > I think so from the rest of the code. > > s/we// Indeed on both points, fixed. > @@ +82,5 @@ > > + import socket > > + default_timeout = socket.getdefaulttimeout() > > + socket.setdefaulttimeout(5) # Don't let networking delay us too long > > + try: > > + dummy = socket.gethostbyname(MOZPOOL_CNAME) > > I don't think we need "dummy" at all. It doesn't bother me though. removed dummy > @@ +84,5 @@ > > + socket.setdefaulttimeout(5) # Don't let networking delay us too long > > + try: > > + dummy = socket.gethostbyname(MOZPOOL_CNAME) > > + except: > > + log.info("No mozpool server in this VLAN") > > Would log.warning() (or higher) make more sense? As mentioned on irc, this code path is hit for tegras as well, so I didn't want to log as a warning or worse when the state is expected for a portion of our pool for now, we can always change in the future though. -- Also fixed up pep8 fallacies, an .hgignore mistake (.egg-info) and added .egg-info for mozpoolclient to patch. Landed with https://hg.mozilla.org/build/tools/rev/9c1045ced0c6
Assignee | ||
Comment 7•11 years ago
|
||
This is now deployed, feel free to file bugs or ping me if you see anything "weird" suspected of being caused by this bug.
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Comment 8•11 years ago
|
||
Great - thank you :-) Could you file a bug for optimising the end-to-end time increases highlighted in comment 2 please?
Flags: needinfo?(bugspam.Callek)
Comment 10•11 years ago
|
||
Comment on attachment 830612 [details] [diff] [review] [tools] mozpoolclient Review of attachment 830612 [details] [diff] [review]: ----------------------------------------------------------------- Nice work Callek. :) Should we take any steps to ensure that the cloned mozpoolclient code stays up-to-date in this repo?
Attachment #830612 -
Flags: feedback?(pmoore) → feedback+
Comment 11•11 years ago
|
||
(In reply to Pete Moore [:pete][:pmoore] from comment #9) > Done - Bug 938060 Thank you :-)
Blocks: 938060
Updated•6 years ago
|
Component: Platform Support → Buildduty
Product: Release Engineering → Infrastructure & Operations
Updated•4 years ago
|
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•