Closed Bug 936615 Opened 11 years ago Closed 11 years ago

Pandas should start buildbot when mozpool thinks the device is ready

Categories

(Infrastructure & Operations Graveyard :: CIDuty, task)

task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: Callek, Assigned: Callek)

References

Details

Attachments

(1 file)

We currently run through a set of android sanity checks for pandas, and never contact mozpool.

Mozpool however could leave a device in the "self test" image and declare it ready, which is not suiteable to pass the verify.py script. We should make verify.py call out to mozpool for state and go on to starting buildbot when it is in said ready state.
Depends on: 936616
CCing sheriffs since this is the bug that will fix the panda pool shortage. See bug 932231 comment 2 for more context :-)
My current patch has the following annoyances, but IMHO is still an improvement over the current status, so I will move forward with it:

* Increases end to end time by ~10 minutes when mozpool needs to image a device to android from the self test image
* Increases end to end time by ~5 minutes when verify.py/cleanup needs to do tasks (e.g. uninstalling fennec)
* Some instances in staging errors of:
19:18:11     INFO -  11/11/2013 19:18:11: INFO: Uninstalling org.mozilla.fennec...
19:18:12     INFO -  11/11/2013 19:18:12: INFO: Calling PDU powercycle for panda-0309, panda-relay-027.p3.releng.scl1.mozilla.com:2:4
--- followed by the device failing to properly come up/be sane. This may not be widespread but something to pay attention to, it could easily be related to not using mozpool for the reboots.

It does however allow us to start buildbot even when we're not "Android" though, so I'm happy.
Pete is no python expert but has touched much of this code, armenzg is my favorite for being familiar with mozpoolclient.

I'm open to bikeshed color improvement suggestions in this as well. (I'm not a big fan of most of my color choices)
Attachment #830612 - Flags: review?(armenzg)
Attachment #830612 - Flags: feedback?(pmoore)
Comment on attachment 830612 [details] [diff] [review]
[tools] mozpoolclient

Note that mozpoolclient-0.1.4/* is a dump from http://puppetagain.pub.build.mozilla.org/data/python/packages/mozpoolclient-0.1.4.tar.gz -- so is a static import of the version of the release mozpoolclient package, no review is necessary in that code
Comment on attachment 830612 [details] [diff] [review]
[tools] mozpoolclient

Review of attachment 830612 [details] [diff] [review]:
-----------------------------------------------------------------

I think we have room to do more in the cases for MOZPOOL_STATE_ERROR and MOZPOOL_STATE_MISSING, however, getting this in should already give us benefits.

Thanks Callek.

::: sut_tools/verify.py
@@ +21,5 @@
>  EXPECTED_DEVICE_SCREEN = 'X:1024 Y:768'
>  EXPECTED_DEVICE_SCREEN_ARGS = {'width': 1024, 'height': 768, 'type': 'crt'}
>  
> +MOZPOOL_CNAME = "mobile-imaging"
> +# CONSTS corresponding to mozpool related return codes.

If you could paste a URL to the codes on the actual mozpool code that would be a bonus.
I say this as they might have more info as to what each means.

@@ +71,5 @@
>              break  # we're done here
>      return True
>  
> +def isMozpoolReady(device):
> +    """ Checks if we mozpool is available and is ready

"available" and "ready" sound very similar in this context.
Are you referring to "mozpool, the host, is available" and "the device is on the ready" state?
I think so from the rest of the code.

s/we//

@@ +82,5 @@
> +    import socket
> +    default_timeout = socket.getdefaulttimeout()
> +    socket.setdefaulttimeout(5) # Don't let networking delay us too long
> +    try:
> +        dummy = socket.gethostbyname(MOZPOOL_CNAME)

I don't think we need "dummy" at all. It doesn't bother me though.

@@ +84,5 @@
> +    socket.setdefaulttimeout(5) # Don't let networking delay us too long
> +    try:
> +        dummy = socket.gethostbyname(MOZPOOL_CNAME)
> +    except:
> +        log.info("No mozpool server in this VLAN")

Would log.warning() (or higher) make more sense?
Attachment #830612 - Flags: review?(armenzg) → review+
(In reply to Armen Zambrano [:armenzg] (Release Engineering) (EDT/UTC-4) from comment #5)
> Comment on attachment 830612 [details] [diff] [review]
> [tools] mozpoolclient
> 
> Review of attachment 830612 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> I think we have room to do more in the cases for MOZPOOL_STATE_ERROR and
> MOZPOOL_STATE_MISSING, however, getting this in should already give us
> benefits.
> 
> Thanks Callek.
> 
> ::: sut_tools/verify.py
> @@ +21,5 @@
> >  EXPECTED_DEVICE_SCREEN = 'X:1024 Y:768'
> >  EXPECTED_DEVICE_SCREEN_ARGS = {'width': 1024, 'height': 768, 'type': 'crt'}
> >  
> > +MOZPOOL_CNAME = "mobile-imaging"
> > +# CONSTS corresponding to mozpool related return codes.
> 
> If you could paste a URL to the codes on the actual mozpool code that would
> be a bonus.
> I say this as they might have more info as to what each means.

These are not actually return codes from mozpool itself, just the verify function I added here, I reworded to "# CONSTS corresponding to mozpool check return codes."

> @@ +71,5 @@
> >              break  # we're done here
> >      return True
> >  
> > +def isMozpoolReady(device):
> > +    """ Checks if we mozpool is available and is ready
> 
> "available" and "ready" sound very similar in this context.
> Are you referring to "mozpool, the host, is available" and "the device is on
> the ready" state?
> I think so from the rest of the code.
> 
> s/we//

Indeed on both points, fixed.

> @@ +82,5 @@
> > +    import socket
> > +    default_timeout = socket.getdefaulttimeout()
> > +    socket.setdefaulttimeout(5) # Don't let networking delay us too long
> > +    try:
> > +        dummy = socket.gethostbyname(MOZPOOL_CNAME)
> 
> I don't think we need "dummy" at all. It doesn't bother me though.

removed dummy

> @@ +84,5 @@
> > +    socket.setdefaulttimeout(5) # Don't let networking delay us too long
> > +    try:
> > +        dummy = socket.gethostbyname(MOZPOOL_CNAME)
> > +    except:
> > +        log.info("No mozpool server in this VLAN")
> 
> Would log.warning() (or higher) make more sense?

As mentioned on irc, this code path is hit for tegras as well, so I didn't want to log as a warning or worse when the state is expected for a portion of our pool for now, we can always change in the future though.

-- Also fixed up pep8 fallacies, an .hgignore mistake (.egg-info) and added .egg-info for mozpoolclient to patch.

Landed with https://hg.mozilla.org/build/tools/rev/9c1045ced0c6
This is now deployed, feel free to file bugs or ping me if you see anything "weird" suspected of being caused by this bug.
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Great - thank you :-)
Could you file a bug for optimising the end-to-end time increases highlighted in comment 2 please?
Flags: needinfo?(bugspam.Callek)
Done - Bug 938060
Flags: needinfo?(bugspam.Callek)
Comment on attachment 830612 [details] [diff] [review]
[tools] mozpoolclient

Review of attachment 830612 [details] [diff] [review]:
-----------------------------------------------------------------

Nice work Callek. :)

Should we take any steps to ensure that the cloned mozpoolclient code stays up-to-date in this repo?
Attachment #830612 - Flags: feedback?(pmoore) → feedback+
(In reply to Pete Moore [:pete][:pmoore] from comment #9)
> Done - Bug 938060

Thank you :-)
Blocks: 938060
Component: Platform Support → Buildduty
Product: Release Engineering → Infrastructure & Operations
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: