mozpool - A failing device request should retrigger the job

RESOLVED FIXED

Status

Release Engineering
General
RESOLVED FIXED
6 years ago
2 months ago

People

(Reporter: armenzg, Assigned: armenzg)

Tracking

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(1 attachment)

(Assignee)

Description

6 years ago
Created attachment 694515 [details] [diff] [review]
retry on failure to request a device

https://tbpl.mozilla.org/php/getParsedLog.php?id=18132101&tree=Cedar&full=1#error0

08:43:59     INFO - #####
08:43:59     INFO - ##### Running request-device step.
08:43:59     INFO - #####
08:43:59     INFO - Getting output from command: ['/builds/panda-0119/test/build/venv/bin/python', '-c', 'from distutils.sysconfig import get_python_lib; print(get_python_lib())']
08:43:59     INFO - Copy/paste: /builds/panda-0119/test/build/venv/bin/python -c "from distutils.sysconfig import get_python_lib; print(get_python_lib())"
08:43:59     INFO - Reading from file tmpfile_stdout
08:43:59     INFO - Output received:
08:43:59     INFO -  /builds/panda-0119/test/build/venv/lib/python2.7/site-packages
08:43:59     INFO - Request POST http://mobile-imaging-001.p1.releng.scl1.mozilla.com/api/device/panda-0119/request/...
08:43:59    ERROR - Bad return status from http://mobile-imaging-001.p1.releng.scl1.mozilla.com/api/device/panda-0119/request/: 500!
Traceback (most recent call last):
  File "scripts/scripts/b2g_panda.py", line 140, in <module>
    pandaTest.run()
  File "/builds/panda-0119/test/scripts/mozharness/base/script.py", line 730, in run
    self._possibly_run_method(method_name, error_if_missing=True)
  File "/builds/panda-0119/test/scripts/mozharness/base/script.py", line 687, in _possibly_run_method
    return getattr(self, method_name)()
  File "scripts/scripts/b2g_panda.py", line 95, in request_device
    b2gbase=b2gbase, pxe_config=None)
  File "/builds/panda-0119/test/scripts/mozharness/mozilla/testing/mozpool.py", line 294, in request_device
    check_mozpool_status(status)
  File "/builds/panda-0119/test/scripts/mozharness/mozilla/testing/mozpool.py", line 70, in check_mozpool_status
    raise MozpoolException('mozpool status not ok, code %s' % pprint.pformat(status))
mozharness.mozilla.testing.mozpool.MozpoolException: mozpool status not ok, code 500
program finished with exit code 1
elapsedTime=2.393243
Attachment #694515 - Flags: review?(aki)
(Assignee)

Updated

6 years ago
Assignee: nobody → armenzg
Blocks: 819492

Comment 1

6 years ago
Comment on attachment 694515 [details] [diff] [review]
retry on failure to request a device

Maybe

                self.buildbot_status(TBPL_RETRY)
                self.fatal("We could not request the device: %s" % str(e))

?
Attachment #694515 - Flags: review?(aki) → review+
(Assignee)

Comment 2

6 years ago
dustin, even if we got 500 status code the requests did go through.
In fact, they are still showing in mozpool:
http://mobile-imaging-001.p1.releng.scl1.mozilla.com/ui/mozpool.html

* After requesting a device [1]
* We check then on the status returned by mozpool [2]

We somehow got a 500 status return code.

[1] http://hg.mozilla.org/build/mozharness/file/tip/mozharness/mozilla/testing/mozpool.py#l291
[2] http://hg.mozilla.org/build/mozharness/file/tip/mozharness/mozilla/testing/mozpool.py#l294
http://hg.mozilla.org/build/mozharness/file/tip/mozharness/mozilla/testing/mozpool.py#l59
(Assignee)

Comment 3

6 years ago
Comment on attachment 694515 [details] [diff] [review]
retry on failure to request a device

I also landed an import of MozpoolException from mozpool.py.
Attachment #694515 - Flags: checked-in+
(Assignee)

Updated

6 years ago
Status: NEW → RESOLVED
Last Resolved: 6 years ago
Resolution: --- → FIXED
(Assignee)

Comment 4

6 years ago
15:26 armenzg: dustin: is dbcron something new that got added?
15:26 dustin: armenzg: there's a bug to add one
15:26 dustin: yeah, it was added late last week
15:26 armenzg: is this what caused the 500 issue?
15:27 dustin: yes
15:28 dustin: basically the tables to insert the log entries into weren't there
15:28 dustin: oh, no bug yet, but in my TODO list - "newbug - nagios check to monitor mozpool partitions"
15:28 dustin: bug 819186 introduced dbcron
15:28 bugbot: Bug https://bugzilla.mozilla.org/show_bug.cgi?id=819186 normal, --, ---, dustin, RESOLVED FIXED, use a crontask on the admin host, rather than a MySQL Scheduled Task, to create new log partitions
Bug 823661 is the fix to this particular issue (dbcron wasn't running because the 'mysql' command wasn't installed), and bug 823666 is the bug to monitor the partitions.  Sheeri recommended this a few weeks ago, and I had it in my TODO but hadn't implemented it yet.  Shame on me!
Product: mozilla.org → Release Engineering
Component: General Automation → General
Product: Release Engineering → Release Engineering
You need to log in before you can comment on or make changes to this bug.