Closed Bug 920153 Opened 10 years ago Closed 9 years ago

Cloning of hg.mozilla.org/build/tools and hg.mozilla.org/integration/gaia-central often times out, as does downloading/unzipping test zips

Categories

(Release Engineering :: General, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: emorley, Assigned: mozilla)

References

Details

(Keywords: intermittent-failure, sheriffing-P1)

Attachments

(8 files, 1 obsolete file)

eg:
https://tbpl.mozilla.org/php/getParsedLog.php?id=28094665&tree=Mozilla-Inbound

{
08:00:05     INFO - Running command: ['tar', 'zxf', '/builds/slave/test/build/emulator.tar.gz'] in /builds/slave/test/build/emulator
08:00:05     INFO - Copy/paste: tar zxf /builds/slave/test/build/emulator.tar.gz
08:00:10     INFO - Return code: 0
08:00:10     INFO - retry: Calling <bound method B2GEmulatorTest._get_revision of <__main__.B2GEmulatorTest object at 0x1fa3f50>> with args: (<mozharness.base.vcs.mercurial.MercurialVCS object at 0x201b450>, '/builds/slave/test/build/tools'), kwargs: {}, attempt #1
08:00:10     INFO - Setting /builds/slave/test/build/tools to http://hg.mozilla.org/build/tools.
08:00:10     INFO - Cloning http://hg.mozilla.org/build/tools to /builds/slave/test/build/tools.
08:00:10     INFO - Running command: ['hg', '--config', 'ui.merge=internal:merge', 'clone', 'http://hg.mozilla.org/build/tools', '/builds/slave/test/build/tools']
08:00:10     INFO - Copy/paste: hg --config ui.merge=internal:merge clone http://hg.mozilla.org/build/tools /builds/slave/test/build/tools

command timed out: 1200 seconds without output, attempting to kill
process killed by signal 9
program finished with exit code -1
elapsedTime=1246.914894
========= Finished '/tools/buildbot/bin/python scripts/scripts/b2g_emulator_unittest.py ...' failed (results: 2, elapsed: 20 mins, 55 secs) (at 2013-09-19 08:20:19.665482) =========
}

Expected:
* Few retries of the hg clone
* "Automation Error: Unable to clone build tools repo" (and no "command timed out: 1200 seconds without output, attempting to kill").

I thought bug 840305 had fixed this..?
Summary: Cloning of build tools does not retry and/or output a TBPL compatible failure message → Cloning of build tools does not retry and/or output a TBPL compatible failure message ("command timed out: 1200 seconds without output, attempting to kill")
Blocks: 840305
There's also this (non-retrying) failure mode:

https://tbpl.mozilla.org/php/getParsedLog.php?id=28035195&tree=Mozilla-Inbound

{
05:26:02     INFO - #####
05:26:02     INFO - ##### Running pull step.
05:26:02     INFO - #####
05:26:02     INFO - Running pre-action listener: _resource_record_pre_action
05:26:02     INFO - Running main action method: pull
05:26:02     INFO - Changing directory to /builds/slave/talos-slave/test/build.
05:26:02     INFO - retry: Calling <bound method DesktopUnittest._get_revision of <__main__.DesktopUnittest object at 0x1010dc7d0>> with args: (<mozharness.base.vcs.mercurial.MercurialVCS object at 0x100409fd0>, 'tools'), kwargs: {}, attempt #1
05:26:02     INFO - Setting /builds/slave/talos-slave/test/build/tools to http://hg.mozilla.org/build/tools.
05:26:02     INFO - Cloning http://hg.mozilla.org/build/tools to /builds/slave/talos-slave/test/build/tools.
05:26:02     INFO - Getting output from command: ['/builds/slave/talos-slave/test/build/venv/bin/python', '-c', 'from distutils.sysconfig import get_python_lib; print(get_python_lib())']
05:26:02     INFO - Copy/paste: /builds/slave/talos-slave/test/build/venv/bin/python -c "from distutils.sysconfig import get_python_lib; print(get_python_lib())"
05:26:02     INFO - Copy/paste: /builds/slave/talos-slave/test/build/venv/bin/python -c "from distutils.sysconfig import get_python_lib; print(get_python_lib())"
05:26:02     INFO - Reading from file tmpfile_stdout
05:26:02     INFO - Output received:
05:26:02     INFO -  /builds/slave/talos-slave/test/build/venv/lib/python2.7/site-packages
05:26:02     INFO - retry: Calling <built-in function remove> with args: ('tmpfile_stderr',), kwargs: {}, attempt #1
05:26:02     INFO - retry: Calling <built-in function remove> with args: ('tmpfile_stdout',), kwargs: {}, attempt #1
05:26:02     INFO - Running command: ['hg', '--config', 'ui.merge=internal:merge', 'clone', 'http://hg.mozilla.org/build/tools', '/builds/slave/talos-slave/test/build/tools']
05:26:02     INFO - Copy/paste: hg --config ui.merge=internal:merge clone http://hg.mozilla.org/build/tools /builds/slave/talos-slave/test/build/tools
05:26:02     INFO - Calling ['hg', '--config', 'ui.merge=internal:merge', 'clone', 'http://hg.mozilla.org/build/tools', '/builds/slave/talos-slave/test/build/tools'] with output_timeout 1000
05:42:42     INFO - mozprocess timed out
05:42:42    ERROR - timed out after 1000 seconds of no output
05:42:42    ERROR - Return code: 9
}
Keywords: sheriffing-P1
Chris, is there someone that could take a look at this for us? :-)
Flags: needinfo?(catlee)
Should we add --verbose and --debug to the clone command to diagnose what is going on?
I was going to file on the gaia tests cloning https://hg.mozilla.org//integration/gaia-central, but I guess we long since decided to just throw them in here.
Summary: Cloning of build tools does not retry and/or output a TBPL compatible failure message ("command timed out: 1200 seconds without output, attempting to kill") → Cloning of build tools and gaia repo does not retry and/or output a TBPL compatible failure message ("command timed out: 1200 seconds without output, attempting to kill")
Also doesn't use the hg share, the hg mirror, or a bundle (which admittedly doesn't exist). This is not at all wonderful given gaia-central has a .hg more than a GB in size, which approaches repos like mozilla-central.
But at least it gives us a handy indicator for the slightly reduced phase of bug 957502, where we don't get the failures downloading from ftp.m.o but we do get this going from one every eight hours to eight an hour.
Blocks: 960072
Breaking out the "output a TBPL compatible failure message & attempt to clone more than once before giving up" parts to other bugs, one for build tools cloning:

https://tbpl.mozilla.org/php/getParsedLog.php?id=32237843&tree=Mozilla-Inbound
{
12:50:20     INFO - retry: Calling <bound method B2GDesktopTest._get_revision of <__main__.B2GDesktopTest object at 0x23f45d0>> with args: (<mozharness.base.vcs.mercurial.MercurialVCS object at 0x2403910>, '/builds/slave/test/build/tools'), kwargs: {}, attempt #1
12:50:20     INFO - Setting /builds/slave/test/build/tools to http://hg.mozilla.org/build/tools.
12:50:20     INFO - Cloning http://hg.mozilla.org/build/tools to /builds/slave/test/build/tools.
12:50:20     INFO - Running command: ['hg', '--config', 'ui.merge=internal:merge', 'clone', 'http://hg.mozilla.org/build/tools', '/builds/slave/test/build/tools']
12:50:20     INFO - Copy/paste: hg --config ui.merge=internal:merge clone http://hg.mozilla.org/build/tools /builds/slave/test/build/tools

command timed out: 1200 seconds without output, attempting to kill
}

And the other for gaia-central cloning:
https://tbpl.mozilla.org/php/getParsedLog.php?id=33150237&tree=B2g-Inbound
{
22:01:35     INFO - #####
22:01:35     INFO - ##### Running pull step.
22:01:35     INFO - #####
22:01:35     INFO - Running pre-action listener: _resource_record_pre_action
22:01:35     INFO - Running main action method: pull
22:01:35     INFO - retry: Calling <bound method GaiaIntegrationTest.load_json_from_url of <__main__.GaiaIntegrationTest object at 0x995bfcc>> with args: ('https://hg.mozilla.org/integration/b2g-inbound/raw-file/328bad2599f2/b2g/config/gaia.json',), kwargs: {}, attempt #1
22:01:41     INFO - Changing directory to /builds/slave/test.
22:01:41     INFO - retry: Calling <bound method GaiaIntegrationTest._get_revision of <__main__.GaiaIntegrationTest object at 0x995bfcc>> with args: (<mozharness.base.vcs.mercurial.MercurialVCS object at 0x99352ec>, '/builds/slave/test/gaia'), kwargs: {}, attempt #1
22:01:41     INFO - Setting /builds/slave/test/gaia to https://hg.mozilla.org//integration/gaia-central revision 9e00ea980c1de438590396b5d990b3a567d2edc6.
22:01:41     INFO - Cloning https://hg.mozilla.org//integration/gaia-central to /builds/slave/test/gaia.
22:01:41     INFO - Running command: ['hg', '--config', 'ui.merge=internal:merge', 'clone', 'https://hg.mozilla.org//integration/gaia-central', '/builds/slave/test/gaia']
22:01:41     INFO - Copy/paste: hg --config ui.merge=internal:merge clone https://hg.mozilla.org//integration/gaia-central /builds/slave/test/gaia

command timed out: 1200 seconds without output, attempting to kill
}
Summary: Cloning of build tools and gaia repo does not retry and/or output a TBPL compatible failure message ("command timed out: 1200 seconds without output, attempting to kill") → Cloning of hg.mozilla.org/build/tools and hg.mozilla.org/integration/gaia-central often times out with "command timed out: 1200 seconds without output, attempting to kill"
Depends on: 961048