Closed Bug 961048 Opened 6 years ago Closed 6 years ago

Mozharness' vcs_checkout() should attempt repo cloning more than once & output a TBPL compatible failure message

Categories

(Release Engineering :: Applications: MozharnessCore, defect)

defect
Not set

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: emorley, Assigned: aki)

References

(Blocks 1 open bug)

Details

(Keywords: sheriffing-P1)

Attachments

(1 file)

In order to save the full log having to be opened (and to differentiate between the various buildbot "command timed out: 1200 seconds without output, attempting to kill" failures), we should:

1) Attempt the clone/pull more than once, so temporary network glitches are less likely to cause job failures.

2) Add a TBPL compatible failure message (eg: "Automation Error: Repo sync failed ...").

Happy to defer #1 to another bug if needed.

Current failures are of the form:

For build tools cloning:
https://tbpl.mozilla.org/php/getParsedLog.php?id=32237843&tree=Mozilla-Inbound
{
12:50:20     INFO - retry: Calling <bound method B2GDesktopTest._get_revision of <__main__.B2GDesktopTest object at 0x23f45d0>> with args: (<mozharness.base.vcs.mercurial.MercurialVCS object at 0x2403910>, '/builds/slave/test/build/tools'), kwargs: {}, attempt #1
12:50:20     INFO - Setting /builds/slave/test/build/tools to http://hg.mozilla.org/build/tools.
12:50:20     INFO - Cloning http://hg.mozilla.org/build/tools to /builds/slave/test/build/tools.
12:50:20     INFO - Running command: ['hg', '--config', 'ui.merge=internal:merge', 'clone', 'http://hg.mozilla.org/build/tools', '/builds/slave/test/build/tools']
12:50:20     INFO - Copy/paste: hg --config ui.merge=internal:merge clone http://hg.mozilla.org/build/tools /builds/slave/test/build/tools

command timed out: 1200 seconds without output, attempting to kill
}

For gaia-central cloning:
https://tbpl.mozilla.org/php/getParsedLog.php?id=33150237&tree=B2g-Inbound
{
22:01:35     INFO - #####
22:01:35     INFO - ##### Running pull step.
22:01:35     INFO - #####
22:01:35     INFO - Running pre-action listener: _resource_record_pre_action
22:01:35     INFO - Running main action method: pull
22:01:35     INFO - retry: Calling <bound method GaiaIntegrationTest.load_json_from_url of <__main__.GaiaIntegrationTest object at 0x995bfcc>> with args: ('https://hg.mozilla.org/integration/b2g-inbound/raw-file/328bad2599f2/b2g/config/gaia.json',), kwargs: {}, attempt #1
22:01:41     INFO - Changing directory to /builds/slave/test.
22:01:41     INFO - retry: Calling <bound method GaiaIntegrationTest._get_revision of <__main__.GaiaIntegrationTest object at 0x995bfcc>> with args: (<mozharness.base.vcs.mercurial.MercurialVCS object at 0x99352ec>, '/builds/slave/test/gaia'), kwargs: {}, attempt #1
22:01:41     INFO - Setting /builds/slave/test/gaia to https://hg.mozilla.org//integration/gaia-central revision 9e00ea980c1de438590396b5d990b3a567d2edc6.
22:01:41     INFO - Cloning https://hg.mozilla.org//integration/gaia-central to /builds/slave/test/gaia.
22:01:41     INFO - Running command: ['hg', '--config', 'ui.merge=internal:merge', 'clone', 'https://hg.mozilla.org//integration/gaia-central', '/builds/slave/test/gaia']
22:01:41     INFO - Copy/paste: hg --config ui.merge=internal:merge clone https://hg.mozilla.org//integration/gaia-central /builds/slave/test/gaia

command timed out: 1200 seconds without output, attempting to kill
}

The current retry isn't working:
http://hg.mozilla.org/build/mozharness/file/3f764317c8db/mozharness/base/vcs/vcsbase.py#l81
    81         return self.retry(
    82             self._get_revision,
    83             error_level=error_level,
    84             error_message="Can't checkout %s!" % kwargs['repo'],
    85             args=(vcs_obj, kwargs['dest']),
    86         )
Blocks: 778688
Depends on: 965519
I think https://bugzilla.mozilla.org/show_bug.cgi?id=920153#c702 shows that 1) is already the case, when we set an idle timeout (to be done in bug 920153).

Now that bug 965519 landed, I can add the idle timeout to the other scripts in mozharness without having to install mozprocess in a venv first.  I'll add the "Automation Error:" magic string there too.
Assignee: nobody → aki
Actually, one liner that's easy to attach here.
I'll resolve once this is landed/merged, and deal with the rest of the retries in the other bug.
Attachment #8372783 - Flags: review?(catlee)
Attachment #8372783 - Flags: review?(catlee) → review+
in production
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Component: General Automation → Mozharness
You need to log in before you can comment on or make changes to this bug.