Closed Bug 961042 Opened 10 years ago Closed 10 years ago

b2g_build.py checkout_sources() should attempt |repo sync| more than once & output a TBPL compatible failure message

Categories

(Release Engineering :: General, defect)

ARM
Gonk (Firefox OS)
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: emorley, Unassigned)

References

(Blocks 1 open bug)

Details

(Keywords: sheriffing-P1)

Attachments

(1 file)

In order to save the full log having to be opened (and to differentiate between the various buildbot "command timed out: 1200 seconds without output, attempting to kill" failures), we should:

1) Attempt |repo sync| more than once, so temporary network glitches are less likely to cause job failures.

2) Add a TBPL compatible failure message (eg: "Automation Error: Repo sync failed ...").

Happy to defer #1 to another bug if needed.

Current failures are of the form:

b2g_b2g-inbound_nexus-4_dep
https://tbpl.mozilla.org/php/getParsedLog.php?id=33144750&tree=B2g-Inbound
{
19:17:46     INFO - Running command: ['script', '-q', '-c', '/builds/slave/b2g_b2g-in_nexus-4_dep-0000000/build/repo sync'] in /builds/slave/b2g_b2g-in_nexus-4_dep-0000000/build
19:17:46     INFO - Copy/paste: script -q -c "/builds/slave/b2g_b2g-in_nexus-4_dep-0000000/build/repo sync"
19:17:46     INFO -  Fetching project fake-libdvm
19:17:46     INFO -  Fetching project device/generic/armv7-a-neon
19:17:46     INFO -  Fetching project device-mako
19:17:46     INFO -  Fetching project device/lge/mako-kernel
...
...
19:24:47     INFO - Fetching projects:  94% (123/130)  Fetching project gonk-misc
19:24:48     INFO -  Receiving objects:  88% (2596/2927), 51.30 MiB | 116 KiB/s   
19:24:48     INFO - 
19:24:48     INFO - Fetching projects:  95% (124/130)  Fetching project platform_build
19:24:49     INFO -  Receiving objects:  88% (2596/2927), 51.44 MiB | 118 KiB/s   
19:24:49     INFO - 
19:24:49     INFO - Fetching projects:  96% (125/130)  Fetching project moztt
19:24:50     INFO -  Fetching project rilproxy

command timed out: 3600 seconds without output, attempting to kill
process killed by signal 9
program finished with exit code -1
elapsedTime=4219.647320
========= Finished 'scripts/scripts/b2g_build.py --target ...' failed (results: 2, elapsed: 1 hrs, 10 mins, 19 secs) (at 2014-01-16 20:24:50.839430) =========
}

As far as I can tell, the relevant code is at:
http://hg.mozilla.org/build/mozharness/file/3f764317c8db/scripts/b2g_build.py#l527
Blocks: 778688
This is full of suck.

We have to run 'repo' inside of a tool called 'script' to work around bug 857158 and not have git clones permafail. It turns out that 'script' always exits with 0, so aside from log parsing, we have no way to know if 'repo sync' succeeded.

I've been experimenting with tmux as a wrapper to provide a pty to repo instead. It seems to return a proper exit code at least!
Depends on: 965519
(In reply to Chris AtLee [:catlee] from comment #1)
> This is full of suck.
> 
> We have to run 'repo' inside of a tool called 'script' to work around bug
> 857158 and not have git clones permafail. It turns out that 'script' always
> exits with 0, so aside from log parsing, we have no way to know if 'repo
> sync' succeeded.
> 
> I've been experimenting with tmux as a wrapper to provide a pty to repo
> instead. It seems to return a proper exit code at least!

Hi Chris - I don't suppose there's a bug filed for this work (or is it this one?) - and have you had any luck with it? :-)
Flags: needinfo?(catlee)
Actually, I think the bulk of this was fixed as part of bug 970918. We're not running inside script any more, and we are retrying. What's left?
Flags: needinfo?(catlee)
Ah great to know :-) The only thing left is that it looks like we're not retrying, from the latest logs I see in bug 873928?

eg:
https://tbpl.mozilla.org/php/getParsedLog.php?id=37522885&tree=B2g-Inbound

Thanks :-)
Depends on: 970918
That looks like a buildbot timeout
The buildbot timeout is set to 3600 seconds right now, so let's set the timeout for config.sh to 2700 (45 minutes).
r+ from aki on irc
Attachment #8411965 - Flags: review+
Attachment #8411965 - Flags: checked-in+
mozharness patch is in production: http://hg.mozilla.org/build/mozharness/rev/9accabdd4358 :)
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Component: General Automation → General
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: