B2g bumper bot problems - hasn't made commits for a while - Gaia-feeding trees closed until bumper bot is fixed

RESOLVED FIXED

Status

Infrastructure & Operations
CIDuty
--
blocker
RESOLVED FIXED
4 years ago
a month ago

People

(Reporter: Tomcat, Unassigned)

Tracking

Details

(Whiteboard: [tree-closing])

(Reporter)

Description

4 years ago
Gaia-feeding trees until bumper bot is fixed, since the b2g bumper hasn't landed since nearly an hour ago. B2g-inbound, aurora, b2g30, b2g28 closed
vcs-sync appears to be working OK (the changes are showing up in the various Gaia trees on hg.mozilla.org). There are alerts in #buildduty about b2g_bumper.stamp's age. I don't see any recent changes to b2g-manifest other than one to v1.4 that appears to have stuck OK.
(Reporter)

Comment 2

4 years ago
closed gaia on git now as well
Would that be why after I updated https://github.com/mozilla-b2g/gaia/pull/21848 that the gaia-try-hook wasn't triggered?
There were git commands hanging on buildbot-master66; as a consequence the bumper execution was hanging and new crontabbed runs could not execute because they could not acquire lock file.
:pmoore (thanks for your help here) and I are now trying to unlock the situation.
Whiteboard: [tree-closing]
from #releng

07:57 < fubar> I see gitweb hits to b2g-manifest up until 1156UTC, and then a ~75 minute gap, then another ~70 minute gap. (for gitweb. there are ssh
               writes frmo vcs-sync)
07:58 < fubar> I don't see any errors, or 500s, though
07:58 < fubar> if it happens again, shout and don't kill the proc
08:00 < RyanVM|sheriffduty> thats fits the timeline of when things appear to have died
Everything seems to be caught up now and builds are all running, so I've reopened all the trees.
Status: NEW → RESOLVED
Last Resolved: 4 years ago
Resolution: --- → FIXED
no errors or 500s on gitmo. minor memory bump around then, but nothing out of the ordinary. 

zlb logs don't show anything bad. there was one pull from AWS at 0611PDT that stands out, only because all others were from SCL3.
We found the following line in b2g_bumper log (buildbot-master66.srv.releng.usw2.mozilla.com:/builds/b2g_bumper/v1.4/logs/log_info.log, all times PDT):

04:50:15     INFO -  2014-07-17 04:50:15,199 command: git fetch -q https://git.mozilla.org/b2g/b2g-manifest.git +refs/heads/*:refs/remotes/origin/*

This was the last entry in the log file for 2h20m, so we had a look to the process table and found several hung processes.

The process table contained:

cltbld   23285 23111  0 04:50 ?        00:00:00 git ls-remote https://git.mozilla.org/external/sprd-aosp/platform/external/bluetooth/bluez refs/heads/sprdb2g_gonk4.0_6821
cltbld   23552 23111  0 04:50 ?        00:00:00 git ls-remote https://git.mozilla.org/external/sprd-aosp/platform/external/sqlite refs/tags/android-4.0.4_r2.1
cltbld   21228     1  0 May19 ?        00:00:00 git ls-remote https://git.mozilla.org/external/caf/platform/external/e2fsprogs refs/tags/android-4.4.2_r1
cltbld   21229     1  0 May19 ?        00:00:00 git ls-remote https://git.mozilla.org/external/caf/platform/external/elfutils refs/tags/android-4.4.2_r1
cltbld   21230 21228  0 May19 ?        00:03:49 git-remote-https https://git.mozilla.org/external/caf/platform/external/e2fsprogs https://git.mozilla.org/external/caf/platform/external/e2fsprogs
cltbld   21231 21229  0 May19 ?        00:03:50 git-remote-https https://git.mozilla.org/external/caf/platform/external/elfutils https://git.mozilla.org/external/caf/platform/external/elfutils
cltbld   21232     1  0 May19 ?        00:00:00 git ls-remote https://git.mozilla.org/external/caf/platform/external/expat refs/tags/android-4.4.2_r1
cltbld   21233 21232  0 May19 ?        00:03:53 git-remote-https https://git.mozilla.org/external/caf/platform/external/expat https://git.mozilla.org/external/caf/platform/external/expat
cltbld   21234     1  0 May19 ?        00:00:00 git ls-remote https://git.mozilla.org/external/caf/platform/external/fdlibm refs/tags/android-4.4.2_r1
cltbld   21237 21234  0 May19 ?        00:03:52 git-remote-https https://git.mozilla.org/external/caf/platform/external/fdlibm https://git.mozilla.org/external/caf/platform/external/fdlibm
cltbld   23153 23147  0 04:50 ?        00:00:00 git fetch -q https://git.mozilla.org/b2g/b2g-manifest.git +refs/heads/*:refs/remotes/origin/*
cltbld   23154 23153  0 04:50 ?        00:00:00 git-remote-https https://git.mozilla.org/b2g/b2g-manifest.git https://git.mozilla.org/b2g/b2g-manifest.git
cltbld   23285 23111  0 04:50 ?        00:00:00 git ls-remote https://git.mozilla.org/external/sprd-aosp/platform/external/bluetooth/bluez refs/heads/sprdb2g_gonk4.0_6821
cltbld   23297 23285  0 04:50 ?        00:00:00 git-remote-https https://git.mozilla.org/external/sprd-aosp/platform/external/bluetooth/bluez https://git.mozilla.org/external/sprd-aosp/platform/external/b
cltbld   23552 23111  0 04:50 ?        00:00:00 git ls-remote https://git.mozilla.org/external/sprd-aosp/platform/external/sqlite refs/tags/android-4.0.4_r2.1
cltbld   23553 23552  0 04:50 ?        00:00:00 git-remote-https https://git.mozilla.org/external/sprd-aosp/platform/external/sqlite https://git.mozilla.org/external/sprd-aosp/platform/external/sqlite

We killed the following process:
  cltbld   23153 23147  0 04:50 ?        00:00:00 git fetch -q https://git.mozilla.org/b2g/b2g-manifest.git +refs/heads/*:refs/remotes/origin/*
cltbld   23285 23111  0 04:50 ?        00:00:00 git ls-remote https://git.mozilla.org/external/sprd-aosp/platform/external/bluetooth/bluez refs/heads/sprdb2g_gonk4.0_6821
cltbld   23552 23111  0 04:50 ?        00:00:00 git ls-remote https://git.mozilla.org/external/sprd-aosp/platform/external/sqlite refs/tags/android-4.0.4_r2.1

This was not enough to remove the stale lock file, so we needed to kill the calling python processes as well:

cltbld   23111 23109  0 04:50 ?        00:00:01 python ../mozharness/scripts/b2g_bumper.py -c b2g_bumper/v1.3t.py
cltbld   23106 23104  0 04:50 ?        00:00:00 python ../mozharness/scripts/b2g_bumper.py -c b2g_bumper/v1.4.py

We determined the path of the working process by running:

[cltbld@buildbot-master66.srv.releng.usw2.mozilla.com b2g_bumper]$ cat /proc/23106/environ | xargs -0 -n1 echo
SHELL=/bin/sh
OLDPWD=/builds/b2g_bumper
USER=cltbld
PATH=/usr/local/bin:/usr/bin:/bin
PWD=/builds/b2g_bumper/v1.4
HOME=/home/cltbld
SHLVL=2
LOGNAME=cltbld
_=/usr/bin/python
Would buildbot-master66 be connecting to gitmo directly, or though v-1030.fw1.releng.scl3? 

For hits on b2g-manifest, I only see hits from v-1030.fw1 between 0400-0500PDT. Looks like the same for the other repos.
See Also: → bug 1230586

Updated

a month ago
Product: Release Engineering → Infrastructure & Operations
You need to log in before you can comment on or make changes to this bug.