Closed Bug 1166516 Opened 9 years ago Closed 7 years ago

dead job with 'Error, must supply who' error cropping up

Categories

(Release Engineering :: General, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED INCOMPLETE

People

(Reporter: selenamarie, Unassigned)

References

()

Details

Have two dead jobs. Currently investigating in #buildduty

Running [u'/builds/buildbot/try1/bin/python', u'/builds/buildbot/try1/lib/python2.7/site-packages/buildbotcustom/bin/postrun.py', u'-c', u'/builds/buildbot/try1/master/postrun.cfg', u'--master-name', u'buildbot-master76.bb.releng.use1.mozilla.com:/builds/buildbot/try1/master', u'--master-incarnation', u'pid6500-boot1429379580', u'/builds/buildbot/try1/master/try-linux64-pgo/0', u'70069616']
2015-05-19 10:59:45,756 - Loading build pickle
2015-05-19 10:59:45,937 - Build info: {'platform': 'linux64-pgo', 'product': 'firefox', 'branch': 'try'}
2015-05-19 10:59:45,937 - uploading log
2015-05-19 10:59:45,937 - Build info: {'platform': 'linux64-pgo', 'product': 'firefox', 'branch': 'try'}
2015-05-19 10:59:45,937 - Build info: {'platform': 'linux64-pgo', 'product': 'firefox', 'branch': 'try'}
2015-05-19 10:59:45,938 - Running ['/builds/buildbot/try1/bin/python', '/builds/buildbot/try1/lib/python2.7/site-packages/buildbotcustom/bin/log_uploader.py', '-r', '2', '-t', '10', '--master-name', u'bm76-try1', '--try', '--product', 'firefox', '--platform', 'linux64-pgo', '--branch', 'try', '--user', u'trybld', '-i', u'/home/cltbld/.ssh/trybld_dsa', u'stage.mozilla.org', '/builds/buildbot/try1/master/try-linux64-pgo', '0']
2015-05-19 10:59:45,938 - command: START
2015-05-19 10:59:45,938 - command: /builds/buildbot/try1/bin/python /builds/buildbot/try1/lib/python2.7/site-packages/buildbotcustom/bin/log_uploader.py -r 2 -t 10 --master-name bm76-try1 --try --product firefox --platform linux64-pgo --branch try --user trybld -i /home/cltbld/.ssh/trybld_dsa stage.mozilla.org /builds/buildbot/try1/master/try-linux64-pgo 0
2015-05-19 10:59:45,938 - command: stdin: <open file '/dev/null', mode 'r' at 0x29ca9c0>
2015-05-19 10:59:45,938 - command: cwd: /
Traceback (most recent call last):
  File "/builds/buildbot/try1/lib/python2.7/site-packages/buildbotcustom/bin/log_uploader.py", line 363, in <module>
    print ssh(user=options.user, identity=options.identity, host=host, remote_cmd=post_upload_cmd)
  File "/builds/buildbot/try1/lib/python2.7/site-packages/buildbotcustom/bin/log_uploader.py", line 46, in ssh
    return retry(do_cmd, attempts=retries + 1, sleeptime=retry_sleep, args=(cmd,))
  File "/builds/buildbot/try1/tools/lib/python/util/retry.py", line 33, in retry
    return action(*args, **kwargs)
  File "/builds/buildbot/try1/lib/python2.7/site-packages/buildbotcustom/bin/log_uploader.py", line 37, in do_cmd
    cmd, retcode, output))
Exception: Command ['ssh', '-l', 'trybld', '-i', '/home/cltbld/.ssh/trybld_dsa', '-p', '22', 'stage.mozilla.org', 'post_upload.py -b try -p firefox --revision e3c7e03823c5 --builddir try-linux64-pgo --release-to-try-builds /tmp/tmp.di3E3x2VxX /tmp/tmp.di3E3x2VxX/try-linux64-pgo-bm76-try1-build0.txt.gz'] returned non-zero exit code 1:
sys.argv: ['/usr/local/bin/post_upload.py', '-b', 'try', '-p', 'firefox', '--revision', 'e3c7e03823c5', '--builddir', 'try-linux64-pgo', '--release-to-try-builds', '/tmp/tmp.di3E3x2VxX', '/tmp/tmp.di3E3x2VxX/try-linux64-pgo-bm76-try1-build0.txt.gz']
Error, must supply who
2015-05-19 11:00:26,936 - Process returned 1
Traceback (most recent call last):
  File "/builds/buildbot/try1/lib/python2.7/site-packages/buildbotcustom/bin/postrun.py", line 379, in <module>
    main()
  File "/builds/buildbot/try1/lib/python2.7/site-packages/buildbotcustom/bin/postrun.py", line 376, in main
    post_runner.processBuild(options, build_path, request_ids)
  File "/builds/buildbot/try1/lib/python2.7/site-packages/buildbotcustom/bin/postrun.py", line 306, in processBuild
    log_url = self.uploadLog(build)
  File "/builds/buildbot/try1/lib/python2.7/site-packages/buildbotcustom/bin/postrun.py", line 104, in uploadLog
    output = get_output(cmd, stdin=devnull)
  File "/builds/buildbot/try1/tools/lib/python/util/commands.py", line 207, in get_output
    raise error
subprocess.CalledProcessError: Command '['/builds/buildbot/try1/bin/python', '/builds/buildbot/try1/lib/python2.7/site-packages/buildbotcustom/bin/log_uploader.py', '-r', '2', '-t', '10', '--master-name', u'bm76-try1', '--try', '--product', 'firefox', '--platform', 'linux64-pgo', '--branch', 'try', '--user', u'trybld', '-i', u'/home/cltbld/.ssh/trybld_dsa', u'stage.mozilla.org', '/builds/buildbot/try1/master/try-linux64-pgo', '0']' returned non-zero exit status 1

Result: 1, Elapsed: 43.2 seconds
bm83 (win32-pgo) and bm76 (linux64-pgo) have a dead job from the same rev: https://hg.mozilla.org/try/rev/e3c7e03823c5

I'm not seeing any recent obvious mh or postrun/post_upload.py changes. interesting that they are both pgo.

investigating results from irc:
14:42:12 <jlund> selenamarie|buildduty: hrm, looks like it's jmaher's push for the first dead item: https://hg.mozilla.org/try/rev/e3c7e03823c5
14:42:30 <jlund> not sure how that broke postrun.py or we are missing the who property
14:42:39 <jlund> has it changed recently?
14:43:25 <jlund> http://hg.mozilla.org/build/buildbotcustom/log/99b03dfbc0c1/bin/postrun.py says no
14:45:07 <jlund> http://hg.mozilla.org/build/tools/log/f4abaf6b1148/stage/post_upload.py nothing changed
14:48:47 <jlund> http://mxr.mozilla.org/build/source/tools/stage/post_upload.py#505 is what's getting hit
14:49:18 <jlund> try builds pass --release-to-try-builds so that's why it's getting hit. not sure why who is unknown though.
14:47:30 <•Callek> jlund: post-upload.py in tools iirc is a mirror of the actual copy in infra puppet
14:48:59 <•Callek> jlund: ook https://callek.pastebin.mozilla.org/8834005 is the existing copy
14:55:34 <jlund> treeherder is going to think this job never finished: https://treeherder.mozilla.org/#/jobs?repo=try&revision=e3c7e03823c5&filter-searchStr=pgo
14:58:25 <jlund> http://buildbot-master76.bb.releng.use1.mozilla.com:8101/builders/Linux%20x86-64%20try%20pgo-build/builds/0 is one of the jobs

it looks like the jobs pass --who:
10:50:39     INFO -  Running post-upload command: post_upload.py --who nobody@example.com --builddir try-linux64-pgo --tinderbox-builds-dir nobody@example.com-e3c7e03823c5 -p firefox -i 20150519094043 --revision e3c7e03823c5 --release-to-try-builds

so I'm not sure how we are hitting:
http://mxr.mozilla.org/build/source/tools/stage/post_upload.py#505
I've moved the dead items to /dev/shm/queue/commands/bug_1166516 to quiet down #buildduty

Won't spend too much time trying to track the cause if this ends up only being one rev.
15:23:53 <catlee> jlund: were those triggered via self-serve perhaps?
15:29:55 <catlee> jlund: " Self-serve: Requested by jmaher@mozilla.com "
15:30:08 <catlee> so maybe there's something funky with how he requested it

jmaher: what api call or request did you make for this rev?
Flags: needinfo?(jmaher)
I requested a pgo build on the self serve api.  We have pgo builders on try, I want to figure out how to use them:)
Flags: needinfo?(jmaher)
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → INCOMPLETE
Component: General Automation → General
You need to log in before you can comment on or make changes to this bug.