Try server builders hitting "abort: HTTP Error 504: Gateway Time-out" trying to clone

RESOLVED FIXED

Status

mozilla.org Graveyard
Server Operations
--
critical
RESOLVED FIXED
7 years ago
3 years ago

People

(Reporter: philor, Assigned: aravind)

Tracking

Details

(Reporter)

Description

7 years ago
The last successful clones of hg.m.o/try by Linux builders started around 23:00 2010/06/19. For Mac builders, it was around 14:00 2010/06/19. For some reason, Windows has been much more variable: the opt builds are either fine or fail for some other reason, the debug builds alternate between hitting this and not hitting it.

For the failures, a random example:

http://tinderbox.mozilla.org/showlog.cgi?log=MozillaTry/1277032505.1277032706.7375.gz

/tools/python/bin/hg clone --verbose --noupdate --rev c17baa9f8bc23ddc67ae6606946c752f226c1d48 http://hg.mozilla.org/try /builds/slave/tryserver-macosx-debug/build
 in dir /builds/slave/tryserver-macosx-debug (timeout 3600 secs)
 watching logfiles {}
 argv: ['/tools/python/bin/hg', 'clone', '--verbose', '--noupdate', '--rev', 'c17baa9f8bc23ddc67ae6606946c752f226c1d48', u'http://hg.mozilla.org/try', '/builds/slave/tryserver-macosx-debug/build']
 environment:
  Apple_PubSub_Socket_Render=/tmp/launch-EenRfI/Render
  CVS_RSH=ssh
  DISPLAY=/tmp/launch-M9NHML/:0
  HOME=/Users/cltbld
  LOGNAME=cltbld
  PATH=/tools/buildbot/bin:/tools/python/bin:/opt/local/bin:/usr/bin:/bin:/usr/sbin:/sbin:/usr/local/bin:/usr/X11/bin
  PWD=/builds/slave/tryserver-macosx-debug
  SHELL=/bin/bash
  SSH_AUTH_SOCK=/tmp/launch-mxjbfC/Listeners
  TMPDIR=/var/folders/TL/TLg3RrMbFAur2hBCXvCeqk+++TM/-Tmp-/
  USER=cltbld
 closing stdin
 using PTY: False
requesting all changes
abort: HTTP Error 504: Gateway Time-out
elapsedTime=180.156681
program finished with exit code 255
(Assignee)

Comment 1

7 years ago
This could be due to the caching servers we are now using to serve hg.m.o.  Investigating.  I was able to replicate the problem a few times, but after one successful clone, I can't replicate it anymore.  Is this behavior you see as well?
Assignee: server-ops → aravind
(Assignee)

Comment 2

7 years ago
I bumped up the backend timeout in varnish to 5 minutes (for the first byte), I think this was the root cause of these problems.  Please comment here if you see anymore of these error messages (and re-open the bug).
Status: NEW → RESOLVED
Last Resolved: 7 years ago
Resolution: --- → FIXED
(Reporter)

Comment 3

7 years ago
http://tinderbox.mozilla.org/showlog.cgi?log=MozillaTry/1277071849.1277072043.21844.gz same thing again.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Once bug 573403 started off builds we got most of them failing with the Gateway timeout error (40 out of 44). They're making requests like this:
hg clone --verbose --noupdate --rev b40891e0d672ef026fbdead5b90c876ecb73c352 http://hg.mozilla.org/try /builds/slave/tryserver-linux/build

We don't use the --rev argument for non-try branches, and 10 mozilla-central builds cloned fine (at the same time try clones were failing).
Aravind pulled the varnish proxy out of production.
Status: REOPENED → RESOLVED
Last Resolved: 7 years ago7 years ago
Resolution: --- → FIXED
Product: mozilla.org → mozilla.org Graveyard
You need to log in before you can comment on or make changes to this bug.