Closed Bug 1218735 Opened 9 years ago Closed 8 years ago

Integrations Trees closed for taskcluster timeouts across all trees

Categories

(Taskcluster :: General, defect)

defect
Not set
blocker

Tracking

(Not tracked)

RESOLVED WONTFIX

People

(Reporter: cbook, Unassigned)

References

()

Details

like https://treeherder.mozilla.org/logviewer.html#?job_id=16318598&repo=mozilla-inbound

   % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current

                                  Dload  Upload   Total   Spent    Left  Speed

 
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
100    29  100    29    0     0     33      0 --:--:-- --:--:-- --:--:--    33
100    29  100    29    0     0     33      0 --:--:-- --:--:-- --:--:--    33

 
  1 1783M    1 20.9M    0     0  14.3M      0  0:02:04  0:00:01  0:02:03 14.3M
  3 1783M    3 70.1M    0     0  28.3M      0  0:01:02  0:00:02  0:01:00 48.9M
  4 1783M    4 84.9M    0     0  24.4M      0  0:01:12  0:00:03  0:01:09 31.9M
  5 1783M    5 95.2M    0     0  21.3M      0  0:01:23  0:00:04  0:01:19 24.7M
  5 1783M    5  104M    0     0  19.1M      0  0:01:33  0:00:05  0:01:28 20.9M
  6 1783M    6  119M    0     0  18.4M      0  0:01:36  0:00:06  0:01:30 19.6M
  7 1783M    7  133M    0     0  17.8M      0  0:01:39  0:00:07  0:01:32 12.6M
  7 1783M    7  140M    0     0  16.6M      0  0:01:47  0:00:08  0:01:39 11.1M
  8 1783M    8  148M    0     0  15.6M      0  0:01:53  0:00:09  0:01:44 10.6M
  8 1783M    8  157M    0     0  15.0M      0  0:01:58  0:00:10  0:01:48 10.5M
  9 1783M    9  171M    0     0  14.9M      0  0:01:59  0:00:11  0:01:48 10.4M
 10 1783M   10  188M    0     0  15.1M      0  0:01:57  0:00:12  0:01:45 11.1M
 11 1783M   11  207M    0     0  15.4M      0  0:01:55  0:00:13  0:01:42 13.4M
 13 1783M   13  231M    0     0  16.0M      0  0:01:51  0:00:14  0:01:37 16.6M
 13 1783M   13  245M    0     0  15.8M      0  0:01:52  0:00:15  0:01:37 17.6M
 14 1783M   14  251M    0     0  15.2M      0  0:01:56  0:00:16  0:01:40 16.1M
 14 1783M   14  253M    0     0  14.5M      0  0:02:02  0:00:17  0:01:45 13.0M
 14 1783M   14  259M    0     0  14.0M      0  0:02:06  0:00:18  0:01:48 10.3M
 15 1783M   15  269M    0     0  13.8M      0  0:02:08  0:00:19  0:01:49 7636k
 15 1783M   15  284M    0     0  13.9M      0  0:02:08  0:00:20  0:01:48 7963k
 16 1783M   16  302M    0     0  14.0M      0  0:02:06  0:00:21  0:01:45 10.1M
 17 1783M   17  316M    0     0  14.0M      0  0:02:06  0:00:22  0:01:44 12.4M
 18 1783M   18  323M    0     0  13.7M      0  0:02:09  0:00:23  0:01:46 12.7M
 18 1783M   18  329M    0     0  13.4M      0  0:02:12  0:00:24  0:01:48 12.0M
 18 1783M   18  338M    0     0  13.2M      0  0:02:14  0:00:25  0:01:49 10.7M
 19 1783M   19  348M    0     0  13.1M      0  0:02:15  0:00:26  0:01:49 9378k
 20 1783M   20  360M    0     0  13.1M      0  0:02:16  0:00:27  0:01:49 8999k
 20 1783M   20  373M    0     0  13.1M      0  0:02:16  0:00:28  0:01:48  9.9M
 21 1783M   21  380M    0     0  12.9M      0  0:02:17  0:00:29  0:01:48 10.2M
 21 1783M   21  387M    0     0  12.7M      0  0:02:20  0:00:30  0:01:50  9.9M
 22 1783M   22  393M    0     0  12.5M      0  0:02:22  0:00:31  0:01:51 9336k
 22 1783M   22  402M    0     0  12.3M      0  0:02:24  0:00:32  0:01:52 8388k
 24 1783M   24  432M    0     0  12.9M      0  0:02:17  0:00:33  0:01:44 11.8M
 25 1783M   25  454M    0     0  13.1M      0  0:02:15  0:00:34  0:01:41 14.6M
 26 1783M   26  470M    0     0  13.2M      0  0:02:14  0:00:35  0:01:39 16.6M
 27 1783M   27  481M    0     0  13.2M      0  0:02:15  0:00:36  0:01:39 17.5M
 27 1783M   27  487M    0     0  12.9M      0  0:02:17  0:00:37  0:01:40 17.5M
 27 1783M   27  492M    0     0  12.7M      0  0:02:19  0:00:38  0:01:41 11.9M
 28 1783M   28  501M    0     0  12.6M      0  0:02:20  0:00:39  0:01:41 9631k
 28 1783M   28  509M    0     0  12.5M      0  0:02:21  0:00:40  0:01:41 7770k
 28 1783M   28  510M    0     0  12.3M      0  0:02:24  0:00:41  0:01:43 5908k
 28 1783M   28  511M    0     0  12.0M      0  0:02:28  0:00:42  0:01:46 4986k
 28 1783M   28  516M    0     0  11.8M      0  0:02:30  0:00:43  0:01:47 5016k
 29 1783M   29  526M    0     0  11.8M      0  0:02:30  0:00:44  0:01:46 5256k
 30 1783M   30  535M    0     0  11.7M      0  0:02:31  0:00:45  0:01:46 5473k
 30 1783M   30  549M    0     0  11.8M      0  0:02:30  0:00:46  0:01:44 7989k
 31 1783M   31  567M    0     0  11.9M      0  0:02:29  0:00:47  0:01:42 11.1M
 32 1783M   32  583M    0     0  12.0M      0  0:02:28  0:00:48  0:01:40 13.3M
 33 1783M   33  591M    0     0  11.9M      0  0:02:29  0:00:49  0:01:40 12.9M
 33 1783M   33  599M    0     0  11.8M      0  0:02:30  0:00:50  0:01:40 12.7M
 34 1783M   34  610M    0     0  11.8M      0  0:02:30  0:00:51  0:01:39 12.2M
 34 1783M   34  619M    0     0  11.8M      0  0:02:31  0:00:52  0:01:39 10.4M
 35 1783M   35  626M    0     0  11.7M      0  0:02:32  0:00:53  0:01:39 8828k
 35 1783M   35  634M    0     0  11.6M      0  0:02:33  0:00:54  0:01:39 8837k
 39 1783M   39  699M    0     0  12.6M      0  0:02:21  0:00:55  0:01:26 20.1M
 41 1783M   41  736M    0     0  13.0M      0  0:02:16  0:00:56  0:01:20 25.1M
 42 1783M   42  757M    0     0  13.1M      0  0:02:15  0:00:57  0:01:18 27.5M
 43 1783M   43  776M    0     0  13.2M      0  0:02:14  0:00:58  0:01:16 29.9M
 44 1783M   44  786M    0     0  13.2M      0  0:02:14  0:00:59  0:01:15 30.3M
 44 1783M   44  794M    0     0  13.1M      0  0:02:15  0:01:00  0:01:15 18.9M
 45 1783M   45  803M    0     0  13.0M      0  0:02:16  0:01:01  0:01:15 13.3M
 45 1783M   45  816M    0     0  13.0M      0  0:02:16  0:01:02  0:01:14 11.7M
 46 1783M   46  834M    0     0  13.1M      0  0:02:15  0:01:03  0:01:12 11.5M
 47 1783M   47  849M    0     0  13.1M      0  0:02:15  0:01:04  0:01:11 12.6M
 48 1783M   48  863M    0     0  13.1M      0  0:02:15  0:01:05  0:01:10 13.7M
 49 1783M   49  879M    0     0  13.2M      0  0:02:14  0:01:06  0:01:08 15.3M
 49 1783M   49  890M    0     0  13.1M      0  0:02:15  0:01:07  0:01:08 14.7M
 50 1783M   50  904M    0     0  13.2M      0  0:02:15  0:01:08  0:01:07 13.9M
 51 1783M   51  914M    0     0  13.1M      0  0:02:15  0:01:09  0:01:06 13.0M
 51 1783M   51  922M    0     0  13.0M      0  0:02:16  0:01:10  0:01:06 11.7M
 52 1783M   52  933M    0     0  13.0M      0  0:02:16  0:01:11  0:01:05 10.7M
 53 1783M   53  949M    0     0  13.1M      0  0:02:16  0:01:12  0:01:04 11.9M
 54 1783M   54  968M    0     0  13.1M      0  0:02:15  0:01:13  0:01:02 12.9M
 55 1783M   55  994M    0     0  13.3M      0  0:02:13  0:01:14  0:00:59 15.9M
 57 1783M   57 1023M    0     0  13.5M      0  0:02:11  0:01:15  0:00:56 20.1M
 59 1783M   59 1059M    0     0  13.8M      0  0:02:08  0:01:16  0:00:52 25.2M
 61 1783M   61 1099M    0     0  14.1M      0  0:02:05  0:01:17  0:00:48 29.8M
 64 1783M   64 1144M    0     0  14.5M      0  0:02:02  0:01:18  0:00:44 35.1M
 67 1783M   67 1195M    0     0  15.0M      0  0:01:58  0:01:19  0:00:39 40.3M
 70 1783M   70 1251M    0     0  15.5M      0  0:01:54  0:01:20  0:00:34 45.6M
 73 1783M   73 1312M    0     0  16.1M      0  0:01:50  0:01:21  0:00:29 50.6M
 77 1783M   77 1381M    0     0  16.7M      0  0:01:46  0:01:22  0:00:24 56.5M
 81 1783M   81 1454M    0     0  17.4M      0  0:01:42  0:01:23  0:00:19 61.9M
 85 1783M   85 1521M    0     0  18.0M      0  0:01:39  0:01:24  0:00:15 65.0M
 88 1783M   88 1583M    0     0  18.5M      0  0:01:36  0:01:25  0:00:11 66.5M
 93 1783M   93 1659M    0     0  19.1M      0  0:01:32  0:01:26  0:00:06 69.2M
 97 1783M   97 1733M    0     0  19.8M      0  0:01:29  0:01:27  0:00:02 70.3M
100 1783M  100 1783M    0     0  20.2M      0  0:01:28  0:01:28 --:--:-- 70.6M

 [taskcluster-vcs] run end : curl -L -o /home/worker/.tc-vcs/clones/hg.mozilla.org/integration/mozilla-inbound/.tar.gz https://queue.taskcluster.net/v1/task/mmHCxwmfT4Ccb_K7M52-4g/artifacts/public/hg.mozilla.org/integration/mozilla-inbound/.tar.gz (0) in 88136 ms

 

 /usr/local/lib/node_modules/taskcluster-vcs/build/bin/tc-vcs.js:57

         throw err;

               ^

 Error: connect ETIMEDOUT

     at errnoException (net.js:905:11)

     at Object.afterConnect [as oncomplete] (net.js:896:19)

 ========= Finished Clone gecko tc-vcs checkout workspace/gecko https://hg.mozilla.org/integration/mozilla-inbound/ https://hg.mozilla.org/integration/mozilla-inbound/ 66d59599465c708d4dc985c774c99ccc26aca69d 66d59599465c708d4dc985c774c99ccc26aca69d (results: 8, elapsed: 217 secs) (at 2015-09-27 10:09:41.280000) =========

 [
Seeing if I can reproduce locally in a docker container, based on "Running Locally" section of https://tools.taskcluster.net/task-inspector/#WeGY1ORTQ-CmT_wZS8NfHw/

    #!/bin/bash -e
    # WARNING: this is experimental mileage may vary!

    # Fetch docker image
    docker pull 'taskcluster/builder:0.5.9'

    # Find a unique container name
    export NAME='task-WeGY1ORTQ-CmT_wZS8NfHw-container'

    # Run docker command
    docker run -ti \
      --name "${NAME}" \
      -e GECKO_BASE_REPOSITORY='https://hg.mozilla.org/integration/mozilla-inbound/' \
      -e TOOLTOOL_REPO='https://git.mozilla.org/build/tooltool.git' \
      -e B2G_DEBUG='0' \
      -e GECKO_HEAD_REPOSITORY='https://hg.mozilla.org/integration/mozilla-inbound/' \
      -e TOOLTOOL_MANIFEST='b2g/config/emulator-x86-kk/releng-emulator-kk.tt' \
      -e GECKO_HEAD_REV='66d59599465c708d4dc985c774c99ccc26aca69d' \
      -e GECKO_HEAD_REF='66d59599465c708d4dc985c774c99ccc26aca69d' \
      -e TOOLTOOL_REV='master' \
      -e TARGET='emulator-x86-kk' \
      taskcluster/builder:0.5.9 \
      /bin/bash -c $'checkout-gecko workspace && cd ./workspace/gecko/testing/taskcluster/scripts/builder && buildbot_step \'Build\' ./build-emulator-x86.sh $HOME/workspace\n' \

    # Extract Artifacts
    mkdir -p 'public/build'
    docker cp "${NAME}:/home/worker/artifacts/" 'public/build'

    # Delete docker container
    docker rm -v "${NAME}"
I'm suspicious of https://github.com/taskcluster/taskcluster-vcs/pull/27 as this is the most recent tc-vcs change, and it looks like a tc-vcs bug.

The PR landed 14 hours ago, and problems starting arising only 30 mins ago - but at the moment i'm not sure if tc-vcs is burned into the docker image, or it is downloaded on startup, or whether maybe a new docker image has been published and is now being used. In short, it may or may not be related - having the docker image locally should help me establish which version of tc-vcs is being used, and how/when it got there.
So the version used is 2.3.12 according to
/usr/local/lib/node_modules/taskcluster-vcs/package.json
inside 'taskcluster/builder:0.5.9'

This would therefore be this version:
https://github.com/taskcluster/taskcluster-vcs/tree/2.3.12
(In reply to Pete Moore [:pmoore][:pete] from comment #2)
> I'm suspicious of https://github.com/taskcluster/taskcluster-vcs/pull/27 as
> this is the most recent tc-vcs change, and it looks like a tc-vcs bug.

It can't be this, since this is not in version 2.3.12.
So the curl is successful, and the crash is immediately afterwards. This would be the time tc-vcs then pulls changes from hg.

I was able to rerun a task, and it ran successfully, which would be using exactly the same version of tc-vcs and other tools frozen in the docker image. Therefore it seems to be something intermittent rather than a fundamental programming error that would cause something consistently to break.

I therefore now suspect it might be an hg issue, but that tc-vcs is doing a really bad job of logging the cause, and just crashing. This would explain why it is has just started happening despite there being no recent tc-vcs changes, and why it might be intermittent.

No doubt we have to sort out tc-vcs, rewrite it or at least improve logging, but for now, it is probably not the *root* cause of the problem we are currently experiencing...
trees were reopen because we were hoping this is just a intermittent problem
Briefly scanned through the code, and based on where this is at in the task logs, and what should be going on, I suspect this might be where the error is cropping up.

https://github.com/taskcluster/taskcluster-vcs/blob/2.3.12/src/vcs/detect_remote.js#L44

Not only could these calls be a little more robust (retry+backoff) but since we know more about what operation failed, we could put that in the log prefixed with [taskclsuter-vcs:error] so that this displays properly in treeherder as well.
This seemed temporary. We're moving ahead with a replacement for tc-vcs so not going to dig into this any further.
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.