Open Bug 1694407 Opened 4 years ago Updated 2 years ago

Gecko Decision task times out when pushing to try from release or beta

Categories

(Developer Infrastructure :: Try, defect)

defect

Tracking

(Not tracked)

People

(Reporter: jstutte, Unassigned)

References

(Blocks 1 open bug)

Details

On a given patch set, the decision task times out reliably.

Flags: needinfo?(mcastelluccio)

The HTTP service is trying to pull a lot of changesets that are not part of the push, and times out while pulling:

Feb 23 14:29:06 bugbug app/worker.3: 2021-02-23 14:29:06,037:INFO:rq.worker:default: bugbug_http.models.schedule_tests('try', 'c79f27f66498130b4133c0052fa3770227670f16') (c6a25f1974764f318b35e0225a984cb8)
Feb 23 14:29:12 bugbug app/worker.3: 2021-02-23 14:29:11,743:INFO:root:Processing schedule_tests:try_c79f27f66498130b4133c0052fa3770227670f16...
Feb 23 14:29:12 bugbug app/worker.3: 2021-02-23 14:29:11,743:INFO:root:Pulling commits from the remote repository...
Feb 23 14:29:35 bugbug app/worker.3: pulling from https://hg.mozilla.org/try/
Feb 23 14:29:35 bugbug app/worker.3: using https://hg.mozilla.org/try/
Feb 23 14:29:35 bugbug app/worker.3: sending capabilities command
Feb 23 14:29:35 bugbug app/worker.3: using ca certificates from certifi
Feb 23 14:29:35 bugbug app/worker.3: using /version-control-tools/third_party/python/certifi/certifi/cacert.pem for CA file
Feb 23 14:29:35 bugbug app/worker.3: preparing listkeys for "bookmarks"
Feb 23 14:29:35 bugbug app/worker.3: sending batch command
Feb 23 14:29:35 bugbug app/worker.3: sending 91 bytes
Feb 23 14:29:35 bugbug app/worker.3: received listkey for "bookmarks": 3080 bytes
Feb 23 14:29:35 bugbug app/worker.3: query 1; heads
Feb 23 14:29:35 bugbug app/worker.3: sending batch command
Feb 23 14:29:35 bugbug app/worker.3: sending 1011 bytes
Feb 23 14:29:35 bugbug app/worker.3: searching for changes
Feb 23 14:29:35 bugbug app/worker.3: taking initial sample
Feb 23 14:29:35 bugbug app/worker.3: query 2; still undecided: 38, sample size is: 38
Feb 23 14:29:35 bugbug app/worker.3: sending known command
Feb 23 14:29:35 bugbug app/worker.3: sending 1563 bytes
Feb 23 14:29:35 bugbug app/worker.3: 2 total queries in 5.1258s
Feb 23 14:29:35 bugbug app/worker.3: sending getbundle command
Feb 23 14:29:35 bugbug app/worker.3: sending 1367 bytes
Feb 23 14:29:35 bugbug app/worker.3: bundle2-input-bundle: with-transaction
Feb 23 14:29:35 bugbug app/worker.3: bundle2-input-part: "changegroup" (params: 1 mandatory 1 advisory) supported
Feb 23 14:29:35 bugbug app/worker.3: adding changesets
Feb 23 14:29:35 bugbug app/worker.3: add changeset 976dd158ef7f
Feb 23 14:29:35 bugbug app/worker.3: add changeset 18b27fa2be84
Feb 23 14:29:35 bugbug app/worker.3: add changeset 26a17424c310
Feb 23 14:29:35 bugbug app/worker.3: add changeset 08cd11c22095
Feb 23 14:29:35 bugbug app/worker.3: add changeset 7e26ca8db92b
Feb 23 14:29:35 bugbug app/worker.3: add changeset 1a53b79ea529
Feb 23 14:29:35 bugbug app/worker.3: add changeset 822bc5cbc8f4
Feb 23 14:29:35 bugbug app/worker.3: add changeset 50de8c1763e2
Feb 23 14:29:35 bugbug app/worker.3: add changeset 17666746e8cc
Feb 23 14:29:35 bugbug app/worker.3: add changeset 18416a172146
Feb 23 14:29:35 bugbug app/worker.3: add changeset 19d48b5f0ca1 
...
Feb 23 14:59:53 bugbug app/worker.1: add changeset c79f27f66498 
Feb 23 14:59:53 bugbug app/worker.1: adding manifests 
Feb 23 14:59:53 bugbug app/worker.1: bundle2-input-bundle: 1 parts total 
Feb 23 14:59:53 bugbug app/worker.1: transaction abort! 
Feb 23 14:59:53 bugbug app/worker.1: rollback completed 
Feb 23 14:59:53 bugbug app/worker.1: (sent 5 HTTP requests and 4725 bytes; received 156164853 bytes in responses) 
Feb 23 14:59:53 bugbug app/worker.1: killed! 
Feb 23 14:59:53 bugbug app/worker.1: 2021-02-23 14:59:53,288:ERROR:rq.worker:Traceback (most recent call last): 
Feb 23 14:59:53 bugbug app/worker.1:   File "/usr/local/lib/python3.8/site-packages/rq/worker.py", line 975, in perform_job 
Feb 23 14:59:53 bugbug app/worker.1:     rv = job.perform() 
Feb 23 14:59:53 bugbug app/worker.1:   File "/usr/local/lib/python3.8/site-packages/rq/job.py", line 696, in perform 
Feb 23 14:59:53 bugbug app/worker.1:     self._result = self._execute() 
Feb 23 14:59:53 bugbug app/worker.1:   File "/usr/local/lib/python3.8/site-packages/rq/job.py", line 719, in _execute 
Feb 23 14:59:53 bugbug app/worker.1:     return self.func(*self.args, **self.kwargs) 
Feb 23 14:59:53 bugbug app/worker.1:   File "/usr/local/lib/python3.8/site-packages/bugbug_http/models.py", line 128, in schedule_tests 
Feb 23 14:59:53 bugbug app/worker.1:     repository.pull(REPO_DIR, branch, rev) 
Feb 23 14:59:53 bugbug app/worker.1:   File "/usr/local/lib/python3.8/site-packages/bugbug/repository.py", line 1349, in pull 
Feb 23 14:59:53 bugbug app/worker.1:     trigger_pull() 
Feb 23 14:59:53 bugbug app/worker.1:   File "/usr/local/lib/python3.8/site-packages/tenacity/__init__.py", line 333, in wrapped_f 
Feb 23 14:59:53 bugbug app/worker.1:     return self(f, *args, **kw) 
Feb 23 14:59:53 bugbug app/worker.1:   File "/usr/local/lib/python3.8/site-packages/tenacity/__init__.py", line 423, in __call__ 
Feb 23 14:59:53 bugbug app/worker.1:     do = self.iter(retry_state=retry_state) 
Feb 23 14:59:53 bugbug app/worker.1:   File "/usr/local/lib/python3.8/site-packages/tenacity/__init__.py", line 372, in iter 
Feb 23 14:59:53 bugbug app/worker.1:     raise retry_exc.reraise() 
Feb 23 14:59:53 bugbug app/worker.1:   File "/usr/local/lib/python3.8/site-packages/tenacity/__init__.py", line 189, in reraise 
Feb 23 14:59:53 bugbug app/worker.1:     raise self.last_attempt.result() 
Feb 23 14:59:53 bugbug app/worker.1:   File "/usr/local/lib/python3.8/concurrent/futures/_base.py", line 432, in result 
Feb 23 14:59:53 bugbug app/worker.1:     return self.__get_result() 
Feb 23 14:59:53 bugbug app/worker.1:   File "/usr/local/lib/python3.8/concurrent/futures/_base.py", line 388, in __get_result 
Feb 23 14:59:53 bugbug app/worker.1:     raise self._exception 
Feb 23 14:59:53 bugbug app/worker.1:   File "/usr/local/lib/python3.8/site-packages/tenacity/__init__.py", line 426, in __call__ 
Feb 23 14:59:53 bugbug app/worker.1:     result = fn(*args, **kwargs) 
Feb 23 14:59:53 bugbug app/worker.1:   File "/usr/local/lib/python3.8/site-packages/bugbug/repository.py", line 1338, in trigger_pull 
Feb 23 14:59:53 bugbug app/worker.1:     p.wait(timeout=180) 
Feb 23 14:59:53 bugbug app/worker.1:   File "/usr/local/lib/python3.8/site-packages/sentry_sdk/integrations/stdlib.py", line 208, in sentry_patched_popen_wait 
Feb 23 14:59:53 bugbug app/worker.1:     return old_popen_wait(self, *a, **kw) 
Feb 23 14:59:53 bugbug app/worker.1:   File "/usr/local/lib/python3.8/subprocess.py", line 1079, in wait 
Feb 23 14:59:53 bugbug app/worker.1:     return self._wait(timeout=timeout) 
Feb 23 14:59:53 bugbug app/worker.1:   File "/usr/local/lib/python3.8/subprocess.py", line 1796, in _wait 
Feb 23 14:59:53 bugbug app/worker.1:     raise TimeoutExpired(self.args, timeout) 
Feb 23 14:59:53 bugbug app/worker.1: subprocess.TimeoutExpired: Command '['hg', 'pull', b'-rc79f27f66498130b4133c0052fa3770227670f16', b'--debug', b'--', b'https://hg.mozilla.org/try/']' timed out after 180 seconds 

I think the issue is that the patch was based on a commit from "release", and so the service was trying to pull everything from "release" (the service locally has a "autoland" clone).

A possible fix would be to use a "unified" clone in the service. There will always be a mismatch problem when running "mach try auto" on a "release" commit, since the tests that the service knows about might not be the same as the ones available on release, but at least it will not fail with a timeout.

Flags: needinfo?(mcastelluccio)
Summary: Gecko Decision task times out → Gecko Decision task times out when pushing to try from release or beta

Is there a better component to track bugbug issues such as this? It's not a task configuration bug.

Flags: needinfo?(mcastelluccio)

We can use Developer Infrastructure::Try.

Component: Task Configuration → Try
Flags: needinfo?(mcastelluccio)
Product: Firefox Build System → Developer Infrastructure
Duplicate of this bug: 1833439
Severity: -- → S3
You need to log in before you can comment on or make changes to this bug.