Closed Bug 1399168 Opened 8 years ago Closed 7 years ago

Various Talos tests fail to retrieve files from https://queue.taskcluster.net

Categories

(Infrastructure & Operations :: RelOps: General, task)

task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: markco, Assigned: markco)

References

Details

Refer to: https://treeherder.mozilla.org/#/jobs?repo=try&revision=c67d7678964c9030f97985f538361529c3d5bc67&selectedJob=130121199 & https://docs.google.com/spreadsheets/d/1GqDuxFpPVDSUvjCbjklh2I2VqGSiKzGtxkpgMQc5j0U/edit#gid=997564298 Seeing a socket error on multiple test: https://public-artifacts.taskcluster.net/NDd3uze_R261C7X3qcvJBg/0/public/logs/talos_raw.log 22:59:03 WARNING - Socket error when accessing https://queue.taskcluster.net/v1/task/RmKFBTigRO2EIs9njrXZ8g/artifacts/public/build/target.zip: ('The read operation timed out',) 23:00:28 ERROR - Return code: -1073741515 23:00:28 ERROR - Return code: -1073741515 23:00:28 FATAL - Uncaught exception: Traceback (most recent call last): 23:00:28 FATAL - File "C:\Users\task_1505170357\mozharness\mozharness\base\script.py", line 2058, in run 23:00:28 FATAL - self.run_action(action) 23:00:28 FATAL - File "C:\Users\task_1505170357\mozharness\mozharness\base\script.py", line 1997, in run_action 23:00:28 FATAL - self._possibly_run_method(method_name, error_if_missing=True) 23:00:28 FATAL - File "C:\Users\task_1505170357\mozharness\mozharness\base\script.py", line 1937, in _possibly_run_method 23:00:28 FATAL - return getattr(self, method_name)() 23:00:28 FATAL - File "C:\Users\task_1505170357\mozharness\mozharness\mozilla\testing\talos.py", line 436, in setup_mitmproxy 23:00:28 FATAL - self.setup_py3_virtualenv() 23:00:28 FATAL - File "C:\Users\task_1505170357\mozharness\mozharness\mozilla\testing\talos.py", line 453, in setup_py3_virtualenv 23:00:28 FATAL - self.py3_venv_configuration(python_path=self.py3_path, venv_path='py3venv') 23:00:28 FATAL - File "C:\Users\task_1505170357\mozharness\mozharness\base\python.py", line 815, in py3_venv_configuration 23:00:28 FATAL - [self.py3_python_path, '--version'], env=self.query_env()).split()[-1] 23:00:28 FATAL - AttributeError: 'NoneType' object has no attribute 'split' 23:00:28 FATAL - Running post_fatal callback... 23:00:28 FATAL - Exiting -1 test-windows7-32/opt-talos-svgr-e10s https://public-artifacts.taskcluster.net/EuJ20i6uSemAriccNAOwWQ/0/public/logs/talos_raw.log 20:31:41 FATAL - Can't download from https://queue.taskcluster.net/v1/task/RmKFBTigRO2EIs9njrXZ8g/artifacts/public/build/target.common.tests.zip 20:31:41 FATAL - Caught exception: ('The read operation timed out',) 20:31:41 FATAL - Caught exception: ('The read operation timed out',) 20:31:41 FATAL - Caught exception: ('The read operation timed out',) 20:31:41 FATAL - Caught exception: ('The read operation timed out',) 20:31:41 FATAL - Caught exception: ('The read operation timed out',) 20:31:41 FATAL - Running post_fatal callback... 20:31:41 FATAL - Exiting -1 20:31:41 WARNING - Blob upload gear skipped. Missin
Blocks: 1391009
I am able to connect, though it receives a 303 response initially to the and begin the download but it never completes. C:\Users\task_1505172480>wget https://queue.taskcluster.net/v1/task/RmKFBTigRO2E Is9njrXZ8g/artifacts/public/build/target.zip --2017-09-12 17:49:55-- https://queue.taskcluster.net/v1/task/RmKFBTigRO2EIs9nj rXZ8g/artifacts/public/build/target.zip Resolving queue.taskcluster.net (queue.taskcluster.net)... 54.243.104.249, 23.23 .151.4, 23.23.190.202 Connecting to queue.taskcluster.net (queue.taskcluster.net)|54.243.104.249|:443. .. connected. HTTP request sent, awaiting response... 303 See Other Location: https://public-artifacts.taskcluster.net/RmKFBTigRO2EIs9njrXZ8g/0/publ ic/build/target.zip [following] --2017-09-12 17:49:56-- https://public-artifacts.taskcluster.net/RmKFBTigRO2EIs 9njrXZ8g/0/public/build/target.zip Resolving public-artifacts.taskcluster.net (public-artifacts.taskcluster.net)... 52.84.239.190 Connecting to public-artifacts.taskcluster.net (public-artifacts.taskcluster.net )|52.84.239.190|:443... connected. HTTP request sent, awaiting response... 200 OK Length: 52732423 (50M) [application/x-zip-compressed] Saving to: 'target.zip' target.zip 22%[===> ] 11.33M 3.25MB/s eta 13s And it just hangs there. Dustin: any thoughts or suggestion on where to start or who to ask about this?
Flags: needinfo?(dustin)
If it's getting the 303 and downloading 11.33M, then it's talking directly to S3. No TC code is in that transaction :( S3 has a nonzero failure rate, but usually not hanging, and usually not repeatable. If this is repeatable, and repeatable with wget and with Python, that suggests that something in the host's networking stack is at fault -- Windows firewall, maybe, or NIC driver, ...
Flags: needinfo?(dustin)
Assignee: relops → mcornmesser
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.