[traceback] ConnectionError HTTPSConnectionPool(...): Read timed out.
Categories
(Tecken :: Symbolication, defect, P2)
Tracking
(Not tracked)
People
(Reporter: willkg, Assigned: willkg)
References
Details
Attachments
(1 file)
Eliot got a ConnectionError when trying to download something from S3. It should be using a session that's set up to retry, so I'm puzzled.
Sentry: https://sentry.prod.mozaws.net/operations/symbols-stage/issues/13176320/
timeout: The read operation timed out
File "urllib3/response.py", line 436, in _error_catcher
yield
File "urllib3/response.py", line 518, in read
data = self._fp.read(amt) if not fp_closed else b""
File "http/client.py", line 462, in read
n = self.readinto(b)
File "http/client.py", line 506, in readinto
n = self.fp.readinto(b)
File "python3.9/socket.py", line 704, in readinto
return self._sock.recv_into(b)
File "python3.9/ssl.py", line 1241, in recv_into
return self.read(nbytes, buffer)
File "python3.9/ssl.py", line 1099, in read
return self._sslobj.read(len, buffer)
ReadTimeoutError: HTTPSConnectionPool(host='s3.us-west-2.amazonaws.com', port=443): Read timed out.
File "requests/models.py", line 758, in generate
for chunk in self.raw.stream(chunk_size, decode_content=True):
File "urllib3/response.py", line 575, in stream
data = self.read(amt=amt, decode_content=decode_content)
File "urllib3/response.py", line 540, in read
raise IncompleteRead(self._fp_bytes_read, self.length_remaining)
File "python3.9/contextlib.py", line 137, in __exit__
self.gen.throw(typ, value, traceback)
File "urllib3/response.py", line 441, in _error_catcher
raise ReadTimeoutError(self._pool, None, "Read timed out.")
ConnectionError: HTTPSConnectionPool(host='s3.us-west-2.amazonaws.com', port=443): Read timed out.
File "falcon/app.py", line 361, in falcon.app.App.__call__
File "markus/main.py", line 515, in _timer_decorator
return fun(*args, **kwargs)
File "eliot/symbolicate_resource.py", line 500, in on_post
results.append(self.symbolicate(stacks, modules))
File "eliot/symbolicate_resource.py", line 325, in symbolicate
self.get_symcache(module_info)
File "eliot/symbolicate_resource.py", line 257, in get_symcache
sym_file = self.download_sym_file(debug_filename, debug_id)
File "eliot/symbolicate_resource.py", line 183, in download_sym_file
data = self.downloader.get(debug_filename, debug_id, sym_filename)
File "eliot/downloader.py", line 163, in get
return source.get(debug_filename, debug_id, filename)
File "eliot/downloader.py", line 61, in _time_download_fun
rv = fun(*args, **kwargs)
File "eliot/downloader.py", line 114, in get
resp = self.session.get(url, allow_redirects=True)
File "requests/sessions.py", line 555, in get
return self.request('GET', url, **kwargs)
File "requests/sessions.py", line 542, in request
resp = self.send(prep, **send_kwargs)
File "raven/breadcrumbs.py", line 341, in send
resp = real_send(self, request, *args, **kwargs)
File "requests/sessions.py", line 677, in send
history = [resp for resp in gen]
File "requests/sessions.py", line 677, in <listcomp>
history = [resp for resp in gen]
File "requests/sessions.py", line 237, in resolve_redirects
resp = self.send(
File "raven/breadcrumbs.py", line 341, in send
resp = real_send(self, request, *args, **kwargs)
File "requests/sessions.py", line 697, in send
r.content
File "requests/models.py", line 836, in content
self._content = b''.join(self.iter_content(CONTENT_CHUNK_SIZE)) or b''
File "requests/models.py", line 765, in generate
raise ConnectionError(e)
I briefly looked at the urllib3 retry code and it's possible we need to pass in an additional parameter specifying how many times to retry on a connection error:
I think the requests library still vendors urllib3. If that's the case, we need to check what the vendored version is doing.
Anyhow, this bug covers figuring out what's going on and fixing it. We can't be having connection errors messing everything up.
| Assignee | ||
Updated•4 years ago
|
| Assignee | ||
Comment 1•4 years ago
|
||
I think what's going on is that it is retrying after a ConnectionError, but it's hitting the max retries and reraising the error.
This error is from September 17th, 2021. We were using urllib3 1.25.11 at the time.
The retry code is here:
When Eliot creates a Retry instance, it doesn't specify read, so read is set in the __init__ as None. However, this code is looking for False. I have no idea when read could ever be False. Also, the type hints from their master tip suggest the value is either int or None. Curious.
In our case, read is set to None and so it should skip the bit about read and retry total number of times. However, if that happens, then we should have gotten a MaxRetryError at the top of the error stack and we didn't.
Looking at the session code and the adapter code, it sure looks like the HTTPAdapterWithTimeout only covers sending the HTTP request and once a request has been sent, it generates a response instance and it's done. Thus the read timeout isn't covered in the retry setup we built.
I need to look into this more. Probably by writing an HTTP server that accepts the HTTP request and then starts to issue a response and then sits on it. Then I can set up a client with the session_with_retries session and see what it does.
| Assignee | ||
Comment 2•4 years ago
|
||
Here's a script that reproduces the issue:
from eliot.librequests import session_with_retries
import logging
import http
http.client.HTTPConnection.debuglevel = 1
logging.basicConfig(level=logging.DEBUG)
logging.getLogger().setLevel(logging.DEBUG)
req_log = logging.getLogger('urllib3')
req_log.setLevel(logging.DEBUG)
req_log.propagate = True
session = session_with_retries()
print(session.get("https://httpbin.org/drip?duration=60&numbytes=2"))
Result:
... lots of traceback output ...
Traceback (most recent call last):
File "/app/eliot-service/ret.py", line 17, in <module>
print(session.get("https://httpbin.org/drip?duration=60&numbytes=2"))
File "/usr/local/lib/python3.9/site-packages/requests/sessions.py", line 555, in get
return self.request('GET', url, **kwargs)
File "/usr/local/lib/python3.9/site-packages/requests/sessions.py", line 542, in request
resp = self.send(prep, **send_kwargs)
File "/usr/local/lib/python3.9/site-packages/requests/sessions.py", line 697, in send
r.content
File "/usr/local/lib/python3.9/site-packages/requests/models.py", line 836, in content
self._content = b''.join(self.iter_content(CONTENT_CHUNK_SIZE)) or b''
File "/usr/local/lib/python3.9/site-packages/requests/models.py", line 765, in generate
raise ConnectionError(e)
requests.exceptions.ConnectionError: HTTPSConnectionPool(host='httpbin.org', port=443): Read timed out.
httpbin drip will send send the response headers and then send numbytes of data (2 here) over duration seconds (60). That exceeds the 5 second timeout session_with_retries defaults to. We get the exact error as in the Sentry issue and no mention of retrying.
Let's switch it to httpbin delay which gets the HTTP request, then waits X seconds, and then sends the whole response:
from eliot.librequests import session_with_retries
import logging
import http
http.client.HTTPConnection.debuglevel = 1
logging.basicConfig(level=logging.DEBUG)
logging.getLogger().setLevel(logging.DEBUG)
req_log = logging.getLogger('urllib3')
req_log.setLevel(logging.DEBUG)
req_log.propagate = True
session = session_with_retries()
print(session.get("https://httpbin.org/delay/60"))
Result:
... lots of retry output for 10 attempts ...
Traceback (most recent call last):
File "/app/eliot-service/ret.py", line 18, in <module>
print(session.get("https://httpbin.org/delay/60"))
File "/usr/local/lib/python3.9/site-packages/requests/sessions.py", line 555, in get
return self.request('GET', url, **kwargs)
File "/usr/local/lib/python3.9/site-packages/requests/sessions.py", line 542, in request
resp = self.send(prep, **send_kwargs)
File "/usr/local/lib/python3.9/site-packages/requests/sessions.py", line 655, in send
r = adapter.send(request, **kwargs)
File "/app/eliot-service/eliot/librequests.py", line 37, in send
return super().send(*args, **kwargs)
File "/usr/local/lib/python3.9/site-packages/requests/adapters.py", line 516, in send
raise ConnectionError(e, request=request)
requests.exceptions.ConnectionError: HTTPSConnectionPool(host='httpbin.org', port=443): Max retries exceeded with url: /delay/60 (Caused by ReadTimeoutError("HTTPSConnectionPool(host='httpbin.org', port=443): Read timed out. (read timeout=5.0)"))
Ergo, I'm right on and the session_with_retries setup doesn't cover timeouts when reading the body of the HTTP response and I need a different way of doing things.
I think I could probably return a session wrapper that does what I want it to, but it'll be messy because it's got multiple things "retrying". Peter used to use backoff. I'll look at that first.
| Assignee | ||
Comment 3•4 years ago
|
||
| Assignee | ||
Comment 4•4 years ago
|
||
willkg merged PR #2445: "bug 1739202: rework requests wrappers to handle content errors" in 94b5d32.
I ended up reworking things and moving the "retryable" stuff off to backoff. That allows me to wrap a method that sends the HTTP request and also reads the response payload as a retryable block.
| Assignee | ||
Comment 5•4 years ago
|
||
This deployed to prod in bug #1739601. Marking as FIXED.
Description
•