Intermittent Wd | UnknownErrorException: unknown error (500): Too many open files (os error 24)
Categories
(Testing :: geckodriver, defect, P3)
Tracking
(firefox81 fixed)
Tracking | Status | |
---|---|---|
firefox81 | --- | fixed |
People
(Reporter: whimboo, Assigned: jgraham)
References
Details
Attachments
(1 file)
I was running all the wdspec tests locally via headless mode, and at the end the following failure was reported for a couple of tests:
/webdriver/tests/get_window_handles/user_prompts.py
ERROR test_dismiss_and_notify[capabilities0-alert] - setup error
capabilities = {'unhandledPromptBehavior': 'dismiss and notify'}
configuration = {'capabilities': {'moz:firefoxOptions': {'args': ['--headless'], 'binary': '/Users/henrik/code/gecko/obj/opt/dist/Nigh...ww1.xn--lve-6lad.not-web-platform.test,xn--lve-6lad.www2.not-web-platform.test'}}}, 'host': '127.0.0.1', 'port': 55606}
request = <SubRequest 'session' for <Function 'test_dismiss_and_notify[capabilities0-alert]'>>
@pytest.fixture(scope="function")
def session(capabilities, configuration, request):
"""Create and start a session for a test that does not itself test session creation.
By default the session will stay open after each test, but we always try to start a
new one and assume that if that fails there is already a valid session. This makes it
possible to recover from some errors that might leave the session in a bad state, but
does not demand that we start a new session per test."""
global _current_session
# Update configuration capabilities with custom ones from the
# capabilities fixture, which can be set by tests
caps = copy.deepcopy(configuration["capabilities"])
caps.update(capabilities)
caps = {"alwaysMatch": caps}
# If there is a session with different capabilities active, end it now
if _current_session is not None and (
caps != _current_session.requested_capabilities):
_current_session.end()
_current_session = None
if _current_session is None:
_current_session = webdriver.Session(
configuration["host"],
configuration["port"],
capabilities=caps)
try:
> _current_session.start()
capabilities = {'unhandledPromptBehavior': 'dismiss and notify'}
caps = {'alwaysMatch': {'moz:firefoxOptions': {'args': ['--headless'], 'binary': '/Users/henrik/code/gecko/obj/opt/dist/Night...lad.not-web-platform.test,xn--lve-6lad.www2.not-web-platform.test'}}, 'unhandledPromptBehavior': 'dismiss and notify'}}
configuration = {'capabilities': {'moz:firefoxOptions': {'args': ['--headless'], 'binary': '/Users/henrik/code/gecko/obj/opt/dist/Nigh...ww1.xn--lve-6lad.not-web-platform.test,xn--lve-6lad.www2.not-web-platform.test'}}}, 'host': '127.0.0.1', 'port': 55606}
request = <SubRequest 'session' for <Function 'test_dismiss_and_notify[capabilities0-alert]'>>
testing/web-platform/tests/webdriver/tests/support/fixtures.py:141:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
testing/web-platform/tests/tools/webdriver/webdriver/client.py:435: in start
value = self.send_command("POST", "session", body=body)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
self = <Session (disconnected)>, method = 'POST', url = 'session'
body = {'capabilities': {'alwaysMatch': {'moz:firefoxOptions': {'args': ['--headless'], 'binary': '/Users/henrik/code/gecko/o...ad.not-web-platform.test,xn--lve-6lad.www2.not-web-platform.test'}}, 'unhandledPromptBehavior': 'dismiss and notify'}}}
timeout = None
def send_command(self, method, url, body=None, timeout=None):
"""
Send a command to the remote end and validate its success.
:param method: HTTP method to use in request.
:param uri: "Command part" of the HTTP request URL,
e.g. `window/rect`.
:param body: Optional body of the HTTP request.
:return: `None` if the HTTP response body was empty, otherwise
the `value` field returned after parsing the response
body as JSON.
:raises error.WebDriverException: If the remote end returns
an error.
:raises ValueError: If the response body does not contain a
`value` key.
"""
response = self.transport.send(
method, url, body,
encoder=protocol.Encoder, decoder=protocol.Decoder,
session=self, timeout=timeout)
if response.status != 200:
err = error.from_response(response)
if isinstance(err, error.InvalidSessionIdException):
# The driver could have already been deleted the session.
self.session_id = None
> raise err
E UnknownErrorException: unknown error (500): Too many open files (os error 24)
body = {'capabilities': {'alwaysMatch': {'moz:firefoxOptions': {'args': ['--headless'], 'binary': '/Users/henrik/code/gecko/o...ad.not-web-platform.test,xn--lve-6lad.www2.not-web-platform.test'}}, 'unhandledPromptBehavior': 'dismiss and notify'}}}
err = <UnknownErrorException http_status=500>
method = 'POST'
response = <Response status=500 error=<UnknownErrorException http_status=500>>
self = <Session (disconnected)>
timeout = None
url = 'session'
Running ulimit
shows that on my MacOS system the number of file descriptors is limited to 256.
James, are these actually files or socket connections we may not correctly clean-up?
Assignee | ||
Comment 1•5 years ago
|
||
Open sockets require a file descriptor, so both open sockets and files will contribute to the limit.
Comment 2•5 years ago
|
||
The severity field is not set for this bug.
:whimboo, could you have a look please?
For more information, please visit auto_nag documentation.
Reporter | ||
Comment 3•5 years ago
|
||
I will keep the needinfo for now and will try again this week if I can reproduce this failure.
Reporter | ||
Comment 4•5 years ago
|
||
When I observe the geckodriver process with the Activity monitor I can clearly see that ports are getting accumulated. As the tab says that those are open ports and files
those must still remain open?
[..]
130
localhost:50976->localhost:50212
131
localhost:50976->localhost:50596
132
localhost:50976->localhost:50825
Given that the tool just runs sudo lsof -Fn -p [pid]
I did it myself in the command line with the following result:
[..]
f81
nlocalhost:58400->localhost:58987
f82
nlocalhost:58400->localhost:59093
f83
nlocalhost:58400->localhost:59195
f84
nlocalhost:58400->localhost:59297
f85
nlocalhost:58400->localhost:59403
f86
nlocalhost:58400->localhost:59695
And to get details what the port at the end of the line actually is I run sudo lsof -i :59695
:
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
python2.7 591 henrik 83u IPv4 0x3cc2caa6ffd56527 0t0 TCP localhost:59695->localhost:58400 (ESTABLISHED)
geckodriv 592 henrik 86u IPv4 0x3cc2caa6ffd53da7 0t0 TCP localhost:58400->localhost:59695 (ESTABLISHED)
So it seems to look like that wptrunner is leaking the ports?
Assignee | ||
Comment 5•5 years ago
|
||
Updated•5 years ago
|
Assignee | ||
Comment 6•5 years ago
|
||
Can you validate the patch on macOS?
Reporter | ||
Comment 7•5 years ago
|
||
Applied and run locally. We no longer exceed 89 files handles for a geckodriver process. So this looks great, and it's the first time I actually can run all the tests at once. I'm sure that this problem was part of the connection failures we have seen for a lot of wdspec tests in CI. So lets see if results improve.
Can you please push a try build?
Assignee | ||
Comment 8•5 years ago
|
||
I had, I just didn't post the link :) https://treeherder.mozilla.org/#/jobs?repo=try&tier=1%2C2%2C3&revision=d47d5729aefbf290e955861368fce284e6ae3d8a
Comment 11•5 years ago
|
||
bugherder |
Description
•