Closed Bug 1653294 Opened 4 years ago Closed 4 years ago

Intermittent Wd | UnknownErrorException: unknown error (500): Too many open files (os error 24)

Categories

(Testing :: geckodriver, defect, P3)

defect

Tracking

(firefox81 fixed)

RESOLVED FIXED
81 Branch
Tracking Status
firefox81 --- fixed

People

(Reporter: whimboo, Assigned: jgraham)

References

Details

Attachments

(1 file)

I was running all the wdspec tests locally via headless mode, and at the end the following failure was reported for a couple of tests:

/webdriver/tests/get_window_handles/user_prompts.py
  ERROR test_dismiss_and_notify[capabilities0-alert] - setup error
capabilities = {'unhandledPromptBehavior': 'dismiss and notify'}
configuration = {'capabilities': {'moz:firefoxOptions': {'args': ['--headless'], 'binary': '/Users/henrik/code/gecko/obj/opt/dist/Nigh...ww1.xn--lve-6lad.not-web-platform.test,xn--lve-6lad.www2.not-web-platform.test'}}}, 'host': '127.0.0.1', 'port': 55606}
request = <SubRequest 'session' for <Function 'test_dismiss_and_notify[capabilities0-alert]'>>

    @pytest.fixture(scope="function")
    def session(capabilities, configuration, request):
        """Create and start a session for a test that does not itself test session creation.

        By default the session will stay open after each test, but we always try to start a
        new one and assume that if that fails there is already a valid session. This makes it
        possible to recover from some errors that might leave the session in a bad state, but
        does not demand that we start a new session per test."""
        global _current_session

        # Update configuration capabilities with custom ones from the
        # capabilities fixture, which can be set by tests
        caps = copy.deepcopy(configuration["capabilities"])
        caps.update(capabilities)
        caps = {"alwaysMatch": caps}

        # If there is a session with different capabilities active, end it now
        if _current_session is not None and (
                caps != _current_session.requested_capabilities):
            _current_session.end()
            _current_session = None

        if _current_session is None:
            _current_session = webdriver.Session(
                configuration["host"],
                configuration["port"],
                capabilities=caps)
        try:
>           _current_session.start()

capabilities = {'unhandledPromptBehavior': 'dismiss and notify'}
caps       = {'alwaysMatch': {'moz:firefoxOptions': {'args': ['--headless'], 'binary': '/Users/henrik/code/gecko/obj/opt/dist/Night...lad.not-web-platform.test,xn--lve-6lad.www2.not-web-platform.test'}}, 'unhandledPromptBehavior': 'dismiss and notify'}}
configuration = {'capabilities': {'moz:firefoxOptions': {'args': ['--headless'], 'binary': '/Users/henrik/code/gecko/obj/opt/dist/Nigh...ww1.xn--lve-6lad.not-web-platform.test,xn--lve-6lad.www2.not-web-platform.test'}}}, 'host': '127.0.0.1', 'port': 55606}
request    = <SubRequest 'session' for <Function 'test_dismiss_and_notify[capabilities0-alert]'>>

testing/web-platform/tests/webdriver/tests/support/fixtures.py:141:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
testing/web-platform/tests/tools/webdriver/webdriver/client.py:435: in start
    value = self.send_command("POST", "session", body=body)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

self = <Session (disconnected)>, method = 'POST', url = 'session'
body = {'capabilities': {'alwaysMatch': {'moz:firefoxOptions': {'args': ['--headless'], 'binary': '/Users/henrik/code/gecko/o...ad.not-web-platform.test,xn--lve-6lad.www2.not-web-platform.test'}}, 'unhandledPromptBehavior': 'dismiss and notify'}}}
timeout = None

    def send_command(self, method, url, body=None, timeout=None):
        """
            Send a command to the remote end and validate its success.

            :param method: HTTP method to use in request.
            :param uri: "Command part" of the HTTP request URL,
                e.g. `window/rect`.
            :param body: Optional body of the HTTP request.

            :return: `None` if the HTTP response body was empty, otherwise
                the `value` field returned after parsing the response
                body as JSON.

            :raises error.WebDriverException: If the remote end returns
                an error.
            :raises ValueError: If the response body does not contain a
                `value` key.
            """

        response = self.transport.send(
            method, url, body,
            encoder=protocol.Encoder, decoder=protocol.Decoder,
            session=self, timeout=timeout)

        if response.status != 200:
            err = error.from_response(response)

            if isinstance(err, error.InvalidSessionIdException):
                # The driver could have already been deleted the session.
                self.session_id = None

>           raise err
E           UnknownErrorException: unknown error (500): Too many open files (os error 24)

body       = {'capabilities': {'alwaysMatch': {'moz:firefoxOptions': {'args': ['--headless'], 'binary': '/Users/henrik/code/gecko/o...ad.not-web-platform.test,xn--lve-6lad.www2.not-web-platform.test'}}, 'unhandledPromptBehavior': 'dismiss and notify'}}}
err        = <UnknownErrorException http_status=500>
method     = 'POST'
response   = <Response status=500 error=<UnknownErrorException http_status=500>>
self       = <Session (disconnected)>
timeout    = None
url        = 'session'

Running ulimit shows that on my MacOS system the number of file descriptors is limited to 256.

James, are these actually files or socket connections we may not correctly clean-up?

Flags: needinfo?(james)

Open sockets require a file descriptor, so both open sockets and files will contribute to the limit.

Flags: needinfo?(james)

The severity field is not set for this bug.
:whimboo, could you have a look please?

For more information, please visit auto_nag documentation.

Flags: needinfo?(hskupin)

I will keep the needinfo for now and will try again this week if I can reproduce this failure.

Severity: -- → S3

When I observe the geckodriver process with the Activity monitor I can clearly see that ports are getting accumulated. As the tab says that those are open ports and files those must still remain open?

[..]
130
localhost:50976->localhost:50212
131
localhost:50976->localhost:50596
132
localhost:50976->localhost:50825

Given that the tool just runs sudo lsof -Fn -p [pid] I did it myself in the command line with the following result:

[..]
f81
nlocalhost:58400->localhost:58987
f82
nlocalhost:58400->localhost:59093
f83
nlocalhost:58400->localhost:59195
f84
nlocalhost:58400->localhost:59297
f85
nlocalhost:58400->localhost:59403
f86
nlocalhost:58400->localhost:59695

And to get details what the port at the end of the line actually is I run sudo lsof -i :59695:

COMMAND   PID   USER   FD   TYPE             DEVICE SIZE/OFF NODE NAME
python2.7 591 henrik   83u  IPv4 0x3cc2caa6ffd56527      0t0  TCP localhost:59695->localhost:58400 (ESTABLISHED)
geckodriv 592 henrik   86u  IPv4 0x3cc2caa6ffd53da7      0t0  TCP localhost:58400->localhost:59695 (ESTABLISHED)

So it seems to look like that wptrunner is leaking the ports?

Flags: needinfo?(hskupin) → needinfo?(james)
Assignee: nobody → james
Status: NEW → ASSIGNED

Can you validate the patch on macOS?

Flags: needinfo?(james) → needinfo?(hskupin)

Applied and run locally. We no longer exceed 89 files handles for a geckodriver process. So this looks great, and it's the first time I actually can run all the tests at once. I'm sure that this problem was part of the connection failures we have seen for a lot of wdspec tests in CI. So lets see if results improve.

Can you please push a try build?

Flags: needinfo?(hskupin) → needinfo?(james)
Pushed by james@hoppipolla.co.uk:
https://hg.mozilla.org/integration/autoland/rev/29be769939c2
Ensure HTTP connections are closed in websocket tests, r=webdriver-reviewers,whimboo
Created web-platform-tests PR https://github.com/web-platform-tests/wpt/pull/25183 for changes under testing/web-platform/tests
Status: ASSIGNED → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → 81 Branch
Upstream PR merged by moz-wptsync-bot
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: