Marionette connection timeout when running mochitest with --valgrind

RESOLVED FIXED in Firefox 46

Status

Testing
Mochitest
RESOLVED FIXED
2 years ago
a year ago

People

(Reporter: ahal, Assigned: ahal)

Tracking

unspecified
mozilla47
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox46 fixed, firefox47 fixed)

Details

MozReview Requests

Submitter Diff Changes Open Issues Last Updated
Loading...
Error loading review requests:

Attachments

(2 attachments, 1 obsolete attachment)

(Assignee)

Description

2 years ago
This is fallout from bug 1231784.

See:
https://public-artifacts.taskcluster.net/RDU46_RkTIqPn3ifDzr9qg/0/public/logs/live_backing.log

Apparently using --valgrind can make Gecko *really* slow to start up. My theory is that the default 60 second connection timeout in marionette isn't sufficient:
https://dxr.mozilla.org/mozilla-central/source/testing/marionette/driver/marionette_driver/marionette.py#1130

Ted points out that it may also be insufficient when running with certain debuggers attached. If this is the case, we'll need to bump that timeout when running with these types of tools.
Created attachment 8719442 [details] [diff] [review]
demo patch (do not commit)

Just to confirm, per irc w/ ahal on Thursday (?), increasing the
marionette startup timeout from 60 seconds to 900 seconds fixed the
problem for me, when running on Valgrind.  As ahal suspected it would.
Blocks: 1245566
(Assignee)

Comment 2

2 years ago
Excellent. I'll add a hidden argument for controlling the marionette timeout. Then I'll automatically bump it up higher if --valgrind or --debugger is set.
Assignee: nobody → ahalberstadt
Status: NEW → ASSIGNED
Created attachment 8719760 [details] [diff] [review]
bug1248056-1.diff

I found a second timeout that needs to be adjusted.  So here's a
revised reference patch that shows both changes.
Attachment #8719442 - Attachment is obsolete: true
(Assignee)

Comment 4

2 years ago
Created attachment 8719781 [details]
MozReview Request: Bug 1248056 - Increase marionette connection timeouts in mochitest when running with valgrind, r?chmanchester

Review commit: https://reviewboard.mozilla.org/r/35113/diff/#index_header
See other reviews: https://reviewboard.mozilla.org/r/35113/
Attachment #8719781 - Flags: review?(cmanchester)
Attachment #8719781 - Flags: review?(cmanchester) → review+
Comment on attachment 8719781 [details]
MozReview Request: Bug 1248056 - Increase marionette connection timeouts in mochitest when running with valgrind, r?chmanchester

https://reviewboard.mozilla.org/r/35113/#review31729

::: testing/mochitest/mochitest_options.py:565
(Diff revision 1)
> +        [["--marionette-port-timeout"],
> +         {"default": 60,
> +          "help": "Timeout while waiting for the marionette port to open.",
> +          "suppress": True,
> +          }],
> +        [["--marionette-socket-timeout"],
> +         {"default": 360.0,
> +          "help": "Timeout while waiting to receive a message from the marionette server.",
> +          "suppress": True,
> +          }],

Consider leaving these un-set when they're not needed so we don't have these defaults both here and in marionette.

::: testing/mochitest/runtests.py:2356
(Diff revision 1)
>              marionette_args = {
>                  'symbols_path': options.symbolsPath,
> +                'socket_timeout': options.marionette_socket_timeout,
> +                'port_timeout': options.marionette_port_timeout,

These args end up in the constructor for marionette, right? I don't see port_timeout as a valid keyword argument there.
(Assignee)

Comment 6

2 years ago
https://reviewboard.mozilla.org/r/35113/#review31729

> Consider leaving these un-set when they're not needed so we don't have these defaults both here and in marionette.

I was trying to avoid having to wrap all my marionette calls in if statements as passing in None to them overrides the default. But I'll fix this by making a change to marionette as well.

> These args end up in the constructor for marionette, right? I don't see port_timeout as a valid keyword argument there.

It gets popped back off before being passed to the constructor. I did it this way to avoid changing the signature of runApp (which meant futzing with b2g and android). But if you feel particularly strongly about it, I can pass it in via the runApp signature instead of piggy-backing off marionette_args.
(Assignee)

Comment 7

2 years ago
Comment on attachment 8719781 [details]
MozReview Request: Bug 1248056 - Increase marionette connection timeouts in mochitest when running with valgrind, r?chmanchester

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/35113/diff/1-2/

Comment 8

2 years ago
https://hg.mozilla.org/integration/mozilla-inbound/rev/bf27afd3c5c6
Does it even make sense to have a timeout for an 'interactive' debugger? The debugger could just be waiting for the user to give the command to start the target, why would timing out make sense in that case?
(Assignee)

Comment 10

2 years ago
That's a good point. Marionette currently polls for the server every 100ms, and there isn't any way to disable the timeout completely (though that could be changed). Though one benefit of a timeout is to prevent this from accidentally getting run indefinitely in a screen session or similar.

But even so, with this patch developers have 15 minutes to start the target. And if they somehow get distracted and miss this timeout, it's just a matter of re-running the command. For those reasons this use case isn't terribly compelling to me. Though I'm open to further debate.
15 minutes certainly sounds better already. I've had multiple instances where I was looking up where to set a breakpoint only to have my run killed once I switched back to my terminal. I still don't think that that timeout makes sense, but I guess I'll just set a huge one using the --marionette-*-timeout arguments.

Comment 12

2 years ago
bugherder
https://hg.mozilla.org/mozilla-central/rev/bf27afd3c5c6
Status: ASSIGNED → RESOLVED
Last Resolved: 2 years ago
status-firefox47: --- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla47
(Assignee)

Comment 13

2 years ago
https://hg.mozilla.org/releases/mozilla-aurora/rev/e8842acd6224
status-firefox46: --- → fixed
(In reply to Andrew Halberstadt [:ahal] from comment #7)
> Comment on attachment 8719781 [details]
> MozReview Request: Bug 1248056 - Increase marionette connection timeouts in
> mochitest when running with valgrind, r?chmanchester

The patch as landed here was not that ideal and doesn't correspond with the commit message as given for mozreview. It actually bumped up the default socket timeout for each and every Marionette based test harness. As result also tests running on opt builds had this timeout set, which increased our test execution times drastically in case of socket failures. I have seen this a lot lately due to all sort of e10s issues. 

Please keep in mind that I reverted the default value back to 60s on bug 1284457 to get rid of those long wait times on build and test machines. So if any other test harness or developers want to use a different timeout they will have to specify that via the command line option --socket-timeout.
See Also: → bug 1284457
You need to log in before you can comment on or make changes to this bug.