Closed
Bug 1207125
Opened 9 years ago
Closed 7 years ago
Marionette._send_message() mixes up response values of different calls
Categories
(Remote Protocol :: Marionette, defect, P3)
Remote Protocol
Marionette
Tracking
(Not tracked)
RESOLVED
INCOMPLETE
People
(Reporter: whimboo, Unassigned)
References
Details
(Keywords: pi-marionette-client)
File "/mozilla/code/gecko/testing/marionette/client/marionette/runner/base.py", line 530, in gather_debug
rv['screenshot'] = marionette.screenshot()
File "/usr/lib/python2.7/contextlib.py", line 24, in __exit__
self.gen.next()
File "/mozilla/code/gecko/testing/marionette/driver/marionette_driver/marionette.py", line 1241, in using_context
self.set_context(scope)
File "/mozilla/code/gecko/testing/marionette/driver/marionette_driver/marionette.py", line 1216, in set_context
assert(context == self.CONTEXT_CHROME or context == self.CONTEXT_CONTENT)
AssertionError
[..]
Traceback (most recent call last):
File "/mozilla/code/gecko/testing/marionette/client/marionette/marionette_test.py", line 296, in run
testMethod()
File "/mozilla/code/firefox-ui-tests/firefox_ui_tests/update/direct/test_direct_update.py", line 26, in test_update
self._test_update()
File "/mozilla/code/firefox-ui-tests/firefox_ui_tests/update/direct/test_direct_update.py", line 20, in _test_update
self.download_and_apply_available_update(force_fallback=False)
File "/mozilla/code/firefox-ui-tests/firefox_ui_harness/testcases/update.py", line 158, in download_and_apply_available_update
self.updates[self.current_update_index]['patch'] = about_window.patch_info
File "/mozilla/code/firefox-ui-tests/firefox_puppeteer/ui/about_window/window.py", line 51, in patch_info
patch = self._software_update.patch_info
File "/mozilla/code/firefox-ui-tests/firefox_puppeteer/api/software_update.py", line 295, in patch_info
info = {'channel': self.update_channel.channel}
File "/mozilla/code/firefox-ui-tests/firefox_puppeteer/api/software_update.py", line 401, in channel
return self.prefs.get_pref('app.update.channel', True)
File "/mozilla/code/firefox-ui-tests/firefox_puppeteer/api/prefs.py", line 70, in get_pref
""", script_args=[pref_name, default_branch, interface])
File "/usr/lib/python2.7/contextlib.py", line 24, in __exit__
self.gen.next()
File "/mozilla/code/gecko/testing/marionette/driver/marionette_driver/marionette.py", line 1241, in using_context
self.set_context(scope)
File "/mozilla/code/gecko/testing/marionette/driver/marionette_driver/marionette.py", line 1216, in set_context
assert(context == self.CONTEXT_CHROME or context == self.CONTEXT_CONTENT)
AssertionError
The extra assert line in the code doesn't give that much information about the actual scope to be set. If you remove it and let the next line throw a ValueError you will see that the scope to be set is "nightly"!
Reporter | ||
Comment 1•9 years ago
|
||
Adding some debug code to using_context shows that very strange output:
current scope: /tmp/tmpAkdPyG.firefox-ui-update/binary.backup/defaults/pref/channel-prefs.js
or
current scope: nightly
None of those scopes get set somewhere. So I wonder if the following method can return bogus values:
self._send_message("getContext", key="value")
Reporter | ||
Comment 2•9 years ago
|
||
Here some Marionette debug output:
************************** using context *******
** wanted: chrome
1442926688558 Marionette DEBUG conn1 -> {"name":"executeScript","parameters":{"scriptTimeout":null,"newSandbox":true,"args":["app.update.channel",true,null],"filename":"prefs.py","script":"\n Components.utils.import(\"resource://gre/modules/Services.jsm\");\n\n let pref_name = arguments[0];\n let default_branch = arguments[1];\n let interface = arguments[2];\n\n let prefBranch;\n if (default_branch) {\n prefBranch = Services.prefs.getDefaultBranch(\"\");\n }\n else {\n prefBranch = Services.prefs;\n }\n\n // If an interface has been set, handle it differently\n if (interface !== null) {\n return prefBranch.getComplexValue(pref_name,\n Components.interfaces[interface]).data;\n }\n\n let type = prefBranch.getPrefType(pref_name);\n\n switch (type) {\n case prefBranch.PREF_STRING:\n return prefBranch.getCharPref(pref_name);\n case prefBranch.PREF_BOOL:\n return prefBranch.getBoolPref(pref_name);\n case prefBranch.PREF_INT:\n return prefBranch.getIntPref(pref_name);\n case prefBranch.PREF_INVALID:\n return null;\n }\n ","sandbox":"default","line":70}}
1442926688560 Marionette DEBUG conn1 client <- {"value":"nightly"}
**current: nightly
The code shows that we call using_context('chrome') to get a preference value from Firefox via execute_script. In this case the return value is "nightly". Somehow the using_context() method gets this value via the call to getContext and stores it internally for the current scope. Then once execute_script has been finished and the original state has to be restored it will accidentally restore the scope to "nightly" but not "content" or "chrome".
Reporter | ||
Comment 3•9 years ago
|
||
Just to add that all the above can only be seen when you hit Ctrl+C while the tests are running.
Another failure which falls into this bucket is:
Traceback (most recent call last):
File "/mozilla/code/gecko/testing/marionette/client/marionette/runner/base.py", line 529, in gather_debug
with marionette.using_context(marionette.CONTEXT_CHROME):
File "/usr/lib/python2.7/contextlib.py", line 17, in __enter__
return self.gen.next()
File "/mozilla/code/gecko/testing/marionette/driver/marionette_driver/marionette.py", line 1234, in using_context
scope = self._send_message("getContext", key="value")
File "/mozilla/code/gecko/testing/marionette/driver/marionette_driver/decorators.py", line 36, in _
return func(*args, **kwargs)
File "/mozilla/code/gecko/testing/marionette/driver/marionette_driver/marionette.py", line 722, in _send_message
return self._unwrap_response(resp.get(key))
AttributeError: 'list' object has no attribute 'get'
Adding debugging code to _send_message gives me the following output:
> **** response for getContext
> [u'3', u'14']
That means for getContext we return the list of open chrome window handles.
So I feel that the _send_message method is busted in those scenarios and returns a value which was set by a different invocation?
Here the steps to reproduce:
1. Clone the firefox-ui-tests repo (https://github.com/mozilla/firefox-ui-tests)
2. Run python setup.py develop
3. Execute firefox-ui-update --binary %path% --update-direct-only
4. After Firefox restarted the first time and the about window opens, quickly switch to the console and hit Ctrl+C
This reliably fails for me.
Summary: Firefox Update tests can cause Marionette to fail with: "Unknown context: nightly" → Marionette._send_message() mixes up response values of different calls
Reporter | ||
Comment 4•9 years ago
|
||
Here some debugging output for responses we get:
**** response for executeScript
**** key: value
**** reponse: {u'value': True}
^C**** response for getContext
**** key: value
**** reponse: {u'value': u'14'}
**** response for setContext
**** key: None
**** reponse: {u'value': u'chrome'}
**** response for executeScript
**** key: value
**** reponse: {}
**** response for getChromeWindowHandles
**** key: None
**** reponse: {u'value': u'nightly'}
**** response for getContext
**** key: value
**** reponse: [u'3', u'14']
As you can see all is fine until the last call to executeScript. Here an empty dict is returned first but the real return value of it ('nightly') comes back for getChromeWindowHandles. And the handles actually for getContext.
Reporter | ||
Comment 5•9 years ago
|
||
Here some debug output with more context:
**** command: {"name": "getChromeWindowHandles"}
**** key for getChromeWindowHandles: None
**** reponse for getChromeWindowHandles: [u'3', u'14']
**** command: {"name": "executeScript", "parameters": {"scriptTimeout": null, "newSandbox": true, "args": [], "filename": "software_update.py", "script": "\n Components.utils.import('resource://gre/modules/Services.jsm');\n\n let file = Services.dirsvc.get('PrfDef', Components.interfaces.nsIFile);\n file.append('channel-prefs.js');\n\n return file.path;\n ", "sandbox": "default", "line": 389}}
**** key for executeScript: value
**** reponse for executeScript: {u'value': u'/tmp/tmpoFtz1S.firefox-ui-update/binary.backup/defaults/pref/channel-prefs.js'}
**** command: {"name": "executeScript", "parameters": {"scriptTimeout": null, "newSandbox": true, "args": [], "filename": "software_update.py", "script": "\n Components.utils.import('resource://gre/modules/Services.jsm');\n\n let file = Services.dirsvc.get('GreD', Components.interfaces.nsIFile);\n file.append('update-settings.ini');\n\n return file.path;\n ", "sandbox": "default", "line": 95}}
^C**** command: {"name": "getContext"}
**** key for getContext: value
**** reponse for getContext: {u'value': u'/tmp/tmpoFtz1S.firefox-ui-update/binary.backup/update-settings.ini'}
**** command: {"name": "setContext", "parameters": {"value": "chrome"}}
**** key for setContext: None
**** reponse for setContext: {u'value': u'chrome'}
See ^C at the beginning of the line which immediately sends a getContext command while the old executeScript command has not been returned yet. As result the response we get for getContext is actually the one for executeScript. :(
Comment 6•9 years ago
|
||
Thanks for that last debug round with more information, that’s incredibly useful!
I’m guessing there’s an interrupt handler in the Marionette runner that on SIGTERM aborts the current operation and sends a getContext message and blocks on receiving a response, but when the response comes back it’s from the previous executeScript command.
This stems from a complete lack of synchronous enforcement in Marionette: The server gladly executes any command you request it to, even if there an existing command is in progress. There is an expectation that the local end blocks until it receives a response back before issuing a new command.
Updated•9 years ago
|
Keywords: ateam-marionette-server
Summary: Marionette._send_message() mixes up response values of different calls → Implement a synchronous, blocking command queue for Marionette (Marionette._send_message() mixes up response values of different calls)
Reporter | ||
Comment 7•9 years ago
|
||
(In reply to Andreas Tolfsen (:ato) from comment #6)
> I’m guessing there’s an interrupt handler in the Marionette runner that on
> SIGTERM aborts the current operation and sends a getContext message and
> blocks on receiving a response, but when the response comes back it’s from
> the previous executeScript command.
So far the runner only listens for SIGINT (bug 1130220) but not SIGTERM (bug 1201871). The getContext might come from a using_context inside our specialized Firefox UI tests harness. But not sure out of the box.
> This stems from a complete lack of synchronous enforcement in Marionette:
> The server gladly executes any command you request it to, even if there an
> existing command is in progress. There is an expectation that the local end
> blocks until it receives a response back before issuing a new command.
Or the _send_message should handle all the exceptions + interrupts correctly and cancel the processing of the command?
Comment 8•9 years ago
|
||
I looked closer at this and the issue seems to be that when a unittest is interrupted, it forcefully shuts down the thread that the test runs on, sometimes causing the socket recv() call from a request to not empty the pending data in the socket buffer.
When the next command is called, i.e. getContext from :whimboo’s log above that gets called in gather_debug by moztest, that actually reads the response from the previous request and thereby causes a race condition where response packets are delayed by a delta of +1. Theoretically this will eventually lead to a buffer overflow if it happens repeatedly.
Summary: Implement a synchronous, blocking command queue for Marionette (Marionette._send_message() mixes up response values of different calls) → Marionette._send_message() mixes up response values of different calls
Comment 9•9 years ago
|
||
One solution is to poll with a timeout (select()) on the socket for read, and iterate byte by byte until the there is no more data.
The issue can be reproduced, if transport’s send is modified to no implicitly receive data, with this test case:
import json
from marionette import MarionetteTestCase
get_current_url = json.dumps({"name": "getCurrentUrl"})
class TestBlocking(MarionetteTestCase):
def test_flush_buffer(self):
self.marionette.client.send(get_current_url)
resp = self.marionette.execute_script("return 'foo'")
self.assertEqual("foo", resp)
Assignee: nobody → ato
Status: NEW → ASSIGNED
Reporter | ||
Comment 10•9 years ago
|
||
I assume you accidentally reverted the summary. Fixing it.
Summary: Marionette._send_message() mixes up response values of different calls → Implement a synchronous, blocking command queue for Marionette (Marionette._send_message() mixes up response values of different calls)
Comment 11•9 years ago
|
||
Experimental patch: https://treeherder.mozilla.org/#/jobs?repo=try&revision=7e7d06753663
(In reply to Henrik Skupin (:whimboo) from comment #10)
> I assume you accidentally reverted the summary. Fixing it.
I think we can address this by emptying the buffer before sending, which will fix this particular issue.
But you’re right that there’s an overall bigger issue here that we need to make the server blocking. I’ve filed bug 1207564 about fixing this.
Summary: Implement a synchronous, blocking command queue for Marionette (Marionette._send_message() mixes up response values of different calls) → Marionette._send_message() mixes up response values of different calls
Updated•9 years ago
|
Keywords: ateam-marionette-server → ateam-marionette-client
Reporter | ||
Comment 12•9 years ago
|
||
So I tested this patch and it doesn't work for me on Linux with my steps. I see a huge perf decrease and when hitting ctrl+c it hangs forever in:
File "/mozilla/code/gecko/testing/marionette/driver/marionette_driver/marionette.py", line 1535, in execute_script
rv = self._send_message("executeScript", body, key="value")
File "/mozilla/code/gecko/testing/marionette/driver/marionette_driver/decorators.py", line 36, in _
return func(*args, **kwargs)
File "/mozilla/code/gecko/testing/marionette/driver/marionette_driver/marionette.py", line 686, in _send_message
resp = self.client.request(packet)
File "/mozilla/code/gecko/testing/marionette/transport/marionette_transport/transport.py", line 126, in request
self.send(data)
File "/mozilla/code/gecko/testing/marionette/transport/marionette_transport/transport.py", line 109, in send
self._empty()
File "/mozilla/code/gecko/testing/marionette/transport/marionette_transport/transport.py", line 99, in _empty
read, _, _ = select.select([self.sock], [], [], 0.1)
KeyboardInterrupt
Comment hidden (obsolete) |
Reporter | ||
Comment 14•9 years ago
|
||
Sorry, the last comment was not meant for this bug.
Flags: needinfo?(ato)
Comment 15•8 years ago
|
||
Unassigning myself as this hasn’t been so much of an issue lately. (Possibly fixed?)
Status: ASSIGNED → NEW
Updated•8 years ago
|
Assignee: ato → nobody
Updated•7 years ago
|
Priority: -- → P3
Reporter | ||
Comment 17•7 years ago
|
||
I tried several methods in reproducing it but wasn't able to. I'm sure I have seen something like that still maybe in wpt tests, but unless I have an example lets close this bug as incomplete.
Status: NEW → RESOLVED
Closed: 7 years ago
Flags: needinfo?(hskupin)
Resolution: --- → INCOMPLETE
Updated•2 years ago
|
Product: Testing → Remote Protocol
You need to log in
before you can comment on or make changes to this bug.
Description
•