Closed
Bug 1308453
Opened 8 years ago
Closed 8 years ago
Firefox-UI update tests failing after fix in bug 1302707, with MarionetteException: Not a Number
Categories
(Remote Protocol :: Marionette, defect)
Tracking
(firefox50 wontfix, firefox51 fixed, firefox52 fixed)
RESOLVED
FIXED
mozilla52
People
(Reporter: FlorinMezei, Assigned: ato)
References
Details
Attachments
(1 file)
All ondemand update tests failed on Beta 5: https://treeherder.mozilla.org/#/jobs?repo=mozilla-beta&revision=49776d31766d&group_state=expanded&filter-tier=3&filter-searchStr=Fxup-beta-cdntest(. All failures are: TEST-UNEXPECTED-ERROR | test_direct_update.py TestDirectUpdate.test_update | MarionetteException: MarionetteException: Not a Number. This was not showing in Beta 4, so the most likely regressor is: https://bugzilla.mozilla.org/show_bug.cgi?id=1302707 (https://hg.mozilla.org/releases/mozilla-beta/pushloghtml?fromchange=f4b45ea3c0c8&tochange=49776d31766d).
Reporter | ||
Updated•8 years ago
|
Flags: needinfo?(ato)
Comment 1•8 years ago
|
||
This is not only for ondemand update tests but also happens permanently on Aurora and Nightly. Problem here is: 03:52:54 INFO - File "c:\jenkins\workspace\ondemand_update\build\venv\lib\site-packages\marionette\marionette_test.py", line 440, in setUp 03:52:54 INFO - self.marionette.reset_timeouts() 03:52:54 INFO - File "c:\jenkins\workspace\ondemand_update\build\venv\lib\site-packages\marionette_driver\marionette.py", line 740, in reset_timeouts 03:52:54 INFO - self.set_page_load_timeout(30000) 03:52:54 INFO - File "c:\jenkins\workspace\ondemand_update\build\venv\lib\site-packages\marionette_driver\marionette.py", line 1208, in set_page_load_timeout 03:52:54 INFO - self._send_message("timeouts", {"page load": timeout}) 03:52:54 INFO - File "c:\jenkins\workspace\ondemand_update\build\venv\lib\site-packages\marionette_driver\decorators.py", line 42, in _ 03:52:54 INFO - return func(*args, **kwargs) 03:52:54 INFO - File "c:\jenkins\workspace\ondemand_update\build\venv\lib\site-packages\marionette_driver\marionette.py", line 690, in _send_message 03:52:54 INFO - self._handle_error(err) 03:52:54 INFO - File "c:\jenkins\workspace\ondemand_update\build\venv\lib\site-packages\marionette_driver\marionette.py", line 723, in _handle_error 03:52:54 INFO - raise errors.lookup(error)(message, stacktrace=stacktrace) Looks like `reset_timeouts()` is not working correctly.
Summary: Ondemand Update tests failing after fix in bug 1302707, with MarionetteException: Not a Number → Firefox-UI update tests failing after fix in bug 1302707, with MarionetteException: Not a Number
Comment 2•8 years ago
|
||
To add, this is actually blocking QA from signing off for Firefox 50 Beta 5. So we should get this fixed ASAP.
Severity: normal → critical
Comment 3•8 years ago
|
||
We have concerns with the 50.0beta stability in general. Because beta 5 contains important fixes, I would like us to move fast. Can we backout changes in bug 1302707?
Assignee | ||
Comment 4•8 years ago
|
||
I’m guessing the out-of-repo (?) ondemand_update uses an unpinned version of the Marionette Python client that tries to communicate with a Firefox that does not have the protocol changes in bug 1302707. The patches in bug 1302707 change the server’s Marionette:timeouts command to be backwards compatible with earlier clients, but the changes in the client aren’t backwards compatible with earlier servers. Since we aren’t particularly attached to the changes in bug 1302707, it’s fine to back them out of beta.
Flags: needinfo?(ato)
Comment 5•8 years ago
|
||
(In reply to Andreas Tolfsen ‹:ato› from comment #4) > I’m guessing the out-of-repo (?) ondemand_update uses an unpinned version of > the Marionette Python client that tries to communicate with a Firefox that > does not have the protocol changes in bug 1302707. Those update tests (here for beta) are testing updates from older releases of Firefox eg. Firefox 49 Beta X or even earlier. As base those tests make use of the test package for the target Firefox version. It means that for the above case Firefox 49 b3 uses a test package with the related timeout changes included. > The patches in bug 1302707 change the server’s Marionette:timeouts command > to be backwards compatible with earlier clients, but the changes in the > client aren’t backwards compatible with earlier servers. The came client is used here for both versions, but different servers. So that is causing the problem and also reflects why we had partial test bustage on mozilla-central and mozilla-aurora for a couple of days. Once the source build contained all the changes server and client wise the failure is gone. But keep in mind that this will not be the case for our release update tests, at least not for the next 3+ releases! We have to find a way to properly fix it, or we can scratch the update tests. > Since we aren’t particularly attached to the changes in bug 1302707, it’s > fine to back them out of beta. It would only be a temporary fix as stated above, which would allow us to run update tests for beta 5. But it would also mean that any upcoming update tests which use Beta 5 as source build will fail too because of the backout - at least if we aren't able to fix it properly until the next beta release.
Assignee | ||
Comment 6•8 years ago
|
||
Yes, I agree to all of your points above. I also wrote a little summary in https://bugzilla.mozilla.org/show_bug.cgi?id=1302707#c72 about the current situation with making changes to the Marionette client. (In reply to Henrik Skupin (:whimboo) from comment #5) > (In reply to Andreas Tolfsen ‹:ato› from comment #4) > > Since we aren’t particularly attached to the changes in bug 1302707, it’s > > fine to back them out of beta. > > It would only be a temporary fix as stated above, which would allow us to > run update tests for beta 5. But it would also mean that any upcoming update > tests which use Beta 5 as source build will fail too because of the backout > - at least if we aren't able to fix it properly until the next beta release. Should we use this bug to track the compatibility fix? I think it should look something like this: def set_page_load_timeout(self, timeout): try: self._send_message("timeouts", {"page load": timeout}) except MarionetteException: self._send_message("timeouts", {"type": "page load", "ms": ms}
Comment 7•8 years ago
|
||
(In reply to Andreas Tolfsen ‹:ato› from comment #6) > Should we use this bug to track the compatibility fix? That sounds fine to me. > I think it should look something like this: > > def set_page_load_timeout(self, timeout): > try: > self._send_message("timeouts", {"page load": timeout}) > except MarionetteException: > self._send_message("timeouts", {"type": "page load", "ms": ms} I assume we cannot do anything regarding the protocol level? Or has this also been bumped for this API change? I'm asking because in case of another change in that area, we could end-up with spaghetti-code trying to many different ways in setting this timeout.
Assignee | ||
Comment 8•8 years ago
|
||
I don’t normally bump the protocol level unless there are more fundamental, structural changes to the data model. We could of course do that regularly whenever there is an API or type change, but I don’t think a magical number is any better than the above: def some_function(self): if self.protocol <= 14: # do something elfi self.protocol >= 23: # something else
Comment 9•8 years ago
|
||
Whatever you think is better. But I think we should at least add a Firefox version to the fallback case, so we can remove this code whenever we no longer have to support it.
Updated•8 years ago
|
Assignee | ||
Comment 10•8 years ago
|
||
Maybe it’s more reasonable to encode and use the Gecko version number when managing shims such as these, so we know exactly in which version an API changed without having to jump through the hoops of looking up in the code when some API disappeared or which Marionette protocol version is tied to what release of Firefox? def set_page_load_timeout(self, timeout): if self._gecko > 52: self._send_message("timeouts", {"page load": timeout}) else: self._send_message("timeouts", {"type": "page load", "ms": timeout}) A possible improvement over this is having some sort of “bridge”, where we detach the protocol implementation from the client API and choose what bridge level we want at a much earlier point. We should do this if we really care about the code quality of the Marionette client: class Gecko34Bridge(Bridge): def timeout(self, type, ms): self._send_message("timeouts", {"type": type, "ms": ms}) class Gecko52Bridge(Gecko34Bridge): def timeout(self, type, ms): self._send_message("timeouts", {type: ms}) def start_session(self): … self.bridge = # some negotiation code … … def set_page_load_timeout(self, timeout): self.bridge.timeouts("page load", timeout) But I don’t think I particularly want to put in the effort to code something like that.
Comment 11•8 years ago
|
||
(In reply to Andreas Tolfsen ‹:ato› from comment #10) > Maybe it’s more reasonable to encode and use the Gecko version number when > managing shims such as these, so we know exactly in which version an API > changed without having to jump through the hoops of looking up in the code > when some API disappeared or which Marionette protocol version is tied to > what release of Firefox? > > def set_page_load_timeout(self, timeout): > if self._gecko > 52: > self._send_message("timeouts", {"page load": timeout}) > else: > self._send_message("timeouts", {"type": "page load", "ms": > timeout}) Sounds good bug it would still cause a failure for update tests in the next couple of days as long as the source build is on the same version number but does not have the changes included. So only a combination of version and buildnumber would satisfy that. > A possible improvement over this is having some sort of “bridge”, where we > detach the protocol implementation from the client API and choose what > bridge level we want at a much earlier point. We should do this if we > really care about the code quality of the Marionette client: [..] > But I don’t think I particularly want to put in the effort to code something > like that. I feel the same way. Maybe we stay with try/catch for now?
Assignee | ||
Comment 12•8 years ago
|
||
There is such a thing as a Services.appinfo.platformBuildId too which I presume is more specific, but I don’t know how much we really care about intra-version incompatibility? I guess I’m trying to imagine how this would fail in the future if we follow this model, disregarding this current fallout.
Comment 13•8 years ago
|
||
The platform buildid should not be used currently. See bug 1298328. Otherwise it's just a question how complicated a check like that should be. I think in the case that we know about a breaking API change it might be fine that tests are broken. But it should definitely not happen for Beta and Release builds.
Comment hidden (Intermittent Failures Robot) |
Comment 15•8 years ago
|
||
Andreas, I assume it's fine to assign this bug to you.
Assignee: nobody → ato
Severity: critical → major
Status: NEW → ASSIGNED
Comment hidden (mozreview-request) |
Comment hidden (mozreview-request) |
Comment 18•8 years ago
|
||
mozreview-review |
Comment on attachment 8801722 [details] Bug 1308453 - Backwards compat for old Marionette:timeouts API; https://reviewboard.mozilla.org/r/86394/#review85108 ::: testing/marionette/client/marionette_driver/marionette.py:1240 (Diff revision 2) > > """ > + try: > - self._send_message("timeouts", {"script": timeout}) > + self._send_message("timeouts", {"script": timeout}) > + except errors.MarionetteException as e: > + if "Not a Number" in e.message: Mind adding a comment here which refers 52.0a? Otherwise no-one will most likely remove it when 45.0esr gets deprecated. Same for the other changes.
Attachment #8801722 -
Flags: review?(hskupin) → review+
Comment hidden (mozreview-request) |
Comment 20•8 years ago
|
||
mozreview-review |
Comment on attachment 8801722 [details] Bug 1308453 - Backwards compat for old Marionette:timeouts API; https://reviewboard.mozilla.org/r/86394/#review85112 ::: testing/marionette/client/marionette_driver/marionette.py:1240 (Diff revisions 2 - 3) > > """ > try: > self._send_message("timeouts", {"script": timeout}) > except errors.MarionetteException as e: > + # remove when 52.0a is stable That is not true. We still support 45esr until 52.2.0.
Comment 21•8 years ago
|
||
mozreview-review |
Comment on attachment 8801722 [details] Bug 1308453 - Backwards compat for old Marionette:timeouts API; https://reviewboard.mozilla.org/r/86394/#review85112 ::: testing/marionette/client/marionette_driver/marionette.py:1240 (Diff revisions 2 - 3) > > """ > try: > self._send_message("timeouts", {"script": timeout}) > except errors.MarionetteException as e: > + # remove when 52.0a is stable That is not true. We still support 45esr until 52.2.0.
Comment 22•8 years ago
|
||
Pushed by atolfsen@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/bf8f96812076 Backwards compat for old Marionette:timeouts API; r=whimboo
Comment 23•8 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/bf8f96812076
Status: ASSIGNED → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla52
Comment 24•8 years ago
|
||
This is a test-only change which we would like to see uplifted to aurora to get a regression fixed. Thanks.
Whiteboard: [checkin-needed-aurora]
Comment 25•8 years ago
|
||
bugherder uplift |
https://hg.mozilla.org/releases/mozilla-aurora/rev/59930726515c
Comment 26•8 years ago
|
||
To reduce confusion we should set status-firefox50 to wontfix. This patch never landed for that release, and the regressor got backed-out.
Updated•1 year ago
|
Product: Testing → Remote Protocol
You need to log in
before you can comment on or make changes to this bug.
Description
•