Closed Bug 1631362 Opened 5 months ago Closed 4 months ago

JavaScript error: resource://gre/modules/Prompter.jsm, line 642: NS_ERROR_NOT_AVAILABLE: prompt aborted by user

Categories

(Firefox :: Security, defect)

defect

Tracking

()

RESOLVED FIXED
Firefox 78
Tracking Status
firefox-esr68 --- unaffected
firefox76 --- unaffected
firefox77 + wontfix
firefox78 --- fixed

People

(Reporter: whimboo, Assigned: pbz)

References

(Regression)

Details

(Keywords: regression)

Attachments

(1 file)

There are new test failures for Marionette and webdriver spec tests. They all have the following failure in common:

[task 2020-04-17T19:37:17.347Z] 19:37:17 INFO - JavaScript error: resource://gre/modules/Prompter.jsm, line 642: NS_ERROR_NOT_AVAILABLE: prompt aborted by user

Looks like a regression from bug 1615588.

Flags: needinfo?(pbz)
Priority: -- → P3
Blocks: 1629147

I'll look into this. Thanks!

Assignee: nobody → pbz
Status: NEW → ASSIGNED
Flags: needinfo?(pbz)
Blocks: 1631326
Blocks: 1631471
Blocks: 1631567
Blocks: 1631710
Blocks: 1631836

I think this might a side effect of regular teardown. When a tab / browser is destroyed it calls abortPrompt on CommonDialog.jsm which causes Prompter.jsm to throw here. We should probably differentiate this more and only throw for actual user prompt aborts like navigation.

Blocks: 1632371

Paul, do you have an update? This problem is causing a fair amount of intermittent failures, some even reaching the top intermittent list. Thanks!

Flags: needinfo?(pbz)

Hi! Sorry for the delay, I was busy with 77 work. I'll take a look today and tomorrow and will keep you updated here. As a quick fix it might be possible to just ignore these exceptions, since they don't really indicate a problem in the test.
I'm still a bit confused why this throws more often with my patch, given that the old code also threw on prompt close / teardown. It's also interesting that there are barely any failures on linux.

I suspect that this became intermittent because I fixed a bug with my patch. The for loop here was faulty https://searchfox.org/mozilla-central/rev/74bd4471abc8545c9a12424b33c6d90889b864aa/browser/actors/PromptParent.jsm#123 it did not iterate over the right object and thus never aborted any prompts => no exception thrown on teardown. Now with the fixed for loop it throws an exception when the browser (+ PromptParent window actor) is destroyed.
Since throwing is the correct behavior I suggest that we ignore the exception in these tests or make sure to always fully close all prompts before doing any kind of teardown. I'll run a few experiments on try and see how I can fix this.

Paul, can you please explain when exactly the error is thrown? When some code calls accept() or dismiss() on the prompt object?

If that is the case, the code affected in Marionette is:

https://searchfox.org/mozilla-central/rev/7fd1c1c34923ece7ad8c822bee062dd0491d64dc/testing/marionette/driver.js#3295-3327

And hereby we always issue a click event on the appropriate buttons.

Note that this problem mainly happens for MacOS debug and Windows 7 opt builds. Why don't we see it on Linux or other kind of builds. This feel kinda in-consequent if we expect this error to be thrown all the time. Then I would expect a perma failure.

Please note that I'm not able to reproduce this failure locally even not with testing/web-platform/tests/webdriver/tests/execute_script/execute.py running in a loop for 50 times.

Ok, so what's interesting here are those lines:

[task 2020-04-30T04:43:43.213Z] 04:43:43     INFO - PID 1678 | 1588221823183	webdriver::server	DEBUG	-> POST /session/c0b0a745-6571-5844-9016-25179d3d44cd/execute/sync {"args": [], "script": "window.alert('Hello'); return 1;"}
[task 2020-04-30T04:43:43.214Z] 04:43:43     INFO - PID 1678 | 1588221823185	Marionette	DEBUG	0 -> [0,342,"WebDriver:ExecuteScript",{"args":[],"script":"window.alert('Hello'); return 1;"}]
[task 2020-04-30T04:43:43.279Z] 04:43:43     INFO - PID 1678 | 1588221823250	Marionette	TRACE	Received observer notification tabmodal-dialog-loaded
[task 2020-04-30T04:43:43.280Z] 04:43:43     INFO - PID 1678 | 1588221823275	Marionette	DEBUG	0 <- [1,342,null,{"value":null}]
[task 2020-04-30T04:43:43.298Z] 04:43:43     INFO - PID 1678 | 1588221823288	Marionette	TRACE	Received event DOMModalDialogClosed
[task 2020-04-30T04:43:43.317Z] 04:43:43     INFO - PID 1678 | JavaScript error: resource://gre/modules/Prompter.jsm, line 1178: NS_ERROR_NOT_AVAILABLE: prompt aborted by user
[task 2020-04-30T04:43:43.317Z] 04:43:43     INFO - PID 1678 | 1588221823312	webdriver::server	DEBUG	<- 200 OK {"value":null}
[task 2020-04-30T04:43:43.318Z] 04:43:43     INFO - PID 1678 | 1588221823314	webdriver::server	DEBUG	-> POST /session/c0b0a745-6571-5844-9016-25179d3d44cd/alert/accept {}
[task 2020-04-30T04:43:43.335Z] 04:43:43     INFO - PID 1678 | 1588221823319	Marionette	DEBUG	0 -> [0,343,"WebDriver:AcceptDialog",{}]
[task 2020-04-30T04:43:43.335Z] 04:43:43     INFO - PID 1678 | 1588221823321	Marionette	DEBUG	0 <- [1,343,{"error":"no such alert","message":"","stacktrace":"WebDriverError@chrome://marionette/content/error.js:175:5\nNoSuchA ... t@chrome://marionette/content/server.js:249:9\n_onJSONObjectReady/<@chrome://marionette/content/transport.js:501:20\n"},null]

The call to WebDriver:ExecuteScript causes a tabmodal-dialog-loaded event, which means the alert prompt has been opened. Then the command within Marionette returns just fine. But why is that followed with DOMModalDialogClosed immediately? There is clearly NO user interaction here. Closing the alert happens around 33ms later via WebDriver:AcceptDialog.

In working cases it looks like:

6:53.63 pid:70610 1588230028090	Marionette	DEBUG	0 -> [0,100,"WebDriver:ExecuteScript",{"args":[],"script":"window.alert('Hello'); return 1;"}]
 6:53.67 pid:70610 1588230028121	Marionette	TRACE	Received observer notification tabmodal-dialog-loaded
 6:53.67 pid:70610 1588230028123	Marionette	DEBUG	0 <- [1,100,null,{"value":null}]
 6:53.67 pid:70610 1588230028133	webdriver::server	DEBUG	<- 200 OK {"value":null}
 6:53.67 pid:70610 1588230028134	webdriver::server	DEBUG	-> POST /session/f3874870-b931-864e-8b16-47dfe6bb409c/alert/accept {}
 6:53.68 pid:70610 1588230028136	Marionette	DEBUG	0 -> [0,101,"WebDriver:AcceptDialog",{}]
 6:53.68 pid:70610 1588230028141	Marionette	TRACE	Received event DOMModalDialogClosed
 6:53.68 pid:70610 1588230028143	Marionette	TRACE	Received DOM event DOMModalDialogClosed for [object XULFrameElement]
 6:53.69 INFO STDOUT: PASSED
 6:53.70 pid:70610 1588230028155	Marionette	DEBUG	0 <- [1,101,null,{"value":null}]

Same for this Marionette test:

https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=300161070&repo=autoland&lineNumber=38716-38721

[task 2020-04-30T07:33:36.963Z] 07:33:36     INFO -  1588232016960	Marionette	DEBUG	48 -> [0,2,"WebDriver:ExecuteAsyncScript",{"scriptTimeout":null,"newSandbox":true,"args":["alert","foo alert"],"filename":"Z:\\task ... .return_value = null;\n            window.return_value = window[arguments[0]](arguments[1]);","sandbox":"default","line":45}]
[task 2020-04-30T07:33:36.970Z] 07:33:36     INFO -  1588232016967	Marionette	TRACE	Received observer notification tabmodal-dialog-loaded
[task 2020-04-30T07:33:36.970Z] 07:33:36     INFO -  1588232016968	Marionette	DEBUG	48 <- [1,2,null,{"value":null}]
[task 2020-04-30T07:33:36.971Z] 07:33:36     INFO -  1588232016969	Marionette	TRACE	Received event DOMModalDialogClosed
[task 2020-04-30T07:33:36.974Z] 07:33:36     INFO -  JavaScript error: resource://gre/modules/Prompter.jsm, line 1178: NS_ERROR_NOT_AVAILABLE: prompt aborted by user
[task 2020-04-30T07:33:36.977Z] 07:33:36     INFO -  1588232016973	Marionette	DEBUG	48 -> [0,3,"WebDriver:GetTitle",{}]
[task 2020-04-30T07:33:36.977Z] 07:33:36     INFO -  1588232016974	Marionette	DEBUG	48 <- [1,3,null,{"value":"Marionette Test"}]

The tab modal gets opened and immediately closed. As such the call to WebDriver:GetTitle fails because it expects an alert prompt to be open.

So something clearly closes the opened tab modal prompt.

CC'ing both Johann and Matthew because they reviewed those patches, and maybe we can spot that problem earlier.

The exception is thrown when abortPrompt is called on CommonDialog.jsm. This happens in a couple cases:

  • "unload" and "TabClose" events here
  • Switching away from a tab with an active beforeunload prompt here
  • The PromptParent window actor calls abortPrompt on destroy and when the content fires pagehide

There could be a race condition with how we handle pagehide events. Since they are fired in the content process, in the new implementation they are sent to the parent via an async actor message. The old implementation handled this with an event listener in Prompter.jsm in the content process. So given that browser (+ JSWindowActor pair) is reused it could be side effects from a preceding test.

I've seen a similar issue in another test: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=300037356&repo=try&lineNumber=14207

For those affected tests we don't do any of those above mentioned cases.

Henrik if it is decided to backout based on convo here https://bugzilla.mozilla.org/show_bug.cgi?id=1629147#c8, please contact sherrifs on duty on riot in #sheriffs, or by ni the sheriffing alias. thank you.

A few general questions before we rush into a backout:

  • How many tests is this causing to fail per day/week?
  • Are these tests tier 1 or tier 2?
  • What's the urgency to back out quickly instead of waiting another week for a better resolution?

We would obviously like to avoid backing out a patch that has landed in central a while ago and now lives beneath a bunch of other work.

Flags: needinfo?(hskupin)
Flags: needinfo?(apavel)

(In reply to Johann Hofmann [:johannh] from comment #17)

A few general questions before we rush into a backout:

  • How many tests is this causing to fail per day/week?
  • Are these tests tier 1 or tier 2?
  • What's the urgency to back out quickly instead of waiting another week for a better resolution?

We would obviously like to avoid backing out a patch that has landed in central a while ago and now lives beneath a bunch of other work.

This caused 59 total failures in the last 7 days in bug 1629147, tier1.
In bug 1631112, there are 15 total failures in the last 7 days, tier1.

We usually backout if there is too much noise on the trees, or if the backout was requested. We (sheriffs) can wait for a patch if this is actively being worker on. Leaving the ni for Henrik if he has something to add here.

Flags: needinfo?(apavel)

Nothing more to add from my side beside that I would like to see this fixed sooner than later. And yes, it passed already a week but not because it was reported that late. So hopefully we can this sort out soon.

Flags: needinfo?(hskupin)

Johann is there an ETA for the fix here?

Flags: needinfo?(jhofmann)

I'm looking to get it fixed this week.
Currently running with verbose logging on try and also seeing if I can reproduce it locally, on a macOS and Windows VM. I have a suspect but I want to be able to reproduce it reliably before landing a fix.

For verbose logging of Marionette you can add -vv --gecko-log - to the mach command, which will enable trace logging. For wdspec tests use --webdriver-arg=-vv.

I was able to reproduce it locally on macOS. It looks like it is a race condition with the pagehide event. So far the try runs look good. I'll submit a patch later today.

Component: Marionette → Security
Priority: P3 → --
Product: Testing → Firefox
Version: Version 3 → unspecified
Attachment #9145831 - Attachment description: Bug 1631362 - Bind prompts to JSWindowActor lifetime instead of closing them on pagehide. r=johannh! → Bug 1631362 - Bind prompts to JSWindowActor lifetime instead of closing them on pagehide. r=MattN!
Attachment #9145831 - Attachment description: Bug 1631362 - Bind prompts to JSWindowActor lifetime instead of closing them on pagehide. r=MattN! → Bug 1631362 - Bind prompts to JSWindowActor lifetime instead of closing them on pagehide. r=gijs
Attachment #9145831 - Attachment description: Bug 1631362 - Bind prompts to JSWindowActor lifetime instead of closing them on pagehide. r=gijs → Bug 1631362 - Updated prompt implementation to use correct BrowsingContext and only send "pagehide" messages while prompts are open. r=gijs!
Attachment #9145831 - Attachment description: Bug 1631362 - Updated prompt implementation to use correct BrowsingContext and only send "pagehide" messages while prompts are open. r=gijs! → Bug 1631362 - Bind prompts to JSWindowActor lifetime instead of closing them on pagehide. r=gijs!
Pushed by pzuhlcke@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/1dc249e62e8a
Bind prompts to JSWindowActor lifetime instead of closing them on pagehide. r=Gijs
Status: ASSIGNED → RESOLVED
Closed: 4 months ago
Resolution: --- → FIXED
Target Milestone: --- → Firefox 78
Duplicate of this bug: 1632805
Flags: needinfo?(pbz)
Flags: needinfo?(jhofmann)

All the Marionette and Webdriver tests are looking fine now! Thanks a lot.

Paul, can you please request uplift so we can also get this fixed for 77 beta? Maybe we should even track the release for it?

Flags: needinfo?(pbz)

Glad its fixed!

We haven't really seen issues anywhere outside of the test environment so I'd rather have this ride the train for now. Please let me know if you have concerns about that.

Flags: needinfo?(pbz)

(In reply to Paul Zühlcke [:pbz] from comment #29)

Glad its fixed!

We haven't really seen issues anywhere outside of the test environment so I'd rather have this ride the train for now. Please let me know if you have concerns about that.

Sorry that I missed this comment and an important note: One thing you did not know is that Marionette and geckodriver are tools, which are part of several web testing frameworks. As such they are not only used by ourselves in CI but also heavily by web developers via the Selenium ecosystem to test web pages, and to automate certain tasks. While we do not know what exactly gets tested, it's fair to assume that there will be a lot of tests for user prompts (alert, confirm, prompt) and also including beforeunload.

Given the amount of intermittent failures we had with this bug being present, we can assume that with not fixing it for Firefox 77 a lot of users will also face it. Also with not having a preference to turn the new behavior off, there would even be no workaround for users, or even us to somewhat hide the failure by implementing a temporary workaround in geckodriver.

As such we may have to re-evaluate the importance of the attached patch and that we should ship it with Firefox 77. I know it is very late and we have the 77.0 RC3 build out. But nevertheless I have to bring this up for consideration by the release drivers. Not sure who will be best to be notified here, so setting needinfo for a couple of folks.

Flags: needinfo?(sledru)
Flags: needinfo?(ryanvm)
Flags: needinfo?(pbz)
Flags: needinfo?(pascalc)

Sorry but I can't help. i am no longer involved in release management

Flags: needinfo?(sledru)

(In reply to Henrik Skupin (:whimboo) [⌚️UTC+2] from comment #30)

Sorry that I missed this comment and an important note: One thing you did not know is that Marionette and geckodriver are tools, which are part of several web testing frameworks. As such they are not only used by ourselves in CI but also heavily by web developers via the Selenium ecosystem to test web pages, and to automate certain tasks. While we do not know what exactly gets tested, it's fair to assume that there will be a lot of tests for user prompts (alert, confirm, prompt) and also including beforeunload.

Given the amount of intermittent failures we had with this bug being present, we can assume that with not fixing it for Firefox 77 a lot of users will also face it. Also with not having a preference to turn the new behavior off, there would even be no workaround for users, or even us to somewhat hide the failure by implementing a temporary workaround in geckodriver.

As such we may have to re-evaluate the importance of the attached patch and that we should ship it with Firefox 77. I know it is very late and we have the 77.0 RC3 build out. But nevertheless I have to bring this up for consideration by the release drivers. Not sure who will be best to be notified here, so setting needinfo for a couple of folks.

Hi Henrik,

Unfortunately this is way too late to include it in Tuesday's release with all of our European QA on PTO on Monday, I am going to track it for a potential 77 dot release, if you can monitor external feedback for this issue, that would be good. We could add a mention in our release notes as a known issue if you can provide the wording.

Flags: needinfo?(pascalc) → needinfo?(hskupin)
Flags: needinfo?(ryanvm)

(In reply to Pascal Chevrel:pascalc from comment #32)

Unfortunately this is way too late to include it in Tuesday's release with all of our European QA on PTO on Monday, I am going to track it for a potential 77 dot release, if you can monitor external feedback for this issue, that would be good. We could add a mention in our release notes as a known issue if you can provide the wording.

I cannot explain the underlying details in Firefox. Paul would be the best person here.

Flags: needinfo?(hskupin)

(In reply to Paul Zühlcke [:pbz] from comment #14)

There could be a race condition with how we handle pagehide events. Since they are fired in the content process, in the new implementation they are sent to the parent via an async actor message. The old implementation handled this with an event listener in Prompter.jsm in the content process. So given that browser (+ JSWindowActor pair) is reused it could be side effects from a preceding test.

Given that removing the pagehide window actor message fixed the issue this was most likely the cause for the intermittent tests.
That was the switch from an event listener in the content process here: https://searchfox.org/mozilla-central/rev/74bd4471abc8545c9a12424b33c6d90889b864aa/toolkit/components/prompts/src/Prompter.jsm#608
to the async window actor message here: https://searchfox.org/mozilla-central/rev/3075dbd453f011aaf378bcac6b2700dccfcf814c/browser/actors/PromptChild.jsm#18

TLDR: In 77 there could be situations in the test environment where the pagehide event of the previous page could cause a prompt spawned in the new page to be closed early. However, only if the window (+ the associated window actor) of the old page is reused for the new page.

Flags: needinfo?(pbz)

Thanks Paul! I have added an item to the developer release notes of Marionette:
https://wiki.developer.mozilla.org/en-US/docs/Mozilla/Firefox/Releases/77#WebDriver_conformance_Marionette

Also Paul mentioned that it might be better to not fix it in a dot release of Firefox 77 but let it ride the train. Lets see how many users of Webdriver are affected here, before we actually escalate the problem. I will keep an eye on feedback from the Selenium community.

You need to log in before you can comment on or make changes to this bug.