Closed Bug 1788894 Opened 2 years ago Closed 2 years ago

Move the Realm cache to the WindowGlobal message manager

Categories

(Remote Protocol :: WebDriver BiDi, enhancement, P2)

enhancement
Points:
2

Tracking

(firefox114 fixed)

RESOLVED FIXED
114 Branch
Tracking Status
firefox114 --- fixed

People

(Reporter: whimboo, Assigned: Sasha, NeedInfo)

References

(Blocks 4 open bugs)

Details

(Whiteboard: [webdriver:m7])

Attachments

(1 file)

With bug 1788657 we will get a basic implementation of the script.realmCreated event, which will already send internal events but suffers from the fact that the window global message handler is not automatically created. As such the event of the realm for the default window global is not emitted.

With the work on bug 1788657 we will get the possibility to define the script module to be auto-enabled on message handler global startup. Once available we will be able to send this missing realm event.

At the same time we should keep a realm cache in the root's script module so that commands eg. script.getRealms could rely on the information and allows a quick retrieval of the corresponding browsing context.

Maybe we should also wait until bug 1788659 has been implemented so that no-longer existing realms will be reported as well.

Blocks: 1788926
Priority: -- → P2
Whiteboard: [webdriver:backlog:blocked]
Blocks: 1750546

Note that the cache of realms we already have in the windowglobal script module.

But as I noticed when trying to send log.entryAdded events with a DOM Node included in the payload the serialization requires a realm. Given that there is no possibility right now to get access to the existing realms from the log module, we most likely want to move the Realm cache to the windowglobal message handler. This can be done because the Realm class doesn't contain any WebDriver BiDi specific code.

Summary: Use a Realm cache in the script root module → Move the Realm cache to the WindowGlobal message manager
Blocks: 1731589

(In reply to Julian Descottes [:jdescottes] from bug 1731589 comment #14)

My initial idea was that Realms should be provided by the MessageHandler class, and not by a specific module.

Most of the Realm logic which is at the moment in windowglobal/script could move to WindowGlobalMessageHandler, which would then expose getters such as getRealm and getRealms. We can have a dedicated file/class to handle this, so that WindowGlobalMessageHandler doesn't become huge. Realms are under webdriver-bidi at the moment, but there is nothing bidi-specific about them so I think it should be fine.

The other option would be to keep this in the Script module, but expose #getRealm (etc...) as internal commands so that other modules can use it. But my overall feeling is that commands are quite verbose, and I think it's best to avoid them for communication which necessarily happens within a single MessageHandler "node".

So my suggestion would be: move Realms handling to a dedicated class owned by MessageHandler. We can have a meeting to discuss or we can try to fit that during a regular triage meeting. I'll flag the bug for triage, we can decide if it needs its own meeting or not.

(for reference there was some discussion about this in https://bugzilla.mozilla.org/show_bug.cgi?id=1742589 as well)

Whiteboard: [webdriver:backlog:blocked] → [webdriver:backlog:blocked][webdriver:triage]

As discussed in the triage meeting we will move forward with this bug and put it into the message handler. Each WebDriver session should have it's own list of Realms.

Whiteboard: [webdriver:backlog:blocked][webdriver:triage] → [webdriver:backlog:blocked]
Blocks: 1788657, 1788659, 1788760
No longer blocks: 1788926
No longer depends on: 1788657, 1788659, 1788760
Whiteboard: [webdriver:backlog:blocked] → [webdriver:backlog]
Points: --- → 2
Priority: P2 → P1
Whiteboard: [webdriver:backlog] → [webdriver:m7]
Priority: P1 → P2
Assignee: nobody → aborovova
Status: NEW → ASSIGNED
Pushed by aborovova@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/a24a3cdae60f Move Realm related logic from script module to the WindowGlobal message manager. r=webdriver-reviewers,jdescottes

Backed out for causing wd failures on script.sys.mjs

[task 2023-04-26T10:07:10.017Z] PID 607 | 1682503630016	RemoteAgent	TRACE	Module windowglobal/script.jsm found for WINDOW_GLOBAL
[task 2023-04-26T10:07:10.017Z] PID 607 | 1682503630016	RemoteAgent	DEBUG	WebDriverBiDiConnection 6a4be93f-67d3-4ab7-a88b-19c0606a70de <- {"id":98,"result":{}}
[task 2023-04-26T10:07:10.018Z] PID 607 | 1682503630018	RemoteAgent	DEBUG	WebDriverBiDiConnection 6a4be93f-67d3-4ab7-a88b-19c0606a70de <- {"id":99,"error":"unknown error","message":"TypeError: can't access property \"column\", stack is null","stacktrace":"#buildExceptionDetails@chrome://remote/content/webdriver-bidi/modules/windowglobal/script.sys.mjs:86:7\n#buildReturnValue@chrome://remote/content/webdriver-bidi/modules/windowglobal/script.sys.mjs:160:56\n"}
[task 2023-04-26T10:07:10.077Z] TEST-UNEXPECTED-FAIL | Evaluation specs Page.evaluate should throw error with detailed information on exception inside promise  (evaluation.spec.js) | expected PASS
[task 2023-04-26T10:07:10.086Z] TEST-INFO took 88ms
[task 2023-04-26T10:07:10.086Z] PID 607 | ["fail",{"title":"should throw error with detailed information on exception inside promise ","fullTitle":"Evaluation specs Page.evaluate should throw error with detailed information on exception inside promise ","file":"/builds/worker/checkouts/gecko/remote/test/puppeteer/test/build/evaluation.spec.js","duration":45,"currentRetry":0,"err":"expect(received).toContain(expected) // indexOf\n\nExpected substring: \"Error in promise\"\nReceived string:    \"Protocol error (script.callFunction): unknown error TypeError: can't access property \\\"column\\\", stack is null #buildExceptionDetails@chrome://remote/content/webdriver-bidi/modules/windowglobal/script.sys.mjs:86:7\n#buildReturnValue@chrome://remote/content/webdriver-bidi/modules/windowglobal/script.sys.mjs:160:56\n\"","stack":"Error: expect(received).toContain(expected) // indexOf\n\nExpected substring: \"Error in promise\"\nReceived string:    \"Protocol error (script.callFunction): unknown error TypeError: can't access property \\\"column\\\", stack is null #buildExceptionDetails@chrome://remote/content/webdriver-bidi/modules/windowglobal/script.sys.mjs:86:7\n#buildReturnValue@chrome://remote/content/webdriver-bidi/modules/windowglobal/script.sys.mjs:160:56\n\"\n    at Context.<anonymous> (/builds/worker/checkouts/gecko/remote/test/puppeteer/test/src/evaluation.spec.ts:499:29)\n    at runMicrotasks (<anonymous>)\n    at processTicksAndRejections (node:internal/process/task_queues:96:5)"}]
Flags: needinfo?(aborovova)

Hi Olli,

We have an unexpected issue with JSWindowActors and the windows they are created for.

To simplify when our JSWindowActor is created (actorCreated), we want to enable async stacktraces on the windowglobal, and when it is destroyed (didDestroy), we disable async stacktraces.

But this doesn't work as expected, because typically when we go from about:blank to any regular web page, the following will happen:

  • a first JSWindowActor is created (for innerWindowId eg equal to 24)
  • a second JSWindowActor is created (for innerWindowId eg equal to 6442450945)
  • the first JSWindowActor is destroyed, but at that point this.contentWindow is null, and all the other getters to get a window (eg this.manager.window, this.document.defaultView etc...) all point to the window with innerWindowId 6442450945

(I'm guessing we have 2 actors created because of the temporary about:blank?)

We knew that those getters might not point to the correct document (Bug 1669961), but what is surprising is that in the destroy, we don't use the getters, instead we use references to the window object which we created in the constructor. Are all those window objects only proxies? If that's correct, is there a way to get a reference to the real window object? Or should we just assume that if contentWindow is null it's too late to perform any cleanup and the window is gone anyway.

Flags: needinfo?(smaug)
Flags: needinfo?(aborovova)
Pushed by aborovova@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/ab69eb7e83f9 Move Realm related logic from script module to the WindowGlobal message manager. r=webdriver-reviewers,jdescottes
Status: ASSIGNED → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
Target Milestone: --- → 114 Branch

I assume needinfo can be cleared given that this bug is closed.

And yes, window objects are proxies.
https://html.spec.whatwg.org/#the-windowproxy-exotic-object
https://html.spec.whatwg.org/#dom-document-defaultview

If you need to store something inner window specific, could you use WindowContext?

Flags: needinfo?(smaug)

(In reply to Olli Pettay [:smaug][bugs@pettay.fi] from comment #10)

I assume needinfo can be cleared given that this bug is closed.

And yes, window objects are proxies.
https://html.spec.whatwg.org/#the-windowproxy-exotic-object
https://html.spec.whatwg.org/#dom-document-defaultview

If you need to store something inner window specific, could you use WindowContext?

Thanks for the feedback, the needinfo was still valid, we just worked around this problem to be able to land, but we'll need to address this :)

We don't need to store something on window objects, in a destroy codepath we need to cleanup a state which is applied on the window (enable/disableAsyncStack).

But since window objects are just proxies, we always run the risk of performing this cleanup on a different window than the one where we called enableAsyncStack. It seems that the only way to check if a window object is still the same global is to store something else, eg the innerWindowID, and check that it still matches before performing the cleanup.

This also brings the question of "when" we should run this cleanup. Right now we cleanup everything on the destroy of the JSWindowActor, but maybe we need to split this logic and do all the cleanup related to the window earlier, before the window changes?

Flags: needinfo?(smaug)

What sorts of cleanups need to happen? Could you just do cleanup with a new global is created?
And perhaps store whatever data you need using innerwindowid as the key?

Flags: needinfo?(smaug)

(In reply to Olli Pettay [:smaug][bugs@pettay.fi] from comment #12)

What sorts of cleanups need to happen? Could you just do cleanup with a new global is created?

The cleanup in question is to disable async stacktraces, which needs to be called on the global itself (https://searchfox.org/mozilla-central/rev/566f69826b829767d0373169eef6c50cc7d2241b/remote/webdriver-bidi/Realm.sys.mjs#183,187).

The idea was: on JSWindowActor creation, enable async stack traces on the global, and on JSWindowActor destroy, disable stack traces on the same global.

Let's say we have a navigation between from page1 to page2, with the globals global1 and global2, and JSWindowActors actor1 and actor2.

In practice what happens:

  • actor1 is created for page1 + global1, we enable stack traces for global1
  • start the navigation from page1 to page2
  • actor2 is created for page2 + global2, we enable stack traces for global2
  • actor1 is destroyed for page1, but at that point the references we had for its window now point to global2. So when we try disable async traces, we actually disable them for global2, after actor2 already enabled it.

Doing cleanup when the new global is created doesn't really work because at that time global1 is already gone.
My guess is that in this scenario, doing the cleanup is pointless because the global is effectively gone. But in other scenarios (eg when automation stops) we still want to disable async stacktraces. Which means we need to special case the destroy codepaths where the innerWindowId no longer matches the one from the actor's window.

And perhaps store whatever data you need using innerwindowid as the key?

Right, we can come up with some way of checking that the global is still valid by using the innerwindowid.
Thanks.

hmm, what is the reason to disable async stacktraces? If it is needed, could DOM code do that when it replaces the inner window?

(In reply to Olli Pettay [:smaug][bugs@pettay.fi] from comment #14)

hmm, what is the reason to disable async stacktraces? If it is needed, could DOM code do that when it replaces the inner window?

I missed the question. So the "overall" reason we want to disable async stacktraces is to cleanup the state when our automation session ends. If async stack traces are not supposed to be enabled by default, then that's how it should be once the browser is no longer under automation.

All our modules implement a "destroy" method that should be responsible for cleaning listeners, restoring state, etc... When the session ends, we should call "destroy" on all those modules. In this particular example, this should disable async stack traces, so that the window goes back to its regular state. And that should work fine, there's almost no chance we will run into the race conditions mentioned in this bug when we close the session.

However, since the lifecycle of our (content process) modules is tied to the lifecycle of the windowglobal, we also call "destroy" when the corresponding JSWindowActor gets destroyed. And when this happens during a navigation, we run into the issues I described. In this specific case, I am not sure disabling async stack straces is needed, since the window gets destroyed. It's just that we try to always cleanup everything in our "destroy" methods, even if some of the cleanup might not be needed when the "destroy" is triggered by the destruction of the windowglobal.

If it is needed, could DOM code do that when it replaces the inner window?

It might be useful. In practice, this gets set on the realm at https://searchfox.org/mozilla-central/rev/ef61a8e576acebd871dbb38f363f0380503c4fb3/js/src/debugger/Debugger.cpp#6330
I don't know if this state persists after a bfcache navigation for instance? I am thinking about the following scenario:

  • start automation session
  • navigate to pageA
  • navigate to pageB (pageA goes in bfcache)
  • stop automation session (this disables async stacks for pageB)
  • go back

Here we would need pageA to no longer have async stack traces enabled. Was the realm for pageA also saved in bfcache, so does it retain isAsyncStackCapturingEnabled?

I will need to test this, so ni? myself for that, but maybe you know already the answer so adding a ni? back for you as well.
If it turns out to be the case, then yes we will need DOM code to do the cleanup, because it seems we can't do it effectively from JSWA's didDestroy.

Flags: needinfo?(smaug)
Flags: needinfo?(jdescottes)

bfcache is perhaps a bit complicated right now since Android uses still the old implementation. On desktop innerwindow is not changed because of bfcache, since the whole browsing context is changed.
Hmm, but yeah, stopping automation would need to disable stuff on the page which is still in the bfcache.
It might be easiest to add some native code which deals with this all.

Flags: needinfo?(smaug)

Bug 1847097 has been filed for the follow-up work.

See Also: → 1847097
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: