[gv-junit-fis] NavigationDelegate tests: `session.evaluateJS` times out due to no open ports available in GeckoSessionTestRule
Categories
(GeckoView Graveyard :: Sandboxing, defect, P3)
Tracking
(firefox120 wontfix, firefox121 fixed)
People
(Reporter: agi, Assigned: m_kato)
References
(Blocks 1 open bug)
Details
(Whiteboard: [fission:android:m2] [geckoview:2022q3][geckoview:m115?][geckoview:m119?][fxdroid][foundation])
Attachments
(2 files)
This bug tracks enabling all NavigationDelegateTest to run on Fission.
| Reporter | ||
Updated•5 years ago
|
| Reporter | ||
Updated•5 years ago
|
Updated•5 years ago
|
Comment 1•5 years ago
|
||
Tracking this bug for Android Fission milestone M2 (pass tests with Fission enabled).
Updated•4 years ago
|
Updated•3 years ago
|
Updated•3 years ago
|
Comment 2•3 years ago
|
||
Moving Android Fission bugs to the new GeckoView::Sandboxing component.
Updated•2 years ago
|
Updated•2 years ago
|
Comment 3•2 years ago
|
||
I am converting this bug. We actually have three errors in NavigationDelegate tests, and let us have this bug documenting one of them.
This error lies in our test framework. To assert on some things, we use session.evaluateJS method, which relies on the test-helper web extension. That web extension essentially populates the list of open ports (mPorts) in GeckoSessionTestRule on pageshow event, but on Fission that doesn't work due to the process switch. I checked the production code in Web Extension framework - everything there works correctly. We might want to rethink the way we populate the list of open ports in GeckoSessionTestRule, or possibly find another way to do the assertions.
Currently the affected tests are:
- NavigationDelegateTest#loadFileNotFound
- NavigationDelegateTest#bypassHTTPSOnlyError
Updated•2 years ago
|
Updated•2 years ago
|
Updated•2 years ago
|
Updated•2 years ago
|
Updated•2 years ago
|
Updated•2 years ago
|
Updated•2 years ago
|
| Assignee | ||
Comment 4•2 years ago
|
||
test-support.js attaches the event listener for pageshow, but it is attached on old window object... Is there a way to reload content script of web extension after switching process?
Comment 5•2 years ago
|
||
(Unassigning myself because I'm focusing on translations at the moment.)
I have in my notes that the event listener can catch a focus event in test-support.js. Not sure if that observation could be useful for finding the right window or reloading or anything, but thought I'd mention it on the off-chance it was useful.
Comment 6•2 years ago
•
|
||
(In reply to Makoto Kato [:m_kato] from comment #4)
test-support.jsattaches the event listener for pageshow, but it is attached on old window object... Is there a way to reload content script of web extension after switching process?
Can we consider some other events? Are there more global events, that attach to the parents of those window objects? Can we try the focus event that Olivia mentions?
Re: the way to reload the content script - I would consult Robwu on this, he mentioned something like that when I was debugging this. Rob, do you have ideas, can this be done?
Comment 7•2 years ago
|
||
Makoto, I'll assign the bug to you, thank you for your help!
Comment 8•2 years ago
|
||
(In reply to Makoto Kato [:m_kato] from comment #4)
test-support.jsattaches the event listener for pageshow, but it is attached on old window object... Is there a way to reload content script of web extension after switching process?
This claim is dubious - test-support.js is a content script, whose lifetime is tied to the document it's running in. If there is a process switch, that would imply that there is a new document that replaces the previous one, which in turn implies that a new port should be opened (from the new content script).
If you're debugging, it is possible for the "old" window (inner window) to be around, if the document is stored in the bfcache (and before that happens, "pagehide" is fired). If the document is revived by navigating to it, then "pageshow" is fired in that original content script (with the "old" window being the current window again).
I took a quick look at GeckoSessionTestRule to see if anything stood out, and the logic seems to be correct, specifically at:
MessageDelegateto receive ports and process messages: lines 2323-2392webExtensionApiCall: Connecting to the test-support extension: lines 2758-2799waitForMessageto wait for the response: lines 2494-2521- The wait at line 2496 should eventually return because
MessageDelegatehas logic in detach to release the message. And the port is removed upon disconnect, so at worst line 2446 or line 2797 would be a NPE if somehow the port doesn't exist.
- The wait at line 2496 should eventually return because
(In reply to [:owlish] 🦉 PST from comment #6)
Re: the way to reload the content script - I would consult Robwu on this, he mentioned something like that when I was debugging this. Rob, do you have ideas, can this be done?
Are you asking the right question? I think that it's not a question of how to reload the content script, but how to detect the load completion (or at least get the ports to connect).
Given the circumstances (content script loads without fission, and the apparent observation of the port not connecting), I wonder whether it is possible for the content script to have run too late (after the "pageshow"). The `"run_at": "document_start" declaration of test-support.js in manifest.json requests immediate execution at document_start. But if the extension loads after the document has been initialized, it's possible for the content script to be executed after that point, potentially even after "pageshow" has already fired. So I suggest to check whether this is happening, e.g. by adding logging.
Comment 9•2 years ago
|
||
Thank you so much, Rob!
But if the extension loads after the document has been initialized, it's possible for the content script to be executed after that point, potentially even after "pageshow" has already fired.
According to my notes, that is exactly what happens. That is why in the test itself we have two page loads - one is to load the conent script so that it sets the pageshow listeners, and the second one to get those listeners to fire. Speaking of which, Makoto, another idea for you: what if we changed that flow in the test to first load a normal page, and only then the error (instead of the two error loads), would that help?
after switching process
Sorry, I only now realized - there should be no process switch on the fresh rebase. I merged a patch to prevent that from happening specifically on Android when loading error pages as a temporary solution.
| Assignee | ||
Comment 10•2 years ago
•
|
||
Since I don't know web extension code, I have a question. When setting content_scripts property, where is this script loaed in gecko's code? I should debug web extension code internally for GeckoView.
This log is example when this occurs. pageshow is fired, but content scripts doesn't receive it. I suspect GeckoView's web extensions implementation if this doesn't occur on Desktop.
09-19 14:40:57.468 7681 7696 I Isolated Web Content: loading test-support.js
...
09-19 14:40:57.529 7547 7566 D GeckoViewXUL: WillChangeBrowserRemoteness
09-19 14:40:57.530 7547 7566 D GeckoViewMediaControl: onDestroyBrowser
09-19 14:40:57.531 7681 7696 I Isolated Web Content: pagehide on test-support.js
09-19 14:40:57.531 7681 7696 E Isolated Web Content: [JavaScript Error: "TypeError: can't access property "disconnect", backgroundPort is null" {file: "moz-extension://ea2bd5a4-08f2-4bcf-a708-d3a7a2302198/test-support.js" line: 57}]
09-19 14:40:57.531 7681 7696 E Isolated Web Content: @moz-extension://ea2bd5a4-08f2-4bcf-a708-d3a7a2302198/test-support.js:57:3
09-19 14:40:57.532 7681 7696 D GeckoViewSelectionActionDelegate[C]: handleEvent: visibilitychange
09-19 14:40:57.532 7681 7696 D GeckoViewClipboardPermissionChild[C]: handleEvent: pagehide
09-19 14:40:57.533 7681 7696 D GeckoViewAutoFill[C]: handleEvent: pagehide
...
09-19 14:40:57.535 7547 7566 D GeckoViewXUL: DidChangeBrowserRemoteness
09-19 14:40:57.536 7547 7566 D GeckoViewNavigation: sessionContextId=null
09-19 14:40:57.536 7547 7566 D GeckoViewMediaControl: onInitBrowser
...
09-19 14:40:57.542 7547 7566 D GeckoViewXUL: receiveMessage GeckoView:ContentModuleLoaded {}
...
09-19 14:40:57.545 7547 7566 E GeckoConsole: [JavaScript Error: "HTTPS-Only Mode: Upgrading insecure request “https://expired.example.com/” failed. (M21-C8181)"]
09-19 14:40:57.547 7681 7696 D GeckoViewContent[C]: handleEvent: pageshow
09-19 14:40:57.547 7681 7696 D GeckoViewProgressDelegate[C]: handleEvent: pageshow
09-19 14:40:57.548 7681 7696 D GeckoViewAutoFill[C]: handleEvent: pageshow
09-19 14:40:57.549 7547 7566 D GeckoViewProgress: receiveMessage: pageshow
| Assignee | ||
Comment 11•2 years ago
|
||
(Or I suspect GeckoView's navigation, so I have to make sense web extension's load logic with content script).
Comment 12•2 years ago
•
|
||
(In reply to [:owlish] 🦉 PST from comment #9)
Thank you so much, Rob!
But if the extension loads after the document has been initialized, it's possible for the content script to be executed after that point, potentially even after "pageshow" has already fired.
According to my notes, that is exactly what happens. That is why in the test itself we have two page loads - one is to load the conent script so that it sets the
pageshowlisteners, and the second one to get those listeners to fire.
If you are indeed encountering the issue of the content script executing after "pageshow" (which seems to be the case according to the log from comment 10), then I suggest to change the implementation as follows:
- In test-support/manifest.json, replace
document_startwithdocument_end. The script only uses pageshow/pagehide, and these events do not require very early registration. The use of document_start has unwanted side effects such as blocking the DOM parser until the scripts are ready to run. - Ordinarily, the first
pageshowfires afterwindow.onload, which in turn fires afterDOMContentLoaded. The latter event matches withdocument_end, which also why I'm suggesting to replace (unnecessarily early) document_start with document_end. - Use the following logic in test-support.js:
let backgroundPort = null;
let nativePort = null;
function connect() {
if (nativePort) {
return; // Already initialized
}
// Rest of logic from https://searchfox.org/mozilla-central/rev/c56cc57d547a5febcc34160e581f9d58c0326aa7/mobile/android/geckoview/src/androidTest/assets/web_extensions/test-support/test-support.js#9-42
}
window.addEventListener("pageshow", connect);
window.addEventListener("pagehide", () => {
backgroundPort.disconnect();
nativePort.disconnect();
nativePort = backgroundPort = null;
});
// Connect ASAP.
connect();
Alternatively, if you don't want the port to connect at DOMContentLoaded, but at the original point (i.e. around window.onload), wrap the last call in a document.readyState === "complete" check OR use document_idle instead of document_end.
load a normal page, and only then the error (instead of the two error loads), would that help?
The content script is only run in documents that match <all_urls>, usually http(s) and file:. They don't run in error pages. If you want it to run in more contexts, you have to add the about:-URL explicitly to the matches section in manifest.json.
(In reply to Makoto Kato [:m_kato] from comment #10)
Since I don't know web extension code, I have a question. When setting
content_scriptsproperty, where is this script loaed in gecko's code? I should debug web extension code internally for GeckoView.
Not sure how much that would help - adding dump(); or console.log calls in the content script (test-support.js) should provide enough logs in the console. To directly answer your question, see https://searchfox.org/mozilla-central/rev/c56cc57d547a5febcc34160e581f9d58c0326aa7/toolkit/components/extensions/ExtensionContent.sys.mjs#561-564,580-584
| Assignee | ||
Comment 13•2 years ago
•
|
||
Thanks for more information.
Best workaround is that loading content script after DidChangeBrowserRemoteness, but content process won't receive this notification. Even if document_end, this issue isn't resolved. After document element is created (initial-document-element-inserted) then onDOMContentLoaded is fired, WillChangeBrowserRemoteness is fired (It means that switching process occurs). After WillChangeBrowserRemoteness is fired, window object seems to be re-created with new window ID... pageshow is dispatched with new window id when this issue occurs.
And, when this problem doesn't occur (extremely low probability), content script is loaded after DidChangeBrowserRemoteness.
Reproduced test throws handleLoadError then switching process. GeckoView has delegation for handleLoadError, so I suspect that this causes something issues if this issue doesn't occur on Desktop... So I need to try to create a sample for desktop if possible.
| Assignee | ||
Comment 14•2 years ago
|
||
Use the following logic in test-support.js:
When this occur, 'pagehide' is received at first. So since native port is closed, we need something trigger to open native port by something way. "pageshow" is dispatched with new window id, so no listener...
| Assignee | ||
Comment 15•2 years ago
|
||
And this issue may not occur if bug 1849060 is fixed. But I am not sure.
Comment 16•2 years ago
|
||
(In reply to Makoto Kato [:m_kato] from comment #15)
And this issue may not occur if bug 1849060 is fixed. But I am not sure.
I doubt this issue is blocked on that bug. You can try replacing the current HTTP page with a resource type URL or even use regular Gecko's about: error page to see if that helps.
| Assignee | ||
Comment 17•2 years ago
•
|
||
(In reply to [:owlish] 🦉 PST from comment #16)
(In reply to Makoto Kato [:m_kato] from comment #15)
And this issue may not occur if bug 1849060 is fixed. But I am not sure.
I doubt this issue is blocked on that bug. You can try replacing the current HTTP page with a
resourcetype URL or even use regular Gecko'sabout:error page to see if that helps.
This seems to be related to error page handing. If we don't replace error page with something by onLoadError, I cannot reproduce this even if switching process. Both tests replace error page. Then in error page, since content script isn't loaded correctly, evaluateJS won't work. When loading replaced error URL, document's insertion event isn't fired (nsContentSink::NotifyDocElementCreate isn't called), so extension component doesn't insert content script... I need to debug DOM side and GV's LoadURIDelegate and this handling in docshell.
| Assignee | ||
Comment 18•2 years ago
|
||
- If onLoadError returns null, document is nothing. It means that document element isn't created then web extension won't work.
- If we return about:* URL etc with "match_about_blank: true", content script is loaded correctly. But it need to modify content script like comment #12.
| Assignee | ||
Comment 19•2 years ago
|
||
| Assignee | ||
Comment 20•2 years ago
|
||
GeckoView can hook error page by NavigationDelegate.onLoadError. But
when this delegation returns null, test-support Web extensions won't
be loaded correctly.
A content script of Web extensions is loaded when first document element
is inserted. But when onLoadError returns null, there is no document
element. So content script won't be loaded on this situation.
If not fission, content script is already loaded. But if fission, content
script may not be loaded if switching process by loading new URL.
So we should return non-null string if we use evaluateJS.
Also, document.reloadWithHttpsOnlyException may causes that the document
will be unloaded immediately before native message isn't handled. So I
use setTimeout to call it.
Comment 21•2 years ago
|
||
| Assignee | ||
Comment 22•2 years ago
|
||
(I request to back out since gv-junit failure rate is incremented due to PortDisconnectException)
Comment 23•2 years ago
|
||
Backed out as requested by dev for causing gv-junit failures
| Assignee | ||
Comment 24•2 years ago
|
||
Also org.mozilla.geckoview.test.NavigationDelegateTest#redirectLoadIframe is sometimes failure after landing this. I guess that this is another bug (we need to consdier HTTPS first?)
Comment 25•2 years ago
•
|
||
Do you think we need to block this one on some other bug, Makoto? What the next steps would be for this ticket?
| Assignee | ||
Comment 26•2 years ago
•
|
||
- redirectLoadIframe is failure high frequencies. This test uses another URL in automation, but I guess that it can fix by modifying iframe_redirect_automation.html. (http -> https). This was bug 1735786.
- Timing issue of
document.reloadWithHttpsOnlyException(). this might causes page navigation before addon replies response.
| Assignee | ||
Comment 27•2 years ago
|
||
Timing issue of document.reloadWithHttpsOnlyException(). this might causes page navigation before addon replies response.
Hmm, when this occurs, content script isn't executed via extensions. No GeckoView:WebExtension:PortMessage message?
Comment 28•2 years ago
•
|
||
Makoto, must we use a web extension for evaluateJS (maybe not the whole of evaluateJS, but in these specific cases)? Is there any other way to achieve this?
Comment 29•2 years ago
|
||
Zak, since you just recently worked with web extensions - do you have any ideas?
| Assignee | ||
Comment 30•2 years ago
|
||
(In reply to [:owlish] 🦉 PST from comment #28)
Makoto, must we use a web extension for
evaluateJS(maybe not the whole ofevaluateJS, but in these specific cases)? Is there any other way to achieve this?
This failure is communication port is immediately closed by executing JS. Even if document.reloadWithHttpsOnlyException() is executed on content script, the navigation by it is stopped unfortunately. When I try many workarounds, I will add simple workaround that is better.
https://treeherder.mozilla.org/jobs?repo=try&revision=6d235179f652fc0ea07ab2c8d46da863b97674f0 seems to fix it.
Updated•2 years ago
|
| Assignee | ||
Comment 31•2 years ago
|
||
When landing previous fix, redirectLoadIframe's failure rate is
incremented. I think this is same as bug 1735786.
Depends on D189217
Comment 32•2 years ago
|
||
Comment 33•2 years ago
|
||
| bugherder | ||
https://hg.mozilla.org/mozilla-central/rev/e10f0ce61cc0
https://hg.mozilla.org/mozilla-central/rev/d0c4fdebcd36
Updated•2 years ago
|
Updated•2 years ago
|
Updated•1 year ago
|
Description
•