browserAction Pop-up failure
Categories
(Core :: DOM: File, defect, P1)
Tracking
()
Tracking | Status | |
---|---|---|
firefox-esr78 | --- | unaffected |
firefox-esr91 | --- | fixed |
firefox93 | --- | unaffected |
firefox94 | --- | unaffected |
firefox95 | --- | fixed |
People
(Reporter: erosman, Assigned: rpl)
References
(Blocks 1 open bug, Regression)
Details
(Keywords: regression, Whiteboard: [addons-jira])
Attachments
(7 files)
70.65 KB,
application/x-xpinstall
|
Details | |
4.48 KB,
image/jpeg
|
Details | |
102.40 KB,
image/jpeg
|
Details | |
132.23 KB,
image/jpeg
|
Details | |
159.98 KB,
image/jpeg
|
Details | |
48 bytes,
text/x-phabricator-request
|
RyanVM
:
approval-mozilla-esr91+
|
Details | Review |
48 bytes,
text/x-phabricator-request
|
RyanVM
:
approval-mozilla-esr91+
|
Details | Review |
Since possibly 95.0a1 (2021-10-10) (64-bit), the browserAction pop-up & sidebar has been failing.
Once Firefox starts, they work and gradually start to fail. It has been the same for multiple unsigned addons that I have installed.
- When FF starts, pop-ups work
- After a while they stop working
- Opening a new window, pop-ups don't work
- Opening a new Private window and pop-ups work
- After a while, pop-ups also start to fail in Private window
Error messages
Layout was forced before the page was fully loaded. If stylesheets are not yet loaded this may cause a flash of unstyled content.
FrameData missing for ******* page moz-extension://************/content/popup.html ExtensionPageChild.jsm:429
initExtensionContext resource://gre/modules/ExtensionPageChild.jsm:429
initExtensionDocument resource://gre/modules/ExtensionProcessScript.jsm:322
I noticed that loading the popup or sidebar without the stylesheet
solves the problem.
Does not work
<head>
<meta charset="utf-8">
<meta name="viewport" content="width=device-width, initial-scale=1">
<title>Popup</title>
<link rel="stylesheet" href="popup.css">
<script type="module" src="popup.js"></script>
</head>
Works
<head>
<meta charset="utf-8">
<meta name="viewport" content="width=device-width, initial-scale=1">
<title>Popup</title>
<!-- <link rel="stylesheet" href="popup.css"> -->
<script type="module" src="popup.js"></script>
</head>
Reporter | ||
Comment 1•3 years ago
|
||
How pop-ups show
Reporter | ||
Comment 2•3 years ago
|
||
Reporter | ||
Comment 3•3 years ago
|
||
Updated•3 years ago
|
Comment 4•3 years ago
|
||
(In reply to erosman from comment #0)
Once Firefox starts, they work and gradually start to fail. It has been the same for multiple unsigned addons that I have installed.
What does "gradually" mean? After a few seconds? minutes? hours?
Reporter | ||
Comment 5•3 years ago
•
|
||
They start to fail after launching the pop-ups a few times.
Notes:
- Regardless of the STR, the issue exists but could be specific to certain conditions
- The Toolbox - Extension - Console errors (FrameData missing for ...) starts to show error after Firefox starts but even WITHOUT opening the pop-up
- The errors should confirm the existence of an issue
- The issue relates to loading of stylesheet
- The issue is affected by the conditions of Private window
- Mouse-over toolbar icon results in Layout was forced before the page was fully loaded. If stylesheets are not yet loaded this may cause a flash of unstyled content.` message on Toolbox - Extension - Console and sometimes FrameData missing for ... error
- Above message also appear on Private window toolbar icon hover
- The issue exists when switching from
<link rel="stylesheet" href="popup.css">
to<style>@import 'popup.css';</style>
- The issue exists with only ONE add-on enabled
- Disabling and re-enabling the add-on makes no difference on the issue
I will do more testing and update this post with the findings.
Installed 95.0a1 (2021-10-10) (64-bit)
http://ftp.mozilla.org/pub/firefox/nightly/2021/10/2021-10-10-21-49-47-mozilla-central/firefox-95.0a1.en-US.win64.installer.exe
- Mouse-over toolbar icon results in Layout was forced before the page was fully loaded. If stylesheets are not yet loaded this may cause a flash of unstyled content.` message on Toolbox - Extension - Console
- The error message was different
TypeError: data is undefined ExtensionPageChild.jsm:588:52
initExtensionContext resource://gre/modules/ExtensionPageChild.jsm:588
initExtensionDocument resource://gre/modules/ExtensionProcessScript.jsm:321
- Despite above errors, the pop-up (& sidebar) works fine so far
screenshot to follow
Reporter | ||
Comment 6•3 years ago
|
||
2021-10-10-21-49-47-mozilla-central
Comment 7•3 years ago
|
||
Hello,
I’ve attempted to bisect the issue and find the regressing issue, however without success. I ran a bisection spanning Firefox 94 until the present day, but all tested builds were good i.e. both the pop-up and the sidebar, in the default window, a new window and a private window, opened each time without problems (opened and closed both pop-up and sidebar around 20 times each).
I did encounter the errors mentioned in the original description, but the pop-up and sidebar did not start to fail once I started testing them.
Reporter | ||
Comment 8•3 years ago
•
|
||
Installed 95.0a1 (2021-10-09) (64-bit) 2021-10-09-21-26-22
- Mouse-over toolbar icon results in
Layout was forced before the page was fully loaded. If stylesheets are not yet loaded this may cause a flash of unstyled content.
message on Toolbox - Extension - Console - There is no
TypeError: data is undefined
error - pop-up (& sidebar) seem to work fine so far
Installed previous versions one by one and the result is:
- 2021-10-09-21 has
Layout was forced before..
on mouse-over but no error & pop-up worked while testing (limited time) - 2021-10-10-21 has
Layout was forced before..
on mouse-over plusTypeError: data is undefined
error & pop-up worked while testing (limited time) ?! - 2021-10-11-21 has
Layout was forced before..
on mouse-over plusTypeError: data is undefined
error & pop-up starts to fail - 2021-10-12-09 has
Layout was forced before..
on mouse-over plusTypeError: data is undefined
error & pop-up starts to fail - 2021-10-12-21 has
Layout was forced before..
on mouse-over plusTypeError: data is undefined
error & pop-up starts to fail - 2021-10-13-03 has
Layout was forced before..
on mouse-over plusTypeError: data is undefined
error & pop-up starts to fail - 2021-10-13-21 has
Layout was forced before..
on mouse-over plusFrameData missing for
error & pop-up starts to fail
It is possible that the problem is related to the error TypeError: data is undefined
& later FrameData missing for
. Tracing the errors might shed some light on the issue.
Nonetheless, it is worthwhile eliminating the error (as well as the notice) and hopefully the pop-up issue could get solves as a result.
Reporter | ||
Comment 9•3 years ago
|
||
Mouse-over browserAction creates even more errors on Multiprocess Browser Console
NS_ERROR_NOT_AVAILABLE: Component returned failure code: 0x80040111 (NS_ERROR_NOT_AVAILABLE) [nsIDocShell.domWindow] 33 window-global.js:408
get window resource://devtools/server/actors/targets/window-global.js:408
_onDocShellDestroy resource://devtools/server/actors/targets/window-global.js:938
observe resource://devtools/server/actors/targets/window-global.js:876
Exception { name: "NS_ERROR_NOT_AVAILABLE", message: "Component returned failure code: 0x80040111 (NS_ERROR_NOT_AVAILABLE) [nsIDocShell.domWindow]", result: 2147746065, filename: "resource://devtools/server/actors/targets/window-global.js", lineNumber: 1734, columnNumber: 0, data: null, stack: "watch@resource://devtools/server/actors/targets/window-global.js:1734:28\n_watchDocshells@resource://devtools/server/actors/targets/window-global.js:712:28\n_attach/<@resource://devtools/server/actors/targets/window-global.js:694:42\nexports.makeInfallible/<@resource://devtools/shared/ThreadSafeDevToolsUtils.js:103:22\n", location: XPCWrappedNative_NoHelper }
ThreadSafeDevToolsUtils.js:82:13
reportException resource://devtools/shared/ThreadSafeDevToolsUtils.js:82
makeInfallible resource://devtools/shared/ThreadSafeDevToolsUtils.js:109
Error while calling actor 'windowGlobalTarget's method 'attach' JSWindowActorChild.sendAsyncMessage: JSWindowActorChild cannot send at the moment Actor.js:92:13
DOMException: JSWindowActorChild.sendAsyncMessage: JSWindowActorChild cannot send at the moment Actor.js:99:15
NS_ERROR_NOT_AVAILABLE: Component returned failure code: 0x80040111 (NS_ERROR_NOT_AVAILABLE) [nsIDocShell.domWindow] window-global.js:1774
InvalidStateError: JSWindowActorChild.sendAsyncMessage: JSWindowActorChild cannot send at the moment DevToolsFrameChild.jsm:437
Error when attaching target: Error: Connection closed, pending request to server1.conn0.windowGlobal8589934743/windowGlobalTarget1, type attach failed
Request stack:
request@resource://devtools/shared/protocol/Front.js:292:14
generateRequestMethods/</frontProto[name]@resource://devtools/shared/protocol/Front/FrontClassWithSpec.js:46:19
attach/this._attach<@resource://devtools/client/fronts/targets/window-global.js:120:36
attach@resource://devtools/client/fronts/targets/window-global.js:133:7
_attachAndInitThread@resource://devtools/client/fronts/targets/target-mixin.js:526:20
attachAndInitThread@resource://devtools/client/fronts/targets/target-mixin.js:505:40
_onTargetAvailable@resource://devtools/shared/commands/target/target-command.js:187:25
_emit@resource://devtools/shared/event-emitter.js:244:34
emit@resource://devtools/shared/event-emitter.js:190:18
emit@resource://devtools/shared/event-emitter.js:342:18
_onTargetAvailable@resource://devtools/client/fronts/watcher.js:60:10
_emit@resource://devtools/shared/event-emitter.js:244:34
emit@resource://devtools/shared/event-emitter.js:190:18
emit@resource://devtools/shared/event-emitter.js:342:18
onPacket@resource://devtools/shared/protocol/Front.js:336:13
onPacket@resource://devtools/client/devtools-client.js:482:13
send/<@resource://devtools/shared/transport/local-transport.js:68:25
exports.makeInfallible/<@resource://devtools/shared/ThreadSafeDevToolsUtils.js:103:22
exports.makeInfallible/<@resource://devtools/shared/ThreadSafeDevToolsUtils.js:103:22
baseFrontClassDestroy resource://devtools/shared/protocol/Front.js:106
_onTargetDestroyed resource://devtools/shared/commands/target/target-command.js:307
_emit resource://devtools/shared/event-emitter.js:244
emit resource://devtools/shared/event-emitter.js:190
emit resource://devtools/shared/event-emitter.js:342
_onTargetDestroyed resource://devtools/client/fronts/watcher.js:74
_emit resource://devtools/shared/event-emitter.js:244
emit resource://devtools/shared/event-emitter.js:190
emit resource://devtools/shared/event-emitter.js:342
onPacket resource://devtools/shared/protocol/Front.js:336
onPacket resource://devtools/client/devtools-client.js:482
send resource://devtools/shared/transport/local-transport.js:68
makeInfallible resource://devtools/shared/ThreadSafeDevToolsUtils.js:103
makeInfallible resource://devtools/shared/ThreadSafeDevToolsUtils.js:103
target-command.js:189:15
Comment 10•3 years ago
|
||
Thank you for the detailed error report erosman, this is the same as bug 1735347, but this really helps confirming it's a real regression from bug 1708243 part 4.
(I might ask you to test/confirm something there in a bit)
Reporter | ||
Comment 11•3 years ago
•
|
||
Bug 1735347 deals with the errors and it is not confirmed yet that the browserAction Pop-up failure is due to the errors.
Furthermore, the browserAction Pop-up failure is still happening in 95.0a1 (2021-10-17) (64-bit).
IMHO, the bug should be kept open until the browserAction Pop-up failure is resolved. Currently (& for the last 9 days), I am forced to restart the browser to get the pop-ups working for a short time, which considerably disrupts the browsing experience.
Furthermore, extension development has become impossible.
Reporter | ||
Comment 12•3 years ago
|
||
Since yesterday, I am noticing the CSS on extension's options pages also started to fail a few times.
It is becoming more likely that there is an issue with loading stylesheets for extensions which is getting more pronounced.
Options page message is from view-source:resource://devtools/server/actors/inspector/node.js
Layout was forced before the page was fully loaded. If stylesheets are not yet loaded this may cause a flash of unstyled content.
node.js:356
Previous CSS message was from the popup.html
Layout was forced before the page was fully loaded. If stylesheets are not yet loaded this may cause a flash of unstyled content.
popup.html
Once again, when Firefox starts, pop-ups and option pages work fine and they start to fail over time. Just now, I have tried to load an extension's options page for over 20 times and it failed every time.
Assignee | ||
Comment 13•3 years ago
|
||
(In reply to erosman from comment #12)
Since yesterday, I am noticing the CSS on extension's options pages also started to fail a few times.
It is becoming more likely that there is an issue with loading stylesheets for extensions which is getting more pronounced.
...
Once again, when Firefox starts, pop-ups and option pages work fine and they start to fail over time. Just now, I have tried to load an extension's options page for over 20 times and it failed every time.
Given that you seem to be able to reproduce that consistently, it would really help if you could try to bisect the possible regressing change using mozregression, which would help us to confirm if the issue that is preventing the popup to be opened correctly it is a separate issue or not.
Reporter | ||
Comment 14•3 years ago
•
|
||
I did try to bisect it by manually downloading and installing previous versions of Nightly as in comment #8.
mozregression does not run on my Win7 OS and that has been the case for many years. There are Python errors. Installing various versions of Python didn't help either.
Looking at Developer Tools ➜ Style Editor, it may !!? relate to CSS @import
as I don't see the files loaded. I also use @import
in the browserAction pop-ups. Although, the pop-ups that don't have @import
also fail.
As mentioned in the first post, commenting out the <link rel="stylesheet" href="popup.css">
will result in the pop-up showing (but looking bad). This point does not relate to the issues in bug 1735347.
Please note that there are specific setting in about:config
that relate to Win7. There are a few OS-specific locked entries as I found out when discussing anther matter in #amo:mozilla.org which may explain why the issue in not happening for everyone.
N.B. As I type and as pop-ups fail and as extension option pages fail, I opened a new Private window and everything works fine in that private window.
After a while, that private window will also start to experience the same failures and I would be forced to restart Firefox in order to access pop-up & options.
PS. In case this has a bearing on the issue, all my extensions are under development and are loaded unpacked as it has been for many years.
Update:
It has already been 10 days and it appears that the failures, when happen, only happen for the above mentioned unpacked add-ons. I have 2 other addons (uBlock Origin & uMatirx) and their pop-ups haven't failed yet. Considering that loading stylesheets seems to be a factor, could there have been some changes to the stylesheet loading process on Oct 9th-10th that is affecting the loading from unpacked & out of Firefox own filesystem location?
Reporter | ||
Updated•3 years ago
|
Assignee | ||
Comment 15•3 years ago
•
|
||
(In reply to erosman from comment #14)
I did try to bisect it by manually downloading and installing previous versions of Nightly as in comment #8.
mozregression does not run on my Win7 OS and that has been the case for many years. There are Python errors. Installing various versions of Python didn't help either.
Looking at Developer Tools ➜ Style Editor, it may !!? relate to CSS
@import
as I don't see the files loaded. I also use@import
in the browserAction pop-ups. Although, the pop-ups that don't have@import
also fail.As mentioned in the first post, commenting out the
<link rel="stylesheet" href="popup.css">
will result in the pop-up showing (but looking bad). This point does not relate to the issues in bug 1735347.Please note that there are specific setting in
about:config
that relate to Win7. There are a few OS-specific locked entries as I found out when discussing anther matter in #amo:mozilla.org which may explain why the issue in not happening for everyone.N.B. As I type and as pop-ups fail and as extension option pages fail, I opened a new Private window and everything works fine in that private window.
After a while, that private window will also start to experience the same failures and I would be forced to restart Firefox in order to access pop-up & options.PS. In case this has a bearing on the issue, all my extensions are under development and are loaded unpacked as it has been for many years.
Update:
It has already been 10 days and it appears that the failures, when happen, only happen for the above mentioned unpacked add-ons. I have 2 other addons (uBlock Origin & uMatirx) and their pop-ups haven't failed yet. Considering that loading stylesheets seems to be a factor, could there have been some changes to the stylesheet loading process on Oct 9th-10th that is affecting the loading from unpacked & out of Firefox own filesystem location?
The only one that I can think of right now is https://bugzilla.mozilla.org/show_bug.cgi?id=1706594, which got on mozilla-central on Oct 11:
- The stylesheets listed in an html document do trigger a speculative load, which was one of the kind of requests that were able to trigger that bug
- the fix explicitly cancel streams that are related to channels that were already canceled and so in theory it shouldn't be preventing the page in the popup from opening correctly
- but the bug was also more likely to happen for unpacked extensions (I couldn't find a way to reproduce it with packed ones)
and so there are 2 parts of your STR that seems to have some kind of connection with that fix.
It would be great if we could confirm if that is actually the case, would you mind to send me the about:support information and any specific setting in about:config
that you already know that relates only to Window 7?
Being able to reproduce the issue locally would allow me to confirm if there is any actual connection between the issue you are experiencing and the fix we landed for Bug 1706594, and in either case (related or not to Bug 1706594) to dig more into it and get a better picture of what is really going on.
Alternatively (or in addition to that), I create a backout commit locally and push it to try, this way you may try on your system a build that includes everything that is already in Nightly besides that single change, you can download the custom build from here:
Let me know if you are unable to reproduce the issue in this build but you are able to reproduce it on the last nightly build.
Reporter | ||
Comment 16•3 years ago
•
|
||
- I was able to reproduce the bug on a clean profile on the Latest Nightly
- I was not able to reproduce the bug on the build in comment #15
Update
- I was able to reproduce the bug on latest Nightly with only one unpacked add-on enabled
Notes
- Set-up:
Windows 7 Ultimate, latest Nightly, 17 enabled unpacked installed addons, 5 enabled standard installed addons, 9 disabled standard installed addons - The failure only occurs on unpacked addons
- The failure affects all add-on pages but mostly on popup & sidebar and much less on options page
- Commenting out
<link rel="stylesheet" href="popup.css">
results in failed pop-up showing - After failure, it is togglable, i.e. without-CSS-shows, with-CSS-fails
@import
appears to be much slower than<link rel="stylesheet" href="popup.css">
Assignee | ||
Updated•3 years ago
|
Assignee | ||
Comment 17•3 years ago
|
||
I was more and more convinced that the errors logged and the popup issue are two different bugs which happens to be triggerable with a similar STR (which leverages the special "popup extension page preloaded in an hidden browser on mouseover and swapped with the visible browser if the browserAction is clicked or preload hidden browser destroyed if the user move the mouse out of the browserAction")
But unlike Bug 1735347 (which is specifically related to the error logged, and a regression from Bug 1708243), this issue can only be triggered by an unpacked extension (it doesn't matter if it is loaded temporarily or permanently).
STR to reproduce the popup issue consistently
Over the last couple of day I spent some more time gathering some more details about the STR originally shared in comment 1 by discussing it with erosman over Matrix, and I have been finally able to trigger the bug consistently (and just a few seconds) by using the following tweaked STR:
- run firefox with the "nsCSSLoader" logs enabled:
export MOZ_LOG="nsCSSLoader:5,sync"
(or evenMOZ_LOG="nsCSSLoader:5,sync" mach run
if running a local Nightly build) - download, unpack in a local directory the xpi file of the FireMonkey extension and install it temporarily from about:debugging (I originally used some more addons all installed as unpacked, but then I have verified that I can also trigger it with just the unpacked FireMonkey extension)
- move the mouse over and out of the FirefoxMonkey browserAction (which happen to be also how kernp5 reproduced consistently Bug 1735347, see Bug 1735347 comment 15, but this popup issue will only be triggered if the extension is unpacked, unlike the error logged that Bug 1735347 is already tracking)
- the bug has been triggered when the nsCSSLoader logs shows that there is a FireMonkey popup.css stylesheets cache entry that gets stuck into the
Loading
state:
[Child 1598065: Main Thread]: D/nsCSSLoader css::Loader::LoadSheet(aURL, aObserver) api call
[Child 1598065: Main Thread]: D/nsCSSLoader Non-document sheet uri: 'moz-extension://8132eec0-bdce-4534-af99-4edb3415c02c/content/popup.css'
[Child 1598065: Main Thread]: D/nsCSSLoader css::Loader::CreateSheet(moz-extension://8132eec0-bdce-4534-af99-4edb3415c02c/content/popup.css)
[Child 1598065: Main Thread]: D/nsCSSLoader KeyEquals(moz-extension://8132eec0-bdce-4534-af99-4edb3415c02c/content/popup.css)
[Child 1598065: Main Thread]: D/nsCSSLoader KeyEquals(moz-extension://8132eec0-bdce-4534-af99-4edb3415c02c/content/popup.css)
[Child 1598065: Main Thread]: D/nsCSSLoader Hit cache with state: Loading
- after the "Hit cache with state: Loading" has been logged at least once for the popup.css url, opening the browserAction will not fully render the FireMonkey popup with its intended content, on the contrary it gets rendered as the one in the screenshot attached in Bug 1735899 comment 1
- from the BrowserToolbox (by setting a breakpoint on the devtools internals that are observing the destroyed window) I have also verified that the extension page document in the browserAction popup is stuck in the "interactive" state (which I think it is due to the fact that it will be forever waiting for the cache entry stuck into the Loading state)
- by opening the same FireMonkey popup moz-extension url in a Firefox tab, the tab is stuck into the loading state and nothing gets rendered in the tab. Explicitly stopping the loading tab makes the extension page able to be rendered, but without the stylesheet that was stuck into the Loading state
Actual regressing change
This, as I was already suspecting when I started to investigate it, points suspiciously to the same kind of race issues triggered by the SimpleChannel created by ExtensionProtocolHandler for the unpacked extensions: Bug 1706594.
I'm pretty sure now that this issue was not happening before we landed Bug 1706594 because before that change Gecko would have kept loading data for requests even if technically cancelled (due to the related document being destroyed in the meantime) and so the preloaded stylesheet entries loads would complete and not stay stuck in the Loading state.
As an additional confirmation, I tried to reproduce the issue on a local build where I backed out the patch landed from Bug 1706594 (hg backout -r 2c21101c4b3b -m "TMP: temporarily revert Bug 1706594 patch"
) and on that build I was not able to trigger the issue anymore.
Next steps
As soon as I got a better picture of the issue yesterday, we (Tomislav and I) agreed on the following next steps:
- Update this bug with more details about how to reproduce the issue consistently and what seems to be going on based on the details we have been able to collect so far
- Needinfo Nika and Valentin for their perspective and opinions about this issue, and evaluate with them if we should backout Bug 1706594 patch from Nightly 95 and reopen it (so that we can investigate this issue and land it again after we have got to its root cause and prepared a fix to land along with the original Bug 1706594 patch)
- Needinfo Emilio for his perspective on the preloaded stylesheet entry that gets stuck, and pointers / ideas that could help us to investigate it more (and then find out where we are missing to propagate the fact that the channel has been cancelled, so that the cache entries would be invalidated accordingly)
- Reduce the STR described in this comment into an automated test and try to record it with rr (and submit to pernosco)
Motivations for backing out Bug 1706594 patch from Nightly 95
- As Tomislav did mention to me when we discussed this yesterday, an additional reason to backout Bug 1706594 would be to ensure that this issue is not hiding any other issue that the fission-related changes Tomislav landed in Firefox 95 from Bug 1735347 may have introduced.
- This bug shows us that there are some side-effects of the fix landed in Bug 1706594 that we missed to notice until now, and so spending some more time investigating this issue may also help to ensure us that we didn't miss other similar ones
What makes a user more likely to be impacted by this bug (and Bug 1706594)
Both this bug and Bug 1706594 can be triggered only on unpacked extensions (because those are the ones that will get a stream asynchronously when a moz-extension url is being loaded), and so the users that are more likely to trigger this issue are:
- extension developers (because they are more likely be installing and testing unpacked extensions)
- Linux users that install distro-packaged Firefox and extensions (which are also unpacked as I did mention in Bug 1706594 comment 32)
Impact comparison for Bug 1706594 and Bug 1735899
- Bug 1706594: for the impacted user that happen to trigger that bug, all extensions would not be able to run any extension code anymore, because the extension child process main thread is kept busy by a "stream pump infinite loop"
- Bug 1735899:
- for the impacted user that happen to trigger this bug, for the extension that triggers this bug any extension page that include the css file that gets stuck will never load anymore (unless the extension process is killed and restarted), disabling and re-enabling the extension is not enough.
- Based on erosman comments it seems that the issue then starts to impact other extensions over the time, but I've not been able to verify this yet
But this impact analysis is based only on the known side-effects of the fix we have landed for Bug 1706594, be sure that there aren't other side effects that we don't know about yet would require some more investigation.
I don't have stats or a complete rationale to bake the following statement, but based on my current understanding of both the bugs (Bug 1706594, Bug 1735899), it looks that they may have more or less the same kind of chances to be triggered by a user.
Assignee | ||
Comment 18•3 years ago
|
||
Hi Nika and Valentin,
I investigated this bug and it looks that this issue did become "triggerable" because of the changes we introduced in Bug 1706594.
In comment 17 I added details from the investigation (include an STR that can reproduce it consistently) and some preliminary evaluations about possible next steps (included evaluating with you if we should backout Bug 1706594 patch from 95, what makes a user more likely to be affected etc.)
We would like to hear your opinions and perspective (about the issue and if we should back out Bug 1706594 from Nightly 95).
Assignee | ||
Comment 19•3 years ago
|
||
Hi Emilio,
It looks that the fix I landed to fix Bug 1706594 is having a side-effect that is now triggering this issue.
As described in more details in comment 17, a preloaded stylesheet cache entry gets stuck into the Loading state and it never completes (likely because with Bug 1706594 changes the stream pump has been cancelled, previously it would have been left to pump data, which I guess would have allowed those preloaded stylesheet cache entry to be fully loaded and never trigger this issue).
The nsCSSLoader log in comment 17 shows that when a cache entry gets stuck then the issue got triggered and the extension pages that include that CSS will never finish loading, which is clearly a side-effect and not the actual root cause.
I'm needinfo-ing you because I'd like to hear your perspective about this issue, and in particular any additional idea or pointers to details related to the CSS loader internals that I may look more deeply into to try to get a better picture and pin point where the issue is actually triggered to then lead to this issue.
Comment 20•3 years ago
|
||
Where is the load canceled? The CSS loader generally coalesced loads, so it's not sound to cancel a load if there's another document loading the same url. Assuming that's not the case however, and we really just want to cancel the load altogether, it seems that the issue is that StreamLoader::OnStopRequest
is not being called for that stylesheet.
This call is what eventually ends up in SheetComplete
, and thus removing the load from the table (if it failed or was canceled), or marking the load as successful and inserting it into the cache. In any case, without that call the stylesheet will remain as loading for ever, which seems like what you're hitting.
So presumably you're missing an OnStopRequest(NS_BINDING_ABORTED)
somewhere in the code you added?
Assignee | ||
Comment 21•3 years ago
|
||
Thanks a lot Emilio, your comment confirmed me that I was looking in the direction and while investigating it I already verified that is the case (those cache entries never gets a OnStopRequest, from what I recall they don't get an OnStartRequest neither but I have to take another look to double-check that was the case) and that made me realize which may be the scenario that may make the OnStopRequest to never be sent.
I'm going to double-check that (and very likely asking your perspective after I have verified if that is the case).
Assignee | ||
Comment 22•3 years ago
|
||
I confirmed that my theory about what was missing to call OnStopRequest
for the preloaded stylesheet cache entries was actually matching what is actually going on:
- When
ExtensionStreamGetter::OnStream
is called we are currently passing theExtensionStreamGetter
instance'smListener
(annsCOMPtr
to ansIStreamListener
) to thensInputStreamPump's AsyncOpen
method - if the call to
mPump
'sAsyncOpen
did not fail, then we assume thatnsInputStreamPump
will be always taking care of notifyingOnStartRequest
/OnStopRequest
to thatnsIStreamListener
- and so in
ExtensionStreamGetter::Cancel
we just callmPump
'sCancel
method if themPump
exists and we assumed that was always going to take care of callingOnStopRequest
(after having calledOnStartRequest
if it didn't yet) as we do inExtensionStreamGetter::Cancel
if the cancellable request got cancelled before we actually got a stream
- and so in
- unfortunately, that isn't the case and there is still a chance that
ExtensionStreamGetter::Cancel
may be called after we successfully created thensInputStreamPump
and called itsAsyncOpen
method, but before thatnsInputStreamPump
has got to notifyOnStartRequest
to thensIStreamListener
passed to it (actually when the race behind this bug is triggered, we are getting to cancel thensInputStreamPump
before it ever got toOnInputStreamReady
).
Locally I tried the following approach and confirmed that it consistently prevented this issue for me locally, and the resulting patch seems to be small and clear enough:
- Tweak
ExtensionStreamGetter
to also implementnsIStreamListener
- When handling an extension protocol's
SimpleChannel
that needs to retrieve the data through an asynchronous stream (in other words: "when we are loading a moz-extension url from an unpacked extension"), pass tomPump
'sAsyncOpen
theExtensionStreamGetter
itself as ansIStreamListener
(and keeps the references tomListener
andmChannel
) - Under normal conditions:
- the
ExtensionStreamGetter
will just forward tomListener
the calls toOnStartRequest
,OnStopRequest
andOnDataAvailable
, - once
ExtensionStreamGetter::OnStopRequest
is called, the stream getter instance can forget both themListener
andmChannel
- the
- Under the race that triggers this bug:
- ExtensionStreamGetter keeps track internally if OnStartRequest has ever been called and so it knows if mPump did never start to pump any data when the channel is being cancelled
- and so if
ExtensionStreamGetter::Cancel
is called beforemPump
calledExtensionStreamGetter::OnStartRequest
, then we know that we need to take care of notify that tomListener
(by explicitly calling itsOnStartRequest
/OnStopRequest
, as we were already doing if get got to create thensInputStreamPump
) as part of the cancellation.
I'm going to attach to this issue a draft patch that implements this strategy in a minute, to evaluate with Nika and Valentin if this is a reasonable strategy also from their perspective.
Making sure that nsInputStreamPump will call mListener's OnStartRequest/OnStopRequest doesn't seem the right way to fix the issue
Assignee | ||
Updated•3 years ago
|
Updated•3 years ago
|
Assignee | ||
Comment 23•3 years ago
|
||
Assignee | ||
Comment 24•3 years ago
|
||
Hi Nika and Valentin,
In comment 22 I tried to provide a shorter summary of what is actually triggering this issue and a proposed approach to handle that in ExtensionStreamGetter (which is the approach drafted in the patch attached in comment 23).
In Comment 17 you can find some more details about the issue (in particular the STR I used to trigger it locally, and which classes of users may be impacted and what kind of impact we may expect based on the picture we got so far).
The motivations to consider backing out Bug 1706594 still stands, unless we are in agreement that the approach in the attached patch seems a reasonable approach and we prefer to aim to get that patch ready to be landed in the next few days (and before the merge to beta) instead.
Assignee | ||
Updated•3 years ago
|
Assignee | ||
Updated•3 years ago
|
Comment 25•3 years ago
|
||
Set release status flags based on info from the regressing bug 1706594
Assignee | ||
Comment 26•3 years ago
|
||
Push to try for the attached patch:
Comment 27•3 years ago
|
||
(In reply to Luca Greco [:rpl] [:luca] [:lgreco] from comment #22)
Making sure that nsInputStreamPump will call mListener's OnStartRequest/OnStopRequest doesn't seem the right way to fix the issue
Why doesn't that feel like the right solution? In general nsIChannel
/nsIRequest
instances should always fire OnStartRequest
/OnStopRequest
pairs once they've been opened, and fixing the situation here should fix it for other users of nsInputStreamPump
as well.
Do you know what code path is being taken in the nsInputStreamPump
which is causing us to never end up firing the OnStartRequest
/OnStopRequest
pair?
Assignee | ||
Comment 28•3 years ago
|
||
(In reply to Nika Layzell [:nika] (ni? for response) from comment #27)
(In reply to Luca Greco [:rpl] [:luca] [:lgreco] from comment #22)
Making sure that nsInputStreamPump will call mListener's OnStartRequest/OnStopRequest doesn't seem the right way to fix the issue
Why doesn't that feel like the right solution? In general
nsIChannel
/nsIRequest
instances should always fireOnStartRequest
/OnStopRequest
pairs once they've been opened, and fixing the situation here should fix it for other users ofnsInputStreamPump
as well.
Do you know what code path is being taken in thensInputStreamPump
which is causing us to never end up firing theOnStartRequest
/OnStopRequest
pair?
To be honest, my initial instinct was that too: nsInputStreamPump should be always fire OnStartRequest
/OnStopRequest
, because in the end it does technically implement nsIRequest.
Based on my current understanding ([1]) nsInputStreamPump's current implementation seems to be guarantying that only if the pump has at least got a call to OnInputStreamReady (which internally will call OnStartRequest/OnStopRequest based on the state transitioning of the pump).
If the pump never got a ready stream then mListener OnStartRequest/OnStopRequest will never be called (there doesn't seem to be any other way to get to them at the moment).
I haven't checked in the detail how many other ways nsInputStreamPump may still not calling mListener's OnStartReqeust/OnStopRequest methods, but I got the impression that nsInputStreamPump's OnInputStreamReady/OnStateStart/OnStateTransfer/OnStateStop may still not call those callbacks in certain error scenarios.
This is mainly what motivated me to also look into "how handling that in ExtensionStreamGetter would have looked like".
And so, more than "nsInputStreamPump doesn't seem the right place where the issue should be fixed" I should have wrote "fixing the issue in nsInputStreamPump may turn out to be harder than expected and may need some more rework of the logic class, and after that some more investigations in the side-effects that changes to that logic may be introducing (in case we may introduce some other regressions due to that)".
As additional side notes:
-
the option to backout Bug 1706594 from Firefox 95 (and spend some more time to look into other approaches to fix this scenario before relanding it in Firefox 96) is definitely still on the table
-
I completely agree that if we end up fixing this scenario in ExtensionStreamGetter, then we should also at least look if a similar scenario has to be handled in the other two classes that we had to adapt in Bug 1706594 (but I deferred that to evaluate if we have a better approach to look into), and we would still have the problem that we may introduce the same issue elsewhere in the future (and I can't exclude that the same issue may already exists somewhere else and a similar issue may be triggered due to that race).
[1]: I've still a lot learn about these components to really have a complete picture, and I assume that I can be wrong.
Assignee | ||
Comment 29•3 years ago
|
||
Nika and I continued to discuss about this issue over Matrix:
- Nika did notice that RemoteLazyInputStream::Close doesn't seems to be handling closing the remote lazy input stream correctly
- which is confirmed by trying to don't use the remote lazy input stream for the requests loading from a file as a quick experiment ([1])
And so we agreed to look into fixing RemoteLazyInputStream::Close
first ([2]).
Also clearing the needinfo assigned to Valentin, while I'm looking into the approach Nika and I agreed on.
[1]: verified by applying a temporary 2 line change on ipc/glue/IPCStreamUtils.cpp and then checking that using the STR from comment 17 I couldn't reproduce the issue anymore
[2]: Bug 1706594 backout is not completely off the table, we may still have to do that as a last resort (e.g. if we need more time to investigate the issue and have a fix we are confident about).
Updated•3 years ago
|
Assignee | ||
Comment 30•3 years ago
|
||
I've updated the attach patch to make it aim for the fix Nika suggested (as briefly described in comment 29).
I tried this version of the fix locally and confirmed that I was unable to trigger the issue using the STR from comment 17.
I'm currently pushed it to try to double-check if in its current form there is any suspicious test failure to look into:
Updated•3 years ago
|
Assignee | ||
Comment 31•3 years ago
|
||
Setting leave-open, Nika and I agreed to push the fix to autoland, but we will still try to write a test case based on the STR (I'll remove the leave-open and close this issue if the resulting test case isn't able to trigger the issue consistently enough).
Comment 32•3 years ago
|
||
Comment 33•3 years ago
|
||
bugherder |
Comment 34•3 years ago
|
||
Luca, this is marked as P1/S2 and is a new regression in 95, should we uplift the fix in beta?
Assignee | ||
Comment 35•3 years ago
|
||
(In reply to Pascal Chevrel:pascalc from comment #34)
Luca, this is marked as P1/S2 and is a new regression in 95, should we uplift the fix in beta?
The patch landed (comment 33) is already in Firefox 95 (the same release where Bug 1706594 patch has also been landed).
I haven't closed the bugzilla issue yet because I want to try to reduce the STR into a test case and see if that is stable enough (and not too intermittent) to be landed in tree (and that test case could be eventually uplifted to beta).
Unfortunately I have not been able to get back to this yet (I was planning to try to get back to it by the end of this week).
I can definitely file a separate follow up if we want to close this issue as fixed in the meantime (especially if keeping it open doesn't make it clear what is the actual status of this).
Comment 36•3 years ago
|
||
Thanks, marking the status as fixed for firefox95 will remove that from our relman queries :)
Assignee | ||
Comment 37•3 years ago
|
||
This patch includes a new test file to cover a scenario similar to the one
fixed in Bug 1735899, and possibly some other similar ones that may slip
through unnoticed.
This test triggers the issue consistently on a build where the fix
is not included and pass consistently in build including the fix,
but even if it doesn't include any arbitrary timeouts to trigger the
issue there is still a chance that the test may start to fail
intermittently for some other reasons.
Keeping it as a separate test file will make it easier to disable it
as a temporary measure if turns out to be necessary.
Assignee | ||
Comment 38•3 years ago
|
||
While putting together this test I did set for myself a couple of base requirements:
- try to not tight it too much to the issue we fixed, but instead try to make it potentially able to trigger and caught other similar issues due to races that may be triggered by starting to preload the popup on mouseover and then cancelling it on mouseout
- no use of arbitrary timeout to trigger the mouseover-mouseout race
- should fail consistently on builds without the fix
- should pass consistently on builds with the fix
And the test attached in comment 37 does match all of those requirements (well, it doesn't have arbitrary timeouts but I had to use an imported css file with an arbitrary size big enough to make the issue we fixed more likely to be triggered with a less amount of mouseover/mouseout sequences, and an arbitrary amount of these mouseover/mouseout sequences), but I'm still a bit on the fence about landing it on central.
I looked into how stable the desired behavior of this test was going to be with some push to try:
- it is going to fail consistently on builds without the fix (backed out manually in a separate temporary patch to verify that): https://treeherder.mozilla.org/jobs?repo=try&revision=afb55ab4b7a43f2917d49f9112019d6acd2ca49f
- it is going to pass consistently on builds with the fix: https://treeherder.mozilla.org/jobs?repo=try&revision=ce8f6562c616becc9d5611ac0511406ed52620f9
I restricted these try pushes to only run on linux64 while I was still figuring out how stable the test was going to be, and so I pushed one more for win32 and macosx64 that is still running:
It is also worth a mention that this test does still fails in debug builds with and without the fix (due to some assertion failures and shutdown leaks that are being triggered often enough, and seems worth to be investigated further in a separate follow up because they may be an hint of other issues we may have not noticed yet), and so I guess there may be also some more that asan/tsan build may also be able to detect.
Assignee | ||
Updated•3 years ago
|
Comment 39•3 years ago
|
||
Comment 40•3 years ago
|
||
bugherder |
Comment 41•3 years ago
|
||
Change the status for beta to have the same as nightly and release.
For more information, please visit auto_nag documentation.
Updated•3 years ago
|
Updated•3 years ago
|
Assignee | ||
Comment 42•3 years ago
|
||
Comment on attachment 9247501 [details]
Bug 1735899 - Make sure RemoteLazyInputStream::Close calls mInputStreamCallback OnInputStreamReady method. r?nika!
ESR Uplift Approval Request
- If this is not a sec:{high,crit} bug, please state case for ESR consideration: This fix is needed on builds where Bug 1706594 has been fixed, and so the case for ESR consideration is that we should be uplifting this fix along with Bug 1706594 if that uplift request is accepted.
- User impact if declined: For the impacted user, once an extension triggers this issue ([1]) any preloaded css file that got stuck will never load anymore ([1]), disabling and re-enabling the extension is not enough.
[1]: which is more likely when the extension page has a browserAction popup, especially if the extension popup html page does include a css that is using @import rules.
[2]: unless the extension process is killed and restarted
- Fix Landed on Version: 95
- Risk to taking this patch: Low
- Why is the change risky/not risky? (and alternatives if risky): The fix was quite small, and it has been in release for a while (without any further issue or regression related to the change being reported to us).
It has been also applied to the Firefox ESR build that Debian OS packages and it doesn't seem that any issue has been reported by Debian users being using it up until now.
NOTE: If accepted, this fix is meant to be uplifted along (and applied on top) of the fix for Bug 1706594
(same as done in the Debian's Firefox ESR package).
Updated•3 years ago
|
Comment 43•3 years ago
|
||
Comment on attachment 9247501 [details]
Bug 1735899 - Make sure RemoteLazyInputStream::Close calls mInputStreamCallback OnInputStreamReady method. r?nika!
Approved for 91.8esr.
Updated•3 years ago
|
Comment 44•3 years ago
|
||
bugherder uplift |
https://hg.mozilla.org/releases/mozilla-esr91/rev/f329065ab096
https://hg.mozilla.org/releases/mozilla-esr91/rev/9d49bb0cdd2a
Description
•