Closed Bug 1249929 Opened 8 years ago Closed 7 years ago

The nsIContentFrameMessageManager I added addMessageListener to, becomes defunct, in Firefox 45.2 e10s a/b experiment beta

Categories

(Core :: DOM: Content Processes, defect, P5)

45 Branch
x86_64
Windows 10
defect

Tracking

()

RESOLVED INCOMPLETE
Tracking Status
e10s + ---

People

(Reporter: noitidart, Unassigned)

References

Details

(Whiteboard: btpp-followup-2016-03-08, triaged)

User Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:45.0) Gecko/20100101 Firefox/45.0
Build ID: 20160215141016

Steps to reproduce:

I discussed this issue over at discourse - https://discourse.mozilla-community.org/t/content-frame-message-manager-changes-during-life-of-tab/7148

I am using FIrefox 45.2 and am in the 10-day "Multi-process Firefox A/B Test 45.2".

I have an addon that has a privileged chrome:// page. On load of this page I addMessageListener to the nsIContentProcessManager by obtaining it like this:

var gCFMM = window.QueryInterface(Ci.nsIInterfaceRequestor)
                              .getInterface(Ci.nsIDocShell)
                              .QueryInterface(Ci.nsIInterfaceRequestor)
                              .getInterface(Ci.nsIContentFrameMessageManager);

Now this works fine for the life of the tab HOWEVER if I pin the tab, and restart the browser. Things behave as normal for some random about of time. It always starts up properly and is good for at least 5 minutes. But sometime after 5minutes, that nsIContentFrameMessageManager is now defunct. Doing sendAsyncMessage throws an error:

NS_ERROR_ILLEGAL_VALUE: Component returned failure code: 0x80070057 (NS_ERROR_ILLEGAL_VALUE) [nsIMessageSender.sendAsyncMessage]

And the listener I added via addMessageListener does not response to anything.



Actual results:

To test this you can install my beta version of the addon here - https://addons.mozilla.org/en-US/firefox/addon/profilist/versions/beta

Then go to about:profilist?html and pin this tab. It's a profile manager, clicking any of the profiles launches it. If any profile is running the icon is filled in black and clicking it focuses the profile.

Restart the browser, and you will see after some time clicking things will give you the sendAsyncMessage error I mentioned above. And the state of the GUI will remain unchanged.


Expected results:

The nsIContentFrameMessageManager should remain connected, if for some reason it is destroyed, the tab should be reconnected to the same one.
Component: Untriaged → Security
Component: Security → Security: Process Sandboxing
OS: Unspecified → Windows 10
Product: Firefox → Core
Hardware: Unspecified → x86_64
Hi,
What makes you think this is Security: Process Sandboxing?
The content process sandbox on Windows is only on Nightly at the moment, so it can't be affecting Beta.
Flags: needinfo?(noitidart)
(In reply to Bob Owen (:bobowen) from comment #1)
> Hi,
> What makes you think this is Security: Process Sandboxing?
> The content process sandbox on Windows is only on Nightly at the moment, so
> it can't be affecting Beta.

Oh shoot. I'm just learning how to triage stuff. I matched it to another one of my topics. It totally can be wrong, do you have any advice on what would be a better category?
Flags: needinfo?(noitidart)
I don't know this area all that well, but I think the message manager stuff for the content process belongs here.
Component: Security: Process Sandboxing → DOM: Content Processes
smaug, mccr8 thinks you may have insight here.
Flags: needinfo?(bugs)
Whiteboard: btpp-followup-2016-03-01
Assuming you're using multiprocess FF, is the relevant page in the tab up and running?
Or are you using single process FF?


So the addon itself isn't using message manager? At least I couldn't find any message manager use.
I loaded it from https://addons.mozilla.org/en-US/firefox/addon/profilist/versions/beta.
Or am I looking at wrong version. I downloaded Version 3.0b3
Flags: needinfo?(bugs)
Without seeing the relevant code here, I wonder if it is possible to add a message listener
to inprocesstabchildglobal, and then we switch the tab to use remote browser so we end up disconnecting the existing message managers. (IIRC we do something like that, but I'm not familiar with that setup, since it is all in Firefox UI code.)
Flags: needinfo?(mconley)
Hi Olli,
Thanks for he reply. I am in e10s I'm pretty sure. I had the message manager set up for quite some time (I have been on beta channel for last couple years), and I only noticed this issue as the A/B testing for e10s started.

You can see the code on github too here for the two about pages:

about:profilist - https://github.com/Noitidart/Profilist/blob/master/resources/scripts/cp.js
about:profilist?html - https://github.com/Noitidart/Profilist/blob/master/resources/scripts/html.js

Those are the scripts and these are the actual xhtml pages:
https://github.com/Noitidart/Profilist/tree/master/resources/pages

I register those pages here with the usual about page registration method:
https://github.com/Noitidart/Profilist/blob/master/bootstrap.js#L71-L117

If you would like a simpler example I can write one up sometime.

Here is a screenshot of the experiment my beta is in right now: http://i.imgur.com/QVWyFPb.png

The beta just ended it seems (the screnshot says "less then one day ago"), I just tested my addon page and it seems to still be connected. I'll update you if it happens while I am off this experiment. :)
Oh a note, the addon has been using the message manager (and my many other addons use same technique) without issue since whatever was in beta on August 2015, so even in the non-e10s versions. However I had never pinned a tab. Two ago is the first time I pinned a tab that used the message manager. I think it might have something to do with on startup.
I should also add that in the last couple months I did thorough testing on dev edition and nightly (even though I mainly don't use them) and it wasn't causing issues, but one constant though is I never pinned those tabs so it wasn't there on startup.
If I understand comment 0 correctly, a chrome:// page (so, one that should never load in the content process) is being restored automatically in a pinned tab after session restore, and then somehow the message manager is getting disconnected...

Do I have that correct, noitidart? The page in question is a chrome:// page in the pinned tab?
Flags: needinfo?(mconley) → needinfo?(noitidart)
(In reply to Mike Conley (:mconley) - Needinfo me! from comment #10)
> If I understand comment 0 correctly, a chrome:// page (so, one that should
> never load in the content process) is being restored automatically in a
> pinned tab after session restore, and then somehow the message manager is
> getting disconnected...
> 
> Do I have that correct, noitidart? The page in question is a chrome:// page
> in the pinned tab?

Thanks Mike. That's true by default however not in this case as I force it to load in the content process on registration of the about:page -

I set URI_CAN_LOAD_IN_CHILD here - https://github.com/Noitidart/Profilist/blob/master/bootstrap.js#L80

This page shows if I set URI_CAN_LOAD_IN_CHILD then it will load in the browsers content process - https://developer.mozilla.org/en-US/Firefox/Multiprocess_Firefox/Which_URIs_load_where 


I actually used to set URI_MUST_LOAD_IN_CHILD (which makes it take a child process aside from the browser process), however in the Fx 47 nightlies the page just wouldnt load so I changed to URI_CAN_LOAD_IN_CHILD.
Flags: needinfo?(noitidart)
Flags: needinfo?(mconley)
Whoops, it seems to happen in the non-experiment as well with pinned tab.
(In reply to noitidart from comment #12)
> Whoops, it seems to happen in the non-experiment as well with pinned tab.

non-experiment, as in, with e10s disabled? Can you confirm by checking about:support, under Multiprocess Windows in the "Application Basics" section?
Flags: needinfo?(mconley) → needinfo?(noitidart)
Yep, with e10s disabled as well.

about:support says - 
Multiprocess Windows 	0/1 (default: false)
Flags: needinfo?(noitidart)
baku, you wrote about:profiles, any thoughts here?
Flags: needinfo?(amarchesini)
Whiteboard: btpp-followup-2016-03-01 → btpp-followup-2016-03-08
(In reply to Andrew Overholt [:overholt] from comment #15)
> baku, you wrote about:profiles, any thoughts here?

Thanks for bumping this Andrew. I'm pretty sure this is independent of profile work. I need to write a simplified example so we can verify. If a tab is there on startup (due to a pin possibly), the addMessageListener eventually disconnects (usually within a few minutes). I'll try to write a verifiable example. I'm sure it's not my code, but the simple example should help verify, I will try to get to that soon.
(In reply to noitidart from comment #16)
> (In reply to Andrew Overholt [:overholt] from comment #15)
> > baku, you wrote about:profiles, any thoughts here?
> 
> Thanks for bumping this Andrew. I'm pretty sure this is independent of
> profile work. I need to write a simplified example so we can verify. If a
> tab is there on startup (due to a pin possibly), the addMessageListener
> eventually disconnects (usually within a few minutes). I'll try to write a
> verifiable example. I'm sure it's not my code, but the simple example should
> help verify, I will try to get to that soon.

Thanks, a simple example will be great. I asked baku just because he's dealt with about:profiles + {non-,}e10s (and another addon) so I thought he may have thoughts.
If you can create a simple example, I can take a look. NI me when you upload it. Thanks!
Flags: needinfo?(amarchesini) → needinfo?(noitidart)
I'll make a demo tonight. I'm testing a fix I landed. I was using the `URI_CAN_LOAD_IN_CHILD` flag from - https://developer.mozilla.org/en-US/Firefox/Multiprocess_Firefox/Which_URIs_load_where - in order to force the page to load inside a content process rather then the default of main process.

I removed that and it seems to have fixed things. I'll test it more and then write a example.
Update - I worked to reproduce this today, but I haven't been able to. It might just be two of my addons have something in common. I did find a side bug that is e10s related as well - https://bugzilla.mozilla.org/show_bug.cgi?id=1257201

Apologies in advanced if this turns out to be a false alarm. :( Will continue tomorrow
Flags: needinfo?(noitidart)
Blocks: e10s-addons
tracking-e10s: --- → +
I was thinking this:

1) Because the tab was pinned, the page was loading and asking for information from the ChromeWorker before the ChromeWorker started up

I tested this however it can't be the issue. The ChromeWorker is definitely there before the page is loaded because data on the page initially DOES load. Later on the data does NOT load.

I'll keep trying to reproduce this.
Update, I'm still working on reproducing this. Will keep you all updated.
The part I don't understand is simply doing:

    gCFMM.sendAsyncMessage(core.addon.id, 'rawr')

Throws this:

[Exception... "Component returned failure code: 0x80070057 (NS_ERROR_ILLEGAL_VALUE) [nsIMessageSender.sendAsyncMessage]"  nsresult: "0x80070057 (NS_ERROR_ILLEGAL_VALUE)"  location: "JS frame :: debugger eval code :: <TOP_LEVEL> :: line 1"  data: no]

I don't know how this is possible. Does anyone have any ideas what can cause this?
What is the value of core.addon.id in this test?
Flags: needinfo?(noitidart)
(In reply to Mike Conley (:mconley) - Needinfo me! from comment #24)
> What is the value of core.addon.id in this test?

Thanks Mike! The id in one is Profilist@jetpack and in the other it is Topick@jetpack

I found a curious thing while testing. I have reproduced it. If you install this simple test addon: 

https://github.com/Noitidart/Topick/tree/bug1249929

Then open about about:topick

You will see a counter - 
4/8/2016, 8:45:20 AM -- 1460130320163
page loaded: 4/8/2016, 8:27:14 AM -- 1460129234651

The first line is the updated time. It updates every second. Pin the tab, then restart the browser. Leave it running and eventually it will stop.

The very interesting thing is, if you install profilist from here - https://github.com/Noitidart/Profilist/tree/bug1249929

Then about to about:profilist?html then ALSO pin this tab and restart the browser

You will also see a timer:

http://i.imgur.com/YK6xlZ1.png

Eventually BOTH stop updating at the exact same time! That is very interesting
Flags: needinfo?(noitidart)
Oh by the way the first repo, the Topick one, is made specifically for this bug, very simple just an about page and it updates every second. So that is the reproducible example.

I'm going to go test on a clean profile. I only reproduced on my main profile so far.
Also sometimes it doesnt lock up for hours. Like currently its been going for 8 hours, in my main profile and in a clean profile.
I found a workaround. I have now loaded a framescript to all, and when the page matches my page, I Cu.exportFunction sendAsyncMessage to it. Now the content frame message manager is not going defunct.

https://github.com/Noitidart/Profilist/blob/bug1249929/resources/scripts/MainFramescript.js#L208-L220

So the problem is for sure when I get the content frame message manager from a privileged page like this:

aContentWindow.QueryInterface(Ci.nsIInterfaceRequestor)
		  .getInterface(Ci.nsIDocShell)
		  .QueryInterface(Ci.nsIInterfaceRequestor)
		  .getInterface(Ci.nsIContentFrameMessageManager);

Does anyone know why this is?
putting as lower priority to investigate the cause - but please update if scenario comes that can't be worked around.
Priority: -- → P5
Whiteboard: btpp-followup-2016-03-08 → btpp-followup-2016-03-08, triaged
@shellescalante - I worked around this with a very different method: I am injecting a framescript into all loaded tabs, and when it detects my page it exposes the nsIContentFrameMessageManager from the framescript to the page via MessageChannel API.

This bug still exists, but reproducing it is tricky. Lower priority makes sense.
With Firefox 57 only WebExtensions are permitted and are, by default, e10s compatible.
Status: UNCONFIRMED → RESOLVED
Closed: 7 years ago
Resolution: --- → INCOMPLETE
You need to log in before you can comment on or make changes to this bug.