Closed Bug 1257154 Opened 4 years ago Closed 4 years ago

Room context keeps getting updated in e10s mode when you are in a conversation

Categories

(Hello (Loop) :: Client, defect, P1)

defect

Tracking

(e10s?, firefox47 fixed, firefox48 fixed)

RESOLVED FIXED
mozilla48
Iteration:
48.3 - Apr 25
Tracking Status
e10s ? ---
firefox47 --- fixed
firefox48 --- fixed

People

(Reporter: standard8, Assigned: standard8)

References

Details

(Whiteboard: [btpp-fix-now])

Attachments

(1 file, 2 obsolete files)

When we're in e10s mode, for some reason the room context keeps getting updated - this is probably some sort of cycle/constant listener. It doesn't happen in a non-e10s window.

STR:

1) Set Firefox to start in E10s mode, set "loop.remote.autostart" to true and restart the browser.
2) Open up the "Browser Content Toolbox"
3) Enter a conversation on the link generator side.

=> The console on the content toolbox settles down and stays quiet.

4) Have a link clicker join the conversation

=> After the initial chatter, the following messages keep being repeated on the content toolbox's console:

[Dispatcher] Dispatching action Object { newRoomDescription: "Planet Mozilla", newRoomThumbnail: "…", newRoomURL: "http://planet.mozilla.org/", roomToken: "pWr2FJ64hu0", name: "updateRoomContext" }dispatcher.js:95:9
[Dispatcher] Dispatching action Object { name: "updateRoomContextDone" }
Rank: 15
Priority: -- → P1
Whiteboard: [btpp-fix-now]
I think I've found out what's happening here:

- In e10s mode, there's at two message managers. There's the global one for a window, and there's a "group" one for social.
- According to http://mxr.mozilla.org/mozilla-central/source/dom/base/nsIMessageManager.idl?force=1 a message sent to the "social" message manager propagates up to the tree.

- In our code, we're listening for a DOMTitleChanged from gBrowser.
- When a DOMTitleChanged is fired, we get the details of the current browser window.
- Once the details are obtained, the conversation window sets the document title.
- Setting the document title happens on the socialchat browser, which has the "social" group message manager.
- The social api sets the conversation window title correctly, but then:
- DOMTitleChanged propagates up the tree, back to our gBrowser listener....

and the cycle starts again.
Dave Townsend pointed out to me on irc that the add-on compatibility shims might be getting in the way:

https://developer.mozilla.org/en-US/Firefox/Multiprocess_Firefox/Limitations_of_chrome_scripts#Compatibility_shims

and that we should turn on multiprocess compatible so that we're behaving properly:

https://developer.mozilla.org/en-US/Add-ons/Install_Manifests#multiprocessCompatible

This means making the DOMTitleChanged listener happen via a frame script loaded into the content process:

https://developer.mozilla.org/en-US/Firefox/Multiprocess_Firefox/Limitations_of_chrome_scripts#DOM_Events
https://developer.mozilla.org/en-US/Firefox/Multiprocess_Firefox/Message_Manager

I've got something hacked together now that works, I'll tidy it up tomorrow.
Assignee: nobody → standard8
Ooooh, exciting! Looking forward to the patch.
Comment on attachment 8732170 [details] [review]
[loop] Standard8:bug-1257154-title > mozilla:master

I think I need to write a test for this, but the patch is now functional. Mike, can you give me some feedback whilst I'm still writing the test please?
Attachment #8732170 - Flags: feedback?(mdeboer)
Comment on attachment 8732170 [details] [review]
[loop] Standard8:bug-1257154-title > mozilla:master

This could easily become an r+ when all comments are addressed. Good stuff! (I didn't know we had a DOMTitleChanged event listener there...)
Attachment #8732170 - Flags: feedback?(mdeboer) → feedback+
Blocks: 1258335
Rank: 15 → 9
Comment on attachment 8732170 [details] [review]
[loop] Standard8:bug-1257154-title > mozilla:master

Updated for the feedback comments and added a unit test.
Attachment #8732170 - Flags: review?(mdeboer)
Attachment #8732170 - Flags: review?(mdeboer) → review+
This has landed in m-c with 1.2.3 (bug 1259245).
Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Per bug 1259245, this was backed out of central due to crashes & leaks:

> Backed out in https://hg.mozilla.org/mozilla-central/rev/d5f3da0cfe7c for
> very frequent 10.10 opt e10s mochitest-5 crashes like
> https://treeherder.mozilla.org/logviewer.html#?job_id=8211122&repo=fx-team

> This also caused leaks in other Linux bc tests and in Windows 7 debug
> M-e10s(dt1):
> https://treeherder.mozilla.org/logviewer.html#?job_id=8210013&repo=fx-team

> This also caused leaks in other Linux bc tests and in Windows 7 debug
> M-e10s(dt1):
> https://treeherder.mozilla.org/logviewer.html#?job_id=8210013&repo=fx-team
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
(In reply to Mark Banner (:standard8) from comment #11)
> > This also caused leaks in other Linux bc tests and in Windows 7 debug
> > M-e10s(dt1):
> > https://treeherder.mozilla.org/logviewer.html#?job_id=8210013&repo=fx-team
> 
> > This also caused leaks in other Linux bc tests and in Windows 7 debug
> > M-e10s(dt1):
> > https://treeherder.mozilla.org/logviewer.html#?job_id=8210013&repo=fx-team

My best guess for these: The frame scripts don't get cleaned up (and can't be, as they don't have that functionality), so this is causing extra leaks in the processes. However, the logs say that the tab process has a threshold of 10000 bytes, but its only leaking 2564 ish bytes.

So I'm confused. I certainly can't see anything we're doing wrong with this patch - we might need to bump the leak threshold, but I can't see where we'd do that.

Justin, any ideas who to ask here?
Flags: needinfo?(dolske)
Try push with fresh fx-team baseline:

https://treeherder.mozilla.org/#/jobs?repo=try&revision=4f2ebdabfcea

Try push with only enabling Loop as a multiprocess compatible add-on:

https://treeherder.mozilla.org/#/jobs?repo=try&revision=1a53c51c753b
Interestingly, just adding this to install.rdf is enough to cause the leaks:

<em:multiprocessCompatible>true</em:multiprocessCompatible>

I've now got another build in progress which drops everything in the add-on's startup & shutdown functions, to see if its likely to be something we're doing, or if there's an issue in the platform:

https://treeherder.mozilla.org/#/jobs?repo=try&revision=3b328c195636
Flags: needinfo?(dolske)
(In reply to Mark Banner (:standard8) from comment #15)
> Interestingly, just adding this to install.rdf is enough to cause the leaks:
> 
> <em:multiprocessCompatible>true</em:multiprocessCompatible>

Sure seems like it! Perhaps Dave has a hint as to why this singular manifest setting might cause a system add-on to leak the world?
Status: REOPENED → ASSIGNED
Flags: needinfo?(dtownsend)
(In reply to Mike de Boer [:mikedeboer] from comment #16)
> (In reply to Mark Banner (:standard8) from comment #15)
> > Interestingly, just adding this to install.rdf is enough to cause the leaks:
> > 
> > <em:multiprocessCompatible>true</em:multiprocessCompatible>
> 
> Sure seems like it! Perhaps Dave has a hint as to why this singular manifest
> setting might cause a system add-on to leak the world?

That manifest change switches off the special shims that attempt to make non-e10s safe code work in an e10s world. It's possible that those shims were protecting you from leaks somehow.
Flags: needinfo?(dtownsend)
(In reply to Dave Townsend [:mossop] from comment #17)
> (In reply to Mike de Boer [:mikedeboer] from comment #16)
> > (In reply to Mark Banner (:standard8) from comment #15)
> > > Interestingly, just adding this to install.rdf is enough to cause the leaks:
> > > 
> > > <em:multiprocessCompatible>true</em:multiprocessCompatible>
> > 
> > Sure seems like it! Perhaps Dave has a hint as to why this singular manifest
> > setting might cause a system add-on to leak the world?
> 
> That manifest change switches off the special shims that attempt to make
> non-e10s safe code work in an e10s world. It's possible that those shims
> were protecting you from leaks somehow.

I did a push to try server on Friday that turned off basically everything in Loop's bootstrap.js.

That basically gets us to loading the add-on, and a few overrides/other items mentioned in chrome.manifest.

However try server is still seeing leaks :-(
Unassigning myself, as I feel I don't have the time to work on fixing the other issues this week. I'm trying to find another owner.
Assignee: standard8 → nobody
Status: ASSIGNED → NEW
Bill, Dave said you might be able to help with understanding what the memory leaks are here:

https://treeherder.mozilla.org/#/jobs?repo=try&revision=3b328c195636&selectedJob=18927614

(bc4/bc5 e10s oranges).
Flags: needinfo?(wmccloskey)
Loop team is asking for some help here.
Ian Bicking gave me some ideas earlier today. I'm currently trying a try build without the em:multiprocessCompatible flag set. So far it seems to be working:

https://treeherder.mozilla.org/#/jobs?repo=try&revision=f6f39926d9c3

However, that still means there's an issue with dropping those shims. Although I tried killing most of bootstrap before, I didn't kill everything. So there may still be something interacting badly. Its concerning that it is non-obvious if that's the case.

If the try build continues successfully, I'll land the reduced patch in the morning, however, I'll be spinning a separate bug or two for the issues with the multiprocessCompatible flags. So any help in the meantime would be appreciated.
Blocks: 1260811
This is the same as the previous patch, but without the multiprocessCompatible flag to remove the shims. Try server said this was good, so I'm going to land it and file a new bug for adding the flag.
Attachment #8738655 - Flags: review+
Attachment #8737170 - Attachment is obsolete: true
Attachment #8732170 - Attachment is obsolete: true
Assignee: nobody → standard8
Iteration: 48.2 - Apr 4 → 48.3 - Apr 18
Blocks: 1262560
I've moved the em:multiprocessCompatible issues out to bug 1262560.
Flags: needinfo?(wmccloskey)
https://hg.mozilla.org/mozilla-central/rev/1725b460c3e0
Status: NEW → RESOLVED
Closed: 4 years ago4 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla48
Comment on attachment 8738655 [details] [diff] [review]
Switch to getting DOM title updates via a frame script to work better with e10s.

Approval Request Comment
[Feature/regressing bug #]: Firefox Hello / e10s
[User impact if declined]: Performance is impacted whilst in a conversation in e10s mode (could cause flickering as well). 
[Describe test coverage new/current, TreeHerder]: Landed in m-c, has tests for the new work.
[Risks and why]: Low - moving to existing well-known frame script mechanism.
[String/UUID change made/needed]: None
Attachment #8738655 - Flags: approval-mozilla-aurora?
Comment on attachment 8738655 [details] [diff] [review]
Switch to getting DOM title updates via a frame script to work better with e10s.

Improves Hello + e10s experience, Aurora47+
Attachment #8738655 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
(In reply to Mark Banner (:standard8) from comment #27)
> Also landed in Loop repo:
> 
> https://github.com/mozilla/loop/commit/
> 36aa310233988caf716469f7397c0e7a95cd7d1d

It appears I forgot to add the new file, which we detected on re-export:

https://github.com/mozilla/loop/commit/3365b697a32e5160d9cbf4d3b1694268d64716b6
Blocks: 1265865
You need to log in before you can comment on or make changes to this bug.