<a class="header-button" href="https://bugzilla.mozilla.org/home" title="Go to home page"> Bugzilla

Brad Lassey [:blassey] (use needinfo?)

Comment 1

•

9 years ago

With CPOWs removed on window closing (bug 1171708) and shutdown (bug 1177310) we don't need the SyncHandler anymore, correct. I'm looking forward to that day :)

Updated

•

9 years ago

Assignee: nobody → mconley

Priority: -- → P1

Assignee

Updated

•

9 years ago

Blocks: 1185674

Assignee

Updated

•

9 years ago

Depends on: 1177310

Assignee

Updated

•

9 years ago

Status: NEW → ASSIGNED

Assignee

Comment 2

•

9 years ago

Attached file MozReview Request: Bug 1195295 - Remove SessionStore's SyncHandler since all tab and window flushing is now async. r=ttaubert — Details

Bug 1195295 - Remove SessionStore's SyncHandler since all tab and window flushing is now async. r?ttaubert

Attachment #8694367 - Flags: review?(ttaubert)

https://treeherder.mozilla.org/#/jobs?repo=try&revision=ebcc2e1d0396

Assignee

Comment 3

•

9 years ago

Comment 4

•

9 years ago

Comment on attachment 8694367 [details] MozReview Request: Bug 1195295 - Remove SessionStore's SyncHandler since all tab and window flushing is now async. r=ttaubert https://reviewboard.mozilla.org/r/26733/#review24183 ::: browser/components/sessionstore/TabState.jsm:62 (Diff revision 1) > + browserReady(browser) { > + // Even if we've already got the browser in _latestMessageID, > + // we reset the latest ID to 0 because the browser has reloaded > + // the content script, and therefore reset its own internal notion > + // of the most recently sent update ID. > this._latestMessageID.set(browser.permanentKey, 0); > }, We don't need to keep this and can get rid of |_latestMessageID|. (more on that below) ::: browser/components/sessionstore/TabState.jsm:74 (Diff revision 1) > + if (!this._latestMessageID.has(browser.permanentKey)) { > + // We've never seen this browser before, so initialize the latest > + // seen ID at 0 so that we can ensure that we only ever handle > + // updates with increasing IDs. > + this._latestMessageID.set(browser.permanentKey, 0); > + } > + > // Only ever process messages that have an ID higher than the last one we > // saw. This ensures we don't use stale data that has already been received > // synchronously. > if (id > this._latestMessageID.get(browser.permanentKey)) { > this._latestMessageID.set(browser.permanentKey, id); > TabStateCache.update(browser, data); > } We don't actually need this with the MessageQueue simplifications, and we can remove the check in |update()| now too. Just always call |TabStateCache.update()| directly. ::: browser/components/sessionstore/content/content-sessionStore.js (Diff revision 1) > - flush: function (id) { > - MessageQueue.flush(id); > - }, MessageQueue.flush() can be removed, nothing uses it anymore. MessageQueue.send() doesn't need the |sync| and |id| options anymore. It should never use |sendRpcMessage|. The whole MessageQueue can be simplified as we don't need to account for sync flushes while we're waiting for an async update message to arrive. We don't need |_lastUpdated| anymore. |send()| simply iterates over |_data| and clears it. Guess that means we can remove |_id| as well.

Attachment #8694367 - Flags: review?(ttaubert)

Assignee

Comment 5

•

9 years ago

Comment on attachment 8694367 [details] MozReview Request: Bug 1195295 - Remove SessionStore's SyncHandler since all tab and window flushing is now async. r=ttaubert Review request updated; see interdiff: https://reviewboard.mozilla.org/r/26733/diff/1-2/

Attachment #8694367 - Flags: review?(ttaubert)

Comment 6

•

9 years ago

Comment on attachment 8694367 [details] MozReview Request: Bug 1195295 - Remove SessionStore's SyncHandler since all tab and window flushing is now async. r=ttaubert https://reviewboard.mozilla.org/r/26733/#review24411 LGTM if try agrees. ::: browser/components/sessionstore/content/content-sessionStore.js (Diff revision 2) > - let sync = options && options.sync; > - let startID = (options && options.id) || this._id; The function comment needs to be adjusted, |sync| and |id| references should be removed.

Attachment #8694367 - Flags: review?(ttaubert) → review+

Assignee

Comment 7

•

9 years ago

Comment on attachment 8694367 [details] MozReview Request: Bug 1195295 - Remove SessionStore's SyncHandler since all tab and window flushing is now async. r=ttaubert Review request updated; see interdiff: https://reviewboard.mozilla.org/r/26733/diff/2-3/

Attachment #8694367 - Attachment description: MozReview Request: Bug 1195295 - Remove SessionStore's SyncHandler since all tab and window flushing is now async. r?ttaubert → MozReview Request: Bug 1195295 - Remove SessionStore's SyncHandler since all tab and window flushing is now async. r=ttaubert

https://treeherder.mozilla.org/#/jobs?repo=try&revision=d1275f0d1f6e

Assignee

Comment 8

•

9 years ago

https://treeherder.mozilla.org/#/jobs?repo=try&revision=e59e94384a0d

Assignee

Comment 9

•

9 years ago

https://treeherder.mozilla.org/#/jobs?repo=try&revision=dbab41305297

Assignee

Comment 10

•

9 years ago

Assignee

Comment 11

•

9 years ago

Comment on attachment 8694367 [details] MozReview Request: Bug 1195295 - Remove SessionStore's SyncHandler since all tab and window flushing is now async. r=ttaubert Review request updated; see interdiff: https://reviewboard.mozilla.org/r/26733/diff/3-4/

:Felipe Gomes (needinfo for replies!)

Assignee

Comment 12

•

9 years ago

Attached file MozReview Request: Bug 1195295 - Fix browser_tab_detach_restore.js and browser_popupNotifications_4.js r?felipe — Details

Bug 1195295 - Fix browser_tab_detach_restore.js and browser_popupNotifications_4.js r?felipe These tests seemed to rely on the synchronous infrastructure that the now defunct SyncHandler for SessionStore provided. This moves them to safer, more modern alternatives.

Attachment #8697176 - Flags: review?(felipc)

Updated

•

9 years ago

Attachment #8697176 - Flags: review?(felipc) → review+

:Felipe Gomes (needinfo for replies!)

Comment 13

•

9 years ago

Comment on attachment 8697176 [details] MozReview Request: Bug 1195295 - Fix browser_tab_detach_restore.js and browser_popupNotifications_4.js r?felipe https://reviewboard.mozilla.org/r/27527/#review24823

Assignee

Comment 14

•

9 years ago

Comment on attachment 8694367 [details] MozReview Request: Bug 1195295 - Remove SessionStore's SyncHandler since all tab and window flushing is now async. r=ttaubert Review request updated; see interdiff: https://reviewboard.mozilla.org/r/26733/diff/4-5/

Assignee

Comment 15

•

9 years ago

Comment on attachment 8697176 [details] MozReview Request: Bug 1195295 - Fix browser_tab_detach_restore.js and browser_popupNotifications_4.js r?felipe Review request updated; see interdiff: https://reviewboard.mozilla.org/r/27527/diff/1-2/

Assignee

Comment 16

•

9 years ago

Attached file MozReview Request: Bug 1195295 - Remove race-y SDK test that relied on slow content-sessionStore.js load. r=Mossop — Details

Bug 1195295 - Remove race-y SDK test that relied on slow content-sessionStore.js load. r=Mossop

Assignee

Comment 17

•

9 years ago

Comment on attachment 8697424 [details] MozReview Request: Bug 1195295 - Remove race-y SDK test that relied on slow content-sessionStore.js load. r=Mossop Mossop r+'d the SDK test removal over IRC.

Attachment #8697424 - Flags: review+

Assignee

Comment 18

•

9 years ago

https://hg.mozilla.org/integration/fx-team/rev/18eec0849a1f53377704d2602593561ef631a50a Bug 1195295 - Remove SessionStore's SyncHandler since all tab and window flushing is now async. r=ttaubert https://hg.mozilla.org/integration/fx-team/rev/5d5885a15ca7ec5649ab207e39a1ba589d3985bb Bug 1195295 - Fix browser_tab_detach_restore.js and browser_popupNotifications_4.js r=felipe https://hg.mozilla.org/integration/fx-team/rev/05b98fe2ac4f8d4aade367623f2931b652f543d0 Bug 1195295 - Remove race-y SDK test that relied on slow content-sessionStore.js load. r=Mossop

https://hg.mozilla.org/mozilla-central/rev/18eec0849a1f https://hg.mozilla.org/mozilla-central/rev/5d5885a15ca7 https://hg.mozilla.org/mozilla-central/rev/05b98fe2ac4f https://hg.mozilla.org/mozilla-central/rev/fff7c04182f9

Assignee

Comment 19

•

9 years ago

https://hg.mozilla.org/integration/fx-team/rev/fff7c04182f9f5ca24e9656e1458efa100b016cc Bug 1195295 - Follow-up: remove other race-y SDK test that I failed to remove in 05b98fe2ac4f. r=Mossop

Carsten Book [:Tomcat]

Comment 20

•

9 years ago

bugherder

Status: ASSIGNED → RESOLVED

Closed: 9 years ago

status-firefox45: --- → fixed

Resolution: --- → FIXED

Target Milestone: --- → Firefox 45

Phil Ringnalda (:philor)

Comment 21

•

9 years ago

Here's something fun: https://treeherder.mozilla.org/#/jobs?repo=fx-team&tochange=05b98fe2ac4f&fromchange=6320b1962b04&filter-searchStr=win%20debug%20browser-chrome-7

Depends on: 1220517

Flags: needinfo?(mconley)

Assignee

Comment 22

•

9 years ago

Huh. Alright. I'll poke at this in bug 1220517 - thanks for letting me know, philor.

Flags: needinfo?(mconley)

Comment 23

•

9 years ago

Backing out commit 18eec0849a1f fixes the permaorange on Aurora (bug 1234697): https://treeherder.mozilla.org/#/jobs?repo=try&revision=557502e5b88a

Comment 24

•

9 years ago

I backed out 18eec0849a1f for causing intermittent leaks. If it sticks, I'll get Aurora approval for backing it out there.

Status: RESOLVED → REOPENED

Resolution: FIXED → ---

Comment 25

•

9 years ago

Attached patch back out 18eec0849a1f for causing permaleaks on Aurora — Details — Splinter Review

Approval Request Comment [Feature/regressing bug #]: this bug [User impact if declined]: I'm not sure what the actual user impact would be, but this patch is causing a permanent orange on TreeHerder. [Describe test coverage new/current, TreeHerder]: There are tests for this feature. I've already backed this patch out on inbound, though the tests haven't finished yet, so I'm not sure if it will stick. [Risks and why]: It will reintroduce whatever session store issue this was supposed to fix, but this bug was around for a while before it was fixed so hopefully it isn't too bad. It is also possible some other changes depend on this, but hopefully nothing too bad if my landing is okay on inbound. [String/UUID change made/needed]: none

Attachment #8701604 - Flags: approval-mozilla-aurora?

Comment 26

•

9 years ago

Comment on attachment 8701604 [details] [diff] [review] back out 18eec0849a1f for causing permaleaks on Aurora Apparently the chunking works differently on Aurora, so I'm not 100% sure yet this fixes the leak. I'll reland or something if I was wrong, but I'm clearing the Aurora flag for now. backout: https://hg.mozilla.org/integration/mozilla-inbound/rev/76d888a191cf

Attachment #8701604 - Flags: approval-mozilla-aurora?

Assignee

Updated

•

9 years ago

Flags: needinfo?(mconley)

Comment 27

•

9 years ago

Comment on attachment 8701604 [details] [diff] [review] back out 18eec0849a1f for causing permaleaks on Aurora Approval Request Comment (see comment 25)

Attachment #8701604 - Flags: approval-mozilla-aurora?

Joel Maher ( :jmaher ) (UTC -8)

Comment 28

•

9 years ago

I triggered bc7 for my Aurora try run, and it is green, so hopefully the backout fixes bug 1234697.

Comment 29

•

9 years ago

bummer we had to back this out- we had a perf win in the linux64 sessionrestore tests which showed up as a regression in talos when we backed this out. maybe we can fix this later and get the win back:)

Sylvestre Ledru [:Sylvestre]

Comment 30

•

9 years ago

Comment on attachment 8701604 [details] [diff] [review] back out 18eec0849a1f for causing permaleaks on Aurora ok, let's try that. thanks

Attachment #8701604 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+

Comment 31

•

9 years ago

Backout needed on aurora.

Keywords: checkin-needed

Updated

•

9 years ago

status-firefox46: --- → affected

Assignee

Comment 32

•

9 years ago

I'll wait for :ting to sort out things in bug 1220517 before attempting to reland this.

Flags: needinfo?(mconley)

Wes Kocher (:KWierso) (Not reading bugmail; email directly if needed)

Comment 33

•

9 years ago

(In reply to Joel Maher (:jmaher) from comment #29) > bummer we had to back this out- we had a perf win in the linux64 > sessionrestore tests which showed up as a regression in talos when we backed > this out. maybe we can fix this later and get the win back:) This is great to hear. We expected that and that was partly the reason we wanted to remove it. Still great to confirm :)

Comment 34

•

9 years ago

backout bugherder uplift

https://hg.mozilla.org/releases/mozilla-aurora/rev/ac8bea113074

Keywords: checkin-needed

Wes Kocher (:KWierso) (Not reading bugmail; email directly if needed)

Updated

•

9 years ago

status-firefox45: fixed → affected

https://treeherder.mozilla.org/#/jobs?repo=try&revision=16b981ba5ac4

Assignee

Comment 35

•

9 years ago

Ryan VanderMeulen [:RyanVM]

Updated

•

9 years ago

Target Milestone: Firefox 45 → ---

Comment 36

•

9 years ago

The reason why attachment 8694367 [details] causes leak easily was explained in bug 1220517 comment 53, which it creates a lot more CC logs. From Try [1] [2], the number of processing count of CCAnalyzer without and with the patch: Without: 03:01:17 INFO - CCAnalyzer.processingCount=28998 With: 07:15:57 INFO - CCAnalyzer.processingCount=98373 I downloaded the cc/gc edge logs [3] from both Try runs. The only thing I can tell now is there're unknown edges to keep some DOMEventTargetHelper alive, which point to the functions in content-sessionStore.js when the patch is applied. For example: ting@ting-pc:~/w/fx/tools/heapgraph/cc$ python find_roots.py ~/Desktop/cc-edges.2168.log 133C1C20 Parsing /home/ting/Desktop/cc-edges.2168.log. Done loading graph. Reversing graph. Done. 19B73200 [DOMEventTargetHelper ] --[mAnonymousGlobalScopes[i]]--> 266E26E0 [JS Object (Block)] --[enclosing_environment]--> 266E2680 [JS Object (NonSyntacticVariablesObject)] --[MessageQueue]--> 133C1C20 [JS Object (Object)] Root 19B73200 is a ref counted object with 1 unknown edge(s). known edges: 1EE80E50 [XPCWrappedNative (ContentFrameMessageManager)] --[mIdentity]--> 19B73200 [1] https://treeherder.mozilla.org/#/jobs?repo=try&revision=aa52faa7074a [2] https://treeherder.mozilla.org/#/jobs?repo=try&revision=487f96a26495 [3] https://goo.gl/lhaB08

Assignee

Comment 37

•

9 years ago

I think I know what's going on here - I think I'm supposed to be clearing the functions in the MessageQueue data map after a flush, and that stuff got removed. Let me see if that clears the orange. Gimme a few minutes to rebase and push.

https://treeherder.mozilla.org/#/jobs?repo=try&revision=95c11e746e0d

Assignee

Comment 38

•

9 years ago

Assignee

Comment 39

•

9 years ago

I'm afraid that theory was bunk. Exploring other scenarios now.

Comment 40

•

9 years ago

(In reply to Mike Conley (:mconley) - Needinfo me! from comment #39) > I'm afraid that theory was bunk. Exploring other scenarios now. I tried [1] to clear |_data| also in the early return [2] and I can't see the functions from content-sessionStore.js, but CCAnalyzer processed more entries (>100000). [1] https://treeherder.mozilla.org/#/jobs?repo=try&revision=8709210e62c0&selectedJob=16035233 [2] https://dxr.mozilla.org/mozilla-central/source/browser/components/sessionstore/content/content-sessionStore.js#754

Comment 41

•

9 years ago

(In reply to Ting-Yu Chou [:ting] from comment #36) > The reason why attachment 8694367 [details] causes leak easily was explained > in bug 1220517 comment 53, which it creates a lot more CC logs. From Try [1] > [2], the number of processing count of CCAnalyzer without and with the patch: > > Without: 03:01:17 INFO - CCAnalyzer.processingCount=28998 > With: 07:15:57 INFO - CCAnalyzer.processingCount=98373 > > I downloaded the cc/gc edge logs [3] from both Try runs. The only thing I > can tell now is there're unknown edges to keep some DOMEventTargetHelper > alive, which point to the functions in content-sessionStore.js when the > patch is applied. For example: > > ting@ting-pc:~/w/fx/tools/heapgraph/cc$ python find_roots.py > ~/Desktop/cc-edges.2168.log 133C1C20 > Parsing /home/ting/Desktop/cc-edges.2168.log. Done loading graph. > Reversing graph. Done. > > 19B73200 [DOMEventTargetHelper ] > --[mAnonymousGlobalScopes[i]]--> 266E26E0 [JS Object (Block)] > --[enclosing_environment]--> 266E2680 [JS Object > (NonSyntacticVariablesObject)] > --[MessageQueue]--> 133C1C20 [JS Object (Object)] > > Root 19B73200 is a ref counted object with 1 unknown edge(s). > known edges: > 1EE80E50 [XPCWrappedNative (ContentFrameMessageManager)] > --[mIdentity]--> 19B73200 > > [1] https://treeherder.mozilla.org/#/jobs?repo=try&revision=aa52faa7074a > [2] https://treeherder.mozilla.org/#/jobs?repo=try&revision=487f96a26495 > [3] https://goo.gl/lhaB08 Andrew, do you think you could look at these logs and try to figure out what's going on?

Flags: needinfo?(continuation)

Comment 42

•

9 years ago

mconley looked at them a little bit with some of my guidance. I'm not sure if he came to any conclusions. I'll try to look later in the week.

Assignee

Comment 43

•

9 years ago

I have been poking at this without too much success, though I think I've made some maybe interesting discoveries. I didn't get too far examining the GC / CC logs, primarily because I wasn't exactly sure what I was looking for, patterns or otherwise. What I ended up doing was breaking up my patch into 5 atomic chunks to see which one would cause the orange to rear its head. After breaking up the chunks, I landed them on try, one on top of the other, and here's what I got: Chunk 1: https://treeherder.mozilla.org/#/jobs?repo=try&revision=bd0c471ec83e Chunk 2: https://treeherder.mozilla.org/#/jobs?repo=try&revision=e1c170a77644 CULPRIT Chunk 3: https://treeherder.mozilla.org/#/jobs?repo=try&revision=a0f9c2f1ec43 Chunk 4: https://treeherder.mozilla.org/#/jobs?repo=try&revision=54e9ee9cbf6f Chunk 5: https://treeherder.mozilla.org/#/jobs?repo=try&revision=54e9ee9cbf6f So the second chunk, when overlaid on top of the first, seems to introduce the issue. I thought that was useful. Then, I started to break Chunk 2 down into smaller bits, again overlaying them on top of one another to see which chunk causes the orange to appear. Remove _latestMessageID: https://treeherder.mozilla.org/#/jobs?repo=try&revision=61c5c0b25240 Make setSyncHandler a no-op: https://treeherder.mozilla.org/#/jobs?repo=try&revision=1dc6b98cdfd6 Don’t respond to setupSyncHandler message: https://treeherder.mozilla.org/#/jobs?repo=try&revision=ba95840be976 Don’t listen for setupSyncHandler on closed: https://treeherder.mozilla.org/#/jobs?repo=try&revision=67bc80826d4d Don’t listen for setupSyncHandler at all: https://treeherder.mozilla.org/#/jobs?repo=try&revision=f129ec2ca7f6 This one is interesting, because making setSyncHandler a no-op seems to make the orange appear, but _not responding_ to the SessionStore:setupSyncHandler message seems to make the orange _much_ more frequent, though I'm not certain why. So I feel like I'm narrowing in on it, but I honestly have _no idea_ how these little chunks might be introducing a need for so much cycle collection.

Kyle Huey (Exited; not receiving bugmail, old account, do not use)

Comment 44

•

9 years ago

The test browser/base/content/test/general/browser_syncui.js can reproduce the issue (comment 36) consistently.

Comment 45

•

9 years ago

https://khuey.pastebin.mozilla.org/8858809

https://treeherder.mozilla.org/#/jobs?repo=try&revision=f84ea86934f9

Assignee

Comment 46

•

9 years ago

Assignee

Comment 47

•

9 years ago

Doing a try build with MOZ_CC_ALL_TRACES=1 and MOZ_CC_LOG_ALL=1 resulted in green tests. :/ https://treeherder.mozilla.org/#/jobs?repo=try&revision=a1d8688865fd&selectedJob=16437082

https://treeherder.mozilla.org/#/jobs?repo=try&revision=6906bb9f296d

Assignee

Comment 48

•

9 years ago

Comment 49

•

9 years ago

I looked at this a little bit. The DETH in comment 36 is whatever is implementing ContentFrameMessageManager. You can tell because an XPCWN (ContentFrameMessageManager) has an mIdentity field that points to it, so that's the underlying native for the XPCWN. I think. I was looking at a different set of objects, JS Functions for SpecialPowersAPI.prototype.* methods, but they were also being held alive by ContentFrameMessageManager, through mAnonymousGlobalScopes. I'm not sure why all of this JS is being entrained there.

Flags: needinfo?(continuation)

Assignee

Comment 50

•

9 years ago

So one thing I've noticed looking at some of my earlier try runs, is that there are a lot more occurences of Promises throwing with messages like "A coding exception was thrown in a Promise resolution callback." in my green runs than there are in my orange runs. I found some of the tests that were triggering those Promise exceptions, and disabled them ham-fistedly to see if that'd make any kind of impact. Here's a try build with just one of the tests disabled: https://treeherder.mozilla.org/#/jobs?repo=try&revision=868c894fd7a4 I might be imagining it, but it looks like things have gotten slightly better. I wonder if somehow the mysterious anonymous scopes are these in-flight Promises that haven't yet resolved or rejected. I've disabled more Promise-throwing tests in this push: https://treeherder.mozilla.org/#/jobs?repo=try&revision=1536ac8ea9f4

https://treeherder.mozilla.org/#/jobs?repo=try&revision=9e3038110c01

Assignee

Comment 51

•

9 years ago

Assignee

Comment 52

•

9 years ago

Forcing a CC / GC between tests doesn't appear to help either: https://treeherder.mozilla.org/#/jobs?repo=try&revision=25f217365060

Comment 53

•

9 years ago

From ting's log in comment 36, without the patch, there are 52 content script scopes (I think that is what mAnonymousGlobalScopes is). With the patch, there are 87. It looks like they are never cleared until the message manager goes away.

Comment 54

•

9 years ago

I'm trying to write a patch that will let us see what those content scripts are.

Assignee

Comment 55

•

9 years ago

Hey ting, I've been trying to reproduce this locally, and you mentioned in comment 44 that browser_syncui.js would do the trick, but I'm not seeing the nsBaseAppShell leak on my Windows machine in a debug build when running that test. Or did you mean that when we run that test, we seem to increase the amount of cycle collection that we do?

Flags: needinfo?(janus926)

Comment 56

•

9 years ago

(In reply to Andrew McCreight [:mccr8] from comment #53) > From ting's log in comment 36, without the patch, there are 52 content > script scopes (I think that is what mAnonymousGlobalScopes is). With the > patch, there are 87. It looks like they are never cleared until the message > manager goes away. It is odd to see more anon global scopes, but it is expected that whatever scopes we have, stay there until the end. Might be useful to log in nsMessageManagerScriptExecutor::LoadScriptInternal which all scripts get loaded to anon scopes.

Comment 57

•

9 years ago

I applied the first two patches from comment 43 and ran tests: https://treeherder.mozilla.org/#/jobs?repo=try&revision=a5f6f4f7690d (without patches from this bug) https://treeherder.mozilla.org/#/jobs?repo=try&revision=b55a77320e3a (with patches) A diff of the frame scripts seen by the cycle collector between two bc7 runs: --- /dev/fd/63 2016-02-09 19:24:31.065413771 -0500 +++ /dev/fd/62 2016-02-09 19:24:31.105414148 -0500 @@ -1,10 +1,16 @@ +chrome://browser/content/content.js chrome://browser/content/content-sessionStore.js chrome://browser/content/content-UITour.js +chrome://browser/content/tab-content.js +chrome://global/content/browser-content.js chrome://global/content/manifestMessages.js chrome://mochikit/content/tests/BrowserTestUtils/content-task.js chrome://mochikit/content/tests/BrowserTestUtils/content-utils.js chrome://mochikit/content/tests/SimpleTest/AsyncUtilsContent.js chrome://satchel/content/formSubmitListener.js +chrome://specialpowers/content/MozillaLogger.js +chrome://specialpowers/content/specialpowersAPI.js +chrome://specialpowers/content/specialpowers.js data:,(function mGlobal mListenerManager (The data:,(function URL comes from browser_windowactivation.js, in case you were wondering.) So it looks like we are entraining some fairly significant scripts with the patches applied. Apologies for the dumb question, but why are we seeing this problem in non-e10s mochitests? Everything should be in the same process in that case, right...?

Comment 58

•

9 years ago

(In reply to Mike Conley (:mconley) - Needinfo me! from comment #55) > Or did you mean that when we run that test, we seem to increase the amount > of cycle collection that we do? Yes, this is what I meant.

Flags: needinfo?(janus926)

Comment 59

•

9 years ago

(In reply to Nathan Froyd [:froydnj] from comment #57) > > A diff of the frame scripts seen by the cycle collector between two bc7 runs: Perhaps CC isn't too interesting here, but comment 53. Why we end up getting so much more frame scripts with the patch? And which ones (urls) we have with the patch? > Apologies for the dumb question, but why are we seeing this problem in > non-e10s mochitests? Everything should be in the same process in that case, > right...? Not sure if I understand the question, but... non-e10s uses in-process message managers so that the setup can be as similar as possible to the e10s case.

Comment 60

•

9 years ago

(In reply to Olli Pettay [:smaug] (high review load) from comment #59) > (In reply to Nathan Froyd [:froydnj] from comment #57) > > > > A diff of the frame scripts seen by the cycle collector between two bc7 runs: > Perhaps CC isn't too interesting here, but comment 53. Why we end up getting > so much more frame scripts with the patch? > And which ones (urls) we have with the patch? All the URLs present in that diff are the ones we load; we load them multiple times, so that accounts for the 52 vs. 87 number in comment 53. To be unambiguously clear, we're only loading: chrome://browser/content/content-sessionStore.js chrome://browser/content/content-UITour.js chrome://global/content/manifestMessages.js chrome://mochikit/content/tests/BrowserTestUtils/content-task.js chrome://mochikit/content/tests/BrowserTestUtils/content-utils.js chrome://mochikit/content/tests/SimpleTest/AsyncUtilsContent.js chrome://satchel/content/formSubmitListener.js data:,(function without the patch and: chrome://browser/content/content.js chrome://browser/content/content-sessionStore.js chrome://browser/content/content-UITour.js chrome://browser/content/tab-content.js chrome://global/content/browser-content.js chrome://global/content/manifestMessages.js chrome://mochikit/content/tests/BrowserTestUtils/content-task.js chrome://mochikit/content/tests/BrowserTestUtils/content-utils.js chrome://mochikit/content/tests/SimpleTest/AsyncUtilsContent.js chrome://satchel/content/formSubmitListener.js chrome://specialpowers/content/MozillaLogger.js chrome://specialpowers/content/specialpowersAPI.js chrome://specialpowers/content/specialpowers.js data:,(function with the patch.

Comment 61

•

9 years ago

The ContentFrameMessageManager was marked gray during GC, but without the patch, it will be unmarked to black when FixWeakMappingGrayBitsTracer::trace() is called for the WeakMap |_syncHandlers|: https://dxr.mozilla.org/mozilla-central/source/xpcom/base/nsCycleCollector.cpp#3791 https://dxr.mozilla.org/mozilla-central/source/xpcom/base/nsCycleCollector.cpp#3478 which winds up calling JS::UnmarkGrayGCThingRecursively() and unmark everything SyncHandler instances reference to: https://dxr.mozilla.org/mozilla-central/source/xpcom/base/CycleCollectedJSRuntime.cpp#269 This is why there're much less CC participants without the patch. But I don't know should CC check all these things when the patch is applied, it's a lot of them. So it's not there're more anon global scopes, it's just they are marked gray or black. For example, from the log of comment 36, without the patch: cc-edges.2548.log: 18AAA600 [rc=1] DOMEventTargetHelper > 18083900 mAnonymousGlobalScopes[i] > 180B4340 mAnonymousGlobalScopes[i] > 18C0AEA0 mAnonymousGlobalScopes[i] > 18C2C020 mAnonymousGlobalScopes[i] > 18C2CE40 mAnonymousGlobalScopes[i] > 194697E0 mAnonymousGlobalScopes[i] > 1B153C00 mAnonymousGlobalScopes[i] gc-edges.2548.log: 18083860 B mAnonymousGlobalScopes[i] 18083900 G mAnonymousGlobalScopes[i] 18083980 B mAnonymousGlobalScopes[i] 18083F00 B mAnonymousGlobalScopes[i] 180B41C0 B mAnonymousGlobalScopes[i] 180B4340 G mAnonymousGlobalScopes[i] 183D7940 B mAnonymousGlobalScopes[i] 18C0A360 B mAnonymousGlobalScopes[i] 18C0AEA0 G mAnonymousGlobalScopes[i] 18C2C020 G mAnonymousGlobalScopes[i] 18C2CE40 G mAnonymousGlobalScopes[i] 194697E0 G mAnonymousGlobalScopes[i] 1B153C00 G mAnonymousGlobalScopes[i]

Comment 62

•

9 years ago

That's very interesting! Olli, is there something that would be holding alive these message managers we could check, and then mark these mAnonymousGlobalScopes black?

Flags: needinfo?(bugs)

Comment 63

•

9 years ago

We do call nsInProcessTabChildGlobal::MarkForCC() normally and that should optimize out mAnonymousGlobalScopes from CC graph. Are we possibly leaking a nsInProcessTabChildGlobal temporarily? If anyone has STR (not tryserver run) to get mAnonymousGlobalScopes showing up in the graph, I'd be happy to look at it. But I'll upload a patch which _might_ help here anyhow.

Flags: needinfo?(bugs)

https://treeherder.mozilla.org/#/jobs?repo=try&revision=439ef7a7aea4

Assignee

Comment 64

•

9 years ago

Comment 65

•

9 years ago

(In reply to Olli Pettay [:smaug] (high review load) from comment #63) > But I'll upload a patch which _might_ help here anyhow. I managed to upload it to wrong bug bug 1220517#c78

Comment 66

•

9 years ago

(In reply to Olli Pettay [:smaug] (high review load) from comment #63) > Are we possibly leaking a nsInProcessTabChildGlobal temporarily? I am also wondering. When is the expected timing of destroying nsInProcessTabChildGlobal after calling nsInProcessTabChildGlobal::Disconnect()? There're some survived 3 rounds of GC/CC from CCAnalyzer.

Assignee

Comment 67

•

9 years ago

Just doing some guesswork here - I think this is the lifetime of an nsInProcessTabChildGlobal - Creation: https://dxr.mozilla.org/mozilla-central/rev/6ea654cad929c9bedd8a4161a182b6189fbeae6a/dom/base/nsFrameLoader.cpp#2607 When the FrameLoader is destroyed, and once all messages from the message manager have been dispatched / serviced, we enter here: https://dxr.mozilla.org/mozilla-central/rev/6ea654cad929c9bedd8a4161a182b6189fbeae6a/dom/base/nsFrameLoader.cpp#1522 Which calls ::Disconnect. Then, _I think_, when CC starts, one of the phases it goes through involves marking message managers for CC: https://dxr.mozilla.org/mozilla-central/rev/6ea654cad929c9bedd8a4161a182b6189fbeae6a/dom/base/nsCCUncollectableMarker.cpp#447 It might be worth logging how many times we call nsFrameLoader::Destroy, nsInProcessTabChildGlobal::Disconnect, and how many times we call the nsInProcessTabChildGlobal destructor...

Comment 68

•

9 years ago

(In reply to Olli Pettay [:smaug] (high review load) from comment #63) > If anyone has STR (not tryserver run) to get mAnonymousGlobalScopes showing > up in the graph, I'd be happy to look at it. nsInProcessTabChildGlobal::MarkForCC() does not get called for every instance, I can see it by running: ./mach mochitest browser/base/content/test/general/browser_syncui.js :mconley's patch here is not required. Note some instances (and nsFrameLoader instances) are destructed very late when nsCycleCollector::Shutdown(). I don't know if it is expected.

Comment 69

•

9 years ago

That is not expected. nsInProcessTabChildGlobal::MarkForCC() is called for those objects which are still bound to the frameloader, in other words when the relevant xul:browser is still in DOM tree. If we have nsInProcessTabChildGlobal after that in CC graph, it means some kind of leak. Good thing is that apparently those are released during shutdown so analyzing the leak should be easier. Need to track who is calling Release.

Comment 70

•

9 years ago

Does it matter that nsInProcessTabChildGlobal's CC traverse doesn't call NS_IMPL_CYCLE_COLLECTION_TRAVERSE_SCRIPT_OBJECTS? I don't think adding that in affects mconley's patches, but it might make debugging easier...?

Comment 71

•

9 years ago

DOMEventTargetHelper has NS_IMPL_CYCLE_COLLECTION_TRAVERSE_SCRIPT_OBJECTS

Comment 72

•

9 years ago

(In reply to Olli Pettay [:smaug] (high review load) from comment #69) > nsInProcessTabChildGlobal::MarkForCC() is called for those objects which are > still bound to the frameloader, in other words when the relevant xul:browser > is still in DOM tree. I could clarify this. MarkForCC does not happen during shutdown CC, even if InProcessTabChildGlobal was in DOM tree. Also, at some point CC needs to traverse/unlink InProcessTabChildGlobal, so it is expected to see those objects in CC graph after closing a tab. But after that they should get deleted. Is there a testcase showing that after closing a window or a tab an instance of InProcessTabChildGlobal stays in the CC graph?

Comment 73

•

9 years ago

And if there is, a way to debug that is to capture stack traces for each nsInProcessTabChildGlobal::Release calls. Some of them probably shows what is keeping the object alive too long.

Assignee

Comment 74

•

9 years ago

(In reply to Olli Pettay [:smaug] (HIGH REVIEW LOAD) from comment #72) > (In reply to Olli Pettay [:smaug] (high review load) from comment #69) > > nsInProcessTabChildGlobal::MarkForCC() is called for those objects which are > > still bound to the frameloader, in other words when the relevant xul:browser > > is still in DOM tree. > I could clarify this. MarkForCC does not happen during shutdown CC, even if > InProcessTabChildGlobal was in DOM tree. > Also, at some point CC needs to traverse/unlink InProcessTabChildGlobal, so > it is expected to see those objects > in CC graph after closing a tab. But after that they should get deleted. > > > Is there a testcase showing that after closing a window or a tab an instance > of InProcessTabChildGlobal stays in the CC graph? I believe so - according to ting, that's browser/base/content/test/general/browser_syncui.js (with or without my patch).

Comment 75

•

9 years ago

How do I run that so that browser isn't shutdown automatically, but the relevant tabs have been removed? ./mach mochitest browser/base/content/test/general/browser_syncui.js just runs the test and closes the browser.

Comment 76

•

9 years ago

Ok, #0 0x00007f5e3339a6a0 in nsInProcessTabChildGlobal::Release() (this=0x7f5e062954f0) at /home/smaug/mozilla/hg/inbound2/dom/base/nsInProcessTabChildGlobal.cpp:177 #1 0x00007f5e3339a6f9 in non-virtual thunk to nsInProcessTabChildGlobal::Release() () at /home/smaug/mozilla/hg/inbound2/dom/base/nsInProcessTabChildGlobal.cpp:175 #2 0x00007f5e3337de6d in nsCOMPtr<nsIInProcessContentFrameMessageManager>::assign_assuming_AddRef(nsIInProcessContentFrameMessageManager*) (this=0x7f5e061eea08, aNewPtr=0x0) at /home/smaug/mozilla/hg/inbound2/ff_build/dist/include/nsCOMPtr.h:380 #3 0x00007f5e3337dd43 in nsCOMPtr<nsIInProcessContentFrameMessageManager>::assign_with_AddRef(nsISupports*) (this=0x7f5e061eea08, aRawPtr=0x0) at /home/smaug/mozilla/hg/inbound2/ff_build/dist/include/nsCOMPtr.h:1066 #4 0x00007f5e333529ef in nsCOMPtr<nsIInProcessContentFrameMessageManager>::operator=(nsIInProcessContentFrameMessageManager*) (this=0x7f5e061eea08, aRhs=0x0) at /home/smaug/mozilla/hg/inbound2/ff_build/dist/include/nsCOMPtr.h:578 #5 0x00007f5e33350a79 in ImplCycleCollectionUnlink<nsIInProcessContentFrameMessageManager>(nsCOMPtr<nsIInProcessContentFrameMessageManager>&) (aField=...) at /home/smaug/mozilla/hg/inbound2/ff_build/dist/include/nsCOMPtr.h:1045 #6 0x00007f5e33324a3d in nsFrameLoader::cycleCollection::Unlink(void*) (this=0x7f5e3b0034e8 <nsFrameLoader::_cycleCollectorGlobal>, p=0x7f5e061ee9d0) at /home/smaug/mozilla/hg/inbound2/dom/base/nsFrameLoader.cpp:131 looks like a problematic release call. Why do we have frameloader pointing to the tabchildglobal still so late. My guess is currently that testing framework, or test does something unexpected so that the MarkForCC optimization can't kick in.

Updated

•

9 years ago

Depends on: 1249226

Comment 77

•

9 years ago

I've spotted two places leak nsFrameLoader, not sure yet are they all the leaks. Separated this to bug 1249226.

https://treeherder.mozilla.org/#/jobs?repo=try&revision=55f4f86d8dd3

Assignee

Comment 78

•

9 years ago

Assignee

Comment 79

•

9 years ago

SUCCESS! Along with the patches that smaug landed on inbound earlier today for bug 1249439 and bug 1249451, as well as this patch he supplied over IRC: https://hg.mozilla.org/try/rev/43ea3c4fdb9d We've got ourselves a green try build! https://treeherder.mozilla.org/#/jobs?repo=try&revision=73570b27131d Huzzah! Wow! Thanks so much everybody! I'm doing a second try build[1] without the patch from smaug just to see if the other patches that smaug landed today ended up fixing things on their own. Hey smaug - assuming that the last patch (the one that clears the mMessageManager and mChildMessageManager) is helping to clear the orange here, is that patch for another bug, or should we make it part of this one? [1]: https://treeherder.mozilla.org/#/jobs?repo=try&revision=db90213e0e76

Assignee

Comment 80

•

9 years ago

ni? for comment 79.

Flags: needinfo?(bugs)

Comment 81

•

9 years ago

I think it would make sense to land that nsFrameLoader patch in a separate bug. It tends to simplify regression tracking when landing somewhat unrelated patches in separate bugs.

Flags: needinfo?(bugs)

Updated

•

9 years ago

Depends on: 1249795

Assignee

Comment 82

•

9 years ago

Looking at the trybuild without bug 1249795, smaug's patch seems, at least by eyeballing, to make the orange far less likely. So I guess I'll wait for that to land before (finally) landing this thing.

Comment 83

•

9 years ago

if that is the case, we should definitely fix also bug 1249226. Not for this bug, but to just not leak frameloaders by keeping them alive in JS. It is so easy to leak in JS :/

https://treeherder.mozilla.org/#/jobs?repo=try&revision=ddb5861e2da4

Assignee

Comment 84

•

9 years ago

Assignee

Comment 85

•

9 years ago

https://hg.mozilla.org/integration/fx-team/rev/7eb1382d6cef49a27f7ced970a2b135f0f20f014 Bug 1195295 - Remove SessionStore's SyncHandler since all tab and window flushing is now async. r=ttaubert

https://hg.mozilla.org/mozilla-central/rev/7eb1382d6cef

Updated

•

9 years ago

No longer depends on: 1249226

Phil Ringnalda (:philor)

Comment 86

•

9 years ago

bugherder

Status: REOPENED → RESOLVED

Closed: 9 years ago → 9 years ago

status-firefox47: --- → fixed

Resolution: --- → FIXED

Target Milestone: --- → Firefox 47

Carsten Book [:Tomcat]

Comment 87

•

9 years ago

is there anything need to be done for 45/46 ?

Flags: needinfo?(sledru)

Flags: needinfo?(mconley)

Assignee

Comment 88

•

9 years ago

Nothing needs to be done for 45. For 46, uplifting is contingent on the dependencies for the frameloader / browser leaks also being uplifted, which we might want for the leaks they address. IMO, we might want to uplift the leak fixes just because they're leak fixes. mccr8, as someone who deals with leaks pretty regularly, do you think it's worth uplifting those leak fixes to aurora? Or should we just let them ride the trains as per usual? The actual removal of the SyncHandler is not something that I think warrants uplift.

Flags: needinfo?(mconley) → needinfo?(continuation)

Comment 89

•

9 years ago

(In reply to Mike Conley (:mconley) - Needinfo me! from comment #88) > mccr8, as someone who deals with leaks pretty regularly, do you think it's > worth uplifting those leak fixes to aurora? Or should we just let them ride > the trains as per usual? These don't seem like very severe leaks, if they are only showing up in our testing, and only in a few tests. I don't entirely understand the patches, though, so the patch authors may disagree.

Flags: needinfo?(continuation)

Sylvestre Ledru [:Sylvestre]

Comment 90

•

9 years ago

bug 1249795 is a bit risky, or not risk-free at least, so give it at least couple of days testing time on Nightly. bug 1249439 should be quite safe.

Comment 91

•

9 years ago

"Nothing needs to be done for 45" I like when people are talking to me like that :p

status-firefox45: affected → unaffected

Flags: needinfo?(sledru)

Assignee

Comment 92

•

9 years ago

(In reply to Sylvestre Ledru [:sylvestre] from comment #91) > "Nothing needs to be done for 45" I like when people are talking to me like > that :p :) Note however that we _do_ still have that sync message still being sent in 45 and 46. I just don't think we want to uplift given the risk of the patches that this one relies on in order to cause oranges on CI.

status-firefox45: unaffected → wontfix

status-firefox46: affected → wontfix