Closed
Bug 1261143
Opened 9 years ago
Closed 8 years ago
Ghost windows on web.telegram.org leading to long CC times
Categories
(Core :: Storage: IndexedDB, defect)
Tracking
()
RESOLVED
FIXED
mozilla50
People
(Reporter: wcpan, Assigned: wcpan)
References
(Blocks 1 open bug)
Details
(Whiteboard: btpp-followup-2016-05-06)
Attachments
(4 files, 9 obsolete files)
381.70 KB,
text/plain
|
Details | |
9.83 KB,
text/plain
|
Details | |
1.03 KB,
text/html
|
Details | |
1.01 KB,
patch
|
khuey
:
review+
lizzard
:
approval-mozilla-aurora+
|
Details | Diff | Splinter Review |
If you need concise or verbose CC/GC log I can upload it to somewhere.
Assignee | ||
Comment 1•9 years ago
|
||
Log in browser console while javascript.options.mem.log=true .
Updated•9 years ago
|
Component: General → JavaScript: GC
Product: Firefox → Core
Assignee | ||
Comment 2•9 years ago
|
||
Attachment #8736831 -
Attachment is obsolete: true
Attachment #8736834 -
Attachment is obsolete: true
Assignee | ||
Comment 3•9 years ago
|
||
find_roots result: 0x17d1029c0 [DOMEventTargetHelper ] --[mScriptOwner]--> 0x1234c6920 [JS Object (Window)] --[UnwrapDOMObject(obj)]--> 0x1254d6400 [nsGlobalWindow #37 inner https://web.telegram.org/#/im?p=@YSITD] Root 0x17d1029c0 is a ref counted object with 1 unknown edge(s). known edges: 0x17d853190 [IDBObjectStore] --[mTransaction]--> 0x17d1029c0
Assignee | ||
Comment 4•9 years ago
|
||
Leak statistics when enabled XPCOM_MEM_REFCNT_LOG
Assignee | ||
Comment 5•9 years ago
|
||
Parsed from attachment 8745913 [details]
Assignee | ||
Comment 6•9 years ago
|
||
What I do is: 1. open Firefox with a clean profile 2. set memory.ghost_window_timeout_seconds to 1 3. login to web.telegram.org 4. randomly scrolling chats, clicking links on the chats 5. close the tab 6. open about:memory 7. GC & CC 8. measure memory usage Sometimes it will cause 2 ghost windows, sometimes it doesn't. Just repeat the steps until the ghost windows appear.
Assignee | ||
Comment 7•9 years ago
|
||
Can not reproduce on Linux (Nightly). The above logs were generated on Mac OS (Gecko 46).
Assignee | ||
Comment 8•9 years ago
|
||
Can not reproduce on Nightly and Mac OS too. Maybe this is fixed in m-c?
Updated•9 years ago
|
Summary: Cycle collection takes long time (> 1000ms) → Ghost windows on web.telegram.org leading to long CC times
Updated•9 years ago
|
Blocks: GhostWindows
Component: JavaScript: GC → DOM: IndexedDB
Comment 10•9 years ago
|
||
I'll try to find some more time tomorrow to look at this. (sorry wpan, had to run to a meeting today) If this is FF46 only, finding the fix-range (not regression range this time) would be great, so that we can push the fixes to FF47 (assuming it leaks too) and to FF46 (leaks are bad, so even release builds should get those fixes asap, if possible, IMO)
Flags: needinfo?(wpan)
Assignee | ||
Comment 12•9 years ago
|
||
Does not leak on Linux with 46, maybe a Mac OS only problem. For Mac OS, 47 still got leaks, but only one ghost-window each time, not two. 48 does not seems to leak, need to check if it is an e10s-related issue.
Assignee | ||
Comment 13•9 years ago
|
||
Seems not e10s-related. I guess now I have to bisect then.
Comment 14•9 years ago
|
||
ahaa, I don't have OSX with me now. Andrew, could you perhaps help here? Looks like telegram needs some phonenumber to register to, but a prepaid number was ok too.
Flags: needinfo?(continuation)
Updated•9 years ago
|
Whiteboard: btpp-followup-2016-05-06
Assignee | ||
Comment 15•9 years ago
|
||
Another refcnt allocation log during bisecting.
Assignee | ||
Comment 16•9 years ago
|
||
This log shows a normal, balanced version of attachment 8745914 [details].
This object should be cycle collectable via IDBObjectStore, but I don't know why sometimes it doesn't work.
Assignee | ||
Comment 17•9 years ago
|
||
I guess I've found which object is holding those IDBTransaction objects. They are hold by CycleCollectedJSRuntime. For some reasons IDBTransaction will be created while RecursionDepth() is 0, but CycleCollectedJSRuntime::ProcessMetastableStateQueue never run at aRecursionDepth == 0 until its destruction, so they don't have a chance to leave the queue. Bug 1195755 also related to this I guess. I've tried bholley's testcase in that bug, while it does not leak nsGlobalWindow, it still caused shutdown crash. I'll find another reduced testcase later. The attached log is the leaked object's refcnt.log. Do you have any suggestion, Kyle?
Attachment #8745899 -
Attachment is obsolete: true
Attachment #8745911 -
Attachment is obsolete: true
Attachment #8745913 -
Attachment is obsolete: true
Attachment #8745914 -
Attachment is obsolete: true
Attachment #8748045 -
Attachment is obsolete: true
Attachment #8750164 -
Attachment is obsolete: true
Flags: needinfo?(khuey)
Assignee | ||
Comment 18•9 years ago
|
||
BTW, to reproduce this on web.telegram.org, I've found an easier way: 1. open Firefox with a clean profile 2. set memory.ghost_window_timeout_seconds to 1 3. login to web.telegram.org 4. find some groups that have unread messages 5. keep clicking those groups 6. close the tab 7. open about:memory 8. GC & CC 9. measure memory usage
Can you get the stack for where we create the IDBTransaction with RecursionDepth() == 0?
Flags: needinfo?(khuey) → needinfo?(wpan)
Assignee | ||
Comment 20•9 years ago
|
||
I disabled the ifndef and let it crashes at MOZ_ASSERT (the code is based on the most recent mozilla-beta).
Flags: needinfo?(wpan)
Comment 21•9 years ago
|
||
It looks like wcpan and khuey are investigating this.
Flags: needinfo?(continuation)
Yuck. We should probably do a microtask checkpoint (i.e. call http://mxr.mozilla.org/mozilla-central/source/xpcom/base/CycleCollectedJSRuntime.cpp#1368 after processing Cocoa events). Unfortunately I don't know where that is ... probably somewhere in the Mac appshell implementation. I *think* on other platforms this doesn't happen (i.e. we call into the native event loop from the XPCOM event loop, not the other way around) but we should verify that.
Assignee | ||
Comment 23•9 years ago
|
||
This is a reduced testcase. It looks weird because the original site uses jQuery/AngularJS, that has their own event/promise implementation. This is the closest code path that I can come up with.
Assignee | ||
Updated•9 years ago
|
OS: Unspecified → Mac OS X
Assignee | ||
Comment 24•9 years ago
|
||
I should correct my previous inspection, obviously this will affect on all branches, in non-e10s mode. In e10s mode, content processes will spin another event loop so they won't cause this problem. (In reply to Kyle Huey [:khuey] (khuey@mozilla.com) from comment #22) > Yuck. We should probably do a microtask checkpoint (i.e. call > http://mxr.mozilla.org/mozilla-central/source/xpcom/base/ > CycleCollectedJSRuntime.cpp#1368 after processing Cocoa events). > Unfortunately I don't know where that is ... probably somewhere in the Mac > appshell implementation. > > I *think* on other platforms this doesn't happen (i.e. we call into the > native event loop from the XPCOM event loop, not the other way around) but > we should verify that. From my inspection, both CycleCollectedJSRuntime::RunInMetastableState and CycleCollectedJSRuntime::ProcessMetastableStateQueue are called from the XPCOM event loop on Linux and Windows. So we have several options: 1. Call CycleCollectedJSRuntime::AfterProcessTask in somewhere. I have tried this in several places (in nsAppShell::ProcessGeckoEvents ... etc.), but I always got it crashes. Maybe I tried wrong places. 2. Tweak the event loop to make it behaves like other platforms. (i.e. always call from the XPCOM event loop) I personally prefer this option, but I guess this will introduce some side effects, just like option 1. 3. Hack the recursion count Set it to mRecursionDepth + 1 if it is less then mRecursionDepth. Seems passed all tests. I think this is easier to back-port to earlier branches. https://treeherder.mozilla.org/#/jobs?repo=try&revision=e422d5bb09fa
I think we should do #3 for the moment. If you were waiting on a response from me before proceeding, you should have used needinfo.
Assignee | ||
Comment 26•8 years ago
|
||
OK, I'll remember that. I've tested this patch in my daily use, didn't see any ghost window again. Also passed tests on try: https://treeherder.mozilla.org/#/jobs?repo=try&revision=320f10229392 I'll send a review request after you are back.
Assignee | ||
Comment 27•8 years ago
|
||
Attachment #8762838 -
Flags: review?(khuey)
Comment 28•8 years ago
|
||
Comment on attachment 8762838 [details] [diff] [review] Bug-1261143-Hack-recursion-depth-in-metastable-state.patch Review of attachment 8762838 [details] [diff] [review]: ----------------------------------------------------------------- Could we include a comment describing why we need to do this? The nested event loop situation on Mac is kind of obscure and it's not immddiately clear what symptom this code is addressing.
Is it possible to increase the recursion depth in the cocoa appshell? I would prefer that to papering over it here ...
Flags: needinfo?(wpan)
We talked in person, and the answer is no.
Flags: needinfo?(wpan)
Comment on attachment 8762838 [details] [diff] [review] Bug-1261143-Hack-recursion-depth-in-metastable-state.patch Review of attachment 8762838 [details] [diff] [review]: ----------------------------------------------------------------- Please add a comment explaining this.
Attachment #8762838 -
Flags: review?(khuey) → review+
Assignee | ||
Comment 32•8 years ago
|
||
Add comment that points to this bug.
Attachment #8762838 -
Attachment is obsolete: true
Attachment #8763774 -
Flags: review?(khuey)
Attachment #8763774 -
Flags: review?(khuey) → review+
Assignee | ||
Updated•8 years ago
|
Keywords: checkin-needed
Comment 33•8 years ago
|
||
Pushed by cbook@mozilla.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/e2b3bf3ebafe Hack recursion depth in metastable state for Mac OS. r=khuey
Keywords: checkin-needed
Comment 34•8 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/e2b3bf3ebafe
Status: NEW → RESOLVED
Closed: 8 years ago
status-firefox50:
--- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla50
Updated•8 years ago
|
Assignee: nobody → wpan
I would be ok with uplifting it to Aurora.
Flags: needinfo?(khuey)
Comment 37•8 years ago
|
||
Wei-Chen, do you mind getting this uplifted to aurora 49? We want to uplift since it leads to more memory leaks on mac.
status-firefox47:
--- → wontfix
status-firefox48:
--- → wontfix
status-firefox49:
--- → affected
tracking-firefox49:
--- → ?
Flags: needinfo?(wpan)
Assignee | ||
Comment 38•8 years ago
|
||
Comment on attachment 8763774 [details] [diff] [review] Bug-1261143-Hack-recursion-depth-in-metastable-state.patch Approval Request Comment [Feature/regressing bug #]: bug 1264566 [User impact if declined]: Mac users may suffer from memory leaks on certain sites and lead to bad experience. [Describe test coverage new/current, TreeHerder]: https://treeherder.mozilla.org/#/jobs?repo=try&revision=320f10229392 [Risks and why]: It does not break IndexedDB or cause crash, I think it's safe to uplift. [String/UUID change made/needed]: None.
Flags: needinfo?(wpan)
Attachment #8763774 -
Flags: approval-mozilla-aurora?
Assignee | ||
Updated•8 years ago
|
Attachment #8763774 -
Flags: approval-mozilla-aurora?
Assignee | ||
Comment 39•8 years ago
|
||
Comment on attachment 8763774 [details] [diff] [review] Bug-1261143-Hack-recursion-depth-in-metastable-state.patch Approval Request Comment [Feature/regressing bug #]: bug 1261143 [User impact if declined]: Mac users may suffer from memory leaks on certain sites and lead to bad experience. [Describe test coverage new/current, TreeHerder]: https://treeherder.mozilla.org/#/jobs?repo=try&revision=320f10229392 [Risks and why]: It does not break IndexedDB or cause crash, I think it's safe to uplift. [String/UUID change made/needed]: None.
Attachment #8763774 -
Flags: approval-mozilla-aurora?
Assignee | ||
Comment 40•8 years ago
|
||
Revised the bug number.
Comment 42•8 years ago
|
||
Comment on attachment 8763774 [details] [diff] [review] Bug-1261143-Hack-recursion-depth-in-metastable-state.patch Regression from 45, let's uplift this memory leak fix.
Attachment #8763774 -
Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Comment 43•8 years ago
|
||
This would be good to verify in aurora, maybe with the test case here and the one in bug 1195755 (or known affected sites)
Flags: needinfo?(cornel.ionce)
Comment 44•8 years ago
|
||
bugherder uplift |
https://hg.mozilla.org/releases/mozilla-aurora/rev/8902dc8b9ab1
Comment 45•8 years ago
|
||
I tried reproducing this bug on multiple stations using Fx 48.0a1, build ID 20160411030231 and 20160331030231 but could not do so. Can you please verify the fix?
Flags: needinfo?(cornel.ionce)
Assignee | ||
Comment 46•8 years ago
|
||
I can verify this on 48.0b6.
1. open attachment 8751693 [details]
2. click the button
3. close the tab
4. GC & CC in about:memory
5. measure and search bmoattachments, you should see something marked as detached
6. wait until it becomes ghost-window (somehow memory.ghost_window_timeout_seconds does not seems to work in this case)
You need to log in
before you can comment on or make changes to this bug.
Description
•