Closed
Bug 639262
Opened 13 years ago
Closed 10 years ago
pinwheeling after resume due to GC/CC
Categories
(Firefox :: General, defect)
Tracking
()
RESOLVED
INCOMPLETE
People
(Reporter: dietrich, Unassigned)
References
Details
(Keywords: hang, perf, Whiteboard: [snappy:p2])
Attachments
(3 files)
I started using a Mac more often recently, and experience this routinely: 1. use browser 2. close lid 3. open lid 4. try to use browser Expected: Can interact with browser. Actual: Browser is non-responsive, pinwheel shows up, for up to 20 seconds. Then browser is usable again. Other applications are responsive during this period of time.
Comment 1•13 years ago
|
||
Can you grab a profile while this is happening? I don't think I see this on my MBP...
Reporter | ||
Comment 2•13 years ago
|
||
This is a Spin Control report from a Macbook Air 2.13ghz/4gb/10.6.6. I can reproduce consistently after a suspend/resume cycle of significant time. Going to sleep and immediately resuming doesn't trigger it. https://people.mozilla.com/~dietrich/hang-on-resume.txt
Comment 3•13 years ago
|
||
Pretty sure this is caused by F1. I have the same thing happening, and it goes away when I disable it. (Adding in some F1 people on CC)
Comment 4•13 years ago
|
||
:dietrich, are you using f1?
Reporter | ||
Comment 5•13 years ago
|
||
I did have it installed and enabled. I disabled it yesterday, while trying to narrow down a memory leak. But I have not had a sleep/resume cycle since, so can't say if it's related yet.
Comment 6•13 years ago
|
||
Might have been a false alarm as far as F1 goes, sorry about that. I disabled it, and the hang went away, but after waking up my computer today without F1 enabled, I got the same hang. Attached is a sample of the process. I assume we should do this sample with a build with debug symbols for it to be any use? Let me know what I can do to help track this down. (the sample was 8MB, so had to create a zip file of it)
Reporter | ||
Comment 7•13 years ago
|
||
FWIW, I've now resumed from sleep a couple of times. No hang with F1 disabled. However, I disabled a bunch of other add-ons as well at the time that I disabled F1.
Comment 8•13 years ago
|
||
For F1 we're using bug 642629 to track down the problem and believe to have it fixed in bug 646598. A hotfix of the add-on code should be pushed out fairly soon. If it is the F1 issue you'll need to restart your browser at some point to clear out the hang but it will likely come back until the new add-on is pushed out.
Comment 9•13 years ago
|
||
We've got a new version of the add-on out with the fix in, if you can update to that and give your Firefox a restart we can start to rule out F1 as the cause.
Reporter | ||
Comment 10•13 years ago
|
||
I was definitely getting beachballs without F1 enabled over the last week. Not as much though, so it feels like it was contributing to the problem. I've just now updated to the latest F1 and have enabled it.
Reporter | ||
Comment 12•13 years ago
|
||
yes! i need to disable all add-ons, try to narrow it down.
Reporter | ||
Updated•13 years ago
|
Whiteboard: [tsnap] → [snappy]
Updated•13 years ago
|
Whiteboard: [snappy] → [snappy:p2]
Reporter | ||
Comment 13•13 years ago
|
||
i see the files for session restore, localStorage and cookies all written-to while the app is frozen. this bug could be a combination of many parts of the app all handling the firing of timers/events after resuming the OS, each doing synchronous work or disk IO. or it could be mostly one of the features listed above. i'll continue to investigate. cookies writes many times during the frozen period, but supposedly writes its file async and shouldn't block while writing. it could be doing too much synchronous work prior to writing, like some of the session restore bugs we've found.
Reporter | ||
Comment 14•12 years ago
|
||
Reporter | ||
Comment 15•12 years ago
|
||
looks like it could very well be bug 722646. basically, we come back from resume, and the idle service considers that time "idle" since no keyboard/mouse interaction, and then Places immediately does a crap-ton of synchronous I/O before you have time to even hit a key.
Depends on: 722646
Comment 16•12 years ago
|
||
(In reply to Dietrich Ayala (:dietrich) from comment #15) > then Places immediately > does a crap-ton of synchronous I/O before you have time to even hit a key. Just to clarify all the Places work is asynchronous on idle, though it assumes user is idle. if you are not it's likely something tries to use a synchronous api and contends the mutex with the async work.
Reporter | ||
Comment 17•12 years ago
|
||
I installed memchaser recently, so am always seeing GC/CC times in the add-on bar (which is a scary way to live). https://addons.mozilla.org/en-US/firefox/addon/memchaser/ This morning, I opened up the laptop, got the long pinwheel again, and noticed that GC was ~20 seconds.
Comment 18•12 years ago
|
||
Dietrich has about 400 unrestored tabs, which is probably unique. Maybe our scheduling is failing.
Assignee: nobody → continuation
Comment 19•12 years ago
|
||
Bill, Olli, can you think of any reason why resuming with a ton of unrestored tabs would cause hideous GC and CC times? (Resuming meaning resuming the whole computer from sleep, the browser is just staying there).
Comment 20•12 years ago
|
||
Dietrich, what kinds of CC times have you seen after resume? Is it possible that because of idle service announcing idleness we try to do lots of I/O and while doing that we also GC/CC and that all uses lots of memory and we end up tons of page faults? Does about:memory show page faults on OSX?
Reporter | ||
Comment 21•12 years ago
|
||
(In reply to Olli Pettay [:smaug] from comment #20) > Dietrich, what kinds of CC times have you seen after resume? I've seen up to 11 seconds. The only screenshot I have is one that's 1.5 seconds: http://cl.ly/2m1r1S36023s0V3j3I0r > Is it possible that because of idle service announcing idleness we try to do > lots of I/O and while > doing that we also GC/CC and that all uses lots of memory and we end up tons > of page faults? > Does about:memory show page faults on OSX? Current value is below, but I've just restarted the browser. 288 ── page-faults-hard 913,319 ── page-faults-soft What would be a lot? Should I look after a resume that pinwheels?
Comment 22•12 years ago
|
||
Hmm. If I added stuff about the size of the CC graph to the error console, then we could see if there is a huge amount of stuff in the graph (eg resuming somehow causes us to look at a lot more stuff in the CC), or if the CC is just super slow. If it is the latter, then maybe we could delay the CC by an extra 10 seconds like we do on startup.
Comment 23•12 years ago
|
||
mitchell is seeing this too. She's seated next to me and will show me the next time it happens. Is there anything I should look for that would tell me if it was the same problem Dietrich is seeing?
Comment 24•12 years ago
|
||
If you go to about:config and set javascript.options.mem.log to true, then you can look in the error console and see GC and CC times. Alternatively, you can install the MemChaser addon ( https://addons.mozilla.org/en-US/firefox/addon/memchaser/ ) and that will show the most recent GC and CC times.
Comment 25•12 years ago
|
||
For either option, that has to be done before you resume, it can't be done after the fact. A third option would be installing about:telemetry addon ( https://addons.mozilla.org/en-US/firefox/addon/abouttelemetry/ ) and seeing there are some really long GCs or CCs, but that just will show you if you've had any since you last started up, or in the last 24 hours.
Reporter | ||
Comment 26•12 years ago
|
||
incremental GC might have fixed this. i have not seen it since.
Reporter | ||
Comment 27•12 years ago
|
||
Nope, still around. Memchaser has been broken for GC since incremental GC landed, but CC post-resume was 4 seconds this time: http://cl.ly/0V292l0u3q3N0v380X1P
Comment 28•12 years ago
|
||
Okay, I'll try to recreate it.
Summary: pinwheeling after resume → pinwheeling after resume due to GC/CC
Comment 29•12 years ago
|
||
Note, incremental GC gets disabled usually, see Bug 728686. But ugh, that is a horrible CC time. Could you perhaps try about:jank addon. Check what the values about before suspending, and then after resume check again.
Comment 30•12 years ago
|
||
Also, as I said before, perhaps page faults cause it. Running CC/GC needs to bring all sorts of objects to memory. So, if you can reproduce this easily, check about:memory before suspending and after resume.
Reporter | ||
Comment 31•12 years ago
|
||
Hard to check before/after easily, since I can't reproduce on-demand and don't really think about this every time i close the lid of my laptop... would quite likely remember *just* after ;) FWIW, here are screenshots of after: about:jank: http://cl.ly/2c3T16420O2L270X191O about:memory 1: http://cl.ly/2L3m3U2s1S0z2f3F3A40 about:memory 2: http://cl.ly/1z1c3F341x2K3W1j382Y
Reporter | ||
Comment 32•12 years ago
|
||
Memchaser makes tracking memory a bit easier, since the pre-resume value is visible: resident before resume: 1127mb resident immediate after resume: 262mb resident 10 seconds after resume: 944mb 2038ms CC, and unknown GC because incremental GC broke Memchaser.
Reporter | ||
Comment 33•12 years ago
|
||
Same pattern today for resident: 1585mb -> pinwheeling -> 350mb -> a few seconds -> 730mb and can use normally thereafter.
Reporter | ||
Comment 34•12 years ago
|
||
This one was serious business: http://cl.ly/3M3X1F1D0W322p3p3O2r
Comment 35•12 years ago
|
||
I wonder what we could do to this. If lots of tabs are open, lots of memory is used. Then, suspending/resuming may cause data to be moved from memory to disk and back to memory. (That is what I guess is happening.) If I/O is slow, moving data back to memory can take time. So, we should reduce memory usage. But we could also try to postpone GC and CC after resuming. That way more data is hopefully in memory when GC and CC actually run.
Comment 36•12 years ago
|
||
Dietrich, if you have a recent (in the last week or so) could you try setting javascript.options.mem.log to true in about:config? Then when this happens go to the error console and copy the line about the CC here. It will look something like this: CC(T+184517.8) duration: 56ms, suspected: 153, visited: 7462 RCed and 5548 GCed, collected: 52 RCed and 2 GCed (54 waiting for GC) ForgetSkippable 4 times before CC, min: 0 ms, max: 13 ms, avg: 3 ms, total: 15 ms, removed: 370 If the visited part is not unusually large (also copy in the information about the CC after it gets back to a normal length), that would suggest that scanning each thing is long (like if it is due to paging), as opposed to their being some weird leak. If you also copy in the information about long GCs that will be in the error console that could be helpful, too.
Comment 37•12 years ago
|
||
If you need more data, I have the exact same issue.
Comment 38•12 years ago
|
||
Kadir, how many tabs do you usually have open? What version of Firefox are you running? Thanks.
Comment 39•12 years ago
|
||
My computer just came up from a fairly long suspend, after I forgot to turn on memchaser logging, so here's some about:telemetry. One thing of note is that I've got an SSD--before I got it I had resume times in the *minutes*. I'm not 100% certain that this is Firefox's fault, I've been having this issue for a pretty long time and I vaguely recall thinking that this *wasn't* Fx's fault because once I resumed and it happened and Firefox wasn't even running. But I know Gnome used to use gecko all through all kinds of things and so even if Firefox wasn't the culprit it may have been some embedded XUL somewhere. (This has been happening for at least 12 months.) At the very least I think I can confirm that something similar happens on Linux.
Attachment #603584 -
Attachment mime type: text/plain → text/html
Reporter | ||
Comment 40•12 years ago
|
||
Andrew, here are the GC/CC entries from the console after a long post-resume hang. Also, these are sent to the console every 3 - 15 seconds, with GCs often around 500ms. CC(T+27710.4) duration: 1745ms, suspected: 6083, visited: 17338 RCed and 5848 GCed, collected: 5670 RCed and 0 GCed (5670 waiting for GC) ForgetSkippable 14 times before CC, min: 4 ms, max: 547 ms, avg: 83 ms, total: 1163 ms, removed: 12571 GC(T+27718.6) TotalTime: 7820.2ms, Type: global, MMU(20ms): 0%, MMU(50ms): 0%, Reason: PAGE_HIDE, NonIncrementalReason: incremental permanently disabled, +chunks: 0, -chunks: 0 mark: 6992.0, mark-roots: 309.2, mark-other: 352.1, sweep: 735.1, sweep-obj: 64.2, sweep-string: 10.3, sweep-script: 2.6, sweep-shape: 69.0, discard-code: 28.1, discard-analysis: 134.4, xpconnect: 53.4, deallocate: 3.1
Reporter | ||
Comment 41•12 years ago
|
||
(In reply to Olli Pettay [:smaug] from comment #35) > I wonder what we could do to this. If lots of tabs are open, lots of memory > is used. > Then, suspending/resuming may cause data to be moved from memory to disk and > back to memory. What data, exactly? Is there a way to tell? A very large transfer of data out of and back into memory sounds exactly like the pattern I see when I experience this problem: pre-resume at 1gb resident, then after resuming and hanging, resident is <200mb, and boom a few seconds later it's back up to ~800mb resident.
Comment 42•12 years ago
|
||
(In reply to Andrew McCreight [:mccr8] from comment #38) > Kadir, how many tabs do you usually have open? What version of Firefox are > you running? Thanks. loaded tabs: ~15 undloaded tabs: ~30 Firefox 12.0a2 (2012-03-13) on Mac OS 10.7
I found a page that shows how we can be notified when a machine resumes: https://developer.apple.com/library/mac/#qa/qa1340/_index.html Maybe when this happens, we should kill any existing GC and CC timers and schedule new ones in 30 seconds, as Olli suggested.
Comment 44•12 years ago
|
||
I'm going to WFM this. Please reopen if you still see this.
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → WORKSFORME
Reporter | ||
Comment 45•12 years ago
|
||
Just had a 33 second GC after resuming.
Status: RESOLVED → REOPENED
Resolution: WORKSFORME → ---
Updated•12 years ago
|
Keywords: testcase-wanted
Comment 46•11 years ago
|
||
Any better now that bug 765215 is fixed in Firefox 20?
Flags: needinfo?(dietrich)
Updated•11 years ago
|
Comment 46 is private:
false
Reporter | ||
Comment 47•11 years ago
|
||
I've seen it pretty bad in the last few days actually.
Flags: needinfo?(dietrich)
Comment 48•10 years ago
|
||
I'm not really sure what's going on with this one, but leaving it open doesn't seem very useful at this point.
Assignee: continuation → nobody
Status: REOPENED → RESOLVED
Closed: 12 years ago → 10 years ago
Resolution: --- → INCOMPLETE
Updated•9 years ago
|
Keywords: testcase-wanted
You need to log in
before you can comment on or make changes to this bug.
Description
•