Closed Bug 639262 Opened 14 years ago Closed 11 years ago

pinwheeling after resume due to GC/CC

Categories

(Firefox :: General, defect)

Product:

Component:

Platform:

x86

macOS

Type:

defect

Priority:

Not set

Severity:

normal

Tracking

()

Status:

RESOLVED INCOMPLETE

People

(Reporter: dietrich, Unassigned)

References

Details

(Keywords: hang, perf, Whiteboard: [snappy:p2])

Attachments

(3 files)

Sample of Firefox at ~90% CPU after resume 14 years ago Alex Limi (:limi) — Firefox UX Team 143.83 KB, application/zip		Details
another sample 14 years ago Dietrich Ayala (:dietrich) 81.26 KB, text/plain		Details
about:telemetry from just after a long restart 13 years ago Brandon W Maister 263.69 KB, text/html		Details

Dietrich Ayala (:dietrich)

Reporter

Description

•

14 years ago

I started using a Mac more often recently, and experience this routinely: 1. use browser 2. close lid 3. open lid 4. try to use browser Expected: Can interact with browser. Actual: Browser is non-responsive, pinwheel shows up, for up to 20 seconds. Then browser is usable again. Other applications are responsive during this period of time.

Dietrich Ayala (:dietrich)

Reporter

Updated

•

14 years ago

Keywords: hang

Whiteboard: [tsnap]

Justin Dolske [:Dolske]

Comment 1

•

14 years ago

Can you grab a profile while this is happening? I don't think I see this on my MBP...

Dietrich Ayala (:dietrich)

Reporter

Comment 2

•

14 years ago

This is a Spin Control report from a Macbook Air 2.13ghz/4gb/10.6.6. I can reproduce consistently after a suspend/resume cycle of significant time. Going to sleep and immediately resuming doesn't trigger it. https://people.mozilla.com/~dietrich/hang-on-resume.txt

Alex Limi (:limi) — Firefox UX Team

Comment 3

•

14 years ago

Pretty sure this is caused by F1. I have the same thing happening, and it goes away when I disable it. (Adding in some F1 people on CC)

David Ascher (:davida)

Comment 4

•

14 years ago

:dietrich, are you using f1?

Dietrich Ayala (:dietrich)

Reporter

Comment 5

•

14 years ago

I did have it installed and enabled. I disabled it yesterday, while trying to narrow down a memory leak. But I have not had a sleep/resume cycle since, so can't say if it's related yet.

Alex Limi (:limi) — Firefox UX Team

Comment 6

•

14 years ago

Attached file Sample of Firefox at ~90% CPU after resume — Details

Might have been a false alarm as far as F1 goes, sorry about that. I disabled it, and the hang went away, but after waking up my computer today without F1 enabled, I got the same hang. Attached is a sample of the process. I assume we should do this sample with a build with debug symbols for it to be any use? Let me know what I can do to help track this down. (the sample was 8MB, so had to create a zip file of it)

Dietrich Ayala (:dietrich)

Reporter

Comment 7

•

14 years ago

FWIW, I've now resumed from sleep a couple of times. No hang with F1 disabled. However, I disabled a bunch of other add-ons as well at the time that I disabled F1.

Bryan Clark (DevTools PM) [:clarkbw]

Comment 8

•

14 years ago

For F1 we're using bug 642629 to track down the problem and believe to have it fixed in bug 646598. A hotfix of the add-on code should be pushed out fairly soon. If it is the F1 issue you'll need to restart your browser at some point to clear out the hang but it will likely come back until the new add-on is pushed out.

Bryan Clark (DevTools PM) [:clarkbw]

Comment 9

•

14 years ago

We've got a new version of the add-on out with the fix in, if you can update to that and give your Firefox a restart we can start to rule out F1 as the cause.

Dietrich Ayala (:dietrich)

Reporter

Comment 10

•

14 years ago

I was definitely getting beachballs without F1 enabled over the last week. Not as much though, so it feels like it was contributing to the problem. I've just now updated to the latest F1 and have enabled it.

Wayne Mery (:wsmwk)

Comment 11

•

14 years ago

still problems deitrich?

Keywords: perf

Dietrich Ayala (:dietrich)

Reporter

Comment 12

•

14 years ago

yes! i need to disable all add-ons, try to narrow it down.

Dietrich Ayala (:dietrich)

Reporter

Updated

•

14 years ago

Whiteboard: [tsnap] → [snappy]

Asa Dotzler [:asa]

Updated

•

14 years ago

Whiteboard: [snappy] → [snappy:p2]

Dietrich Ayala (:dietrich)

Reporter

Comment 13

•

14 years ago

i see the files for session restore, localStorage and cookies all written-to while the app is frozen. this bug could be a combination of many parts of the app all handling the firing of timers/events after resuming the OS, each doing synchronous work or disk IO. or it could be mostly one of the features listed above. i'll continue to investigate. cookies writes many times during the frozen period, but supposedly writes its file async and shouldn't block while writing. it could be doing too much synchronous work prior to writing, like some of the session restore bugs we've found.

Dietrich Ayala (:dietrich)

Reporter

Comment 14

•

14 years ago

Attached file another sample — Details

Dietrich Ayala (:dietrich)

Reporter

Comment 15

•

13 years ago

looks like it could very well be bug 722646. basically, we come back from resume, and the idle service considers that time "idle" since no keyboard/mouse interaction, and then Places immediately does a crap-ton of synchronous I/O before you have time to even hit a key.

Depends on: 722646

Marco Bonardo [:mak]

Comment 16

•

13 years ago

(In reply to Dietrich Ayala (:dietrich) from comment #15) > then Places immediately > does a crap-ton of synchronous I/O before you have time to even hit a key. Just to clarify all the Places work is asynchronous on idle, though it assumes user is idle. if you are not it's likely something tries to use a synchronous api and contends the mutex with the async work.

Dietrich Ayala (:dietrich)

Reporter

Comment 17

•

13 years ago

I installed memchaser recently, so am always seeing GC/CC times in the add-on bar (which is a scary way to live). https://addons.mozilla.org/en-US/firefox/addon/memchaser/ This morning, I opened up the laptop, got the long pinwheel again, and noticed that GC was ~20 seconds.

Andrew McCreight [:mccr8]

Comment 18

•

13 years ago

Dietrich has about 400 unrestored tabs, which is probably unique. Maybe our scheduling is failing.

Assignee: nobody → continuation

Andrew McCreight [:mccr8]

Comment 19

•

13 years ago

Bill, Olli, can you think of any reason why resuming with a ton of unrestored tabs would cause hideous GC and CC times? (Resuming meaning resuming the whole computer from sleep, the browser is just staying there).

Olli Pettay [:smaug][bugs@pettay.fi]

Comment 20

•

13 years ago

Dietrich, what kinds of CC times have you seen after resume? Is it possible that because of idle service announcing idleness we try to do lots of I/O and while doing that we also GC/CC and that all uses lots of memory and we end up tons of page faults? Does about:memory show page faults on OSX?

Dietrich Ayala (:dietrich)

Reporter

Comment 21

•

13 years ago

(In reply to Olli Pettay [:smaug] from comment #20) > Dietrich, what kinds of CC times have you seen after resume? I've seen up to 11 seconds. The only screenshot I have is one that's 1.5 seconds: http://cl.ly/2m1r1S36023s0V3j3I0r > Is it possible that because of idle service announcing idleness we try to do > lots of I/O and while > doing that we also GC/CC and that all uses lots of memory and we end up tons > of page faults? > Does about:memory show page faults on OSX? Current value is below, but I've just restarted the browser. 288 ── page-faults-hard 913,319 ── page-faults-soft What would be a lot? Should I look after a resume that pinwheels?

Andrew McCreight [:mccr8]

Comment 22

•

13 years ago

Hmm. If I added stuff about the size of the CC graph to the error console, then we could see if there is a huge amount of stuff in the graph (eg resuming somehow causes us to look at a lot more stuff in the CC), or if the CC is just super slow. If it is the latter, then maybe we could delay the CC by an extra 10 seconds like we do on startup.

Asa Dotzler [:asa]

Comment 23

•

13 years ago

mitchell is seeing this too. She's seated next to me and will show me the next time it happens. Is there anything I should look for that would tell me if it was the same problem Dietrich is seeing?

Andrew McCreight [:mccr8]

Comment 24

•

13 years ago

If you go to about:config and set javascript.options.mem.log to true, then you can look in the error console and see GC and CC times. Alternatively, you can install the MemChaser addon ( https://addons.mozilla.org/en-US/firefox/addon/memchaser/ ) and that will show the most recent GC and CC times.

Andrew McCreight [:mccr8]

Comment 25

•

13 years ago

For either option, that has to be done before you resume, it can't be done after the fact. A third option would be installing about:telemetry addon ( https://addons.mozilla.org/en-US/firefox/addon/abouttelemetry/ ) and seeing there are some really long GCs or CCs, but that just will show you if you've had any since you last started up, or in the last 24 hours.

Dietrich Ayala (:dietrich)

Reporter

Comment 26

•

13 years ago

incremental GC might have fixed this. i have not seen it since.

Dietrich Ayala (:dietrich)

Reporter

Comment 27

•

13 years ago

Nope, still around. Memchaser has been broken for GC since incremental GC landed, but CC post-resume was 4 seconds this time: http://cl.ly/0V292l0u3q3N0v380X1P

Andrew McCreight [:mccr8]

Comment 28

•

13 years ago

Okay, I'll try to recreate it.

Summary: pinwheeling after resume → pinwheeling after resume due to GC/CC

Olli Pettay [:smaug][bugs@pettay.fi]

Comment 29

•

13 years ago

Note, incremental GC gets disabled usually, see Bug 728686. But ugh, that is a horrible CC time. Could you perhaps try about:jank addon. Check what the values about before suspending, and then after resume check again.

Olli Pettay [:smaug][bugs@pettay.fi]

Comment 30

•

13 years ago

Also, as I said before, perhaps page faults cause it. Running CC/GC needs to bring all sorts of objects to memory. So, if you can reproduce this easily, check about:memory before suspending and after resume.

Dietrich Ayala (:dietrich)

Reporter

Comment 31

•

13 years ago

Hard to check before/after easily, since I can't reproduce on-demand and don't really think about this every time i close the lid of my laptop... would quite likely remember *just* after ;) FWIW, here are screenshots of after: about:jank: http://cl.ly/2c3T16420O2L270X191O about:memory 1: http://cl.ly/2L3m3U2s1S0z2f3F3A40 about:memory 2: http://cl.ly/1z1c3F341x2K3W1j382Y

Dietrich Ayala (:dietrich)

Reporter

Comment 32

•

13 years ago

Memchaser makes tracking memory a bit easier, since the pre-resume value is visible: resident before resume: 1127mb resident immediate after resume: 262mb resident 10 seconds after resume: 944mb 2038ms CC, and unknown GC because incremental GC broke Memchaser.

Dietrich Ayala (:dietrich)

Reporter

Comment 33

•

13 years ago

Same pattern today for resident: 1585mb -> pinwheeling -> 350mb -> a few seconds -> 730mb and can use normally thereafter.

Dietrich Ayala (:dietrich)

Reporter

Comment 34

•

13 years ago

This one was serious business: http://cl.ly/3M3X1F1D0W322p3p3O2r

Olli Pettay [:smaug][bugs@pettay.fi]

Comment 35

•

13 years ago

I wonder what we could do to this. If lots of tabs are open, lots of memory is used. Then, suspending/resuming may cause data to be moved from memory to disk and back to memory. (That is what I guess is happening.) If I/O is slow, moving data back to memory can take time. So, we should reduce memory usage. But we could also try to postpone GC and CC after resuming. That way more data is hopefully in memory when GC and CC actually run.

Andrew McCreight [:mccr8]

Comment 36

•

13 years ago

Dietrich, if you have a recent (in the last week or so) could you try setting javascript.options.mem.log to true in about:config? Then when this happens go to the error console and copy the line about the CC here. It will look something like this: CC(T+184517.8) duration: 56ms, suspected: 153, visited: 7462 RCed and 5548 GCed, collected: 52 RCed and 2 GCed (54 waiting for GC) ForgetSkippable 4 times before CC, min: 0 ms, max: 13 ms, avg: 3 ms, total: 15 ms, removed: 370 If the visited part is not unusually large (also copy in the information about the CC after it gets back to a normal length), that would suggest that scanning each thing is long (like if it is due to paging), as opposed to their being some weird leak. If you also copy in the information about long GCs that will be in the error console that could be helpful, too.

Kadir Topal [:atopal]

Comment 37

•

13 years ago

If you need more data, I have the exact same issue.

Andrew McCreight [:mccr8]

Comment 38

•

13 years ago

Kadir, how many tabs do you usually have open? What version of Firefox are you running? Thanks.

Brandon W Maister

Comment 39

•

13 years ago

Attached file about:telemetry from just after a long restart — Details

My computer just came up from a fairly long suspend, after I forgot to turn on memchaser logging, so here's some about:telemetry. One thing of note is that I've got an SSD--before I got it I had resume times in the *minutes*. I'm not 100% certain that this is Firefox's fault, I've been having this issue for a pretty long time and I vaguely recall thinking that this *wasn't* Fx's fault because once I resumed and it happened and Firefox wasn't even running. But I know Gnome used to use gecko all through all kinds of things and so even if Firefox wasn't the culprit it may have been some embedded XUL somewhere. (This has been happening for at least 12 months.) At the very least I think I can confirm that something similar happens on Linux.

Updated

•

13 years ago

Attachment #603584 - Attachment mime type: text/plain → text/html

Dietrich Ayala (:dietrich)

Reporter

Comment 40

•

13 years ago

Andrew, here are the GC/CC entries from the console after a long post-resume hang. Also, these are sent to the console every 3 - 15 seconds, with GCs often around 500ms. CC(T+27710.4) duration: 1745ms, suspected: 6083, visited: 17338 RCed and 5848 GCed, collected: 5670 RCed and 0 GCed (5670 waiting for GC) ForgetSkippable 14 times before CC, min: 4 ms, max: 547 ms, avg: 83 ms, total: 1163 ms, removed: 12571 GC(T+27718.6) TotalTime: 7820.2ms, Type: global, MMU(20ms): 0%, MMU(50ms): 0%, Reason: PAGE_HIDE, NonIncrementalReason: incremental permanently disabled, +chunks: 0, -chunks: 0 mark: 6992.0, mark-roots: 309.2, mark-other: 352.1, sweep: 735.1, sweep-obj: 64.2, sweep-string: 10.3, sweep-script: 2.6, sweep-shape: 69.0, discard-code: 28.1, discard-analysis: 134.4, xpconnect: 53.4, deallocate: 3.1

Dietrich Ayala (:dietrich)

Reporter

Comment 41

•

13 years ago

(In reply to Olli Pettay [:smaug] from comment #35) > I wonder what we could do to this. If lots of tabs are open, lots of memory > is used. > Then, suspending/resuming may cause data to be moved from memory to disk and > back to memory. What data, exactly? Is there a way to tell? A very large transfer of data out of and back into memory sounds exactly like the pattern I see when I experience this problem: pre-resume at 1gb resident, then after resuming and hanging, resident is <200mb, and boom a few seconds later it's back up to ~800mb resident.

Kadir Topal [:atopal]

Comment 42

•

13 years ago

(In reply to Andrew McCreight [:mccr8] from comment #38) > Kadir, how many tabs do you usually have open? What version of Firefox are > you running? Thanks. loaded tabs: ~15 undloaded tabs: ~30 Firefox 12.0a2 (2012-03-13) on Mac OS 10.7

Bill McCloskey [inactive unless it's an emergency] (:billm)

Comment 43

•

13 years ago

I found a page that shows how we can be notified when a machine resumes: https://developer.apple.com/library/mac/#qa/qa1340/_index.html Maybe when this happens, we should kill any existing GC and CC timers and schedule new ones in 30 seconds, as Olli suggested.

Andrew McCreight [:mccr8]

Comment 44

•

13 years ago

I'm going to WFM this. Please reopen if you still see this.

Status: NEW → RESOLVED

Closed: 13 years ago

Resolution: --- → WORKSFORME

Dietrich Ayala (:dietrich)

Reporter

Comment 45

•

13 years ago

Just had a 33 second GC after resuming.

Status: RESOLVED → REOPENED

Resolution: WORKSFORME → ---

Andrew McCreight [:mccr8]

Updated

•

13 years ago

Keywords: testcase-wanted

Wayne Mery (:wsmwk)

Comment 46

•

12 years ago

Any better now that bug 765215 is fixed in Firefox 20?

Flags: needinfo?(dietrich)

Wayne Mery (:wsmwk)

Updated

•

12 years ago

Comment 46 is private: false

Dietrich Ayala (:dietrich)

Reporter

Comment 47

•

12 years ago

I've seen it pretty bad in the last few days actually.

Flags: needinfo?(dietrich)

Andrew McCreight [:mccr8]

Comment 48

•

11 years ago

I'm not really sure what's going on with this one, but leaving it open doesn't seem very useful at this point.

Assignee: continuation → nobody

Status: REOPENED → RESOLVED

Closed: 13 years ago → 11 years ago

Resolution: --- → INCOMPLETE

Ryan VanderMeulen [:RyanVM]

Updated

•

10 years ago

Keywords: testcase-wanted

Andrew McCreight [:mccr8]

Updated

•

9 years ago

See Also: → 1310264

You need to log in before you can comment on or make changes to this bug.