152528 - Chimera sometimes sucks all resources (high CPU usage) on wake-up

Reporter

Description

•

23 years ago

Sometimes, when waking up a PowerBook G4, Chimera starts to take all CPUs. It looks like it makes the system take all CPU because neither top nor ProcessViewer display and unusual load, just XLoad. The system is indeed less responsive (but still kind of livable) and quitting Chimera does exactly fix this (hence the bug is attributed to Chimera...). This fact is a real problem as Chimera has no notion of "restore all the windows I had opened when quitting" (which would be a nice feature). Please find attached (as soon as I get a chance to it) a sample.

Paul Libbrecht

Reporter

Comment 1

•

23 years ago

Attached file A sample when the bug occured. Obtained with "sample 630 30" — Details

saari (gone)

Comment 2

•

23 years ago

Hmm, I haven't seen this, do you have a particular page loaded when this happens? Any windows open, etc?

Assignee: saari → pinkerton

Greg K.

Updated

•

23 years ago

Severity: normal → critical

Garret Huntress

Comment 3

•

23 years ago

I just experienced this bug on my work computer (Beige G3 300). I left chimera open over night, and it had probably been open for well over 12 hours doing nothing at all (no open windows or anything), yet it was consuming 80-90% of the processor (according to top). Chimera was still usable, and could browse the web, but the system wasn't very responsive. I tend to leave Chimera open for long periods of time after I leave work or leave home, but this is the first time I've seen this happen. will attempt to grab a sample the next time I see this... *IF* I see it again. Occured with Chimera 8-5

jacob berkman

Comment 4

•

22 years ago

chimera 0.4 and the nightly from aug. 12 both occasionally consume all resources. removing the flash plugin from my system plugin dir has made this happen less frequently, but it's also happened when i hide it when visiting a page with a quicktime plugin. afaict, other times it has just happened at random - i haven't been able to find a pattern. it usually goes away after a bit - i can't tell if it is me doing something or not.

Greg K.

Comment 5

•

22 years ago

Remember that just because Chimera appears to take a lot of CPU time sometimes, that doesn't mean it's hogging the CPU. I suspect Chimera, like Mozilla, still calls WaitNextEvent a lot.

Paul Libbrecht

Reporter

Comment 6

•

22 years ago

Well, that only convinces me half-way: -> the machine still gets unresponsive (though indeed, less than a real infinite loop happening in a faulty program, at least 60% of CPU load is real) -> and... I see no reason that a "user" would know about this: if this is a bug in the OS, a workaround has to be found or updates have to be expected... But you're not the first supposing the event-pump is the one eating everything. At least, the event-pump shouldn't do so and possibly should try to measure if it needs to work (it still does when Chimera is hidden) and, if takes too much CPU time (what to do in this case is unknown to me though). Oh, and about the question of a particular page, the answer is no, I don't. It happens just about everywhere, spuriously, only when waking up in my case (which often involves network changes and, these days, I'm experiencing a lot of Lookupd hangs (where lookupd is so hung that even a ps -aux locks and no cocoa app can be launched), could this be related ?). Paul

Mike Pinkerton (not reading bugmail)

Comment 7

•

22 years ago

Chimera is a cocoa app and _never_ calls waitNextEvent.

Greg K.

Comment 8

•

22 years ago

Sorry for the misunderstanding, Mike.

Paul Libbrecht

Reporter

Comment 9

•

22 years ago

May I request that a Chimera build has a little flag somewhere (an environment variable or a command-line startup-option) that would print, say, every minute, the type of event that's mostly used ? I think using such a feature would allow me to say more. (note, it goes on on 0.4). Paul

Greg K.

Comment 10

•

22 years ago

Paul, you can use the Sampler application to record what Chimera's doing during these resource-hogging episodes.

Mike Pinkerton (not reading bugmail)

Comment 11

•

22 years ago

i did recently on my powerbook, and it's just sitting in the cocoa main loop (CFRunLoop). nothing else shows up in the sample. the fact that it eventually disappears leads me to believe it's probably a bug in the OS with apps going to sleep with running timers. maybe we should see if it gets any better in shaguar.

jacob berkman

Comment 12

•

22 years ago

i haven't noticed this bug lately, however i haven't been using tabs as much. formerly, when i would get this all the time, i had ~6 tabs in the window, 2 of them had some tinderbox pages up (which automatically refresh)

Mike Pinkerton (not reading bugmail)

Comment 13

•

22 years ago

i never use tabs, and see it about once a week on my powerbook at home, though i'm still using 0.4 there, or a build just after it perhaps.

Greg K.

Comment 14

•

22 years ago

Whoops, sorry, Paul. I see you already sampled it.

Simon Fraser [no longer active]

Assignee

Updated

•

22 years ago

Summary: Chimera sometimes sucks all resources on wake-up → Chimera sometimes sucks all resources (high CPU usage) on wake-up

Simon Fraser [no longer active]

Assignee

Comment 15

•

22 years ago

*** Bug 168439 has been marked as a duplicate of this bug. ***

Ben Harper

Comment 16

•

22 years ago

This happens to me on my powerbook g4 800 running 10.2.1 and 10.2. It's usually after it's been running for an extended period of time (12+ hours), and i've noticed it after going to sleep. It relably occurs after an extended period of time, running on my machine. --Ben

Simon Fraser [no longer active]

Assignee

Comment 17

•

22 years ago

I think this problem is caused by the slack timer adjustment code in TimerThread.cpp. When the machine is sleeping, no threads are running (much) and few timers fire. On wake, the repeating timers therefore all think that they are hugely behind schedule, so fire multiple times in a short space of time to "catch up". TimerThread::UpdateFilter is the code that's to blame here, I think.

Simon Fraser [no longer active]

Assignee

Comment 18

•

22 years ago

Note: to reproduce this, sleep with a blinking caret in a gecko text field (e.g. load google.com and then sleep). It appears that pthreads get to run every few minutes while the machine is asleep, but that the main threads doesn't. So the timer thread is posting timer events which are never handled, and they pile up on the main thread. At wake time, we service all those delayed timers. In addition, the code that tries to adjust timer firing frequencies (TimerThread::UpdateFilter) gets confused because the timers are all firing very late.

Simon Fraser [no longer active]

Assignee

Comment 19

•

22 years ago

I can't find an easy solution here. I think we need to fix 2 things: 1. TimerThread::UpdateFilter() should ignore really big negative slack valules (i.e. timers that are more than, say, 60 seconds late). 2. More importantly, TimerThread::Run needs to be fixed to avoid processing the same repeating precise timer over and over again, because of the 'goto next' clause. It would be good not to process a timer that has an outstanding PLEvent, but maintaining this info for each timer, in a threadsafe manner, seems hard.

Brendan Eich [:brendan]

Comment 20

•

22 years ago

> 1. TimerThread::UpdateFilter() should ignore really big negative slack valules > (i.e. timers that are more than, say, 60 seconds late). Negative slack is clamped at 100ms already: http://lxr.mozilla.org/mozilla/source/xpcom/threads/TimerThread.cpp#144 I think something's busted at the OS level if the main thread doesn't run while other threads do, when the machine is asleep. Is this a known hazard on OS/X? /be

Brendan Eich [:brendan]

Comment 21

•

22 years ago

I meant -100ms, of course. /be

Simon Fraser [no longer active]

Assignee

Comment 22

•

22 years ago

I have a fix for this that uses SleepQInstall(), and, on wake, bumps up the mTimeout of all existing timers by the lenght of time that the machine slept. One bummer is that calling PR_IntervalNow() from the SleepQProc (which is called at wake) gives the wrong answer; it seems that gettimeofday() hasn't been updated yet. sleep(2) in there fixes that.

Simon Fraser [no longer active]

Assignee

Comment 23

•

22 years ago

Attached patch Timer patch (obsolete) — Details — Splinter Review

This patch installs a SleepQ function that gets called on wake, and calls SetDelay() on all the timers, to reset their timeout relative to the current time.

Mike Pinkerton (not reading bugmail)

Comment 24

•

22 years ago

- if (!TIMER_LESS_THAN(now, timeout)) + if (!TIMER_LESS_THAN(now, timeout)) { goto next; + } leave the braces off + if ((gSleepQEntry.sleepQProc = NewSleepQUPP(TimerSleepQProc)) != NULL) { won't this leak the UPP every time? I never see you dispose of it. + timer->SetDelay(timer->GetDelay()); what does this accomplish? a comment might be nice.

richard smykla

Comment 25

•

22 years ago

I agree with Garrett Huntress comments: I too generally leave Chimera running for long periods. Just now, reading an article, no rendering going on, checked processor use with top showing 80-90% cpu usage. By the time I started filing this bug report a couple of minutes later, it was back down to almost nothing - strange. Here's an example: PID COMMAND %CPU TIME #TH #PRTS #MREGS RPRVT RSHRD RSIZE VSIZE 1475 top 0.0% 0:00.01 1 14 18 284K 324K 580K 13.6M 1466 smbspool 0.0% 0:00.21 1 12 88 5.20M 732K 5.66M 19.1M 1465 pstops 0.0% 0:00.05 1 9 15 116K 316K 364K 2.18M 1464 cgpdftops 0.0% 0:00.50 1 18 52 452K 2.10M 1.29M 20.7M 1422 Print Cent 0.0% 7:42.61 2 92 241 3.53M 8.70M 20.7M 53.0M 1418 Navigator 88.3% 11:40.72 7 106 440 18.4M 26.8M 52.2M 95.6M

Paul Libbrecht

Reporter

Comment 26

•

22 years ago

Looks like there's some work going on... the nightly of 2002-10-16 04 has at least one crashed on wake-up. Sorry I did not file that comment earlier, it crashed another time (on an evil page) in the meantime so a crash-log is no more available.

Tyler B

Comment 27

•

22 years ago

I too have noticed this bug on my powerbook, but lately after wake from sleep when the CPU usage is ~100% I run top and find that it is the Kernel Task (PID 0 kernel_tas) that appears to be using all spare cycles. Quitting Chimera returns the kernel task usage to normal, which got me thinking about what might be happening. When I ran top -a, which is an event counting mode, for 30 secs on a basically idle system I get very interesting results. Here the terminal app is in the foreground running top and all other apps are idling in the background. Look how much Chimera's event counts differ from the other idle apps! [jagwire:~] % top -as 10 Processes: 62 total, 2 running, 60 sleeping... 164 threads 0:00:30 Load Avg: 0.07, 0.13, 0.13 CPU usage: 3.3% user, 2.6% sys, 94.1% idle Networks: 1 ipkts/0K 7 opkts /1K Disks: 0 reads/0K 8 writes/43K VM: 0 pageins 0 pageouts PID COMMAND %CPU TIME MSGS_SENT MSGS_RCVD BSDSYSCALL MACHSYSCALL CSWITCH 2580 top 0.3% 0:00.10 2190 2178 294 2553 59 2568 iChat 0.0% 0:00.00 0 0 0 0 0 2483 tcsh 0.0% 0:00.00 0 0 0 0 0 2482 login 0.0% 0:00.00 0 0 0 0 0 2459 Navigator 0.1% 0:00.05 891 677 3024 15823 7096 2425 Help Viewe 0.0% 0:00.00 0 0 0 0 0 2347 slpd 0.0% 0:00.00 0 90 78 180 105 2320 lookupd 0.0% 0:00.00 0 0 0 0 0 2115 Interface 0.0% 0:00.00 0 0 0 0 0 2114 Address Bo 0.0% 0:00.00 0 0 0 0 0 2113 Script Edi 0.0% 0:00.00 0 0 0 0 0 2103 OmniWeb 0.0% 0:00.00 0 0 0 0 0 2033 tcsh 0.0% 0:00.00 0 0 0 0 0 2032 login 0.0% 0:00.00 0 0 0 0 0 2031 Terminal 0.3% 0:00.10 79 69 29 279 153 1849 Disk Copy 0.0% 0:00.00 0 0 0 0 0 1640 hdi_agent 0.0% 0:00.00 0 0 0 0 0

Paul Libbrecht

Reporter

Comment 28

•

22 years ago

I am not aware of what was committed or not. The nightly of 2002102204 seems however to be fine... I had one cpu-crunch-on-wake-up only since three days (it is always on, with applets, animated gifs, forms, and flash-things) and just un-hiding it killed the cpu-crunch. I am not sure I can say it's fixed (give me a week). But lasting three days is already quite good. Paul

Simon Fraser [no longer active]

Assignee

Comment 29

•

22 years ago

This isn't fixed. To see the CPU eating, put your machine to sleep when chimera shows a blinking caret in a text field. Taking.

Assignee: pinkerton → sfraser

Simon Fraser [no longer active]

Assignee

Updated

•

22 years ago

Status: NEW → ASSIGNED

Simon Fraser [no longer active]

Assignee

Comment 30

•

22 years ago

Attached patch Final timer patch — Details — Splinter Review

This patch does the trick. It fixes an issue with the previous patch, where there was a time window after waking the machine, but before we got the wake notification, during which timer events could still pile up, causing a spurt of high CPU usage. The current patch disables all timer firing between sleep and wake.

Simon Fraser [no longer active]

Assignee

Updated

•

22 years ago

Attachment #103111 - Attachment is obsolete: true

Simon Fraser [no longer active]

Assignee

Comment 31

•

22 years ago

Patch checked in. We probably need something similar on the Macho trunk.

Status: ASSIGNED → RESOLVED

Closed: 22 years ago

Resolution: --- → FIXED

A sample when the bug occured. Obtained with "sample 630 30" 23 years ago Paul Libbrecht 11.56 KB, text/plain		Details
Timer patch 22 years ago Simon Fraser [no longer active] 3.44 KB, patch		Details \| Diff \| Splinter Review
Final timer patch 22 years ago Simon Fraser [no longer active] 4.91 KB, patch		Details \| Diff \| Splinter Review