Closed Bug 152528 Opened 23 years ago Closed 22 years ago

Chimera sometimes sucks all resources (high CPU usage) on wake-up

Categories

(Camino Graveyard :: General, defect)

PowerPC
macOS
defect
Not set
critical

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: paul, Assigned: sfraser_bugs)

References

Details

Attachments

(2 files, 1 obsolete file)

Sometimes, when waking up a PowerBook G4, Chimera starts to take all CPUs. It looks like it makes the system take all CPU because neither top nor ProcessViewer display and unusual load, just XLoad. The system is indeed less responsive (but still kind of livable) and quitting Chimera does exactly fix this (hence the bug is attributed to Chimera...). This fact is a real problem as Chimera has no notion of "restore all the windows I had opened when quitting" (which would be a nice feature). Please find attached (as soon as I get a chance to it) a sample.
Hmm, I haven't seen this, do you have a particular page loaded when this happens? Any windows open, etc?
Assignee: saari → pinkerton
Severity: normal → critical
I just experienced this bug on my work computer (Beige G3 300). I left chimera open over night, and it had probably been open for well over 12 hours doing nothing at all (no open windows or anything), yet it was consuming 80-90% of the processor (according to top). Chimera was still usable, and could browse the web, but the system wasn't very responsive. I tend to leave Chimera open for long periods of time after I leave work or leave home, but this is the first time I've seen this happen. will attempt to grab a sample the next time I see this... *IF* I see it again. Occured with Chimera 8-5
chimera 0.4 and the nightly from aug. 12 both occasionally consume all resources. removing the flash plugin from my system plugin dir has made this happen less frequently, but it's also happened when i hide it when visiting a page with a quicktime plugin. afaict, other times it has just happened at random - i haven't been able to find a pattern. it usually goes away after a bit - i can't tell if it is me doing something or not.
Remember that just because Chimera appears to take a lot of CPU time sometimes, that doesn't mean it's hogging the CPU. I suspect Chimera, like Mozilla, still calls WaitNextEvent a lot.
Well, that only convinces me half-way: -> the machine still gets unresponsive (though indeed, less than a real infinite loop happening in a faulty program, at least 60% of CPU load is real) -> and... I see no reason that a "user" would know about this: if this is a bug in the OS, a workaround has to be found or updates have to be expected... But you're not the first supposing the event-pump is the one eating everything. At least, the event-pump shouldn't do so and possibly should try to measure if it needs to work (it still does when Chimera is hidden) and, if takes too much CPU time (what to do in this case is unknown to me though). Oh, and about the question of a particular page, the answer is no, I don't. It happens just about everywhere, spuriously, only when waking up in my case (which often involves network changes and, these days, I'm experiencing a lot of Lookupd hangs (where lookupd is so hung that even a ps -aux locks and no cocoa app can be launched), could this be related ?). Paul
Chimera is a cocoa app and _never_ calls waitNextEvent.
Sorry for the misunderstanding, Mike.
May I request that a Chimera build has a little flag somewhere (an environment variable or a command-line startup-option) that would print, say, every minute, the type of event that's mostly used ? I think using such a feature would allow me to say more. (note, it goes on on 0.4). Paul
Paul, you can use the Sampler application to record what Chimera's doing during these resource-hogging episodes.
i did recently on my powerbook, and it's just sitting in the cocoa main loop (CFRunLoop). nothing else shows up in the sample. the fact that it eventually disappears leads me to believe it's probably a bug in the OS with apps going to sleep with running timers. maybe we should see if it gets any better in shaguar.
i haven't noticed this bug lately, however i haven't been using tabs as much. formerly, when i would get this all the time, i had ~6 tabs in the window, 2 of them had some tinderbox pages up (which automatically refresh)
i never use tabs, and see it about once a week on my powerbook at home, though i'm still using 0.4 there, or a build just after it perhaps.
Whoops, sorry, Paul. I see you already sampled it.
Summary: Chimera sometimes sucks all resources on wake-up → Chimera sometimes sucks all resources (high CPU usage) on wake-up
*** Bug 168439 has been marked as a duplicate of this bug. ***
This happens to me on my powerbook g4 800 running 10.2.1 and 10.2. It's usually after it's been running for an extended period of time (12+ hours), and i've noticed it after going to sleep. It relably occurs after an extended period of time, running on my machine. --Ben
I think this problem is caused by the slack timer adjustment code in TimerThread.cpp. When the machine is sleeping, no threads are running (much) and few timers fire. On wake, the repeating timers therefore all think that they are hugely behind schedule, so fire multiple times in a short space of time to "catch up". TimerThread::UpdateFilter is the code that's to blame here, I think.
Note: to reproduce this, sleep with a blinking caret in a gecko text field (e.g. load google.com and then sleep). It appears that pthreads get to run every few minutes while the machine is asleep, but that the main threads doesn't. So the timer thread is posting timer events which are never handled, and they pile up on the main thread. At wake time, we service all those delayed timers. In addition, the code that tries to adjust timer firing frequencies (TimerThread::UpdateFilter) gets confused because the timers are all firing very late.
I can't find an easy solution here. I think we need to fix 2 things: 1. TimerThread::UpdateFilter() should ignore really big negative slack valules (i.e. timers that are more than, say, 60 seconds late). 2. More importantly, TimerThread::Run needs to be fixed to avoid processing the same repeating precise timer over and over again, because of the 'goto next' clause. It would be good not to process a timer that has an outstanding PLEvent, but maintaining this info for each timer, in a threadsafe manner, seems hard.
> 1. TimerThread::UpdateFilter() should ignore really big negative slack valules > (i.e. timers that are more than, say, 60 seconds late). Negative slack is clamped at 100ms already: http://lxr.mozilla.org/mozilla/source/xpcom/threads/TimerThread.cpp#144 I think something's busted at the OS level if the main thread doesn't run while other threads do, when the machine is asleep. Is this a known hazard on OS/X? /be
I meant -100ms, of course. /be
I have a fix for this that uses SleepQInstall(), and, on wake, bumps up the mTimeout of all existing timers by the lenght of time that the machine slept. One bummer is that calling PR_IntervalNow() from the SleepQProc (which is called at wake) gives the wrong answer; it seems that gettimeofday() hasn't been updated yet. sleep(2) in there fixes that.
Attached patch Timer patch (obsolete) — Splinter Review
This patch installs a SleepQ function that gets called on wake, and calls SetDelay() on all the timers, to reset their timeout relative to the current time.
- if (!TIMER_LESS_THAN(now, timeout)) + if (!TIMER_LESS_THAN(now, timeout)) { goto next; + } leave the braces off + if ((gSleepQEntry.sleepQProc = NewSleepQUPP(TimerSleepQProc)) != NULL) { won't this leak the UPP every time? I never see you dispose of it. + timer->SetDelay(timer->GetDelay()); what does this accomplish? a comment might be nice.
I agree with Garrett Huntress comments: I too generally leave Chimera running for long periods. Just now, reading an article, no rendering going on, checked processor use with top showing 80-90% cpu usage. By the time I started filing this bug report a couple of minutes later, it was back down to almost nothing - strange. Here's an example: PID COMMAND %CPU TIME #TH #PRTS #MREGS RPRVT RSHRD RSIZE VSIZE 1475 top 0.0% 0:00.01 1 14 18 284K 324K 580K 13.6M 1466 smbspool 0.0% 0:00.21 1 12 88 5.20M 732K 5.66M 19.1M 1465 pstops 0.0% 0:00.05 1 9 15 116K 316K 364K 2.18M 1464 cgpdftops 0.0% 0:00.50 1 18 52 452K 2.10M 1.29M 20.7M 1422 Print Cent 0.0% 7:42.61 2 92 241 3.53M 8.70M 20.7M 53.0M 1418 Navigator 88.3% 11:40.72 7 106 440 18.4M 26.8M 52.2M 95.6M
Looks like there's some work going on... the nightly of 2002-10-16 04 has at least one crashed on wake-up. Sorry I did not file that comment earlier, it crashed another time (on an evil page) in the meantime so a crash-log is no more available.
I too have noticed this bug on my powerbook, but lately after wake from sleep when the CPU usage is ~100% I run top and find that it is the Kernel Task (PID 0 kernel_tas) that appears to be using all spare cycles. Quitting Chimera returns the kernel task usage to normal, which got me thinking about what might be happening. When I ran top -a, which is an event counting mode, for 30 secs on a basically idle system I get very interesting results. Here the terminal app is in the foreground running top and all other apps are idling in the background. Look how much Chimera's event counts differ from the other idle apps! [jagwire:~] % top -as 10 Processes: 62 total, 2 running, 60 sleeping... 164 threads 0:00:30 Load Avg: 0.07, 0.13, 0.13 CPU usage: 3.3% user, 2.6% sys, 94.1% idle Networks: 1 ipkts/0K 7 opkts /1K Disks: 0 reads/0K 8 writes/43K VM: 0 pageins 0 pageouts PID COMMAND %CPU TIME MSGS_SENT MSGS_RCVD BSDSYSCALL MACHSYSCALL CSWITCH 2580 top 0.3% 0:00.10 2190 2178 294 2553 59 2568 iChat 0.0% 0:00.00 0 0 0 0 0 2483 tcsh 0.0% 0:00.00 0 0 0 0 0 2482 login 0.0% 0:00.00 0 0 0 0 0 2459 Navigator 0.1% 0:00.05 891 677 3024 15823 7096 2425 Help Viewe 0.0% 0:00.00 0 0 0 0 0 2347 slpd 0.0% 0:00.00 0 90 78 180 105 2320 lookupd 0.0% 0:00.00 0 0 0 0 0 2115 Interface 0.0% 0:00.00 0 0 0 0 0 2114 Address Bo 0.0% 0:00.00 0 0 0 0 0 2113 Script Edi 0.0% 0:00.00 0 0 0 0 0 2103 OmniWeb 0.0% 0:00.00 0 0 0 0 0 2033 tcsh 0.0% 0:00.00 0 0 0 0 0 2032 login 0.0% 0:00.00 0 0 0 0 0 2031 Terminal 0.3% 0:00.10 79 69 29 279 153 1849 Disk Copy 0.0% 0:00.00 0 0 0 0 0 1640 hdi_agent 0.0% 0:00.00 0 0 0 0 0
I am not aware of what was committed or not. The nightly of 2002102204 seems however to be fine... I had one cpu-crunch-on-wake-up only since three days (it is always on, with applets, animated gifs, forms, and flash-things) and just un-hiding it killed the cpu-crunch. I am not sure I can say it's fixed (give me a week). But lasting three days is already quite good. Paul
This isn't fixed. To see the CPU eating, put your machine to sleep when chimera shows a blinking caret in a text field. Taking.
Assignee: pinkerton → sfraser
Status: NEW → ASSIGNED
This patch does the trick. It fixes an issue with the previous patch, where there was a time window after waking the machine, but before we got the wake notification, during which timer events could still pile up, causing a spurt of high CPU usage. The current patch disables all timer firing between sleep and wake.
Attachment #103111 - Attachment is obsolete: true
Patch checked in. We probably need something similar on the Macho trunk.
Status: ASSIGNED → RESOLVED
Closed: 22 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: