Chimera sometimes sucks all resources (high CPU usage) on wake-up

RESOLVED FIXED

Status

Camino Graveyard
General
--
critical
RESOLVED FIXED
16 years ago
16 years ago

People

(Reporter: Paul Libbrecht, Assigned: Simon Fraser)

Tracking

Details

Attachments

(2 attachments, 1 obsolete attachment)

(Reporter)

Description

16 years ago
Sometimes, when waking up a PowerBook G4, Chimera starts to take all CPUs.
It looks like it makes the system take all CPU because neither top nor
ProcessViewer display and unusual load, just XLoad. The system is indeed less
responsive (but still kind of livable) and quitting Chimera does exactly fix
this (hence the bug is attributed to Chimera...).

This fact is a real problem as Chimera has no notion of "restore all the windows
I had opened when quitting" (which would be a nice feature).
Please find attached (as soon as I get a chance to it) a sample.
(Reporter)

Comment 1

16 years ago
Created attachment 88096 [details]
A sample when the bug occured. Obtained with "sample 630 30"

Comment 2

16 years ago
Hmm, I haven't seen this, do you have a particular page loaded when this
happens? Any windows open, etc?
Assignee: saari → pinkerton

Updated

16 years ago
Severity: normal → critical

Comment 3

16 years ago
I just experienced this bug on my work computer (Beige G3 300).  I left chimera
open over night, and it had probably been open for well over 12 hours doing
nothing at all (no open windows or anything), yet it was consuming 80-90% of the
processor (according to top).  Chimera was still usable, and could browse the
web, but the system wasn't very responsive.  I tend to leave Chimera open for
long periods of time after I leave work or leave home, but this is the first
time I've seen this happen.  will attempt to grab a sample the next time I see
this... *IF* I see it again.

Occured with Chimera 8-5

Comment 4

16 years ago
chimera 0.4 and the nightly from aug. 12 both occasionally consume all resources.

removing the flash plugin from my system plugin dir has made this happen less
frequently, but it's also happened when i hide it when visiting a page with a
quicktime plugin.

afaict, other times it has just happened at random - i haven't been able to find
a pattern.  it usually goes away after a bit - i can't tell if it is me doing
something or not.

Comment 5

16 years ago
Remember that just because Chimera appears to take a lot of CPU time sometimes,
that doesn't mean it's hogging the CPU. I suspect Chimera, like Mozilla, still
calls WaitNextEvent a lot.
(Reporter)

Comment 6

16 years ago
Well, that only convinces me half-way:
-> the machine still gets unresponsive (though indeed, less than a real infinite
loop happening in a faulty program, at least 60% of CPU load is real)
-> and... I see no reason that a "user" would know about this: if this is a bug
in the OS, a workaround has to be found or updates have to be expected...

But you're not the first supposing the event-pump is the one eating everything.
At least, the event-pump shouldn't do so and possibly should try to measure if
it needs to work (it still does when Chimera is hidden) and, if takes too much
CPU time (what to do in this case is unknown to me though).

Oh, and about the question of a particular page, the answer is no, I don't. It
happens just about everywhere, spuriously, only when waking up in my case (which
often involves network changes and, these days, I'm experiencing a lot of
Lookupd hangs (where lookupd is so hung that even a ps -aux locks and no cocoa
app can be launched), could this be related ?).

Paul
Chimera is a cocoa app and _never_ calls waitNextEvent.

Comment 8

16 years ago
Sorry for the misunderstanding, Mike.
(Reporter)

Comment 9

16 years ago
May I request that a Chimera build has a little flag somewhere (an environment
variable or a command-line startup-option) that would print, say, every minute,
the type of event that's mostly used ?
I think using such a feature would allow me to say more.
(note, it goes on on 0.4).

Paul

Comment 10

16 years ago
Paul, you can use the Sampler application to record what Chimera's doing during
these resource-hogging episodes.
i did recently on my powerbook, and it's just sitting in the cocoa main loop
(CFRunLoop). nothing else shows up in the sample. the fact that it eventually
disappears leads me to believe it's probably a bug in the OS with apps going to
sleep with running timers.

maybe we should see if it gets any better in shaguar.

Comment 12

16 years ago
i haven't noticed this bug lately, however i haven't been using tabs as much.

formerly, when i would get this all the time, i had ~6 tabs in the window, 2 of
them had some tinderbox pages up (which automatically refresh)
i never use tabs, and see it about once a week on my powerbook at home, though
i'm still using 0.4 there, or a build just after it perhaps.

Comment 14

16 years ago
Whoops, sorry, Paul. I see you already sampled it.
(Assignee)

Updated

16 years ago
Summary: Chimera sometimes sucks all resources on wake-up → Chimera sometimes sucks all resources (high CPU usage) on wake-up
(Assignee)

Comment 15

16 years ago
*** Bug 168439 has been marked as a duplicate of this bug. ***

Comment 16

16 years ago
This happens to me on my powerbook g4 800 running 10.2.1 and 10.2.  It's usually
after it's been running for an extended period of time (12+ hours), and i've
noticed it after going to sleep.  It relably occurs after an extended period of
time, running on my machine.

--Ben
(Assignee)

Comment 17

16 years ago
I think this problem is caused by the slack timer adjustment code in
TimerThread.cpp.

When the machine is sleeping, no threads are running (much) and few timers fire.
On wake, the repeating timers therefore all think that they are hugely behind
schedule, so fire multiple times in a short space of time to "catch up".
TimerThread::UpdateFilter is the code that's to blame here, I think.
(Assignee)

Comment 18

16 years ago
Note: to reproduce this, sleep with a blinking caret in a gecko text field (e.g.
load google.com and then sleep).

It appears that pthreads get to run every few minutes while the machine is
asleep, but that the main threads doesn't. So the timer thread is posting timer
events which are never handled, and they pile up on the main thread.

At wake time, we service all those delayed timers. In addition, the code that
tries to adjust timer firing frequencies (TimerThread::UpdateFilter) gets
confused because the timers are all firing very late.
(Assignee)

Comment 19

16 years ago
I can't find an easy solution here. I think we need to fix 2 things:

1. TimerThread::UpdateFilter() should ignore really big negative slack valules
   (i.e. timers that are more than, say, 60 seconds late).

2. More importantly, TimerThread::Run needs to be fixed to avoid processing the
   same repeating precise timer over and over again, because of the 'goto next'
   clause. It would be good not to process a timer that has an outstanding
   PLEvent, but maintaining this info for each timer, in a threadsafe manner,
   seems hard.
> 1. TimerThread::UpdateFilter() should ignore really big negative slack valules
>    (i.e. timers that are more than, say, 60 seconds late).

Negative slack is clamped at 100ms already:
http://lxr.mozilla.org/mozilla/source/xpcom/threads/TimerThread.cpp#144

I think something's busted at the OS level if the main thread doesn't run while
other threads do, when the machine is asleep.  Is this a known hazard on OS/X?

/be
I meant -100ms, of course.

/be
(Assignee)

Comment 22

16 years ago
I have a fix for this that uses SleepQInstall(), and, on wake, bumps up the
mTimeout of all existing timers by the lenght of time that the machine slept.

One bummer is that calling PR_IntervalNow() from the SleepQProc (which is called
at wake) gives the wrong answer; it seems that gettimeofday() hasn't been
updated yet. sleep(2) in there fixes that.
(Assignee)

Comment 23

16 years ago
Created attachment 103111 [details] [diff] [review]
Timer patch

This patch installs a SleepQ function that gets called on wake, and calls
SetDelay() on all the timers, to reset their timeout relative to the current
time.
-      if (!TIMER_LESS_THAN(now, timeout))
+      if (!TIMER_LESS_THAN(now, timeout)) {
         goto next;
+      }

leave the braces off

+  if ((gSleepQEntry.sleepQProc = NewSleepQUPP(TimerSleepQProc)) != NULL) {

won't this leak the UPP every time? I never see you dispose of it.

+    timer->SetDelay(timer->GetDelay());

what does this accomplish? a comment might be nice.

Comment 25

16 years ago
I agree with Garrett Huntress comments:

I too generally leave Chimera running for long periods. Just now, reading an
article, no rendering going on, checked processor use with top showing 80-90%
cpu usage. By the time I started filing this bug report a couple of minutes
later, it was back down to almost nothing - strange. Here's an example:

  PID COMMAND      %CPU   TIME   #TH #PRTS #MREGS RPRVT  RSHRD  RSIZE  VSIZE
 1475 top          0.0%  0:00.01   1    14    18   284K   324K   580K  13.6M
 1466 smbspool     0.0%  0:00.21   1    12    88  5.20M   732K  5.66M  19.1M 
 1465 pstops       0.0%  0:00.05   1     9    15   116K   316K   364K  2.18M
 1464 cgpdftops    0.0%  0:00.50   1    18    52   452K  2.10M  1.29M  20.7M
 1422 Print Cent   0.0%  7:42.61   2    92   241  3.53M  8.70M  20.7M  53.0M
 1418 Navigator   88.3% 11:40.72   7   106   440  18.4M  26.8M  52.2M  95.6M

(Reporter)

Comment 26

16 years ago
Looks like there's some work going on... the nightly of 2002-10-16 04 has at
least one crashed on wake-up. Sorry I did not file that comment earlier, it
crashed another time (on an evil page) in the meantime so a crash-log is no more
available.

Comment 27

16 years ago
I too have noticed this bug on my powerbook, but lately after wake from sleep
when the CPU usage is ~100%  I run top and find that it is the Kernel Task (PID
 0 kernel_tas) that appears to be using all spare cycles. Quitting Chimera
returns the kernel task usage to normal, which got me thinking about what might
be happening. When I ran top -a, which is an event counting mode, for 30 secs on
a basically idle system I get very interesting results.  Here the terminal app
is in the foreground running top and all other apps are idling in the
background.  Look how much Chimera's event counts differ from the other idle apps!


[jagwire:~] % top -as 10

Processes:  62 total, 2 running, 60 sleeping... 164 threads      0:00:30
Load Avg:  0.07, 0.13, 0.13     CPU usage:  3.3% user, 2.6% sys, 94.1% idle
Networks:         1 ipkts/0K                 7 opkts /1K
Disks:            0 reads/0K                 8 writes/43K
VM:               0 pageins                  0 pageouts

PID COMMAND     %CPU  TIME   MSGS_SENT MSGS_RCVD BSDSYSCALL MACHSYSCALL CSWITCH
2580 top        0.3%  0:00.10 2190       2178       294        2553        59
2568 iChat      0.0%  0:00.00  0          0          0          0           0
2483 tcsh       0.0%  0:00.00  0          0          0          0           0
2482 login      0.0%  0:00.00  0          0          0          0           0
2459 Navigator  0.1%  0:00.05 891        677        3024       15823       7096
2425 Help Viewe 0.0%  0:00.00  0          0          0          0           0
2347 slpd       0.0%  0:00.00  0         90         78         180         105
2320 lookupd    0.0%  0:00.00  0          0          0          0           0
2115 Interface  0.0%  0:00.00  0          0          0          0           0
2114 Address Bo 0.0%  0:00.00  0          0          0          0           0
2113 Script Edi 0.0%  0:00.00  0          0          0          0           0
2103 OmniWeb    0.0%  0:00.00  0          0          0          0           0
2033 tcsh       0.0%  0:00.00  0          0          0          0           0
2032 login      0.0%  0:00.00  0          0          0          0           0
2031 Terminal   0.3%  0:00.10  79         69         29         279         153
1849 Disk Copy  0.0%  0:00.00  0          0          0          0           0
1640 hdi_agent  0.0%  0:00.00  0          0          0          0           0
(Reporter)

Comment 28

16 years ago
I am not aware of what was committed or not. 
The nightly of 2002102204 seems however to be fine... I had one
cpu-crunch-on-wake-up only since three days (it is always on, with applets,
animated gifs, forms, and flash-things) and just un-hiding it killed the cpu-crunch.
I am not sure I can say it's fixed (give me a week). 
But lasting three days is already quite good.
Paul
(Assignee)

Comment 29

16 years ago
This isn't fixed. To see the CPU eating, put your machine to sleep when chimera
shows a blinking caret in a text field.

Taking.
Assignee: pinkerton → sfraser
(Assignee)

Updated

16 years ago
Status: NEW → ASSIGNED
(Assignee)

Comment 30

16 years ago
Created attachment 108338 [details] [diff] [review]
Final timer patch

This patch does the trick. It fixes an issue with the previous patch, where
there was a time window after waking the machine, but before we got the wake
notification, during which timer events could still pile up, causing a spurt of
high CPU usage. The current patch disables all timer firing between sleep and
wake.
(Assignee)

Updated

16 years ago
Attachment #103111 - Attachment is obsolete: true
(Assignee)

Comment 31

16 years ago
Patch checked in. We probably need something similar on the Macho trunk.
Status: ASSIGNED → RESOLVED
Last Resolved: 16 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.