When debugging with gdb the WindowServer process will start using lots of CPU.
This seems to be caused by nsToolkit::RegisterForAllProcessMouseEvents presumably causing a large queue of events that are waiting to be delivered.
Oh sweet jesus I'm glad we finally have a lead on the cause of this bug!
This issue has come up a number of times before, and I've even landed
a couple of patches to improve the situation -- one to stop using
event taps in apps that (like Camino) use native context menus (bug
389683), and another to make sure an event tap gets destroyed when
it's released (bug 436897).
But we can't simply get rid of event taps, so I'm not sure what more
can be done.
Bug 536350 is another (fairly) recent bug with a similar complaint.
I also see another behavior which I bet is related to this bug. When I kill Firefox in the middle of a debugging session, my mouse events get dispatched to other windows, which causes windows to move around randomly, and also sometimes other application windows stop responding to any mouse events for quite some time. This is the single most annoying thing that I have to deal with on a daily basis. :(
At the very least, we should have something which one can specify in their mozconfig to exclude the culprit code in their development builds...
(In reply to comment #3)
> Firefox in the middle of a debugging session, my mouse events get dispatched to
> other windows, which causes windows to move around randomly, and also sometimes
Jesse once filed a similar sort of bug about events getting sent to the wrong place, or almost as if mouse buttons were stuck, or something; I can't find the bug right now.
My bug is bug 393664.
(In reply to comment #2)
> But we can't simply get rid of event taps, so I'm not sure what more
> can be done.
We could switch to native context menus.
> We could switch to native context menus.
No, we can't. As I understand it, there are things that Gecko context
menus need to be able to do that native OS X context menus can't do.
(I think Josh is more familiar with the story here than I am.)
But in any case, it's entirely disproportionate to make a major
architectural change to fix this bug.
I see it too -- several times a day. But it's at worst a minor
annoyance. I work around it by clicking on the desktop, then back on
the window where my gdb session is running (sometimes I go back and
forth a couple of times).
Ultimately this is an Apple bug. But I can't report it to Apple
without reliable steps to reproduce, which I've never been able to
So here's a challenge to you guys: Get me 100% reliable steps to
reproduce, and I'll not only report this bug to Apple, but I'll spend
a day or two looking for a code-based workaround (so that we no longer
have to click on the desktop and back).
I think there's some confusion about what RegisterForAllProcessMouseEvents() does and what it's needed for. It does two different things:
1. It installs a system-wide Carbon event handler for mouse *move* events.
2. It creates a system tap for mouse *down* events.
Part 1 is necessary for mouseover feedback while our app is in the background. Without it, for example moving the mouse over the bookmarks bar of an inactive Firefox window wouldn't give the bookmark the hover appearance. Context menus are affected in a similar way: Only context menus opened on inactive windows wouldn't show mouse-over feedback. Context menus opened while our application is active would still work.
Part 2 is necessary to work around a 10.4 bug, according to the comment there. We should definitely check whether it's still necessary on 10.5 or 10.6.
The WindowServer CPU usage problem is probably related to part 1. The crash-causes-stuck-mouse-button problem sounds more like part 2.
(In reply to comment #7)
> > We could switch to native context menus.
> No, we can't. As I understand it, there are things that Gecko context
> menus need to be able to do that native OS X context menus can't do.
> (I think Josh is more familiar with the story here than I am.)
What are these things? and are they that valuable? Not having native OS X context menus has caused us other problems in the past too.
> So here's a challenge to you guys: Get me 100% reliable steps to
> reproduce, and I'll not only report this bug to Apple, but I'll spend
> a day or two looking for a code-based workaround (so that we no longer
> have to click on the desktop and back).
Attaching gdb, breaking and waiting seems to reproduce the problem reliably for me.
> Attaching gdb, breaking and waiting seems to reproduce the problem reliably for
It doesn't for me. You (or someone else) needs to do better than this.
(In reply to comment #8)
Thanks, Markus, for your comment.
> Part 2 is necessary to work around a 10.4 bug, according to the comment there.
> We should definitely check whether it's still necessary on 10.5 or 10.6.
I'll look into this as soon as I can -- probably sometime next week.
(Following up comment #7)
> I see it too -- several times a day.
The problem I see is what Ehsan reports in comment #3 (or very similar
to it). On Markus's analysis, this is "part 2", or "problem #2".
I suspect these problems are ultimately one (Apple) bug, in their
implementation of event taps. If I can actually get rid of event taps
on the trunk (if it turns out they're only needed for OS X 10.4, which
is no longer supported on the trunk), we may be able to find out.
(In reply to comment #7)
> But in any case, it's entirely disproportionate to make a major
> architectural change to fix this bug.
Yes. This is a bug about gdb causing problems -- while there may or may not be other reasons for making such changes, we certainly shouldn't be making big changes just for the sake of nicer debugging with gdb.
But maybe there's another option... Can we have a compile flag (or pref) that disables the stuff causing problems, at the cost of making some things not work correctly/normally? EG, wrt comment 8, I think it would be entirely reasonable to have a way to unfuck gdb at the cost of not being able to do so on 10.4 and resulting in quirky mouse tracking in certain cases.
[Assuming any fix is needed at all; I used to hit this all the time, but I don't remember running into it recently. Better in 10.6, perhaps?]
For what it's worth, I saw the "window server uses lots of cpu" thing all the time on 10.5 (every time I stopped the app in gdb); I haven't seen it once on the newer 10.6 machine. So it's possible that one got fixed in 10.6.
The mouse event issue, though, definitely happens on 10.6. It happens on any crash; gdb need not be involved. It'd be really nice to fix it....
(In reply to comment #14)
> For what it's worth, I saw the "window server uses lots of cpu" thing all the
> time on 10.5 (every time I stopped the app in gdb); I haven't seen it once on
> the newer 10.6 machine. So it's possible that one got fixed in 10.6.
> The mouse event issue, though, definitely happens on 10.6. It happens on any
> crash; gdb need not be involved. It'd be really nice to fix it....
I've also recently seen strange mouse issues when switching between windows where my mouse starts to think that I'm holding down the button, so every mouse move will either turn into a selection or a drag operation. I usually have to switch between several windows multiple times to resolve this (and I remember having to restart my computer twice because of this once it stuck in this mode and didn't recover.)
Should we spin off broken-mouse state thing to a separate bug? Especially since the original issue here (high CPU usage) sounds like it may just be a wontfix as people move to 10.6.
> Should we spin off broken-mouse state thing to a separate bug?
Let's not do that right away.
The broken-mouse state is actually the only problem I ever see (I
never see the mouse-events-causes-high-CPU problem). And like I said
in comment #12, I suspect both of these problems are one bug -- an
Apple bug in their implementation of event taps (possibly with
different symptoms on OS X 10.6 and 10.5).
As I said above, I want to follow Markus' suggestion from comment #8,
and possibly get rid of event taps on the trunk.
I probably won't get to this next week. But hopefully I can start on
it the week after next.
If I can't get rid of event taps on the trunk, I'll consider adding some kind of setting to turn them off (in addition to the "ui.use_native_popup_windows" setting that already exists).
For what it's worth, I still regularly see high WindowServer CPU usage while debugging Mozilla on 10.6; just a couple of days ago it was fluctuating around 40+% while debugging. What did change for me with 10.6 is that the machine no longer becomes gradually slower and slower until it's unusable. Maybe that indicates that WindowServer wasn't the actual problem in 10.5, or maybe something in WindowServer changed in 10.6 so that it doesn't cause the same problems for the system.
...and now that I just claimed that 10.6 doesn't slow down for me...it's doing exactly that during my current debug session. Apps are currently taking about 1-10s to respond to mouse events although the mouse cursor moves just fine and text input is unaffected. *sigh* I hope this is a coincidental one-off, otherwise I'll be forced back to avoiding debugging stuff...
Could the slowdown somehow be related to stepping through painting code? That would make no sense, right? But that's what I was just doing, and it's something I've not done for a while (maybe why I've not been having this problem recently?).
Jonathan: While your hosed session is still running, could you do 'thread apply all bt' and post the results here? Also post (attach) whatever other debugging info you think might be helpful.
$ thread apply all bt
-bash: thread: command not found
One observation I've just made is that the slowdown seems to occur (or perhaps just deteriorates much more rapidly) as a result of *click* events. Maybe that would explain why some people don't encounter this [so often]? (i.e. those who tend to type gdb commands during their debug sessions will be less likely to encounter the problem, but those that tend to click on Xcode debugger buttons (as I do) would be much more likely to?)
In fact with Firefox stopped in the debugger, I can sit at a bash shell watching the output from 'top -o cpu' and see that WindowServer CPU use will increase both as a result of mouse clicks and mouse moves (mouse interaction in the shell window only). However, there seems to be a significant difference between clicks and mousemove events. With mousemoves, after I stop moving the mouse WindowServer's CPU usage returns to the level it was at prior to me moving the mouse around; with mouse clicks on the other hand, the level will not drop back down to the same level as it was at before I started clicking. As WindowServer's "idle" CPU level increases due to the clicks, the system slowdown seems to get worse.
It did take quite a lot of clicking in the shell window to produce a noticeable slowdown though, whereas during my last debug session it didn't seem to take nearly so long, and a lot fewer clicks I think. So I'm not totally sure about all this.
> -bash: thread: command not found
"thread apply all bt" is a gdb command (to be run in the debugger attached to the app in question).
Created attachment 492417 [details]
thread apply all bt
(In reply to comment #21)
> Could the slowdown somehow be related to stepping through painting code?
No, I get this often as well, and I've never debugged the painting code at all. It sometimes slows down things to a degree that I have to force reboot.
I've also noticed the memory used to go up very quickly, which swaps out other apps to disk, and causes a fair amount of the slowdown.
Jeff, can you please attach a patch to disable this feature for our locale builds for the benefit of me and other poor souls who are suffering from this on a daily basis? Thanks!
Created attachment 492713 [details] [diff] [review]
Disable mouse hooking
This should do the trick.
I hope this is fixed by bug 675208 which removed the mouse move event monitor.
Please reopen this bug if you run into it again.