Closed Bug 773463 Opened 8 years ago Closed 8 years ago

devicemanagerSUT can use nearly 100% CPU while waiting for a response to a command

Categories

(Testing :: Mozbase, defect)

defect
Not set

Tracking

(Not tracked)

RESOLVED FIXED
mozilla17

People

(Reporter: wlach, Assigned: wlach)

Details

Attachments

(1 file)

While waiting for a response to a command, dmSUT just keeps on recv'ing repeatedly, resulting in the CPU spinning like crazy. This throws off things like Eideticker which need to use a lot of system resources while capturing. I'm sure it doesn't help the foopies in production either.

The solution is to use a simple select() to only try reading if data is available (or after a timeout)
Assignee: nobody → wlachance
Comment on attachment 641618 [details] [diff] [review]
Use select to only try reading from socket when it's ready for reading (or we timeout)

Review of attachment 641618 [details] [diff] [review]:
-----------------------------------------------------------------

have you tested this on try?  Have you verified that this resolved the CPU?

::: build/mobile/devicemanagerSUT.py
@@ +211,5 @@
>            if (self.debug >= 4): print "recv'ing..."
>  
>            # Get our response
>            try:
> +            ready = select.select([self._sock], [], [], 1) # Wait up to a second for socket to become ready for reading...

we assign ready, but do we use it?  Can we put the comment above this so we don't have along wrapped line?

Do you think 1 second is enough?
Attachment #641618 - Flags: review?(jmaher) → review+
(In reply to Joel Maher (:jmaher) from comment #2)
> Comment on attachment 641618 [details] [diff] [review]
> Use select to only try reading from socket when it's ready for reading (or
> we timeout)
> 
> Review of attachment 641618 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> have you tested this on try?  Have you verified that this resolved the CPU?

Haven't pushed to try, but have verified that this solves CPU issues at least in eideticker (which is probably most sensitive to it)
 
> ::: build/mobile/devicemanagerSUT.py
> @@ +211,5 @@
> >            if (self.debug >= 4): print "recv'ing..."
> >  
> >            # Get our response
> >            try:
> > +            ready = select.select([self._sock], [], [], 1) # Wait up to a second for socket to become ready for reading...
> 
> we assign ready, but do we use it?  Can we put the comment above this so we
> don't have along wrapped line?

Good points. Addressed in a followup patch that I pushed to try: https://tbpl.mozilla.org/?tree=Try&rev=e6794e68e286

> Do you think 1 second is enough?

I think it should be ok. We iterate through the containing loop up to 1000 times until we're done, so I think between that and the select we should be more than fine (another way of looking at it is that this is even safer than before).
(note: try did not successfully complete everything but a few crashtests/reftests but we're fairly confident this will work)
Didn't make it to mozilla-central before the uplift (merge was blocked on bug 774259). Adjusting milestone accordingly.
Target Milestone: --- → mozilla17
https://hg.mozilla.org/mozilla-central/rev/c1a1b8100952
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
I was looking through DeviceManager changes recently, and I looked more closely at this bug. It doesn't make any sense. Our sockets are in blocking mode, so if there is no data, recv() should not return. Spinning should only be possible if (a) the connection is closed (it would return None immediately) or (b) the socket is in nonblocking mode (in which case it would raise an exception if there's no data).

We have since added a fix for (a), which was being ignored before, and (b) is not applicable. This patch is still useful, though, in the case that the agent, or some shell command on the agent, has frozen or is otherwise taking too long. But I would love to investigate the root cause of this bug more closely.
You need to log in before you can comment on or make changes to this bug.