Closed Bug 809098 Opened 12 years ago Closed 7 years ago

Find out when do the kernel or hal expect gecko to hold wake locks

Categories

(Firefox OS Graveyard :: General, defect)

x86
macOS
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED WONTFIX

People

(Reporter: justin.lebar+bug, Unassigned)

References

Details

In bug 804707 comment 9, Gene showed that holding the CPU wake lock when receiving a call makes the dialer app load 5s faster.

But we don't really understand why this happens.

Understanding this is important because if the CPU is going to sleep here, perhaps it's going to sleep elsewhere as well, and perhaps there are other operations we can speed up by holding wake locks (or by modifying the kernel, or something).
This sounds weird. The only reason the CPU should go to sleep when not holding the wake lock is because the kernel's idle function gets called. This implies that there isn't any work to be done, which means that we're waiting for a timer or some type of async I/O to complete.

Furthermore, because holding the wakelock makes it faster, it also seems to imply that either the CPU is going idle alot (in a previous world, I seem to recall typical suspend/resume times should be on the order of 100 msec - maybe they're longer here), or there is some extreme overhead in performing the suspend/resume.
I just found that the device is staying in early suspend until writing an "on" to "/sys/power/state".  This action is taken by "set_screen_state()" that we used to turn on the screen.  I guess to send an "on" to ".../state" bring the device away early suspend, and some resume functions of early suspend bring the CPU back full speed.

Wakelock seems also bring the device back from early suspend.  But, the source code I have on my hand show acquiring a wakelock don't bring the device back from early suspend state, but it can sustain the device on wake.  It is a weir behavior.  I guess it is a bug of the kernel source code on my hand.  Gene is trying it.  I will do more study over the source code.
Gene told me that AlarmAPI have the same issue, but AlarmAPI face a more serious issue, the app take more time to wakeup.  I had learned that libril will grab a wakelock for unsolicited responses.  But, it hold it for very short time, 1 second according the code in the our source tree.  I guess it is why AlarmAPI take more time to bring apps up.  1 second is not long enough to bring our app up.  I think the origin idea of 1 second of holding wakelock is to provide a short CPU burst to run into sleeping, but it is not enough for our software stack.
I don't understand either. Ringing the phone should have the same effect as |echo on > /sys/power/state|, to bring the kernel back from suspend or early_suspend state and allow the CPU to run at its full speed. Acquiring a wake lock or not should not affect the performance at all. Lets run a simple benchmark to verify this assumption.
Could someone provide the dmesg for investigation from kernel point of view on cases of wakelock or not? Thanks.
None of the transitions among wake<->early_suspend<->suspend are free. The cost maybe high due to improper implementations. Are there possibilities that certain applications that do not eats up all CPU time and result in a large number of power state transitions? That would hurt performance a lot. In this case, a CPU-eager benchmark might not be able to tell the difference between holding wake locks or not.
So using this little benchmark https://gist.github.com/4030106 it shows that we run at full speed no matter we are in early suspend or wakeup. My number is about 88 loop/us.

Although the log shows something like

 [3267: kworker/u:0][cpufreq] [1]change policy freq from (1e000,c3500) to (c3500,c3500)
 [3267: kworker/u:0][cpufreq] [0]change policy freq from (c3500,c3500) to (1e000,c3500)

when entering/leaving early suspend, the actual frequency does not change.

One interesting thing is that we does support cpufreq scaling but we use the "performance" governor by default.

  $ cat /sys/devices/system/cpu/cpu0/cpufreq/stats/time_in_state
  122880 0
  245760 0
  320000 0
  480000 0
  600000 0
  800000 9372857

We might want to utilize this feature to make the phone more power efficient.
(In reply to Kan-Ru Chen [:kanru] from comment #4)
> early_suspend state and allow the CPU to run at its full speed. Acquiring a
> wake lock or not should not affect the performance at all. Lets run a simple
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
This is a interesting topic!  What is happended if libril.so release its wakelock (1s) before b2g acquire its wakelock?  This is very likely the case that we have now.  In the normal, it should fall into suspend since the screen is off.  Apparently, it falls into a special situation, not really suspended.  It is not a problem if no any wakelock was acquired before screen on.  But, libril did acquire a lock for 1s.
So the root cause is the phone is suspended in unexpected time. See my comment in bug 804707.

We need to find out when do the kernel or the hal (rild, EventHub, etc) expect the upper layer (gecko) to hold wake locks.
Summary: Investigate why holding the CPU wake lock makes loading the dialer significantly faster → Find out when do the kernel or hal expect gecko to hold wake locks
(In reply to Kan-Ru Chen [:kanru] from comment #9)
> We need to find out when do the kernel or the hal (rild, EventHub, etc)
> expect the upper layer (gecko) to hold wake locks.
Actually, libril.so already do that.  It hold a wake lock (radio-interface) for 1s.  The problem is our stack does not finish handling in 1s.  We need more time.  The info from bug 804707 prove what my guessing that RING message would be sent by modem periodically, so the device would be wake up for several times.  But, I am not sure if Alarm also do the same thing.  Maybe Gene can provide more information.

I have discussed this with Gene yesterday.  We should acquire a wake lock at first moment; aka. rilproxy or ril_worker since the source of libril.so is usually unavailable.  That wake lock should be with a expire time, for example 4s to make enough waking time for handling it.
|grep -rs wake_lock_timeout b2g/kernel| throw out a lot of examples.  When the kernel sitting up, these handling drivers would acquire a wake lock with a expire time (0.5s ~ 2s).  They expect userspace code to handle it in that period and, then, fall into sleep again.  I think driver of mode also do the same thing.  It acquires a "qcril" wake lock for otoro.
(In reply to Thinker Li [:sinker] from comment #10)
> (In reply to Kan-Ru Chen [:kanru] from comment #9)
> > We need to find out when do the kernel or the hal (rild, EventHub, etc)
> > expect the upper layer (gecko) to hold wake locks.
> Actually, libril.so already do that.  It hold a wake lock (radio-interface)
> for 1s.  The problem is our stack does not finish handling in 1s.  We need
> more time.

libril.so is counted as part of hal. It expects someone to react, by adding more locks, in 1s or less. Yes, rilproxy might be a good candidate.
Closing this old B2G bug.
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.