Closed Bug 806363 Opened 7 years ago Closed 7 years ago

hal doesn't release wake locks when process crashes

Categories

(Core :: Hardware Abstraction Layer (HAL), defect)

All
Gonk (Firefox OS)
defect
Not set

Tracking

()

RESOLVED FIXED
mozilla19
blocking-basecamp +
Tracking Status
firefox18 --- fixed

People

(Reporter: justin.lebar+bug, Assigned: kanru)

References

Details

Attachments

(1 file, 2 obsolete files)

When a process crashes, it doesn't release its wake locks.  That means if a process crashes while holding the screen or CPU wake lock, we'll never release that lock!
Kan-Ru, would you be interested in taking this bug?  When the child crashes, we'll run ContentParent::ActorDestroy.  The way I might do it is to fire an observer on all shutdowns (right now we only fire an observer on crashes) and listen to that observer from the hal code; that way, we'll always clear the wake locks when the process exits, even if it's not a crash.

You'll also need to change how the wake locks are stored so you know which locks came from which processes, of course.  :)
blocking-basecamp: --- → ?
OS: Mac OS X → Gonk (Firefox OS)
Hardware: x86 → All
If these are the normal android wakelocks that you're talking about, then the only place that this can be dealt with properly is inside the kernel.

Firing observers or even attempting stuff from userspace will only work when the app dies in a controlled manner. If the app dies due to sigkill (say due to OOM) then there is no user-side cleanup opportunity.
> If these are the normal android wakelocks that you're talking about

I guess there are two separate things here.

One is: What happens if a content process dies?  The main process keeps a tally of how many times each wake lock has been acquired, and when a content process dies, that tally should be revised.

But then there's a second question like you say about what happens when the main process dies.  Maybe we can handle that by resetting all of the system wake locks whenever the main process starts?
blocking-basecamp: ? → +
Android wake locks are broken-by-design, we have to work around that in gecko.
(In reply to Justin Lebar [:jlebar] from comment #0)
> When a process crashes, it doesn't release its wake locks.  That means if a
> process crashes while holding the screen or CPU wake lock, we'll never
> release that lock!

Ha! I also thought about this possibility. You beat me to file the bug :)
Assignee: nobody → kchen
I'm not sure whether I'm using LinkedList correctly or not. Old test works as expected. Could I test this by crashing one or two processes purposely?
Attachment #679035 - Flags: review?(justin.lebar+bug)
Comment on attachment 679035 [details] [diff] [review]
Remove wake locks on content-shutdown. v1

>diff --git a/hal/sandbox/PHal.ipdl b/hal/sandbox/PHal.ipdl
>--- a/hal/sandbox/PHal.ipdl
>+++ b/hal/sandbox/PHal.ipdl
>@@ -147,17 +147,17 @@ parent:
>     EnableSystemTimezoneChangeNotifications();
>     DisableSystemTimezoneChangeNotifications();
> 
>     sync SetLight(LightType light, LightConfiguration aConfig)
>       returns (bool status);
>     sync GetLight(LightType light)
>       returns (LightConfiguration aConfig, bool status);
> 
>-    ModifyWakeLock(nsString aTopic, WakeLockControl aLockAdjust, WakeLockControl aHiddenAdjust);
>+    ModifyWakeLock(nsString aTopic, WakeLockControl aLockAdjust, WakeLockControl aHiddenAdjust, uint64_t aID);

aProcessID?  Although I wonder why we need this; can't we get the ID off the
ContentParent, which manages HalParent?

>diff --git a/dom/ipc/ContentParent.h b/dom/ipc/ContentParent.h
>+    uint64_t mChildID;

You should initialize this in the constructor so the value isn't left
uninitialized.  Initializing to -1 would be fine with me; 0 is apparently
already taken.

>diff --git a/hal/Hal.cpp b/hal/Hal.cpp
>--- a/hal/Hal.cpp
>+++ b/hal/Hal.cpp
>@@ -2,16 +2,17 @@
> /* vim: set sw=2 ts=8 et ft=cpp : */
> /* This Source Code Form is subject to the terms of the Mozilla Public
>  * License, v. 2.0. If a copy of the MPL was not distributed with this file,
>  * You can obtain one at http://mozilla.org/MPL/2.0/. */
> 
> #include "Hal.h"
> #include "HalImpl.h"
> #include "HalSandbox.h"
>+#include "HalWakeLock.h"
> #include "mozilla/Util.h"
> #include "nsThreadUtils.h"
> #include "nsXULAppAPI.h"
> #include "mozilla/Observer.h"
> #include "nsIDOMDocument.h"
> #include "nsIDOMWindow.h"
> #include "mozilla/Services.h"
> #include "nsIWebNavigation.h"
>@@ -648,17 +649,23 @@ UnregisterWakeLockObserver(WakeLockObser
> }
> 
> void
> ModifyWakeLock(const nsAString &aTopic,
>                WakeLockControl aLockAdjust,
>                WakeLockControl aHiddenAdjust)
> {
>   AssertMainThread();
>-  PROXY_IF_SANDBOXED(ModifyWakeLock(aTopic, aLockAdjust, aHiddenAdjust));
>+  if (InSandbox()) {
>+    if (!hal_sandbox::IsHalChildLive()) {
>+      hal_sandbox::ModifyWakeLock(aTopic, aLockAdjust, aHiddenAdjust);
>+    }
>+  } else {
>+    ModifyWakeLockParent(aTopic, aLockAdjust, aHiddenAdjust, 0);
>+  }
> }

I don't understand why you renamed hal_impl::ModifyWakeLock to
hal::ModifyWakeLock.  (Is that actually what you did?)  It looks like you did
it for a reason, though.

The flow of control through hal (i.e., hal:: --> hal_sandbox:: --> up one
process --> hal:: --> eventually to --> hal_impl::) is confusing even without
an exception added here, so I'd prefer to remain consistent, unless there's a
good reason to change things.   If there is, can you explain in a comment
somewhere what the reason is?

(Did you do the rename in an attempt to avoid the problem with nested content
processes and the ids not being globally unique?  It's not clear to me that
we're succeeding in this respect.  If we have nested processes A -> B -> C and
C acquires a wake lock, A needs to be informed somehow, but this code, if I
follow it correctly, only informs B.

If it's a pain to make this correct for nested content processes -- and it
looks to me like it is -- I'd rather that we just add an assertion and file a
bug blocking 761935.)

>diff --git a/hal/HalWakeLock.h b/hal/HalWakeLock.h
>--- a/hal/HalWakeLock.h
>+++ b/hal/HalWakeLock.h
>@@ -15,12 +15,16 @@ enum WakeLockState {
>   WAKE_LOCK_STATE_VISIBLE
> };
> 
> /**
>  * Return the wake lock state according to the numbers.
>  */
> WakeLockState ComputeWakeLockState(int aNumLocks, int aNumHidden);
> 
>+void ModifyWakeLockParent(const nsAString &aTopic,
>+                          hal::WakeLockControl aLockAdjust,
>+                          hal::WakeLockControl aHiddenAdjust,
>+                          uint64_t aID);

aProcessID, here and elsewhere?

> } // hal
> } // mozilla
> 
> #endif /* __HAL_WAKELOCK_H_ */
>diff --git a/hal/sandbox/SandboxHal.cpp b/hal/sandbox/SandboxHal.cpp
>--- a/hal/sandbox/SandboxHal.cpp
>+++ b/hal/sandbox/SandboxHal.cpp
>@@ -1,15 +1,16 @@
> /* -*- Mode: C++; tab-width: 8; indent-tabs-mode: nil; c-basic-offset: 2 -*- */
> /* vim: set sw=2 ts=8 et ft=cpp : */
> /* This Source Code Form is subject to the terms of the Mozilla Public
>  * License, v. 2.0. If a copy of the MPL was not distributed with this file,
>  * You can obtain one at http://mozilla.org/MPL/2.0/. */
> 
> #include "Hal.h"
>+#include "HalWakeLock.h"
> #include "mozilla/AppProcessPermissions.h"
> #include "mozilla/dom/ContentChild.h"
> #include "mozilla/hal_sandbox/PHalChild.h"
> #include "mozilla/hal_sandbox/PHalParent.h"
> #include "mozilla/dom/TabParent.h"
> #include "mozilla/dom/TabChild.h"
> #include "mozilla/dom/battery/Types.h"
> #include "mozilla/dom/network/Types.h"
>@@ -273,17 +274,17 @@ void
> DisableWakeLockNotifications()
> {
>   Hal()->SendDisableWakeLockNotifications();
> }
> 
> void
> ModifyWakeLock(const nsAString &aTopic, WakeLockControl aLockAdjust, WakeLockControl aHiddenAdjust)
> {
>-  Hal()->SendModifyWakeLock(nsString(aTopic), aLockAdjust, aHiddenAdjust);
>+  Hal()->SendModifyWakeLock(nsString(aTopic), aLockAdjust, aHiddenAdjust, dom::ContentChild::GetSingleton()->GetID());
> }
> 
> void
> GetWakeLockInfo(const nsAString &aTopic, WakeLockInformation *aWakeLockInfo)
> {
>   Hal()->SendGetWakeLockInfo(nsString(aTopic), aWakeLockInfo);
> }
> 
>@@ -700,20 +701,21 @@ public:
>   
>   void Notify(const SensorData& aSensorData) {
>     unused << SendNotifySensorChange(aSensorData);
>   }
> 
>   virtual bool
>   RecvModifyWakeLock(const nsString &aTopic,
>                      const WakeLockControl &aLockAdjust,
>-                     const WakeLockControl &aHiddenAdjust) MOZ_OVERRIDE
>+                     const WakeLockControl &aHiddenAdjust,
>+                     const uint64_t &aID) MOZ_OVERRIDE
>   {
>     // We allow arbitrary content to use wake locks.
>-    hal::ModifyWakeLock(aTopic, aLockAdjust, aHiddenAdjust);
>+    hal::ModifyWakeLockParent(aTopic, aLockAdjust, aHiddenAdjust, aID);
>     return true;
>   }
> 
>   virtual bool
>   RecvEnableWakeLockNotifications() MOZ_OVERRIDE
>   {
>     // We allow arbitrary content to use wake locks.
>     hal::RegisterWakeLockObserver(this);

>diff --git a/hal/HalWakeLock.cpp b/hal/HalWakeLock.cpp
>+namespace {
>+/**
>+ * Per-child lock count
>+ */
>+struct LockCount : public LinkedListElement<LockCount> {

Perhaps ProcessLockCount?

>+  LockCount(uint64_t aID)
>+    : mID(aID)
>+    , numLocks(0)
>+    , numHidden(0)
>+  {}
>+  uint64_t mID;

mProcessID, please.

>+  uint32_t numLocks;
>+  uint32_t numHidden;
>+};
>+typedef LinkedList<LockCount> LockList;
>+typedef nsClassHashtable<nsStringHashKey, LockList> LockTable;

It's not clear to me that we're gaining a lot by using a LinkedList<LockCount>
instead of nsTarray<LockCount>.  RemoveChildFromList is O(# processes) in both
cases.  But with LinkedList, we have manual memory management, which is a bit
scary.

LinkedList really shines when you can do an O(1) removal -- that is, you're
given |T* obj|, and you can do obj->Remove().  But if you have to search the list
to find obj, then it's not much better than a vector.

If you want fast asymptotic behavior here, you'd need to use a hashtable
instead of LockList.  I'm not convinced that's worthwhile, but I'm not going to
r- a patch for being too efficient.  :)  (But note that even with nested
hashtables, it's still O(# wake lock names), since you have to iterate over the
outer table.)

>+static int sActiveListeners = 0;
>+static StaticAutoPtr<LockTable> sLockTable;
>+static bool sInitialized = false;
>+static bool sIsShuttingDown = false;
>+
>+static PLDHashOperator
>+ClearList(const nsAString& aKey, LockList* aList, void* aUserArg)
>+{
>+  LockCount* c = aList->popFirst();
>+  while (c) {
>+    delete c;
>+    c = aList->popFirst();
>+  }

You can do

  LockCount* c;
  while (c = aList->popFirst()) {
    delete c;
  }

>+class CleanupOnContentShutdown MOZ_FINAL : public nsIObserver {
>+public:
>+  NS_DECL_ISUPPORTS
>+  NS_DECL_NSIOBSERVER
>+};
>+
>+NS_IMPL_ISUPPORTS1(CleanupOnContentShutdown, nsIObserver)
>+
>+NS_IMETHODIMP
>+CleanupOnContentShutdown::Observe(nsISupports* aSubject, const char* aTopic, const PRUnichar* data)
>+{
>+  MOZ_ASSERT(!strcmp(aTopic, "ipc:content-shutdown"));
>+
>+  nsCOMPtr<nsIPropertyBag2> props = do_QueryInterface(aSubject);
>+  if (!props) {
>+    NS_WARNING("ipc:content-shutdown message without property bag as subject");
>+    return NS_OK;
>+  }
>+
>+  uint64_t childID = 0;
>+  nsresult rv = props->GetPropertyAsUint64(NS_LITERAL_STRING("childID"),
>+                                           &childID);
>+  if (NS_SUCCEEDED(rv)) {
>+    sLockTable->EnumerateRead(RemoveChildFromList, &childID);
>+  } else {
>+    NS_WARNING("ipc:content-shutdown message without childID property");
>+  }
>+  return NS_OK;
>+}

I don't see you registering this observer anywhere.

>+void
>+GetWakeLockInfo(const nsAString &aTopic, WakeLockInformation *aWakeLockInfo)
>+{
>+  if (sIsShuttingDown) {
>+    NS_WARNING("You don't want to get wake lock information during xpcom-shutdown!");
>+    return;
>+  }
>+  if (!sInitialized) {
>+    Init();
>+  }
>+
>+  uint32_t numLocks = 0;
>+  uint32_t numHidden = 0;
>+  LockList* list = sLockTable->Get(aTopic);
>+  if (!list) {
>+    aWakeLockInfo->numLocks() = 0;
>+    aWakeLockInfo->numHidden() = 0;
>+    aWakeLockInfo->topic() = aTopic;
>+    return;
>+  }
>+  for (LockCount* c = list->getFirst(); c != NULL; c = c->getNext()) {

s/NULL/nullptr/, here and elsewhere.

Maybe move the if statement above so it wraps this for loop so we don't have to
repeat code?  Or you could do

    for (LockCount* c = list ? list->getFirst() : nullptr; ...)

which is probably too clever for its own good.  :)
Attachment #679035 - Flags: review?(justin.lebar+bug) → review-
(In reply to Justin Lebar [:jlebar] from comment #7)
> Comment on attachment 679035 [details] [diff] [review]
> Remove wake locks on content-shutdown. v1
> 
> >diff --git a/hal/sandbox/PHal.ipdl b/hal/sandbox/PHal.ipdl
> >--- a/hal/sandbox/PHal.ipdl
> >+++ b/hal/sandbox/PHal.ipdl
> >@@ -147,17 +147,17 @@ parent:
> >     EnableSystemTimezoneChangeNotifications();
> >     DisableSystemTimezoneChangeNotifications();
> > 
> >     sync SetLight(LightType light, LightConfiguration aConfig)
> >       returns (bool status);
> >     sync GetLight(LightType light)
> >       returns (LightConfiguration aConfig, bool status);
> > 
> >-    ModifyWakeLock(nsString aTopic, WakeLockControl aLockAdjust, WakeLockControl aHiddenAdjust);
> >+    ModifyWakeLock(nsString aTopic, WakeLockControl aLockAdjust, WakeLockControl aHiddenAdjust, uint64_t aID);
> 
> aProcessID?  Although I wonder why we need this; can't we get the ID off the
> ContentParent, which manages HalParent?

Yes, we could. The id was only known to the ContentChild, but my
patch adds it to the ContentParent so I guess it is possible.

> > void
> > ModifyWakeLock(const nsAString &aTopic,
> >                WakeLockControl aLockAdjust,
> >                WakeLockControl aHiddenAdjust)
> > {
> >   AssertMainThread();
> >-  PROXY_IF_SANDBOXED(ModifyWakeLock(aTopic, aLockAdjust, aHiddenAdjust));
> >+  if (InSandbox()) {
> >+    if (!hal_sandbox::IsHalChildLive()) {
> >+      hal_sandbox::ModifyWakeLock(aTopic, aLockAdjust, aHiddenAdjust);
> >+    }
> >+  } else {
> >+    ModifyWakeLockParent(aTopic, aLockAdjust, aHiddenAdjust, 0);
> >+  }
> > }
> 
> I don't understand why you renamed hal_impl::ModifyWakeLock to
> hal::ModifyWakeLock.  (Is that actually what you did?)  It looks like you did
> it for a reason, though.

I shuffled the functions and namespaces, so the patch looks a bit
messy. hal::ModifyWakeLock is still the dispatcher in Hal.cpp,
hal_impl::ModifyWakeLock is undefined. I added a new function
hal::ModifyWakeLockParent. This is moot if we get the child process id
from ContentParent.

> >+  uint32_t numLocks;
> >+  uint32_t numHidden;
> >+};
> >+typedef LinkedList<LockCount> LockList;
> >+typedef nsClassHashtable<nsStringHashKey, LockList> LockTable;
> 
> It's not clear to me that we're gaining a lot by using a
> LinkedList<LockCount>
> instead of nsTarray<LockCount>.  RemoveChildFromList is O(# processes) in
> both   
> cases.  But with LinkedList, we have manual memory management, which is a bit
> scary.

I agree that manual memory management is not very attractive for
LinkedList :)
 
> LinkedList really shines when you can do an O(1) removal -- that is, you're
> given |T* obj|, and you can do obj->Remove().  But if you have to search the
> list
> to find obj, then it's not much better than a vector.
> 
> If you want fast asymptotic behavior here, you'd need to use a hashtable
> instead of LockList.  I'm not convinced that's worthwhile, but I'm not going
> to
> r- a patch for being too efficient.  :)  (But note that even with nested
> hashtables, it's still O(# wake lock names), since you have to iterate over
> the
> outer table.)

I tend to avoid hashtables and vectors when the key space is usually
small but could grow non-deterministically. Do our hashtables and
nsTArray shrink their size?

> >+static int sActiveListeners = 0;
> >+static StaticAutoPtr<LockTable> sLockTable;
> >+static bool sInitialized = false;
> >+static bool sIsShuttingDown = false;
> >+
> >+static PLDHashOperator
> >+ClearList(const nsAString& aKey, LockList* aList, void* aUserArg)
> >+{
> >+  LockCount* c = aList->popFirst();
> >+  while (c) {
> >+    delete c;
> >+    c = aList->popFirst();
> >+  }
> 
> You can do
> 
>   LockCount* c;
>   while (c = aList->popFirst()) {
>     delete c;
>   }

Ah, old habit to avoid assignment in condition. Looks like you are ok
with that :)

> >+  for (LockCount* c = list->getFirst(); c != NULL; c = c->getNext()) {
> 
> s/NULL/nullptr/, here and elsewhere.

Oops, copied from LinkedList.h

Thanks for the review, I'll try to make the patch smaller.
(In reply to Kan-Ru Chen [:kanru] from comment #8)
> > You can do
> > 
> >   LockCount* c;
> >   while (c = aList->popFirst()) {
> >     delete c;
> >   }
> 
> Ah, old habit to avoid assignment in condition. Looks like you are ok
> with that :)
> 

That's borderline bad style.  However,

  while (LockCount* c = aList->popFirst()) {
    delete c;
  }

is even more concise and good style! ;)
> Do our hashtables and nsTArray shrink their size?

Good question!

Looks like nsTArray doesn't shrink itself unless you remove all its elements;
the relevant function is ShiftData, in nsTArray-inl.h.  Maybe we should fix
that, although it's of course tricky to do without thrashing.  You can
definitely call Compact(), though.

It looks like PLDHash, which backs most of our hashtables, does shrink itself.
Search for "Shrink if" in pldhash.cpp.

> I tend to avoid hashtables and vectors when the key space is usually
> small but could grow non-deterministically. 

I'm totally in favor of using a linked list if it gives us asymptotic gains
over a vector / hashtable.  But I don't think any operations are asymptotically
faster here, so I think the case for LinkedList with manual memory management
is more tenuous.
Changed to use hashtable.

hal::ModifyWakeLock calls hal::ModifyWakeLockInternal and passes aProcessID to it so that in a nested processes setup, the leaf child process id will be passed along to chrome process.
Attachment #679035 - Attachment is obsolete: true
Attachment #680561 - Flags: review?(justin.lebar+bug)
This looks great to me.

>diff --git a/hal/Hal.cpp b/hal/Hal.cpp
> void
>-ModifyWakeLock(const nsAString &aTopic,
>+ModifyWakeLock(const nsAString& aTopic,
>                WakeLockControl aLockAdjust,
>                WakeLockControl aHiddenAdjust)
> {
>   AssertMainThread();
>-  PROXY_IF_SANDBOXED(ModifyWakeLock(aTopic, aLockAdjust, aHiddenAdjust));
>+  ModifyWakeLockInternal(aTopic, aLockAdjust, aHiddenAdjust,
>+                         InSandbox() ? dom::ContentChild::GetSingleton()->GetID() : 0);
> }
> 
> void
>-GetWakeLockInfo(const nsAString &aTopic, WakeLockInformation *aWakeLockInfo)
>+ModifyWakeLockInternal(const nsAString& aTopic,
>+                       WakeLockControl aLockAdjust,
>+                       WakeLockControl aHiddenAdjust,
>+                       uint64_t aProcessID)
>+{
>+  AssertMainThread();
>+  PROXY_IF_SANDBOXED(ModifyWakeLockInternal(aTopic, aLockAdjust, aHiddenAdjust, aProcessID));
>+}

This is a lot clearer now.  Thanks.

This will not do the right thing if we have nested content processes, because
the IDs are not globally unique; they're unique only among sibling processes.
(See the contortions the vibrate code goes through to avoid this problem.)

I'm fine having this bug in this code -- we have other problems once we have
nested content processes.  But rather than doing the wrong thing silently, can
we add an assertion somehow?  (Maybe you could change hal::ModifyWakeLock to
proxy if sandboxed; then hal::ModifyWakeLockInternal can assert that it's
called only from the main process.)

I'd also like us to file a follow-up bug on this and mark it blocking bug
761935.

r=me with the added assertion!
sLockTable was cleaned twice on chrome shutdown.

Try again: https://tbpl.mozilla.org/?tree=Try&rev=b779c9f8c36d
patch to commit.
Attachment #680561 - Attachment is obsolete: true
Attachment #680561 - Flags: review?(justin.lebar+bug)
Attachment #682754 - Flags: review+
https://hg.mozilla.org/mozilla-central/rev/fc4e7e3a7407
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla19
This patch doesn't seem to notify observers in the parent that the wakelock has gone away.
(In reply to Dave Hylands [:dhylands] from comment #19)
> This patch doesn't seem to notify observers in the parent that the wakelock
> has gone away.

If this doesn't work properly, please file a new bug and mark it as blocking this bug.
Depends on: 814822
Blocks: 785124
You need to log in before you can comment on or make changes to this bug.