Closed Bug 970307 Opened 10 years ago Closed 10 years ago

Preload in Nuwa process

Tracking

(Not tracked)

Status:

RESOLVED FIXED

Milestone:

2.2 S2 (19dec)

People

(Reporter: kk1fff, Assigned: kk1fff)

References

Details

Attachments

(5 files, 28 obsolete files)

WIP: find a point for Nuwa process to start to freeze 10 years ago Patrick Wang (Chih-Kai Wang) (:kk1fff) 13.33 KB, patch		Details \| Diff \| Splinter Review
WIP 10 years ago Patrick Wang (Chih-Kai Wang) (:kk1fff) 19.25 KB, patch		Details \| Diff \| Splinter Review
WIP 10 years ago Patrick Wang (Chih-Kai Wang) (:kk1fff) 21.21 KB, patch		Details \| Diff \| Splinter Review
WIP 10 years ago Patrick Wang (Chih-Kai Wang) (:kk1fff) 35.02 KB, patch		Details \| Diff \| Splinter Review
Part 1: Collect thread state in nsThreadStatusManager 10 years ago Patrick Wang (Chih-Kai Wang) (:kk1fff) 10.16 KB, patch		Details \| Diff \| Splinter Review
Part 2: Report status of each threads. 10 years ago Patrick Wang (Chih-Kai Wang) (:kk1fff) 11.92 KB, patch		Details \| Diff \| Splinter Review
Part 3: preload in Nuwa and wait until loading finish. 10 years ago Patrick Wang (Chih-Kai Wang) (:kk1fff) 7.70 KB, patch		Details \| Diff \| Splinter Review
Part 4: copy mSendPermissionUpdates when copying ContentParent 10 years ago Patrick Wang (Chih-Kai Wang) (:kk1fff) 2.38 KB, patch		Details \| Diff \| Splinter Review
Patch: Rebuild Parent/Child connection in modules 10 years ago Patrick Wang (Chih-Kai Wang) (:kk1fff) 8.86 KB, patch		Details \| Diff \| Splinter Review
Part 1: Collect thread state in nsThreadStatusManager 10 years ago Patrick Wang (Chih-Kai Wang) (:kk1fff) 10.09 KB, patch		Details \| Diff \| Splinter Review
Part 2: Report threads' status to nsThreadStatusManager 10 years ago Patrick Wang (Chih-Kai Wang) (:kk1fff) 14.81 KB, patch		Details \| Diff \| Splinter Review
Part 1: Collect thread state in nsThreadStatusManager 10 years ago Patrick Wang (Chih-Kai Wang) (:kk1fff) 10.71 KB, patch		Details \| Diff \| Splinter Review
Part 2: Report threads' status to nsThreadStatusManager 10 years ago Patrick Wang (Chih-Kai Wang) (:kk1fff) 14.82 KB, patch		Details \| Diff \| Splinter Review
Part 1: Collect thread state in nsThreadStatusManager 10 years ago Patrick Wang (Chih-Kai Wang) (:kk1fff) 10.64 KB, patch	froydnj : feedback+	Details \| Diff \| Splinter Review
Part 2: Report threads' status to nsThreadStatusManager 10 years ago Patrick Wang (Chih-Kai Wang) (:kk1fff) 14.80 KB, patch		Details \| Diff \| Splinter Review
Part 3: make nuwa wait for CanFreeze message and preload in nuwa. 10 years ago Patrick Wang (Chih-Kai Wang) (:kk1fff) 12.34 KB, patch	cyu : review-	Details \| Diff \| Splinter Review
Part 4: reinitialize AppServiceChild after a new process forked. 10 years ago Patrick Wang (Chih-Kai Wang) (:kk1fff) 8.94 KB, patch		Details \| Diff \| Splinter Review
Part 1: report status of each thread. 10 years ago Patrick Wang (Chih-Kai Wang) (:kk1fff) 23.87 KB, patch	froydnj : feedback+	Details \| Diff \| Splinter Review
Part 2: let nuwa wait for all tasks of preload to be done. 10 years ago Patrick Wang (Chih-Kai Wang) (:kk1fff) 13.42 KB, patch		Details \| Diff \| Splinter Review
Part 3: reinitialize some module after forking. 10 years ago Patrick Wang (Chih-Kai Wang) (:kk1fff) 9.74 KB, patch	fabrice : review+	Details \| Diff \| Splinter Review
Part 1: report status of each thread. 10 years ago Patrick Wang (Chih-Kai Wang) (:kk1fff) 30.95 KB, patch	froydnj : feedback+	Details \| Diff \| Splinter Review
Part 2: let nuwa wait for all tasks of preload to be done. 10 years ago Patrick Wang (Chih-Kai Wang) (:kk1fff) 16.00 KB, patch		Details \| Diff \| Splinter Review
Part 1: report status of each thread. 10 years ago Patrick Wang (Chih-Kai Wang) (:kk1fff) 32.92 KB, patch		Details \| Diff \| Splinter Review
Part 2: let nuwa wait for all tasks of preload to be done. 10 years ago Patrick Wang (Chih-Kai Wang) (:kk1fff) 16.01 KB, patch	cyu : feedback+	Details \| Diff \| Splinter Review
Part 1: report status of each thread. 10 years ago Patrick Wang (Chih-Kai Wang) (:kk1fff) 32.92 KB, patch	froydnj : feedback+	Details \| Diff \| Splinter Review
Part 1: report status of each thread. 10 years ago Patrick Wang (Chih-Kai Wang) (:kk1fff) 32.93 KB, patch	froydnj : review+	Details \| Diff \| Splinter Review
Part 4: bump the leak check threshold 10 years ago Patrick Wang (Chih-Kai Wang) (:kk1fff) 1.39 KB, patch	froydnj : review+	Details \| Diff \| Splinter Review
Part 2: let nuwa wait for all tasks of preload to be done. 10 years ago Patrick Wang (Chih-Kai Wang) (:kk1fff) 15.75 KB, patch	cyu : review+	Details \| Diff \| Splinter Review
Part 1: report status of each thread. (r=nfroyd) 10 years ago Patrick Wang (Chih-Kai Wang) (:kk1fff) 33.06 KB, patch	kk1fff : review+	Details \| Diff \| Splinter Review
Part 2: let nuwa wait for all tasks of preload to be done. (r=cyu) 10 years ago Patrick Wang (Chih-Kai Wang) (:kk1fff) 15.81 KB, patch	kk1fff : review+	Details \| Diff \| Splinter Review
Part 3: reinitialize some module after forking. (r=fabrice) 10 years ago Patrick Wang (Chih-Kai Wang) (:kk1fff) 9.28 KB, patch		Details \| Diff \| Splinter Review
Part 1b: fix dead lock. 10 years ago Patrick Wang (Chih-Kai Wang) (:kk1fff) 9.29 KB, patch	froydnj : feedback+	Details \| Diff \| Splinter Review
Part 1b: fix dead lock. 10 years ago Patrick Wang (Chih-Kai Wang) (:kk1fff) 14.42 KB, patch	froydnj : review+	Details \| Diff \| Splinter Review

Patrick Wang (Chih-Kai Wang) (:kk1fff)

Assignee

Description

•

10 years ago

Receiving for some IPC message after Nuwa is ready could make Nuwa frozen totally, we will need a mechanism to determine when to start to freeze Nuwa, so we can preload whole preload.js in Nuwa process.

Patrick Wang (Chih-Kai Wang) (:kk1fff)

Assignee

Comment 1

•

10 years ago

Attached patch WIP: find a point for Nuwa process to start to freeze (obsolete) — Details — Splinter Review

The basic idea of this patch is to find a point that is safe for Nuwa process to freeze. If there are IPC messages that cause Nuwa's main thread try to dispatch event to some frozen thread, it may acquire a lock that will never be released, and then run into a deadlock.

If we can find a point after when there's no more IPC messages delivered to Nuwa, we can make Nuwa freeze after that point. This patch tests all event loops in chrome process, try to find a point when event loops of all threads are empty. This means that all tasks that started by the message that Nuwa was sent previously are done, and since Nuwa is started before we have network access, we can ignore the possibility of receiving message from network in Nuwa.

Patrick Wang (Chih-Kai Wang) (:kk1fff)

Assignee

Updated

•

10 years ago

Depends on: 999323

Patrick Wang (Chih-Kai Wang) (:kk1fff)

Assignee

Comment 2

•

10 years ago

Attached patch WIP (obsolete) — Details — Splinter Review

Attachment #8403892 - Attachment is obsolete: true

Patrick Wang (Chih-Kai Wang) (:kk1fff)

Assignee

Comment 3

•

10 years ago

Test with current WIP patch:

Preload in Nuwa process:
                            |     megabytes     |
           NAME   PID  PPID CPU(s) NICE  USS  PSS  RSS VSIZE OOM_ADJ USER
            b2g 28250     1   25.9    0 52.1 58.6 71.2 146.3       0 root
         (Nuwa) 28302 28250    1.6    0  3.8 10.5 25.4  55.7       0 root
     Homescreen 28444 28302    6.0    1 11.6 16.6 30.2  81.0       2 app_28444
Built-in Keyboa 28514 28302    0.7   18  5.8  9.5 21.8  61.8      10 app_28514
(Preallocated a 28536 28302    0.1   18  3.2  5.7 14.1  58.7      10 root

Preload in preallocated process:
                            |     megabytes     |
           NAME   PID  PPID CPU(s) NICE  USS  PSS  RSS VSIZE OOM_ADJ USER
            b2g 28994     1   27.0    0 52.3 57.9 70.2 146.7       0 root
         (Nuwa) 29034 28994    1.1    0  3.9  8.9 20.9  52.5       0 root
     Homescreen 29047 29034    4.4    1 12.5 17.0 30.4  82.7       2 app_29047
Built-in Keyboa 29121 29034    1.5   18  7.3 11.1 23.7  70.3      10 app_29121
(Preallocated a 29136 29034    0.9   18  5.5  8.6 19.9  58.5      10 root

It saves about 2MB in preallocated process and 0.9 ~ 1.5MB in homescreen and keyboard.

[:fabrice] Fabrice Desré

Comment 4

•

10 years ago

Awesome Patrick! Let's get that reviewed and landed on m-c as soon as possible to be sure we are not regressing.

Gabriele Svelto [:gsvelto]

Comment 5

•

10 years ago

(In reply to Patrick Wang (Chih-Kai Wang) (:kk1fff) from comment #3)
> It saves about 2MB in preallocated process and 0.9 ~ 1.5MB in homescreen and
> keyboard.

The savings in the homescreen app seems genuine but the keyboard has a different RSS between the tests so that might account for the difference you're seeing (it's my understanding that this patch should affect the USS values only, not the RSS ones).

In my previous testing I often used ./tools/get_about_memory.py --minimize before looking at USS values to make sure that the RSS were as close as possible between the different runs.

Patrick Wang (Chih-Kai Wang) (:kk1fff)

Assignee

Updated

•

10 years ago

No longer depends on: 941466

Patrick Wang (Chih-Kai Wang) (:kk1fff)

Assignee

Comment 6

•

10 years ago

The WIP has issues with timer thread, so it will slow down boot process. Still fixing.

Ting-Yu Chou [:ting] (away)

Comment 7

•

10 years ago

(In reply to Gabriele Svelto [:gsvelto] from comment #5)
> In my previous testing I often used ./tools/get_about_memory.py --minimize
> before looking at USS values to make sure that the RSS were as close as
> possible between the different runs.

Good tips, thanks for sharing.

Patrick Wang (Chih-Kai Wang) (:kk1fff)

Assignee

Comment 8

•

10 years ago

Attached patch WIP (obsolete) — Details — Splinter Review

Check tasks in ThreadPools.

Attachment #8410453 - Attachment is obsolete: true

Patrick Wang (Chih-Kai Wang) (:kk1fff)

Assignee

Comment 9

•

10 years ago

Test is failure on emulator, still need to deal with worker thread.

Patrick Wang (Chih-Kai Wang) (:kk1fff)

Assignee

Updated

•

10 years ago

Depends on: 1006336

Patrick Wang (Chih-Kai Wang) (:kk1fff)

Assignee

Comment 10

•

10 years ago

Attached patch WIP (obsolete) — Details — Splinter Review

Handle worker threads.
The test cases start to be able to run, but there are still test failures. I think observing thread task queue with thread observer would be more clear in this case, I am going to change the approach of this patch.

Attachment #8413602 - Attachment is obsolete: true

Cervantes Yu [:cyu] [:cervantes]

Updated

•

10 years ago

Comment 11

•

10 years ago

Attached patch Part 1: Collect thread state in nsThreadStatusManager (obsolete) — Details — Splinter Review

Attachment #8417999 - Attachment is obsolete: true

Patrick Wang (Chih-Kai Wang) (:kk1fff)

Assignee

Comment 12

•

10 years ago

Attached patch Part 2: Report status of each threads. (obsolete) — Details — Splinter Review

Patrick Wang (Chih-Kai Wang) (:kk1fff)

Assignee

Comment 13

•

10 years ago

Attached patch Part 3: preload in Nuwa and wait until loading finish. (obsolete) — Details — Splinter Review

Patrick Wang (Chih-Kai Wang) (:kk1fff)

Assignee

Comment 14

•

10 years ago

Attached patch Part 4: copy mSendPermissionUpdates when copying ContentParent (obsolete) — Details — Splinter Review

If permission manager is initialized in Nuwa, mSendPermissionUpdates will be set but the value will not be copied to the new ContentParent when Nuwa forks a new process. Thus the new child will not reveice the permission update from chrome.

Patrick Wang (Chih-Kai Wang) (:kk1fff)

Assignee

Comment 15

•

10 years ago

Still some issues due to initialize modules before fork need to be addressed case by case.

Patrick Wang (Chih-Kai Wang) (:kk1fff)

Assignee

Updated

•

10 years ago

Depends on: 1032125

Patrick Wang (Chih-Kai Wang) (:kk1fff)

Assignee

Comment 16

•

10 years ago

Attached patch Patch: Rebuild Parent/Child connection in modules (obsolete) — Details — Splinter Review

Some modules register callbacks to parent when child initialize. If a module's child part is initialized in Nuwa, and it registers callback in parent side, the callbacks won't be clone in parent. The child side need to re-register the callbacks after fork.

Patrick Wang (Chih-Kai Wang) (:kk1fff)

Assignee

Comment 17

•

10 years ago

Attached patch Part 1: Collect thread state in nsThreadStatusManager (obsolete) — Details — Splinter Review

nsThreadStatusManager as a singleton object that each thread report its status (idle/busy) to. It notifies listeners when all of the watched threads become idle.

Attachment #8438988 - Attachment is obsolete: true

Attachment #8455158 - Flags: feedback?(bent.mozilla)

Patrick Wang (Chih-Kai Wang) (:kk1fff)

Assignee

Comment 18

•

10 years ago

Attached patch Part 2: Report threads' status to nsThreadStatusManager (obsolete) — Details — Splinter Review

Attachment #8438989 - Attachment is obsolete: true

Attachment #8455159 - Flags: feedback?(bent.mozilla)

Patrick Wang (Chih-Kai Wang) (:kk1fff)

Assignee

Comment 19

•

10 years ago

Attached patch Part 1: Collect thread state in nsThreadStatusManager (obsolete) — Details — Splinter Review

Attachment #8455158 - Attachment is obsolete: true

Attachment #8455158 - Flags: feedback?(bent.mozilla)

Attachment #8463923 - Flags: feedback?(bent.mozilla)

Patrick Wang (Chih-Kai Wang) (:kk1fff)

Assignee

Comment 20

•

10 years ago

Attached patch Part 2: Report threads' status to nsThreadStatusManager (obsolete) — Details — Splinter Review

Attachment #8455159 - Attachment is obsolete: true

Attachment #8455159 - Flags: feedback?(bent.mozilla)

Attachment #8463924 - Flags: feedback?(bent.mozilla)

Patrick Wang (Chih-Kai Wang) (:kk1fff)

Assignee

Comment 21

•

10 years ago

Attached patch Part 1: Collect thread state in nsThreadStatusManager (obsolete) — Details — Splinter Review

Attachment #8463923 - Attachment is obsolete: true

Attachment #8463923 - Flags: feedback?(bent.mozilla)

Attachment #8478208 - Flags: review?(bent.mozilla)

Patrick Wang (Chih-Kai Wang) (:kk1fff)

Assignee

Comment 22

•

10 years ago

Attached patch Part 2: Report threads' status to nsThreadStatusManager (obsolete) — Details — Splinter Review

Attachment #8463924 - Attachment is obsolete: true

Attachment #8463924 - Flags: feedback?(bent.mozilla)

Attachment #8478211 - Flags: review?(bent.mozilla)

Patrick Wang (Chih-Kai Wang) (:kk1fff)

Assignee

Comment 23

•

10 years ago

Attached patch Part 3: make nuwa wait for CanFreeze message and preload in nuwa. (obsolete) — Details — Splinter Review

Attachment #8438991 - Attachment is obsolete: true

Attachment #8453668 - Attachment is obsolete: true

Attachment #8478212 - Flags: review?(khuey)

Patrick Wang (Chih-Kai Wang) (:kk1fff)

Assignee

Comment 24

•

10 years ago

Attached patch Part 4: reinitialize AppServiceChild after a new process forked. (obsolete) — Details — Splinter Review

Attachment #8446304 - Attachment is obsolete: true

Attachment #8478213 - Flags: review?(fabrice)

Patrick Wang (Chih-Kai Wang) (:kk1fff)

Assignee

Comment 25

•

10 years ago

Comment on attachment 8478212 [details] [diff] [review]
Part 3: make nuwa wait for CanFreeze message and preload in nuwa.

stop preloading protocol-proxy-service since dispatches runnables inside nuwa, which can sometimes frozen Nuwa.

[:fabrice] Fabrice Desré

Comment 26

•

10 years ago

Comment on attachment 8478213 [details] [diff] [review]
Part 4: reinitialize AppServiceChild after a new process forked.

Review of attachment 8478213 [details] [diff] [review]:
-----------------------------------------------------------------

Mostly nits, but I'd really like us to use better names. That helps understanding what's happening!

::: dom/ipc/jar.mn
@@ +8,5 @@
>          content/global/BrowserElementChild.js (../browser-element/BrowserElementChild.js)
>          content/global/BrowserElementChildPreload.js (../browser-element/BrowserElementChildPreload.js)
>  *       content/global/BrowserElementPanning.js (../browser-element/BrowserElementPanning.js)
>          content/global/preload.js (preload.js)
> +        content/global/preload2.js (preload2.js)

I'm not a fan on the preload2.js name. Could we use a more explicit one. If we need to rename preload.js too that's fine.

::: dom/ipc/preload2.js
@@ +11,5 @@
> +  "use strict";
> +
> +  let Cu = Components.utils;
> +  let Cc = Components.classes;
> +  let Ci = Components.interfaces;

you don't need Cc and Ci. Actually, you're using Cu only once so you can just do Components.utils.import(...)

@@ +14,5 @@
> +  let Cc = Components.classes;
> +  let Ci = Components.interfaces;
> +
> +  Cu.import("resource://gre/modules/AppsServiceChild.jsm");
> +  dump("TEST: preload2.js");

remove before landing.

Attachment #8478213 - Flags: review?(fabrice)

Kyle Huey (Exited; not receiving bugmail, old account, do not use)

Comment 28

•

10 years ago

bent, are you going to review these anytime soon?  Can we redirect this to froydnj?

Flags: needinfo?(bent.mozilla)

Andrew Overholt [:overholt]

Comment 29

•

10 years ago

Comment on attachment 8478212 [details] [diff] [review]
Part 3: make nuwa wait for CanFreeze message and preload in nuwa.

Cervantes, can you take this review?

Attachment #8478212 - Flags: review?(khuey) → review?(cyu)

Andrew Overholt [:overholt]

Comment 30

•

10 years ago

Comment on attachment 8478208 [details] [diff] [review]
Part 1: Collect thread state in nsThreadStatusManager

Nathan, it was suggested to me that you could perhaps take this review.

Attachment #8478208 - Flags: review?(bent.mozilla) → review?(nfroyd)

Andrew Overholt [:overholt]

Comment 31

•

10 years ago

Comment on attachment 8478211 [details] [diff] [review]
Part 2: Report threads' status to nsThreadStatusManager

Nathan, it was suggested to me that you could perhaps take this review.

Attachment #8478211 - Flags: review?(bent.mozilla) → review?(nfroyd)

Nathan Froyd [:froydnj]

Comment 32

•

10 years ago

Comment on attachment 8478208 [details] [diff] [review]
Part 1: Collect thread state in nsThreadStatusManager

Review of attachment 8478208 [details] [diff] [review]:
-----------------------------------------------------------------

The locking problems and the separate class for managing thread statuses are the big issues in the comments below.

I'm also not a fan of having to have this functionality on all the time, for all platforms, when we only need it for Nuwa.  We're also going to have this idleness stuff running all the time even after we freeze, which is pure overhead.  Is there any way we can make this monitoring either apply only for Nuwa purposes and/or turn itself off once we freeze?  (I guess the turning off is what NotifyIfAllThreadsIdle is trying to do, but we still have the overhead of looking up threads and setting them idle or not.)

::: xpcom/threads/nsIThreadStatusListener.idl
@@ +2,5 @@
> +
> +[scriptable, uuid(b9d16aee-065e-4d69-8c5b-de5c4e60210d)]
> +interface nsIThreadStatusListener : nsISupports
> +{
> +  void onAllThreadIdle();

This needs some comments.  It also needs a license header, modelines, and a big warning that this notification really only means that all "normal" threads--and it would be great to have a concrete definition for what you want to call "normal"--were idle at some point in the past, not that they are all idle now.

Actually, why does this even need to be scriptable?  And if it doesn't need to be scriptable, why do we even need IDL for it?  Can't it just be an abstract class in a header somewhere?

::: xpcom/threads/nsThreadStatusManager.cpp
@@ +1,1 @@
> +/* -*- Mode: C++; tab-width: 8; indent-tabs-mode: nil; c-basic-offset: 4 -*- */

Please use two-space indentation as per Gecko style.

@@ +18,5 @@
> +{
> +public:
> +    NotifyListenersRunnable() { }
> +
> +    virtual ~NotifyListenersRunnable() { }

Protected, please.

@@ +34,5 @@
> +} // anonymous namespace
> +
> +struct ThreadInfoHolder
> +{
> +    ThreadInfoHolder(nsThread* thread)

Arguments are prefixed with an 'a', so |aThread|.  Please fix this and other arguments throughout the patch.

@@ +84,5 @@
> +        }
> +    }
> +
> +    if (idle) {
> +        NS_DispatchToMainThread(new NotifyListenersRunnable(), NS_DISPATCH_NORMAL);

Please hold an nsRefPtr to the runnable over this call.

@@ +143,5 @@
> +nsThreadStatusManager::AddListener(nsIThreadStatusListener* listener)
> +{
> +    NS_ASSERTION(NS_IsMainThread(), "Need to be run on main thread");
> +    if (!mListeners.Contains(listener)) {
> +        mListeners.AppendElement(listener);

Not locking prior to modification means that you can have multiple threads race to add listeners, but not all of them succeed.  Not good.  Please fix.

@@ +157,5 @@
> +    NS_ASSERTION(NS_IsMainThread(), "Need to be run on main thread");
> +    mListeners.RemoveElement(listener);
> +    if (mListeners.Length() == 0) {
> +        MutexAutoLock lock(mLock);
> +        mWatching = false;

It's not obvious to me what the rules for examining or modifying |mListeners| under the lock here are.  Some places you do, some places you don't.  And then you have this |mWatching| variable which is supposed to represent |mListeners.Length() == 0|.  But duplicating information like this, and examining some parts under a lock and some parts not, is a recipe for races:

1. AddListener on some thread, prepare to AppendElement, context switch to a different thread;
2. RemoveListener checks mListeners.Length() == 0, prepare to lock |mLock|, context switch.
3. AddListener appends, locks |mLock|, sets |mWatching| to true, context switch.
4. Remove Listener locks |mLock|, sets |mWatching| to false.

Now you have a situation where |mWatching| is false|, but you have listeners in |mListeners|.  Not good.

Please eliminate mWatching and clean up the locking.

::: xpcom/threads/nsThreadStatusManager.h
@@ +16,5 @@
> +class nsThread;
> +class nsThreadPool;
> +
> +class ThreadInfoHolder;
> +class nsThreadStatusManager

We should not have another class that does similar things to the existing nsThreadManager.  All the functionality exposed here should be merged into nsThreadManager.

@@ +21,5 @@
> +{
> +public:
> +  ~nsThreadStatusManager();
> +  enum ThreadType {
> +    ThreadTypeNormal,

Some documentation on what exactly these different thread types are and when they're meant to be used would be good.

@@ +22,5 @@
> +public:
> +  ~nsThreadStatusManager();
> +  enum ThreadType {
> +    ThreadTypeNormal,
> +    ThreadTypeThreadpool,

I don't see this used in the places I'd expect it to be used in the second part of this series.  Is this even needed anymore?

@@ +30,5 @@
> +    static nsThreadStatusManager tsm;
> +    return &tsm;
> +  }
> +
> +  void SetThreadType(ThreadType type);

Might be a little nicer on the API client side to say:

SetThreadTypeIgnored() { SetThreadType(ThreadTypeIgnored); }

and so on for each of the types, and then SetThreadType becomes an internal function, and the enum becomes private.  That way, you don't have to qualify the ThreadType enums at the callsites.  (Also prevents clients from accidentally passing bogus data.)

@@ +32,5 @@
> +  }
> +
> +  void SetThreadType(ThreadType type);
> +  void SetThreadIdleStatus(bool isIdle);
> +  void SetThreadIdleStatus(nsIThread *thread, bool isIdle);

This API should not take a boolean argument, it should take an enum instead.  It may also be worthwhile to have it be instead:

SetThreadStatusIdle()
SetThreadStatusWorking()

or similar, so you don't have to qualify the enum name at the callsite.

@@ +39,5 @@
> +  void OnThreadExit(nsThread* thread);
> +
> +  void AddListener(nsIThreadStatusListener* listener);
> +  void RemoveListener(nsIThreadStatusListener* listener);
> +  void GetListener(nsTArray<nsCOMPtr<nsIThreadStatusListener> > *listeners);

Nit: this should be |GetListeners|.

Also nit: We can use >> to close off templates now.  Please do that, here and throughout the patch.

Also, given the recent move constructor support to nsTArray, you could just return an nsTArray here and it will be just as efficient, along with being easier to read.

@@ +44,5 @@
> +
> +private:
> +  nsThreadStatusManager();
> +
> +  void NotifyIfAllThreadIdle();

Nit: this should be |NotifyIfAllThreadsIdle|.

Attachment #8478208 - Flags: review?(nfroyd) → feedback+

Nathan Froyd [:froydnj]

Comment 33

•

10 years ago

Comment on attachment 8478211 [details] [diff] [review]
Part 2: Report threads' status to nsThreadStatusManager

Review of attachment 8478211 [details] [diff] [review]:
-----------------------------------------------------------------

I'd like to see this again once some of the naming changes suggested in part 1 make it in.  I'll have a much higher confidence that all the calls are right then.

::: content/media/MediaStreamGraph.cpp
@@ +1345,5 @@
>      MonitorAutoLock lock(mMonitor);
>      messageQueue.SwapElements(mMessageQueue);
>    }
> +
> +  nsThreadStatusManager::get()->SetThreadType(nsThreadStatusManager::ThreadTypeIgnored);

Nit: indentation is incorrect with respect to the surrounding code.

::: netwerk/base/src/nsSocketTransportService2.cpp
@@ +736,5 @@
>  
>          do {
> +          // Mark current thread empty when there's no pending events
> +          // and we are going to poll.
> +          if (!pendingEvents) {

Nit: indentation of this block is wrong with respect to the code below.

::: widget/gonk/GonkMemoryPressureMonitoring.cpp
@@ +127,5 @@
>        NuwaMarkCurrentThread(nullptr, nullptr);
>      }
>  #endif
>  
> +    nsThreadStatusManager::get()->SetThreadType(nsThreadStatusManager::ThreadTypeIgnored);

Nit: indentation.

::: xpcom/threads/TimerThread.cpp
@@ +187,5 @@
>      NuwaMarkCurrentThread(nullptr, nullptr);
>    }
>  #endif
>  
> +  nsThreadStatusManager::get()->SetThreadType(nsThreadStatusManager::ThreadTypeIgnored);

Nit: indentation.

Attachment #8478211 - Flags: review?(nfroyd)

Cervantes Yu [:cyu] [:cervantes]

Comment 34

•

10 years ago

Comment on attachment 8478212 [details] [diff] [review]
Part 3: make nuwa wait for CanFreeze message and preload in nuwa.

Review of attachment 8478212 [details] [diff] [review]:
-----------------------------------------------------------------

I'd like to see the next revision with the issues addressed.

::: dom/ipc/ContentChild.cpp
@@ +1740,5 @@
>      }
>  
> +    if (IsNuwaProcess()) {
> +        SendNuwaWaiting();
> +    }

This has to be #ifdef'd, or it won't build on non-Nuwa platforms.

::: dom/ipc/ContentParent.cpp
@@ +1321,5 @@
>  
>  NS_IMPL_ISUPPORTS(SystemMessageHandledListener,
>                    nsITimerCallback)
>  
> +class NuwaListener : public nsIThreadStatusListener

Nit: NuwaListener doesn't sound like a good name to me. The listener listen to "Nuwa", what does that mean? The reader has to trace the code to see what it does. I would suggest giving it a more self-descriptive name.

@@ +1324,5 @@
>  
> +class NuwaListener : public nsIThreadStatusListener
> +{
> +public:
> +

Nit: no blank line after the public section, please.

@@ +1332,5 @@
> +        mParent(parent) {
> +    }
> +
> +    virtual ~NuwaListener() {
> +    }

This cannot be public, or it will fail the static assertion MOZ_ASSERT_TYPE_OK_FOR_REFCOUNTING(X).

@@ +1341,5 @@
> +        nsThreadStatusManager::get()->RemoveListener(this);
> +        return NS_OK;
> +    }
> +private:
> +    ContentParent *mParent;

I think this has to be a refptr. We register the listener to the singleton of nsThreadStatusManager, which can outlive the ContentParent instance. If the instance mParent points to should ever be destroyed for any reason, we can use a stale pointer and crash.

::: dom/ipc/PContent.ipdl
@@ +447,5 @@
>       * Notify idle observers in the child
>       */
>      NotifyIdleObserver(uint64_t observerId, nsCString topic, nsString str);
> +
> +    NuwaCanFreeze();

Nit: one blank line before starting a new section, please. Also the same in other parts.

@@ +619,5 @@
>      // Notify the parent that the child has finished handling a system message.
>      async SystemMessageHandled();
>  
>      NuwaReady();
> +    NuwaWaiting();

NuwaWaiting() isn't descriptive enough. Please comment it or use a more descriptive name.

::: dom/ipc/preload.js
@@ +53,5 @@
>    Cc["@mozilla.org/network/effective-tld-service;1"].getService(Ci["nsIEffectiveTLDService"]);
>    Cc["@mozilla.org/network/idn-service;1"].getService(Ci["nsIIDNService"]);
>    Cc["@mozilla.org/network/io-service;1"].getService(Ci["nsIIOService2"]);
>    Cc["@mozilla.org/network/mime-hdrparam;1"].getService(Ci["nsIMIMEHeaderParam"]);
> +  // Cc["@mozilla.org/network/protocol-proxy-service;1"].getService(Ci["nsIProtocolProxyService"]);

Why is nsProtocolProxyService removed from preload? It's not loaded in preload2.js after fork. If you are to comment out preload of nsIProtocolProxyService please leave a comment here why you remove disable the preload of it.

::: xpcom/threads/nsTimerImpl.cpp
@@ +558,5 @@
>  
> +  if (IsNuwaProcess() && IsNuwaReady()) {
> +    return;
> +  }
> +

This also has to be #ifdef'd.

Attachment #8478212 - Flags: review?(cyu) → review-

Cervantes Yu [:cyu] [:cervantes]

Comment 35

•

10 years ago

BTW, Send/RecvNuwaForkDone() seems not necessary to me. Is there any reason we need to run preload2.js after parent has received AddNewProcess() message? If you only want to run preload2.js after fork, you may consider using ContentChild.cpp::InitOnContentProcessCreated().

Ben Turner (not reading bugmail, use the needinfo flag!)

Updated

•

10 years ago

Flags: needinfo?(bent.mozilla)

Patrick Wang (Chih-Kai Wang) (:kk1fff)

Assignee

Comment 36

•

10 years ago

Attached patch Part 1: report status of each thread. (obsolete) — Details — Splinter Review

Attachment #8478208 - Attachment is obsolete: true

Attachment #8478211 - Attachment is obsolete: true

Patrick Wang (Chih-Kai Wang) (:kk1fff)

Assignee

Comment 37

•

10 years ago

Attached patch Part 2: let nuwa wait for all tasks of preload to be done. (obsolete) — Details — Splinter Review

Attachment #8478212 - Attachment is obsolete: true

Patrick Wang (Chih-Kai Wang) (:kk1fff)

Assignee

Comment 38

•

10 years ago

Attached patch Part 3: reinitialize some module after forking. (obsolete) — Details — Splinter Review

Attachment #8478213 - Attachment is obsolete: true

Patrick Wang (Chih-Kai Wang) (:kk1fff)

Assignee

Updated

•

10 years ago

Attachment #8513341 - Attachment description: Bug-970307-Part-1-Report-status-of-each-thread-to-fi.patch → Part 1: report status of each thread.

Attachment #8513341 - Flags: review?(nfroyd)

Patrick Wang (Chih-Kai Wang) (:kk1fff)

Assignee

Updated

•

10 years ago

Attachment #8513342 - Flags: review?(cyu)

Patrick Wang (Chih-Kai Wang) (:kk1fff)

Assignee

Updated

•

10 years ago

Attachment #8513345 - Flags: review?(fabrice)

Nathan Froyd [:froydnj]

Comment 39

•

10 years ago

Comment on attachment 8513341 [details] [diff] [review]
Part 1: report status of each thread.

Review of attachment 8513341 [details] [diff] [review]:
-----------------------------------------------------------------

Thank you for adding the NS_* wrappers in this patch.  They make things much more readable.

Only f+'ing because I'm unclear on some of the comments and code below.  I think the whole idleness checking stuff is inherently racy, but I am willing to believe that in the controlled Nuwa environment, especially if we don't have any network connections, that when we send the idle notification, it's unlikely lots of threads are going to become un-idle.

::: xpcom/glue/nsThreadUtils.h
@@ +568,5 @@
> +/**
> + * Helpers for thread to report their status when compiled with Nuwa.
> + */
> +#ifdef MOZ_NUWA_PROCESS
> +#ifdef MOZILLA_INTERNAL_API

I realize that you probably did this for consistency with other #ifdef MOZ_NUWA_PROCESS top-level blocks, but I think we'd be much better off with the conditionals turned inside-out:

#ifdef MOZILLA_INTERNAL_API
#ifdef MOZ_NUWA_PROCESS
// declarations
#else
// inline definitions
#endif // MOZ_NUWA_PROCESS
#endif // MOZILLA_INTERNAL_API

::: xpcom/threads/nsThread.cpp
@@ +827,5 @@
>        if (MAIN_THREAD == mIsMainThread) {
>          HangMonitor::NotifyActivity();
>        }
> +#ifdef MOZ_NUWA_PROCESS
> +      nsThreadManager::get()->SetThreadWorking();

NS_SetCurrentThreadWorking?  Then you don't need the #ifdef.

@@ +841,5 @@
>    --mRunningEvent;
>  
> +#ifdef MOZ_NUWA_PROCESS
> +  if ((!mEvents->GetEvent(false, nullptr)) && (mRunningEvent == 0)) {
> +    nsThreadManager::get()->SetThreadIdle();

Likewise, though I think the #ifdef is still valuable for avoiding work when we're not using Nuwa.

::: xpcom/threads/nsThreadManager.cpp
@@ +48,5 @@
>  
>  typedef nsTArray<nsRefPtr<nsThread>> nsThreadArray;
>  
> +#ifdef MOZ_NUWA_PROCESS
> +class NotifyAllThreadsHasIdled: public nsRunnable

I think |NotifyAllThreadsWereIdle| is a better name for this class.

@@ +444,5 @@
> +  SetThreadStatus(true);
> +}
> +
> +void
> +nsThreadManager::AddAllThreadHadIdledListener(AllThreadHadIdledListener *listener)

Nit: aListener (and similarly below, please).

@@ +446,5 @@
> +
> +void
> +nsThreadManager::AddAllThreadHadIdledListener(AllThreadHadIdledListener *listener)
> +{
> +  mThreadsIdledListener.AppendElement(listener);

Please add a MOZ_ASSERT(GetCurrentThreadStatusInfo()->isWorking); here.

@@ +466,5 @@
> +    // after holding a lock.
> +    // There may be threads that has checked |mThreadsIdledListener| before a
> +    // listener is put into it and then is context switched. In this case, we
> +    // might get the value before it has been updated, and the thread won't check
> +    // again after update to value.

What is "the value" here?  Is it whether mThreadsIdledListener has any elements in it?  Or is it mWorking?

@@ +467,5 @@
> +    // There may be threads that has checked |mThreadsIdledListener| before a
> +    // listener is put into it and then is context switched. In this case, we
> +    // might get the value before it has been updated, and the thread won't check
> +    // again after update to value.
> +    // a. If it goes from idle to working, we'll get incorrect idle status since

What is "it" here (and below, in point b)?  Is it the current thread, or is it some other thread that we might be racing with?

@@ +470,5 @@
> +    // again after update to value.
> +    // a. If it goes from idle to working, we'll get incorrect idle status since
> +    //    there's actually something in its queue. However, having not finished
> +    //    the function means the thread that dispatches this task is still
> +    //    marked as working, and that will prevent us from firing an event.

"the thread that dispatches this task"...isn't that this thread?  Or is it the main thread, since we're going to dispatch the notifier there?  I don't see how either of those choices is still marked as working: the current thread can be marked as non-working, by virtue of this function, and the main thread doesn't necessarily have to be working while we're going through the loop below.

::: xpcom/threads/nsThreadManager.h
@@ +22,5 @@
>  {
>  public:
> +#ifdef MOZ_NUWA_PROCESS
> +  struct ThreadStatusInfo;
> +  class AllThreadHadIdledListener {

I think |AllThreadsWereIdleListener| is a better name for this class.

@@ +25,5 @@
> +  struct ThreadStatusInfo;
> +  class AllThreadHadIdledListener {
> +  public:
> +    NS_INLINE_DECL_REFCOUNTING(AllThreadHadIdledListener);
> +    virtual void OnAllThreadHadIdled() = 0;

Similarly, OnAllThreadsWereIdle.

@@ +76,5 @@
> +  void SetThreadIdle();
> +  void SetThreadWorking();
> +
> +  void AddAllThreadHadIdledListener(AllThreadHadIdledListener *listener);
> +  void RemoveAllThreadHadIdledListener(AllThreadHadIdledListener *listener);

Similarly, {Add,Remove}AllThreadsWereIdleListener

@@ +112,5 @@
> +  ThreadStatusInfo* GetCurrentThreadStatusInfo();
> +  void SetThreadStatus(bool isWorking);
> +
> +  unsigned            mThreadStatusInfoIndex;
> +  nsTArray<nsRefPtr<AllThreadHadIdledListener>> mThreadsIdledListener;

Since this is a collection of listeners, it should be named as such: mThreadsIdleListeners.

@@ +114,5 @@
> +
> +  unsigned            mThreadStatusInfoIndex;
> +  nsTArray<nsRefPtr<AllThreadHadIdledListener>> mThreadsIdledListener;
> +  nsTArray<ThreadStatusInfo*> mThreadStatusInfos;
> +  nsAutoPtr<mozilla::ReentrantMonitor> mMonitor;

mozilla::UniquePtr, please.  (Actually, why is this not a plain member?)

::: xpcom/threads/nsThreadPool.cpp
@@ +207,5 @@
>          } else {
>            PRIntervalTime delta = timeout - (now - idleSince);
>            LOG(("THRD-P(%p) waiting [%d]\n", this, delta));
> +#ifdef MOZ_NUWA_PROCESS
> +          nsThreadManager::get()->SetThreadIdle();

NS_SetCurrentThreadIdle?  Then you don't need the #ifdef.

@@ +219,5 @@
>      }
>      if (event) {
>        LOG(("THRD-P(%p) running [%p]\n", this, event.get()));
> +#ifdef MOZ_NUWA_PROCESS
> +      nsThreadManager::get()->SetThreadWorking();

Likewise for this.

::: xpcom/threads/nsTimerImpl.cpp
@@ +557,5 @@
>    }
>  
> +#ifdef MOZ_NUWA_PROCESS
> +  if (IsNuwaProcess() && IsNuwaReady()) {
> +    return;

This condition looks a little odd, can you elaborate on why this is here?

Attachment #8513341 - Flags: review?(nfroyd) → feedback+

Patrick Wang (Chih-Kai Wang) (:kk1fff)

Assignee

Comment 40

•

10 years ago

(In reply to Nathan Froyd (:froydnj) from comment #39)
> 
> @@ +466,5 @@
> > +    // after holding a lock.
> > +    // There may be threads that has checked |mThreadsIdledListener| before a
> > +    // listener is put into it and then is context switched. In this case, we
> > +    // might get the value before it has been updated, and the thread won't check
> > +    // again after update to value.
> 
> What is "the value" here?  Is it whether mThreadsIdledListener has any
> elements in it?  Or is it mWorking?

It's |mWorking|.

> 
> @@ +467,5 @@
> > +    // There may be threads that has checked |mThreadsIdledListener| before a
> > +    // listener is put into it and then is context switched. In this case, we
> > +    // might get the value before it has been updated, and the thread won't check
> > +    // again after update to value.
> > +    // a. If it goes from idle to working, we'll get incorrect idle status since
> 
> What is "it" here (and below, in point b)?  Is it the current thread, or is
> it some other thread that we might be racing with?

It's some other thread that we might be racing with.

> 
> @@ +470,5 @@
> > +    // again after update to value.
> > +    // a. If it goes from idle to working, we'll get incorrect idle status since
> > +    //    there's actually something in its queue. However, having not finished
> > +    //    the function means the thread that dispatches this task is still
> > +    //    marked as working, and that will prevent us from firing an event.
> 
> "the thread that dispatches this task"...isn't that this thread?  Or is it
> the main thread, since we're going to dispatch the notifier there?  I don't
> see how either of those choices is still marked as working: the current
> thread can be marked as non-working, by virtue of this function, and the
> main thread doesn't necessarily have to be working while we're going through
> the loop below.

The first case I try to describe in comment is that a if there's a thread A changing its |mWorking| from false to true while we are checking status of all threads, we could fire a bogus event:
(1) Thread B (working) dispatches a task to Thread A (idle)
(2) A entered |SetThreadStatus| and find |mThreadsIdledListener| is empty (so A won't acquire lock)
(3) something is inserted into |mThreadsIdledListener|
(4) Thread C enters |SetThreadStatus|, find there is something in |mThreadsIdledListener|, acquire lock and check status of other threads.
(5) C checks A, finds it's idle
(6) A updates it status to working
(7) B updates it status to idle
(8) C checks B, finds it's idle.

I was trying to explain that the bogus event won't actually be fired because it's not possible that B set it's |mWorking| to false during our loop of checking |mWorking|s, since B will be blocked by the monitor as it entered this function after we have entered the loop. (But when I was writing this bugzilla comment, I realized that current implementation may not work properly, I have to make sure that B won't start to set itself to idle before A had become working. I will address this in next version.)

> @@ +114,5 @@
> > +
> > +  unsigned            mThreadStatusInfoIndex;
> > +  nsTArray<nsRefPtr<AllThreadHadIdledListener>> mThreadsIdledListener;
> > +  nsTArray<ThreadStatusInfo*> mThreadStatusInfos;
> > +  nsAutoPtr<mozilla::ReentrantMonitor> mMonitor;
> 
> mozilla::UniquePtr, please.  (Actually, why is this not a plain member?)

Actually, I copied the style from the Mutex declared above. Another reason is that the header of ReentrantMonitor doesn't need to be included in this header file.

> ::: xpcom/threads/nsTimerImpl.cpp
> @@ +557,5 @@
> >    }
> >  
> > +#ifdef MOZ_NUWA_PROCESS
> > +  if (IsNuwaProcess() && IsNuwaReady()) {
> > +    return;
> 
> This condition looks a little odd, can you elaborate on why this is here?

It should be put in another part actually. It's because when loading preload in Nuwa, a timer event which fires after Nuwa frozen can freeze main thread as well. Thus I cancelled the timer events which fires after Nuwa frozen.

[:fabrice] Fabrice Desré

Updated

•

10 years ago

Attachment #8513345 - Flags: review?(fabrice) → review+

Patrick Wang (Chih-Kai Wang) (:kk1fff)

Assignee

Comment 41

•

10 years ago

Attached patch Part 1: report status of each thread. (obsolete) — Details — Splinter Review

Update in this version is that a thread needs to enter a per-thread monitor or mutex of target thread before it can dispatch tasks to the target thread. This would make sure that once a thread has a task in its queue, its status is working. Except thread pool, in which the target thread is not decided when dispatching tasks to the queue. For thread pool, once a thread got a task from thead pool's shared queue, it marks itself working.

Attachment #8513341 - Attachment is obsolete: true

Attachment #8517393 - Flags: review?(nfroyd)

Patrick Wang (Chih-Kai Wang) (:kk1fff)

Assignee

Comment 42

•

10 years ago

Attached patch Part 2: let nuwa wait for all tasks of preload to be done. (obsolete) — Details — Splinter Review

Changed as name change in part 1. Also changed the way nuwa process waits for PBackground form a nested event loop in which Nuwa can be blocked to a continuously dispatching a task to check whether PBackground is ready.

Attachment #8513342 - Attachment is obsolete: true

Attachment #8513342 - Flags: review?(cyu)

Attachment #8517397 - Flags: review?(cyu)

Nathan Froyd [:froydnj]

Comment 43

•

10 years ago

Comment on attachment 8517393 [details] [diff] [review]
Part 1: report status of each thread.

Review of attachment 8517393 [details] [diff] [review]:
-----------------------------------------------------------------

The locking in nsThread is a good idea; it makes a number of things easier to reason about.

I apologize for continued comments on the comment block in nsThreadManager.cpp, but I'm having a hard time convincing myself it's describing the situation well.  And since it's the heart of the patch, well...

::: dom/indexedDB/TransactionThreadPool.cpp
@@ +864,5 @@
>    nsRefPtr<FinishCallback> finishCallback;
>    bool shouldFinish = false;
> +#ifdef MOZ_NUWA_PROCESS
> +  mThread = static_cast<nsThread*>(NS_GetCurrentThread());
> +  // Set ourselve as working thread. We can reset later if we found

Nit: "Set ourself"

::: xpcom/threads/nsThreadManager.cpp
@@ +75,5 @@
> +};
> +
> +struct nsThreadManager::ThreadStatusInfo {
> +  bool mWorking;
> +  bool mWillBeWorking;

I would feel much better if these were Atomic<bool>.  I realize that accesses to Atomic variables are not cheap, but the correctness of the notification mechanism depends on seeing cross-thread updates to these variables, and we're definitely not guaranteed that if these aren't Atomic.

@@ +450,5 @@
> +  aInfo->mWillBeWorking = aIsWorking;
> +  if (mThreadsIdledListeners.Length() > 0) {
> +    // If there's a listener, we update our status and check statuses of threads
> +    // after holding a lock.
> +    // There may be threads that has checked |mThreadsIdledListeners| before a

Thank you for the small changes you made to this, and for the additional example you provided.

I tried reading through this again, but got confused (still!).  Below is my attempt to rewrite the whole comment, with more clarity about what "it" is referring to in many places.  Inline comments between [], also:

-----
There may be a thread X that has checked whether there are listeners prior to a listener being placed into |mThreadsIdledListeners|, but is suspended prior to setting its |mWorking|.  In that case, we might get the value of |mWorking| before it has been updated, and thread X won't check whether there are listeners after updating |mWorking|.

There are two transitions to consider for thread X.

a) If the status of thread X is changed from idle to working while we are walking through |mThreadStatusInfos|, we will not send a bogus event.  The thread that is walking through |mThreadStatusInfos| is obviously working, and the thread that we dispatch...

[Please see review comments elsewhere, I don't think I can adequately rewrite the description of this case to make sense.]

b) If the status of thread X is changed from working to idle, we could miss a chance to notify listeners.  This case is addressed by adding the additional status variable |mWillBeWorking|.  If thread X's |mWillBeWorking| is false while |mWorking| is true, then X is becoming idle, and we can legitimately treat the thread as an idle thread.  There are two  cases to consider here:

  1) The thread X is blocked or about to be blocked by the monitor.  If all the threads are idle (or becoming so), then we will dispatch a task to the main thread.  The main thread will then be marked as working, and when thread X finally enters the monitor, it will see that the main thread is working and not send another event.

[Why does thread X necessarily get scheduled prior to the main thread running its event?  It seems perfectly reasonable that the main thread could completely process its event, go back to idling, and then thread X would be scheduled, only to notice that all the threads are idled again, and fire another event.  Are we simply assuming there won't be any more listeners when that second event gets fired?]

  2) The thread X is not entering the monitor at all, i.e. it's entering the |else| branch of the |if| here.  We don't have to worry about a second event being sent here.
-----

Does all that seem correct?

@@ +456,5 @@
> +    // might get the value before it has been updated, and the thread won't check
> +    // again after update to value.
> +    // a. If status of a thread is changed idle to working during we are
> +    //    checking info array, we won't send bogus event since the thread that
> +    //    is dispatch a task to the idle thread is working. It is not possible

This sentence is unclear: "the thread that is dispatch(ing?) a task to the idle thread is working"...isn't that us, running the checking loop below?  If it is us, how do we know that we are marked as working?  What if we're making the transition to an idle thread?

We only ever dispatch tasks to the main thread, how do we know that it's idle?

If you need to talk about multiple threads, then it would be extremely helpful if you gave them names.  The names given in comment 40 are a good start.

@@ +460,5 @@
> +    //    is dispatch a task to the idle thread is working. It is not possible
> +    //    for the thread that is dispatch a task to change status of itself
> +    //    after it dispatched the task, since it will enter this function after
> +    //    us, so it will find there's something in |mThreadsIdledListeners|,
> +    //    and then try to arquire the monitor.

I tried to rewrite this last sentence, but completely failed, probably because "the thread that is dispatch a task to change..." is unclear: are we talking about the thread that is receiving the task, or the thread that is doing the dispatching (i.e. the one running this loop here)?

::: xpcom/threads/nsThreadManager.h
@@ +22,5 @@
>  {
>  public:
> +#ifdef MOZ_NUWA_PROCESS
> +  struct ThreadStatusInfo;
> +  class AllThreadWereIdleListener {

Nit: AllThreadsWereIdleListener.

@@ +25,5 @@
> +  struct ThreadStatusInfo;
> +  class AllThreadWereIdleListener {
> +  public:
> +    NS_INLINE_DECL_REFCOUNTING(AllThreadWereIdleListener);
> +    virtual void OnAllThreadWereIdle() = 0;

Likewise here.

::: xpcom/threads/nsTimerImpl.cpp
@@ +556,5 @@
>      return;
>    }
>  
> +#ifdef MOZ_NUWA_PROCESS
> +  if (IsNuwaProcess() && IsNuwaReady()) {

Please add a comment describing why we are doing this.

Attachment #8517393 - Flags: review?(nfroyd) → feedback+

Patrick Wang (Chih-Kai Wang) (:kk1fff)

Assignee

Comment 44

•

10 years ago

(In reply to Nathan Froyd (:froydnj) from comment #43)
> Comment on attachment 8517393 [details] [diff] [review]
> Part 1: report status of each thread.
> 
> Review of attachment 8517393 [details] [diff] [review]:

Thanks for reviewing this patch, Nathan. I will try to describe each step of how the possible racing occurs and the way I proposed to address them.

The race problem occurs since we don't want threads to try to enter the monitor (nsThreadManager::mMonitor) when no one cares about their status. And thus the race can happen when we put the first listener into |mThreadsIdledListeners|:

(1) Thread A wants to dispatch a task to Thread B.
(2) Thread A checks |mThreadsIdledListeners|, and nothing is in the list. So Thread A decides not to
    enter |mMonitor| when updating B's status.
(3) Thread A is suspended just before it changed status of B.
(4) A listener is added to |mThreadsIdledListeners|
(5) Now is Thread C's turn to run. Thread C finds there's something in |mThreadsIdledListeners|, so it
    enters |mMonitor| and check all thread info structs in |mThreadStatusInfos| while A is in the
    middle of changing B's status.

Then C might find Thread B is an idle thread (which is not correct, because A attempted to change B's status prior to C's checking starts), but the fact that thread A is working (thread A hasn't finished dispatching a task to thread B) can prevent thread C from firing a bogus notification.

It's also possible for A to change status of itself to idle after C has checked B's status (and found that B is idle), , since when A's attempt to change status of itself must be started after thread C entered the loop, and then A will need to enter monitor this time.


If the state transition that happens outside the monitor is the other direction, it will be:

(1) Thread D has just finished its jobs and wants to set its status to idle.
(2) Thread D checks |mThreadsIdledListeners|, and nothing is in the list. So Thread D decides not
    to enter |mMonitor|
(3) Thread D is suspend before it update status of itself.
(4) A listener is put into |mThreadsIdledListeners|.
(5) Thread C wants to changes status of itself. It checks |mThreadsIdledListeners| and finds something
    inside the list, thread C then enters |mMonitor|, updates its status and checks thread info in
    |mThreadStatusInfos| while D is changing status of itself out of monitor.

Thread C will find that thread D is working (D is actually idle and just hasn't update its status), then C returns without firing any notification. Finding that thread D is working can make our checking mechinasm miss a chance to fire a notification: because thread D thought there's nothing in |mThreadsIdledListeners| and thus won't check the |mThreadStatusInfos| after changing the status of itself.

|mWillBeWorking| can be used to address this problem. Say we require each thread to put the value that is going to be set to |mWorking| to |mWillBeWorking| before the thread decide whether it should enter |mMonitor| to change status or not. Thus C finds that D is working while D's |mWillBeWorking| is false, and C realizes that D is just updating and can treat D as an idle thread.

It doesn't matter whether D will check thread status after changing its own status or not. If D checks, which means D will enter the monitor before updating status, thus D must be blocked until C has finished dispatching the notification task to main thread, and D will find that main thread is working and will not fire an additional event. On the other hand, if D doesn't check |mThreadStatusInfos|, it's still ok, because C has treated D as an idle thread already.

Patrick Wang (Chih-Kai Wang) (:kk1fff)

Assignee

Comment 45

•

10 years ago

Attached patch Part 1: report status of each thread. (obsolete) — Details — Splinter Review

I rewrote the comments by giving each participating thread. Hoping this makes it a bit more clear.

Attachment #8517393 - Attachment is obsolete: true

Attachment #8519673 - Flags: review?(nfroyd)

Patrick Wang (Chih-Kai Wang) (:kk1fff)

Assignee

Comment 46

•

10 years ago

Attached patch Part 2: let nuwa wait for all tasks of preload to be done. (obsolete) — Details — Splinter Review

Attachment #8517397 - Attachment is obsolete: true

Attachment #8517397 - Flags: review?(cyu)

Attachment #8519674 - Flags: review?(cyu)

Patrick Wang (Chih-Kai Wang) (:kk1fff)

Assignee

Comment 47

•

10 years ago

Attached patch Part 1: report status of each thread. (obsolete) — Details — Splinter Review

Attachment #8519673 - Attachment is obsolete: true

Attachment #8519673 - Flags: review?(nfroyd)

Attachment #8519707 - Flags: review?(nfroyd)

Nathan Froyd [:froydnj]

Comment 48

•

10 years ago

Comment on attachment 8519707 [details] [diff] [review]
Part 1: report status of each thread.

Review of attachment 8519707 [details] [diff] [review]:
-----------------------------------------------------------------

Thanks for working through this with me.  The detailed comment is much better!

Have you done any performance testing with these patches applied?

::: xpcom/threads/nsThreadManager.cpp
@@ +477,5 @@
> +    //
> +    // (1) Thread D has just finished its jobs and wants to set its status to idle.
> +    // (2) Thread D checks |mThreadsIdledListeners|, and nothing is in the list.
> +    //     So Thread D decides not to enter |mMonitor|.
> +    // (3) Thread D is suspend before it update status of itself.

Nit: "is suspended before it updates its own status."

@@ +484,5 @@
> +    //     |mThreadsIdledListeners| and finds something inside the list. Thread C
> +    //     then enters |mMonitor|, updates its status and checks thread info in
> +    //     |mThreadStatusInfos| while D is changing status of itself out of monitor.
> +    //
> +    // Thread C will find that thread D is working (D actaully wants to change its

Nit: "actually"

@@ +487,5 @@
> +    //
> +    // Thread C will find that thread D is working (D actaully wants to change its
> +    // status to idle before C starting to check), then C returns without firing
> +    // any notification. Finding that thread D is working can make our checking
> +    // mechinasm miss a chance to fire a notification: because thread D thought

Nit: "mechanism"

@@ +491,5 @@
> +    // mechinasm miss a chance to fire a notification: because thread D thought
> +    // there's nothing in |mThreadsIdledListeners| and thus won't check the
> +    // |mThreadStatusInfos| after changing the status of itself.
> +    //
> +    // |mWillBeWorking| can be used to address this problem. We require each

Now that |mWorking| is Atomic, I wonder if we can just structure things:

  aInfo->mWorking = aIsWorking;
  if (mThreadStatusInfos.Length() > 0) {
    bool hasWorkingThread = false;
    ReentrantMonitorAutoEnter mon(*mMonitor);
    for (size_t i = 0; i < mThreadStatusInfos.Length(); ++i) {
      ...
    }
  }

Which means that we're doing fewer Atomic accesses, which ought to be a good thing.  I *think* the arguments that you used here apply equally for the above algorithm.

Attachment #8519707 - Flags: review?(nfroyd) → feedback+

Patrick Wang (Chih-Kai Wang) (:kk1fff)

Assignee

Comment 49

•

10 years ago

(In reply to Nathan Froyd (:froydnj) from comment #48)

> Have you done any performance testing with these patches applied?

I've done testing of app startup time. Since the additional synchronization mechanism could affect performance at every where, I think that a trivial testing can reflex the effect. It doesn't seem much difference between startup time of clock with these patches applied and that without patches applied. I've run 5 time for both cases, the results are almost the same.

I've also measured the time spent from b2g startup to the first preallocated being ready on flame. If these patches are not applied, it, from b2g startup, takes about 15.5 secs to Nuwa starting to freeze and takes about 18 secs to the first preallocated process becoming ready. With these patches applied, it takes about 18 secs to Nuwa starting to freeze and takes about 18.5 sec to the first preallocated process becoming ready. With these patches, however, it sometimes costs more time (about 25 sec) from b2g startup to Nuwa starting to freeze. I think this case is due to events from other source are dispatched to chrome process and make threads take longer to consume all of the events.

Patrick Wang (Chih-Kai Wang) (:kk1fff)

Assignee

Comment 50

•

10 years ago

(In reply to Nathan Froyd (:froydnj) from comment #48)
> Now that |mWorking| is Atomic, I wonder if we can just structure things:
> 
>   aInfo->mWorking = aIsWorking;
>   if (mThreadStatusInfos.Length() > 0) {
>     bool hasWorkingThread = false;
>     ReentrantMonitorAutoEnter mon(*mMonitor);
>     for (size_t i = 0; i < mThreadStatusInfos.Length(); ++i) {
>       ...
>     }
>   }
> 
> Which means that we're doing fewer Atomic accesses, which ought to be a good
> thing.  I *think* the arguments that you used here apply equally for the
> above algorithm.

I don't think |mWorking| can be taken out of the monitor. Since this will make |mWorking|s be able to be changed when a thread is walking through the |mThreadStatusInfos|.

Patrick Wang (Chih-Kai Wang) (:kk1fff)

Assignee

Comment 51

•

10 years ago

Attached patch Part 1: report status of each thread. (obsolete) — Details — Splinter Review

just fixing nits...

Attachment #8519707 - Attachment is obsolete: true

Attachment #8521340 - Flags: review?(nfroyd)

Nathan Froyd [:froydnj]

Comment 52

•

10 years ago

Comment on attachment 8521340 [details] [diff] [review]
Part 1: report status of each thread.

Review of attachment 8521340 [details] [diff] [review]:
-----------------------------------------------------------------

Oh, yes, we still need things under the monitor.  Sigh.

Thanks for working through everything here.

Attachment #8521340 - Flags: review?(nfroyd) → review+

Patrick Wang (Chih-Kai Wang) (:kk1fff)

Assignee

Comment 53

•

10 years ago

Thank you for the review, Nathan!

Patrick Wang (Chih-Kai Wang) (:kk1fff)

Assignee

Comment 54

•

10 years ago

https://treeherder.mozilla.org/ui/#/jobs?repo=try&revision=e4b2f842b6d1

Nuwa creation test failure (I guess it's because it takes more time then before, since some of m5 ended successfully), and try finds that there are leakages.

Cervantes Yu [:cyu] [:cervantes]

Comment 55

•

10 years ago

Comment on attachment 8519674 [details] [diff] [review]
Part 2: let nuwa wait for all tasks of preload to be done.

Review of attachment 8519674 [details] [diff] [review]:
-----------------------------------------------------------------

f+ this patch because the patch looks correct to me. My comments are mostly on code clarity and I'd like to see them addressed.

::: dom/ipc/ContentChild.cpp
@@ +2089,5 @@
>          PreloadSlowThings();
>      }
>  
>  #ifdef MOZ_NUWA_PROCESS
>      if (IsNuwaProcess()) {

Please comment the intention for this message sent and what it is waiting for.

::: dom/ipc/ContentChild.h
@@ +452,5 @@
>      static ContentChild* sSingleton;
>  
>      DISALLOW_EVIL_CONSTRUCTORS(ContentChild);
> +#ifdef MOZ_NUWA_PROCESS
> +    bool mDontFlushMemory;

This variable isn't necessary. It is initialized in the double negation form: false to mDontFlushMemory means true to flush memory. It's true in the Nuwa process and the preallocated process.

if (mDontFlushMemory) { ... }
is equivalent to
if (IsNuwaProcess() || ManagedPBrowserChild().Length() == 0) { ... }

I think using this predicate makes it clearer. Don't forget to comment why you don't want to flush memory in the preallocated process.

::: dom/ipc/ContentParent.cpp
@@ +1358,5 @@
>                    nsITimerCallback)
>  
> +#ifdef MOZ_NUWA_PROCESS
> +class NuwaCanFreezeListener : public nsThreadManager::AllThreadsWereIdleListener
> +{

You need to rename this listener if you renamed NuwaCanFreeze() message.

::: dom/ipc/PContent.ipdl
@@ +505,5 @@
>      /**
>       * Notify windows in the child to apply a new app style.
>       */
>      OnAppThemeChanged();
> +    NuwaCanFreeze();

This name confuses me and I don't think it a good name. At first glance I can't see the difference and temporal relationships between NuwaCanFreeze() and NuwaReadyForFreeze(). We need to express the intention of this message more directly. NuwaFreeze() sounds a better choice.

@@ +718,5 @@
>  
>      NuwaReady();
> +    // Sent when nuwa finished its initialization process and is waiting for
> +    // parent's signal to make it freeze.
> +    NuwaReadyForFreeze();

NuwaReadyForFreeze() isn't the correct name to me because it's not really "ready" for freeze. If we ask the Nuwa process to freeze on receiving this message, we could make it deadlock because the preload.js hasn't actually finished. We need a name like NuwaWaitForFreeze() to show that it waits for the async events in preload.js to be process before freeze can actually happen.

::: dom/ipc/tests/test_NuwaProcessCreation.html
@@ +75,5 @@
>    };
>    let timeout = setTimeout(function() {
>      ok(false, "Nuwa process is not launched");
>      testEnd();
> +  }, 120000);

Why do you double the timeout?

Attachment #8519674 - Flags: review?(cyu) → feedback+

Patrick Wang (Chih-Kai Wang) (:kk1fff)

Assignee

Comment 56

•

10 years ago

(In reply to Cervantes Yu from comment #55)
> ::: dom/ipc/tests/test_NuwaProcessCreation.html
> @@ +75,5 @@
> >    };
> >    let timeout = setTimeout(function() {
> >      ok(false, "Nuwa process is not launched");
> >      testEnd();
> > +  }, 120000);
> 
> Why do you double the timeout?

Waiting for preloading increases the time that Nuwa consumes to become ready. It's not quite obvious at boot time since the task queue contains relatively fewer tasks then, but when we test restarting Nuwa after many event sources are activated, the increase can be obvious. And running in emulator basically amplifies the increase. On try, increasing the timeout can increase the possibility of passing this test. Actually, I am testing with longer timeout to see if this can be passed on try.

Patrick Wang (Chih-Kai Wang) (:kk1fff)

Assignee

Comment 57

•

10 years ago

Attached patch Part 4: bump the leak check threshold — Details — Splinter Review

With part 1, nsThread's size is increased by 28 byte (a void* + a reentrant monitor), and there have been 2 nsThreads leaked before applying this patch. And there are 3 more reentrant monitors counted by reference count tracer after part 1 applied: 2 are with nsThread, another 1 is with nsThreadManager. Thus I'm increasing the threshold by 128 (28*2+24*3) bytes.

Attachment #8525267 - Flags: review?(nfroyd)

Nathan Froyd [:froydnj]

Updated

•

10 years ago

Attachment #8525267 - Flags: review?(nfroyd) → review+

Patrick Wang (Chih-Kai Wang) (:kk1fff)

Assignee

Comment 58

•

10 years ago

Attached patch Part 2: let nuwa wait for all tasks of preload to be done. (obsolete) — Details — Splinter Review

Fix with previous review. I also increased timeout of nuwa's deadlock test case. Timeout of both deadlock test and creation test are 4 minutes, so we still have 1 minute to reset environment if the test is failed.

Attachment #8519674 - Attachment is obsolete: true

Attachment #8525754 - Flags: review?(cyu)

Patrick Wang (Chih-Kai Wang) (:kk1fff)

Assignee

Comment 59

•

10 years ago

Attached patch Part 1: report status of each thread. (r=nfroyd) — Details — Splinter Review

Fixing test failure on try: Moving the routine of adding a |ThreadStatusInfo| into |mThreadStatusInfos| and the routine of removing a |ThreadStatusInfo| from |mThreadStatusInfos| to constructor and destructor of |ThreadStatusInfo|, respectively, to prevent |ThreadStatusInfo| from being left in the array if a thread exits without calling unregister. Carrying r+ as this doesn't change the logic of the patch.

Attachment #8521340 - Attachment is obsolete: true

Attachment #8527420 - Flags: review+

Cervantes Yu [:cyu] [:cervantes]

Comment 60

•

10 years ago

Comment on attachment 8525754 [details] [diff] [review]
Part 2: let nuwa wait for all tasks of preload to be done.

Review of attachment 8525754 [details] [diff] [review]:
-----------------------------------------------------------------

::: dom/ipc/ContentChild.cpp
@@ +1988,4 @@
>          // Don't flush memory in the nuwa process: the GC thread could be frozen.
> +        // If there's no PBrowser child, don't flush memory, either. GC writes
> +        // to copy-on-write pages and makes preallocated process take more memory
> +        // before it actually become an app.

s/become/becomes

::: dom/ipc/PContent.ipdl
@@ +507,5 @@
>      /**
>       * Notify windows in the child to apply a new app style.
>       */
>      OnAppThemeChanged();
> +    NuwaFreeze();

nit: please add an empty line above and below this NuwaFreeze(). It's not supposed to be clustered with OnAppThemeChanged(), and there is originally an empty line before starting the parent: section.

Attachment #8525754 - Flags: review?(cyu) → review+

Patrick Wang (Chih-Kai Wang) (:kk1fff)

Assignee

Updated

•

10 years ago

Attachment #8527420 - Attachment description: Part 1: report status of each thread. → Part 1: report status of each thread. (r=nfroyd)

Patrick Wang (Chih-Kai Wang) (:kk1fff)

Assignee

Comment 61

•

10 years ago

Attached patch Part 2: let nuwa wait for all tasks of preload to be done. (r=cyu) — Details — Splinter Review

updated according to previous review.

Attachment #8525754 - Attachment is obsolete: true

Attachment #8527432 - Flags: review+

Patrick Wang (Chih-Kai Wang) (:kk1fff)

Assignee

Comment 62

•

10 years ago

Attached patch Part 3: reinitialize some module after forking. (r=fabrice) — Details — Splinter Review

Fix failure on try by not having process crash (that is, removing the assertion and making it simply return) when calling post fork preload before calling preload. It happens when Nuwa is created since post fork preload is invoked init() of content child. Carrying r+.

Attachment #8513345 - Attachment is obsolete: true

Patrick Wang (Chih-Kai Wang) (:kk1fff)

Assignee

Comment 63

•

10 years ago

try:
https://treeherder.mozilla.org/ui/#/jobs?repo=try&revision=3eff7ef3e2b0

Patrick Wang (Chih-Kai Wang) (:kk1fff)

Assignee

Comment 64

•

10 years ago

https://hg.mozilla.org/integration/b2g-inbound/rev/cd9c2aaf1343
https://hg.mozilla.org/integration/b2g-inbound/rev/de1da65301a8
https://hg.mozilla.org/integration/b2g-inbound/rev/4420bb4e5e7c
https://hg.mozilla.org/integration/b2g-inbound/rev/9beb53e53951

Patrick Wang (Chih-Kai Wang) (:kk1fff)

Assignee

Comment 65

•

10 years ago

fix non-unified build bustage
https://hg.mozilla.org/integration/b2g-inbound/rev/32933803bc74

Ryan VanderMeulen [:RyanVM]

Comment 66

•

10 years ago

Backed out for frequent B2G debug mochitest crashes.
https://treeherder.mozilla.org/ui/logviewer.html#?job_id=888022&repo=b2g-inbound

https://hg.mozilla.org/integration/b2g-inbound/rev/5c4d07e2199e

Wes Kocher (:KWierso) (Not reading bugmail; email directly if needed)

Comment 67

•

10 years ago

https://hg.mozilla.org/mozilla-central/rev/32933803bc74

Status: NEW → RESOLVED

Closed: 10 years ago

Resolution: --- → FIXED

Target Milestone: --- → 2.2 S1 (5dec)

Patrick Wang (Chih-Kai Wang) (:kk1fff)

Assignee

Comment 68

•

10 years ago

Reopening as the patches are backed out.

Status: RESOLVED → REOPENED

Resolution: FIXED → ---

Patrick Wang (Chih-Kai Wang) (:kk1fff)

Assignee

Comment 69

•

10 years ago

Attached patch Part 1b: fix dead lock. (obsolete) — Details — Splinter Review

Nathan, could you review this update patch to part 1? Thanks.

This updated patch fixes the deadlock by preventing a thread from trying to hold locks needed by dispatching task to main thread in the |nsThreadManager.mMonitor| and |mainthread.mThreadStatusMonitor|.


The cause of mochitest 11 failure is that a deadlock occurs when main thread is checking the |mThreadStatusInfos| list in |nsThread.mThreadStatusMonitor| and |nsThreadManager.mMonitor| monitors, and another thread (Thread A) whose status we don't care is trying to dispatch a task to main thread. So A attempts to change main thread's status to working, it holds |mainthread.mLock| and is blocked by |mainthread.mThreadStatusMonitor| which is containing main thread who enters the monitor to set status of itself to idle.

Then main thread finds all threads are idle (actually, A is not idle, but we ignore it), then it tries to dispatch a task to main thread, which is itself, so it try to lock |mainthread.mLock|, and then it will be blocked by A, while A is blocked by the monitor which mainthread is in.

Attachment #8528789 - Flags: review?(nfroyd)

Michael Vines [:m1] [:evilmachines]

Updated

•

10 years ago

Blocks: CAF-v2.2-metabug

Patrick Wang (Chih-Kai Wang) (:kk1fff)

Assignee

Comment 70

•

10 years ago

The milestone was set when first landing and I suppose it should be reset when they were backed out.

No longer blocks: CAF-v2.2-metabug

Target Milestone: 2.2 S1 (5dec) → ---

Patrick Wang (Chih-Kai Wang) (:kk1fff)

Assignee

Comment 71

•

10 years ago

I removed the dependency of 1063044 because I though the dependency was based on the milestone setting which I think was not accurate. But I should have asked Michael whether or not this bug should still be set to be a blocker of 2.2 meta bug.

Flags: needinfo?(mvines)

Michael Vines [:m1] [:evilmachines]

Comment 72

•

10 years ago

(np, sounds like this'll save ~1-2 MB in the preallocated process which is something we'd very much like in v2.2)

Blocks: CAF-v2.2-metabug

Flags: needinfo?(mvines)

Nathan Froyd [:froydnj]

Comment 73

•

10 years ago

Comment on attachment 8528789 [details] [diff] [review]
Part 1b: fix dead lock.

Review of attachment 8528789 [details] [diff] [review]:
-----------------------------------------------------------------

I apologize for taking so long to review this.  PTO + Portland did not help.

::: xpcom/threads/nsThread.cpp
@@ +868,5 @@
>      }
>    }
> +  if (notifyAllIdleRunnable) {
> +    // If we are on main thread, we need to dispatch the task after we left
> +    // the monitors.

I think this is worth describing in some detail *why* we need to do that.

@@ +1088,5 @@
>  nsThread::SetWorking()
>  {
>    nsThreadManager::get()->SetThreadIsWorking(
> +    static_cast<nsThreadManager::ThreadStatusInfo*>(mThreadStatusInfo),
> +    true, nullptr);

Why not |nsThreadManager::get()->SetThreadWorking()|?  Is it slightly less efficient, but I think it's worth it.  Then we don't need to store mThreadStatusInfo in nsThread, which reduces coupling between it and nsThreadManager.

@@ +1096,5 @@
>  nsThread::SetIdle()
>  {
>    nsThreadManager::get()->SetThreadIsWorking(
> +    static_cast<nsThreadManager::ThreadStatusInfo*>(mThreadStatusInfo),
> +    false, nullptr);

Same question here, but with |->SetThreadIdle()|?

::: xpcom/threads/nsThreadManager.cpp
@@ +88,5 @@
>      ReentrantMonitorAutoEnter mon(*(mMgr->mMonitor));
>      mMgr->mThreadStatusInfos.AppendElement(this);
> +    if (NS_IsMainThread()) {
> +      mMgr->mMainThreadStatusInfo = this;
> +    }

This code should move to GetCurrentThreadStatusInfo, there's no reason that ThreadStatusInfo should bear this responsibility.

@@ +96,5 @@
>      ReentrantMonitorAutoEnter mon(*(mMgr->mMonitor));
>      mMgr->mThreadStatusInfos.RemoveElement(this);
> +    if (NS_IsMainThread()) {
> +      mMgr->mMainThreadStatusInfo = nullptr;
> +    }

Likewise for this, except that it needs to move to DeleteThreadStatusInfo.  We can get the required handle on nsThreadManager there by calling ::get().

Once those two things are moved, we don't need the |mMgr| member either.

@@ +548,5 @@
> +        // If we want to dispatch a notification task, we need to dispatch
> +        // after left mMonitor. But we should set main thread to working
> +        // before we leave.
> +        mMainThreadStatusInfo->mWorking = true;
> +        mMainThreadStatusInfo->mWillBeWorking = true;

This makes me nervous.  Why can't something like the following happen:

0. Main thread is idle, enters this function.
1. Main thread sets itself as |mWillBeWorking = false|;
2. Context switch someplace else.
3. Thread X enters this loop, discovers all threads idle.
4. Thread X sets these two members to |true|.
5. Event is dispatched to the main thread.
6. Main thread resumes, enters the above loop, and sets itself as |mWorking = false|.

Now these two members have gotten out of sync.

I think we must have locking to protect us from doing this sort of thing, but I'm having trouble seeing it at the moment.

@@ +556,5 @@
> +
> +    if (runnable) {
> +      if (NS_IsMainThread()) {
> +        // If we are on main thread, we can't dispatch to ourself while in
> +        // status change monitor.

This comment doesn't sound right to me--even if we're not on the main thread here, we're not dispatching within the status change monitor, right?  (The monitor was released above when we exited the block.)

::: xpcom/threads/nsThreadManager.h
@@ +76,5 @@
>    void SetThreadIdle();
>    void SetThreadWorking();
> +  void SetThreadIsWorking(ThreadStatusInfo* aInfo,
> +                          bool aIsWorking,
> +                          nsIRunnable **aReturnRunnable);

I think this should be instead:

void SetThreadIdle(nsIRunnable** aReturnRunnable);

It needs a comment describing why we have this method and |SetThreadIdle|.

Attachment #8528789 - Flags: review?(nfroyd) → feedback+

Patrick Wang (Chih-Kai Wang) (:kk1fff)

Assignee

Comment 74

•

10 years ago

(In reply to Nathan Froyd (:froydnj) from comment #73)
> @@ +1088,5 @@
> >  nsThread::SetWorking()
> >  {
> >    nsThreadManager::get()->SetThreadIsWorking(
> > +    static_cast<nsThreadManager::ThreadStatusInfo*>(mThreadStatusInfo),
> > +    true, nullptr);
> 
> Why not |nsThreadManager::get()->SetThreadWorking()|?  Is it slightly less
> efficient, but I think it's worth it.  Then we don't need to store
> mThreadStatusInfo in nsThread, which reduces coupling between it and
> nsThreadManager.
> 
This function is used to set the thread which is represented by the nsThread. nsThreadManager::SetThreadWorking can only set status of currently running thread since it gets the ThreadStatusInfo from TLS.

> @@ +548,5 @@
> > +        // If we want to dispatch a notification task, we need to dispatch
> > +        // after left mMonitor. But we should set main thread to working
> > +        // before we leave.
> > +        mMainThreadStatusInfo->mWorking = true;
> > +        mMainThreadStatusInfo->mWillBeWorking = true;
> 
> This makes me nervous.  Why can't something like the following happen:
> 
> 0. Main thread is idle, enters this function.
> 1. Main thread sets itself as |mWillBeWorking = false|;
> 2. Context switch someplace else.
> 3. Thread X enters this loop, discovers all threads idle.
> 4. Thread X sets these two members to |true|.
> 5. Event is dispatched to the main thread.
> 6. Main thread resumes, enters the above loop, and sets itself as |mWorking
> = false|.
> 
> Now these two members have gotten out of sync.
> 
Yes, it can happen. Thank you for catching this. 

> I think we must have locking to protect us from doing this sort of thing,
> but I'm having trouble seeing it at the moment.
> 
> @@ +556,5 @@
> > +
> > +    if (runnable) {
> > +      if (NS_IsMainThread()) {
> > +        // If we are on main thread, we can't dispatch to ourself while in
> > +        // status change monitor.
> 
> This comment doesn't sound right to me--even if we're not on the main thread
> here, we're not dispatching within the status change monitor, right?  (The
> monitor was released above when we exited the block.)
> 
It means |mThreadStatusMonitor|, that we have hold before entering this function to make thread status consistent to the actual thread status. I will make this comment more precise here.

Patrick Wang (Chih-Kai Wang) (:kk1fff)

Assignee

Comment 75

•

10 years ago

Attached patch Part 1b: fix dead lock. — Details — Splinter Review

Updated patch.

I added a flag |nsThreadManager::mDispatchingToMainThread|, it is set when we found threads are all idle and before the task of notifying listeners has been dispatched. The flag changed and read under |nsThreadManager::mMonitor| is to prevent another notification event from being fired before the task made main thread no longer idle. Thus we don't need to change main thread's status in another thread's turn of checking threads' status, and will prevent status of main thread from being out of sync.

Another change is to turn |DeleteThreadStatusInfo| into a static member of |nsThreadManager|, to access |mThreadStatusInfos| and |mMonitor| of |nsThreadManager| when cleaning up |ThreadStatusInfo|.

And improve comments.

Attachment #8528789 - Attachment is obsolete: true

Attachment #8536302 - Flags: review?(nfroyd)

Nathan Froyd [:froydnj]

Comment 76

•

10 years ago

Comment on attachment 8536302 [details] [diff] [review]
Part 1b: fix dead lock.

Review of attachment 8536302 [details] [diff] [review]:
-----------------------------------------------------------------

::: xpcom/threads/nsThreadManager.cpp
@@ +554,5 @@
> +    if (runnable) {
> +      if (NS_IsMainThread()) {
> +        // We are in main thread's |nsThread::mThreadStatusMonitor|. Dispatching
> +        // a task to ourself put us in dangerous of forming deadlock. Returning
> +        // the task and let our caller to dispatch for us.

"We are holding the main thread's |nsThread::mThreadStatusMonitor|.  If we dispatch a task to ourself, then we are in danger of causing deadlock.  Instead, return the task, and let the caller dispatch it for us."

::: xpcom/threads/nsThreadManager.h
@@ +76,5 @@
> +
> +  // |SetThreadWorking| and |SetThreadIdle| set status of thread that is
> +  // currently running. They get thread status information from TLS and pass
> +  // the information to |SetThreadIsWorking|.
> +  void SetThreadIdle(nsIRunnable **aReturnRunnable);

Nit: |nsIRunnable** aReturnRunnable|.

@@ +87,5 @@
> +  // |ResetIsDispatchingToMainThread| should be invoked after caller on main
> +  // thread dispatched the task to main thread's queue.
> +  void SetThreadIsWorking(ThreadStatusInfo* aInfo,
> +                          bool aIsWorking,
> +                          nsIRunnable **aReturnRunnable);

Nit: |nsIRunnable** aReturnRunnable|

Attachment #8536302 - Flags: review?(nfroyd) → review+

Patrick Wang (Chih-Kai Wang) (:kk1fff)

Assignee

Comment 77

•

10 years ago

https://hg.mozilla.org/integration/mozilla-inbound/rev/b71ccef9c674
https://hg.mozilla.org/integration/mozilla-inbound/rev/adae9be2294d
https://hg.mozilla.org/integration/mozilla-inbound/rev/a80140872ea5
https://hg.mozilla.org/integration/mozilla-inbound/rev/39e167bbd14c

Ryan VanderMeulen [:RyanVM]

Comment 78

•

10 years ago

https://hg.mozilla.org/mozilla-central/rev/b71ccef9c674
https://hg.mozilla.org/mozilla-central/rev/adae9be2294d
https://hg.mozilla.org/mozilla-central/rev/a80140872ea5
https://hg.mozilla.org/mozilla-central/rev/39e167bbd14c

Status: REOPENED → RESOLVED

Closed: 10 years ago → 10 years ago

Flags: in-testsuite+

Resolution: --- → FIXED

Target Milestone: --- → 2.2 S2 (19dec)

Ben Turner (not reading bugmail, use the needinfo flag!)

Updated

•

10 years ago

Blocks: 1113428

Ben Turner (not reading bugmail, use the needinfo flag!)

Updated

•

10 years ago

Blocks: 1113429

Peter Bylenga [:PBylenga]

Updated

•

9 years ago

Depends on: 1121364

Inder

Updated

•

9 years ago

Blocks: CAF-v3.0-FL-metabug

Inder

Updated

•

9 years ago

No longer blocks: CAF-v3.0-FL-metabug

Ben Turner (not reading bugmail, use the needinfo flag!)

Updated

•

9 years ago

Depends on: 1151672

Cervantes Yu [:cyu] [:cervantes]

Comment 79

•

9 years ago

We need to back the patches out on central and find another solution that is safer and not as fragile. But 1151672 contains the discussion for this decision. While this bug is backed out, bug 1138620 also needs to be backed out.

Pulsebot

Comment 80

•

9 years ago

Backout:
https://hg.mozilla.org/integration/b2g-inbound/rev/78b19f9f8da8
https://hg.mozilla.org/integration/b2g-inbound/rev/4d2839eea957

Cervantes Yu [:cyu] [:cervantes]

Updated

•

9 years ago

Blocks: 1166207

Carsten Book [:Tomcat]

Comment 81

•

9 years ago

(In reply to Pulsebot from comment #80)
> Backout:
> https://hg.mozilla.org/integration/b2g-inbound/rev/78b19f9f8da8
> https://hg.mozilla.org/integration/b2g-inbound/rev/4d2839eea957

(In reply to Cervantes Yu from comment #79)
> We need to back the patches out on central and find another solution that is
> safer and not as fragile. But 1151672 contains the discussion for this
> decision. While this bug is backed out, bug 1138620 also needs to be backed
> out.

sorry had to back this out since this broke ics debug tests like https://treeherder.mozilla.org/logviewer.html#?job_id=1958360&repo=b2g-inbound

Flags: needinfo?(cyu)

Pulsebot

Comment 82

•

9 years ago

Backout:
https://hg.mozilla.org/integration/b2g-inbound/rev/65d98d97db61
https://hg.mozilla.org/integration/b2g-inbound/rev/34495fd71b79

Cervantes Yu [:cyu] [:cervantes]

Comment 83

•

9 years ago

Canceling NI flag. The bustage is fixed and the change is repushed.

Flags: needinfo?(cyu)

Pulsebot

Comment 84

•

9 years ago

Backout:
https://hg.mozilla.org/integration/b2g-inbound/rev/c7c340d2774c

You need to log in before you can comment on or make changes to this bug.