Last Comment Bug 743112 - JS GC: Incremental deferred release
: JS GC: Incremental deferred release
Status: RESOLVED FIXED
[Snappy]
:
Product: Core
Classification: Components
Component: XPConnect (show other bugs)
: unspecified
: All All
: -- normal (vote)
: mozilla17
Assigned To: [PTO to Dec5] Bill McCloskey (:billm)
:
: Andrew Overholt [:overholt]
Mentors:
Depends on: 785806 785778 806433
Blocks: 754303
  Show dependency treegraph
 
Reported: 2012-04-05 18:08 PDT by [PTO to Dec5] Bill McCloskey (:billm)
Modified: 2012-10-30 14:01 PDT (History)
8 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments
patch (2.82 KB, patch)
2012-04-06 10:58 PDT, [PTO to Dec5] Bill McCloskey (:billm)
no flags Details | Diff | Splinter Review
patch v2 (3.77 KB, patch)
2012-04-06 15:09 PDT, [PTO to Dec5] Bill McCloskey (:billm)
bugs: review+
Details | Diff | Splinter Review
JS engine changes (7.61 KB, patch)
2012-07-26 11:54 PDT, [PTO to Dec5] Bill McCloskey (:billm)
jcoppeard: review+
Details | Diff | Splinter Review
patch v3 (3.14 KB, patch)
2012-07-26 11:56 PDT, [PTO to Dec5] Bill McCloskey (:billm)
bugs: review+
Details | Diff | Splinter Review
patch v4 (8.34 KB, patch)
2012-08-02 14:35 PDT, [PTO to Dec5] Bill McCloskey (:billm)
bugs: review+
ryanvm: feedback+
Details | Diff | Splinter Review

Description [PTO to Dec5] Bill McCloskey (:billm) 2012-04-05 18:08:32 PDT
I've been looking at my own garbage collector time logs as well as some other people's. I've noticed that, on rare occasions, the XPConnect deferred release code takes a really long time. As an example, I'm looking at a GC that took 1841ms, with the deferred release phase taking 1709ms.

I don't know of any reason why we couldn't do deferred release in little slices. Could we inject a task into the event loop that would do deferred release for, say, 10ms. If it had more work to do after that, it could schedule another task.

Olli, does this seem feasible?
Comment 1 Olli Pettay [:smaug] 2012-04-06 01:41:11 PDT
Yes.
Comment 2 [PTO to Dec5] Bill McCloskey (:billm) 2012-04-06 10:58:14 PDT
Created attachment 612945 [details] [diff] [review]
patch

I did some light testing and this seems to work okay.
Comment 3 Ben Turner (not reading bugmail, use the needinfo flag!) 2012-04-06 11:08:53 PDT
Comment on attachment 612945 [details] [diff] [review]
patch

>+  public:
>+    IncrementalReleaseRunnable(nsTArray<nsISupports *> &items) : items(items) {}

FYI you can use SwapElements() here to avoid copying the array.
Comment 4 [PTO to Dec5] Bill McCloskey (:billm) 2012-04-06 11:21:58 PDT
(In reply to ben turner [:bent] from comment #3)
> FYI you can use SwapElements() here to avoid copying the array.

The class stores a reference, so I don't think it's going to copy. Although now that you mention it, this could be a problem. It's possible that one of these events could be outstanding after the XPCJSRuntime has been destroyed. I'm also realizing now that we could maybe get shutdown leaks if we don't release right away during shutdown.

I'll post a new patch in a bit.
Comment 5 Olli Pettay [:smaug] 2012-04-06 14:06:58 PDT
Comment on attachment 612945 [details] [diff] [review]
patch

>+NS_IMETHODIMP
>+IncrementalReleaseRunnable::Run()
>+{
>+    MOZ_ASSERT(NS_IsMainThread());
>+
>+    PRTime started = PR_Now();
>+    int counter = 0;
>+    while (1) {
>+        PRUint32 count = items.Length();
>+        if (!count)
>+            break;
>+
>+        nsISupports *wrapper = items[count-1];
space before and after -

>+        items.RemoveElementAt(count-1);
Same here, though, I wonder if RemoveElementAt shows up in the profiles.
This ends up doing lots of memmoves. Could you replace the existing value with null, and
remove the whole range of null elements at once.

>+        NS_RELEASE(wrapper);
>+
>+        /* We don't want to call PR_Now() too often. */
>+        counter++;
>+        if (counter == 20) {
20 is perhaps a bit low.


>+            counter = 0;
>+            if (PR_Now() - started >= SliceTime)
Perhaps you could use TimeStamp() for this.

Clearing the review request since I assume a new patch is coming.
Comment 6 Ben Turner (not reading bugmail, use the needinfo flag!) 2012-04-06 14:33:49 PDT
(In reply to Olli Pettay [:smaug] from comment #5)
> >+        items.RemoveElementAt(count-1);
> Same here, though, I wonder if RemoveElementAt shows up in the profiles.
> This ends up doing lots of memmoves. Could you replace the existing value
> with null, and
> remove the whole range of null elements at once.

He's removing from the end, so there shouldn't be any memmove going on here.
Comment 7 Olli Pettay [:smaug] 2012-04-06 14:42:03 PDT
Oops, I missed that.
Comment 8 [PTO to Dec5] Bill McCloskey (:billm) 2012-04-06 15:09:08 PDT
Created attachment 613001 [details] [diff] [review]
patch v2

In this version, the runnable has its own list of stuff to release. It uses SwapElements as Ben suggested. And I think it handles shutdown properly. I looked at the code for NS_DispatchToMainThread. It correctly returns an error if the event loop has stopped running. However, it prints some sort of warning in debug builds. Oh well.
Comment 9 Olli Pettay [:smaug] 2012-04-06 15:54:48 PDT
Comment on attachment 613001 [details] [diff] [review]
patch v2


>+NS_IMETHODIMP
>+IncrementalReleaseRunnable::Run()
>+{
>+    MOZ_ASSERT(NS_IsMainThread());
>+
>+    TimeDuration sliceTime = TimeDuration::FromMilliseconds(SliceMillis);
>+    TimeStamp started = TimeStamp::Now();
>+    int counter = 0;
PRUint32
Comment 10 [PTO to Dec5] Bill McCloskey (:billm) 2012-07-26 11:54:02 PDT
Created attachment 646237 [details] [diff] [review]
JS engine changes

This patch adds a field to the runtime saying whether the most recent GC was incremental or not. It exposes the field via jsfriendapi.h.
Comment 11 [PTO to Dec5] Bill McCloskey (:billm) 2012-07-26 11:56:58 PDT
Created attachment 646238 [details] [diff] [review]
patch v3

The previous patch didn't work because we sometimes GC synchronously and expect certain objects to die right away. This patch fixes the problem by doing the incremental release only when the GC was incremental. It passes tryserver.

Olli, the last time we talked about this, you wanted me to addref nsLayoutStatics, I think. I think that the incremental GC check should take care of your concerns, since we never do incremental GC during shutdown.
Comment 12 Olli Pettay [:smaug] 2012-07-26 13:38:38 PDT
Comment on attachment 646238 [details] [diff] [review]
patch v3


>+class IncrementalReleaseRunnable : public nsRunnable
>+{
>+    nsTArray<nsISupports *> items;
Extra space after nsISupports.
s/items/mItems/

>+
>+    static const PRTime SliceMillis = 10; /* ms */
>+
>+  public:
>+    IncrementalReleaseRunnable(nsTArray<nsISupports *> &items);
>+
>+    NS_DECL_NSIRUNNABLE
>+};
>+
>+IncrementalReleaseRunnable::IncrementalReleaseRunnable(nsTArray<nsISupports *> &items)
ditto, and & goes with type, not with param name. aItem.
(though, xpconnect coding style is odd.)



>+IncrementalReleaseRunnable::Run()
>+{
>+    MOZ_ASSERT(NS_IsMainThread());
>+
>+    TimeDuration sliceTime = TimeDuration::FromMilliseconds(SliceMillis);
>+    TimeStamp started = TimeStamp::Now();
>+    PRUint32 counter = 0;
>+    while (1) {
while (true) {

>+
>+    items.Compact();
Do we need this?  I think not.


>+
>+    if (items.Length()) {
>+        nsresult rv = NS_DispatchToMainThread(this);
>+        if (NS_FAILED(rv))
>+            Run();
>+    }
>+
>+    return NS_OK;
>+}
>+
>+static void
>+ReleaseIncrementally(nsTArray<nsISupports *> &array)
nsTArray<nsISupports*>& aArray


Because ::Run may dispatch the runnable again, I think 
IncrementalReleaseRunnable should addref/release
layoutstatics.
It is really just
nsLayoutStatics::AddRef(); in ctor and 
nsLayoutStatics::Release(); in dtor


With those, r=me
Comment 13 Andrew McCreight [:mccr8] 2012-07-26 14:08:08 PDT
> (though, xpconnect coding style is odd.)

In theory, XPConnect is supposed to be JS style, so I don't think Bill needs to do the aArg stuff. ;)
Comment 14 Olli Pettay [:smaug] 2012-07-26 14:11:28 PDT
/me argues that all js/* should move to use saner Gecko coding style :)
Comment 15 Jon Coppeard (:jonco) 2012-07-27 09:25:11 PDT
Comment on attachment 646237 [details] [diff] [review]
JS engine changes

Review of attachment 646237 [details] [diff] [review]:
-----------------------------------------------------------------

Look fine to me.
Comment 16 [PTO to Dec5] Bill McCloskey (:billm) 2012-07-27 11:33:02 PDT
https://hg.mozilla.org/integration/mozilla-inbound/rev/a5b21cffeebf

I didn't make the syntactic changes to the xpconnect code, since it follows JS style, as Andrew said. I fixed the nsLayoutStatics refcounting.
Comment 17 Ed Morley [:emorley] 2012-07-28 10:56:35 PDT
Unfortunately after RyanVM and myself spent a good 8 hours of our weekend playing avoid the coalescing with retriggers, intermittent but pretty severe mochitest-other leaks were attributed to this landing.

Examples:
https://tbpl.mozilla.org/php/getParsedLog.php?id=13941428&tree=Mozilla-Inbound
https://tbpl.mozilla.org/php/getParsedLog.php?id=13941444&tree=Mozilla-Inbound
https://tbpl.mozilla.org/php/getParsedLog.php?id=13941458&tree=Mozilla-Inbound

These leaks happen predominantly on WinXP debug & Linux {32,64} debug, and only around ~20-30% of the time.

In addition, there were a few other unexplained failures that occurred on this push, in the course of the final stage of (bisecting) retriggers:
https://tbpl.mozilla.org/php/getParsedLog.php?id=13941312&tree=Mozilla-Inbound
https://tbpl.mozilla.org/php/getParsedLog.php?id=13941290&tree=Mozilla-Inbound

Backout:
https://hg.mozilla.org/integration/mozilla-inbound/rev/29bff59d3bbe

Before this lands again, please can we have a green try run (with URL pasted in-bug so sheriffs can use it for elimination purposes), with at least 10 retriggers of mochitest-other on each of WinXP debug, Linux32 debug and Linux 64 debug.

Out of curiosity, did this get sent to try before landing? (I can't see a try url in-bug)
Comment 19 [PTO to Dec5] Bill McCloskey (:billm) 2012-07-28 13:22:30 PDT
I'm sorry this happened, Ed. I did push the patch to try and it was as green as things ever are nowadays. There was one leak reported, but it was only three windows, which I thought we normally ignore.
  https://tbpl.mozilla.org/?tree=Try&rev=fb75c988363e
The link no longer shows up, though. I don't know why.

If it helps at all, please consider that I'll be spending the next several days toiling away in frustration. These leak reports are almost certainly spurious, but tracking down why they happen is going to take forever.
Comment 20 Andrew McCreight [:mccr8] 2012-07-28 13:42:49 PDT
You could try running with the patches in bug 728294, they supposedly reduce false positives a bit.
Comment 21 Olli Pettay [:smaug] 2012-07-28 15:11:53 PDT
(In reply to Bill McCloskey (:billm) from comment #19)
> I'm sorry this happened, Ed. I did push the patch to try and it was as green
> as things ever are nowadays. There was one leak reported, but it was only
> three windows, which I thought we normally ignore.
You mean 3 nsGlobalWindow objects?
There should be no nsGlobalWindow object leaks ever.
Comment 22 Ed Morley [:emorley] 2012-07-28 15:36:51 PDT
(In reply to Bill McCloskey (:billm) from comment #19)
> I'm sorry this happened, Ed. I did push the patch to try and it was as green
> as things ever are nowadays. There was one leak reported, but it was only
> three windows, which I thought we normally ignore.
>   https://tbpl.mozilla.org/?tree=Try&rev=fb75c988363e
> The link no longer shows up, though. I don't know why.

No worries, these things happen :-) Thank you for using try; the 3 DOMWindows leak is a common intermittent, in retrospect given the low rate of failure, I don't think there was much else you could have done here. (The try repo was reset unexpectedly recently, which is why the try link won;t work; bug 778062).

Not envious of the debugging - good luck!
Comment 23 Olli Pettay [:smaug] 2012-07-28 16:14:16 PDT
(In reply to Ed Morley [:edmorley] from comment #22)
> the 3
> DOMWindows leak is a common intermittent, 
It is? Do we have a bug # for it? It really should be fixed.
Comment 24 [PTO to Dec5] Bill McCloskey (:billm) 2012-07-28 16:18:01 PDT
(In reply to Olli Pettay [:smaug] (limited connectivity July 26-29) from comment #21)
> You mean 3 nsGlobalWindow objects?
> There should be no nsGlobalWindow object leaks ever.

The leaks being discussed are not shutdown leaks. There's a different leak detector that uses the ++DOMWINDOW and --DOMWINDOW output to try to detect leaks before shutdown. It frequently reports false positives and seems pretty broken.
Comment 25 Ed Morley [:emorley] 2012-07-28 16:18:33 PDT
(In reply to Olli Pettay [:smaug] (limited connectivity July 26-29) from comment #23)
> It is? Do we have a bug # for it? It really should be fixed.

We do, sadly it's a pick three from:
https://bugzilla.mozilla.org/showdependencytree.cgi?id=754804&hide_resolved=1

Bug 728294 should cut out any false positives; how many remain after that is yet to be seen.
Comment 26 Olli Pettay [:smaug] 2012-07-28 16:35:18 PDT
(In reply to Bill McCloskey (:billm) from comment #24)
> 
> The leaks being discussed are not shutdown leaks. 
Oh, nm then.
Comment 27 [PTO to Dec5] Bill McCloskey (:billm) 2012-08-02 14:35:44 PDT
Created attachment 648492 [details] [diff] [review]
patch v4

This seems to fix the leak. Here's the try link:
  https://tbpl.mozilla.org/?tree=Try&rev=7206a5fca258
We never leak more than 4 windows in the tests, which I think is normal.

To look for leaks, we force a GC&CC at some point in the tests. This GC is non-incremental, so objects are released synchronously. However, there may have been an incremental GC that happened right before it that released objects asynchronously. In that case, there may still be some objects to be released even after the non-incremental GC finishes, and that's why we were reporting leaks.

This patch keeps a pointer to the outstanding IncrementalReleaseRunnable and ensures that it never lives beyond the next GC. This fixes the problem above, and it also ensures that we don't get into situations where the number of objects to be released incrementally grows without bound.
Comment 28 Olli Pettay [:smaug] 2012-08-02 14:58:55 PDT
Comment on attachment 648492 [details] [diff] [review]
patch v4


>+XPCIncrementalReleaseRunnable::~XPCIncrementalReleaseRunnable()
>+{
>+    nsLayoutStatics::Release();
>+}
Could you add assertion here that 'this' != runtime->mReleaseRunnable


>+
>+void
>+XPCIncrementalReleaseRunnable::ReleaseNow(bool limited)
>+{
>+    MOZ_ASSERT(NS_IsMainThread());
>+
>+    TimeDuration sliceTime = TimeDuration::FromMilliseconds(SliceMillis);
>+    TimeStamp started = TimeStamp::Now();
>+    PRUint32 counter = 0;
>+    while (1) {
>+        PRUint32 count = items.Length();
>+        if (!count)
>+            break;
>+
>+        nsISupports *wrapper = items[count - 1];
>+        items.RemoveElementAt(count - 1);
>+        NS_RELEASE(wrapper);
>+
>+        /* We don't want to read the clock too often. */
>+        counter++;
I would put 'if (limited) {' before counter and remove limited below

>+        if (counter == 100) {
>+            counter = 0;
>+            if (limited && TimeStamp::Now() - started >= sliceTime)
>+                break;
>+        }
>+    }



         case JSGC_BEGIN:
         {
+            if (self->mReleaseRunnable)
+                self->mReleaseRunnable->ReleaseNow(false);
+
Add a comment why this is needed, and why it is safe to call ReleaseNow()
Comment 29 Ryan VanderMeulen [:RyanVM] 2012-08-02 15:02:15 PDT
Comment on attachment 648492 [details] [diff] [review]
patch v4

4 leaks is normal, yes. LGTM!
Comment 30 [PTO to Dec5] Bill McCloskey (:billm) 2012-08-02 15:37:38 PDT
https://hg.mozilla.org/integration/mozilla-inbound/rev/f41cd7125dbd
Comment 31 Ed Morley [:emorley] 2012-08-03 07:32:52 PDT
https://hg.mozilla.org/mozilla-central/rev/f41cd7125dbd

Note You need to log in before you can comment on or make changes to this bug.