JS GC: Incremental deferred release

RESOLVED FIXED in mozilla17

Status

()

Core
XPConnect
RESOLVED FIXED
5 years ago
5 years ago

People

(Reporter: billm, Assigned: billm)

Tracking

(Depends on: 1 bug)

unspecified
mozilla17
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(Not tracked)

Details

(Whiteboard: [Snappy])

Attachments

(2 attachments, 3 obsolete attachments)

(Assignee)

Description

5 years ago
I've been looking at my own garbage collector time logs as well as some other people's. I've noticed that, on rare occasions, the XPConnect deferred release code takes a really long time. As an example, I'm looking at a GC that took 1841ms, with the deferred release phase taking 1709ms.

I don't know of any reason why we couldn't do deferred release in little slices. Could we inject a task into the event loop that would do deferred release for, say, 10ms. If it had more work to do after that, it could schedule another task.

Olli, does this seem feasible?

Comment 1

5 years ago
Yes.
(Assignee)

Comment 2

5 years ago
Created attachment 612945 [details] [diff] [review]
patch

I did some light testing and this seems to work okay.
Attachment #612945 - Flags: review?(bugs)
Comment on attachment 612945 [details] [diff] [review]
patch

>+  public:
>+    IncrementalReleaseRunnable(nsTArray<nsISupports *> &items) : items(items) {}

FYI you can use SwapElements() here to avoid copying the array.
(Assignee)

Comment 4

5 years ago
(In reply to ben turner [:bent] from comment #3)
> FYI you can use SwapElements() here to avoid copying the array.

The class stores a reference, so I don't think it's going to copy. Although now that you mention it, this could be a problem. It's possible that one of these events could be outstanding after the XPCJSRuntime has been destroyed. I'm also realizing now that we could maybe get shutdown leaks if we don't release right away during shutdown.

I'll post a new patch in a bit.

Comment 5

5 years ago
Comment on attachment 612945 [details] [diff] [review]
patch

>+NS_IMETHODIMP
>+IncrementalReleaseRunnable::Run()
>+{
>+    MOZ_ASSERT(NS_IsMainThread());
>+
>+    PRTime started = PR_Now();
>+    int counter = 0;
>+    while (1) {
>+        PRUint32 count = items.Length();
>+        if (!count)
>+            break;
>+
>+        nsISupports *wrapper = items[count-1];
space before and after -

>+        items.RemoveElementAt(count-1);
Same here, though, I wonder if RemoveElementAt shows up in the profiles.
This ends up doing lots of memmoves. Could you replace the existing value with null, and
remove the whole range of null elements at once.

>+        NS_RELEASE(wrapper);
>+
>+        /* We don't want to call PR_Now() too often. */
>+        counter++;
>+        if (counter == 20) {
20 is perhaps a bit low.


>+            counter = 0;
>+            if (PR_Now() - started >= SliceTime)
Perhaps you could use TimeStamp() for this.

Clearing the review request since I assume a new patch is coming.
Attachment #612945 - Flags: review?(bugs)
(In reply to Olli Pettay [:smaug] from comment #5)
> >+        items.RemoveElementAt(count-1);
> Same here, though, I wonder if RemoveElementAt shows up in the profiles.
> This ends up doing lots of memmoves. Could you replace the existing value
> with null, and
> remove the whole range of null elements at once.

He's removing from the end, so there shouldn't be any memmove going on here.

Comment 7

5 years ago
Oops, I missed that.
(Assignee)

Comment 8

5 years ago
Created attachment 613001 [details] [diff] [review]
patch v2

In this version, the runnable has its own list of stuff to release. It uses SwapElements as Ben suggested. And I think it handles shutdown properly. I looked at the code for NS_DispatchToMainThread. It correctly returns an error if the event loop has stopped running. However, it prints some sort of warning in debug builds. Oh well.
Attachment #612945 - Attachment is obsolete: true
Attachment #613001 - Flags: review?(bugs)

Comment 9

5 years ago
Comment on attachment 613001 [details] [diff] [review]
patch v2


>+NS_IMETHODIMP
>+IncrementalReleaseRunnable::Run()
>+{
>+    MOZ_ASSERT(NS_IsMainThread());
>+
>+    TimeDuration sliceTime = TimeDuration::FromMilliseconds(SliceMillis);
>+    TimeStamp started = TimeStamp::Now();
>+    int counter = 0;
PRUint32
Attachment #613001 - Flags: review?(bugs) → review+
Blocks: 754303
(Assignee)

Comment 10

5 years ago
Created attachment 646237 [details] [diff] [review]
JS engine changes

This patch adds a field to the runtime saying whether the most recent GC was incremental or not. It exposes the field via jsfriendapi.h.
Attachment #613001 - Attachment is obsolete: true
Attachment #646237 - Flags: review?(jcoppeard)
(Assignee)

Comment 11

5 years ago
Created attachment 646238 [details] [diff] [review]
patch v3

The previous patch didn't work because we sometimes GC synchronously and expect certain objects to die right away. This patch fixes the problem by doing the incremental release only when the GC was incremental. It passes tryserver.

Olli, the last time we talked about this, you wanted me to addref nsLayoutStatics, I think. I think that the incremental GC check should take care of your concerns, since we never do incremental GC during shutdown.
Attachment #646238 - Flags: review?(bugs)
Comment on attachment 646238 [details] [diff] [review]
patch v3


>+class IncrementalReleaseRunnable : public nsRunnable
>+{
>+    nsTArray<nsISupports *> items;
Extra space after nsISupports.
s/items/mItems/

>+
>+    static const PRTime SliceMillis = 10; /* ms */
>+
>+  public:
>+    IncrementalReleaseRunnable(nsTArray<nsISupports *> &items);
>+
>+    NS_DECL_NSIRUNNABLE
>+};
>+
>+IncrementalReleaseRunnable::IncrementalReleaseRunnable(nsTArray<nsISupports *> &items)
ditto, and & goes with type, not with param name. aItem.
(though, xpconnect coding style is odd.)



>+IncrementalReleaseRunnable::Run()
>+{
>+    MOZ_ASSERT(NS_IsMainThread());
>+
>+    TimeDuration sliceTime = TimeDuration::FromMilliseconds(SliceMillis);
>+    TimeStamp started = TimeStamp::Now();
>+    PRUint32 counter = 0;
>+    while (1) {
while (true) {

>+
>+    items.Compact();
Do we need this?  I think not.


>+
>+    if (items.Length()) {
>+        nsresult rv = NS_DispatchToMainThread(this);
>+        if (NS_FAILED(rv))
>+            Run();
>+    }
>+
>+    return NS_OK;
>+}
>+
>+static void
>+ReleaseIncrementally(nsTArray<nsISupports *> &array)
nsTArray<nsISupports*>& aArray


Because ::Run may dispatch the runnable again, I think 
IncrementalReleaseRunnable should addref/release
layoutstatics.
It is really just
nsLayoutStatics::AddRef(); in ctor and 
nsLayoutStatics::Release(); in dtor


With those, r=me
Attachment #646238 - Flags: review?(bugs) → review+
> (though, xpconnect coding style is odd.)

In theory, XPConnect is supposed to be JS style, so I don't think Bill needs to do the aArg stuff. ;)
Whiteboard: [Snappy]
/me argues that all js/* should move to use saner Gecko coding style :)
Comment on attachment 646237 [details] [diff] [review]
JS engine changes

Review of attachment 646237 [details] [diff] [review]:
-----------------------------------------------------------------

Look fine to me.
Attachment #646237 - Flags: review?(jcoppeard) → review+
(Assignee)

Comment 16

5 years ago
https://hg.mozilla.org/integration/mozilla-inbound/rev/a5b21cffeebf

I didn't make the syntactic changes to the xpconnect code, since it follows JS style, as Andrew said. I fixed the nsLayoutStatics refcounting.
Unfortunately after RyanVM and myself spent a good 8 hours of our weekend playing avoid the coalescing with retriggers, intermittent but pretty severe mochitest-other leaks were attributed to this landing.

Examples:
https://tbpl.mozilla.org/php/getParsedLog.php?id=13941428&tree=Mozilla-Inbound
https://tbpl.mozilla.org/php/getParsedLog.php?id=13941444&tree=Mozilla-Inbound
https://tbpl.mozilla.org/php/getParsedLog.php?id=13941458&tree=Mozilla-Inbound

These leaks happen predominantly on WinXP debug & Linux {32,64} debug, and only around ~20-30% of the time.

In addition, there were a few other unexplained failures that occurred on this push, in the course of the final stage of (bisecting) retriggers:
https://tbpl.mozilla.org/php/getParsedLog.php?id=13941312&tree=Mozilla-Inbound
https://tbpl.mozilla.org/php/getParsedLog.php?id=13941290&tree=Mozilla-Inbound

Backout:
https://hg.mozilla.org/integration/mozilla-inbound/rev/29bff59d3bbe

Before this lands again, please can we have a green try run (with URL pasted in-bug so sheriffs can use it for elimination purposes), with at least 10 retriggers of mochitest-other on each of WinXP debug, Linux32 debug and Linux 64 debug.

Out of curiosity, did this get sent to try before landing? (I can't see a try url in-bug)
Some more instances of the leak:
https://tbpl.mozilla.org/php/getParsedLog.php?id=13940006&tree=Mozilla-Inbound
https://tbpl.mozilla.org/php/getParsedLog.php?id=13938924&tree=Mozilla-Inbound
https://tbpl.mozilla.org/php/getParsedLog.php?id=13939362&tree=Mozilla-Inbound
https://tbpl.mozilla.org/php/getParsedLog.php?id=13940258&tree=Mozilla-Inbound
https://tbpl.mozilla.org/php/getParsedLog.php?id=13935653&tree=Mozilla-Inbound
https://tbpl.mozilla.org/php/getParsedLog.php?id=13938206&tree=Mozilla-Inbound
https://tbpl.mozilla.org/php/getParsedLog.php?id=13937756&tree=Mozilla-Inbound
https://tbpl.mozilla.org/php/getParsedLog.php?id=13930112&tree=Mozilla-Inbound
https://tbpl.mozilla.org/php/getParsedLog.php?id=13931939&tree=Mozilla-Inbound
https://tbpl.mozilla.org/php/getParsedLog.php?id=13932186&tree=Mozilla-Inbound
https://tbpl.mozilla.org/php/getParsedLog.php?id=13932761&tree=Mozilla-Inbound
https://tbpl.mozilla.org/php/getParsedLog.php?id=13932761&tree=Mozilla-Inbound
(Assignee)

Comment 19

5 years ago
I'm sorry this happened, Ed. I did push the patch to try and it was as green as things ever are nowadays. There was one leak reported, but it was only three windows, which I thought we normally ignore.
  https://tbpl.mozilla.org/?tree=Try&rev=fb75c988363e
The link no longer shows up, though. I don't know why.

If it helps at all, please consider that I'll be spending the next several days toiling away in frustration. These leak reports are almost certainly spurious, but tracking down why they happen is going to take forever.
You could try running with the patches in bug 728294, they supposedly reduce false positives a bit.
(In reply to Bill McCloskey (:billm) from comment #19)
> I'm sorry this happened, Ed. I did push the patch to try and it was as green
> as things ever are nowadays. There was one leak reported, but it was only
> three windows, which I thought we normally ignore.
You mean 3 nsGlobalWindow objects?
There should be no nsGlobalWindow object leaks ever.
(In reply to Bill McCloskey (:billm) from comment #19)
> I'm sorry this happened, Ed. I did push the patch to try and it was as green
> as things ever are nowadays. There was one leak reported, but it was only
> three windows, which I thought we normally ignore.
>   https://tbpl.mozilla.org/?tree=Try&rev=fb75c988363e
> The link no longer shows up, though. I don't know why.

No worries, these things happen :-) Thank you for using try; the 3 DOMWindows leak is a common intermittent, in retrospect given the low rate of failure, I don't think there was much else you could have done here. (The try repo was reset unexpectedly recently, which is why the try link won;t work; bug 778062).

Not envious of the debugging - good luck!
(In reply to Ed Morley [:edmorley] from comment #22)
> the 3
> DOMWindows leak is a common intermittent, 
It is? Do we have a bug # for it? It really should be fixed.
(Assignee)

Comment 24

5 years ago
(In reply to Olli Pettay [:smaug] (limited connectivity July 26-29) from comment #21)
> You mean 3 nsGlobalWindow objects?
> There should be no nsGlobalWindow object leaks ever.

The leaks being discussed are not shutdown leaks. There's a different leak detector that uses the ++DOMWINDOW and --DOMWINDOW output to try to detect leaks before shutdown. It frequently reports false positives and seems pretty broken.
(In reply to Olli Pettay [:smaug] (limited connectivity July 26-29) from comment #23)
> It is? Do we have a bug # for it? It really should be fixed.

We do, sadly it's a pick three from:
https://bugzilla.mozilla.org/showdependencytree.cgi?id=754804&hide_resolved=1

Bug 728294 should cut out any false positives; how many remain after that is yet to be seen.
(In reply to Bill McCloskey (:billm) from comment #24)
> 
> The leaks being discussed are not shutdown leaks. 
Oh, nm then.
(Assignee)

Comment 27

5 years ago
Created attachment 648492 [details] [diff] [review]
patch v4

This seems to fix the leak. Here's the try link:
  https://tbpl.mozilla.org/?tree=Try&rev=7206a5fca258
We never leak more than 4 windows in the tests, which I think is normal.

To look for leaks, we force a GC&CC at some point in the tests. This GC is non-incremental, so objects are released synchronously. However, there may have been an incremental GC that happened right before it that released objects asynchronously. In that case, there may still be some objects to be released even after the non-incremental GC finishes, and that's why we were reporting leaks.

This patch keeps a pointer to the outstanding IncrementalReleaseRunnable and ensures that it never lives beyond the next GC. This fixes the problem above, and it also ensures that we don't get into situations where the number of objects to be released incrementally grows without bound.
Attachment #646238 - Attachment is obsolete: true
Attachment #648492 - Flags: review?(bugs)
Comment on attachment 648492 [details] [diff] [review]
patch v4


>+XPCIncrementalReleaseRunnable::~XPCIncrementalReleaseRunnable()
>+{
>+    nsLayoutStatics::Release();
>+}
Could you add assertion here that 'this' != runtime->mReleaseRunnable


>+
>+void
>+XPCIncrementalReleaseRunnable::ReleaseNow(bool limited)
>+{
>+    MOZ_ASSERT(NS_IsMainThread());
>+
>+    TimeDuration sliceTime = TimeDuration::FromMilliseconds(SliceMillis);
>+    TimeStamp started = TimeStamp::Now();
>+    PRUint32 counter = 0;
>+    while (1) {
>+        PRUint32 count = items.Length();
>+        if (!count)
>+            break;
>+
>+        nsISupports *wrapper = items[count - 1];
>+        items.RemoveElementAt(count - 1);
>+        NS_RELEASE(wrapper);
>+
>+        /* We don't want to read the clock too often. */
>+        counter++;
I would put 'if (limited) {' before counter and remove limited below

>+        if (counter == 100) {
>+            counter = 0;
>+            if (limited && TimeStamp::Now() - started >= sliceTime)
>+                break;
>+        }
>+    }



         case JSGC_BEGIN:
         {
+            if (self->mReleaseRunnable)
+                self->mReleaseRunnable->ReleaseNow(false);
+
Add a comment why this is needed, and why it is safe to call ReleaseNow()
Attachment #648492 - Flags: review?(bugs) → review+
Comment on attachment 648492 [details] [diff] [review]
patch v4

4 leaks is normal, yes. LGTM!
Attachment #648492 - Flags: feedback+
(Assignee)

Comment 30

5 years ago
https://hg.mozilla.org/integration/mozilla-inbound/rev/f41cd7125dbd
https://hg.mozilla.org/mozilla-central/rev/f41cd7125dbd
Status: NEW → RESOLVED
Last Resolved: 5 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla17
Depends on: 785778

Updated

5 years ago
Depends on: 785806

Updated

5 years ago
Summary: Deferred release during JS GC should happen... more later → JS GC: Incremental deferred release
Depends on: 806433
You need to log in before you can comment on or make changes to this bug.