race when returning GC arenas from the background finalizer to GC arena allocator

RESOLVED FIXED

Status

()

Core
JavaScript Engine
RESOLVED FIXED
6 years ago
6 years ago

People

(Reporter: Igor Bukanov, Assigned: Igor Bukanov)

Tracking

Firefox Tracking Flags

(Not tracked)

Details

(Whiteboard: fixed-in-tracemonkey)

Attachments

(1 attachment, 5 obsolete attachments)

(Assignee)

Description

6 years ago
When the background finalization introduced in the bug 627200 returns arenas to the GC arena allocator it uses the volatile ArenaList::hasToBeFinalized field to signal that the arena list is ready to be scanned for arenas with free things.

But such volatile variable do not ensure that that other writes are consistent. In particular, the CPU that allocates things may not see other writes that was preceding the code that sets ArenaList::hasToBeFinalized to false on the background thread.
(Assignee)

Comment 1

6 years ago
I plan to fix this via appending the finalized arenas to ArenaList under the GC lock. It increases the number of times RefillTypedFreeList takes this lock by one per each arena list from which we allocate things between GC runs, but I suppose this would not be visible.
(Assignee)

Comment 2

6 years ago
There is another race in the implementation. Currently the GC waits for the background finalization to end only before it starts the sweeping. So the marking and finalization may in parallel after the GC cleared the mark bits.
(Assignee)

Comment 3

6 years ago
Created attachment 528028 [details] [diff] [review]
v1

Work in progress that passes shell tests
I am trying to find out what the actual bug is you want to fix.
So there is the hasToBeFinalized field that indicates whether we have to allocate a new arena or we can try to allocate from the already existing arenalist. Arenas that are freed on the background thread can be allocated right away and they are not "protected" by this variable.
My understanding is that the change of this variable might not be visible right away but it will be visible once the thread flushes the cache and this will eventually happen.
So you think the bug is 1)that the state change is not visible right away or 2)due to the fact that the state change is not visible right away we can get a race condition or 3)the other threads will never see the state change?

For comment 2)
Where do you see that? We usually wait in js_GC for the sweeping do be done. And for gcPoke we wait again right before we clear the marking bits. I was very conservative with the synchronization and I guess we could get rid of some synchronization because I don't do delayed sweeping during LAST_CONTEXT GC where we actually pay attention to gcPoke.
(In reply to comment #4)
> I am trying to find out what the actual bug is you want to fix.
> So there is the hasToBeFinalized field that indicates whether we have to
> allocate a new arena or we can try to allocate from the already existing
> arenalist. Arenas that are freed on the background thread can be allocated
> right away and they are not "protected" by this variable.
> My understanding is that the change of this variable might not be visible right
> away but it will be visible once the thread flushes the cache and this will
> eventually happen.
> So you think the bug is 1)that the state change is not visible right away or
> 2)due to the fact that the state change is not visible right away we can get a
> race condition or 3)the other threads will never see the state change?

It sounds like Igor is worried about certain architectures where writes made by CPU A can be seen out of order by CPU B. The Alpha was somewhat notorious for this. In this case, the main thread might see hasToBeFinalized get set to zero before it actually sees the writes made by the finalization process (such as setting up the freelist). This would be bad. I don't think this is an issue on x86, but we should probably fix it if we can, I guess.

Comment 6

6 years ago
We have a mips port, so yeah, we should take a patch for this if anyone cares, but I wouldn't actually go and spend time on this without indications that this is a problem on platforms we actively support.
Do we have an ia64 port?  ;)
Actually, I found this article while Googling. It seems to suggest that ARM has this problem. So we should definitely fix it.
  http://wanderingcoder.net/2011/04/01/arm-memory-ordering/

Comment 9

6 years ago
I think we went through this with the operation callback. We use

JS_ATOMIC_SET(&interruptFlags, 1)

which should enforce memory ordering.
(In reply to comment #9)
> I think we went through this with the operation callback. We use
> 
> JS_ATOMIC_SET(&interruptFlags, 1)
> 
> which should enforce memory ordering.

I'm not sure this is correct. In gcc, JS_ATOMIC_SET gets converted to __sync_lock_test_and_set. The gcc docs say that this is not a full barrier operation--only an acquire barrier, which is insufficient for our purposes.
  http://gcc.gnu.org/onlinedocs/gcc-4.1.2/gcc/Atomic-Builtins.html

And really we need two barriers: one where the flag is written (to ensure that no writes get moved past it) and one where the flag is read (to make sure that no other reads move before it). GCC only seems to support __sync_synchronize(), which is a full barrier. We could use it in both places.

However, it sounds like Igor's solution just uses a lock, so then we wouldn't have to worry about this. But we might want to look at the operation callback again. Is there a bug where this was discussed?
(Assignee)

Comment 11

6 years ago
(In reply to comment #4)
> For comment 2)
> Where do you see that? We usually wait in js_GC for the sweeping do be done.
> And for gcPoke we wait again right before we clear the marking bits. I was very
> conservative with the synchronization and I guess we could get rid of some
> synchronization because I don't do delayed sweeping during LAST_CONTEXT GC
> where we actually pay attention to gcPoke.

You are right, there is no bug in that.
(Assignee)

Comment 12

6 years ago
(In reply to comment #10)
> But we might want to look at the operation callback
> again. Is there a bug where this was discussed?

There is no problem with the operation callback as the code does not depend on other variables written besides the callback flag itself. This is very different from the background finalization when we have to ensure that all writes to GC structures are finished before we can access for reading them.
(Assignee)

Comment 13

6 years ago
(In reply to comment #8)
> Actually, I found this article while Googling. It seems to suggest that ARM has
> this problem. So we should definitely fix it.
>   http://wanderingcoder.net/2011/04/01/arm-memory-ordering/

We have the problem even on x86. The current implementation depends on all reads from the allocation thread happens after the background thread finishes all writes that happens before that thread clears ArenaList::hasToBeFinalized flag. But according to that article:

---
Speaking of which, on the other hand x86 guarantees that writes won’t be reordered, that architecture is referred to as strongly ordered. This is not to say it doesn’t do any reordering, for instance reads are allowed to happen ahead of writes that come “before” them; this breaks a few algorithms like Peterson’s algorithm. 
---
The GCC manual mentions "asm volatile ("" : : : "memory");" as a solution.
(Assignee)

Comment 15

6 years ago
(In reply to comment #14)
> The GCC manual mentions "asm volatile ("" : : : "memory");" as a solution.

From http://en.wikipedia.org/wiki/Memory_ordering#Compiler_memory_barrier :

---
These barriers prevent a compiler from reordering instructions, they do not prevent reordering by CPU.
---
(Assignee)

Comment 16

6 years ago
Created attachment 528209 [details] [diff] [review]
v2

This passes the try server.

Changes:

1. The background thread puts the finalized list into the arena list under the lock into the separated ArenaList::finalizedInBackground field. The allocator checks the field and when it is set, append the list under the lock to the arena list. In principle the lock can be replaced by a write barrier in the background finalization and read barrier during the allocation, but that is for another bug. 

2. ArenaList::cursor becomes a pointer to an arena pointer, instead of just an arena pointer. This way the finalized list can always be appended to the main list and there is no need to keep at least one arena in the list.

3. The patch copies parts of the patch from the bug 589771 (that was never landed as it rotten too much before it can get landing approval) to simplify handling of the background finalization during the GC and the stats collection.

The patch is based on the patch from the bug 605029 for simplicity.
Attachment #528028 - Attachment is obsolete: true
(Assignee)

Updated

6 years ago
Attachment #528209 - Flags: review?(gal)
(Assignee)

Comment 17

6 years ago
To Gregor: 

With the patch compared with TM tip I see a 0.8% regression in V8 with a slight win in splay and losses in earley-boyer (1.5%) and regexp (0.7%). The sunspider results are not affected. 

Now, since the patch takes the GC lock when returning finalized arenas to ArenaList, the chunk lock can be removed. So if I use the GC lock when releasing the chunks from the background finalization, then V8 shows gains in all the tests compared with the patch that still uses locks. The overall regression in V8 becomes insignificant with the win in splay becoming 1.5%. Sunspider show 0.4% win compared the the current TM tip.

In the bug 627200 comment 1 you mentioned that the patch has not affected sunspider, but have you seen regressions in V8 due to the added locking?
(Assignee)

Comment 18

6 years ago
Created attachment 529066 [details] [diff] [review]
v3

The new version removes the per-chunk locks to offset benchmark losses during allocation due to the need to take the GC lock. With this changes v8 no longer regresses and sunspider shows 0.4% win.
Attachment #528209 - Attachment is obsolete: true
Attachment #528209 - Flags: review?(gal)
Attachment #529066 - Flags: review?(anygregor)
(In reply to comment #17)
> To Gregor: 
> 
> In the bug 627200 comment 1 you mentioned that the patch has not affected
> sunspider, but have you seen regressions in V8 due to the added locking?

I also saw small "unstable" regressions that were present one day but they were also gone the next day. My guess is that the benchmark-suite has fixed GC points between benchmarks. Witch background finalization this means now right when a benchmark starts we are going the path with the locking and maybe new arena allocation. So the "regression" depends on the thread-scheduling.


BTW: Please stop combining many different fixes into one patch. The time for reviewing increases exponentially.
Puh is there really no other way than splitting the arenalists? What was wrong with the lock idea you had? 

I think the code becomes very error-prone because we always have to maintain 2 separate arena lists now and there is no easy way to just iterate over the arenalist because we always have to check if the second one is around.
(Assignee)

Comment 21

6 years ago
(In reply to comment #20)
> Puh is there really no other way than splitting the arenalists?

Yes, that was not the good solution. I will update the patch without the split. 

The observation is that the arena list can be in 3 states. Waiting for finalization, just-finalized and ready. That can be encoded as a volatile enum. When the allocation thread see either waiting or just-finalized state, it takes the GC lock. If under the lock the state is still waiting, it would allocate a new arena as currently.

If the state is just-finalized, then it means that the background thread has finished the arena thread and updated the tail of the list with finalized arenas under tyhe GC lock at some point before. In that case the allocation thread transition the the state into ready. In the ready state the allocation thread proceeds to check at arenas under ArenaList::cursor for free things without any locks.
(In reply to comment #21)
> (In reply to comment #20)
> > Puh is there really no other way than splitting the arenalists?
> 
> Yes, that was not the good solution. I will update the patch without the split. 
> 
> The observation is that the arena list can be in 3 states. Waiting for
> finalization, just-finalized and ready. That can be encoded as a volatile enum.
> When the allocation thread see either waiting or just-finalized state, it takes
> the GC lock. If under the lock the state is still waiting, it would allocate a
> new arena as currently.
> 

Yeah it took me some time to figure out your implicit state-machine. An enum sounds much better than the 
reinterpret_cast<ArenaHeader *>(sizeof(ArenaHeader));
(Assignee)

Comment 23

6 years ago
Created attachment 529319 [details] [diff] [review]
v4

The new version bring more comments about the lock policy regarding the arena list access.

The patch moves to ArenaHeader::getNextWithFreeList  all the logic related to choosing an arena with a non-empty free list. Initially I tried to avoid goto there, but that resulted in rather unclear nested loops. I guess management of 3-state variable accessed outside and inside the lock leads to mess in any case.

With the patch the GC lock protects access to both the head and cursor of the arena list, not just the cursor. Event with using ArenaHeader **, not ArenaHeader * for the cursor, inserting a newly allocated arena to the empty list requires to modify both the head and the cursor. As the result ArenaHeader::getNextWithFreeList must keep the GC lock during insertion to prevent the background thread from manipulating the cursor while the main thread puts the first arena into the list. But that just adds few CPU instructions to the duration of the lock.
Attachment #529066 - Attachment is obsolete: true
Attachment #529066 - Flags: review?(anygregor)
Attachment #529319 - Flags: review?(anygregor)
I walked through the code a first time and it looks good but I need more
time for a detailed review.
I have a few nits, questions so far:


> #ifdef JS_GCMETER
>-# define METER(x)               ((void) (x))
>+# define METER(x)               x

Why this change?



> 
>-ArenaList *
>-GetFinalizableArenaList(JSCompartment *c, unsigned thingKind) {
>-    JS_ASSERT(thingKind < FINALIZE_LIMIT);
>-    return &c->arenas[thingKind];
>+template <typename T>
>+inline ArenaHeader *
>+ArenaList::getNextWithFreeList(JSContext *cx, unsigned thingKind)
>+{
>+    /*
>+     * Search the list for arenas with free things unless the
>+     * background finalization still runs.
>+     */
>+#ifdef JS_THREADSAFE
>+    if (backgroundFinalizeState == BFS_DONE) {
>+      check_arena_list:
>+#endif
>+        while (ArenaHeader *aheader = *cursor) {
>+            cursor = &aheader->next;
>+            if (aheader->freeList)
>+                return aheader;
>+        }
>+#ifdef JS_THREADSAFE
>+    }
>+
>+    AutoLockGC lock(cx->runtime);
>+
>+  check_state:
>+    if (backgroundFinalizeState == BFS_JUST_FINISHED) {
>+        /*
>+         * The finalization has finished before we took the lock, check
>+         * the arena list again.
>+         */
>+        backgroundFinalizeState = BFS_DONE;
>+        goto check_arena_list;
>+    }
>+#endif

Couldn't you end up in a situation where you get the GC lock,
goto check_arena_list, and no arena has free things so you end up 
getting the GC lock again?

>+
>+    if (ArenaHeader *aheader = AllocateArena<T>(cx, thingKind)) {
>+        /*
>+         * Add arenas to the head of the list before the cursor so they are
>+         * not checked for the free things again.
>+         */
>+        aheader->next = head;
>+        if (cursor == &head)
>+            cursor = &aheader->next;
>+        head = aheader;
>+        return aheader;
>+    }
>+
>+#ifdef JS_THREADSAFE
>+    /* We don't release the lock when allocating the arenas. */

Remove "the" before arenas.


>+    }
>+
>+    /*
>+     * In stats we should not reflect newly the arenas allocated after the GC
>+     * has finished. So we do not add to livearenas the arenas from al->head.
>+     */

In stats we should not reflect newly allocated arenas



> 
> /* The arenas in a list have uniform kind. */
>-struct ArenaList {
>-    ArenaHeader           *head;          /* list start */
>-    ArenaHeader           *cursor;        /* arena with free things */

arena with possible free things
(Assignee)

Comment 25

6 years ago
(In reply to comment #24)
> > #ifdef JS_GCMETER
> >-# define METER(x)               ((void) (x))
> >+# define METER(x)               x
> 
> Why this change?

In an initial version of the patch I used that to write METER(uint32 x = ); But this is not necessary so I will remove the change.

> Couldn't you end up in a situation where you get the GC lock,
> goto check_arena_list, and no arena has free things so you end up 
> getting the GC lock again?

Yes, this is possible. But free lists that contain just full arenas are rare. So the preference here to release the lock and walk the list outside it to minimize the amount of time the lock is held.
(In reply to comment #25)
> 
> > Couldn't you end up in a situation where you get the GC lock,
> > goto check_arena_list, and no arena has free things so you end up 
> > getting the GC lock again?
> 
> Yes, this is possible. But free lists that contain just full arenas are rare.
> So the preference here to release the lock and walk the list outside it to
> minimize the amount of time the lock is held.

Is this allowed?
From MDC:
PR_Lock is not reentrant. Calling it twice on the same thread results in undefined behavior.
(Assignee)

Comment 27

6 years ago
(In reply to comment #26)
> Is this allowed?

Hm, but in C++ goto outside the scope of variable calls the destructor. For example, consider the following C++ program:

#include <stdio.h>

struct A {
    A() { printf("Construct\n"); }
    ~A() { printf("Destruct\n"); } 
};


int main()
{
    int i = 0;
  label:
    A a;
    if (!i) {
        ++i;
        goto label;
    }
    return 0;
}

With GCC it prints:

Construct
Destruct
Construct
Destruct

Or are there bugs in some compilers that we use that do not call the destructor?
(In reply to comment #27)
> (In reply to comment #26)
> > Is this allowed?
> 
> Or are there bugs in some compilers that we use that do not call the
> destructor?

Yes you are right. I never used such code and it looked strange.
Such a bug would be almost impossible to find. Can we add an assert?
All the ifdefs and gotos are not very helpful either in this particular function.
I know that you tried to avoid them but maybe you can look over it again if we can make it more readable :)
(Assignee)

Comment 29

6 years ago
Created attachment 530012 [details] [diff] [review]
v5

In the new version I split getNextWithFreeList into getArenaWithFreeList and searchForFreeArena. getNextWithFreeList no longer tries to share code between thread-safe and single-threaded versions. That allowed to keep just single goto replacing the rest of the control flow with single loop. In addition the new version avoids TriggerGC call when it retries the allocation later after the finalization is done.
Attachment #529319 - Attachment is obsolete: true
Attachment #529319 - Flags: review?(anygregor)
(Assignee)

Comment 30

6 years ago
Created attachment 530017 [details] [diff] [review]
v5 for real

Prev patch was wrong
Attachment #530012 - Attachment is obsolete: true
Attachment #530017 - Flags: review?(anygregor)
Comment on attachment 530017 [details] [diff] [review]
v5 for real

> 
> static inline bool
> CanBeFinalizedInBackground(gc::FinalizeKind kind, Class *clasp)
> {
>+#ifdef JS_THREADSAFE
>     JS_ASSERT(kind <= gc::FINALIZE_OBJECT_LAST);
>     /* If the class has no finalizer or a finalizer that is safe to call on
>      * a different thread, we change the finalize kind. For example,
>      * FINALIZE_OBJECT0 calls the finalizer on the main thread,
>      * FINALIZE_OBJECT0_BACKGROUND calls the finalizer on the gcHelperThread.
>      * kind % 2 prevents from recursivly incrementing the finalize kind because
>      * we can call NewObject with a background finalize kind.
>      */
>     if (kind % 2 == 0 && (!clasp->finalize || clasp->flags & JSCLASS_CONCURRENT_FINALIZER))
>         return true;
>+#endif
>     return false;
> }

nice!

Thx!
Attachment #530017 - Flags: review?(anygregor) → review+
cdleary-bot mozilla-central merge info:
http://hg.mozilla.org/mozilla-central/rev/b892a04bedf7
Note: not marking as fixed because fixed-in-tracemonkey is not present on the whiteboard.
(Assignee)

Comment 33

6 years ago
I forgot to mark the bug as fixed-in-tracemonkey:

http://hg.mozilla.org/tracemonkey/rev/b892a04bedf7
Status: NEW → RESOLVED
Last Resolved: 6 years ago
Resolution: --- → FIXED
Whiteboard: fixed-in-tracemonkey
(Assignee)

Updated

6 years ago
Duplicate of this bug: 589771
You need to log in before you can comment on or make changes to this bug.