Closed Bug 1407142 Opened 7 years ago Closed 7 years ago

Lock inversion in Nursery::init

Tracking

()

Status:

RESOLVED FIXED

Milestone:

mozilla58

Tracking Flags:

Tracking

Status

firefox57

---

wontfix

firefox58

---

fixed

People

(Reporter: pbone, Assigned: pbone)

References

Details

Attachments

(2 files, 4 obsolete files)

inversion.patch 7 years ago Paul Bone [:pbone] 10.60 KB, patch	jonco : feedback+	Details \| Diff \| Splinter Review
Bug 1407142 (Part 1) - Remove the AutoMaybeStartBackgroundAllocation class 7 years ago Paul Bone [:pbone] 11.12 KB, patch	jonco : review+	Details \| Diff \| Splinter Review
Bug 1407142 (Part 2) - Subclass AutoLockGC to handle background allocation 7 years ago Paul Bone [:pbone] 7.71 KB, patch	jonco : feedback+	Details \| Diff \| Splinter Review
Bug 1407142 (Part 1) - Remove the AutoMaybeStartBackgroundAllocation class r=jonco 7 years ago Paul Bone [:pbone] 11.12 KB, patch	pbone : review+	Details \| Diff \| Splinter Review
Bug 1407142 (Part 2) - Subclass AutoLockGC to handle background allocation 7 years ago Paul Bone [:pbone] 7.38 KB, patch	jonco : review+	Details \| Diff \| Splinter Review
Bug 1407142 (Part 2) - Subclass AutoLockGC to handle background allocation r=jonco 7 years ago Paul Bone [:pbone] 7.37 KB, patch	pbone : review+	Details \| Diff \| Splinter Review

Paul Bone [:pbone]

Assignee

Description

•

7 years ago

Nursery::init takes a AutoLockGC reference and constructs a AutoMaybeStartBackgroundAllocation object inside. This can create a lock inversion since the AutoMaybeStartBackgroundAllocation deconstructor runs before the AutoLockGC destructor.

Naveed Ihsanullah [:naveed]

Updated

•

7 years ago

status-firefox57: --- → wontfix

Priority: -- → P3

Paul Bone [:pbone]

Assignee

Comment 1

•

7 years ago

The lock inversion doesn't seem to affect anything. So for some reason the background allocation is never started in this situation. Therefore I agree with Naveed that we don't need to uplift it to 57. However it should still be fixed.

Assignee: nobody → pbone

Status: NEW → ASSIGNED

status-firefox58: --- → affected

Paul Bone [:pbone]

Assignee

Comment 2

•

7 years ago

Hi sfink, can we make tsan detect situations like this?

Flags: needinfo?(sphink)

Paul Bone [:pbone]

Assignee

Comment 3

•

7 years ago

Attached patch inversion.patch (obsolete) — Details — Splinter Review

This patch makes this particular lock inversion impossible by having the AutoLockGC class itself do the AutoMaybeStartBackgroundAllocation work. I have two questions regarding this code: In https://searchfox.org/mozilla-central/source/js/src/gc/Allocator.cpp#524 Why do we pass rt->gc to tryToStartBackgroundAllocation rather than passing 'this'? In https://searchfox.org/mozilla-central/source/js/src/jsgc.h#307 Some comments refer to "worker thread state lock" is this the same as a helper lock? Thanks.

Attachment #8917247 - Flags: review?(jcoppeard)

Paul Bone [:pbone]

Assignee

Comment 4

•

7 years ago

The try job is https://treeherder.mozilla.org/#/jobs?repo=try&revision=9342207cc750b7761c2db9cd185e23195d69e653

Jon Coppeard (:jonco) (PTO until 10th March)

Comment 5

•

7 years ago

Comment on attachment 8917247 [details] [diff] [review] inversion.patch Review of attachment 8917247 [details] [diff] [review]: ----------------------------------------------------------------- Hey, I like the idea of having a class to do both jobs. It makes sense when you're manipulating the chunk lists to be able to start the background alloc when you're done. And I've always hated the name AutoMaybeStartBackgroundAllocation. I was in two minds about whether we want to add this behaviour to the AutoLockGC class itself or use a separate class for this (probably derived from AutoLockGC). The GC lock is used for other things too and I'd rather keep it simple and have that class *just* lock the GC (like all the other auto locking classes)... however having said that, locking the chunk lists is its main use and I think this is fine. This patch also simplifies the code in a bunch of places which I like. Yes, "worker thread state lock" is the same thing as the helper thread state lock. Actually we should probably change it as "workers" can refer to "web workers" which is totally different. ::: js/src/gc/Allocator.cpp @@ +515,4 @@ > } > > if (wantBackgroundAllocation(lock)) > + lock.tryToStartBackgroundAllocation(rt->gc); Passing |rt->gc| here is a hangover from a previous era... you can just pass |this|. ::: js/src/vm/Runtime.h @@ +1161,4 @@ > return jitPoisonRanges.append(range); > } > > +using gc::GCRuntime; Don't use using at the top level of a header file - it will affect everything that includes it. Usually we just qualify everything in headers (unless it becomes really tedious and then you can use using inside the class definition). @@ +1174,4 @@ > explicit AutoLockGC(JSRuntime* rt > MOZ_GUARD_OBJECT_NOTIFIER_PARAM) > : runtime_(rt) > + , gcForBGAlloc(nullptr) The GCRuntime pointer will always be |&runtime_->gc|, so you don't need a separate pointer for this. Please use a bool instead. @@ +1184,5 @@ > unlock(); > + /* > + * We have to do this after releasing the lock because it may acquire > + * the helper lock which could cause lock inversion if we still held > + * the GC lock. Lock order is checked in debug builds so this would assert. @@ +1201,4 @@ > lockGuard_.reset(); > } > > + void tryToStartBackgroundAllocation(GCRuntime &gc) { While we're here, can you add comment explaining that this is used to kick off background alloc and ensure we have the minimum number of free chunks after manipulating chunks lists with the lock held, or something to that effect?

Attachment #8917247 - Flags: review?(jcoppeard) → feedback+

Paul Bone [:pbone]

Assignee

Comment 6

•

7 years ago

(In reply to Jon Coppeard (:jonco) from comment #5) > Comment on attachment 8917247 [details] [diff] [review] > inversion.patch > > Review of attachment 8917247 [details] [diff] [review]: > ----------------------------------------------------------------- > > Hey, I like the idea of having a class to do both jobs. It makes sense when > you're manipulating the chunk lists to be able to start the background alloc > when you're done. And I've always hated the name > AutoMaybeStartBackgroundAllocation. > > I was in two minds about whether we want to add this behaviour to the > AutoLockGC class itself or use a separate class for this (probably derived > from AutoLockGC). The GC lock is used for other things too and I'd rather > keep it simple and have that class *just* lock the GC (like all the other > auto locking classes)... however having said that, locking the chunk lists > is its main use and I think this is fine. Ah, I didn't think of making a sub class for this task. I like that better because it'll use less memory when you only need AutoLockGC. I'll take a look but I also see how these things are tightly coupled and might as well be in the same class. > ::: js/src/vm/Runtime.h > @@ +1161,4 @@ > > return jitPoisonRanges.append(range); > > } > > > > +using gc::GCRuntime; > > Don't use using at the top level of a header file - it will affect > everything that includes it. Usually we just qualify everything in headers > (unless it becomes really tedious and then you can use using inside the > class definition). I tried putting the using within the class declaration, but gcc didn't like it. I think it thought it was a class member. I"ll update this to use an explicit namespace each time. Thanks. > @@ +1174,4 @@ > > explicit AutoLockGC(JSRuntime* rt > > MOZ_GUARD_OBJECT_NOTIFIER_PARAM) > > : runtime_(rt) > > + , gcForBGAlloc(nullptr) > > The GCRuntime pointer will always be |&runtime_->gc|, so you don't need a > separate pointer for this. Please use a bool instead. I wondered about that but forgot to ask. I'm glad and I'll do that. > @@ +1184,5 @@ > > unlock(); > > + /* > > + * We have to do this after releasing the lock because it may acquire > > + * the helper lock which could cause lock inversion if we still held > > + * the GC lock. > > Lock order is checked in debug builds so this would assert. Good, but I want to keep the comment anyway. It helps anyone reading the code. Is this checked at runtime or some kind of compile time analysis? tsan? Why wasn't the original lock inversion picked up? > @@ +1201,4 @@ > > lockGuard_.reset(); > > } > > > > + void tryToStartBackgroundAllocation(GCRuntime &gc) { > > While we're here, can you add comment explaining that this is used to kick > off background alloc and ensure we have the minimum number of free chunks > after manipulating chunks lists with the lock held, or something to that > effect? Yep, I agree and will add this. Thanks.

Chia-Hung Duan

Comment 7

•

7 years ago

(In reply to Paul Bone [:pbone] from comment #6) > (In reply to Jon Coppeard (:jonco) from comment #5) > > Comment on attachment 8917247 [details] [diff] [review] > > inversion.patch > > > > Review of attachment 8917247 [details] [diff] [review]: > > ----------------------------------------------------------------- > > > > Hey, I like the idea of having a class to do both jobs. It makes sense when > > you're manipulating the chunk lists to be able to start the background alloc > > when you're done. And I've always hated the name > > AutoMaybeStartBackgroundAllocation. > > > > I was in two minds about whether we want to add this behaviour to the > > AutoLockGC class itself or use a separate class for this (probably derived > > from AutoLockGC). The GC lock is used for other things too and I'd rather > > keep it simple and have that class *just* lock the GC (like all the other > > auto locking classes)... however having said that, locking the chunk lists > > is its main use and I think this is fine. > > Ah, I didn't think of making a sub class for this task. I like that better > because it'll use less memory when you only need AutoLockGC. I'll take a > look but I also see how these things are tightly coupled and might as well > be in the same class. > > > ::: js/src/vm/Runtime.h > > @@ +1161,4 @@ > > > return jitPoisonRanges.append(range); > > > } > > > > > > +using gc::GCRuntime; > > > > Don't use using at the top level of a header file - it will affect > > everything that includes it. Usually we just qualify everything in headers > > (unless it becomes really tedious and then you can use using inside the > > class definition). > > I tried putting the using within the class declaration, but gcc didn't like > it. I think it thought it was a class member. I"ll update this to use an > explicit namespace each time. Thanks. > > > @@ +1174,4 @@ > > > explicit AutoLockGC(JSRuntime* rt > > > MOZ_GUARD_OBJECT_NOTIFIER_PARAM) > > > : runtime_(rt) > > > + , gcForBGAlloc(nullptr) > > > > The GCRuntime pointer will always be |&runtime_->gc|, so you don't need a > > separate pointer for this. Please use a bool instead. > > I wondered about that but forgot to ask. I'm glad and I'll do that. > > > @@ +1184,5 @@ > > > unlock(); > > > + /* > > > + * We have to do this after releasing the lock because it may acquire > > > + * the helper lock which could cause lock inversion if we still held > > > + * the GC lock. > > > > Lock order is checked in debug builds so this would assert. > > Good, but I want to keep the comment anyway. It helps anyone reading the > code. > > Is this checked at runtime or some kind of compile time analysis? tsan? > Why wasn't the original lock inversion picked up? > It's a runtime check. The references: https://searchfox.org/mozilla-central/source/js/src/threading/Mutex.cpp#45 https://searchfox.org/mozilla-central/source/js/src/vm/MutexIDs.h During Nursery::init(), I guess it won't trigger background allocation, i.e. we don't need to acquire helper lock, that's why the assertion is not fired. > > @@ +1201,4 @@ > > > lockGuard_.reset(); > > > } > > > > > > + void tryToStartBackgroundAllocation(GCRuntime &gc) { > > > > While we're here, can you add comment explaining that this is used to kick > > off background alloc and ensure we have the minimum number of free chunks > > after manipulating chunks lists with the lock held, or something to that > > effect? > > Yep, I agree and will add this. > > Thanks.

Paul Bone [:pbone]

Assignee

Comment 8

•

7 years ago

Attached patch Bug 1407142 (Part 1) - Remove the AutoMaybeStartBackgroundAllocation class (obsolete) — Details — Splinter Review

Attachment #8917247 - Attachment is obsolete: true

Attachment #8917690 - Flags: review?(jcoppeard)

Paul Bone [:pbone]

Assignee

Comment 9

•

7 years ago

Attached patch Bug 1407142 (Part 2) - Subclass AutoLockGC to handle background allocation (obsolete) — Details — Splinter Review

Hi Jon, I've created a subclass of AutoLockGC to add the background allocation code there. Could you check my class definition? Do I need to define a private copy constructor or operator= method as in the parent class, or are the definitions in the parent good enough for this class? Is there an easier or better way to ensure that the unlock / destructor code runs without using virtual methods? If not, is what I've done here worth it or should we just drop this patch? Thanks.

Attachment #8917692 - Flags: review?(jcoppeard)

Jon Coppeard (:jonco) (PTO until 10th March)

Comment 10

•

7 years ago

(In reply to Chia-Hung Duan from comment #7) > It's a runtime check. The references: > > https://searchfox.org/mozilla-central/source/js/src/threading/Mutex.cpp#45 > https://searchfox.org/mozilla-central/source/js/src/vm/MutexIDs.h > > During Nursery::init(), I guess it won't trigger background allocation, i.e. > we don't need to acquire helper lock, that's why the assertion is not fired. Chia-Hung you are correct. The comment in GCRuntime::wantBackgroundAllocation explains that we don't trigger background allocation when the heap size is small, which will be the case during initialisation.

Paul Bone [:pbone]

Assignee

Comment 11

•

7 years ago

(In reply to Jon Coppeard (:jonco) from comment #10) > > The comment in GCRuntime::wantBackgroundAllocation explains that we don't > trigger background allocation when the heap size is small, which will be the > case during initialisation. I was wondering why it wouldn't be intermittent. But this makes total sense. I'm still curious if it could be done statically, possibly via tsan. Maybe we want to do some background allocation for the whole heap during startup. Based on bug 1387950 it sounded like we wanted to shave some time off Firefox startup. Maybe some background allocation could help then. Particularly if some pool of chunks can be shared between different parts of Firefox, not just SpiderMonkey.

Jon Coppeard (:jonco) (PTO until 10th March)

Comment 12

•

7 years ago

Comment on attachment 8917690 [details] [diff] [review] Bug 1407142 (Part 1) - Remove the AutoMaybeStartBackgroundAllocation class Review of attachment 8917690 [details] [diff] [review]: ----------------------------------------------------------------- This looks good. r=me. ::: js/src/gc/Nursery.h @@ +446,1 @@ > AutoLockGC& lock); You can put the two function arguments on the same line now. ::: js/src/vm/Runtime.h @@ +1206,5 @@ > + /* > + * This can be used to start a background allocation task (if one isn't > + * already running) that allocates chunks and makes them available in the > + * free chunks list. This happens after the lock is released in order to > + * avoid lock inversion. "if one isn't already running" - and also only if necessary as judged by some heuristics in GCRuntime::wantBackgroundAllocation(). @@ +1223,5 @@ > MOZ_DECL_USE_GUARD_OBJECT_NOTIFIER > > + // true if we should start a background chunk allocation task after the > + // lock is released. > + bool startBgAlloc; Please remove the trailing space at the end of the line.

Attachment #8917690 - Flags: review?(jcoppeard) → review+

Jon Coppeard (:jonco) (PTO until 10th March)

Comment 13

•

7 years ago

Comment on attachment 8917692 [details] [diff] [review] Bug 1407142 (Part 2) - Subclass AutoLockGC to handle background allocation Review of attachment 8917692 [details] [diff] [review]: ----------------------------------------------------------------- ::: js/src/vm/Runtime.h @@ +1219,5 @@ > +{ > + public: > + explicit AutoLockGCBgAlloc(JSRuntime* rt) > + : AutoLockGC(rt) > + , startBgAlloc(false) { } style nit: the empty method body should be on the next line and doesn't need a space inside, i.e. just "{}" @@ +1226,5 @@ > + * Neither the destructor or the unlock method are virtual. We want to > + * avoid a virtual method table which we can do because we will always > + * call the destructor with the correct static type which will have the > + * correct behaviour. > + */ I don't think we need this comment. We don't use virtual methods that often in SpiderMonkey so the assumption would be that methods are non-virtual anyway. The reason you've run into this I guess is that the destructor class unlock(), but we need to call a different version of unlock() in the subclass. I'd suggest simplifying the semantics to start background alloc on destruction rather than on any unlock. That matches what the original code did when it used AutoMaybeStartBackgroundAllocation. Then you can just do that in the destructor and don't have to have an unlock method here. > Do I need to define a private copy constructor or operator= method as in the parent class, or are the definitions in the parent good enough for this class? You don't need to do anything here. We don't want these methods (the compiler will try and generate them by default) so they are declared as deleted in the parent class. For the subclass, the compiler won't generate these methods because they would need to call the deleted versions in the parent class.

Jon Coppeard (:jonco) (PTO until 10th March)

Updated

•

7 years ago

Attachment #8917692 - Flags: review?(jcoppeard) → feedback+

Steve Fink [:sfink] [:s:]

Comment 14

•

7 years ago

tsan is a dynamic analysis, not a static one, so it would have no more ability to detect this case than the already-existing debug assertions. The rooting hazard analysis is the closest thing we have to something that would be able to check this, and I have been considering implementing something like this. What the hazard analysis has now is that it knows that certain RAII classes either suppress GC or suppress the analysis within their scope. A function that is *only* called within the scope of such a class can never GC, so need not be scanned for hazards. For this, we need the opposite -- we want functions that might *ever* called within the scope of a lock RAII class. If they might take the same lock, we should complain about deadlocking with that single lock. If they might take a different lock, then that determines the expected locking order, and if we observe a different order anywhere else, we can complain. I don't know how doable it is. That analysis is pretty straightforward and ought to be doable, but it might have too many false positives. Those can arise when we unlock via another RAII class or via function calls, and then re-lock at a time when we will never dynamically be holding the lock already. (A subset of those could be known statically.) I suspect there would be a fair amount of fiddling to get it to a usable state. There would be false negatives too, where we use a function call to lock and then later use an RAII class. And that still doesn't cover the case where we use an RAII class to lock, but we pass in which lock to use. That's used in stylo code -- or it was, now they're switching to using a single lock for everything just to make things easier for analyses.

Flags: needinfo?(sphink)

Steve Fink [:sfink] [:s:]

Comment 15

•

7 years ago

Filed bug 1408081 for the analysis idea.

Paul Bone [:pbone]

Assignee

Comment 16

•

7 years ago

(In reply to Jon Coppeard (:jonco) from comment #13) > > The reason you've run into this I guess is that the destructor class > unlock(), but we need to call a different version of unlock() in the > subclass. Yeah, I didn't try any alternatives. It's my understanding of C++ that if I put this behaviour in unlock() then I need to oervide both unlock() and the destructor, so that my destructor called the correct unlock(). > I'd suggest simplifying the semantics to start background alloc on > destruction rather than on any unlock. That matches what the original code > did when it used AutoMaybeStartBackgroundAllocation. Then you can just do > that in the destructor and don't have to have an unlock method here. Yeah, that's better. > > Do I need to define a private copy constructor or operator= method as in the parent class, or are the definitions in the parent good enough for this class? > > You don't need to do anything here. We don't want these methods (the > compiler will try and generate them by default) so they are declared as > deleted in the parent class. For the subclass, the compiler won't generate > these methods because they would need to call the deleted versions in the > parent class. That last bit was what I was wondering. I read about how you can declare methods like this as deleted. I wanted to check what happens when I make a subclass; to make sure it inherrits the deleted-ness. Thanks.

Paul Bone [:pbone]

Assignee

Comment 17

•

7 years ago

Attached patch Bug 1407142 (Part 1) - Remove the AutoMaybeStartBackgroundAllocation class r=jonco — Details — Splinter Review

Rebased onto central, r+ carried forward.

Attachment #8917690 - Attachment is obsolete: true

Attachment #8918838 - Flags: review+

Paul Bone [:pbone]

Assignee

Comment 18

•

7 years ago

Attached patch Bug 1407142 (Part 2) - Subclass AutoLockGC to handle background allocation (obsolete) — Details — Splinter Review

Attachment #8917692 - Attachment is obsolete: true

Attachment #8918839 - Flags: review?(jcoppeard)

Paul Bone [:pbone]

Assignee

Updated

•

7 years ago

Blocks: 1298018

Jon Coppeard (:jonco) (PTO until 10th March)

Comment 19

•

7 years ago

Comment on attachment 8918839 [details] [diff] [review] Bug 1407142 (Part 2) - Subclass AutoLockGC to handle background allocation Review of attachment 8918839 [details] [diff] [review]: ----------------------------------------------------------------- r=me with the problem below fixed. ::: js/src/vm/Runtime.h @@ +1183,3 @@ > > ~AutoLockGC() { > unlock(); I ran the tests against these patches and this call to unlock() is causing assertion failures because the lock may already have been released by the subclass' destructor at this point. I think the fix is just to do |lockGuard_.reset()| here without the assertion at the start of unlock().

Attachment #8918839 - Flags: review?(jcoppeard) → review+

Paul Bone [:pbone]

Assignee

Comment 20

•

7 years ago

(In reply to Jon Coppeard (:jonco) from comment #19) > Comment on attachment 8918839 [details] [diff] [review] > Bug 1407142 (Part 2) - Subclass AutoLockGC to handle background allocation > > Review of attachment 8918839 [details] [diff] [review]: > ----------------------------------------------------------------- > > r=me with the problem below fixed. > > ::: js/src/vm/Runtime.h > @@ +1183,3 @@ > > > > ~AutoLockGC() { > > unlock(); > > I ran the tests against these patches and this call to unlock() is causing > assertion failures because the lock may already have been released by the > subclass' destructor at this point. Ouch. I missed that, earlier I was testing with --enable-debug but I hadn't re-tested that since changing this patch. I missed the implicit call to the base class destructor. Thanks for catching this, I guess you saw it in the try jobs for my other change? > I think the fix is just to do |lockGuard_.reset()| here without the > assertion at the start of unlock(). I think we can keep that assertion provided the AutoLockGC destructor doesn't call unlock(). This way any explicit calls to unlock can keep the assertion. I investigated other options for keeping the assertion in more cases, such as when AutoLockGC is used without AutoLockGCBgAlloc, but either everything I do has a runtime cost or less flexability than this subclass relationship. Thanks.

Paul Bone [:pbone]

Assignee

Comment 21

•

7 years ago

Attached patch Bug 1407142 (Part 2) - Subclass AutoLockGC to handle background allocation r=jonco — Details — Splinter Review

Update based on review feedback. r+ carried forward.

Attachment #8918839 - Attachment is obsolete: true

Attachment #8919097 - Flags: review+

Paul Bone [:pbone]

Assignee

Updated

•

7 years ago

Keywords: checkin-needed

Pulsebot

Comment 22

•

7 years ago

Pushed by ryanvm@gmail.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/334054ec2083 Part 1: Remove the AutoMaybeStartBackgroundAllocation class. r=jonco https://hg.mozilla.org/integration/mozilla-inbound/rev/81df983b8560 Part 2: Subclass AutoLockGC to handle background allocation. r=jonco

Keywords: checkin-needed

Joel Maher ( :jmaher ) (UTC -8)

Updated

•

7 years ago

Depends on: 1409411

Sebastian Hengst [:aryx] (needinfo me if it's about an intermittent or backout)

Comment 23

•

7 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/334054ec2083 https://hg.mozilla.org/mozilla-central/rev/81df983b8560

Status: ASSIGNED → RESOLVED

Closed: 7 years ago

status-firefox58: affected → fixed

Resolution: --- → FIXED

Target Milestone: --- → mozilla58

Joel Maher ( :jmaher ) (UTC -8)

Updated

•

7 years ago

No longer depends on: 1409411

You need to log in before you can comment on or make changes to this bug.