Closed
Bug 1102525
Opened 6 years ago
Closed 6 years ago
MFBT: Add a SegmentedVector type
Categories
(Core :: MFBT, defect)
Tracking
()
RESOLVED
FIXED
mozilla37
People
(Reporter: njn, Assigned: njn)
References
(Blocks 1 open bug)
Details
Attachments
(4 files, 8 obsolete files)
|
3.12 KB,
patch
|
glandium
:
review+
|
Details | Diff | Splinter Review |
|
13.73 KB,
patch
|
njn
:
review+
|
Details | Diff | Splinter Review |
|
8.97 KB,
patch
|
froydnj
:
review+
|
Details | Diff | Splinter Review |
|
13.42 KB,
patch
|
froydnj
:
review+
|
Details | Diff | Splinter Review |
nsCycleCollector.cpp has a useful SegmentedArray class. I want to use it in DMD, and it might be useful in other places eventually, so it's worth robustifying it and putting it into MFBT.
Comment 1•6 years ago
|
||
This would be incredibly handy in a couple of places in the GC as well.
| Assignee | ||
Comment 2•6 years ago
|
||
froydnj, some things to note for the review: - I'm using the name SegmentedVector rather than SegmentedArray in order to match mozilla::Vector. - I'm using infallible new for Segment allocation (in SegmentedVector::AppendElement()). You could argue that I should be using an AllocPolicy, but it didn't seem worthwhile. - Please check the move constructor stuff carefully. I've tested it as well as I can, but I'm still not comfortable with how C++ move semantics works. Thank you.
Attachment #8526551 -
Flags: review?(nfroyd)
| Assignee | ||
Comment 3•6 years ago
|
||
smaug, some thing to note for the review: - The patch changes the order of iteration in NS_TRACE_SEGMENTED_ARRAY. Previously, it iterated forward through the segments list but backwards within each segment. Now it iterates forwards through both. I assume/hope this doesn't matter. - I'm not sure if the mValues.Clear() and mObjects.Clear() are needed in Destroy(), now that the destructor does that. Can the clearing happen after the mozilla::DropJSObjects(this) call? - Having a segment capacity of 60 for mValues and mObjects results in a segment size of about 512 (8 * 60 + a bit extra). So changed it to use SegmentedVectorCapacity to get this size more reliably. Thank you.
Attachment #8526552 -
Flags: review?(bugs)
Comment 4•6 years ago
|
||
> Can the clearing happen after the mozilla::DropJSObjects(this) call?
DropJSObjects will go through the array and null out every JS pointer, to enforce that there are no dangling pointers, so it probably more efficient to call clear before DropJSObject(). (Before, the absence of dangling pointers was just checked in debug builds, so the clear would have been needed before Drop)
Comment 5•6 years ago
|
||
Comment on attachment 8526551 [details] [diff] [review] (part 1) - Add SegmentedVector to MFBT Review of attachment 8526551 [details] [diff] [review]: ----------------------------------------------------------------- If JS and GC are interested in this, you don't get the luxury of pretending allocation is infallible. Nor, I think, for an *indexed* collection type should you ever get it anyway. If an allocation fails and you index into that allocation, that's sec-critical territory. Not cool. And if we're discussing SpiderMonkey use, you don't get the ability to not support a custom allocator. We need this to support SpiderMonkey custom allocators, and we likely need it for for memory tracking purposes to tune things like GC.
Attachment #8526551 -
Flags: review?(nfroyd) → review-
| Assignee | ||
Comment 6•6 years ago
|
||
> you don't get the luxury of pretending allocation is infallible.
Isn't mozalloc's |operator new| infallible? What am I missing here?
Comment 7•6 years ago
|
||
(In reply to Jeff Walden [:Waldo] (remove +bmo to email) from comment #5) > Nor, I think, for an *indexed* collection type > should you ever get it anyway. If an allocation fails and you index into > that allocation, that's sec-critical territory. Not cool. You crash when the allocation fails.
Comment 8•6 years ago
|
||
(In reply to Nicholas Nethercote [:njn] from comment #3) > - The patch changes the order of iteration in NS_TRACE_SEGMENTED_ARRAY. > Previously, it iterated forward through the segments list but backwards > within each segment. Now it iterates forwards through both. I assume/hope > this doesn't matter. That shouldn't matter > > - I'm not sure if the mValues.Clear() and mObjects.Clear() are needed in > Destroy(), now that the destructor does that. Can the clearing happen after > the mozilla::DropJSObjects(this) call? These days it can. It used to assert hard. http://mxr.mozilla.org/mozilla-central/source/xpcom/base/CycleCollectedJSRuntime.cpp?rev=c3ae42b64ce9&mark=858-866#856 was changed to http://mxr.mozilla.org/mozilla-central/source/xpcom/base/CycleCollectedJSRuntime.cpp?rev=34d52be130aa&mark=886-890#884 But it can be significantly faster to call Clear() before DropJSObjects. > > - Having a segment capacity of 60 for mValues and mObjects results in a > segment > size of about 512 (8 * 60 + a bit extra). So changed it to use > SegmentedVectorCapacity to get this size more reliably. good
Comment 9•6 years ago
|
||
Comment on attachment 8526552 [details] [diff] [review] (part 2) - Replace SegmentedArray with mozilla::SegmentedVector in the cycle collector >+ static const size_t kValuesSegmentSize = 512; >+ static const size_t kObjectsSegmentSize = 512; >+ static const size_t kValuesSegmentCapacity = >+ SegmentedVectorCapacity<JS::Value, kValuesSegmentSize>::value; >+ static const size_t kObjectsSegmentCapacity = >+ SegmentedVectorCapacity<JSObject*, kObjectsSegmentSize>::value; >+ SegmentedVector<JS::Value, kValuesSegmentCapacity> mValues; >+ SegmentedVector<JSObject*, kObjectsSegmentCapacity> mObjects; Could the API somehow hide the capacity parts into its implementation. It would be great if only could just say SegmentedVector<Type, minimumNumberOfEntriesPerSegment> and then the implementation of SegmentedVector might use some higher number than minimumNumberOfEntriesPerSegment for the number of entries in a segment to optimize memory usage. >-#define NS_TRACE_SEGMENTED_ARRAY(_field, _type) \ >- { \ >- auto segment = tmp->_field.GetFirstSegment(); \ >- while (segment) { \ >- for (uint32_t i = segment->Length(); i > 0;) { \ >- js::gc::CallTraceCallbackOnNonHeap<_type, TraceCallbacks>( \ >- &segment->ElementAt(--i), aCallbacks, #_field, aClosure); \ >- } \ >- segment = segment->getNext(); \ >- } \ >+#define NS_TRACE_SEGMENTED_ARRAY(_field, _type) \ >+ { \ >+ for (auto r = tmp->_field.All(); !r.IsEmpty(); r.PopFront()) { \ I don't understand this. Why you Pop anything? Or, apparently Pop means in this context something unusual. It doesn't actually modify the vector. Sounds like the API should be changed. PopFront() should be called to IterateToNext() or some such and Front() maybe Current(). And should All() be Iterator()
Attachment #8526552 -
Flags: review?(bugs) → review-
| Assignee | ||
Comment 10•6 years ago
|
||
> I don't understand this. Why you Pop anything?
> Or, apparently Pop means in this context something unusual. It doesn't
> actually
> modify the vector. Sounds like the API should be changed. PopFront() should
> be called to IterateToNext() or some such and
> Front() maybe Current().
> And should All() be Iterator()
I just copied the existing names from Vector::Range.
Comment 11•6 years ago
|
||
Hmm, bug in Vector then too. I wouldn't copy the bug to the new API. std has for (std::vector<int>::iterator it = myvector.begin() ; it != myvector.end(); ++it) which is quite understandable. No popping anywhere.
Comment 12•6 years ago
|
||
Also, since mozilla/Vector is rarely used API, changing it shouldn't be too tricky.
| Assignee | ||
Comment 13•6 years ago
|
||
I think the names |Range| and |all()| indicate that it's meant to be a conceptual copy of all the elements in the container. With that in mind, popping values makes more sense. I believe Luke wrote that code; maybe he can explain the rationale better.
> Also, since mozilla/Vector is rarely used API, changing it shouldn't be too tricky.
It's used in *lots* of places. grep tells them there are 346 mentions within js/src/ alone. js::HashTable also has a Range type. Searching for "::Range.*all()" I get 85 matches.Flags: needinfo?(luke)
Comment 14•6 years ago
|
||
Comparing to nsTArray, Vector is rarely used. (And Vector is very rarely used outside js/)
| Assignee | ||
Comment 15•6 years ago
|
||
> Could the API somehow hide the capacity parts into its implementation.
> It would be great if only could just say
> SegmentedVector<Type, minimumNumberOfEntriesPerSegment>
> and then the implementation of SegmentedVector might use some higher number
> than
> minimumNumberOfEntriesPerSegment for the number of entries in a segment to
> optimize memory usage.
Doing it like that is difficult. But I can do something similar.
I originally made SegmentedVectorCapacity separate so that you could choose the size of each segment based on either the number of elements or the number of bytes, but in practice we always do the latter because we want to choose a nice power-of-two like 512 or 4096 or 16384 that will minimize slop.
So I'll restrict it to just the SegmentedVector<T, IdealSegmentSize> form. This actually simplifies things slightly, because Segment can be moved within SegmentedVector.| Assignee | ||
Comment 16•6 years ago
|
||
froydnj: I'll post an updated patch on Monday, so no need for you to review now unless you're really curious.
| Assignee | ||
Comment 17•6 years ago
|
||
> froydnj: I'll post an updated patch on Monday
Or now, why not?
froydnj, some things to note for the review:
- I'm using the name SegmentedVector rather than SegmentedArray in order to
match mozilla::Vector.
- I've added the use of AllocPolicy as requested by Waldo, and simplified the
capacity computation as suggested by smaug.
- I've left the Range names untouched, in order to get a second opinion while
we're waiting for Luke's explanation.
- Please check the move constructor stuff carefully. I've tested it as well as
I can, but I'm still not comfortable with how C++ move semantics works.
Thank you.
Attachment #8526765 -
Flags: review?(nfroyd)
| Assignee | ||
Updated•6 years ago
|
Attachment #8526551 -
Attachment is obsolete: true
| Assignee | ||
Comment 18•6 years ago
|
||
As mentioned in the previous comment, I've simplified the capacity computation, but left the Range names unchanged for now.
Attachment #8526766 -
Flags: review?(bugs)
| Assignee | ||
Updated•6 years ago
|
Attachment #8526552 -
Attachment is obsolete: true
Comment 19•6 years ago
|
||
For a while (5 years ago), it looked like "ranges" were going to be added to the STL as an alternative to iterators: http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2012/n3350.html The basic reasoning is that: 1. A range is an abstraction of a (begin, end) pair which is what we are working with 99% of the time. 2. Ranges are syntactically much nicer to work with (before C++11 for-range came along :) 3. When iterators aren't trivially pointers, it is less efficient and more complicated in the impl to have a sentinel "end" iterator. When I wrote Vector, I felt like iterators were overkill and it wasn't STL-compliant anyway; so I didn't add any Range or Iterator initially, just begin/end which return T*. When I added HashMap/Set, I needed some abstraction because I can't stand callback-based enumeration, so I added Range+all(). Someone later (probably for regularity or some generic algo) added Range+all to Vector. I think it is a good idea to have range be the abstraction exposed by containers, not iterators. For pretty much everything but Vector, the only thing you ever do with iterators is "iterate through every element" and ranges are great for that. For Vectors, we do all manner of crazy things and I think it's way simpler and less verbose to just have T*. But on the naming issue, I am totally happy to do a big sed-rename patch to something more internally consistent and logical. I wrote most of this code in my first few months at Mozilla and I mostly just boosted (pun intended) the names from then-current range proposals. In SM, the convention is to have *Iter classes with operator++ and done() (so our "Iter" classes are actually ranges :). I'd be happy to go with that for HashMap/Set/Vector if others agreed. Ranges apparently didn't make it into C++11 anyway.
Updated•6 years ago
|
Flags: needinfo?(luke)
Comment 20•6 years ago
|
||
Comment on attachment 8526766 [details] [diff] [review] (part 2) - Replace SegmentedArray with mozilla::SegmentedVector in the cycle collector >+++ b/xpcom/base/nsCycleCollector.cpp ... >+class InfallibleAllocPolicy > { ... >+ template <typename T> >+ T* pod_malloc(size_t aNumElems) > { Looks like a wrong place for this class. I'd expect most of the users in Gecko to need infallible allocation. So, move to SegmentedVector.h or something.
Attachment #8526766 -
Flags: review?(bugs) → review+
| Assignee | ||
Comment 21•6 years ago
|
||
> I'd expect most of the users in Gecko to need infallible allocation.
> So, move to SegmentedVector.h or something.
I tried doing that and having InfallibleAllocPolicy as the default instead of MallocAllocPolicy. But moz_xmalloc is in libmozalloc and I had trouble using that from within an MFBT module. But it's probably doable, and glandium might know...
glandium, is using moz_xmalloc from within MFBT a problem? Since this is just a .h file, perhaps it matters more where SegmentedVector.h is used.Flags: needinfo?(mh+mozilla)
| Assignee | ||
Comment 22•6 years ago
|
||
> In SM, the convention is to have *Iter classes with operator++ and done() (so our "Iter" classes are actually ranges :). I'd be happy to go with that for HashMap/Set/Vector if others agreed. Ranges apparently didn't make it into C++11 anyway.
Thank you for the detailed explanation. I'll see how Iter/operator++/done() looks and then probably will file a follow-up for Vector and HashTable to convert them. Though I'm not sure what to do with HashTable::Enum... steal some inspiration from Rust and call it MutIter?
Comment 23•6 years ago
|
||
(In reply to Nicholas Nethercote [:njn] from comment #21) > glandium, is using moz_xmalloc from within MFBT a problem? Since this is > just a .h file, perhaps it matters more where SegmentedVector.h is used. It can only be used in things linked against libmozalloc. Essentially, libxul and gkmedias. *not* js as long as it can be built standalone and/or as a shared lib in a firefox build.
Flags: needinfo?(mh+mozilla)
Comment 24•6 years ago
|
||
(In reply to Nicholas Nethercote [:njn] from comment #6) > Isn't mozalloc's |operator new| infallible? (In reply to Olli Pettay [:smaug] from comment #7) > You crash when the allocation fails. These might be true for Gecko, but they are not true generally for all JSAPI embeddings. And SpiderMonkey does care about getting OOM behavior right for all embeddings, not just Gecko.
| Assignee | ||
Comment 25•6 years ago
|
||
> > glandium, is using moz_xmalloc from within MFBT a problem? Since this is
> > just a .h file, perhaps it matters more where SegmentedVector.h is used.
>
> It can only be used in things linked against libmozalloc. Essentially,
> libxul and gkmedias. *not* js as long as it can be built standalone and/or
> as a shared lib in a firefox build.
Hmm, ok. The reason I wanted to use moz_xmalloc() was so that any OOMs would get annotated with their size nicely. But since that's difficult, I see two options:
- Put an InfallibleAllocPolicy in MFBT that doesn't use moz_xmalloc(), but instead just crashes with MOZ_CRASH.
- Leave the InfallibleAllocPolicy in nsCycleCollector.cpp.| Assignee | ||
Comment 26•6 years ago
|
||
> Hmm, ok. The reason I wanted to use moz_xmalloc() was so that any OOMs would
> get annotated with their size nicely.
Ah, I can put InfallibleAllocPolicy in mozalloc.h. That'll work nicely.| Assignee | ||
Comment 27•6 years ago
|
||
Attachment #8527386 -
Flags: review?(mh+mozilla)
| Assignee | ||
Comment 28•6 years ago
|
||
I've changed |Range| to |Iter| as suggested above. That addresses all the comments so far.
Attachment #8527387 -
Flags: review?(nfroyd)
| Assignee | ||
Updated•6 years ago
|
Attachment #8526765 -
Attachment is obsolete: true
Attachment #8526765 -
Flags: review?(nfroyd)
| Assignee | ||
Comment 29•6 years ago
|
||
Carrying over r+. |Range| has been replaced with |Iter|, and |InfallibleAllocPolicy| is now in mozalloc.
| Assignee | ||
Updated•6 years ago
|
Attachment #8526766 -
Attachment is obsolete: true
| Assignee | ||
Updated•6 years ago
|
Attachment #8527388 -
Flags: review+
| Assignee | ||
Updated•6 years ago
|
Attachment #8527387 -
Attachment is obsolete: true
Attachment #8527387 -
Flags: review?(nfroyd)
Comment 31•6 years ago
|
||
(In reply to Nicholas Nethercote [:njn] from comment #22) > Though I'm not sure what to do with HashTable::Enum... steal > some inspiration from Rust and call it MutIter? That sounds great.
| Assignee | ||
Comment 32•6 years ago
|
||
Had to hack around GCC 4.4's stupidity.
Attachment #8528063 -
Flags: review?(nfroyd)
| Assignee | ||
Updated•6 years ago
|
Attachment #8527400 -
Attachment is obsolete: true
Attachment #8527400 -
Flags: review?(nfroyd)
Updated•6 years ago
|
Attachment #8527386 -
Flags: review?(mh+mozilla) → review+
Comment 33•6 years ago
|
||
Comment on attachment 8528063 [details] [diff] [review] (part 2) - Add SegmentedVector to MFBT Review of attachment 8528063 [details] [diff] [review]: ----------------------------------------------------------------- I think this all looks reasonable, but I think the |Append| and the |mozilla::Array| issue need some discussion. ::: mfbt/SegmentedVector.h @@ +29,5 @@ > +// and not too large so that not too much space is wasted when the final > +// segment is not full). Something like 4096 or 8192 is probably good. > +template<typename T, size_t IdealSegmentSize, > + typename AllocPolicy = MallocAllocPolicy> > +class SegmentedVector : AllocPolicy Can you make this explicitly |: private AllocPolicy|? @@ +53,5 @@ > + mLength++; > + } > + > + uint32_t mLength; > + mozilla::Array<T, SegmentCapacity> mElems; This declaration seems like a bit of a footgun, as we're calling many default constructors when a SegmentImpl is allocated. I don't think that's what I would expect as a Segmented Vector client. Wouldn't it be better to only construct the object when Append() is called (i.e. construction is explicit)? @@ +103,5 @@ > + return n; > + } > + > + // Returns false if the allocation failed. (If you are using an infallible > + // allocation policy, the return value doesn't need to be checked.) We should really make this MOZ_WARN_UNUSED_RESULT, but then what do infallible alloc policies do? Cast to void? MOZ_ASSERT? Or have |Append MOZ_WARN_UNUSED_RESULT| and |InfallibleAppend|? @@ +151,5 @@ > + > + public: > + bool Done() const { return !mSegment; } > + > + T& Get() const WDYT about providing this in |T& Get()| and |const T& Get() const| flavors? The members would probably have to be marked |mutable| of course.
Attachment #8528063 -
Flags: review?(nfroyd) → feedback+
| Assignee | ||
Comment 34•6 years ago
|
||
Here are the requested changes. This version of the patch: - Makes SegmentedVector's inheritance of AllocPolicy |private|. - Uses AlignedStorage for the elements so that default constructors aren't called for them. - Adds InfallibleAppend() and makes Append() use MOZ_WARN_UNUSED_RESULT. - Splits Get() into const and non-const versions.
Attachment #8532989 -
Flags: review?(nfroyd)
| Assignee | ||
Comment 35•6 years ago
|
||
This patch consists of part 2b folded into the already-reviewed part 2.
Attachment #8532991 -
Flags: review?(nfroyd)
| Assignee | ||
Updated•6 years ago
|
Attachment #8528063 -
Attachment is obsolete: true
| Assignee | ||
Comment 36•6 years ago
|
||
SegmentCapacity() is no longer needed.
Attachment #8532994 -
Flags: review?(nfroyd)
| Assignee | ||
Updated•6 years ago
|
Attachment #8532991 -
Attachment is obsolete: true
Attachment #8532991 -
Flags: review?(nfroyd)
| Assignee | ||
Comment 37•6 years ago
|
||
p.s.: I deserve an award for this line:
> new (&(*this)[mLength - 1]) T(mozilla::Forward<U>(aU));
I'll let you decide what the award should be.
Comment 38•6 years ago
|
||
Comment on attachment 8532989 [details] [diff] [review] (part 2b) - Address review comments Review of attachment 8532989 [details] [diff] [review]: ----------------------------------------------------------------- I think the bonus points you gain from providing the interdiff cancel out whatever points you lost from SegmentImpl::Append. ::: mfbt/SegmentedVector.h @@ +8,5 @@ > +// > +// This class should be used in preference to mozilla::Vector or nsTArray when > +// you are simply gathering items in order to later iterate over them, because > +// it avoids those types' need to repeatedly allocate and copy the data into > +// increasingly large buffers. This description sounds incomplete, because one might ask after reading it, "Why shouldn't we just use nsTArray::SetCapacity (or Vector's equivalent)?" thereby avoiding the copying. This nitpicking inspired by https://groups.google.com/a/chromium.org/d/msg/chromium-dev/EUqoIz2iFU4/A8xFIrsKPWUJ We should say something here about it being difficult to determine a good value for the total # of elements being gathered or not wanting to make multiple large-ish allocations for the array's elements. @@ +80,4 @@ > } > > uint32_t mLength; > + mozilla::AlignedStorage<sizeof(T) * SegmentCapacity> mElems; We should have a static_assert here just in case somebody tries using this with T with unusual alignment requirements. Something like: typedef mozilla::AlignedStorage<sizeof(T) * SegmentCapacity> Storage; Storage mElems; static_assert(MOZ_ALIGNOF(T) <= MOZ_ALIGNOF(Storage), "..."); works, I think?
Attachment #8532989 -
Flags: review?(nfroyd) → review+
Comment 39•6 years ago
|
||
Comment on attachment 8532994 [details] [diff] [review] (part 2) - Add SegmentedVector to MFBT Review of attachment 8532994 [details] [diff] [review]: ----------------------------------------------------------------- I'm assuming this is just the previous part 2 + interdiff 2b, so it doesn't need careful review. rs+
Attachment #8532994 -
Flags: review?(nfroyd) → review+
| Assignee | ||
Comment 40•6 years ago
|
||
> I'm assuming this is just the previous part 2 + interdiff 2b, so it doesn't
> need careful review.
Correct.| Assignee | ||
Comment 41•6 years ago
|
||
AlignedStorage only guarantees uint64_t alignment, which on 32-bit machines is 4 bytes, which is insufficient for SegmentedVector<JS::Value> because JS::Value is 8-byte-aligned. So I ended up using a union, similar to how nsTArray does it. I also added an example vector in the test whose element requires 16-byte alignment. https://hg.mozilla.org/integration/mozilla-inbound/rev/c207ff1b40fd https://hg.mozilla.org/integration/mozilla-inbound/rev/96c0e71d648d https://hg.mozilla.org/integration/mozilla-inbound/rev/a251917f5bfc
Comment 42•6 years ago
|
||
https://hg.mozilla.org/mozilla-central/rev/c207ff1b40fd https://hg.mozilla.org/mozilla-central/rev/96c0e71d648d https://hg.mozilla.org/mozilla-central/rev/a251917f5bfc
Status: ASSIGNED → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla37
You need to log in
before you can comment on or make changes to this bug.
Description
•