Closed Bug 1094564 Opened 10 years ago Closed 10 years ago

Use a segmented vector in SnowWhiteKiller

Categories

(Core :: XPCOM, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla36

People

(Reporter: n.nethercote, Assigned: n.nethercote)

References

Details

Attachments

(2 files, 1 obsolete file)

SnowWhiteKiller allocates an nsTArray whose size is based on a worst-case
scenario, e.g. if every object in the purple buffer needs to be appended to it.
And it can be quite big -- often 0.5 MiB or more each time. Here are some
sizes from a DMD run:

> Individual block sizes: 913,408; 733,184; 724,992; 716,800 x 2; 499,712;
> 462,848 x 2; 454,656 x 2; 450,560 x 2; 446,464; 442,368; 438,272 x 2; 434,176
> x 2; 430,080 x 2; 425,984; 421,888; 413,696; 409,600; 401,408 x 2; 397,312;
> 393,216; 376,832; 368,640; 364,544 x 2; 233,472 x 5; 167,936; 135,168;
> 57,344; 40,960 x 3; 36,864 x 7; 32,768 x 4; 28,672 x 8; 24,576 x 9; 20,480 x
> 5; 16,384 x 5; 12,288 x 24; 8,192 x 17; 4,096 x 20; ~4,093 x 2

But in practice, most of the time the nsTArray never gets even close to being
filled. In fact, around 95% of the time a 4 KiB (on 32-bit) or 8 KiB (on
64-bit) buffer will suffice. So a better approach is to use a segmented vector
type and grow it as necessary, instead of allocating for the worse case.

Benefits:
- It will greatly reduce the cumulative allocations. 
- It will avoids the need for large allocations, which can be a problem on
  win32 when virtual address space runs low. 
- It might help with fragmentation, because all the allocations will be a
  single nice size (4 or 8 KiB) instead of having all these awkward sizes like
  892 KiB and 716 KiB and 708 KiB.
Attached file profiling results
Here are some more measurements from a different browser session. Each line
shows a SnowWhiteKiller instantiation, the required length and the allocated
capacity for the array. (This is measured in elements, not bytes.)

E.g. this line:

> 3 / 12547 (0.02%)

indicates that we allocated an nsTArray with a capacity of 12547, but we only
ended up using 3 of those elements.

Summary stats:
- 574 cases in total.
- 80 (13.9%) of them ended up using 0 elements.
- 555 (96.7%) of them needed <= 341 elements, which is how many can fit in 4
  KiB on 32-bit and 8 KiB on 64-bit.
- The maximum number of used elements is 7212, which is 173,088 bytes on
  64-bit.
- The maximum allocated capacity is 30,519, which is 732,456 bytes on 64-bit.
This patch implements the change suggested in comment 0. If I start the
browser, open Gmail, CNN and TechCrunch, and the shutdown, I get:

- old: 28,440,888 bytes of cumulative allocations from 210 allocs
- new:  1,351,680 bytes of cumulative allocations from 330 allocs

I considered writing a generic SegmentedVector class, but ended up opting for
code specific to this file, for conciseness.
Attachment #8517878 - Flags: review?(bugs)
Here are cumulative measurements using the current code in an e10s build when
running http://gregor-wagner.com/tmp/mem50:

> PARENT PROCESS
> --------------
> Live {
>   ~1,200 blocks in heap block record 33 of 4,903
>   ~82,550,742 bytes (~80,033,302 requested / ~2,517,440 slop)
>   Individual block sizes: 1,122,304; 1,118,208; 479,232; 434,176; 417,792 x 2; 413,696 x 5; 409,600; 364,544; 323,584; 319,488; 307,200; 294,912 x 2; 266,240; 253,952; 249,856; 245,760; 241,664; 237,568 x 5; 233,472 x 6; 229,376 x 6; 225,280 x 3; 221,184 x 4; 217,088; 212,992 x 3; 204,800 x 2; 200,704 x 2; 196,608; 192,512; 188,416 x 5; 180,224 x 4; 176,128 x 4; 172,032 x 8; 167,936 x 7; 163,840 x 4; 159,744 x 2; 155,648 x 5; 151,552 x 6; 147,456 x 8; 143,360 x 3; 139,264 x 13; 135,168 x 9; 131,072 x 8; 126,976 x 16; 122,880 x 13; 118,784 x 16; 114,688 x 9; 110,592 x 14; 106,496 x 22; 102,400 x 18; 98,304 x 18; 94,208 x 17; 90,112 x 19; 86,016 x 31; 81,920 x 27; 77,824 x 32; 73,728 x 29; 69,632 x 39; 65,536 x 30; 61,440 x 33; 57,344 x 43; 53,248 x 24; 49,152 x 29; 45,056 x 40; 40,960 x 44; 36,864 x 48; 32,768 x 55; 28,672 x 41; 24,576 x 61; 20,480 x 49; 16,384 x 53; 12,288 x 64; 8,192 x 72; 4,096 x 37; ~4,093 x 14
>   0.64% of the heap (60.73% cumulative)
>   Allocated at {
>     #01: nsTArray_base<nsTArrayFallibleAllocator, nsTArray_CopyWithMemutils>::EnsureCapacity(unsigned long, unsigned long) (/home/njn/moz/mi5/co64dmd/toolkit/components/ur
> l-classifier/../../../dist/include/nsTArray-inl.h:135)
>     #02: nsTArray_Impl<SnowWhiteObject, nsTArrayFallibleAllocator>::SetCapacity(unsigned long) (/home/njn/moz/mi5/co64dmd/xpcom/base/../../dist/include/nsTArray.h:1478)
>     #03: SnowWhiteKiller (/home/njn/moz/mi5/co64dmd/xpcom/base/../../../xpcom/base/nsCycleCollector.cpp:2606)
>   }
> }
> 
> CHILD PROCESS
> -------------
> Live {
>   ~1,319 blocks in heap block record 7 of 5,967
>   ~270,606,294 bytes (~267,927,758 requested / ~2,678,536 slop)
>   Individual block sizes: 2,383,872; 2,367,488; 2,301,952; 2,256,896; 2,252,800 x 16; 2,093,056; 2,068,480; 1,998,848; 1,359,872; 1,101,824; 933,888; 786,432; 741,376; 733,184; 716,800; 700,416; 696,320; 692,224; 679,936; 675,840; 667,648; 663,552 x 2; 638,976 x 2; 630,784 x 2; 626,688 x 2; 622,592; 618,496; 614,400 x 2; 610,304; 606,208 x 2; 602,112 x 4; 598,016 x 2; 593,920 x 2; 589,824 x 2; 585,728 x 3; 581,632; 577,536 x 2; 569,344 x 4; 565,248 x 7; 561,152 x 2; 557,056 x 2; 552,960 x 4; 548,864; 544,768 x 3; 540,672; 536,576 x 2; 532,480 x 4; 528,384; 520,192; 512,000; 507,904; 503,808 x 2; 499,712; 495,616; 483,328 x 2; 475,136; 471,040 x 3; 466,944; 462,848 x 2; 458,752 x 2; 454,656 x 2; 450,560 x 2; 446,464 x 3; 442,368; 434,176; 425,984 x 4; 421,888; 417,792 x 2; 413,696 x 3; 405,504 x 3; 401,408 x 3; 397,312 x 2; 393,216 x 2; 389,120 x 3; 385,024 x 2; 376,832 x 2; 372,736 x 4; 368,640 x 4; 364,544 x 5; 360,448 x 5; 352,256 x 2; 348,160 x 5; 344,064 x 3; 339,968 x 3; 335,872 x 6; 331,776 x 6; 327,680; 323,584 x 4; 319,488; 315,392 x 2; 311,296 x 3; 307,200 x 8; 303,104 x 4; 299,008 x 2; 294,912 x 2; 290,816 x 5; 286,720 x 4; 282,624; 278,528 x 6; 274,432 x 5; 270,336 x 4; 266,240 x 5; 262,144 x 6; 258,048 x 7; 253,952 x 2; 249,856 x 4; 245,760 x 3; 241,664 x 6; 237,568 x 7; 233,472 x 7; 229,376 x 5; 225,280 x 8; 221,184 x 8; 217,088 x 9; 212,992 x 14; 208,896 x 14; 204,800 x 19; 200,704 x 15; 196,608 x 13; 192,512 x 8; 188,416 x 13; 184,320 x 10; 180,224 x 18; 176,128 x 19; 172,032 x 17; 167,936 x 18; 163,840 x 22; 159,744 x 18; 155,648 x 14; 151,552 x 31; 147,456 x 28; 143,360 x 15; 139,264 x 30; 135,168 x 32; 131,072 x 27; 126,976 x 9; 122,880 x 21; 118,784 x 14; 114,688 x 17; 110,592 x 7; 106,496 x 22; 102,400 x 18; 98,304 x 11; 94,208 x 22; 90,112 x 29; 86,016 x 19; 81,920 x 5; 77,824 x 16; 73,728 x 6; 69,632 x 15; 65,536 x 14; 61,440 x 16; 57,344 x 17; 53,248 x 19; 49,152 x 29; 45,056 x 20; 40,960 x 30; 36,864 x 29; 32,768 x 19; 28,672 x 22; 24,576 x 28; 20,480 x 34; 16,384 x 9; 12,288 x 12; 8,192 x 32; 4,096 x 25; ~4,093 x 14
>   1.91% of the heap (44.37% cumulative)
>   Allocated at {
>     #01: nsTArray_base<nsTArrayFallibleAllocator, nsTArray_CopyWithMemutils>::EnsureCapacity(unsigned long, unsigned long) (/home/njn/moz/mi5/co64dmd/toolkit/components/ur
> l-classifier/../../../dist/include/nsTArray-inl.h:135)
>     #02: nsTArray_Impl<SnowWhiteObject, nsTArrayFallibleAllocator>::SetCapacity(unsigned long) (/home/njn/moz/mi5/co64dmd/xpcom/base/../../dist/include/nsTArray.h:1478)
>     #03: SnowWhiteKiller (/home/njn/moz/mi5/co64dmd/xpcom/base/../../../xpcom/base/nsCycleCollector.cpp:2606)
>   }
> }
Summary: Used a segmented vector in SnowWhiteKiller → Use a segmented vector in SnowWhiteKiller
(We really need to put a proper segmented array to some public place, like xpcom/* or mfbt/*.)

Couldn't you at least use LinkedList and not ChunkList.
But even better would be if you could reuse SegmentedArray which is already used for JSPurpleBuffer.
It does have magical '60' for the array size, but that could be factored out to be more flexible.
Comment on attachment 8517878 [details] [diff] [review]
Used a segmented vector in SnowWhiteKiller

So unless there is some really good reason to add yet another segmented array, 
we should use the existing one (especially because it is a template class already).
Attachment #8517878 - Flags: review?(bugs) → review-
This patch generalizes SegmentedArray a little, and then uses it instead of
nsTArray in SnowWhiteKiller.
Attachment #8518666 - Flags: review?(bugs)
Attachment #8517878 - Attachment is obsolete: true
Comment on attachment 8518666 [details] [diff] [review]
Used SegmentedArray in SnowWhiteKiller

>+  // Segments are 4 KiB on 32-bit and 8 KiB on 64-bit.
>+  static const size_t kIdealSegmentSize = sizeof(void*) * 1024;
>+  static const size_t kSingleElemSegmentSize =
>+    sizeof(SegmentedArrayElement<SnowWhiteObject, 1>);
>+  static const size_t kSegmentCapacity =
>+    (kIdealSegmentSize - kSingleElemSegmentSize) / sizeof(SnowWhiteObject) + 1;
>+
>+  static const size_t kActualSegmentSize =
>+      sizeof(SegmentedArrayElement<SnowWhiteObject, kSegmentCapacity>);
2 spaces for indentation.

>   ~SnowWhiteKiller()
...
>+    auto segment = mObjects.GetFirstSegment();
Ok, this is about the only case where use o

>+    while (segment) {
>+      for (uint32_t i = 0; i < segment->Length(); i++) {
>+        SnowWhiteObject& o = segment->ElementAt(i);
>+        if (!o.mRefCnt->get() && !o.mRefCnt->IsInPurpleBuffer()) {
>+          mCollector->RemoveObjectFromGraph(o.mPointer);
>+          o.mRefCnt->stabilizeForDeletion();
>+          o.mParticipant->Trace(o.mPointer, *this, nullptr);
>+          o.mParticipant->DeleteCycleCollectable(o.mPointer);
>+        }
>       }
>+      segment = segment->getNext();
>     }
Given the current SegmentedArray setup, you'd need to have mObjects.Clear(); call here.
Don't you get an assertion all the time without it?
(I don't recall why I wanted the explicit .Clear() call and MOZ_ASSERT(IsEmpty()); in SegmentedArray dtor.)



> 
>   void
>   Visit(nsPurpleBuffer& aBuffer, nsPurpleBufferEntry* aEntry)
>   {
>     MOZ_ASSERT(aEntry->mObject, "Null object in purple buffer");
>     if (!aEntry->mRefCnt->get()) {
>       void* o = aEntry->mObject;
>       nsCycleCollectionParticipant* cp = aEntry->mParticipant;
>       CanonicalizeParticipant(&o, &cp);
>       SnowWhiteObject swo = { o, cp, aEntry->mRefCnt };
>-      if (mObjects.AppendElement(swo)) {
>-        aBuffer.Remove(aEntry);
>-      }
>+      mObjects.AppendElement(swo);
>+      aBuffer.Remove(aEntry);
>     }
>   }
> 
>   bool HasSnowWhiteObjects() const
>   {
>-    return mObjects.Length() > 0;
>+    return !mObjects.IsEmpty();
>   }
> 
>   virtual void Trace(JS::Heap<JS::Value>* aValue, const char* aName,
>                      void* aClosure) const
>   {
>     JS::Value val = *aValue;
>     if (val.isMarkable()) {
>       void* thing = val.toGCThing();
>@@ -2696,17 +2717,17 @@ public:
> 
>   virtual void Trace(JS::Heap<JSFunction*>* aFunction, const char* aName,
>                      void* aClosure) const
>   {
>   }
> 
> private:
>   nsCycleCollector* mCollector;
>-  FallibleTArray<SnowWhiteObject> mObjects;
>+  ObjectsArray mObjects;
> };
> 
> class RemoveSkippableVisitor : public SnowWhiteKiller
> {
> public:
>   RemoveSkippableVisitor(nsCycleCollector* aCollector,
>                          uint32_t aMaxCount, bool aRemoveChildlessNodes,
>                          bool aAsyncSnowWhiteFreeing,
Attachment #8518666 - Flags: review?(bugs) → review+
> >+    auto segment = mObjects.GetFirstSegment();
> Ok, this is about the only case where use o

Sorry, I don't understand this comment.

> Given the current SegmentedArray setup, you'd need to have mObjects.Clear();
> call here.
> Don't you get an assertion all the time without it?
> (I don't recall why I wanted the explicit .Clear() call and
> MOZ_ASSERT(IsEmpty()); in SegmentedArray dtor.)

Good catch. I did a try run and got build errors on most platforms, due to having SegmentedArrayElement as a class name and a typedef. Linux-ASAN was one of the few platforms that built and I got a bunch of leaks there, very likely because of this. Shouldn't be hard to fix, I'll definitely do so and re-run on try before landing.

BTW, something I forgot to mention is that adding a new SnowWhiteObject to the array is now infallible. I figure this is probably ok, because if we fail to allocate 4 KiB or 8 KiB then we're in deep trouble anyway. But I could make it fallible if you think it's important.
(In reply to Nicholas Nethercote [:njn] from comment #8)
> > >+    auto segment = mObjects.GetFirstSegment();
> > Ok, this is about the only case where use o
> 
> Sorry, I don't understand this comment.

What happened to my comment...
I think I said 
"Ok, this is about the only case where use of auto is not horrible."

> BTW, something I forgot to mention is that adding a new SnowWhiteObject to
> the array is now infallible. I figure this is probably ok, because if we
> fail to allocate 4 KiB or 8 KiB then we're in deep trouble anyway. But I
> could make it fallible if you think it's important.
I don't think it is important.
Try looks good on an updated patch that calls Clear(): https://treeherder.mozilla.org/ui/#/jobs?repo=try&revision=70ae0f061954
https://hg.mozilla.org/mozilla-central/rev/095f5e23c5c5
Status: ASSIGNED → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla36
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: