Closed Bug 1050035 Opened 5 years ago Closed 5 years ago

Most pldhash tables are empty or almost-empty

Categories

(Core :: XPCOM, defect)

x86_64
Linux
defect
Not set

Tracking

()

RESOLVED FIXED
mozilla38
Tracking Status
firefox38 --- fixed
firefox39 --- fixed

People

(Reporter: njn, Assigned: njn)

References

Details

(Whiteboard: [MemShrink:P2])

Attachments

(5 files)

Here's a histogram showing how full our pldhashes are, after opening gmail, TechCrunch and cnn.com:

(  1)     3414 (55.4%, 55.4%): pld: 0 / 16
(  2)     1151 (18.7%, 74.1%): pld: 1 / 16
(  3)      308 ( 5.0%, 79.1%): pld: 2 / 16
(  4)      118 ( 1.9%, 81.0%): pld: 6 / 16
(  5)      100 ( 1.6%, 82.6%): pld: 5 / 16
(  6)       95 ( 1.5%, 84.1%): pld: 4 / 16
(  7)       85 ( 1.4%, 85.5%): pld: 7 / 16
(  8)       84 ( 1.4%, 86.9%): pld: 3 / 16
(  9)       76 ( 1.2%, 88.1%): pld: 30 / 64
( 10)       68 ( 1.1%, 89.2%): pld: 12 / 32
( 11)       60 ( 1.0%, 90.2%): pld: 16 / 32
( 12)       60 ( 1.0%, 91.2%): pld: 8 / 16
( 13)       57 ( 0.9%, 92.1%): pld: 25 / 64
( 14)       57 ( 0.9%, 93.0%): pld: 124 / 256
( 15)       23 ( 0.4%, 93.4%): pld: 10 / 16
( 16)       23 ( 0.4%, 93.8%): pld: 9 / 16
( 17)       20 ( 0.3%, 94.1%): pld: 15 / 32
( 18)       14 ( 0.2%, 94.3%): pld: 13 / 32
( 19)       12 ( 0.2%, 94.5%): pld: 36 / 64
( 20)       12 ( 0.2%, 94.7%): pld: 20 / 32

55.4% are empty! 79.1% have 2 or fewer items!

Here's the same data, but weighted by capacity:

(  1)    54624 (17.9%, 17.9%): pld: 0 / 16
(  2)    32768 (10.8%, 28.7%): pld: 15795 / 32768
(  3)    32768 (10.8%, 39.5%): pld: 14859 / 32768
(  4)    18416 ( 6.0%, 45.5%): pld: 1 / 16
(  5)    14592 ( 4.8%, 50.3%): pld: 124 / 256
(  6)     8192 ( 2.7%, 53.0%): pld: 5006 / 8192
(  7)     8192 ( 2.7%, 55.7%): pld: 5620 / 8192
(  8)     8192 ( 2.7%, 58.4%): pld: 5994 / 8192
(  9)     8192 ( 2.7%, 61.1%): pld: 3683 / 8192
( 10)     4928 ( 1.6%, 62.7%): pld: 2 / 16
( 11)     4864 ( 1.6%, 64.3%): pld: 30 / 64
( 12)     4096 ( 1.3%, 65.6%): pld: 2103 / 4096
( 13)     4096 ( 1.3%, 67.0%): pld: 1773 / 4096
( 14)     4096 ( 1.3%, 68.3%): pld: 1485 / 4096
( 15)     4096 ( 1.3%, 69.7%): pld: 2597 / 4096
( 16)     4096 ( 1.3%, 71.0%): pld: 1223 / 2048
( 17)     3648 ( 1.2%, 72.2%): pld: 25 / 64
( 18)     2176 ( 0.7%, 72.9%): pld: 12 / 32
( 19)     2048 ( 0.7%, 73.6%): pld: 822 / 2048
( 20)     2048 ( 0.7%, 74.3%): pld: 923 / 2048

Empty hashes still account for 17.9% of total element capacity. This doesn't account for the fact that different pldhashes have different-sized elements, but still...
Depends on: 1050036
Here's the top 20, with the entry storage size (in bytes) shown in parentheses.

(  1)     1769 (28.0%, 28.0%): pld: 0 / 16 (512)
(  2)      789 (12.5%, 40.5%): pld: 1 / 16 (2048)
(  3)      620 ( 9.8%, 50.4%): pld: 0 / 16 (256)
(  4)      528 ( 8.4%, 58.7%): pld: 0 / 16 (1024)
(  5)      376 ( 6.0%, 64.7%): pld: 0 / 16 (2048)
(  6)      186 ( 2.9%, 67.6%): pld: 1 / 16 (1024)
(  7)      171 ( 2.7%, 70.3%): pld: 0 / 16 (384)
(  8)      136 ( 2.2%, 72.5%): pld: 1 / 16 (256)
(  9)      135 ( 2.1%, 74.6%): pld: 2 / 16 (2048)
( 10)       99 ( 1.6%, 76.2%): pld: 2 / 16 (1024)
( 11)       69 ( 1.1%, 77.3%): pld: 1 / 16 (384)
( 12)       63 ( 1.0%, 78.3%): pld: 7 / 16 (384)
( 13)       61 ( 1.0%, 79.3%): pld: 30 / 64 (8192)
( 14)       59 ( 0.9%, 80.2%): pld: 16 / 32 (2048)
( 15)       58 ( 0.9%, 81.1%): pld: 25 / 64 (4096)
( 16)       58 ( 0.9%, 82.0%): pld: 6 / 16 (384)
( 17)       58 ( 0.9%, 83.0%): pld: 124 / 256 (20480)
( 18)       58 ( 0.9%, 83.9%): pld: 12 / 32 (2048)
( 19)       52 ( 0.8%, 84.7%): pld: 3 / 16 (384)
( 20)       52 ( 0.8%, 85.5%): pld: 4 / 16 (384)

The same data, but weighted by entryStore size:

(  1)  1615872 (11.9%, 11.9%): pld: 1 / 16 (2048)
(  2)  1187840 ( 8.7%, 20.6%): pld: 124 / 256 (20480)
(  3)   905728 ( 6.7%, 27.3%): pld: 0 / 16 (512)
(  4)   770048 ( 5.7%, 32.9%): pld: 0 / 16 (2048)
(  5)   540672 ( 4.0%, 36.9%): pld: 0 / 16 (1024)
(  6)   524288 ( 3.9%, 40.8%): pld: 5006 / 8192 (524288)
(  7)   524288 ( 3.9%, 44.6%): pld: 15877 / 32768 (524288)
(  8)   524288 ( 3.9%, 48.5%): pld: 14895 / 32768 (524288)
(  9)   499712 ( 3.7%, 52.2%): pld: 30 / 64 (8192)
( 10)   393216 ( 2.9%, 55.0%): pld: 5620 / 8192 (393216)
( 11)   393216 ( 2.9%, 57.9%): pld: 4401 / 16384 (393216)
( 12)   276480 ( 2.0%, 60.0%): pld: 2 / 16 (2048)
( 13)   237568 ( 1.7%, 61.7%): pld: 25 / 64 (4096)
( 14)   190464 ( 1.4%, 63.1%): pld: 1 / 16 (1024)
( 15)   163840 ( 1.2%, 64.3%): pld: 2104 / 4096 (163840)
( 16)   158720 ( 1.2%, 65.5%): pld: 0 / 16 (256)
( 17)   131072 ( 1.0%, 66.5%): pld: 960 / 2048 (131072)
( 18)   131072 ( 1.0%, 67.4%): pld: 1552 / 4096 (131072)
( 19)   131072 ( 1.0%, 68.4%): pld: 5994 / 8192 (131072)
( 20)   120832 ( 0.9%, 69.3%): pld: 16 / 32 (2048)

Empty arrays account for 18.1% of the entryStore size. That is 2.44 MiB.
Whiteboard: [MemShrink] → [MemShrink:P2]
Bug 1059056 got rid of about half of the empty tables I was seeing. Beyond that it got into a long tail situation and I decided it wasn't worth case-by-case fixes any more.

This bug can be closed.
Depends on: 1059056
Status: ASSIGNED → RESOLVED
Closed: 5 years ago
Resolution: --- → WORKSFORME
I have in-progress patches working that make the allocation of |mEntryStore| lazy -- the allocation happens now when PL_DHashTableAdd() is first called, not when PL_DHashTableInit() is called. This was made possible by bug 1124793, which got rid of PL_DHashTableLookup() which returned a pointer to an unused entry on lookup failure.

However, in bug 1050036 comment 2, mccr8 said this about making |mEntryStore| lazy:

> Please don't lazily create it for hash tables that are initialized with a large size.
> PL_DHashtableInit is infallible by default, but the add operation is not, so if we
> lazily initialize large hash tables, then we'll go from having a nice size-annotated
> OOM crash to some other odd failure down the line.  (The specific case I'm concerned about
> is the cycle collector.)

The only hash table I can find in nsCycleCollector.cpp is CCGraph::mPtrToNodeMap. It's created with a length of 16384, which would give an initial capacity of 32768, and an initial size of 256 KiB (on 32-bit platforms) and 512 KiB (on 64-bit platforms). mccr8, is this the only case that concerns you? Currently, if the initial allocation fails you'll get a nicely-annotated OOM, but if a subsequent resizing fails (after adding lots of entries) then you'll get an odd failure. So you already have a mix there...

If this is really important I guess I could add a method to PLDHashTable() that force-allocates the entry storage and you could use it in the cycle collector. Do you still think this is important?
Status: RESOLVED → REOPENED
Depends on: 1124793
Flags: needinfo?(continuation)
Resolution: WORKSFORME → ---
Well, really what should be done is to just make these hashtable operations infallible and rip out all the weird OOM gunk, but I haven't really gotten around to it, so just go ahead and do whatever.

It would be nice to have a way to do an infallible add.  I've also considered switching this hashtable over to nsTHashtable, but I'm a little concerned about what perf effects that might have.
Flags: needinfo?(continuation)
I'd also argue that making PLDHashtable operations infallible-by-default would be a step towards making the class more modern. ;)  Though dealing with the OOM crash fallout might be a pain.
(In reply to Andrew McCreight [:mccr8] from comment #5)
> I'd also argue that making PLDHashtable operations infallible-by-default
> would be a step towards making the class more modern. ;)  Though dealing
> with the OOM crash fallout might be a pain.

So...

- Currently, PLDHashTable has: (a) fallible Init, (b) infallible Init (with nice OOM annotation on failure), and (c) fallible Add. All three operations can do large allocations.

- My in-progress patches change that to: (a) infallible Init, and (b) fallible Add. Only (b) can do large allocations.

- I could change my patches to add (c) infallible Add (with nice OOM annotation on failure). This seems like the best combination. And nsTHashtable would end up with the same behaviour because it's layered on top of PLDHashTable.
This makes zero-element hash tables, which are common, smaller, and also avoids
unnecessary malloc/free pairs.

I did some measurements during some basic browsing of a few sites. I found that
35% of all live tables were empty with a few tabs open. And cumulatively, for
the whole session, 45% of tables never had an element added to them.

There is more to be done w.r.t. simplifying initialization, which will occur in
the next patch.
Attachment #8558247 - Flags: review?(nfroyd)
mrbkap: Can you please review the xpconnect changes? In summary,
PL_NewDHashTable() is now infallible, and |new| is already infallible (I'm
pretty sure that's true in xpconnect), so there's a heap of null checking that
can be removed.

froydnj: Can you please review everything else? The non-xpconnect changes
outside of pldhash.{cpp,h} are fairly simple, but please ask for additional
review if you are uncomfortable with any of them.
Attachment #8558257 - Flags: review?(nfroyd)
Attachment #8558257 - Flags: review?(mrbkap)
Because they are now just equivalent to |new PLDHashTable()| +
PL_DHashTableInit() and PL_DHashTableFinish(t) + |delete t|, respectively.

They're only used in a handful of places and obscure things more than they
clarify -- I only recently worked out exactly how they different from Init()
and Finish().
Attachment #8558272 - Flags: review?(nfroyd)
I kept all the existing PL_DHashTableAdd() calls fallible, in order to be
conservative, except for the ones in nsAtomTable.cpp which already were
followed immediately by an abort on failure.
Attachment #8558274 - Flags: review?(nfroyd)
Attachment #8558276 - Flags: review?(continuation) → review+
Comment on attachment 8558247 [details] [diff] [review]
(part 1) - Lazily allocate PLDHashTable::mEntryStore

Review of attachment 8558247 [details] [diff] [review]:
-----------------------------------------------------------------

I think this is OK.  I'd appreciate a double-check on my thinking for the ::Enumerate change.

Assuming your numbers from comment 0 and comment 1 are still accurate, and I have understood them correctly, this change saves ~1MB of memory for a reasonably common browsing session?

::: xpcom/glue/pldhash.cpp
@@ +571,5 @@
>  PLDHashTable::Add(const void* aKey)
>  {
>    PLDHashNumber keyHash;
>    PLDHashEntryHdr* entry;
> +  uint32_t capacity;

Moving this here is odd, since you don't use it in the allocating case.  Is this done to avoid compiler warnings?

@@ +747,5 @@
>      }
>    }
>  
>    for (uint32_t e = 0; e < capacity; ++e) {
> +    MOZ_ASSERT(mEntryStore);

I think instead of asserting this each time through the loop, we should simply do:

MOZ_ASSERT_IF(capacity > 0, mEntryStore);

outside of the loop.

I think this is safe, because mEntryStore can never become nullptr during the loop (assuming I understand how Remove() works--but even if I don't, I think if we call Remove() in the enumerate callback, we can use-after-free even before this patch), and if somebody is calling PL_DHashTableFinish() during an enumerate callback, they get what they deserve.

@@ +943,5 @@
>    // checks pass, then this method will only iterate through the full capacity
>    // once. If they fail, then this loop may end up returning the early entries
>    // more than once.
>    for (uint32_t e = 0; e < capacity; ++e) {
> +    MOZ_ASSERT(mTable->mEntryStore);

Same rationale as for the Enumerate() case above applies here.

@@ +997,5 @@
>    hash2 = 0;
>    sqsum = 0;
>  
>    for (uint32_t i = 0; i < capacity; i++) {
> +    MOZ_ASSERT(mEntryStore);

...and for all those people using DumpMeter(), we should do the same thing here.
Attachment #8558247 - Flags: review?(nfroyd) → review+
Comment on attachment 8558257 [details] [diff] [review]
(part 2) - Remove the fallible version of PL_DHashTableInit()

Review of attachment 8558257 [details] [diff] [review]:
-----------------------------------------------------------------

::: xpcom/tests/TestPLDHash.cpp
@@ +26,5 @@
> +  // platforms and 1GB on 64-bit platforms.
> +  //
> +  // Ideally we'd also try (a) a too-large capacity, and (b) a large capacity
> +  // combined with a large entry size that when multipled overflow. But those
> +  // cases would cause the test to abort immediately.

Why don't we (maybe as a followup) write a Unix-only test that does something like:

int pid = fork();
if (pid == -1) {
  abort();
}

if (pid == 0) {
  /* child, do too-large things and expect aborts */
} else {
  /* parent, wait for child to fail */
}
Attachment #8558257 - Flags: review?(nfroyd) → review+
Comment on attachment 8558272 [details] [diff] [review]
(part 3) - Remove PL_NewDHashTable() and PL_DHashTableDestroy()

Review of attachment 8558272 [details] [diff] [review]:
-----------------------------------------------------------------

I assume all these callsites are on the list for the eventual removal of the C-style pldhash API?

Inflating lines of code doesn't seem like a great change, but I suppose that a smaller API surface is worth it.
Attachment #8558272 - Flags: review?(nfroyd) → review+
Attachment #8558274 - Flags: review?(nfroyd) → review+
A descriptive email to dev-platform about these changes would be a good idea.
> Assuming your numbers from comment 0 and comment 1 are still accurate, and I
> have understood them correctly, this change saves ~1MB of memory for a
> reasonably common browsing session?

At least that, yes. The important number from comment 1 was 2.44 MiB, and then bug 1059056's lazy creation of attribute caches got rid of a big chunk of them. (I plan to revert that change once this lands, since it's now just unnecessary complication.)

> > +  uint32_t capacity;
> 
> Moving this here is odd, since you don't use it in the allocating case.  Is
> this done to avoid compiler warnings?

I introduced a new goto, so I had to move this to avoid an error about jumping over a variable initialization.

> I think instead of asserting this each time through the loop, we should
> simply do:
> 
> MOZ_ASSERT_IF(capacity > 0, mEntryStore);
> 
> outside of the loop.

Sounds fine. I'll do it for the three cases you mentioned.

> I assume all these callsites are on the list for the eventual removal of the C-style pldhash API?

Yes. Bug 1121760. But I'm doing the more substantial refactorings first, and saving the simple renamings for later.
> Inflating lines of code doesn't seem like a great change, but I suppose that
> a smaller API surface is worth it.

 8 files changed, 67 insertions(+), 78 deletions(-)

:)
Attachment #8558257 - Flags: review?(mrbkap) → review+
Depends on: 1129692
Depends on: 1130914
Depends on: 1131097
Component: General → XPCOM
Depends on: 1131386
Blocks: 1129786
I backed this out. Too many mysterious crashes and assertion failures:
https://hg.mozilla.org/integration/mozilla-inbound/rev/a4cf56d0e98f
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Depends on: 1132218
Depends on: 1130088
Depends on: 1130157
Duplicate of this bug: 1131097
Now that we've started a new dev cycle, I have landed a slightly tweaked version of part 1 (which doesn't change the API) again, to see if it still causes problems. Bug 1132802 added some more assertions, which might help finding any problems.

https://hg.mozilla.org/integration/mozilla-inbound/rev/a9248b3d5c1f
https://hg.mozilla.org/mozilla-central/rev/a9248b3d5c1f
Status: REOPENED → RESOLVED
Closed: 5 years ago5 years ago
Resolution: --- → FIXED
If part 1 sticks this time, I will redo parts 2 and 3 in a separate bug.
(In reply to Nicholas Nethercote [:njn] from comment #24)
> If part 1 sticks this time, I will redo parts 2 and 3 in a separate bug.

After almost a month on m-c I haven't heard of any problems.  Chris Lord confirmed in bug 1131386 comment 8 that he's not getting the B2G crash from the previous landing, which was the only problem that reproduced reliably. Hooray. I'll get to relanding parts 2 and 3 some time in Q2.
> ::: xpcom/tests/TestPLDHash.cpp
> @@ +26,5 @@
> > +  // platforms and 1GB on 64-bit platforms.
> > +  //
> > +  // Ideally we'd also try (a) a too-large capacity, and (b) a large capacity
> > +  // combined with a large entry size that when multipled overflow. But those
> > +  // cases would cause the test to abort immediately.
> 
> Why don't we (maybe as a followup) write a Unix-only test that does
> something like:
> 
> int pid = fork();
> if (pid == -1) {
>   abort();
> }
> 
> if (pid == 0) {
>   /* child, do too-large things and expect aborts */
> } else {
>   /* parent, wait for child to fail */
> }

I finally got around to trying this. When the child process aborted I got a stack trace and the following:

> Sleeping for 300 seconds.
> Type 'gdb /home/njn/moz/mi2/d64/dist/bin/firefox 5945' to attach your debugger to this thread.

A 5 minute pause in proceedings is bad. I wonder if there's a way to avoid that. glandium, any ideas?
Flags: needinfo?(mh+mozilla)
Setting the environment variable MOZ_GDB_SLEEP to 0 prior to the fork is what you want here.
Flags: needinfo?(mh+mozilla)
> Setting the environment variable MOZ_GDB_SLEEP to 0 prior to the fork is
> what you want here.

Thank you for the pointer. It doesn't quite work because MOZ_GDB_SLEEP is read at start-up. But I'm going to try changing it so it's read in ah_crap_handler() instead.
(In reply to Nicholas Nethercote [:njn] from comment #25)
> After almost a month on m-c I haven't heard of any problems.  Chris Lord
> confirmed in bug 1131386 comment 8 that he's not getting the B2G crash from
> the previous landing, which was the only problem that reproduced reliably.
> Hooray. I'll get to relanding parts 2 and 3 some time in Q2.

Are there bugs for tracking the outstanding work?
Flags: needinfo?(n.nethercote)
> Are there bugs for tracking the outstanding work?

New versions of parts 2, 3 and 4 all landed a while ago. I don't remember the exact bug numbers, but they will be somewhere in the dependencies of bug 1128407.

mccr8: part 5 never relanded. It looks like the code affected has changed a bit in the meantime. I don't know if you would like to resurrect the patch.
Flags: needinfo?(n.nethercote) → needinfo?(continuation)
(In reply to Nicholas Nethercote [:njn] from comment #30)
> mccr8: part 5 never relanded. It looks like the code affected has changed a
> bit in the meantime. I don't know if you would like to resurrect the patch.

Looking through the hg history, it looks like it landed in bug 1131901, then I backed it out in bug 1144649 because it was causing too many crashes.
Flags: needinfo?(continuation)
You need to log in before you can comment on or make changes to this bug.