bugzilla.mozilla.org has resumed normal operation. Attachments prior to 2014 will be unavailable for a few days. This is tracked in Bug 1475801.
Please report any other irregularities here.

Reduce NSS's heap allocation rate

RESOLVED FIXED

Status

NSS
Libraries
RESOLVED FIXED
4 years ago
3 years ago

People

(Reporter: njn, Assigned: njn)

Tracking

Dependency tree / graph

Firefox Tracking Flags

(Not tracked)

Details

(Whiteboard: [MemShrink:P2])

Attachments

(3 attachments)

(Assignee)

Description

4 years ago
I did some cumulative heap profiling with a modified version of DMD (see bug
1094552) on an e10s build of Firefox. I ran http://gregor-wagner.com/tmp/mem50,
which opens 50 popular websites, one per tab. The parent process cumulatively
allocated 12.9 GiB of heap blocks, and the child process cumulatively allocated
14.2 GiB.

I was surprised to see that more than 2/3 of the allocations done in the parent
process were from NSS. Typically NSS doesn't show up as significant in memory
snapshots (which measure *live* allocations rather than *cumulative*
allocations), so this indicates that NSS is doing lots of short-lived
allocations.

Here are the top seven records when we use a stack trace depth of 1, which
gives us the most coarse-grained view. I've added some comments to some of the
records. (Note that the word "cumulative" in these records means "this record
and all the ones before it", and is unrelated to the notion of "cumulative
heap profiling". Apologies for any confusion.)

> # These are almost all from PORT_ArenaAlloc_Util() and
> # nss_zalloc_arena_locked().
> Live {
>   ~630,282 blocks in heap block record 1 of 1,832
>   ~2,580,519,432 bytes (~2,580,420,334 requested / ~99,098 slop)
>   Individual block sizes: 8,192 x 189; 4,096 x 165; ~4,093 x 629,928
>   19.99% of the heap (19.99% cumulative)
>   Allocated at {
>     #01: PL_ArenaAllocate (/home/njn/moz/mi5/co64dmd/nsprpub/lib/ds/../../../../nsprpub/lib/ds/plarena.c:203)
>   }
> }
> 
> Live {
>   ~344,405 blocks in heap block record 2 of 1,832
>   ~1,410,476,390 bytes (~1,163,558,918 requested / ~246,917,472 slop)
>   Individual block sizes: 4,096 x 275,575; ~4,093 x 68,830
>   10.93% of the heap (30.92% cumulative)
>   Allocated at {
>     #01: nss_ZAlloc (/home/njn/moz/mi5/security/nss/lib/base/arena.c:892)
>   }
> }
> 
> Live {
>   ~212,422 blocks in heap block record 3 of 1,832
>   ~1,385,587,010 bytes (~1,039,694,442 requested / ~345,892,568 slop)
>   Individual block sizes: 16,384 x 387; 8,192 x 124,759; 4,096 x 2; ~4,093 x 87,274
>   10.73% of the heap (41.65% cumulative)
>   Allocated at {
>     #01: PORT_ZAlloc_Util (/home/njn/moz/mi5/security/nss/lib/util/secport.c:117)
>   }
> }
> 
> # These are almost all called from PORT_NewArena_Util(), nssArena_Create(),
> # sftk_NewSession(), pk11_CreateNewContextInSlot(), nssCKFWMutex_Create(),
> # sftk_NewObject(), nssPKIObject_NewLock(), PK11_NewSlotList().
> Live {
>   ~296,281 blocks in heap block record 4 of 1,832
>   ~1,212,678,133 bytes (~1,212,678,133 requested / ~0 slop)
>   Individual block sizes: ~4,093 x 296,281
>   9.39% of the heap (51.04% cumulative)
>   Allocated at {
>     #01: PR_NewLock (/home/njn/moz/mi5/co64dmd/nsprpub/pr/src/pthreads/../../../../../nsprpub/pr/src/pthreads/ptsynch.c:142)
>   }
> }
> 
> Live {
>   ~213,135 blocks in heap block record 5 of 1,832
>   ~910,281,079 bytes (~905,540,469 requested / ~4,740,610 slop)
>   Individual block sizes: 20,480 x 2,314; 4,096 x 2; ~4,093 x 210,819
>   7.05% of the heap (58.09% cumulative)
>   Allocated at {
>     #01: PORT_Alloc_Util (/home/njn/moz/mi5/security/nss/lib/util/secport.c:86)
>   }
> }
> 
> Live {
>   ~220,580 blocks in heap block record 6 of 1,832
>   ~902,833,940 bytes (~902,833,940 requested / ~0 slop)
>   Individual block sizes: ~4,093 x 220,580
>   6.99% of the heap (65.09% cumulative)
>   Allocated at {
>     #01: mp_init_size (/home/njn/moz/mi1/security/nss/lib/freebl/mpi/mpi.c:120)
>   }
> }

I've done some digging and found various causes.

- Bug 1094650 fixes some over-allocation in nss_builtins_FindObjectsInit().

- Arenas are used *extremely* heavily. I understand that they can simplify
  freeing, but the frequency of arena creation/destruction is incredibly high.
  For each arena you have to allocate the arena, the first chunk, and also
  a lock (which explains most of the high allocation rate from PR_NewLock()).

- There appears to be a lot of copying of structs. I suspect this is related to
  the heavy use of arenas, e.g. often a struct needs to be copied from one
  arena into another so it can be destroyed at the right time.

- The number of arbitrary-precision integers created is enormous, hence the
  appearance of mp_init_size() in record 7 above. These appear to commonly be
  512 or 1024 bytes each, which is rather large.

I certainly don't want to make major changes to NSS. I'm hoping that a small
number of minor changes will suffice to greatly reduce the allocation rates.
This should make things faster (fewer calls to malloc() and free(), and less
memory traffic) and may help with heap fragmentation.
(Assignee)

Comment 1

4 years ago
> The number of arbitrary-precision integers created is enormous

More specifically, during another run of http://gregor-wagner.com/tmp/mem50,
934,262 of them were created. 617,087 of them were 512 bytes (64 digits),
and 317175 of them were 1024 bytes (128 digits), for a total of 640 MB of
allocations.
(Assignee)

Updated

4 years ago
Depends on: 1095307
(Assignee)

Comment 2

4 years ago
Created attachment 8518670 [details]
DMD output with stack traces of depth 5

This file gives a more detailed breakdown of where the allocations are happening.
(Assignee)

Comment 3

4 years ago
Most of the arbitrary-precision integers come about as a result of this stack trace:

> RSA_PublicKeyOp (/home/njn/moz/mi5/security/nss/lib/freebl/rsa.c:836)
> RSA_CheckSignRecover (/home/njn/moz/mi5/security/nss/lib/freebl/rsapkcs.c:1346)
> NSC_VerifyRecover (/home/njn/moz/mi5/security/nss/lib/softoken/pkcs11c.c:3349)
> PK11_VerifyRecover (/home/njn/moz/mi5/security/nss/lib/pk11wrap/pk11obj.c:648)
> recoverPKCS1DigestInfo (/home/njn/moz/mi5/security/nss/lib/cryptohi/secvfy.c:66)
> vfy_CreateContext (/home/njn/moz/mi5/security/nss/lib/cryptohi/secvfy.c:451)
> vfy_VerifyData (/home/njn/moz/mi5/security/nss/lib/cryptohi/secvfy.c:727)
> mozilla::pkix::VerifySignedData(mozilla::pkix::SignedDataWithSignature const&, mozilla::pkix::Input, unsigned int, void*) (/home/njn/moz/mi5/co64dmd/security/pkix/../../../security/pkix/lib/pkixnss.cpp:157)

In my mem50 workload, there were 39,108 calls to RSA_PublicKeyOp(). But there
were only 546 unique results, and the top 12 most frequent results covered
98.1% of the calls. I wonder if it's worth caching prior results.
(Assignee)

Updated

4 years ago
Depends on: 1096741

Comment 4

4 years ago
Before you spend too much time on this, would you mind sending something to mozilla.dev.tech.crypto discussing your plans, short and long-term?

I'm very nervous that you're touching very security sensitive code which is designed NOT to leak timing information, and that may require consistently allocating certain buffers or performing some operations. Because your hot spots are always going to be at the lowest layers, it would be good to discuss this with the broader NSS community, to understand your objectives / performance criteria, etc.

In particular, I'd like to understand what the objectives goals are and how success is or isn't measured. Simply reducing allocations to reduce allocations is not necessarily a reasonable goal (depending on the allocator used)
(Assignee)

Comment 5

4 years ago
I've emailed the list. I also expect that my patches will get the appropriate level of reviewing required.
(Assignee)

Comment 6

4 years ago
Created attachment 8520395 [details]
DMD output with stack traces of depth 10
(Assignee)

Updated

4 years ago
Whiteboard: [MemShrink] → [MemShrink:P2]
(Assignee)

Comment 7

4 years ago
Created attachment 8521702 [details]
DMD output with stack traces of depth 24
(Assignee)

Comment 8

3 years ago
With all the dependent bugs closed, this bug can be closed.
Status: NEW → RESOLVED
Last Resolved: 3 years ago
Resolution: --- → FIXED
Assignee: nobody → n.nethercote
You need to log in before you can comment on or make changes to this bug.