Open Bug 97485 Opened 19 years ago Updated 13 years ago
malloc lock contention
123.12 KB, text/plain
186.12 KB, text/html
12.46 KB, patch
|Details | Diff | Splinter Review|
OS/2 test program to try to load a variable from an executable. Mike Kaply, please pass this to OS/2 kernel development
1.31 KB, text/plain
2.49 KB, patch
|Details | Diff | Splinter Review|
1.49 KB, patch
|Details | Diff | Splinter Review|
The iws folks (email@example.com) found that the webserver is 60% faster when SmartHeap is engaged. SmartHeap is not usable for the most part though, because it breaks the Java garbage collector. This bug is an attempt to address heap contention in NSS specifically. Detailed notes are attached and available in: /u/kirke/nss/collector/HINTS As of this filing, I've created 3 freelists for data structures that are allocated often. Negligable performance improvement running selfserv resulted from these changes, but Wan-Teh asked that I file a bug for the record with my patch. It creates freelists for SECITEM, session, and context which recycle data structures rather than calling malloc() repeatedly. The NSS library addresses heap contention already by using arenas. Larry (firstname.lastname@example.org) instrumented NSPR with lock wrapper code to ascertain the "high-water-mark" of the arenas. He found that we didn't stand to gain anything by fiddling with the arena sizes. See [Bug 87725] Instrument NSS use of PLArena. Solaris Collector/Analyzer proved to be most useful in identifying contention. 2001-0720-0931 To reduce contention on the global lock that lives in malloc, we replace calls to calls with high Inclusive Other sec. Lock contention (LMS_USER_LOCK - see p. 144 of Analyzing Program Performance with WorkShop) is included in this stat. PORT_Alloc() with PORT_ArenaAlloc() malloc() PORT_ZAlloc() with PORT_ArenaZAlloc() calloc() 1. pk11_CreateNewContextInSlot() 110.950 2. pk11_NewSession() 32.390 3. pk11_InitGeneric() 19.010 4. SECTITEM_DupItem() 19.010 Turning on "synctrace" yields "inclusive sync wait time". After instituting the session and context freelists originating from (1) and (2), I discovered how to get synctrace working. killall selfserv rm -rf .test* test* selfserv --optimize --debug collector store filename test.1.er collector store directory /export/kirke/workarea-nss33 collector synctrace threshold 0 collector synctrace on collector show collector enable run -t 8 -d /export/kirke/certs -n "iws-files.red.iplanet.com's Cert-O-Matic II ID" -p 12344 -w iplanet go --restartonly iws-files xxx https://iws-files.red.iplanet.com:12344/stop 2001-0802-1438 Analyzer output (post changes from axilla) is backed up in axilla:/tmp/collector-2001-0801 1. Use exclusive metrics to locate functions that have high metric values. Exclusive metrics for a function are calculated from events which occur inside the function itself; they exclude metrics coming from calls to other functions. mutex_lock 671.741 pthread_mutex_lock 590.656 pthread_cond_wait 162.759 2. Use inclusive metrics to determine which call sequence was responsible for high metric values. Inclusive metrics are calculated from events which occur inside the function and any function it calls; thet include metrics coming from calls to other functions. mutex_lock malloc 346.955 PORT_Alloc 220.448 pk11_CreateNewContextInSlot 239.185 pk11_InitGeneric 89.980 pk11_NewSession 53.553 SECITEM_DupItem 50.668 calloc (PORT_ZAlloc) 120.285 pk11_getKeyFromList 41.699 ssl3_InitState 39.052 ssl3_CreateSecurityInfo 34.271 calloc (PR_NewLock) 36.644 pk11_CreateNewContextInSlot 239.185 pk11_NewSession 53.553 pk11_getKeyFromList 41.699 free 323.514 pthread_mutex_lock PR_Lock 590.158 ssl_Read 981.068 pk11_CreateNewContextInSlot 239.185 pk11_DestroyContext 156.575 pk11_CloneContext 106.912 pthread_cond_wait PORT_Alloc 220.448 3. Use attributed metrics to trace a particular call sequence to the function or functions that are responsible for high metric values. Attributed metricss tell you how much of a metric came from calls from or to another function; they attribute metrics to another function. ssl_Read 981.068 ssl_SecureRecv 942.061 ssl_Do1stHandshake 942.014 .... ssl3_HandleHandshakeMessage 839.295 ssl3_HandleClientHello 710.477 ssl3_InitPendingCipherSpec 398.732 ssl3_GenerateSesssionKeys 256.173 PK11_Derive 133.360 PK11_SymKeyFromHandle 117.690 pk11_CloseSession 64.088 pk11_DestroySession 59.602 (free) PK11_CreateSymKey 53.603 PK11_CreateContextBySymKey 129.557 pk11_getKeyFromList 41.699 PK11_ReferenceSlot 23.690 pk11_CreateSymKey pk11_getKeyFromList 41.699 pk11_getNewSession NSC_OpenSession pk11_NewSession 53.553 ssl3_SendFinished 201.115 pk11_CreateNewContextInSlot 239.185 pk11_context_init 127.106 NSC_SignInit 88.203 pk11_InitGeneric 38.436 PORT_Alloc 87.165 NSC_DigestInit 49.012 pk11_SessionFromHandle 26.961 PR_Lock pk11_InitGeneric 21.414 PORT_Alloc SECITEM_DupItem 50.566 PORT_Alloc 50.668 pk11_GetNewSession 48.251 NSC_OpenSession 89.527 pk11_NewSession 53.553 PR_Lock 35.974 pk11_DestroyContext 156.575 PK11_FreeSlot 89.663 PR_Lock SECITEM_FreeItem 89.663 free 41.572 free 22.721 pk11_CloneContext 106.912 pk11_CreateNewContextInSlot 69.319 pk11_context_init 127.106 NSC_SignInit 88.203 pk11_InitGeneric 38.436 PORT_Alloc 87.165 pk11_InitGeneric 89.980 PORT_Alloc 87.165 pk11_NewSession 53.553 PR_NewLock 35.823 PORT_Alloc 17.730 SECITEM_DupItem 50.668 PORT_Alloc 50.668 pk11_getKeyFromList 41.699 pk11_GetNewSession 41.275 NSC_OpenSession 89.527 pk11_NewSession 53.553 PR_Lock 35.974 ssl3_InitState 39.052 PK11_CreateDigestContext 38.565 pk11_CreateNewContextInSlot 32.836 ssl3_CreateSecurityInfo 34.271 PORT_ZAlloc 17.893
Here are selfserv runs on Linux. The "tip" is for comparison (no change). The "items" isolate the secitem.c change. The "lists" results are with all 3 (context, session, and item freelists). box 010827 tip-delay-full 12.86 13.37 14.46 74.79 box 010827 tip-delay-restart 4.48 4.53 4.71 220.91 box 010827 tip-nodelay-full 12.79 12.87 13.51 77.67 box 010827 tip-nodelay-restart 4.42 4.61 4.71 216.68 box 010827 tip-nodelayclient-full 12.79 12.84 12.96 77.88 box 010827 tip-nodelayclient-restart 4.44 4.56 4.62 219.14 box 010827 tip-nodelayserver-full 12.62 12.77 12.90 78.32 box 010827 tip-nodelayserver-restart 4.52 4.59 4.66 218.02 box 010827 items-delay-full 12.79 12.94 13.65 77.31 box 010827 items-delay-restart 4.46 4.60 4.70 217.47 box 010827 items-nodelay-full 12.71 12.83 13.10 77.92 box 010827 items-nodelay-restart 4.39 4.52 4.60 221.08 box 010827 items-nodelayclient-full 12.83 13.14 14.13 76.12 box 010827 items-nodelayclient-restart 4.52 4.81 4.94 207.76 box 010827 items-nodelayserver-full 12.88 13.00 13.04 76.94 box 010827 items-nodelayserver-restart 4.50 4.57 4.65 218.74 box 010829 lists-delay-full 12.61 12.67 13.42 78.96 box 010829 lists-delay-restart 4.19 4.36 4.53 229.18 box 010829 lists-nodelay-full 12.58 12.71 12.81 78.69 box 010829 lists-nodelay-restart 4.27 4.39 4.47 228.05 box 010829 lists-nodelayclient-full 12.54 12.79 13.35 78.20 box 010829 lists-nodelayclient-restart 4.27 4.37 4.57 228.66 box 010829 lists-nodelayserver-full 12.57 12.69 12.94 78.80 box 010829 lists-nodelayserver-restart 4.23 4.32 4.37 231.57
Status: UNCONFIRMED → NEW
Ever confirmed: true
Placing zone allocator in PORT_Alloc and friends didn't yield performance gain based on selfserv/strsclnt. Entertained the thought of creating a separate malloc lib so we catch freebl and EVERYTHING, but Wan-Teh and I feared we would be like SmartHeap and likely fail Java garbage collection overriding malloc. Also this approach won't work on all platforms. So the NSPR PR_Malloc level is where we would like Nelson's zone allocator. Initial tests show no performance gain. Wondering now if we're stressing like Julien did (with GNAT), driving Enterprise Server. Increasing strsclnt threads from default (8) to 100 degraded performance. Need to settle on appropriate strsclnt/selfserv parameters to reveal the performance gain basically removing the mallocs should yield. The zone allocator is rigged to print out stats on normal exit. Here's an example strsclnt/selfserv run, which was done when the allocator was in the PORT_ memory routines and some collector/analyzer output run under Solaris SunOS iws-files 5.8 Generic sun4u sparc SUNW,Ultra-250 Running: selfserv -t 8 -d /export/kirke/certs -n "iws-files.red.iplanet.com's Cert-O-Matic II ID" -p 12344 -w iplanet (process id 22796) Reading libcollector.so Creating experiment database test.1.er ... Reading libfreebl_hybrid_3.so pool: 1, zone: 0, size: 16, free: 21, hit: 4440, miss: 25, contend: 0 pool: 1, zone: 1, size: 64, free: 37, hit: 25299, miss: 130, contend: 0 pool: 1, zone: 2, size: 256, free: 16, hit: 9329, miss: 79, contend: 0 pool: 1, zone: 3, size: 1024, free: 24, hit: 18020, miss: 44, contend: 2 pool: 1, zone: 4, size: 4096, free: 17, hit: 17987, miss: 76, contend: 1 pool: 2, zone: 0, size: 16, free: 18, hit: 41550, miss: 18, contend: 0 pool: 2, zone: 1, size: 64, free: 13, hit: 17337, miss: 18, contend: 0 pool: 2, zone: 2, size: 256, free: 24, hit: 60049, miss: 26, contend: 1 pool: 2, zone: 3, size: 1024, free: 4, hit: 6924, miss: 4, contend: 0 pool: 2, zone: 4, size: 4096, free: 1, hit: 1153, miss: 6, contend: 0 pool: 2, zone: 6, size: 65536, free: 1, hit: 1153, miss: 1, contend: 0 pool: 3, zone: 0, size: 16, free: 18, hit: 41382, miss: 18, contend: 0 pool: 3, zone: 1, size: 64, free: 9, hit: 17241, miss: 11, contend: 0 pool: 3, zone: 2, size: 256, free: 25, hit: 59698, miss: 27, contend: 1 pool: 3, zone: 3, size: 1024, free: 3, hit: 6897, miss: 3, contend: 0 pool: 3, zone: 4, size: 4096, free: 1, hit: 1149, miss: 7, contend: 0 pool: 3, zone: 6, size: 65536, free: 1, hit: 1149, miss: 1, contend: 0 pool: 4, zone: 0, size: 16, free: 18, hit: 39630, miss: 18, contend: 0 pool: 4, zone: 1, size: 64, free: 13, hit: 16522, miss: 15, contend: 0 pool: 4, zone: 2, size: 256, free: 25, hit: 57238, miss: 26, contend: 1 pool: 4, zone: 3, size: 1024, free: 4, hit: 6604, miss: 4, contend: 0 pool: 4, zone: 4, size: 4096, free: 1, hit: 1100, miss: 1, contend: 0 pool: 4, zone: 6, size: 65536, free: 1, hit: 1100, miss: 1, contend: 0 pool: 5, zone: 0, size: 16, free: 18, hit: 39630, miss: 18, contend: 0 pool: 5, zone: 1, size: 64, free: 9, hit: 16529, miss: 17, contend: 0 pool: 5, zone: 2, size: 256, free: 24, hit: 57261, miss: 27, contend: 0 pool: 5, zone: 3, size: 1024, free: 4, hit: 6604, miss: 4, contend: 0 pool: 5, zone: 4, size: 4096, free: 1, hit: 1100, miss: 10, contend: 0 pool: 5, zone: 6, size: 65536, free: 1, hit: 1100, miss: 1, contend: 0 pool: 7, zone: 0, size: 16, free: 18, hit: 39306, miss: 18, contend: 0 pool: 7, zone: 1, size: 64, free: 13, hit: 16387, miss: 22, contend: 0 pool: 7, zone: 2, size: 256, free: 25, hit: 56774, miss: 27, contend: 0 pool: 7, zone: 3, size: 1024, free: 4, hit: 6550, miss: 4, contend: 0 pool: 7, zone: 4, size: 4096, free: 1, hit: 1091, miss: 10, contend: 0 pool: 7, zone: 6, size: 65536, free: 1, hit: 1091, miss: 1, contend: 0 pool: 8, zone: 0, size: 16, free: 18, hit: 41984, miss: 18, contend: 0 pool: 8, zone: 1, size: 64, free: 13, hit: 17535, miss: 19, contend: 0 pool: 8, zone: 2, size: 256, free: 24, hit: 60582, miss: 26, contend: 1 pool: 8, zone: 3, size: 1024, free: 4, hit: 7002, miss: 4, contend: 0 pool: 8, zone: 4, size: 4096, free: 1, hit: 1165, miss: 5, contend: 0 pool: 8, zone: 6, size: 65536, free: 1, hit: 1165, miss: 1, contend: 0 pool: 9, zone: 0, size: 16, free: 18, hit: 40554, miss: 18, contend: 0 pool: 9, zone: 1, size: 64, free: 9, hit: 16896, miss: 18, contend: 0 pool: 9, zone: 2, size: 256, free: 25, hit: 58605, miss: 26, contend: 0 pool: 9, zone: 3, size: 1024, free: 3, hit: 6759, miss: 3, contend: 0 pool: 9, zone: 4, size: 4096, free: 1, hit: 1126, miss: 11, contend: 0 pool: 9, zone: 6, size: 65536, free: 1, hit: 1126, miss: 1, contend: 0 pool: 10, zone: 0, size: 16, free: 18, hit: 39966, miss: 18, contend: 0 pool: 10, zone: 1, size: 64, free: 13, hit: 16677, miss: 16, contend: 0 pool: 10, zone: 2, size: 256, free: 24, hit: 57730, miss: 28, contend: 2 pool: 10, zone: 3, size: 1024, free: 4, hit: 6660, miss: 4, contend: 0 pool: 10, zone: 4, size: 4096, free: 1, hit: 1109, miss: 4, contend: 0 pool: 10, zone: 6, size: 65536, free: 1, hit: 1109, miss: 1, contend: 0 execution completed, exit code is 0 (/usr/suntools/internal/SUNWspro/bin/../WS6U1/bin/sparcv9/ 2052 exclusive sync wait sec (pthread_mutex_lock) 1219 exclusive sync wait sec (pthread_cond_wait) 591 exclusive sync wait sec (mutex_lock) 2048 inclusive sync wait sec PR_Lock
Status: NEW → ASSIGNED
kirke@iws-files !! | grep so Julien, I thought I should let you know where the NSPR based zone allocator can be found for the record incase you have an opportunity to exercise the web server. We're wondering if my selfserv/strsclnt benchmarking is stressing enough. I built on iws-files: f "*nspr*" | grep so ./mozilla/dist/SunOS5.8_OPT.OBJ/bin/libnspr4.so ./mozilla/dist/SunOS5.8_OPT.OBJ/lib/libnspr4.so ./mozilla/dist/SunOS5.8_DBG.OBJ/bin/libnspr4.so ./mozilla/dist/SunOS5.8_DBG.OBJ/lib/libnspr4.so ./mozilla/nsprpub/pr/src/SunOS5.8_OPT.OBJ/libnspr4.so ./mozilla/nsprpub/pr/src/SunOS5.8_DBG.OBJ/libnspr4.so kirke@iws-files pwd /export/kirke/workarea-nss33 kirke@iws-files
Kirk, I don't have access to much hardware to do my stress testing yet as I was doing before, since our labs aren't setup. I will try to see if the network in my cube is sufficient for measurements but I'm not certain.
Julien, I've built on Axilla under Solaris 2.6: kirke@axilla f "*nspr*so" ./mozilla/nsprpub/pr/src/SunOS5.6_OPT.OBJ/libnspr4.so ./mozilla/dist/SunOS5.6_OPT.OBJ/bin/libnspr4.so ./mozilla/dist/SunOS5.6_OPT.OBJ/lib/libnspr4.so -rwxrwxr-x 1 kirke staff 374480 Oct 4 10:44 ./mozilla/nsprpub/pr/src/SunOS5.6_OPT.OBJ/libnspr4.so*kirke@axilla pwd /export/home8/kirke/workarea-nss33 kirke@axilla My certificate expired for Axilla, and cert-o-matic (troll.red.iplanet.com) has disappeared so I need to create a cert server and generate new certs before I'll have selfserv numbers. Looking forward to see your results. Thanks a bunch for your help.
Note, this patch is a patch to the file prmem.c after the patch in attachment 51740 [details] [diff] [review] has already been applied. This patch also cleans up some indentation problems with the previous patch. Please review this patch. Please retest iWS with this patch applied.
I combined the two patches (attachment 51740 [details] [diff] [review] and attachment 58817 [details] [diff] [review]) and did some editing to follow the NSPR coding style. This patch was generated against the tip of NSPR.
Some comments about the combined patch (attachment 62271 [details] [diff] [review]). 1. The zone allocator currently can only be used on Unix platforms that have pthreads. It can be ported to all platforms that have native threads. 2. To enable the zone allocator, one needs to set the environment variable NSPR_USE_ZONE_ALLOCATOR to 1.
Wan-Teh, I believe we previously agreed to also have another method to turn on the zone allocator - a call to an API. This is especially necessary on NT for services. Otherwise if the system environment variable is used, all applications under NT end up using the zone allocator, even the client, which will waste memory. The recommended way to turn on the allocator should be the call to that API, not setting the environment variable, which should be only for testing application compatibility with the zone allocator.
Julien, You are right. I am going to use an alternate method that achieves the same goal. I will allow an application to define a magic global variable and initialize it to 1 to enable the zone allocator. The advantage of this method over the function call method is that this magic global variable can be checked by NSPR initialization routine, whereas the function call method cannot guarantee that the function is called before NSPR is initialized.
Wan-Teh, I don't think your approach is portable. Trying to access a global variable defined in the executable from a DLL is likely to be impossible on some platforms. I know it works on Unix and NT. But I experimented with this on OS/2. DosQueryModuleHandle is specified to return a module handle only for DLLs. However, I was able to get it to successfully return one by passing "program.exe". Unfortunately, a subsequent DosQueryProcAddr - which is supposed to find an address in a module - returns error 6, invalid module, regardless of whether I try to look by symbol name or ordinal. This might be a question worth asking the OS/2 kernel guys.
I know this started as an NSS issue, but the resolution will be in NSPR. Should we change the component for this bug to NSPR ?
Good point. Changing the product to NSPR.
Status: ASSIGNED → NEW
Component: Libraries → NSPR
Product: NSS → NSPR
QA Contact: sonja.mirtitsch → wtc
Target Milestone: --- → 4.2
Version: 3.4 → 4.1.2
It is definitely not possible to do a DosQueryProcAddr on an entrypoint in an EXE on OS/2. You can only load and call functions in DLLs. The same is true of global variables. I like Julien's idea of an API call to turn it on.
Mike, I'd appreciate if you could go back to Scott Garfinkle and still ask if there is a way to do that, if you haven't yet. I was able to get ahold of the EXE's module handle in my test program. Just not to load any entry points from it as the handle was considered invalid by DosQueryProcAddr. I'm wondering what the EXE handle is good for. About the API call, the issue is complicated. Specifically, there is the problem of using a single allocator throughout the life of a process. Ideally, we would like NSPR to select its allocation method when it gets initialized. The problem is that it usually gets initialized incidentally rather than explicitly. Typically, a C++ object at global scope will invoke some function before main() would get a chance to make the API call to select the desired allocator. However an environment variable works for that implicit NSPR initialization case. One solution we came up with is that we could default to using the current malloc/free allocator in NSPR when the program starts up. Then the user makes the API call, and the allocator is switched. The problem is how to work with existing objects that were allocated before the switchover. One way to do that is that if an object to free does not exist in the allocator's list, we fall back to free. The downside is that we can no longer assert that this is an invalid pointer like we would if we were sure everything was allocated with the zone allocator. So it is not as clean an approach, though it is portable and workable. I think what Wan-Teh is afraid of is that on some platforms the free() of a bad pointer would be a no-op and mask some application bugs that would normally be automatically detected when using the zone allocator.
on NT, you may want to look at the MPHeap sample: http://msdn.microsoft.com/library/default.asp?url=/library/en-us/vcsample98/html/vcsmpmpheapsamplemultithreadedheapmanager.asp I think that it does the job
There are other heap products out there that work as well. The problem is they all interfere with the Java garbage collection code. With the zone allocator, we don't replace all malloc calls, just those which we use in NSS. Also we really need something that works on all platforms (or most common unix platforms, since that make up the most commonly used servers).
I submit that we now have something that works on all platforms and that satisfies the heap needs of NSS and NSPR without interfering with things like Java's run time which makes platform-dependent assumptions about the heap implementation.
I fixed a few bugs in the previous patch (attachment 62271 [details] [diff] [review]) and a compiler warning. Initialize the zone allocator first during NSPR initialization. Print warning messages if memory blocks from ordinary malloc are handed to the zone allocator.
Maybe the new printfs should be inside #ifdef DEBUG or something similar.
Nelson, I was debating whether I should do that. Since you made the same point, I added #ifdef DEBUG around the warning printfs.
This patch allows an application to define a magic global variable to enable the zone allocator. I have tested it on HP-UX, Linux, and Solaris. To enable the zone allocator, add the following to your main program: PRBool nspr_use_zone_allocator = PR_TRUE; On Solaris, this symbols gets exported automatically. On HP-UX, you need to pass "+ee nspr_use_zone_allocator" to the linker. On Linux, you need to use the -E linker option. This is going to make our release notes complicated. Maybe I should listen to you guys and just add an API call for enabling the zone allocator.
Wan-Teh, Your approach also has another problem, which is that it may not be the main program linking directly with NSPR but some other shared library or Windows DLL. In this case trying to grab the global variable would likely fail. I'd like to see the API call if it can be done without the expense of losing the PR_ASSERT on invalid free after the switch to the zone allocator. I'm still wondering how this can be done. Here is a suggestion : Since the asserts are only for debug builds presumably, we shouldn't care about memory usage for those cases. So in the debug build, when NSPR is implicitly initialized, it could allocate a few extra bytes before the buffer, to flag info about the memory block. Then after the switch is done, the PR_Free could look at those bytes and know even about the blocks that weren't initialized by the zone allocator before the API call, and assert. Of course this means the debug and opt build wil behave somewhat differently. But if the zone allocator is well tested, it could work and let us continue to debug problems regardless of whether the zone allocator is enabled or disabled.
Well actually with my suggestion you would have to return a pointer a few bytes after the flagged data. Then in the free, you would go back a few bytes and check the data. Possibly calling a page fault right there if the pointer is invalid or it is a double free. So it wouldn't exactly be like a PR_ASSERT, but it would still occur only in case of an application bug.
Patrick Beard suggested that a symbol in the main executable program can be looked up as follows: You can get the CFBundle of the main executable CFBundleGetMainBundle(), and then look up a symbol in it using CFBundleGetFunctionPointerForName(). See CFBundle.h.
The target milestone is already released. Resetting target milestone.
Target Milestone: 4.2 → ---
You need to log in before you can comment on or make changes to this bug.