Measure mmapped/library io
Categories
(Core :: Performance, defect)
Tracking
()
People
(Reporter: taras.mozilla, Unassigned)
Details
Attachments
(1 file)
250.52 KB,
text/plain
|
Details |
Currently mmap io is good at evading most measuring tools. It is confirmed to be the dominant source of io latency on Linux and is likely to be that on other platforms. We need a good way to measure mmap io.
Updated•15 years ago
|
Comment 1•15 years ago
|
||
Not build-config.
Comment 2•15 years ago
|
||
My DTrace repo lives here: http://github.com/wagerlabs/firefox-startup Grab d, startup.d, sigcont.d and pagefault.d then run 'd' in one terminal window, e.g. ./d firefox-bin page-fault.d Then run something like this in another window purge && /Applications/Minefield.app/Contents/MacOS/firefox-bin -no-remote -foreground -P 2 You should see output like this in the "dtrace" window. Note that I tested on Snow Leopard and kernel structures in Leopard may be different. dtrace -Zqw -x dynvarsize=64m -x evaltime=exec -p 18743 -s start.d -s page-fault.d -s sigcont.d vnode_pagein: XUL, offset: 20402176, size: 8192 vnode_pagein: XUL, offset: 19722240, size: 16384 vnode_pagein: XUL, offset: 19738624, size: 16384 vnode_pagein: XUL, offset: 19755008, size: 16384 vnode_pagein: XUL, offset: 19771392, size: 16384 vnode_pagein: XUL, offset: 19804160, size: 16384 vnode_pagein: XUL, offset: 19820544, size: 16384 vnode_pagein: XUL, offset: 19836928, size: 16384 vnode_pagein: XUL, offset: 19857408, size: 16384 vnode_pagein: XUL, offset: 19873792, size: 16384 vnode_pagein: XUL, offset: 19890176, size: 16384 vnode_pagein: XUL, offset: 19906560, size: 16384 vnode_pagein: XUL, offset: 19922944, size: 16384 vnode_pagein: XUL, offset: 19939328, size: 16384 vnode_pagein: XUL, offset: 19955712, size: 16384 vnode_pagein: XUL, offset: 20410368, size: 16384 vnode_pagein: XUL, offset: 19972096, size: 20480 vnode_pagein: XUL, offset: 19992576, size: 20480 vnode_pagein: XUL, offset: 20013056, size: 20480 vnode_pagein: XUL, offset: 20033536, size: 20480 vnode_pagein: XUL, offset: 20054016, size: 20480 vnode_pagein: XUL, offset: 20094976, size: 20480 vnode_pagein: XUL, offset: 20115456, size: 20480 vnode_pagein: XUL, offset: 20135936, size: 20480 vnode_pagein: XUL, offset: 20156416, size: 24576 vnode_pagein: XUL, offset: 20180992, size: 24576 vnode_pagein: XUL, offset: 20205568, size: 24576 vnode_pagein: XUL, offset: 20230144, size: 24576 vnode_pagein: XUL, offset: 20254720, size: 24576 vnode_pagein: XUL, offset: 20303872, size: 24576 vnode_pagein: XUL, offset: 20328448, size: 28672 vnode_pagein: XUL, offset: 20357120, size: 28672 vnode_pagein: XUL, offset: 20393984, size: 8192 vnode_pagein: XUL, offset: 19787776, size: 16384 vnode_pagein: XUL, offset: 20074496, size: 20480 vnode_pagein: XUL, offset: 20279296, size: 24576 vnode_pagein: XUL, offset: 20496384, size: 434176 vnode_pagein: XUL, offset: 20439040, size: 8192 vnode_pagein: XUL, offset: 14548992, size: 1048576 vnode_pagein: XUL, offset: 20385792, size: 8192 vnode_pagein: XUL, offset: 8192, size: 1040384 vnode_pagein: XUL, offset: 10944512, size: 999424 vnode_pagein: XUL, offset: 8921088, size: 1048576 vnode_pagein: XUL, offset: 20426752, size: 4096 vnode_pagein: XUL, offset: 20430848, size: 4096 vnode_pagein: XUL, offset: 20434944, size: 4096 vnode_pagein: XUL, offset: 19853312, size: 4096 vnode_pagein: XUL, offset: 20447232, size: 4096 vnode_pagein: XUL, offset: 20451328, size: 8192 vnode_pagein: XUL, offset: 12963840, size: 1048576 vnode_pagein: XUL, offset: 20459520, size: 36864 vnode_pagein: XUL, offset: 14012416, size: 536576 vnode_pagein: XUL, offset: 11943936, size: 1019904 vnode_pagein: XUL, offset: 1048576, size: 827392 vnode_pagein: XUL, offset: 10285056, size: 659456 vnode_pagein: XUL, offset: 1875968, size: 651264 vnode_pagein: XUL, offset: 9969664, size: 315392 vnode_pagein: XUL, offset: 3645440, size: 1048576 vnode_pagein: XUL, offset: 4694016, size: 819200 vnode_pagein: XUL, offset: 2658304, size: 987136 vnode_pagein: XUL, offset: 5701632, size: 1048576 vnode_pagein: XUL, offset: 7528448, size: 1048576 vnode_pagein: XUL, offset: 5513216, size: 188416 vnode_pagein: XUL, offset: 6934528, size: 593920 vnode_pagein: XUL, offset: 6750208, size: 184320 vnode_pagein: XUL, offset: 2527232, size: 131072 total run time: 5638.591735ms
Comment 3•15 years ago
|
||
I want to refine the bug description a bit, this should help understand what I'm doing... Dynamic libraries are mapped onto a range of memory by mmap. This memory range is backed by a set of virtual memory pages that are loaded from disk upon access to their corresponding memory address. Mac OSX uses 4K pages and these are fetched from disk on demand by the virtual memory pager. We are talking dynamic library pages here so each page is filled with code for functions that live in the dynamic library. Pages are accessed (and fetched from disk) when a call is made to a function in the dynamic library. Pages should be accessed sequentially for best performance. It does look from the above output that we are loading multiple pages at the same time, e.g. size 1019904 corresponds to 249 pages. Page access looks quite random, though, which is killing us. Now that we know what the access pattern is, we should try to first identify the symbols that are being accessed in each set of pages and then make the linker rearrange the code [1] such that page access is more sequential and less random. [1] http://developer.apple.com/mac/library/documentation/Performance/Conceptual/CodeFootprint/Articles/ImprovingLocality.html#//apple_ref/doc/uid/20001862
Comment 4•15 years ago
|
||
Saving it here for posterity: http://wagerlabs.com/post/230853261/tracking-io-patterns-in-memory-mapped-dynamic-libaries Summary: You are not likely to see page-ins when running this DTrace script unless you have just restarted your Mac and are running Firefox for the first time. This is because the dynamic libraries will be stored in the Unified Buffer Cache (UBC) after first access and there won’t be any subsequent disk access until they are evicted from the cache. I wrote about hacking the Unified Buffer Cache before [1] but the same technique does not work with mmap-ed data since the virtual manager and the cache are the same thing. Evicting the libraries would involve allocating at at least twice as much virtual memory as there’s RAM and then touching each page to make sure it’s cached. This is unlikely to correspond to normal use of Firefox, though. [1] http://tinyco.de/2008/03/04/hacking-the-unified-buffer-cache.html
Comment 5•15 years ago
|
||
Normally, we would just use gprof to give us four different types of symbol orderings: 1) Ordering based on a “closest is best” analysis of the profiling call graph. Calls that call each other frequently are placed close together. 2) Routines sorted by the number of calls made to each routine, largest numbers first. 3) Routines sorted by the order in which they are called. 4) Routines sorted by the amount of CPU time spent, largest times first. gprof doesn't work with dynamic libraries, though, so the above would require a static build of Firefox. There are some issues with that according to Bug 525013. In the meantime, 'size -m -l XUL' gives us the size (first number) and offset of each segment and section in XUL, e.g. Segment __TEXT: 19062784 (vmaddr 0x0 fileoff 0) Section __const: 1120204 (addr 0x1300 offset 4864) Section __cstring: 441251 (addr 0x112ad0 offset 1125072) Section __text: 13853392 (addr 0x17e680 offset 1566336) Section __symbol_stub1: 9282 (addr 0xeb4950 offset 15419728) Section __stub_helper: 15486 (addr 0xeb6d92 offset 15429010) Section __gcc_except_tab: 70540 (addr 0xebaa10 offset 15444496) Section __unwind_info: 224592 (addr 0xeccd90 offset 15519120) Section __eh_frame: 3319048 (addr 0xf03ae0 offset 15743712) total 19053795 Segment __DATA: 1822720 (vmaddr 0x122e000 fileoff 19062784) Section __nl_symbol_ptr: 1136 (addr 0x122e000 offset 19062784) Section __la_symbol_ptr: 12376 (addr 0x122e470 offset 19063920) Section __mod_init_func: 1536 (addr 0x12314c8 offset 19076296) Section __const: 1574096 (addr 0x1231ae0 offset 19077856) Section __cfstring: 3296 (addr 0x13b1fb0 offset 20651952) Section __objc_data: 1760 (addr 0x13b2c90 offset 20655248) Section __objc_msgrefs: 10544 (addr 0x13b3370 offset 20657008) Section __objc_selrefs: 432 (addr 0x13b5ca0 offset 20667552) Section __objc_classrefs: 1560 (addr 0x13b5e50 offset 20667984) Section __objc_superrefs: 160 (addr 0x13b6468 offset 20669544) Section __objc_const: 13040 (addr 0x13b6508 offset 20669704) Section __objc_classlist: 176 (addr 0x13b97f8 offset 20682744) Section __objc_catlist: 48 (addr 0x13b98a8 offset 20682920) Section __objc_protolist: 24 (addr 0x13b98d8 offset 20682968) Section __objc_imageinfo: 8 (addr 0x13b98f0 offset 20682992) Section __data: 107592 (addr 0x13b9900 offset 20683008) Section __bss: 61316 (addr 0x13d3d60 offset 0) Section __common: 29672 (addr 0x13e2d00 offset 0) total 1818772 Segment __LINKEDIT: 18173952 (vmaddr 0x13eb000 fileoff 20791296) total 39059456 The LINKEDIT segment houses the symbol table [1], a huge one in our case (18Mb) and one that cannot be stripped, e.g. strip XUL1 strip: symbols referenced by indirect symbol table entries that can't be stripped in: /Users/joelr/Work/mozilla/profile/toolkit/library/XUL1 What I'll try to do is convert the offsets and sizes in comment #2 to the segments and sections above. This will give us a bit more insight into how things are accessed. Once that is taken care of, I'll proceed to reorder the code pages for sequential access. I will then try to resolve Bug 525013 by getting a static build going so that we can time profile-guided (gprof) reordering. [1] http://developer.apple.com/mac/library/documentation/DeveloperTools/Conceptual/MachORuntime/Reference/reference.html
Comment 6•15 years ago
|
||
This corresponds to page-ins in comment #2. I'll need to reboot and re-run the page-ins DTrace script as I don't see __TEXT, __text being hit here. __DATA, __cfstring = 2 pages __DATA, __cfstring = 4 pages __DATA, __cfstring = 4 pages __DATA, __cfstring = 4 pages __DATA, __cfstring = 4 pages __DATA, __cfstring = 4 pages __DATA, __cfstring = 4 pages __DATA, __cfstring = 4 pages __DATA, __cfstring = 4 pages __DATA, __cfstring = 4 pages __DATA, __cfstring = 4 pages __DATA, __cfstring = 4 pages __DATA, __cfstring = 4 pages __DATA, __cfstring = 4 pages __DATA, __cfstring = 4 pages __DATA, __cfstring = 4 pages __DATA, __cfstring = 5 pages __DATA, __cfstring = 5 pages __DATA, __cfstring = 5 pages __DATA, __cfstring = 5 pages __DATA, __cfstring = 5 pages __DATA, __cfstring = 5 pages __DATA, __cfstring = 5 pages __DATA, __cfstring = 5 pages __DATA, __cfstring = 6 pages __DATA, __cfstring = 6 pages __DATA, __cfstring = 6 pages __DATA, __cfstring = 6 pages __DATA, __cfstring = 6 pages __DATA, __cfstring = 6 pages __DATA, __cfstring = 7 pages __DATA, __cfstring = 7 pages __DATA, __cfstring = 2 pages __DATA, __cfstring = 4 pages __DATA, __cfstring = 5 pages __DATA, __cfstring = 6 pages __DATA, __cfstring = 106 pages __DATA, __cfstring = 2 pages __TEXT, __symbol_stub1 = 256 pages __DATA, __cfstring = 2 pages __TEXT, __cstring = 254 pages __TEXT, __symbol_stub1 = 244 pages __TEXT, __symbol_stub1 = 256 pages __DATA, __cfstring = 1 pages __DATA, __cfstring = 1 pages __DATA, __cfstring = 1 pages __DATA, __cfstring = 1 pages __DATA, __cfstring = 1 pages __DATA, __cfstring = 2 pages __TEXT, __symbol_stub1 = 256 pages __DATA, __cfstring = 9 pages __TEXT, __symbol_stub1 = 131 pages __TEXT, __symbol_stub1 = 249 pages __TEXT, __cstring = 202 pages __TEXT, __symbol_stub1 = 161 pages __TEXT, __symbol_stub1 = 159 pages __TEXT, __symbol_stub1 = 77 pages __TEXT, __symbol_stub1 = 256 pages __TEXT, __symbol_stub1 = 200 pages __TEXT, __symbol_stub1 = 241 pages __TEXT, __symbol_stub1 = 256 pages __TEXT, __symbol_stub1 = 256 pages __TEXT, __symbol_stub1 = 46 pages __TEXT, __symbol_stub1 = 145 pages __TEXT, __symbol_stub1 = 45 pages __TEXT, __symbol_stub1 = 32 pages
Comment 7•15 years ago
|
||
I'm now purging the caches using this Python script (64 bits): #!/usr/bin/env python import mmap SIZE = 8589934592L # 8Gb map = mmap.mmap(-1, SIZE) map[::4096] = '\x01' * (SIZE // 4096) I then run a small DTrace script that looks for page-ins in XUL, e.g. dtrace -Zqws pf.d where the script looks like this: fbt::vnode_pagein:entry { self->v_name = stringof(((vnode_t)arg0)->v_name); } fbt::vnode_pagein:entry /self->v_name == "XUL"/ { printf("vnode_pagein: %s, offset: %u, size: %u\n", self->v_name, arg3, arg4); } I then analyze the output from the DTrace script and the list of segments, e.g. ./access-pattern.py xul.segments xul.pageins.2 where access-pattern.py lives at http://github.com/wagerlabs/firefox-startup/blob/master/mmap/access-pattern.py What I get back is devoid of any code (__TEXT, __text) access (see below). I'm consulting the Darwin deities tonight and will post any suggestions as soon as I have them. segment: __TEXT, size: 18673664, offset: 0 segment: __DATA, size: 1814528, offset: 18673664 segment: __LINKEDIT, size: 19382272, offset: 20393984 __DATA, __cfstring = 4 pages __DATA, __cfstring = 4 pages __DATA, __cfstring = 4 pages __DATA, __cfstring = 4 pages __DATA, __cfstring = 4 pages __DATA, __cfstring = 4 pages __DATA, __cfstring = 4 pages __DATA, __cfstring = 4 pages __LINKEDIT, __symtab = 4 pages __DATA, __cfstring = 5 pages __DATA, __cfstring = 5 pages __DATA, __cfstring = 5 pages __DATA, __cfstring = 5 pages __DATA, __cfstring = 5 pages __LINKEDIT, __symtab = 2 pages __TEXT, __symbol_stub1 = 256 pages __LINKEDIT, __symtab = 9 pages __TEXT, __symbol_stub1 = 254 pages __TEXT, __cstring = 131 pages __TEXT, __symbol_stub1 = 249 pages __LINKEDIT, __symtab = 2 pages __LINKEDIT, __symtab = 2 pages __DATA, __cfstring = 4 pages __DATA, __cfstring = 4 pages __DATA, __cfstring = 4 pages __DATA, __cfstring = 4 pages __DATA, __cfstring = 4 pages __DATA, __cfstring = 4 pages __DATA, __cfstring = 4 pages __DATA, __cfstring = 5 pages __DATA, __cfstring = 5 pages __DATA, __cfstring = 5 pages __DATA, __cfstring = 5 pages __DATA, __cfstring = 6 pages __DATA, __cfstring = 6 pages __DATA, __cfstring = 6 pages __DATA, __cfstring = 6 pages __DATA, __cfstring = 6 pages __DATA, __objc_classlist = 6 pages __LINKEDIT, __symtab = 6 pages __LINKEDIT, __symtab = 7 pages __LINKEDIT, __symtab = 7 pages __LINKEDIT, __symtab = 7 pages __LINKEDIT, __symtab = 128 pages __LINKEDIT, __symtab = 1 pages __LINKEDIT, __symtab = 1 pages __LINKEDIT, __symtab = 1 pages __LINKEDIT, __symtab = 2 pages __DATA, __cfstring = 1 pages __LINKEDIT, __symtab = 1 pages __LINKEDIT, __symtab = 2 pages __TEXT, __cstring = 256 pages __TEXT, __symbol_stub1 = 149 pages __TEXT, __symbol_stub1 = 256 pages __TEXT, __symbol_stub1 = 256 pages __TEXT, __symbol_stub1 = 199 pages __TEXT, __symbol_stub1 = 200 pages __TEXT, __symbol_stub1 = 256 pages __TEXT, __symbol_stub1 = 134 pages __TEXT, __symbol_stub1 = 130 pages __TEXT, __symbol_stub1 = 241 pages __TEXT, __symbol_stub1 = 256 pages __TEXT, __symbol_stub1 = 256 pages __TEXT, __symbol_stub1 = 46 pages __TEXT, __symbol_stub1 = 145 pages __TEXT, __symbol_stub1 = 45 pages __TEXT, __symbol_stub1 = 4 pages __TEXT, __symbol_stub1 = 3 pages
Comment 8•15 years ago
|
||
Allright, so I'm an idiot and the output should look like this: segment: __TEXT, size: 18673664, offset: 0 segment: __DATA, size: 1814528, offset: 18673664 segment: __LINKEDIT, size: 19382272, offset: 20393984 __DATA, __const = 4 pages __DATA, __const = 4 pages __DATA, __const = 4 pages __DATA, __const = 4 pages __DATA, __const = 4 pages __DATA, __const = 4 pages __DATA, __const = 4 pages __DATA, __const = 4 pages __LINKEDIT, __symtab = 4 pages __DATA, __const = 5 pages __DATA, __const = 5 pages __DATA, __const = 5 pages __DATA, __const = 5 pages __DATA, __const = 5 pages __DATA, __data = 2 pages __TEXT, __text = 256 pages __LINKEDIT, __symtab = 9 pages __TEXT, __text = 254 pages __TEXT, __const = 131 pages __TEXT, __text = 249 pages __LINKEDIT, __symtab = 2 pages __LINKEDIT, __symtab = 2 pages __DATA, __const = 4 pages __DATA, __const = 4 pages __DATA, __const = 4 pages __DATA, __const = 4 pages __DATA, __const = 4 pages __DATA, __const = 4 pages __DATA, __const = 4 pages __DATA, __const = 5 pages __DATA, __const = 5 pages __DATA, __const = 5 pages __DATA, __const = 5 pages __DATA, __const = 6 pages __DATA, __const = 6 pages __DATA, __const = 6 pages __DATA, __const = 6 pages __DATA, __const = 6 pages __DATA, __objc_const = 6 pages __DATA, __data = 6 pages __DATA, __data = 7 pages __DATA, __data = 7 pages __LINKEDIT, __symtab = 7 pages __LINKEDIT, __symtab = 128 pages __LINKEDIT, __symtab = 1 pages __LINKEDIT, __symtab = 1 pages __LINKEDIT, __symtab = 1 pages __LINKEDIT, __symtab = 2 pages __DATA, __const = 1 pages __LINKEDIT, __symtab = 1 pages __LINKEDIT, __symtab = 2 pages __TEXT, __const = 256 pages __TEXT, __text = 149 pages __TEXT, __text = 256 pages __TEXT, __text = 256 pages __TEXT, __text = 199 pages __TEXT, __text = 200 pages __TEXT, __text = 256 pages __TEXT, __text = 134 pages __TEXT, __text = 130 pages __TEXT, __text = 241 pages __TEXT, __text = 256 pages __TEXT, __text = 256 pages __TEXT, __text = 46 pages __TEXT, __text = 145 pages __TEXT, __text = 45 pages __TEXT, __text = 4 pages __TEXT, __text = 3 pages
Comment 9•15 years ago
|
||
And here's an even better output, with offsets segment: __TEXT, size: 18673664, offset: 0 segment: __DATA, size: 1814528, offset: 18673664 segment: __LINKEDIT, size: 19382272, offset: 20393984 __DATA, __const = 4 pages, 4843 pages in __DATA, __const = 4 pages, 4848 pages in __DATA, __const = 4 pages, 4852 pages in __DATA, __const = 4 pages, 4856 pages in __DATA, __const = 4 pages, 4860 pages in __DATA, __const = 4 pages, 4864 pages in __DATA, __const = 4 pages, 4868 pages in __DATA, __const = 4 pages, 4872 pages in __LINKEDIT, __symtab = 4 pages, 4983 pages in __DATA, __const = 5 pages, 4876 pages in __DATA, __const = 5 pages, 4881 pages in __DATA, __const = 5 pages, 4896 pages in __DATA, __const = 5 pages, 4901 pages in __DATA, __const = 5 pages, 4906 pages in __DATA, __data = 2 pages, 4977 pages in __TEXT, __text = 256 pages, 3165 pages in __LINKEDIT, __symtab = 9 pages, 4995 pages in __TEXT, __text = 254 pages, 2 pages in __TEXT, __const = 131 pages, 3421 pages in __TEXT, __text = 249 pages, 2916 pages in __LINKEDIT, __symtab = 2 pages, 4979 pages in __LINKEDIT, __symtab = 2 pages, 4981 pages in __DATA, __const = 4 pages, 4815 pages in __DATA, __const = 4 pages, 4819 pages in __DATA, __const = 4 pages, 4823 pages in __DATA, __const = 4 pages, 4827 pages in __DATA, __const = 4 pages, 4831 pages in __DATA, __const = 4 pages, 4835 pages in __DATA, __const = 4 pages, 4839 pages in __DATA, __const = 5 pages, 4886 pages in __DATA, __const = 5 pages, 4891 pages in __DATA, __const = 5 pages, 4911 pages in __DATA, __const = 5 pages, 4916 pages in __DATA, __const = 6 pages, 4921 pages in __DATA, __const = 6 pages, 4927 pages in __DATA, __const = 6 pages, 4933 pages in __DATA, __const = 6 pages, 4939 pages in __DATA, __const = 6 pages, 4945 pages in __DATA, __objc_const = 6 pages, 4951 pages in __DATA, __data = 6 pages, 4957 pages in __DATA, __data = 7 pages, 4963 pages in __DATA, __data = 7 pages, 4970 pages in __LINKEDIT, __symtab = 7 pages, 5004 pages in __LINKEDIT, __symtab = 128 pages, 5011 pages in __LINKEDIT, __symtab = 1 pages, 4987 pages in __LINKEDIT, __symtab = 1 pages, 4988 pages in __LINKEDIT, __symtab = 1 pages, 4989 pages in __LINKEDIT, __symtab = 2 pages, 4990 pages in __DATA, __const = 1 pages, 4847 pages in __LINKEDIT, __symtab = 1 pages, 4992 pages in __LINKEDIT, __symtab = 2 pages, 4993 pages in __TEXT, __const = 256 pages, 3552 pages in __TEXT, __text = 149 pages, 2767 pages in __TEXT, __text = 256 pages, 2312 pages in __TEXT, __text = 256 pages, 259 pages in __TEXT, __text = 199 pages, 2568 pages in __TEXT, __text = 200 pages, 1146 pages in __TEXT, __text = 256 pages, 890 pages in __TEXT, __text = 134 pages, 2178 pages in __TEXT, __text = 130 pages, 515 pages in __TEXT, __text = 241 pages, 649 pages in __TEXT, __text = 256 pages, 1392 pages in __TEXT, __text = 256 pages, 1838 pages in __TEXT, __text = 46 pages, 1346 pages in __TEXT, __text = 145 pages, 1693 pages in __TEXT, __text = 45 pages, 1648 pages in __TEXT, __text = 4 pages, 645 pages in __TEXT, __text = 3 pages, 256 pages in
Comment 10•15 years ago
|
||
Here's the same but with a running commentary. We start by loading 8 chunks of 4 pages, starting 4843 into the file. __DATA, __const = 4 pages, 4843 pages in Hit the symbol table for 4 pages. __LINKEDIT, __symtab = 4 pages, 4983 pages in Continue loading constants from where we left off, 5 chunks of 5 pages. __DATA, __const = 5 pages, 4876 pages in ... __DATA, __const = 5 pages, 4906 pages in Hit initialized mutable variables, such as writable C strings and data arrays for 2 pages. __DATA, __data = 2 pages, 4977 pages in Load 256 pages of code __TEXT, __text = 256 pages, 3165 pages in Grab 9 pages of symbols from the symbol table. __LINKEDIT, __symtab = 9 pages, 4995 pages in Grab some code. __TEXT, __text = 254 pages, 2 pages in Grab some constants. __TEXT, __const = 131 pages, 3421 pages in Grab some code. This chunk extends up to page 3165 in our last code read. __TEXT, __text = 249 pages, 2916 pages in Sequential read of 4 pages from the symbol table. __LINKEDIT, __symtab = 2 pages, 4979 pages in __LINKEDIT, __symtab = 2 pages, 4981 pages in Sequential read of 16 x 4/5/6 pages, last read 4945 pages in. __DATA, __const = 4 pages, 4815 pages in ObjC constants. __DATA, __objc_const = 6 pages, 4951 pages in More data, sequential reading. __DATA, __data = 6 pages, 4957 pages in __DATA, __data = 7 pages, 4963 pages in __DATA, __data = 7 pages, 4970 pages in Sequential reading of the symbol table. __LINKEDIT, __symtab = 7 pages, 5004 pages in __LINKEDIT, __symtab = 128 pages, 5011 pages in And a sudden jump back for more sequential reading. __LINKEDIT, __symtab = 1 pages, 4987 pages in __LINKEDIT, __symtab = 1 pages, 4988 pages in __LINKEDIT, __symtab = 1 pages, 4989 pages in __LINKEDIT, __symtab = 2 pages, 4990 pages in This page must be feeling lonely! __DATA, __const = 1 pages, 4847 pages in Read some mor symbols. __LINKEDIT, __symtab = 1 pages, 4992 pages in __LINKEDIT, __symtab = 2 pages, 4993 pages in And issue a bunch of completly random code reads. __TEXT, __const = 256 pages, 3552 pages in __TEXT, __text = 149 pages, 2767 pages in __TEXT, __text = 256 pages, 2312 pages in __TEXT, __text = 256 pages, 259 pages in __TEXT, __text = 199 pages, 2568 pages in __TEXT, __text = 200 pages, 1146 pages in __TEXT, __text = 256 pages, 890 pages in __TEXT, __text = 134 pages, 2178 pages in __TEXT, __text = 130 pages, 515 pages in __TEXT, __text = 241 pages, 649 pages in __TEXT, __text = 256 pages, 1392 pages in __TEXT, __text = 256 pages, 1838 pages in __TEXT, __text = 46 pages, 1346 pages in __TEXT, __text = 145 pages, 1693 pages in __TEXT, __text = 45 pages, 1648 pages in __TEXT, __text = 4 pages, 645 pages in __TEXT, __text = 3 pages, 256 pages in
Comment 11•15 years ago
|
||
I think that trying to control page-in order is a dead end, at least on (Snow) Leopard. I reboot to make sure there are no traces of Firefox in memory. I then run Firefox and trace the page-ins. One run may give me (head -10 ...) __LINKEDIT, __symtab = 2 pages, 5016 pages in __LINKEDIT, __symtab = 2 pages, 5018 pages in __DATA, __const = 4 pages, 4850 pages in __DATA, __const = 4 pages, 4854 pages in __DATA, __const = 4 pages, 4858 pages in __DATA, __const = 4 pages, 4862 pages in __DATA, __const = 4 pages, 4866 pages in whereas another run __LINKEDIT, __symtab = 2 pages, 5016 pages in __DATA, __const = 5 pages, 4921 pages in __DATA, __const = 6 pages, 4956 pages in __DATA, __data = 7 pages, 4998 pages in __TEXT, __const = 40 pages, 256 pages in __TEXT, __cstring = 44 pages, 307 pages in __TEXT, __text = 73 pages, 2390 pages in I dug into the source code for vnode_pagein [1] and it appears that the file system starts by trying to fetch 256 x 4096 (256 pages = 1Mb) from disk but the end number of pages is limited by various factors outside of our control. I wasn't able to figure out why access pattern is so different, apart from the first two pages of the __LINKEDIT segment always being requested first. I understand that we are really being hit by IO performance on mobile devices. These are Linux based, though, so shouldn't we try to improve mmap IO performance under Linux instead? I volunteer to look into this. Also, since Linux performance (mobile!) is so important to us, we may want to spend some time improving the Linux DTrace port. I volunteer for this as well. [1] xnu-1456.1.26/bsd/vm/vnode_pager.c
Comment 12•15 years ago
|
||
It may not be a dead end, I take it back. I think it will require careful studying of the Mac OSX dynamic linker, further reading of the Snow Leopard kernel, as well as writing Mach-O (executable format) tools to further advance this. It will also take a few weeks to reach positive results, in addition to the 2-3 weeks I spent on this already. For starters, I will need to prove that the page-in pattern does not change if the start-up path through the code is static. I think it may not be in our case and that's what's may be causing varying results in our case. I'm not quite sure about this (more below) but it's worth looking into. I used DTrace to get a sequential list of function calls that happen during our startup (time order) and told the static linker (ld) to order the symbols accordingly. This did not significantly improve our startup performance, e.g. using Vlad's timing: Unordered 1762, 1716, 1724, 1726, 1720 Ordered 1746, 1670, 918, 932, 1737 I got these results by rebooting, waiting until my apps came up and then starting Firefox. That's 10 reboots right there but each one takes about 40 seconds which beats allocating 8Gb of memory to clear the OS caches (2x physical, 4-5 minutes). I suspect it will be more efficient to whip up a Python library to parse the Mach-O file format (I know all about it [1]) and then try to shuffle code and data according to the page-in pattern. [1] http://tinyco.de/2009/01/26/creating-mac-binaries-on-any-platform.html Last but not least, I'll attach the cause of every XUL page-in for your viewing pleasure, first the user space backtrace then kernel backtrace and, finally, the page-in entry itself. We do get to XUL code towards the end but note that the first 16 or so page-ins are due to the dynamic linker (dyld). The difference in the page-in pattern during two identical runs is evident in the first 10 page-ins (see comment 11). It will take me a while just to figure out whether this difference in due to the dynamic linker, the virtual memory (VM) pager or the virtual file system (VFS) layer.
Comment 13•15 years ago
|
||
Note that the kernel stack trace is always the same and that the first 16 or so page-ins are due to the dynamic linker (dyld).
Comment 14•15 years ago
|
||
Here I'm looking at the differences in the dyld stage, before library initializers run. Following are 3 complete samples of dyld-related page-ins. I determined that these are the product of pre-initialization dyld work by looking at the user-space stack trace, e.g. this is before initializers dyld`ImageLoaderMachOCompressed::rebase(ImageLoader::LinkContext const&)+0x63 dyld`ImageLoader::recursiveRebase(ImageLoader::LinkContext const&)+0x67 dyld`ImageLoader::recursiveRebase(ImageLoader::LinkContext const&)+0x45 dyld`ImageLoader::link(ImageLoader::LinkContext const&, bool, bool, ImageLoader::RPathChain const&)+0x89 dyld`dyld::link(ImageLoader*, bool, ImageLoader::RPathChain const&)+0x76 dyld`dyld::_main(macho_header const*, unsigned long, int, char const**, char const**, char const**)+0xb06 dyld`dyldbootstrap::start(macho_header const*, int, char const**, long)+0x31f dyld`_dyld_start+0x2a firefox-bin`0x100000000 Once I see a different stack trace, e.g libobjc.A.dylib`_mapStrHash+0xb libobjc.A.dylib`NXMapGet+0x2f libobjc.A.dylib`addNamedClass+0x26 libobjc.A.dylib`_read_images+0x247 libobjc.A.dylib`map_images_nolock+0x4f2 libobjc.A.dylib`map_images+0x72 dyld`dyld::notifyBatchPartial(dyld_image_states, bool, char const* (*)(dyld_image_states, unsigned int, dyld_image_info const*))+0x2fb dyld`dyld::registerImageStateBatchChangeHandler(dyld_image_states, char const* (*)(dyld_image_states, unsigned int, dyld_image_info const*))+0x1ba libSystem.B.dylib`dyld_register_image_state_change_handler+0x58 libobjc.A.dylib`_objc_init+0x3a dyld`ImageLoaderMachO::doModInitFunctions(ImageLoader::LinkContext const&)+0xe4 dyld`ImageLoader::recursiveInitialization(ImageLoader::LinkContext const&, unsigned int)+0xec dyld`ImageLoader::recursiveInitialization(ImageLoader::LinkContext const&, unsigned int)+0x9d dyld`ImageLoader::recursiveInitialization(ImageLoader::LinkContext const&, unsigned int)+0x9d dyld`ImageLoader::recursiveInitialization(ImageLoader::LinkContext const&, unsigned int)+0x9d dyld`ImageLoader::recursiveInitialization(ImageLoader::LinkContext const&, unsigned int)+0x9d dyld`ImageLoader::recursiveInitialization(ImageLoader::LinkContext const&, unsigned int)+0x9d dyld`ImageLoader::recursiveInitialization(ImageLoader::LinkContext const&, unsigned int)+0x9d dyld`ImageLoader::recursiveInitialization(ImageLoader::LinkContext const&, unsigned int)+0x9d dyld`ImageLoader::runInitializers(ImageLoader::LinkContext const&)+0x3a I know that initialization code in the libraries is being run. I would expect dyld to always be doing the same thing with the same library but either this is not the case or the kernel is reacting completely different each time. Here's iteration #1 (73 pages) __LINKEDIT, __symtab = 2 pages, 5016 pages in __LINKEDIT, __symtab = 2 pages, 5018 pages in __DATA, __const = 4 pages, 4850 pages in __DATA, __const = 4 pages, 4854 pages in __DATA, __const = 4 pages, 4858 pages in __DATA, __const = 4 pages, 4862 pages in __DATA, __const = 4 pages, 4866 pages in __DATA, __const = 6 pages, 4956 pages in __DATA, __const = 6 pages, 4962 pages in __DATA, __const = 6 pages, 4968 pages in __DATA, __const = 6 pages, 4974 pages in __DATA, __const = 6 pages, 4980 pages in __DATA, __objc_classrefs = 6 pages, 4986 pages in __DATA, __data = 6 pages, 4992 pages in __DATA, __data = 7 pages, 4998 pages in iteration #2 (84 pages) __LINKEDIT, __symtab = 2 pages, 5018 pages in __DATA, __const = 6 pages, 4962 pages in __DATA, __const = 6 pages, 4968 pages in __DATA, __const = 6 pages, 4974 pages in __DATA, __const = 6 pages, 4980 pages in __DATA, __objc_classrefs = 6 pages, 4986 pages in __DATA, __data = 6 pages, 4992 pages in __DATA, __data = 7 pages, 4998 pages in __DATA, __data = 7 pages, 5005 pages in __LINKEDIT, __symtab = 1 pages, 5024 pages in __LINKEDIT, __symtab = 2 pages, 5025 pages in __LINKEDIT, __symtab = 4 pages, 5027 pages in __LINKEDIT, __symtab = 25 pages, 5031 pages in and iteration #3 (54 pages) __LINKEDIT, __symtab = 2 pages, 5016 pages in __LINKEDIT, __symtab = 2 pages, 5018 pages in __DATA, __const = 4 pages, 4850 pages in __DATA, __const = 4 pages, 4854 pages in __DATA, __const = 4 pages, 4858 pages in __DATA, __const = 4 pages, 4862 pages in __DATA, __const = 4 pages, 4866 pages in __DATA, __const = 4 pages, 4870 pages in __DATA, __const = 4 pages, 4874 pages in __DATA, __const = 4 pages, 4878 pages in __DATA, __const = 4 pages, 4882 pages in __DATA, __const = 4 pages, 4886 pages in __DATA, __const = 5 pages, 4916 pages in __DATA, __const = 5 pages, 4951 pages in
Comment 15•15 years ago
|
||
We are faulting in an average of 70 pages or 280K before dyld runs library initializers. This is counted over 3 identical runs (see comment 14). We are faulting in an average of 8361 pages or 32Mb during library initialization and beyond, i.e. after the above 70 pages. I apologize for calling this a dead end, it looks like an optimization opportunity large enough to drive a truck through.
Reporter | ||
Comment 16•15 years ago
|
||
(In reply to comment #11) > I understand that we are really being hit by IO performance on mobile devices. > These are Linux based, though, so shouldn't we try to improve mmap IO > performance under Linux instead? We aren't hit by random io perf on Linux mobile devices. On linux flash random-io is fast enough. Random io is hurting us most on desktop devices with a harddrive and Windows CE(this isn't backed up by direct evidence as I don't have any tools that can measure library io). > > I volunteer to look into this. > > Also, since Linux performance (mobile!) is so important to us, we may want to > spend some time improving the Linux DTrace port. I volunteer for this as well. systemtap works well enough on linux.
Comment 17•15 years ago
|
||
I'm digging further into this, if there are any doubts. I used DTrace to get a sequential list of function calls that happen during our startup (time order) and told the static linker (ld) to order the symbols accordingly. This did not significantly improve our startup performance, e.g. using Vlad's timing: Unordered 1762, 1716, 1724, 1726, 1720 Ordered 1746, 1670, 918, 932, 1737 The problem here is that time-ordering functions is the recommended approach but it's not improving startup performance.
Comment 18•15 years ago
|
||
A few more data points running from a slow USB HD... Unordered 5628, 5606, 5873, 4462, 6062 (after a reboot) Ordered 2629, 5997, 6239, 6050, 5994 (right after unordered) Ordered 4973, 5360, 4999 (after a reboot) Ordered 6348, 4682, 5292 (after ejecting the HD) As you can see, the numbers from the recommended approach of ordering symbols by use over time does not improve startup performance. Either that or my symbol ordering is wrong somehow. Need to keep digging.
Comment 19•15 years ago
|
||
Taras asked for the duration of each page-in so here it is. What I want to do now is write a Python script that grabs the page-in log and a dynamic library (XUL for starters) and outputs a dynamic library re-arranged for optimal performance. It doesn't look like feeding an ordered list of symbols to the linker does the trick so I want to go deeper. segment: __TEXT, size: 4594 pages, offset: 0 pages segment: __DATA, size: 445 pages, offset: 4594 pages segment: __LINKEDIT, size: 4433 pages, offset: 5016 pages __LINKEDIT, __symtab = 2 pages in 0.057769ms, 5016 pages in __LINKEDIT, __symtab = 2 pages in 0.075856ms, 5018 pages in __DATA, __const = 4 pages in 0.030841ms, 4850 pages in __DATA, __const = 4 pages in 0.025206ms, 4854 pages in __DATA, __const = 4 pages in 0.032988ms, 4858 pages in __DATA, __const = 4 pages in 0.060483ms, 4862 pages in __DATA, __const = 4 pages in 0.048479ms, 4866 pages in __DATA, __const = 4 pages in 0.039833ms, 4870 pages in __DATA, __const = 4 pages in 0.038306ms, 4874 pages in __DATA, __const = 4 pages in 0.047224ms, 4878 pages in __DATA, __const = 4 pages in 0.045886ms, 4882 pages in __DATA, __const = 4 pages in 0.039886ms, 4886 pages in __DATA, __const = 4 pages in 0.045787ms, 4890 pages in __DATA, __const = 4 pages in 0.032182ms, 4894 pages in __DATA, __const = 4 pages in 0.05377ms, 4898 pages in __DATA, __data = 7 pages in 0.069645ms, 4998 pages in __TEXT, __const = 40 pages in 0.10155ms, 256 pages in __TEXT, __text = 46 pages in 0.090554ms, 3673 pages in __TEXT, __text = 51 pages in 0.099173ms, 2171 pages in __TEXT, __text = 45 pages in 0.12043ms, 2845 pages in __TEXT, __text = 36 pages in 0.14896ms, 1928 pages in __TEXT, __text = 42 pages in 0.10131ms, 2549 pages in __TEXT, __text = 47 pages in 0.10006ms, 2629 pages in __TEXT, __text = 46 pages in 0.12338ms, 3285 pages in __TEXT, __text = 43 pages in 0.12906ms, 2802 pages in __TEXT, __text = 42 pages in 0.12084ms, 3508 pages in __TEXT, __text = 41 pages in 0.13751ms, 2231 pages in __TEXT, __text = 40 pages in 0.12185ms, 2463 pages in __DATA, __const = 4 pages in 0.060871ms, 4902 pages in __LINKEDIT, __symtab = 4 pages in 0.057369ms, 5020 pages in __DATA, __const = 5 pages in 0.043033ms, 4906 pages in __DATA, __const = 5 pages in 0.064485ms, 4911 pages in __DATA, __const = 5 pages in 0.063558ms, 4916 pages in __DATA, __const = 5 pages in 0.043632ms, 4921 pages in __DATA, __const = 5 pages in 0.026748ms, 4926 pages in __DATA, __const = 5 pages in 0.048189ms, 4931 pages in __DATA, __const = 5 pages in 0.051392ms, 4936 pages in __DATA, __const = 5 pages in 0.04664ms, 4941 pages in __DATA, __const = 5 pages in 0.052771ms, 4946 pages in __DATA, __const = 5 pages in 0.038657ms, 4951 pages in __DATA, __const = 6 pages in 0.055948ms, 4956 pages in __DATA, __const = 6 pages in 0.036347ms, 4962 pages in __DATA, __const = 6 pages in 0.054377ms, 4968 pages in __DATA, __const = 6 pages in 0.035665ms, 4974 pages in __DATA, __const = 6 pages in 0.031481ms, 4980 pages in __DATA, __objc_classrefs = 6 pages in 0.042742ms, 4986 pages in __DATA, __data = 6 pages in 0.023917ms, 4992 pages in __DATA, __data = 7 pages in 0.05238ms, 5005 pages in __LINKEDIT, __symtab = 1 pages in 0.079198ms, 5024 pages in __LINKEDIT, __symtab = 2 pages in 0.03906ms, 5025 pages in __LINKEDIT, __symtab = 4 pages in 0.050371ms, 5027 pages in __LINKEDIT, __symtab = 25 pages in 0.074566ms, 5031 pages in __TEXT, __cstring = 37 pages in 0.093076ms, 351 pages in __DATA, __data = 4 pages in 0.055401ms, 5012 pages in __TEXT, __const = 44 pages in 0.15777ms, 2 pages in __TEXT, __cstring = 44 pages in 0.082086ms, 307 pages in __TEXT, __text = 34 pages in 0.041126ms, 388 pages in __TEXT, __const = 50 pages in 0.048358ms, 176 pages in __TEXT, __text = 48 pages in 0.14794ms, 3372 pages in __TEXT, __text = 42 pages in 0.10351ms, 3049 pages in __TEXT, __text = 44 pages in 0.1259ms, 1532 pages in __TEXT, __text = 43 pages in 0.1081ms, 1964 pages in __TEXT, __text = 47 pages in 0.09868ms, 3120 pages in __TEXT, __text = 43 pages in 0.1234ms, 1834 pages in __TEXT, __text = 40 pages in 0.1181ms, 3167 pages in __TEXT, __text = 41 pages in 0.098898ms, 1650 pages in __TEXT, __text = 39 pages in 0.10108ms, 3223 pages in __TEXT, __cstring = 11 pages in 0.050122ms, 296 pages in __TEXT, __const = 30 pages in 0.085229ms, 226 pages in __TEXT, __text = 70 pages in 0.092412ms, 2007 pages in __TEXT, __text = 81 pages in 0.21815ms, 672 pages in __TEXT, __text = 43 pages in 0.1581ms, 2272 pages in __TEXT, __text = 29 pages in 0.078536ms, 3091 pages in __TEXT, __text = 76 pages in 0.11776ms, 2700 pages in __TEXT, __text = 23 pages in 0.061873ms, 3262 pages in __TEXT, __text = 42 pages in 0.11333ms, 422 pages in __TEXT, __text = 73 pages in 0.17751ms, 2390 pages in __TEXT, __text = 83 pages in 0.13849ms, 1698 pages in __TEXT, __text = 63 pages in 0.14195ms, 2986 pages in __TEXT, __text = 16 pages in 0.070791ms, 3207 pages in __TEXT, __text = 79 pages in 0.18594ms, 2079 pages in __TEXT, __text = 39 pages in 0.088643ms, 3420 pages in __TEXT, __text = 46 pages in 0.10984ms, 2503 pages in __TEXT, __text = 60 pages in 0.081019ms, 2926 pages in __TEXT, __text = 41 pages in 0.11579ms, 3331 pages in __TEXT, __text = 51 pages in 0.089781ms, 1877 pages in __TEXT, __text = 49 pages in 0.14999ms, 464 pages in __TEXT, __text = 12 pages in 0.076134ms, 519 pages in __TEXT, __text = 147 pages in 0.22823ms, 901 pages in __TEXT, __text = 49 pages in 0.078693ms, 3459 pages in __TEXT, __text = 5 pages in 0.053016ms, 1781 pages in __TEXT, __text = 170 pages in 0.28439ms, 1253 pages in __TEXT, __text = 63 pages in 0.13733ms, 1587 pages in __TEXT, __text = 26 pages in 0.088221ms, 2776 pages in __TEXT, __text = 75 pages in 0.17244ms, 2315 pages in __TEXT, __text = 48 pages in 0.090419ms, 1786 pages in __TEXT, __text = 38 pages in 0.070018ms, 2591 pages in __TEXT, __text = 107 pages in 0.17802ms, 553 pages in __TEXT, __const = 111 pages in 0.13617ms, 60 pages in __TEXT, __text = 6 pages in 0.061568ms, 513 pages in __TEXT, __text = 22 pages in 0.10429ms, 531 pages in __TEXT, __text = 87 pages in 0.17596ms, 753 pages in __TEXT, __text = 61 pages in 0.11519ms, 840 pages in __TEXT, __text = 12 pages in 0.073598ms, 660 pages in __TEXT, __text = 9 pages in 0.06006ms, 2222 pages in __TEXT, __text = 132 pages in 0.23864ms, 1048 pages in __TEXT, __text = 13 pages in 0.073986ms, 2158 pages in __TEXT, __text = 24 pages in 0.090026ms, 2676 pages in __TEXT, __text = 36 pages in 0.078139ms, 2890 pages in __TEXT, __text = 109 pages in 0.12254ms, 1423 pages in __TEXT, __text = 73 pages in 0.073424ms, 1180 pages in __TEXT, __unwind_info = 123 pages in 0.13173ms, 3719 pages in __TEXT, __text = 123 pages in 0.16964ms, 3550 pages in __TEXT, __text = 2 pages in 0.058368ms, 2077 pages in __TEXT, __text = 7 pages in 0.051494ms, 1691 pages in __TEXT, __text = 11 pages in 0.063852ms, 1576 pages in
Comment 20•15 years ago
|
||
I just realized that "NNN pages in" may not be clear enough. I'll rename it to "offset = NNN pages" going forward.
Comment 21•15 years ago
|
||
The page-in log in comment 19 is for an old and slow USB HD that I'll be using for testing until my Mozilla laptops arrive, same HD as in comment 18.
Reporter | ||
Comment 22•15 years ago
|
||
(In reply to comment #21) > The page-in log in comment 19 is for an old and slow USB HD that I'll be using > for testing until my Mozilla laptops arrive, same HD as in comment 18. There is something wrong with that log. It's impossible to have that much hard-drive io and have all of the individual file requests be <1ms. There have to be some seeks in there. Either we are missing some disk io, or these ms are actually seconds.
Comment 23•15 years ago
|
||
Taras, the timings are originally reported in nanoseconds which I divide by 1,000,000 to get millis. There are no individual file requests here, these are all requests within the body of XUL.
Reporter | ||
Comment 24•15 years ago
|
||
(In reply to comment #23) > Taras, the timings are originally reported in nanoseconds which I divide by > 1,000,000 to get millis. There are no individual file requests here, these are > all requests within the body of XUL. s/file requests/disk io/. Even fast hard drives spec their seek times at over 8ms, so I'm assuming something is missing here.
Comment 25•15 years ago
|
||
Here are Shark profiles for a change. Domain: https://jreymont-howe.strongspace.com Login: public Password: 4ZD5CbOBbm Click on the browse tab and download Firefox-Shark-Time-Profile.zip Open the 'FF Unordered USB - Time Profile' in the archive. Set View in the bottom right corner to 'Heavy (Bottom-up)' then select the 'main' thread in the drop-down box to the left. mach_msg_trap should show at the top of the list with 67.4%. We don't care about that so pull up the menu (right mouse click) and select 'Remove callstacks with symbol mach_msg_trap'. You should be able to see what I pasted below, after expanding some items. That's 598 samples, 10ms each, for a total of ~6s. 9.7% is ours and 8% is on dyld reads, etc. I understand that libjar uses mmap but pt_Read (nsprpub/pr/src/pthreads/ptio.c) does not use mmap-ed IO and issues plain reads. It looks like we have a 10% optimization opportunity here. 6.4% 6.4% XUL nsZipArchive::GetData(nsZipItem*) 0.0% 6.4% XUL nsJARInputStream::InitFile(nsJAR*, nsZipItem*) 0.0% 6.4% XUL nsJAR::GetInputStreamWithSpec(nsACString_internal const&, char const*, nsIInputStream**) 0.0% 6.4% XUL nsJARInputThunk::EnsureJarStream() 0.0% 6.4% XUL nsJARChannel::Open(nsIInputStream**) 0.0% 4.5% XUL nsXBLService::FetchBindingDocument(nsIContent*, nsIDocument*, nsIURI*, nsIURI*, int, nsIDocument**) 0.0% 4.5% XUL nsXBLService::LoadBindingDocumentInfo(nsIContent*, nsIDocument*, nsIURI*, nsIPrincipal*, int, nsIXBLDocumentInfo**) 0.0% 4.5% XUL nsXBLService::GetBinding(nsIContent*, nsIURI*, int, nsIPrincipal*, int*, nsXBLBinding**, nsTArray<nsIURI*>&) 0.0% 3.2% XUL nsXBLService::GetBinding(nsIContent*, nsIURI*, int, nsIPrincipal*, int*, nsXBLBinding**) 0.0% 1.3% XUL nsXBLService::GetBinding(nsIContent*, nsIURI*, int, nsIPrincipal*, int*, nsXBLBinding**, nsTArray<nsIURI*>&) 0.0% 0.8% XUL CSSLoaderImpl::LoadSheet(SheetLoadData*, StyleSheetState) 0.0% 0.5% XUL nsStringBundle::LoadProperties() 0.0% 0.3% XUL mozJSSubScriptLoader::LoadSubScript(unsigned short const*) 0.0% 0.2% XUL nsExpatDriver::OpenInputStreamFromExternalDTD(unsigned short const*, unsigned short const*, unsigned short const*, nsIInputStream**, nsAString_internal&) 5.0% 5.0% dyld pread 0.0% 5.0% dyld dyld::loadPhase6(int, stat&, char const*, dyld::LoadContext const&) 0.0% 5.0% dyld dyld::loadPhase5(char const*, dyld::LoadContext const&, std::vector<char const*, std::allocator<char const*> >*) 0.0% 5.0% dyld dyld::loadPhase4(char const*, dyld::LoadContext const&, std::vector<char const*, std::allocator<char const*> >*) 0.0% 5.0% dyld dyld::loadPhase3(char const*, dyld::LoadContext const&, std::vector<char const*, std::allocator<char const*> >*) 0.0% 5.0% dyld dyld::loadPhase1(char const*, dyld::LoadContext const&, std::vector<char const*, std::allocator<char const*> >*) 0.0% 5.0% dyld dyld::loadPhase0(char const*, dyld::LoadContext const&, std::vector<char const*, std::allocator<char const*> >*) 0.0% 5.0% dyld dyld::load(char const*, dyld::LoadContext const&) 0.0% 3.7% dyld dyld::libraryLocator(char const*, bool, char const*, ImageLoader::RPathChain const*) 0.0% 3.7% dyld ImageLoader::recursiveLoadLibraries(ImageLoader::LinkContext const&, bool, ImageLoader::RPathChain const&) 0.0% 2.3% dyld ImageLoader::link(ImageLoader::LinkContext const&, bool, bool, ImageLoader::RPathChain const&) 0.0% 1.3% dyld ImageLoader::recursiveLoadLibraries(ImageLoader::LinkContext const&, bool, ImageLoader::RPathChain const&) 0.0% 1.3% dyld dlopen 3.3% 3.3% libSystem.B.dylib read 0.0% 2.8% libnspr4.dylib pt_Read 0.0% 1.3% XUL nsFileInputStream::Read(char*, unsigned int, unsigned int*) 0.0% 1.0% XUL nsChromeRegistry::ProcessManifest(nsILocalFile*, int) 0.0% 0.5% XUL xptiInterfaceInfoManager::ReadXPTFile(nsILocalFile*, xptiWorkingSet*) 0.0% 0.3% CoreFoundation _CFReadBytesFromFile 0.0% 0.2% XUL IsLoadablePlugin(__CFURL const*) 3.0% 3.0% dyld ImageLoaderMachOCompressed::rebase(ImageLoader::LinkContext const&) 0.0% 3.0% dyld ImageLoader::recursiveRebase(ImageLoader::LinkContext const&) 0.0% 1.8% dyld ImageLoader::recursiveRebase(ImageLoader::LinkContext const&) 0.0% 1.2% dyld ImageLoader::link(ImageLoader::LinkContext const&, bool, bool, ImageLoader::RPathChain const&) 2.2% 2.2% libSystem.B.dylib __memcpy 0.0% 0.8% XUL nsJARInputStream::Read(char*, unsigned int, unsigned int*) 0.0% 0.3% ATS AnnexFileReadTableBytesByIndex 0.0% 0.2% XUL nsCSubstringTuple::WriteTo(char*, unsigned int) const 0.0% 0.2% libmozjs.dylib js_InternalGetOrSet 0.0% 0.2% libSystem.B.dylib fread 0.0% 0.2% CoreGraphics CGDataProviderCreateWithCopyOfData 0.0% 0.2% libmozsqlite3.dylib balance 0.0% 0.2% AppKit -[NSView frame]
Comment 26•15 years ago
|
||
Here's another observation... The stack below represents 30 samples and thus 300ms. The dynamic linker (dyld) ONLY uses pread to grab the Mach-O headers located in the first page of a library or executable. We should be able to cut this down drastically just by reducing the number of shared libraries and making Firefox more static (Bug 525013). It seems like the lowest-hanging fruit here, should I take care of it first? 5.0% 5.0% dyld pread 0.0% 5.0% dyld dyld::loadPhase6(int, stat&, char const*, dyld::LoadContext const&) 0.0% 5.0% dyld dyld::loadPhase5(char const*, dyld::LoadContext const&, std::vector<char const*, std::allocator<char const*> >*) 0.0% 5.0% dyld dyld::loadPhase4(char const*, dyld::LoadContext const&, std::vector<char const*, std::allocator<char const*> >*) 0.0% 5.0% dyld dyld::loadPhase3(char const*, dyld::LoadContext const&, std::vector<char const*, std::allocator<char const*> >*)
Reporter | ||
Updated•14 years ago
|
Comment 27•2 years ago
|
||
The bug assignee didn't login in Bugzilla in the last 7 months.
:overholt, could you have a look please?
For more information, please visit auto_nag documentation.
Comment 28•2 years ago
|
||
I think it is safe to close an 11-year old performance investigation.
Comment 29•2 years ago
|
||
(In reply to Taras Glek from comment #0)
Currently mmap io is good at evading most measuring tools. [...] We need a good way to measure mmap io.
Taras wrote this 13 years ago.
And just three weeks ago, he blogged about a way to do exactly this, easily: https://taras.glek.net/post/ebpf-mmap-page-fault-tracing/
Noticed that there wasn’t any bpf that I could google for tracing mmap-IO. This is now fun and somewhat easy
I'm happy to close this bug. If people want to do more mmap IO performance investigation, eBPF on Linux is the right way.
Comment 30•2 years ago
|
||
Sorry, I meant to close the bug at the same time I was moving it.
Description
•