Closed
Bug 554421
Opened 15 years ago
Closed 14 years ago
Replace or augment unix wrapper script with a madvise()ing binary for a massive win
Categories
(Toolkit :: Startup and Profile System, defect)
Tracking
()
RESOLVED
DUPLICATE
of bug 632404
People
(Reporter: taras.mozilla, Unassigned)
References
Details
(Whiteboard: [ts])
Attachments
(2 files, 3 obsolete files)
1.46 KB,
patch
|
Details | Diff | Splinter Review | |
8.67 KB,
patch
|
Details | Diff | Splinter Review |
By default, the dynamic linker + demand paging is an excellent way to kill our startup perf.
By default the dynamic linker causes random (and frequently backwards) IO through the binary, which is extremely likely to cause pagefaults and disk io. This overhead can be measured in seconds on my system.
By default Linux reads 200K worth of pages per fault which is a lot of seeks/reads in a >10MB binary.
Turns out this sort of stupidity would be avoided if one could call madvise(MADV_WILLNEED) from the dynamic linker right after it mmaps() relevant areas. On my system MADV_WILLNEED cranks up the readahead by 100x to 2MB which massively reduces faulting. Note that madvise has zero overhead if there are no page faults, so it basically has no startup overhead during warm startup.
Since I can't figure out how to make the dynamic linker do madvise(), have to hack around it.
hack a) Call madvise from a function that runs before main() in firefox-bin. This is easy to implement, could be deployed tomorrow. In my testcase it reduced libxul.so reads from 127 to 62. The downside is that by the time my function is called the dynamic linker has already gone and done the worst of the seeky backwards io.
hack b) I wasn't sure if this was going to work, but it did. Write a basic C program that open()s libxul, maps the two rw/rx segments and madvise()s them. Then it forks(in order to keep the memory maps) and starts firefox which then presumably shares the mappings. My script can't measure faults in this situation, but i'm estimating it's 10x better than a). It shaves off 1second of my startup time down to 1.6seconds.
proper-solution c) Modify the dynamic linker so one could specify madvise()s via ld commandline or env variables. What what I know about libc/ld release cycles and difficulty of getting stuff landed, this sounds hard.
Reporter | ||
Updated•15 years ago
|
Whiteboard: [ts]
Updated•15 years ago
|
Attachment #434313 -
Attachment mime type: text/x-csrc → text/plain
Comment 1•15 years ago
|
||
Wouldn't it then be much simpler to use the xulrunner stub, which dlopen()s libxul.so, at which point you could madvise() ?
Reporter | ||
Comment 2•15 years ago
|
||
(In reply to comment #1)
> Wouldn't it then be much simpler to use the xulrunner stub, which dlopen()s
> libxul.so, at which point you could madvise() ?
Yes. This would work better for the current libxul xulrunner builds. It wont help with static builds in bug 525013.
Comment 3•15 years ago
|
||
(In reply to comment #2)
> Yes. This would work better for the current libxul xulrunner builds. It wont
> help with static builds in bug 525013.
How about mixing both worlds ? Why not make the static builds really a big fat libxul and use a loader/stub that would do the loading and madvise() ? (and also get rid of the shell wrapper scripts)
Reporter | ||
Comment 4•15 years ago
|
||
(In reply to comment #3)
> (In reply to comment #2)
> > Yes. This would work better for the current libxul xulrunner builds. It wont
> > help with static builds in bug 525013.
>
> How about mixing both worlds ? Why not make the static builds really a big fat
> libxul and use a loader/stub that would do the loading and madvise() ? (and
> also get rid of the shell wrapper scripts)
Yeah I could live with that on Linux. Sounds like a reasonable compromise. Got a patch? :)
Comment 5•15 years ago
|
||
d) use a different dynamic linker than ld.so. e.g. a stub dynamic linker that does the madvise and then just forwards everything else to ld.so. But that doesn't sound simple
b) sounds fine with me
How would "static builds a big fat libxul" be any different from our current situation? Just linking in JS, which we could/should probably do anyway?
Reporter | ||
Comment 6•15 years ago
|
||
(In reply to comment #5)
> d) use a different dynamic linker than ld.so. e.g. a stub dynamic linker that
> does the madvise and then just forwards everything else to ld.so. But that
> doesn't sound simple
Yeah shipping a drop-in linker with firefox sounds a little scary.
>
> b) sounds fine with me
>
> How would "static builds a big fat libxul" be any different from our current
> situation? Just linking in JS, which we could/should probably do anyway?
Yes. Link in everything we can into libxul.
Reporter | ||
Comment 7•15 years ago
|
||
Reporter | ||
Comment 8•15 years ago
|
||
I was misinterpreting my logs. Turns out madvise() immediately triggers readahead. This makes linking everything into libxul.so, madvise()ing, then dlopen()ing it a robust way to read our binaries in fast.
Reporter | ||
Comment 9•15 years ago
|
||
Here is another approach(dirty hack) for preloading the binary. This relies on gnu linker's function layout and gcc's initialization order.
This places a marker function on the front of the binary and a startup function on the end. Gcc runs initializers in reverse order so the end function runs before any other initializer. I also use function/variable positions to compute the approximate size of the .text and .data sections.
This loads the firefox into memory(with patches from bug 561842) with ~50 pagefaults. vs near 250 of a stock nightly.
Attachment #434313 -
Attachment is obsolete: true
Attachment #434721 -
Attachment is obsolete: true
Comment 11•14 years ago
|
||
Comment on attachment 447253 [details] [diff] [review]
libxul preloader
This is the best patch I've ever seen. Please never ever commit it to the tree.
Comment 12•14 years ago
|
||
(In reply to comment #11)
> (From update of attachment 447253 [details] [diff] [review])
> This is the best patch I've ever seen. Please never ever commit it to the tree.
Whatever, but this is massively less invasive and complicated than the other stuff Taras had been looking into.
Comment 13•14 years ago
|
||
Actually, I take it back: it could go in fine if it was a good win, I guess. I don't really care about the performance of XULRunner-stubbed builds, since people who distribute that way are basically deciding to make startup slow anyway. (And they tend to correlate with people who want to link against a bazillion system libraries, rather than use the in-tree ones that we can get into a big libxul anyway.)
Reporter | ||
Comment 14•14 years ago
|
||
(In reply to comment #13)
> Actually, I take it back: it could go in fine if it was a good win, I guess. I
> don't really care about the performance of XULRunner-stubbed builds, since
> people who distribute that way are basically deciding to make startup slow
> anyway. (And they tend to correlate with people who want to link against a
> bazillion system libraries, rather than use the in-tree ones that we can get
> into a big libxul anyway.)
I'm not sure why this would negatively affect xulrunner stubs. I would also point out that using system libraries instead of [even statically] linking our own is a performance win as they have a higher chance of being cached already.
Comment 15•14 years ago
|
||
Have you considered using the build system to get the appropriate libxul offsets and compile them into the executable in an approach like comment 1 / attachment 434313 [details] with mmap, madvise, dlopen?
Comment 16•14 years ago
|
||
You could use linker scripts to set symbols with the proper values to get the offsets. Anyways, I don't think we need this yet.
Reporter | ||
Comment 17•14 years ago
|
||
This turned out to be super-easy. This version of the hack use dlopen(as suggested above) to workaround avoid ld.so stupidity.
Updated•14 years ago
|
Status: NEW → RESOLVED
Closed: 14 years ago
Resolution: --- → DUPLICATE
You need to log in
before you can comment on or make changes to this bug.
Description
•