Last Comment Bug 554421 - Replace or augment unix wrapper script with a madvise()ing binary for a massive win
: Replace or augment unix wrapper script with a madvise()ing binary for a massi...
Status: RESOLVED DUPLICATE of bug 632404
[ts]
:
Product: Toolkit
Classification: Components
Component: Startup and Profile System (show other bugs)
: unspecified
: x86 Linux
: -- normal with 7 votes (vote)
: ---
Assigned To: Nobody; OK to take it and work on it
:
Mentors:
Depends on: 552864
Blocks: 627591
  Show dependency treegraph
 
Reported: 2010-03-23 12:56 PDT by (dormant account)
Modified: 2011-05-16 03:03 PDT (History)
32 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments
proof of concept (1.12 KB, text/plain)
2010-03-23 12:56 PDT, (dormant account)
no flags Details
approach a (1.97 KB, patch)
2010-03-24 16:06 PDT, (dormant account)
no flags Details | Diff | Splinter Review
libxul preloader (1.38 KB, patch)
2010-05-24 18:39 PDT, (dormant account)
no flags Details | Diff | Splinter Review
libxul preloader (1.46 KB, patch)
2010-05-24 18:44 PDT, (dormant account)
no flags Details | Diff | Splinter Review
firefox-bin that dynamically opens libxul.so (8.67 KB, patch)
2010-07-02 16:21 PDT, (dormant account)
no flags Details | Diff | Splinter Review

Description (dormant account) 2010-03-23 12:56:32 PDT
Created attachment 434313 [details]
proof of concept

By default, the dynamic linker + demand paging is an excellent way to kill our startup perf.
By default the dynamic linker causes random (and frequently backwards) IO through the binary, which is extremely likely to cause pagefaults and disk io. This overhead can be measured in seconds on my system.
By default Linux reads 200K worth of pages per fault which is a lot of seeks/reads in a >10MB binary.

Turns out this sort of stupidity would be avoided if one could call madvise(MADV_WILLNEED) from the dynamic linker right after it mmaps() relevant areas. On my system MADV_WILLNEED cranks up the readahead by 100x to 2MB which massively reduces faulting. Note that madvise has zero overhead if there are no page faults, so it basically has no startup overhead during warm startup.

Since I can't figure out how to make the dynamic linker do madvise(), have to hack around it.
hack a) Call madvise from a function that runs before main() in firefox-bin. This is easy to implement, could be deployed tomorrow. In my testcase it reduced libxul.so reads from 127 to 62. The downside is that by the time my function is called the dynamic linker has already gone and done the worst of the seeky backwards io.
hack b) I wasn't sure if this was going to work, but it did. Write a basic C program that open()s libxul, maps the two rw/rx segments and madvise()s them. Then it forks(in order to keep the memory maps) and starts firefox which then presumably shares the mappings. My script can't measure faults in this situation, but i'm estimating it's 10x better than a). It shaves off 1second of my startup time down to 1.6seconds. 
proper-solution c) Modify the dynamic linker so one could specify madvise()s via ld commandline or env variables. What what I know about libc/ld release cycles and difficulty of getting stuff landed, this sounds hard.
Comment 1 Mike Hommey [:glandium] 2010-03-23 13:52:01 PDT
Wouldn't it then be much simpler to use the xulrunner stub, which dlopen()s libxul.so, at which point you could madvise() ?
Comment 2 (dormant account) 2010-03-23 14:16:36 PDT
(In reply to comment #1)
> Wouldn't it then be much simpler to use the xulrunner stub, which dlopen()s
> libxul.so, at which point you could madvise() ?

Yes. This would work better for the current libxul xulrunner builds. It wont help with static builds in bug 525013.
Comment 3 Mike Hommey [:glandium] 2010-03-23 14:23:54 PDT
(In reply to comment #2)
> Yes. This would work better for the current libxul xulrunner builds. It wont
> help with static builds in bug 525013.

How about mixing both worlds ? Why not make the static builds really a big fat libxul and use a loader/stub that would do the loading and madvise() ? (and also get rid of the shell wrapper scripts)
Comment 4 (dormant account) 2010-03-23 14:25:01 PDT
(In reply to comment #3)
> (In reply to comment #2)
> > Yes. This would work better for the current libxul xulrunner builds. It wont
> > help with static builds in bug 525013.
> 
> How about mixing both worlds ? Why not make the static builds really a big fat
> libxul and use a loader/stub that would do the loading and madvise() ? (and
> also get rid of the shell wrapper scripts)

Yeah I could live with that on Linux. Sounds like a reasonable compromise. Got a patch? :)
Comment 5 Benjamin Smedberg AWAY UNTIL 2-AUG-2016 [:bsmedberg] 2010-03-24 06:56:17 PDT
d) use a different dynamic linker than ld.so. e.g. a stub dynamic linker that does the madvise and then just forwards everything else to ld.so. But that doesn't sound simple

b) sounds fine with me

How would "static builds a big fat libxul" be any different from our current situation? Just linking in JS, which we could/should probably do anyway?
Comment 6 (dormant account) 2010-03-24 08:44:47 PDT
(In reply to comment #5)
> d) use a different dynamic linker than ld.so. e.g. a stub dynamic linker that
> does the madvise and then just forwards everything else to ld.so. But that
> doesn't sound simple

Yeah shipping a drop-in linker with firefox sounds a little scary.

> 
> b) sounds fine with me
> 
> How would "static builds a big fat libxul" be any different from our current
> situation? Just linking in JS, which we could/should probably do anyway?

Yes. Link in everything we can into libxul.
Comment 7 (dormant account) 2010-03-24 16:06:44 PDT
Created attachment 434721 [details] [diff] [review]
approach a
Comment 8 (dormant account) 2010-04-06 15:57:55 PDT
I was misinterpreting my logs. Turns out madvise() immediately triggers readahead. This makes linking everything into libxul.so, madvise()ing, then dlopen()ing it a robust way to read our binaries in fast.
Comment 9 (dormant account) 2010-05-24 18:39:42 PDT
Created attachment 447252 [details] [diff] [review]
libxul preloader

Here is another approach(dirty hack) for preloading the binary. This relies on gnu linker's function layout and gcc's initialization order.

This places a marker function on the front of the binary and a startup function on the end. Gcc runs initializers in reverse order so the end function runs before any other initializer. I also use function/variable positions to compute the approximate size of the .text and .data sections.

This loads the firefox into memory(with patches from bug 561842) with ~50 pagefaults. vs near 250 of a stock nightly.
Comment 10 (dormant account) 2010-05-24 18:44:36 PDT
Created attachment 447253 [details] [diff] [review]
libxul preloader

Correct patch
Comment 11 Mike Shaver (:shaver -- probably not reading bugmail closely) 2010-06-04 10:38:20 PDT
Comment on attachment 447253 [details] [diff] [review]
libxul preloader

This is the best patch I've ever seen. Please never ever commit it to the tree.
Comment 12 Jim Blandy :jimb 2010-06-04 10:42:20 PDT
(In reply to comment #11)
> (From update of attachment 447253 [details] [diff] [review])
> This is the best patch I've ever seen. Please never ever commit it to the tree.

Whatever, but this is massively less invasive and complicated than the other stuff Taras had been looking into.
Comment 13 Mike Shaver (:shaver -- probably not reading bugmail closely) 2010-06-04 11:39:19 PDT
Actually, I take it back: it could go in fine if it was a good win, I guess.  I don't really care about the performance of XULRunner-stubbed builds, since people who distribute that way are basically deciding to make startup slow anyway.  (And they tend to correlate with people who want to link against a bazillion system libraries, rather than use the in-tree ones that we can get into a big libxul anyway.)
Comment 14 (dormant account) 2010-06-04 12:03:45 PDT
(In reply to comment #13)
> Actually, I take it back: it could go in fine if it was a good win, I guess.  I
> don't really care about the performance of XULRunner-stubbed builds, since
> people who distribute that way are basically deciding to make startup slow
> anyway.  (And they tend to correlate with people who want to link against a
> bazillion system libraries, rather than use the in-tree ones that we can get
> into a big libxul anyway.)

I'm not sure why this would negatively affect xulrunner stubs. I would also point out that using system libraries instead of [even statically] linking our own is a performance win as they have a higher chance of being cached already.
Comment 15 Karl Tomlinson (:karlt) 2010-06-04 14:38:38 PDT
Have you considered using the build system to get the appropriate libxul offsets and compile them into the executable in an approach like comment 1 / attachment 434313 [details] with mmap, madvise, dlopen?
Comment 16 Mike Hommey [:glandium] 2010-06-05 00:06:06 PDT
You could use linker scripts to set symbols with the proper values to get the offsets. Anyways, I don't think we need this yet.
Comment 17 (dormant account) 2010-07-02 16:21:28 PDT
Created attachment 455791 [details] [diff] [review]
firefox-bin that dynamically opens libxul.so

This turned out to be super-easy. This version of the hack use dlopen(as suggested above) to workaround avoid ld.so stupidity.
Comment 18 Mike Hommey [:glandium] 2011-03-30 01:28:37 PDT

*** This bug has been marked as a duplicate of bug 632404 ***

Note You need to log in before you can comment on or make changes to this bug.