Last Comment Bug 549749 - Icegrind: Valgrind plugin to produce an exact chronological log of functions/data accesses during startup
: Icegrind: Valgrind plugin to produce an exact chronological log of functions/...
Status: NEW
:
Product: Core
Classification: Components
Component: General (show other bugs)
: Trunk
: x86 Mac OS X
: -- normal with 1 vote (vote)
: ---
Assigned To: (dormant account)
:
:
Mentors:
http://developer.apple.com/mac/librar...
: 523655 (view as bug list)
Depends on:
Blocks: 531406 622908
  Show dependency treegraph
 
Reported: 2010-03-02 15:03 PST by (dormant account)
Modified: 2012-07-19 15:56 PDT (History)
11 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments
lackey hack (2.65 KB, patch)
2010-03-03 13:27 PST, (dormant account)
no flags Details | Diff | Splinter Review
working prototype (11.19 KB, patch)
2010-03-04 17:00 PST, (dormant account)
no flags Details | Diff | Splinter Review
lk_main.c (25.93 KB, text/x-csrc)
2010-04-05 16:55 PDT, (dormant account)
no flags Details
icegrind (21.42 KB, patch)
2010-04-07 14:21 PDT, (dormant account)
no flags Details | Diff | Splinter Review
icegrind (21.34 KB, patch)
2010-06-02 10:26 PDT, (dormant account)
no flags Details | Diff | Splinter Review
ice grind checkpoint (22.83 KB, patch)
2010-06-07 16:00 PDT, (dormant account)
no flags Details | Diff | Splinter Review
New Makefile.am for latest valgrind trunk (2.08 KB, application/octet-stream)
2010-09-17 05:59 PDT, Mike Hommey [:glandium]
no flags Details

Description (dormant account) 2010-03-02 15:03:18 PST
Notes from irc:

need to write an instrumentation function which
a) looks at the IRStmt_IMarks (case Ist_IMark).  when it sees new addresses for the first time, then print something out via one of the various routines in the public header of m_debuginfo (pub_tool_debuginfo.h) will convert it into a name

b) hands every data load/store (Ist_Store, Iex_Load) to an runtime helper function, a la ptrcheck, memcheck, helgrind, lackey that maintains a big bitmap.

The output of the plugin(with postprocessing) will be a set of linker scripts that will produce cold-startup-friendly binaries.
Comment 1 Nicholas Nethercote [:njn] 2010-03-03 12:55:33 PST
The data accesses are given by Lackey's --trace-mem option.  That code is guarded by the 'clo_trace_mem' variable so is easy to extract from Lackey.

As for the function calls, that's a bit trickier.  Well, doing it imperfectly is pretty easy.  Doing it really accurately is harder, at least it is if you want to identify calls *and* returns, because of things like setjmp/longjmp and tail recursion.  Callgrind has some code that tries to do it accurately.
Comment 2 Nicholas Nethercote [:njn] 2010-03-03 13:15:15 PST
Taras, if you give me a reasonably precise spec of what the output should look like, I can hack it up for you early next week.
Comment 3 (dormant account) 2010-03-03 13:27:56 PST
Created attachment 430136 [details] [diff] [review]
lackey hack

Nicholas, 
I appreciate your offer to help. I followed Julian's instructions and ended up with this patch. It captures data accesses as desired, but can't seem to catch any functions. Not sure if I'm doing something wrong there(going to look at callgrind next).

Spec:
I would like to obtain a chronological log which consists of functions(.text) and the .data that they access the first time they are accessed(ie modelling page-ins during application startup). This should then be easy to turn into a linker script to reorder the binary such that it is optimized for sequential reading during cold startup.
Comment 4 (dormant account) 2010-03-03 14:40:49 PST
Note, my problem turned out to be a weird testcase. Everything is working great now, moving on to make this useful.
Comment 5 Nicholas Nethercote [:njn] 2010-03-03 21:18:01 PST
Comment on attachment 430136 [details] [diff] [review]
lackey hack

> static VG_REGPARM(2) void trace_instr(Addr addr, SizeT size)
> {
>-   VG_(printf)("I  %08lx,%lu\n", addr, size);
>+  char fnname[200];
>+  //VG_(printf)("I  %08lx,%lu\n", addr, size);
>+  if (remember_addr(addr) && 
>+      VG_(get_fnname_if_entry)(addr, fnname, sizeof(fnname))) {
>+    VG_(printf)(".text.%s\n", fnname);
>+  }
>+
> }

The function-detection part could be done at instrumentation-time (ie. in lk_instrument()) rather than run-time if you want it to be more efficient, but maybe that's not important for your needs.


> static VG_REGPARM(2) void trace_modify(Addr addr, SizeT size)
> {
>-   VG_(printf)(" M %08lx,%lu\n", addr, size);
>+  //VG_(printf)(" M %08lx,%lu\n", addr, size);
> }

This function is for instructions that modify a memory location, eg. 'addl (%eax), 1'.  You probably want to consider that along with more normal loads.


Other than that, it looks ok and I think it should do what you want it to.
Comment 6 (dormant account) 2010-03-04 10:12:11 PST
(In reply to comment #5)
 
> The function-detection part could be done at instrumentation-time (ie. in
> lk_instrument()) rather than run-time if you want it to be more efficient, but
> maybe that's not important for your needs.

Yeah, this was just a prototype. I've switched to doing function-detection like you/Julian suggested.

> > static VG_REGPARM(2) void trace_modify(Addr addr, SizeT size)
> > {
> >-   VG_(printf)(" M %08lx,%lu\n", addr, size);
> >+  //VG_(printf)(" M %08lx,%lu\n", addr, size);
> > }
> 
> This function is for instructions that modify a memory location, eg. 'addl
> (%eax), 1'.  You probably want to consider that along with more normal loads.

Thanks for the headsup

> Other than that, it looks ok and I think it should do what you want it to.

Unfortunately I ran into another snag. Is Valgrind supposed to work well on Fedora 12 yet? I'm using trunk from yesterday and it seems to fail to start firefox completely. I noticed that my tool gets through a significant portion of startup before firefox exits.

../vg-in-place --tool=none ~/builds/minefield.release/dist/bin/firefox-bin -P crash -no-remote

leads to
../vg-in-place: line 28:  5816 Killed                  VALGRIND_LIB="$vgbasedir/.in_place" VALGRIND_LIB_INNER="$vgbasedir/.in_place" "$vgbasedir/coregrind/valgrind" "$@"

If f12 isn't working well, what's a recent distribution that works well?
Comment 7 Nicholas Nethercote [:njn] 2010-03-04 13:03:21 PST
I don't know of any Fedora 12 issues, but Julian might.

Have you read this?

https://developer.mozilla.org/en/Debugging_Mozilla_with_Valgrind

You'll need --smc-check=all, unless you built Firefox with --enable-valgrind.
Comment 8 (dormant account) 2010-03-04 14:02:07 PST
(In reply to comment #7)
> I don't know of any Fedora 12 issues, but Julian might.

There aren't any, I had my environment paths screwed up. Going to post a working prototype soon.
Comment 9 (dormant account) 2010-03-04 17:00:10 PST
Created attachment 430471 [details] [diff] [review]
working prototype

Now valgrind outputs the linker scripts.
This was underwhelming. Got the exact same speedup as I did with function-only reorder, ie barely 150ms. 
The linker isn't moving the data sections as I expected. Will have to investigate more.
Comment 10 (dormant account) 2010-03-05 11:49:19 PST
Note to self:
Just as my head was about to explode on detecting the types of data sections being used and grouping them in linker scripts, Ian Taylor pointed out the simplified interface being proposed for gold: http://sourceware.org/ml/binutils/2010-03/msg00050.html
Comment 11 (dormant account) 2010-04-05 16:55:40 PDT
Created attachment 437169 [details]
lk_main.c

This is a valgrind plugin that outputs a chronological log of function/data-sections that are accessed by the program. 

The sections are produced by (http://hg.mozilla.org/users/tglek_mozilla.com/startup/file/6453ad2a7906/elflog.cpp) g++ elflog.cpp -g -O3 -o elflog -lelf &&  time ./elflog  --contents  ~/builds/minefield.release/dist/bin/libxul.so > ~/builds/minefield.release/dist/bin/libxul.so.sections

The plugin looks for .sections section-metadata every time a new file is mmap()ed. The section names live in an OSet. When a section is accessed, section name is printed to the log file and the section is removed from the OSet. The resulting section list is then fed into gnu gold to produce binaries with correct order.

This produces a libxul.so that has 2.5-3x less page faults on startup, 30% faster cold startup, and 2mb(out of 14mb) ram savings due to paging in less of libxul.

Posting the file in the hopes that V authors will help me strip the remaining lackey leftovers and package this up into an out-of-tree V plugin.
Comment 12 Nicholas Nethercote [:njn] 2010-04-05 17:58:58 PDT
> This produces a libxul.so that has 2.5-3x less page faults on startup, 30%
> faster cold startup, and 2mb(out of 14mb) ram savings due to paging in less of
> libxul.

Nice!  Glad you've made such good progress.

> Posting the file in the hopes that V authors will help me strip the remaining
> lackey leftovers and package this up into an out-of-tree V plugin.

Stripping out the Lackey leftovers shouldn't be hard -- I think you'd have the best idea of what's being used and what isn't.  Or maybe you need help putting into a separate tool?  http://www.valgrind.org/docs/manual/manual-writing-tools.html explains how, including build system stuff, I last updated it for the 3.5.0 release so it should be accurate.

Unfortunately there isn't any good way to package out-of-tree Valgrind plug-ins.  The two options are (a) a patch, or (b) get it included in the source distribution.  The former shouldn't be too painful since most of the changes will be additions in separate files, AFAICT.  The latter would require adding docs, preferably some tests, and a commitment to maintain it.
Comment 13 (dormant account) 2010-04-06 11:42:29 PDT
> 
> > Posting the file in the hopes that V authors will help me strip the remaining
> > lackey leftovers and package this up into an out-of-tree V plugin.
> 
> Stripping out the Lackey leftovers shouldn't be hard -- I think you'd have the
> best idea of what's being used and what isn't.  Or maybe you need help putting
> into a separate tool?

Both. My main problem with ripping out code is I do not understand the setup code that
sets up trace_load/modify/etc callbacks. Those are the only parts of lackey that I need.


> http://www.valgrind.org/docs/manual/manual-writing-tools.html explains how,
> including build system stuff, I last updated it for the 3.5.0 release so it
> should be accurate.

Those docs are really good. Got my tool building. It's too bad out-of-tree plugins aren't supported, I'll distribute this as a V patch for now.

> 
> Unfortunately there isn't any good way to package out-of-tree Valgrind
> plug-ins.  The two options are (a) a patch, or (b) get it included in the
> source distribution.  The former shouldn't be too painful since most of the
> changes will be additions in separate files, AFAICT.  The latter would require
> adding docs, preferably some tests, and a commitment to maintain it.

I'm ok with maintaining and all the requirements that come with it if we manage to get this deployed for our nightly builds.
Comment 14 Julian Seward [:jseward] 2010-04-07 05:28:52 PDT
(In reply to comment #13)

> > Stripping out the Lackey leftovers shouldn't be hard -- I think you'd have the
> > best idea of what's being used and what isn't.  Or maybe you need help putting
> > into a separate tool?
> 
> Both. My main problem with ripping out code is I do not understand the setup
> code that
> sets up trace_load/modify/etc callbacks. Those are the only parts of lackey
> that I need.

If you put the latest version up, I'll have a poke at it.  Shouldn't
be difficult.

---

ATM this is just for Linux, right?  Do you want this stuff to work for
MacOSX and Win32 too?
Comment 15 (dormant account) 2010-04-07 14:21:03 PDT
Created attachment 437664 [details] [diff] [review]
icegrind

This is now feature-complete. I hooked into valgrind's mmap notifications and life became easy. I would appreciate any suggestions for improvement.

This code makes a mess of mixing Addr,Word,64bit<>32bit conversions, sorry about that, will clean that up before submitting this for inclusion in valgrind.
Comment 16 Nicholas Nethercote [:njn] 2010-04-08 18:16:38 PDT
Comment on attachment 437664 [details] [diff] [review]
icegrind

I had a quick look over the code...

>+/*--------------------------------------------------------------------*/
>+/*--- A tool to produce a chronological log of functions/data access  */
>+/*--------------------------------------------------------------------*/
>+
>+/*
>+  This file is part of Lackey, an example Valgrind tool that does
>+  some simple program measurement and tracing.

You should update the tool description.


>+static void lk_post_clo_init(void)
>+{
>+
>+}

You should change the "lk_" prefixes to "ice_" to match the filename.


>+    case Ist_IMark:
>+           
>+      /* Needed to be able to check for inverted condition in Ist_Exit */
>+      inspect_addr( st->Ist.IMark.addr, st->Ist.IMark.len);
>+      break;

You realise this call will occur at instrumentation-time, not at run-time,
right?  Hopefully you do, but it's easy to get confused between the two.


>Index: coregrind/m_tooliface.c
>===================================================================
>--- coregrind/m_tooliface.c	(revision 11100)
>+++ coregrind/m_tooliface.c	(working copy)
>@@ -330,6 +330,14 @@
>    VG_(tdict).tool_final_IR_tidy_pass = final_tidy;
> }
> 
>+void VG_(needs_mmap_notify)(
>+                            void (*track_mmap)(struct NSegment const * seg),
>+                            void (*track_munmap)(Addr a, SizeT len)
>+                            ) 
>+{
>+  VG_(tdict).track_mmap = track_mmap;
>+  VG_(tdict).track_munmap = track_munmap;
>+}
> /*--------------------------------------------------------------------*/
> /* Tracked events.  Digit 'n' on DEFn is the REGPARMness. */
> 

Do you know about the existing track_new_mem_map and track_die_mem_munmap
events?  This is just replicating them.
Comment 17 (dormant account) 2010-04-20 14:35:47 PDT
*** Bug 523655 has been marked as a duplicate of this bug. ***
Comment 18 (dormant account) 2010-06-02 10:26:46 PDT
Created attachment 448782 [details] [diff] [review]
icegrind

Uploading my working copy, this isn't ready for review.
Comment 19 (dormant account) 2010-06-07 16:00:04 PDT
Created attachment 449748 [details] [diff] [review]
ice grind checkpoint

mmap()s sections, switched ist_mark: case over to record functions at runtime. This results in a 4-5x slowdown, but hopefully a more realistic log.
TODO: mmap() a binary sections file to avoid building datastructures for section descriptors at runtime.
Comment 20 (dormant account) 2010-06-07 16:00:16 PDT
This also handles nested mmaps
Comment 21 Mike Hommey [:glandium] 2010-06-08 03:30:33 PDT
Comment on attachment 449748 [details] [diff] [review]
ice grind checkpoint

>+  while((iterator_mmap = VG_(OSetGen_Next)(mmaps)) != NULL) {
>+    if (iterator_mmap->addr < a->addr && iterator_mmap->end > a->end) {
>+      if (0)
>+        VG_(printf)("nested ");
>+      if (!iterator_mmap->sub_mmaps) {
>+        iterator_mmap->sub_mmaps = VG_(OSetGen_Create)(0, NULL, VG_(malloc), "sub_mmaps", VG_(free));
>+      }
>+      dest_mmaps = iterator_mmap->sub_mmaps;
>+      break;

Wouldn't that be better done with find_mmap?
Comment 22 Mike Hommey [:glandium] 2010-06-08 04:26:22 PDT
Comment on attachment 449748 [details] [diff] [review]
ice grind checkpoint

>+static void track_mmap(NSegment const * seg) {
(...)
>+  VG_(OSetGen_Insert)(mmaps, a);

This should be dest_mmaps instead of mmaps, otherwise nested mmaps don't work. Note, however, that if the new mmap starts in a previous one, but ends after its end, that won't work either. Maybe the best would be to actually cut the original mmap (which is afaiu what happens with real mmaps anyway)
Comment 23 Mike Hommey [:glandium] 2010-09-17 05:59:49 PDT
Created attachment 476231 [details]
New Makefile.am for latest valgrind trunk

To make icegrind properly work with latest valgrind trunk, you need to add a few things to Makefile.am. Here is the new version.
Comment 24 Mike Hommey [:glandium] 2010-09-21 09:25:30 PDT
(In reply to comment #16)
> Do you know about the existing track_new_mem_map and track_die_mem_munmap
> events?  This is just replicating them.

The downside of this is that it doesn't track the original mmap() for the executable binary (though libraries loaded by ld.so are still tracked)
Comment 25 Nicholas Nethercote [:njn] 2010-09-21 15:39:15 PDT
(In reply to comment #24)
> (In reply to comment #16)
> > Do you know about the existing track_new_mem_map and track_die_mem_munmap
> > events?  This is just replicating them.
> 
> The downside of this is that it doesn't track the original mmap() for the
> executable binary (though libraries loaded by ld.so are still tracked)

track_new_mem_startup should get the original binary, IIRC.

Note You need to log in before you can comment on or make changes to this bug.