Closed Bug 549749 Opened 14 years ago Closed 6 years ago

Icegrind: Valgrind plugin to produce an exact chronological log of functions/data accesses during startup

Categories

(Core :: General, defect)

x86
macOS
defect
Not set
normal

Tracking

()

RESOLVED WONTFIX

People

(Reporter: taras.mozilla, Assigned: taras.mozilla)

References

(Blocks 1 open bug, )

Details

Attachments

(2 files, 5 obsolete files)

Notes from irc:

need to write an instrumentation function which
a) looks at the IRStmt_IMarks (case Ist_IMark).  when it sees new addresses for the first time, then print something out via one of the various routines in the public header of m_debuginfo (pub_tool_debuginfo.h) will convert it into a name

b) hands every data load/store (Ist_Store, Iex_Load) to an runtime helper function, a la ptrcheck, memcheck, helgrind, lackey that maintains a big bitmap.

The output of the plugin(with postprocessing) will be a set of linker scripts that will produce cold-startup-friendly binaries.
The data accesses are given by Lackey's --trace-mem option.  That code is guarded by the 'clo_trace_mem' variable so is easy to extract from Lackey.

As for the function calls, that's a bit trickier.  Well, doing it imperfectly is pretty easy.  Doing it really accurately is harder, at least it is if you want to identify calls *and* returns, because of things like setjmp/longjmp and tail recursion.  Callgrind has some code that tries to do it accurately.
Taras, if you give me a reasonably precise spec of what the output should look like, I can hack it up for you early next week.
Attached patch lackey hack (obsolete) — Splinter Review
Nicholas, 
I appreciate your offer to help. I followed Julian's instructions and ended up with this patch. It captures data accesses as desired, but can't seem to catch any functions. Not sure if I'm doing something wrong there(going to look at callgrind next).

Spec:
I would like to obtain a chronological log which consists of functions(.text) and the .data that they access the first time they are accessed(ie modelling page-ins during application startup). This should then be easy to turn into a linker script to reorder the binary such that it is optimized for sequential reading during cold startup.
Assignee: nobody → tglek
Note, my problem turned out to be a weird testcase. Everything is working great now, moving on to make this useful.
Comment on attachment 430136 [details] [diff] [review]
lackey hack

> static VG_REGPARM(2) void trace_instr(Addr addr, SizeT size)
> {
>-   VG_(printf)("I  %08lx,%lu\n", addr, size);
>+  char fnname[200];
>+  //VG_(printf)("I  %08lx,%lu\n", addr, size);
>+  if (remember_addr(addr) && 
>+      VG_(get_fnname_if_entry)(addr, fnname, sizeof(fnname))) {
>+    VG_(printf)(".text.%s\n", fnname);
>+  }
>+
> }

The function-detection part could be done at instrumentation-time (ie. in lk_instrument()) rather than run-time if you want it to be more efficient, but maybe that's not important for your needs.


> static VG_REGPARM(2) void trace_modify(Addr addr, SizeT size)
> {
>-   VG_(printf)(" M %08lx,%lu\n", addr, size);
>+  //VG_(printf)(" M %08lx,%lu\n", addr, size);
> }

This function is for instructions that modify a memory location, eg. 'addl (%eax), 1'.  You probably want to consider that along with more normal loads.


Other than that, it looks ok and I think it should do what you want it to.
(In reply to comment #5)
 
> The function-detection part could be done at instrumentation-time (ie. in
> lk_instrument()) rather than run-time if you want it to be more efficient, but
> maybe that's not important for your needs.

Yeah, this was just a prototype. I've switched to doing function-detection like you/Julian suggested.

> > static VG_REGPARM(2) void trace_modify(Addr addr, SizeT size)
> > {
> >-   VG_(printf)(" M %08lx,%lu\n", addr, size);
> >+  //VG_(printf)(" M %08lx,%lu\n", addr, size);
> > }
> 
> This function is for instructions that modify a memory location, eg. 'addl
> (%eax), 1'.  You probably want to consider that along with more normal loads.

Thanks for the headsup

> Other than that, it looks ok and I think it should do what you want it to.

Unfortunately I ran into another snag. Is Valgrind supposed to work well on Fedora 12 yet? I'm using trunk from yesterday and it seems to fail to start firefox completely. I noticed that my tool gets through a significant portion of startup before firefox exits.

../vg-in-place --tool=none ~/builds/minefield.release/dist/bin/firefox-bin -P crash -no-remote

leads to
../vg-in-place: line 28:  5816 Killed                  VALGRIND_LIB="$vgbasedir/.in_place" VALGRIND_LIB_INNER="$vgbasedir/.in_place" "$vgbasedir/coregrind/valgrind" "$@"

If f12 isn't working well, what's a recent distribution that works well?
I don't know of any Fedora 12 issues, but Julian might.

Have you read this?

https://developer.mozilla.org/en/Debugging_Mozilla_with_Valgrind

You'll need --smc-check=all, unless you built Firefox with --enable-valgrind.
(In reply to comment #7)
> I don't know of any Fedora 12 issues, but Julian might.

There aren't any, I had my environment paths screwed up. Going to post a working prototype soon.
Attached patch working prototype (obsolete) — Splinter Review
Now valgrind outputs the linker scripts.
This was underwhelming. Got the exact same speedup as I did with function-only reorder, ie barely 150ms. 
The linker isn't moving the data sections as I expected. Will have to investigate more.
Attachment #430136 - Attachment is obsolete: true
Note to self:
Just as my head was about to explode on detecting the types of data sections being used and grouping them in linker scripts, Ian Taylor pointed out the simplified interface being proposed for gold: http://sourceware.org/ml/binutils/2010-03/msg00050.html
Attached file lk_main.c (obsolete) —
This is a valgrind plugin that outputs a chronological log of function/data-sections that are accessed by the program. 

The sections are produced by (http://hg.mozilla.org/users/tglek_mozilla.com/startup/file/6453ad2a7906/elflog.cpp) g++ elflog.cpp -g -O3 -o elflog -lelf &&  time ./elflog  --contents  ~/builds/minefield.release/dist/bin/libxul.so > ~/builds/minefield.release/dist/bin/libxul.so.sections

The plugin looks for .sections section-metadata every time a new file is mmap()ed. The section names live in an OSet. When a section is accessed, section name is printed to the log file and the section is removed from the OSet. The resulting section list is then fed into gnu gold to produce binaries with correct order.

This produces a libxul.so that has 2.5-3x less page faults on startup, 30% faster cold startup, and 2mb(out of 14mb) ram savings due to paging in less of libxul.

Posting the file in the hopes that V authors will help me strip the remaining lackey leftovers and package this up into an out-of-tree V plugin.
Attachment #430471 - Attachment is obsolete: true
> This produces a libxul.so that has 2.5-3x less page faults on startup, 30%
> faster cold startup, and 2mb(out of 14mb) ram savings due to paging in less of
> libxul.

Nice!  Glad you've made such good progress.

> Posting the file in the hopes that V authors will help me strip the remaining
> lackey leftovers and package this up into an out-of-tree V plugin.

Stripping out the Lackey leftovers shouldn't be hard -- I think you'd have the best idea of what's being used and what isn't.  Or maybe you need help putting into a separate tool?  http://www.valgrind.org/docs/manual/manual-writing-tools.html explains how, including build system stuff, I last updated it for the 3.5.0 release so it should be accurate.

Unfortunately there isn't any good way to package out-of-tree Valgrind plug-ins.  The two options are (a) a patch, or (b) get it included in the source distribution.  The former shouldn't be too painful since most of the changes will be additions in separate files, AFAICT.  The latter would require adding docs, preferably some tests, and a commitment to maintain it.
> 
> > Posting the file in the hopes that V authors will help me strip the remaining
> > lackey leftovers and package this up into an out-of-tree V plugin.
> 
> Stripping out the Lackey leftovers shouldn't be hard -- I think you'd have the
> best idea of what's being used and what isn't.  Or maybe you need help putting
> into a separate tool?

Both. My main problem with ripping out code is I do not understand the setup code that
sets up trace_load/modify/etc callbacks. Those are the only parts of lackey that I need.


> http://www.valgrind.org/docs/manual/manual-writing-tools.html explains how,
> including build system stuff, I last updated it for the 3.5.0 release so it
> should be accurate.

Those docs are really good. Got my tool building. It's too bad out-of-tree plugins aren't supported, I'll distribute this as a V patch for now.

> 
> Unfortunately there isn't any good way to package out-of-tree Valgrind
> plug-ins.  The two options are (a) a patch, or (b) get it included in the
> source distribution.  The former shouldn't be too painful since most of the
> changes will be additions in separate files, AFAICT.  The latter would require
> adding docs, preferably some tests, and a commitment to maintain it.

I'm ok with maintaining and all the requirements that come with it if we manage to get this deployed for our nightly builds.
(In reply to comment #13)

> > Stripping out the Lackey leftovers shouldn't be hard -- I think you'd have the
> > best idea of what's being used and what isn't.  Or maybe you need help putting
> > into a separate tool?
> 
> Both. My main problem with ripping out code is I do not understand the setup
> code that
> sets up trace_load/modify/etc callbacks. Those are the only parts of lackey
> that I need.

If you put the latest version up, I'll have a poke at it.  Shouldn't
be difficult.

---

ATM this is just for Linux, right?  Do you want this stuff to work for
MacOSX and Win32 too?
Attached patch icegrind (obsolete) — Splinter Review
This is now feature-complete. I hooked into valgrind's mmap notifications and life became easy. I would appreciate any suggestions for improvement.

This code makes a mess of mixing Addr,Word,64bit<>32bit conversions, sorry about that, will clean that up before submitting this for inclusion in valgrind.
Attachment #437169 - Attachment is obsolete: true
Summary: Valgrind plugin to produce an exact chronological log of functions/data accesses during startup → Icegrind: Valgrind plugin to produce an exact chronological log of functions/data accesses during startup
Comment on attachment 437664 [details] [diff] [review]
icegrind

I had a quick look over the code...

>+/*--------------------------------------------------------------------*/
>+/*--- A tool to produce a chronological log of functions/data access  */
>+/*--------------------------------------------------------------------*/
>+
>+/*
>+  This file is part of Lackey, an example Valgrind tool that does
>+  some simple program measurement and tracing.

You should update the tool description.


>+static void lk_post_clo_init(void)
>+{
>+
>+}

You should change the "lk_" prefixes to "ice_" to match the filename.


>+    case Ist_IMark:
>+           
>+      /* Needed to be able to check for inverted condition in Ist_Exit */
>+      inspect_addr( st->Ist.IMark.addr, st->Ist.IMark.len);
>+      break;

You realise this call will occur at instrumentation-time, not at run-time,
right?  Hopefully you do, but it's easy to get confused between the two.


>Index: coregrind/m_tooliface.c
>===================================================================
>--- coregrind/m_tooliface.c	(revision 11100)
>+++ coregrind/m_tooliface.c	(working copy)
>@@ -330,6 +330,14 @@
>    VG_(tdict).tool_final_IR_tidy_pass = final_tidy;
> }
> 
>+void VG_(needs_mmap_notify)(
>+                            void (*track_mmap)(struct NSegment const * seg),
>+                            void (*track_munmap)(Addr a, SizeT len)
>+                            ) 
>+{
>+  VG_(tdict).track_mmap = track_mmap;
>+  VG_(tdict).track_munmap = track_munmap;
>+}
> /*--------------------------------------------------------------------*/
> /* Tracked events.  Digit 'n' on DEFn is the REGPARMness. */
> 

Do you know about the existing track_new_mem_map and track_die_mem_munmap
events?  This is just replicating them.
Attached patch icegrind (obsolete) — Splinter Review
Uploading my working copy, this isn't ready for review.
Attachment #437664 - Attachment is obsolete: true
mmap()s sections, switched ist_mark: case over to record functions at runtime. This results in a 4-5x slowdown, but hopefully a more realistic log.
TODO: mmap() a binary sections file to avoid building datastructures for section descriptors at runtime.
Attachment #448782 - Attachment is obsolete: true
This also handles nested mmaps
Comment on attachment 449748 [details] [diff] [review]
ice grind checkpoint

>+  while((iterator_mmap = VG_(OSetGen_Next)(mmaps)) != NULL) {
>+    if (iterator_mmap->addr < a->addr && iterator_mmap->end > a->end) {
>+      if (0)
>+        VG_(printf)("nested ");
>+      if (!iterator_mmap->sub_mmaps) {
>+        iterator_mmap->sub_mmaps = VG_(OSetGen_Create)(0, NULL, VG_(malloc), "sub_mmaps", VG_(free));
>+      }
>+      dest_mmaps = iterator_mmap->sub_mmaps;
>+      break;

Wouldn't that be better done with find_mmap?
Comment on attachment 449748 [details] [diff] [review]
ice grind checkpoint

>+static void track_mmap(NSegment const * seg) {
(...)
>+  VG_(OSetGen_Insert)(mmaps, a);

This should be dest_mmaps instead of mmaps, otherwise nested mmaps don't work. Note, however, that if the new mmap starts in a previous one, but ends after its end, that won't work either. Maybe the best would be to actually cut the original mmap (which is afaiu what happens with real mmaps anyway)
To make icegrind properly work with latest valgrind trunk, you need to add a few things to Makefile.am. Here is the new version.
(In reply to comment #16)
> Do you know about the existing track_new_mem_map and track_die_mem_munmap
> events?  This is just replicating them.

The downside of this is that it doesn't track the original mmap() for the executable binary (though libraries loaded by ld.so are still tracked)
(In reply to comment #24)
> (In reply to comment #16)
> > Do you know about the existing track_new_mem_map and track_die_mem_munmap
> > events?  This is just replicating them.
> 
> The downside of this is that it doesn't track the original mmap() for the
> executable binary (though libraries loaded by ld.so are still tracked)

track_new_mem_startup should get the original binary, IIRC.
Blocks: 622908
Seems that there isn't a strong interest on that anymore. Closing.
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: