Implement SPS stackwalk using the breakpad unwinder

RESOLVED FIXED in mozilla22

Status

()

defect
RESOLVED FIXED
7 years ago
6 years ago

People

(Reporter: jseward, Assigned: jseward)

Tracking

Trunk
mozilla22
ARM
Linux
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(7 attachments, 28 obsolete attachments)

5.93 KB, patch
Details | Diff | Splinter Review
4.07 KB, patch
glandium
: review+
ted
: checkin+
Details | Diff | Splinter Review
9.43 KB, patch
glandium
: review+
ted
: checkin+
Details | Diff | Splinter Review
2.19 KB, patch
glandium
: review+
ted
: checkin+
Details | Diff | Splinter Review
154.69 KB, patch
BenWa
: review+
Details | Diff | Splinter Review
1.39 KB, text/plain
Details
13.84 KB, text/plain
Details
This supersedes bug 769241, which proposed to implement this
functionality using libunwind.  But that gives two different Dwarf
based CFI based unwinders.  Hence it might be better to abandon
libunwind and concentrate work on using breakpad instead.
Isn't breakpad unwinder going to give worse performance than libunwind? Also, upstream breakpad doesn't support unwinding system libraries and needs this patch:
http://hg.mozilla.org/users/tmielczarek_mozilla.com/breakpad-mq/file/bec0dd47992f/arm-unwind
Another thing to keep in mind is how much more is this going to consume at runtime. Mobile will quickly kill the app if we ask for just a few 100 MB's.
Julian, I'm interested to hear about your plans for reading the CFI using Breakpad's current code. Right now, the dump_syms executable reads all the DWARF and produces a single Breakpad-format symbol file, which is typically gigantic. minidump_stackwalk reads entire files, and hands out translations of the DWARF CFI into a little texty stack language for SimpleCFIWalker to consume. So there are two difficulties:

1) The reader reads entire .debug_frame sections, instead of focusing on the section that we actually are going to use.

2) The reader converts the CFI into an intermediate format which is then re-parsed and interpreted, instead of just using the CFI directly.

I'm guessing that both of these would need to be reworked before this unwinder was practical for use in a profiler.

Isn't the Valgrind unwinder better suited for this application? It's already used for profiling.
s/focusing on the section/focusing on the portion/
(In reply to Jim Blandy :jimb from comment #3)
> Julian, I'm interested to hear about your plans for reading the CFI using
> Breakpad's current code.

I hadn't got that far.  So far I have been grappling with the problem
of whether it's even possible to glue the breakpad pieces together to
create a standalone in-process unwinder, and learning about the top
level structure.

> [...] there are two difficulties: [...]
> I'm guessing that both of these would need to be reworked [...]

I'm getting contradictory opinions re what the best thing to do is.
On the one hand, maintaining two separate unwinders seems wasteful.
On the other, it sounds like breakpad at least as it stands will be a
cpu and resource hog.  So I'm unclear which way to go.

The idea that maintaining two unwinders is wasteful is based on the
assumption that the hard bit is the core CFI unwinding system.  But if
that isn't so -- and instead the plumbing around the outside is the
hard bit -- then maybe it would be better to stick with libunwind for
unwinding.  I don't know.  I begin to suspect that the plumbing is
indeed the hard bit.  Advice welcomed.

If our copy of breakpad has to be extensively modified to be fast and
lightweight enough, then it will be more difficult to keep in sync
with the upstream version.

> Isn't the Valgrind unwinder better suited for this application? It's
> already used for profiling.

It's fast, maintained and (to me at least) familiar.  But it is
license incompatible (GPL2+) and so is out of the game.
Breakpad's overall structure, as a two-phase batch process designed to run on a big machine and operate on big files, really isn't suitable for reuse as a live unwinder. However, having slept on it, I think there are pieces of Breakpad that could be profitably reused. But simply fixing libunwind might be competitive; see what you think.

The overall pipeline is in two phases: symbol-dumping phase, where we run dump_syms to read the ELF/DWARF and write out a Breakpad-format symbol file; and stack-walking phase, where we run minidump_stackwalk to read a minidump, read the relevant Breakpad-format symbol files, and walk the stack.

Considering CFI only, the symbol dumping phase is sort of a pipeline of C++ objects:

- src/common/linux/dump_symbols.cc:LoadDwarfCFI stitches the pieces listed below together and makes them go, given the appropriate DWARF sections in the ELF file, as found by its caller.

- dwarf2reader::CallFrameInfo is the actual CFI parser. The comments in src/common/dwarf/dwarf2reader.h are meant to document it completely. In brief, you give a parser a dwarf2reader::CallFrameInfo::Handler implementation, and the parser will tell the handler about whatever FDEs it finds and the rules they contain.

- google_breakpad::DwarfCFIToModule implements CallFrameInfo::Handler. Given the FDEs and rules it receives, it populates a google_breakpad::Module.

- google_breakpad::Module accumulates all the debugging information extracted from whatever sources the dumper can find --- DWARF, STABS, the ELF symbol table, etc. --- and knows how to write it out as a Breakpad symbol file.

For source position data and functions, the pipeline is similar:

- Similar to the above, src/common/linux/dump_symbols.cc:LoadDwarf stitches the pieces together and drives the process.

- dwarf2reader::CompilationUnit parses one compilation unit, from the .debug_info section, at a time, passing DIE data to a Dwarf2Handler instance. (I didn't write this code.)

- dwarf2reader::DieDispatcher implements Dwarf2Handler, receiving data from the DIE parser, but does no real work other than driving a higher-level handler interface, DIEHandler (and RootDIEHandler). (My intent was for DIEHandler to be an easier interface to implement than Dwarf2Handler. I don't know if it actually is.) Again, the comments in dwarf2reader.h are meant to be complete.

- google_breakpad::DwarfCUToModule implements RootDIEHandler, and has a bunch of helper classes that implement DIEHandler for various kinds of dies. It takes the DWARF DIE data and populates a google_breakpad::Module. In particular, when it gets a DW_TAG_compile_unit with a DW_AT_stmt_list attribute, it passes that off to a LineToModuleFunctor to get the source line information as well.

- DumperLineToModule implements LineToModuleFunctor. It uses dwarf2reader::LineInfo to parse the DWARF .debug_line section contents, passing it a DwarfLineToModule handler to take the line data and populate a Module with it.

- And as above, google_breakpad::Module accumulates all the function and line data and writes it out as a Breakpad-format symbol file.


For live use, we need to cut the breakpad-format symbol files out of the picture. DwarfCFIToModule, DwarfCUToModule, DwarfLineToModule, and Module are too specific to the two-phase batch process to be useful. But I think the actual DWARF parsers, dwarf2reader::CallFrameInfo, dwarf2reader::CompilationUnit, and dwarf2reader::LineInfo, could be readily reused.

CallFrameInfo's unit tests have full code and branch coverage, except for some error handling, and it's been pretty trustworthy. It is presently oriented towards reading an entire .debug_frame or .eh_frame section at a time, so you'd have to add something to read a single FDE. I believe it should be easy to use the current interface to build an address-to-FDE map. It reads a CIE each time it finds an FDE that refers to it, so each FDE's parsing is pretty much independent.

One possible concern is that breakpad makes extensive use of STL classes: std::string, std::vector, std::map, and so on. Those can be told not to throw exceptions, but I seem to recall there were other reasons not to want STL classes in Firefox.

The heart of the stack walking phase is driven by the StackwalkerFoo classes in src/processor/stackwalker_*.cc. The stackwalkers consult concrete subclasses of SymbolSupplier and SourceLineResolver to do all their symbol reading; you could probably write custom implementations of those that actually do the DWARF reading on demand.
That sounds about exactly what I was envisioning.
(In reply to Jim Blandy :jimb from comment #6)

Thanks for the detailed description/analysis.

> The heart of the stack walking phase is driven by the StackwalkerFoo classes
> in src/processor/stackwalker_*.cc. The stackwalkers consult concrete
> subclasses of SymbolSupplier and SourceLineResolver to do all their symbol
> reading; you could probably write custom implementations of those that
> actually do the DWARF reading on demand.

Did you have in mind a specific reason why it would be desirable to
create custom SymbolSupplier/SourceLineResolver implementations rather
than re-using the existing facilities?  [Unclear if you had something
in mind here, or whether it was merely a theoretical observation.]

> One possible concern is that breakpad makes extensive use of STL classes:
> std::string, std::vector, std::map, and so on. Those can be told not to
> throw exceptions, but I seem to recall there were other reasons not to want
> STL classes in Firefox.

A serious related problem that happened with SPS+libunwind, and will
happen here too, is that the profiler deadlocks as soon as we try to
take a sample on a thread which happens to be inside malloc (et al)
and is holding the malloc lock.

That means either -- completely isolating the code we want to run from
the outside world, so it can't do dynamic memory allocation.  Or
having only a simple bit of code run in the signal handler, that ships
register state and raw stack to a different thread that does the
unwind.  Even that's pretty dangerous though.
(In reply to Julian Seward from comment #8)
> Did you have in mind a specific reason why it would be desirable to
> create custom SymbolSupplier/SourceLineResolver implementations rather
> than re-using the existing facilities?  [Unclear if you had something
> in mind here, or whether it was merely a theoretical observation.]

I *believe* the extant implementations of those interfaces are the code that actually reads the Breakpad-format symbol files. We'd want to change them to talk to the DWARF readers directly (or through a cache).

> A serious related problem that happened with SPS+libunwind, and will
> happen here too, is that the profiler deadlocks as soon as we try to
> take a sample on a thread which happens to be inside malloc (et al)
> and is holding the malloc lock.

Yes, signals are lovely, no?

> That means either -- completely isolating the code we want to run from
> the outside world, so it can't do dynamic memory allocation.  Or
> having only a simple bit of code run in the signal handler, that ships
> register state and raw stack to a different thread that does the
> unwind.  Even that's pretty dangerous though.

If malloc (or anything else) needs a global lock, then having a different thread do it won't help: the thread whose stack is being sampled will never progress enough to release the lock.

I believe all the STL classes let one specify the allocator to use. So we could have them run from a separate heap.

If it were possible to distinguish allocations of data retained between samples (say, caches) and allocations of data that was only needed within a sample (say, DWARF reader state), then we could use a simple bump allocator for the latter. If the former class really included only caches, we could preallocate that.
Here's a proof of concept program showing how to do in-process CFI
unwinds on with breakpad.  It contains implementations of parent
classes CodeModule and CodeModules that scoop the necessary
information out of /proc/self/maps.

It also contains an implementation of SymbolSupplier which joins the
front part (what dump_syms would normally do -- reading the Dwarf etc
from ELF objects) to the back part (what minidump_stackwalk does --
unwinding) by the crude mechanism of dumping the symbol file into a
std::ostringstream and handing that to the unwinder.

To assess the costs of doing this, the program also forces aboard a
libxul.so w/ debug info, file size 389MB, and fakes up an unwind from
inside there, so we can see the costs of the serialise-parse step.
With a default "-g -O" build of breakpad on linux-amd64, this is
really bad -- about 300MB and 35 billion instructions.  However, most
of that is avoidable.  With the patch shown below it costs about 70MB
and 3.4 billion instructions.

Index: src/common/linux/dump_symbols.cc
===================================================================
--- src/common/linux/dump_symbols.cc	(revision 999)
+++ src/common/linux/dump_symbols.cc	(working copy)
@@ -539,9 +539,11 @@
     found_debug_info_section = true;
     found_usable_info = true;
     info->LoadedSection(".debug_info");
+#if 0
     if (!LoadDwarf<ElfClass>(obj_file, elf_header, big_endian, module))
       fprintf(stderr, "%s: \".debug_info\" section found, but failed to load "
               "DWARF debugging information\n", obj_file.c_str());
+#endif
   }
 
   // Dwarf Call Frame Information (CFI) is actually independent from
Proof of concept program showing how I'd like to integrate this.

The functions that we can safely call from within a signal handler are
very limited, and certainly don't include malloc/new or any file I/O.
So it's pretty much infeasible to unwind from inside the handler.
Even if it were, we'd also have to show that the unwinder (that is,
large swathes of the breakpad sources) are race-free once we are
sampling more than just one thread.

A safer solution is to have a minimal signal handler, which just
bundles up the top 32kB of stack and the machine registers, and hands
it off to an unwinder thread to do the real work.  This is what this
program demonstrates.  Even this is tricky because we can't even
safely use the usual locking primitives within the signal handler, so
this implementation provides its own spinlocks to do the inter-thread
communication.  It is designed so that the spins are indeed very
short.

Overall the burden of showing this is deadlock-, race- and
segfault-free is much reduced, compared to unwinding in the signal
handler.  The handler is completely self-contained and only needs to
call external function pthread_self (or windows equivalent), which is
pretty much unavoidable once we are sampling more than one thread, but
also probably pretty safe.

This program:

* demonstrates two threads being sampled, and one unwinder thread
* both sampling threads also do malloc/free, to simulate real workload
  and test for malloc-induced deadlocks
* no detected deadlocks in many runs
* no detected memory errors with Valgrind/Memcheck
* no detected data races with Valgrind/Helgrind
* unwinds are successfully done, using Dwarf CFI data
* all of the above verified on x86_64-linux and arm-linux
Attachment #651377 - Attachment is obsolete: true
This is excellent news! I'm a bit concerned about the performance and cache pressure of this approach but it's certainly pragmatic. It may turn out to work well if the overhead of copying is lower then unwinding which wouldn't entirely surprise me.

Do you have any estimates as to what happens when the stack is larger then 32kB? If the unwind stops gracefully it shouldn't be to bad because if we fail to unwind we fill in the stack with instrumented sample label so we still get a meaningful stack.
Demonstrates initial integration into m-c.  Runs ok on x86_64-linux
and is verified as using CFI info to unwind in libxul.so.  Main lack
is that samples are printed out from the unwinder thread and not yet
fed back to the |aProfile| for the thread.  Fixing that is the next
thing to do.  Also shows the required build system changes (pretty
minimal) and the changes required to our in-tree breakpad sources
(some minor importing of newer features from the breakpad svn trunk.)
Attachment #653706 - Attachment is obsolete: true
First working implementation.  Now feeds samples to the |aProfile| for
the thread, so it is possible to see the results in Cleopatra.
Unwinds in libxul.so and other Firefox libs.  Still has problems
picking up split debuginfo for system libraries, although it does
succeed for libc and libpthread on Ubuntu 10.04 (x86_64).  Fixing this
properly is the next thing to do.  Unwinds are currently limited
(arbitrarily) to 16 frames.

If you try it, you will eventually wind up with a lot of lines like
this

QQQ: unwinder: seqNo 1331, buf 3: got 16 frames (16 trustworthy)

which indicate that the unwinder thread is picking up samples from the
sampled thread(s) and unwinding them successfully.  This printing can
be disabled at BreakpadUnwind.cpp:1128.

There are sometimes performance problems due to repeated failing
attempts to load split debuginfo for system libraries.

To see which libraries it does manage to get debuginfo for, pass the
debug output through "| grep File4 | grep sym".
Attachment #656610 - Attachment is obsolete: true
As a sidenote to this, it would be useful to resync our breakpad
sources to the current svn tip for at least the following reasons:

* avoidance of space leaks -- current impl holds onto several tens of
  megabytes of memory that it doesn't need to

* newer breakpads can unwind on x86_64 by stack scanning when
  the CFI trail goes cold -- hence more robust

as well as for the reason that I had to copy in a bunch of changes
from the svn tip merely in order to get this to work at all.
As per previous patch, but more usable:

* can specify multiple directories for split debugobjs, so it
  can read debuginfo for system libraries more effectively

* controllable verbosity in logging, and better diagnostics re
  debuginfo file finding (see top of BreakpadUnwind.cpp)

* unwind depth increased to 32

Seems fairly stable on Ubuntu 10.04 (x86_64).  Analysis with Helgrind
shows there are data races when adding samples to the |aProfile|
objects.  I haven't got a fix for that yet, but it is not causing
obvious crashing, at least.
Attachment #656802 - Attachment is obsolete: true
This patch adds initial support for android-arm (ICS).  It appears to
be stable (deadlock + crash free) but mostly produces one-frame stack
traces.  I am not sure why this is, but I suspect it is because the
Fennec main thread is mostly blocked in sys_futex waiting for some
other thread to hand it events to process, the libc code for sys_futex
has no CFI unwind info, and the in-tree (old-ish) version of breakpad
does not know how to unwind by stack scanning.  Next thing to do is to
look at the feasibility of copying in arm stack scanning code from
breakpad svn trunk.
Attachment #657844 - Attachment is obsolete: true
Adds support for stack scanning on ARM.  Hence is the first version
that might actually be useful on Android.  Segfaults at shutdown,
although I don't know if this is related to this patch or not.
Attachment #658610 - Attachment is obsolete: true
(In reply to comment #17)
> I suspect it is because the
> Fennec main thread is mostly blocked in sys_futex waiting for some
> other thread to hand it events to process, the libc code for sys_futex
> has no CFI unwind info, and the in-tree (old-ish) version of breakpad
> does not know how to unwind by stack scanning.

The way I sometimes would verify this kind of thing would be to modify something like nsGlobalWindow::Alert to do a busy loop which would take a while, and then attempt to profile a page which calls alert, to make sure that the unwindings starts at a place where I can control the unwind info.
I mentioned this on IRC, but bug 791775 covers updating to the latest Breakpad.
Next steps:
* resync with updated breakpad sources (bug 791775)
* look at performance problems
Attachment #662962 - Attachment is obsolete: true
Comment on attachment 664830 [details] [diff] [review]
Implements stack merging with breakpad-generated native stacks

Review of attachment 664830 [details] [diff] [review]:
-----------------------------------------------------------------

I took the time to read your patch, so I thought I'd leave a few notes.

::: tools/profiler/UnwinderThread.cpp
@@ +1290,5 @@
> +
> +///////////////////////////////////////////////////////////////////
> +/* Implement MyCodeModule and MyCodeModules, so they pull the relevant
> +   information about which modules are loaded where out of
> +   /proc/self/maps. */

1) There's Breakpad code that parses /proc/self/maps in LinuxDumper::EnumerateMappings: http://code.google.com/p/google-breakpad/source/browse/trunk/src/client/linux/minidump_writer/linux_dumper.cc#164
2) glandium had some idea to use the DT_DEBUG auxv entry to get the library mappings instead of /proc/self/maps, bug 689178. This might be simpler and faster for your usecase as well.

@@ +1510,5 @@
> +#    endif
> +
> +     std::ostringstream sym_file_in_mem;
> +     bool ok = google_breakpad::WriteSymbolFile(objname,
> +                                                debug_dirs, sym_file_in_mem);

I believe we talked about this, but we could probably get rid of this ugliness at the expense of a bit more code by writing some glue code that took the debug-symbol-parsing bits of Breakpad (like dwarf_cfi_to_module) and used them directly in a new SourceLineResolverInterface subclass, thus cutting out the "huge string" intermediary.

@@ +1611,5 @@
> +                                 UnwinderThreadBuffer* buff,
> +                                 int buffNo /* for debug printing only */ )
> +{
> +# if defined(SPS_PLAT_amd64_linux)
> +  MDRawContextAMD64* context = new MDRawContextAMD64();

I'm pretty sure that lots of these types that you're heap-allocating can just be stack-allocated instead.
> 1) There's Breakpad code that parses /proc/self/maps in
> LinuxDumper::EnumerateMappings:
> http://code.google.com/p/google-breakpad/source/browse/trunk/src/client/
> linux/minidump_writer/linux_dumper.cc#164

See also MapsReporter::ParseMapping() in xpcom/base/MapsMemoryReporter.cpp.
Provides native and pseudo unwind capability, + stack merging, +all
the other stuff (response time measurement) on arm-android and
x86_64-linux.

Also has sampling mode and interval controllable by environment
variables -- I got tired of recompiling to change it:

Profiler: SPS: Environment variable usage:
Profiler: SPS: 
Profiler: SPS:   MOZ_PROFILER_MODE=native    for native unwind only
Profiler: SPS:   MOZ_PROFILER_MODE=pseudo    for pseudo unwind only
Profiler: SPS:   MOZ_PROFILER_MODE=combined  for combined native & pseudo unwind
Profiler: SPS:   If unset, default is 'combined' on native-capable
Profiler: SPS:     platforms, 'pseudo' on others.
Profiler: SPS: 
Profiler: SPS:   MOZ_PROFILER_INTERVAL=<number>   (milliseconds, 1 to 1000)
Profiler: SPS:   If unset, platform default is used.
Profiler: SPS: 
Profiler: SPS:   You can use SPS_MODE     instead of MOZ_PROFILER_MODE
Profiler: SPS:   You can use SPS_INTERVAL instead of MOZ_PROFILER_INTERVAL
Profiler: SPS: 
Profiler: SPS:   This platform supports native unwinding.
Profiler: SPS: 
Profiler: SPS: Use env var SPS_MODE=help for further information.

On Linux, just set these vars before starting.  On Android it is
more complex.  I use a command line like this:

  sh /system/bin/am start -n org.mozilla.fennec_sewardj/.App \
     --es env0 MOZ_LINKER_EXTRACT=1 \
     --es env1 SPS_INTERVAL=100 \
     --es env2 SPS_MODE=combined
Attachment #664830 - Attachment is obsolete: true
Blocks: 758697
Here's a build in case someone wants to play around with this:
http://people.mozilla.com/~bgirard/spsbreakpad.apk
Just want to mention here that getting backtraces has become a very high priority for B2G.  Right now, we basically cannot profile native code, nor can we run heap profilers.  And we very badly need both of these sorts of tools.
> Just want to mention here that getting backtraces has become a very high priority for B2G.

I got kind of crummy but much-better-than-nothing stacks using _Unwind_Backtrace plus -funwind-tables, so we're no longer dying for this atm.
(In reply to Justin Lebar [:jlebar] from comment #30)
> I got kind of crummy but much-better-than-nothing stacks using
> _Unwind_Backtrace plus -funwind-tables, so we're no longer dying for this
> atm.

Unfortunately this doesn't work in SPS because there the backtrace is generated from within a signal handler and _Unwind_Backtrace() stops when it hits the handler's frame (see bug 810526).
What's blocking this from getting in?
We need to upstream a bunch of Breakpad patches, and I've just been dragged off on a bunch of other work repeatedly.
I've got updated patches for this that apply cleanly to the current mozilla-central repo, I haven't posted them yet because I wanted Julian to make sure they were still working as he had intended first.
As Ted mentions, we needed to land a bunch of Breakpad patches to
improve performance of Breakpad to a level usable for profiling.

But also, the core patches here refactored the profiler back end
so that unwinding is done in a different thread from the thread
that is being sampled.  Landing that all at once will break
sampling on Windows and possibly other targets (I'm not sure
which, though).  So I need to rewrite it so that both the old and
new code paths coexist, and we can incrementally transition from
old to new without breaking existing functionality in the
process.
I've started putting some of our patches upstream for review:
https://breakpad.appspot.com/510002/
https://breakpad.appspot.com/511002/
https://breakpad.appspot.com/511003/

I'm going to keep at this and try to get them all up, then pester jimb for reviews.
Looks like there's only https://breakpad.appspot.com/511003/ outstanding.

Can I help with the patch on the mozilla size?
I still have two more patches after this to get up:
http://breakpad.appspot.com/517002

Sorry, it took me a little longer than expected.
Comment on attachment 667968 [details] [diff] [review]
Builds/runs on android and x86_64-linux

Review of attachment 667968 [details] [diff] [review]:
-----------------------------------------------------------------

::: tools/profiler/TableTicker.cpp
@@ +223,5 @@
>      // being thread safe. Bug 750989.
>      t->SetPaused(true);
>  
>      // Get file path
> +#   if defined(SPS_PLAT_arm_android)

SPS_PLAT_arm_android doesn't mean the same thing as MOZ_WIDGET_ANDROID.
Blocks: 734691
Current WIP patch set:

   (710930) breakpad-local-patches-29-01-2013.cset
   (710932) sps-breakpad-22.cset
   (710936) sps-breakpad-22plus.cset

hg import them in that order.  This gives the existing
functionality unmodified, and gives the breakpad-based backend
for x64-linux and arm-android.  You need SPS_NEW=(anything) to
get the latter functionality.  I use

  ac_add_options --enable-optimize="-g -O"
  ac_add_options --disable-elf-hack
  ac_add_options --enable-profiling
  ac_add_options --disable-jemalloc
  ac_add_options --enable-valgrind

The last two are not necessary for end-use, just useful	for
debugging.
Assignee: nobody → jseward
Blocks: 788022
Attachment #710930 - Attachment is obsolete: true
Attachment #710932 - Attachment is obsolete: true
Attachment #710936 - Attachment is obsolete: true
Hmm, building with the rollup patch:

/home/vladimir/proj/b2g-unagi/gecko/tools/profiler/UnwinderThread2.cpp: In function 'void spinLock_acquire(SpinLock*)':
/home/vladimir/proj/b2g-unagi/gecko/tools/profiler/UnwinderThread2.cpp:420: error: 'VALGRIND_HG_MUTEX_LOCK_PRE' was not declared in this scope
/home/vladimir/proj/b2g-unagi/gecko/tools/profiler/UnwinderThread2.cpp:427: error: 'VALGRIND_HG_MUTEX_LOCK_POST' was not declared in this scope
/home/vladimir/proj/b2g-unagi/gecko/tools/profiler/UnwinderThread2.cpp: In function 'void spinLock_release(SpinLock*)':
/home/vladimir/proj/b2g-unagi/gecko/tools/profiler/UnwinderThread2.cpp:433: error: 'VALGRIND_HG_MUTEX_UNLOCK_PRE' was not declared in this scope
/home/vladimir/proj/b2g-unagi/gecko/tools/profiler/UnwinderThread2.cpp:439: error: 'VALGRIND_HG_MUTEX_UNLOCK_POST' was not declared in this scope
/home/vladimir/proj/b2g-unagi/gecko/tools/profiler/UnwinderThread2.cpp: In function 'void release_full_buffer(ThreadProfile*, UnwinderThreadBuffer*, void*)':
/home/vladimir/proj/b2g-unagi/gecko/tools/profiler/UnwinderThread2.cpp:666: error: 'VALGRIND_MAKE_MEM_DEFINED' was not declared in this scope
/home/vladimir/proj/b2g-unagi/gecko/tools/profiler/UnwinderThread2.cpp: In function 'void* unwind_thr_fn(void*)':
/home/vladimir/proj/b2g-unagi/gecko/tools/profiler/UnwinderThread2.cpp:1180: error: 'VALGRIND_MAKE_MEM_UNDEFINED' was not declared in this scope


I don't have --enable-valgrind, is this stuff just depending on it?
> I don't have --enable-valgrind, is this stuff just depending on it?

Yes.  Either use that (tricky when crosscompiling), or try the
following (untested):

diff --git a/tools/profiler/UnwinderThread2.cpp b/tools/profiler/UnwinderThread2.cpp
--- a/tools/profiler/UnwinderThread2.cpp
+++ b/tools/profiler/UnwinderThread2.cpp
@@ -10,16 +10,21 @@
 #include <unistd.h>
 #include <pthread.h>
 #include <stdlib.h>
 #include <time.h>
 
 #ifdef MOZ_VALGRIND
 # include <valgrind/helgrind.h>
 # include <valgrind/memcheck.h>
+#else
+# define VALGRIND_HG_MUTEX_LOCK_PRE(_mx,_istry)  /* */
+# define VALGRIND_HG_MUTEX_LOCK_POST(_mx)        /* */
+# define VALGRIND_MAKE_MEM_DEFINED(_addr,_len)   /* */
+# define VALGRIND_MAKE_MEM_UNDEFINED(_addr,_len) /* */
 #endif
 
 // mmap
 #include <sys/mman.h>
 
 // Next 4 needed to keep TableTicker.h happy
 #include "sps_sampler.h"
 #include "platform.h"
Yep, just did that (roughly, had to do some do {} while() and removing some (void) bits)
The next step is to merge the patches in comments 44, 45, 49 and 50
into a single patch.  This should give something which

(1) provides breakpad unwinding on the initially supported platforms
    amd64-linux, arm-android and possibly OSX,

(2) provides the existing SPS functionality unchanged on all
    platforms, and

(3) does not break the build on any Tier-1 platform.

Then it can be tested on try and BenWa/others can start to review the
SPS side of it.
Rollup patch that applies against m-c of 14 Feb, that

* builds on all T-1 platforms: {x86,amd64}-linux, {x86,amd64}-darwin,
  x86-windows, arm-android

* provides breakpad based unwind on: {x86,amd64}-linux, arm-android
  (tested a few days ago), {x86,amd64}-darwin (maybe -- all the bits
  are hooked up, but needs further looking at)

This is the first version that is suitable for serious consideration
for landing.

Does not contain Mike Hommey's extra fixes yet (stack check overflow,
use FP unwind on ARM).
Attachment #712511 - Attachment is obsolete: true
Attachment #712926 - Attachment is obsolete: true
Attachment #713492 - Attachment is obsolete: true
Attachment #713495 - Attachment is obsolete: true
While running this on B2G, I had serious issues until I renamed the SaveProfileTask class to SaveProfileTask2 in TableTicker2.cpp.  TableTicker.cpp has the same class name, and it looked like one of its functions was being called even though the new (v2) one was enabled via env var.  Renaming the class made the problem go away.
Comment on attachment 714299 [details] [diff] [review]
Rollup patch for all Tier-1 platforms

Review of attachment 714299 [details] [diff] [review]:
-----------------------------------------------------------------

I've noted some follow-up bugs as self notes. I will likely address them myself.

::: toolkit/xre/Makefile.in
@@ +133,5 @@
>  
> +ifdef MOZ_ENABLE_PROFILER_SPS
> +SHARED_LIBRARY_LIBS += \
> +  $(DEPTH)/toolkit/crashreporter/google-breakpad/src/processor/$(LIB_PREFIX)breakpad_sps_common_s.$(LIB_SUFFIX) \
> +  $(DEPTH)/toolkit/crashreporter/google-breakpad/src/common/$(LIB_PREFIX)breakpad_common_s.$(LIB_SUFFIX) \

Wont this regress building without MOZ_ENABLE_PROFILER_SPS?

::: tools/profiler/TableTicker.cpp
@@ +81,5 @@
>  #endif
>  
>  static const int DYNAMIC_MAX_STRING = 512;
>  
> +//mozilla::ThreadLocal<PseudoStack *> tlsPseudoStack; // supplied by TableTicker2.cpp

Let's remove this too

@@ +82,5 @@
>  
>  static const int DYNAMIC_MAX_STRING = 512;
>  
> +//mozilla::ThreadLocal<PseudoStack *> tlsPseudoStack; // supplied by TableTicker2.cpp
> +static mozilla::ThreadLocal<TableTicker *> tlsTicker;

Perhaps this should also be supplied by TableTicker2.cpp.

Self followup: Rename static to gName.

@@ +87,5 @@
>  // We need to track whether we've been initialized otherwise
>  // we end up using tlsStack without initializing it.
>  // Because tlsStack is totally opaque to us we can't reuse
>  // it as the flag itself.
> +// bool stack_key_initialized; // supplied by TableTicker2.cpp

Let's remove this.

@@ +881,5 @@
>  }
>  #endif
>  
>  static
> +void doSampleStackTrace(PseudoStack *aStack, ThreadProfile &aProfile, TickSample *sample)

Self follow-up: This could use a better name

@@ +1003,5 @@
>  }
>  
> +bool sps_version2()
> +{
> +  static int version = 0; // Raced on, potentially

This should be initialized in sampler_init.

@@ +1007,5 @@
> +  static int version = 0; // Raced on, potentially
> +
> +  if (version == 0) {
> +    bool allow2 = true; // Is v2 allowable on this platform?
> +#   if defined(XP_WIN)

Has this been tested on mac? If not we should disable this for now.

::: tools/profiler/TableTicker2.cpp
@@ +1,1 @@
> +/* -*- Mode: C++; tab-width: 2; indent-tabs-mode: nil; c-basic-offset: 2 -*- */

Self followup: This file needs to be seperated. Perhaps once that done TableTicker1&2 can share a bit more code.

@@ +47,5 @@
> +
> +// Self
> +#include "TableTicker2.h"
> +
> +// HACK; remove me

If we plan on fixing that post landing this needs more information.

@@ +185,5 @@
> +  void doNativeBacktrace(ThreadProfile2 &aProfile, TickSample* aSample);
> +
> +private:
> +  // This represent the application's main thread (SAMPLER_INIT)
> +  ThreadProfile2 mPrimaryThreadProfile2;

This variable shouldn't be named 2.

@@ +322,5 @@
> +  b.DefineProperty(meta, "version", 2);
> +  b.DefineProperty(meta, "interval", interval());
> +  b.DefineProperty(meta, "stackwalk", mUseStackWalk);
> +  b.DefineProperty(meta, "jank", mJankOnly);
> +  b.DefineProperty(meta, "processType", XRE_GetProcessType());

personal follow up: Perhaps store the unwind mode in the meta data.

@@ +407,5 @@
> +// Everything in this section RUNS IN SIGHANDLER CONTEXT
> +
> +// RUNS IN SIGHANDLER CONTEXT
> +static
> +void genProfileEntry2(/*MOD*/UnwinderThreadBuffer* utb,

What's MOD?

@@ +478,5 @@
> +                               PseudoStack *aStack, TickSample *sample)
> +{
> +  // Call genProfileEntry2 to generate tags for each profile
> +  // entry.  Each entry will be bounded by a 'h' 'P' tag to
> +  // mark the start and a 'h' 'Q' tag to mark the end.

What do the letters stand for? They should be documented in sampler.h.

@@ +516,5 @@
> +  /* This could fail, if no buffers are currently available, in which
> +     case we must give up right away.  We cannot wait for a buffer to
> +     become available, as that risks deadlock. */
> +  if (!utb)
> +    return;

Should we have a way of reporting this is occuring? We could have a static count and report that in the profile meta data.

@@ +557,5 @@
> +  // settings in sUnwindMode and sUnwindInterval.
> +  // Add a native-backtrace request, or add pseudo backtrace entries,
> +  // or both.
> +  switch (sUnwindMode) {
> +    case UnwNATIVE: /* Native only */

Perhaps we can use 'getenv("MOZ_PROFILER_MODE")' to represent the max feature that the profiler will report and continue to honor mStackWalk. This means we can runtime switch what mode we use. This is useful for users to control how much of an overhead/memory they want to take while profiling without restarting. This would become && mStackWalk.

Edit: Or I just read below how we handle the interval. If the interval isn't specified we use the value passed to the profiler. We should do the same here for now.

@@ +712,5 @@
> +  const char* strM = getenv("MOZ_PROFILER_MODE");
> +  if (!strM) strM = getenv("SPS_MODE");
> +
> +  const char* strI = getenv("MOZ_PROFILER_INTERVAL");
> +  if (!strI) strI = getenv("SPS_INTERVAL");

Why are there 2 sets of variables?

@@ +755,5 @@
> +  LOGF("SPS:   This platform %s native unwinding.",
> +       nativeAvail ? "supports" : "does not support");
> +  LOG( "SPS: ");
> +  /* Re-set defaults */
> +  sUnwindMode       = nativeAvail ? UnwCOMBINED : UnwPSEUDO;

Likewise if this isn't provided it should use the value provided to ProfilerStart

@@ +756,5 @@
> +       nativeAvail ? "supports" : "does not support");
> +  LOG( "SPS: ");
> +  /* Re-set defaults */
> +  sUnwindMode       = nativeAvail ? UnwCOMBINED : UnwPSEUDO;
> +  sUnwindInterval   = 0;  /* We'll have to look elsewhere */

/* Use the value provided to ProfilerStart */

@@ +889,5 @@
> +const char** mozilla_sampler_get_features2()
> +{
> +  static const char* features[] = {
> +#if defined(MOZ_PROFILING)                                     \
> +    && (defined(USE_NS_STACKWALK) || defined(USE_BREAKPAD))

I don't see this defined in this patch. We should only report this feature where we expect this to have a reasonable chance at working.

@@ +1003,5 @@
> +}
> +
> +void mozilla_sampler_print_location2()
> +{
> +  // FIXME

These can be postponed for now since the default backend will still be mode1.

::: tools/profiler/TableTicker2.h
@@ +1,2 @@
> +
> +/* What does TableTicker2.cpp export?  Should anything be in here? */

TableTicker exports via sps_sampler.h. We shouldn't check this file in and as a follow up TableTicker should be split off into many files since it holds many classes.

::: tools/profiler/UnwinderThread2.cpp
@@ +64,5 @@
> +#define LOGLEVEL 2
> +
> +
> +// The 'else' of this covers the entire rest of the file
> +#if defined(SPS_PLAT_x86_windows)

NIT: We should move this out into a stub file.

@@ +201,5 @@
> +//// BEGIN type UnwindThreadBuffer
> +
> +typedef  unsigned int            UInt;  /* always 32-bit */
> +typedef  unsigned long int       UWord; /* machine word */
> +typedef  unsigned long long int  ULong; /* always 64-bit */

why not use stdint?

@@ +384,5 @@
> +
> +
> +/* Implement machine-word sized atomic compare-and-swap. */
> +/* return 1 if success, 0 if failure */
> +static int do_CASW ( UWord* addr, UWord expected, UWord nyu )

I'm can't provide a good review for the do_* functions. Julian do you think we need to pull in someone who understand this better to review this part?

@@ +562,5 @@
> +   for (i = 0; i < g_stackLimitsUsed; i++) {
> +      /* check for duplicate registration */
> +      assert(g_stackLimits[i].thrId != me);
> +   }
> +   assert(g_stackLimitsUsed < N_SAMPLING_THREADS);

We should handle this failure.

@@ +662,5 @@
> +  buff->state = S_FILLING;
> +  buff->seqNo = g_seqNo;
> +  g_seqNo++;
> +
> +  if (0)

Should this be conditional on a log level?

@@ +823,5 @@
> +
> +
> +// RUNS IN SIGHANDLER CONTEXT
> +void
> +utb_add_profent(/*MOD*/UnwinderThreadBuffer* utb, ProfileEntry2 ent)

profile_entry?

@@ +1469,5 @@
> +
> +/* Find out, in a platform-dependent way, where the code modules got
> +   mapped in the process' virtual address space, and add them to
> +   |mods_|. */
> +static void read_procmaps ( std::vector<MyCodeModule*>& mods_ )

Why is this not using shared-libraries-*.cc? procmaps doesn't work with elfhack.

@@ +1723,5 @@
> +     debug_dirs.push_back("/usr/lib/debug/usr/lib");
> +     debug_dirs.push_back("/usr/lib/debug/lib/x86_64-linux-gnu");
> +     debug_dirs.push_back("/usr/lib/debug/usr/lib/x86_64-linux-gnu");
> +#    elif defined(SPS_PLAT_arm_android)
> +     debug_dirs.push_back("/sdcard/symbols/system/lib");

I wonder if it's worth doing a followup to provide these over the debug protocol. I guess it depends how much this differs from phone to phone.

@@ +1769,5 @@
> +    return;
> +  }
> +
> +  if (n_frames > 0) {
> +    //buff->aProfile->addTag(ProfileEntry2('s', "(root)"));

This should be provided by the pseudostack and can be removed right?

@@ +1838,5 @@
> +  }
> +
> +  delete stack;
> +  delete sw;
> +  //delete modules;

Is this delete elsewhere?

::: tools/profiler/UnwinderThread2.h
@@ +24,5 @@
> +void utb__addEntry(/*MOD*/UnwinderThreadBuffer* utb,
> +                   ProfileEntry2 ent);
> +
> +// Create the unwinder thread.  At the moment there can be only one.
> +void uwt__init();

What does the double underscore represent?

::: tools/profiler/local_debug_info_symbolizer.cc
@@ +26,5 @@
> +
> +namespace google_breakpad {
> +
> +LocalDebugInfoSymbolizer::~LocalDebugInfoSymbolizer() {
> +# if !defined(SPS_PLAT_x86_windows)

NIT: This has a lot of ifdef, we could split out a stub and a non stub version.

::: tools/profiler/platform-macos.cc
@@ +129,5 @@
>    // This is also initialized by the first argument to pthread_create() but we
>    // don't know which thread will run first (the original thread or the new
>    // one) so we initialize it here too.
> +
> +  // BEGIN temp hack for SPS v1-vs-v2

This could use a description on how to get the hack removed such as a follow bug #

::: tools/profiler/shim_mac_dump_syms.mm
@@ +9,5 @@
> +                    google_breakpad::Module** module)
> +{
> +  google_breakpad::DumpSymbols ds(symbol_data);
> +
> +  NSString* obj_file_ns = [NSString stringWithUTF8String:obj_file.c_str()];

Does ds.Read release this?

::: tools/profiler/sps_sampler.h
@@ +48,5 @@
>  
> +/* Returns true if env var SPS_NEW is set to anything, else false. */
> +extern bool sps_version2();
> +
> +#define SAMPLER_INIT() \

Follow-up bug: Replace these macros with inline functions
Please roll up something similar in your patch.
(In reply to Benoit Girard (:BenWa) from comment #55)
This doesn't seem to apply against (m-c + current rollup patch from c52).
In any case the rollup patch already contains what I believe is a suitable
fix.
In reply to Comment on attachment 714299 [details] [diff] [review] (comment 54)

> ::: toolkit/xre/Makefile.in
> @@ +133,5 @@
> >  
> > +ifdef MOZ_ENABLE_PROFILER_SPS
> > +SHARED_LIBRARY_LIBS += \
> > +  $(DEPTH)/toolkit/crashreporter/google-breakpad/src/processor/$(LIB_PREFIX)breakpad_sps_common_s.$(LIB_SUFFIX) \
> > +  $(DEPTH)/toolkit/crashreporter/google-breakpad/src/common/$(LIB_PREFIX)breakpad_common_s.$(LIB_SUFFIX) \
> 
> Wont this regress building without MOZ_ENABLE_PROFILER_SPS?

Don't know.  Need to check.  In any case this all needs to pass try before it
can land, and I don't believe try won't catch this case.

------

> ::: tools/profiler/TableTicker.cpp
> @@ +81,5 @@
> >  #endif
> >  
> >  static const int DYNAMIC_MAX_STRING = 512;
> >  
> > +//mozilla::ThreadLocal<PseudoStack *> tlsPseudoStack; // supplied by TableTicker2.cpp
> 
> Let's remove this too

Done

------

> @@ +82,5 @@
> >  
> >  static const int DYNAMIC_MAX_STRING = 512;
> >  
> > +//mozilla::ThreadLocal<PseudoStack *> tlsPseudoStack; // supplied by TableTicker2.cpp
> > +static mozilla::ThreadLocal<TableTicker *> tlsTicker;
> 
> Perhaps this should also be supplied by TableTicker2.cpp.

It's local to this file.  I'd prefer to minimise the amount of effort
on code sharing between TableTicker.cpp and TableTicker2.cpp since the
former will get removed at some point anyway.

------

> Self followup: Rename static to gName.
> 
> @@ +87,5 @@
> >  // We need to track whether we've been initialized otherwise
> >  // we end up using tlsStack without initializing it.
> >  // Because tlsStack is totally opaque to us we can't reuse
> >  // it as the flag itself.
> > +// bool stack_key_initialized; // supplied by TableTicker2.cpp
> 
> Let's remove this.

Done

-------

> @@ +1003,5 @@
> >  }
> >  
> > +bool sps_version2()
> > +{
> > +  static int version = 0; // Raced on, potentially
> 
> This should be initialized in sampler_init.

It's too late by then.  It is consulted by SAMPLER_INIT() in order to
decide whether to go to sampler_init1() or sampler_init2().  When
TableTicker.cpp goes away, so will this variable, the routine that
sets it, sps_version(), and the macros.  This is just temporary
scaffolding.

------

> @@ +1007,5 @@
> > +  static int version = 0; // Raced on, potentially
> > +
> > +  if (version == 0) {
> > +    bool allow2 = true; // Is v2 allowable on this platform?
> > +#   if defined(XP_WIN)
> 
> Has this been tested on mac? If not we should disable this for now.

It's been tested, it doesn't crash, but I don't believe it works
properly, and needs investigation.  Fixed -- restricted to 
amd64-linux, x86-linux, arm-android for the time being.

------
 
> ::: tools/profiler/TableTicker2.cpp
> @@ +1,1 @@
> > +/* -*- Mode: C++; tab-width: 2; indent-tabs-mode: nil; c-basic-offset: 2 -*- */
> 
> Self followup: This file needs to be seperated. Perhaps once that done
> TableTicker1&2 can share a bit more code.
> 
> @@ +47,5 @@
> > +
> > +// Self
> > +#include "TableTicker2.h"
> > +
> > +// HACK; remove me
> 
> If we plan on fixing that post landing this needs more information.

There's a few assertions (3) in this file that I'd like to keep.  Is
there a standard 'assertion policy' for this part of the code base,
that I should observe?

------
 
> @@ +185,5 @@
> > +  void doNativeBacktrace(ThreadProfile2 &aProfile, TickSample* aSample);
> > +
> > +private:
> > +  // This represent the application's main thread (SAMPLER_INIT)
> > +  ThreadProfile2 mPrimaryThreadProfile2;
> 
> This variable shouldn't be named 2.

Done

------

> @@ +322,5 @@
> > +  b.DefineProperty(meta, "version", 2);
> > +  b.DefineProperty(meta, "interval", interval());
> > +  b.DefineProperty(meta, "stackwalk", mUseStackWalk);
> > +  b.DefineProperty(meta, "jank", mJankOnly);
> > +  b.DefineProperty(meta, "processType", XRE_GetProcessType());
> 
> personal follow up: Perhaps store the unwind mode in the meta data.

------

> @@ +407,5 @@
> > +// Everything in this section RUNS IN SIGHANDLER CONTEXT
> > +
> > +// RUNS IN SIGHANDLER CONTEXT
> > +static
> > +void genProfileEntry2(/*MOD*/UnwinderThreadBuffer* utb,
> 
> What's MOD?

Modified (by this function).  Iow, is neither a pure in- nor out-
parameter.  I changed it to MODIFIED.

------

> @@ +478,5 @@
> > +                               PseudoStack *aStack, TickSample *sample)
> > +{
> > +  // Call genProfileEntry2 to generate tags for each profile
> > +  // entry.  Each entry will be bounded by a 'h' 'P' tag to
> > +  // mark the start and a 'h' 'Q' tag to mark the end.
> 
> What do the letters stand for? They should be documented in
> sampler.h.

'h' is "hint", associated with mTagChar.  For mTagChar 'P' this is
the start of a profile entry, and 'Q' is the end.  I think this is
needed so as to make interleaving (pseudo + native entries) work
in the separate-thread scenario.

They are partially documented in ProfileEntry2.cpp lines 84-92.

This needs to be cleaned up anyway, so as to have a compiler
enforced mapping between tag chars and union variant fields.  ATM
it is ad-hoc.  OK to improve the documentation and fix the code
in a followup?

------

> @@ +516,5 @@
> > +  /* This could fail, if no buffers are currently available, in which
> > +     case we must give up right away.  We cannot wait for a buffer to
> > +     become available, as that risks deadlock. */
> > +  if (!utb)
> > +    return;
> 
> Should we have a way of reporting this is occuring? We could have a static
> count and report that in the profile meta data.

This is reported, but it can't be reported at the point you show
because involves calling printf/logging functions and risking
deadlock.

The unwinder thread keeps track of how many attempts we made to get a
buffer but failed, and prints this periodically in the log -- every
1000 samples.  So it's easy to see if the unwinder thread is falling
behind.  Typically it misses several hundred samples at startup whilst
reading the CFI from libxul.so, but is OK after that.  eg:

Profiler: BPUnw: 19000 total samples, 1090 failed due to buffer unavail

------

> @@ +557,5 @@
> > +  // settings in sUnwindMode and sUnwindInterval.
> > +  // Add a native-backtrace request, or add pseudo backtrace entries,
> > +  // or both.
> > +  switch (sUnwindMode) {
> > +    case UnwNATIVE: /* Native only */
> 
> Perhaps we can use 'getenv("MOZ_PROFILER_MODE")' to represent the max feature
> that the profiler will report and continue to honor mStackWalk. This means we
> can runtime switch what mode we use. This is useful for users to control how
> much of an overhead/memory they want to take while profiling without
> restarting. This would become && mStackWalk.
> 
> Edit: Or I just read below how we handle the interval. If the interval isn't
> specified we use the value passed to the profiler. We should do the same here
> for now.

Can we do a followup where we (re)design how the profiler is to be
controlled, and implement?

------

> @@ +712,5 @@
> > +  const char* strM = getenv("MOZ_PROFILER_MODE");
> > +  if (!strM) strM = getenv("SPS_MODE");
> > +
> > +  const char* strI = getenv("MOZ_PROFILER_INTERVAL");
> > +  if (!strI) strI = getenv("SPS_INTERVAL");
> 
> Why are there 2 sets of variables?

Not sure.  It may be that I thought that the right way was to have
MOZ_FOO_BAR, but I knew I would never remember the names, so added
SPS_BAR too.  This should be rationalised, although not now.

------

> @@ +755,5 @@
> > +  LOGF("SPS:   This platform %s native unwinding.",
> > +       nativeAvail ? "supports" : "does not support");
> > +  LOG( "SPS: ");
> > +  /* Re-set defaults */
> > +  sUnwindMode       = nativeAvail ? UnwCOMBINED : UnwPSEUDO;
> 
> Likewise if this isn't provided it should use the value provided to
> ProfilerStart
> 
> @@ +756,5 @@
> > +       nativeAvail ? "supports" : "does not support");
> > +  LOG( "SPS: ");
> > +  /* Re-set defaults */
> > +  sUnwindMode       = nativeAvail ? UnwCOMBINED : UnwPSEUDO;
> > +  sUnwindInterval   = 0;  /* We'll have to look elsewhere */
> 
> /* Use the value provided to ProfilerStart */

Ditto comments about do this in a followup.

------

> @@ +889,5 @@
> > +const char** mozilla_sampler_get_features2()
> > +{
> > +  static const char* features[] = {
> > +#if defined(MOZ_PROFILING)                                     \
> > +    && (defined(USE_NS_STACKWALK) || defined(USE_BREAKPAD))

> I don't see this defined in this patch. We should only report this feature
> where we expect this to have a reasonable chance at working.

True.  I changed this to be keyed on 
if defined(MOZ_PROFILING) && defined(HAVE_NATIVE_UNWIND).  IMO, we
should change "stackwalk" to some other name as it is to me ambiguous
(does it imply pseudostack or nativestack?  I don't know)

------

> @@ +1003,5 @@
> > +}
> > +
> > +void mozilla_sampler_print_location2()
> > +{
> > +  // FIXME
> 
> These can be postponed for now since the default backend will still
> be mode1.

------

> ::: tools/profiler/TableTicker2.h
> @@ +1,2 @@
> > +
> > +/* What does TableTicker2.cpp export?  Should anything be in here? */
> 
> TableTicker exports via sps_sampler.h. We shouldn't check this file in and as a
> follow up TableTicker should be split off into many files since it holds many
> classes.

Done 

------
> 
> ::: tools/profiler/UnwinderThread2.cpp
> @@ +64,5 @@
> > +#define LOGLEVEL 2
> > +
> > +
> > +// The 'else' of this covers the entire rest of the file
> > +#if defined(SPS_PLAT_x86_windows)
> 
> NIT: We should move this out into a stub file.

This is only scaffolding to get it to compile right now.  It will
have to get reworked anyway once we start using breakpad on Windows.

------

> 
> @@ +201,5 @@
> > +//// BEGIN type UnwindThreadBuffer
> > +
> > +typedef  unsigned int            UInt;  /* always 32-bit */
> > +typedef  unsigned long int       UWord; /* machine word */
> > +typedef  unsigned long long int  ULong; /* always 64-bit */
> 
> why not use stdint?

Doesn't always exist in some MSVC versions (AFAIU).  That said, these
definitions are still problematic and will need to be fixed once we
start compiling this section of the code on Windows, which we
currently don't.

------

> @@ +384,5 @@
> > +
> > +
> > +/* Implement machine-word sized atomic compare-and-swap. */
> > +/* return 1 if success, 0 if failure */
> > +static int do_CASW ( UWord* addr, UWord expected, UWord nyu )
> 
> I'm can't provide a good review for the do_* functions. Julian do you think we
> need to pull in someone who understand this better to review this
> part?

cjones perhaps.

This scheme with the memory barriers isn't exactly correct (I've come
to understand) and will need a followup to fix it.  I think it's OK
for now though.

------

> @@ +562,5 @@
> > +   for (i = 0; i < g_stackLimitsUsed; i++) {
> > +      /* check for duplicate registration */
> > +      assert(g_stackLimits[i].thrId != me);
> > +   }
> > +   assert(g_stackLimitsUsed < N_SAMPLING_THREADS);
> 
> We should handle this failure.

Yeah .. not easy since this is accessed with the buffer-coordinating
spinlock held and so we can't dynamically reallocate it.  My crude
plan was to set N_SAMPLING_THREADS high enough (eg, 100) to cover
most eventualities, at least at first.

------

> 
> @@ +662,5 @@
> > +  buff->state = S_FILLING;
> > +  buff->seqNo = g_seqNo;
> > +  g_seqNo++;
> > +
> > +  if (0)
> 
> Should this be conditional on a log level?

No, it's a debugging printf.  Deleted.

------

> @@ +823,5 @@
> > +
> > +
> > +// RUNS IN SIGHANDLER CONTEXT
> > +void
> > +utb_add_profent(/*MOD*/UnwinderThreadBuffer* utb, ProfileEntry2 ent)
> 
> profile_entry?

I've used 'ent' to mean a profile entry in lots of places .. I think
using profile_entry would make it quite verbose.

------

> @@ +1469,5 @@
> > +
> > +/* Find out, in a platform-dependent way, where the code modules got
> > +   mapped in the process' virtual address space, and add them to
> > +   |mods_|. */
> > +static void read_procmaps ( std::vector<MyCodeModule*>& mods_ )
> 
> Why is this not using shared-libraries-*.cc? procmaps doesn't work with
> elfhack.

Because I wasn't aware of shared-libraries-*.cc until you pointed it
out.  We should definitely move to this in a followup.

------

> 
> @@ +1723,5 @@
> > +     debug_dirs.push_back("/usr/lib/debug/usr/lib");
> > +     debug_dirs.push_back("/usr/lib/debug/lib/x86_64-linux-gnu");
> > +     debug_dirs.push_back("/usr/lib/debug/usr/lib/x86_64-linux-gnu");
> > +#    elif defined(SPS_PLAT_arm_android)
> > +     debug_dirs.push_back("/sdcard/symbols/system/lib");
> 
> I wonder if it's worth doing a followup to provide these over the debug
> protocol. I guess it depends how much this differs from phone to phone.

Definitely.  What we have at the moment is just a getting-started hack.

------

> @@ +1769,5 @@
> > +    return;
> > +  }
> > +
> > +  if (n_frames > 0) {
> > +    //buff->aProfile->addTag(ProfileEntry2('s', "(root)"));
> 
> This should be provided by the pseudostack and can be removed right?

Yes it can be removed (done).  This entry is made now by
unwind_thr_fn(), when doing stack merging.

------

> 
> @@ +1838,5 @@
> > +  }
> > +
> > +  delete stack;
> > +  delete sw;
> > +  //delete modules;
> 
> Is this delete elsewhere?

Err, that variable doesn't even exist any more.  Removed.

------

> ::: tools/profiler/UnwinderThread2.h
> @@ +24,5 @@
> > +void utb__addEntry(/*MOD*/UnwinderThreadBuffer* utb,
> > +                   ProfileEntry2 ent);
> > +
> > +// Create the unwinder thread.  At the moment there can be only one.
> > +void uwt__init();
> 
> What does the double underscore represent?

Nothing particularly, just serves to seperate the common prefix 'uwt'
from the rest of these funciton names.

------

> ::: tools/profiler/local_debug_info_symbolizer.cc
> @@ +26,5 @@
> > +
> > +namespace google_breakpad {
> > +
> > +LocalDebugInfoSymbolizer::~LocalDebugInfoSymbolizer() {
> > +# if !defined(SPS_PLAT_x86_windows)
> 
> NIT: This has a lot of ifdef, we could split out a stub and a non
> stub version.

This is (again) just a temp hack to get it to compile on Windows.
When we come to use breakpad for unwinding on Windows this will have
to be reworked anyway.

------

> ::: tools/profiler/platform-macos.cc
> @@ +129,5 @@
> >    // This is also initialized by the first argument to pthread_create() but we
> >    // don't know which thread will run first (the original thread or the new
> >    // one) so we initialize it here too.
> > +
> > +  // BEGIN temp hack for SPS v1-vs-v2
> 
> This could use a description on how to get the hack removed such as a follow
> bug #

Can only be removed when SPS v1 disappears.

------

> ::: tools/profiler/shim_mac_dump_syms.mm
> @@ +9,5 @@
> > +                    google_breakpad::Module** module)
> > +{
> > +  google_breakpad::DumpSymbols ds(symbol_data);
> > +
> > +  NSString* obj_file_ns = [NSString stringWithUTF8String:obj_file.c_str()];
> 
> Does ds.Read release this?

No, and the ObjC runtime system (or some such) complains loudly about
leakage when this runs on OSX.  Will need to be fixed when OSX-native
unwinding is re-enabled.

------

> ::: tools/profiler/sps_sampler.h
> @@ +48,5 @@
> >  
> > +/* Returns true if env var SPS_NEW is set to anything, else false. */
> > +extern bool sps_version2();
> > +
> > +#define SAMPLER_INIT() \
> 
> Follow-up bug: Replace these macros with inline functions

OK.  I don't think my patch introduced these macros; they were
already present.
Also I've been accumulating a growing list of followups.  Here it
is so far.  LMK what I forgot ..

Urgent:
  add MikeH's overflow-fix patch
    -- done, will be in sps-breakpad-25
  add build fixes for OSX (779291 c 55)
    -- patch doesn't seem to apply, plus it is already in -24
       afaics

Less urgent, small:
  do_SFENCE, do_LFENCE: redo this properly
  re-enable native unwind for MacOS
  clean up and document ProfileEntry
  add MikeH's frame-pointer unwind patch (but needs to do the
    ARM/Thumb thing properly, which Ted said he might look at)
  get rid of fixed limit on number of sampling threads (N_SAMPLING_THREADS)

Less urgent, large:
  make breakpad-windows work
  fixes for how the mode and sample interval are controlled
  use shared-libraries-*.cc
  fixes for how/where debuginfo objects are found
(In reply to Julian Seward from comment #57)
> > @@ +47,5 @@
> > > +
> > > +// Self
> > > +#include "TableTicker2.h"
> > > +
> > > +// HACK; remove me
> > 
> > If we plan on fixing that post landing this needs more information.
> 
> There's a few assertions (3) in this file that I'd like to keep.  Is
> there a standard 'assertion policy' for this part of the code base,
> that I should observe?
> 

What we use in gfx which I followed here is an assert should never be reachable. For the profiler we should use MOZ_ASSERT. This should let you remove those '#undef NDEBUG' and include assert lines.

> > @@ +478,5 @@
> > > +                               PseudoStack *aStack, TickSample *sample)
> > > +{
> > > +  // Call genProfileEntry2 to generate tags for each profile
> > > +  // entry.  Each entry will be bounded by a 'h' 'P' tag to
> > > +  // mark the start and a 'h' 'Q' tag to mark the end.
> > 
> > What do the letters stand for? They should be documented in
> > sampler.h.
> 
> 'h' is "hint", associated with mTagChar.  For mTagChar 'P' this is
> the start of a profile entry, and 'Q' is the end.  I think this is
> needed so as to make interleaving (pseudo + native entries) work
> in the separate-thread scenario.
> 
> They are partially documented in ProfileEntry2.cpp lines 84-92.
> 
> This needs to be cleaned up anyway, so as to have a compiler
> enforced mapping between tag chars and union variant fields.  ATM
> it is ad-hoc.  OK to improve the documentation and fix the code
> in a followup?
> 

yes that's fine

> ------
> 
> > @@ +516,5 @@
> > > +  /* This could fail, if no buffers are currently available, in which
> > > +     case we must give up right away.  We cannot wait for a buffer to
> > > +     become available, as that risks deadlock. */
> > > +  if (!utb)
> > > +    return;
> > 
> > Should we have a way of reporting this is occuring? We could have a static
> > count and report that in the profile meta data.
> 
> This is reported, but it can't be reported at the point you show
> because involves calling printf/logging functions and risking
> deadlock.
> 
> The unwinder thread keeps track of how many attempts we made to get a
> buffer but failed, and prints this periodically in the log -- every
> 1000 samples.  So it's easy to see if the unwinder thread is falling
> behind.  Typically it misses several hundred samples at startup whilst
> reading the CFI from libxul.so, but is OK after that.  eg:
> 
> Profiler: BPUnw: 19000 total samples, 1090 failed due to buffer unavail
> 

Ok, as a follow up perhaps we can include that in the profile save and have the UI print warnings if it detect something that would indicate the profile was not accurate.

> ------
> 
> > @@ +557,5 @@
> > > +  // settings in sUnwindMode and sUnwindInterval.
> > > +  // Add a native-backtrace request, or add pseudo backtrace entries,
> > > +  // or both.
> > > +  switch (sUnwindMode) {
> > > +    case UnwNATIVE: /* Native only */
> > 
> > Perhaps we can use 'getenv("MOZ_PROFILER_MODE")' to represent the max feature
> > that the profiler will report and continue to honor mStackWalk. This means we
> > can runtime switch what mode we use. This is useful for users to control how
> > much of an overhead/memory they want to take while profiling without
> > restarting. This would become && mStackWalk.
> > 
> > Edit: Or I just read below how we handle the interval. If the interval isn't
> > specified we use the value passed to the profiler. We should do the same here
> > for now.
> 
> Can we do a followup where we (re)design how the profiler is to be
> controlled, and implement?
> 

yes that's fine.

> ------
> 
> > @@ +712,5 @@
> > > +  const char* strM = getenv("MOZ_PROFILER_MODE");
> > > +  if (!strM) strM = getenv("SPS_MODE");
> > > +
> > > +  const char* strI = getenv("MOZ_PROFILER_INTERVAL");
> > > +  if (!strI) strI = getenv("SPS_INTERVAL");
> > 
> > Why are there 2 sets of variables?
> 
> Not sure.  It may be that I thought that the right way was to have
> MOZ_FOO_BAR, but I knew I would never remember the names, so added
> SPS_BAR too.  This should be rationalised, although not now.
> 

I think it's worth picking one format before landing. This stuff tends to be really painful to fix once scripts depend on these.

> 
> > @@ +889,5 @@
> > > +const char** mozilla_sampler_get_features2()
> > > +{
> > > +  static const char* features[] = {
> > > +#if defined(MOZ_PROFILING)                                     \
> > > +    && (defined(USE_NS_STACKWALK) || defined(USE_BREAKPAD))
> 
> > I don't see this defined in this patch. We should only report this feature
> > where we expect this to have a reasonable chance at working.
> 
> True.  I changed this to be keyed on 
> if defined(MOZ_PROFILING) && defined(HAVE_NATIVE_UNWIND).  IMO, we
> should change "stackwalk" to some other name as it is to me ambiguous
> (does it imply pseudostack or nativestack?  I don't know)
> 

I've been using stackwalk to only mean nativestack. So feel free to use nativestack or something similar going forward.

> > 
> > @@ +201,5 @@
> > > +//// BEGIN type UnwindThreadBuffer
> > > +
> > > +typedef  unsigned int            UInt;  /* always 32-bit */
> > > +typedef  unsigned long int       UWord; /* machine word */
> > > +typedef  unsigned long long int  ULong; /* always 64-bit */
> > 
> > why not use stdint?
> 
> Doesn't always exist in some MSVC versions (AFAIU).  That said, these
> definitions are still problematic and will need to be fixed once we
> start compiling this section of the code on Windows, which we
> currently don't.

I think the mfbt/StandardInteger.h can be relied upon everywhere.

> 
> ------
> 
> > @@ +562,5 @@
> > > +   for (i = 0; i < g_stackLimitsUsed; i++) {
> > > +      /* check for duplicate registration */
> > > +      assert(g_stackLimits[i].thrId != me);
> > > +   }
> > > +   assert(g_stackLimitsUsed < N_SAMPLING_THREADS);
> > 
> > We should handle this failure.
> 
> Yeah .. not easy since this is accessed with the buffer-coordinating
> spinlock held and so we can't dynamically reallocate it.  My crude
> plan was to set N_SAMPLING_THREADS high enough (eg, 100) to cover
> most eventualities, at least at first.
> 

If we can't handle this initially I think we should run time assert. On optimize builds (where we will profile) we wont trip this assert and I believe it will cause corruption. We shouldn't silently ignore this.

> 
> ------
> 
> > @@ +823,5 @@
> > > +
> > > +
> > > +// RUNS IN SIGHANDLER CONTEXT
> > > +void
> > > +utb_add_profent(/*MOD*/UnwinderThreadBuffer* utb, ProfileEntry2 ent)
> > 
> > profile_entry?
> 
> I've used 'ent' to mean a profile entry in lots of places .. I think
> using profile_entry would make it quite verbose.

maybe prof_ent/profEnt then. profent is very confusing, I googled it thinking it was a single word.

> 
> ------
> 
> > @@ +1469,5 @@
> > > +
> > > +/* Find out, in a platform-dependent way, where the code modules got
> > > +   mapped in the process' virtual address space, and add them to
> > > +   |mods_|. */
> > > +static void read_procmaps ( std::vector<MyCodeModule*>& mods_ )
> > 
> > Why is this not using shared-libraries-*.cc? procmaps doesn't work with
> > elfhack.
> 
> Because I wasn't aware of shared-libraries-*.cc until you pointed it
> out.  We should definitely move to this in a followup.
> 

I think we should either fix before landing this or have an obvious failure if this code is hit and elfhack enabled. There requirements to use this are already complex. We should try to avoid any pitfalls that are going to cause users to lose time debugging.


> > ::: tools/profiler/UnwinderThread2.h
> > @@ +24,5 @@
> > > +void utb__addEntry(/*MOD*/UnwinderThreadBuffer* utb,
> > > +                   ProfileEntry2 ent);
> > > +
> > > +// Create the unwinder thread.  At the moment there can be only one.
> > > +void uwt__init();
> > 
> > What does the double underscore represent?
> 
> Nothing particularly, just serves to seperate the common prefix 'uwt'
> from the rest of these funciton names.
> 

I'd vote to remove it then if we don't need it to prevent collision with another symbol.

> 
> > ::: tools/profiler/shim_mac_dump_syms.mm
> > @@ +9,5 @@
> > > +                    google_breakpad::Module** module)
> > > +{
> > > +  google_breakpad::DumpSymbols ds(symbol_data);
> > > +
> > > +  NSString* obj_file_ns = [NSString stringWithUTF8String:obj_file.c_str()];
> > 
> > Does ds.Read release this?
> 
> No, and the ObjC runtime system (or some such) complains loudly about
> leakage when this runs on OSX.  Will need to be fixed when OSX-native
> unwinding is re-enabled.
> 

Why not add [obj_file_ns release]; then? Or at the very least a TODO.

> ------
> 
> > ::: tools/profiler/sps_sampler.h
> > @@ +48,5 @@
> > >  
> > > +/* Returns true if env var SPS_NEW is set to anything, else false. */
> > > +extern bool sps_version2();
> > > +
> > > +#define SAMPLER_INIT() \
> > 
> > Follow-up bug: Replace these macros with inline functions
> 
> OK.  I don't think my patch introduced these macros; they were
> already present.

Yes, this was just a self note. I've been waiting for these patches to land to finally clean this up :).
(In reply to Benoit Girard (:BenWa) from comment #59)

> > > @@ +47,5 @@
> > > > +
> > > > +// Self
> > > > +#include "TableTicker2.h"
> > > > +
> > > > +// HACK; remove me
> > > 
> > > If we plan on fixing that post landing this needs more information.
> > 
> > There's a few assertions (3) in this file that I'd like to keep.  Is
> > there a standard 'assertion policy' for this part of the code base,
> > that I should observe?
> > 
> 
> What we use in gfx which I followed here is an assert should never be
> reachable. For the profiler we should use MOZ_ASSERT. This should let
> you remove those '#undef NDEBUG' and include assert lines.

Done

------

> > > @@ +712,5 @@
> > > > +  const char* strM = getenv("MOZ_PROFILER_MODE");
> > > > +  if (!strM) strM = getenv("SPS_MODE");
> > > > +
> > > > +  const char* strI = getenv("MOZ_PROFILER_INTERVAL");
> > > > +  if (!strI) strI = getenv("SPS_INTERVAL");
> > > 
> > > Why are there 2 sets of variables?
> > 
> > Not sure.  It may be that I thought that the right way was to have
> > MOZ_FOO_BAR, but I knew I would never remember the names, so added
> > SPS_BAR too.  This should be rationalised, although not now.
> 
> I think it's worth picking one format before landing. This stuff tends
> to be really painful to fix once scripts depend on these.

Done -- deleted SPS_FOO, retained MOZ_PROFILER_FOO

------

> > > @@ +201,5 @@
> > > > +//// BEGIN type UnwindThreadBuffer
> > > > +
> > > > +typedef  unsigned int            UInt;  /* always 32-bit */
> > > > +typedef  unsigned long int       UWord; /* machine word */
> > > > +typedef  unsigned long long int  ULong; /* always 64-bit */
> > > 
> > > why not use stdint?
> > 
> > Doesn't always exist in some MSVC versions (AFAIU).  That said, these
> > definitions are still problematic and will need to be fixed once we
> > start compiling this section of the code on Windows, which we
> > currently don't.
> 
> I think the mfbt/StandardInteger.h can be relied upon everywhere.

Done

------

> > > @@ +562,5 @@
> > > > +   for (i = 0; i < g_stackLimitsUsed; i++) {
> > > > +      /* check for duplicate registration */
> > > > +      assert(g_stackLimits[i].thrId != me);
> > > > +   }
> > > > +   assert(g_stackLimitsUsed < N_SAMPLING_THREADS);
> > > 
> > > We should handle this failure.
> > 
> > Yeah .. not easy since this is accessed with the buffer-coordinating
> > spinlock held and so we can't dynamically reallocate it.  My crude
> > plan was to set N_SAMPLING_THREADS high enough (eg, 100) to cover
> > most eventualities, at least at first.
> > 
> 
> If we can't handle this initially I think we should run time
> assert. On optimize builds (where we will profile) we wont trip this
> assert and I believe it will cause corruption. We shouldn't silently
> ignore this.

Done (runtime assert, incl on opt builds)

------

> > > @@ +823,5 @@
> > > > +
> > > > +
> > > > +// RUNS IN SIGHANDLER CONTEXT
> > > > +void
> > > > +utb_add_profent(/*MOD*/UnwinderThreadBuffer* utb, ProfileEntry2 ent)
> > > 
> > > profile_entry?
> > 
> > I've used 'ent' to mean a profile entry in lots of places .. I think
> > using profile_entry would make it quite verbose.
> 
> maybe prof_ent/profEnt then. profent is very confusing, I googled it
> thinking it was a single word.

Done (changed to prof_ent)

------

> > > @@ +1469,5 @@
> > > > +
> > > > +/* Find out, in a platform-dependent way, where the code modules got
> > > > +   mapped in the process' virtual address space, and add them to
> > > > +   |mods_|. */
> > > > +static void read_procmaps ( std::vector<MyCodeModule*>& mods_ )
> > > 
> > > Why is this not using shared-libraries-*.cc? procmaps doesn't work with
> > > elfhack.
> > 
> > Because I wasn't aware of shared-libraries-*.cc until you pointed it
> > out.  We should definitely move to this in a followup.
> > 
> 
> I think we should either fix before landing this or have an obvious
> failure if this code is hit and elfhack enabled. There requirements to
> use this are already complex. We should try to avoid any pitfalls that
> are going to cause users to lose time debugging.

configure.in automatically disables elfhack when --enable-profiling is
given.  But for added safety I also added a check at profiler start, 
in sps_version2(), which complains in the log if we do appear to have
an elfhacked build, and drops back to 'version 1'.

------

> > > ::: tools/profiler/shim_mac_dump_syms.mm
> > > @@ +9,5 @@
> > > > +                    google_breakpad::Module** module)
> > > > +{
> > > > +  google_breakpad::DumpSymbols ds(symbol_data);
> > > > +
> > > > +  NSString* obj_file_ns = [NSString stringWithUTF8String:obj_file.c_str()];
> > > 
> > > Does ds.Read release this?
> > 
> > No, and the ObjC runtime system (or some such) complains loudly about
> > leakage when this runs on OSX.  Will need to be fixed when OSX-native
> > unwinding is re-enabled.
> > 
> 
> Why not add [obj_file_ns release]; then? Or at the very least a TODO.

Done (added TODO).  Can verify proper fix when the OSX support is reenabled.
Comment on attachment 716003 [details] [diff] [review]
Rollup patch, including fixes for BenWa feedback up to & including c 57

Requesting review for the following 7 functions:

do_CASW
do_PAUSE
do_LFENCE
do_SFENCE
spinLock_acquire
spinLock_release
atomic_INC

The purpose of this code is to add code to do compare-and-swap of
machine words, load and store fences, and spin-wait hints for CPUs
that support them.  On top of that is built support for very simple
spinlocks (spinLock_acquire, spinLock_release) and for atomic
increments of words in memory.

Currently only trying to support these for the following arch/compiler
combinations: x86/gcc, x86_64/gcc, arm/gcc.

ULong/UInt/UWord have been replaced by uint64_t/uint32_t/uintptr_t by
the companion patch in comment 62.

The fences (do_LFENCE, do_SFENCE) and their placement need rework in a
followup.  On x86/amd64 I think they are irrelevant and could be
replaced by a simple compiler fence
(__asm__ __volatile__("":::"memory")).  On ARM they need to be present,
but the load vs store distinction is perhaps unnecessary.

Another followup possibility is to replace these with calls to an
existing library in the tree, if suitable routines can be found.
Attachment #716003 - Flags: review?(mh+mozilla)
Pre-review comment: 732043 will eventually add some atomic functions.
Comment on attachment 716003 [details] [diff] [review]
Rollup patch, including fixes for BenWa feedback up to & including c 57

Review of attachment 716003 [details] [diff] [review]:
-----------------------------------------------------------------

::: tools/profiler/UnwinderThread2.cpp
@@ +385,5 @@
> +/* Implement machine-word sized atomic compare-and-swap. */
> +/* return 1 if success, 0 if failure */
> +static int do_CASW ( UWord* addr, UWord expected, UWord nyu )
> +{
> +#if defined(SPS_PLAT_amd64_linux) || defined(SPS_PLAT_amd64_darwin)

It's not really helpful for porting efforts that each different platform using the same architecture comes with its own macro.

@@ +387,5 @@
> +static int do_CASW ( UWord* addr, UWord expected, UWord nyu )
> +{
> +#if defined(SPS_PLAT_amd64_linux) || defined(SPS_PLAT_amd64_darwin)
> +   UWord block[4] = { (UWord)addr, expected, nyu, 2 };
> +   __asm__ __volatile__(

Why not just use compiler intrinsics?
Posted patch Rollup patch, passes try (obsolete) — Splinter Review
try "-b do -p all -u all -t none" looks good, produces
https://tbpl.mozilla.org/?tree=Try&rev=b0e9bfb89d4c

Re-requesting MikeH review on the spinlock bits, as per
comment 63.
Attachment #716003 - Attachment is obsolete: true
Attachment #716004 - Attachment is obsolete: true
Attachment #716003 - Flags: review?(mh+mozilla)
Attachment #717361 - Flags: review?(mh+mozilla)
> ::: tools/profiler/UnwinderThread2.cpp
> @@ +385,5 @@
> > +/* Implement machine-word sized atomic compare-and-swap. */
> > +/* return 1 if success, 0 if failure */
> > +static int do_CASW ( UWord* addr, UWord expected, UWord nyu )
> > +{
> > +#if defined(SPS_PLAT_amd64_linux) || defined(SPS_PLAT_amd64_darwin)
> 
> It's not really helpful for porting efforts that each different platform
> using the same architecture comes with its own macro.

True.  These could be compressed into an SPS_ARCH_amd64, in
PlatformMacros.h (added by the patch).  There are some cases where the
factorisation does need to be both arch- and os- specific, most
obviously in pulling register values out of sigcontext structures, but
that doesn't apply here.

> @@ +387,5 @@
> > +static int do_CASW ( UWord* addr, UWord expected, UWord nyu )
> > +{
> > +#if defined(SPS_PLAT_amd64_linux) || defined(SPS_PLAT_amd64_darwin)
> > +   UWord block[4] = { (UWord)addr, expected, nyu, 2 };
> > +   __asm__ __volatile__(
> 
> Why not just use compiler intrinsics?

That would indeed be a simpler approach, although it does expose us to
dependencies on compiler versions (eg, how far back does gcc have
__sync_bool_compare_and_swap), and compiler kinds (eg, MSVC).  The
current scheme avoids that but exposes us to dependencies on inline
assembly syntax instead.  Overall changing to intrinsics is probably
the best approach.  Per 2nd last para in comment 63 this needs rework
anyway since the fencing is not really right.
Comment on attachment 717361 [details] [diff] [review]
Rollup patch, passes try

Review of attachment 717361 [details] [diff] [review]:
-----------------------------------------------------------------

spinlock code looks good to me.
Attachment #717361 - Flags: review?(mh+mozilla)
Carrying over MikeH review request for spinlock bits, as per comment 63.
Attachment #717361 - Attachment is obsolete: true
Attachment #718372 - Flags: review?(mh+mozilla)
.. and a couple of other small cleanups for breakpad.  Applies on top
of the patch in comment 69.
Comment on attachment 718372 [details] [diff] [review]
Rollup patch, addresses comments in comment 65

Review of attachment 718372 [details] [diff] [review]:
-----------------------------------------------------------------

Code style-wise, this probably needs to be adjusted to be more in-line with Mozilla coding style (although there is not one such coding style, there are some things you'll never see, like spaces before and after parenthesis).

::: tools/profiler/UnwinderThread2.cpp
@@ +383,5 @@
> +/* Implement machine-word sized atomic compare-and-swap.  Returns true
> +   if success, false if failure. */
> +static bool do_CASW ( uintptr_t* addr, uintptr_t expected, uintptr_t nyu )
> +{
> +#if defined(SPS_OS_linux) || defined(SPS_OS_android) || defined(SPS_OS_darwin)

This should be a compiler test.

@@ +384,5 @@
> +   if success, false if failure. */
> +static bool do_CASW ( uintptr_t* addr, uintptr_t expected, uintptr_t nyu )
> +{
> +#if defined(SPS_OS_linux) || defined(SPS_OS_android) || defined(SPS_OS_darwin)
> +   return __sync_bool_compare_and_swap(addr, expected, nyu);

I'm not entirely sure for non x86 architectures, but on x86, this doesn't need a memory barrier (except a few Opteron models that have a bug under some circumstances)

@@ +402,5 @@
> +    || defined(SPS_PLAT_x86_android)
> +  __asm__ __volatile__("rep; nop"); // F3 90 -- check this
> +#elif defined(SPS_PLAT_arm_android)
> +  /* Some variant of WFE here, but be careful .. will need SEV to wake
> +     up afterwards. */

I think you should add something here before landing.
Attachment #718372 - Flags: review?(mh+mozilla)
With cleanups to includes for Android ELF headers, and to
windows_help.cc.
Attachment #719080 - Flags: review?(ted)
Posted patch Patch r29 (SPS components) (obsolete) — Splinter Review
Attachment #718372 - Attachment is obsolete: true
Attachment #718465 - Attachment is obsolete: true
I've segregated your local Breakpad changes out into logical changes, I should be able to get them organized into local Breakpad patches by tomorrow (or just directly upstream the simple ones).
(In reply to Mike Hommey [:glandium] from comment #71)
In Patch r29 (SPS components) (comment #73):

> like spaces before and after parenthesis

Fixed (redundant spaces removed, 2-char indent throughout)

> > +#if defined(SPS_OS_linux) || defined(SPS_OS_android) || defined(SPS_OS_darwin)
> 
> This should be a compiler test.

Done.

> > +   return __sync_bool_compare_and_swap(addr, expected, nyu);
> 
> I'm not entirely sure for non x86 architectures, but on x86, this doesn't
> need a memory barrier

It doesn't introduce one, though.  __sync_bool_compare_and_swap
compiles into LOCK;CMPXCHG.

> > +  /* Some variant of WFE here, but be careful .. will need SEV to wake
> > +     up afterwards. */
> 
> I think you should add something here before landing.

Done.  Needed guarding since WFE/SEV are not available on all ARMv6
variants.
Comment on attachment 719084 [details] [diff] [review]
Patch r29 (SPS components)

Re-requesting MikeH review on the spinlock bits, as per
comment 63.
Attachment #719084 - Flags: review?(mh+mozilla)
Comment on attachment 719084 [details] [diff] [review]
Patch r29 (SPS components)

Review of attachment 719084 [details] [diff] [review]:
-----------------------------------------------------------------

Nitpicking.

::: tools/profiler/UnwinderThread2.cpp
@@ +384,5 @@
> +/* Implement machine-word sized atomic compare-and-swap.  Returns true
> +   if success, false if failure. */
> +static bool do_CASW(uintptr_t* addr, uintptr_t expected, uintptr_t nyu)
> +{
> +#if defined(__GNUC__) || defined(__clang__)

clang defines __GNUC__ ;)

@@ +399,5 @@
> +static void do_SPINLOOP_RELAX()
> +{
> +#if defined(SPS_PLAT_amd64_linux) || defined(SPS_PLAT_amd64_darwin) \
> +    || defined(SPS_PLAT_x86_linux) || defined(SPS_PLAT_x86_darwin) \
> +    || defined(SPS_PLAT_x86_android)

Come to think of it, this should be something like
#if (defined(SPS_ARCH_amd64) || defined(SPS_ARCH_x86)) && defined(__GNUC__)

Note we use "x86_64" more often than "amd64".

@@ +406,5 @@
> +# if MOZILLA_ARM_ARCH >= 7
> +  __asm__ __volatile__("wfe");
> +# endif
> +#else
> +#  error "Unhandled platform"

If it's "Not critical if this is a no-op on some targets." as noted in the comment above, no reason for a #error here.

@@ +416,5 @@
> +{
> +#if defined(SPS_PLAT_amd64_linux) || defined(SPS_PLAT_amd64_darwin) \
> +    || defined(SPS_PLAT_x86_linux) || defined(SPS_PLAT_x86_darwin) \
> +    || defined(SPS_PLAT_x86_android)
> +   /* this is a no-op */

#if defined(SPS_ARCH_amd64) || defined(SPS_ARCH_x86)

@@ +422,5 @@
> +# if MOZILLA_ARM_ARCH >= 7
> +  __asm__ __volatile__("sev");
> +# endif
> +#else
> +#  error "Unhandled platform"

lift the #error

@@ +471,5 @@
> +  nanosleep(&req, NULL);
> +}
> +
> +/* Use CAS to implement standalone atomic increment. */
> +static void atomic_INC(uintptr_t* loc)

aha, i missed this one before.

@@ +476,5 @@
> +{
> +  while (1) {
> +    uintptr_t old = *loc;
> +    uintptr_t nyu = old + 1;
> +    int ok = do_CASW( loc, old, nyu );

Why not use __sync_fetch_and_add(loc, 1) ?
Attachment #719084 - Flags: review?(mh+mozilla)
Comment on attachment 719080 [details] [diff] [review]
Patch r29 (all non-SPS components, mostly breakpad changes)

Review of attachment 719080 [details] [diff] [review]:
-----------------------------------------------------------------

I'm not 100% finished with this patch, I have 4 things left to take care of, but I realized I've been accumulating comments here for 2 days now.

::: build/unix/stdc++compat/stdc++compat.cpp
@@ +32,5 @@
>      template istream& istream::_M_extract(double&);
>      template istream& istream::_M_extract(float&);
>      template istream& istream::_M_extract(unsigned int&);
>      template istream& istream::_M_extract(unsigned long&);
> +    template istream& istream::_M_extract(unsigned long long&);

If we land the DIRS changes to build the processor parts of Breakpad separately this should be in that patch, otherwise it can be merged back into the main patch.

::: js/xpconnect/src/XPCJSRuntime.cpp
@@ +1367,5 @@
>  #endif
>      }
>  #ifdef MOZ_ENABLE_PROFILER_SPS
>      // Tell the profiler that the runtime is gone
> +    if (PseudoStack *stack = mozilla_get_pseudo_stack())

The changes to this file should get merged back into the main patch.

::: toolkit/crashreporter/Makefile.in
@@ +16,5 @@
>  ifeq  ($(OS_ARCH),WINNT)
> +DIRS += \
> +  google-breakpad/src/common \
> +  google-breakpad/src/processor \
> +  breakpad-windows-libxul \

We're going to have to merge these on top of the changes from bug 784841, but that shouldn't be terrible. I can probably take care of that.

::: toolkit/crashreporter/google-breakpad/src/common/dwarf/dwarf2reader.cc
@@ +899,5 @@
> +                 CFIR_VAL_OFFSET_RULE, CFIR_REGISTER_RULE,
> +                 CFIR_EXPRESSION_RULE, CFIR_VAL_EXPRESSION_RULE };
> +
> +  // Produce the tag that identifies the child class of this object.
> +  virtual CFIRTag getTag() const = 0;

This file's changes are up for upstream review here:
http://breakpad.appspot.com/530002/

I'll fold them into a local Breakpad patch.

::: toolkit/crashreporter/google-breakpad/src/common/dwarf_cfi_to_module.cc
@@ +262,5 @@
>            " describe how to recover register '%s', "
>            " but this translator cannot yet translate DWARF expressions to"
> +          " Breakpad postfix expressions (shown %u times)\n",
> +          file_.c_str(), section_.c_str(), offset, FromUniqueString(reg),
> +          (unsigned int)n_complaints);

Do you think we can instead just add a "bool verbose" flag to the DumpSymbols code and make it trickle all the way down through the DWARF code? Are you OK with just suppressing these messages?

::: toolkit/crashreporter/google-breakpad/src/common/dwarf_cu_to_module.cc
@@ +56,5 @@
>  using std::map;
>  using std::pair;
>  using std::set;
>  using std::vector;
> +using std::sort;

Landed this fix upstream:
http://code.google.com/p/google-breakpad/source/detail?r=1117

::: toolkit/crashreporter/google-breakpad/src/common/linux/dump_symbols.cc
@@ +118,5 @@
>  class MmapWrapper {
>   public:
>    MmapWrapper() : is_set_(false) {}
>    ~MmapWrapper() {
> +    if (is_set_ && base_ != NULL && size_ > 0) {

Landed this fix upstream:
http://code.google.com/p/google-breakpad/source/detail?r=1116

@@ +424,5 @@
> +            obj_file.c_str());
> +    for (it = debug_dirs.begin(); it < debug_dirs.end(); it++) {
> +      const std::string debug_dir = *it;
> +      fprintf(stderr, "  %s/%s\n", debug_dir.c_str(), debuglink);
> +    }

Landed this fix upstream:
http://code.google.com/p/google-breakpad/source/detail?r=1116

@@ +523,5 @@
>    typedef typename ElfClass::Phdr Phdr;
>    typedef typename ElfClass::Shdr Shdr;
>  
> +  BPLOG(INFO) << "";
> +  BPLOG(INFO) << "LoadSymbols: BEGIN " << obj_file;

Do you think we could skip these logging things for now? Mixing the processor logging classes into the symbol dumping code is a little messy.

@@ +626,4 @@
>      }
>    }
>  
> +  if (!found_debug_info_section && !found_usable_info) {

I looked at this again, and I figured out a smarter change here. Here's a diff -w:
http://diff.pastebin.mozilla.org/2185071

I'll roll this into 01-dump-symbols-just-cfi.patch.

::: toolkit/crashreporter/google-breakpad/src/common/module.h
@@ +370,5 @@
>  
>    // Relation for maps whose keys are strings shared with some other
>    // structure.
>    struct CompareStringPtrs {
> +    bool operator()(const string *x, const string *y) const { return *x < *y; }

Landed this fix upstream:
http://code.google.com/p/google-breakpad/source/detail?r=1118

::: toolkit/crashreporter/google-breakpad/src/common/windows_help.h
@@ +19,5 @@
> +char *strtok_r(char *s1, const char *s2, char **lasts);
> +uint64_t strtoull(const char *str, char **endptr, int base);
> +#endif
> +
> +#endif  // COMMON_WINDOWS_HELP_H_

I think I've been able to obsolete this file entirely. I have a base patch to use stdint types everywhere in Breakpad as a start:
https://breakpad.appspot.com/535002/

then a patch to ensure that we're using breakpad_types.h instead of just stdint so this works on Windows, as well as a few #defines to make strtok_r and strtoull work on Windows:
http://breakpad.appspot.com/535003

::: toolkit/crashreporter/google-breakpad/src/google_breakpad/processor/stack_frame_symbolizer.h
@@ +60,3 @@
>      // This indicates non-critical error, such as, no code module found for
>      // frame's instruction, no symbol file, or resolver failed to load symbol.
> +    SR_ERROR,

I have a patch that might fix this in a less invasive fashion:
http://hg.mozilla.org/users/tmielczarek_mozilla.com/breakpad-mq/file/02cbeb362bde/stack-frame-symbolizer-error-macro

Needs testing, but if it works I'll apply it as a local patch and work on upstreaming it.

::: toolkit/crashreporter/google-breakpad/src/processor/cfi_frame_info.cc
@@ +127,5 @@
>    for (std::vector<const UniqueString*>::const_iterator name = rr_names.begin();
>         name != rr_names.end();
>         ++name) {
>      const UniqueString* nm = *name;
> +    Module::Expr rule = register_rules_.find(nm)->second;

I folded this change into 03-unique-string.patch.
Comment on attachment 719080 [details] [diff] [review]
Patch r29 (all non-SPS components, mostly breakpad changes)

Review of attachment 719080 [details] [diff] [review]:
-----------------------------------------------------------------

Ok, I am down to one last thing to sort out, the android elf.h changes.

::: toolkit/crashreporter/google-breakpad/src/common/stabs_to_module.cc
@@ +32,5 @@
>  // dump_stabs.cc --- implement the StabsToModule class.
>  
>  #include <assert.h>
> +#if !defined(ANDROID)
> +# include <cxxabi.h>

I fixed the STABS issues by adding a way to build without STABS support:
http://breakpad.appspot.com/536002

::: toolkit/crashreporter/google-breakpad/src/google_breakpad/processor/stack_frame_symbolizer.h
@@ +60,3 @@
>      // This indicates non-critical error, such as, no code module found for
>      // frame's instruction, no symbol file, or resolver failed to load symbol.
> +    SR_ERROR,

Upon further review I couldn't make this work, so I just landed your patch upstream (although I renamed the enum values):
http://code.google.com/p/google-breakpad/source/detail?r=1120

::: toolkit/crashreporter/google-breakpad/src/processor/basic_source_line_resolver.cc
@@ -36,5 @@
>  #include <stdio.h>
>  #include <string.h>
>  #include <sys/types.h>
>  #include <sys/stat.h>
> -#include <unistd.h>

Pushed this fix upstream:
http://code.google.com/p/google-breakpad/source/detail?r=1119
Posted patch Breakpad Makefile changes (obsolete) — Splinter Review
After upstreaming and factoring into local patches, I'm left with just this: Makefile.in changes.
Comment on attachment 719080 [details] [diff] [review]
Patch r29 (all non-SPS components, mostly breakpad changes)

This patch has been factored out into a Breakpad update:
http://hg.mozilla.org/users/tmielczarek_mozilla.com/mq/file/8125c6373b46/update-breakpad-r1120

A few additions to our local Breakpad patches:
http://hg.mozilla.org/users/tmielczarek_mozilla.com/mq/file/8125c6373b46/breakpad-local-patches

and the Makefile.in patch I just attached. This built fine for me on Linux64 and Android. I'm testing Windows right now and I'll also push to try to make sure my Breakpad changes don't break anything else. If it all looks good I'll land the Breakpad update tomorrow.
Attachment #719080 - Attachment is obsolete: true
Attachment #719080 - Flags: review?(ted)
These are the bits that I said you should fold back into your main SPS patch, as well as one minor change from my renaming of that Breakpad enum.
Comment on attachment 721014 [details] [diff] [review]
Breakpad Makefile changes

This obviously needs to be rebased on top of the moz.build changes. I'll get to that tomorrow.
Attachment #721014 - Flags: review?(mh+mozilla)
Blocks: 820048
Comment on attachment 721014 [details] [diff] [review]
Breakpad Makefile changes

Review of attachment 721014 [details] [diff] [review]:
-----------------------------------------------------------------

::: toolkit/crashreporter/google-breakpad/src/processor/Makefile.in
@@ +39,5 @@
> +#endif
> +#
> +#ifeq (gonk,$(MOZ_WIDGET_TOOLKIT))
> +#DEFINES += -DELFSIZE=32
> +#endif

Do you mean to keep those commented?

@@ +52,5 @@
> +
> +include $(topsrcdir)/config/rules.mk
> +
> +# See https://bugzilla.mozilla.org/show_bug.cgi?id=741348#c11
> +##file_id.$(OBJ_SUFFIX): STL_FLAGS=

Remainder from copy/paste?

::: toolkit/xre/Makefile.in
@@ +133,5 @@
>  SHARED_LIBRARY_LIBS += \
>    $(DEPTH)/toolkit/crashreporter/breakpad-windows-libxul/$(LIB_PREFIX)google_breakpad_libxul_s.$(LIB_SUFFIX)
>  endif
>  
> +ifdef MOZ_ENABLE_PROFILER_SPS

Do you really mean to remove these libs when SPS is disabled?
Attachment #721014 - Flags: review?(mh+mozilla)
I have a patch that's about to land in Breakpad to switch it to using stdint
types: https://breakpad.appspot.com/535002/

I have this rolled into my local patches for the moment, but this requires
a few local changes to make the build work.

This winds up being a prerequisite of the patches here to make some things
build on Windows.
Attachment #721221 - Flags: review?(mh+mozilla)
Fixed the issues glandium pointed out, also rebased on top of the moz.build changes.
Attachment #721253 - Flags: review?(mh+mozilla)
Attachment #721014 - Attachment is obsolete: true
Comment on attachment 721221 [details] [diff] [review]
fixup for stdint changes in Breakpad

Review of attachment 721221 [details] [diff] [review]:
-----------------------------------------------------------------

::: toolkit/crashreporter/test/dumputils.cpp
@@ +41,5 @@
>    if (!context)
>      return false;
>  
> +  uint64_t instruction_pointer;
> +  if (!context->GetInstructionPointer(&instruction_pointer)) {

This doesn't seem related.
Attachment #721221 - Flags: review?(mh+mozilla) → review+
(In reply to Mike Hommey [:glandium] from comment #87)
> >  
> > +  uint64_t instruction_pointer;
> > +  if (!context->GetInstructionPointer(&instruction_pointer)) {
> 
> This doesn't seem related.

It's not. I just had to touch this line anyway for the fixup, so I figured I'd remove a few lines of code since I landed that method upstream a while ago.
Attachment #721253 - Flags: review?(mh+mozilla) → review+
I pushed the Breakpad update + local Breakpad patches + Breakpad makefile changes to Try:
https://tbpl.mozilla.org/?tree=Try&rev=5a9a423b2f76

I fixed the B2G Panda opt bustage (trivial change) and re-tested that on try:
https://tbpl.mozilla.org/?tree=Try&rev=ba5501dfdc7d

That looked good so I pushed those to inbound:
https://hg.mozilla.org/integration/mozilla-inbound/rev/5fe39d67806e
https://hg.mozilla.org/integration/mozilla-inbound/rev/e6835779f327
Whiteboard: [leave open]
Ted, this is still not compiling for me on try, but only in a debug build.  An optimized build works fine.  That suggests some sort of inlining issue where parts of the code thing a function is inlined and parts don't...  Is that possible fallout from the above changes for these functions:

  google_breakpad::StabsReader::Process, 
  google_breakpad::StabsReader::StabsReader
  google_breakpad::DumpSymbols::LoadCommandDumper::SymtabCommand
  google_breakpad::StabsToModule::StabsToModule

?
I can repro this on Mac with a debug build, that was what I was missing. I think I know how to fix it, I believe the patch I had you try just wasn't quite correct. I should have a fix tomorrow.
OK.  For what it's worth, an --enable-debug --enable-optimize build (which is what "debug" builds are on tbpl) works fine, which explains this mystery.
This broke mac debug non-optimized builds because I didn't add NO_STABS_SUPPORT to the mac code, so it tries to use the stabs code but we don't actually build and link it in. I have a patch that adds NO_STABS_SUPPORT for the Mac code, but this seemed simpler since it doesn't require Breakpad changes. This just builds the stabs code everywhere but Android, even though we don't need it.
Attachment #722236 - Flags: review?(mh+mozilla)
Comment on attachment 722236 [details] [diff] [review]
Build stabs code except for Android

Review of attachment 722236 [details] [diff] [review]:
-----------------------------------------------------------------

Fair enough
Attachment #722236 - Flags: review?(mh+mozilla) → review+
Comment on attachment 723059 [details] [diff] [review]
For final review

Review of attachment 723059 [details] [diff] [review]:
-----------------------------------------------------------------

r+ conditional on me misunderstanding the spinLock

::: toolkit/crashreporter/google-breakpad/src/common/linux/dump_symbols.cc
@@ +625,5 @@
>        found_usable_info = found_usable_info || result;
>      }
>    }
>  
> +  if (!found_debug_info_section && symbol_data != ONLY_CFI) {

Does ted need to look at this?

::: tools/profiler/ProfileEntry2.cpp
@@ +84,5 @@
> +  // There is no compiler enforced mapping between tag chars
> +  // and union variant fields, so the following was derived
> +  // by looking through all the use points of TableTicker.cpp.
> +  //   mTagData   (const char*)  m,c,s
> +  //   mTagPtr    (void*)        d,l,L, S(sp-value)

S stands for 'Start of Stack'. Any tags read before the S are considered to come from a truncated sample and are ignored.

::: tools/profiler/UnwinderThread2.cpp
@@ +123,5 @@
> +static void thread_register_for_profiling ( void* stackTop );
> +
> +// RUNS IN SIGHANDLER CONTEXT
> +// Acquire an empty buffer and mark it as FILLING
> +static UnwinderThreadBuffer* acquire_empty_buffer();

I don't see the benefit of having this intermediate acquire_empty_buffer instead of just having uwt__acquire_empty_buffer() as defined in the header.

@@ +317,5 @@
> +                  ->ents[j % N_PROFENTS_PER_PAGE]
> +     
> +   entsPages[] are allocated on demand.  Because zero can
> +   theoretically be a valid page pointer, use 
> +   ProfEntsPage_INVALID == (ProfEntsPage*)1 to mark invalid pages.

-1?

@@ +469,5 @@
> +}
> +
> +/* Register a thread for profiling.  It must not be allowed to receive
> +   signals before this is done, else the signal handler will
> +   MOZ_ASSERT. */

I don't see the assert. I don't like the idea of having to take a chance deadlocking the process when a thread is being register when for example we start a WebWorker. Gecko is pretty thread heavy so this will make the profiler flaky.

@@ +804,5 @@
> +static void* unwind_thr_fn(void* exit_nowV)
> +{
> +  /* If we're the first thread in, we'll need to allocate the buffer
> +     array g_buffers plus the Buffer structs that it points at. */
> +  spinLock_acquire(&g_spinLock);

This functions grabs the spin_lock which we also do in signal handler context. I don't understand how this is safe.

@@ +891,5 @@
> +        if (ms_to_sleep_if_empty > 1000)
> +          ms_to_sleep_if_empty = 1000;
> +      }
> +      continue;
> +    }

NIT: I think it would be more self contain to reset ms_to_sleep_if_empty here.

@@ +903,5 @@
> +
> +    /* unwind .. in which we can do anything we like, since any
> +       resource stalls that we may encounter (eg malloc locks) in
> +       competition with signal handler instances, will be short
> +       lived since the signal handler is guaranteed nonblocking. */

I don't think this comment really is accurate. It's not safe because they are 'short'. It's safe because the signal handler isn't allowed to use them.

@@ +971,5 @@
> +       combined (both), and neither.  We handle all four cases. */
> +
> +    MOZ_ASSERT( (ix_first_hP == infUW && ix_last_hQ == infUW) ||
> +            (ix_first_hP != infUW && ix_last_hQ != infUW) );
> +    bool have_P = ix_first_hP != infUW;

have_Pseudo_ent. Too many hP, hQ, UW in here makes it hard to follow.

@@ +977,5 @@
> +      MOZ_ASSERT(ix_first_hP < ix_last_hQ);
> +      MOZ_ASSERT(ix_last_hQ <= buff->entsUsed);
> +    }
> +
> +    /* Neither N nor P.  This is very unusual but has been observed to happen.

Do you mean: 'No native unwind or pseudo entries'? When would this actually happen? Off the top of my head this could happen if we're profiling but the pseudostack is empty. I typically try to always have a permanent root tag.

@@ +979,5 @@
> +    }
> +
> +    /* Neither N nor P.  This is very unusual but has been observed to happen.
> +       Just copy to the output. */
> +    if (!need_native_unw && !have_P) {

I think a lot of code duplication could be saved by combing all cases (or at least the first 3) and just iterating over each entry keeping track of what piece it's currently processing.

::: tools/profiler/UnwinderThread2.h
@@ +2,5 @@
> +/* This Source Code Form is subject to the terms of the Mozilla Public
> + * License, v. 2.0. If a copy of the MPL was not distributed with this
> + * file, You can obtain one at http://mozilla.org/MPL/2.0/. */
> +
> +#ifndef MOZ_UNWINDER_THREAD_2_H

NIT: I don't really think these need the 2 suffix. It's not even used consistently.
Attachment #723059 - Flags: review?(bgirard) → review+
(In reply to Benoit Girard (:BenWa) from comment #99)
> r+ conditional on me misunderstanding the spinLock
> This functions grabs the spin_lock which we also do in signal handler
> context. I don't understand how this is safe.

(as discussed on irc) The sighandler and unwinder threads communicate
via a circular array of buffers.  Each buffer is basically a dump of
the registers and top 32k of stack.  The spinlock protects the buffer
array.

The unwinder thread takes the lock, pulls off the next buffer to
unwind, drops the lock, and does the unwind.  Hence it holds the lock
only very briefly.  It makes forward progress until the array is
empty, at which point it sleeps briefly before rechecking the array.

The sighandler thread takes the lock, and checks if there are any
buffers free.  If so, it copies registers + top of the stack into the
buffer, and drops the lock.  If there are no free buffers, it drops
the lock immediately and returns, hence losing the sample but avoiding
any danger of deadlock.  The sighandler thread is guaranteed always
to make forward progress.

In 'steady state' operation the unwinder thread can keep up with the
sighandler thread, so no samples are lost and only one buffer at most
is ever in use.  Having a few available buffers (by default 8)
provides a small safety margin against lost samples if the unwinder
thread should stall briefly, usually due to having to load CFI from
a newly encountered .so.
My concern was that the sighandler would prevent the sighandler from running not allowing the underwinder thread. I made a sample program which confirms that this is not the case. As a result I am convinced that this implementation does not deadlock (as long as we don't profile the unwinder threads).
Whiteboard: [leave open]
Posted file build error
I tried to build from inbound but got a compile error. I think we might of regressed '--disable-crashreporter' builds.
(In reply to Benoit Girard (:BenWa) from comment #103)
> I tried to build from inbound but got a compile error. I think we might of
> regressed '--disable-crashreporter' builds.

Which means that Fennec can't be built on Mac right now, since Mac users can only build Fennec with --disable-crashreporter. :(
Building with --disable-profiler does not help; I still get:

ld: .../tools/profiler/local_debug_info_symbolizer.o: in function google_breakpad::ustr__ZDcfa():toolkit/crashreporter/google-breakpad/src/common/unique_string.h:227: error: undefined reference to 'google_breakpad::ToUniqueString(std::basic_string<char, std::char_traits<char>, std::allocator<char> >)'

etc.
Ok, that's a different bustage that's probably easier to fix. I'll sort that out and then figure out the --disable-crashreporter bustage.
Would it be worth backing this out until the --disable-crashreporter / --disable-profiler bustage is sorted out...?

I'm worried that this will cause broad local-bustage & confusion if it's merged to central without those issues fixed. (I suspect that "disable-crashreporter" is a pretty common setting in developers' local debug builds, since we don't want to report crashes for our own local patches' issues. And per comment 104, it's apparently required for some configurations.)
This patch was a real bear to get landed, so I'd rather not have it backed out. --disable-crashreporter isn't required to not submit reports, we only enable the crashreporter code at runtime on official builds. I'll fix this ASAP.
(In reply to Daniel Holbert [:dholbert] from comment #107)
> --disable-profiler bustage is sorted out...?

--disable-profiler isn't a configure option.
(I don't directly know about any disable-profiler stuff; I was just quoting that from comment 105 / comment 106)
https://hg.mozilla.org/mozilla-central/rev/cc45fdc389df
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla22
(In reply to Benoit Girard (:BenWa) from comment #109)
> --disable-profiler isn't a configure option.

--disable-profiling is, though.
Yes but it's not expected to disable the profiler. The profiler will continue to work without full stacks.
(In reply to Benoit Girard (:BenWa) from comment #113)
> Yes but it's not expected to disable the profiler. The profiler will
> continue to work without full stacks.

dholbert points out on IRC that it sounds like I'm arguing the build failure. The build failure is real and should be fixed. I just didn't want anyone to be under the false impression that we had a configure option that would prevent the profiler sources from being compiled which we do not. And while 'disable-profiling' sounds like it would do that, it does not.
(In reply to Ted Mielczarek [:ted.mielczarek] from comment #108)
Filed as bug 850089.
Also broke the (hidden) Win64 builds, "Unsupported platform".
And I presume it's what broke the (hidden) ASan builds, https://tbpl.mozilla.org/php/getParsedLog.php?id=20543688&tree=Firefox
Depends on: 850089
Depends on: 850131
Depends on: 850132
As I mentioned in Bug 850089 Comment 3, this just broke Android development on Mac, which is the majority of our Fennec developers. If someone gets to it before me, please back out this bug before the morning.
Backed out on inbound.

https://hg.mozilla.org/integration/mozilla-inbound/rev/e8938a43c31a

Sorry, Ted; I'm sure this was unpleasant to land, but I figured it was better to allow practically our entire mobile development team to continue working tomorrow, rather than wait for a solution.

I filed Bug 850136 to see if we can get some builders running, um, the configuration that pretty much *everyone* actually uses for Android development, so that next time you'll actually get some bustage that you can see in tbpl.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Sorry, I was mistaken in thinking we had a --disable-profiler switch that would work as a workaround. In light of that backing this out is ok, I'm working on a build system fix for the --disable-crashreporter case and will re-land when that's ready.
Backout:
https://hg.mozilla.org/mozilla-central/rev/e8938a43c31a

I'll try to do another merge later tonight with the re-landing.
I think this also broke the b2g build:
ia/src/tools/profiler/UnwinderThread2.cpp:1760: error: undefined reference to 'google_breakpad::CallStack::~CallStack()'
collect2: ld returned 1 exit status
make[6]: *** [libxul.so] Error 1
Blocks: 850375
Gregor: broke the B2G build where?
(In reply to Ted Mielczarek [:ted.mielczarek] from comment #124)
> Gregor: broke the B2G build where?

I am building on mac where the crashreporter is disabled by default.
Ok, this should be fixed with my relanding on inbound.
https://hg.mozilla.org/mozilla-central/rev/8b366545161d
Status: REOPENED → RESOLVED
Closed: 6 years ago6 years ago
Resolution: --- → FIXED
Depends on: 850833
Depends on: 853097
Depends on: 855012
Blocks: 855466
You need to log in before you can comment on or make changes to this bug.