Closed Bug 977067 Opened 10 years ago Closed 3 years ago

|mach valgrind-test|: use --soname-synonyms so that --disable-jemalloc isn't required

Categories

(Testing :: General, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED WONTFIX

People

(Reporter: n.nethercote, Assigned: n.nethercote)

References

Details

Attachments

(2 files, 1 obsolete file)

glandium told me that Valgrind's --soname-synonyms=somalloc=NONE options (funky syntax with the two '=' symbols, BTW) means that --disable-jemalloc builds are no longer needed. I just confirmed this locally. Cool!

Now, why didn't I know about this earlier? Oh, because the running-Firefox-under-Valgrind docs doesn't mention it. So I will update that page to remove all references to --disable-jemalloc, and add --soname-synonyms=somalloc=NONE to the list of required Valgrind flags.
jseward, can you review the patch and also my plan to update the docs as per
comment 0? Thanks.
Attachment #8382158 - Flags: review?(jseward)
Kyle, I'd like to fix the Valgrind-on-B2G invocation to account for this change. It looks like gonk-misc/default-gecko-config is the file that specifies --disable-jemalloc for B2G builds. How do I submit patches for that? GitHub pull requests?

Also, run-valgrind.sh doesn't specify --vex-iropt-register-updates=allregs-at-mem-access, but it should.
(As well as -soname-synonyms=somalloc=NONE.)
Flags: needinfo?(kyle)
(In reply to Nicholas Nethercote [:njn] from comment #1)

I believe this will work on desktop linuxes, but I seem to recall some
difficulties with it when experimenting on Android.  To which platforms
do your changes to |mach| apply?
I actually already have a separate bug to do this on FxOS, bug 976950. Filed it last night and went to bed so haven't had a chance to work on it, but I can take care of the updates needed this morning. I'll add iropt-register-updates also.
Flags: needinfo?(kyle)
This version also removes the --disable-jemalloc flags from the relevant
in-tree mozconfigs. glandium, can you review that part? Thanks.
Attachment #8382541 - Flags: review?(mh+mozilla)
Attachment #8382541 - Flags: review?(jseward)
Attachment #8382158 - Attachment is obsolete: true
Attachment #8382158 - Flags: review?(jseward)
> I believe this will work on desktop linuxes, but I seem to recall some
> difficulties with it when experimenting on Android.  To which platforms
> do your changes to |mach| apply?

Hmm, ok. The |mach| changes apply to all platforms. I wonder if in the history of the universe anybody has ever run |mach valgrind-test| on Android.
This works fine for me on b2g, but the caveat is that mozjemalloc.c has issues with its valgrind reporting.  I have some local patches to move some of the flags around, but they're not 100% perfect.
Here's a patch that fixes a bunch of bogus errors from being reported from both jemalloc and ipc code when valgrind+jemalloc is being used.
Attachment #8382604 - Flags: review?(kyle)
AIUI --soname-synonyms=somalloc=NONE makes valgrind replace jemalloc with its own allocator.
Comment on attachment 8382604 [details] [diff] [review]
fix jemalloc + valgrind, and ipc + valgrind

Review of attachment 8382604 [details] [diff] [review]:
-----------------------------------------------------------------

I am so not qualified to review either of this. Forwarding to glandium and bent respectively.
Attachment #8382604 - Flags: review?(mh+mozilla)
Attachment #8382604 - Flags: review?(kyle)
Attachment #8382604 - Flags: review?(bent.mozilla)
(That is, glandium for jemalloc, bent for ipc pickle fix)
Comment on attachment 8382604 [details] [diff] [review]
fix jemalloc + valgrind, and ipc + valgrind

Review of attachment 8382604 [details] [diff] [review]:
-----------------------------------------------------------------

::: ipc/chromium/src/base/pickle.cc
@@ +513,5 @@
>    header_->payload_size = new_size;
> +
> +#ifdef MOZ_VALGRIND
> +  // pad the trailing end as well, so that valgrind
> +  // doesn't complain when we write the buffer

Is that right? If Valgrind is complaining about writing to the buffer, putting in some marker bytes won't help. Is it actually complaining about reading the buffer? What does the error look like?

::: memory/mozjemalloc/jemalloc.c
@@ +203,5 @@
>  #else
> +#  define VALGRIND_MALLOCLIKE_BLOCK(addr, sizeB, rzB, is_zeroed) do { } while(0)
> +#  define VALGRIND_FREELIKE_BLOCK(addr, rzB) do { } while(0)
> +#  define VALGRIND_MAKE_MEM_DEFINED(addr, sizeB) do { } while(0)
> +#  define VALGRIND_MAKE_MEM_UNDEFINED(addr, sizeB) do { } while(0)

Hmm... IIRC, the Valgrind annotations in this file are not even remotely close to being correct, and if you're relying on them then something has gone wrong with the way Valgrind is invoked, i.e. it should be using its own allocators rather than jemalloc...
(In reply to Nicholas Nethercote [:njn] from comment #12)
> Comment on attachment 8382604 [details] [diff] [review]
> fix jemalloc + valgrind, and ipc + valgrind
> 
> Review of attachment 8382604 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> ::: ipc/chromium/src/base/pickle.cc
> @@ +513,5 @@
> >    header_->payload_size = new_size;
> > +
> > +#ifdef MOZ_VALGRIND
> > +  // pad the trailing end as well, so that valgrind
> > +  // doesn't complain when we write the buffer
> 
> Is that right? If Valgrind is complaining about writing to the buffer,
> putting in some marker bytes won't help. Is it actually complaining about
> reading the buffer? What does the error look like?

Whoops, sorry, meant "read".  The issue is that while current code explicitly writes pad bytes to get up to the requested alignment of the data element to be written, it *also* always treats the size of a data element as AlignInt(size) (rounds up to next alignment) -- and the extra bytes aren't touched.  So when we call sendmsg(), the buffer often has some uninitialized bytes along the way.  This could be a rare privacy issue (which is I assume why the original pad bytes are always written anyway) if the buffer is reused actually, so maybe this shouldn't be MOZ_VALGRIND

> ::: memory/mozjemalloc/jemalloc.c
> @@ +203,5 @@
> >  #else
> > +#  define VALGRIND_MALLOCLIKE_BLOCK(addr, sizeB, rzB, is_zeroed) do { } while(0)
> > +#  define VALGRIND_FREELIKE_BLOCK(addr, rzB) do { } while(0)
> > +#  define VALGRIND_MAKE_MEM_DEFINED(addr, sizeB) do { } while(0)
> > +#  define VALGRIND_MAKE_MEM_UNDEFINED(addr, sizeB) do { } while(0)
> 
> Hmm... IIRC, the Valgrind annotations in this file are not even remotely
> close to being correct, and if you're relying on them then something has
> gone wrong with the way Valgrind is invoked, i.e. it should be using its own
> allocators rather than jemalloc...

I have them much closer to correct now.  But we're definitely using jemalloc if jemalloc is enabled, and relying on these.  I think the problem is that our names are actually moz_malloc etc., and not malloc.  Valgrind doesn't understand those.
moz_malloc is just a level of indirection, it calls malloc under the hood. If valgrind --soname-synonyms=soname=NONE doesn't skip jemalloc for you then there's something wrong.
Hmm. I suspect the compiler is being helpful here maybe, by aggressively inlining or something?  I see traces like:

==17927== Invalid write of size 4
==17927==    at 0x4896904: memset (in /system/lib/valgrind/vgpreload_memcheck-arm-linux.so)
==17927==    by 0x48B7B45: arena_dalloc (jemalloc.c:4553)
==17927==    by 0x64D53DF: js::LifoAlloc::freeAll() (Utility.h:157)
...
==17927==  Address 0xdae3010 is 16 bytes after a recently re-allocated block of size 4,096 alloc'd
==17927==    at 0x48B85F0: arena_malloc (jemalloc.c:4127)
==17927==    by 0x48B8B29: malloc (jemalloc.c:6215)
==17927==    by 0x64D54E5: js::LifoAlloc::getOrCreateChunk(unsigned int) (Utility.h:134)

The allocation is malloc() so should have been caught.  But the free went straight to arena_dalloc.  Unless this is tail-call optimization making backtraces skip the intervening elements?  Either way, definitely not overriding malloc.
I'm sure Julian can help debug this, but this is not supposed to happen.
Flags: needinfo?(jseward)
Vlad: the pickle.cc change looks fine to me.  We appear to be agreed that
this is to avoid memcheck complaining about undefined values in alignment
holes (or similar) when writing structs to a file or socket.

As far as the memory/mozjemalloc/jemalloc.c annotations go, my inclination
is to nuke them and instead make intercepting of the jemalloc entry points
work properly.
Flags: needinfo?(jseward)
(In reply to Mike Hommey [:glandium] from comment #16)
> I'm sure Julian can help debug this, but this is not supposed to happen.

Mike: to explain by analogy: in libc.so, there are always symbols "malloc"
and "free" (etc) to intercept, and there is never any danger of missing
them due to inlining.

For jemalloc, what are the equivalent guaranteed-to-always-be-there-and
-always-be-used symbols called?  moz_{malloc,free} or plain {malloc,free} ?

In particular, since

> moz_malloc is just a level of indirection, it calls malloc under the hood.

what is there to stop malloc being inlined into moz_malloc?
Nick: I'm reluctant to OK this until we have a clearer picture of what
jemalloc guarantees us in terms of availability of interceptable
symbols -- which ones are definitely never inlined nor otherwise
sidestepped.  IOW, what is the guaranteed stable binary-level interface
to jemalloc?  That's a question for Mike, I think.

Once we have this nailed down, I am all in favour of ditching
the --disable-jemalloc requirement.
Comment on attachment 8382604 [details] [diff] [review]
fix jemalloc + valgrind, and ipc + valgrind

Review of attachment 8382604 [details] [diff] [review]:
-----------------------------------------------------------------

The pickle.cc change looks fine to me.

Also, fwiw, chromium upstream does something similar:

  https://code.google.com/p/chromium/codesearch#chromium/src/base/pickle.cc&l=334
Attachment #8382604 - Flags: review?(bent.mozilla) → review+
(In reply to Julian Seward from comment #18)
> > moz_malloc is just a level of indirection, it calls malloc under the hood.
> 
> what is there to stop malloc being inlined into moz_malloc?

The fact that they are not in the same binary. moz_malloc is in libmozalloc. malloc is in the executable (on linux), or libmozglue (!linux)
Comment on attachment 8382541 [details] [diff] [review]
|mach valgrind-test|: use --soname-synonyms so that --disable-jemalloc isn't required.

Review of attachment 8382541 [details] [diff] [review]:
-----------------------------------------------------------------

Clearing review requests until things are clearer.
Attachment #8382541 - Flags: review?(mh+mozilla)
Attachment #8382541 - Flags: review?(jseward)
(In reply to Mike Hommey [:glandium] from comment #21)
> (In reply to Julian Seward from comment #18)
> > > moz_malloc is just a level of indirection, it calls malloc under the hood.
> > 
> > what is there to stop malloc being inlined into moz_malloc?
> 
> The fact that they are not in the same binary. moz_malloc is in libmozalloc.
> malloc is in the executable (on linux), or libmozglue (!linux)

This doesn't seem to be true on b2g, which is going down the !linux path:

$ nm objdir-gecko/dist/bin/b2g | grep malloc
         U malloc
$ nm objdir-gecko/dist/bin/libmozglue.so  | grep malloc
...
000218b9 T malloc

$ nm objdir-gecko/dist/bin/libxul.so | grep malloc
         U malloc
003c7869 T moz_malloc

But, there is code that explicitly calls moz_malloc.  We tell developers to do this, e.g. https://developer.mozilla.org/en-US/docs/Infallible_memory_allocation
(In reply to Vladimir Vukicevic [:vlad] [:vladv] from comment #23)
> (In reply to Mike Hommey [:glandium] from comment #21)
> > (In reply to Julian Seward from comment #18)
> > > > moz_malloc is just a level of indirection, it calls malloc under the hood.
> > > 
> > > what is there to stop malloc being inlined into moz_malloc?
> > 
> > The fact that they are not in the same binary. moz_malloc is in libmozalloc.
> > malloc is in the executable (on linux), or libmozglue (!linux)
> 
> This doesn't seem to be true on b2g, which is going down the !linux path:

Android and B2G *are* !linux as far as many things go, including this one.

> $ nm objdir-gecko/dist/bin/b2g | grep malloc
>          U malloc
> $ nm objdir-gecko/dist/bin/libmozglue.so  | grep malloc
> ...
> 000218b9 T malloc
> 
> $ nm objdir-gecko/dist/bin/libxul.so | grep malloc
>          U malloc
> 003c7869 T moz_malloc
> 
> But, there is code that explicitly calls moz_malloc.  We tell developers to
> do this, e.g.
> https://developer.mozilla.org/en-US/docs/Infallible_memory_allocation

Again, moz_malloc is not in the same lib as malloc, and it's always calling it.

Now, I realize comment 15 was on b2g, and I think the problem is "simply" caused by the fact that libmozglue is LD_PRELOADed and valgrind can't divert that.

Which is not a problem on desktop linux, which this bug is really about, isn't it?
Comment on attachment 8382604 [details] [diff] [review]
fix jemalloc + valgrind, and ipc + valgrind

Review of attachment 8382604 [details] [diff] [review]:
-----------------------------------------------------------------

Let's just not touch jemalloc just yet.
Attachment #8382604 - Flags: review?(mh+mozilla) → review-
(In reply to Mike Hommey [:glandium] from comment #24)
> Now, I realize comment 15 was on b2g, and I think the problem is "simply"
> caused by the fact that libmozglue is LD_PRELOADed and valgrind can't divert
> that.

I'm pretty sure that V can intercept functions in LD_PRELOAD objects.  The
intercept mechanism examines object when they get mapped into the process,
and it doesn't care how they got mapped in.  We might be looking at a
soname-difference problem, though.
(In reply to Mike Hommey [:glandium] from comment #21)
> (In reply to Julian Seward from comment #18)
> > > moz_malloc is just a level of indirection, it calls malloc under the hood.
> > 
> > what is there to stop malloc being inlined into moz_malloc?
> 
> The fact that they are not in the same binary. moz_malloc is in libmozalloc.
> malloc is in the executable (on linux), or libmozglue (!linux)

I don't see that being in different binaries necessarily inhibits
cross-module inlining, if enough of our malloc implementation is
pushed into a header file.  Is it marked __attribute__((noinline)),
or whatever the MSVC equivalent is, assuming it exists?
(In reply to Julian Seward from comment #27)
> I don't see that being in different binaries necessarily inhibits
> cross-module inlining, if enough of our malloc implementation is
> pushed into a header file.  Is it marked __attribute__((noinline)),
> or whatever the MSVC equivalent is, assuming it exists?

We have no part of malloc implementation in a header.
Here's my best understanding of the situation so far.

All Gecko allocation/free is routed through moz_malloc (et al) in
libmozalloc.so.

This calls onwards to malloc (et al) which is either in the
executable, on desktop Linux, or in libmozglue.so, on B2G/Android.

moz_malloc ----------------------> malloc
in libmozalloc.so                  in executable (linux)
                                   in libmozglue.so (B2G/Android)

Hence interception in either the executable or libmozglue.so should
work.  libmozglue.so appears to have the soname "libmozglue.so", and
the executable has no soname and so we have to use the fake name
"NONE".

Hence, V flags:

Desktop:      --soname-synonyms=somalloc=NONE

B2G/Android:  --soname-synonyms=somalloc=libmozglue.so   or
              --soname-synonyms=somalloc=libmozglueZdso

For the B2G/Android case, I am not sure whether V will accept the name
with a dot in, or whether it needs to be escaped to "Zd".  I need to
check.

No action in seven years, this isn't going to happen.

Status: ASSIGNED → RESOLVED
Closed: 3 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: