Closed
Bug 977067
Opened 10 years ago
Closed 3 years ago
|mach valgrind-test|: use --soname-synonyms so that --disable-jemalloc isn't required
Categories
(Testing :: General, defect)
Testing
General
Tracking
(Not tracked)
RESOLVED
WONTFIX
People
(Reporter: n.nethercote, Assigned: n.nethercote)
References
Details
Attachments
(2 files, 1 obsolete file)
3.20 KB,
patch
|
Details | Diff | Splinter Review | |
4.50 KB,
patch
|
glandium
:
review-
bent.mozilla
:
review+
|
Details | Diff | Splinter Review |
glandium told me that Valgrind's --soname-synonyms=somalloc=NONE options (funky syntax with the two '=' symbols, BTW) means that --disable-jemalloc builds are no longer needed. I just confirmed this locally. Cool! Now, why didn't I know about this earlier? Oh, because the running-Firefox-under-Valgrind docs doesn't mention it. So I will update that page to remove all references to --disable-jemalloc, and add --soname-synonyms=somalloc=NONE to the list of required Valgrind flags.
Assignee | ||
Comment 1•10 years ago
|
||
jseward, can you review the patch and also my plan to update the docs as per comment 0? Thanks.
Attachment #8382158 -
Flags: review?(jseward)
Assignee | ||
Comment 2•10 years ago
|
||
Kyle, I'd like to fix the Valgrind-on-B2G invocation to account for this change. It looks like gonk-misc/default-gecko-config is the file that specifies --disable-jemalloc for B2G builds. How do I submit patches for that? GitHub pull requests? Also, run-valgrind.sh doesn't specify --vex-iropt-register-updates=allregs-at-mem-access, but it should. (As well as -soname-synonyms=somalloc=NONE.)
Flags: needinfo?(kyle)
Comment 3•10 years ago
|
||
(In reply to Nicholas Nethercote [:njn] from comment #1) I believe this will work on desktop linuxes, but I seem to recall some difficulties with it when experimenting on Android. To which platforms do your changes to |mach| apply?
Comment 4•10 years ago
|
||
I actually already have a separate bug to do this on FxOS, bug 976950. Filed it last night and went to bed so haven't had a chance to work on it, but I can take care of the updates needed this morning. I'll add iropt-register-updates also.
Flags: needinfo?(kyle)
Assignee | ||
Comment 5•10 years ago
|
||
This version also removes the --disable-jemalloc flags from the relevant in-tree mozconfigs. glandium, can you review that part? Thanks.
Attachment #8382541 -
Flags: review?(mh+mozilla)
Attachment #8382541 -
Flags: review?(jseward)
Assignee | ||
Updated•10 years ago
|
Attachment #8382158 -
Attachment is obsolete: true
Attachment #8382158 -
Flags: review?(jseward)
Assignee | ||
Comment 6•10 years ago
|
||
> I believe this will work on desktop linuxes, but I seem to recall some
> difficulties with it when experimenting on Android. To which platforms
> do your changes to |mach| apply?
Hmm, ok. The |mach| changes apply to all platforms. I wonder if in the history of the universe anybody has ever run |mach valgrind-test| on Android.
This works fine for me on b2g, but the caveat is that mozjemalloc.c has issues with its valgrind reporting. I have some local patches to move some of the flags around, but they're not 100% perfect.
Here's a patch that fixes a bunch of bogus errors from being reported from both jemalloc and ipc code when valgrind+jemalloc is being used.
Attachment #8382604 -
Flags: review?(kyle)
Comment 9•10 years ago
|
||
AIUI --soname-synonyms=somalloc=NONE makes valgrind replace jemalloc with its own allocator.
Comment 10•10 years ago
|
||
Comment on attachment 8382604 [details] [diff] [review] fix jemalloc + valgrind, and ipc + valgrind Review of attachment 8382604 [details] [diff] [review]: ----------------------------------------------------------------- I am so not qualified to review either of this. Forwarding to glandium and bent respectively.
Attachment #8382604 -
Flags: review?(mh+mozilla)
Attachment #8382604 -
Flags: review?(kyle)
Attachment #8382604 -
Flags: review?(bent.mozilla)
Comment 11•10 years ago
|
||
(That is, glandium for jemalloc, bent for ipc pickle fix)
Assignee | ||
Comment 12•10 years ago
|
||
Comment on attachment 8382604 [details] [diff] [review] fix jemalloc + valgrind, and ipc + valgrind Review of attachment 8382604 [details] [diff] [review]: ----------------------------------------------------------------- ::: ipc/chromium/src/base/pickle.cc @@ +513,5 @@ > header_->payload_size = new_size; > + > +#ifdef MOZ_VALGRIND > + // pad the trailing end as well, so that valgrind > + // doesn't complain when we write the buffer Is that right? If Valgrind is complaining about writing to the buffer, putting in some marker bytes won't help. Is it actually complaining about reading the buffer? What does the error look like? ::: memory/mozjemalloc/jemalloc.c @@ +203,5 @@ > #else > +# define VALGRIND_MALLOCLIKE_BLOCK(addr, sizeB, rzB, is_zeroed) do { } while(0) > +# define VALGRIND_FREELIKE_BLOCK(addr, rzB) do { } while(0) > +# define VALGRIND_MAKE_MEM_DEFINED(addr, sizeB) do { } while(0) > +# define VALGRIND_MAKE_MEM_UNDEFINED(addr, sizeB) do { } while(0) Hmm... IIRC, the Valgrind annotations in this file are not even remotely close to being correct, and if you're relying on them then something has gone wrong with the way Valgrind is invoked, i.e. it should be using its own allocators rather than jemalloc...
(In reply to Nicholas Nethercote [:njn] from comment #12) > Comment on attachment 8382604 [details] [diff] [review] > fix jemalloc + valgrind, and ipc + valgrind > > Review of attachment 8382604 [details] [diff] [review]: > ----------------------------------------------------------------- > > ::: ipc/chromium/src/base/pickle.cc > @@ +513,5 @@ > > header_->payload_size = new_size; > > + > > +#ifdef MOZ_VALGRIND > > + // pad the trailing end as well, so that valgrind > > + // doesn't complain when we write the buffer > > Is that right? If Valgrind is complaining about writing to the buffer, > putting in some marker bytes won't help. Is it actually complaining about > reading the buffer? What does the error look like? Whoops, sorry, meant "read". The issue is that while current code explicitly writes pad bytes to get up to the requested alignment of the data element to be written, it *also* always treats the size of a data element as AlignInt(size) (rounds up to next alignment) -- and the extra bytes aren't touched. So when we call sendmsg(), the buffer often has some uninitialized bytes along the way. This could be a rare privacy issue (which is I assume why the original pad bytes are always written anyway) if the buffer is reused actually, so maybe this shouldn't be MOZ_VALGRIND > ::: memory/mozjemalloc/jemalloc.c > @@ +203,5 @@ > > #else > > +# define VALGRIND_MALLOCLIKE_BLOCK(addr, sizeB, rzB, is_zeroed) do { } while(0) > > +# define VALGRIND_FREELIKE_BLOCK(addr, rzB) do { } while(0) > > +# define VALGRIND_MAKE_MEM_DEFINED(addr, sizeB) do { } while(0) > > +# define VALGRIND_MAKE_MEM_UNDEFINED(addr, sizeB) do { } while(0) > > Hmm... IIRC, the Valgrind annotations in this file are not even remotely > close to being correct, and if you're relying on them then something has > gone wrong with the way Valgrind is invoked, i.e. it should be using its own > allocators rather than jemalloc... I have them much closer to correct now. But we're definitely using jemalloc if jemalloc is enabled, and relying on these. I think the problem is that our names are actually moz_malloc etc., and not malloc. Valgrind doesn't understand those.
Comment 14•10 years ago
|
||
moz_malloc is just a level of indirection, it calls malloc under the hood. If valgrind --soname-synonyms=soname=NONE doesn't skip jemalloc for you then there's something wrong.
Hmm. I suspect the compiler is being helpful here maybe, by aggressively inlining or something? I see traces like: ==17927== Invalid write of size 4 ==17927== at 0x4896904: memset (in /system/lib/valgrind/vgpreload_memcheck-arm-linux.so) ==17927== by 0x48B7B45: arena_dalloc (jemalloc.c:4553) ==17927== by 0x64D53DF: js::LifoAlloc::freeAll() (Utility.h:157) ... ==17927== Address 0xdae3010 is 16 bytes after a recently re-allocated block of size 4,096 alloc'd ==17927== at 0x48B85F0: arena_malloc (jemalloc.c:4127) ==17927== by 0x48B8B29: malloc (jemalloc.c:6215) ==17927== by 0x64D54E5: js::LifoAlloc::getOrCreateChunk(unsigned int) (Utility.h:134) The allocation is malloc() so should have been caught. But the free went straight to arena_dalloc. Unless this is tail-call optimization making backtraces skip the intervening elements? Either way, definitely not overriding malloc.
Comment 16•10 years ago
|
||
I'm sure Julian can help debug this, but this is not supposed to happen.
Flags: needinfo?(jseward)
Comment 17•10 years ago
|
||
Vlad: the pickle.cc change looks fine to me. We appear to be agreed that this is to avoid memcheck complaining about undefined values in alignment holes (or similar) when writing structs to a file or socket. As far as the memory/mozjemalloc/jemalloc.c annotations go, my inclination is to nuke them and instead make intercepting of the jemalloc entry points work properly.
Flags: needinfo?(jseward)
Comment 18•10 years ago
|
||
(In reply to Mike Hommey [:glandium] from comment #16) > I'm sure Julian can help debug this, but this is not supposed to happen. Mike: to explain by analogy: in libc.so, there are always symbols "malloc" and "free" (etc) to intercept, and there is never any danger of missing them due to inlining. For jemalloc, what are the equivalent guaranteed-to-always-be-there-and -always-be-used symbols called? moz_{malloc,free} or plain {malloc,free} ? In particular, since > moz_malloc is just a level of indirection, it calls malloc under the hood. what is there to stop malloc being inlined into moz_malloc?
Comment 19•10 years ago
|
||
Nick: I'm reluctant to OK this until we have a clearer picture of what jemalloc guarantees us in terms of availability of interceptable symbols -- which ones are definitely never inlined nor otherwise sidestepped. IOW, what is the guaranteed stable binary-level interface to jemalloc? That's a question for Mike, I think. Once we have this nailed down, I am all in favour of ditching the --disable-jemalloc requirement.
Comment on attachment 8382604 [details] [diff] [review] fix jemalloc + valgrind, and ipc + valgrind Review of attachment 8382604 [details] [diff] [review]: ----------------------------------------------------------------- The pickle.cc change looks fine to me. Also, fwiw, chromium upstream does something similar: https://code.google.com/p/chromium/codesearch#chromium/src/base/pickle.cc&l=334
Attachment #8382604 -
Flags: review?(bent.mozilla) → review+
Comment 21•10 years ago
|
||
(In reply to Julian Seward from comment #18) > > moz_malloc is just a level of indirection, it calls malloc under the hood. > > what is there to stop malloc being inlined into moz_malloc? The fact that they are not in the same binary. moz_malloc is in libmozalloc. malloc is in the executable (on linux), or libmozglue (!linux)
Assignee | ||
Comment 22•10 years ago
|
||
Comment on attachment 8382541 [details] [diff] [review] |mach valgrind-test|: use --soname-synonyms so that --disable-jemalloc isn't required. Review of attachment 8382541 [details] [diff] [review]: ----------------------------------------------------------------- Clearing review requests until things are clearer.
Attachment #8382541 -
Flags: review?(mh+mozilla)
Attachment #8382541 -
Flags: review?(jseward)
(In reply to Mike Hommey [:glandium] from comment #21) > (In reply to Julian Seward from comment #18) > > > moz_malloc is just a level of indirection, it calls malloc under the hood. > > > > what is there to stop malloc being inlined into moz_malloc? > > The fact that they are not in the same binary. moz_malloc is in libmozalloc. > malloc is in the executable (on linux), or libmozglue (!linux) This doesn't seem to be true on b2g, which is going down the !linux path: $ nm objdir-gecko/dist/bin/b2g | grep malloc U malloc $ nm objdir-gecko/dist/bin/libmozglue.so | grep malloc ... 000218b9 T malloc $ nm objdir-gecko/dist/bin/libxul.so | grep malloc U malloc 003c7869 T moz_malloc But, there is code that explicitly calls moz_malloc. We tell developers to do this, e.g. https://developer.mozilla.org/en-US/docs/Infallible_memory_allocation
Comment 24•10 years ago
|
||
(In reply to Vladimir Vukicevic [:vlad] [:vladv] from comment #23) > (In reply to Mike Hommey [:glandium] from comment #21) > > (In reply to Julian Seward from comment #18) > > > > moz_malloc is just a level of indirection, it calls malloc under the hood. > > > > > > what is there to stop malloc being inlined into moz_malloc? > > > > The fact that they are not in the same binary. moz_malloc is in libmozalloc. > > malloc is in the executable (on linux), or libmozglue (!linux) > > This doesn't seem to be true on b2g, which is going down the !linux path: Android and B2G *are* !linux as far as many things go, including this one. > $ nm objdir-gecko/dist/bin/b2g | grep malloc > U malloc > $ nm objdir-gecko/dist/bin/libmozglue.so | grep malloc > ... > 000218b9 T malloc > > $ nm objdir-gecko/dist/bin/libxul.so | grep malloc > U malloc > 003c7869 T moz_malloc > > But, there is code that explicitly calls moz_malloc. We tell developers to > do this, e.g. > https://developer.mozilla.org/en-US/docs/Infallible_memory_allocation Again, moz_malloc is not in the same lib as malloc, and it's always calling it. Now, I realize comment 15 was on b2g, and I think the problem is "simply" caused by the fact that libmozglue is LD_PRELOADed and valgrind can't divert that. Which is not a problem on desktop linux, which this bug is really about, isn't it?
Comment 25•10 years ago
|
||
Comment on attachment 8382604 [details] [diff] [review] fix jemalloc + valgrind, and ipc + valgrind Review of attachment 8382604 [details] [diff] [review]: ----------------------------------------------------------------- Let's just not touch jemalloc just yet.
Attachment #8382604 -
Flags: review?(mh+mozilla) → review-
Comment 26•10 years ago
|
||
(In reply to Mike Hommey [:glandium] from comment #24) > Now, I realize comment 15 was on b2g, and I think the problem is "simply" > caused by the fact that libmozglue is LD_PRELOADed and valgrind can't divert > that. I'm pretty sure that V can intercept functions in LD_PRELOAD objects. The intercept mechanism examines object when they get mapped into the process, and it doesn't care how they got mapped in. We might be looking at a soname-difference problem, though.
Comment 27•10 years ago
|
||
(In reply to Mike Hommey [:glandium] from comment #21) > (In reply to Julian Seward from comment #18) > > > moz_malloc is just a level of indirection, it calls malloc under the hood. > > > > what is there to stop malloc being inlined into moz_malloc? > > The fact that they are not in the same binary. moz_malloc is in libmozalloc. > malloc is in the executable (on linux), or libmozglue (!linux) I don't see that being in different binaries necessarily inhibits cross-module inlining, if enough of our malloc implementation is pushed into a header file. Is it marked __attribute__((noinline)), or whatever the MSVC equivalent is, assuming it exists?
Comment 28•10 years ago
|
||
(In reply to Julian Seward from comment #27) > I don't see that being in different binaries necessarily inhibits > cross-module inlining, if enough of our malloc implementation is > pushed into a header file. Is it marked __attribute__((noinline)), > or whatever the MSVC equivalent is, assuming it exists? We have no part of malloc implementation in a header.
Comment 29•10 years ago
|
||
Here's my best understanding of the situation so far. All Gecko allocation/free is routed through moz_malloc (et al) in libmozalloc.so. This calls onwards to malloc (et al) which is either in the executable, on desktop Linux, or in libmozglue.so, on B2G/Android. moz_malloc ----------------------> malloc in libmozalloc.so in executable (linux) in libmozglue.so (B2G/Android) Hence interception in either the executable or libmozglue.so should work. libmozglue.so appears to have the soname "libmozglue.so", and the executable has no soname and so we have to use the fake name "NONE". Hence, V flags: Desktop: --soname-synonyms=somalloc=NONE B2G/Android: --soname-synonyms=somalloc=libmozglue.so or --soname-synonyms=somalloc=libmozglueZdso For the B2G/Android case, I am not sure whether V will accept the name with a dot in, or whether it needs to be escaped to "Zd". I need to check.
Assignee | ||
Comment 30•3 years ago
|
||
No action in seven years, this isn't going to happen.
Status: ASSIGNED → RESOLVED
Closed: 3 years ago
Resolution: --- → WONTFIX
You need to log in
before you can comment on or make changes to this bug.
Description
•