Closed Bug 1125091 Opened 10 years ago Closed 10 years ago

Make Nuwa be Valgrind-friendly

Categories

(Core :: mozglue, defect)

ARM
Gonk (Firefox OS)
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla41
Tracking Status
firefox41 --- fixed

People

(Reporter: jseward, Assigned: jseward)

Details

Attachments

(1 file, 2 obsolete files)

gecko/mozglue/build/Nuwa.cpp implements a multithreaded fork facility, using some non-standard techniques that cause Valgrind to report huge numbers of false uninitialised value errors in b2g runs. The problem appears to be that Nuwa uses longjmp() to set the stack pointer values of the threads being restored in the child process. The effect is to move the stack pointers down a small amount, typically between 300 and 1000 bytes. Valgrind/Memcheck marks the new area as accessible but undefined, since the change looks to it like allocation of a small-ish stack frame. But in this particular case, the heuristic is inappropriate, because the restored threads then resume, using the newly recreated stacks. And read values from those frames and use them, but they are marked as uninitialised, and so V complains. A solution that appears to fix the problem is is to add a client request for Memcheck, immediately after all setjmp calls that return (in this file), that paints the small area of stack above the stack pointer as defined-if-it-is-accessible. Since we don't know, after the setjmp returns, how far down SP moved, we have to paint some fixed area above it and hope for the best. 4096 bytes appears to be enough, whilst 2048 is not. The best solution would be to use a special version of longjmp which communicates to Valgrind exactly how/when the SP moves. But longjmp is written in assembly, so that would be (wildly) unportable. Patch(es) to follow.
Attached patch bug1125091-nuwa-w-mylongjmp.diff (obsolete) — Splinter Review
WIP patch, do not commit. This actually does fix the problem AFAICS, but contains an implementation of mylongjmp() which is only there for diagnostic purposes and is not used. The final fix will require only the definition and uses of macro POSTSETJMP and nothing else.
Thinker: the fix assumes that the setjmp calls in Nuwa.cpp only move SP down by a few hundred bytes at most. Hence the use of 2048*2 in POSTSETJMP in the patch. But is that true? Are there any cases where either: (1) setjmp could move SP to a higher address, or (2) setjmp could move SP down a large distance (much larger than 2048*2) ?
Flags: needinfo?(tlee)
Flags: needinfo?(tlee)
There are 2 types of stack tricks in Nuwa.cpp: 1. The reservation of 256 (STACK_RESERVED_SZ * 4) bytes in the beginning of the downward growing stack, see http://hg.mozilla.org/mozilla-central/file/6d55adbfd24d/mozglue/build/Nuwa.cpp#l613 and http://hg.mozilla.org/mozilla-central/file/6d55adbfd24d/mozglue/build/Nuwa.cpp#l627 This is reserved in thread_create_startup() in the Nuwa process and used by thread_recreate_startup() in the forked process to prevent from touching the stack used by _thread_create_start() and downward. In the forked process, when returning to thread_create_startup(), it switches back to thread_recreate_startup() by longjmp(). This longjmp won't move SP too much (upward and < 256 bytes) and works much like the switch between coroutines. 2. The longjmp() into where the stack is frozen in the Nuwa process: http://hg.mozilla.org/mozilla-central/file/6d55adbfd24d/mozglue/build/Nuwa.cpp#l1333 I am afraid that the movement of SP by this longjmp() is arbitrary (as much as limited by the stack size). I think this longjmp() will break the assumption in comment #2 for a deep stack (e.g. a thread used by WebRTC, see bug 966802).
(In reply to Cervantes Yu from comment #3) Cervantes, thank you for the information. I have some more questions. > There are 2 types of stack tricks in Nuwa.cpp: > > 1. The reservation of 256 (STACK_RESERVED_SZ * 4) bytes in the beginning of > the downward growing stack, see > [..] > This is reserved in thread_create_startup() in the Nuwa process and used by > thread_recreate_startup() in the forked process to prevent from touching the > stack used by _thread_create_start() and downward. In the forked process, > when returning to thread_create_startup(), it switches back to > thread_recreate_startup() by longjmp(). This longjmp won't move SP too much > (upward and < 256 bytes) and works much like the switch between coroutines. I think this is not a problem. Memcheck interprets "small move of SP upwards" to mean that the uncovered area (sp_old .. sp_new-1) can no longer be accessed safely, which I think is the correct interpretation. --- > 2. The longjmp() into where the stack is frozen in the Nuwa process: > http://hg.mozilla.org/mozilla-central/file/6d55adbfd24d/mozglue/build/Nuwa. > cpp#l1333 > > I am afraid that the movement of SP by this longjmp() is arbitrary (as much > as limited by the stack size). I think this longjmp() will break the > assumption in comment #2 for a deep stack (e.g. a thread used by WebRTC, see > bug 966802). Ok, that is interesting to know. I experimented with this a bit and found that the SP moved downwards somewhere between 1024 and 1536 bytes. Two more questions about this case (2.): (a) is the SP movement only "within a thread"? I mean, does this longjmp move SP only inside the calling thread's stack? Or does it move it from on thread's stack to another? (b) if the answer to (a) is "only inside the calling thread's stack", is the movement down only (to lower addresses), up only, or can it be in either direction?
Flags: needinfo?(cyu)
(In reply to Julian Seward [:jseward] from comment #4) > Ok, that is interesting to know. I experimented with this a bit and found > that the SP moved downwards somewhere between 1024 and 1536 bytes. > > Two more questions about this case (2.): > > (a) is the SP movement only "within a thread"? I mean, does this > longjmp move SP only inside the calling thread's stack? Or does it > move it from on thread's stack to another? > Yes, it moves sp only within the thread. We don't juggle the stacks on the thread. > (b) if the answer to (a) is "only inside the calling thread's stack", is > the movement down only (to lower addresses), up only, or can it be > in either direction? It only moves SP downward into the thread stack.
Flags: needinfo?(cyu)
Attached patch bug1125091-2.diff (obsolete) — Splinter Review
A version that seems to work OK on Flame at least, and is closer to something that could be landed.
Attachment #8553692 - Attachment is obsolete: true
Comment on attachment 8608097 [details] [diff] [review] bug1125091-2.diff Cervantes, does the patch look like something that would be acceptable for landing?
Attachment #8608097 - Flags: feedback?(cyu)
Comment on attachment 8608097 [details] [diff] [review] bug1125091-2.diff The use of fprintf() and the like is risky because the implementation is stack-hungry. I checked with gdb and found that deep inside the implementation fprintf(), when __sfvwrite() is called (bionic/libc/stdio/fvwrite.c), the stack grows at least 1700+ bytes, which is far larger than the stack reservation (256 bytes) we have in thread_create_startup(). That means the thread stack will be corrupted with the use of fprintf(). The patch can't land as is since DEBUG_VALGRIND_ANNOTATIONS is on, and fprintf() will be added even on a non-valgrind build. We should increase the stack reservation to a safe value if DEBUG_VALGRIND_ANNOTATIONS is on.
Attachment #8608097 - Flags: feedback?(cyu) → feedback-
Addresses review comments in Comment 8. Will now be completely silent when not running on Valgrind, regardless of build config and of the setting of DEBUG_VALGRIND_ANNOTATIONS. * DEBUG_VALGRIND_ANNOTATIONS is now 0 * No use of fprintf. Instead uses VALGRIND_PRINTF, which guarantees no linking problem and restricts max extra stack use to 12 words. This causes Valgrind itself to print the text. libc fprintf is not used. * No wrapping of LONGJMP any more; that was not necessary. * STACK_RESERVED_SZ is increased from 64 to 96 (256 to 384 bytes on 32 bit targets) so as to accommodate the 12 extra word requirement. * Is restricted to 32-bit arm targets. Will have no effect on any other platform.
Attachment #8608097 - Attachment is obsolete: true
Attachment #8610119 - Flags: feedback?(cyu)
Comment on attachment 8610119 [details] [diff] [review] bug1125091-3.diff Review of attachment 8610119 [details] [diff] [review]: ----------------------------------------------------------------- Looks good to me.
Attachment #8610119 - Flags: feedback?(cyu) → feedback+
Attachment #8610119 - Flags: review?(n.nethercote)
Comment on attachment 8610119 [details] [diff] [review] bug1125091-3.diff Review of attachment 8610119 [details] [diff] [review]: ----------------------------------------------------------------- Seems reasonable. ::: mozglue/build/Nuwa.cpp @@ +32,5 @@ > +/** > + * Support for telling Valgrind about the stack pointer changes that Nuwa > + * makes. Without this, Valgrind is unusable in Nuwa child processes due > + * to the large number of false positives resulting from Nuwa's stack > + * pointer changes. See bug 1125091. Nit: I don't think you need a javadoc-style /** comment here. @@ +62,5 @@ > + extra stack is therefore 12 words, that is, 48 bytes on arm32. > +*/ > +#if defined(MOZ_VALGRIND) && defined(VALGRIND_MAKE_MEM_DEFINED_IF_ADDRESSABLE) \ > + && defined(__arm__) && !defined(__aarch64__) > +# define POSTSETJMP_RESTORE(_who) \ Nit: I'd add '_' after POST. @@ +72,5 @@ > + unsigned long int len = 1024*2; \ > + VALGRIND_MAKE_MEM_DEFINED_IF_ADDRESSABLE(sp, len); \ > + if (DEBUG_VALGRIND_ANNOTATIONS) { \ > + VALGRIND_PRINTF("Nuwa: POSTSETJMP_RESTORE: marking [0x%lx, +%ld) as " \ > + "D-if-A, called by %s\n", sp, len, (_who)); \ Nit: expand "D-if-A" to "defined-if-addressable"?
Attachment #8610119 - Flags: review?(n.nethercote) → review+
Assignee: nobody → jseward
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla41
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: