Make Nuwa be Valgrind-friendly

RESOLVED FIXED in Firefox 41

Status

()

Core
mozglue
RESOLVED FIXED
3 years ago
3 years ago

People

(Reporter: jseward, Assigned: jseward)

Tracking

unspecified
mozilla41
ARM
Gonk (Firefox OS)
Points:
---

Firefox Tracking Flags

(firefox41 fixed)

Details

Attachments

(1 attachment, 2 obsolete attachments)

(Assignee)

Description

3 years ago
gecko/mozglue/build/Nuwa.cpp implements a multithreaded fork facility,
using some non-standard techniques that cause Valgrind to report huge
numbers of false uninitialised value errors in b2g runs.

The problem appears to be that Nuwa uses longjmp() to set the stack
pointer values of the threads being restored in the child process.
The effect is to move the stack pointers down a small amount,
typically between 300 and 1000 bytes.  Valgrind/Memcheck marks the new
area as accessible but undefined, since the change looks to it like
allocation of a small-ish stack frame.  But in this particular case,
the heuristic is inappropriate, because the restored threads then
resume, using the newly recreated stacks.  And read values from those
frames and use them, but they are marked as uninitialised, and so V
complains.

A solution that appears to fix the problem is is to add a client
request for Memcheck, immediately after all setjmp calls that return
(in this file), that paints the small area of stack above the stack
pointer as defined-if-it-is-accessible.  Since we don't know, after
the setjmp returns, how far down SP moved, we have to paint some fixed
area above it and hope for the best.  4096 bytes appears to be enough,
whilst 2048 is not.

The best solution would be to use a special version of longjmp which
communicates to Valgrind exactly how/when the SP moves.  But longjmp
is written in assembly, so that would be (wildly) unportable.

Patch(es) to follow.
(Assignee)

Comment 1

3 years ago
Created attachment 8553692 [details] [diff] [review]
bug1125091-nuwa-w-mylongjmp.diff

WIP patch, do not commit.  This actually does fix the problem
AFAICS, but contains an implementation of mylongjmp() which
is only there for diagnostic purposes and is not used.  The final
fix will require only the definition and uses of macro POSTSETJMP
and nothing else.
(Assignee)

Comment 2

3 years ago
Thinker: the fix assumes that the setjmp calls in Nuwa.cpp only
move SP down by a few hundred bytes at most.  Hence the use of
2048*2 in POSTSETJMP in the patch.  But is that true?  Are there
any cases where either:

(1) setjmp could move SP to a higher address, or
(2) setjmp could move SP down a large distance (much larger than 2048*2)

?
Flags: needinfo?(tlee)

Updated

3 years ago
Flags: needinfo?(tlee)
There are 2 types of stack tricks in Nuwa.cpp:

1. The reservation of 256 (STACK_RESERVED_SZ * 4) bytes in the beginning of the downward growing stack, see

http://hg.mozilla.org/mozilla-central/file/6d55adbfd24d/mozglue/build/Nuwa.cpp#l613
and http://hg.mozilla.org/mozilla-central/file/6d55adbfd24d/mozglue/build/Nuwa.cpp#l627

This is reserved in thread_create_startup() in the Nuwa process and used by  thread_recreate_startup() in the forked process to prevent from touching the stack used by _thread_create_start() and downward. In the forked process, when returning to thread_create_startup(), it switches back to thread_recreate_startup() by longjmp(). This longjmp won't move SP too much (upward and < 256 bytes) and works much like the switch between coroutines.

2. The longjmp() into where the stack is frozen in the Nuwa process:
http://hg.mozilla.org/mozilla-central/file/6d55adbfd24d/mozglue/build/Nuwa.cpp#l1333

I am afraid that the movement of SP by this longjmp() is arbitrary (as much as limited by the stack size). I think this longjmp() will break the assumption in comment #2 for a deep stack (e.g. a thread used by WebRTC, see bug 966802).
(Assignee)

Comment 4

3 years ago
(In reply to Cervantes Yu from comment #3)

Cervantes, thank you for the information.  I have some more questions.

> There are 2 types of stack tricks in Nuwa.cpp:
> 
> 1. The reservation of 256 (STACK_RESERVED_SZ * 4) bytes in the beginning of
> the downward growing stack, see
> [..]
> This is reserved in thread_create_startup() in the Nuwa process and used by 
> thread_recreate_startup() in the forked process to prevent from touching the
> stack used by _thread_create_start() and downward. In the forked process,
> when returning to thread_create_startup(), it switches back to
> thread_recreate_startup() by longjmp(). This longjmp won't move SP too much
> (upward and < 256 bytes) and works much like the switch between coroutines.

I think this is not a problem.  Memcheck interprets "small move of SP
upwards" to mean that the uncovered area (sp_old .. sp_new-1) can no
longer be accessed safely, which I think is the correct
interpretation.

---

> 2. The longjmp() into where the stack is frozen in the Nuwa process:
> http://hg.mozilla.org/mozilla-central/file/6d55adbfd24d/mozglue/build/Nuwa.
> cpp#l1333
> 
> I am afraid that the movement of SP by this longjmp() is arbitrary (as much
> as limited by the stack size). I think this longjmp() will break the
> assumption in comment #2 for a deep stack (e.g. a thread used by WebRTC, see
> bug 966802).

Ok, that is interesting to know.  I experimented with this a bit and found
that the SP moved downwards somewhere between 1024 and 1536 bytes.

Two more questions about this case (2.):

(a) is the SP movement only "within a thread"?  I mean, does this
    longjmp move SP only inside the calling thread's stack?  Or does it
    move it from on thread's stack to another?

(b) if the answer to (a) is "only inside the calling thread's stack", is
    the movement down only (to lower addresses), up only, or can it be
    in either direction?
Flags: needinfo?(cyu)
(In reply to Julian Seward [:jseward] from comment #4)
> Ok, that is interesting to know.  I experimented with this a bit and found
> that the SP moved downwards somewhere between 1024 and 1536 bytes.
> 
> Two more questions about this case (2.):
> 
> (a) is the SP movement only "within a thread"?  I mean, does this
>     longjmp move SP only inside the calling thread's stack?  Or does it
>     move it from on thread's stack to another?
> 
Yes, it moves sp only within the thread. We don't juggle the stacks on the thread.

> (b) if the answer to (a) is "only inside the calling thread's stack", is
>     the movement down only (to lower addresses), up only, or can it be
>     in either direction?
It only moves SP downward into the thread stack.
Flags: needinfo?(cyu)
(Assignee)

Comment 6

3 years ago
Created attachment 8608097 [details] [diff] [review]
bug1125091-2.diff

A version that seems to work OK on Flame at least, and is
closer to something that could be landed.
Attachment #8553692 - Attachment is obsolete: true
(Assignee)

Comment 7

3 years ago
Comment on attachment 8608097 [details] [diff] [review]
bug1125091-2.diff

Cervantes, does the patch look like something that would be acceptable
for landing?
Attachment #8608097 - Flags: feedback?(cyu)
Comment on attachment 8608097 [details] [diff] [review]
bug1125091-2.diff

The use of fprintf() and the like is risky because the implementation is stack-hungry. I checked with gdb and found that deep inside the implementation fprintf(), when __sfvwrite() is called (bionic/libc/stdio/fvwrite.c), the stack grows at least 1700+ bytes, which is far larger than the stack reservation (256 bytes) we have in thread_create_startup(). That means the thread stack will be corrupted with the use of fprintf().

The patch can't land as is since DEBUG_VALGRIND_ANNOTATIONS is on, and fprintf() will be added even on a non-valgrind build. We should increase the stack reservation to a safe value if DEBUG_VALGRIND_ANNOTATIONS is on.
Attachment #8608097 - Flags: feedback?(cyu) → feedback-
(Assignee)

Comment 9

3 years ago
Created attachment 8610119 [details] [diff] [review]
bug1125091-3.diff

Addresses review comments in Comment 8.  Will now be completely silent
when not running on Valgrind, regardless of build config and of the
setting of DEBUG_VALGRIND_ANNOTATIONS.

* DEBUG_VALGRIND_ANNOTATIONS is now 0

* No use of fprintf.  Instead uses VALGRIND_PRINTF, which guarantees
  no linking problem and restricts max extra stack use to 12 words.
  This causes Valgrind itself to print the text.  libc fprintf is not
  used.

* No wrapping of LONGJMP any more; that was not necessary.

* STACK_RESERVED_SZ is increased from 64 to 96 (256 to 384 bytes on
  32 bit targets) so as to accommodate the 12 extra word requirement.

* Is restricted to 32-bit arm targets.  Will have no effect on any
  other platform.
Attachment #8608097 - Attachment is obsolete: true
(Assignee)

Updated

3 years ago
Attachment #8610119 - Flags: feedback?(cyu)
Comment on attachment 8610119 [details] [diff] [review]
bug1125091-3.diff

Review of attachment 8610119 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good to me.
Attachment #8610119 - Flags: feedback?(cyu) → feedback+
(Assignee)

Updated

3 years ago
Attachment #8610119 - Flags: review?(n.nethercote)
Comment on attachment 8610119 [details] [diff] [review]
bug1125091-3.diff

Review of attachment 8610119 [details] [diff] [review]:
-----------------------------------------------------------------

Seems reasonable.

::: mozglue/build/Nuwa.cpp
@@ +32,5 @@
> +/**
> + * Support for telling Valgrind about the stack pointer changes that Nuwa
> + * makes.  Without this, Valgrind is unusable in Nuwa child processes due
> + * to the large number of false positives resulting from Nuwa's stack
> + * pointer changes.  See bug 1125091.

Nit: I don't think you need a javadoc-style /** comment here.

@@ +62,5 @@
> +   extra stack is therefore 12 words, that is, 48 bytes on arm32.
> +*/
> +#if defined(MOZ_VALGRIND) && defined(VALGRIND_MAKE_MEM_DEFINED_IF_ADDRESSABLE) \
> +    && defined(__arm__) && !defined(__aarch64__)
> +# define POSTSETJMP_RESTORE(_who) \

Nit: I'd add '_' after POST.

@@ +72,5 @@
> +      unsigned long int len = 1024*2; \
> +      VALGRIND_MAKE_MEM_DEFINED_IF_ADDRESSABLE(sp, len); \
> +      if (DEBUG_VALGRIND_ANNOTATIONS) { \
> +        VALGRIND_PRINTF("Nuwa: POSTSETJMP_RESTORE: marking [0x%lx, +%ld) as " \
> +                        "D-if-A, called by %s\n", sp, len, (_who)); \

Nit: expand "D-if-A" to "defined-if-addressable"?
Attachment #8610119 - Flags: review?(n.nethercote) → review+
(Assignee)

Updated

3 years ago
Assignee: nobody → jseward
https://hg.mozilla.org/mozilla-central/rev/85785e257efc
Status: NEW → RESOLVED
Last Resolved: 3 years ago
status-firefox41: --- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla41
You need to log in before you can comment on or make changes to this bug.