Last Comment Bug 732173 - poison write during shutdown in a debug build
: poison write during shutdown in a debug build
Status: RESOLVED FIXED
[snappy:p1]
:
Product: Core
Classification: Components
Component: XPCOM (show other bugs)
: unspecified
: x86_64 Linux
: -- normal with 1 vote (vote)
: mozilla16
Assigned To: Rafael Ávila de Espíndola (:espindola) (not reading bugmail)
:
Mentors:
Depends on: 794178 771083 771920 804828 1054632
Blocks: 662444 773903
  Show dependency treegraph
 
Reported: 2012-03-01 13:31 PST by Rafael Ávila de Espíndola (:espindola) (not reading bugmail)
Modified: 2014-08-16 02:44 PDT (History)
8 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments
wip patch (3.52 KB, patch)
2012-03-22 09:37 PDT, Rafael Ávila de Espíndola (:espindola) (not reading bugmail)
no flags Details | Diff | Review
os X demo program (9.34 KB, text/plain)
2012-04-27 07:35 PDT, Rafael Ávila de Espíndola (:espindola) (not reading bugmail)
no flags Details
os X demo program (8.55 KB, application/x-bzip2)
2012-05-01 13:43 PDT, Rafael Ávila de Espíndola (:espindola) (not reading bugmail)
no flags Details
poison write (45.47 KB, patch)
2012-05-08 05:59 PDT, Rafael Ávila de Espíndola (:espindola) (not reading bugmail)
no flags Details | Diff | Review
With MozillaRegisterDebugFD/MozillaUnRegisterDebugFD (52.25 KB, patch)
2012-05-11 05:32 PDT, Rafael Ávila de Espíndola (:espindola) (not reading bugmail)
no flags Details | Diff | Review
poison write (56.65 KB, patch)
2012-05-22 10:30 PDT, Rafael Ávila de Espíndola (:espindola) (not reading bugmail)
ted: review-
Details | Diff | Review
poison write (51.24 KB, patch)
2012-06-11 20:14 PDT, Rafael Ávila de Espíndola (:espindola) (not reading bugmail)
no flags Details | Diff | Review
poison write (49.44 KB, patch)
2012-06-13 13:05 PDT, Rafael Ávila de Espíndola (:espindola) (not reading bugmail)
ted: review+
Details | Diff | Review
poison write (45.13 KB, patch)
2012-06-21 15:34 PDT, Rafael Ávila de Espíndola (:espindola) (not reading bugmail)
ted: review+
Details | Diff | Review
poison write (56.68 KB, patch)
2012-06-25 10:58 PDT, Rafael Ávila de Espíndola (:espindola) (not reading bugmail)
ted: review+
Details | Diff | Review
the upstream breakpad bits (2.73 KB, patch)
2012-06-29 15:35 PDT, Rafael Ávila de Espíndola (:espindola) (not reading bugmail)
no flags Details | Diff | Review
update upstream bits (2.77 KB, patch)
2012-06-29 15:43 PDT, Rafael Ávila de Espíndola (:espindola) (not reading bugmail)
ted: review+
Details | Diff | Review

Description Rafael Ávila de Espíndola (:espindola) (not reading bugmail) 2012-03-01 13:31:41 PST

    
Comment 1 Rafael Ávila de Espíndola (:espindola) (not reading bugmail) 2012-03-01 13:34:57 PST
We want to make sure that after some point in the shutdown process there are no more writes to disk. The most reliable way to do this is probably to poison the write plt.got entry. On linux x86_64 it looks like this works:

----------------------------------------------
#include <stdio.h>
#include <unistd.h>
#include <stdint.h>
#include <assert.h>
#include <stdlib.h>

void abortOnInvalidWrite() {
  abort();
}

static void poisonWrite() {
  unsigned char *plt;
  asm("leaq	write@PLT(%%rip), %0" : "=r" (plt));
  uint16_t opcode = *(uint16_t*) plt;
  assert(opcode == 0x25ff);
  uint32_t offset = *(uint32_t*)(plt + 2);
  void **PltGotAddr = (void**) (plt + offset + 6);
  *PltGotAddr = (void*) abortOnInvalidWrite;
}

int main() {
  write(1, "test\n", 5);
  poisonWrite();
  write(1, "test\n", 5);
  return 0;
}
----------------------------------

Unfortunately we have to extract the address from the plt because the linker rejects a  R_X86_64_JUMP_SLOT in a .o file.
Comment 2 Rafael Ávila de Espíndola (:espindola) (not reading bugmail) 2012-03-22 09:37:03 PDT
Created attachment 608365 [details] [diff] [review]
wip patch

Testing the got patching idea found out that glibc skips the got when calling its own functions. Se the patching would work when we call write directly, but not via printf for example.

Another interesting problem found is that there are two write functions, one in libpthread and one in libc. We have to handle both.

What I am currently trying is patching the code. Similar to what a debugger would do to insert a breakpoint. The patch has less than 16 bytes, so we should be able to use an atomic (the wip patch still has a memcpy).

Another option would be to remove the exec bits of the start of write and handle the fault. This would be a bit harder since the function is not  required to be page aligned.
Comment 3 Rafael Ávila de Espíndola (:espindola) (not reading bugmail) 2012-03-28 07:13:56 PDT
I did a try push to

https://tbpl.mozilla.org/?tree=Try&rev=76476f19c348

Very simple testing locally found that the last writes were from the StartupCache. Once I have the current safe position for calling poisonWrite I will attach a patch for review.
Comment 4 Rafael Ávila de Espíndola (:espindola) (not reading bugmail) 2012-04-03 07:14:15 PDT
New push at 
https://tbpl.mozilla.org/?tree=Try&rev=e9ca5c834d38
Comment 5 Rafael Ávila de Espíndola (:espindola) (not reading bugmail) 2012-04-05 10:00:13 PDT
https://tbpl.mozilla.org/?tree=Try&rev=7fcb5e09c95b
Comment 6 Rafael Ávila de Espíndola (:espindola) (not reading bugmail) 2012-04-27 07:35:34 PDT
Created attachment 619026 [details]
os X demo program

Talking with Jeff Muizelaar he suggested that this might be easier to do on OS X, and I think he is right. The attached program already does a lot more than the linux/posix one. I will integrated it into m-c and see how it goes.
Comment 7 Rafael Ávila de Espíndola (:espindola) (not reading bugmail) 2012-05-01 13:43:56 PDT
Created attachment 620052 [details]
os X demo program

While trying to get asan working for another bug I found an interesting mini lib that it uses for wrapping some library calls. It is similar to what I was doing, but has some big advantages:

* supports 32 and 64 bits
* allocates memory close to the function, so the patch is just a 5 byte jmp.
* knows which functions are safe to copy and copies the old header to an executable buffer. The wrappers can use that buffer instead of repatching the wrapped function.
Comment 8 Rafael Ávila de Espíndola (:espindola) (not reading bugmail) 2012-05-02 20:28:06 PDT
There is some difference in symbols for 10.5, 10.6 and 10.7, but a try push with the symbols common to all 3 is at

https://tbpl.mozilla.org/?tree=Try&rev=be061e339976

I will try to cover all symbols tomorrow.
Comment 9 Rafael Ávila de Espíndola (:espindola) (not reading bugmail) 2012-05-03 08:50:29 PDT
I think I fixed the 32 bit issue:
https://tbpl.mozilla.org/?tree=Try&rev=e67ff636750d
Comment 10 Rafael Ávila de Espíndola (:espindola) (not reading bugmail) 2012-05-04 08:09:40 PDT
This is already find late writes to disk. I will open a separate bug about it.

A new try push is at.
https://tbpl.mozilla.org/?tree=Try&rev=4e9b83049363

This one should be all green, at which point I will clean it up and send for review.
Comment 11 Rafael Ávila de Espíndola (:espindola) (not reading bugmail) 2012-05-07 13:40:56 PDT
I have moved poisoning down a bit to avoid bugs 752642 and 751899. Lets see if it finds more.

https://tbpl.mozilla.org/?tree=Try&rev=5b432c95d12e
Comment 12 Rafael Ávila de Espíndola (:espindola) (not reading bugmail) 2012-05-08 05:59:45 PDT
Created attachment 621946 [details] [diff] [review]
poison write

This patch poisons the 13 write functions of OS X at a point where firefox no longer writes to disk.

Functions that are not used after this point are replaced with abort. Those that have some valid uses (like logging) are replaced with a wrapper.

The idea is to enable this first and then slowly move the point we poison write back as the late writes are moved/removed.

Is this the correct way to import third party code? We don't seem to have a third_party directory. Is the copyright noticed in the new files correct?

https://tbpl.mozilla.org/?tree=Try&rev=9ba969032ebc
Comment 13 Rafael Ávila de Espíndola (:espindola) (not reading bugmail) 2012-05-09 07:27:50 PDT
I found two other valid writes late in the shutdown process. Both from breakpad. The list is getting somewhat big for mozPoinsonWrite.cpp to know them all, so I think I will change this to use a register/unregister interface that clients can use to let us know that a given file descriptor can be used during shutdown. Ted, what do you think?
Comment 14 Rafael Ávila de Espíndola (:espindola) (not reading bugmail) 2012-05-11 05:32:14 PDT
Created attachment 623108 [details] [diff] [review]
With MozillaRegisterDebugFD/MozillaUnRegisterDebugFD

https://tbpl.mozilla.org/?tree=Try&rev=9b6d483029d4

This is similar to the previous patch, but handles the breakpad fds too. It uses the functions MozillaRegisterDebugFD and MozillaUnRegisterDebugFD the know the list of file descriptors that are OK to use during shutdown instead of AbortOnBadWrite knowing where too look.

This new patch is not as cleaner as I was expecting. Mostly because we have to handle writes that happen during static destructors, se we have to leak the vector and mutex. I can fix this by using a fixed sized static buffer and a posix mutex, but would like to know what you think first.
Comment 15 Rafael Ávila de Espíndola (:espindola) (not reading bugmail) 2012-05-22 10:30:50 PDT
Created attachment 626079 [details] [diff] [review]
poison write

This is an updated patch. It is using the register/unregister pair, has code to ignore the bogus nss (dbm really) writes and moves the poisoning as early as I could get it so far.

https://tbpl.mozilla.org/?tree=Try&rev=4af3cf41c0d9
Comment 16 Ted Mielczarek [:ted.mielczarek] 2012-06-08 10:44:05 PDT
Comment on attachment 626079 [details] [diff] [review]
poison write

Review of attachment 626079 [details] [diff] [review]:
-----------------------------------------------------------------

r- for the Breakpad stuff, I have a few other comments.

Is there a plan to implement this for other platforms in a followup?

::: toolkit/crashreporter/google-breakpad/src/client/minidump_file_writer.cc
@@ +40,5 @@
>  #include "common/linux/linux_syscall_support.h"
>  #include "common/linux/linux_libc_support.h"
>  #include "client/minidump_file_writer-inl.h"
>  #include "common/string_conversion.h"
> +#include "mozilla/mozPoisonWrite.h"

This isn't going to work. :-/ The Breakpad code is an upstream library, we can't stick calls to Mozilla functions in here.

I'm curious, though. Why is this ever a problem? Minidump writing should only happen on a crash. Were you seeing crashes during shutdown?

We can install a Breakpad callback that happens after the crash but before the minidump gets written. Could we simply disable write poisoning in that case, since we're crashing anyway?

::: toolkit/crashreporter/nsExceptionHandler.cpp
@@ +520,5 @@
>                        O_WRONLY | O_CREAT | O_TRUNC,
>                        0666);
>  
>      if (fd != -1) {
> +      MozillaRegisterDebugFD(fd);

This also only happens during a crash, so I don't see why it'd be a problem either.

::: tools/trace-malloc/lib/nsTraceMalloc.c
@@ +1768,5 @@
>      TM_SUPPRESS_TRACING_AND_ENTER_LOCK(t);
>  
>      ofp = fopen(pathname, WRITE_FLAGS);
>      if (ofp) {
> +        MozillaRegisterDebugFD(fileno(ofp));

Seems like you almost want an RAII class here to wrap this. There's a template class for RAII classes in mfbt now, if you want to do that:
http://mxr.mozilla.org/mozilla-central/source/mfbt/Scoped.h

::: xpcom/base/nsTraceRefcntImpl.cpp
@@ +1263,3 @@
>      fclose(gBloatLog);
> +    if (fd != 1 && fd != 2)
> +      MozillaUnRegisterDebugFD(fd);

Would be nice to avoid having to write this 5 times here, maybe it's worth making this a function?

::: xpcom/build/mach_override.c
@@ +1,1 @@
> +/* copied from https://github.com/rentzsch/mach_star */

I'm not actually going to review mach_override.{c,h}.

::: xpcom/build/mozPoisonWrite.cpp
@@ +43,5 @@
> +#include <string.h>
> +
> +using namespace mozilla;
> +
> +#ifdef XP_MACOSX

I think either you should put a comment here noting that this entire file is a no-op everywhere but OS X, or you should split the OS X and other implementations into separate files, like mozPoisonWriteMac.cpp and mozPoisonWriteStub.cpp. It's sort of confusing as-is.

@@ +76,5 @@
> +
> +// Define a simple FuncData that just aborts.
> +#define DEFINE_F_DATA_ABORT(X)                                       \
> +static void wrap_ ## X() { abort(); }                                \
> +static FuncData X ## _data = { 0, (void*) wrap_ ## X, (void*) X }

I think it'd make sense to just reuse DEFINE_F_DATA here.

@@ +81,5 @@
> +
> +// Define a simple FuncData that just aborts for a function that needs dlsym.
> +#define DEFINE_F_DATA_ABORT_DYN(X, NAME)                             \
> +static void wrap_ ## X() { abort(); }                                \
> +static FuncData X ## _data = { NAME, (void*) wrap_ ## X }

and DEFINE_F_DATA_DYN here.

@@ +110,5 @@
> +// These exist on 64 bit OS X
> +static ssize_t wrap_write_NOCANCEL(int fd, const void *buf, size_t count);
> +DEFINE_F_DATA_DYN(write_NOCANCEL, "write$NOCANCEL");
> +
> +const static int NumFunctions = 13;

I don't know what our style guide says, but I've been moving away from file-statics in favor of using anonymous namespaces.

@@ +126,5 @@
> +
> +                                             &writev_data,
> +                                             &writev_NOCANCEL_UNIX2003_data,
> +                                             &writev_UNIX2003_data,
> +                                             &writev_NOCANCEL_data};

I think these would be better written as
static FuncData *Functions[] = { ... }
static int NumFunctions = sizeof(Functions) / sizeof(Functions[0]); (not sure if you can use ArrayLength here: http://mxr.mozilla.org/mozilla-central/source/mfbt/Util.h#309)

@@ +131,5 @@
> +
> +static PRLock *getDebugFDsLock() {
> +    // FIXME: have to use something lower level than a mutex. We can
> +    // get recursive in here when called from logging a call to free.
> +    static PRLock *Lock = PR_NewLock();

Any reason to not just make this a file-static?

@@ +150,5 @@
> +        PR_Lock(mL);
> +    }
> +    ~MyAutoLock() {
> +        PR_Unlock(mL);
> +    }

You could use the MFBT Scoped stuff here.

@@ +236,5 @@
> +wrap_write_NOCANCEL_UNIX2003(int fd, const void *buf, size_t count) {
> +    AbortOnBadWrite(fd, buf, count);
> +    write_t old_write = (write_t) write_NOCANCEL_UNIX2003_data.Buffer;
> +    return old_write(fd, buf, count);
> +}

These seem like they could all be generated via macros, since they all have the same body.

@@ +239,5 @@
> +    return old_write(fd, buf, count);
> +}
> +
> +namespace mozilla {
> +    void PoisonWrite() {

Don't indent inside a namespace.

::: xpcom/build/mozPoisonWrite.h
@@ +29,5 @@
> + * and other provisions required by the GPL or the LGPL. If you do not delete
> + * the provisions above, a recipient may use your version of this file under
> + * the terms of any one of the MPL, the GPL or the LGPL.
> + *
> + * ***** END LICENSE BLOCK ***** */

You want to use the new MPL2 boilerplate.

@@ +40,5 @@
> +MOZ_END_EXTERN_C
> +
> +#ifdef __cplusplus
> +namespace mozilla {
> +    void PoisonWrite();

I think our style guide says not to indent inside a namespace block.

::: xpcom/build/nsXPComInit.cpp
@@ +615,5 @@
> +        // #16 0x00000001022a107f in NS_ProcessNextEvent_P (thread=0x1088bf690, mayWait=true) at nsThreadUtils.cpp:245
> +        // #17 0x0000000102337ec5 in nsThread::ThreadFunc (arg=0x1088bf690) at /Users/espindola/mozilla-central/xpcom/threads/nsThread.cpp:289
> +        // #18 0x000000010827e504 in _pt_root (arg=0x1088895f0) at /Users/espindola/mozilla-central/nsprpub/pr/src/pthreads/ptthread.c:155
> +        // #19 0x00007fff86c688bf in _pthread_start ()
> +        // #20 0x00007fff86c6bb75 in thread_start ()

This comment doesn't seem to add much value, remove it.
Comment 17 Rafael Ávila de Espíndola (:espindola) (not reading bugmail) 2012-06-11 08:06:14 PDT
> r- for the Breakpad stuff, I have a few other comments.
> 
> Is there a plan to implement this for other platforms in a followup?

Yes. I actually started with linux trying to poison the PLT, but OS X was quiet a bit easier. The current OS X implementation can probably be ported to Linux. Ehsan is taking a look if there is a way to implement something like this on windows.

> ::: toolkit/crashreporter/google-breakpad/src/client/minidump_file_writer.cc
> @@ +40,5 @@
> >  #include "common/linux/linux_syscall_support.h"
> >  #include "common/linux/linux_libc_support.h"
> >  #include "client/minidump_file_writer-inl.h"
> >  #include "common/string_conversion.h"
> > +#include "mozilla/mozPoisonWrite.h"
> 
> This isn't going to work. :-/ The Breakpad code is an upstream library, we
> can't stick calls to Mozilla functions in here.

I had some hopes of being able to put this behind a macro. Maybe one for opening
and closing the file instead of registering/unregestering file descriptors. Do you thing that would work?

If not, we probably need a procedure for keeping a local change. Do you have a suggestion?

> I'm curious, though. Why is this ever a problem? Minidump writing should
> only happen on a crash. Were you seeing crashes during shutdown?

The case I found was the write poisoning itself causing a crash with a failed assert. The expected result is that we get a minidump so that we can debug what is causing the late write, but instead we get into a "crash loop"

1) Write poisoning finds a bug and an assert fails
2) Breakpad tries to write a minidump
3) Write poisoning thinks it found another bug, goto 2

> We can install a Breakpad callback that happens after the crash but before
> the minidump gets written. Could we simply disable write poisoning in that
> case, since we're crashing anyway?

That could work :-)
Do you have a pointer to how to do it?

> ::: toolkit/crashreporter/nsExceptionHandler.cpp
> @@ +520,5 @@
> >                        O_WRONLY | O_CREAT | O_TRUNC,
> >                        0666);
> >  
> >      if (fd != -1) {
> > +      MozillaRegisterDebugFD(fd);
> 
> This also only happens during a crash, so I don't see why it'd be a problem
> either.

Same "crash loop" problem.
 
> ::: tools/trace-malloc/lib/nsTraceMalloc.c
> @@ +1768,5 @@
> >      TM_SUPPRESS_TRACING_AND_ENTER_LOCK(t);
> >  
> >      ofp = fopen(pathname, WRITE_FLAGS);
> >      if (ofp) {
> > +        MozillaRegisterDebugFD(fileno(ofp));
> 
> Seems like you almost want an RAII class here to wrap this. There's a
> template class for RAII classes in mfbt now, if you want to do that:
> http://mxr.mozilla.org/mozilla-central/source/mfbt/Scoped.h

Will take a look, thanks.

> ::: xpcom/base/nsTraceRefcntImpl.cpp
> @@ +1263,3 @@
> >      fclose(gBloatLog);
> > +    if (fd != 1 && fd != 2)
> > +      MozillaUnRegisterDebugFD(fd);
> 
> Would be nice to avoid having to write this 5 times here, maybe it's worth
> making this a function?

OK.

> ::: xpcom/build/mach_override.c
> @@ +1,1 @@
> > +/* copied from https://github.com/rentzsch/mach_star */
> 
> I'm not actually going to review mach_override.{c,h}.

Is this the correct way to add third party code? Should it be in a separated directory or something? Is there someone that you think should review the code itself?
 
> ::: xpcom/build/mozPoisonWrite.cpp
> @@ +43,5 @@
> > +#include <string.h>
> > +
> > +using namespace mozilla;
> > +
> > +#ifdef XP_MACOSX
> 
> I think either you should put a comment here noting that this entire file is
> a no-op everywhere but OS X, or you should split the OS X and other
> implementations into separate files, like mozPoisonWriteMac.cpp and
> mozPoisonWriteStub.cpp. It's sort of confusing as-is.

I will try to split it and select which one to use in the makefile.

> @@ +76,5 @@
> > +
> > +// Define a simple FuncData that just aborts.
> > +#define DEFINE_F_DATA_ABORT(X)                                       \
> > +static void wrap_ ## X() { abort(); }                                \
> > +static FuncData X ## _data = { 0, (void*) wrap_ ## X, (void*) X }
> 
> I think it'd make sense to just reuse DEFINE_F_DATA here.

OK.

> @@ +81,5 @@
> > +
> > +// Define a simple FuncData that just aborts for a function that needs dlsym.
> > +#define DEFINE_F_DATA_ABORT_DYN(X, NAME)                             \
> > +static void wrap_ ## X() { abort(); }                                \
> > +static FuncData X ## _data = { NAME, (void*) wrap_ ## X }
> 
> and DEFINE_F_DATA_DYN here.

OK.

> @@ +110,5 @@
> > +// These exist on 64 bit OS X
> > +static ssize_t wrap_write_NOCANCEL(int fd, const void *buf, size_t count);
> > +DEFINE_F_DATA_DYN(write_NOCANCEL, "write$NOCANCEL");
> > +
> > +const static int NumFunctions = 13;
> 
> I don't know what our style guide says, but I've been moving away from
> file-statics in favor of using anonymous namespaces.

Will give that a try.
 
> @@ +126,5 @@
> > +
> > +                                             &writev_data,
> > +                                             &writev_NOCANCEL_UNIX2003_data,
> > +                                             &writev_UNIX2003_data,
> > +                                             &writev_NOCANCEL_data};
> 
> I think these would be better written as
> static FuncData *Functions[] = { ... }
> static int NumFunctions = sizeof(Functions) / sizeof(Functions[0]); (not
> sure if you can use ArrayLength here:
> http://mxr.mozilla.org/mozilla-central/source/mfbt/Util.h#309)

OK.

> @@ +131,5 @@
> > +
> > +static PRLock *getDebugFDsLock() {
> > +    // FIXME: have to use something lower level than a mutex. We can
> > +    // get recursive in here when called from logging a call to free.
> > +    static PRLock *Lock = PR_NewLock();
> 
> Any reason to not just make this a file-static?

I was using a mutex before, and that one complains if it is constructed from an static initialization. I guess a PRLock would work, but it is better to avoid a static constructor if we can, no?

> @@ +150,5 @@
> > +        PR_Lock(mL);
> > +    }
> > +    ~MyAutoLock() {
> > +        PR_Unlock(mL);
> > +    }
> 
> You could use the MFBT Scoped stuff here.

OK.

> @@ +236,5 @@
> > +wrap_write_NOCANCEL_UNIX2003(int fd, const void *buf, size_t count) {
> > +    AbortOnBadWrite(fd, buf, count);
> > +    write_t old_write = (write_t) write_NOCANCEL_UNIX2003_data.Buffer;
> > +    return old_write(fd, buf, count);
> > +}
> 
> These seem like they could all be generated via macros, since they all have
> the same body.

Similar, I will give it a try.
 
> @@ +239,5 @@
> > +    return old_write(fd, buf, count);
> > +}
> > +
> > +namespace mozilla {
> > +    void PoisonWrite() {
> 
> Don't indent inside a namespace.

OK.

> ::: xpcom/build/mozPoisonWrite.h
> @@ +29,5 @@
> > + * and other provisions required by the GPL or the LGPL. If you do not delete
> > + * the provisions above, a recipient may use your version of this file under
> > + * the terms of any one of the MPL, the GPL or the LGPL.
> > + *
> > + * ***** END LICENSE BLOCK ***** */
> 
> You want to use the new MPL2 boilerplate.

OK.

> @@ +40,5 @@
> > +MOZ_END_EXTERN_C
> > +
> > +#ifdef __cplusplus
> > +namespace mozilla {
> > +    void PoisonWrite();
> 
> I think our style guide says not to indent inside a namespace block.

OK

> ::: xpcom/build/nsXPComInit.cpp
> @@ +615,5 @@
> > +        // #16 0x00000001022a107f in NS_ProcessNextEvent_P (thread=0x1088bf690, mayWait=true) at nsThreadUtils.cpp:245
> > +        // #17 0x0000000102337ec5 in nsThread::ThreadFunc (arg=0x1088bf690) at /Users/espindola/mozilla-central/xpcom/threads/nsThread.cpp:289
> > +        // #18 0x000000010827e504 in _pt_root (arg=0x1088895f0) at /Users/espindola/mozilla-central/nsprpub/pr/src/pthreads/ptthread.c:155
> > +        // #19 0x00007fff86c688bf in _pthread_start ()
> > +        // #20 0x00007fff86c6bb75 in thread_start ()
> 
> This comment doesn't seem to add much value, remove it.

I reported a bug with it and forgot to remove it from the file, sorry.

I will try to upload a new patch today.
Comment 18 Rafael Ávila de Espíndola (:espindola) (not reading bugmail) 2012-06-11 20:14:34 PDT
Created attachment 632116 [details] [diff] [review]
poison write

https://tbpl.mozilla.org/?tree=Try&rev=31ad305a2a58

The breakpad changes are still as they were in the previous patch. How do I setup a callback before a minidump is written to disk?

The idea of using RAII in the end only worked in one file, since for breakpad we will do something completely different and another file is C.

A hacked version showing that breakpad is able to write a minidump is at

https://tbpl.mozilla.org/?tree=Try&rev=e843bb6acdc8
Comment 19 Rafael Ávila de Espíndola (:espindola) (not reading bugmail) 2012-06-13 13:05:21 PDT
Created attachment 632827 [details] [diff] [review]
poison write

It looks like Ted's idea of using the callback to disable write poisoning works. A try push is at

https://tbpl.mozilla.org/?tree=Try&rev=2368d69e5873

And a push to show that breakpad is working is at

https://tbpl.mozilla.org/?tree=Try&rev=4ac361a5bb13
Comment 20 Ted Mielczarek [:ted.mielczarek] 2012-06-20 08:29:59 PDT
Comment on attachment 632827 [details] [diff] [review]
poison write

Review of attachment 632827 [details] [diff] [review]:
-----------------------------------------------------------------

This looks good, thanks for the changes! Just a few nits.

::: toolkit/crashreporter/nsExceptionHandler.cpp
@@ +635,5 @@
>    return !(envvar && *envvar);
>  }
>  
> +namespace {
> +  bool Filter(void *context) {

nit: style for this file is * snuggled up to the type name.

::: xpcom/build/mozPoisonWriteMac.cpp
@@ +113,5 @@
> +
> +std::vector<int>& getDebugFDs() {
> +    // We have to use new as some write happen during static destructors
> +    // so an static std::vector might be destroyed while we still need it.
> +    static std::vector<int> *DebugFDs = new std::vector<int>();

So we're just going to leak this then, right? We might want an annotation to tell Valgrind that this is okay.

@@ +170,5 @@
> +    if (std::find(Vec.begin(), Vec.end(), fd) != Vec.end())
> +        return;
> +
> +    // As a really bad hack, accept writes that don't change the on disk
> +    // content. This is needed because dbm doesn't keep track of dirt bits

nit: "dirty".

@@ +172,5 @@
> +
> +    // As a really bad hack, accept writes that don't change the on disk
> +    // content. This is needed because dbm doesn't keep track of dirt bits
> +    // and can end up writing the same data to disk twice. Once when the
> +    // user (nss) asks it to sync and once when closing the database.

This does suck. I thought we weren't actually using dbm anywhere anymore?
Comment 21 Rafael Ávila de Espíndola (:espindola) (not reading bugmail) 2012-06-21 15:34:10 PDT
Created attachment 635508 [details] [diff] [review]
poison write

Julian tells me that valgrind currently has no API for marking a leak as OK, but I have found a way to free the resources.

I have also added an early return in release builds. Sorry I forgot that before. We will want to enable this on release builds (with a runtime option), but it is safer to start only with debug ones.

Sending for a final OK.

https://tbpl.mozilla.org/?tree=Try&rev=79aee4c72014
Comment 22 Rafael Ávila de Espíndola (:espindola) (not reading bugmail) 2012-06-25 10:58:41 PDT
Created attachment 636391 [details] [diff] [review]
poison write

I did a last try run just before committing, and it found a new write to disk :-(

Avoiding the write itself is simple (move the call to poison write), but the interesting thing is that the write was in another process we were talking to via ipc. We were not setting a filter for that case. In fact, breakpad doesn't support filters for child process right now.

This patch fixes both issues. Do you think the breakpad patch is OK? Should I send it upstream first or can I do it in parallel?

https://tbpl.mozilla.org/?tree=Try&rev=20e27ef3c670
Comment 23 Ted Mielczarek [:ted.mielczarek] 2012-06-29 13:09:17 PDT
Comment on attachment 636391 [details] [diff] [review]
poison write

Review of attachment 636391 [details] [diff] [review]:
-----------------------------------------------------------------

If you could give me the Breakpad bits as a patch against Breakpad SVN I could land it upstream for you.

::: toolkit/crashreporter/google-breakpad/src/client/mac/crash_generation/crash_generation_server.h
@@ +64,5 @@
>                                                const std::string &file_path);
>  
>    typedef void (*OnClientExitingCallback)(void *context,
>                                            const ClientInfo &client_info);
> +  typedef bool (*Filter)();

Should name this FilterCallback. Do you think it makes sense to add a context param as well for consistency with the other callbacks?

@@ +69,5 @@
>  
>    // Create an instance with the given parameters.
>    //
>    // mach_port_name: Named server port to listen on.
>    // dump_callback: Callback for a client crash dump request.

Please add a comment describing the new parameter.

::: toolkit/crashreporter/nsExceptionHandler.cpp
@@ +634,5 @@
>    const char *envvar = PR_GetEnv("MOZ_CRASHREPORTER_NO_REPORT");
>    return !(envvar && *envvar);
>  }
>  
> +namespace {

We should probably move more of this file into an anon namespace. (But we don't have to do that now obviously.)
Comment 24 Rafael Ávila de Espíndola (:espindola) (not reading bugmail) 2012-06-29 15:35:04 PDT
Created attachment 638032 [details] [diff] [review]
the upstream breakpad bits
Comment 25 Rafael Ávila de Espíndola (:espindola) (not reading bugmail) 2012-06-29 15:36:43 PDT
Comment on attachment 638032 [details] [diff] [review]
the upstream breakpad bits

sorry, still have to finish up the patch.
Comment 26 Rafael Ávila de Espíndola (:espindola) (not reading bugmail) 2012-06-29 15:43:11 PDT
Created attachment 638034 [details] [diff] [review]
update upstream bits
Comment 27 Rafael Ávila de Espíndola (:espindola) (not reading bugmail) 2012-06-29 15:44:39 PDT
A final try push is at
https://tbpl.mozilla.org/?tree=Try&rev=8564a6b7e46a

I will push as soon as it finishes.
Comment 28 Rafael Ávila de Espíndola (:espindola) (not reading bugmail) 2012-07-03 06:17:42 PDT
https://tbpl.mozilla.org/?tree=Mozilla-Inbound&rev=da871640d448
Comment 29 Ryan VanderMeulen [:RyanVM] 2012-07-03 16:08:13 PDT
https://hg.mozilla.org/mozilla-central/rev/da871640d448
Comment 30 Boris Zbarsky [:bz] 2012-07-08 00:55:25 PDT
I've had a number of crashes in my debug build today that seem to be due to this.... Unfortunately, I'm failing to catch them in a debugger so far.  :(

Could we poison with something that would sleep() for 30 seconds before crashing?
Comment 31 Ted Mielczarek [:ted.mielczarek] 2012-07-19 18:08:52 PDT
Comment on attachment 638034 [details] [diff] [review]
update upstream bits

Thanks, I'll land this upstream.
Comment 32 Ted Mielczarek [:ted.mielczarek] 2012-07-20 05:25:21 PDT
Landed upstream:
http://code.google.com/p/google-breakpad/source/detail?r=990
Comment 33 Rafael Ávila de Espíndola (:espindola) (not reading bugmail) 2012-07-20 05:50:51 PDT
(In reply to Ted Mielczarek [:ted] from comment #32)
> Landed upstream:
> http://code.google.com/p/google-breakpad/source/detail?r=990

Thanks, and thanks for adding the test!

Note You need to log in before you can comment on or make changes to this bug.