Implement write poisoning on Windows

RESOLVED FIXED in mozilla21

Status

()

RESOLVED FIXED
6 years ago
6 years ago

People

(Reporter: Ehsan, Assigned: BenWa)

Tracking

Trunk
mozilla21
x86
Mac OS X
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(relnote-firefox -)

Details

Attachments

(2 attachments, 4 obsolete attachments)

(Reporter)

Description

6 years ago
Poison all of the writes!
(Reporter)

Comment 1

6 years ago
So, I took a quick look around and there seems to be only one write function on Windows (weird, right?!) called NtFlushBuffersFile, which is what kernel32's FlushFileBuffers is implemented on top of.

NtFlushBuffersFile is documented here: <http://msdn.microsoft.com/en-us/library/windows/hardware/ff566454%28v=vs.85%29.aspx> (don't get confused by the Zw prefix.)  It is implemented in ntdll.dll and it's what calls the Windows kernel to perform the write.

For intercepting this call, we can use http://mxr.mozilla.org/mozilla-central/source/toolkit/xre/nsWindowsDllInterceptor.h.  Here's what the code will roughly look like:

WindowsDllInterceptor interceptor;

typedef NTSTATUS (WINAPI* FlushBuffersFile_fn)(HANDLE, PIO_STATUS_BLOCK);

FlushBuffersFile_fn gOriginalFlushBuffersFile;

NTSTATUS WINAPI
patched_FlushBuffersFile(HANDLE aFileHandle, PIO_STATUS_BLOCK aIoStatusBlock)
{
  // poison the write if needed, otherwise:
  return gOriginalFlushBuffersFile(aFileHandle, aIoStatusBlock);
}

void ReplaceFlushBuffersFile()
{
  interceptor.Init("ntdll.dll");
  interceptor.AddHook("NtFlushBuffersFile", reinterpret_cast<intptr_t>(patched_FlushBuffersFile), reinterpret_cast<void**>(&gOriginalFlushBuffersFile));
}
(Reporter)

Comment 2

6 years ago
I promise fast reviews for a patch here (for the Windows bits.)  Rafael should review the rest.
(Assignee)

Updated

6 years ago
Assignee: nobody → bgirard
(Reporter)

Comment 6

6 years ago
Comment on attachment 697745 [details] [diff] [review]
WIP

Review of attachment 697745 [details] [diff] [review]:
-----------------------------------------------------------------

::: xpcom/build/mozPoisonWriteWin.cpp
@@ +27,5 @@
> +  AbortOnBadWrite();
> +  return gOriginalFlushBuffersFile(aFileHandle, aIoStatusBlock);
> +}
> +
> +bool ValidWriteAssert(bool ok)

Can |ok| ever be true?

@@ +39,5 @@
> +
> +  return false;
> +}
> +
> +void AbortOnBadWrite()

The name of this function is very misleading.

@@ +59,5 @@
> +  if (PoisoningDisabled)
> +      return;
> +
> +  // REVIEW Does this need to be static? Examples make this static but I don't see anything in WindowsDllInterceptor that suggests its required
> +  WindowsDllInterceptor interceptor;

It's best to follow suit here.
You might want to take a look at

https://bugzilla.mozilla.org/show_bug.cgi?id=826029#c16

We want to reenable writes just before returning from main, but we try to unload xul, so that can cause problems. For some reason on windowns we only succeed in unloading it in -process-updates runs, which is why I was getting crashes in the update tests.

Since it is not unloaded already in regular runs, maybe the best thing to do is to not try it at all, but we should at least figure out why it is not being unloaded.
(Assignee)

Comment 8

6 years ago
(In reply to :Ehsan Akhgari from comment #6)
> Comment on attachment 697745 [details] [diff] [review]
> WIP
> 
> Review of attachment 697745 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> ::: xpcom/build/mozPoisonWriteWin.cpp
> @@ +27,5 @@
> > +  AbortOnBadWrite();
> > +  return gOriginalFlushBuffersFile(aFileHandle, aIoStatusBlock);
> > +}
> > +
> > +bool ValidWriteAssert(bool ok)
> 
> Can |ok| ever be true?

It may later. I'm staying consistent with the windows implementation. I may refactor some common code to a mozPoisonWriteBase.cpp but I'm not sharing enough code yet.

> 
> @@ +39,5 @@
> > +
> > +  return false;
> > +}
> > +
> > +void AbortOnBadWrite()
> 
> The name of this function is very misleading.

I was trying to be consistent with the windows implementation. More logic will go in here soon.

> 
> @@ +59,5 @@
> > +  if (PoisoningDisabled)
> > +      return;
> > +
> > +  // REVIEW Does this need to be static? Examples make this static but I don't see anything in WindowsDllInterceptor that suggests its required
> > +  WindowsDllInterceptor interceptor;
> 
> It's best to follow suit here.

Ok. As long as it's not adding a static constructor.
(Assignee)

Comment 9

6 years ago
Created attachment 698103 [details] [diff] [review]
Part 1: Poison writes to NtFlushBuffersFile and share code with mozPoisonWriteBase

https://tbpl.mozilla.org/?tree=Try&rev=a88c3cd7a3af
Attachment #697745 - Attachment is obsolete: true
Attachment #698103 - Flags: review?(respindola)
(Assignee)

Comment 10

6 years ago
Created attachment 698147 [details] [diff] [review]
Part 2: Share LateWrite logging with windows
Comment on attachment 698103 [details] [diff] [review]
Part 1: Poison writes to NtFlushBuffersFile and share code with mozPoisonWriteBase

Review of attachment 698103 [details] [diff] [review]:
-----------------------------------------------------------------

This is going on the right direction, but please check these issues first.

::: toolkit/xre/nsWindowsDllInterceptor.h
@@ +362,5 @@
>            return;
>          }
> +      } else if (origBytes[nBytes] == 0xB8) {
> +        // MOV 0xB8: http://ref.x86asm.net/coder32.html#xB8
> +        nBytes += 5;

Do we have tests for this? Can toolkit/xre/test/win/TestDllInterceptor.cpp be expanded to include it?

::: xpcom/build/mozPoisonWriteBase.cpp
@@ +40,5 @@
> +
> +std::vector<int>& getDebugFDs() {
> +  // We have to use new as some write happen during static destructors
> +  // so an static std::vector might be destroyed while we still need it.
> +  static std::vector<int> *DebugFDs = new std::vector<int>();

This is using a local static, which is guaranteed to have an atomic initialization, but I am not sure if msvc implements this correctly. Could you please check that?

::: xpcom/build/mozPoisonWriteWin.cpp
@@ +32,5 @@
> +}
> +
> +bool ValidWriteAssert(bool ok)
> +{
> +  if (ok) {

Why not just MOZ_ASSERT(ok)?

::: xpcom/build/nsXPComInit.cpp
@@ +525,1 @@
>  

Testing change?
Attachment #698103 - Flags: review?(respindola) → review-
(Assignee)

Updated

6 years ago
Depends on: 761135
(Assignee)

Comment 12

6 years ago
(In reply to Rafael Ávila de Espíndola (:espindola) from comment #11)
> ::: xpcom/build/mozPoisonWriteBase.cpp
> @@ +40,5 @@
> > +
> > +std::vector<int>& getDebugFDs() {
> > +  // We have to use new as some write happen during static destructors
> > +  // so an static std::vector might be destroyed while we still need it.
> > +  static std::vector<int> *DebugFDs = new std::vector<int>();
> 
> This is using a local static, which is guaranteed to have an atomic
> initialization, but I am not sure if msvc implements this correctly. Could
> you please check that?

It's not:
http://stackoverflow.com/questions/10585928/is-static-init-thread-safe-with-vc2010

Let's make sure that debugFDs is always lock, which it currently is. Adding a comment to document this.
(Assignee)

Comment 15

6 years ago
Created attachment 699382 [details] [diff] [review]
patch

I merged the two part since they didn't really apply on their own.
Attachment #698103 - Attachment is obsolete: true
Attachment #698147 - Attachment is obsolete: true
Attachment #699382 - Flags: review?(respindola)
Attachment #699382 - Flags: review?(ehsan)
(Reporter)

Comment 16

6 years ago
Comment on attachment 699382 [details] [diff] [review]
patch

Review of attachment 699382 [details] [diff] [review]:
-----------------------------------------------------------------

I only skimmed over the mozPoisonWrite* stuff since I think they're mostly moving code around.  Please let me know if you want a more in-depth review.
Attachment #699382 - Flags: review?(ehsan) → review+
(Assignee)

Comment 17

6 years ago
Comment on attachment 699382 [details] [diff] [review]
patch

Review of attachment 699382 [details] [diff] [review]:
-----------------------------------------------------------------

Just wanted to point out that there is a code change within a big code move block to make it more obvious to reviewers.

::: xpcom/build/mozPoisonWriteBase.cpp
@@ +116,5 @@
> +
> +    // We want the sha1 of the entire file, so please don't write to fd
> +    // directly; use sha1Stream.
> +#ifdef XP_WIN
> +    // The windows version doesn't open the file so there's a time-of-check-time-of-use issue :(

Note that this #ifdef is not from the move.
(Reporter)

Comment 18

6 years ago
Comment on attachment 699382 [details] [diff] [review]
patch

Review of attachment 699382 [details] [diff] [review]:
-----------------------------------------------------------------

I only skimmed over the mozPoisonWrite* stuff since I think they're mostly moving code around.  Please let me know if you want a more in-depth review.

::: xpcom/build/mozPoisonWriteBase.cpp
@@ +116,5 @@
> +
> +    // We want the sha1 of the entire file, so please don't write to fd
> +    // directly; use sha1Stream.
> +#ifdef XP_WIN
> +    // The windows version doesn't open the file so there's a time-of-check-time-of-use issue :(

Can you please explain this?  The comment doesn't really tell me much about why this is needed.
(Assignee)

Comment 19

6 years ago
mkstemp doesn't exist in windows. _mktemp_s only formats the name but doesn't open the file so a fopen_s is also needed.
Comment on attachment 699382 [details] [diff] [review]
patch

Review of attachment 699382 [details] [diff] [review]:
-----------------------------------------------------------------

Has this found any writes on try?

::: browser/app/nsBrowserApp.cpp
@@ +386,5 @@
>    {
>      ScopedLogging log;
>      result = do_main(argc, argv, xreDirectory);
>    }
>  

This depends on 761135, but it will be awesome to see it go!

::: toolkit/xre/test/win/TestDllInterceptor.cpp
@@ +115,5 @@
>        TestHook("user32.dll", "SetWindowLongA") &&
>        TestHook("user32.dll", "SetWindowLongW") &&
>  #endif
>        TestHook("user32.dll", "TrackPopupMenu") &&
> +      TestHook("ntdll.dll", "NtFlushBuffersFile") &&

Thanks!

::: xpcom/build/mozPoisonWriteBase.cpp
@@ +21,5 @@
> +using namespace mozilla;
> +
> +namespace mozilla {
> +
> +char *sProfileDirectory = NULL;

Please add an anonymous namespace to this file and move as much as you can to it. sProfileDirectory can be moved for example.

@@ +123,5 @@
> +    if (fopen_s(&stream, name, "w") != 0) {
> +      NS_RUNTIMEABORT("Um, how did we get here?");
> +    }
> +    fd = fileno(stream);
> +    // REVIEW is a fclose required on the stream

I think so. Since you have a FILE, you should use fclose on it and let it close the FD.  If you want, you can do a fdopen on the non-windows version so that we always just fclose.

Ehsan, is there something more like mkstemp on windows that returns an open file?
Attachment #699382 - Flags: review?(respindola) → review+
> > +    // REVIEW is a fclose required on the stream
> 
> I think so. Since you have a FILE, you should use fclose on it and let it
> close the FD.  If you want, you can do a fdopen on the non-windows version
> so that we always just fclose.

I just noticed that SHA1Stream calls fdopen, so maybe (depending on what windows apis we find) the best option is to just pass a FILE to SHA1Stream and have the caller call fdopen if needed.
(Reporter)

Comment 22

6 years ago
Comment on attachment 699382 [details] [diff] [review]
patch

Review of attachment 699382 [details] [diff] [review]:
-----------------------------------------------------------------

::: xpcom/build/mozPoisonWriteBase.cpp
@@ +121,5 @@
> +    int fd = _mktemp_s(name, strlen(name) + 1);
> +    FILE *stream;
> +    if (fopen_s(&stream, name, "w") != 0) {
> +      NS_RUNTIMEABORT("Um, how did we get here?");
> +    }

You should do this in a loop to avoid the race condition.
Attachment #699382 - Flags: review+ → review-
(Reporter)

Comment 23

6 years ago
(In reply to Rafael Ávila de Espíndola (:espindola) from comment #20)
> Ehsan, is there something more like mkstemp on windows that returns an open
> file?

Not that I know of.
(Assignee)

Comment 26

6 years ago
Created attachment 701175 [details] [diff] [review]
patch + old mac warning fix
Attachment #699382 - Attachment is obsolete: true
Attachment #701175 - Flags: review?(ehsan)
(Reporter)

Comment 27

6 years ago
Comment on attachment 701175 [details] [diff] [review]
patch + old mac warning fix

Review of attachment 701175 [details] [diff] [review]:
-----------------------------------------------------------------

I know that you're just moving a lot of this code, but you know that I don't give out r+'es that easily.  ;-)  (Sorry!)

::: xpcom/build/mozPoisonWrite.h
@@ +8,5 @@
>  #include <stdio.h>
>  
>  MOZ_BEGIN_EXTERN_C
>    void MozillaRegisterDebugFD(int fd);
> +  void MozillaRegisterDebugFile(FILE *f);

Nit: FILE, to match the Unregister function.

::: xpcom/build/mozPoisonWriteBase.cpp
@@ +47,5 @@
> +}
> +
> +void BaseCleanup() {
> +  if (sProfileDirectory)
> +    PL_strfree(sProfileDirectory);

Null check not needed.

@@ +49,5 @@
> +void BaseCleanup() {
> +  if (sProfileDirectory)
> +    PL_strfree(sProfileDirectory);
> +  sProfileDirectory = nullptr;
> +  delete &getDebugFDs();

This is terrible.  Just make getDebugFDs return a pointer.

@@ +58,5 @@
> +// computes the sha1 of the data that passes through it.
> +class SHA1Stream
> +{
> +public:
> +    SHA1Stream(FILE *stream) {

Nit: explicit.

@@ +59,5 @@
> +class SHA1Stream
> +{
> +public:
> +    SHA1Stream(FILE *stream) {
> +        mFile = stream;

Nit: please use an initializer list.

@@ +110,5 @@
> +                 reinterpret_cast<void*>(&rawStack), 0, nullptr);
> +    Telemetry::ProcessedStack stack = Telemetry::GetStackAndModules(rawStack);
> +
> +    nsPrintfCString nameAux("%s%s", sProfileDirectory,
> +                            "/Telemetry.LateWriteTmpXXXXXX");

Use something like NS_SLASH defined elsewhere in the code base.

@@ +119,5 @@
> +    // directly; use sha1Stream.
> +    FILE *stream;
> +#ifdef XP_WIN
> +    for (;;) {
> +      // The windows version doesn't open the file so there's a time-of-check-time-of-use issue :(

Please make this comment clearer.

@@ +128,5 @@
> +      }
> +      if (fopenResult != 0) {
> +        NS_RUNTIMEABORT("Um, how did we get here?");
> +      }
> +      break;

Hmm, this loop makes my head swing.  I think the following is more readable:

  int result;
  errno_t fOpenResult;
  do {
    result = _mkstemp_s(...);
    fOpenResult = fopen_s(...);
  } while (fOpenResult == EEXIST);

  if (fOpenResult != 0) {
    how did we get here?
  }

What do you think?

Or, even better, just implement your own mkstemp #ifdef XP_WIN earlier in the file and remove this #ifdef and just call it like the Unix code path.

@@ +157,5 @@
> +        //      cmdsize 632
> +        //      segname __TEXT
> +        //      vmaddr 0x0000000100000000
> +        // so to print the line matching the offset 123 one has to run
> +        // atos -o firefox 0x100000123.

This comment seems kind of Mac specific.

::: xpcom/build/mozPoisonWriteBase.h
@@ +5,5 @@
> + * file, You can obtain one at http://mozilla.org/MPL/2.0/. */
> +
> +// Private interface for code shared between the platforms of mozPoisonWriteXXX
> +
> +#pragma once

Use #include guards.

@@ +20,5 @@
> +
> +struct DebugFDAutoLockTraits {
> +  typedef PRLock *type;
> +  const static type empty() {
> +    return NULL;

nullptr

::: xpcom/build/mozPoisonWriteWin.cpp
@@ +54,5 @@
> +  sNtDllInterceptor.AddHook("NtFlushBuffersFile", reinterpret_cast<intptr_t>(patched_FlushBuffersFile), reinterpret_cast<void**>(&gOriginalFlushBuffersFile));
> +}
> +}
> +
> +// REVIEW How to add a __cleanup method for BaseCleanup
\ No newline at end of file

What?
Attachment #701175 - Flags: review?(ehsan) → review+
(Assignee)

Comment 28

6 years ago
(In reply to :Ehsan Akhgari from comment #27)
> Hmm, this loop makes my head swing.  I think the following is more readable:
> 
>   int result;
>   errno_t fOpenResult;
>   do {
>     result = _mkstemp_s(...);
>     fOpenResult = fopen_s(...);
>   } while (fOpenResult == EEXIST);
> 
>   if (fOpenResult != 0) {
>     how did we get here?
>   }
> 
> What do you think?

Testing for EEXIST is actually the wrong thing since "w" will just destroy the old file.

> 
> Or, even better, just implement your own mkstemp #ifdef XP_WIN earlier in
> the file and remove this #ifdef and just call it like the Unix code path.

We need to get the stream now and not the FD so there's not much to gain. I'll use the do {}

> 
> @@ +157,5 @@
> > +        //      cmdsize 632
> > +        //      segname __TEXT
> > +        //      vmaddr 0x0000000100000000
> > +        // so to print the line matching the offset 123 one has to run
> > +        // atos -o firefox 0x100000123.
> 
> This comment seems kind of Mac specific.

I'll let Raphael deal with this when we support the breakpad IDs.

> 
> ::: xpcom/build/mozPoisonWriteBase.h
> @@ +5,5 @@
> > + * file, You can obtain one at http://mozilla.org/MPL/2.0/. */
> > +
> > +// Private interface for code shared between the platforms of mozPoisonWriteXXX
> > +
> > +#pragma once
> 
> Use #include guards.

I though #pragma was ok you told me last
(Reporter)

Comment 30

6 years ago
(In reply to comment #28)
> > ::: xpcom/build/mozPoisonWriteBase.h
> > @@ +5,5 @@
> > > + * file, You can obtain one at http://mozilla.org/MPL/2.0/. */
> > > +
> > > +// Private interface for code shared between the platforms of mozPoisonWriteXXX
> > > +
> > > +#pragma once
> > 
> > Use #include guards.
> 
> I though #pragma was ok you told me last

No, I was mistaken.  I brought it up on dev-platform and it turns out that it will screw us over if you #include the same file using different cases in its name on case-preserving but insensitive platforms such as Windows and Mac.
(Assignee)

Comment 32

6 years ago
STR:
1) Build mac 64 debug with 'ac_add_options --enable-trace-malloc'
2) Run with 'python leaktest.py -- --trace-malloc malloc.log --shutdown-leaks=sdleak.log'

#0  0x00007fff968e5212 in __pthread_kill ()
#1  0x00007fff9101daf4 in pthread_kill ()
#2  0x00007fff91061dce in abort ()
#3  0x0000000100409b6d in PR_Assert (s=0x1004404bc "0 == rv", file=0x100442e80 "/Volumes/SSD-Mac1/Users/benoitgirard/ssd-mozilla/mozilla-central/tree/nsprpub/pr/src/pthreads/ptsynch.c", ln=175) at /Volumes/SSD-Mac1/Users/benoitgirard/ssd-mozilla/mozilla-central/tree/nsprpub/pr/src/io/prlog.c:554
#4  0x000000010042ceae in PR_Lock (lock=0x10c9152b0) at /Volumes/SSD-Mac1/Users/benoitgirard/ssd-mozilla/mozilla-central/tree/nsprpub/pr/src/pthreads/ptsynch.c:175
#5  0x0000000103853e5b in mozilla::DebugFDAutoLock::DebugFDAutoLock (this=0x7fff5fbff2b8) at mozPoisonWriteBase.h:42
#6  0x0000000103852805 in mozilla::DebugFDAutoLock::DebugFDAutoLock (this=0x7fff5fbff2b8) at mozPoisonWriteBase.h:43
#7  0x00000001038520f7 in MozillaRegisterDebugFD (fd=6) at /Volumes/SSD-Mac1/Users/benoitgirard/ssd-mozilla/mozilla-central/tree/xpcom/build/mozPoisonWriteBase.cpp:222
#8  0x000000010392335f in NS_TraceMallocDumpAllocations (pathname=0x106656ec0 "sdleak.log") at /Volumes/SSD-Mac1/Users/benoitgirard/ssd-mozilla/mozilla-central/tree/tools/trace-malloc/lib/nsTraceMalloc.c:1780
#9  0x0000000103923171 in NS_TraceMallocShutdown () at /Volumes/SSD-Mac1/Users/benoitgirard/ssd-mozilla/mozilla-central/tree/tools/trace-malloc/lib/nsTraceMalloc.c:1526
#10 0x0000000103927151 in TraceMallocShutdown::~TraceMallocShutdown (this=0x106d45cc0) at /Volumes/SSD-Mac1/Users/benoitgirard/ssd-mozilla/mozilla-central/tree/tools/trace-malloc/lib/nsTypeInfo.cpp:27
#11 0x0000000103927135 in TraceMallocShutdown::~TraceMallocShutdown (this=0x106d45cc0) at /Volumes/SSD-Mac1/Users/benoitgirard/ssd-mozilla/mozilla-central/tree/tools/trace-malloc/lib/nsTypeInfo.cpp:26
#12 0x00007fff91063307 in __cxa_finalize ()
#13 0x00007fff91064f57 in exit ()
#14 0x00000001000011ab in start ()
(Assignee)

Comment 33

6 years ago
The problem is that we need to keep track of the DebugFD regardless of the state of the write poisoning. Otherwise we could enable and miss from opened DebugFD.
https://hg.mozilla.org/mozilla-central/rev/afbe1011ae1c
Status: NEW → RESOLVED
Last Resolved: 6 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla21

Comment 39

6 years ago
Anyone else seeing this error?
mozPoisonWriteBase.cpp
mozPoisonWriteWin.cpp
c:\t1\hg\comm-central\mozilla\xpcom\build\mozPoisonWriteBase.h(24) : error C2065: 'nullptr' : undeclared identifier
c:\t1\hg\comm-central\mozilla\xpcom\build\mozPoisonWriteBase.h(24) : error C2065: 'nullptr' : undeclared identifier
Depends on: 832973
How can I locally disable this?  Can I modify ValidWriteAssert return true?  Doing shutdown CC logging trips this.
Making it return |false| seems to work.
(Assignee)

Comment 42

6 years ago
(In reply to Andrew McCreight [:mccr8] from comment #41)
> Making it return |false| seems to work.

You just need to register the IO as debugging:
http://mxr.mozilla.org/mozilla-central/source/xpcom/build/mozPoisonWrite.h#14
Thanks, that looks like an easy fix. Filed bug  837197 for it.
relnote-firefox: --- → ?
(Assignee)

Updated

6 years ago
Blocks: 845098

Updated

6 years ago
relnote-firefox: ? → -
You need to log in before you can comment on or make changes to this bug.