Report main-thread IO to Telemetry

RESOLVED FIXED in mozilla29

Status

()

defect
RESOLVED FIXED
6 years ago
7 months ago

People

(Reporter: jonasfj, Assigned: aklotz)

Tracking

(Depends on 2 bugs)

unspecified
mozilla29
x86_64
Windows 7
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(8 attachments, 30 obsolete attachments)

6.21 KB, patch
Details | Diff | Splinter Review
53.87 KB, patch
jonasfj
: review+
RyanVM
: checkin+
Details | Diff | Splinter Review
3.76 KB, patch
jonasfj
: review+
RyanVM
: checkin+
Details | Diff | Splinter Review
66.90 KB, patch
jonasfj
: review+
RyanVM
: checkin+
Details | Diff | Splinter Review
31.66 KB, patch
aklotz
: review+
Details | Diff | Splinter Review
14.32 KB, patch
aklotz
: review+
Details | Diff | Splinter Review
15.48 KB, patch
aklotz
: review+
Details | Diff | Splinter Review
5.11 KB, patch
aklotz
: review+
Details | Diff | Splinter Review
We should report main-thread IO to Telemetry. Essentially a list of filenames similar to how slowSQL is report with telemetry.
As a first approach we'll extend IOInterposer from #867762 to support filenames and collect writes on Windows by extending mozPoisonWriteWin.

We'll worry about other operations, platforms and privacy later.
Assignee: nobody → jopsen
Disclaimer: This my first set of patches, mostly I think I know what I've been
doing, but there's also some qualified guess work :)

So I ended up doing this in 3 parts, I've outline summary of changes and remarks
for consideration for each part below. The two first parts is really just
refactoring, the third uses the two first and by activating poison IO earlier
any rare intermittent bugs become more apparent, as does performance issues.

Push to try with all 3 patches available here:
https://tbpl.mozilla.org/?tree=Try&rev=9641de59a7c5

Reviewers are more or less picked out of 'hg log', so feel free to reassign.
I guess this crosses modules, so maybe a super review is needed, anyways, I
assume one starts with a normal review.
Bug 902587 - Part 1: Refactoring IO Interposer with filename support
--------------------------------------------------------------------
In this part the IOInterposer API is refactored, with aim of:
  * Remove activation of IOInterpose reporters (NSPRIOInterposer, TelemetryVFS
    PoisonIOInterposer) from IOInterposer.cpp, no more IOInterposerModule.
  * Hide singleton implementation from consumers, no more GetInstance()
  * Use IsObservedOperation() to enable/disable reporting of I/O, this is better
    suited at multiple consumers, i.e. disable when no observers are registered.
  * Class for representing an Observation, featuring virtual function for
    generating the filename on demand.
  * Use locking to allow (un)registration of observers at any time.
  * Migration from Mutex to PRLock.
  * IOInterposer reports I/O from all threads, not just the main-thread.


Remarks:
  * The class scope is deliberately abused for IOInterposer which only have
    static members (see comments in source).
  * We use PRLock and std::vector<> to allow leaking the IOInterposer.
  * There is no generic distinction of asynchronous I/O methods, other than what
    can be deduced from Observation::Reference(), which is non-trivial.
  * Profiler still only reports main-thread I/O, but this is handled in the
    ProfilerIOInterposeObserver.

We use PRLock instead of an OffTheBooksMutex as we need to unregister late-write
checks on Mac near the very end. At this point the deadlock detector on DEBUG
builds is unloaded. Can't tell if this is a bug in OffTheBooksMutex, or just a
limitation to be expected during shutdown.
Attachment #797144 - Flags: review?(bgirard)
Bug 902587 - Part 2: Refactor late-write-checks as client of IO Interposer
--------------------------------------------------------------------------
In this part the mozPoisonWrite* files are refactored, with the aim of:
  * Separating I/O Poison from late-write checks.
  * I/O Poison is now in the tools/profiler/PoisonIOInterposer* files.
  * Late-write checks are conducted in xpcom/build/LateWriteChecks.(cpp|h)
  * NtReadFile and NtReadFileScatter are now interposed.
  * Filenames can be requested on demand, using NtQueryInformationFile and
    fcntl(fd, F_GETPATH, ...) on Windows and Mac, respectively.

Remarks:
  * PoisonIOInterposerMac.cpp is only tested on try, I don't run Mac.
  * Only writes are interposed on Mac at this point.
  * LateWriteChecks.(cpp|h) is still located in xpcom/build/
  * FileHandle to Filename cache for Windows is in the works, but I fear bugs
    when free() calls NtClose() and I want to lock and purge cache from NtClose.
    (ie. I'm currently playing around with intermittent deadlock and crashes)
  * Tests on try suggests that OnlyReportDirtyWrites() is unnecessary.
  * If OnlyReportDirtyWrites() is called initially, tests will fail by timeout.

Regarding OnlyReportDirtyWrites() then qualified guess work let me to disable
the part of IsValidWrite() that reads the buffer and ensures we only report
dirty writes. The hypothesis being that reading from an arbitrary file
descriptor could be bad, or be a very expensive thing to do on each write.

I don't have Mac so this part probably needs more testing, or we should disable
telemetry IO poison on Mac, hence, only use poison on Mac for late-write checks.
In any event, my prelimiary testing on try, showed that not disabling the last
part of IsValidWrite() caused a lot of intermittent bugs to flare, when telemetry
I/O reporting (Part 3) is activated initially.
Attachment #797147 - Flags: review?(bgirard)
Bug 902587 - Part 3: IO Interpose Reporter for Telemetry
--------------------------------------------------------------------------
This is patch records a limited dataset from the I/O interposers for
telemetry. At present it just needs to be added to the telemetry ping, but the
serverside of things should probably be done along side.

I guess, I'll ask for a review once the two first parts are about to land.
But thought I'd include this for completeness.

Remarks:
  * This adds the Javascript property fileIOReports to nsITelemetry
  * The fileIOReports property is not included in the ping yet
  * mozilla::Telemetry::InitIOReporting() is only called on DEBUG builds
  * Hardcoded list of files to report IO for is still just arbitrary
Quick question I was told maybe harsha could help with:
Can I just add a new property in the telemetry ping payload?

The current data I'm recording looks like this:
https://gist.github.com/jonasfj/a54aa84e320d7dff2791

I'm currently adding a property called 'fileIOReports' next to slowSQL in the
payload. Will this be filtered out or stored so we can query it later?

Files listed in the gist above are just some prelimiary filenames extracted from
a test run. Anybody interested to see what I/O is reported when looking for cute
cats in Firefox should checkout: http://tinyurl.com/q7mr52f
Status: NEW → ASSIGNED
Flags: needinfo?(schintalapani)
I updated the telemetry I/O reporting patch (part 3):
  * fileIOReports are now included in telemetry ping payload,
  * Only I/O from main-thread is reported,
  * Data is reported as an array of arrays, instead of JSON dicts,
  * No array is included with reads, writes and fsyncs as zero,
  * Hardcoded files for which I/O is recorded have been updated.
  * Telemetry I/O Reporting is enabled on nightly, local builds and DEBUG builds.

The list of files for which we record I/O is still pretty arbitrary, as mentioned
in previous comment a complete log of I/O is available here http://tinyurl.com/q7mr52f
I just picked some relatively arbitrary files that do main-thread I/O.
Attachment #797148 - Attachment is obsolete: true
Attachment #797862 - Flags: review?(vdjeric)
Okay, so this update just changes to JSON format for the fileIOReports property.
It is now a dictionary with an entry for each filename containing an array of duration, reads, writes and fsyncs.

The example data have also be updated, see:
https://gist.github.com/jonasfj/a54aa84e320d7dff2791

I would like to include an additional remark for consideration:
  * The TelemetryIOInterposeObserver is constructed as global object
  * Initialization is separate from TelemetryImpl initialization
  * IOInterposer and poison IO interposer have special initialization concerns.
Attachment #797862 - Attachment is obsolete: true
Attachment #797862 - Flags: review?(vdjeric)
Attachment #798440 - Flags: review?(vdjeric)
(In reply to Jonas Finnemann Jensen (:jonsafj) from comment #6)
> Quick question I was told maybe harsha could help with:
> Can I just add a new property in the telemetry ping payload?
> 
> The current data I'm recording looks like this:
> https://gist.github.com/jonasfj/a54aa84e320d7dff2791
> 
> I'm currently adding a property called 'fileIOReports' next to slowSQL in the
> payload. Will this be filtered out or stored so we can query it later?
> 
> Files listed in the gist above are just some prelimiary filenames extracted
> from
> a test run. Anybody interested to see what I/O is reported when looking for
> cute
> cats in Firefox should checkout: http://tinyurl.com/q7mr52f

Hi Jonas,
       If you add a new property to telemetry payload it will get stored in our cluster(hbase). We don't filter out data from telemetry payload before storing into hbase. 
Thanks,
Harsha
Flags: needinfo?(schintalapani)
Sorry Jonas, still haven't gotten to this. I'll need a bit more time.
Comment on attachment 797144 [details] [diff] [review]
p1-io-interposer-refactoring.patch

Review of attachment 797144 [details] [diff] [review]:
-----------------------------------------------------------------

Here's some feedback but since you're touching Aaron' code you should have him review it.

::: storage/src/TelemetryVFS.cpp
@@ +11,5 @@
>  #include "nsThreadUtils.h"
>  #include "mozilla/Util.h"
>  #include "mozilla/dom/quota/QuotaManager.h"
>  #include "mozilla/dom/quota/QuotaObject.h"
> +#include "mozilla/IOInterposer.h"

Make sure it still builds with the profiler compiled out with these changes.

::: tools/profiler/IOInterposer.cpp
@@ +17,3 @@
>  
> +/** Lists of Observers */
> +struct ObserverLists{

space before {

@@ +18,5 @@
> +/** Lists of Observers */
> +struct ObserverLists{
> +  ObserverLists()
> +  {
> +    mObserverListsLock = PR_NewLock();

use MOZ_COUNT_CTOR/DTOR

@@ +45,5 @@
> +  std::vector<IOInterposeObserver*>  mFSyncObservers;
> +};
> +
> +/**
> + * A quick and dirty RAII class to automatically lock a PRLock  

These already exist, don't create a new one.

@@ +59,5 @@
> +    PR_Unlock(mLock);
> +  }
> +};
> +
> +

extra space and below

@@ +103,5 @@
> +                                                  IOInterposeObserver::OpNone;
> +
> +
> +/* static */ void IOInterposer::InitIOInterposer(){
> +  // We can't actually use this assertion because we execute this code

then remove it

@@ +108,5 @@
> +  // after XPCOM is shut down, so NS_IsMainThread() always returns false.
> +  // MOZ_ASSERT(NS_IsMainThread());
> +
> +  // Don't initiallize twice...
> +  if(sObserverLists) {

'if (' and elsewhere in the patch

::: tools/profiler/platform.cpp
@@ +306,5 @@
>  
> +  // Initialize I/O interposing
> +  mozilla::IOInterposer::InitIOInterposer();
> +  // Initialize NSPR I/O Interposing
> +  mozilla::InitNSPRIOInterposing();

Can we avoid initializing this on gecko startup and instead lazy init? We really want to minimize code here for mobile and platforms with little IO, otherwise we're going to be asked to turn off the profiler for those platforms.
Attachment #797144 - Flags: review?(bgirard) → review?(aklotz)
(In reply to Jonas Finnemann Jensen (:jonsafj) from comment #4)
> Created attachment 797147 [details] [diff] [review]
> p2-late-write-checks-refactoring.patch
> 
> Bug 902587 - Part 2: Refactor late-write-checks as client of IO Interposer
> --------------------------------------------------------------------------
> In this part the mozPoisonWrite* files are refactored, with the aim of:
>   * Separating I/O Poison from late-write checks.
>   * I/O Poison is now in the tools/profiler/PoisonIOInterposer* files.
IO poisoning should NOT live in the profiler, we should make it easier for external components to report and embed information into profiles but we shouldn't have the profiler try to do all the monitoring things.
>   * Late-write checks are conducted in xpcom/build/LateWriteChecks.(cpp|h)
>   * NtReadFile and NtReadFileScatter are now interposed.
>   * Filenames can be requested on demand, using NtQueryInformationFile and
>     fcntl(fd, F_GETPATH, ...) on Windows and Mac, respectively.

This patch does way too much and should be a patch queue. Each of these bullet points should be at a minimum a patch. Can you split up the patch to make it more manageable?
(In reply to Benoit Girard (:BenWa) from comment #11)
> ::: tools/profiler/IOInterposer.cpp
> @@ +45,5 @@
> > +  std::vector<IOInterposeObserver*>  mFSyncObservers;
> > +};
> > +
> > +/**
> > + * A quick and dirty RAII class to automatically lock a PRLock  
> 
> These already exist, don't create a new one.

I'm using PRLock so we can leak the internal `ObserverLists` structure and because we cannot acquire `mozilla::OffTheBooksMutex` during shutdown on DEBUG builds, this causes a segfault because the deadlock detector have been deallocated (yes, this is the obscure corner case).
If there already exists an RAII auto-lock that operates on PRLocks, please help me find it?
Note: we need to leak the IOInterposer inorder to use it for late-write checks and on mac we need to disable it just before static destructors kick in, see bug 826029. 

(In reply to Benoit Girard (:BenWa) from comment #12)
I'll look into splitting part 2 into 3 patches:
  * Separation of poison writes and late-write checks
  * Interposing of NtReadFile and NtReadFileScatter
  * Support for filename requests.
I think/hope that should be relatively easy...

Thanks for the feedback, I'll revise and maybe get aklotz to review part 1, next time.
I addressed issues BenWa highlighted, mostly style.
As argued in comment #13 we cannot use an `OffTheBooksMutex` in this case, as we need to leak the `ObserverLists` structure in order to do late-write checks during shutdown.
I think same argument applies to MOZ_COUNT_CTOR/DTOR, as these probably will report the structure as a leak.

> ::: tools/profiler/platform.cpp
> @@ +306,5 @@
> > ...
> > +  mozilla::IOInterposer::InitIOInterposer();
> > +  // Initialize NSPR I/O Interposing
> > +  mozilla::InitNSPRIOInterposing();
> Can we avoid initializing this on gecko startup and instead lazy init?

No, we need to initialize the IO interposer on the main thread while no other threads are running.
This is not new! The existing code does the same thing, only creation of `ProfilerIOInterposeObserver` causes the IOInterposer to be initialized, which in turn initializes the NSPR IO Interposer and the Sqlite IO Interposer.
(Yes, this was hidden in the `IOInterposer::GetInstance();` in the constructor for `ProfilerIOInterposeObserver` (the constructor will not do this with this patch).

So with this patch we're at least honest about the initialization we do at startup.
Also no initialization for the Sqlite IO reporting is necessary anymore.
However, to make things just a tiny bit better I moved initialization of  `ProfilerIOInterposeObserver` to `mozilla_sampler_start` so atleast this dynamic allocation is lazy.

Note: There is not way to globally disable/enable the IO Interposer anymore.
But IO will only be reported if observers are registered, this is better suited at multiple consumers.
Attachment #797144 - Attachment is obsolete: true
Attachment #797144 - Flags: review?(aklotz)
Attachment #800161 - Flags: review?(aklotz)
Comment on attachment 797147 [details] [diff] [review]
p2-late-write-checks-refactoring.patch

I'm currently testing a split this patch into Part 2A, Part 2B and Part 2C.
Attachment #797147 - Flags: review?(bgirard)
Comment on attachment 800161 [details] [diff] [review]
p1-io-interposer-refactoring.patch

Review of attachment 800161 [details] [diff] [review]:
-----------------------------------------------------------------

For the most part I'm happy with this patch, but I would like to see a few changes.

Nit: I've noticed that in this patch there is varying amount of whitespace (1-3 lines) between the end of a block and the start of the next piece of code. Please standardize on one blank line between the end of a block and the beginning of the next.

::: tools/profiler/IOInterposer.cpp
@@ +2,5 @@
>   * License, v. 2.0. If a copy of the MPL was not distributed with this
>   * file, You can obtain one at http://mozilla.org/MPL/2.0/. */
>  
> +#include <vector>
> +#include <algorithm>

Can you please list these includes alphabetically?

@@ +8,5 @@
>  #include "IOInterposer.h"
>  #include "NSPRInterposer.h"
> +
> +#include "mozilla/StaticPtr.h"
> +#include "mozilla/Mutex.h"

These ones too.

@@ +41,5 @@
> +
> +  // Lists of observers for read, write and fsync events respectively
> +  // These are implemented as vectors since they are allowed to survive gecko,
> +  // without reporting leaks. This is necessary for the IOInterposer to be used
> +  // for late-write checks.

Thanks for commenting on the rationale for this.

@@ +125,5 @@
> +    // this is probably a case where one consumer clears the IO interposer and
> +    // another consumer still wants events.
> +    MOZ_ASSERT(sObserverLists->mReadObservers.size() == 0);
> +    MOZ_ASSERT(sObserverLists->mWriteObservers.size() == 0);
> +    MOZ_ASSERT(sObserverLists->mFSyncObservers.size() == 0);

Please use empty() instead of checking for size() == 0.

@@ +145,5 @@
> +  //TODO: We only need read access here, so we should consider some sharedlock
> +  //      on the other hand most of the observes who receive this oberservation
> +  //      might appreciate the guarantee that all observations are synchronous.
> +  //      NOTE: We should not rely on this guarantee untill we've declared that
> +  //            we want it, do that in the documentation comment in the header.

Have you measured this to see whether this lock impacts performance? I'd like to base our decision to go with this on some data as opposed to gut feeling.

I'm also concerned about the fact that the lock is held while invoking observers. For right now that should be okay since we're the only ones writing them. OTOH I worry about somebody in the future writing an observer that does something expensive, and ends up blocking other threads because it's taking so long. I'm okay with addressing this as a follow-up, but before I r+ this patch please file a bug and cc: me on it.

@@ +155,5 @@
>    }
> +
> +  // Decide which list of observers to inform
> +  std::vector<IOInterposeObserver*>* observers = nullptr;
> +  if (aObservation.ObservedOperation() == IOInterposeObserver::OpRead) {

How about we use a switch statement here?

@@ +225,3 @@
>    if (aOp & IOInterposeObserver::OpRead) {
> +    VectorRemove(sObserverLists->mReadObservers, aObserver);
> +    if (sObserverLists->mReadObservers.size() == 0) {

s/size/empty

@@ +231,4 @@
>    }
>    if (aOp & IOInterposeObserver::OpWrite) {
> +    VectorRemove(sObserverLists->mWriteObservers, aObserver);
> +    if (sObserverLists->mWriteObservers.size() == 0) {

s/size/empty

@@ +238,4 @@
>    }
>    if (aOp & IOInterposeObserver::OpFSync) {
> +    VectorRemove(sObserverLists->mFSyncObservers, aObserver);
> +    if (sObserverLists->mFSyncObservers.size() == 0) {

s/size/empty

::: tools/profiler/IOInterposer.h
@@ +11,5 @@
>  namespace mozilla {
>  
>  /**
> + * Interface for I/O interposer observers. This is separate from the
> + * IOInterposing because we have multiple uses for these observations.

nit: s/IOInterposing/IOInterposer/

@@ +79,5 @@
> +
> +    virtual ~Observation();
> +  protected:
> +    Operation   mOperation;
> +    double      mDuration;

Can you please modify this class and its constructor so that we store two TimeStamps representing the start and end of the operation, instead of just storing a duration? That way we don't lose information about when the operation began; this is going to be particularly useful when these events are sent to the profiler.

Sorry, I know that I'm kind of springing a new requirement on you. Since you're making significant changes to the observer interface anyway, I think that now is the best time to do it.

@@ +113,5 @@
> + * Remark: Instances of this class will never be created, you should consider it
> + * to be a namespace containing static functions. The class is created to
> + * facilitate to a private static instance variable sObservedOperations.
> + * As we want to access this from an inline static methods, we have to do this
> + * trick.

Thank you for explaining this.

@@ +137,2 @@
>     */
> +  static void InitIOInterposer();

s/InitIOInterposer/Init/ (The class is already named IOInterposer, so I think leaving IOInterposer out of the function name is more concise.)

@@ +140,4 @@
>  
>    /**
>     * This function must be called from the main thread, and furthermore
> +   * it must be called when no other threads are executing. Effectivly

nit: misspelled Effectively

@@ +151,2 @@
>     */
> +  static void ClearIOInterposer();

s/ClearIOInterposer/Clear/

@@ +220,3 @@
>    IOInterposer();
> +public:
> +  static inline void InitIOInterposer()                                   {}

s/InitIOInterposer/Init

@@ +220,4 @@
>    IOInterposer();
> +public:
> +  static inline void InitIOInterposer()                                   {}
> +  static inline void ClearIOInterposer()                                  {}

s/ClearIOInterposer/Clear

::: tools/profiler/NSPRInterposer.cpp
@@ +18,5 @@
> +// Forward declaration of interposed IO methods
> +int32_t   PR_CALLBACK interposedRead(PRFileDesc* aFd, void* aBuf, int32_t aAmt);
> +int32_t   PR_CALLBACK interposedWrite(PRFileDesc* aFd, const void* aBuf,
> +                                      int32_t aAmt);
> +PRStatus  PR_CALLBACK interposedFSync(PRFileDesc* aFd);

Let's throw this stuff into an anonymous namespace.

@@ +24,1 @@
>  using namespace mozilla;

If you're going to use namespace mozilla {} below, let's get rid of this using statement.

::: tools/profiler/NSPRInterposer.h
@@ +11,5 @@
>  
>  /**
> + * Initialize IO interposing for NSPR. This will report NSPR read, writes and
> + * fsyncs to the IOInterposerObserver. It is only safe to call this from the
> + * main-thread when all other threads have been stopped.

nit: s/all other threads have been stopped/no other threads are running/

@@ +18,3 @@
>  
>  /**
> + * Removes interseption of NSPR IO methods as setup by InitNSPRIOInterposing.

nit: s/interseption/interception/
Attachment #800161 - Flags: review?(aklotz) → review-
Bug 902587 - Part 2A: Refactor late-write-checks as client of IO Interposer
--------------------------------------------------------------------------
See: p2a-late-write-checks-refactoring.patch

In this part the mozPoisonWrite* files are refactored, with the aim of:
  * Separating I/O Poison from late-write checks.
  * Late-write checks are conducted in xpcom/build/LateWriteChecks.(cpp|h)

Remarks:
  * PoisonIOInterposerMac.cpp is only tested on try, I don't run Mac.
  * Should LateWriteChecks.(cpp|h) remain in xpcom/build/?
  * Tests on try suggests that OnlyReportDirtyWrites() is unnecessary.
  * If OnlyReportDirtyWrites() is called initially, tests will fail by timeout.

See comment #4 for notes on `OnlyReportDirtyWrites()`.
Attachment #797147 - Attachment is obsolete: true
Attachment #800909 - Flags: review?(bgirard)
Bug 902587 - Part 2B: Interpose NtReadFile and NtReadFileScatter
----------------------------------------------------------------

In this patch PoisonIOInterposerWin.cpp is extended, s.t.
  * NtReadFile and NtReadFileScatter are interposed.

Remarks:
  * Only writes are interposed on Mac at this point.
Attachment #800910 - Flags: review?(bgirard)
Bug 902587 - Part 2C: Filename method for poison IO Interposer 
----------------------------------------------------------------

In this patch PoisonIOInterposer(Win|Mac).cpp is extended, s.t.
  * Filenames can be requested on demand,
  * using NtQueryInformationFile on Windows, and
  * fcntl(fd, F_GETPATH, ...) on Mac.

Remarks:
  * The Mac part is not tested, it builds on try, but I don't have a Mac.
Attachment #800912 - Flags: review?(bgirard)
For completeness I'm attaching a small patch designed to log all IO from a
Firefox session. This is not designed to be merged, but could be useful for:
  * Detailed investigation of Firefox IO, and
  * Testing if filename reporting on Mac works.

To use the patch:
  1. Apply patches above (Part 1, 2A, 2B, 2C and 3)
  2. Apply the attached `moz-all-io-logfile.patch` patch
  3. Let environment variable `MOZ_ALL_IO_LOGFILE` be the absolute path of your
     log file.
  4. `mach run`, now search for "cats".
(For future readers, Part 1, 2A, 2B, 2C and 3 will hopefully be upstreamed).
Attachment #800910 - Flags: review?(bgirard)
Attachment #800912 - Flags: review?(bgirard)
Attachment #798440 - Flags: review?(vdjeric)
Attachment #800909 - Flags: review?(bgirard)
Sorry, about spamming you with review requests... I somehow missing aklotz's review, it won't affect the other patches much, but they will need minor updates. So no need to review them just yet... 

@aklotz, thanks for the awesome feedback, storing start and end timestamp is a great idea.
With respect to the lock issue, I agree that we need to investigate and I'll file a separate bug for that.
I think all issues should be addressed, I filed bug 913653 for discussion of
performance in IOInterposer::Report(). This issue is complicated by the fact that
IOInterposeObservers might also lock for very legitimate purposes.

Anyways, I'm not convinced we have a significant issue at this point, but it's
worth investigating. Especially as more IOInterposeObservers are added.
Attachment #800161 - Attachment is obsolete: true
Attachment #801425 - Flags: review?(aklotz)
Comment on attachment 801425 [details] [diff] [review]
p1-io-interposer-refactoring.patch

Review of attachment 801425 [details] [diff] [review]:
-----------------------------------------------------------------

A few nits. r=me with these fixes.

::: tools/profiler/IOInterposer.cpp
@@ +69,5 @@
> +static StaticAutoPtr<ObserverLists> sObserverLists;
> +
> +/** Find if a vector contains a specific element */
> +template<class T>
> +bool VectorContains(const std::vector<T>& vector, const T& element) {

Nit: { on new line

@@ +71,5 @@
> +/** Find if a vector contains a specific element */
> +template<class T>
> +bool VectorContains(const std::vector<T>& vector, const T& element) {
> +  return std::find(vector.begin(), vector.end(), element) != vector.end();
> +};

Nit: Don't need ';' after '}' here

@@ +75,5 @@
> +};
> +
> +/** Remove element from a vector */
> +template<class T>
> +void VectorRemove(std::vector<T>& vector, const T& element) {

Nit: { on new line

@@ +79,5 @@
> +void VectorRemove(std::vector<T>& vector, const T& element) {
> +  typename std::vector<T>::iterator newEnd = std::remove(vector.begin(),
> +                                                         vector.end(), element);
> +  vector.erase(newEnd, vector.end());
> +};

Nit: Don't need ';' after '}' here

::: tools/profiler/IOInterposer.h
@@ +35,5 @@
> +    Observation()
> +    {
> +    }
> +  public:
> +    Observation(Operation aOperation, TimeStamp aStart, TimeStamp aEnd,

Can we declare the aStart and aEnd parameters as references?

::: tools/profiler/NSPRInterposer.cpp
@@ +13,1 @@
>  using namespace mozilla;

Please remove this using statement.
Attachment #801425 - Flags: review?(aklotz) → review+
Part 1, as r+'ed by aklotz with nits addressed, but keeping "using namespace mozilla;" in NSPRInterposer.cpp as discussed on #pref

A push to try here: https://tbpl.mozilla.org/?tree=Try&rev=21cce0963b54
(Android test aborted, see bug 915481)

I'm flagging this bug checkin-needed, if person who can do that needs anything else please let me know..
(After checking please leave the bug assigned, this is just the part 1 :)
Attachment #801425 - Attachment is obsolete: true
Attachment #803559 - Flags: review+
Attachment #803559 - Flags: checkin?
Keywords: checkin-needed
See notes from comment #17.
Attachment #800909 - Attachment is obsolete: true
Attachment #803567 - Flags: review?(bgirard)
See notes from comment #18.
Attachment #800910 - Attachment is obsolete: true
Attachment #803568 - Flags: review?(bgirard)
See notes from comment #19.
Attachment #800912 - Attachment is obsolete: true
Attachment #803569 - Flags: review?(bgirard)
Attachment #803559 - Flags: checkin? → checkin+
TelemetryVFS.obj : error LNK2019: unresolved external symbol "public: virtual __thiscall mozilla::IOInterposeObserver::Observation::~Observation(void)" (??1Observation@IOInterposeObserver@mozilla@@UAE@XZ),the symbol is referenced in function "public: virtual void * __thiscall mozilla::IOInterposeObserver::Observation::`scalar deleting destructor'(unsigned int)" (??_GObservation@IOInterposeObserver@mozilla@@UAEPAXI@Z)
xul.dll : fatal error LNK1120: one unresolved external symbol

in my build, MOZ_ENABLE_PROFILER_SPS is not defined.
@zhoubcfan,
Sorry, I'm not sure why that didn't show up when I compiled without MOZ_ENABLE_PROFILER_SPS.

I've attached a patch to fix this, by
  1. not declaring virtual destructors in IOInterposeObserver(::Observation), and
  2. not reporting IO `TelemetryVFS.cpp`
when MOZ_ENABLE_PROFILER_SPS is undefined.

I think either (1) or (2) should be enough, but there's no reason to ask for
trouble and performance overhead.

On my end this builds both with and without MOZ_ENABLE_PROFILER_SPS.
A push to try with MOZ_ENABLE_PROFILER_SPS is available here:
https://tbpl.mozilla.org/?tree=Try&rev=7bcf0c66ffe3

If you, or somebody else reading this, can give the patch a review, please do.
(I imagine people don't like having mozilla-central broken, even for corner cases)

Also please, let me know if this doesn't fix the issue. I might be unable to reproduce it.
Without MOZ_ENABLE_PROFILER_SPS the IOInterposer::IsObservedOperation() function
will always return false. So maybe gcc optimizes out the offending part of `TelemetryVFS.cpp`
before linking. Anyway, (2) should take care of that.
 - did you build without optimizations? (just curious)
Attachment #804408 - Flags: review?
(In reply to Jonas Finnemann Jensen (:jonsafj) from comment #32)
> Created attachment 804408 [details] [diff] [review]
> remove-iointerposer-destructor-ifndef-sps.patch
> 
> @zhoubcfan,
> Sorry, I'm not sure why that didn't show up when I compiled without
> MOZ_ENABLE_PROFILER_SPS.
> 
> I've attached a patch to fix this, by
>   1. not declaring virtual destructors in
> IOInterposeObserver(::Observation), and
>   2. not reporting IO `TelemetryVFS.cpp`
> when MOZ_ENABLE_PROFILER_SPS is undefined.
> 
> I think either (1) or (2) should be enough, but there's no reason to ask for
> trouble and performance overhead.
> 
> On my end this builds both with and without MOZ_ENABLE_PROFILER_SPS.
> A push to try with MOZ_ENABLE_PROFILER_SPS is available here:
> https://tbpl.mozilla.org/?tree=Try&rev=7bcf0c66ffe3
> 
> If you, or somebody else reading this, can give the patch a review, please
> do.
> (I imagine people don't like having mozilla-central broken, even for corner
> cases)
> 
> Also please, let me know if this doesn't fix the issue. I might be unable to
> reproduce it.
> Without MOZ_ENABLE_PROFILER_SPS the IOInterposer::IsObservedOperation()
> function
> will always return false. So maybe gcc optimizes out the offending part of
> `TelemetryVFS.cpp`
> before linking. Anyway, (2) should take care of that.
>  - did you build without optimizations? (just curious)

I had used (1) to take care of the problem. And I use vs2013.
Ahh... I just tried it and the problem also exists on linux, if you compile without optimizations.
So that explains why it slipped, I also did no SPS builds on windows, but I probably forgot to redo that having added empty virtual destructors.
Attachment #804408 - Flags: review? → review?(aklotz)
Comment on attachment 804408 [details] [diff] [review]
remove-iointerposer-destructor-ifndef-sps.patch

Review of attachment 804408 [details] [diff] [review]:
-----------------------------------------------------------------

::: storage/src/TelemetryVFS.cpp
@@ +95,5 @@
>                                            (mainThread ? main_ref : other_ref));
>        // Report observation
>        IOInterposer::Report(ob);
>      }
> +#endif /* MOZ_ENABLE_PROFILER_SPS */

I'm okay with this.

::: tools/profiler/IOInterposer.h
@@ +96,3 @@
>  #else
>      const char* Filename() { return nullptr; }
>  #endif /* MOZ_ENABLE_PROFILER_SPS */

This entire #ifdef block is a premature optimization, IMHO, since both implementations of Filename() return nullptr by default. Let's eliminate this added code complexity and just have a single implementation using virtual functions.

FWIW if this were to stay in, you would need to add an #ifndef guard that you missed around the definition of IOInterposerObserver::Observation::~Observation(). This has already become trickier to maintain than it needs to be. In the absence of evidence that says this is a problem, let's just stick with the virtual function in all configurations.

@@ +120,2 @@
>    virtual ~IOInterposeObserver();
> +#endif /* MOZ_ENABLE_PROFILER_SPS */

This wasn't giving anybody trouble in the build, was it? Let's leave this as-is.
Attachment #804408 - Flags: review?(aklotz) → review-
As discussed on IRC, I've defined trivial virtual destructors and trivial filename() implementation in the header.

I've tested it with/without optimizations with/without SPS, and here is try:
https://tbpl.mozilla.org/?tree=Try&rev=6d7c4cd7888e
Attachment #804408 - Attachment is obsolete: true
Attachment #804663 - Flags: review?(aklotz)
Attachment #804663 - Flags: review?(aklotz) → review+
As r+'ed by aklotz, I just:
  * Change patch filename to reflect what it does,
  * Added r=aklotz to patch/commit message, and
  * Update patch/commit message to reflect what it does.

I'm flagging this for checkin, so mozilla-central will build when
MOZ_ENABLE_PROFILER_SPS is undefined and optimizations are disabled.

Patch is the same as this push to try:
https://tbpl.mozilla.org/?tree=Try&rev=6d7c4cd7888e
(only difference is filename and patch/commit message).
Attachment #804663 - Attachment is obsolete: true
Attachment #804862 - Flags: review+
Attachment #804862 - Flags: checkin?
Keywords: checkin-needed
Attachment #804862 - Flags: checkin? → checkin+
Comment on attachment 803567 [details] [diff] [review]
p2a-late-write-checks-refactoring.patch

Review of attachment 803567 [details] [diff] [review]:
-----------------------------------------------------------------

This patch has hg rm/add instead of hg mv. Let's fix that to preserve the history. Also I can't see what changes you've made to the files making it nearly impossible to review. Everything outside of that is just renaming so I'm ok with those.

::: toolkit/xre/nsAppRunner.cpp
@@ -3946,5 @@
>      appInitiatedRestart = true;
> -  } else {
> -    // We will have a real shutdown, let ShutdownXPCOM poison writes to
> -    // find any late ones.
> -    mozilla::EnableWritePoisoning();

Are you removing EnableWritePoisoning during shutdown?
Attachment #803567 - Flags: review?(bgirard) → review-
Comment on attachment 803568 [details] [diff] [review]
p2b-interpose-ntreadfile.patch

Review of attachment 803568 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good but I'm not a winapi expert. Ehsan can you give it a quick check?

::: xpcom/build/PoisonIOInterposerWin.cpp
@@ +37,5 @@
>  /**
> + * Function pointer declaration for internal NT routine to read data from file.
> + * For documentation on the NtReadFile routine, see ZwReadFile on MSDN.
> + */
> +typedef NTSTATUS (WINAPI *NtReadFileFn)(

Aren't we going to miss everything that calls ZwReadFile directly?

@@ +51,5 @@
> +);
> +
> +/**
> + * Function pointer declaration for internal NT routine to read data from file.
> + * No documentation exists, see wine sources for details.

I can't find any specific documentation and I'd like to know what IO calls we expect to go through this function vs. NtReadFile. Can we give a more specific link to this documentation or better yet have a quick inline description why this function is important. 'NtReadFileScatter' isn't very self descriptive.

@@ +156,5 @@
> +{
> +  // Report IO
> +  const char* ref = "NtReadFile";
> +  WinIOAutoObservation timer(IOInterposeObserver::OpRead, ref, aFileHandle);
> +  // Something is badly wring if this function is undefined

wrong*

@@ +187,5 @@
> +{
> +  // Report IO
> +  const char* ref = "NtReadFileScatter";
> +  WinIOAutoObservation timer(IOInterposeObserver::OpRead, ref, aFileHandle);
> +  // Something is badly wring if this function is undefined

wrong*

@@ +299,5 @@
> +    reinterpret_cast<intptr_t>(InterposedNtReadFile),
> +    reinterpret_cast<void**>(&gOriginalNtReadFile)
> +  );
> +  sNtDllInterceptor.AddHook(
> +    "NtReadFileScatter",

Please add these to TestDllInterceptor:
http://mxr.mozilla.org/mozilla-central/source/toolkit/xre/test/win/TestDllInterceptor.cpp#128
Attachment #803568 - Flags: review?(bgirard)
Attachment #803568 - Flags: review+
Attachment #803568 - Flags: feedback?(ehsan)
Comment on attachment 803569 [details] [diff] [review]
p2c-filename-for-poison-io-interposer.patch

Review of attachment 803569 [details] [diff] [review]:
-----------------------------------------------------------------

Yay filenames! I'd like to see the revisions so I wont r+ just yet.

::: xpcom/build/PoisonIOInterposerMac.cpp
@@ +119,5 @@
> +    return mFilename;
> +  }
> +  char filename[MAXPATHLEN];
> +  if (fcntl(mFd, F_GETPATH, filename) != -1) {
> +    mFilename = PL_strdup(filename);

Why not use strdup + free?

@@ +123,5 @@
> +    mFilename = PL_strdup(filename);
> +  } else {
> +    mFilename = nullptr;
> +  }
> +  // Return filename

mHasFilename = true

::: xpcom/build/PoisonIOInterposerWin.cpp
@@ +48,5 @@
> +);
> +/**
> + * Return structure for the FileNameInformation query on NtQueryInformationFile
> + * FileNameLength contains the number of bytes returned and FileName contains
> + * the first 16 bit Unicode character, followed by the remaining characters.

I'm having trouble parsing |first 16 bit Unicode character, followed by the remaining characters|. Is the first character special as this seems to point out?

@@ +50,5 @@
> + * Return structure for the FileNameInformation query on NtQueryInformationFile
> + * FileNameLength contains the number of bytes returned and FileName contains
> + * the first 16 bit Unicode character, followed by the remaining characters.
> + *
> + * Please note, that FileNameLength is the number of bytes returned, to get the

Maybe we could rename this to FileNameByteLength then? Keep the comment since it's useful to warn.

@@ +186,5 @@
> +  // I'm not exactly sure if NT API are unable to return filenames larger
> +  // than MAX_PATH which is 260 chars. In any event 260 characters is
> +  // probably enough for us.
> +  IO_STATUS_BLOCK status;
> +  WCHAR buffer[sizeof(FILE_NAME_INFORMATION) + MAX_PATH];

sizeof will return the number of bytes to accommodate FILE_NAME_INFORMATION which WCHAR will double. you should make buffer of type char. How about this:
char buffer[sizeof(FILE_NAME_INFORMATION) + sizeof(WCHAR) * MAX_PATH]

@@ +189,5 @@
> +  IO_STATUS_BLOCK status;
> +  WCHAR buffer[sizeof(FILE_NAME_INFORMATION) + MAX_PATH];
> +  FILE_NAME_INFORMATION* fileInfo = (FILE_NAME_INFORMATION*)buffer;
> +  // Check that NtQueryInformationFile have been grabbed from ntdll.dll
> +  MOZ_ASSERT(NtQueryInformationFile);

I'm a bit worried that on some versions of windows we might fail to grab NtQueryInformationFile. If you're sure we can get this everywhere then we can leave this. Otherwise we should return a null or empty file name if we failed to grab it instead of aborting. If you think we can get it everywhere then this is fine.

@@ +193,5 @@
> +  MOZ_ASSERT(NtQueryInformationFile);
> +  // Invoke NtQueryInformationFile to get the filename
> +  if (NtQueryInformationFile(
> +                        mFileHandle, &status, fileInfo,
> +                        2 * (sizeof(FILE_NAME_INFORMATION) + MAX_PATH),

I'm guessing 2 is meant for sizeof(WCHAR) which we should just use. But I think sizeof(buffer) should be able to replace this entire expression.

@@ +195,5 @@
> +  if (NtQueryInformationFile(
> +                        mFileHandle, &status, fileInfo,
> +                        2 * (sizeof(FILE_NAME_INFORMATION) + MAX_PATH),
> +                        /* FileNameInformation */ (FILE_INFORMATION_CLASS)9
> +  ) == /* STATUS_SUCCESS */ 0) {

This needs additional indent

@@ +206,5 @@
> +    // Save null in cache if we can't get a filename
> +    mFilename = nullptr;
> +  }
> +  // Return filename
> +  return mFilename;

mHasFilename = true;

@@ +364,5 @@
>    }
>    sIOPoisoned = true;
>  
> +  // Fetch the NtQueryInformationFile function
> +  if(NtQueryInformationFile == 0){

spacing + ! for pointer: if (!NtQueryInformationFile) {
Attachment #803569 - Flags: review?(bgirard) → review-
Comment on attachment 803568 [details] [diff] [review]
p2b-interpose-ntreadfile.patch

Review of attachment 803568 [details] [diff] [review]:
-----------------------------------------------------------------

::: xpcom/build/PoisonIOInterposerWin.cpp
@@ +37,5 @@
>  /**
> + * Function pointer declaration for internal NT routine to read data from file.
> + * For documentation on the NtReadFile routine, see ZwReadFile on MSDN.
> + */
> +typedef NTSTATUS (WINAPI *NtReadFileFn)(

Yep, both entries seem to exist in ntdll.dll.

@@ +302,5 @@
> +  sNtDllInterceptor.AddHook(
> +    "NtReadFileScatter",
> +    reinterpret_cast<intptr_t>(InterposedNtReadFileScatter),
> +    reinterpret_cast<void**>(&gOriginalNtReadFileScatter)
> +  );

I think you should hook the Zw* variants as well.  Even though they do the exact same thing, an application may link against ZwReadFile and not NtReadFile, for example.
Attachment #803568 - Flags: feedback?(ehsan) → feedback+
Rebased and refactored to use `hg mv/cp`. But there's still a little `hg rm/add`.
Unfortunately, the `LateWriteChecks*` and `PoisonIOInterposer*` files are pretty much refactored. In some cases I also took the opportunity to clean up style and things a little bit smarter.

So there's no way around the fact that this is a big patch. I would suggest reading the patched `LateWriteChecks*` and `PoisonIOInterposer*` files instead of completely relying on the diff... With all the change the diff might just be more confusing.
Attachment #803567 - Attachment is obsolete: true
Attachment #814637 - Flags: review?(bgirard)
Since BenWa r+'ed this patch, I've:
 * Interposed ZwReadFile, and
 * Fixed spelling errors.
Attachment #803569 - Attachment is obsolete: true
Attachment #814639 - Flags: review?(bgirard)
In this patch I have:
 * Moved from PL_strdup() to strdup()/free() on Mac (leaving Windows alone),
 * Added `mHasFilename = true` to cache filenames correctly,
 * Fixed WCHAR buffer over-allocation,
 * Removed MOZ_ASSERT(NtQueryInformationFile), but return nullptr instead,
 * Restyled complicated if(NtQueryInformationFile(....)) construction,
 * Clarified doc-comment on FILE_NAME_INFORMATION

I did not rename the `FileNameLength` field on `FILE_NAME_INFORMATION` to `FileNameByteLength` as suggested, because I want the header declarations to match MDSN documentation as much as possible.
(but who knows maybe that's just me going crazy).
Attachment #803568 - Attachment is obsolete: true
Attachment #814644 - Flags: review?(bgirard)
Comment on attachment 814644 [details] [diff] [review]
p2c-filename-for-poison-io-interposer.patch

Review of attachment 814644 [details] [diff] [review]:
-----------------------------------------------------------------

I have some comments on the implementation of this below, but there is a more general issue with this patch.  Did you read the comments in the Remarks section of <http://msdn.microsoft.com/en-us/library/windows/hardware/ff567052%28v=vs.85%29.aspx>?  It seems like this function doesn't give you back a reliable syntax for the file path.  Using a technique similar to what is described in <http://msdn.microsoft.com/en-us/library/windows/desktop/aa366789%28v=vs.85%29.aspx> seems like a much better way of getting this information.  Note that in order to avoid licensing issues, please don't copy the sample code from MSDN, just use the idea to implement your own version.

Thanks!

::: xpcom/build/PoisonIOInterposerMac.cpp
@@ +123,5 @@
> +    mFilename = strdup(filename);
> +  } else {
> +    mFilename = nullptr;
> +  }
> +  mHasFilename = true;

Same comment about mHasFilename.

::: xpcom/build/PoisonIOInterposerWin.cpp
@@ +158,5 @@
>  
> +// Pointer to NtQueryInformationFile, as the NtQueryInformationFile symbol isn't
> +// officially exported we have to grab it manually at runtime.
> +// This will be done by InitPoisonIOInterposer()
> +static NtQueryInformationFileFn NtQueryInformationFile = 0;

Nit: please give this a different name than the name of the function to make it obvious that we're not statically linking against it.

@@ +182,5 @@
>      }
>    }
>  
> +  // Custom implementation of IOInterposeObserver::Observation::Filename
> +  const char* Filename();

Nit: MOZ_OVERRIDE.

@@ +226,5 @@
> +  // I'm not exactly sure if NT API are unable to return filenames larger
> +  // than MAX_PATH which is 260 chars. In any event 260 characters is
> +  // probably enough for us.
> +  IO_STATUS_BLOCK status;
> +  char buffer[sizeof(FILE_NAME_INFORMATION) + sizeof(WCHAR) * MAX_PATH];

MAX_PATH is not guaranteed to be enough here.  You should call this function twice, the first time passing a buffer of sie sizeof(FILE_NAME_INFORMATION) and then read the actual length of the file from FileNameLength and allocate a large enough buffer dynamically and pass that to this function again.  (Welcome to the Win32 API!!!)

@@ +234,5 @@
> +  if (NtQueryInformationFile(mFileHandle, &status, fileInfo, sizeof(buffer),
> +                             FileNameInformation) == STATUS_SUCCESS) {
> +    // Convert filename to UTF-8
> +    NS_ConvertUTF16toUTF8 filename(fileInfo->FileName,
> +                                   fileInfo->FileNameLength / 2);

This will do the wrong thing if your file name contains Unicode characters.  Can't we juse store the file name in a wide char buffer?  If not, you should use WideCharToMultiByte to do a locale aware conversion of the file name (which may convert those characters to '?', which is not great.)

Also, please use sizeof(WCHAR) instead of the hardcoded 2 value.

@@ +241,5 @@
> +  } else {
> +    // Save null in cache if we can't get a filename
> +    mFilename = nullptr;
> +  }
> +  mHasFilename = true;

It's not clear to me why you're using mHasFilename.

@@ +434,5 @@
>    }
>    sIOPoisoned = true;
>  
> +  // Fetch the NtQueryInformationFile function
> +  if (!NtQueryInformationFile){

How can this be non-null?  It seems better to MOZ_ASSERT what we know here.
Attachment #814644 - Flags: review?(bgirard) → review-
@ehsan, thanks for the feedback :)
Windows APIs is not my strong side yet.

I did read the comment section on NtQueryInformationFile, but I think it is acceptable for this use case.
When interposing we want low overhead, if the fullpath is missing or relative/incomplete in a few corner cases that okay.
When logging with telemetry we're only going to use the filename and not the fullpath, so I suggest that we go with
NtQueryInformationFile until more precision is needed.

With respect to the alternative you referenced, then it might make sense to use GetFinalPathNameByHandle on Vista
and later. But if I recall our discussions from August correctly you were saying that:
http://msdn.microsoft.com/en-us/library/windows/desktop/aa366789%28v=vs.85%29.aspx
was the wrong way to do this as Windows already has the filename. I tend to agree as using CreateFileMapping,
MapViewOfFile, GetMappedFileName and other hacks to construct filename seems fairly expensive.

The filename was at some point in WCHAR (as Windows returns them), but OS X provides UTF-8, so one of them
would have to converted. Hence, I figured UTF-8 was preferred.
Your comment regarding use of WideCharToMultiByte as opposed to NS_ConvertUTF16toUTF8 seems to suggest that Windows WCHAR*
is local dependent, in which case we should convert it before exposing to IOInterpose consumers, anyway.

I've already started addressing some of the issues you've pointed out, and will continue tomorrow.
Based on your feedback I decided to a quick test of NtQueryInformationFile on Windows 7, and it can indeed return
filenames larger than MAX_PATH (260). However, the call twice approach you outlined seems very reasonable.

NEEDINFO: Do you agree that NtQueryInformationFile is a good fit for this use case?
If not please consider stopping me before I continue to implementing this when I get up tomorrow :)
Flags: needinfo?(ehsan)
(In reply to Jonas Finnemann Jensen (:jonasfj) from comment #48)
> I did read the comment section on NtQueryInformationFile, but I think it is
> acceptable for this use case.
> When interposing we want low overhead, if the fullpath is missing or
> relative/incomplete in a few corner cases that okay.
> When logging with telemetry we're only going to use the filename and not the
> fullpath, so I suggest that we go with
> NtQueryInformationFile until more precision is needed.

It's not clear to me how you're making this trade-off.  We don't know if this is only going to not work as we expect in a few corner cases, what if it turns out to not work in most cases?  I mean, we have an alternative way to do this which is more easily provable to be correct after all.  :-)

> With respect to the alternative you referenced, then it might make sense to
> use GetFinalPathNameByHandle on Vista
> and later. But if I recall our discussions from August correctly you were
> saying that:
> http://msdn.microsoft.com/en-us/library/windows/desktop/aa366789%28v=vs.
> 85%29.aspx
> was the wrong way to do this as Windows already has the filename. I tend to
> agree as using CreateFileMapping,
> MapViewOfFile, GetMappedFileName and other hacks to construct filename seems
> fairly expensive.

Well, Windows already *knows* the filename for sure, we just need to convince it to give it to us!

Have you measured the cost of the two approaches?  I don't know which one is more expensive than the other, so I think if you are worried about the cost, it's best to measure.  But we also need a better reason to know if the cases that NtQueryInformationFile will fail for are actually corner cases.

> The filename was at some point in WCHAR (as Windows returns them), but OS X
> provides UTF-8, so one of them
> would have to converted. Hence, I figured UTF-8 was preferred.
> Your comment regarding use of WideCharToMultiByte as opposed to
> NS_ConvertUTF16toUTF8 seems to suggest that Windows WCHAR*
> is local dependent, in which case we should convert it before exposing to
> IOInterpose consumers, anyway.

Hmm, actually now that I think about this again, I think I was wrong the first time around -- your code should work correctly.  But please test it with a file path containing Unicode characters to be sure.  One very easy and common way to hit this is to have a user name with Unicode characters inside it, which gives you a profile path with those characters.

> I've already started addressing some of the issues you've pointed out, and
> will continue tomorrow.

Sounds good!

> Based on your feedback I decided to a quick test of NtQueryInformationFile
> on Windows 7, and it can indeed return
> filenames larger than MAX_PATH (260). However, the call twice approach you
> outlined seems very reasonable.

Yeah in fact this is an idiom of sorts on Windows!

> NEEDINFO: Do you agree that NtQueryInformationFile is a good fit for this
> use case?
> If not please consider stopping me before I continue to implementing this
> when I get up tomorrow :)

I think we should use the other alternative unless you have very good reasons not to.  The cost could be one such reason but you should back that up with some measurements.  :-)
Flags: needinfo?(ehsan)
@ehsan,
Okay, I did a few quick tests, essentially measuring time spent looking up the filename for the first 1000 file operations when starting Firefox on Windows.

With `NtQueryInformationFile` the first 1000 queries took: 0.064s
With `GetFinalPathNameByHandle` (Vista+ only) the first 1000 queries took: 1.110s

And then I haven't even tested the legacy approach MSDN suggests with `CreateFileMapping` and friends.
But I imagine that this can only be slower, there quite a few blog posts around, but a write up here:
http://cbloomrants.blogspot.com/2012/12/12-21-12-file-handle-to-file-name-on.html

Notice, that `GetFinalPathNameByHandle` is Vista+ only and provides the drive letter,
as opposed to `NtQueryInformationFile`. Again, I think we should accept the reduced precision as this
is mainly for developers, we can always add more precision and overhead if we need it.
Attachment #814639 - Flags: review?(bgirard) → review+
(In reply to Jonas Finnemann Jensen (:jonasfj) from comment #44)
> So there's no way around the fact that this is a big patch.

In general you should build up a patch queue incrementally of the logically changes your doing. I'll review it this time but dumping a large patch like this make a quality review impossible and it's very uncomfortable for a reviewer to rubber stamp.
@BenWa,
It was actually split in two initially, but in between (1) removing write-poison, (2) adding poison IO interposer and
(3) adding late-write checks there was regressions... In hind sight, maybe it would have been better to do some
extra work maintaining that separation and live with regression between those patches.
(Actually, it would still be easy to split the patch, but history would be impossible to maintain with `hg mv/cp`).

Anyways, don't rubber stamp, if there are things not explained properly let's do a quick session on it,
maybe more comments elaborating on how we hook into nsAppRunner and XPCOMInit could be added.
Comment on attachment 814637 [details] [diff] [review]
p2a-late-write-checks-refactoring.patch

Review of attachment 814637 [details] [diff] [review]:
-----------------------------------------------------------------

r+ if we have a good solution to the leaks.

::: xpcom/build/LateWriteChecks.cpp
@@ +240,5 @@
> +      sLateWriteObserver
> +    );
> +    // Deallocation would not be thread-safe, and StopLateWriteChecks() is
> +    // called at shutdown and only in special cases.
> +    // sLateWriteObserver = nullptr;

Will these leaks be caught by our leak tests? I don't think regressing those is acceptable. Any reason why the previous behavior is not appropriate?

::: xpcom/build/LateWriteChecks.h
@@ +14,2 @@
>  
> +namespace mozilla{

nit: space before {
Attachment #814637 - Flags: review?(bgirard) → review+
(In reply to comment #50)
> @ehsan,
> Okay, I did a few quick tests, essentially measuring time spent looking up the
> filename for the first 1000 file operations when starting Firefox on Windows.
> 
> With `NtQueryInformationFile` the first 1000 queries took: 0.064s
> With `GetFinalPathNameByHandle` (Vista+ only) the first 1000 queries took:
> 1.110s

We cannot use GetFinalPathNameByHandle since we do need to support WinXP, so this performance measurement is moot I'm afraid.

> And then I haven't even tested the legacy approach MSDN suggests with
> `CreateFileMapping` and friends.
> But I imagine that this can only be slower, there quite a few blog posts
> around, but a write up here:
> http://cbloomrants.blogspot.com/2012/12/12-21-12-file-handle-to-file-name-on.html

Well, I wanted to see the performance difference between the two approaches which we _can_ use.  This is not it.  :-)

> Notice, that `GetFinalPathNameByHandle` is Vista+ only and provides the drive
> letter,
> as opposed to `NtQueryInformationFile`. Again, I think we should accept the
> reduced precision as this
> is mainly for developers, we can always add more precision and overhead if we
> need it.

When you say "mainly for developers" do you mean on their own local builds?
@BenWa,

> Will these leaks be caught by our leak tests?
No, IOInterposer, PoisonIOInterposer and LateWriteChecks do not used any leak-test-tracked allocations.
The idea with LateWriteChecks is to find writes during shutdown, as I understand it that also includes
writes while static destructors are executed. Thus, it is necessary to leak it, and it should not use
leak-test-tracked allocations.

> Any reason why the previous behavior is not appropriate?
The old code, flipped a boolean and deallocated some, but not all structures related to poison IO interposing and
late-write checks. Unless there is only one thread running at the call sites of DisableWritePoisoning() then the
old behavior isn't thread-safe.
This code is only used in corner cases, the leak doesn't affect performance, as we only call StopLateWriteChecks()
during shutdown, in case of crash or before static destructors on OS X (bug 826029). Ideally, there shouldn't
be a thread executing inside the LateWriteObserver when StopLateWriteChecks() because there shouldn't be any
late writes.

@ehsan,
Why can't we use `GetFinalPathNameByHandle` on platforms where it is available?
Sure we'll have to look it up at runtime, like we do when we get function pointer to `NtQueryInformationFile`.

Anyways, I'll do some preliminary performance tests for the alternative with file mappings, though,
I fear file mappings will fail for empty files and write-only files. But I'll do the testing before
pushing another iteration of the patch for review.
(In reply to comment #55)
> @ehsan,
> Why can't we use `GetFinalPathNameByHandle` on platforms where it is available?
> Sure we'll have to look it up at runtime, like we do when we get function
> pointer to `NtQueryInformationFile`.

I don't think there is a good reason to have multiple platform specific code paths for this.
As r+'ed by BenWa with Nit fixed. As earlier mentioned, we're not regressing
leak tests, which would also show up on try and in local DEBUG builds.

There a push to try, here: https://tbpl.mozilla.org/?tree=Try&rev=e680eac3b236
Hopefully, I'm the only one working Sunday night, so it should be all green tomorrow morning :)
Attachment #814637 - Attachment is obsolete: true
Attachment #819539 - Flags: review+
Attachment #819539 - Flags: checkin?
Keywords: checkin-needed
After using qimportbz and doing a bit of manually shuffling, I still can't get this to apply in order. Please can you number all of the patches & in a way that means qimport can import correctly? (I can't remember by what method it does the ordering)

[/c/src-gecko/inbound]$ hg qser -v
 0 A nocrash.diff [another bug]
 1 A moz-all-io-logfile.patch
 2 U p1-io-interposer-refactoring.patch
 3 U fix-iointerposer-destructors-ifndef-sps.patch
 4 U p2a-late-write-checks-refactoring.patch
 5 U p2b-interpose-ntreadfile.patch
 6 U p3-telemetry-io-interposing.patch
 7 U p2c-filename-for-poison-io-interposer.patch

[/c/src-gecko/inbound]$ hg qgoto -- p2c-filename-for-poison-io-interposer.patch
applying moz-all-io-logfile.patch
patching file toolkit/xre/nsAppRunner.cpp
Hunk #2 FAILED at 3264
1 out of 2 hunks FAILED -- saving rejects to file toolkit/xre/nsAppRunner.cpp.rej
patching file tools/profiler/moz.build
Hunk #2 succeeded at 87 with fuzz 2 (offset 27 lines).
patch failed, unable to continue (try -v)
patch failed, rejects left in working dir
errors during apply, please fix and refresh moz-all-io-logfile.patch
Keywords: checkin-needed
@edmorley,

I'm sorry, it's only p2a-late-write-checks-refactoring.patch that is ready for checkin.
Some of the others are merely for references or already checked in.

But looks like it needs a rebase, I'll get around to that later today (hopefully).
Ah sorry missed the per-patch flags, been so long since I've done the checkin-neededs lol
(Ryan is on PTO until Monday so trying to get some of the backlog caught up)
Okay, I rebased the patch from before...
The try run is here: https://tbpl.mozilla.org/?tree=Try&rev=120fc569f459
Attachment #819539 - Attachment is obsolete: true
Attachment #819539 - Flags: checkin?
Attachment #821383 - Flags: review+
Attachment #821383 - Flags: checkin?
Keywords: checkin-needed
Really sorry, that patch doesn't apply cleanly for me:

[/c/src-gecko/inbound]$ qimp 902587
Fetching... done
Parsing...Unknown flag checkin
Unknown flag checkin
Unknown flag checkin
 done
1: p3-telemetry-io-interposing.patch
2: moz-all-io-logfile.patch
3: p1-io-interposer-refactoring.patch
  jopsen: review+, ryanvm: checkin+
4: fix-iointerposer-destructors-ifndef-sps.patch
  jopsen: review+, ryanvm: checkin+
5: p2b-interpose-ntreadfile.patch
  bgirard: review+
6: p2c-filename-for-poison-io-interposer.patch
  ehsan: review-
7: p2a-late-write-checks-refactoring.patch
  jopsen: review+, jopsen: checkin?
Which patches do you want to import? [Default is '1', use '1-7' for all] 7
adding 902587 to series file
renamed 902587 -> p2a-late-write-checks-refactoring.patch
applying p2a-late-write-checks-refactoring.patch
patching file xpcom/build/moz.build
Hunk #1 FAILED at 11
1 out of 1 hunks FAILED -- saving rejects to file xpcom/build/moz.build.rej
patch failed, unable to continue (try -v)
patch failed, rejects left in working dir
errors during apply, please fix and refresh p2a-late-write-checks-refactoring.patch

(Using inbound tip)
Keywords: checkin-needed
Attachment #821383 - Flags: checkin?
Carrying forward with r+ from BenWa as with previous rebase.

@edmorley,
Okay... Seems I can't keep up with chances, I rebased again :)
Attachment #821383 - Attachment is obsolete: true
Attachment #822055 - Flags: review+
Attachment #822055 - Flags: checkin?
Keywords: checkin-needed
Comment on attachment 822055 [details] [diff] [review]
p2a-late-write-checks-refactoring.patch

I had to rebase again, on top of bug 929905, but looked simple enough (and would have felt bad punting this again) so just rebased & landed it :-)

https://hg.mozilla.org/integration/mozilla-inbound/rev/2d6c5cbcc213
Attachment #822055 - Flags: checkin? → checkin+
Keywords: checkin-needed
Unfortunately this resulted in these failures:
https://tbpl.mozilla.org/php/getParsedLog.php?id=29657567&tree=Mozilla-Inbound
https://tbpl.mozilla.org/php/getParsedLog.php?id=29657919&tree=Mozilla-Inbound

Don't know if my rebase was incorrect, things have shifted under you since the time of the try run in comment 62, or if this is exposing a build system dependency failure (given try server is always clobber builds). Another push to try after a proper rebase will rule out the first two at least :-)

Backout:
remote:   https://hg.mozilla.org/integration/mozilla-inbound/rev/429806586db9
Attachment #822055 - Flags: checkin+ → checkin-
Actually, another failure was a (cooincidental) clobber:
https://tbpl.mozilla.org/php/getParsedLog.php?id=29658456&tree=Mozilla-Inbound
...so build system dependency issue is ruled out.
Okay, a file slipped while rebasing moz.build, this should be fixed now.
There's a push to try here:
https://tbpl.mozilla.org/?tree=Try&rev=3b4fb34a58aa
Attachment #822055 - Attachment is obsolete: true
Attachment #824204 - Flags: review+
Attachment #824204 - Flags: checkin?
Keywords: checkin-needed
Comment on attachment 824204 [details] [diff] [review]
p2a-late-write-checks-refactoring.patch

This is bitrotted. Please rebase :(
Attachment #824204 - Flags: checkin?
Same as r+'ed long ago...
Try push here: https://tbpl.mozilla.org/?tree=Try&rev=ac3f95d4e897

Maybe it comes through this time :)
Attachment #824204 - Attachment is obsolete: true
Attachment #827139 - Flags: review+
Attachment #827139 - Flags: checkin?
Keywords: checkin-needed
Jonas, is there a way (in a followup bug) to hack this into also reporting to the console main thread IO in unit testing?
Attachment #827139 - Flags: checkin? → checkin+
@Yoric,
Hmm, if you want to use main-thread I/O in unit testing that might be possible, though we would have to look into how
stable the measure is. But flesh it out in a bug and CC me, hooking into IO Interposer should be easier now.
p2a fail the build when MOZ_ENABLE_PROFILER_SPS is not defined.   errors are  "mozilla::IsDebugFile" has already been defined and so on.
@zhoubcfan,
Thanks for noticing. I forgot to test without MOZ_ENABLE_PROFILER_SPS on Windows.
To keep the size of this bug manageable I've filed bug 937201 for this issue.
Anyways, a patch is underway, if this is blocking or annoying anybody,
please set a higher priority on bug 937201 or just post a comment on it.
Assignee: jopsen → aklotz
I unbitrotted the remaining patches and pushed them to try. Here's the complete queue:
https://tbpl.mozilla.org/?tree=Try&rev=cecae39437a6

I resolved the conflicts in their correct patches, and added an extra patch with some updates that were required to make it work.
(The change in patch sequence number from p2c to p2b is intentional; I have reordered these patches in my patch queue.)

Performance test of [GetFinalPathNameByHandle, NtQueryInformationFile, GetMappedFileName], means are in microseconds:
Means: [1715.1111469742862, 25.903408376998, 39.96582780611611]
%diff: [6521.179429411652, 0.0, 54.28791155377611]

This patch does 3 things:

1. Pushes some boilerplate code that was common across the various subclasses of IOInterposeObsever::Observation into the parent class itself;

2. Cleans up the usage of the |aReference| parameter to the IOInterposeObsever::Observation constructor. The purpose of that parameter is to annotate the mechanism by which the Observation was generated, e.g. NSPRInterposer vs SQLiteInterposer vs PoisonIOInterposer. The reference strings being passed in with PoisonIOInterposer were more granular and inconsistent compared to the NSPR and SQLite interposers.

3. Implements handle to filename conversion. Based on the test results, I decided to implement a GetMappedFileName-based solution for Windows. I think that the benefits of using fully-documented APIs outweigh the slight performance hit compared to the NtQueryInformationFile route. The version that I wrote also accepts an offset as a parameter so that we can map the page containing the same offset that the cache manager is referencing.

Ehsan, regarding the mHasFilename issue: I believe that Jonas' intention here is to distinguish between "I haven't obtained the filename yet" and "I tried to obtain the filename but failed." If mHasFilename is true but the filename is nullptr, then we avoid the futility of repeatedly executing a failing query.
Attachment #814644 - Attachment is obsolete: true
Attachment #8360904 - Flags: review?(ehsan)
Compared to the previous revision of this patch, I added additional hooks for NtCreateFile, NtFlushBuffersFile, and NtQueryFullAttributesFile.

I also had to remove the hook for ZwReadFile because in user mode that name is aliased to the same address as NtReadFile, thus hooking the same function twice.
Attachment #814639 - Attachment is obsolete: true
Attachment #8360906 - Flags: review?(ehsan)
Comment on attachment 8360904 [details] [diff] [review]
P2B: Filename for PoisonIOInterposer

Review of attachment 8360904 [details] [diff] [review]:
-----------------------------------------------------------------

r=me on everything except PoisonIOInterposerMac.cpp.  If you need review on that, please ask Benoit.

::: xpcom/build/PoisonIOInterposerWin.cpp
@@ +105,5 @@
> +  do {
> +    // Unfortunately QueryDosDevice doesn't support the idiom for querying the
> +    // output buffer size, so it may require retries.
> +    driveTemplate[0] = *cur;
> +    DWORD targetPathLen = 0;

SetLastError(0)

@@ +131,5 @@
> +    }
> +    // Advance to the next NUL character in logicalDrives
> +    while (*cur++);
> +  } while (cur != end);
> +  // Code for handling UNC paths would go here, if eventually required.

Please use NS_WARNING here.

@@ +146,5 @@
> +typedef Scoped<ScopedMappedViewTraits> ScopedMappedView;
> +struct ScopedHandleTraits
> +{
> +  typedef HANDLE type;
> +  static HANDLE empty() { return NULL; }

nullptr

@@ +149,5 @@
> +  typedef HANDLE type;
> +  static HANDLE empty() { return NULL; }
> +  static void release(HANDLE h) { CloseHandle(h); }
> +};
> +typedef Scoped<ScopedHandleTraits> ScopedHandle;

Hmm it makes me sad that you're reinventing nsAutoHandle here.  I would suggest using that instead, and/or filing a follow-up to sort out this mess.

@@ +168,5 @@
> +  if (!view) {
> +    return false;
> +  }
> +  nsAutoString mappedFilename;
> +  DWORD len = 0;

SetLastError(0), otherwise you may end up in an infinite loop if something sets the last error to E_I_B.

@@ +204,4 @@
>      }
>    }
>  
> +  WinIOAutoObservation(IOInterposeObserver::Operation aOp, nsAString& aFilename)

This constructor is unused as far as I can tell.  Please remove it.

@@ +234,5 @@
>  
>  private:
>    HANDLE              mFileHandle;
> +  LARGE_INTEGER       mOffset;
> +  bool                mHasFilename;

mHasFilename is a bad name.  Please use a better name, such as mHasQueriedFilename, etc.
Attachment #8360904 - Flags: review?(ehsan) → review+
Comment on attachment 8360906 [details] [diff] [review]
P2C: Interpose NtCreateFile, NtReadFile, NtReadFileScatter, NtFlushBuffersFile, and NtQueryFullAttributesFile

Review of attachment 8360906 [details] [diff] [review]:
-----------------------------------------------------------------

::: xpcom/build/PoisonIOInterposerWin.cpp
@@ +358,5 @@
> +{
> +  // Report IO
> +  char16_t* buf = aObjectAttributes ?
> +                    aObjectAttributes->ObjectName->Buffer :
> +                    L"";

MOZ_UTF16()

@@ +363,5 @@
> +  uint32_t len = aObjectAttributes ?
> +                   aObjectAttributes->ObjectName->Length / sizeof(WCHAR) :
> +                   0;
> +  nsDependentString filename(buf, len);
> +  WinIOAutoObservation timer(IOInterposeObserver::OpCreateOrOpen, filename);

Oh I guess this uses the ctor which was unused in the previous patch...

@@ +537,5 @@
> +{
> +  // Report IO
> +  char16_t* buf = aObjectAttributes ?
> +                    aObjectAttributes->ObjectName->Buffer :
> +                    L"";

Ditto.
Attachment #8360906 - Flags: review?(ehsan) → review+
Patch updated with fixes for Ehsan.

Benoit, can you please review the Mac-related part? See comment 78 for more info about this patch.
Attachment #8360904 - Attachment is obsolete: true
Attachment #8362712 - Flags: review?(bgirard)
Comment on attachment 8362712 [details] [diff] [review]
P2B: Filename for PoisonIOInterposer

Review of attachment 8362712 [details] [diff] [review]:
-----------------------------------------------------------------

::: tools/profiler/IOInterposer.cpp
@@ +109,5 @@
> +  : mOperation(aOperation)
> +  , mStart(aStart)
> +  , mEnd(aEnd)
> +  , mReference(aReference)
> +  , mShouldReport(false)

I don't see anything to make mShouldReport true. How is this constructor useful?

::: xpcom/build/PoisonIOInterposerWin.cpp
@@ +72,5 @@
>  
>  /*************************** Auxiliary Declarations ***************************/
>  
> +// Helper function to convert fully-qualified NT paths to DOS-style paths
> +bool NtPathToDosPath(const nsAString& aNtPath, nsAString& aDosPath)

This code needs test coverage. I'm not familiar with NT paths so I'm not a good person to review this.

@@ +234,5 @@
>    }
>  
>  private:
>    HANDLE              mFileHandle;
> +  LARGE_INTEGER       mOffset;

Why do we need a large integer here?
Attachment #8362712 - Flags: review?(bgirard) → review-
(In reply to Benoit Girard (:BenWa) from comment #83)
> Comment on attachment 8362712 [details] [diff] [review]
> P2B: Filename for PoisonIOInterposer
> 
> Review of attachment 8362712 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> ::: tools/profiler/IOInterposer.cpp
> @@ +109,5 @@
> > +  : mOperation(aOperation)
> > +  , mStart(aStart)
> > +  , mEnd(aEnd)
> > +  , mReference(aReference)
> > +  , mShouldReport(false)
> 
> I don't see anything to make mShouldReport true. How is this constructor
> useful?

With this constructor the Observation object does not report itself. It is passed into IOInterposer::Report by code that is external to the Observation object. It is being used by the SQLite interpose code.

> 
> ::: xpcom/build/PoisonIOInterposerWin.cpp
> @@ +72,5 @@
> >  
> >  /*************************** Auxiliary Declarations ***************************/
> >  
> > +// Helper function to convert fully-qualified NT paths to DOS-style paths
> > +bool NtPathToDosPath(const nsAString& aNtPath, nsAString& aDosPath)
> 
> This code needs test coverage. I'm not familiar with NT paths so I'm not a
> good person to review this.

Ehsan has already r+'d the Windows-centric stuff. I've separated that code into FileUtilsWin.(h|cpp) and added a test to this latest patch revision.

> 
> @@ +234,5 @@
> >    }
> >  
> >  private:
> >    HANDLE              mFileHandle;
> > +  LARGE_INTEGER       mOffset;
> 
> Why do we need a large integer here?

We need that offset later if Filename() is called. It's a LARGE_INTEGER because that's the type used by the NT native APIs for offset parameters.
Attachment #8362712 - Attachment is obsolete: true
Attachment #8364483 - Flags: review?(bgirard)
Attachment #798440 - Attachment is obsolete: true
Attachment #8366340 - Flags: review?(nfroyd)
Comment on attachment 8366340 [details] [diff] [review]
P3: I/O Interpose Observer for Telemetry

Review of attachment 8366340 [details] [diff] [review]:
-----------------------------------------------------------------

Looks generally good, but I have a few questions and observations.

::: toolkit/components/telemetry/Telemetry.cpp
@@ +37,5 @@
>  #include "nsIFile.h"
>  #include "nsIFileStreams.h"
>  #include "nsIMemoryReporter.h"
> +#include "nsIObserver.h"
> +#include "nsIObserverService.h"

What are these for?  They look unused.

@@ -359,5 @@
>  }
>  
> -size_t
> -TelemetryImpl::SizeOfIncludingThis(mozilla::MallocSizeOf aMallocSizeOf)
> -{

njn should really look at these memory reporter changes; can you move the memory reporter changes to a separate patch and r?njn,froydnj, please?

@@ +2303,5 @@
> +   */
> +  void Observe(Observation& aOb);
> +
> +  /**
> +   * Reflect recorded file IO statistics into Javascript 

Nit: trailing whitespace.

@@ +2310,5 @@
> +
> +  /**
> +   * Get size of hash table with file stats
> +   */
> +  size_t SizeOfExcludingThis(mozilla::MallocSizeOf aMallocSizeOf) {

This should really be SizeOfIncludingThis because you're only ever accessing TelemetryIOInterposeObserver through a pointer.

@@ +2323,5 @@
> +  // Statistics for each filename
> +  AutoHashtable<FileIOEntryType> mFileStats;
> +  // Container for whitelisted directories
> +  nsTArray<nsString> mSafeDirs;
> +  // nsCOMArray<nsIFile> mSafeDirs;

Remove this?

@@ +2327,5 @@
> +  // nsCOMArray<nsIFile> mSafeDirs;
> +
> +  /**
> +   * Reflect a FileIOEntryType object to a Javascript property on obj with
> +   * filename as key containing array: [totalTime, reads, writes, fsync]

It looks like we reflect a lot more than that.  The comment or the code needs to be changed.

@@ +2333,5 @@
> +  static bool ReflectFileStats(FileIOEntryType* entry, JSContext *cx,
> +                               JS::Handle<JSObject*> obj);
> +
> +  /**
> +   * Stupid size of method for keeping things simple

I don't think we need this comment because...

@@ +2339,5 @@
> +  static size_t SizeOfFileIOEntryTypeExcludingThis(FileIOEntryType*,
> +                                                   mozilla::MallocSizeOf,
> +                                                   void*)
> +  {
> +    return 0;

...we should be measuring the string key size here, if it's not shared.

@@ +2356,5 @@
> +void TelemetryIOInterposeObserver::AddPath(const nsAString& aPath)
> +{
> +  mSafeDirs.AppendElement(aPath);
> +}
> +  

Nit: trailing whitespace.

@@ +2366,5 @@
> +  }
> +
> +  // Get the filename
> +  const char16_t* filename = aOb.Filename();
> +  

Nit: trailing whitespace.

@@ +2374,5 @@
> +  }
> +
> +  bool report = false;
> +  nsDependentString filenameStr(filename);
> +  for (uint32_t i = 0; i < mSafeDirs.Length(); ++i) {

Pull the Length call out of the loop, please.

@@ +2376,5 @@
> +  bool report = false;
> +  nsDependentString filenameStr(filename);
> +  for (uint32_t i = 0; i < mSafeDirs.Length(); ++i) {
> +    uint32_t curSafeDirLen = mSafeDirs[i].Length();
> +    if (curSafeDirLen <= filenameStr.Length()) {

This one too.

@@ +2393,5 @@
> +  if (!report) {
> +    return;
> +  }
> +
> +  // Find the leaf of the filename, that is last slash or backslash + 1

Since you already know the length thanks to |filenameStr|, you could start from the end and work backwards, which would be more efficient.

@@ +2395,5 @@
> +  }
> +
> +  // Find the leaf of the filename, that is last slash or backslash + 1
> +  const char16_t* leaf = nullptr;
> +  const char16_t* c = filename;

Are you guaranteed that filename will be an absolute path here?  Seems like you could wind up with just a basename in filename, and you'd fail to find a leaf, which would be incorrect.

@@ +2408,5 @@
> +    return;
> +  }
> +
> +  // Create a new entry or retrieve the existing one
> +  FileIOEntryType* entry = mFileStats.PutEntry(nsDependentString(leaf));

I assume that the string eventually winds up being copied, so the dependency on the short-lived leaf here is not a problem.

@@ +2441,5 @@
> +{
> +  // Let's not report arrays containing [0, 0, 0, 0, 0, 0]
> +  if (entry->mData.totalTime == 0 && entry->mData.creates == 0 &&
> +     entry->mData.reads == 0 && entry->mData.writes == 0 &&
> +     entry->mData.fsyncs == 0 && entry->mData.stats == 0) {

Nit: the continuation lines should be aligned, like so:

  if (entry->
      entry->
      entry-> ...)

@@ +2901,5 @@
> +
> +  // Initialize IO interposing
> +  IOInterposer::Init();
> +  InitPoisonIOInterposer();
> +  

Nit: trailing whitespace.

@@ +2912,5 @@
> +{
> +  if (!sTelemetryIOObserver) {
> +    return;
> +  }
> +  IOInterposer::Unregister(IOInterposeObserver::OpAll, sTelemetryIOObserver);   

Nit: trailing whitespace.

::: toolkit/components/telemetry/Telemetry.h
@@ +165,5 @@
> +
> +/**
> + * Stop reporting IO, see InitIOReporting()
> + */
> +void ClearIOReporting();

Do we really need to expose this, given that IO reporting is automatically cleared as Telemetry itself shuts down?

::: toolkit/components/telemetry/nsITelemetry.idl
@@ +240,5 @@
>     */
>    void asyncFetchTelemetryData(in nsIFetchTelemetryDataCallback aCallback);
> +
> +  /**
> +   * Get statistics of file IO reports, null, if not recorded.

Please describe the format of the returned object.

@@ +243,5 @@
> +  /**
> +   * Get statistics of file IO reports, null, if not recorded.
> +   */
> +  [implicit_jscontext]
> +  readonly attribute jsval fileIOReports;

You need to update the UUID of the interface or you're gonna have a bad time.

::: toolkit/xre/nsAppRunner.cpp
@@ +3404,5 @@
> +#else
> +  {
> +    const char* releaseChannel = NS_STRINGIFY(MOZ_UPDATE_CHANNEL);
> +    if (strcmp(releaseChannel, "nightly") == 0 ||
> +        strcmp(releaseChannel, "default") == 0) {

Why do we not want this turned on everywhere?  Why do we not want to enable it only when Telemetry.canReport or similar?

@@ +3406,5 @@
> +    const char* releaseChannel = NS_STRINGIFY(MOZ_UPDATE_CHANNEL);
> +    if (strcmp(releaseChannel, "nightly") == 0 ||
> +        strcmp(releaseChannel, "default") == 0) {
> +      mozilla::Telemetry::InitIOReporting(gAppData->xreDirectory);
> +    } 

Nit: trailing whitespace.
Attachment #8366340 - Flags: review?(nfroyd) → feedback+
(In reply to Nathan Froyd (:froydnj) from comment #86)
> Comment on attachment 8366340 [details] [diff] [review]
> P3: I/O Interpose Observer for Telemetry
> 
> Review of attachment 8366340 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> Looks generally good, but I have a few questions and observations.
> 
> ::: toolkit/components/telemetry/Telemetry.cpp
> @@ +37,5 @@
> >  #include "nsIFile.h"
> >  #include "nsIFileStreams.h"
> >  #include "nsIMemoryReporter.h"
> > +#include "nsIObserver.h"
> > +#include "nsIObserverService.h"
> 
> What are these for?  They look unused.

Vestigial. Removed.

> 
> @@ -359,5 @@
> >  }
> >  
> > -size_t
> > -TelemetryImpl::SizeOfIncludingThis(mozilla::MallocSizeOf aMallocSizeOf)
> > -{
> 
> njn should really look at these memory reporter changes; can you move the
> memory reporter changes to a separate patch and r?njn,froydnj, please?

OK.

> 
> @@ +2303,5 @@
> > +   */
> > +  void Observe(Observation& aOb);
> > +
> > +  /**
> > +   * Reflect recorded file IO statistics into Javascript 
> 
> Nit: trailing whitespace.

Fixed.

> 
> @@ +2310,5 @@
> > +
> > +  /**
> > +   * Get size of hash table with file stats
> > +   */
> > +  size_t SizeOfExcludingThis(mozilla::MallocSizeOf aMallocSizeOf) {
> 
> This should really be SizeOfIncludingThis because you're only ever accessing
> TelemetryIOInterposeObserver through a pointer.

Done.

> 
> @@ +2323,5 @@
> > +  // Statistics for each filename
> > +  AutoHashtable<FileIOEntryType> mFileStats;
> > +  // Container for whitelisted directories
> > +  nsTArray<nsString> mSafeDirs;
> > +  // nsCOMArray<nsIFile> mSafeDirs;
> 
> Remove this?

Fixed.

> 
> @@ +2327,5 @@
> > +  // nsCOMArray<nsIFile> mSafeDirs;
> > +
> > +  /**
> > +   * Reflect a FileIOEntryType object to a Javascript property on obj with
> > +   * filename as key containing array: [totalTime, reads, writes, fsync]
> 
> It looks like we reflect a lot more than that.  The comment or the code
> needs to be changed.

Fixed comment.

> 
> @@ +2333,5 @@
> > +  static bool ReflectFileStats(FileIOEntryType* entry, JSContext *cx,
> > +                               JS::Handle<JSObject*> obj);
> > +
> > +  /**
> > +   * Stupid size of method for keeping things simple
> 
> I don't think we need this comment because...
> 
> @@ +2339,5 @@
> > +  static size_t SizeOfFileIOEntryTypeExcludingThis(FileIOEntryType*,
> > +                                                   mozilla::MallocSizeOf,
> > +                                                   void*)
> > +  {
> > +    return 0;
> 
> ...we should be measuring the string key size here, if it's not shared.

Fixed in memory reporting patch.

> 
> @@ +2356,5 @@
> > +void TelemetryIOInterposeObserver::AddPath(const nsAString& aPath)
> > +{
> > +  mSafeDirs.AppendElement(aPath);
> > +}
> > +  
> 
> Nit: trailing whitespace.

Fixed.

> 
> @@ +2366,5 @@
> > +  }
> > +
> > +  // Get the filename
> > +  const char16_t* filename = aOb.Filename();
> > +  
> 
> Nit: trailing whitespace.

Fixed.

> 
> @@ +2374,5 @@
> > +  }
> > +
> > +  bool report = false;
> > +  nsDependentString filenameStr(filename);
> > +  for (uint32_t i = 0; i < mSafeDirs.Length(); ++i) {
> 
> Pull the Length call out of the loop, please.

Fixed.

> 
> @@ +2376,5 @@
> > +  bool report = false;
> > +  nsDependentString filenameStr(filename);
> > +  for (uint32_t i = 0; i < mSafeDirs.Length(); ++i) {
> > +    uint32_t curSafeDirLen = mSafeDirs[i].Length();
> > +    if (curSafeDirLen <= filenameStr.Length()) {
> 
> This one too.

Fixed.

> 
> @@ +2393,5 @@
> > +  if (!report) {
> > +    return;
> > +  }
> > +
> > +  // Find the leaf of the filename, that is last slash or backslash + 1
> 
> Since you already know the length thanks to |filenameStr|, you could start
> from the end and work backwards, which would be more efficient.

Done.

> 
> @@ +2395,5 @@
> > +  }
> > +
> > +  // Find the leaf of the filename, that is last slash or backslash + 1
> > +  const char16_t* leaf = nullptr;
> > +  const char16_t* c = filename;
> 
> Are you guaranteed that filename will be an absolute path here?  Seems like
> you could wind up with just a basename in filename, and you'd fail to find a
> leaf, which would be incorrect.

Yes, the interposers are expected to provide absolute paths. Furthermore, any path that didn't conform to this expectation would not pass the check against mSafeDirs and thus would never make it to here anyway.

> 
> @@ +2408,5 @@
> > +    return;
> > +  }
> > +
> > +  // Create a new entry or retrieve the existing one
> > +  FileIOEntryType* entry = mFileStats.PutEntry(nsDependentString(leaf));
> 
> I assume that the string eventually winds up being copied, so the dependency
> on the short-lived leaf here is not a problem.

Correct.

> 
> @@ +2441,5 @@
> > +{
> > +  // Let's not report arrays containing [0, 0, 0, 0, 0, 0]
> > +  if (entry->mData.totalTime == 0 && entry->mData.creates == 0 &&
> > +     entry->mData.reads == 0 && entry->mData.writes == 0 &&
> > +     entry->mData.fsyncs == 0 && entry->mData.stats == 0) {
> 
> Nit: the continuation lines should be aligned, like so:
> 
>   if (entry->
>       entry->
>       entry-> ...)
> 

Fixed.

> @@ +2901,5 @@
> > +
> > +  // Initialize IO interposing
> > +  IOInterposer::Init();
> > +  InitPoisonIOInterposer();
> > +  
> 
> Nit: trailing whitespace.
> 

Fixed.

> @@ +2912,5 @@
> > +{
> > +  if (!sTelemetryIOObserver) {
> > +    return;
> > +  }
> > +  IOInterposer::Unregister(IOInterposeObserver::OpAll, sTelemetryIOObserver);   
> 
> Nit: trailing whitespace.

Fixed.

> 
> ::: toolkit/components/telemetry/Telemetry.h
> @@ +165,5 @@
> > +
> > +/**
> > + * Stop reporting IO, see InitIOReporting()
> > + */
> > +void ClearIOReporting();
> 
> Do we really need to expose this, given that IO reporting is automatically
> cleared as Telemetry itself shuts down?

Agreed.

> 
> ::: toolkit/components/telemetry/nsITelemetry.idl
> @@ +240,5 @@
> >     */
> >    void asyncFetchTelemetryData(in nsIFetchTelemetryDataCallback aCallback);
> > +
> > +  /**
> > +   * Get statistics of file IO reports, null, if not recorded.
> 
> Please describe the format of the returned object.

Fixed.

> 
> @@ +243,5 @@
> > +  /**
> > +   * Get statistics of file IO reports, null, if not recorded.
> > +   */
> > +  [implicit_jscontext]
> > +  readonly attribute jsval fileIOReports;
> 
> You need to update the UUID of the interface or you're gonna have a bad time.

Fixed.

> 
> ::: toolkit/xre/nsAppRunner.cpp
> @@ +3404,5 @@
> > +#else
> > +  {
> > +    const char* releaseChannel = NS_STRINGIFY(MOZ_UPDATE_CHANNEL);
> > +    if (strcmp(releaseChannel, "nightly") == 0 ||
> > +        strcmp(releaseChannel, "default") == 0) {
> 
> Why do we not want this turned on everywhere?  Why do we not want to enable
> it only when Telemetry.canReport or similar?

I consulted with Vladan about this and we agreed that we should probably restrict it temporarily until we know what (if any) side effects are encountered by enabling this stuff. We agreed that I'd file a follow-up bug to shift this to be conditional on Telemetry.canReport once this feature's robustness has been confirmed. Does that work for you?

> 
> @@ +3406,5 @@
> > +    const char* releaseChannel = NS_STRINGIFY(MOZ_UPDATE_CHANNEL);
> > +    if (strcmp(releaseChannel, "nightly") == 0 ||
> > +        strcmp(releaseChannel, "default") == 0) {
> > +      mozilla::Telemetry::InitIOReporting(gAppData->xreDirectory);
> > +    } 
> 
> Nit: trailing whitespace.

Fixed.
Attachment #8366340 - Attachment is obsolete: true
Attachment #8367459 - Flags: review?(nfroyd)
I moved the memory reporter changes to a separate patch as requested by Nathan.
Attachment #8367461 - Flags: review?(nfroyd)
Attachment #8367461 - Flags: review?(n.nethercote)
Comment on attachment 8364483 [details] [diff] [review]
P2B: Filename for PoisonIOInterposer

Review of attachment 8364483 [details] [diff] [review]:
-----------------------------------------------------------------

Nice

::: tools/profiler/IOInterposer.h
@@ +100,5 @@
>        return mReference;
>      }
>  
>      /** Request filename associated with the I/O operation, null if unknown */
> +    virtual const char16_t* Filename()

IMO This should really be a UTF8 encoded string instead of UTF16

::: xpcom/tests/windows/moz.build
@@ +5,5 @@
>  # file, You can obtain one at http://mozilla.org/MPL/2.0/.
>  
>  CPP_UNIT_TESTS += [
>      'TestCOM.cpp',
> +    'TestNtPathToDosPath.cpp',

Are these tests part of 'make check' or do we need a separate make target for that?
Attachment #8364483 - Flags: review?(bgirard) → review+
Comment on attachment 8367459 [details] [diff] [review]
P3: I/O Interpose Observer for Telemetry

Review of attachment 8367459 [details] [diff] [review]:
-----------------------------------------------------------------

Thanks for the explanations.  Yes, please file a followup on the nsAppRunner bit.

::: toolkit/components/telemetry/Telemetry.cpp
@@ +297,5 @@
> +{
> +  nsAutoString xreDirPath;
> +  nsresult rv = aXreDir->GetPath(xreDirPath);
> +  if (NS_SUCCEEDED(rv)) {
> +    mSafeDirs.AppendElement(xreDirPath);

How about calling AddPath here?

@@ +305,5 @@
> +void TelemetryIOInterposeObserver::AddPath(const nsAString& aPath)
> +{
> +  mSafeDirs.AppendElement(aPath);
> +}
> + 

Nit: whitespace.

@@ +315,5 @@
> +  }
> +
> +  // Get the filename
> +  const char16_t* filename = aOb.Filename();
> + 

Nit: whitespace.

@@ +2877,5 @@
> +
> +  // Initialize IO interposing
> +  IOInterposer::Init();
> +  InitPoisonIOInterposer();
> + 

Nit: whitespace.

::: toolkit/components/telemetry/Telemetry.h
@@ +156,5 @@
>                              uint32_t delay);
>  
>  /**
> + * Initialize IO Reporting
> + * This will only record IO for files in the binary and user profile directories.

It looks like we're just initializing for aXreDir; the profile has to come later.  Fix the comment?
Attachment #8367459 - Flags: review?(nfroyd) → review+
Comment on attachment 8367461 [details] [diff] [review]
P4: Telemetry memory reporter changes

Review of attachment 8367461 [details] [diff] [review]:
-----------------------------------------------------------------

This looks good to me, but njn has more experience here.

Oh, you'll probably want to update the commit comment; this patch isn't really for "Changes to telemetry I/O reporting".

::: toolkit/components/telemetry/Telemetry.cpp
@@ +2336,5 @@
>                                 JS::Handle<JSObject*> obj);
> +
> +  static size_t SizeOfFileIOEntryTypeExcludingThis(FileIOEntryType* aEntry,
> +                                                   mozilla::MallocSizeOf mallocSizeOf,
> +                                                   void* arg)

Nit: you can drop the |arg| name.

@@ +2338,5 @@
> +  static size_t SizeOfFileIOEntryTypeExcludingThis(FileIOEntryType* aEntry,
> +                                                   mozilla::MallocSizeOf mallocSizeOf,
> +                                                   void* arg)
> +  {
> +    return mallocSizeOf(aEntry) +

You shouldn't be measuring the size of the entry itself since this is an *ExcludingThis function.  The higher level hashtable code will take care of this.  (At least, I'm pretty sure that's how this works.)
Attachment #8367461 - Flags: review?(nfroyd) → review+
Comment on attachment 8367461 [details] [diff] [review]
P4: Telemetry memory reporter changes

Review of attachment 8367461 [details] [diff] [review]:
-----------------------------------------------------------------

Looks pretty good, but I'd like another view once the comments are addressed. Thanks.

::: toolkit/components/telemetry/Telemetry.cpp
@@ +2312,5 @@
> +   * Get size of hash table with file stats
> +   */
> +  size_t SizeOfIncludingThis(mozilla::MallocSizeOf aMallocSizeOf) {
> +    return aMallocSizeOf(this) + SizeOfExcludingThis(aMallocSizeOf);
> +  }

Make this function |const|.

@@ +2314,5 @@
> +  size_t SizeOfIncludingThis(mozilla::MallocSizeOf aMallocSizeOf) {
> +    return aMallocSizeOf(this) + SizeOfExcludingThis(aMallocSizeOf);
> +  }
> +
> +  size_t SizeOfExcludingThis(mozilla::MallocSizeOf aMallocSizeOf) {

Ditto.

@@ +2317,5 @@
> +
> +  size_t SizeOfExcludingThis(mozilla::MallocSizeOf aMallocSizeOf) {
> +    return mFileStats.SizeOfExcludingThis(SizeOfFileIOEntryTypeExcludingThis,
> +                                          aMallocSizeOf) +
> +           mSafeDirs.SizeOfExcludingThis(aMallocSizeOf);

mSafeDirs is a nsTArray<nsString>, and SizeOfExcludingThis() just measures the storage used by the array itself. You probably also want to measure the strings within the array. Do that by iterating over the elements and measuring them individually. I can't remember if nsString has any kind of SizeOf function... nsTSubstring_CharT has SizeOfExcludingThisMustBeUnshared() and SizeOfExcludingThisIfUnshared(), but I can't remember/work out if nsString inherits from these.

@@ +2338,5 @@
> +  static size_t SizeOfFileIOEntryTypeExcludingThis(FileIOEntryType* aEntry,
> +                                                   mozilla::MallocSizeOf mallocSizeOf,
> +                                                   void* arg)
> +  {
> +    return mallocSizeOf(aEntry) +

Nathan's right: don't measure the entry itself, just the things hanging off it.

@@ +2517,5 @@
> +  n += mSanitizedSQL.SizeOfExcludingThis(nullptr, aMallocSizeOf);
> +  n += mTrackedDBs.SizeOfExcludingThis(nullptr, aMallocSizeOf);
> +  n += mHangReports.SizeOfExcludingThis();
> +  n += mThreadHangStats.sizeOfExcludingThis(aMallocSizeOf);
> +  if (sTelemetryIOObserver) {

Move this |if| below the "It's a bit gross..." comment.
Attachment #8367461 - Flags: review?(n.nethercote) → feedback+
All comments addressed.
Attachment #8367461 - Attachment is obsolete: true
Attachment #8368216 - Flags: review?(n.nethercote)
Comment on attachment 8368216 [details] [diff] [review]
P4: Telemetry memory reporter changes

Review of attachment 8368216 [details] [diff] [review]:
-----------------------------------------------------------------

Lovely. Thanks!

::: toolkit/components/telemetry/Telemetry.cpp
@@ +286,5 @@
> +  }
> +
> +  size_t SizeOfExcludingThis(mozilla::MallocSizeOf aMallocSizeOf) const {
> +    size_t size;
> +    size = mFileStats.SizeOfExcludingThis(SizeOfFileIOEntryTypeExcludingThis,

Nit: Merge the declaration and assignment. Or you could assign 0 in the declaration, and then use +=; that's an idiom that's quite common in SizeOf* functions.
Attachment #8368216 - Flags: review?(n.nethercote) → review+
Rebased P2B patch, carrying forward r+
Attachment #8364483 - Attachment is obsolete: true
Attachment #8369007 - Flags: review+
Rebased P3, carrying forward r+
Attachment #8367459 - Attachment is obsolete: true
Attachment #8369009 - Flags: review+
Rebased P4, carrying forward r+
Attachment #8368216 - Attachment is obsolete: true
Attachment #8369010 - Flags: review+
(In reply to Benoit Girard (:BenWa) from comment #89)
> Comment on attachment 8364483 [details] [diff] [review]
> P2B: Filename for PoisonIOInterposer
> 
> Review of attachment 8364483 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> Nice
> 
> ::: tools/profiler/IOInterposer.h
> @@ +100,5 @@
> >        return mReference;
> >      }
> >  
> >      /** Request filename associated with the I/O operation, null if unknown */
> > +    virtual const char16_t* Filename()
> 
> IMO This should really be a UTF8 encoded string instead of UTF16

My opinion differs, obviously, but I'm happy to discuss that as a follow-up.

> 
> ::: xpcom/tests/windows/moz.build
> @@ +5,5 @@
> >  # file, You can obtain one at http://mozilla.org/MPL/2.0/.
> >  
> >  CPP_UNIT_TESTS += [
> >      'TestCOM.cpp',
> > +    'TestNtPathToDosPath.cpp',
> 
> Are these tests part of 'make check' or do we need a separate make target
> for that?

They are part of make check.
I pushed a trivial fixup for mingw:

https://hg.mozilla.org/integration/mozilla-inbound/rev/ba58374ea933

_wcsnicmp expects wchar_t* argument, which is not the same as char16_t on GCC.
Depends on: 973353
Depends on: 978583
Blocks: 983957
Blocks: 1030487
The fstat on every write call is showing up in profiles on OS X as one of the more expensive parts of compositing. Can we not do that?
Flags: needinfo?(aklotz)
Eww! I'll take a closer look at the OSX implementation in bug 1270571.
Flags: needinfo?(aklotz)
Depends on: 1330359
No longer depends on: 1330359
Depends on: 1524574
You need to log in before you can comment on or make changes to this bug.