Make it possible to extract main thread IO data

RESOLVED FIXED in mozilla31

Status

()

RESOLVED FIXED
5 years ago
4 years ago

People

(Reporter: Yoric, Assigned: aklotz)

Tracking

({main-thread-io})

unspecified
mozilla31
main-thread-io
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(1 attachment, 1 obsolete attachment)

I'm dreaming of considering main thread I/O just like we consider memory leaks, i.e. as unforgiven bugs that should lead to a TEST-UNEXPECTED-FAIL.

Since we already poison main thread writes, I assume that the best way is to extract this information and make it accessible to the test harness.

Vlad, do you agree with me?
(filed as Telemetry because it is my understanding that this is where the information is available – I may be wrong)
would the ideal situation be where at the end of each test we query the telemetry counters and output them if needed?  Possibly we could put it in do_test_finished (http://mxr.mozilla.org/mozilla-central/source/testing/xpcshell/head.js#949) to pull the telemetry counters and throw an assertion.

I suspect there is a cleaner way to do it, but it would be similar to this.
Flags: needinfo?(vdjeric)
Sounds good to me.
I suspect that we're going to end up with a huge number of true-but-not-due-to-the-current-test positives if/when we turn this on, though.
I bet a lot of false positives will crop up. We could set expected I/O or thresholds in the manifest files. Another way is to look for certain things instead of everything.
(Assignee)

Comment 5

5 years ago
Note that this would overlap quite a bit with the xperf tests in Talos. Once we widen the scope of what's covered by xperf beyond startup, you'd see the same thing.
Some information about main-thread I/O will be present in telemetry, but this is not the place to query it.
You can get information about I/O from various sources, such as Sqlite, NSPR and poisoned IO methods by
implementing `IOInterposeObserver` from `tools/profiler/IOInterposer.h`. The register an instance of
your observer with the IOInterposer and ensure that reporting components are initialized.
(Sqlite, NSPR and posioned IO reporting can safely be initialized more than once, but this must be done during startup).

From bug 902587:
Part 1:  Refactored `IOInterposeObserver` to be more suitable for multiple independent consumers,
Part 2A: Reports poisoned IO writes methods on Windows/OS X through `IOInterposeObserver`
         (and in unrelated news refactors late-write-checks to use `IOInterposeObserver`)
Part 2B: Report poisoned IO read/open/etc.. methods on Windows through `IOInterposeObserver`.
Part 2C: Adds filename support for observations reported by poison IO methods.
(Part 3 is telemetry consumption of this information, not relevant for this bug)

So far Part 1 and Part 2A have landed, Part 2B is r+'ed by might have bit rotten a bit in the wait :)
Either way, this bug shouldn't be blocked by 902587 once Part 2C lands.

@Yoric,
I'm sure there'll be lots of false positives, this thing will report all IO, including libraries being loaded.
So we could get false positives because specific windows version loads a library of different size, or something.
This could probably be mitigated by thresholds, and with filenames, it should be possible detect if a file located
in the profile folder, which I imagine is a red flag.

Note: Functions Mozilla(Un)?RegisterDebug(FD|File) from `PoisonIOInterposer.h` can be used to output debug logs without
setting of any alarms. Also useful for logging IO without causing infinite recursion :)
Flags: needinfo?(vdjeric)
I think we can start moving on this agian. We could use the IOInterposer + profiler infrastructure to capture main-thread IO stacks and compare them against a whitelist
(Assignee)

Updated

5 years ago
Assignee: nobody → aklotz
(Assignee)

Comment 8

4 years ago
Created attachment 8407844 [details] [diff] [review]
Main Thread I/O Logger for IOInterposer (rev 1)

This patch adds a new IOInterposer observer that runs a low-priority logging thread when a specific environment variable is set. As with other IOInterposer related code it avoids using classes that have leak checking etc since it runs until process termination.
Attachment #8407844 - Flags: review?(nfroyd)
Comment on attachment 8407844 [details] [diff] [review]
Main Thread I/O Logger for IOInterposer (rev 1)

Review of attachment 8407844 [details] [diff] [review]:
-----------------------------------------------------------------

Next time, please break things up into smaller, self-contained changes, something like:

1. ObservedOperationString change(s);
2. Adding the staging op change(s);
3. Main thread I/O observer.

::: xpcom/build/IOInterposer.cpp
@@ +413,5 @@
> +      return "close";
> +    case OpNextStage:
> +      return "NextStage";
> +    default:
> +      return nullptr;

I think returning non-null here will be less surprising and less deadly if something goes wrong.

::: xpcom/build/IOInterposerUtil.h
@@ +8,5 @@
> +#define xpcom_build_IOInterposerUtil_h
> +
> +/* This header file contains declarations for helper classes that are
> +   to be used exclusively by IOInterposer and its observers. This header
> +   file is not to be used by anything else and MUST NOT be exported! */

Let's call this file IOInterposerPrivate.h to make that doubly clear.

::: xpcom/build/MainThreadIOLogger.cpp
@@ +35,5 @@
> +    if (filename) {
> +      mFilename = filename;
> +    }
> +  }
> +  

Nit: whitespace.

@@ +63,5 @@
> +  TimeStamp             mLogStartTime;
> +  const char*           mFileName;
> +  PRThread*             mIOThread;
> +  IOInterposer::Monitor mMonitor;
> +  bool                  mShutdownRequired;

Please make this Atomic<bool>.

@@ +143,5 @@
> +      }
> +      // Pull events off the shared array onto a local one
> +      std::vector<ObservationWithStack> observationsToWrite;
> +      observationsToWrite.swap(mObservations);
> +      

Nit: whitespace.

@@ +153,5 @@
> +      while (i != observationsToWrite.end()) {
> +        if (i->mObservation.ObservedOperation() == OpNextStage) {
> +          PR_fprintf(fd, "%f,NEXT-STAGE\n",
> +                     (TimeStamp::Now() - mLogStartTime).ToMilliseconds());
> +          i = observationsToWrite.erase(i);

It seems like it would be significantly more efficient if you just did a standard |for| loop and then .erase()'d at the end.  I don't see how erasing in the middle of the loop is buying you anything.

@@ +179,5 @@
> +            //       (This will be added in a later bug)
> +            profiler_free_backtrace(stack);
> +          }
> +        }
> +        i = observationsToWrite.erase(i);

Here too.
Attachment #8407844 - Flags: review?(nfroyd) → review+
(Assignee)

Comment 10

4 years ago
(In reply to Nathan Froyd (:froydnj) from comment #9)
> Comment on attachment 8407844 [details] [diff] [review]
> Main Thread I/O Logger for IOInterposer (rev 1)
> 
> Review of attachment 8407844 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> Next time, please break things up into smaller, self-contained changes,
> something like:
> 
> 1. ObservedOperationString change(s);
> 2. Adding the staging op change(s);
> 3. Main thread I/O observer.

Will do.

> 
> ::: xpcom/build/IOInterposer.cpp
> @@ +413,5 @@
> > +      return "close";
> > +    case OpNextStage:
> > +      return "NextStage";
> > +    default:
> > +      return nullptr;
> 
> I think returning non-null here will be less surprising and less deadly if
> something goes wrong.

Replaced with "unknown".

> 
> ::: xpcom/build/IOInterposerUtil.h
> @@ +8,5 @@
> > +#define xpcom_build_IOInterposerUtil_h
> > +
> > +/* This header file contains declarations for helper classes that are
> > +   to be used exclusively by IOInterposer and its observers. This header
> > +   file is not to be used by anything else and MUST NOT be exported! */
> 
> Let's call this file IOInterposerPrivate.h to make that doubly clear.

Done.

> 
> ::: xpcom/build/MainThreadIOLogger.cpp
> @@ +35,5 @@
> > +    if (filename) {
> > +      mFilename = filename;
> > +    }
> > +  }
> > +  
> 
> Nit: whitespace.

Fixed.

> 
> @@ +63,5 @@
> > +  TimeStamp             mLogStartTime;
> > +  const char*           mFileName;
> > +  PRThread*             mIOThread;
> > +  IOInterposer::Monitor mMonitor;
> > +  bool                  mShutdownRequired;
> 
> Please make this Atomic<bool>.

Could you please clarify your reasoning for this? All accesses to mShutdownRequired occur while the lock is held.

> 
> @@ +143,5 @@
> > +      }
> > +      // Pull events off the shared array onto a local one
> > +      std::vector<ObservationWithStack> observationsToWrite;
> > +      observationsToWrite.swap(mObservations);
> > +      
> 
> Nit: whitespace.

Fixed.

> 
> @@ +153,5 @@
> > +      while (i != observationsToWrite.end()) {
> > +        if (i->mObservation.ObservedOperation() == OpNextStage) {
> > +          PR_fprintf(fd, "%f,NEXT-STAGE\n",
> > +                     (TimeStamp::Now() - mLogStartTime).ToMilliseconds());
> > +          i = observationsToWrite.erase(i);
> 
> It seems like it would be significantly more efficient if you just did a
> standard |for| loop and then .erase()'d at the end.  I don't see how erasing
> in the middle of the loop is buying you anything.

Fair enough.

> 
> @@ +179,5 @@
> > +            //       (This will be added in a later bug)
> > +            profiler_free_backtrace(stack);
> > +          }
> > +        }
> > +        i = observationsToWrite.erase(i);
> 
> Here too.
Flags: needinfo?(nfroyd)
(Assignee)

Comment 11

4 years ago
Created attachment 8409353 [details] [diff] [review]
Main Thread I/O Logger for IOInterposer (rev 2)

Patch updated with the exception of one remark which is needinfo?. Carrying forward r+.
Attachment #8407844 - Attachment is obsolete: true
Attachment #8409353 - Flags: review+
(In reply to Aaron Klotz [:aklotz] from comment #10)
> > @@ +63,5 @@
> > > +  TimeStamp             mLogStartTime;
> > > +  const char*           mFileName;
> > > +  PRThread*             mIOThread;
> > > +  IOInterposer::Monitor mMonitor;
> > > +  bool                  mShutdownRequired;
> > 
> > Please make this Atomic<bool>.
> 
> Could you please clarify your reasoning for this? All accesses to
> mShutdownRequired occur while the lock is held.

You're right!  Sorry about that.  Too many patches where people share variables between threads and assume things will be "ok" with Atomic<>.  Locks are great too, thanks.
Flags: needinfo?(nfroyd)
https://hg.mozilla.org/mozilla-central/rev/83c8cecf8297
Status: NEW → RESOLVED
Last Resolved: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla31
(Assignee)

Updated

4 years ago
Blocks: 998794
Blocks: 1126790
You need to log in before you can comment on or make changes to this bug.