Pre-open() and send the fd for app process's application.zip

RESOLVED FIXED in Firefox 21

Status

()

RESOLVED FIXED
6 years ago
2 years ago

People

(Reporter: cjones, Assigned: bent.mozilla)

Tracking

unspecified
mozilla21
x86_64
Linux
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(blocking-b2g:-, firefox19 wontfix, firefox20 wontfix, firefox21 fixed, b2g18 wontfix, b2g18-v1.0.0 wontfix, b2g18-v1.0.1 wontfix)

Details

Attachments

(2 attachments, 11 obsolete attachments)

57.91 KB, patch
jduell
: review+
Details | Diff | Splinter Review
693 bytes, patch
Details | Diff | Splinter Review
With bug 835681 "fixed" and bug 835679 partially fixed, we spend significant idle time in early startup in app processes, about 55 samples (~120ms).  We spend this idle time in an "async-sync wait loop" waiting for the application.zip to come back from the master process.

During this time, the master process is churning away at a lot of expensive work.

We can try to reduce the master-process busy time, but a win up-front (along with those other bugs) would be to pre-send an open()'d fd for an app's application.zip (if it has one).  We know we're going to need it "now" so we might as well eagerly open it and forward it along.
I can take this.
Assignee: nobody → bent.mozilla
blocking-b2g: --- → tef?
This might avoid bug 836198, incidentally.  But we should set things up that so we don't have to rely on timing (excluding degenerate cases).
Created attachment 708164 [details] [diff] [review]
Patch, v1

This is a first stab. It's tricky since I can't test on my build machine (Windows), but jduell's linux xpcshell tests for remoteopenfile are passing on tryserver (https://tbpl.mozilla.org/?tree=Try&rev=ebd26984691a) so I think this is close at least.
Attachment #708164 - Flags: review?(josh)
Attachment #708164 - Flags: review?(jduell.mcbugs)
blocking-b2g: tef? → tef+
Marking status-b2g18 and status-b2g18-v1.0.0 as affected, please update the status to fixed once this is verified landed on v1-train/mozilla-b2g18 and v1.0.0/mozilla-b2g18_v_1_0_0
status-b2g18: --- → affected
status-b2g18-v1.0.0: --- → affected
If we fadvise() properly on
 /system
 /system/b2g
 /system/b2g/webapps
 /data
 /data/local
 /data/local/webapps

is it valid to assume that open()ing any files from there will be basically free?

If so, that would let us simplify the work here a bit (and guarantee the optimization is more likely to kick in).
(In reply to Chris Jones [:cjones] [:warhammer] from comment #5)
> If we fadvise() properly on
>  /system
>  /system/b2g
>  /system/b2g/webapps
>  /data
>  /data/local
>  /data/local/webapps
> 
> is it valid to assume that open()ing any files from there will be basically
> free?

As long as they stay in memory, yes, but they will only stay there if there is no memory pressure.
Well, it depends when you fadvise()
Memory pressure meaning anything that starts throwing things out of pagecache?  If so, do you happen to know the algorithm for pagecache eviction?  The stuff in comment 6 will be a handful of relatively frequently used pages.
I think the kernel just evicts the least recent used pages from pagecache.
Note that if these files are actually used that often, then they will be in the pagecache except the first time they are accessed. Their i/o shouldn't get in the way on warm startup. If they do, then there's just too much memory pressure for all "interesting" files to live in the pagecache forever.
Created attachment 708576 [details] [diff] [review]
Patch, v1.1

Same basic patch but I fixed the things you saw yesterday. I can give you an interdiff if that helps.
Attachment #708164 - Attachment is obsolete: true
Attachment #708164 - Flags: review?(josh)
Attachment #708164 - Flags: review?(jduell.mcbugs)
Attachment #708576 - Flags: review?(jduell.mcbugs)
Comment on attachment 708576 [details] [diff] [review]
Patch, v1.1

Review of attachment 708576 [details] [diff] [review]:
-----------------------------------------------------------------

When I run with this patch on B2G desktop, after I get through FTU the desktop shows w/o icons, and it appears that the homescreen app is getting launched over and over.  I gdb'd one instance and it's failing at TabChild.cpp:1279, i.e. MOZ_ASSERT(!info->mCallback).

More generally, this patch feels at least twice as complicated as it needs to be.  I can't completely put my finger on it, but here's some ideas:

1) If we could determine on the child during app:// loads that the load is coming from the app's own application.zip (i.e. not a Webapps:manage app loading other apps' zip files), we would know that the parent is going to send an fd for us, and we could stop issuing a duplicate, racing request for the fd, which would eliminate a *lot* of the code complexity here.  I want us to think a little more about whether we can determine load-is-from-my-app:  in TabChild::GetCachedFileDescriptor, for instance, can't we simply compare the aPath requested with the path for our own application.zip (which we can construct from our app uuid: if we can't get the basename in the child to get a full pathname match, isn't it enough to strstr for the uuid?  Surely we don't allow same-uuid apps in different directories?)

2) The Refcounting is really hard to follow in places.  I suspect at least part of that is that we're storing addref'd objects (always a RemoteFileOpenChild) as a void * in mContext across some function calls.  As I comment below I think converting that to nsISupports* or just RemoteFileOpenChild* might make this a lot cleaner.

3) Currently we know that we're only getting 1 fd max from the parent, but this patch uses an nsTArray of autoptrs "just in case" we get more.  I suspect the idea is that this is infrastructure for future optimizations?  For a patch we're going to land on 1.0.0 I suspect we're better of keeping things as simple as possible for now.

::: dom/ipc/TabChild.cpp
@@ +125,5 @@
> +class TabChild::CachedFileDescriptorInfo
> +{
> +    struct PathOnlyComparatorHelper
> +    {
> +        bool Equals(const nsAutoPtr<CachedFileDescriptorInfo>& a,

you've got a bunch of ^M (I assume DOS-style line endings), for some reason only in the two CachedFileDescriptorInfo constructors

@@ +139,5 @@
> +                    const CachedFileDescriptorInfo& b) const
> +        {
> +            return a->mPath == b.mPath &&
> +                    a->mCallback == b.mCallback &&
> +                    a->mClosure == b.mClosure;

extra points if you pretty-indent.  Line 'em up!

@@ +147,5 @@
> +public:
> +    nsString mPath;
> +    FileDescriptor mFileDescriptor;
> +    CachedFileDescriptorCallback mCallback;
> +    void* mClosure;

If you change mClosure to a nsCOMPtr<nsISupports> I suspect following the refcounting will be easier in the rest of the code. (Since the only class that you actually use as an mClosure is RemoteOpenFileChild, you could even use a nsRefPtr to that?)

@@ +155,5 @@
> +      : mPath(aPath), mCallback(nullptr), mClosure(nullptr), mCanceled(false)
> +    { }
> +
> +    CachedFileDescriptorInfo(const nsAString& aPath,
> +                              const FileDescriptor& aFileDescriptor)

more pretty indent

@@ +162,5 @@
> +    { }
> +
> +    CachedFileDescriptorInfo(const nsAString& aPath,
> +                              CachedFileDescriptorCallback aCallback,
> +                              void* aClosure)

ditto

@@ +1206,5 @@
> +    MOZ_ASSERT(NS_IsMainThread());
> +    MOZ_ASSERT(!aPath.IsEmpty());
> +
> +    // aFileDescriptor may be invalid here, but the callback will choose how to
> +    // handle it.

Seems to me that aFileDesc can't be invalid here.  Both because we won't send the IPDL message if OpenNSPRFileDesc failed, and because (at least as of a few weeks ago: doubt it's changed) IPDL will actually *kill* the child process if you try to send it an invalid fd (it treats it as a serialization error: it's a bug, really).

@@ +1236,5 @@
> +        // Only close if this is a valid file descriptor.
> +        if (aFileDescriptor.IsValid()) {
> +            nsRefPtr<CloseFileRunnable> runnable =
> +                new CloseFileRunnable(aFileDescriptor);
> +            runnable->Dispatch();

nit: it reads cleaner if you just call Dispatch(new runnable), and it actually saves a few threadsafe (i.e expensive atomic) refcount operations.  Old code couldn't use this idiom because we had to check for new() failing.

::: dom/ipc/TabChild.h
@@ +321,5 @@
>      void GetAppType(nsAString& aAppType) const { aAppType = mAppType; }
>  
> +    typedef void (*CachedFileDescriptorCallback)(const nsAString& aPath,
> +                                                 const FileDescriptor& aFD,
> +                                                 void* aClosure);

I think aClosure should be nsISupports*, per my earlier comment, here and elsewhere in the patch.

@@ +452,5 @@
>      // that exceeds the gesture threshold doesn't happen.
>      CancelableTask* mTapHoldTimer;
> +    // At present only 1 of these is really expected.
> +    nsAutoTArray<nsAutoPtr<CachedFileDescriptorInfo>, 1>
> +        mCachedFileDescriptorInfos;

ditch array for now if we know we're only getting 1 fd?

::: dom/ipc/TabParent.cpp
@@ +66,5 @@
>  // The flags passed by the webProgress notifications are 16 bits shifted
>  // from the ones registered by webProgressListeners.
>  #define NOTIFY_FLAG_SHIFT 16
>  
> +namespace {

don't use anonymous namespace.  Give the namespace some sort of unique-ish name, else give the class one.  Speaking of which, "OpenFileRunnable" is not very descriptive.  How about "OpensFileAndSendsFdToChild"?

@@ +101,5 @@
> +    {
> +        MOZ_ASSERT(!mFD);
> +    }
> +
> +    NS_IMETHOD Run()

So this works because nsRunnable:Run is public, and the virtual call works even with a private Run in this class?  I forget my C++ rules here.  Just make it public?

@@ +118,5 @@
> +    void SendResponse()
> +    {
> +        MOZ_ASSERT(NS_IsMainThread());
> +        MOZ_ASSERT(mTabParent);
> +        MOZ_ASSERT(mEventTarget);

Looks like you can toss in MOZ_ASSERT(mFD) here too.

@@ +171,5 @@
> +
> +        MOZ_ASSERT(mFD);
> +
> +        PR_Close(mFD);
> +        mFD = nullptr;

mFD is an int on Unix/Mac, and even if nullptr compiles, 0 is usually stdin :)  I think you should just #ifdef WINDOZE ... nullptr  #else -1 here.  (That's what nspr does to define INVALID_HANDLE, but it's hidden in FileDescriptor.cpp so we can't use it).

@@ +175,5 @@
> +        mFD = nullptr;
> +    }
> +};
> +
> +} // anonymous namespace

replace with namespace name

@@ +340,5 @@
>      return true;
>  }
>  
>  void
>  TabParent::LoadURL(nsIURI* aURI)

I don't know the codepath(s) that kick off a load of application.zip resources in the child.  If you know for sure that this is the one spot to hook, I'll trust you, otherwise get another reviewer for the DOM/load pathway.

@@ +349,5 @@
>  
>      nsCString spec;
>      aURI->GetSpec(spec);
>  
> +    if (!mShown) {

You're now using aURI unconditionally if !mShown, while the old code checked it for null first.  Make sure that's safe or change to check null before GetSpec call

@@ +360,3 @@
>      unused << SendLoadURL(spec);
> +
> +    // If this app is a packaged app then we need to send over the file

s/we need to send/we can speed up startup by sending/

::: netwerk/ipc/RemoteOpenFileChild.cpp
@@ +129,5 @@
> +  }
> +
> +  if (mTabChild) {
> +    // Add one reference for the cached file descriptor callback.
> +    nsRefPtr<RemoteOpenFileChild> callbackReference = this;

You might as well just call AddRef(): it's the same thing as this nsRefPtr construct-then-forget, but more efficient and less confusing.  But still confusing:  better still would be to have GetCachedFD AddRef() things as needed using safe constructs like nsRefPtr.  Which it could do it you change its void *Context to an nsISupports* (or RemoteOpenFileChild*).

@@ +182,5 @@
> +  self->Done(aFD, /* aFromIPDL */ false);
> +}
> +
> +void
> +RemoteOpenFileChild::Done(const FileDescriptor& aFD, bool aFromIPDL)

There's got to be a better name than "Done".  Even something cheesy like DoNotifyListener() would be better.

The refcounting as things go into Done() via IPDL or not seems buggy to me, but maybe I'm just having trouble following it...

@@ +195,5 @@
> +    if (aFD.IsValid()) {
> +      nsRefPtr<CloseFileRunnable> runnable = new CloseFileRunnable(aFD);
> +      runnable->Dispatch();
> +    }
> +    return;

returning w/o releasing the addRef you did for non-IPDL pathway?

@@ +207,5 @@
> +  nsRefPtr<RemoteOpenFileChild> callbackReference;
> +
> +  if (tabChild) {
> +    // Absorb the extra reference we gave to the callback function.
> +    callbackReference = dont_AddRef(this);

Release()'ing whether aFromIPDL or not.  But AFAICT the IPDL path doesn't have the extra AddRef() (it does AddIPDLReference, but that's matched by the NeckoChild::DeallocPRemoteOpenFile call to ReleaseIPDLReference: so this Release seems like an extra one for the IPDL case?).

@@ +220,5 @@
> +
> +      tabChild->CancelCachedFileDescriptorCallback(
> +                                                  path,
> +                                                  CachedFileDescriptorCallback,
> +                                                  this);

Grossness that we could avoid if we don't have 2 fd messages racing...
Attachment #708576 - Flags: review?(jduell.mcbugs) → review-
(In reply to Jason Duell (:jduell) from comment #12)
> When I run with this patch on B2G desktop, after I get through FTU the
> desktop shows w/o icons, and it appears that the homescreen app is getting
> launched over and over.  I gdb'd one instance and it's failing at
> TabChild.cpp:1279, i.e. MOZ_ASSERT(!info->mCallback).

This is worrisome, but my patch has that assert on a different line. Are you sure you have the right patch applied? And that you did a full rebuild? I haven't been able to make this assertion fail with the most recent patch.

> 1) If we could determine on the child during app:// loads that the load is
> coming from the app's own application.zip (i.e. not a Webapps:manage app
> loading other apps' zip files), we would know that the parent is going to
> send an fd for us, and we could stop issuing a duplicate, racing request for
> the fd, which would eliminate a *lot* of the code complexity here.

I'm not sure... It would remove the need to cancel a pending callback, but that's really the end of the benefits, right? We'd still need all the rest, plus we'd have to add more brittle code to figure out when this was the app's own zip file.

> you've got a bunch of ^M (I assume DOS-style line endings), for some reason
> only in the two CachedFileDescriptorInfo constructors

Bah, MSVC likes doing that... I fixed the ones I found, let me know if I missed anything.

> extra points if you pretty-indent.  Line 'em up!

More editor fail (on linux I use Komodo)! It auto-aligned to the nearest 2-char line.

> If you change mClosure to a nsCOMPtr<nsISupports> I suspect following the
> refcounting will be easier in the rest of the code. (Since the only class
> that you actually use as an mClosure is RemoteOpenFileChild, you could even
> use a nsRefPtr to that?)

I changed to using nsISupports for now, though I really don't like requiring that. It does make the refcounting easier to follow now. That's about as specific a class as I'd like to go though.

The one big downside is that RemoteOpenFileChild multiply-inherits nsISupports and that makes a lot of extra casting necessary. I added a bunch of DEBUG stuff to make sure that the casts are doing the right things and I added some scary comments around the QI function.

Converting from a nsISupports to RemoteOpenFileChild is even scarier... I could add a private QI if you prefer that over the cast madness.

> Seems to me that aFileDesc can't be invalid here.  Both because we won't
> send the IPDL message if OpenNSPRFileDesc failed, and because (at least as
> of a few weeks ago: doubt it's changed) IPDL will actually *kill* the child
> process if you try to send it an invalid fd (it treats it as a serialization
> error: it's a bug, really).

This has changed, bug 831307. All file handles need to be inspected on the receiving side now.

> nit: it reads cleaner if you just call Dispatch(new runnable), and it
> actually saves a few threadsafe (i.e expensive atomic) refcount operations.

It's actually unsafe to do that since it violates one of COM's basic rules (that all in-params have a stable refcount). Depending on how the called function works it could lead to a leak or to a crash.

> ditch array for now if we know we're only getting 1 fd?

Well, we don't know that, really... There's now a simple IPDL message that can cache any number of these things if someone adds another caller. We could make it fail if we ever receive more than one but that seems kind of lame.

> don't use anonymous namespace.

I'll let the style gods work this out elsewhere. I just removed the namespace for now.

> How about "OpensFileAndSendsFdToChild"?

"OpenFileAndSendFDRunnable"?

> So this works because nsRunnable:Run is public, and the virtual call works
> even with a private Run in this class?  I forget my C++ rules here.  Just
> make it public?

Yep, when dealing with a concrete pointer to this thing you can only call Dispatch (though you can get around it by casting to nsIRunnable).

I changed it for you, but then I had to add a comment saying "please don't let anything call this" which seems less good. I like my method better, but you're the reviewer here.

> Looks like you can toss in MOZ_ASSERT(mFD) here too.

Yep!

> mFD is an int on Unix/Mac, and even if nullptr compiles, 0 is usually stdin
> :)  I think you should just #ifdef WINDOZE ... nullptr  #else -1 here. 
> (That's what nspr does to define INVALID_HANDLE, but it's hidden in
> FileDescriptor.cpp so we can't use it).

mFD is a PRFileDesc* here, it's either null or not.

> I don't know the codepath(s) that kick off a load of application.zip
> resources in the child.  If you know for sure that this is the one spot to
> hook, I'll trust you, otherwise get another reviewer for the DOM/load
> pathway.

Yep, I've looked all through it and this is the spot.

> You're now using aURI unconditionally if !mShown, while the old code checked
> it for null first.  Make sure that's safe or change to check null before
> GetSpec call

I made sure it was safe before, but now I've added an assertion just to make it clear.

> s/we need to send/we can speed up startup by sending/

Ok.

> There's got to be a better name than "Done".  Even something cheesy like
> DoNotifyListener() would be better.

I went with "HandleFileDescriptorAndNotifyListener".

> The refcounting as things go into Done() via IPDL or not seems buggy to me,
> but maybe I'm just having trouble following it...
> ...
> returning w/o releasing the addRef you did for non-IPDL pathway?
> ...
> Release()'ing whether aFromIPDL or not.  But AFAICT the IPDL path doesn't
> have the extra AddRef() (it does AddIPDLReference, but that's matched by the
> NeckoChild::DeallocPRemoteOpenFile call to ReleaseIPDLReference: so this
> Release seems like an extra one for the IPDL case?).

Yeah, this is all safe and correct as far as I can tell, though it's not really important now that I made the callback hold the ref automatically. Basically before we were always adding a ref if we had a TabChild, and then we were always removing that ref the first time we got the callback (either because we were being called from the callback or because we were about to cancel the callback).
Created attachment 709043 [details] [diff] [review]
Patch, v1.2
Attachment #708576 - Attachment is obsolete: true
Attachment #709043 - Flags: review?(jduell.mcbugs)
Comment on attachment 709043 [details] [diff] [review]
Patch, v1.2

Review of attachment 709043 [details] [diff] [review]:
-----------------------------------------------------------------

Clearing review for now because we'll have another patch coming since we're seeing crash on b2g18.

I've confirmed that the crash on the b2g18 branch happens because we have 2 outstanding RemoteFileOpen requests to the same file (so we hit the assertion MOZ_ASSERT(!info->mCallback) in GetCachedFileDescriptor().  The file is 

  /profile/webapps/communications.gaiamobile.org/application.zip

and is opened twice by the HomeScreen app (perhaps on inbound it only gets opened once).

On IRC I was worrying about fixing in a way that would result in the child process keeping duplicate mmapped fd's to the same file if we allow 2 independent RemoteOpenFile's to proceed and complete.  But on looking at the code I think that's not an issue: nsJARChannel::OnRemoteFileOpenComplete calls CreateJarInput() which already checks the JAR cache, so we'll use the first mmap and the 2nd or following fd's that we passed from the parent will just get closed when mJarFile goes out of scope (we might want to close them to avoid keeping a useless fd around? We could check the JARCache for the file in OnRemoteFileOpenComplete and close the fd if its already cached).

If you're more ambitious we could stop the extra IPDL traffic by keeping some kind of RemoteOpenFileService that provides "dud" RemoteOpenFile channels (with some mDoNotDoIPDL flag or whatnot), keeps tabs on them, and then when the 1st, "real" RemoteOpenFileChannel for the path completes, it triggers the service to tell all the other dud channels to notify their nsIRemoteOpenFileListeners.  The way nsJARChannel::OnRemoteFileOpenComplete is written they'll all get JARCache hits as long as the 1st channel's listener is triggered first.  

> we could stop issuing a duplicate, racing requests for
> the fd, which would eliminate a *lot* of the code complexity here

OK, I won't be hard-line about this.  Implementor's choice (I always choose to avoid races when possible, but go for it).

> I changed to using nsISupports for now, though I really don't like requiring
> that. It does make the refcounting easier to follow...  The one big downside
> is that RemoteOpenFileChild multiply-inherits nsISupports and that makes a lot
> of extra casting necessary.

Would it be easier just to create a new 'CachedFileDescriptorListener' IDL and use that?  Avoids both void* and multiple inheritance of ISupports.  Passing around a callback function ptr and a context arg seems so old-school.

> It's actually unsafe to do that since it violates one of COM's basic rules
> (that all in-params have a stable refcount)

Thanks for letting me know.

> I changed it for you, but then I had to add a comment saying "please don't let
> anything call this" which seems less good. I like my method better, but you're
> the reviewer here.

Sooo tempting to exert my endless arbitrary power.  But go ahead and make it private.

>> The refcounting as things go into Done() via IPDL or not seems buggy to me
>
> this is all safe and correct as far as I can tell

OK, I'll look over the refcounting logic again when I see your next patch,  Perhaps all we need is a comment explaining how/why it's correct.

Everything else in your response seems fine.
Attachment #709043 - Flags: review?(jduell.mcbugs) → feedback+
Created attachment 709765 [details] [diff] [review]
Patch, v1.3

Ok, this addresses the multi-request from the child. I walked through it with jdm a little and he said it made sense to him! I've never been able to reproduce the multi-request though so please give it a whirl on your machine.

I also changed the callback function/closure to an interface :)
Attachment #709043 - Attachment is obsolete: true
Attachment #709044 - Attachment is obsolete: true
Attachment #709765 - Flags: review?(jduell.mcbugs)
Comment on attachment 709765 [details] [diff] [review]
Patch, v1.3

Oops, wrong patch.
Attachment #709765 - Attachment is obsolete: true
Attachment #709765 - Flags: review?(jduell.mcbugs)
Attachment #709766 - Attachment is obsolete: true
Created attachment 709767 [details] [diff] [review]
Patch, v1.3
Attachment #709767 - Flags: review?(jduell.mcbugs)
Comment on attachment 709767 [details] [diff] [review]
Patch, v1.3

Review of attachment 709767 [details] [diff] [review]:
-----------------------------------------------------------------

The logic in this patch is looking good to me now. I only have nits (and aside from perf issue, I'd +r with those fixed w/o another review needed).

Now for the bad news:  I'm not seeing a performance gain with this patch.  I measured a bunch of apps in an opt, variant=user build on unagi, using |Settings ... Developer options ... Show startup timings| to get numbers:

APP           b2g18    with patch
--------------------------------------
clock         1681     1980
Cost control  2130     2203
Camera        1681     2000
FM radio       977      772
Calendar      2017     1957

So, worse numbers often :(  Haven't had time to look into what's going on here (I traced though gdb and things seemed to be working as expected).  Would also be nice to see if these numbers are replicable: perhaps I'm doing something wrong.

::: dom/ipc/TabChild.cpp
@@ +170,5 @@
> +    PathAndCallbackComparatorHelper PathAndCallbackComparator() const
> +    {
> +        return PathAndCallbackComparatorHelper();
> +    }
> +    

nit: whitespace in empty line.

::: dom/ipc/TabParent.cpp
@@ +94,5 @@
> +    }
> +
> +    // This shouldn't be called directly except by the event loop. Use Dispatch
> +    // to start the sequence.
> +    NS_IMETHOD Run()

Feel free to make private.

@@ +359,5 @@
> +    // If this app is a packaged app then we can speed startup by sending over
> +    // the file descriptor for the "application.zip" file that it will
> +    // invariably request. Only do this once.
> +    if (!mAppPackageFileDescriptorSent) {
> +        mAppPackageFileDescriptorSent = true;

this conflicts with comment from variable declaration: 

  // This will only ever be true for packaged apps (i.e. GetOwnOrContainingApp()
  // returns an app with "app://" manifest URL).

personally I'd just ditch the comment rather than expend the cycles to check for app:// etc each time LoadURI is hit.

::: dom/ipc/nsICachedFileDescriptorListener.h
@@ +24,5 @@
> +
> +class NS_NO_VTABLE nsICachedFileDescriptorListener : public nsISupports
> +{
> +public:
> +  NS_DECLARE_STATIC_IID_ACCESSOR(NS_ICACHEDFILEDESCRIPTORLISTENER_IID)

I'm just going to assume you're following a well-known (but not by me :) recipe for declaring an XPCOM class here w/o using an IDL.  I don't understand all the NO_VTABLE, etc magic here.

What's the benefit of doing things this way versus just using a 2 line IDL file?

::: ipc/glue/FileDescriptorUtils.h
@@ +11,5 @@
> +
> +namespace mozilla {
> +namespace ipc {
> +
> +class CloseFileRunnable : public nsIRunnable

add comment above class name:

// When Dispatch() is called (from main thread) this class arranges to close the provided FileDescriptor on the socket transport thread (to avoid main thread I/O).

::: modules/libjar/nsJARChannel.cpp
@@ +364,5 @@
> +            mOpeningRemote = true;
> +
> +            if (gJarHandler->RemoteOpenFileInProgress(remoteFile, this)) {
> +                // Wait for the previous request to finish, then use the cached
> +                // file.

how about

// JarHandler will trigger OnRemoteFileOpen() after first request for this file completes,
// and we'll get a JAR cache hit.

@@ +935,5 @@
>  nsJARChannel::OnRemoteFileOpenComplete(nsresult aOpenStatus)
>  {
>      nsresult rv = aOpenStatus;
>  
>      if (NS_SUCCEEDED(rv)) {

See my comment in JarProtocolHandler.cpp:

  // NS_ERROR_ALREADY_OPENED here means we'll hit JAR cache in OpenLocalFile()
  if (NS_SUCCEEDED(rv) || rv == NS_ERROR_ALREADY_OPENED) {

::: modules/libjar/nsJARProtocolHandler.cpp
@@ +84,5 @@
> +        listeners->AppendElement(aListener);
> +        return true;
> +    }
> +
> +    mRemoteFileListeners.Put(aRemoteFile, new RemoteFileListenerArray());

comment here that we deliberately don't put the initial remoteFile in the listener array, since it's handled differently.

@@ +114,5 @@
> +    mRemoteFileListeners.Remove(aRemoteFile);
> +
> +    uint32_t count = listeners.Length();
> +    for (uint32_t index = 0; index < count; index++) {
> +        listeners[index]->OnRemoteFileOpenComplete(aStatus);

This works, but it violates the stated API for nsIRemoteOpenFileListener, which guarantees that OpenNSPRFileDesc() will succeed if status is NS_OK. 

Rather than complicate the API for nsIRemoteOpenFileListener, I think the best way forward here is something like this:

   // technically we must fail OnRemoteFileComplete since OpenNSPRFileDesc() won't 
   // succeed here.  So we've trained nsJARChannel to recognize ALREADY_OPENED 
   // in this case as "proceed to JAR cache hit."
   listeners[index]->OnRemoteFileComplete(NS_SUCCEEDED(aStatus) ? NS_ERROR_ALREADY_OPENED : aStatus)

If you find some other solution more pleasing, go for it. (a separate OnRemoteFileAlreadyOpen() method in the IDL could work, but it'd be ambiguous when it occurs unless we add some flag or something).

::: modules/libjar/nsJARProtocolHandler.h
@@ +51,4 @@
>  protected:
>      nsCOMPtr<nsIZipReaderCache> mJARCache;
>      nsCOMPtr<nsIMIMEService> mMimeService;
> +    nsClassHashtable<nsHashableHashKey, RemoteFileListenerArray>

comment: 

// holds lists of RemoteFileOpens (not including the 1st) that are requesting 
// the same file from parent.

::: netwerk/ipc/RemoteOpenFileChild.cpp
@@ +131,5 @@
> +
> +  if (mTabChild) {
> +    if (mTabChild->GetCachedFileDescriptor(path, this)) {
> +      // The file descriptor was already in the cache. No need to do anything
> +      // else here.

// The file descriptor was found in the cache, and OnCachedFileDescriptor() will be called with it.

@@ +198,5 @@
> +
> +  nsRefPtr<TabChild> tabChild;
> +  mTabChild.swap(tabChild);
> +
> +  // If there is a pending callback and we're being called from IPDL then we

"being called from RecvFileOpened" (see my comment in .h file)

::: netwerk/ipc/RemoteOpenFileChild.h
@@ +92,5 @@
> +  virtual void OnCachedFileDescriptor(const nsAString& aPath,
> +                                      const FileDescriptor& aFD) MOZ_OVERRIDE;
> +
> +  void HandleFileDescriptorAndNotifyListener(const FileDescriptor&,
> +                                             bool aFromIPDL);

aFromIPDL is confusingly named.  I was tracing this code in gdb and it's possible to call this function via a stack that leads back to an IPDL receipt, i.e.

  TabChild::RecvCacheFD -> (find child-init'd request) -> FireCallback
   -> RemoteOpenFileChild::OnCachedFd -> this function

There's really 3 paths to this function: 1) when fd is really already cached in child; 2) when parent-initiated fd arrives; and 3) when child-initiated fd arrives.  You only set aFromIPDL=true in case 3 (which is correct AFAICT), so name it something like "aFromRecvFileOpened".  Make sure to get rid of all refs to 'aFromIPDL' in patch (they're in inline comments in function calls)
Attachment #709767 - Flags: review?(jduell.mcbugs) → feedback+
Created attachment 710638 [details] [diff] [review]
Patch, v1.4

Comments addressed.
Attachment #709767 - Attachment is obsolete: true
Attachment #709768 - Attachment is obsolete: true
Attachment #710638 - Flags: review?(jduell.mcbugs)
(In reply to Jason Duell (:jduell) from comment #22)
> Now for the bad news:  I'm not seeing a performance gain with this patch.  I
> measured a bunch of apps in an opt, variant=user build on unagi, using
> |Settings ... Developer options ... Show startup timings| to get numbers:

This patch is only going to speed up cases where the parent is quite busy and the child is just launched (as stated in comment 0). Simulating such a state for manual testing is going to be very error-prone. Also, as I understand it, the method by which the app startup numbers are generated is currently in flux (because they didn't measure exactly what we thought) and the numbers have historically been pretty noisy. Without lots and lots of averaged data showing problems I don't think we should worry.

Chris should weigh in here of course but otherwise I think we're all good! :)
Flags: needinfo?(jones.chris.g)
(In reply to Jason Duell (:jduell) from comment #22)
> What's the benefit of doing things this way versus just using a 2 line IDL
> file?

Oh, forgot about this. Basically leaving this in C++ only keeps it out of JS entirely, doesn't add the interface to XPT files, etc.
The win here (when the directory is in pagecache, so no disk IO) is gained from filling idle time created by other patches in bug 834622.  The amount of win is right at the limit of what a stopwatch can measure, so the best way is to look at profiler results before/after.
Flags: needinfo?(jones.chris.g)
Comment on attachment 710639 [details] [diff] [review]
Interdiff, v1.3 -> v1.4

Review of attachment 710639 [details] [diff] [review]:
-----------------------------------------------------------------

OK, logically this should be a win, and if we see otherwise, we can always back out.

::: modules/libjar/nsJARProtocolHandler.cpp
@@ +123,3 @@
>      uint32_t count = listeners.Length();
>      for (uint32_t index = 0; index < count; index++) {
> +        listeners[index]->OnRemoteFileOpenComplete(status);

I knew I could count on you to take the tertiary operator out of the loop :)
Attachment #710639 - Flags: review+
Comment on attachment 710638 [details] [diff] [review]
Patch, v1.4

Whoops, +r'd the interdiff :)
Attachment #710638 - Flags: review?(jduell.mcbugs) → review+
https://hg.mozilla.org/releases/mozilla-b2g18/rev/1c21e59d64f3

I'm gonna let this bake for a day before landing on 1.0.0.
status-b2g18: affected → fixed
Backed out in https://hg.mozilla.org/integration/mozilla-inbound/rev/278f5973b2c8 for the Win debug crash in test_jarchannel_e10s.js in https://tbpl.mozilla.org/php/getParsedLog.php?id=19518213&tree=Mozilla-Inbound and https://tbpl.mozilla.org/php/getParsedLog.php?id=19519309&tree=Mozilla-Inbound, and the two of the same crash on the b2g18 push. I didn't back it out there, because that tree seems to have different rules for patches that cause permaorange.
bent: you've got a windoze box, right?  You're probably in a better place to debug the windows-only test_jarchannel_e10s.js than I, then.  I doubt it's anything serious--we don't even do real IPDL in the windows case.

Since the orange is windows only, I'm ok with letting this stay on b2g18--if others feel differently, speak up.
There are developer builds made for win32 on b2g18.  Please to be backing out.
Comment on attachment 711175 [details] [diff] [review]
patch to fix on windows

Yep, this looks great.
Attachment #711175 - Flags: review?(bent.mozilla) → review+
https://hg.mozilla.org/mozilla-central/rev/1a2e34c9ebc7
https://hg.mozilla.org/mozilla-central/rev/f21da1055c46
Status: NEW → RESOLVED
Last Resolved: 6 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla21
https://hg.mozilla.org/releases/mozilla-b2g18_v1_0_0/rev/8c5ba56817a6
status-b2g18-v1.0.0: affected → fixed
status-firefox19: --- → wontfix
status-firefox20: --- → wontfix
status-firefox21: --- → fixed
Depends on: 839688
Backed out for causing app installs to break (bug 839688):

(ryanVm told me not to back out of inbound, as he'll merge m-c -> inbound soon) 

  https://hg.mozilla.org/releases/mozilla-b2g18_v1_0_0/rev/9822d3782c98

Mozilla-central had this split into 2 patches:

  https://hg.mozilla.org/mozilla-central/rev/31d07b2d8234
  https://hg.mozilla.org/mozilla-central/rev/876652a0c8ef

["hg backout" created two patches from my "hg backout" of the single patch that was on b2g18 for some reason: they're both copies of the original patch if you check with "hg diff -c", but if you go to the links here, you see the 2nd patch seems like a small subset of the first.  AFAICT everything is fine on the tree (the orig patch applies fine).  I guess this is what we get from using a source control system named "mercurial"]

  https://hg.mozilla.org/releases/mozilla-b2g18/rev/60454bfc40ac
  https://hg.mozilla.org/releases/mozilla-b2g18/rev/2f65effa5dab
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
status-b2g18: fixed → ---
status-b2g18-v1.0.0: fixed → ---
status-firefox21: fixed → ---
status-b2g18: --- → affected
status-b2g18-v1.0.0: --- → affected
status-firefox21: --- → affected
Target Milestone: mozilla21 → ---
Created attachment 712155 [details] [diff] [review]
Patch, v1.5

Added a .forget() that should fix the crash. Woops.
Attachment #710638 - Attachment is obsolete: true
Attachment #711175 - Attachment is obsolete: true
Attachment #712155 - Flags: review?(jduell.mcbugs)
Attachment #712155 - Attachment is patch: true
Comment on attachment 712155 [details] [diff] [review]
Patch, v1.5

Review of attachment 712155 [details] [diff] [review]:
-----------------------------------------------------------------

didn't need review IMO for one line crashfix, FYI
Attachment #712155 - Flags: review?(jduell.mcbugs) → review+
https://hg.mozilla.org/mozilla-central/rev/1646e649878a
Status: REOPENED → RESOLVED
Last Resolved: 6 years ago6 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla21

Updated

6 years ago
status-firefox21: affected → fixed
RyanVM, could I trouble you to back this out?  (I don't have a v1_0_0 tree.)
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Backed out on all branches for causing bug 839256.
https://hg.mozilla.org/integration/mozilla-inbound/rev/bf2650bd2e3b
https://hg.mozilla.org/releases/mozilla-b2g18/rev/b08b8af6b73e
https://hg.mozilla.org/releases/mozilla-b2g18_v1_0_0/rev/8622fb5eda00
status-b2g18: fixed → affected
status-b2g18-v1.0.0: fixed → affected
status-b2g18-v1.0.1: --- → affected
status-firefox21: fixed → affected
Target Milestone: mozilla21 → ---
Given the risk around landing here, we won't block on this (possible) perf win. We'll need to consider it for v1.0.1 outside of the blocking list.
blocking-b2g: tef+ → -
tracking-b2g18: --- → +
Ok, chatted with cjones about this, since neither jduell nor I can make this crash and the crash reports in the other bugs were kinda wonky we're going to try relanding this and see if problems crop up again.

https://hg.mozilla.org/integration/mozilla-inbound/rev/d7996e9773fd
https://hg.mozilla.org/mozilla-central/rev/d7996e9773fd
Status: REOPENED → RESOLVED
Last Resolved: 6 years ago6 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla21

Updated

6 years ago
status-firefox21: affected → fixed
Depends on: 843433
bent/cjones:  do you still want to land this bug (and possibly 835575) on the b2g18 branch?  We need to either ask for a? on this bug, or clear the a+ on bug 835575 (since it depend on this).   My sense is we haven't seen enough perf improvement here to merit taking these upstream.
The patches that removed work that exposed the async-wait here (bug 839154, bug 835681,  bug 835679) either bounced or didn't get real fixes in time and won't in b2g18 timeframe, most likely.  With those, this brings pretty significant win, but without we just have more unnecessary work crowding the event loop.  So I don't think cost/benefit is right for uplift.
(In reply to Chris Jones [:cjones] [:warhammer] from comment #54)
> The patches that removed work that exposed the async-wait here (bug 839154,
> bug 835681,  bug 835679) either bounced or didn't get real fixes in time and
> won't in b2g18 timeframe, most likely.  With those, this brings pretty
> significant win, but without we just have more unnecessary work crowding the
> event loop.  So I don't think cost/benefit is right for uplift.

Removing tracking+ then.
status-b2g18: affected → wontfix
status-b2g18-v1.0.0: affected → wontfix
status-b2g18-v1.0.1: affected → wontfix
tracking-b2g18: + → ---
See Also: → bug 1119692
You need to log in before you can comment on or make changes to this bug.