Closed Bug 716174 Opened 13 years ago Closed 11 years ago

nsLocalFile::CopyTo called on the main thread during startup

Categories

(Firefox :: Session Restore, defect)

12 Branch
defect
Not set
normal

Tracking

()

RESOLVED DUPLICATE of bug 762867

People

(Reporter: dmandelin, Unassigned)

References

Details

(Whiteboard: [Snappy:P2])

Attachments

(5 obsolete files)

A few minutes ago, I started Firefox, and got a pause during startup (while tabs from saved session were being loaded) of 3300 ms. My profiler says that most of that time was spent inside nsLocalFile::CopyTo:

sample 1955098613
    NtWaitForMultipleObjects
    WaitForMultipleObjectsEx
    WaitForMultipleObjects
    GetDurationFormatEx
    CopyFileExW
    CopyFileExW
    CopyFileExW
    nsLocalFile::CopySingleFile
    nsLocalFile::CopyMove
    nsLocalFile::CopyTo
    NS_InvokeByIndex_P
    XPCWrappedNative::CallMethod
    XPC_WN_CallMethod

XPC_WN_CallMethod is as deep as the stack trace gets (it couldn't figure out the rest, apparently), so I really don't know who was calling that. I do know that just before the end of the 3300 ms there was a stack involving sqlite and cookies. Here is the relevant snippet from the middle of that stack:

    sqlite3_exec
    mozilla::storage::Connection::initialize
    mozilla::storage::Service::OpenUnsharedDatabase
    nsCookieService::TryInitDB
    nsCookieService::InitDBStates
    nsCookieService::Init
    nsCookieService::GetSingleton
    nsICookieServiceConstructor
    mozilla::GenericFactory::CreateInstance
    nsComponentManagerImpl::CreateInstance
    nsComponentManagerImpl::GetService
    nsJSCID::GetService
    NS_InvokeByIndex_P
    XPCWrappedNative::CallMethod
    XPC_WN_CallMethod
    js::InvokeKernel
    js::Interpret
    js::RunScript
Dumping the JS stack would be useful here.
Assignee: nobody → ddahl
Attached patch v 1 Patch (obsolete) — Splinter Review
All tests pass, however, is this codepath tested? If not, I can include a test. I am wondering if it is tested as this code seems to execute only after a crash. Of course, you have 687 tests, so I assume it is tested.
Attachment #593225 - Flags: review?(paul)
It's rather inefficient for us to be doing our own I/O here when all we really need is a straight file copy. Bug 563742 would help.
(In reply to Gavin Sharp (use gavin@gavinsharp.com for email) from comment #4)
> It's rather inefficient for us to be doing our own I/O here when all we
> really need is a straight file copy. Bug 563742 would help.

Is the solution (for now) to wait for the JS File API to land?
As far as I can see that API is all synchronous, so how does it solve this problem?
(In reply to Gavin Sharp (use gavin@gavinsharp.com for email) from comment #4)
> It's rather inefficient for us to be doing our own I/O here when all we
> really need is a straight file copy. Bug 563742 would help.

What exactly is not efficient about it? Why should we wait on that? This is a trivial fix, and I don't understand why we would not take this improvement on the status quo.
(In reply to Kyle Huey [:khuey] (khuey@mozilla.com) from comment #6)
> As far as I can see that API is all synchronous, so how does it solve this
> problem?

That new file API is designed to be usable from Chrome Workers. Maybe Gavin is suggesting sending the file info over to Chrome Worker and having it do the operations?
(In reply to David Dahl :ddahl from comment #3)
> All tests pass, however, is this codepath tested? If not, I can include a
> test. I am wondering if it is tested as this code seems to execute only
> after a crash. Of course, you have 687 tests, so I assume it is tested.

IIRC, this codepath is hit in any scenario in which we resume the session during browser startup.

Please check that there are mozmill tests that cover this. If not, file a bug for having one created. And if you haven't already, please smoke-test by force-killing Firefox and confirming the file is created, as well as testing with normal restoring of session (the "restore my windows and tabs" option in preferences).
OS: Windows 7 → All
Hardware: x86_64 → All
(In reply to Dietrich Ayala (:dietrich) from comment #7)
> What exactly is not efficient about it?

We're reading the file into memory, and then writing it out again, with inefficient XPCOM overhead. We don't need to do any of that to just copy a file on the filesystem - there are system calls (like CopyFileEx on Windows) that will asynchronously copy files directly, and much more efficiently.
(In reply to Kyle Huey [:khuey] (khuey@mozilla.com) from comment #6)
> As far as I can see that API is all synchronous

:(

I really don't understand what that API is about, then. "Use it from workers" isn't really going to cut it. I hope I'm missing something.

We need an alternative to nsIFile that implements moving/copying files directly, and asynchronously.
(In reply to Gavin Sharp (use gavin@gavinsharp.com for email) from comment #10)
> We're reading the file into memory, and then writing it out again, with
> inefficient XPCOM overhead. We don't need to do any of that to just copy a
> file on the filesystem - there are system calls (like CopyFileEx on Windows)
> that will asynchronously copy files directly, and much more efficiently.

Hm, i thought we were reading the input stream directly into the output stream specifically to avoid reading the whole file into memory. Does asyncCopy not actually do that?
There is buffering, you're right. But the overhead is still significant compared to the current nsIFile CopyTo() implementations.

I don't think it's wise to re-write all calls to copyFile() the way this patch does - we should abstract file copying away (FileUtils.jsm?) so that we can easily replace it later with more efficient methods.
(In reply to Gavin Sharp (use gavin@gavinsharp.com for email) from comment #13)
> There is buffering, you're right. But the overhead is still significant
> compared to the current nsIFile CopyTo() implementations.

Is it worse than blocking the UI for the whole copy?

> I don't think it's wise to re-write all calls to copyFile() the way this
> patch does - we should abstract file copying away (FileUtils.jsm?) so that
> we can easily replace it later with more efficient methods.

I'm not clear on the conclusion.

Do you recommend that we move this into a FileUtils.jsm, or rather not make this change at all?
(In reply to Gavin Sharp (use gavin@gavinsharp.com for email) from comment #11)
> (In reply to Kyle Huey [:khuey] (khuey@mozilla.com) from comment #6)
> > As far as I can see that API is all synchronous
> 
> :(
> 
> I really don't understand what that API is about, then. "Use it from
> workers" isn't really going to cut it. I hope I'm missing something.

Yeah I don't really understand what's going on over there either.
IO is synchronous. You can run it in a worker if you want to be async and we'll provide APIs to do that.
(In reply to Dietrich Ayala (:dietrich) from comment #14)
> Is it worse than blocking the UI for the whole copy?

Of course not.

> Do you recommend that we move this into a FileUtils.jsm, or rather not make
> this change at all?

The former. Using a common copyFile helper somewhere convenient (FileUtils.jsm? nsIFile.jsm? I don't care exactly where), and replacing calls to nsIFile::copyTo with it, so that when we have a better alternative, we don't need to hunt down all the callers and change them again. We can do the same with move/delete/etc.
Comment on attachment 593225 [details] [diff] [review]
v 1 Patch

Clearing review based on the discussion. I'm liking FileUtils.jsm.

And then about tests... what Dietrich said. There aren't any mochitests for it, so check Litmus.
Attachment #593225 - Flags: review?(paul)
Attached patch v 2 Patching FileUtils as well (obsolete) — Splinter Review
Mak: I never see a sessionstore backup made when this code runs, and I see it run all the way through NetUtil.asyncCopy() - will my dump statements never make it to the console inside an async callback function? I always thought they would. /me confused
Attachment #593225 - Attachment is obsolete: true
Attachment #593911 - Flags: feedback?(mak77)
Comment on attachment 593911 [details] [diff] [review]
v 2 Patching FileUtils as well

I know there's a lot of trailing whitespace in there, but please don't fix it all! ( I blame emacs :) )
aCallback function would probably be more versatile than aObserverTopic.

Is there a reason to notify "sessionstore-backup-write-complete"? I'm not sure why anyone would care to observe this.
(In reply to Gavin Sharp (use gavin@gavinsharp.com for email) from comment #21)
> aCallback function would probably be more versatile than aObserverTopic.
> 
> Is there a reason to notify "sessionstore-backup-write-complete"? I'm not
> sure why anyone would care to observe this.

I was taking an example from mobile up a notch, but am also doubtful of the utility. I can imagine the callback being more useful. Great idea.
(In reply to Paul O'Shannessy [:zpao] from comment #20)
> Comment on attachment 593911 [details] [diff] [review]
> v 2 Patching FileUtils as well
> 
> I know there's a lot of trailing whitespace in there, but please don't fix
> it all! ( I blame emacs :) )

Yeah. I was not trying to do so, I should have peeked at the diff before uploading. Won't happen again sir. But, for all that is holy, make a patch to only remove the whitespace, please?;)
Attached patch v 3 Patch. Huh? wha? (obsolete) — Splinter Review
Everything seems to execute, no errors from NetUtil.asyncCopy and yet I never see the backup file materialize. wha wha?
Attachment #593911 - Attachment is obsolete: true
Attachment #593911 - Flags: feedback?(mak77)
Attachment #593984 - Flags: feedback?(paul)
Comment on attachment 593984 [details] [diff] [review]
v 3 Patch. Huh? wha?

Review of attachment 593984 [details] [diff] [review]:
-----------------------------------------------------------------

just drive-by comments

::: browser/components/sessionstore/src/nsSessionStore.js
@@ +157,5 @@
> +XPCOMUtils.defineLazyGetter(this, "FileUtils", function() {
> +  let scope = {};
> +  Cu.import("resource://gre/modules/FileUtils.jsm", scope);
> +  return scope.FileUtils;
> +});

defineLazyModuleGetter

::: toolkit/mozapps/shared/FileUtils.jsm
@@ +54,5 @@
> +XPCOMUtils.defineLazyGetter(this, "NetUtil", function (){
> +  let scope = {};
> +  Cu.import("resource://gre/modules/NetUtil.jsm", scope);
> +  return scope.NetUtil;
> +});

defineLazyModuleGetter

@@ +176,5 @@
> +   *
> +   * @param nsIFile aSourceFile
> +   * @param nsIFile aDestinationFile
> +   * @param function aCallback
> +   * @return void

if there is no return value, no reason to document it

@@ +185,5 @@
> +  {
> +    if (!(aSourceFile instanceof Ci.nsIFile &&
> +          aDestinationFile instanceof Ci.nsIFile)) {
> +      Cu.reportError("FileUtils.copyFile: Invalid argument");
> +    }

you should throw here probably, for sure interrupt the function rather than continuing

@@ +208,5 @@
> +      }
> +      else {
> +        Cu.reportError("FileUtils.copyFile failed");
> +      }
> +    });

maybe you want to callback regardless and pass rc to the callback, otherwise there's no way to detect failures.
Attached patch v 4 ball of wax patch (obsolete) — Splinter Review
Naturally, this behavior in this patch is now a mystery to me - I can create the .bak file but data is never written to it. I see an exception once in a while about a file input stream that loses data or something to that effect. Please enlighten me:)
Attachment #593984 - Attachment is obsolete: true
Attachment #593984 - Flags: feedback?(paul)
Attachment #594318 - Flags: feedback?(mak77)
mak: wil upload a new patch in a moment this one is cursed by whitespace
Attached patch v 4.1 ball of wax patch (obsolete) — Splinter Review
Better patch no whitespace:)
Attachment #594318 - Attachment is obsolete: true
Attachment #594318 - Flags: feedback?(mak77)
Attachment #594329 - Flags: feedback?(mak77)
(In reply to David Dahl :ddahl from comment #26)
> Naturally, this behavior in this patch is now a mystery to me - I can create
> the .bak file but data is never written to it. I see an exception once in a
> while about a file input stream that loses data or something to that effect.
> Please enlighten me:)

My best guess here is that we're replacing/overwriting sessionstore.js while you're reading from it. We call saveState early in SS. That would never have been an issue before since we were copying sync, but now both of these operations are async (using NetUtil.asyncCopy even) so who knows! If that's it, then we'd have to add a flag indicating have a pending save and need to do it in the callback.
Please see bug 724177, this part of code should no longer be a problem as long as that gets reviewed with an r+.  I didn't know about this bug until after I did that bug and fix by the way.
It might still be good to get it off the main thread during startup, but the savings will only be < 1ms now from the other bug.  I'm not 100% sure if you can do it on a non main thread in a safe way though as I'm not familiar with that part of the code.
(In reply to Brian R. Bondy [:bbondy] from comment #30)
> Please see bug 724177, this part of code should no longer be a problem as
> long as that gets reviewed with an r+.  I didn't know about this bug until
> after I did that bug and fix by the way.

Cool. That patch looks like it is Windows only.

I am not surprised as this bug's QA contact is general@fx and bug 724177 contact is core@xpcom

What does Gavin think?
Good point regarding Windows only, yup I can confirm this is in the Windows file code.  I haven't profiled this on other platforms so there may be big gains by this task on other platforms.
The nsLocalFileUnix implementation of copyTo() (only other one that matters, it's used for both Mac and Linux) looks relatively efficient, but it still blocks the main thread. There's still value in making this I/O non-blocking.
Assignee: ddahl → nobody
Component: General → Session Restore
QA Contact: general → session.restore
Assignee: nobody → ddahl
(In reply to Paul O'Shannessy [:zpao] from comment #29)
>  now both of these
> operations are async (using NetUtil.asyncCopy even) so who knows! If that's
> it, then we'd have to add a flag indicating have a pending save and need to
> do it in the callback.

Here the new Schedule API from Yoric would come at hand (bug 692420)
Blocks: 724177
Comment on attachment 594329 [details] [diff] [review]
v 4.1 ball of wax patch

Review of attachment 594329 [details] [diff] [review]:
-----------------------------------------------------------------

I don't have further feedback, you should check what Paul said regarding concurrent async operations though.

::: toolkit/mozapps/shared/test/unit/test_FileUtils.js
@@ +202,5 @@
> +
> +  let fileB = FileUtils.getFile("ProfD", ["copyB"]);
> +
> +  FileUtils.copyFile(fileA, fileB, function (rc) {
> +    do_check_true(rc === 0);

you should check Components.isSuccessCode(rc)
Attachment #594329 - Flags: feedback?(mak77) → feedback+
So after a bunch of debugging with zpao, it seems as though NetUtil.asyncCopy fails silently with a file-input-stream. It only works with a string-input-stream. The file silently part should maybe be fixed too. Any ideas fellow bug travelers?
rs:

Would you be able to peek at this patch? Not sure how to get the correct input stream for this copyFile method I am adding to FileUtils
Whiteboard: [Snappy] → [Snappy:P2]
David, what more needs to happen here? Figure out how to convert a file-input-stream to string-input-stream? Or be able to get a string-input-stream to begin with? Or change asyncCopy to accept file-input-stream as well?
Attachment #594329 - Flags: feedback?(robert.bugzilla)
(In reply to Dietrich Ayala (:dietrich) from comment #39)
> David, what more needs to happen here? Figure out how to convert a
> file-input-stream to string-input-stream? Or be able to get a
> string-input-stream to begin with? Or change asyncCopy to accept
> file-input-stream as well?

I forgot to flag rs for feedback. Whoops. 

Robert:

I was hoping you might know what the best path here might be? We fail silently most of the time with a file-input-stream passed to NetUtil.asyncCopy
Comment on attachment 594329 [details] [diff] [review]
v 4.1 ball of wax patch

I'm kind of buried with reviews atm. The only thing I saw that is of concern to me is that copyFile assumes that the permissions should be the default and it might be better to allow the caller to set them if they want to.
Attachment #594329 - Flags: feedback?(robert.bugzilla) → feedback+
Dave what needs to happen next here?
I think we need a new bug for adding fileInputStream support to NetUtil.asyncCopy - as it does not work that way and fails silently.
Depends on: 735926
Comment on attachment 594329 [details] [diff] [review]
v 4.1 ball of wax patch

I think it would be useful to check aSourceFile.isFile() in copyFile before aDestinationFile is created to avoid starting to copy a directory or other non-file.
This looks a lot like a subset of bug 532150.

Also, is there a good reason to not use OS.File.copy (which is off main thread and uses low-level system copy wherever available)?
(In reply to David Rajchenbach Teller [:Yoric] from comment #45)
> This looks a lot like a subset of bug 532150.

Reading of the file and copying it to a backup are two separate code paths. I don't really care how we split the work up exactly, but obviously the goal is getting rid of all of the sync/main-thread IO here :)

> Also, is there a good reason to not use OS.File.copy (which is off main
> thread and uses low-level system copy wherever available)?

No, this code just predates that.
Assignee: ddahl → nobody
Depends on: 532150
No longer depends on: 735926
Attachment #594329 - Attachment is obsolete: true
(In reply to David Rajchenbach Teller [:Yoric] <away until Jan 7th> from comment #47)
> This is probably solved by bug 532150.

Is this solved now?
I am almost sure that it is. I probably won't have time to test this week, but if someone wants to do this, be my guest.
Is there a function I can use to check whether I'm on the main thread or not?
NS_IsMainThread() should be good for that.
Bug 833286 appears to have already changed the backup of the session store file to an async operation.  This can probably be closed WFM (but I'm not clear on how I would actually verify this, given the relatively vague initial problem report).
Yeah, this has been resolved some time ago, by bug 762867 iirc.
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → DUPLICATE
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: