Remove main-thread I/O from password manager

VERIFIED FIXED in mozilla32

Status

()

defect
P1
normal
VERIFIED FIXED
6 years ago
3 years ago

People

(Reporter: Dolske, Assigned: Paolo)

Tracking

(Depends on 1 bug, Blocks 4 bugs, {dev-doc-needed})

unspecified
mozilla32
Points:
---
Dependency tree / graph
Bug Flags:
firefox-backlog +
in-testsuite +

Firefox Tracking Flags

(e10s-, firefox31-)

Details

(Whiteboard: [snappy:p1] p=13 s=it-32c-31a-30b.3 [qa!])

Attachments

(1 attachment, 19 obsolete attachments)

93.05 KB, patch
Details | Diff | Splinter Review
Password Manager currently uses SQLite (bug 288040) in the old "dumb and simple" way -- synchronous queries on the main thread. It's time to stop doing that. I'm not beholden to any particular fix, but here are some ideas:

1) Move to async DB queries. We could do this, but we really don't need the power of a full DB. The nsILoginManager interfaces would also need to be changed, because they're expecting an immediate return of data.

2) Jettison SQLite entirely. Instead use JSON as the file format, and keep the contents in memory to eliminate all read traffic after initialization.

  This is basically what password manager did pre-288040. Typical password storage should be fairly small, so this isn't likely to have significant memory usage (and is probably a net-win, compared to how much sqlite uses as a baseline). We'd want the flush-to-disk-on-update to happen off the main thread, of course.

  Note that this wouldn't need to change the nsILoginManager APIs. Things like addLogin() would still be "synchronous", but the actual flush to disk might not complete until a short time later. In fact, we might not even _start_ the flush for a short delay, so that bulk operations (like Sync importing a bunch of logins) wouldn't cause unneeded I/O.

3) Same as #2, but change the nsILoginManager API to be async too.

  Bug 566746 implements a nice API for form history, but that was an easy decision because the old form history "API" was crippled and unusable. The major downsides here are addon-compat (as well as our own API usage), and that synchronous APIs are generally simpler to use. The upside is consistency with the form history API, and possibly a better API for Sync to use (but a Sync rewrite is underway, so that's hazy). I suspect an async API might make things like the Master Password easier to deal with, and (further out) if we want to use OS-provides password storage facilities and async API would give us more flexibility there (eg, if the OS API is async, that's hard to use from a sync API. You have to spin the event loop).


I'd strongly lean towards #1 based on what I know today. It's well-contained (a rewrite of just storage-mozStorage.js), fixes the immediate need quickly, and allows deferring an API rewrite until later (or never).

A couple other observations:

* storage-Legacy.js can probably go away now. We don't need to import Firefox 3.0 profiles. (Although I vaguely recall this being used for something else...)

* We'll want to carefully look at migration to the new storage format, as well as future upgrade/downgrade changes.

* Bug 839961 is currently refactoring some pwmgr code, but is largely independent of this bug and should be complete shortly.
I'm going to dupe a couple of similar bugs to this on, they're older and have stale context that's not so useful now.
Duplicate of this bug: 657007
Duplicate of this bug: 600002
Posted patch work in progress patch (obsolete) — Splinter Review
This patch converts the storage api into a module that does its work asynchronously and uses a task. It has a wrapper that waits for the task to complete so that callers think it is operating synchronously. This is intended as a first step until the callers and tests can be made to work with the api asynchronously. It is unfinished but most unit tests and most parts of the mochitests work.
Depends on: 798844
Posted patch work in progress patch, v2 (obsolete) — Splinter Review
This version of the patch fixes all unit tests and all password manager mochitests save one (test_prompt_async.html). It only changes the database handling to be asynchronous, and as an intermediate step callers spin the event loop so they appear to be synchronous. Some other tests that expect events in certain orders fail because of this. This will be replaced of course, and my current thinking is:

1. Convert login manager interfaces to either take an optional callback, or create a second login manager interface/component that uses asynchronous calls.
2. Convert callers to use this.
3. Convert tests to use this.
4. Later, change to support a different storage model if needed.
 
Is this what was in mind by option 1 in the original comment?
Assignee: nobody → enndeakin
Attachment #729119 - Attachment is obsolete: true
Attachment #733146 - Flags: feedback?
Flags: needinfo?(dolske)
Attachment #733146 - Attachment is obsolete: true
Attachment #733146 - Flags: feedback?
Posted patch Part 2 - remove legacy tests (obsolete) — Splinter Review
This creates another module rather than changing the interface. I'm not convinced that it's the right approach though rather than just changing the interface or something else.
All of the unit tests now work asynchrnously
This patch is a mishmash of working and broken stuff. The issue here is that the tests rely on being able to:

1. add some test logins in an inline script before the load event fires
2. the DOMContentLoaded test fires and the login forms are filled in
3. run the tests

With the other patches, 1 now does not complete until 2 is already run and 2 does not complete until 3 runs. I adjusted some tests to load into a child frame and combined them into a single test.

Much of the rest is experimentation for the other tests. Look at your own risk.
OK, so this is what I have so far. If we go with this approach, the remaining work is to finish the main tests, and convert a few callers in browser/mobile/etc to use the new code.

Since I will be gone for a while, I am unassigning this bug from myself.
Assignee: enndeakin → nobody
Whiteboard: [snappy:p1]
Mano, can you post an update of your current status?
Assignee: nobody → mano
(In reply to Neil Deakin (not available until June 10) from comment #5)

> [...] It only changes the database
> handling to be asynchronous, and as an intermediate step callers spin the
> event loop so they appear to be synchronous. Some other tests that expect
> events in certain orders fail because of this. This will be replaced of
> course, and my current thinking is:
> 
> 1. Convert login manager interfaces to either take an optional callback, or
> create a second login manager interface/component that uses asynchronous
> calls.
> 2. Convert callers to use this.
> 3. Convert tests to use this.
> 4. Later, change to support a different storage model if needed.
>  
> Is this what was in mind by option 1 in the original comment?

Not really, the "option 1" I listed presumed changed the nsILoginManager APIs to be async as well. But if you put spinning the event loop on the table, I suppose it's close. :)

I certainly wouldn't want to have event loop spinning be a permanent solution. I'm somewhat less averse to it as an intermediate step, but it still worries me due to the nested event-loop issues it risks. In particular, I'd want to _carefully_ look at how this would interact with prompts (master password, http auth, and the ones used by TB/SM) since they can also be spinning the event loop, and that could make for some sad-times if things interact poorly.

I actually think it would be less painful/risky to do things in reverse-order:

 1. implement an additional "async" api, using the current nasty mainthread-IO
    and invoking the callbacks immediately / from a setTimeout(0).
 2. Start getting addons (and our own code!) ported to this API
 3. Remove the synchronous API
 4. Switch the backend to actually do async queries OMT.

Looks like Jetpack's API for passwords is already async (though I didn't look closely), so hopefully it can switch to the new new API here without any Jetpack compat issues. And so it'll only be an issue for apps and traditional add-ons...

This actually looks like a really big change for add-ons. :-( I get over 1000 hits on https://mxr.mozilla.org/addons/search?string=nsILoginManager (though the actual number of addons is maybe 5x less). So that's going to take some extra outreach to get developers to update incompatible addons.

It's tempting to consider "option 2" again -- retain the synchronous API, keep all the signon data in memory, and lazily flush to disk OMT...
Flags: needinfo?(dolske)
Setting realistic ETA given Dolske's input and more issue that are coming up as I'm working on this.

As I somewhat started from scratch yet again, I'll try to update this as partial patches so the review could start as I'm working on this.
Whiteboard: [snappy:p1] → [snappy:p1][ETA for working patch: 1/6. WIP: 25/5]
Status: NEW → ASSIGNED
OS: Mac OS X → All
Priority: -- → P1
Hardware: x86 → All
Whiteboard: [snappy:p1][ETA for working patch: 1/6. WIP: 25/5] → [snappy:p1][ETA for working patch: 1/6. WIP: 25/5, pseudo async api to land ahead of the implementaton]
(In reply to Justin Dolske [:Dolske] from comment #14)
>  1. implement an additional "async" api, using the current nasty
> mainthread-IO
>     and invoking the callbacks immediately / from a setTimeout(0).
>  2. Start getting addons (and our own code!) ported to this API
>  3. Remove the synchronous API
>  4. Switch the backend to actually do async queries OMT.

This is somewhat similar to what we're doing with the search service. There, we have maintained the synchronous API, but added an asynchronous init() method. To port consumers over to the async API, consumers need to call init, wait to be called back, and then proceed using the same (synchronous) API. In the mean time, there is a synchronous fallback path that is invoked if a consumer doesn't call init(). The goal is to over time migrate all consumers to calling init(), and then eventually get rid of the synchronous fallback path.

> It's tempting to consider "option 2" again -- retain the synchronous API,
> keep all the signon data in memory, and lazily flush to disk OMT...

Hmm, yeah. Should we get data about the average size of the password DB? May also be worth looking into the approach localStorage took. IIRC they try to eagerly load stuff into a memory cache OMT, but have a synchronous fallback if they weren't able to finish the load before the first "get". This might work well for pwmgr, assuming it isn't frequently accessed early during startup (?).
We should not duplicate localstorage preloading logic. A proper async approach should be safer.
(In reply to Taras Glek (:taras) from comment #17)
> We should not duplicate localstorage preloading logic. A proper async
> approach should be safer.

Sure, absent any compatibility constraints. It's not yet clear to me how important those are in this case. Probably not enough to be worth the extra complexity, I agree.
(In reply to :Gavin Sharp (use gavin@gavinsharp.com for email) from comment #16)

> Hmm, yeah. Should we get data about the average size of the password DB?

I'm pretty sure that must be floating around somewhere, but I don't know the exact answer. I vaguely recall that most users had < 5 entries. But I'd actually want to know what what the long tail looks like.


> IIRC they try to
> eagerly load stuff into a memory cache OMT, but have a synchronous fallback
> if they weren't able to finish the load before the first "get". This might
> work well for pwmgr, assuming it isn't frequently accessed early during
> startup (?).

Yeah, that's the downside of keeping a synchronous API with async load/flush -- something has to do the initial load, and it's sad times if that's not complete when the first sync API call comes in. 

The async init() + callback idea is clever, I suspect that would be easy to addons to adapt to. That could enable a gradual migration path like:

  1) Add async API, backend still sync
  2) Add async init() + require sync callers to use it, backend does lazy load and
     lazy flush.
  3) Remove sync API.

Sounds nice, but I think the details make step 2 are not so useful. It's essentially an entire backend rewrite to do lazy-flushed JSON -- least, I've not heard of a way to load a sqlite DB into memory and I don't know how it would get flushed to disk -- and so once you've got that I'm not sure what the point of moving to an async API is. Maybe if you know in the future you will want to do some hardcore DB stuff and will switch back to sqlite? Or all your data lives in the cloud?

It _would_ work as a single-shot change which rewrites the backend to use lazy-JSON, adds an async init(), keeps sync API (w/o an async API added). No further work needed, and it trades a major API change for a minor API flag day. And it would fix the mainthread I/O problem sooner as a result.

Tempting! Although I do think there's still merit to moving to an async API and so going with comment 14 will be less work overall, even if it takes longer.
Whiteboard: [snappy:p1][ETA for working patch: 1/6. WIP: 25/5, pseudo async api to land ahead of the implementaton] → [snappy:p1][ETA for working patch: 1-Jun. WIP: 25-May, pseudo async api to land ahead of the implementaton]
Hmm, didn't Places run into a problem as they were asyncifying involving a mix of sync and async queries? Anything that would affect a plan to have both an async API (issuing async queries) and a sync API (issuing sync queries) living together?

[I think the problem I'm remembering had something to do with shutdown -- maybe synchronous shutdown interacting poorly with async queries still running? Not sure.]
(In reply to Justin Dolske [:Dolske] from comment #20)
> Hmm, didn't Places run into a problem as they were asyncifying involving a
> mix of sync and async queries? Anything that would affect a plan to have
> both an async API (issuing async queries) and a sync API (issuing sync
> queries) living together?

There is contention between the synchronous and asynchronous Storage APIs, if that's what you mean -- they contend for the same mutex. Adding an async API and keeping the synchronous one around will most likely make performance *worse* by introducing contention.

(See, for context, Bug 686025 Comment 33.)
 
> [I think the problem I'm remembering had something to do with shutdown --
> maybe synchronous shutdown interacting poorly with async queries still
> running? Not sure.]

There are other problems around async queries running at shutdown, of course. The trivial solution of joining the storage thread to the main thread to block shutdown is insufficient for, well, non-trivial modules (i.e., those that need to do anything after the last query has run, particularly running another query). We solved this in FHR by just spinning the event loop, which turns out to be the least evil solution.
I will update the ETA etc one I get green light for this approach.

The attached patch should be (mostly) "landable". Once it lands we can go ahead and fix callers to use the new API, and, in parallel, replace the fake async implementation with neil's work.
Whiteboard: [snappy:p1][ETA for working patch: 1-Jun. WIP: 25-May, pseudo async api to land ahead of the implementaton] → [snappy:p1]
Dolske, any idea on when you could get to this patch? I don't need a detailed review here, just a sign off for the the plan and the APIs.
Comment on attachment 757361 [details] [diff] [review]
Async API + Promise-style wrapper + fake-async implementation

Review of attachment 757361 [details] [diff] [review]:
-----------------------------------------------------------------

::: toolkit/components/passwordmgr/AsyncLoginManager.js
@@ +47,5 @@
> +  classID: Components.ID("{73D4F1C1-093A-403E-8F3E-92DD43741D5B}"),
> +  QueryInterface: XPCOMUtils.generateQI([Ci.mozIAsyncLoginManager,
> +                                         Ci.nsIInterfaceRequestor]),
> +
> +  addLogin: function ALM_addLogin(aLogin, aCallbackObj) {

Don't need the function name (ALM_*) any more.

@@ +49,5 @@
> +                                         Ci.nsIInterfaceRequestor]),
> +
> +  addLogin: function ALM_addLogin(aLogin, aCallbackObj) {
> +    ExecuteAndCallCallbackFunction(
> +      Services.logins.addLogin.bind(Services.logins, aLogin),

Kind of an ugly convention but the scope and lifetime of this code is limited enough... (I know, famous last words.)

There's also the slightly exciting and temporary problem of what happens should an async caller modify the login between calltime and async execution time. Presumably the final, fully-async would immediately create and execute a statement, making this question moot.

@@ +145,5 @@
> +      aCallbackObj, true);
> +  }
> +};
> +
> +this.NSGetFactory = XPCOMUtils.generateNSGetFactory([AsyncLoginManager]);

Hmm. Not much benefit from having this be XPCOM anymore. Maybe just make it a JSM? I guess that's effectively collapsing this code into LoginManager.jsm?

::: toolkit/modules/LoginManager.jsm
@@ +44,5 @@
> +
> +for (let method of ["addLogin", "removeLogin", "modifyLogin", "removeAllLogins",
> +                    "getLoginSavingEnabled", "setLoginSavingEnabled", "anyLogins",
> +                    "fillForm"]) {
> +  // Careful here.. the for-decalared variable is not static.

bug 449811?

::: toolkit/modules/Services.jsm
@@ +41,4 @@
>    ["io", "@mozilla.org/network/io-service;1", "nsIIOService2"],
>    ["locale", "@mozilla.org/intl/nslocaleservice;1", "nsILocaleService"],
>    ["logins", "@mozilla.org/login-manager;1", "nsILoginManager"],
> +  ["asyncLogins", "@mozilla.org/async-login-manager;1", "mozIAsyncLoginManager"],

Kind of unfortunate that the nice new version gets the ugly name, but that's life. :)
Attachment #757361 - Flags: feedback?(dolske) → feedback+
Duplicate of this bug: 887195
So as it turns out there's only a single c++ consumer for the login manger, nsFormFillController, which uses just a single method, which is AutoCompleteSearch. For that method alone (which I left out of the new module, at least for now) we can create a "bridge" component until nsFormFillController is also converted to JS. We sure don't need the mozIAsyncLoginManager interface (and those callback interfaces) I introduced in the previous revision.
Attachment #757361 - Attachment is obsolete: true
Attachment #767832 - Flags: review?(dolske)
(In reply to Mano from comment #27)
> Created attachment 767832 [details] [diff] [review]
> Async API + Fake-async implementation
> 
> So as it turns out there's only a single c++ consumer for the login manger,
> nsFormFillController, which uses just a single method, which is
> AutoCompleteSearch. For that method alone (which I left out of the new
> module, at least for now) we can create a "bridge" component until
> nsFormFillController is also converted to JS.

Makes sense. That code shouldn't even really be in the login manager, there's no reason for it to be there (per comment above it). pwmgr should just have a separate JS XPCOM nsIAutoCompleteSearch implementation that uses the public LoginManager.jsm APIs, and we should make nsFormFillController use that.
Comment on attachment 767832 [details] [diff] [review]
Async API + Fake-async implementation

Review of attachment 767832 [details] [diff] [review]:
-----------------------------------------------------------------

r+ with these very critical fixes. \o/

::: toolkit/modules/LoginManager.jsm
@@ +19,5 @@
> +        deferred.resolve(rv);
> +      }
> +      catch(ex) {
> +        deferred.reject(ex);
> +      } 

trailing whitespace!!!

@@ +43,5 @@
> +   *        The login to be removed (nsILoginInfo).
> +   * @return {Promise}
> +   */
> +  removeLogin: SimpleFakeAsyncLoginManagerFunction("removeLogin"),
> +  

spaces on empty line!!!
Attachment #767832 - Flags: review?(dolske) → review+
Blocks: 890708
No longer blocks: 890708
Duplicate of this bug: 890708
Attachment #736920 - Attachment is obsolete: true
Attachment #736917 - Attachment is obsolete: true
Attachment #736916 - Attachment is obsolete: true
Attachment #736915 - Attachment is obsolete: true
Attachment #736913 - Attachment is obsolete: true
Attachment #736912 - Attachment is obsolete: true
Mano, can we land attachment 767832 [details] [diff] [review]? It's just step 1, but it seems like it wouldn't hurt to get the new API added (and we can investigate changing some consumers in parallel, while the work to implement an alternate back-end is on hold).
Flags: needinfo?(mano)
Yes, will do so soon. I'll get back to this bug for real once we make some good progress on async places transaction, which are a higher priority right now.
Flags: needinfo?(mano)
Note that we're also working on sqlite for chrome workers. Would this be a good candidate?
FWIW Paolo is investigating converting this storage to json. I checked and I have 145 entries in my signons.sqlite DB, which takes 393,216 bytes on disk.

I then converted this data to a JSON representation, with all the columns from the SQL version, using the full name for the columns on every entry. The JSON file ended with a size of 75,793 bytes.
We'll probably able to reduce this even further once we have bug 854169.
I'd actually worry more about the in-memory usage.

But even if you take the numbers from comment 35 and double them (for UTF8-to-UCS2), and then double again (for some vague JS per-object runtime overhead that I don't know how to estimate), I suspect it's still less than SQLite's own overhead. And because SecretDecoderRing is dumb about how it emits encrypted strings (username/password), there are some potentially large reductions there.

I've got ~267 logins, and about:memory says sqlite is using 0.35MB for signons.sqlite.
True, very good point. I talked a bit with njn about it and filed a bug about getting an API to measure the in-memory usage of that, bug 932152.

For the interested, also filed bug 932156 for reporting it in about:memory.
Posted patch Plain conversion to JSON (obsolete) — Splinter Review
This preliminary patch tests the approach I discussed with Dolkse and Gavin.

The idea is to keep the nsILoginManager API synchronous, while adding an
asynchronous getter that ensures initialization (not present in this patch).
If a synchronous service getter like "Services.logins" is used and a method
on the interface is called, we fall back to synchronous initialization unless
we already initialized asynchronously (similarly to the search service).

This was quite easy to implement, and consumers don't need to be changed,
except that we should remove the code that accesses the database directly
to create a transaction in Sync:

http://mxr.mozilla.org/mozilla-central/source/services/sync/modules/engines/passwords.js#83

This patch has just a field-by-field conversion for now, but I think we can
polish it more. For example, I don't think we need the "id" field anymore.
We can also refactor the code for clarity. In any case, this is expected to
work locally (note that it deletes the old database after import).

Import from the previous database is asynchronous and it does not have a
synchronous fallback. I think it should stay like that, as the only possible
side-effect is that we might not populate password fields during the first
session restore after the update.

We should still figure out how to handle all the possible cases of import
failures and application version rollback, and add more unit tests.
Assignee: mano → paolo.mozmail
Attachment #767832 - Attachment is obsolete: true
Depends on: 940408
This version includes some changes to make unit tests work, though this might not be the final way to handle testing. Some high-level feedback on the patch could be useful while I'm working on adding more tests.

It is still missing tests for the SQLite migration and corrupt database backup. Also, the Sync code dependent on the database connection still needs to be updated.
Attachment #825517 - Attachment is obsolete: true
Attachment #8342300 - Flags: feedback?(dolske)
In addition to Sync, an Android test is affected by the fact that there is no database anymore:

http://mxr.mozilla.org/mozilla-central/source/mobile/android/base/tests/testPasswordProvider.java

I'm not sure what this Java test is checking exactly, how it can be changed or whether it can be removed.

Wes, how do you suggest to proceed here?
Flags: needinfo?(wjohnston)
Hmm... We're currently opening the database in java code using a small sqlite bridge we wrote to use our sqlite libraries. We'd have to rewrite that to access a JSON database instead. Fixable, but non-trivial. Also, we'll have two different places potentially concurrently writing to the same json file which scares me. I haven't looked at the patch. Does this support concurrent access?
Flags: needinfo?(wjohnston)
Paolo: you need to reimplement mobile/android/base/db/PasswordsProvider.java to allow for concurrent read-write to the same backend used by your new password manager impl.

Bear in mind that this will be read and written from a different process than Gecko, perhaps at the same time as Gecko is using it, and in other instances Gecko won't/can't be running.
I don't have the architectural context to understand the purpose of the PasswordsProvider here.

If concurrent access to the passwords data from a process different than Gecko is a requirement, I think we might need to limit the JSON conversion to Desktop only. Alternatively, in Desktop also, we could keep JSON data as a memory cache and write it to SQLite in the background, though this will reduce the efficiency and memory usage gains of serializing to JSON.
(In reply to :Paolo Amadini from comment #44)

> I don't have the architectural context to understand the purpose of the
> PasswordsProvider here.

There's an Android concept called a ContentProvider. It's an abstraction around storage, supporting querying, insert, update, delete.

This is what we use to allow Sync to talk to our data stores. Most of Fennec's data stores are native Java.

(Fennec's browser bits also talk to the content providers; it's not just Sync.)

PasswordsProvider is a ContentProvider that lives in a separate process (with its own loaded NSS and sqlite), which opens the passwords database for access.

 
> If concurrent access to the passwords data from a process different than
> Gecko is a requirement

Alas so, unless you can loosen some of the constraints that apply to password storage (e.g., not needing to use NSS for the crypto parts, and thus being able to run the CP in the same process).

Alternatively, we can alter the Gecko-side password access to talk via IPC to the ContentProvider process, so the only access is from that Java process.

At that point, we can ditch the DB ourselves -- we'd just be implementing the nsILoginManager API on top of Java calls, just as you've implemented it on top of SQLite.

That sounds like the best option, to be honest. Thoughts?


> I think we might need to limit the JSON conversion
> to Desktop only. Alternatively, in Desktop also, we could keep JSON data as
> a memory cache and write it to SQLite in the background, though this will
> reduce the efficiency and memory usage gains of serializing to JSON.

I definitely don't want to impede desktop perf improvements.
Comment on attachment 8342300 [details] [diff] [review]
Including some updates to unit tests

Review of attachment 8342300 [details] [diff] [review]:
-----------------------------------------------------------------

Just a high-level look; didn't really look at the new JSMs in depth, nor some of the mostly-mechanical changes to storage-mozStorage.

::: browser/base/content/browser.js
@@ +1101,5 @@
> +        Services.logins;
> +      } catch (ex) {
> +        Cu.reportError(ex);
> +      }
> +    }, 3000);

It makes me sad to have yet another delayed startup within _delayedStartup(). :( We should really figure out a better solution at some point.

It would also be interesting to add a simple telemetry probe to report how long after startup login manager is inited (either by this call or implicitly elsewhere). Because I suspect 3000ms is just a random guess -- it would be good to know (data!) if this can actually be much later, or if it's commonly created before this timeout even gets to fire.

::: toolkit/components/passwordmgr/LoginImport.jsm
@@ +89,5 @@
> +            let passwordField = row.getResultByName("passwordField");
> +            let encryptedUsername = row.getResultByName("encryptedUsername");
> +            let encryptedPassword = row.getResultByName("encryptedPassword");
> +
> +            // The "guid" field was introduced in schema version 2.

I guess you've already written it, but I think we should probably just drop support for older schema versions...

We added v.5. in bug 718817 (Firefox 13, 2/2012), v.4 in bug 465636 (Firefox 4, 3/2010), and v.3 in bug 316084 (Firefox 3.5+3.6, 3/2009)... I don't even care before that. :)

Of these, v.5 is the only one remotely plausible/interesting to continue supporting (ie, code to upgrade from v.4 to v.5). But it's not actually used on desktop (it was added for Fennec's Sync), and as it's just adding a moz_deleted_logins table, it's not actually relevant to the JSON migration anyway.

So, I'd add an explicit check for the schema version, and fail if it's < 4 (ie, no import, you get a fresh empty JSON version). And remove all the legacy format conversion stuff.

@@ +149,5 @@
> +            Cu.reportError("Error importing login: " + ex);
> +          }
> +        }
> +
> +        rows = yield connection.execute("SELECT * FROM moz_disabledHosts");

At risk of scope-creep... I wonder if this would be a good time to import this data to nsIPermissionManager, instead of continuing to have login manager track them.

It's always been kind of a special case (I suspect it predates the permission manager?), and it would be nice to unify it with all our other permissions.

@@ +173,5 @@
> +            let timeDeleted = row.getResultByName("timeDeleted");
> +
> +            this.store.data.deletedLogins.push({
> +              id: this.store.data.nextId++,
> +              guid: guid,

Hmm, not sure if this is right, but I think it is? Worth checking to make sure the Sync code only cares about the GUID.

I suppose there's also the question as to if the JSON storage should bother with the ID field at all, given we have a GUID. Probably not?

::: toolkit/components/passwordmgr/storage-mozStorage.js
@@ +2,1 @@
>  /* vim: set sw=4 ts=4 et lcs=trail\:.,tab\:>~ : */

This file really doesn't have anything to do with mozStorage (aka SQLite) any more -- I'd suggesting making an hg copy of it to storage-JSON.js, and then hg removing the now-unused storage-mozStorage.js. Or just an hg rename, not sure if there's any advantage one way or the other.

@@ +21,5 @@
> +                                  "resource://gre/modules/LoginImport.jsm");
> +XPCOMUtils.defineLazyModuleGetter(this, "LoginStore",
> +                                  "resource://gre/modules/LoginStore.jsm");
> +XPCOMUtils.defineLazyModuleGetter(this, "OS",
> +                                  "resource://gre/modules/osfile.jsm");

Is there actually any benefit to making these lazy? I'd expect that once this file is loaded, it's immediately going to use most/all of those.

@@ +105,5 @@
> +        dump("initWithFile: waiting for initialization\n");
> +
> +        while (!finished) {
> +            Services.tm.currentThread.processNextEvent(true);
> +        }

Need to look carefully about if this can get wedged by a master password prompt.

Can we just call ensureDataReady() here instead? (And kill off any existing task, if it's already running.)

@@ +223,5 @@
>              guid:                loginClone.guid,
>              encType:             encType,
> +            timeCreated:         new Date(loginClone.timeCreated).toJSON(),
> +            timeLastUsed:        new Date(loginClone.timeLastUsed).toJSON(),
> +            timePasswordChanged: new Date(loginClone.timePasswordChanged).toJSON(),

These should remain numeric timestamps, not Date objects.

@@ +248,5 @@
> +        let array = this._store.data.logins;
> +        for (let i = 0; i < array.length; i++) {
> +          if (array[i].id == idToDelete) {
> +            array.splice(i, 1);
> +            this._store.saveSoon();

I'm wary of having the caller needing to call saveSoon(). Can we use a getter/setter or JS proxy to make this happen automagically?

::: toolkit/components/passwordmgr/test/unit/test_storage_mozStorage_5.js
@@ +60,5 @@
>  // sanity check that the file copy worked
>  do_check_true(cfile.exists());
>  
> +// will init mozStorage module with corrupt database
> +storage = LoginTest.reloadStorage(OUTDIR, filename);

I don't know if this test is so directly useful any more. I suppose it's worth checking to see what happens if we try to migrate to the the new JSON backend with a corrupted DB, but the more useful test would be to see what happens if signons.json is corrupted.

::: toolkit/components/passwordmgr/test/unit/test_storage_mozStorage_migrate.js
@@ +16,5 @@
>  const CURRENT_SCHEMA = 5;
>  
>  function run_test() {
>  
> +return;

weak. ;-)
Attachment #8342300 - Flags: feedback?(dolske)
(In reply to Justin Dolske [:Dolske] from comment #46)

> > +            this.store.data.deletedLogins.push({
> > +              id: this.store.data.nextId++,
> > +              guid: guid,
> 
> Hmm, not sure if this is right, but I think it is? Worth checking to make
> sure the Sync code only cares about the GUID.

I think you're safe with just the GUID. Sync on Android does look at the ID, but only for logging. Sync on desktop never looks at the ID.
 

> This file really doesn't have anything to do with mozStorage (aka SQLite)
> any more -- I'd suggesting making an hg copy of it to storage-JSON.js, and
> then hg removing the now-unused storage-mozStorage.js. Or just an hg rename,
> not sure if there's any advantage one way or the other.

hg rename is exactly equivalent to hg copy && hg remove.


> Need to look carefully about if this can get wedged by a master password
> prompt.

Bear in mind that Sync will trigger a MP prompt shortly after the browser starts, and sadness sometimes ensues if two components do so at the same time... worth testing.


> I don't know if this test is so directly useful any more. I suppose it's
> worth checking to see what happens if we try to migrate to the the new JSON
> backend with a corrupted DB, but the more useful test would be to see what
> happens if signons.json is corrupted.

For the record (possibly academic at this point): we expect a greater incidence of corrupt DBs on Android, Because Android®.
(In reply to Richard Newman [:rnewman] from comment #45)
> PasswordsProvider is a ContentProvider that lives in a separate process
> (with its own loaded NSS and sqlite), which opens the passwords database for
> access.
> 
> > If concurrent access to the passwords data from a process different than
> > Gecko is a requirement
> 
> Alas so, unless you can loosen some of the constraints that apply to
> password storage (e.g., not needing to use NSS for the crypto parts, and
> thus being able to run the CP in the same process).

This is still unclear to me, I've searched the wiki for Android Sync but didn't find an overview of which processes are involved and at which time they run. You mentioned that Gecko might not be running even if the process containing the PasswordsProvider is active. The PasswordsProvider is used only by Sync, thus I assume the Sync process cannot load NSS and thus it uses IPC to talk to PasswordsProvider. But, at the time when the Sync process actually calls PasswordsProvider, is it possible that the Gecko process is not running?

> Alternatively, we can alter the Gecko-side password access to talk via IPC
> to the ContentProvider process, so the only access is from that Java process.
> 
> At that point, we can ditch the DB ourselves -- we'd just be implementing
> the nsILoginManager API on top of Java calls, just as you've implemented it
> on top of SQLite.

Using direct IPC is possible, but in practice it would mean making every single method of the Logins API asynchronous. But this is also true for the solution that uses SQLite queries off the main thread.

The only way to keep a synchronous API, and still allow concurrent access from different processes, would imply receiving update notifications in the Gecko process, re-querying the data from the storage in the background after an update notification, and merging the new data into the in-memory storage with synchronous access. Very complicated.

The only simple solution is that the Gecko process is the only one that accesses the storage, and other processes use IPC to query the Gecko process for the data. In that case, we could use any storage back-end in Gecko, including JSON. But this is not possible if Sync might run without the Gecko process, and the Gecko process might start or stop without notice while Sync is running.

That said, unless I'm missing a simpler solution for Android, I think we can move forward on Desktop with the JSON storage back-end for an immediate performance gain with no API change, and keep the main-thread SQLite storage back-end on Android until we implement some of the solutions above (which might or might not require a change to the Logins API).

More thoughts?
(In reply to :Paolo Amadini from comment #48)
> This is still unclear to me, I've searched the wiki for Android Sync but
> didn't find an overview of which processes are involved and at which time
> they run. You mentioned that Gecko might not be running even if the process
> containing the PasswordsProvider is active. The PasswordsProvider is used
> only by Sync, thus I assume the Sync process cannot load NSS and thus it
> uses IPC to talk to PasswordsProvider. But, at the time when the Sync
> process actually calls PasswordsProvider, is it possible that the Gecko
> process is not running?

Yes. Android handles the launch of Sync. We avoided the memory and time overhead of starting Gecko just to write to databases. Sync's can happen on Android while Gecko is or isn't running. Or Gecko can start in the middle of a sync. Just because Sync is running, doesn't necessarily mean Gecko is (or ever has) run.

> The only way to keep a synchronous API, and still allow concurrent access
> from different processes, would imply receiving update notifications in the

Or desktop continues to use sqlite and switches to async OMT api's. It sounds like we're going to try to have both, which scares me a bit. Can we at least make sure the toolkit mochitests are actually running on Android before we do that?

> That said, unless I'm missing a simpler solution for Android, I think we can
> move forward on Desktop with the JSON storage back-end for an immediate
> performance gain with no API change, and keep the main-thread SQLite storage
> back-end on Android until we implement some of the solutions above (which
> might or might not require a change to the Logins API).
> 
> More thoughts?

This isn't the only json db we have where concurrent access is a problem (webapps are also json, and can be accessed/written to from multiple app processes). I wonder if its worth pursuing a real json db solution.
(In reply to :Paolo Amadini from comment #48)

> This is still unclear to me, I've searched the wiki for Android Sync but
> didn't find an overview of which processes are involved and at which time
> they run.

This is Androidy, not Mozillaey, I'm afraid. You won't find much relevant to this particular issue in our docs.


> You mentioned that Gecko might not be running even if the process
> containing the PasswordsProvider is active. The PasswordsProvider is used
> only by Sync, thus I assume the Sync process cannot load NSS and thus it
> uses IPC to talk to PasswordsProvider. But, at the time when the Sync
> process actually calls PasswordsProvider, is it possible that the Gecko
> process is not running?

It's probably worth clarifying: Android controls which processes are running (directed by manifest hints). We do not `fork` anywhere. In fact, Android is in control of when activities run, when SyncAdapters are created, when ContentProviders are instantiated, etc. etc.

There are two processes on Android.

  org.mozilla.fennec runs a bunch of threads, including one for Gecko (if it's running), one for the UI (if it's running), and a bunch for Sync if it's running.

  org.mozilla.f3nn3c runs the PasswordsProvider; we note in the manifest that this needs to be a separate process, so Android starts it there.

If a sync is occurring, and the user is syncing passwords, then both processes are alive. Gecko might not be loaded, and there might be no Gecko thread.


> Using direct IPC is possible, but in practice it would mean making every
> single method of the Logins API asynchronous. But this is also true for the
> solution that uses SQLite queries off the main thread.

ContentProvider calls are synchronous, so the only thing we'd need to pay attention to is the calls into Java from nsILM, which could easily just call via JNI directly. (Java uses threads for concurrency, not callbacks and event loops, so synchronous calls are common.)


> The only simple solution is that the Gecko process is the only one that
> accesses the storage, and other processes use IPC to query the Gecko process
> for the data.

I think the reverse is equivalent: Gecko would make synchronous ContentProvider calls, just like the other consumers of the ContentProviders on Android. Android would take care of the IPC marshaling and the database lifecycle.

So long as we implement nsILoginManager{,Storage} correctly, we could do this today.


> In that case, we could use any storage back-end in Gecko,
> including JSON. But this is not possible if Sync might run without the Gecko
> process, and the Gecko process might start or stop without notice while Sync
> is running.

But the reverse is possible. This is why we settled on this architecture, by the way: fighting the grain of Android is very troublesome.
Thanks a lot for the explanations, this is much clearer now!

(In reply to Richard Newman [:rnewman] from comment #50)
> Gecko would make synchronous
> ContentProvider calls, just like the other consumers of the ContentProviders
> on Android. Android would take care of the IPC marshaling and the database
> lifecycle.
> 
> So long as we implement nsILoginManager{,Storage} correctly, we could do
> this today.

This sounds good! In practice, if I understand correctly, we could already make an nsILoginManagerStorage implementation that calls the PasswordsProvider (that uses synchronous SQLite), and that would not significantly change performance (because Gecko uses synchronous SQLite too). At that point, we could reimplement the PasswordsProvider using an in-memory store (physically in the org.mozilla.f3nn3c process) and a JSON back-end on disk, and that would improve Gecko performance and reduce jank.

This confirms that we need two nsILoginManagerStorage back-ends, one for Desktop and one for Android.
(In reply to :Paolo Amadini from comment #51)

> Thanks a lot for the explanations, this is much clearer now!

Sorry that it's so obtuse!

> This sounds good!

Conveniently enough, I asked Deb to start a project page for this yesterday :D

https://wiki.mozilla.org/Mobile/Projects/Java-side_replacement_for_nsILoginManager

This will also fix some of our Master Password woes (e.g., inability to sync while MP is enabled, and Bug 939900).
 
> This confirms that we need two nsILoginManagerStorage back-ends, one for
> Desktop and one for Android.

I'll file a bug.
(In reply to Justin Dolske [:Dolske] from comment #46)
> > +        rows = yield connection.execute("SELECT * FROM moz_disabledHosts");
> 
> At risk of scope-creep... I wonder if this would be a good time to import
> this data to nsIPermissionManager, instead of continuing to have login
> manager track them.

This is definitely the best time, if we want to do that. Questions:

- Will the nsIPermissionManager interface remain synchronous?
- Are we concerned by the fact we need to enumerate all permissions to get the list of disabled hosts for the nsILoginManager API?
- Does nsIPermissionManager distinguish "http://example.com" and "https://example.com"?
- nsILoginManager distinguishes "http://Example.com" and "http://example.com" (maybe incorrectly). I guess we are fine with changing the behavior of the API to consider it the same host?

> It would also be interesting to add a simple telemetry probe to report how
> long after startup login manager is inited (either by this call or
> implicitly elsewhere). Because I suspect 3000ms is just a random guess -- it
> would be good to know (data!) if this can actually be much later, or if it's
> commonly created before this timeout even gets to fire.

This is a good idea, I think I can add some telemetry in the next iteration.

> We added v.5. in bug 718817 (Firefox 13, 2/2012), v.4 in bug 465636 (Firefox
> 4, 3/2010), and v.3 in bug 316084 (Firefox 3.5+3.6, 3/2009)... I don't even
> care before that. :)
> 
> So, I'd add an explicit check for the schema version, and fail if it's < 4
> (ie, no import, you get a fresh empty JSON version). And remove all the
> legacy format conversion stuff.

I can do that. For now, I've only written a stub for the DownloadImport tests, importing from v2 just because we have that database in the test data folder.

Since we don't have a useful v5 test database, for the final version of the tests I can create a database on the fly like we do for the DownloadImport.jsm tests, and only test v4 and v5.

> > +            let timeDeleted = row.getResultByName("timeDeleted");

Since deletedLogins is only ever relevant for Android Sync, I removed it from the Desktop code.

> > +        while (!finished) {
> > +            Services.tm.currentThread.processNextEvent(true);
> > +        }
> 
> Need to look carefully about if this can get wedged by a master password
> prompt.

This is only ever used by regression tests, I've clarified that in the code.

> @@ +248,5 @@
> > +        let array = this._store.data.logins;
> > +        for (let i = 0; i < array.length; i++) {
> > +          if (array[i].id == idToDelete) {
> > +            array.splice(i, 1);
> > +            this._store.saveSoon();
> 
> I'm wary of having the caller needing to call saveSoon(). Can we use a
> getter/setter or JS proxy to make this happen automagically?

I couldn't find a better way that allows both synchronous and asynchronous callers to behave atomically, all other things I've tried (like calling a function with a Task callback) just seem to be more complicated. But suggestions are welcome!
Posted patch First working version (obsolete) — Splinter Review
Dolske, do you think you can do a more in-depth review here, or do you prefer the final review to be done by someone else?

The ID field of logins is not exposed and could be removed, but at the cost of changing the "storage-json.js" code significantly. We don't have very good regression tests for it, unfortunately. What do you think we should do?
Attachment #8342300 - Attachment is obsolete: true
Attachment #8344655 - Flags: feedback?(dolske)
Bug 947694 might be relevant here, or experience from this much bug be helpful there.
(In reply to Justin Dolske [:Dolske] from comment #55)
> Bug 947694 might be relevant here, or experience from this much bug be
> helpful there.

This depends on whether we would like to store the data in compressed form. I'm not sure about that, one of the advantages of JSON is that it is roughly searchable and editable for debugging. A smaller file size, on the other hand, might be a performance gain.
(In reply to :Paolo Amadini from comment #53)
> > At risk of scope-creep... I wonder if this would be a good time to import
> > this data to nsIPermissionManager, instead of continuing to have login
> > manager track them.
> 
> This is definitely the best time, if we want to do that. Questions:

I talked with Gavin and we observed there might be some future work to make nsIPermissionManager asynchronous. So, the idea is to avoid introducing a new, necessarily synchronous caller here. In the future, it will not be too much effort to set up a migration step if we want to move this to the permissions system.
(In reply to :Paolo Amadini from comment #53)

> > At risk of scope-creep... I wonder if this would be a good time to import
> > this data to nsIPermissionManager, instead of continuing to have login
> > manager track them.
> 
> This is definitely the best time, if we want to do that. Questions:
> 
> - Will the nsIPermissionManager interface remain synchronous?
...
> I talked with Gavin and we observed there might be some future work to make
> nsIPermissionManager asynchronous. So, the idea is to avoid introducing a
> new, necessarily synchronous caller here. In the future, it will not be too
> much effort to set up a migration step if we want to move this to the
> permissions system.

I'm not sure that matters much. This API is only used in a few places and we keep piling other new permissions into the existing permission manager. At least from a quick skim, the existing call sites wouldn't be that bad to make async.

> - Are we concerned by the fact we need to enumerate all permissions to get
> the list of disabled hosts for the nsILoginManager API?

Hmm? Oh, no, I was assuming these APIs would just go away entirely, in favor of the usual permission manager API. Although in the short term it might be easier to implement them on top of the permission manager, to minimize the code that has to be written to land this bug.

> - Does nsIPermissionManager distinguish "http://example.com" and
> "https://example.com"?

No. I think that's fine, it makes permissions more consistent.

> - nsILoginManager distinguishes "http://Example.com" and
> "http://example.com" (maybe incorrectly). I guess we are fine with changing
> the behavior of the API to consider it the same host?

That would be a bug in the current code. Oops. :/ [Can this actually happen? We don't appear to enforce it, but I don't know offhand if something in the normal callchains will already normalize it.]


> > > +        while (!finished) {
> > > +            Services.tm.currentThread.processNextEvent(true);
> > > +        }
> > 
> > Need to look carefully about if this can get wedged by a master password
> > prompt.
> 
> This is only ever used by regression tests, I've clarified that in the code.

Ah. I'll look in the code review, but I wonder if we can just eliminate this entirely (e.g., make the caller pass in a JS object or JSON string -- s/initFromFile/initFromJSON/). That might make testing easier in general, since you wouldn't need to create static data in a file... Hmm!
(In reply to :Paolo Amadini from comment #54)

> Dolske, do you think you can do a more in-depth review here, or do you
> prefer the final review to be done by someone else?

I'll do it.

> The ID field of logins is not exposed and could be removed, but at the cost
> of changing the "storage-json.js" code significantly. We don't have very
> good regression tests for it, unfortunately. What do you think we should do?

I don't feel too strongly about this -- it would be good to do, but if it's a PITA then there's no harm in keeping it.


(In reply to :Paolo Amadini from comment #56)
> (In reply to Justin Dolske [:Dolske] from comment #55)
> > Bug 947694 might be relevant here, or experience from this much bug be
> > helpful there.
> 
> This depends on whether we would like to store the data in compressed form.
> I'm not sure about that, one of the advantages of JSON is that it is roughly
> searchable and editable for debugging. A smaller file size, on the other
> hand, might be a performance gain.

I don't think retaining searching or editability are goals here. In fact, I'd say they're anti-goals. We had quite a few cases of people editing the old .txt file by hand an mangling it.

That said, bug 947694 doesn't even have a final patch yet. So we probably shouldn't block on it, but it would be good to make sure we can easily switch in the future. I mostly mention it as a heads-up.
Comment on attachment 8344655 [details] [diff] [review]
First working version

Review of attachment 8344655 [details] [diff] [review]:
-----------------------------------------------------------------

Still haven't gone through LoginStore.jsm in detail.

::: toolkit/components/passwordmgr/LoginImport.jsm
@@ +78,5 @@
> +    return Task.spawn(function task_DI_import() {
> +      let connection = yield Sqlite.openConnection({ path: this.path });
> +      try {
> +
> +        let rows = yield connection.execute("SELECT * FROM moz_logins");

Minor comment: It might help with readability/indentedness/testability to split this up into a "get everything from the DB" function and a "convert and save into JSON" function.

@@ +112,5 @@
> +                encType = Ci.nsILoginManagerCrypto.ENCTYPE_SDR;
> +              }
> +            }
> +
> +            // The time and count fields were introduced in schema version 4.

The older schema imports should go away.

@@ +141,5 @@
> +              guid: guid,
> +              encType: encType,
> +              timeCreated: new Date(timeCreated).toJSON(),
> +              timeLastUsed: new Date(timeLastUsed).toJSON(),
> +              timePasswordChanged: new Date(timePasswordChanged).toJSON(),

As noted before, these should stay integers, everywhere.

::: toolkit/components/passwordmgr/LoginStore.jsm
@@ +256,5 @@
> +      this.data.disabledHosts = [];
> +    }
> +
> +    // Indicate that the current version of the code has touched the file.
> +    this.data.version = 1;

This should be assigned from a global variable (|const DATA_VERSION = 1| or something like that).

::: toolkit/components/passwordmgr/storage-json.js
@@ +97,5 @@
>       * init
>       *
>       */
>      init : function () {
> +        this._initInternal(null);

Can we kill _initInternal with some ES5 cleverness here?

  init: function (aFileTestPath = null) { ... }

@@ +158,5 @@
> +                    // No need to wait for the file removal.
> +                    OS.File.remove(sqlitePath).then(null, Cu.reportError);
> +                }
> +
> +                Services.prefs.setBoolPref("signon.importedFromSqlite", true);

Can't delete the sqlite file yet. If we do, that means someone trying a build with this code and then downgrading will lose all their logins (because the old version can't read the JSON file).

We should do basically what the current code does -- try to load logins.json, and if it's not there try to import from signons.sqlite. [And if you don't have any logins, store an empty JSON file to prevent trying to import every time.]

In a few releases, we can nuke the old signon.sqlite (and maybe the import code too). See bug 717490 / bug 925101, but let's not wait quite that long. :) Maybe after the next ESR?

@@ +159,5 @@
> +                    OS.File.remove(sqlitePath).then(null, Cu.reportError);
> +                }
> +
> +                Services.prefs.setBoolPref("signon.importedFromSqlite", true);
> +            }.bind(this)).then(null, Cu.reportError);

Wonder if it would be useful to add a "logins-available" notification here, so that addons like a GMail-checker can politely wait for the async load to finish.

(I suppose that might also eliminate the need for the processNextEvent stuff for tests -- they could just init and wait for logins-available?)

@@ +389,5 @@
>  
>      /*
>       * getAllEncryptedLogins
>       *
> +     * Not implemented.

Mmmf. We should just remove this now.

@@ +439,4 @@
>  
> +        let conditions = [];
> +
> +        function match(row) {

Nit: row is kind of a weird name to use here. "login"?

@@ +440,5 @@
> +        let conditions = [];
> +
> +        function match(row) {
> +            for (let field in matchData) {
> +                let doesMatch = true;

|doesMatch| is unused.

@@ +447,5 @@
> +                    // Historical compatibility requires this special case
> +                    case "formSubmitURL":
> +                        if (value != null) {
> +                            if (row.formSubmitURL && row.formSubmitURL != value)
> +                            {

This isn't right when row.formSubmitURL is null.

Should be:

  case "formSubmitURL":
  if (value != null) {
    if (row.formSubmitURL != '' && row.formSubmitURL != value)
      return false;
    break;
  }

When this value is |null| in a login, it means "this login is not for use in forms" (eg, HTTP auth logins). Most form logins will have an actual URL here, but in some case's we didn't save it or don't know it, and so it will be empty-string, which is treated as a wildcard.

@@ +467,5 @@
> +                    case "timeLastUsed":
> +                    case "timePasswordChanged":
> +                    case "timesUsed":
> +                        if (value == null) {
> +                            if (row[field])

This isn't right (consider when row[field] is empty-string). I think all you need here is:

  ...
  case "foo":
  case "bar":
    if (value !== row[field])
      return false;

SQL needed an if/else to correctly match |null| values, but JS can do better.

@@ +473,5 @@
> +                                return false;
> +                            }
> +                        } else {
> +                            switch (field) {
> +                                case "id":

I don't understand why this extra else and switch is here.

@@ +657,5 @@
> +        };
> +        let matchData = { };
> +        for each (let field in ["hostname", "formSubmitURL", "httpRealm"])
> +            if (loginData[field] != '')
> +                matchData[field] = loginData[field];

I don't think this is correct, because formSubmitURL should be handled specially (per original code).
Attachment #8344655 - Flags: feedback?(dolske) → feedback-
(In reply to Justin Dolske [:Dolske] from comment #60)
> > +                    // Historical compatibility requires this special case
> > +                    case "formSubmitURL":
> 
> Most form logins will have an actual URL
> here, but in some case's we didn't save it or don't know it, and so it will
> be empty-string, which is treated as a wildcard.

Before looking into the other mostly mechanical fixes in detail, I'd like to verify if I understand this legacy database condition correctly.

In normal conditions, for the purpose of login autocomplete, HTML forms are identified using both the prePath of the URI on which they are located, and the prePath of the URI where the data will be submitted. This is represented by the hostname and formSubmitURL properties of the stored nsILoginInfo.

When a new login for use in forms is saved (after the user replies to the password prompt), it is always stored with both the hostname and the formSubmitURL (that will be equal to the hostname when the form has no "action" attribute).

When the same form is displayed again, the password is autocompleted. If there is another form on the same site that submits to a different site, it is considered a different form, so the password is not autocompleted, but a new password can be stored for the other form.

However, the login database might contain data for an nsILoginInfo that has a valid hostname, but an empty formSubmitURL. This means that the login applies to all forms on the site, regardless of where they submit data to.

A site can have at most one such login, and in case it is present, then it is not possible to store separate logins for forms on the same site that submit data to different sites.

The only way to have such condition is to be using logins that were initially saved by a very old version of Firefox (or because of data manually added by an extension). Changing the password does not alter this condition. If the stored password is deleted and then entered again, then the normal condition is restored.

Logins stored in this way are still subject to the same security issues as those noted in bug 360493.

I suppose that ideally we should migrate those old logins so that they are stored together with a valid formSubmitURL. The issue is that we're not sure about which formSubmitURL to use. We have two alternatives:

1. Just copy the formSubmitURL from the hostname on migration, assuming that the form submits to the same site. In case we are wrong and the form submits to a different site, password autocomplete may disappear from a form on which it previously worked. The actual password is not really lost, because it can still be recovered by reading it from the Preferences dialog.

2. Populate the formSubmitURL the first time the password is autocompleted, using the actual value from the "action" attribute of the form. This allows us to restore normal behavior for the site, though we should still need to handle the legacy case until all the sites for which we stored passwords have been viewed. I guess we'll eventually do #1 in any case, but we'll have migrated most of the passwords at that time.

Is this correct? What do you think about the possible solutions?
Posted file Test data (obsolete) —
In the meantime, I've created some more current test data, representing the normal use cases. The legacy case of the empty formSubmitURL will be tested separately.

I'll use these for some new regression tests and in particular for the tests of database migration. I'll programmatically attach the metadata like creation time and number of uses before generating the source SQLite file.

I've not included data that doesn't make sense or cannot be created by the current autocomplete and protocol authentication code.

Are the cases exhaustive, or is there some other case I should consider?
Attachment #8348111 - Flags: feedback?(dolske)
Posted patch Work in progress (obsolete) — Splinter Review
I'm in the process of rewriting most of the regression tests so that they work on nsILoginManager instead of nsILoginManagerStorage, and check the expected behavior of the nsILoginManager methods. Now the only test that works directly on nsILoginManagerStorage is the one for the Android-only SQLite update and recovery.

This will allow a more thorough rewriting of the JSON module, as well as testing the consistency of the migration to nsIPermissionManager for the disabled hosts. The latter is a separate process that should be done at the nsILoginManager level, independently from which nsILoginManagerStorage module is active in the browser. In fact, in case a custom storage module is active, data for disabled hosts should be migrated away from it, but migration of login data from SQLite should not happen.
Attachment #8344655 - Attachment is obsolete: true
Depends on: 956332
Depends on: 956550
Whiteboard: [snappy:p1] → [snappy:p1] p=0
Duplicate of this bug: 975988
Duplicate of this bug: 988289
Mass tracking-e10s flag change. Filter bugmail on "2be0fcce-e36a-4e2c-aa80-0e3d33eb5406".
tracking-e10s: --- → +
Posted patch Updated work in progress (obsolete) — Splinter Review
Attachment #8348743 - Attachment is obsolete: true
I'll try to get this to completion in the next 15 days. In case blockers are found for which this is not possible, I'll file them separately for the next Firefox Desktop Iteration.
Flags: firefox-backlog?
Whiteboard: [snappy:p1] p=0 → [snappy:p1] p=13
Comment on attachment 8348111 [details]
Test data

Clearing this old feedback request since it's been handled during the review of bug 956332.
Attachment #8348111 - Flags: feedback?(dolske)
Flags: firefox-backlog? → firefox-backlog+
Whiteboard: [snappy:p1] p=13 → [snappy:p1] p=13 s=it-31c-30a-29b.3 [qa?]
No longer blocks: fxdesktopbacklog
Florin, please make sure this gets verified in this iteration. We'll need to make sure the changes here don't regress legacy Password Manager functionality.
QA Contact: florin.mezei
Whiteboard: [snappy:p1] p=13 s=it-31c-30a-29b.3 [qa?] → [snappy:p1] p=13 s=it-31c-30a-29b.3 [qa+]
QA Contact: florin.mezei → andrei.vaida
(In reply to Justin Dolske [:Dolske] from comment #46)
> I guess you've already written it, but I think we should probably just drop
> support for older schema versions...
> 
> v.5 in bug 718817 (Firefox 13, 2/2012)
> v.4 in bug 465636 (Firefox 4, 3/2010)
> v.3 in bug 316084 (Firefox 3.5+3.6, 3/2009)

It seems from the recent statistics that Benjamin collected that there are a number of users still running Firefox 3.6. While the work in bug 928173 is not going to affect them, I still think that we can facilitate updates from 3.6 by keeping passwords data. It doesn't cost us any effort, since we already have the code and test written, and we can remove the code when the upgrade campaign is finished.

I agree that older versions can be dropped from the upgrade code.
Posted patch The patch (obsolete) — Splinter Review
This version is ready for a detailed review.
Attachment #8348111 - Attachment is obsolete: true
Attachment #8406142 - Attachment is obsolete: true
Attachment #8408276 - Flags: review?(dolske)
The two storage backends share the same structure, so this comparison to see what has actually changed in the JSON backend can be useful, even if some common code has been moved to LoginHeper.jsm so both files are shorter now.

(Note that handling this new JSON backend as a Mercurial copy of the SQLite backend makes patch management and rebasing difficult, since the source file has been modified as well and MQ doesn't handle this case very well).
I started a tryserver build of the latest patch combined with all its dependencies:

https://tbpl.mozilla.org/?tree=Try&rev=71cf5f02c78c
Comment on attachment 8408276 [details] [diff] [review]
The patch

Review of attachment 8408276 [details] [diff] [review]:
-----------------------------------------------------------------

I think I've reviewed everything but storage-json.js and LoginStore.jsm. Posting feedback now since I'm done for the night.

::: services/sync/modules/engines/passwords.js
@@ -169,5 @@
> -
> -    return Utils.runInTransaction(this.DBConnection, function() {
> -      return Store.prototype.applyIncomingBatch.call(this, records);
> -    }, this);
> -  },

A sync peer should review this. Seems surprising this can just be excised.

::: toolkit/components/passwordmgr/LoginHelper.jsm
@@ +26,5 @@
> +Cu.import("resource://gre/modules/XPCOMUtils.jsm");
> +Cu.import("resource://gre/modules/Task.jsm");
> +
> +XPCOMUtils.defineLazyModuleGetter(this, "NetUtil",
> +                                  "resource://gre/modules/NetUtil.jsm");

Task and NetUtil both seem unused. Remove?

::: toolkit/components/passwordmgr/LoginImport.jsm
@@ +74,5 @@
> +    if (this.store.data.logins.length > 0 ||
> +        this.store.data.disabledHosts.length > 0) {
> +      throw new Error("Unable to import saved passwords because some data " +
> +                      "has already been imported or saved.");
> +    }

Hmm, I think there's a race here?

The check here is fine (aiui storage-json will have always loaded any JSON data before attempting an import). But a caller triggering the synchronous path would seem to able to potentially run before the migration.

EG, consider a user upgrading to the Firefox version with this patch, who has a addon that calls findLogin() early on in startup (e.g. a Gmail checker, or something like Sync). If it wins the race, it might think there's no stored login, and either fail or prompt the user to create an account. Both of which would be a "oh fuuuuuuu" moment for the user. :)

Or is this not actually possible?

@@ +118,5 @@
> +            timeCreated = Date.now();
> +            timeLastUsed = Date.now();
> +            timePasswordChanged = Date.now();
> +            timesUsed = 1;
> +          }

This isn't quite right; one can have a DB with schema v.3 but also have these columns (ie, someone who downgraded).

That's why the old code queries for logins where any of these fields are null.

Also, tiny nit: Date.now() isn't guaranteed to return the same value on each of these 3 calls. Assign it to a temporary first, to ensure all 3 are identical.

@@ +121,5 @@
> +            timesUsed = 1;
> +          }
> +
> +          // Data is stored directly in a serializable form, thus dates are
> +          // converted to JSON strings when storing them in memory.

I... don't understand what this comment means. It can probably just be removed, since I'm not sure what needs explaining.

@@ +156,5 @@
> +          });
> +        } catch (ex) {
> +          Cu.reportError("Error importing disabled host: " + ex);
> +        }
> +      }

I don't think it really matters, but maybe this should first fetch data from both sqlite tables, and then add it to the JSON storage? That would help ensure there's no |yield| in the middle, and so the storage would always be in a consistent state (either empty or containing all the imported data).

In practice the data in the two tables are unrelated to each other, but is probably good practice.

::: toolkit/components/passwordmgr/passwordmgr.manifest
@@ +10,5 @@
>  component {8c2023b9-175c-477e-9761-44ae7b549756} storage-mozStorage.js
>  contract @mozilla.org/login-manager/storage/mozStorage;1 {8c2023b9-175c-477e-9761-44ae7b549756}
> +#else
> +component {c00c432d-a0c9-46d7-bef6-9c45b4d07341} storage-json.js
> +contract @mozilla.org/login-manager/storage/mozStorage;1 {c00c432d-a0c9-46d7-bef6-9c45b4d07341}

json != mozStorage (aka sqlite). These are completely different implementations, and shouldn't have the same name.

contract @mozilla.org/login-manager/storage/json;1

You'll need to fiddle a couple other places in the tree to ensure the right one is used.

::: toolkit/components/passwordmgr/storage-mozStorage.js
@@ +17,3 @@
>  
> +XPCOMUtils.defineLazyModuleGetter(this, "LoginHelper",
> +                                  "resource://gre/modules/LoginHelper.jsm");

This doesn't need to be lazy, since it appears to always be immediately used.

@@ -73,5 @@
> -        if (!this.__crypto)
> -            this.__crypto = Cc["@mozilla.org/login-manager/crypto/SDR;1"].
> -                            getService(Ci.nsILoginManagerCrypto);
> -        return this.__crypto;
> -    },

Why is this being moved to LoginHelper? This change has nothing to do with making things async, and is driveby churn. Please undo the LoginHelper.crypto stuff.
Comment on attachment 8408276 [details] [diff] [review]
The patch

Richard, since the new JSON back-end of the Login Manager in Firefox for Desktop does not use transactions anymore, do you think the related code in "services/sync/modules/engines/passwords.js" can be safely removed?
Attachment #8408276 - Flags: review?(rnewman)
(In reply to Justin Dolske [:Dolske] from comment #75)
> ::: toolkit/components/passwordmgr/LoginImport.jsm
> @@ +74,5 @@
> > +    if (this.store.data.logins.length > 0 ||
> > +        this.store.data.disabledHosts.length > 0) {
> > +      throw new Error("Unable to import saved passwords because some data " +
> > +                      "has already been imported or saved.");
> > +    }
> 
> Hmm, I think there's a race here?
> 
> The check here is fine (aiui storage-json will have always loaded any JSON
> data before attempting an import). But a caller triggering the synchronous
> path would seem to able to potentially run before the migration.

The latter is always true, independently from this check.

> EG, consider a user upgrading to the Firefox version with this patch, who
> has a addon that calls findLogin() early on in startup (e.g. a Gmail
> checker, or something like Sync). If it wins the race, it might think
> there's no stored login, and either fail or prompt the user to create an
> account. Both of which would be a "oh fuuuuuuu" moment for the user. :)

In fact, the import process always completes asynchronously, so this case is always possible at the moment of the upgrade, as originally noted in comment 39. We might want to warn add-on authors and recommend that they wait on initializationPromise before making any Login Manager calls.

A possible race involving the import check above may happen if an add-on adds a login before the import process starts, but I think this is rare enough, if it will ever happen at all.

> > +          // Data is stored directly in a serializable form, thus dates are
> > +          // converted to JSON strings when storing them in memory.
> 
> I... don't understand what this comment means. It can probably just be
> removed, since I'm not sure what needs explaining.

Leftover from previous implementation, thanks for catching it!

> ::: toolkit/components/passwordmgr/storage-mozStorage.js
> @@ +17,3 @@
> >  
> > +XPCOMUtils.defineLazyModuleGetter(this, "LoginHelper",
> > +                                  "resource://gre/modules/LoginHelper.jsm");
> 
> This doesn't need to be lazy, since it appears to always be immediately used.

It's now only used in addLogin and modifyLogin. I can still make it eagerly-loaded, but it seems that we should just eagerly load all modules if that is the preferred style.
(In reply to :Paolo Amadini from comment #77)
> In fact, the import process always completes asynchronously, so this case is
> always possible at the moment of the upgrade, as originally noted in comment 39.

Actually, this might be an argument for pushing records to the "logins" array as soon as they are read, so that they can immediately be used, instead of waiting to write them all together at the end.
Comment on attachment 8408276 [details] [diff] [review]
The patch

Review of attachment 8408276 [details] [diff] [review]:
-----------------------------------------------------------------

Review of Sync bits: yes, this is fine, with one assumption.

PasswordStore established a transaction prior to performing pwmgr operations primarily for performance (no need to sequentially perform dozens of transactions), and secondarily to get consistency (both an isolated view of the DB during record application, and either exclusive writing or an explicit failure on commit race, assuming that there was no API-level caching happening).

The assumption in my "yes" is that the API doesn't have a similar construct now -- "I'm inserting/updating/deleting a bunch of passwords, here ya go" -- which would preserve some of those performance and safety desires. I'm guessing that's a safe assumption because we previously donned gloves and reached into the guts to do this, but please correct me if I'm wrong!

The usual caveat applies -- please make sure ./mach xpcshell services/sync/tests/unit passes! It has a rudimentary password store test that should help smoke-test this. Beyond that, QA.

::: services/sync/modules/engines/passwords.js
@@ -166,5 @@
> -    if (!this.DBConnection) {
> -      return Store.prototype.applyIncomingBatch.call(this, records);
> -    }
> -
> -    return Utils.runInTransaction(this.DBConnection, function() {

Utils.runInTransaction can also be deleted. It's only used here.

@@ +160,3 @@
>    applyIncoming: function applyIncoming(record) {
>      Store.prototype.applyIncoming.call(this, record);
>      this._sleep(0); // Yield back to main thread after synchronous operation.

Depending on what you've changed, this sleep (and thus the whole method override) might no longer be necessary.
Attachment #8408276 - Flags: review?(rnewman) → review+
Whiteboard: [snappy:p1] p=13 s=it-31c-30a-29b.3 [qa+] → [snappy:p1] p=13 s=it-32c-31a-30b.1 [qa+]
(In reply to :Paolo Amadini from comment #77)

> > EG, consider a user upgrading to the Firefox version with this patch, who
> > has a addon that calls findLogin() early on in startup (e.g. a Gmail
> > checker, or something like Sync). If it wins the race, it might think
> > there's no stored login, and either fail or prompt the user to create an
> > account. Both of which would be a "oh fuuuuuuu" moment for the user. :)
> 
> In fact, the import process always completes asynchronously, so this case is
> always possible at the moment of the upgrade, as originally noted in comment
> 39. We might want to warn add-on authors and recommend that they wait on
> initializationPromise before making any Login Manager calls.
> 
> A possible race involving the import check above may happen if an add-on
> adds a login before the import process starts, but I think this is rare
> enough, if it will ever happen at all.

I'd have preferred to just leave the import synchronous, which would have made fixing this trivial (ie, both cases would just synchronously import once).

I'm still wary of this causing addon problems, but I don't know if it's worth redoing again at this point.

> > ::: toolkit/components/passwordmgr/storage-mozStorage.js
> > @@ +17,3 @@
> > >  
> > > +XPCOMUtils.defineLazyModuleGetter(this, "LoginHelper",
> > > +                                  "resource://gre/modules/LoginHelper.jsm");
> > 
> > This doesn't need to be lazy, since it appears to always be immediately used.
> 
> It's now only used in addLogin and modifyLogin.

Ah, right, my comment no longer applies with the |LoginHelper.crypto| removal.
Comment on attachment 8408276 [details] [diff] [review]
The patch

Review of attachment 8408276 [details] [diff] [review]:
-----------------------------------------------------------------

kerplop.

::: toolkit/components/passwordmgr/LoginStore.jsm
@@ +217,5 @@
> +      try {
> +        let json = Cc["@mozilla.org/dom/json;1"].createInstance(Ci.nsIJSON);
> +        this.data = json.decodeFromStream(inputStream,
> +                                          inputStream.available());
> +      } finally {

So, the async load() handles charset conversion explicitly (whatever the default is, I assume UTF-8?).

But I don't see any charset handling here. IIRC all this stream stuff just silently discards the high byte of the UCS2 data in JS strings, and that causes problems. See bug 451155 and bug 454708.

Please either explain why that's not a problem here, or add conversion to this path.

It would nice if this and open() could share code through helpers, but looks like that would be infeasible because of the interwoven asyncness.

::: toolkit/components/passwordmgr/storage-json.js
@@ +240,5 @@
> +        let [encUsername, encPassword, encType] = this._encryptLogin(newLogin);
> +
> +        let array = this._store.data.logins;
> +        for (let i = 0; i < array.length; i++) {
> +            if (array[i].id == idToModify) {

|array| is an overly generic name, and I wouldn't assume that the constant |array[i]| dereferencing is always handled efficiently by the JS engine. :/

This can be clarified as simply:

  for (let login of this._store.data.logins) {
    if (login.id == idToModify) {
    ...
  }

@@ +363,5 @@
> +            }
> +            return true;
> +        }
> +
> +        let logins = [], ids = [];

|logins| is confusingly generic here. Call it "foundLogins" or something.

@@ +365,5 @@
> +        }
> +
> +        let logins = [], ids = [];
> +        let array = this._store.data.logins;
> +        for (let i = 0; i < array.length; i++) {

Ditto previous comment about using |for (let login of this._store.data.logins)|.

@@ +369,5 @@
> +        for (let i = 0; i < array.length; i++) {
> +            if (match(array[i])) {
> +                // Create the new nsLoginInfo object, push to array
> +                let login = Cc["@mozilla.org/login-manager/loginInfo;1"].
> +                            createInstance(Ci.nsILoginInfo);

Bah, I was going to suggest making this simply:

  if (match(login)) {
    foundLogins.push(login.clone());
    ids.push(login.id);
  }

But I realized we're storing actual JS objects in the array, and not nsILoginInfos.

I almost wonder if we should consider doing that (obviously in a followup). I think it's a potential source of bugs, because their usage is almost identical (until you try to call a nsILoginInfo method with a JS object, then you're gonna have a bad day). Makes the deserialization a PITA though, so probably not worth it.

@@ +428,5 @@
> +     *
> +     */
> +    getLoginSavingEnabled : function (hostname) {
> +        this.log("Getting login saving is enabled for " + hostname);
> +        return this._queryDisabledHosts(hostname).length == 0

queryDisabledHosts() should just go away. It needlessly duplicates the entire contents for  getLoginSavingEnabled/ setLoginSavingEnabled.

These two functions should just do the obvious twiddling directly with this._store.data.disabledHosts.

@@ +469,5 @@
> +                    hostname: hostname,
> +                });
> +                this._store.saveSoon();
> +            }
> +        }

I don't understand this.

disabledHosts doesn't need to be an object with a monotonic ID, it's just a simple list of strings.

All this should need to do is basically:

  setLoginSavingEnabled : function (hostname, enabled) {
    let i = /* index of hostname in our list */
    let found = (i != -1);
    if (enabled) {
      if (found) /* splice() out index i */
    } else {
      if (!found) /* push() hostname to end */
    }
  }
Attachment #8408276 - Flags: review?(dolske) → review-
Posted patch Updated patch (obsolete) — Splinter Review
(In reply to Richard Newman [:rnewman] from comment #79)
> The assumption in my "yes" is that the API doesn't have a similar construct
> now -- "I'm inserting/updating/deleting a bunch of passwords, here ya go" --
> which would preserve some of those performance and safety desires. I'm
> guessing that's a safe assumption because we previously donned gloves and
> reached into the guts to do this, but please correct me if I'm wrong!

That's correct!

> @@ +160,3 @@
> >    applyIncoming: function applyIncoming(record) {
> >      Store.prototype.applyIncoming.call(this, record);
> >      this._sleep(0); // Yield back to main thread after synchronous operation.
> 
> Depending on what you've changed, this sleep (and thus the whole method
> override) might no longer be necessary.

In fact we don't do any synchronous I/O anymore.

(In reply to Justin Dolske [:Dolske] from comment #81)
> But I don't see any charset handling here. IIRC all this stream stuff just
> silently discards the high byte of the UCS2 data in JS strings, and that
> causes problems. See bug 451155 and bug 454708.
> 
> Please either explain why that's not a problem here, or add conversion to
> this path.

nsIJSON.decodeFromStream handles encoding detection, in this case it always detects UTF-8. I've added a comment to that effect to the patch.

> It would nice if this and open() could share code through helpers, but looks
> like that would be infeasible because of the interwoven asyncness.

Exactly.

> But I realized we're storing actual JS objects in the array, and not
> nsILoginInfos.
> 
> I almost wonder if we should consider doing that (obviously in a followup).
> I think it's a potential source of bugs, because their usage is almost
> identical (until you try to call a nsILoginInfo method with a JS object,
> then you're gonna have a bad day). Makes the deserialization a PITA though,
> so probably not worth it.

Yeah, we handle the "serializable" form internally and the XPCOM form externally. We'll always need a serializable form, though in a future API redesign we may turn the non-serializable form into a regular JavaScript object, so that consumers don't need to worry about QueryInterface anymore.

> disabledHosts doesn't need to be an object with a monotonic ID, it's just a
> simple list of strings.

I guess that we won't ever need to extend disabledHosts with other associated properties, since we're planning to turn the list into permissions in the future, so we can definitely update it to be a simple array of strings.
Attachment #8408276 - Attachment is obsolete: true
Attachment #8408283 - Attachment is obsolete: true
Attachment #8417935 - Flags: review?(dolske)
Whiteboard: [snappy:p1] p=13 s=it-32c-31a-30b.1 [qa+] → [snappy:p1] p=13 s=it-32c-31a-30b.2 [qa+]
Comment on attachment 8417935 [details] [diff] [review]
Updated patch

Review of attachment 8417935 [details] [diff] [review]:
-----------------------------------------------------------------

::: browser/base/content/browser.js
@@ +1109,5 @@
> +        Services.logins;
> +      } catch (ex) {
> +        Cu.reportError(ex);
> +      }
> +    }, 3000);

Followup fodder...

It would be interesting to add some simple telemetry probes to see how effective this is. Ideas:

* What percentage of the time do we successfully do any async init (vs it being interrupted by a sync init)?

* What is the time distribution (in seconds after browser-startup) for when the first password manager call is made?

* When a sync-init occurs, how long does it take? (Ditto for async, just out of curiosity.)

This would help us know if 3 seconds is a good value or not, and if the sync-init is still a significant source of jank or not.

Seems worth a bug to do this.

::: toolkit/components/passwordmgr/LoginHelper.jsm
@@ +213,5 @@
> +};
> +
> +XPCOMUtils.defineLazyServiceGetter(LoginHelper, "crypto",
> +                                   "@mozilla.org/login-manager/crypto/SDR;1",
> +                                   "nsILoginManagerCrypto");

Please remove this bit too, there's no need to stash a service onto LoginHelper.

::: toolkit/components/passwordmgr/LoginImport.jsm
@@ +64,5 @@
> +
> +  /**
> +   * Imports login-related data from the previous SQLite storage format.
> +   */
> +  import: Task.async(function* () {

More Followup Fodder:

Let's get a bug on file for deleting signons.sqlite after a reasonable period of time. The equivalent of the now-ridiculously-overdue bug 925101. :(

::: toolkit/components/passwordmgr/LoginStore.jsm
@@ +97,5 @@
> +
> +/**
> + * Delay between a change to the login data and the related save operation.
> + */
> +const kSaveDelayMs = 1500;

I'm full of followup ideas!

This one's much less interesting, though: telemetry to see how long the actual interval is between operations.

1500ms seems fine, although I bet data might show there's a bimodal distribution between "a few milliseconds" and "many many seconds-to-hours". IE, either something is processing things in a batch (like sync), or it's from slow user interaction as they login to sites during normal browsing.
Attachment #8417935 - Flags: review?(dolske) → review+
Blocks: 1013935
Blocks: 1013947
While waiting for the next steps in bug 956550 to be defined, I have created a tryserver build that is probably close to the final result:

https://tbpl.mozilla.org/?tree=Try&rev=ea23986693e1

I ran the following test case on Windows to check profile update and data storage:

* Create a new profile using a previous version of Firefox.
* Go to a web page that has a form with a password field.
* Submit the form and choose "Remember password".
* Go to the same web page again.
  - The password should be autocompleted.
* Close Firefox.
* Open the same profile with the tryserver build.
* Go to the same web page again.
  - The password should be autocompleted.
* Change the password in the field, choosing one of a different length.
* Submit the form and choose "Update password".
  - Note that, for this to work, the site should accept the new password without errors.
* Restart the browser.
* Go to the same web page again.
  - The new password (with a different length) should be autocompleted.
I think that the QA team may want to move forward with a preliminary verification of the tryserver build:

https://tbpl.mozilla.org/?tree=Try&rev=ea23986693e1

In general, we're interested in ensuring we haven't introduced regressions in the Login Manager on all platforms, including Android and B2G, by running at least some basic functionality tests.

We should probably add one or two new manual test cases for the next Desktop release, based on comment 84, related to profile upgrade. In general, exploratory testing around profile upgrades and downgrades would be welcome!

When this bug lands, we may make use of those test cases and the results of exploratory testing to better test the Nightly build.
Flags: needinfo?(andrei.vaida)
Paolo, thank you for bringing this to my attention. I will take a look and follow up on this matter as soon as possible.
Flags: needinfo?(andrei.vaida)
Comment on attachment 8417935 [details] [diff] [review]
Updated patch

>new file mode 100644
>--- /dev/null
>+++ b/toolkit/components/passwordmgr/storage-json.js
...
>+    /*
>+     * _checkHostnameValue
>+     *
>+     * Legacy storage prohibited newlines and nulls in hostnames, so we'll keep
>+     * that standard here. Throws on illegal format.
>+     */
>+    _checkHostnameValue : function (hostname) {
>+        // File format prohibits certain values. Also, nulls
>+        // won't round-trip with getAllDisabledHosts().
>+        if (hostname == "." ||
>+            hostname.indexOf("\r") != -1 ||
>+            hostname.indexOf("\n") != -1 ||
>+            hostname.indexOf("\0") != -1)
>+            throw "Invalid hostname";
>+    },

One more trivial fix I noticed while looking at bug 956550...

Can you move this _checkHostnameValue into LoginHelper, and have both storage-json and storage-mozStorage call it? (i.e., also remove the implementation in storage-mozStorage).
Whiteboard: [snappy:p1] p=13 s=it-32c-31a-30b.2 [qa+] → [snappy:p1] p=13 s=it-32c-31a-30b.3 [qa+]
(In reply to :Paolo Amadini (away May 24 - May 26) from comment #84)
> While waiting for the next steps in bug 956550 to be defined, I have created
> a tryserver build that is probably close to the final result:
> 
> https://tbpl.mozilla.org/?tree=Try&rev=ea23986693e1
> 
> I ran the following test case on Windows to check profile update and data
> storage:
> 
> * Create a new profile using a previous version of Firefox.
> * Go to a web page that has a form with a password field.
> * Submit the form and choose "Remember password".
> * Go to the same web page again.
>   - The password should be autocompleted.
> * Close Firefox.
> * Open the same profile with the tryserver build.
> * Go to the same web page again.
>   - The password should be autocompleted.
> * Change the password in the field, choosing one of a different length.
> * Submit the form and choose "Update password".
>   - Note that, for this to work, the site should accept the new password
> without errors.
> * Restart the browser.
> * Go to the same web page again.
>   - The new password (with a different length) should be autocompleted.

The first thing I did was to verify these steps on the tryserver build and Firefox 29.0.1 [1] to downgrade. All the steps met their expected results but I was wondering, if you update the password on the tryserver build, shouldn't Firefox 29.0.1 also autocomplete the form with the updated one? Currently, the old passwords are autocompleted instead of the updated ones on downgraded profiles. I might be looking at this wrong, but please let me know. 

I'm currently knee-deep in determining relevant scenarios for this fix, I'll let you know as soon as things take shape.


[1] Mozilla/5.0 (Windows NT 6.1; WOW64; rv:29.0) Gecko/20100101 Firefox/29.0
(In reply to Andrei Vaida, QA [:avaida] from comment #88)
> if you update the password on the tryserver build,
> shouldn't Firefox 29.0.1 also autocomplete the form with the updated one?
> Currently, the old passwords are autocompleted instead of the updated ones
> on downgraded profiles. I might be looking at this wrong, but please let me
> know. 

This would be the ideal behavior, but we explicitly decided that we won't support the case of downgrading to previous versions for passwords that have been changed after the upgrade, because of the complexity of supporting this case in a way that does not affect performance significantly.

> I'm currently knee-deep in determining relevant scenarios for this fix, I'll
> let you know as soon as things take shape.

Thanks!
Posted patch Checked inSplinter Review
Pushed to fx-team, including the last requested change:

https://hg.mozilla.org/integration/fx-team/rev/ca4caf0d721c
Attachment #8417935 - Attachment is obsolete: true
Despite we're late in the Aurora cycle, it might make sense for the ESR release to be upgraded to the same back-end format for passwords that we're using in the other releases.

To do this, we may either uplift this patch to Aurora when QA on trunk is finished, or let this change ride the trains and uplift it later to a dot release of the ESR.

Once trunk QA is completed, both uplifts will generally be low risk since the feature is well-covered by regression tests and the Login Manager code has not been changed in the meantime.
sorry had to backout this change for valgrind test failures like https://tbpl.mozilla.org/php/getParsedLog.php?id=40537709&tree=Fx-Team
(In reply to Carsten Book [:Tomcat] from comment #92)
> sorry had to backout this change for valgrind test failures like
> https://tbpl.mozilla.org/php/getParsedLog.php?id=40537709&tree=Fx-Team

Ouch, missed the failure in the TBPL interface of the tryserver build.

There used to be a Valgrind suppression for that leak (bug 794358) which is probably caused by the initialization of the storage module of the Login Manager. This was then removed because the leak disappeared:

http://hg.mozilla.org/mozilla-central/rev/7f14d602e0c5#l1.150

It's quite possible that the leak disappeared because the Login Manager was changed to delay the initialization of the storage module. We reverted that behavior in bug 956332, but also removed the initialization of the entire Login Manager.

This bug reintroduces the initialization of the Login Manager some time after startup. Since that operation now always initializes the storage module, the leak has returned.

I'm running a tryserver build with initialization disabled to confirm:

https://tbpl.mozilla.org/?tree=Try&rev=d6b41b0115e7

If that build is green, I think we can just resurrect the suppression. Apparently, the 24 bytes leak in NSS always occurred when initializing the storage module.
Blocks: 1017112
Filed bug 1017112 and re-landed with suppression:

https://hg.mozilla.org/integration/fx-team/rev/a3f383647966
https://hg.mozilla.org/mozilla-central/rev/a3f383647966
Status: ASSIGNED → RESOLVED
Closed: 5 years ago
Flags: in-testsuite+
Resolution: --- → FIXED
Target Milestone: --- → mozilla32
I executed the test case from Comment 84 along with additional exploratory testing around it, using Nightly 32.0a1 (2014-05-29) and Firefox 29.0.1 (Build ID: 20140506152807). On the same note, I also added a few scenarios to cover this better, you can see them here [1].

At this point, I only managed to run the tests on Windows 7 64-bit [2], I'll post the results for Mac OS X and Ubuntu as soon as soon as possible. Paolo, please take a look at [1] and let me know if there's anything else I should look at here.


[1] http://goo.gl/t07m0K
[2] Mozilla/5.0 (Windows NT 6.1; WOW64; rv:32.0) Gecko/20100101 Firefox/32.0
(In reply to Andrei Vaida, QA [:avaida] from comment #96)
> On the same note, I also added a few scenarios
> to cover this better, you can see them here [1].

Thank you, they look good!

> please take a look at [1] and let me know if there's anything else I should
> look at here.

For the Sync tests, I would add the scenario where a new profile is created with the newest version of Nightly, and the passwords are subsequently recovered from an existing Sync account.
Blocks: 1017696
(In reply to :Paolo Amadini from comment #91)
> Despite we're late in the Aurora cycle, it might make sense for the ESR
> release to be upgraded to the same back-end format for passwords that we're
> using in the other releases.

I think it is beneficial to have ESR on the same code base for rewrites in order to make maintenance simpler. However, in this case we are pretty late in the Aurora cycle to accept the risk of a rewrite landing. As well, reviewing the fixes that went into the ESR24 cycle [1], I don't see any that are related to password manager. This means that we have had no maintenance work to do in this area for the last 6 ESR releases. Given this, I don't think this change warrants uplift to Aurora only to ensure that it is included in ESR31.

[1] https://www.mozilla.org/security/known-vulnerabilities/firefoxESR.html

> To do this, we may either uplift this patch to Aurora when QA on trunk is
> finished, or let this change ride the trains and uplift it later to a dot
> release of the ESR.

A change like this is not in scope for an ESR dot release. The scope of these releases is limited to sec high/critial and some functional correctness issues.
(In reply to :Paolo Amadini from comment #97)
> (In reply to Andrei Vaida, QA [:avaida] from comment #96)
> > On the same note, I also added a few scenarios
> > to cover this better, you can see them here [1].
> 
> Thank you, they look good!
> 
> > please take a look at [1] and let me know if there's anything else I should
> > look at here.
> 
> For the Sync tests, I would add the scenario where a new profile is created
> with the newest version of Nightly, and the passwords are subsequently
> recovered from an existing Sync account.

Thanks for the feedback Paolo, I added the test case to the suite. I'll post an update for Mac OS X and Ubuntu first thing Monday.
Depends on: 1018121
Depends on: 1018624
I finished testing the fix on Ubuntu 12.04 32-bit [1] and Mac OS X 10.9.2 [2] as well and updated the test suite [3] with my results.

I didn't find any conclusive issues following the regression testing, only potential ones - please take a look at the Actual Result section for test case#1 and test case#2. I'm still looking into those, but as I mentioned in the document, I didn't managed to determine exact STR for them. If you have any thoughts on that note, please let me know.


[1] Mozilla/5.0 (X11; Linux i686; rv:32.0) Gecko/20100101 Firefox/32.0
[2] Mozilla/5.0 (Macintosh; Intel Mac OS X 10.9; rv:32.0) Gecko/20100101 Firefox/32.0
[3] http://goo.gl/t07m0K
(In reply to Andrei Vaida, QA [:avaida] from comment #100)

"Saving at least 3 different passwords and then logging into a new website for the first time, does not always trigger the "Remember password" doorhanger, but the password *is* saved despite that. I was unable to determine exact STR for this issue, it needs further investigation due to its rare/intermittent reproducibility rate."

This looks like an existing bug that may be unrelated to the storage back-end, thanks for finding it!

"[Linux] Noticed the following error in autocomplete.xml:901 (Browser Console): NS_ERROR_XPC_CANT_MODIFY_PROP_ON_WN: Cannot modify properties of a WrappedNative. 
I only managed to replicate this *once* on Ubuntu 12.04 32-bit."

Don't know if it's related, but worth mentioning that we have some known re-entrancy issues in autocomplete code, as noted in bug 488812.
(In reply to :Paolo Amadini from comment #101)
> "Saving at least 3 different passwords and then logging into a new website
> for the first time, does not always trigger the "Remember password"
> doorhanger, but the password *is* saved despite that. I was unable to
> determine exact STR for this issue, it needs further investigation due to
> its rare/intermittent reproducibility rate."
> 
> This looks like an existing bug that may be unrelated to the storage
> back-end, thanks for finding it!
> 
> "[Linux] Noticed the following error in autocomplete.xml:901 (Browser
> Console): NS_ERROR_XPC_CANT_MODIFY_PROP_ON_WN: Cannot modify properties of a
> WrappedNative. 
> I only managed to replicate this *once* on Ubuntu 12.04 32-bit."
> 
> Don't know if it's related, but worth mentioning that we have some known
> re-entrancy issues in autocomplete code, as noted in bug 488812.

Thank you for your feedback, Paolo. At this point I was unable to determine a way of replicating any of the above issues - I'm going to mark this issue verified fixed and keep an eye on these two potential ones in the future.
Status: RESOLVED → VERIFIED
Whiteboard: [snappy:p1] p=13 s=it-32c-31a-30b.3 [qa+] → [snappy:p1] p=13 s=it-32c-31a-30b.3 [qa!]
Depends on: 1019885
Depends on: 1020112
Blocks: 1024100
Depends on: 1062336
No longer depends on: 1062336
Depends on: bug 1093546
Maybe a crash case to consider?
Whatever, thank you for all your job guys.
Depends on: 1096787
Depends on: 1328437
You need to log in before you can comment on or make changes to this bug.