Closed Bug 1597358 Opened 5 years ago Closed 4 years ago

logins.json is a 0-byte file. Create a backup during writes to it.

Categories

(Toolkit :: Password Manager, defect, P2)

defect

Tracking

()

VERIFIED FIXED
mozilla79
Tracking Status
firefox-esr68 --- wontfix
firefox-esr78 --- wontfix
firefox77 --- wontfix
firefox78 --- wontfix
firefox79 --- verified

People

(Reporter: ehsan.akhgari, Assigned: prathiksha)

References

Details

(Keywords: dataloss, regression, Whiteboard: [passwords:storage])

Attachments

(2 files, 2 obsolete files)

I turned on my main work laptop today after a couple of weeks of working from home and not using this machine, and found that Firefox had lost all of my saved passwords.

I connected it to Firefox Accounts again and waited for a sync to finish. That only sync'ed up 4 logins, including the Firefox Account login that I saved when reconnecting to FxA, from the hundreds I have synced... Now no matter how many times I press Sync Now nothing more gets synced.

I can confirm from another machine running a different profile that my hundreds of passwords show up there without any problems.

I connected it to Firefox Accounts again and waited for a sync to finish. That only sync'ed up 4 logins,
including the Firefox Account login that I saved when reconnecting to FxA,

Just to check, did you reset your account password at any point during this process?

Please try installing the about:sync addon and then visiting about:sync. It should let you check for any error logs which you can attach here, as well as give you a summary for each synced datatype, including how many records sync believes are on the server and on the client. Do you see anything in there that might give some clues for further debugging?

I can confirm from another machine running a different profile that my hundreds of passwords show up there without any problems.

Is this profile also connected to your Firefox Account?

Flags: needinfo?(ehsan)

(In reply to Ryan Kelly [:rfkelly] from comment #1)

I connected it to Firefox Accounts again and waited for a sync to finish. That only sync'ed up 4 logins,
including the Firefox Account login that I saved when reconnecting to FxA,

Just to check, did you reset your account password at any point during this process?

No, I don't remember the last time I have changed that password.

Please try installing the about:sync addon and then visiting about:sync. It should let you check for any error logs which you can attach here, as well as give you a summary for each synced datatype, including how many records sync believes are on the server and on the client. Do you see anything in there that might give some clues for further debugging?

Can you please tell me how to look for the error log after installing the extension in about:sync?

I can confirm from another machine running a different profile that my hundreds of passwords show up there without any problems.

Is this profile also connected to your Firefox Account?

Yes it is.

Flags: needinfo?(ehsan)

In about:sync, under passwords I see:

Downloaded all 894 records (0 deleted)

from https://sync-758-us-west-2.sync.services.mozilla.com/1.5/125128847/storage/passwords, last modified at Mon Nov 18 2019 13:25:39 GMT-0500 (Eastern Standard Time)

Under validation I see:

clientMissing: 889 problems

The following server records appear on the server but not on the client.

Followed by a giant table of logins.

Can you please tell me how to look for the error log after installing the extension in about:sync?

In the upper right there should be an item labelled "Log Files and Diagnostics", clicking on it will give you options to view the logs locally or to save them as a .zip for easy sharing.

Given that the records appear on the server but not on the client, I'm hopeful the sync error logs will give more clues about why they haven't downloaded successfully.

Ehsan, could you please try resetting the services.sync.passwords.lastSync pref in about:config, syncing again, and seeing if your passwords gets downloaded then? Could you also please check if you have a logins.json.corrupt file in your profile folder, and, if so, check its timestamp, and if anything about the contents looks off? And could you please check the creation time on logins.json?

One explanation for what you're seeing is Firefox automatically replaced logins.json. However, because we store the last sync time in prefs, and not password storage itself, Sync thinks you already have all the records from the server, and only downloads new ones since the last sync. We fixed this for bookmarks and history by storing this metadata in the database, so corruption would trigger a full sync—but haven't given logins the same treatment yet.

Flags: needinfo?(ehsan)
See Also: → 1199077

(In reply to :Lina Cambridge from comment #5)

Ehsan, could you please try resetting the services.sync.passwords.lastSync pref in about:config, syncing again, and seeing if your passwords gets downloaded then?

Thanks so much, Lina. This indeed fixed the problem.

Could you also please check if you have a logins.json.corrupt file in your profile folder, and, if so, check its timestamp, and if anything about the contents looks off? And could you please check the creation time on logins.json?

Yes, I did have that file (a 0-byte file) in my profile directory. Unfortunately my filesystem doesn't seem to record file creation dates, but I don't see anything unusual with the other timestamps on the file...

One explanation for what you're seeing is Firefox automatically replaced logins.json. However, because we store the last sync time in prefs, and not password storage itself, Sync thinks you already have all the records from the server, and only downloads new ones since the last sync. We fixed this for bookmarks and history by storing this metadata in the database, so corruption would trigger a full sync—but haven't given logins the same treatment yet.

That seems like a very plausible explanation of what happened, yes. Should we keep this bug open to store this metadata in the logins database similar to passwords and history? I would expect that if I were a regular user who wouldn't have had access to Bugzilla, I would have considered this a catastrophic bug in Firefox where it had deleted all of my passwords without any explanation... :-/

Thanks again!

Flags: needinfo?(ehsan)

(In reply to :ehsan akhgari from comment #6)

That seems like a very plausible explanation of what happened, yes. Should we keep this bug open to store this metadata in the logins database similar to passwords and history?

Bug 1377100 is our "canonical" bug for this scenario, but maybe we should change the title to be more discoverable?

I would expect that if I were a regular user who wouldn't have had access to Bugzilla, I would have considered this a catastrophic bug in Firefox where it had deleted all of my passwords without any explanation... :-/

Sadly, even fixing this means that non-sync users will still have that experience :( Logins on desktop seem simply unreliable - eg bug 1124553, bug 1295122 and a number of others are nominally reported as sync bugs but in reality they are more fundamental problems. In particular, corrupt or damaged key3.db/key4.db files tends to leave things completely broken until manual removal the file from the profile directory.

But yeah, we will at least bump bug 1377100 up the priority list - it shouldn't be particularly difficult.

The priority flag is not set for this bug.
:markh, could you have a look please?

For more information, please visit auto_nag documentation.

Flags: needinfo?(markh)
Status: NEW → RESOLVED
Closed: 5 years ago
Flags: needinfo?(markh)
Resolution: --- → DUPLICATE

(In reply to :ehsan akhgari from comment #6)

(In reply to :Lina Cambridge from comment #5)

Could you also please check if you have a logins.json.corrupt file in your profile folder, and, if so, check its timestamp, and if anything about the contents looks off? And could you please check the creation time on logins.json?

Yes, I did have that file (a 0-byte file) in my profile directory. Unfortunately my filesystem doesn't seem to record file creation dates, but I don't see anything unusual with the other timestamps on the file...

Ehsan, did you also have a logins.json.tmp file?

I'm going to re-open the bug to handle the corruption of logins.json… I thought we were using the safest options of OS.File.writeAtomic but I see there is a flush option which is supposed to be safer yet so I think we should start. Performance should be fine given that our writes are fairly infrequent and we use a timer to batch them.

Assignee: nobody → MattN+bmo
Status: RESOLVED → REOPENED
Component: Sync → Password Manager
Priority: -- → P1
Product: Firefox → Toolkit
Resolution: DUPLICATE → ---
Summary: Firefox sync can't sync my passwords any more → logins.json is a 0-byte file
Status: REOPENED → ASSIGNED

(In reply to Matthew N. [:MattN] (PM me if requests are blocking you) from comment #10)

(In reply to :ehsan akhgari from comment #6)

(In reply to :Lina Cambridge from comment #5)

Could you also please check if you have a logins.json.corrupt file in your profile folder, and, if so, check its timestamp, and if anything about the contents looks off? And could you please check the creation time on logins.json?

Yes, I did have that file (a 0-byte file) in my profile directory. Unfortunately my filesystem doesn't seem to record file creation dates, but I don't see anything unusual with the other timestamps on the file...

Ehsan, did you also have a logins.json.tmp file?

I did not.

(In reply to :ehsan akhgari from comment #12)

(In reply to Matthew N. [:MattN] (PM me if requests are blocking you) from comment #10)

(In reply to :ehsan akhgari from comment #6)

(In reply to :Lina Cambridge from comment #5)

Could you also please check if you have a logins.json.corrupt file in your profile folder, and, if so, check its timestamp, and if anything about the contents looks off? And could you please check the creation time on logins.json?

Yes, I did have that file (a 0-byte file) in my profile directory. Unfortunately my filesystem doesn't seem to record file creation dates, but I don't see anything unusual with the other timestamps on the file...

Ehsan, did you also have a logins.json.tmp file?

I did not.

Yoric, do we have tests for the tmpPath option? In any bug reports that reports issues with logins.json, I've never seen a logins.json.tmp file present… I'm wondering if that option even works?

Flags: needinfo?(dteller)

When we were faced with the same kind of problem with Session Restore, flushing didn't help, for several reasons. One of them is that flushing generally doesn't do what people think it does :)

Let me suggest the following strategy, which basically solved the problem with Session Restore:

  • when you write logins.json, write first to an intermediate file before overwriting logins.json (I think that you're already doing that) and backup the existing logins.json to logins-prev.json or something along these lines (OS.File.writeAtomic has an option backupTo exactly for this);
  • when you start Firefox, try loading from logins.json – if it fails, load from logins-prev.json.

Also, please recall that the entire design of JSONFile.jsm is shaky, as it attempts to mix nsIFile (main thread I/O) and OS.File (off main thread I/O). If you have critical data, please don't use JSONFile.jsm.

What do you think, MattN?

Flags: needinfo?(dteller) → needinfo?(MattN+bmo)

(In reply to Matthew N. [:MattN] (PM me if requests are blocking you) from comment #13)

Yoric, do we have tests for the tmpPath option? In any bug reports that reports issues with logins.json, I've never seen a logins.json.tmp file present… I'm wondering if that option even works?

Well, many tests use it, but I don't think that we have any tests that ensure that it is created.

(In reply to David Teller [:Yoric] (please use "needinfo") from comment #14)

  • when you write logins.json, write first to an intermediate file before overwriting logins.json (I think that you're already doing that) and backup the existing logins.json to logins-prev.json or something along these lines (OS.File.writeAtomic has an option backupTo exactly for this);

How is that different than what tmpPath is supposed to do?

If specified, write all data to a temporary file called |tmpPath| and, once this write is complete, rename the file to replace |path|. Performing this additional operation is a little slower but also a little safer.

Also, please recall that the entire design of JSONFile.jsm is shaky, as it attempts to mix nsIFile (main thread I/O) and OS.File (off main thread I/O). If you have critical data, please don't use JSONFile.jsm.

Can you give a concrete example of a bug that this causes? People say this but never explain the reason… nsIFile is only used for the synchronous fallback case and combining that with writeAtomic should be safe.

I was also going to use the backupTo option in another bug but there are privacy issues to address with that. e.g. if a user disables the password manager and removes all logins, we may not want to keep the backup around forever.

Flags: needinfo?(MattN+bmo) → needinfo?(dteller)

(In reply to Matthew N. [:MattN] (PM me if requests are blocking you) from comment #16)

(In reply to David Teller [:Yoric] (please use "needinfo") from comment #14)

  • when you write logins.json, write first to an intermediate file before overwriting logins.json (I think that you're already doing that) and backup the existing logins.json to logins-prev.json or something along these lines (OS.File.writeAtomic has an option backupTo exactly for this);

How is that different than what tmpPath is supposed to do?

The intermediate file is tmpPath, which I believe you're already using. The backup is backupTo, which I believe you're not using yet.

Can you give a concrete example of a bug that this causes? People say this but never explain the reason… nsIFile is only used for the synchronous fallback case and combining that with writeAtomic should be safe.

I haven't checked, how does the code make sure that writeAtomic is not running during the synchronous fallback?

I was also going to use the backupTo option in another bug but there are privacy issues to address with that. e.g. if a user disables the password manager and removes all logins, we may not want to keep the backup around forever.

Yes, you'll need to cleanup the backup (and also the tmp file if it exists).

Yoric, do we have tests for the tmpPath option? In any bug reports that reports issues with logins.json, I've never seen a logins.json.tmp file present… I'm wondering if that option even works?

Do the test report a missing logins.json or an empty file? If it's empty, the problem may have happened at OS-level or at battery-level. Or it could be a bug in OS.File, of course :)

Flags: needinfo?(dteller) → needinfo?(MattN+bmo)

(In reply to David Teller [:Yoric] (please use "needinfo") from comment #17)

(In reply to Matthew N. [:MattN] (PM me if requests are blocking you) from comment #16)

(In reply to David Teller [:Yoric] (please use "needinfo") from comment #14)

  • when you write logins.json, write first to an intermediate file before overwriting logins.json (I think that you're already doing that) and backup the existing logins.json to logins-prev.json or something along these lines (OS.File.writeAtomic has an option backupTo exactly for this);

How is that different than what tmpPath is supposed to do?

The intermediate file is tmpPath, which I believe you're already using. The backup is backupTo, which I believe you're not using yet.

Yeah, I understand they are different options but it sounds like tmpPath should have already prevented the empty file. I was trying to understand if there is some technical reason why backupTo would have prevented this issue where tmpPath would not? My assumption would be that the options are the same except tmpPath doesn't leave a temp. file around if everything is successful?

Can you give a concrete example of a bug that this causes? People say this but never explain the reason… nsIFile is only used for the synchronous fallback case and combining that with writeAtomic should be safe.

I haven't checked, how does the code make sure that writeAtomic is not running during the synchronous fallback?

Why would that be a problem if the write is atomic (according to the name of the method) and we are using tmpPath?

I was also going to use the backupTo option in another bug but there are privacy issues to address with that. e.g. if a user disables the password manager and removes all logins, we may not want to keep the backup around forever.

Yes, you'll need to cleanup the backup (and also the tmp file if it exists).

Yeah, but IIUC, the .tmp would be cleaned up on its own if the write was successful so I don't think we'd need to worry about that nearly as much. Does adding flush: true like the patch on Phabricator improve the situation at all? If so, I think we should still take it and figure out more complicated solutions later. We likely wouldn't switch away from JSONFile as I'm not convinced there is a problem with it and because we will switch to the rust logins crate for storage in the next year or so.

Yoric, do we have tests for the tmpPath option? In any bug reports that reports issues with logins.json, I've never seen a logins.json.tmp file present… I'm wondering if that option even works?

Do the test report a missing logins.json or an empty file? If it's empty, the problem may have happened at OS-level or at battery-level. Or it could be a bug in OS.File, of course :)

I've seen dozens of reports of both cases.

Flags: needinfo?(MattN+bmo)
Flags: needinfo?(dteller)

(In reply to Matthew N. [:MattN] (PM me if requests are blocking you) from comment #18)

Yeah, I understand they are different options but it sounds like tmpPath should have already prevented the empty file. I was trying to understand if there is some technical reason why backupTo would have prevented this issue where tmpPath would not? My assumption would be that the options are the same except tmpPath doesn't leave a temp. file around if everything is successful?

Empty file could be due to any number of factors. In particular, both the OS and the on-disk controller reorder file operations to e.g. save battery or bandwidth, and in case of a OS or hardware-level crash, anything can happen. For instance, while our operations state that we first write, then move (as per tmpPath), it could actually be happening in a different order.

Now, if the empty file appears after a Firefox crash (by opposition to a OS-level or hardware-level crash), it's most likely our fault.

I haven't checked, how does the code make sure that writeAtomic is not running during the synchronous fallback?

Why would that be a problem if the write is atomic (according to the name of the method) and we are using tmpPath?

It's atomic with respect to OS.File only. If you're performing two operations with OS.File, we guarantee that the second will only take place after the first one is complete (barring any OS.File bug, of course). However, there are no guarantees on whatever may happen if operations take place using other APIs. I don't think any OS currently offers primitives that would guarantee complete atomicity (Windows had primitives for this purpose, but they're really, really slow, so they're deprecated, iirc).

So, if nsIFile and writeAtomic write at the same time to the same file, anything can happen. It could be empty or missing. It could be corrupted.

Yes, you'll need to cleanup the backup (and also the tmp file if it exists).

Yeah, but IIUC, the .tmp would be cleaned up on its own if the write was successful so I don't think we'd need to worry about that nearly as much. Does adding flush: true like the patch on Phabricator improve the situation at all? If so, I think we should still take it and figure out more complicated solutions later. We likely wouldn't switch away from JSONFile as I'm not convinced there is a problem with it and because we will switch to the rust logins crate for storage in the next year or so.

Well, a few years ago, my job was to remove all flushes from Firefox because they were killing the performance of both Firefox and the entire OS and they did not make anything observably more reliable, so I would not suggest doing this :) Do we already know when these errors happen? A flush would protect us against some OS-level and hardware-level crashes, while a backupTo would protect us against all of them.

Yoric, do we have tests for the tmpPath option? In any bug reports that reports issues with logins.json, I've never seen a logins.json.tmp file present… I'm wondering if that option even works?

Do the test report a missing logins.json or an empty file? If it's empty, the problem may have happened at OS-level or at battery-level. Or it could be a bug in OS.File, of course :)

I've seen dozens of reports of both cases.

Well, that's worrying. Out of curiosity, what happens currently when the file is corrupted?

Flags: needinfo?(dteller) → needinfo?(MattN+bmo)

(In reply to David Teller [:Yoric] (please use "needinfo") from comment #19)

(In reply to Matthew N. [:MattN] (PM me if requests are blocking you) from comment #18)

Yeah, I understand they are different options but it sounds like tmpPath should have already prevented the empty file. I was trying to understand if there is some technical reason why backupTo would have prevented this issue where tmpPath would not? My assumption would be that the options are the same except tmpPath doesn't leave a temp. file around if everything is successful?

Empty file could be due to any number of factors. In particular, both the OS and the on-disk controller reorder file operations to e.g. save battery or bandwidth, and in case of a OS or hardware-level crash, anything can happen. For instance, while our operations state that we first write, then move (as per tmpPath), it could actually be happening in a different order.

I see, that's scary.

Now, if the empty file appears after a Firefox crash (by opposition to a OS-level or hardware-level crash), it's most likely our fault.

I suspect we have some of both and therefore my patch will at least help some cases. I think logins.json is one of the most important pieces of user data so according to the API comments it seems like flush is appropriate, especially given the infrequent writes.

I haven't checked, how does the code make sure that writeAtomic is not running during the synchronous fallback?

Why would that be a problem if the write is atomic (according to the name of the method) and we are using tmpPath?

It's atomic with respect to OS.File only. If you're performing two operations with OS.File, we guarantee that the second will only take place after the first one is complete (barring any OS.File bug, of course). However, there are no guarantees on whatever may happen if operations take place using other APIs. I don't think any OS currently offers primitives that would guarantee complete atomicity (Windows had primitives for this purpose, but they're really, really slow, so they're deprecated, iirc).

So, if nsIFile and writeAtomic write at the same time to the same file, anything can happen. It could be empty or missing. It could be corrupted.

Since JSONFIle.jsm never writes via an API other than writeAtomic we shouldn't have to worry about that.

Yes, you'll need to cleanup the backup (and also the tmp file if it exists).

Yeah, but IIUC, the .tmp would be cleaned up on its own if the write was successful so I don't think we'd need to worry about that nearly as much. Does adding flush: true like the patch on Phabricator improve the situation at all? If so, I think we should still take it and figure out more complicated solutions later. We likely wouldn't switch away from JSONFile as I'm not convinced there is a problem with it and because we will switch to the rust logins crate for storage in the next year or so.

Well, a few years ago, my job was to remove all flushes from Firefox because they were killing the performance of both Firefox and the entire OS and they did not make anything observably more reliable, so I would not suggest doing this :) Do we already know when these errors happen? A flush would protect us against some OS-level and hardware-level crashes, while a backupTo would protect us against all of them.

Yoric, do we have tests for the tmpPath option? In any bug reports that reports issues with logins.json, I've never seen a logins.json.tmp file present… I'm wondering if that option even works?

Do the test report a missing logins.json or an empty file? If it's empty, the problem may have happened at OS-level or at battery-level. Or it could be a bug in OS.File, of course :)

I've seen dozens of reports of both cases.

Well, that's worrying. Out of curiosity, what happens currently when the file is corrupted?

We move the file to logins.json.corrupt and start as if the user never saved anything to login storage (which wouldn't mean an empty file).

Flags: needinfo?(MattN+bmo)

(In reply to Matthew N. [:MattN] (PM me if requests are blocking you) from comment #20)

I see, that's scary.

Yes, it is.

Now, if the empty file appears after a Firefox crash (by opposition to a OS-level or hardware-level crash), it's most likely our fault.

I suspect we have some of both and therefore my patch will at least help some cases. I think logins.json is one of the most important pieces of user data so according to the API comments it seems like flush is appropriate, especially given the infrequent writes.

Can you confirm that it's not called during startup? If so, let's do that.

So, if nsIFile and writeAtomic write at the same time to the same file, anything can happen. It could be empty or missing. It could be corrupted.

Since JSONFIle.jsm never writes via an API other than writeAtomic we shouldn't have to worry about that.

Well, there's a call to moveTo that may end up non-trivial in some cases. Do we have guarantees that it's never called concurrently with writeAtomic? It might be simple to replace this moveTo with OS.File.move.

Also, do we have guarantees that we never write with writeAtomic during a nsIFileInputStream read?

Do the test report a missing logins.json or an empty file? If it's empty, the problem may have happened at OS-level or at battery-level. Or it could be a bug in OS.File, of course :)

I've seen dozens of reports of both cases.

Well, that's worrying. Out of curiosity, what happens currently when the file is corrupted?

We move the file to logins.json.corrupt and start as if the user never saved anything to login storage (which wouldn't mean an empty file).

Ok. For future reference, it might be interesting to have Telemetry on how often this happens and how often the file is empty.

Flags: needinfo?(MattN+bmo)

(In reply to David Teller [:Yoric] (please use "needinfo") from comment #21)

Now, if the empty file appears after a Firefox crash (by opposition to a OS-level or hardware-level crash), it's most likely our fault.

I suspect we have some of both and therefore my patch will at least help some cases. I think logins.json is one of the most important pieces of user data so according to the API comments it seems like flush is appropriate, especially given the infrequent writes.

Can you confirm that it's not called during startup? If so, let's do that.

Ugh, that sounds like a recipe for having to remove the flush calls in a few months or years when we get bug reports about the performance problems.

To the best of my knowledge, on Windows at least, a flush call flushes all kernel file buffers. There is no way to just flush what needs to be written to the disk for one file. This means, for example, if your system is running on virtual memory, and pages need to be committed to the backing virtual memory file, your flush call is gonna block until that is done. Or if another application has copied 500MB of files in the background, your flush call is gonna block until all of those buffers touch the disk. This is why there is almost no "cheap" flush calls, and they ruin the performance of both Firefox and other applications running on the system. I believe Yoric knows this problem all too well as he mentioned in comment 19. :-)

This sounds like a typical use case in database applications, where we have some data that needs to get to the disk and we'd like to ensure that happens atomically. There are two very common techniques for handling such a use case, a rollback journal or a write-ahead log. I think the fact that we're talking about using flush instead of the textbook solutions available for this problem is a glaring sign that the choice to store the login database in a JSON file may have been a mistake, and it may be time to revise it perhaps. (I suppose technically you could implement a write-ahead log manually on top of a JSON store but doing so correctly is probably a lot of error prone work.)

Attachment #9113461 - Attachment is obsolete: true

(In reply to David Teller [:Yoric] (please use "needinfo") from comment #21)

(In reply to Matthew N. [:MattN] (PM me if requests are blocking you) from comment #20)

Now, if the empty file appears after a Firefox crash (by opposition to a OS-level or hardware-level crash), it's most likely our fault.

I suspect we have some of both and therefore my patch will at least help some cases. I think logins.json is one of the most important pieces of user data so according to the API comments it seems like flush is appropriate, especially given the infrequent writes.

Can you confirm that it's not called during startup? If so, let's do that.

I can't think of any reason why it would get called during startup but I will instrument it and confirm. Sync and form submission would happen after startup. I guess it could happen at startup for proxy authentication auto-login but that is uncommon.

So, if nsIFile and writeAtomic write at the same time to the same file, anything can happen. It could be empty or missing. It could be corrupted.

Since JSONFIle.jsm never writes via an API other than writeAtomic we shouldn't have to worry about that.

Well, there's a call to moveTo that may end up non-trivial in some cases. Do we have guarantees that it's never called concurrently with writeAtomic?

At least for the logins.json consumer that looks to be the case (except for finalization) because every call to _save (which calls writeAtomic) is preceded by a call to ensureDataReady. The finalization piece isn't as clear to me.

It might be simple to replace this moveTo with OS.File.move.

I doesn't look that straightforward since ensureDataReady needs to be synchronous. JSONFile was made to handle legacy synchronous APIs which is what causes all the sync code paths that don't use OS.File… ideally we would convert all the storage APIs to be async but there is over a decade of code (including that in TB) that have built on top of this so that's not easy. I recently started converting some APIs to be async to support an async storage engine for Fenix. My understanding is that the Rust login store that we want to switch to eventually would also require async APIs so it will have to be done in the next year.

Also, do we have guarantees that we never write with writeAtomic during a nsIFileInputStream read?

For LoginStore.jsm that seems to be the case except again the finalization part is unclear as I'm not very familiar with how that all works.

Do the test report a missing logins.json or an empty file? If it's empty, the problem may have happened at OS-level or at battery-level. Or it could be a bug in OS.File, of course :)

I've seen dozens of reports of both cases.

Well, that's worrying. Out of curiosity, what happens currently when the file is corrupted?

We move the file to logins.json.corrupt and start as if the user never saved anything to login storage (which wouldn't mean an empty file).

Ok. For future reference, it might be interesting to have Telemetry on how often this happens and how often the file is empty.

Yeah, I agree. I filed bug 1599567 on this the other day.

Next year we want to switch to use the Rust logins store instead (which is backed my SQLite) and won't have these problems. I was looking for low-hanging fruit in the short-term but it seems like a flush may not be it. I can morph this bug into handling backupTo in the future when we sort out how to deal with the privacy issues of leaving a backup file around.

One of the issues with deleting the backup is that there is an extension API to disable the password manager and that API isn't really expected to show UI so we might not be able to simply prompt to ask the user if we should delete the backup.

Flags: needinfo?(MattN+bmo)
Whiteboard: [passwords:storage]

(In reply to :ehsan akhgari from comment #22)

(In reply to David Teller [:Yoric] (please use "needinfo") from comment #21)

Now, if the empty file appears after a Firefox crash (by opposition to a OS-level or hardware-level crash), it's most likely our fault.

I suspect we have some of both and therefore my patch will at least help some cases. I think logins.json is one of the most important pieces of user data so according to the API comments it seems like flush is appropriate, especially given the infrequent writes.

Can you confirm that it's not called during startup? If so, let's do that.

Ugh, that sounds like a recipe for having to remove the flush calls in a few months or years when we get bug reports about the performance problems.

To the best of my knowledge, on Windows at least, a flush call flushes all kernel file buffers. There is no way to just flush what needs to be written to the disk for one file. This means, for example, if your system is running on virtual memory, and pages need to be committed to the backing virtual memory file, your flush call is gonna block until that is done. Or if another application has copied 500MB of files in the background, your flush call is gonna block until all of those buffers touch the disk. This is why there is almost no "cheap" flush calls, and they ruin the performance of both Firefox and other applications running on the system. I believe Yoric knows this problem all too well as he mentioned in comment 19. :-)

Good to know. I abandoned the flush patch now based on this information.

I think the fact that we're talking about using flush instead of the textbook solutions available for this problem is a glaring sign that the choice to store the login database in a JSON file may have been a mistake, and it may be time to revise it perhaps.

I agree it was a mistake switching desktop (not Fennec, and thus maintaining two backends) from SQLite to JSON in bug 853549 (btw. I just deleted the storage-mozStorage.js SQLite backend yesterday as Fennec is now dead) but this was driven by the war on (main-thread) SQLite back in the day (example thread). We are already planning on switching to the Rust logins store for desktop next year but that is a big project and I thought there may be some low-hanging fruit here to address dataloss in the meantime.

Assignee: MattN+bmo → nobody
Status: ASSIGNED → NEW
Keywords: dataloss
Priority: P1 → P2

(In reply to Matthew N. [:MattN] (PM me if requests are blocking you) from comment #23)

One of the issues with deleting the backup is that there is an extension API to disable the password manager and that API isn't really expected to show UI so we might not be able to simply prompt to ask the user if we should delete the backup.

Technically, you can delete the backup silently after ~20 seconds, so that's ok.

But yeah, sqlite & co would be a better solution.

(In reply to David Teller [:Yoric] (please use "needinfo") from comment #25)

(In reply to Matthew N. [:MattN] (PM me if requests are blocking you) from comment #23)

One of the issues with deleting the backup is that there is an extension API to disable the password manager and that API isn't really expected to show UI so we might not be able to simply prompt to ask the user if we should delete the backup.

Technically, you can delete the backup silently after ~20 seconds, so that's ok.

Yeah, this isn't a as much a technical issue… we have users regularly reporting that the password manager was disabled and they didn't do so (I triaged one today)… every time it's because one of the user's extensions disabled it without them knowing. There is also an API for extensions to remove all saved logins and I've seen reports of extensions deleting user logins without them expecting it… in these cases a backup could help the user recover.

On the flip side, if a user intentionally deletes all saved logins from about:logins and disabled the password manager then they will be surprised if we kept a backup around indefinitely (since disabling the password manager could mean we never even load any of the login storage code to know whether to do any cleanups).

I would like backups to protect against cases like the above where extensions or the user deletes some logins but wants them back but I see what you're saying that in the short term we could simply use backupTo for some extra durability but not use it as an actual backup. I think there are still edge cases to consider with that approach and it's not a top priority with all the other things to fix with password manager.

I'm also still curious to test if tmpPath actually works.

(In reply to Matthew N. [:MattN] (PM me if requests are blocking you) from comment #24)

I think the fact that we're talking about using flush instead of the textbook solutions available for this problem is a glaring sign that the choice to store the login database in a JSON file may have been a mistake, and it may be time to revise it perhaps.

I agree it was a mistake switching desktop (not Fennec, and thus maintaining two backends) from SQLite to JSON in bug 853549 (btw. I just deleted the storage-mozStorage.js SQLite backend yesterday as Fennec is now dead) but this was driven by the war on (main-thread) SQLite back in the day (example thread).

Very true. The idea of moving storage backends from sqlite to plain JSON to avoid main-thread IO was pretty common place at the time. I think the history was partly informed by our mozStorage layer at the time not having full support for asynchronous access to sqlite (IIRC even when code was using the API that pretended to be async it would still end up doing a fair amount of main-thread IO back then). These days I believe many (most?/all?) of those issues in mozStorage have been fixed but still async sqlite access is something to achieve with great care.

We are already planning on switching to the Rust logins store for desktop next year but that is a big project and I thought there may be some low-hanging fruit here to address dataloss in the meantime.

Understood, makes sense.

Assignee: nobody → prathikshaprasadsuman
Status: NEW → ASSIGNED
Attachment #9155702 - Attachment is obsolete: true
Depends on: 1647934
Pushed by prathikshaprasadsuman@gmail.com: https://hg.mozilla.org/integration/autoland/rev/4654336b7664 Create a backup of logins.json to use when logins.json is missing or corrupt. r=MattN
Status: ASSIGNED → RESOLVED
Closed: 5 years ago4 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla79
Summary: logins.json is a 0-byte file → logins.json is a 0-byte file. Create a backup during writes to it.
Flags: qe-verify+
Regressions: 1650003

Hi! Is there any STR that manual QA can use to verify this? Thank you!

Flags: needinfo?(prathikshaprasadsuman)

(In reply to Alexandru Trif, QA [:atrif] from comment #34)

Hi! Is there any STR that manual QA can use to verify this? Thank you!

STR:

  1. Open Nightly and navigate to "about:support". Open the profile folder and check that there's no "logins-backup.json" file.
  2. Now set "signon.backup.enabled" pref to true. Navigate to "about:logins" and add a couple of logins.
  3. When you open the profile folder from "about:support" now, you should be able to see a "logins-backup.json" file.

This bug just creates a backup whenever new logins are added into Lockwise if the backup pref is turned on. We will be adding support to restore lost logins using this backup in Bug 1593467.

Flags: needinfo?(prathikshaprasadsuman)
Attached image logins-backup

Thank you for the detailed information! The logins-backup.json is indeed created as expected on Firefox Nightly (20200721215127) and Firefox 79.0 (20200720193547) after setting signon.backup.enabled pref to true.
But I saw that the last added login is not shown in logins-backup.json file. For example, I create 2 logins for facebook.com and reddit.com. The first created one (facebook.com) appears in logins-backup.json but the one I created for redit.com does not. If I add one more login (twitter.com), the logins-backup.json file now contains the facebook.com and reddit.com logins but not twitter. Is this expected? Thank you!

Flags: needinfo?(prathikshaprasadsuman)

(In reply to Alexandru Trif, QA [:atrif] from comment #36)

Is this expected? Thank you!

Yes, this is expected. The backup contains the data from before the last change that way if you want to undo the last change (e.g. removing all) you can recover from the backup.

Flags: needinfo?(prathikshaprasadsuman)

Thank you!
The issue is verified fixed using Firefox 79.0 (20200720193547) on Windows 10x64, macOS 10.12 and Ubuntu 18.04.

Status: RESOLVED → VERIFIED
Flags: qe-verify+
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: