Closed Bug 1530660 Opened 6 years ago Closed 5 years ago

[regression] All preferences lost after power loss, OS crash, etc

Categories

(Core :: Preferences: Backend, defect, P2)

67 Branch
defect

Tracking

()

VERIFIED FIXED
mozilla69
Tracking Status
firefox-esr60 --- unaffected
firefox67 --- wontfix
firefox68 + verified
firefox69 + verified

People

(Reporter: jgaunt, Assigned: kmag)

References

(Blocks 1 open bug, Regression)

Details

(Keywords: dataloss, regression)

Attachments

(1 file)

Upon first load of Nightly 67.0a1 (2019-02-25) I noticed the following:

Toolbar UI was reset such that every button was exposed, reload button was reset to default location, default flexible space had been reinserted.

Search preferences were reset to unified search (undesirable) and search suggestions were re-enabled (also undesirable).

Saved stylings for the Stylus addon were purged - I suppose I should make a habit of backing them up

Presumably more was reset to default than this but that's all I've noticed so far. Bit of a nuisance to reset all of these things, I would hate to imagine this happening to a less committed user.

Restore previous session was also toggled back to off by default. I wager all of my customization has been lost.

How about things like your bookmarks and history? Are those still around?

Flags: needinfo?(jgaunt)

Are your history and bookmarks still present? I'm curious if your Nightly is now using a different profile (bug 1474285). Do you run Beta or Release also? See bug 1528309 for example.

annevk reported seeing something similar in #fx-team - do you have an additional profile listed in about:profiles that has a name similar to "default"? If so, if you switch to it, are your customizations back?

thanks for the eyes

:mconley - bookmarks and history are intact

:jaws - I do run Release on the same machine, fwiw with a different profile (named Release) and -P "Release" -no-remote command line switches

also, profile creationDate is 17611, which maps to 2018-03-21... unless a new profile would retain the PCD of the previous it seems like it is not a new one?

Flags: needinfo?(jgaunt)

Got the same exact problem here. Bookmarks and history are indeed intact.

Flags: needinfo?(mconley)
Flags: needinfo?(jaws)

Interesting, okay. So it sounds like at least the preferences file was wiped out, but the rest of the profile stayed roughly intact.

What about your add-ons? If those are gone too, then this sounds like somehow a "profile refresh" was triggered.

I'm not sure where Stylish keeps its saved stylings, but given that it's a WebExtension, I have to assume it's indexedDB, since I believe that's the default now.

At the very least, if you go to about:support, open your Profile Folder, and find the prefs.js file in there... when was it created? Are there any other prefs*.js files in that folder?

Flags: needinfo?(mconley) → needinfo?(jgaunt)

My add-ons are still there, but all their preferences and data are gone. For instance, I had Mind the Time and Textarea Cache installed, and both of their history were completely wiped out. The same for Tampermonkey scripts and Stylus styles.

I have nine different prefs.js files (prefs.js, and prefs-1.js to prefs-8.js). Those which are numbered go from June 9th, 2016 to August 15, 2018. Regarding prefs.js, it was last modified some minutes ago, just when I launched Nightly again.

Moreover, since then, I've reinstalled some of my Stylus/Tampermonkey scripts, and those were not removed as I restarted Nightly, so my guess is that something went wrong during the transitioning from 66.0a1 to 67.0a1.

My prefs got reset again when launching Nightly this morning (67.0a1 (2019-03-03) (64-bit)). No new prefs file was created. Could you please take a look at this issue?

Flags: needinfo?(mconley)

(In reply to Enzo from comment #9)

My prefs got reset again when launching Nightly this morning (67.0a1 (2019-03-03) (64-bit)). No new prefs file was created. Could you please take a look at this issue?

Can you post a copy of the raw data (button there to copy it to the clipboard) from about:support ?

What OS are you using?

Flags: needinfo?(mconley) → needinfo?(spidersouris)

(In reply to :Gijs (he/him) from comment #10)

(In reply to Enzo from comment #9)

My prefs got reset again when launching Nightly this morning (67.0a1 (2019-03-03) (64-bit)). No new prefs file was created. Could you please take a look at this issue?

Can you post a copy of the raw data (button there to copy it to the clipboard) from about:support ?

What OS are you using?

Here's a copy of the raw data: https://hastebin.com/qoromiyayo.json

Using Windows 10, build 18334.

Flags: needinfo?(spidersouris)

(In reply to Mike Conley (:mconley) (:⚙️) from comment #7)

What about your add-ons? If those are gone too, then this sounds like somehow a "profile refresh" was triggered.

There's a way to check if this happened, IIRC using ProfileAge. Matt, can you remind us how to do this? (Should we get a bug on file to have that info included in about:support ? )

Flags: needinfo?(MattN+bmo)

(In reply to Enzo from comment #11)

(In reply to Enzo from comment #9)

My prefs got reset again when launching Nightly this morning (67.0a1 (2019-03-03) (64-bit)). No new prefs file was created. Could you please take a look at this issue?

Here's a copy of the raw data: https://hastebin.com/qoromiyayo.json

There are basically (almost) no modified prefs in here at all, and you did say that the prefs got reset. What specifically do you mean by "no new prefs file was created"? There are definitely changed-from-default prefs, so clearly they are stored somewhere. Are you saying the creation date of the prefs.js file in the profile folder stayed the same? Or just that there were some number N prefs(-42).js files in your profile and there aren't any more of them today? Or something else?

Regardless, this seems like something for :njn as this seems like an issue with the prefs backend if we're not able to reuse the prefs file.

Component: Toolbars and Customization → Preferences: Backend
Flags: needinfo?(n.nethercote)
Product: Firefox → Core

(In reply to :Gijs (he/him) from comment #13)

(In reply to Enzo from comment #11)

(In reply to Enzo from comment #9)

My prefs got reset again when launching Nightly this morning (67.0a1 (2019-03-03) (64-bit)). No new prefs file was created. Could you please take a look at this issue?

Here's a copy of the raw data: https://hastebin.com/qoromiyayo.json

There are basically (almost) no modified prefs in here at all, and you did say that the prefs got reset.

Actually, for instance, since this morning, I reinstalled some Stylus scripts and changed how some addons' behavior since the wipe happened, so that may be why. I should probably copy the raw data directly without modifying anything if the wipe ever happens again.

What specifically do you mean by "no new prefs file was created"? There are definitely changed-from-default prefs, so clearly they are stored somewhere.

It seems like my old prefs.js file which was saving all my addons data as well as Firefox preferences got somehow wiped out and replaced with a totally new one, just like if I had refreshed Firefox (which I didn't do - plus, addons (except their data/prefs), bookmarks and history are still there). After this happened, no other prefs.js file was created (as the oldest one is from August 2018); so the file seems to have been totally erased.

(In reply to Enzo from comment #14)

Actually, for instance, since this morning, I reinstalled some Stylus scripts and changed how some addons' behavior since the wipe happened, so that may be why. I should probably copy the raw data directly without modifying anything if the wipe ever happens again.

Add-on preferences / settings / scripts / whatever are not stored in prefs.js , I don't think. Luca, can you confirm?

Flags: needinfo?(lgreco)

The data stored by an extension using the storage.local API is not in the prefs.js file.

Nevertheless if there are general problems with loading that prefs file, then that issue is likely having an impact on the storage.local API as a side-effect, because we look for the values of some about:config preferences to:

  • detect if the storage.local API should use the IndexedDB backend or the JSONFIle backend (based on the bool preference named "extensions.webextensions.ExtensionStorageIDB.enabled")

  • detect if a particular extension has been already migrated to the storage.local IndexedDB backend (based on bool preferences named like "extensions.webextensions.ExtensionStorageIDB.migrated.<EXTENSION ID>")

As an example, if (because of the issues with the prefs file loading) we are getting false when retrieving the value for "extensions.webextensions.ExtensionStorageIDB.enabled",
then all the extensions are going to choose the storage.local JSONFile backend as the active backend, and for the extensions that were previously migrated to IndexedDB that would look like data loss, because the previous json data file has been migrated to IndexedDB and renamed to "storage.js.migrated" and the JSONFile backend will start from an empty file, whereas the actual extension data stored in the IndexedDB backend will not be used).

Flags: needinfo?(lgreco)

(In reply to :Gijs (he/him) from comment #12)

(In reply to Mike Conley (:mconley) (:⚙️) from comment #7)

What about your add-ons? If those are gone too, then this sounds like somehow a "profile refresh" was triggered.

There's a way to check if this happened, IIRC using ProfileAge. Matt, can you remind us how to do this?

Easiest for most people is to look at times.json in the profile folder (accessible from about:support). Otherwise, you can get a version of it from about:telemetry#environment-data-tab_profile (Environment Data -> Profile)

(Should we get a bug on file to have that info included in about:support ? )

I think it could be useful to show in about:support but maybe shouldn't be in the copied (raw) data since the timestamps can be used as a fairly unique identifier. One thing we've done before is report the timestamps with granularity the older they are and only provide one time bucket like some relative time JS libraries do: e.g. X hours ago => Y days ago => Z months ago => etc.

I filed bug 1532369 for this.

Flags: needinfo?(MattN+bmo)

It seems like my old prefs.js file which was saving all my addons data as well as Firefox preferences got somehow wiped out and replaced with a totally new one, just like if I had refreshed Firefox (which I didn't do - plus, addons (except their data/prefs), bookmarks and history are still there).

Do you have a Firefox account? If so, based on that description, I would guess that a new Nightly profile was created (as per bug 1474285), resulting in prefs being wiped, and then addons, bookmarks and history were re-populated via sync.

Flags: needinfo?(n.nethercote)

(In reply to Nicholas Nethercote [:njn] from comment #18)

It seems like my old prefs.js file which was saving all my addons data as well as Firefox preferences got somehow wiped out and replaced with a totally new one, just like if I had refreshed Firefox (which I didn't do - plus, addons (except their data/prefs), bookmarks and history are still there).

Do you have a Firefox account? If so, based on that description, I would guess that a new Nightly profile was created (as per bug 1474285), resulting in prefs being wiped, and then addons, bookmarks and history were re-populated via sync.

Pretty sure that's never automatic, there would have been prompts.

Something has gone wrong here, we should figure out what it is. The problem is that there's currently no steps to reproduce and no papertrail or other evidence that points out exactly what has gone wrong, AFAICT.

(In reply to :Gijs (he/him) from comment #19)

(In reply to Nicholas Nethercote [:njn] from comment #18)

It seems like my old prefs.js file which was saving all my addons data as well as Firefox preferences got somehow wiped out and replaced with a totally new one, just like if I had refreshed Firefox (which I didn't do - plus, addons (except their data/prefs), bookmarks and history are still there).

Do you have a Firefox account? If so, based on that description, I would guess that a new Nightly profile was created (as per bug 1474285), resulting in prefs being wiped, and then addons, bookmarks and history were re-populated via sync.

Pretty sure that's never automatic, there would have been prompts.

(ie, you would have to manually set up sync for that to happen)

My addons are all there.

My prefs.js is from today. It is the only one there.

Whether those details are still germane to this thread as it has evolved I'm uncertain.

fwiw I do use an account and sync

Flags: needinfo?(jgaunt) → needinfo?(mconley)

Hey njn, are there ever occasions where we'll try to read in a prefs.js file, decide it's corrupt or something, and then throw it out and generate a new one with defaults? In those cases, do we stash a copy of the corrupt prefs.js somewhere? Or do we write log files somewhere about that?

Flags: needinfo?(mconley) → needinfo?(n.nethercote)

Also, fwiw I just realized my snoozed tabs were lost - maybe there's a common store for those and stylus styles

Hey njn, are there ever occasions where we'll try to read in a prefs.js file, decide it's corrupt or something, and then throw it out and generate a new one with defaults? In those cases, do we stash a copy of the corrupt prefs.js somewhere?

Yes and yes.

There is a method Preferences::MakeBackupPrefFile. It copies prefs.js to Invalidprefs.js.
https://searchfox.org/mozilla-central/rev/3e0f1d95fcf8832413457e3bec802113bdd1f8e8/modules/libpref/Preferences.cpp#4045.

MakeBackupPrefFile is only called by Preferences::ReadSavedPrefs in the case where prefs.js is present but could not be read successfully: https://searchfox.org/mozilla-central/rev/3e0f1d95fcf8832413457e3bec802113bdd1f8e8/modules/libpref/Preferences.cpp#4022. In that case, any prefs that were successfully read before the error occurred (e.g. before a parse error) will stay in memory and should be put back into prefs.js the next time it gets written out.

ReadSavedPrefs is only called Preferences::InitializeUserPrefs is called, at startup.

As far as I know, the internals of libpref hasn't change at all recently. I looked around a bit but couldn't see anything suspicious about how it's being used.

Flags: needinfo?(n.nethercote)

Josh, are you on Windows as well? And is there an "invalidprefs.js" file in your profile?

Enzo: same question about the invalid prefs file.

I wonder if the launcher process accesses prefs...

Flags: needinfo?(spidersouris)
Flags: needinfo?(jgaunt)

(In reply to :Gijs (he/him) from comment #25)

Enzo: same question about the invalid prefs file.

I do not have any invalidprefs.js file in my profile folder, nope.

Flags: needinfo?(spidersouris)

Yes, Gijs, I am also on Windows but no, the only .js file in my profile folder is prefs.js

Flags: needinfo?(jgaunt) → needinfo?(gijskruitbosch+bugs)

This is extremely puzzling.

So, what do we know so far?

  1. We have two Windows users who, it seems, have had their prefs.js destroyed and reset to default after an update
  2. There is no invalidprefs.js in the profile folder, which I think means that libpref didn't choke on some invalid prefs

Has something happened recently in Nightly on Windows that would cause the above during an update, or during the early start of Firefox after an update? rstrong, does any of this sound familiar?

Flags: needinfo?(robert.strong.bugs)

There isn't any code in the updater that touches profiles so it is very unlikely that app update could be doing this. Maybe there is something in the installer code that runs after an update that might cause the above so needinfo'ing mhowell.

Flags: needinfo?(robert.strong.bugs) → needinfo?(mhowell)

(In reply to Robert Strong (Robert he/him) [:rstrong] (use needinfo to contact me) from comment #29)

Maybe there is something in the installer code that runs after an update that might cause the above so needinfo'ing mhowell.

Don't think so, the installer PostUpdate stuff doesn't touch anything inside the profile.

(In reply to :Gijs (he/him) from comment #25)

I wonder if the launcher process accesses prefs...

There is a launcher process pref, but the handling for that doesn't seem to be doing anything odd?

Flags: needinfo?(mhowell)

I'm out of ideas on how to investigate further here, or what could be the cause. Without steps to reproduce, I'm not sure what else we can do.

Flags: needinfo?(gijskruitbosch+bugs)

That makes sense, :GIJs - any ideas for what I could do to help better capture the state of transition if I notice this happening again?

Flags: needinfo?(gijskruitbosch+bugs)

(In reply to Josh Gaunt [:jgaunt] from comment #32)

That makes sense, :GIJs - any ideas for what I could do to help better capture the state of transition if I notice this happening again?

Not really, except if you have a backup solution of sorts, the states of the prefs file before/after this kind of thing happens, and whether e.g. about:crashes shows us having a (shutdown?) crash when last closing Firefox, or something?

Flags: needinfo?(gijskruitbosch+bugs)
Priority: -- → P3

Bug 1550967 has the same basic symptoms - we've started getting a number of reports of strange sync behaviour in 67, which ends up being the services.sync.username pref being reset

See also bug 1555201.

Summary: search preferences, addon preferences, UI customization all reset to default in 67.0a1 → Preferences and UI customization all reset to default

Like bug 1555201. a number of the sync reporters also reported a power-loss or os-crash type event.

See Also: → 1556161
Summary: Preferences and UI customization all reset to default → All preferences lost after power loss
Severity: normal → critical
Keywords: dataloss, regression
OS: Unspecified → All
Priority: P3 → --
Hardware: Unspecified → All
Summary: All preferences lost after power loss → [regression] All preferences lost after power loss

I can't remember encountering such a problem before, at least not in the last 5 years, and the influx of reports suggests that it's not a coincidence, so it must be a recent code change.

The prefs-1.js name suggests this being deliberate, so whatever code does that, that's the cause. That should be checked first: Any code that tries to move prefs.js to a new name with a unique filename. This is what caused this bug.

I ran my trunk build with my normal use profile. Then, my cleaning robot dragged the power cable out of the wall socket, resulting in a power loss of the computer. I started it again, and started my trunk build.

Now, all my preferences were lost. Everything was back to the defaults. For me, that's catastrophic, because I made a number of critically important security-relevant changes in the profile that absolutely need to be honored, and were not honored, and all the tabs were restored without the prefs. That's a catastrophe from a security standpoint.

I checked the profile folder, and there was a prefs-1.js file which did contain my old preferences. The file was not corrupted. I could simply kill firefox, copy prefs-1.js back to prefs.js, start Firefox again, and everything is fine.

Obviously, that's
a) not something a normal user can do - for him, the loss of preferences is final. For a normal users, this is a data loss bug.
b) too late for me, because all my tabs were loaded without my prefs being honored.

Importance:
Data loss -> Critical

Note: This is not file corruption due to the power outage. The prefs.js file was just fine. This is home-made data loss.

Summary: [regression] All preferences lost after power loss → [regression] All preferences lost after power loss, OS crash, etc

i can fairly consistently reproduce this by running firefox in a windows vm and forcing an improper shutdown (simulating a power cut). after the next boot, the prefs.js file will still be in the profile folder but apparently corrupted/blank if you open it with notepad. launching firefox will regenerate the prefs.js file as it does on first run.
firefox 66 wasn't yet affected this way as far as i can tell...

Has STR: --- → yes

(In reply to [:philipp] from comment #45)

i can fairly consistently reproduce this by running firefox in a windows vm and forcing an improper shutdown (simulating a power cut). after the next boot, the prefs.js file will still be in the profile folder but apparently corrupted/blank if you open it with notepad. launching firefox will regenerate the prefs.js file as it does on first run.
firefox 66 wasn't yet affected this way as far as i can tell...

Maybe you can find a regression range with mozregression then?

Needinfo to njn in case he's not watching the component since I see no priority/assignee set.

Has Regression Range: --- → no
Flags: needinfo?(n.nethercote)
Flags: needinfo?(madperson)

mozregression doesn't work here unfortunately since the steps involve shutting down/powering off a vm. i tried to manually bisect the issue, which is a bit tedious but i think i first saw the issue in nightly build 20190130094929:

(i can't guarantee for the correctness of this result though, since the issue doesn't reproduce 100% reliably and there may be some other variations playing into this - sometimes windows performs some filecheck after a forced shutdown, at other times not for example...)

Flags: needinfo?(madperson)

Thanks for the info, Phillipp. I'm not aware of any recent (or even not-so-recent) changes to pref saving, and I don't see anything obvious in the regression range you posted. I will look into this, but I won't be able to get to it until after the Whistler work week, i.e. the week of June 24.

Flags: needinfo?(n.nethercote)
Priority: -- → P2

(In reply to [:philipp] from comment #45)

i can fairly consistently reproduce this by running firefox in a windows vm and forcing an improper shutdown

Me too. On about 6 tries, 3 caused these symptoms. In 2 of the bad cases, an update was staged (ie, hamburger had the "restart to update" widget) as I powered down, but at least once when that was the case I didn't experience the problem.

When the problem occurred, prefs.js existed after restarting and had the expected size - however, it was full of \0 chars. There was no other "prefs*.js" in the profile directory. Sadly I didn't take a copy of prefs.js immediately before the power-down, so I can't be sure whether the \0 filled file was created before or after the crash.

Sadly I'm also not going to have more time to dig for another week.

Mark Hammond wrote to me by email:

https://searchfox.org/mozilla-central/source/modules/libpref/Preferences.cpp#2968 is where it's written, via an "atomic output file" abstraction at https://searchfox.org/mozilla-central/source/netwerk/base/nsFileStreams.cpp#832. None of the code involved has changed in a long time

Ben wrote:
Ah, that "atomic write" might explain the prefs-1.js . Does the temp file code create "-1" filenames?

Mark wrote:
Yes. But it's not clear that this is actually the problem. Obviously a power-outage while writing that temp file will leave that file behind, but that shouldn't touch the non-temp file. We rely on the OS to perform the atomic rename. When I repro'd it, no -1 files were involved.

Ben responds:
I see. It's becoming clearer to me. Like others, I ran into a situation where my prefs.js was emptied, because of a write during the power outage. I just happened to be more lucky than others, that I had an infact working prefs-1.js left over.

For this to happen, the power outage would have to be at the very moment while the prefs.js is being written. Normally, that should be a very brief moment, and not happen often. It should definitely not happen while nobody is using the computer, because there would be no reason to save prefs. In my case, nobody was using the composer since about 1 hour when this happened.

So, one reason why this appears much more often recently might be that there is something that triggers prefs.js writes a lot lot lot more often than before. It would need to be dramatic increase in save actions for this sudden spike in cases.

prefs.js gets written more than you might think -- any time a pref changes, prefs.js will be saved within 500ms. This is an attempt to minimize the chance of data loss. This behaviour has been in place for a long time, probably a couple of years.

So it's plausible to me that once in a blue moon someone might get unlucky and have their system crash at just the wrong time during a prefs.js save. But I don't know why there is a recent increase in this happening.

I don't think this is related to user actions. It happened to me again today and it was again after my PC failed to wake up from hibernation. Maybe there is some watch that checks PC going to sleep and tries to save something?

On mozilla-inbound, 7ea334118c82e3609be2a1cfa3c5b867017c4937 is definitely bad and cd4c2708c960b0db005a5e26a73417cb66dd16ee has never failed for me after many many attempts.

Regression window:
https://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?fromchange=cd4c2708c960b0db005a5e26a73417cb66dd16ee&tochange=7ea334118c82e3609be2a1cfa3c5b867017c4937

Regressed by Bug 1478124.

Regressed by: 1478124

Kris, any chance you can take a look and see how your changes may have affected pref saving? Thanks.

Flags: needinfo?(kmaglione+bmo)

Comparing prefs.js file access events between 66 and 67 using Process Monitor, the series of events and the frequency appear largely the same. However on closer inspection the time delay before the last CloseFile is now shorter than it used to be, before the first bad build it was 100-200ms but after it is 30-40ms (with a HDD). This confirms that Bug 1478124 has changed file access behavior in some way.

I can't think of a reason those changes would be relevant. My best guess is that it's just a coincidence and some timing changes make it more likely for those changes to be lost if there's filesystem corruption. The only way around it I can think of is to fsync after writing prefs files.

Flags: needinfo?(kmaglione+bmo)

Hm. Actually, it looks like we are supposed to be doing an fsync, but my registration conversation script chose the AtomicFileOutputStream constructor for the safe-file-output-stream contract ID.

Assignee: nobody → kmaglione+bmo

Thanks so much, Kestrel, for pin-pointing the exact change that caused it!

(In reply to Ben Bucksch (:BenB) from comment #60)

Thanks so much, Kestrel, for pin-pointing the exact change that caused it!

Glad to help. Phillipp's regression window in Comment 47 was already very close so it wasn't too much work narrowing it down from there.

More details about nsSafeFileOutputStream for those interested:

A safe file output stream that overwrites the destination file only once writing + flushing is complete. This protects against more classes of software/hardware errors than nsAtomicFileOutputStream, at the expense of being more costly to the disk, OS and battery.

Several years ago, I noticed nsSafeFileOuputStream was not as safe as the comment suggested.

I can only recall a very vague memory of the issue: that could have been already fixed. I think it was an issue of either failing to check the return value of write() call that can return an ENOSPC under linux indicating that the file system is full and writing cannot proceed.
However, back then nsSafeFileOutputSream ignored the return value of write() during the write operation and only check
the return value of close(). close() does NOT return error in this case.
Thus, the nsSafeFileOutputStream prteneded that it copied the file intact, but was completely in error.
The issue could have been fixed since now the comment mentions ": flushing": buffering makes the situation more complex since the error only appears when the flushing happens. Now a file close operation ( a la fclose() ) can return ENOSPC. because it involves writing the unflushed data if any.
My memory is hazy and the problem I mentioned could have been with nsAtomicFileOuptutStream.

But if unbuffered output is used by either of the routine still today, we should check the error value of each write() operation because, like I mentioned above, close() may not return the ENOSPC error condition at all while write() does in a full file system.

I noticed the problem with nsSafeFileOutputSream when I went over the file I/O code for TB and created a patch set for using buffered I/O output for TB. In doing so, I noticed the sloppy coding that did not check the return value of write() or close() operation at all and tried to fix those instances.
Currently TB writes 75 character lines without buffering at all. (Typically mime encoded photo, etc. has many lines of 75 octets.)
I have not been able to land the patch set yet due to the sheer size of the patch and due to the subtle semantic difference of file operation under Windows and POSIX (linux/UNIX, etc.) such as we cannot rename/remove files while they are open under Windows while it is OK to rename/remove files under linux, etc. I test locally under linux and only find a problem under windows after a extensive change of the source tree.
The sheer size alone makes it difficult to keep the patch alive in sync with the latest source tree. Recent clang-format change does nto make things easiler a bit for now. :-( I have begun progressing again to keep the patch in sync with the latest source: however, I have to figure out why patch set causes about 30 errors out of 1000+ tests of TB's |make mozmil| test. This started to happen in April-July time frame last year.
Until then, the patch worked beautifully...

Anyway, I would certainly caution anybody to rely on the nsSafeFileOutputStream without checking its code carefully. The problem was definitely there. It could have been fixed in the intervening years.

PreferencesWriter::Write will query an nsISafeOutputStream implementation from nsBufferedOutputStream:
https://searchfox.org/mozilla-central/rev/b3b401254229f0a26f7ee625ef5f09c6c31e3949/modules/libpref/Preferences.cpp#3002-3003,3032

nsSafeFileOutputStream::Finish calls Flush probably because of ensuring the disk write before renaming operation:
https://searchfox.org/mozilla-central/rev/b3b401254229f0a26f7ee625ef5f09c6c31e3949/netwerk/base/nsFileStreams.cpp#893

But nsBufferedOutputStream::Flush does not call Flush on the underlying stream:
https://searchfox.org/mozilla-central/rev/b3b401254229f0a26f7ee625ef5f09c6c31e3949/netwerk/base/nsBufferedStreams.cpp#944-972

So sync will not be called before renaming to the final filename.
https://searchfox.org/mozilla-central/rev/b3b401254229f0a26f7ee625ef5f09c6c31e3949/netwerk/base/nsFileStreams.cpp#229

It will explain comment #49 (\0 filled file.)

Ah, sorry, I missed that nsBufferedOutputStream has its own Finish implementation:
https://searchfox.org/mozilla-central/rev/b3b401254229f0a26f7ee625ef5f09c6c31e3949/netwerk/base/nsBufferedStreams.cpp#976

Please disregard the previous comment.

BTW, I offered this observation. I didn't think it was relevant when I was cc'ed on this bug, but now come to think of it, it does seem relevant.

In the last few weeks, immediately after an upgrade of FF, I ** was offered ** to clean up my profile out of blue (!)
I was in the middle of some server setup maintenance and needed to search some esoteric operation of a server software quickly and
without much thinking I hit [OK], which was basically a disaster for me as well.
** BUT AT LEAST, I WAS OFFERED THE DIALOG, and I ACCEPTED IT. **
So I am not complaining it here, and I thought it was irrelevant to this bug, but now I have this big question/observation.

Could it be that somehow the reporters of the symptom somehow was hidden from the purported popup dialog?
(I have noticed over the years that Thunderbird often failed to show the popup dialog in the foremost position of various windows, i.e., the popup somehow is hidden behind the main window or some other windows, hard to describe, and I have no clue sometimes why the operation is not invoked, or for that matter, why the main window is not responsive because thunderbird is waiting for my input to the hidden dialog.
Since thunderbird shares much of the basic operation with FF, I bet there are cases where the popup may not be shown at the foremost position.)

Of course, people including myself would like to know

  • why TB suddenly asks for cleanup of profile after the upgrade (is it intentional?), or
  • even if the popup is possibly hidden, it does not explain why the user interaction somehow is accepted as OK to proceed with such proposal of profile cleanup.

Just an observation.

  • why TB suddenly asks for cleanup of profile after the upgrade (is it intentional?), or
    I meant to say why FF suddenly asks ...
  • even if the popup is possibly hidden, it does not explain why the user interaction somehow is accepted as OK to proceed with such proposal of profile cleanup.

Chiaki, I don't think the profile cleanup is important to this bug, as most people ran into it without cleanup. Could you please post elsewhere in another bug?

(In reply to Ben Bucksch (:BenB) from comment #67)

Chiaki, I don't think the profile cleanup is important to this bug, as most people ran into it without cleanup. Could you please post elsewhere in another bug?

OK, I will.
TIA.

Flags: needinfo?(jaws)

(In reply to ISHIKAWA, Chiaki from comment #68)

(In reply to Ben Bucksch (:BenB) from comment #67)

Chiaki, I don't think the profile cleanup is important to this bug, as most people ran into it without cleanup. Could you please post elsewhere in another bug?

OK, I will.
TIA.

I filed Bug 1560334
RFC: Please disable unsolicited Firefox profile cleanup unsolicited dialog - Less value for more trouble.

(In reply to Ben Bucksch (:BenB) from comment #50)

Mark Hammond wrote to me by email:

https://searchfox.org/mozilla-central/source/modules/libpref/Preferences.cpp#2968 is where it's written, via an "atomic output file" abstraction at https://searchfox.org/mozilla-central/source/netwerk/base/nsFileStreams.cpp#832. None of the code involved has changed in a long time

While we are at this, we should probably do a "boy scout" clean up of the following part.
Otherwise, the corruption may never be reported to a sizable portion of users who experienced the problem
(possibly with broken file system to begin with).

https://searchfox.org/mozilla-central/source/modules/libpref/Preferences.cpp#2995
I put a * mark at the beginning : the error value of outStream need to be checked and if the error was noticed then that has to be taken care of then and there.

// Sort the preferences to make a readable file on disk.
aPrefs.Sort(CharComparator());

// Write out the file header.
  • outStream->Write(kPrefFileHeader, sizeof(kPrefFileHeader) - 1,
    &writeAmount);

    for (nsCString& pref : aPrefs) {

  •  outStream->Write(pref.get(), pref.Length(), &writeAmount);
    
  •  outStream->Write(NS_LINEBREAK, NS_LINEBREAK_LEN, &writeAmount);
    

    }

    // Tell the safe output stream to overwrite the real prefs file. <--- This is too late if errors occur above.
    // (It'll abort if there were any errors during writing.)
    nsCOMPtr<nsISafeOutputStream> safeStream = do_QueryInterface(outStream);
    MOZ_ASSERT(safeStream, "expected a safe output stream!");
    if (safeStream) {
    rv = safeStream->Finish();
    }

Changing the stream to safeStream type and expects it to work miracles after the fact won't cut it
if the earlier error prevented the data to be written in the first place.

TIA

Reproduction here is:

  1. Firefox runs
  2. Power off computer without shutting down, e.g. by plugging power cable

Result in Firefox 67 and without this patch:
Profile corrupted, preferences file is corrupted
Result in Firefox 66:
Profile works, preferences file is OK
Result in with this patch:
Profile works, preferences file is OK

Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla69

[Tracking Requested - why for this release]:

This is a data loss bug, it'd be nice to get this out into a point release, especially given that the fix is a one liner.

(In reply to Mark Hammond [:markh] from comment #77)

(In reply to Asif Youssuff from comment #76)

especially given that the fix is a one liner.

I gotta say though, the fact that there's an 'nsAtomicFileOutputStream' class (which turns out to apparently not be atomic for this use-case) and the 'nsSafeFileOutputStream' which should be used here seems a footgun that might bite elsewhere?

It doesn't seem to be used much outside of nsSafeFileOutputStream: https://searchfox.org/mozilla-central/search?q=nsAtomicFileOutputStream&path= .

(In reply to Laurentiu Nicola from comment #78)

It doesn't seem to be used much outside of nsSafeFileOutputStream: https://searchfox.org/mozilla-central/search?q=nsAtomicFileOutputStream&path= .

That's what I mean - should it exist?

[Tracking Requested - why for this release]:

This is a data loss bug, it'd be nice to get this out into a point release, especially given that the fix is a one liner.

Please don't mess with tracking flags...

(In reply to Asif Youssuff from comment #80)

[Tracking Requested - why for this release]:

This is a data loss bug, it'd be nice to get this out into a point release, especially given that the fix is a one liner.

The bug was already tracked for 68 (the + state)! You have just turned that off and re-requested release management to consider it. Please stop adjusting these flags!

(In reply to Ben Bucksch (:BenB) from comment #73)

Comment on attachment 9073110 [details]
Bug 1530660: Fix registration for nsSafeFileOutputStream. r=erahm

ESR Uplift Approval Request

  • If this is not a sec:{high,crit} bug, please state case for ESR consideration: This is a dataloss bug and regression that many users ran into. We got many reports from reddit with the same symptoms. The dataloss reports spiked when the regression was shipped in Firefox 67.
  • User impact if declined: * User loses all preferences.
  • Even security-relevant preferences are reset, which can expose certain users
  • Fix Landed on Version: 69
  • Risk to taking this patch: Low
  • Why is the change risky/not risky? (and alternatives if risky): The patch restores the state of Firefox 66, as far as we understand
  • String or UUID changes made by this patch: -

This was not the right flag to set.

Comment on attachment 9073110 [details]
Bug 1530660: Fix registration for nsSafeFileOutputStream. r=erahm

Beta/Release Uplift Approval Request

  • User impact if declined: Users could lose all their prefs in event of a power loss or crash.
  • Is this code covered by automated tests?: Unknown
  • Has the fix been verified in Nightly?: Yes
  • Needs manual test from QE?: No
  • If yes, steps to reproduce:
  • List of other uplifts needed: None
  • Risk to taking this patch: Low
  • Why is the change risky/not risky? (and alternatives if risky): Trivial, one-liner patch.
  • String changes made/needed: None
Attachment #9073110 - Flags: approval-mozilla-esr68? → approval-mozilla-beta?

(In reply to Aaron Klotz [:aklotz] from comment #82)

The bug was already tracked for 68 (the + state)! You have just turned that off and re-requested release management to consider it. Please stop adjusting these flags!

Honest mistake, maybe next time try to be a little kinder or more positive in correcting my error. The documentation on tracking flags did not make it clear what + meant: https://wiki.mozilla.org/BMO/UserGuide#Tracking_Flags - I realize now that I should have looked at the linked page in addition.

Comment on attachment 9073110 [details]
Bug 1530660: Fix registration for nsSafeFileOutputStream. r=erahm

dataloss fix, approved for 68.0b14

Attachment #9073110 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Depends on: 1562047
No longer depends on: 1562047
See Also: → 1562047

(In reply to ISHIKAWA, Chiaki from comment #70)

While we are at this, we should probably do a "boy scout" clean up of the following part.

Could you file a follow-up to cover this part?

Flags: needinfo?(ishikawa)
Flags: qe-verify+

(In reply to :Gijs (he/him) from comment #89)

(In reply to ISHIKAWA, Chiaki from comment #70)

While we are at this, we should probably do a "boy scout" clean up of the following part.

Could you file a follow-up to cover this part?

I am in the middle of building TB using a new source tree updated last Friday or so.
I have encountered a few problems.
https://bugzilla.mozilla.org/show_bug.cgi?id=1562504

Once I get rid of the build issue, I will prepare a patch or something.

I will keep the needinfo uncleared until I can come back with a patch.

TIA

I managed to reproduce the issue on Firefox 67.0a1 (2019-02-25), under Windows 10x64 with force shut down.
The issue is no longer reproducible on Firefox 68.0b14, or on Firefox 69.0a1 (2019-06-30).
Tests were executed under Windows 10x64, macOS 10.11.6 and Ubuntu 16.04x64.

Note that I wasn't able to reproduce the issue by crashing Firefox.
I was focused on bookmarks, history, title bar, menu bar, addons/themes and customization.

Status: RESOLVED → VERIFIED
Flags: qe-verify+

hi, i don't know if adding a comment here is the right thing to do.
anyway, i'm on debian, firefox 74.0.1 and after a power loss i've a corrupt logins.json and i've lost all my passwords and credentials.
firefox preview on my android device appeared to have lost them as well somehow. after trying to sync a few times i got them back, but now (i don't know if that's normal) i have duplicates whenever i add back any credential from desktop.

i hope it's clear what i'm saying.

(In reply to ISHIKAWA, Chiaki from comment #90)

(In reply to :Gijs (he/him) from comment #89)

(In reply to ISHIKAWA, Chiaki from comment #70)

While we are at this, we should probably do a "boy scout" clean up of the following part.

Could you file a follow-up to cover this part?

I am in the middle of building TB using a new source tree updated last Friday or so.
I have encountered a few problems.
https://bugzilla.mozilla.org/show_bug.cgi?id=1562504

Once I get rid of the build issue, I will prepare a patch or something.

I will keep the needinfo uncleared until I can come back with a patch.

TIA

Finally, the source tree has been in a good shape, TB's transition to mochitest from |make mozmill| has settled down (it is more than a year), and althought valgrind cannot run TB's mochitest at all, at least valgrind can run TB's xpcshell tests adequately and I could find some extra bugs thansk to it.
I am trying to clean up my local patch queue in various ways.
Finally, I have time to pay attention to this issue.
And the situation is bad. Not only write() is not checked during the the writing.
Despite the write() uses buffering, the final Flush() return value is not checked.
It is exactly when an error may be encountered.
All so called nsSaveOutputStream::Finish() is trying to make sure the final renaming will be OK, but I doubt if that code is correct after reading the
user experiences reported here.
Anyway, I will try to at least check the failure of write() and Flush() and return the value to callers.
That is a start.
I will file a new bug by cloning this bug.

Flags: needinfo?(ishikawa)
Blocks: 1700420

FWIW, I had suffered from this bug (comment 43 ff). However, since the fix here landed, I had not seen the bug anymore. I had several power outages since then, even one today, but had not suffered dataloss anymore.

The fix here helped a lot. Thank you, Mark Hammond, Kestrel (regression window), Kris Maglione :kmag (actual patch), Chiaki, and everyone else involved.

Maybe the fix was not perfect and can be made even safer, I cannot speak to that. I'm just saying that I appreciate the relatively quick response back then, and the fix here helped a lot. It probably prevented a lot of data loss in the last year for me. Thank you!

(In reply to Ben Bucksch (:BenB) from comment #94)

FWIW, I had suffered from this bug (comment 43 ff). However, since the fix here landed, I had not seen the bug anymore. I had several power outages since then, even one today, but had not suffered dataloss anymore.

The fix here helped a lot. Thank you, Mark Hammond, Kestrel (regression window), Kris Maglione :kmag (actual patch), Chiaki, and everyone else involved.

Maybe the fix was not perfect and can be made even safer, I cannot speak to that. I'm just saying that I appreciate the relatively quick response back then, and the fix here helped a lot. It probably prevented a lot of data loss in the last year for me. Thank you!

Great to hear your experience.

I filed bug 1700420 because there was something fishy in the code.
As it turned out my initial doubt was a bit misguided, Preferences::Write() is not probably buggy.
But as I waded through the code, I found one place where
a runtime error (due to, who knows, maybe a broken file system, temporal network outage when someone uses a remote filesystem, or maybe a program error due to interfering threads, etc. ) is not handled when the last flushing of a pending write accumulated in a write buffer is done.
That is in |nsSafeFileOutputStream::Finish()| of all the places.
That needs to be fixed.

From what I hear from you, I suspect then the majority of the problems that happen AFTER the write completes successfully are gone. So I don't have to worry about it any more probably. That IS great. (What happens AFTER the successful write to a temporary file is something like
copying of the original to a different name (prefs-1.js, I think), renaming the new temporary file (that has just been written) to prefs.js, and
after verifying the successful renaming, the renamed original is removed, etc. )

I need TB to work flawless and robustly to handle many mails for an annual exhibition and technical conference paper submissions (and lots more).
I like its powerful filtering and searching feature.
So, I am trying to make it as robust as possible by occasional patch submission. (TB once ate about half of my input folder during compaction. That prompted me to look into the code and send patches.)
Great to hear that patches from programming community helps users.

Happy Hacking.

Some of my addons' settings got reset after I had a memory corruption BSoD. My firefox profiles are on the ssd. I'm using the ESR version though.

Has Regression Range: no → yes
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: