Closed Bug 477934 Opened 11 years ago Closed 11 years ago

nsSafeFileOutputStream (prefs.js) not safe from system crashes

Categories

(Core :: Networking: File, defect, critical)

defect
Not set
critical

Tracking

()

VERIFIED FIXED
mozilla1.9.2a1

People

(Reporter: rideau3, Assigned: timeless)

References

(Blocks 1 open bug)

Details

(Keywords: dataloss, fixed1.8.1.22, verified1.9.1)

Attachments

(1 file, 1 obsolete file)

673 bytes, patch
Biesinger
: review+
Details | Diff | Splinter Review
User-Agent:       Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.0.5) Gecko/2008121622 Ubuntu/8.10 (intrepid) Firefox/3.0.5
Build Identifier: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.0.5) Gecko/2008121622 Ubuntu/8.10 (intrepid) Firefox/3.0.5

If the OS crashes or processing otherwise terminates, prefs.js has a tendency to get corrupted, and Firefox just resets the whole thing. Unfortunately, the first thing I do upon startup is load firefox, so I keep forgetting to check to see whether any corruption has occurred, so I can't produce the corrupted version.

Reproducible: Sometimes

Steps to Reproduce:
1. Crash the OS (or hold down the power button)
2. Load Firefox.
Actual Results:  
Many preferences have disappeared, including all my NoScript & Chatzilla options. These are all stored in prefs.js

Expected Results:  
Firefox should not lose my preferences.

I'm on ext4, so a filesystem bug could be the culprit, but the journal should prevent any serious damage to the files. So Firefox must not be writing to the file all at once and have no backups.
Version: unspecified → 3.0 Branch
Update: I managed to look at prefs.js this time, and the file was completely empty, rather than corrupted.

This would imply the failing is somewhere between the fopen call that resets the file, and the actual writing to the file.

The only solution I can think of is to write to a backup file after writing to the main prefs.js, so that if prefs.js is lost, then backup can be used to restore prefs.
See Bug 246722 Comment #2 for history of backups.number_of_prefs_copies.
It appears to be a conflict between the way ext4 handles delayed allocation, and the way Firefox writes files. More information at https://bugs.edge.launchpad.net/ubuntu/+source/linux/+bug/317781/comments/88
Component: Preferences → Networking: File
Keywords: dataloss
Product: Firefox → Core
QA Contact: preferences → networking.file
Summary: prefs.js not crash-safe → nsSafeFileOutputStream (prefs.js) not safe from system crashes
Version: 3.0 Branch → Trunk
Attached patch flush firstSplinter Review
in theory this should fix it....

otoh, i don't intend to setup linux+ext4 and then intentionally cause my system to crash just to test it.

note that i'm not testing the return value from Flush(). It could be tested, but i'm not sure what's gained from doing that.

Either the user wants a chance to get their new data, or they don't.
Assignee: nobody → timeless
Status: UNCONFIRMED → ASSIGNED
Ever confirmed: true
Attachment #367003 - Flags: review?(cbiesinger)
Attachment #367003 - Flags: review?(cbiesinger) → review+
http://hg.mozilla.org/mozilla-central/rev/3f38a00b89cd
Status: ASSIGNED → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Duplicate of this bug: 318156
Attachment #367003 - Flags: approval1.9.1?
Comment on attachment 367003 [details] [diff] [review]
flush first

This would also make Session Restore more reliable (cf. bug 406378).
Duplicate of this bug: 406378
Comment on attachment 367003 [details] [diff] [review]
flush first

a191=beltzner
Attachment #367003 - Flags: approval1.9.1? → approval1.9.1+
Keywords: checkin-needed
This checkin was in a range identified with a Ts Shutdown regression on Windows:

Regression: Ts Shutdown increase 27.64% on WINNT 5.1 Firefox3.5
   Previous results:
       357.263 from build 20090506145316 of revision 486b76052a94 at 2009-05-06 14:53:00
   New results:
       456.0 from build 20090506155401 of revision 7aa4483585bd at 2009-05-06 15:54:00
http://graphs-new.mozilla.org/graph.html#tests=[{"machine":32,"test":36,"branch":3},{"machine":33,"test":36,"branch":3},{"machine":34,"test":36,"branch":3},{"machine":35,"test":36,"branch":3},{"machine":48,"test":36,"branch":3}]&sel=1241564040,1241736840
Backed out from trunk and branch. If there's no fix for the perf regression and it's considered less severe than the dataloss issue, you might want to reland this as is on trunk and re-request branch approval.
Status: RESOLVED → REOPENED
Keywords: fixed1.9.1
Resolution: FIXED → ---
Attachment #367003 - Flags: approval1.9.1+
So, if it's supposed to be a safe output stream, and we are using it as such, and it's not actually safe, it seems to be a serious issue.  This might just be a performance regression we need to eat.
Flags: blocking1.9.1?
Attached file (obsolete) —
test comment
Attachment #376378 - Flags: approval1.8.1.next?
Attachment #376378 - Attachment description: test description →
Attachment #376378 - Attachment is obsolete: true
Attachment #376378 - Attachment is patch: false
Attachment #376378 - Flags: approval1.8.1.next?
dao: so um. this is a mandatory flush. basically anytime we're seeing a perf penalty here, it's because the system was planning to lose our data if it died.

we're paying for an essential data flush.

if someone wants to rework the flush so that it happens on a thread while we shut down, well, in theory, that might be possible, but given the totally broken state of unix platforms, you probably wouldn't see a win for a while anyway. (oh, and the complexity of such a thing would be amazingly painful, you'd have to create a queue and make sure that renames cooperated sequentially and that you shut down the flusher before you quit.)

now, i don't care, but if you're going to back it out, then i hope that you'll also reland it.

i'm traveling for business and am not going to campaign to save your [firefox] users' data.

to be perfectly frank, i'm quite happy for everyone else to lose their data. I've done my part by providing the necessary fix.
Assignee: timeless → dao
Status: REOPENED → ASSIGNED
Keyword: relnote. Assuming this is not fixed for 3.5, here is the proposed release note:

"Users on the Linux platform may randomly experience loss of preferences when Firefox shuts down. Plan accordingly."
Keywords: relnote
(In reply to comment #16)
> "Users on the Linux platform may randomly experience loss of preferences when
> Firefox shuts down. Plan accordingly."

It's certainly not random. It's a compounding effect from a crash.
(In reply to comment #16)
This actually affects all platforms, not only preferences (e.g. sessionstore.js as well) and only the case of a system crash.
OS: Linux → All
Hardware: x86 → All
Assignee: dao → nobody
Status: ASSIGNED → NEW
Dataloss vs. TsShutdown ... hard call, but I think the arguments in here have swayed me to believe that this is more important.

Please reland on trunk and branch.
Flags: blocking1.9.1? → blocking1.9.1+
Assignee: nobody → timeless
Keywords: checkin-needed
Flags: wanted1.9.0.x?
Keywords: relnote
Re-landed on trunk and 1.9.1.

http://hg.mozilla.org/mozilla-central/rev/6bb086bb8b90
http://hg.mozilla.org/releases/mozilla-1.9.1/rev/5bb3fc3ab02f
Status: NEW → RESOLVED
Closed: 11 years ago11 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla1.9.2a1
Would this be portable to 1.8 for the next Tb 2.x update ?
Flags: wanted1.9.0.x? → wanted1.8.1.x?
I suspect this wants to actually block 1.8.1.next to help Thunderbird profile lossage issues; nominating.
Flags: blocking1.8.1.next?
Oups
Flags: wanted1.9.0.x?
This is coming in too late to block for 1.8.1.22 -- we don't have time to hassle this into the tree. If you want it fixed now rather than two-three months from now we'll take the patch if you request approval and can promise to land it on the 1.8 branch _TODAY_, Friday May 29.
Flags: wanted1.8.1.x? → wanted1.8.1.x+
Comment on attachment 367003 [details] [diff] [review]
flush first

Approved for 1.8.1.22, a=dveditz
Attachment #367003 - Flags: approval1.8.1.next+
Checking in netwerk/base/src/nsFileStreams.cpp;
/cvsroot/mozilla/netwerk/base/src/nsFileStreams.cpp,v  <--  nsFileStreams.cpp
new revision: 1.72.8.2; previous revision: 1.72.8.1
done
Flags: blocking1.8.1.next?
Keywords: fixed1.8.1.22
Depends on: 496825
Verified fixed on trunk and 1.9.1 based on code check-in.

I tried it also with the steps given by the reporter and didn't loose settings. Reporter, can you verify the fix too?
Status: RESOLVED → VERIFIED
This change has resulted in a fairly good chance of triggering a NS_ERROR_FILE_ACCESS_DENIED exceptions when nsSafeFileOutputStream::Finish() is called on my home PC.  The problem is that the Flush() is triggering Norton Internet Security 2009 (and presumably Norton Antivirus 2009) to scan the file which can result in the temporary file being locked when the move occurs, resulting in the changes being lost.

This is resulting in frequent loss of data when saving preferences, installing add-ons, etc.

See bug 496825
Bug 496825 Comment 15 makes me think that we need to get in touch with Symantic.  Perhaps Sam has some insight here?
I contacted them, as I said in that bug... Michael's comment was split off to a new bug, which is where we're tracking that (potential) regression.
Flags: wanted1.9.0.x?
You need to log in before you can comment on or make changes to this bug.