Closed Bug 464486 Opened 16 years ago Closed 15 years ago

lots of bookmark corruption reported via user support

Categories

(Firefox :: Bookmarks & History, defect, P1)

defect

Tracking

()

VERIFIED FIXED
Firefox 3.6a1

People

(Reporter: dietrich, Assigned: dietrich)

References

Details

(Keywords: dataloss, fixed1.9.0.11, verified1.9.1)

Attachments

(1 file, 1 obsolete file)

      No description provided.
Severity: normal → critical
Flags: blocking-firefox3.1?
Keywords: dataloss
Priority: -- → P1
Target Milestone: --- → Firefox 3.1
according to cww:

various reported problems:
- history not working, but bookmarks are
- unfiled bookmarks folder missing (or other root folder)
- random chunks of bookmarks missing

the json backups are valid, and can be used to workaround the problem.
cww also said many people are saying this occurs in conjunction with power loss or sudden system restarts.
Useful information to know:
value of toolkit.storage.synchronous
OS version
Actually, it just occurred to me that one reason we may see a spike is if the force restart that happens with an OS patch may be at fault.  This is purely conjecture.
Also, a common symptom is still toolbar buttons not working AND awesomebar not working AND bookmarks not saving.  (i.e. places.sqlite is so corrupted that it's not letting Firefox save anything to it.)

This bug does NOT cover the other two causes for that problem which we exclude when we get data from users:

1) Norton 360
2) On Linux, out of date sqlite
(In reply to comment #5)
> Also, a common symptom is still toolbar buttons not working

can you provide more information on this? which toolbar buttons? all of them?
In bug 421482, sqlite's SYNCHRONOUS pragma was changed from FULL to NORMAL, which can provide opportunity for corruption in power-outage and hard-restart scenarios.
(In reply to comment #6)
Back/forward/home/that dropdown next to back/forward.  Basically any button that would cause a write to places.sqlite
Definitely need to understand both the scope (in terms of number of users affected) and cause better here before we have confidence in shipping 3.1
Flags: blocking-firefox3.1? → blocking-firefox3.1+
Pretty sure I just hit this on 3.1 nightly (Mac).

Observations:
 - Back/Forward work for me
 - Unfiled bookmarks and bookmarks toolbar gone
 - No toolkit.storage.synchronous pref

What else do you need?
My history is intact, and the "recent tags" menu under bookmarks has the right tag entries, but all of the tag-specific sub-menus come up empty.
i think we need to do two things here:

1. set the SYNCHONOUS pragma back to FULL, ensuring data integrity for places.sqlite

2. we currently only backup the db and restore bookmarks if the connection fails for sqlite-detectable corruption. we should instead do this for other possible failures as well (but not for file-locked, see bug 414715).
(In reply to comment #11)
> My history is intact, and the "recent tags" menu under bookmarks has the right
> tag entries, but all of the tag-specific sub-menus come up empty.

Your scenario does not sound like the same problem as the earlier reports (where back/forward are disabled, etc). Those reports indicate that nsNavHistoryService cannot init.

In your database, bookmarks have been removed, but history and tags have not. I found a codepath that will result in this same database state, and filed bug 464767 for it.
From last week's live chat reports on support.mozilla.com, 3 people confirmed that renaming places.sqlite and places.sqlite-journal fixed the issue with no bookmarks displayed, back/forward disabled, home page not displayed, etc.  Additionally, 3 more confirmed that renaming both files fixed an issue with some bookmarks unable to be deleted, changed, or added - with other bookmarks working normally.

Two people with the first issue (back/forward broken, etc) who left live chat before renaming places.sqlite did have three extensions in common - Yahoo Canada Toolbar, AIM Toolbar, and Fast Video Downloader.  Has anyone who has seen this issue used any of those extensions?
Assignee: nobody → dietrich
(In reply to comment #12)
> i think we need to do two things here:
> 
> 1. set the SYNCHONOUS pragma back to FULL, ensuring data integrity for
> places.sqlite
> 
> 2. we currently only backup the db and restore bookmarks if the connection
> fails for sqlite-detectable corruption. we should instead do this for other
> possible failures as well (but not for file-locked, see bug 414715).

there's not conclusive evidence about power failures or restarts, so it's premature to implement #1 above. the first step will be to implement #2 above, to at least allow the backups to kick-in in this situation.
Depends on: 414715
(In reply to comment #15)
> there's not conclusive evidence about power failures or restarts, so it's
> premature to implement #1 above.

I don't agree, if sqlite developers choosed to make FULL the default they probably have better knowledge than us on their code, and we should trust them. I see that we don't have evidence of power failures, but it's also true users often don't report such events, or that a crash could cause similar situations.
I think taking both can't hurt us, if it will we can go back.

about point 2, what kind of failures do you think makes sense to detect without increasing our Ts? Do you mean detecting other sqlite errors on connection creation?
(In reply to comment #16)
> (In reply to comment #15)
> > there's not conclusive evidence about power failures or restarts, so it's
> > premature to implement #1 above.
> 
> I don't agree, if sqlite developers choosed to make FULL the default they
> probably have better knowledge than us on their code, and we should trust them.
> I see that we don't have evidence of power failures, but it's also true users
> often don't report such events, or that a crash could cause similar situations.
> I think taking both can't hurt us, if it will we can go back.

yes, synchonous=FULL can hurt us on performance. however, the temp tables will mitigate this, and resolving the dataloss issue should be prioritized regardless.
> 
> about point 2, what kind of failures do you think makes sense to detect without
> increasing our Ts? Do you mean detecting other sqlite errors on connection
> creation?

yes, any error that could be returned when opening a connection. we explicitly handle connection error due to corruption, and will explicitly handle when the file is locked when bug 414715 lands, but any other possible error code we do not handle, and the service fails to initialize.
Attached patch v1 (obsolete) — Splinter Review
i'll post to dev.platform to discuss this. also, this should land after bug 414715, as we want to handle the locking error differently.
Attachment #350242 - Flags: review?(sdwilsh)
Whiteboard: [has patch][needs review sdwilsh]
Comment on attachment 350242 [details] [diff] [review]
v1

>+  if (!NS_SUCCEEDED(rv)) {
Just use NS_FAILED please

Have you pushed this to the try server to see how that (lying) talos does?
Attachment #350242 - Flags: review?(sdwilsh) → review+
Whiteboard: [has patch][needs review sdwilsh] → [has patch][has review][can land]
Target Milestone: Firefox 3.1 → Firefox 3.1b3
Not really relevant to this bug but if people are seeing _all_ bookmarks lost from their bookmarks toolbar, they may need to reset localstore.rdf and it's not this issue.
won't this patch threat LOCKED case as an error and try to restore?
(In reply to comment #21)
> won't this patch threat LOCKED case as an error and try to restore?

yes, needs to land after bug 414715.
Whiteboard: [has patch][has review][can land] → [has patch][has review][land after dependent bug]
Adding common-issues+ keyword for tracking.
Keywords: common-issues+
i think a catch-all errors if is not good at all here and could cause us to try to restore bookmarks on errors that are completely unrelated to places.sqlite itself, overwriting or destroying (if the backup is not good) user's bookmarks... see for example the cases we have found for a locked file, we can end up with a locked, busy, or ioerr error for the same issue... So my fear is that places.sqlite is completely valid, but due to another OS error we will still replace it.

these are actually the errors we could receive:

NS_ERROR_FILE_CORRUPTED
in this case we should replace for sure, the db file is corrupt or is not a db.

NS_ERROR_FILE_ACCESS_DENIED
in this case the database has wrong permission or OS doesn't allow access to it, this sounds to me as something that should be seen as LOCKED on our side, moreover we won't probably be able to remove the file.

NS_ERROR_STORAGE_BUSY
in this case the db is locked

NS_ERROR_STORAGE_IOERR
in this case the filesystem has returned an error, the db could be locked

NS_ERROR_FILE_IS_LOCKED
this is a bit cucumbersome since it's the storage translation for "SQLITE_LOCKED /* A table in the database is locked */" we consider the db locked even if probably that's unrelated

NS_ERROR_FILE_READ_ONLY
the db is read only, as above this sounds to me as a locked case and we won't be able to replace the db.

NS_ERROR_FILE_NO_DEVICE_SPACE
In this case again we should consider the db locked, not corrupt.

NS_ERROR_OUT_OF_MEMORY
again, there's no point in replacing the db

NS_ERROR_UNEXPECTED
well, this is unexpected

NS_ERROR_ABORT
we aborted

NS_ERROR_FAILURE
unknown error, really, could be a new sqlite error that storage is not translating for us, so replacing here could not be the right choice

So to me the only valid case to rebuild the db appear to be CORRUPT.
And many other cases should be added to the LOCKED if.
Whiteboard: [has patch][has review][land after dependent bug] → [has patch][has review][see comment 24]
Whiteboard: [has patch][has review][see comment 24] → [needs new patch]
per irc and the previous comment, going to take the pragma setting change, but not other change, as it will cause history to be lost unnecessarily in many cases.
Status: NEW → ASSIGNED
Attached patch v1.1Splinter Review
same as v1 (already reviewed) minus error catching part, with my r+

This can land, hwv would be better land on closed or metered tree to catch eventual perf regressions.
Attachment #350242 - Attachment is obsolete: true
Attachment #353211 - Flags: review+
Whiteboard: [needs new patch] → [has review]
Blocks: 470270
Whiteboard: [has review] → [push on a metered tree to catch eventual perf regressions]
No longer blocks: 470270
the tree is quiet, checked in:

http://hg.mozilla.org/mozilla-central/rev/7c47afbd5675

will close if perf stats check out ok.
Whiteboard: [push on a metered tree to catch eventual perf regressions] → [landed, watching for perf regressions]
Target Milestone: Firefox 3.1b3 → Firefox 3.2a1
graphs look good, closing.
Status: ASSIGNED → RESOLVED
Closed: 15 years ago
Resolution: --- → FIXED
Whiteboard: [landed, watching for perf regressions]
Whiteboard: [baking on trunk]
Hello! This is marked resolved fixed, likely by the same individual that caused this problem.  

My problem is when I make any changes to the bookmarks the change is undone when firebox is restarted.

The so called fix you suggest does not work, becuase you have attempted to place different problems under one umbrella and pronouce them cured.  Happy you are not in the medical field.
Steve, you should have a look at the build you are using. This bug is fixed on trunk, means for the over next release of Firefox 3.2. There was no check-in for Firefox 3.1 yet.

If you wanna test this you should download one of the following builds, make a copy of your profile and run a test with the downloaded build:

http://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/2009/01/2009-01-04-03-mozilla-central/
(In reply to comment #33)
> My problem is when I make any changes to the bookmarks the change is undone
> when firebox is restarted.

check importBookmarksHTML in about:config and set it to false.
on branch: http://hg.mozilla.org/releases/mozilla-1.9.1/rev/f2a6d3e42e9c
Keywords: fixed1.9.1
Whiteboard: [baking on trunk]
Dear Henrick,

I really appreciate your courtesy, thanks.  When will 3.2 be released and is it in Beta?

Unfortunately NIS 2009 adds a boot file which became corrupted so I had to do a system recovery and the files I saved to do a restore will not work because of a HP bug. (Norton tech support was no help).

I had version 3.0.5 and when I checked on updates then and now it shows 3.0.5 as the latest version.  I was able to back up my favorites, but cannot get them to restore.

Amazing how yesterday's problems seem so small.

Best regards,

Steve
Steve, sorry for the delay. A restore will only work, if you have a working profile backup. Please also see comment 35 first. If nothing helps, I can give you instructions but lets do that outside of this bug. You can drop me an email.
verified with Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.1b3pre) Gecko/20090127 Shiretoko/3.1b3pre
Status: RESOLVED → VERIFIED
Can we consider getting this fix into the 3.0.x branch? This is continuously the most common support issue being reported in the support forum and Live Chat (21 reports last week; I'm willing to bet that for every user that reports about this on SUMO, there are at least ten others that are not reporting it) and the behavior this bug causes is really hurting the user experience. 

The symptoms are really hard for people to describe and varies a great deal, so it's hard for them to find the relevant KB article. And that only applies to the people that actively seek support on SUMO in the first place.
Flags: wanted1.9.0.x?
Comment on attachment 353211 [details] [diff] [review]
v1.1

This patch applies to 1.9.0.

The downside is that performance will be worse for Linux users w/ ext3 filesystem.

I think that the win of data integrity and user trust is well worth the cost of regressing performance for a small set of users.
Attachment #353211 - Flags: approval1.9.0.11?
dietrich: how much "regressing performance" do you talk approximately?
(In reply to comment #42)
> The downside is that performance will be worse for Linux users w/ ext3
> filesystem.
Only who don't upgrade to the latest kernal, where this is greatly mitigated with ext3 fixes.

(In reply to comment #43)
> dietrich: how much "regressing performance" do you talk approximately?
Highly dependent on disk activity on the system.  You can read bug 421482 for more details.
Comment on attachment 353211 [details] [diff] [review]
v1.1

Approved for 1.9.0.11, a=dveditz for release-drivers
Attachment #353211 - Flags: approval1.9.0.11? → approval1.9.0.11+
Flags: wanted1.9.0.x?
Flags: wanted1.9.0.x+
Flags: blocking1.9.0.11+
Checking in toolkit/components/places/src/nsNavHistory.cpp;
/cvsroot/mozilla/toolkit/components/places/src/nsNavHistory.cpp,v  <--  nsNavHistory.cpp
new revision: 1.310; previous revision: 1.309
done
Keywords: fixed1.9.0.11
Is there a way to reproduce this bug on demand? Looking through the comments? I don't see one.
Definitely happening less than before but not necessarily gone.  Will try to look at recent instances and see if a new bug is warranted.
Keywords: common-issue+
Bug 451915 - move Firefox/Places bugs to Firefox/Bookmarks and History. Remove all bugspam from this move by filtering for the string "places-to-b-and-h".

In Thunderbird 3.0b, you do that as follows:
Tools | Message Filters
Make sure the correct account is selected. Click "New"
Conditions: Body   contains   places-to-b-and-h
Change the action to "Delete Message".
Select "Manually Run" from the dropdown at the top.
Click OK.

Select the filter in the list, make sure "Inbox" is selected at the bottom, and click "Run Now". This should delete all the bugspam. You can then delete the filter.

Gerv
Component: Places → Bookmarks & History
QA Contact: places → bookmarks
You need to log in before you can comment on or make changes to this bug.