Last Comment Bug 464486 - lots of bookmark corruption reported via user support
: lots of bookmark corruption reported via user support
Status: VERIFIED FIXED
: dataloss, fixed1.9.0.11, verified1.9.1
Product: Firefox
Classification: Client Software
Component: Bookmarks & History (show other bugs)
: Trunk
: All All
: P1 critical with 1 vote (vote)
: Firefox 3.6a1
Assigned To: Dietrich Ayala (:dietrich)
:
Mentors:
: 454111 458981 462069 470270 472011 491224 (view as bug list)
Depends on: 414715
Blocks:
  Show dependency treegraph
 
Reported: 2008-11-12 11:12 PST by Dietrich Ayala (:dietrich)
Modified: 2009-11-26 06:12 PST (History)
30 users (show)
mbeltzner: blocking‑firefox3.5+
dveditz: blocking1.9.0.11+
dveditz: wanted1.9.0.x+
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments
v1 (1.05 KB, patch)
2008-11-26 16:29 PST, Dietrich Ayala (:dietrich)
sdwilsh: review+
Details | Diff | Review
v1.1 (1.05 KB, patch)
2008-12-16 06:57 PST, Marco Bonardo [::mak]
mak77: review+
dveditz: approval1.9.0.11+
Details | Diff | Review

Description Dietrich Ayala (:dietrich) 2008-11-12 11:12:34 PST

    
Comment 1 Dietrich Ayala (:dietrich) 2008-11-12 11:16:28 PST
according to cww:

various reported problems:
- history not working, but bookmarks are
- unfiled bookmarks folder missing (or other root folder)
- random chunks of bookmarks missing

the json backups are valid, and can be used to workaround the problem.
Comment 2 Dietrich Ayala (:dietrich) 2008-11-12 11:18:08 PST
cww also said many people are saying this occurs in conjunction with power loss or sudden system restarts.
Comment 3 Shawn Wilsher :sdwilsh 2008-11-12 11:22:09 PST
Useful information to know:
value of toolkit.storage.synchronous
OS version
Comment 4 [:Cww] 2008-11-12 11:29:35 PST
Actually, it just occurred to me that one reason we may see a spike is if the force restart that happens with an OS patch may be at fault.  This is purely conjecture.
Comment 5 [:Cww] 2008-11-12 11:38:09 PST
Also, a common symptom is still toolbar buttons not working AND awesomebar not working AND bookmarks not saving.  (i.e. places.sqlite is so corrupted that it's not letting Firefox save anything to it.)

This bug does NOT cover the other two causes for that problem which we exclude when we get data from users:

1) Norton 360
2) On Linux, out of date sqlite
Comment 6 Dietrich Ayala (:dietrich) 2008-11-12 11:44:43 PST
(In reply to comment #5)
> Also, a common symptom is still toolbar buttons not working

can you provide more information on this? which toolbar buttons? all of them?
Comment 7 Dietrich Ayala (:dietrich) 2008-11-12 11:55:05 PST
In bug 421482, sqlite's SYNCHRONOUS pragma was changed from FULL to NORMAL, which can provide opportunity for corruption in power-outage and hard-restart scenarios.
Comment 8 [:Cww] 2008-11-12 16:27:28 PST
(In reply to comment #6)
Back/forward/home/that dropdown next to back/forward.  Basically any button that would cause a write to places.sqlite
Comment 9 Mike Beltzner [:beltzner, not reading bugmail] 2008-11-12 16:31:14 PST
Definitely need to understand both the scope (in terms of number of users affected) and cause better here before we have confidence in shipping 3.1
Comment 10 Johnathan Nightingale [:johnath] 2008-11-13 08:04:29 PST
Pretty sure I just hit this on 3.1 nightly (Mac).

Observations:
 - Back/Forward work for me
 - Unfiled bookmarks and bookmarks toolbar gone
 - No toolkit.storage.synchronous pref

What else do you need?
Comment 11 Johnathan Nightingale [:johnath] 2008-11-13 08:06:10 PST
My history is intact, and the "recent tags" menu under bookmarks has the right tag entries, but all of the tag-specific sub-menus come up empty.
Comment 12 Dietrich Ayala (:dietrich) 2008-11-13 08:52:23 PST
i think we need to do two things here:

1. set the SYNCHONOUS pragma back to FULL, ensuring data integrity for places.sqlite

2. we currently only backup the db and restore bookmarks if the connection fails for sqlite-detectable corruption. we should instead do this for other possible failures as well (but not for file-locked, see bug 414715).
Comment 13 Dietrich Ayala (:dietrich) 2008-11-13 13:15:01 PST
(In reply to comment #11)
> My history is intact, and the "recent tags" menu under bookmarks has the right
> tag entries, but all of the tag-specific sub-menus come up empty.

Your scenario does not sound like the same problem as the earlier reports (where back/forward are disabled, etc). Those reports indicate that nsNavHistoryService cannot init.

In your database, bookmarks have been removed, but history and tags have not. I found a codepath that will result in this same database state, and filed bug 464767 for it.
Comment 14 Matthew Middleton (:zzxc) 2008-11-13 13:29:07 PST
From last week's live chat reports on support.mozilla.com, 3 people confirmed that renaming places.sqlite and places.sqlite-journal fixed the issue with no bookmarks displayed, back/forward disabled, home page not displayed, etc.  Additionally, 3 more confirmed that renaming both files fixed an issue with some bookmarks unable to be deleted, changed, or added - with other bookmarks working normally.

Two people with the first issue (back/forward broken, etc) who left live chat before renaming places.sqlite did have three extensions in common - Yahoo Canada Toolbar, AIM Toolbar, and Fast Video Downloader.  Has anyone who has seen this issue used any of those extensions?
Comment 15 Dietrich Ayala (:dietrich) 2008-11-25 09:37:22 PST
(In reply to comment #12)
> i think we need to do two things here:
> 
> 1. set the SYNCHONOUS pragma back to FULL, ensuring data integrity for
> places.sqlite
> 
> 2. we currently only backup the db and restore bookmarks if the connection
> fails for sqlite-detectable corruption. we should instead do this for other
> possible failures as well (but not for file-locked, see bug 414715).

there's not conclusive evidence about power failures or restarts, so it's premature to implement #1 above. the first step will be to implement #2 above, to at least allow the backups to kick-in in this situation.
Comment 16 Marco Bonardo [::mak] 2008-11-26 07:59:34 PST
(In reply to comment #15)
> there's not conclusive evidence about power failures or restarts, so it's
> premature to implement #1 above.

I don't agree, if sqlite developers choosed to make FULL the default they probably have better knowledge than us on their code, and we should trust them. I see that we don't have evidence of power failures, but it's also true users often don't report such events, or that a crash could cause similar situations.
I think taking both can't hurt us, if it will we can go back.

about point 2, what kind of failures do you think makes sense to detect without increasing our Ts? Do you mean detecting other sqlite errors on connection creation?
Comment 17 Dietrich Ayala (:dietrich) 2008-11-26 14:00:58 PST
(In reply to comment #16)
> (In reply to comment #15)
> > there's not conclusive evidence about power failures or restarts, so it's
> > premature to implement #1 above.
> 
> I don't agree, if sqlite developers choosed to make FULL the default they
> probably have better knowledge than us on their code, and we should trust them.
> I see that we don't have evidence of power failures, but it's also true users
> often don't report such events, or that a crash could cause similar situations.
> I think taking both can't hurt us, if it will we can go back.

yes, synchonous=FULL can hurt us on performance. however, the temp tables will mitigate this, and resolving the dataloss issue should be prioritized regardless.
> 
> about point 2, what kind of failures do you think makes sense to detect without
> increasing our Ts? Do you mean detecting other sqlite errors on connection
> creation?

yes, any error that could be returned when opening a connection. we explicitly handle connection error due to corruption, and will explicitly handle when the file is locked when bug 414715 lands, but any other possible error code we do not handle, and the service fails to initialize.
Comment 18 Dietrich Ayala (:dietrich) 2008-11-26 16:29:36 PST
Created attachment 350242 [details] [diff] [review]
v1

i'll post to dev.platform to discuss this. also, this should land after bug 414715, as we want to handle the locking error differently.
Comment 19 Shawn Wilsher :sdwilsh 2008-12-01 17:46:48 PST
Comment on attachment 350242 [details] [diff] [review]
v1

>+  if (!NS_SUCCEEDED(rv)) {
Just use NS_FAILED please

Have you pushed this to the try server to see how that (lying) talos does?
Comment 20 [:Cww] 2008-12-03 11:07:20 PST
Not really relevant to this bug but if people are seeing _all_ bookmarks lost from their bookmarks toolbar, they may need to reset localstore.rdf and it's not this issue.
Comment 21 Marco Bonardo [::mak] 2008-12-04 05:07:40 PST
won't this patch threat LOCKED case as an error and try to restore?
Comment 22 Dietrich Ayala (:dietrich) 2008-12-04 09:14:11 PST
(In reply to comment #21)
> won't this patch threat LOCKED case as an error and try to restore?

yes, needs to land after bug 414715.
Comment 23 [:Cww] 2008-12-07 10:37:30 PST
Adding common-issues+ keyword for tracking.
Comment 24 Marco Bonardo [::mak] 2008-12-12 08:40:44 PST
i think a catch-all errors if is not good at all here and could cause us to try to restore bookmarks on errors that are completely unrelated to places.sqlite itself, overwriting or destroying (if the backup is not good) user's bookmarks... see for example the cases we have found for a locked file, we can end up with a locked, busy, or ioerr error for the same issue... So my fear is that places.sqlite is completely valid, but due to another OS error we will still replace it.

these are actually the errors we could receive:

NS_ERROR_FILE_CORRUPTED
in this case we should replace for sure, the db file is corrupt or is not a db.

NS_ERROR_FILE_ACCESS_DENIED
in this case the database has wrong permission or OS doesn't allow access to it, this sounds to me as something that should be seen as LOCKED on our side, moreover we won't probably be able to remove the file.

NS_ERROR_STORAGE_BUSY
in this case the db is locked

NS_ERROR_STORAGE_IOERR
in this case the filesystem has returned an error, the db could be locked

NS_ERROR_FILE_IS_LOCKED
this is a bit cucumbersome since it's the storage translation for "SQLITE_LOCKED /* A table in the database is locked */" we consider the db locked even if probably that's unrelated

NS_ERROR_FILE_READ_ONLY
the db is read only, as above this sounds to me as a locked case and we won't be able to replace the db.

NS_ERROR_FILE_NO_DEVICE_SPACE
In this case again we should consider the db locked, not corrupt.

NS_ERROR_OUT_OF_MEMORY
again, there's no point in replacing the db

NS_ERROR_UNEXPECTED
well, this is unexpected

NS_ERROR_ABORT
we aborted

NS_ERROR_FAILURE
unknown error, really, could be a new sqlite error that storage is not translating for us, so replacing here could not be the right choice

So to me the only valid case to rebuild the db appear to be CORRUPT.
And many other cases should be added to the LOCKED if.
Comment 25 Dietrich Ayala (:dietrich) 2008-12-15 09:35:49 PST
per irc and the previous comment, going to take the pragma setting change, but not other change, as it will cause history to be lost unnecessarily in many cases.
Comment 26 Marco Bonardo [::mak] 2008-12-16 06:57:13 PST
Created attachment 353211 [details] [diff] [review]
v1.1

same as v1 (already reviewed) minus error catching part, with my r+

This can land, hwv would be better land on closed or metered tree to catch eventual perf regressions.
Comment 27 Marco Bonardo [::mak] 2008-12-20 04:48:09 PST
*** Bug 462069 has been marked as a duplicate of this bug. ***
Comment 28 Marco Bonardo [::mak] 2008-12-20 05:20:55 PST
*** Bug 454111 has been marked as a duplicate of this bug. ***
Comment 29 Bruno Calvignac 2008-12-22 09:12:34 PST
*** Bug 470270 has been marked as a duplicate of this bug. ***
Comment 30 Dietrich Ayala (:dietrich) 2009-01-03 09:29:29 PST
the tree is quiet, checked in:

http://hg.mozilla.org/mozilla-central/rev/7c47afbd5675

will close if perf stats check out ok.
Comment 31 Matthias Versen [:Matti] 2009-01-03 20:29:06 PST
*** Bug 472011 has been marked as a duplicate of this bug. ***
Comment 32 Dietrich Ayala (:dietrich) 2009-01-03 21:08:40 PST
graphs look good, closing.
Comment 33 Steve 2009-01-04 10:47:31 PST
Hello! This is marked resolved fixed, likely by the same individual that caused this problem.  

My problem is when I make any changes to the bookmarks the change is undone when firebox is restarted.

The so called fix you suggest does not work, becuase you have attempted to place different problems under one umbrella and pronouce them cured.  Happy you are not in the medical field.
Comment 34 Henrik Skupin (:whimboo) 2009-01-04 11:22:16 PST
Steve, you should have a look at the build you are using. This bug is fixed on trunk, means for the over next release of Firefox 3.2. There was no check-in for Firefox 3.1 yet.

If you wanna test this you should download one of the following builds, make a copy of your profile and run a test with the downloaded build:

http://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/2009/01/2009-01-04-03-mozilla-central/
Comment 35 Marco Bonardo [::mak] 2009-01-06 10:26:48 PST
(In reply to comment #33)
> My problem is when I make any changes to the bookmarks the change is undone
> when firebox is restarted.

check importBookmarksHTML in about:config and set it to false.
Comment 36 Dietrich Ayala (:dietrich) 2009-01-06 10:54:14 PST
on branch: http://hg.mozilla.org/releases/mozilla-1.9.1/rev/f2a6d3e42e9c
Comment 37 Steve 2009-01-06 13:11:34 PST
Dear Henrick,

I really appreciate your courtesy, thanks.  When will 3.2 be released and is it in Beta?

Unfortunately NIS 2009 adds a boot file which became corrupted so I had to do a system recovery and the files I saved to do a restore will not work because of a HP bug. (Norton tech support was no help).

I had version 3.0.5 and when I checked on updates then and now it shows 3.0.5 as the latest version.  I was able to back up my favorites, but cannot get them to restore.

Amazing how yesterday's problems seem so small.

Best regards,

Steve
Comment 38 Marco Bonardo [::mak] 2009-01-08 13:06:03 PST
*** Bug 458981 has been marked as a duplicate of this bug. ***
Comment 39 Henrik Skupin (:whimboo) 2009-01-13 12:59:42 PST
Steve, sorry for the delay. A restore will only work, if you have a working profile backup. Please also see comment 35 first. If nothing helps, I can give you instructions but lets do that outside of this bug. You can drop me an email.
Comment 40 Tracy Walker [:tracy] 2009-01-27 09:35:51 PST
verified with Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.1b3pre) Gecko/20090127 Shiretoko/3.1b3pre
Comment 41 David Tenser [:djst] 2009-04-21 19:50:44 PDT
Can we consider getting this fix into the 3.0.x branch? This is continuously the most common support issue being reported in the support forum and Live Chat (21 reports last week; I'm willing to bet that for every user that reports about this on SUMO, there are at least ten others that are not reporting it) and the behavior this bug causes is really hurting the user experience. 

The symptoms are really hard for people to describe and varies a great deal, so it's hard for them to find the relevant KB article. And that only applies to the people that actively seek support on SUMO in the first place.
Comment 42 Dietrich Ayala (:dietrich) 2009-04-29 11:31:46 PDT
Comment on attachment 353211 [details] [diff] [review]
v1.1

This patch applies to 1.9.0.

The downside is that performance will be worse for Linux users w/ ext3 filesystem.

I think that the win of data integrity and user trust is well worth the cost of regressing performance for a small set of users.
Comment 43 Michael Kohler [:mkohler] 2009-04-29 12:36:33 PDT
dietrich: how much "regressing performance" do you talk approximately?
Comment 44 Shawn Wilsher :sdwilsh 2009-04-29 13:12:56 PDT
(In reply to comment #42)
> The downside is that performance will be worse for Linux users w/ ext3
> filesystem.
Only who don't upgrade to the latest kernal, where this is greatly mitigated with ext3 fixes.

(In reply to comment #43)
> dietrich: how much "regressing performance" do you talk approximately?
Highly dependent on disk activity on the system.  You can read bug 421482 for more details.
Comment 45 Daniel Veditz [:dveditz] 2009-05-01 10:52:20 PDT
Comment on attachment 353211 [details] [diff] [review]
v1.1

Approved for 1.9.0.11, a=dveditz for release-drivers
Comment 46 Dietrich Ayala (:dietrich) 2009-05-01 11:04:02 PDT
Checking in toolkit/components/places/src/nsNavHistory.cpp;
/cvsroot/mozilla/toolkit/components/places/src/nsNavHistory.cpp,v  <--  nsNavHistory.cpp
new revision: 1.310; previous revision: 1.309
done
Comment 47 Al Billings [:abillings] 2009-05-18 16:42:58 PDT
Is there a way to reproduce this bug on demand? Looking through the comments? I don't see one.
Comment 48 Marco Bonardo [::mak] 2009-06-05 05:44:17 PDT
*** Bug 491224 has been marked as a duplicate of this bug. ***
Comment 49 [:Cww] 2009-10-29 09:54:10 PDT
Definitely happening less than before but not necessarily gone.  Will try to look at recent instances and see if a new bug is warranted.
Comment 50 Gervase Markham [:gerv] 2009-11-26 06:12:43 PST
Bug 451915 - move Firefox/Places bugs to Firefox/Bookmarks and History. Remove all bugspam from this move by filtering for the string "places-to-b-and-h".

In Thunderbird 3.0b, you do that as follows:
Tools | Message Filters
Make sure the correct account is selected. Click "New"
Conditions: Body   contains   places-to-b-and-h
Change the action to "Delete Message".
Select "Manually Run" from the dropdown at the top.
Click OK.

Select the filter in the list, make sure "Inbox" is selected at the bottom, and click "Run Now". This should delete all the bugspam. You can then delete the filter.

Gerv

Note You need to log in before you can comment on or make changes to this bug.