Closed Bug 641074 Opened 13 years ago Closed 13 years ago

Broken folder shortcuts should not interrupt a Places result (nsINavHistoryContainerResultNode.childCount throws)

Categories

(Firefox :: Bookmarks & History, defect)

defect
Not set
major

Tracking

()

RESOLVED FIXED
Firefox 6

People

(Reporter: zandr, Assigned: mak)

References

Details

(Whiteboard: [places-next-wanted][fixed-in-places])

Attachments

(6 files, 2 obsolete files)

User-Agent:       Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; rv:2.0) Gecko/20100101 Firefox/4.0
Build Identifier: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; rv:2.0) Gecko/20100101 Firefox/4.0 ID:20110303140758

I set up a new MacBook on my sync account, and it was working fine last night, but this morning sync always returns an unknown error.

The old machine on the same account is working perfectly.

Sync logs attached.

Reproducible: Always

Steps to Reproduce:
1.Sync ('Sync Now' or automatic, makes no difference)
2.
3.
Actual Results:  
Sync encountered an error while syncing: Unknown error. Sync will automatically retry this action.

Expected Results:  
Sync should complete normally.
Attached file sync.log
Remembered to attach the logs.
Can't repro locally. 

Similar error many moons ago, turned up by Google: http://forums.mozillazine.org/viewtopic.php?p=3187125#p3187125
Attachment #518811 - Attachment mime type: application/octet-stream → text/plain
rnewman was concerned I may have incorrectly truncated the log. This probably overlaps the previous attachments.
Attachment #518813 - Attachment mime type: application/octet-stream → text/plain
Alright, here's the problem.

http://mxr.mozilla.org/mozilla-central/source/services/sync/modules/engines/bookmarks.js#1428

We set containerOpen to true. Then we call .childCount, which throws an error.

That's not supposed to occur:

https://developer.mozilla.org/en/nsINavHistoryContainerResultNode

"The number of child nodes; accessing this throws an NS_ERROR_NOT_AVAILABLE exception of containerOpen is false. Read only."

I added debug printing:

      node.containerOpen = true;

      if (!node.containerOpen) {
        throw "THIS CANNOT BE!";
      }

and that line is hit. The observed error behavior is consistent with the .containerOpen = true line having no effect for record ID 5 (Unfiled Bookmarks).

I'm going to throw this at the Places team on IRC, because this ain't supposed to happen.

Thanks for your help, zandr.
Can someone on the Places team take a look at Comment 5? No answer on IRC.
Summary: Sync errors on new machine, old machine is fine → Sync: error in nsINavHistoryContainerResultNode.childCount
I have reproduced this twice with brand new profiles. On rnewman's advice, I set aside the profile where I found this problem and started fresh. I then walked away from the new machine, and when I came back the problem had recurred.

I don't believe that I created any new bookmarks on other machines in that time, so the only thing I can think that would have been added was history.

Once I saw that, I again created a new profile, set up sync, and the initial sync worked. I added a bookmark on another machine, and the next sync to the new profile failed with the same error.

I hate to say this, but this is starting to feel blocker-ish.
(In reply to comment #7)
> Once I saw that, I again created a new profile, set up sync, and the initial
> sync worked. I added a bookmark on another machine, and the next sync to the
> new profile failed with the same error.

So on this second run, did you do any browsing at all? Or was it simply:

* New profile
* Sync down existing data
* Add bookmark elsewhere, sync
* Sync down new data
* Borked!

?

Not sure I'm any closer to understanding the chain of events that cause the final borkage, but every little helps.

Somehow Places is getting into an inconsistent state, such that opening query results on Unfiled Bookmarks fails. That's not supposed to happen regardless of what Sync does, I think, but I'm open to correction.

zandr, are you able to view your Unfiled Bookmarks in the UI, once you reach this point?

If you're feeling especially helpful, could you:

* create yet another a fresh profile
* turn on logging, bookmarks Trace logging, and turn off log rotation
* relaunch, set up Sync
* reproduce the problem via the two-syncs-to-heaven steps you described
* send me the whole log?

I'm hopeful that this will reveal some odd operation earlier in the chain of repro steps that results in the observed error -- until now we've only seen the failure logs, not the cause, and this will give us logs from an empty profile through to completion.

You can turn off everything but History and Bookmarks sync in Sync Options when you set up Sync. No point in pulling everything else down. (If that results in no problem, then we've got a more interesting bug!)


> I hate to say this, but this is starting to feel blocker-ish.

I can't say I disagree.

Places team, any insight?
Setting "worried flags".
Severity: normal → major
Whiteboard: [possible blocker]
Assigning to Bookmarks to get some action. If it turns out to be a Sync misuse of the Places APIs, throw it back!
Component: Firefox Sync: Backend → Bookmarks & History
Product: Mozilla Services → Firefox
QA Contact: sync-backend → bookmarks
(In reply to comment #8)

> So on this second run, did you do any browsing at all? Or was it simply:
> 
> * New profile
> * Sync down existing data
> * Add bookmark elsewhere, sync
> * Sync down new data
> * Borked!

Exactly this.

> Somehow Places is getting into an inconsistent state, such that opening query
> results on Unfiled Bookmarks fails. That's not supposed to happen regardless of
> what Sync does, I think, but I'm open to correction.

So, one observation is that Unfiled on my 'working' machine has two folders and a smartfolder that show as (no title). And if we ignore those, we're left with the 5 items we saw in unfiled in the logs.

> zandr, are you able to view your Unfiled Bookmarks in the UI, once you reach
> this point?

No, the group is empty on the 'new' machine.

> If you're feeling especially helpful, could you:
...
> * send me the whole log?

emailed
(In reply to comment #11)

> No, the group is empty on the 'new' machine.

OK, so we have a bookmarks folder that, according to SQL queries in sqlite3, has children, but cannot display in the UI... presumably because something is screwed up. That screwed-up-ness means an API call fails unexpectedly, which screws up Sync's ability to populate lazyMap and thus sync bookmarks, but Sync doesn't seem to be at fault here.

(That makes a pleasant change.)


> emailed

Got it, thanks. Will read through this with coffee in the morning, see if I see anything stand out!
Can I get a places.sqlite file to check by mail? There is not enough data to say anything useful.

OT: I see the node.containerOpen = true, but where is the corresponding call to node.containerOpen = false?
Also, that "(no title)" query in unfiled bookmarks seems an indication of broken ui roots, what does it contain?
(In reply to comment #13)
> Can I get a places.sqlite file to check by mail? There is not enough data to
> say anything useful.

zandr?

> OT: I see the node.containerOpen = true, but where is the corresponding call to
> node.containerOpen = false?

As far as I can see, there are no such calls in the Sync codebase.
(In reply to comment #15)
> > OT: I see the node.containerOpen = true, but where is the corresponding call to
> > node.containerOpen = false?
> 
> As far as I can see, there are no such calls in the Sync codebase.

File a bug in Sync please, any container opened that is not needed for caching purposes (so unless you use it to track changes) should be explicitly closed.  Not doing so causes the result's observers to stay alive, till it is cycle collected, that could be much later, these observers can regress global performances.
(In reply to comment #14)
> Also, that "(no title)" query in unfiled bookmarks seems an indication of
> broken ui roots, what does it contain?

One of these might be the record:

(You can search for these in the log by searching for "under 5 as null".)


  Engine.Bookmarks	TRACE	Incoming: { id: 1za3ClLEoiWf
   index: 0
   modified: 1294118703.71
   ttl: undefined
   payload: {"id":"1za3ClLEoiWf","type":"query","folderName":"snowlUserView:Custom","title":null,"parentName":"Unsorted Bookmarks","bmkUri":"place:folder=1138","tags":[],"keyword":null,"loadInSidebar":false,"parentid":"unfiled"}
   collection: bookmarks }
  Engine.Bookmarks	TRACE	Reconcile step 1: Check for conflicts
  Engine.Bookmarks	TRACE	Reconcile step 2: Check for updates
  Store.Bookmarks	TRACE	Number of rows matching GUID 1za3ClLEoiWf: 0
  Engine.Bookmarks	TRACE	Reconcile step 2.5: Don't dupe deletes
  Engine.Bookmarks	TRACE	Reconcile step 3: Find dupes
  Engine.Bookmarks	TRACE	Finding mapping: Unsorted Bookmarks, bplace:folder=1138:null
  Engine.Bookmarks	TRACE	No parent => no dupe.
  Store.Bookmarks	TRACE	Number of rows matching GUID 1za3ClLEoiWf: 0
  Store.Bookmarks	DEBUG	created bookmark 99 under 5 as null place:folder=1138

or 

  Engine.Bookmarks	TRACE	Incoming: { id: 9IbIHLuqEb1I
 index: 1000000
 modified: 1294118703.71
 ttl: undefined
 payload: {"id":"9IbIHLuqEb1I","type":"folder","parentName":"Unsorted Bookmarks","title":null,"children":[],"parentid":"unfiled","hasDupe":true}
 collection: bookmarks }
  Engine.Bookmarks	TRACE	Reconcile step 1: Check for conflicts
  Engine.Bookmarks	TRACE	Reconcile step 2: Check for updates
  Store.Bookmarks	TRACE	Number of rows matching GUID 9IbIHLuqEb1I: 0
  Engine.Bookmarks	TRACE	Reconcile step 2.5: Don't dupe deletes
  Engine.Bookmarks	TRACE	Reconcile step 3: Find dupes
  Store.Bookmarks	TRACE	Number of rows matching GUID 9IbIHLuqEb1I: 0
  Store.Bookmarks	DEBUG	created folder 113 under 5 as null
  Store.Bookmarks	TRACE	Setting GUID of new item 113 to 9IbIHLuqEb1I
  Store.Bookmarks	TRACE	Number of rows matching GUID 9IbIHLuqEb1I: 1
  Store.Bookmarks	DEBUG	Reparenting orphans  to 113

or

  Engine.Bookmarks	TRACE	Incoming: { id: trHp-TP0j_eG
   index: 0
   modified: 1294118703.71
   ttl: undefined
   payload: {"id":"trHp-TP0j_eG","type":"query","folderName":"snowlCollectionsSystem","title":null,"parentName":"Unsorted Bookmarks","bmkUri":"place:folder=1135&OR","tags":[],"keyword":null,"loadInSidebar":false,"parentid":"unfiled"}
   collection: bookmarks }
  Engine.Bookmarks	TRACE	Reconcile step 1: Check for conflicts
  Engine.Bookmarks	TRACE	Reconcile step 2: Check for updates
  Store.Bookmarks	TRACE	Number of rows matching GUID trHp-TP0j_eG: 0
  Engine.Bookmarks	TRACE	Reconcile step 2.5: Don't dupe deletes
  Engine.Bookmarks	TRACE	Reconcile step 3: Find dupes
  Engine.Bookmarks	TRACE	Finding mapping: Unsorted Bookmarks, bplace:folder=1135&OR:null
  Engine.Bookmarks	TRACE	No parent => no dupe.
  Store.Bookmarks	TRACE	Number of rows matching GUID trHp-TP0j_eG: 0
  Store.Bookmarks	DEBUG	created bookmark 220 under 5 as null place:folder=1135&OR
  Store.Bookmarks	TRACE	Setting GUID of new item 220 to trHp-TP0j_eG
  Store.Bookmarks	TRACE	Number of rows matching GUID trHp-TP0j_eG: 1
(In reply to comment #16)

> File a bug in Sync please...

Done: Bug 641531.

Though I don't think there's much worry about waiting for a full GC when Sync is running :D
Attached file Trivial test.
Trivial test attached.

Apparently Places can't query a folder which contains a query bookmark that
refers to a non-existent folder. Sync does not rewrite those query URIs, so one
can break the remote profile by syncing a profile containing just such a record.
According to Marco's recommendation, we got a bit stricter in bug 610501 by making sure we only rewrite queries that have resultType == RESULTS_AS_TAG_CONTENTS. So the question really is, how did Zandr end up with this record:

{"id":"trHp-TP0j_eG",
 "type":"query",
 "folderName":"snowlCollectionsSystem",
 "title":null,
 "parentName":"Unsorted Bookmarks",
 "bmkUri":"place:folder=1135&OR",
 "tags":[],
 "keyword":null,
 "loadInSidebar":false,
 "parentid":"unfiled"}

or 

{"id":"1za3ClLEoiWf",
 "type":"query",
 "folderName":"snowlUserView:Custom",
 "title":null,"parentName":"Unsorted Bookmarks",
 "bmkUri":"place:folder=1138",
 "tags":[],
 "keyword":null,
 "loadInSidebar":false,
 "parentid":"unfiled"}
More specifically, are "snowlCollectionsSystem" and "snowlUserView:Custom" tags in Zandr's profile? What's the "&OR" bit?
Actions derived from IRC:

1. Sync shouldn't upload folder queries in this way.
2. Sync should ignore them when they are downloaded.
3. Places shouldn't barf on bad queries -- this bug can occur even without Sync.

Future action:

1. Don't use Places IDs in URIs _at all_. Use the GUID. That way we can sync these items.
And a comment regarding blockerness: the built-in UI doesn't allow for creating this kind of bookmark. That limits its impact to users with certain add-ons (e.g., Snowl seems to be a culprit). Not a blocker.
Whiteboard: [possible blocker]
What happens in Places:
- we query the parent of the broken folder shortcut
- we start collecting info on each child of this parent
- when we hit the broken shortcut, bookmarks service queries for the corresponding folder, the database doesn't know about that id, and bookmarks service throws here: http://mxr.mozilla.org/mozilla-central/source/toolkit/components/places/src/nsNavBookmarks.cpp#2206

Extract from the stack:
xul.dll!nsNavBookmarks::ResultNodeForContainer
xul.dll!nsNavHistory::QueryRowToResult
xul.dll!nsNavHistory::RowToResult
xul.dll!nsNavBookmarks::ProcessFolderNodeRow
xul.dll!nsNavBookmarks::QueryFolderChildren
xul.dll!nsNavHistoryFolderResultNode::FillChildren()
xul.dll!nsNavHistoryFolderResultNode::OpenContainer()
xul.dll!nsNavHistoryContainerResultNode::SetContainerOpen
xul.dll!nsNavHistoryFolderResultNode::SetContainerOpen

Globally we reach this situation when:
- An advanced user or an add-on (in this case Snowl?) creates a folder shortcut (even if we don't expose a method to do so in the UI, the API allows to do it)
- Sync copies the query uri to another profile, but does not remap the folder id contained in the query uri
- The new synced folder shortcut points to a not-existing id, or to a not-container, or to another random folder.
(In reply to comment #25)

> Globally we reach this situation when: ...

Presumably there's an even shorter way to achieve this: create a folder shortcut, then delete the folder, leaving the query intact.
So, this bug could handle the Places-throws bug, we need another bug to track changing place:folder=XYZ to place:folder=guid(XYZ) and a Sync bug to somehow handle this stuff before we are ready to remap to guid shortcuts.
For the change most likely we could distinguish old ids from guids by length: guids are 12 chars and I doubt we could have a 12-digits id.
(In reply to comment #27)
> So, this bug could handle the Places-throws bug, we need another bug to track
> changing place:folder=XYZ to place:folder=guid(XYZ) and a Sync bug to somehow
> handle this stuff before we are ready to remap to guid shortcuts.
> For the change most likely we could distinguish old ids from guids by length:
> guids are 12 chars and I doubt we could have a 12-digits id.

We can't just change the format for query records here. Older clients (Firefox 4.0, Sync 1.7) will not understand a place:folder=GUID query and will break Places just as much.

What we can do is:
* create a *new* bookmark record type called "query2" or whatever that actually has a more sensible data format (e.g. explodes the places query object into an actual JSON object when syncing up and reassembles it when applying locally).
* ignore "query" records that aren't tag queries or smart bookmarks.
(In reply to comment #28)
> We can't just change the format for query records here. Older clients (Firefox
> 4.0, Sync 1.7) will not understand a place:folder=GUID query and will break
> Places just as much.

From what I can tell Sync is already setting folder=RANDOMINT so I don't see the difference from what we have today, it will set it to folder=RANDOMCHARS.
In future it could just don't download queries containing folder shortcuts if browser version < X.
(In reply to comment #29)
> (In reply to comment #28)
> > We can't just change the format for query records here. Older clients (Firefox
> > 4.0, Sync 1.7) will not understand a place:folder=GUID query and will break
> > Places just as much.
> 
> From what I can tell Sync is already setting folder=RANDOMINT so I don't see
> the difference from what we have today, it will set it to folder=RANDOMCHARS.

Sure, we can do that, but at this point we've left the idea of syncing the original query anyway. So why not sync the query properties if we assemble and disassemble them anyway?

Also, doing folder=RANDOMCHARS will break code from before bug 610501 because we used to do a regex match for \d before that. In bug 610501 we changed this to disassemble the query. I've checked that that actually doesn't throw, it will just ignore:

var queries = {}; var count = {}; var options = {};
Components.utils.import("resource://services-sync/main.js");
Weave.Svc.History.queryStringToQueries("place:folder=FOOBAR&type=7", queries, count, options);
queries.value[0].getFolders()

That last statement returns [], so we're good.

> In future it could just don't download queries containing folder shortcuts if
> browser version < X.

I don't understand this sentence. We can't go back in time and fix released code.
> Also, doing folder=RANDOMCHARS will break code from before bug 610501 because
> we used to do a regex match for \d before that. In bug 610501 we changed this
> to disassemble the query. I've checked that that actually doesn't throw, it
> will just ignore:

Discussed on IRC: one possible solution is to use folder=guid for non-tag queries, and folder=numericID for tag queries.

That's backward-compatible for the tag rewriting in earlier (~1.6.2) versions of Sync, and the patches in Bug 641329 would allow GUID folder queries to sync safely.

When we abandon support for earlier versions, we can switch to using GUIDs everywhere, The Way It Should Be®.
Do you guys need anything additional from me, or can I set about cleaning up these toxic bookmark folders and making sync work again?
(In reply to comment #32)
> Do you guys need anything additional from me, or can I set about cleaning up
> these toxic bookmark folders and making sync work again?

I think we've got a good handle on things.

Snowl is the add-on that caused the grief; removing it (you might have already; it's not compatible with 4.0), deleting its folders, and then wiping your bookmarks on the server should do the trick.
More precisely:

1. disable the addon
2. get rid of the queries it created in your Firefox profile
3. wipe your server data (best done through the account portal)
4. resync
Assignee: nobody → mak77
Status: NEW → ASSIGNED
Whiteboard: [places-next-wanted]
(In reply to comment #34)
> More precisely:
> 
> 1. disable the addon

Long, long gone. It's a bit puzzling as to why this only caused problems now, but at least the root cause is known.

> 2. get rid of the queries it created in your Firefox profile

Done.

> 3. wipe your server data (best done through the account portal)

Reset Sync (replace others) was sufficient.

> 4. resync

Profit! All's well on the new machine now.
Flags: in-testsuite?
OS: Mac OS X → All
Hardware: x86 → All
Summary: Sync: error in nsINavHistoryContainerResultNode.childCount → Broken folder shortcuts should not interrupt a Places result (nsINavHistoryContainerResultNode.childCount throws)
Attached patch patch v1.0 (obsolete) — Splinter Review
Richard, could you check if this plays well with Sync (maybe you want to add a test based on Trivial test above?)
Attachment #527642 - Flags: review?(sdwilsh)
Attachment #527642 - Flags: feedback?
Attachment #527642 - Flags: feedback? → feedback?(rnewman)
> Richard, could you check if this plays well with Sync (maybe you want to add a
> test based on Trivial test above?)

Will take a look today. Thanks for thinking of Sync :D
Attached patch patch v1.1 (obsolete) — Splinter Review
ooooops, I didn't qrefresh.
Attachment #527642 - Attachment is obsolete: true
Attachment #527642 - Flags: review?(sdwilsh)
Attachment #527642 - Flags: feedback?(rnewman)
Attachment #527694 - Flags: review?(sdwilsh)
Attachment #527694 - Flags: feedback?(rnewman)
Comment on attachment 527642 [details] [diff] [review]
patch v1.0

This dies in bad places. Test incoming...
Attachment #527642 - Flags: feedback-
Comment on attachment 527694 [details] [diff] [review]
patch v1.1

Hmm, Bugzilla showed me the old patch.
Attachment #527694 - Flags: feedback?(rnewman) → feedback-
Comment on attachment 527694 [details] [diff] [review]
patch v1.1

I think I've nailed down the problem, unfortunately I have to clobber due to some recent build change, but this finding was really useful. updated patch coming.
Attachment #527694 - Attachment is obsolete: true
Attachment #527694 - Flags: review?(sdwilsh)
Attached patch patch v1.2Splinter Review
now also testing what Richard found.
Please ask review on the Sync test accordingly if it's ready.
Attachment #527763 - Flags: review?(sdwilsh)
Comment on attachment 527763 [details] [diff] [review]
patch v1.2

>-      // this addrefs for us
>+      // This AddRefs for us.
I think that API is broken :(

r=sdwilsh
Attachment #527763 - Flags: review?(sdwilsh) → review+
Comment on attachment 527698 [details] [diff] [review]
Sync test.

Moar tests! \o/
Attachment #527698 - Flags: review+
(In reply to comment #45)
> Comment on attachment 527698 [details] [diff] [review]
> Sync test.
> 
> Moar tests! \o/

Just to clarify: mak, please land this test alongside your fix. It will fail if I land it now :D
(In reply to comment #44)
> Comment on attachment 527763 [details] [diff] [review]
> patch v1.2
> 
> >-      // this addrefs for us
> >+      // This AddRefs for us.
> I think that API is broken :(
> 
> r=sdwilsh

well it's an internal API, but sure, there is some flaw to fix there.
http://hg.mozilla.org/projects/places/rev/ddd40a5b413c
http://hg.mozilla.org/projects/places/rev/bf1fa4fca00f
Whiteboard: [places-next-wanted] → [places-next-wanted][fixed-in-places]
I added my own report at "see also" by advice of philipp@weitershausen.de - it may be a dupe. My results came from my original and also from a freshly created account. If you need more info (places.sqlite), please let me know.
See Also: → 652122
Blocks: 647605
http://hg.mozilla.org/mozilla-central/rev/ddd40a5b413c
http://hg.mozilla.org/mozilla-central/rev/bf1fa4fca00f
Status: ASSIGNED → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Target Milestone: --- → Firefox 6
Depends on: 652379
Blocks: 624230
This issue breaks a major functionality of FF 4. Wouldn't it be worth to be solved in FF 5 instead of FF 6 ? (see target milestone)
(In reply to comment #53)
> This issue breaks a major functionality of FF 4. Wouldn't it be worth to be
> solved in FF 5 instead of FF 6 ? (see target milestone)

yes, but we have strict rules on what can be ported to each version, and this doesn't satisfy them (we can only fix exposed security bugs or disable features in FF5, since it's in the stabilization branch Aurora).
But don't worry, there will be a new release of Firefox every 6 weeks, so you won't have to wait long after FF5 to get FF6.
(In reply to comment #54)

> yes, but we have strict rules on what can be ported to each version, and
> this doesn't satisfy them (we can only fix exposed security bugs or disable
> features in FF5, since it's in the stabilization branch Aurora).
> But don't worry, there will be a new release of Firefox every 6 weeks, so
> you won't have to wait long after FF5 to get FF6.

I actually kinda agree with Thommie. I've seen a bunch of reports that all come down to this; mrz just sent me an email with the same symptom.

Users with this issue will have Sync fail with an error bar every single time.

AFAIK, Aurora is OK for landing safe, approved fixes, if features can't be disabled (and this can't, of course). Not sure this change counts as "safe", but waiting for Fx6 would be pretty painful for those affected -- any user who doesn't find Bugzilla is basically in broken-Sync-land for the next three months.

Perhaps we could propose this for Aurora, and if it's rejected we could try to put together a minimal workaround (e.g., skipping bad records in _lazyMap)?
Honestly, bug 647605 is what worries me more, this bug is something a user that is using folder shortcuts (not a newbie) can figure out (even by going to support or filing a bug) but that bug can be hit by anyone just syncing desktop with mobile.
So something I'd really like to see in Aurora is a check on Sync side that the mobile root points to something valid.

Regarding this bug, I absolutely share your thoughts, but a change in generating result nodes needs baking and doesn't look like something that could go to beta stage in 8 days from today.
OK, I understand, 647605 looks as ugly as this one ;-(. If the fixes for both have dependencies and a stable solution is only possible in FF6, I agree with Marcos position.
Flags: in-testsuite? → in-testsuite+
You need to log in before you can comment on or make changes to this bug.