Closed Bug 1380606 Opened 2 years ago Closed 2 years ago

Uptick of failures in Firefox 54: Error: Invalid value for property 'guid': null

Categories

(Firefox :: Sync, enhancement, P2)

enhancement

Tracking

()

VERIFIED FIXED
Firefox 58
Tracking Status
firefox58 --- fixed

People

(Reporter: markh, Assigned: Lina)

References

Details

Attachments

(2 files, 1 obsolete file)

As 54 went to release, we see the success rate of the bookmarks engine drop from ~99.85% to ~99.62%, almost all related to "Error: Invalid value for property 'guid': null"

https://gist.github.com/mhammond/f07357812fc6be5ab7a6ce42ea710a4b has graphs showing the general trend, but I've somehow broken output from the gist at https://gist.github.com/mhammond/01907397057deb00edec1d6616c0a17c, but it has more detail. No stacks etc though...
Depends on: 1380718
I can't see a way we'd get this error processing an incoming record nor directly creating an outgoing record (although that could just be a lack of imagination) - but certainly both _reconcile and createRecord fail in different ways attempting to use null.

One somewhat way I *could* possibly see this happen is if (say) an addon inserted a record into moz_bookmarks without a GUID - so I did that :) Via about-sync, I executed:

> INSERT INTO moz_bookmarks (fk, type, parent, position, title) VALUES (1, 1, 2, 0, "title")

(where the bookmark menu has an id of 2), and sure enough, syncing fails with:

> 1500016798966	Sync.Engine.Bookmarks	WARN	Error creating record: Error: Invalid value for property 'guid': null (resource://gre/modules/PlacesUtils.jsm:584:17) JS Stack trace: validateItemProperties@PlacesUtils.jsm:584:17 < validateBookmarkObject@Bookmarks.jsm:2036:10 < fetch@Bookmarks.jsm:917:21 < fetch@PlacesSyncUtils.jsm:709:30 < async*createRecord@bookmarks.js:806:22 < async*_createRecord@engines.js:954:24 < async*_createRecord@bookmarks.js:540:24 < async*_uploadOutgoing@engines.js:1645:23 < async*_sync@engines.js:1799:13 < async*WrappedNotify@util.js:161:27 < async*sync@engines.js:722:12 < async*_syncEngine@enginesync.js:203:13 < async*sync@enginesync.js:153:21 < async*onNotify@service.js:1099:13 < async*WrappedNotify@util.js:161:27 < async*WrappedLock@util.js:117:22 < async*_lockedSync@service.js:1092:12 < async*sync/<@service.js:1084:13 < async*WrappedCatch@util.js:91:22 < async*sync@service.js:1073:12

IIRC, we would previously handle a missing GUID and just allocate it one. I guess one option might be to try and catch this error and "fix" everything without a local GUID? /cc Mak incase he has thoughts here...
Priority: -- → P2
Given bug 1380718, In a few weeks we should be able to start looking for the operation most often causing these errors. My money is that it will tell us that fetching a bookmark is the issue, which will imply there are entries without a GUID.

Assuming that turns out to be true, I see 2 possible mitigations:

* Query and fix these items as part of places maintenance - I'd expect that might be relatively inexpensive in the normal case of no entries without GUIDs, but it's still additional work done by all users for something we expect to be rare.

* Perform a schema migration and enforce non-null GUIDs at the database level (obviously adding GUIDs as part of that process)

Mak, any thoughts on this?
Flags: needinfo?(mak77)
(In reply to Mark Hammond [:markh] from comment #2)
> * Query and fix these items as part of places maintenance - I'd expect that
> might be relatively inexpensive in the normal case of no entries without
> GUIDs, but it's still additional work done by all users for something we
> expect to be rare.
> 
> * Perform a schema migration and enforce non-null GUIDs at the database
> level (obviously adding GUIDs as part of that process)
> 
> Mak, any thoughts on this?

I don't have ideas until we reach a point where we know where the problem starts.
I actually think we may still have a problem with bug 1297941, I just changed the JOIN to CROSS JOIN in bug 1354032 cause I discovered that the current behavior only works when the table is ANALYZEd. So that could also be another cause of a broken guids cache.
We are also continuing the work towards removal of itemIds from the codebase, it will still take some time though, at the current pace.

I honestly don't see how entries may end up without a GUID, but I'm sure we can fix them.
Unfortunately the only way to set notnull in a column is when defining the column with the table, so I don't think that's a feasible option unless we want to throw away and rebuild the whole table.
Flags: needinfo?(mak77)
The PlacesSyncUtils methods call `PlacesUtils.bookmarks.{fetch, update, insert, remove}`. Everything except `fetch` takes a `syncId`, not a `guid`, so I'd expect the error to read "Invalid value for property 'syncId'" if Sync tried to insert invalid data. That leaves `fetch` as the likely suspect.

From the error, it's also not clear if Sync is passing the value `null`, or the string `"null"`. That could happen if `pullChanges` fetches a row with a `null` GUID, which is then converted to a string when it's stored as an object key in the changeset. Then Sync essentially does `Object.keys(changeset).map(PlacesSyncUtils.bookmarks.fetch)`, we end up calling `PlacesUtils.bookmarks.fetch("null")`, and there's our exception.

Bug 1380718 and bug 1344282 changed this to use `JSON.stringify`, so we should be able to better see what's going on.
(In reply to Kit Cambridge (he/him) [:kitcambridge] (UTC-7) from comment #4)
> From the error, it's also not clear if Sync is passing the value `null`, or
> the string `"null"`.

As you guessed, it appears to be the latter - telemetry is now showing the vast majority look like:

> {u'error': u'Error: Bookmarks.jsm: fetch: Invalid value for property \'guid\': "null"',

while there's a few with:

> {u'error': u'Error: Bookmarks.jsm: fetch: Invalid value for property \'guid\': "0omdMwqX2oQ5\\n"',

*sigh*. I only took a 1% sample, and all cases that were not null had that same GUID with a trailing "\n", so we've got that going for us, which is nice :/
Flags: needinfo?(markh)
FTR, there are ~55k devices failing to sync with an error that includes the message "Invalid value for property" over the last 3 months.
Flags: needinfo?(kit)
Assignee: nobody → kit
There are two parts to this. Missing GUIDs are easier, because other devices can't apply them, assuming the items make it up to the server at all. I like Mark's suggestion to assign GUIDs as part of Places maintenance (we could also do this in `pullChanges`, but items without GUIDs will break more than just Sync), and bump the change counter to ensure they're uploaded on the next sync.

Invalid GUIDs are trickier, because they might already exist on other devices. We'll need to write tombstones for the invalid GUIDs, assign fresh GUIDs, and bump the change counters for the affected items and their children. I think this is mostly safe, even with our current dodgy deduping.

We always dedupe local to remote, so other devices will download the items with the new GUIDs, then change the local invalid GUIDs to the new valid ones. It's also OK if maintenance runs separately on two different devices, and we end up changing the same invalid GUID to two different valid ones. In that case, deduping should kick in, and pick the valid GUID that's on the server when the other device syncs. There are ways this can go wrong, of course, and we'll want to make sure it's well-tested...but this should unstick those users.

WDYT, Mark?
Flags: needinfo?(kit)
For "fun", I looked at all "invalid property" errors we've seen in the last 3 months:

first field: property name
second field: a massaged value. string[xx] is a string xx bytes long, quoted-string[xx] is a string xx bytes plus 2 quote chars, for "modified" I split the numbers into ranges, and everything else numeric appears to be actual numeric values (although stored as a string in the DB).
third field: number of failed syncs with the particular (massaged) value.

I'm not sure if we should attempt to fix all of these in this bug, although I *guess* it makes sense to (and almost certainly makes sense to fix parentGuid, which we'll probably be forced to touch just fixing guid). I'll get back to comment 7 tomorrow.

[((u'guid', 1.0), 1),
 ((u'guid', 2.0), 14),
 ((u'guid', 17.0), 155),
 ((u'guid', 9088.0), 6027),
 ((u'guid', u'"null"'), 361639),
 ((u'guid', 'massive positive number'), 637),
 ((u'guid', u'null'), 14798361),
 ((u'guid', 'quoted-string[10]'), 828),
 ((u'guid', 'quoted-string[11]'), 10709),
 ((u'guid', 'quoted-string[12]'), 99),
 ((u'guid', 'quoted-string[14]'), 17132),
 ((u'guid', 'quoted-string[22]'), 12262),
 ((u'guid', 'quoted-string[26]'), 2936),
 ((u'guid', 'quoted-string[27]'), 38),
 ((u'guid', 'quoted-string[32]'), 30694),
 ((u'guid', 'quoted-string[36]'), 1768),
 ((u'guid', 'quoted-string[38]'), 347),
 ((u'guid', 'quoted-string[4]'), 384),
 ((u'guid', 'quoted-string[6]'), 54),
 ((u'guid', 'quoted-string[9]'), 58),
 ((u'guid', 'string[10]'), 81924),
 ((u'guid', 'string[11]'), 817362),
 ((u'guid', 'string[12]'), 18135),
 ((u'guid', 'string[14]'), 67),
 ((u'guid', 'string[172]'), 773),
 ((u'guid', 'string[22]'), 119332),
 ((u'guid', 'string[26]'), 104721),
 ((u'guid', 'string[27]'), 20671),
 ((u'guid', 'string[32]'), 609054),
 ((u'guid', 'string[36]'), 55257),
 ((u'guid', 'string[38]'), 5762),
 ((u'guid', 'string[4]'), 646),
 ((u'guid', 'string[6]'), 374),
 ((u'guid', 'string[8]'), 191),
 ((u'guid', 'string[9]'), 5054),
 ((u'modified', 'nan'), 413),
 ((u'modified', 'negative number'), 2028797),
 ((u'parentGuid', 'string[22]'), 1),
 ((u'parentGuid', 'string[32]'), 1),
 ((u'status', 1.0), 1),
 ((u'status', 4.0), 29),
 ((u'status', 5.0), 2),
 ((u'url', 'string[5]'), 381)]
Comment on attachment 8915786 [details]
Bug 1380606 - Add a Places maintenance task to fix up items with invalid GUIDs.

https://reviewboard.mozilla.org/r/187004/#review192050

::: toolkit/components/places/PlacesDBUtils.jsm:219
(Diff revision 1)
> +            SELECT id FROM moz_bookmarks
> +            WHERE guid IS NULL OR
> +                  NOT VALID_GUID(guid)`);
> +          for (let row of idsWithInvalidGuidsRows) {
> +            let id = row.getResultByName("id");
> +            PlacesUtils.invalidateCachedGuidFor(id);

I considered adding a `PlacesUtils.invalidateAllCachedGuids` method to clear the entire cache, instead of an extra `invalidateCaches` step to invalidate them one-by-one here...but it seems excessive to clobber the entire cache for a handful of invalid GUIDs.
Thanks for this analysis, Mark!

(In reply to Mark Hammond [:markh] from comment #8)
>  ((u'modified', 'nan'), 413),
>  ((u'modified', 'negative number'), 2028797),

I'd be curious to see if this decreases over time with bug 1368715, or if that fix was incomplete.

>  ((u'status', 1.0), 1),
>  ((u'status', 4.0), 29),
>  ((u'status', 5.0), 2),

This is bizarre. I think `status` comes from https://searchfox.org/mozilla-central/rev/8efd128b48cdae3a6c6862795ce618aa1ca1c0b4/toolkit/components/places/PlacesSyncUtils.jsm#566-573,1982; 4 and 5 aren't valid `SYNC_STATUS_` constants. I'm also puzzled why "1" would be an error, but, given that these occur so infrequently compared to the rest, we can let them slide.
Comment on attachment 8915784 [details]
Bug 1380606 - Add `mozIStorageAsyncConnection::createDeterministicFunction`.

https://reviewboard.mozilla.org/r/187000/#review192318

Apart from the semantic correctness of dterministic VS not, is there any specific advantage of deterministic function your are trying to obtain here?
Off-hand you don't seem to use this function in any of the forbidden ways for non-deterministic functions expressed in https://sqlite.org/deterministic.html and thus it looks like this could just use the existing CreateFunction API.
I don't think there's any specific perf optimization for a deterministic function vs the others (but I don't have certainty).
Comment on attachment 8915785 [details]
Bug 1380606 - Add an `IS_VALID_GUID()` SQL function to Places.

https://reviewboard.mozilla.org/r/187002/#review192320

::: toolkit/components/places/SQLFunctions.cpp:783
(Diff revision 1)
> +  nsresult
> +  IsValidGUIDFunction::create(mozIStorageConnection *aDBConn)
> +  {
> +    RefPtr<IsValidGUIDFunction> function = new IsValidGUIDFunction();
> +    return aDBConn->CreateDeterministicFunction(
> +      NS_LITERAL_CSTRING("valid_guid"), 1, function

any reason to not name it completely as "is_valid_guid"?
Attachment #8915784 - Attachment is obsolete: true
Attachment #8915784 - Flags: review?(mak77)
Comment on attachment 8915784 [details]
Bug 1380606 - Add `mozIStorageAsyncConnection::createDeterministicFunction`.

https://reviewboard.mozilla.org/r/187000/#review192318

Just for correctness. I was thinking we could mark `autocomplete_match`, `hash`, `get_unreversed_host`, `fixup_url`, and the case-folding functions as deterministic for clarity, too...but you're right that we don't use them anywhere where we can use a regular function, and speculative optimizations aren't a good reason to add a new API. I'll just drop this patch.
(In reply to Kit Cambridge (he/him) [:kitcambridge] (UTC-7) from comment #7)
...
> WDYT, Mark?

That analysis SGTM, thanks!
Flags: needinfo?(markh)
Comment on attachment 8915785 [details]
Bug 1380606 - Add an `IS_VALID_GUID()` SQL function to Places.

https://reviewboard.mozilla.org/r/187002/#review192698
Attachment #8915785 - Flags: review?(mak77) → review+
Comment on attachment 8915786 [details]
Bug 1380606 - Add a Places maintenance task to fix up items with invalid GUIDs.

https://reviewboard.mozilla.org/r/187004/#review192704

::: toolkit/components/places/PlacesDBUtils.jsm:803
(Diff revision 2)
> +      query:
> +      `INSERT OR IGNORE INTO moz_bookmarks_deleted(guid, dateRemoved)
> +       SELECT guid, :dateRemoved
> +       FROM moz_bookmarks
> +       WHERE syncStatus <> :syncStatus AND
> +             guid NOT NULL AND

I assume we skip null guids because moz_bookmarks_deleted(guid) is a primary key and thus it's not null?

::: toolkit/components/places/PlacesDBUtils.jsm:828
(Diff revision 2)
> +      `UPDATE moz_bookmarks
> +       SET guid = GENERATE_GUID(),
> +           syncChangeCounter = syncChangeCounter + 1
> +       WHERE id IN (SELECT id FROM moz_bookmarks
> +                    WHERE guid IS NULL OR
> +                          NOT IS_VALID_GUID(guid))`,

Is there a reason to not just do UPDATE ... WHERE guid IS NULL OR NOT is_valid_guid(guid)?
Attachment #8915786 - Flags: review?(mak77) → review+
Comment on attachment 8915786 [details]
Bug 1380606 - Add a Places maintenance task to fix up items with invalid GUIDs.

https://reviewboard.mozilla.org/r/187004/#review192704

> I assume we skip null guids because moz_bookmarks_deleted(guid) is a primary key and thus it's not null?

Yes, and also because Sync records can't have a `null` ID.

> Is there a reason to not just do UPDATE ... WHERE guid IS NULL OR NOT is_valid_guid(guid)?

D'oh, that was silly of me. Fixed, thanks!
Pushed by kcambridge@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/455609e45fe7
Add an `IS_VALID_GUID()` SQL function to Places. r=mak
https://hg.mozilla.org/integration/autoland/rev/479c9304b703
Add a Places maintenance task to fix up items with invalid GUIDs. r=mak
https://hg.mozilla.org/mozilla-central/rev/455609e45fe7
https://hg.mozilla.org/mozilla-central/rev/479c9304b703
Status: NEW → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
Target Milestone: --- → Firefox 58
See Also: → 1407922
Telemetry shows a massive drop for this error since mid Oct on 58 - nice work!
Status: RESOLVED → VERIFIED

The bookmarks.get example from the documentation returns invalid guid in Firefox 65.0.1

Not sure if a new issue should be started for this, but this looks directly related.

https://developer.mozilla.org/en-US/docs/Mozilla/Add-ons/WebExtensions/API/bookmarks/get

function onFulfilled(bookmarks) {
  console.log(bookmarks);
}

function onRejected(error) {
  console.log(`An error: ${error}`);
}

var gettingBookmarks = browser.bookmarks.get("abcdefghijkl");
gettingBookmarks.then(onFulfilled, onRejected);

Console output:

An error: Error: Bookmarks.jsm: fetch: Invalid value for property 'guid': "TestBookmark" background.js:6:3

It's not related, but good catch! 👀 I've updated the example on MDN to use a valid GUID–though it'll still call onRejected, because GUIDs are randomly generated, and it's unlikely that bookmarkAAAA would exist. Thanks!

You need to log in before you can comment on or make changes to this bug.