Add facility for measuring inconsistent client/server bookmark states

RESOLVED FIXED in Firefox 49

Status

()

Firefox
Sync
P1
normal
RESOLVED FIXED
2 years ago
2 years ago

People

(Reporter: tcsc, Assigned: tcsc)

Tracking

unspecified
Firefox 49
Points:
---
Dependency tree / graph
Bug Flags:
firefox-backlog +

Firefox Tracking Flags

(firefox49 fixed)

Details

(Whiteboard: [sync-data-integrity])

MozReview Requests

()

Submitter Diff Changes Open Issues Last Updated
Loading...
Error loading review requests:

Attachments

(1 attachment, 1 obsolete attachment)

(Assignee)

Description

2 years ago
As mentioned in bug 1263835, this should land behind a nightly-only pref.

To measure this, it requires grabbing the entire server bookmark tree, so some of the telemetry should measure how long this takes as well (and whether or not we got it all).
(Assignee)

Updated

2 years ago
Assignee: nobody → tchiovoloni
(Assignee)

Comment 1

2 years ago
Created attachment 8743571 [details]
MozReview Request: Bug 1265419 - Implement a validator for bookmarks, that reports errors in the server-side bookmark store, and inconsistencies between server and client r?markh

Review commit: https://reviewboard.mozilla.org/r/47889/diff/#index_header
See other reviews: https://reviewboard.mozilla.org/r/47889/
(Assignee)

Comment 2

2 years ago
Created attachment 8743572 [details]
MozReview Request: Bug 1265419 - Hook bookmark sync validator into the bookmark sync engine and add a telemetry probe for it f?markh

Review commit: https://reviewboard.mozilla.org/r/47891/diff/#index_header
See other reviews: https://reviewboard.mozilla.org/r/47891/
(Assignee)

Updated

2 years ago
Attachment #8743571 - Flags: feedback?(markh)
(Assignee)

Updated

2 years ago
Attachment #8743572 - Flags: feedback?(markh)
(Assignee)

Comment 3

2 years ago
WIP checkin. This still needs to add telemetry reporting for the actual results, and I also think it needs to test the validation in some way (shouldn't be too bad, since the validator takes the server and client data in as arguments).

It's also could probably use a little bit of clean up as the function checking the server data is getting up there in size.

(Also, I'm not sure this is the right way to do a nightly-only pref)
https://reviewboard.mozilla.org/r/47889/#review44725

This is looking great - I like the way problems is reported, and can imagine our first cut at telemetry would be to write the exact same structure as you have here, but using the length of each array rather than the individual elements. This was just a first pass and I didn't attempt to check the logic, but I think we definately want tests for this - the tests should be relatively easy given it's already setup to run against a list of records without knowledge of where those records live.

::: services/sync/modules/engines/bookmarks.js:1610
(Diff revision 1)
>  
>    onBeginUpdateBatch: function () {},
>    onEndUpdateBatch: function () {},
>    onItemVisited: function () {}
>  };
> +

I'd be inclined to think this could be a separate module

::: services/sync/modules/engines/bookmarks.js:1697
(Diff revision 1)
> +   *       do not match
> +   *     - cycles (array of array of ids). List of cycles found in the "tree".
> +   *     - orphans (array of ids): List of nodes with no parentid.
> +   *     - missingChildren (array of {parent: id, child: id}):
> +   *       List of parent/children where the child id couldn't be found
> +   *     - missingParents (array of ids): list of records whose parentid

I'm unclear from this comment what missingParents is - is it a list of IDs for existing children for which the parent GUID is missing (ie, orphans), or is it a list of the missing parent IDs (ie, GUIDs which should exist as an existing record points at them, but doesn't)?

If the former, it might be clearer to call it "orphanedChildren" or similar, but if the latter the comment should be updated.

Similarly for "orphans" a few lines above - "list of nodes with no parentid" is subtly different from "list of nodes with a parentid that doesn't exist".

I wonder if we could combine these into something like:

orphans: array of {id, parentid} where id is an existing record and parentid is the ID of a missing parent record, or null if the record did not specify a parent.

(or maybe I'm missing the point entirely :)

::: services/sync/modules/engines/bookmarks.js:1772
(Diff revision 1)
> +        if (new Set(record.children).size !== record.children.length) {
> +          log(`Record "${record.id}" has duplicated entries in it's children array`);
> +          problemData.duplicateChildren.push(record.id)
> +        }
> +
> +        // This whole next part is a huge hack.

lol

::: services/sync/modules/engines/bookmarks.js:1819
(Diff revision 1)
> +      // avoid complaining about stuff like it missing it's childGUIDs later.
> +      root = { id: 'places', phony: true, children: [], type: 'folder', title: '' };
> +      resultRecords.push(root);
> +      idToRecord.set('places', root);
> +    } else {
> +      problemData.rootOnServer = true;

do we really need |phony| given this flag? (ie, it seems an invariant we will have a root, and problemData.rootOnServer tells us whether it's "phony" or not.

(I don't really care, but the item with phony=true sounds like a "problem" when it's actually desired and intended)

::: services/sync/modules/engines/bookmarks.js:1969
(Diff revision 1)
> +   * Returns the problemData described in the inspectServerRecords comment,
> +   * with the following additional fields.
> +   * - clientMissing: Array of ids on the server missing from the client
> +   * - serverMissing: Array of ids on the client missing from the server
> +   * - differences: Array of {id: string, differences: string array} recording
> +   *   the properties that are differente between the client and server

trivial typo - trailing "e" on different

::: services/sync/modules/engines/bookmarks.js:1970
(Diff revision 1)
> +   * with the following additional fields.
> +   * - clientMissing: Array of ids on the server missing from the client
> +   * - serverMissing: Array of ids on the client missing from the server
> +   * - differences: Array of {id: string, differences: string array} recording
> +   *   the properties that are differente between the client and server
> +   * - good: Array of ids of items where the client and server fields are both

do we really need "good"?
https://reviewboard.mozilla.org/r/47891/#review44733

Looks good - I think the prefs thing is easy to fix. I'm unsure about some of my other comments, but thought they are worth starting a discussion about.

::: browser/branding/aurora/pref/firefox-branding.js:39
(Diff revision 1)
>  // Number of usages of the web console or scratchpad.
>  // If this is less than 5, then pasting code into the web console or scratchpad is disabled
>  pref("devtools.selfxss.count", 5);
> +
> +// Whether or not we should run the (costly) client/server bookmark validation.
> +pref("services.sync.bookmarks.validationEnabled", false);

We should juse use a #ifdef in browser/app/profile/firefox.js - search for existing |#ifdef NIGHTLY_BUILD| blocks in there. However, almost all services.sync prefs are in services/sync/services-sync.js - but I've no idea if that is setup to be pre-processed - if so, it should probably go there instead.

Another option is that we just treat your |services.sync.bookmarks.validationInterval| pref as the flag - ie, default it to zero on all other channels and treat 0 as "never"

::: services/sync/modules/engines/bookmarks.js:242
(Diff revision 1)
>  
>      // Expose the exception if something inside the batch failed
>      if (batchEx != null) {
>        throw batchEx;
>      }
> +    if (Svc.Prefs.get("bookmarks.validationEnabled")) {

and this block will already treat 0 as "never" :)

We might need to put some more thought into this scheduling though - eg, maybe it makes sense to be more aggressive when a bookmark Sync actually did something, and the far-less aggressive schedule in other cases? No firm thoughts about this yet though...

Another 1/2-baked thought - it might make sense to do this after sync completes in its entirety, just so we don't make it less likely long syncs actually complete. Off the top of my head, it might make sense to, say:

* Create a base "validator" class that has the getServerState() function for its engine.
* Add a method to SyncEngine in engines.js called, say, getValidator() - only bookmarks returns non-null currently, but later history and passwords for example might. Passwords could create a new instance of a "validator" subclass and pass the engine in.
* services.js waits for sync to complete, and it implements the "how often" logic, and if it decides now, it attempts to get the validator object and call its (say) .validate() method.

As mentioned, a 1/2-baked thought :)

::: services/sync/modules/engines/bookmarks.js:243
(Diff revision 1)
>      // Expose the exception if something inside the batch failed
>      if (batchEx != null) {
>        throw batchEx;
>      }
> +    if (Svc.Prefs.get("bookmarks.validationEnabled")) {
> +      let lastValidation = parseInt(Svc.Prefs.get("bookmarks.lastValidation", "0"));

there's no good reason not to use ints best I can tell, particularly if we store seconds instead of ms (which I think is what we tend to do in other places)

::: services/sync/modules/engines/bookmarks.js:246
(Diff revision 1)
>      }
> +    if (Svc.Prefs.get("bookmarks.validationEnabled")) {
> +      let lastValidation = parseInt(Svc.Prefs.get("bookmarks.lastValidation", "0"));
> +      let validationInterval = Svc.Prefs.get("bookmarks.validationInterval", 1000*60*60*24);
> +      if (Date.now()-lastValidation > validationInterval) {
> +        this.validateState();

we're going to want a try/catch here with a log.error (and once we have a sync-specific telemetry story, record the fact we failed) - but it should ignore an Async.isShutdownException() exception.

::: services/sync/modules/engines/bookmarks.js:536
(Diff revision 1)
> +    // Start waiting for this before we send off the request for the server state
> +    let treePromise = PlacesUtils.promiseBookmarksTree("", {
> +      includeItemIds: true
> +    });
> +    let serverState = this._getServerState();
> +    let clientTree = Async.promiseSpinningly(treePromise);

I think that if possible we should make the validators promise-based, possibly via Task.jsm, and have the hacky promiseSpinningly up in the caller - it will make the inevitable future "remove event-loop spinning" work that bit easier to manage.

(We do still have the problem of Task.jsm implemented stuff not giving good stack traces, so that might be something we regret, but we can't avoid addressing this - I suspect bug 1095267 will fix that problem and that might be right up your alley anyway ;)

::: services/sync/modules/engines/bookmarks.js:538
(Diff revision 1)
> +      includeItemIds: true
> +    });
> +    let serverState = this._getServerState();
> +    let clientTree = Async.promiseSpinningly(treePromise);
> +    let result = BookmarkValidator.compareServerWithClient(
> +      serverState, clientTree, this._log.warn.bind(this._log))

We might want to see how much log spam this generates, especially given the log messages aren't particularly actionable (eg, if a user uploads a log, the messages aren't going to tell us something we can help the user with (ie, we aren't going to be able to tell the user what a specific ID corresponds to) - it's almost as though this log output is more for diagnostics for us, and the logs themselves should default to the same summary info we are thinking about for telemetry. IOW, "5 multiple parents, 3 orphans, 1 cycle" might be enough.

If it transpires that we only tend to see a small number of these log message though it's probably fine. This is the kind of thing we can tweak further after landing.

::: toolkit/components/telemetry/Histograms.json:10109
(Diff revision 1)
>      "description": "Exceptions thrown by a Sync engine. Keyed on the engine name."
>    },
> +  "WEAVE_BOOKMARK_VALIDATION_DURATION": {
> +    "alert_emails": ["fx-team@mozilla.com"],
> +    "bug_numbers": [1265419, 1263835],
> +    "expires_in_version": "never",

We might have trouble justifying "never" here - so let's try 60 ;) FYI, we will need to get signoff on a "data collection peer" here (and for anything we do that touches any telemetry)
(Assignee)

Comment 6

2 years ago
https://reviewboard.mozilla.org/r/47889/#review44725

> I'd be inclined to think this could be a separate module

It would have been, but it need's access to kSpecialIds. Although, possibly that should be public.

> I'm unclear from this comment what missingParents is - is it a list of IDs for existing children for which the parent GUID is missing (ie, orphans), or is it a list of the missing parent IDs (ie, GUIDs which should exist as an existing record points at them, but doesn't)?
> 
> If the former, it might be clearer to call it "orphanedChildren" or similar, but if the latter the comment should be updated.
> 
> Similarly for "orphans" a few lines above - "list of nodes with no parentid" is subtly different from "list of nodes with a parentid that doesn't exist".
> 
> I wonder if we could combine these into something like:
> 
> orphans: array of {id, parentid} where id is an existing record and parentid is the ID of a missing parent record, or null if the record did not specify a parent.
> 
> (or maybe I'm missing the point entirely :)

No, that's basically the distinction. Orphans are the items where the parentid was not specified at all, and missingParents are the nodes where the parentid was specified but no such parent existed.

> lol

I'd love for this not to be the case, but as the comment says, there doesn't seem to be a great way to get, err, "special" ids, from the real guids stored in the childrens array.

I took a bit of time staring at the [iOS sync validation code](https://github.com/mozilla/firefox-ios/blob/master/Storage/SQL/SQLiteBookmarksSyncing.swift#L679) when writing this, which turned out to be just a bunch of sql queries. So either it doesn't handle this, or it's handled when inserting into the database that mirrors the server initially. Meant to dig in further, but decided to get it working, but hacky first.

Also, my earlier statement that the children array is duplicated information isn't exactly true -- the children array is what the order is determined from.

> do we really need |phony| given this flag? (ie, it seems an invariant we will have a root, and problemData.rootOnServer tells us whether it's "phony" or not.
> 
> (I don't really care, but the item with phony=true sounds like a "problem" when it's actually desired and intended)

That's a good point, no, we don't need both of them. 

When I wrote this initially, I thought the root being on the server *was* intentional -- not to mention my first pass was producing a `problems` array of messages along the lines of what aboutsync creates (most of which is preserved as log messages, actually), but obviously that would be less useful than what it produces now.

> do we really need "good"?

No, and I waffled a bit on including it. It makes determining e.g. the % of records that are properly synced easier, but is probably not that useful overall.
(Assignee)

Comment 7

2 years ago
https://reviewboard.mozilla.org/r/47891/#review44733

> We should juse use a #ifdef in browser/app/profile/firefox.js - search for existing |#ifdef NIGHTLY_BUILD| blocks in there. However, almost all services.sync prefs are in services/sync/services-sync.js - but I've no idea if that is setup to be pre-processed - if so, it should probably go there instead.
> 
> Another option is that we just treat your |services.sync.bookmarks.validationInterval| pref as the flag - ie, default it to zero on all other channels and treat 0 as "never"

That sounds good to me. Again, I was unsure what the correct way to go about this was.

> and this block will already treat 0 as "never" :)
> 
> We might need to put some more thought into this scheduling though - eg, maybe it makes sense to be more aggressive when a bookmark Sync actually did something, and the far-less aggressive schedule in other cases? No firm thoughts about this yet though...
> 
> Another 1/2-baked thought - it might make sense to do this after sync completes in its entirety, just so we don't make it less likely long syncs actually complete. Off the top of my head, it might make sense to, say:
> 
> * Create a base "validator" class that has the getServerState() function for its engine.
> * Add a method to SyncEngine in engines.js called, say, getValidator() - only bookmarks returns non-null currently, but later history and passwords for example might. Passwords could create a new instance of a "validator" subclass and pass the engine in.
> * services.js waits for sync to complete, and it implements the "how often" logic, and if it decides now, it attempts to get the validator object and call its (say) .validate() method.
> 
> As mentioned, a 1/2-baked thought :)

Hm, yes, that sounds like it wouldn't be a terrible idea.

I considered abstracting this out into something more generic, but didn't mainly because I wasn't certain it would be necessarially be used. But doing so isn't really problematic, aside from a couple annoying dependencies that BookmarkValidator has on items declared locally in bookmarks.js (`kSpecialIds` mainly).

> there's no good reason not to use ints best I can tell, particularly if we store seconds instead of ms (which I think is what we tend to do in other places)

I used ints initially, but Date.now() overflows 32 bits, which causes Svc.Prefs.set to throw. Truncating before passing it in is an option, but not a great one IMO, since if it ever wraps around (and I have absolutely no idea how far off that would be), it could be a very long time before the validator run unless we handled that explicitly.

I considered saving the date in UTC format, but it would require more work to handle the "missing" case.

> we're going to want a try/catch here with a log.error (and once we have a sync-specific telemetry story, record the fact we failed) - but it should ignore an Async.isShutdownException() exception.

Right, getting the client and server states can still certainly fail (the validation shouldn't fail, except in the case of a bug, although I guess it would be a good idea to catch that too).

> I think that if possible we should make the validators promise-based, possibly via Task.jsm, and have the hacky promiseSpinningly up in the caller - it will make the inevitable future "remove event-loop spinning" work that bit easier to manage.
> 
> (We do still have the problem of Task.jsm implemented stuff not giving good stack traces, so that might be something we regret, but we can't avoid addressing this - I suspect bug 1095267 will fix that problem and that might be right up your alley anyway ;)

The main reason I did it this way was because there didn't appear to be a way to grab the server data asynchronously, although I didn't look that hard. Generally speaking, yeah, there's no reason not to use promises here.

> We might want to see how much log spam this generates, especially given the log messages aren't particularly actionable (eg, if a user uploads a log, the messages aren't going to tell us something we can help the user with (ie, we aren't going to be able to tell the user what a specific ID corresponds to) - it's almost as though this log output is more for diagnostics for us, and the logs themselves should default to the same summary info we are thinking about for telemetry. IOW, "5 multiple parents, 3 orphans, 1 cycle" might be enough.
> 
> If it transpires that we only tend to see a small number of these log message though it's probably fine. This is the kind of thing we can tweak further after landing.

That's not a bad point. These are very much less than ideal. A summary would be easy to generate given the current data. Another option might be to put more information in the logs so that they're actually useful (if that's possible, for some of the logs it certainly is -- e.g. for differences, what the differences are, etc), and only log this stuff out at level debug or trace, and put the summary as a warning.

For what it's worth, I do think the items in a cycle has the possibility of being useful information, since it could tell us if it goes through a root.

> We might have trouble justifying "never" here - so let's try 60 ;) FYI, we will need to get signoff on a "data collection peer" here (and for anything we do that touches any telemetry)

Sounds good, I had no idea what the correct thing to put here was, so I basically just guessed for everything that went in this file.
(In reply to Thom Chiovoloni [:tcsc] from comment #7)
> That sounds good to me. Again, I was unsure what the correct way to go about
> this was.

There's rarely a single answer to that question - it comes down to taste, and TBH I'm generally fine with accepting tastes are subjective - ie, whatever you think is best and clearest is fine with me. I have lots of strong opinions weakly held :)

> > Another 1/2-baked thought - it might make sense to do this after sync completes in its entirety, just so we don't make it less likely long syncs actually complete. Off the top of my head, it might make sense to, say:
> > 
> > * Create a base "validator" class that has the getServerState() function for its engine.
> > * Add a method to SyncEngine in engines.js called, say, getValidator() - only bookmarks returns non-null currently, but later history and passwords for example might. Passwords could create a new instance of a "validator" subclass and pass the engine in.
> > * services.js waits for sync to complete, and it implements the "how often" logic, and if it decides now, it attempts to get the validator object and call its (say) .validate() method.
> > 
> > As mentioned, a 1/2-baked thought :)
> 
> Hm, yes, that sounds like it wouldn't be a terrible idea.
> 
> I considered abstracting this out into something more generic, but didn't
> mainly because I wasn't certain it would be necessarially be used. But doing
> so isn't really problematic, aside from a couple annoying dependencies that
> BookmarkValidator has on items declared locally in bookmarks.js
> (`kSpecialIds` mainly).

Thinking a little more about this though, I'm not sure we will ever end up with many different implementations - bookmarks is a special snowflake as it tries to maintain a tree, but every other collection should map exactly to local items - each "tracker" already has a getAllIDs and a way to convert a local object to a "record" - so ISTM that every other collection could probably share a fairly generic and simple validator (ie, read all server records, get all local objects as records, simple object compare of the contents). It's not even clear that such a validator even makes sense for them TBH, and that a simple "find missing records" might be enough.

So I think we should still consider something like the above, but with the assumption that there aren't going to be many implementations - eg, maybe a modules/validators directory doesn't make sense (but maybe it also doesn't hurt) - but this too is somewhat personal taste, so I wont bikeshed it too much. I do think exposing the dependencies you need directly from bookmarks.js is fine though and that this validator should be split to avoid increasing the complexity of bookmarks.js even further.

> > there's no good reason not to use ints best I can tell, particularly if we store seconds instead of ms (which I think is what we tend to do in other places)
> 
> I used ints initially, but Date.now() overflows 32 bits, which causes
> Svc.Prefs.set to throw. Truncating before passing it in is an option, but
> not a great one IMO, since if it ever wraps around (and I have absolutely no
> idea how far off that would be), it could be a very long time before the
> validator run unless we handled that explicitly.
> 
> I considered saving the date in UTC format, but it would require more work
> to handle the "missing" case.

Saving of seconds is very common - grepping for "/ 1000" in services/sync should give you some idea of that, and often the values are stuck directly in prefs. If we ever manage to overflow a preference with a date stored in seconds we are screwed in many other ways.

> except in the case of a bug, although I guess it
> would be a good idea to catch that too).

Yeah, exactly :)

> That's not a bad point. These are very much less than ideal. A summary would
> be easy to generate given the current data. Another option might be to put
> more information in the logs so that they're actually useful (if that's
> possible, for some of the logs it certainly is -- e.g. for differences, what
> the differences are, etc), and only log this stuff out at level debug or
> trace, and put the summary as a warning.

Most logs are already at "debug", so "trace" might work. But it's also worth considering privacy here - the sync logs already expose a lot of private information (eg, the URLs of their bookmarks), but we should avoid leaking even more of that data without reason (eg, when a bookmark was last visited seems more sensitive than just the URL). In the past, when we are logging clearly undesirable data (such as encryption keys) we've used a different pref rather than a verbose log level - the thinking was that the user flipping a well-named pref is more obvious than just changing a log level (not my idea BTW, but whatever :) So that's another possibility - another pref that logs all the details - possibly even every record so we an reproduce the full tree. We can make this up as we go though.

> For what it's worth, I do think the items in a cycle has the possibility of
> being useful information, since it could tell us if it goes through a root.

Yeah, that's true - but it still doesn't seem actionable - it would be somewhat interesting, but wouldn't change what advice we would give to a user based on that appearing in their logs. Maybe a cycle going through a root should just be explicit problem metadata?

OTOH though, I don't think there's a privacy concern with just reporting IDs - so yeah, whatever makes sense.

> Sounds good, I had no idea what the correct thing to put here was, so I
> basically just guessed for everything that went in this file.

There's a slightly higher bar for collecting info with "never" in terms of justifying the user benefit. The bar isn't *that* high, but I like avoiding administrivia where I can :)
(Assignee)

Comment 9

2 years ago
> There's rarely a single answer to that question - it comes down to taste, and
> TBH I'm generally fine with accepting tastes are subjective - ie, whatever you
> think is best and clearest is fine with me. I have lots of strong opinions
> weakly held :)

Well, the way you're suggesting seems to work more reliably, so I definitely prefer it to what I have.


> So I think we should still consider something like the above, but with the
> assumption that there aren't going to be many implementations - eg, maybe a
> modules/validators directory doesn't make sense (but maybe it also doesn't
> hurt) - but this too is somewhat personal taste, so I wont bikeshed it too
> much.

Hm... Okay. My personal taste is generally towards minimizing unused genericity where possible, since it so frequently doesn't end up being used, in which case it's just a pile of extra complexity (lots of parts of sync seem to be examples of this).

I think a reasonable tradeoff here might be to split bookmark validator out into its own module, but also use the getValidator() approach you mentioned, to allow the scheduling to be performed by something other than the bookmark engine (e.g. services.js), at the very least to remove any temptation to duplicate that logic for when other engines get validators.

> I do think exposing the dependencies you need directly from bookmarks.js is
> fine though and that this validator should be split to avoid increasing the
> complexity of bookmarks.js even further.

Well, they either need to be made into their own module (to avoid a circular dependency between bookmark_validator.js and bookmarks.js). That or they need to be attached to the engine (which would be provided as a constructor argument to the validator).

Honestly, I don't love either of these options, but the first is probably the better option, since there's really no reason the validator needs to have a reference to the engine when doing the actual validation logic, and giving it one for this purpose would just make testing more difficult (even though I'm not a fan of kSpecialId's existence, and don't like to see it get it's own file -- not that it will be the only item in that file).

> Saving of seconds is very common - grepping for "/ 1000" in services/sync
> should give you some idea of that, and often the values are stuck directly in
> prefs. If we ever manage to overflow a preference with a date stored in
> seconds we are screwed in many other ways.

Yes, 2038 will be fun -- but hopefully we've worked out the bugs in bookmark sync by that point :p

> Most logs are already at "debug", so "trace" might work. But it's also worth
> considering privacy here - the sync logs already expose a lot of private
> information (eg, the URLs of their bookmarks), but we should avoid leaking
> even more of that data without reason (eg, when a bookmark was last visited
> seems more sensitive than just the URL). In the past, when we are logging
> clearly undesirable data (such as encryption keys) we've used a different pref
> rather than a verbose log level - the thinking was that the user flipping a
> well-named pref is more obvious than just changing a log level (not my idea
> BTW, but whatever :) So that's another possibility - another pref that logs
> all the details - possibly even every record so we an reproduce the full tree.
> We can make this up as we go though.

Hmm... A pref sounds good, but I'm willing to not log any of the intermediate data and just log a summary for now -- Do you think I should keep the logging code around, or move it out of the validator entirely. (Part of my motivation for keeping it in this way would be to give it to aboutsync, but it also is probably redundant at this point, given that everything in it is also reported in some other way).

> Yeah, that's true - but it still doesn't seem actionable - it would be
> somewhat interesting, but wouldn't change what advice we would give to a user
> based on that appearing in their logs. Maybe a cycle going through a root
> should just be explicit problem metadata?

A cycle going through a root might be a bad example (it's interesting, but honestly having a cycle at all means your bookmarks are really quite broken). A better example (that I actually have have) is where a child has two parents, but the two parents are the same bookmark in every way *except* for id. But I'm not sure that this case is interesting/actionable enough either.
(Assignee)

Comment 10

2 years ago
Comment on attachment 8743571 [details]
MozReview Request: Bug 1265419 - Implement a validator for bookmarks, that reports errors in the server-side bookmark store, and inconsistencies between server and client r?markh

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/47889/diff/1-2/
Attachment #8743571 - Flags: feedback?(markh)
Attachment #8743572 - Flags: feedback?(markh)
(Assignee)

Comment 11

2 years ago
Comment on attachment 8743572 [details]
MozReview Request: Bug 1265419 - Hook bookmark sync validator into the bookmark sync engine and add a telemetry probe for it f?markh

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/47891/diff/1-2/
(Assignee)

Comment 12

2 years ago
Comment on attachment 8743572 [details]
MozReview Request: Bug 1265419 - Hook bookmark sync validator into the bookmark sync engine and add a telemetry probe for it f?markh

Is this something like what you had in mind for scheduling?

I'm not confident this is the, err, correct place to put or do this. Specifically, I don't know if we should be holding the sync lock/preventing sync while validating (hopefully I understood the code correctly and that is what that does), and I don't know if we should have separate prefs for each engine's validation (doesn't seem like it's likely to be worthwhile, but maybe?).

The thing that's most consistent with how everything works now is probably to offload it to some SyncScheduler or whatever, but that strikes me as overkill.
Attachment #8743572 - Flags: feedback?(markh)
(Assignee)

Updated

2 years ago
Attachment #8743571 - Flags: feedback?(markh)
(Assignee)

Comment 13

2 years ago
Comment on attachment 8743571 [details]
MozReview Request: Bug 1265419 - Implement a validator for bookmarks, that reports errors in the server-side bookmark store, and inconsistencies between server and client r?markh

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/47889/diff/2-3/
Attachment #8743571 - Attachment description: MozReview Request: Bug 1265419 - Implement a validator for bookmarks, that reports errors in the server-side bookmark store, and inconsistencies between server and client f?markh → MozReview Request: Bug 1265419 - Implement a validator for bookmarks, that reports errors in the server-side bookmark store, and inconsistencies between server and client r?markh
Attachment #8743571 - Flags: feedback?(markh) → review?(markh)
Attachment #8743572 - Flags: feedback?(markh)
(Assignee)

Comment 14

2 years ago
Comment on attachment 8743572 [details]
MozReview Request: Bug 1265419 - Hook bookmark sync validator into the bookmark sync engine and add a telemetry probe for it f?markh

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/47891/diff/2-3/
(Assignee)

Comment 15

2 years ago
Comment on attachment 8743572 [details]
MozReview Request: Bug 1265419 - Hook bookmark sync validator into the bookmark sync engine and add a telemetry probe for it f?markh

Still want feedback on this one, but updated the other to be r? instead of f?, which apparently cancels the feedback requests on both (weird -- next time i'll see if there's a way to avoid this).
Attachment #8743572 - Flags: feedback?(markh)
Flags: firefox-backlog+
Priority: -- → P1
Comment on attachment 8743571 [details]
MozReview Request: Bug 1265419 - Implement a validator for bookmarks, that reports errors in the server-side bookmark store, and inconsistencies between server and client r?markh

https://reviewboard.mozilla.org/r/47889/#review45735

Looks great, thanks. I don't think we should land the "integrate" patch yet though - we should probably get the telemetry bits going first - I don't see much value in having nightly users get these checks run but going to what is (effectively) a black hole (presumably users will still see successful syncs, and they don't write logsfiles by default).

I think there's value in landing this and hooking it up to that addon, so I'd suggest you open a new bug for the integration part and put that second patch there, then land this chunk after a successful try run.

::: services/sync/tests/unit/test_bookmark_validator.js:33
(Diff revision 3)
> +    {id: 'A', type: 'folder', children: ['B'], parentid: 'B'},
> +    {id: 'B', type: 'folder', children: ['A'], parentid: 'A'},
> +  ]).problemData;
> +
> +  equal(c.cycles.length, 1);
> +  if (c.cycles[0]) {

here and below - remove the additional checks as equal will throw so we won't get to the next test anyway - and even if we did, it's probably clearer to let it fail noisily (and in this particular case it would be more correct - c.cycles being an array of |[null]| would obviously be wrong, but it looks like this test would pass.

::: services/sync/tests/unit/test_bookmark_validator.js:254
(Diff revision 3)
> +  test_cswc_serverMissing();
> +  test_cswc_differences();
> +}
> +
> +function run_test() {
> +  test_inspectServerRecords();

Note that you could also write the tests as, eg:

add_test(function name() {
 ...
});

and then there's no need to actually remember to call them - the harness calls them all automagically. add_task(function* ...) also exists and is the preferred way of writing async tests, so add_test/add_task has a nice symmetry.

This is just FYI - I'm not bothered if you change it or not.
Attachment #8743571 - Flags: review?(markh) → review+
https://reviewboard.mozilla.org/r/47891/#review45707

This looks great (but see other comment - let's put this in a new bug and land it a little later)

::: browser/app/profile/firefox.js:1507
(Diff revision 3)
>  
>  pref("extensions.pocket.enabled", true);
> +
> +#ifdef NIGHTLY_BUILD
> +// Enable the (fairly costly) client/server validation on nightly only.
> +pref("services.sync.validationEnabled", true);

let's also create the pref when !nightly - it should make it a tiny bit easier for users who we ask to flip the pref for diagnostic purposes.

::: services/sync/modules/bookmark_validator.js:22
(Diff revision 3)
>  
>  class BookmarkValidator {
>  
> +  constructor() {
> +    this._log = Log.repository.getLogger("Sync.Validator.Bookmarks");
> +    let level = Svc.Prefs.get("log.logger.engine.bookmarks", "Debug");

another option might be a log name of "Sync.Engine.Bookmarks.Tracker", and we should then "inherit" the level of the parent log.

::: services/sync/modules/bookmark_validator.js:493
(Diff revision 3)
> +      });
> +      let serverState = self._getServerState(engine);
> +      let result = self.compareServerWithClient(serverState, clientTree);
> +
> +      let end = Date.now();
> +      let duration = end-start;

style nit - we tend to prefer spaces around operators.

::: services/sync/modules/engines.js:955
(Diff revision 3)
>      // Keep track of what to delete at the end of sync
>      this._delete = {};
>    },
>  
>    /**
>     * A tiny abstraction to make it easier to test incoming record

this comment probably needs to be updated now (might be as simple as adding "validators and" to it)

::: services/sync/modules/engines/bookmarks.js:430
(Diff revision 3)
>      this._log.debug(item.id + " mapped to " + mapped);
>      return mapped;
> +  },
> +
> +  getValidator() {
> +    return new BookmarkValidator(this._log.debug.bind(this._log));

the constructor doesn't take any params - I'm not too bothered if you pass the log or do the log dance in the validator itself, but both caller and callee should agree :)

::: services/sync/modules/service.js:1309
(Diff revision 3)
>        histogram = Services.telemetry.getHistogramById("WEAVE_COMPLETE_SUCCESS_COUNT");
>        histogram.add(1);
>  
> +      // Check and see if we should run validation, but wait until we've
> +      // released the lock.
> +      Utils.nextTick(this._validate.bind(this));

I'm inclined to say we should hold the lock, as there may be non-obvious global state even in this read-only implementation (eg, management of the decryption keys, validating while incoming bookmarks are being written) and later, when we come to actually repairing the damage we are likely to find more (eg, "fixing" a record at the same time Sync is trying to apply the server view of the world.) Also, the observer notifications as we release the lock stop the "spinner" in the UI, and I think it is desired that it spins during validation.

I agree we don't need separate prefs yet.

::: services/sync/services-sync.js:82
(Diff revision 3)
>  pref("services.sync.log.cryptoDebug", false);
>  
>  pref("services.sync.fxa.termsURL", "https://accounts.firefox.com/legal/terms");
>  pref("services.sync.fxa.privacyURL", "https://accounts.firefox.com/legal/privacy");
> +
> +pref("services.sync.validationInterval", 86400);

add a comment here "// 1 day" (assuming my math is correct ;)

Updated

2 years ago
Attachment #8743572 - Flags: feedback?(markh) → feedback+
Just following up on other comments FTR.

(In reply to Thom Chiovoloni [:tcsc] from comment #9)

> Hm... Okay. My personal taste is generally towards minimizing unused
> genericity where possible, since it so frequently doesn't end up being used,
> in which case it's just a pile of extra complexity (lots of parts of sync
> seem to be examples of this).

Sing it! :) Can't agree more with that sentiment (and the approach you took)

> Yes, 2038 will be fun -- but hopefully we've worked out the bugs in bookmark
> sync by that point :p

:)

> Hmm... A pref sounds good, but I'm willing to not log any of the
> intermediate data and just log a summary for now -- Do you think I should
> keep the logging code around, or move it out of the validator entirely.
> (Part of my motivation for keeping it in this way would be to give it to
> aboutsync, but it also is probably redundant at this point, given that
> everything in it is also reported in some other way).

I'm not sure either - let's just go with what you did (well, one of the 2 approaches you took ;) and tweak it later.

(In reply to Thom Chiovoloni [:tcsc] from comment #12)
> Is this something like what you had in mind for scheduling?

Yep, looks good.

> I'm not confident this is the, err, correct place to put or do this.
> Specifically, I don't know if we should be holding the sync lock/preventing
> sync while validating (hopefully I understood the code correctly and that is
> what that does), and I don't know if we should have separate prefs for each
> engine's validation (doesn't seem like it's likely to be worthwhile, but
> maybe?).

As mentioned, we want the lock held IMO.

> The thing that's most consistent with how everything works now is probably
> to offload it to some SyncScheduler or whatever, but that strikes me as
> overkill.

Agreed.

(In reply to Thom Chiovoloni [:tcsc] from comment #15)
> Comment on attachment 8743572 [details]
> MozReview Request: Bug 1265419 - Hook bookmark sync validator into the
> bookmark sync engine and add a telemetry probe for it f?markh
> 
> Still want feedback on this one, but updated the other to be r? instead of
> f?, which apparently cancels the feedback requests on both (weird -- next
> time i'll see if there's a way to avoid this).

heh - welcome to mozreview. There's also no way in mozreview to give it f+ - I believe that even if you only requested feedback, clicking the "ship it" button upgrades it to r+! No big deal though, I can usually work out what is going on (or fix it later when I get it wrong :)
(Assignee)

Comment 19

2 years ago
Comment on attachment 8743571 [details]
MozReview Request: Bug 1265419 - Implement a validator for bookmarks, that reports errors in the server-side bookmark store, and inconsistencies between server and client r?markh

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/47889/diff/3-4/
(Assignee)

Updated

2 years ago
Attachment #8743572 - Attachment is obsolete: true
(Assignee)

Comment 20

2 years ago
https://reviewboard.mozilla.org/r/47889/#review45735

This makes sense -- When I opened this bug it wasn't 100% clear to me how to decompose the original problem into smaller bugs, but again, this makes more sense, especially now.

> here and below - remove the additional checks as equal will throw so we won't get to the next test anyway - and even if we did, it's probably clearer to let it fail noisily (and in this particular case it would be more correct - c.cycles being an array of |[null]| would obviously be wrong, but it looks like this test would pass.

Good point. For what it's worth, the test runner *seems* to bail out anyway as soon as it hits it's first falsy condition, so my intent of continuing after errors was pointless anyway.

> Note that you could also write the tests as, eg:
> 
> add_test(function name() {
>  ...
> });
> 
> and then there's no need to actually remember to call them - the harness calls them all automagically. add_task(function* ...) also exists and is the preferred way of writing async tests, so add_test/add_task has a nice symmetry.
> 
> This is just FYI - I'm not bothered if you change it or not.

Ah, alright. Thanks. I'll change it since I'll already be going back to fix the other issues.
This bug is now just about landing the initial validator - I'll open a new bug for integration with Sync itself and telemetry.
Summary: Add telemetry measuring inconsistent client/server bookmark states, and related data → Add facility for measuring inconsistent client/server bookmark states

Updated

2 years ago
Blocks: 1267917
(Assignee)

Updated

2 years ago
Keywords: checkin-needed

Comment 23

2 years ago
bugherder
https://hg.mozilla.org/mozilla-central/rev/2c230d7fcf51
Status: NEW → RESOLVED
Last Resolved: 2 years ago
status-firefox49: --- → fixed
Resolution: --- → FIXED
Target Milestone: --- → Firefox 49

Updated

2 years ago
Whiteboard: [sync-data-integrity]
You need to log in before you can comment on or make changes to this bug.