Closed Bug 1173359 Opened 5 years ago Closed 3 years ago

History items are duplicated after reenabling history sync

Categories

(Firefox :: Sync, defect, P1)

39 Branch
defect

Tracking

()

VERIFIED FIXED
Firefox 53
Tracking Status
firefox53 --- verified
firefox54 --- verified
firefox55 --- verified

People

(Reporter: VarCat, Assigned: Lina)

References

(Depends on 1 open bug, Blocks 1 open bug)

Details

Attachments

(3 files)

Environment:

FF 39.0b4
OS: Win Xp x86, Win 8.1 x64

Prerequisites:

Have 2 Firefox Profile that are using the same Firefox Account that was previously synced.

STR:

1. On profile 1 disable History from about:preferences#sync
2. On profile 1 open some new pages to create new history entries.
3. Sync Now on profile 1
4. Sync Now on profile 2
The new history entries are not synced to profile 2(that is correct behavior)
5. On profile 1 enable History
6. Sync Now on profile 1
7. Sync Now on profile 2

Issue:
The history entries added on step 2 are duplicated on profile 2.

Note:
This bug is reproducible on FF 38.0.5 so it's not a recent regression.
Can you explain more about what it means for them to be duplicated? Visually in the UI? Does this mean that the entries appear twice in profile 2?

It's expected behavior for the entries that were created during the period that History sync was disabled to appear on profile 2 after History sync is re-enabled.
Flags: needinfo?(catalin.varga)
Priority: -- → P4
Hi Chris, I mean that the entries that were created during the period that History sync was disabled appear twice in profile 2.
Flags: needinfo?(catalin.varga)
Yeah, that's weird, and I'm concerned it's a symptom of a larger bug. Since the STR sound simple enough, I'm escalating to P2 to at least investigate.
Priority: P4 → P2
Priority: P2 → P3
Priority: P3 → P2
Priority: P2 → P1
Assignee: nobody → kcambridge
I think part of the issue is that we're syncing hidden visits. I didn't even have to disconnect and reconnect to reproduce this. On profile B, I typed in a URL I'd never visited before ("bbc.com"), and was redirected to "http://www.bbc.com/". Even though my history menu only has one entry for BBC, the tracker recorded two visits:

1479316754194 Sync.Tracker.History  TRACE onVisit: http://bbc.com/
1479316754195 Sync.Tracker.History  TRACE Adding changed ID: x3XsfL2IlPSH, 1479316754.195
1479316754357 Sync.Tracker.History  TRACE onVisit: http://www.bbc.com/
1479316754358 Sync.Tracker.History  TRACE Adding changed ID: TAthr-_WwjO6, 1479316754.358
1479316799804 Sync.Store.History  DEBUG createRecord: We have a record for x3XsfL2IlPSH: {"url":"http://bbc.com/","title":null,"frecency":2000}
1479316799805 Sync.Engine.History TRACE Outgoing: { id: x3XsfL2IlPSH  index: 2000  modified: undefined  ttl: 5184000  payload: {"id":"x3XsfL2IlPSH","histUri":"http://bbc.com/","title":null,"visits":[{"date":1479316754186840,"type":2}]}  collection: history }
1479316799808 Sync.Store.History  DEBUG createRecord: We have a record for TAthr-_WwjO6: {"url":"http://www.bbc.com/","title":"BBC - Homepage","frecency":2000}
1479316799809 Sync.Engine.History TRACE Outgoing: { id: TAthr-_WwjO6  index: 2000  modified: undefined  ttl: 5184000  payload: {"id":"TAthr-_WwjO6","histUri":"http://www.bbc.com/","title":"BBC - Homepage","visits":[{"date":1479316754333185,"type":5}]}  collection: history }

"x3XsfL2IlPSH" is hidden by the redirect, but we upload records for both. We should filter those out before uploading.
Comment on attachment 8811547 [details]
Bug 1173359 - Convert the history tracker tests to use async functions.

https://reviewboard.mozilla.org/r/93630/#review93682

This looks great, but I'd like to better understand why the semantics of that test changed.

::: services/sync/tests/unit/test_history_tracker.js:12
(Diff revision 1)
>  Cu.import("resource://services-sync/constants.js");
>  Cu.import("resource://services-sync/engines/history.js");
>  Cu.import("resource://services-sync/service.js");
>  Cu.import("resource://services-sync/util.js");
>  
> -function onScoreUpdated(callback) {
> +function promiseTopicObserved(topic) {

FWIW, I just landed bug 1316500 which included a promiseOneObserver(), which can probably be used here.

::: services/sync/tests/unit/test_history_tracker.js:94
(Diff revision 1)
> +  await stopTracking();
> +}
> +
> +add_task(async function test_empty() {
>    _("Verify we've got an empty, disabled tracker to work with.");
> -  do_check_empty(tracker.changedIDs);
> +  verifyTrackerEmpty();

you've declared verifyTrackerEmpty() as async, so you probably want an await here for a future when it truly becomes async.

::: services/sync/tests/unit/test_history_tracker.js
(Diff revision 1)
> -  do_check_false(guid in tracker.changedIDs);
> +  await verifyTrackerEmpty();
> -
> -  onScoreUpdated(function() {
> -    do_check_true(guid in tracker.changedIDs);
> -    do_check_attribute_count(tracker.changedIDs, 3);
> -    do_check_eq(tracker.score, SCORE_INCREMENT_XLARGE + 2 * SCORE_INCREMENT_SMALL);

I don't understand why the semantics of this test appears to have changed (ie, 3 tracked items vs one)?
Comment on attachment 8811548 [details]
Bug 1173359 - Exclude hidden pages and framed links from syncing.

https://reviewboard.mozilla.org/r/93632/#review93706

While the patch looks fine to me, I don't understand how this solves any problems. If I'm reading the code correctly, these transition types should end up being applied correctly as "incoming" records, and while we don't explicitly manage .hidden ourselves, it seems History.cpp should update .hidden correctly based on the transition type as the record is being written?

::: services/sync/modules/engines/history.js:102
(Diff revision 1)
> +        for (let i = 0; i < chunkLength; i++) {
> +          statement.bindByIndex(i, modifiedGUIDs[startIndex + i]);
> +        }
> +        let results = Async.querySpinningly(statement, ["guid"]);
> +        for (let { guid } of results) {
> +          this._tracker.removeChangedID(guid);

I wonder if it is worth adding a removeChangedIDs function (or simply allows removeChangeID to take an array) would be worthwhile to avoid the "timer churn" that would happen (IIUC)?

::: services/sync/modules/engines/history.js:212
(Diff revision 1)
>        "WHERE guid = :guid");
>    },
>    _urlCols: ["url", "title", "frecency"],
>  
>    get _allUrlStm() {
> -    return this._getStmt(
> +    return this._getStmt(`

I think a comment pointing at pullNewChanges to explain the visit_type clause would make sense.

::: services/sync/modules/engines/history.js:218
(Diff revision 1)
> -      "SELECT url " +
> -      "FROM moz_places " +
> -      "WHERE last_visit_date > :cutoff_date " +
> -      "ORDER BY frecency DESC " +
> -      "LIMIT :max_results");
> +      SELECT DISTINCT p.url
> +      FROM moz_places p
> +      JOIN moz_historyvisits v ON p.id = v.place_id
> +      WHERE p.last_visit_date > :cutoff_date AND
> +            p.hidden = 0 AND
> +            v.visit_type NOT IN (0, 8)

Should we use PlacesUtils.history.TRANSITIONS.FRAMED_LINK instead of the literal here? (it seems odd places apparently has no constant for 0, but there you go!)
Comment on attachment 8811547 [details]
Bug 1173359 - Convert the history tracker tests to use async functions.

https://reviewboard.mozilla.org/r/93630/#review93682

> I don't understand why the semantics of this test appears to have changed (ie, 3 tracked items vs one)?

I changed the semantics here because the tests weren't independent. A previous test would add a visit, not clean it up, and a later test would expect to find that visit. This makes it harder to comment tests out, and eventually migrate to tracking history changes in Places if we want.

I made sure the new tests passed before writing part 2. If you'd like, I can split the test changes into a separate bug, and we can land that separately.
(In reply to Kit Cambridge [:kitcambridge] from comment #9)
> I changed the semantics here because the tests weren't independent. A
> previous test would add a visit, not clean it up, and a later test would
> expect to find that visit.

I see - that's fine then :)
Mak, could you clarify how these "hidden" Places work, please?

For context: we track and sync all history visits, but it looks like the history menu filters out hidden pages and framed links. Because we don't sync the referrer, we don't mark these Places as hidden on other devices, which means we can easily end up with dupes. The STR in comment 0 involve signing out and signing back in to Sync, but it's pretty easy to reproduce without that. Check out comment 4 and the attached screenshot.

The patch fixes this, but I noticed that just adding a TRANSITION_REDIRECT_PERMANENT visit with a referrer isn't enough to mark the old Place as hidden. I see there's a maintenance query to do that (http://searchfox.org/mozilla-central/rev/62db1c9021cfbde9fa5e6e9601de16c21f4c7ce4/toolkit/components/places/PlacesDBUtils.jsm#702-713), but I'm wondering how this works in real usage.

Is there any value to syncing the hidden visits at all?
Flags: needinfo?(mak77)
(In reply to Kit Cambridge [:kitcambridge] from comment #11)
> Mak, could you clarify how these "hidden" Places work, please?

There are 2 ways to "hide" pages in Places.
Hidden is used to hide pages from history lists.
A 0 frecency value is used to hide pages from the awesomebar.
The 2 things are different because the scopes of the views are different (one is a search we filter for you, the other one a list you filter with your eyes).

In general the idea for views you must filter "manually" is to provide more useful results in a smaller space, for example the 15 most recently visited pages could be filled of pointless redirects and embed visits without hidden.

The things we mark as hidden are the things the user is unlikely to look for in an history view. For example a redirect source, since the landing page has a different url and title from the starting page, and user will likely only notice the landing page values in the UI.
Other things marked as hidden are embed visits (objects in a page) and framed_link visits (visits across frames). In practice all the things where the user is unlikely to notice url nor title and thus unlikely to look for in a list.
When instead the user searches history by searchterms, we also include hidden matches, cause then it's again a search view we filter for the user, rather than a plain long list.

> The patch fixes this, but I noticed that just adding a
> TRANSITION_REDIRECT_PERMANENT visit with a referrer isn't enough to mark the
> old Place as hidden.

Right, marking things as hidden is something we usually only do from History.cpp, that is the point where a visit is initially added from the docshell.
We can't do in other points because the visit type is how you reached a page, not how you moved out of it. Basically TRANSITION_REDIRECT_PERMANENT means this page was the target of a redirect. History doesn't track sources of redirect, the only thing that knows about that is the docshell!

The only way to properly set a redirect as hidden by adding a visit would be to check if there's another visit that has from_visit set to this one... But if you add visits in order that's basically impossible cause the redirect source would be inserted before the target... oops.

So basically there's no easy way from the API to add an hidden visit, if we need it we need to figure out something.

> Is there any value to syncing the hidden visits at all?

there may be, a user may search a redirect source by search terms cause maybe he is used to type a url that always redirects. and in case of FRAMED_LINKS, one may want the visits for visited link coloring. for example suppose you use a newsgroup where the left pane contains threads and the right pane contains bodies, you may want to know across devices which threads you already read.
Flags: needinfo?(mak77)
Thanks so much for the helpful explanation!

If we decide to sync hidden visits, I think we'll want an API to mark them as such, so that they don't show up in the history menu. FWIW, I don't think we sync the frecency, either; we just use it as a hint for record order. That opens up another conversation about whether syncing frecencies even makes sense: the set of sites you visit most often on mobile might be intentionally different than those on desktop!

So, we have a few options to consider:

* Closing this bug means we'll sync all history, but people will see "dupes" for redirects in the menu. This makes for a confusing UX, especially if the source doesn't have a title. (What's the difference between "mozilla.org/", "www.mozilla.org/", and "Internet for people, not profit - Mozilla"?)

* Excluding hidden visits and framed links means the menu will look right, but people won't be able to search those visits, or see accurate visited link coloring. We can think of this as a form of data loss.

* Syncing hidden visits requires adding new fields for "hidden" and "referrer" to the Sync record payload (how to do this is an open question), a Places API for marking visits as hidden, and maybe changes to Android and iOS.
I think Bug 623667 is relevant here.

We're also talking about this in the context of Tofino: https://github.com/mozilla/tofino/issues/663

To summarize, there is value in syncing every URL the browser _starts at_ or _ends up at_, because the user might try to find the history item by one of those URLs. For Comment 4, for example, if you type "http://bbc" you would expect to see a match in your history (and, indeed, for the browser to be smart enough to show you the title of https://www.bbc.co.uk!).

And there's value in syncing more URLs than that: e.g., link coloring for intermediate hops.

But without having good redirect/referrer/etc. chains in place alongside the raw data, syncing every URL can cause a mess. That mess is this bug.


We already want to revisit (haha) history record formats. As discussed in a mail thread about containers (Bug 1288858):

---
The Sync history data format will need to be redesigned, and this is something we seem to only do once per decade. This is a little complicated, and several things will piggyback on this work:

- We should fix the inefficiency in the current format mentioned in Bug 717136.
- There's more data we should sync: Bug 623667 (and Bug 559175), but also probably the visiting device and device type (Bug 1302797). 
- We might want to split things, 'cos one-record-per-URL is expensive for incremental change.
- Sync currently can't delete individual visits. We should definitely fix that: not only does it make "forget recent history" very buggy, but it'll cause cross-container deletion for you.
---


So my proposals:

* In the short term, don't track history items that have only hidden visits, because we don't include enough information to reconstruct useful history on other clients.

* In the longer term (though probably not that long), design a history format that can represent the data that browsers actually record and need to exchange.
Attachment #8811548 - Flags: review?(markh)
Comment on attachment 8811548 [details]
Bug 1173359 - Exclude hidden pages and framed links from syncing.

https://reviewboard.mozilla.org/r/93632/#review93706

> Should we use PlacesUtils.history.TRANSITIONS.FRAMED_LINK instead of the literal here? (it seems odd places apparently has no constant for 0, but there you go!)

Sure, let's use the constant here. It's trickier for the chunked query; for simplicity, I think we should either leave it as is, or embed it into the SQL via `${PlacesUtils.history.TRANSITION_FRAMED_LINK}`.
Comment on attachment 8811547 [details]
Bug 1173359 - Convert the history tracker tests to use async functions.

https://reviewboard.mozilla.org/r/93630/#review95472

awesome!
Attachment #8811547 - Flags: review?(markh) → review+
Comment on attachment 8811548 [details]
Bug 1173359 - Exclude hidden pages and framed links from syncing.

https://reviewboard.mozilla.org/r/93632/#review95476

awesomer!

Do you think there's anything a possible future history validator/repairer might do here? If so, then maybe we should get a bug on file so we don't lose this? Or maybe even a TODO comment in the file? Or maybe it's just not worth bothering about at all?
Attachment #8811548 - Flags: review?(markh) → review+
(In reply to Mark Hammond [:markh] from comment #8)
> (it seems odd places apparently has no constant for 0, but
> there you go!)

We had a constant for 0, and IIRC it was TRANSITION_UNKNOWN. Why should we ever add an unknown source visit beats me, so we just removed it in the early days after the Google team left.
Pushed by kcambridge@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/c55ce9775296
Convert the history tracker tests to use async functions. r=markh
https://hg.mozilla.org/integration/autoland/rev/4c39528f81a9
Exclude hidden pages and framed links from syncing. r=markh
Blocks: 1320185
https://hg.mozilla.org/mozilla-central/rev/c55ce9775296
https://hg.mozilla.org/mozilla-central/rev/4c39528f81a9
Status: NEW → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
Target Milestone: --- → Firefox 53
Flags: qe-verify+
I was not able to reproduce this bug since is a server side issue.
I confirm that the issue is no longer reproducible on Firefox 53.0b1, Firefox 54.0a2 (2017-03-13), Firefox 55.0a1 (2017-03-13).
Tests were performed under Windows 8.1 x64, Mac OS X 10.11, Ubuntu 14.04x86.
Status: RESOLVED → VERIFIED
Flags: qe-verify+
Depends on: 1366713
Duplicate of this bug: 1376183
Duplicate of this bug: 559175
You need to log in before you can comment on or make changes to this bug.