Open Bug 1428496 Opened 2 years ago Updated 2 years ago
It's possible for sync to fail to de-dupe incoming duplicate history records
Imagine a set of history records is downloaded during sync, and there are duplicates present: two records with the same URI but different GUIDs. Sync de-duping logic assumes that this sort of thin won't happen. A guid->uri map is built from the local database when we start processing records. We check against this map as downloaded records are processed one by one. If we fail to find an incoming record in this map, we queue it for insertion. Eventually, we flush the queue into the database, at which point we also update our guid->uri map. If duplicates happened to be processed in the same "flush batch" - that is, recordA is processed, queued for insertion, then recordB(dupe of recordA) is processed, queued for insertion - we'll end up with both records being queued and sent over to our ContentProvider for insertion. Our BrowserProvider's bulk history insertion logic will do as told, and insert both records. Prior to addition of the transactional bulk-insert, the regular 'insertHistory' path in BrowserProvider will do the same - uniqueness constraints are enforced neither at the schema level, nor at the ContentProvider level.
There are two cases where Desktop can write such records today: if the Places database is restored from a backup on startup, or if you connect a new device with URLs that already exist on the server, but have different local GUIDs. Desktop doesn't dedupe history pages: it looks up by URL, not by GUID, and only inserts visits without changing GUIDs. In the first case, Desktop also has no idea that the database was replaced, so it won't rewind its 'lastSync' time. The next time you visit a URL that's already on the server, it'll upload a duplicate history record with the same URL, but different GUID and visits.
You need to log in before you can comment on or make changes to this bug.