Closed Bug 1266232 Opened 8 years ago Closed 8 years ago

Ensure that we don't insert orphaned visits from HistoryExtensionsDB during the local visits migration

Categories

(Firefox for Android Graveyard :: Data Providers, defect)

defect
Not set
normal

Tracking

(firefox48 fixed, fennec48+)

RESOLVED FIXED
Firefox 48
Tracking Status
firefox48 --- fixed
fennec 48+ ---

People

(Reporter: Grisha, Assigned: Grisha)

References

Details

Attachments

(1 file)

HistoryExtensions.db -> visits@browser.db migration (aka "visits migration") currently assumes that FK constraints are enabled. It relies on the constraint checks for performance reasons: when porting over visits from HistoryExtensionsDB, just insert them (and ignore failures) without first checking if HistoryGUID is actually valid (i.e. corresponding record exists in the History table).
 
In SQLite, foreign key constraints are OFF by default, and must be enabled explicitly after connecting via "PRAGMA foreign_keys=ON".

We currently execute that pragma statement in an onOpen callback. However, "This method is called after the database connection has been configured and after the database schema has been created, upgraded or downgraded as necessary" (see [0]). So that means we don't have FK constraints enabled for the visits migration, which results in potentially large number of orphan visits being inserted.

From the docs on foreign_keys pragma: "This pragma is a no-op within a transaction; foreign key constraint enforcement may only be enabled or disabled when there is no pending BEGIN or SAVEPOINT."

We run our migrations in a transaction (or rather, SQLiteOpenHelper does), so we can't enable FKs on per-migration basis.

There's an onConfigure (see [1]), which is called before onCreate, onUpgrade, onDowngrade or onOpen. This could be a good point at which we enable FK constraints. However, I fear that it might cause unexpected issues with our long stack of various data migrations... Also, it's API16+...

Constraint 0: we can't loose any valid visit information, because that will make us a bad Sync user
Assumption 0: we can't enable FK constraints for the visits migration

That leaves us a few different options:
1) cleanup bad visits after they've been inserted. We can get a nice list of bad visits by running: pragma main.foreign_key_check(visits); (returns a list of constraint-violating row IDs). Then we can batch-delete them (up to 999 deletes per one statement)
2) while processing HistoryExtensionsDB, look up each History GUID, and don't insert that record's visits if we don't have the GUID locally. This was done at some point, but was scrapped due to fear of large performance impact (we'll be doing a bunch of extra SELECTs... but on an indexed field).
3) ???


If (1) can be performed outside of migrations entirely - spin off a task to run at some point in the future? - then that's not too bad.
If (2)'s performance impact isn't as large as feared due to the fact that we're doing SELECTs on indexed fields, that might be acceptable - and will save us from doing a ton of INSERTs which we don't need (which are much much more expensive than selects!!). That's especially the case due to asynchronous relationship between # of SELECTs vs # of INSERTs - we might save ourselves from doing up to 20-30 INSERTs by running only 1 SELECT. For heavy (and old) Sync users, this approach is likely to be a net-win in terms of total migration time.

I'm leaning towards (2)...

[0] https://developer.android.com/intl/ja/reference/android/database/sqlite/SQLiteOpenHelper.html#onOpen(android.database.sqlite.SQLiteDatabase)
[1] https://developer.android.com/intl/ja/reference/android/database/sqlite/SQLiteOpenHelper.html#onConfigure(android.database.sqlite.SQLiteDatabase)
Additional background info:

This migration might encounter A LOT of orphaned visits. In margaret's case, 70% of 75k visits where orphaned visits (with bad history GUIDs). Bulk of the time was spent on inserting those visits, so we want to ideally eliminate that.

Now, why those visits are there in the first place - I'm not entirely sure. IIRC history expiration/cleanup/etc didn't actually clean up hist.ext.db on Android, so for a user with an old profile + old hist.ext.db, over time you'll end up with a mass of orphaned visits.

I don't see how else we might end up with this discrepancy, since clients push up visits to Sync by attaching them to history records.
Looking closer at my own code (duh), I see that we actually do a SELECT for each history record from hist.ext.db anyway, in order to figure out how many local visits we need to synthesize...

So that seems like the path forward - do that one SELECT, and use it for both (1) determining if we need to proceed with INSERTs and (2) synthesizing local visits.
Additionally, simplifies visit synthesis and fixes a (not yet encountered) infinite loop which will arise if we encounter `null` instead of a visits json array

Review commit: https://reviewboard.mozilla.org/r/47907/diff/#index_header
See other reviews: https://reviewboard.mozilla.org/r/47907/
Attachment #8743597 - Flags: review?(michael.l.comella)
tracking-fennec: --- → 48+
Comment on attachment 8743597 [details]
MozReview Request: Bug 1266232 - be extra careful and mindful of performance when migrating visits r=mcomella

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/47907/diff/1-2/
Comment on attachment 8743597 [details]
MozReview Request: Bug 1266232 - be extra careful and mindful of performance when migrating visits r=mcomella

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/47907/diff/2-3/
Attachment #8743597 - Attachment description: MozReview Request: Bug 1266232 - ensure we don't insert orphan visits during a visits migration r=mcomella → MozReview Request: Bug 1266232 - be extra careful and mindful of performance when migrating visits r=mcomella
Comment on attachment 8743597 [details]
MozReview Request: Bug 1266232 - be extra careful and mindful of performance when migrating visits r=mcomella

https://reviewboard.mozilla.org/r/47907/#review45041

No r+ yet because I hear there is a revision coming!

::: mobile/android/base/java/org/mozilla/gecko/db/BrowserDatabaseHelper.java:544
(Diff revision 3)
>                  return;
>              }
>  
> -            final int guidCol = cursor.getColumnIndexOrThrow(columnGuid);
> -            while (!cursor.isAfterLast()) {
> -                final String guid = cursor.getString(guidCol);
> +            // We're wrapping all of the INSERTs into one transaction, and committing them at once.
> +            // Below we're extra careful to ensure our individual inserts won't fail.
> +            db.beginTransaction();

Upgrades already occur inside transactions, right?

::: mobile/android/base/java/org/mozilla/gecko/db/BrowserDatabaseHelper.java:548
(Diff revision 3)
> -            while (!cursor.isAfterLast()) {
> -                final String guid = cursor.getString(guidCol);
> -                final JSONArray visitsInHistoryExtensionDB = RepoUtils.getJSONArrayFromCursor(cursor, columnVisits);
> +            // Below we're extra careful to ensure our individual inserts won't fail.
> +            db.beginTransaction();
> +
> +            final int guidCol = historyExtensionCursor.getColumnIndexOrThrow(columnGuid);
> +
> +            // Use prepared (aka "compiled") SQL statements because they are much faster when we're inserting

nit: faster than ContentValues. Why? Just because of GC churn or are there other reasons? Add to the comment.

::: mobile/android/base/java/org/mozilla/gecko/db/BrowserDatabaseHelper.java:553
(Diff revision 3)
> +            // We care about ON CONFLICT IGNORE because we want to ensure that in rare case of a (GUID,DATE)
> +            // clash (the UNIQUE constraint), we will not fail the transaction, and just skip conflicting row.

When could there be a (GUID,DATE) clash? Add a comment.

::: mobile/android/base/java/org/mozilla/gecko/db/BrowserDatabaseHelper.java:567
(Diff revision 3)
> +
> +            while (!historyExtensionCursor.isAfterLast()) {
> +                final String guid = historyExtensionCursor.getString(guidCol);
> +
> +                // Sanity check, let's not risk a bad incoming GUID.
> +                if (guid == null || guid.isEmpty()) {

nit: `TextUtils.isEmpty(guid)`

::: mobile/android/base/java/org/mozilla/gecko/db/BrowserDatabaseHelper.java:568
(Diff revision 3)
> +            while (!historyExtensionCursor.isAfterLast()) {
> +                final String guid = historyExtensionCursor.getString(guidCol);
> +
> +                // Sanity check, let's not risk a bad incoming GUID.
> +                if (guid == null || guid.isEmpty()) {
> +                    historyExtensionCursor.moveToNext();

nit: It's error prone to be updating the position when erroring out.

`Cursor.moveToNext` returns false if the move fails (e.g. attempt move after last) and the `SQLiteDatabase.query` docs claim the Cursor is returned positioned before the first element so:

while (cursor.moveToNext()) {
  ...
}

If you remove the `cursor.moveToFirst` line.

::: mobile/android/base/java/org/mozilla/gecko/db/BrowserDatabaseHelper.java:578
(Diff revision 3)
> +                // We might have a lot of "orphaned visits" with improper GUIDs. so let's avoid doing
> +                // unnecessary work. Note that we don't have foreign key constraints enabled at this point.
> +                // See Bug 1266232 for details.
> +                final Cursor historyCursor = db.query(
> +                        History.TABLE_NAME,
> +                        new String[] {History.GUID}, History.GUID + " = ?", new String[] {guid},

nit: run checkstyle (or tell me if it doesn't work)

::: mobile/android/base/java/org/mozilla/gecko/db/BrowserDatabaseHelper.java:612
(Diff revision 3)
> -                    baseVisitDateForSynthesis = (Long) ((JSONObject) visitsInHistoryExtensionDB.get(0)).get("date") - 1;
> -                } else {
> -                    baseVisitDateForSynthesis = System.currentTimeMillis() - 1;
> -                }
> +                    }
>  
> -                insertSynthesizedVisits(db,
> +                    final Long date = (Long) visit.get("date");

Any possibility of these casts failing? :P

::: mobile/android/base/java/org/mozilla/gecko/db/BrowserDatabaseHelper.java:629
(Diff revision 3)
> -                return 0;
> +                        // Strange, SQL string is invalid for some reason.
> +                        // These visit inserts are "best effort", so let's ignore a failure here.

Why are these best efforts? It's my understanding the SQLException is only thrown if the SQL statement is invalid, which sounds like a programmer error that we should try to catch by throwing (e.g. in the worst case, rather than having some users have none of their data copied).

Though I suppose it's possible the types are off, but then I suppose we're really screwed anyway.
Attachment #8743597 - Flags: review?(michael.l.comella)
https://reviewboard.mozilla.org/r/47907/#review45043

btw, consider adding some helper functions for readability – I think the low level of abstraction and length of the method are challenging.
Comment on attachment 8743597 [details]
MozReview Request: Bug 1266232 - be extra careful and mindful of performance when migrating visits r=mcomella

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/47907/diff/3-4/
Attachment #8743597 - Flags: review?(michael.l.comella)
Manual tests I've ran through for the updated migration (all from v31):
- no sync, no local visits
- no sync, some local visits
- sync, no local visits
- sync, local visits

-> after each, sync was enabled with new or existing account.

Also, I've created a somewhat large (30k visits) history_extension_database based on data obtained from an old/crusty Sync profile, and upgraded from that. Unfortunately I didn't have an old, slow, rooted device handy to do this with - just an emulator. It worked fine with a small initial delay while the migration ran - 0.5-1 second freeze of top sites.

Ideally this needs a wider variety of profiles, devices, etc.
Comment on attachment 8743597 [details]
MozReview Request: Bug 1266232 - be extra careful and mindful of performance when migrating visits r=mcomella

https://reviewboard.mozilla.org/r/47907/#review45279

::: mobile/android/base/java/org/mozilla/gecko/db/BrowserDatabaseHelper.java:565
(Diff revision 4)
> +            while (!historyExtensionCursor.isAfterLast()) {
> +                final String guid = historyExtensionCursor.getString(guidCol);
> +
> +                // Sanity check, let's not risk a bad incoming GUID.
> +                if (guid == null || guid.isEmpty()) {
> +                    historyExtensionCursor.moveToNext();

As we spoke about on irc, move the Cursor iteration to the `while` loop.

::: mobile/android/base/java/org/mozilla/gecko/db/BrowserDatabaseHelper.java:569
(Diff revision 4)
> +                if (guid == null || guid.isEmpty()) {
> +                    historyExtensionCursor.moveToNext();
>                      continue;
>                  }
>  
> -                debug("Inserting " + visitsInHistoryExtensionDB.size() + " visits from history extension db");
> +                // First, check if history with given GUID exists in the History table.

nit: this could be a good place to have a helper method for readability.

while (iterating over cursor) {
  historyExtGUID = whatever from cursor.
  if (!isGUIDInHistoryTable(historyExtGUID, ...) {
    continue;
  }
  
  // Do stuff assuming GUID in table.
}

---
The reason I'm such a stiff about this is as I read through the code, when I get to the bottom of the method, it's hard to remember what happened at the beginning of the method and I have to look back to remind myself. However, this is not something that I get easily from the existing code and I'm forced to make these abstractions (e.g. isGUIDInHistoryTable) in my head – it'd be helpful to me if it was already in the code! :P

::: mobile/android/base/java/org/mozilla/gecko/db/BrowserDatabaseHelper.java:570
(Diff revision 4)
> +                    historyExtensionCursor.moveToNext();
>                      continue;
>                  }
>  
> -                debug("Inserting " + visitsInHistoryExtensionDB.size() + " visits from history extension db");
> -                for (int i = 0; i < visitsInHistoryExtensionDB.size(); i++) {
> +                // First, check if history with given GUID exists in the History table.
> +                // We might have a lot of "orphaned visits" with improper GUIDs. so let's avoid doing

nit: `GUIDs so`

::: mobile/android/base/java/org/mozilla/gecko/db/BrowserDatabaseHelper.java:570
(Diff revision 4)
> -                for (int i = 0; i < visitsInHistoryExtensionDB.size(); i++) {
> -                    final ContentValues cv = new ContentValues();
> +                // We might have a lot of "orphaned visits" with improper GUIDs. so let's avoid doing
> +                // unnecessary work. Note that we don't have foreign key constraints enabled at this point.

nit: For someone without context, I'm afraid it could be unclear what an "orphaned visit" is and what an "improper GUID" is.

I think this means there may be, "a lot of entries in the history extension table whose GUID doesn't match a GUID from the history table" – it'd be good to be more explicit.

::: mobile/android/base/java/org/mozilla/gecko/db/BrowserDatabaseHelper.java:583
(Diff revision 4)
> -                } else {
> -                    baseVisitDateForSynthesis = System.currentTimeMillis() - 1;
>                  }
> -
> -                insertSynthesizedVisits(db,
> -                        generateSynthesizedVisits(
> +                try {
> +                    // No history record found for given GUID - skipping!
> +                    if (!historyCursor.moveToFirst()) {

nit: I think `getCount` is a little clearer.

::: mobile/android/base/java/org/mozilla/gecko/db/BrowserDatabaseHelper.java:1520
(Diff revision 4)
> +
> +        // Sync was set up, history extensions database is present.
> +        // Copy visits over and delete the old database.
> +        if (historyExtensionsDatabase.exists()) {
> -        try {
> +            try {
> -            historyExtensionDb = SQLiteDatabase.openDatabase(
> +                historyExtensionDb = SQLiteDatabase.openDatabase(historyExtensionsDatabase.getPath(), null,

This could throw SQLiteException if it can't be opened, in which case I think we'd want to synthesize.

::: mobile/android/base/java/org/mozilla/gecko/db/BrowserDatabaseHelper.java:1569
(Diff revision 4)
> +            final String insertSqlStatement = "INSERT OR IGNORE INTO " + Visits.TABLE_NAME + "(" +
> +                    Visits.DATE_VISITED + "," +
> +                    Visits.HISTORY_GUID + "," +
> +                    Visits.IS_LOCAL + "," +
> +                    Visits.VISIT_TYPE +
> +                    ") VALUES (?, ?, 1, 1)";

Why is `VISIT_TYPE` 1? Consider binding from a constant, or add a comment.

::: mobile/android/base/java/org/mozilla/gecko/db/BrowserDatabaseHelper.java:1573
(Diff revision 4)
> +                    Visits.VISIT_TYPE +
> +                    ") VALUES (?, ?, 1, 1)";
> +            final SQLiteStatement compiledInsertStatement = db.compileStatement(insertSqlStatement);
> +
> +            // For each history record, insert as many visits as there are recorded in the VISITS column.
>              while (!cursor.isAfterLast()) {

nit: `cursor.moveToNext()`
Attachment #8743597 - Flags: review?(michael.l.comella)
Comment on attachment 8743597 [details]
MozReview Request: Bug 1266232 - be extra careful and mindful of performance when migrating visits r=mcomella

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/47907/diff/4-5/
Attachment #8743597 - Flags: review?(michael.l.comella)
https://reviewboard.mozilla.org/r/47907/#review45041

> nit: `TextUtils.isEmpty(guid)`

I personally like explicitness of this more. Other than hiding the null check, what's the difference?

> nit: run checkstyle (or tell me if it doesn't work)

checkstyle passes as is

> Any possibility of these casts failing? :P

i hope not... maybe.

> Why are these best efforts? It's my understanding the SQLException is only thrown if the SQL statement is invalid, which sounds like a programmer error that we should try to catch by throwing (e.g. in the worst case, rather than having some users have none of their data copied).
> 
> Though I suppose it's possible the types are off, but then I suppose we're really screwed anyway.

Going with throw. Agree about programmer error, let's see if this happens. (it shouldn't, but crashes might point us somewhere useful).
(In reply to :Grisha Kruglov from comment #12)
> > nit: `TextUtils.isEmpty(guid)`
> 
> I personally like explicitness of this more. Other than hiding the null
> check, what's the difference?

Advantages:
 * It's safer:
  - if you use it as your go-to, you're less likely to cause a NullPointerException than if your go-to was String.isEmpty
  - if this code gets re-factored, it's less likely to lose a component (e.g. the null check is removed and later the value can become null again)
 * It's more concise:
  - It's one (granted, abstracted) operation as opposed to 2 and you can only hold so many things in your head at once so it eases the burden on your brain
  - Visually, it's more concise, making it faster to read and skim for meaning

The disadvantage is you have to have internalized what `TextUtils.isEmpty` does in order to benefit from it (but it's an Android go-to so I'm not too worried).

On the note of explicitness, I think there's a balance – if we favored pure explicitness, we'd be managing our own memory (which is quite error prone!), or at the extreme, writing binary! That being said, removing all explicitness makes it easy to be untrusting (e.g. does this really do what I think it does?).

For me, I tend to favor high-level abstraction: it provides an easy overview and if I really don't trust what's going on, I can jump to the method definition to find out more. I think this is also helpful to others who come to view the code so they're able to understand the general idea rather than be overwhelmed with null-checking details (which generally aren't that important to see).

If you have another point of view, I'd be interested to hear it! :)

> checkstyle passes as is

Strange. My expectation is `new String [] { whatever, yeah }`, but it's not that important.
(In reply to Michael Comella (:mcomella) from comment #13)
> That being said, removing all
> explicitness makes it easy to be untrusting (e.g. does this really do what I
> think it does?).

And it can make the code less powerful or performant by using an overly general solution (which is a trade-off I often make for code readability).
Comment on attachment 8743597 [details]
MozReview Request: Bug 1266232 - be extra careful and mindful of performance when migrating visits r=mcomella

https://reviewboard.mozilla.org/r/47907/#review45335

::: mobile/android/base/java/org/mozilla/gecko/db/BrowserDatabaseHelper.java:601
(Diff revision 5)
> -                                getNumberOfVisitsToSynthesize(db, guid, visitsInHistoryExtensionDB.size()),
> -                                guid, baseVisitDateForSynthesis
> -                        )
> -                );
> +                    final Long date;
> +                    final Long visitType;
> +                    try {
> +                        date = (Long) visit.get("date");
> +                        visitType = (Long) visit.get("type");
> +                    } catch (Exception e) {

nit: We should be specific as to which error we're looking for -> `catch (ClassCastException ...`

::: mobile/android/base/java/org/mozilla/gecko/db/BrowserDatabaseHelper.java:633
(Diff revision 5)
> -                Log.e(LOGTAG, "Expected to get history visit count with guid but failed: " + guid);
> -                return 0;
> +            if (!historyCursor.moveToFirst()) {
> +                return false;

nit: you could actually do `return historyCursor.moveToFirst()`

::: mobile/android/base/java/org/mozilla/gecko/db/BrowserDatabaseHelper.java:1535
(Diff revision 5)
> +                    synthesizeAndInsertVisits(db, false);
> +                }
> +
> +                if (historyExtensionDb != null) {
> +                    copyHistoryExtensionDataToVisitsTable(historyExtensionDb, db);

I'm not sure about the execution rules of java that would ensure if `catch` occurs, `historyExtensionDb` will be null. I'd rather set a `wasCaught` flag or similar and check it later, e.g.

if (historyExtensionDb != null && !wasCaught) {
  copy...
} else {
  synthesize();
}

Error handling is always so icky. :\

::: mobile/android/base/java/org/mozilla/gecko/db/BrowserDatabaseHelper.java:1606
(Diff revision 5)
> +                // Sanity check.
> +                if (guid == null) {
> +                    continue;
> +                }
> +
> +                // In a strange case that lastVisitedDate is a very low number, let's not introduce

Weird! How common is this?

If super uncommon, sure, let's dump it. Otherwise, it might be worth adding the error handling code.

::: mobile/android/base/java/org/mozilla/gecko/db/BrowserDatabaseHelper.java:1618
(Diff revision 5)
> +                    final long offsetVisitedDate = lastVisitedDate - i;
> +                    compiledInsertStatement.clearBindings();
> +                    compiledInsertStatement.bindLong(1, offsetVisitedDate);
> +                    compiledInsertStatement.bindString(2, guid);
> +                    if (markAsLocal) {
> +                        compiledInsertStatement.bindLong(3, 1);

Is there a constant we can use here instead of hard-coded values? If not, can we make one? :)
Comment on attachment 8743597 [details]
MozReview Request: Bug 1266232 - be extra careful and mindful of performance when migrating visits r=mcomella

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/47907/diff/5-6/
Attachment #8743597 - Flags: review?(michael.l.comella)
https://reviewboard.mozilla.org/r/47907/#review45335

> Weird! How common is this?
> 
> If super uncommon, sure, let's dump it. Otherwise, it might be worth adding the error handling code.

Should be veeerrry uncommon.

> Is there a constant we can use here instead of hard-coded values? If not, can we make one? :)

for 1 & 0, i think a comment is sufficient.
Attachment #8743597 - Flags: review?(michael.l.comella) → review+
Comment on attachment 8743597 [details]
MozReview Request: Bug 1266232 - be extra careful and mindful of performance when migrating visits r=mcomella

https://reviewboard.mozilla.org/r/47907/#review45341

lgtm. Thanks for the quick review cycles! :)

::: mobile/android/base/java/org/mozilla/gecko/db/BrowserDatabaseHelper.java:1617
(Diff revision 6)
> +                for (int i = 0; i < numberOfVisits; i++) {
> +                    final long offsetVisitedDate = lastVisitedDate - i;
> +                    compiledInsertStatement.clearBindings();
> +                    compiledInsertStatement.bindLong(1, offsetVisitedDate);
> +                    compiledInsertStatement.bindString(2, guid);
> +                    // Very old school, 1 is true and 0 is false :)

Still think this should be a constant but if we change it, we run the risk of breaking things so your call.
Comment on attachment 8743597 [details]
MozReview Request: Bug 1266232 - be extra careful and mindful of performance when migrating visits r=mcomella

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/47907/diff/6-7/
https://hg.mozilla.org/integration/fx-team/rev/37f04460ddb76d6ef4e7c32a8a6b2fbc44cb8776
Bug 1266232 - be extra careful and mindful of performance when migrating visits r=mcomella
https://hg.mozilla.org/mozilla-central/rev/37f04460ddb7
Status: ASSIGNED → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Target Milestone: --- → Firefox 48
Product: Firefox for Android → Firefox for Android Graveyard
You need to log in before you can comment on or make changes to this bug.