Closed Bug 1339413 Opened 4 years ago Closed 4 years ago

Cap the amount of history we import from Chrome for the next onboarding funnelcake

Categories

(Firefox :: Migration, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
Firefox 54
Tracking Status
firefox52 --- fixed
firefox-esr52 --- fixed
firefox53 --- fixed
firefox54 --- fixed

People

(Reporter: dao, Assigned: dao)

References

(Blocks 1 open bug)

Details

Attachments

(1 file)

<Gijs> mak: so my basic question is, esp. if we want to uplift to beta, what's the best way forward?
<mak> if we wantto uplift to beta, I don't see any best way forward apart from checking what the observers are doing (and maybe delay that work) and trying to use onManyFrecenciesChanged if updatePlaces is invoked with an array bigger than N
<dao> for beta specifically we could potentially add a hack that's only used for the funnelcake. obviously it can't break tons of add-ons or the places database, but it doesn't need to be perfect either
...
<mak> btw, the manyFrecenciesChanged fix sounds like feasible
<Gijs> yes, I already have a local modification and it seems to not crash and burn terribly
...
<Gijs> that sounds like the maximum we can do for beta, then...
<Gijs> even on my fast machine importing ~24000 items takes 2 minutes right now
<Gijs> which ideally I'd like to improve
<Gijs> but if it can not hang the UI while it's doing it, I guess it might be OK to wait with that
...
<dao> we can still cap the amount of history we import (again, potentially limited to beta)
...
<Gijs> dao: ok! Feel free to ping me with questions or something... I think for Chrome you basically just want to update this query: https://dxr.mozilla.org/mozilla-central/source/browser/components/migration/ChromeProfileMigrator.js#314 to ignore items from before a certain date, and potentially sort by date and limit to a certain number of items, or something
I think these might be reasonable default values for the funnelcake:

browser.migrate.chrome.history.maxAgeInDays = 100
browser.migrate.chrome.history.limit = 10000
Assignee: nobody → dao+bmo
Status: NEW → ASSIGNED
Comment on attachment 8837582 [details]
Bug 1339413 - Implement prefs for capping the amount of history we import from Chrome.

https://reviewboard.mozilla.org/r/112704/#review114154

::: browser/app/profile/firefox.js:1538
(Diff revision 2)
> +pref("browser.migrate.chrome.history.limit", 0);
> +pref("browser.migrate.chrome.history.maxAgeInDays", 0);

I think it makes sense to set some non-0 limit even in other cases than the funnelcake.

::: browser/components/migration/ChromeProfileMigrator.js:325
(Diff revision 2)
> -        let rows = yield MigrationUtils.getRowsFromDBWithoutLocks(historyFile.path, "Chrome history",
> -          `SELECT url, title, last_visit_time, typed_count FROM urls WHERE hidden = 0`);
> +        let MAX_AGE_IN_DAYS = Services.prefs.getIntPref("browser.migrate.chrome.history.maxAgeInDays");
> +        let LIMIT = Services.prefs.getIntPref("browser.migrate.chrome.history.limit");

Nit: const (for both of these lines)

::: browser/components/migration/ChromeProfileMigrator.js:334
(Diff revision 2)
> +        if (MAX_AGE_IN_DAYS) {
> +          let maxAge = dateToChromeTime(new Date() - MAX_AGE_IN_DAYS * 24 * 60 * 60 * 1000);
> +          query += " AND last_visit_time > " + maxAge;
> +        }
> +        if (LIMIT) {
> +          query += " ORDER BY last_visit_time DESC LIMIT " + LIMIT;

Have you done some tests with large amounts of history and whether the `order by` clause causes any negative effects in terms of the time it takes to get this data?

Also, have you checked if this code still works if you run it while Chrome is running (and modifying history) ?
Attachment #8837582 - Flags: review?(gijskruitbosch+bugs)
(In reply to :Gijs from comment #4)
> > +pref("browser.migrate.chrome.history.limit", 0);
> > +pref("browser.migrate.chrome.history.maxAgeInDays", 0);
> 
> I think it makes sense to set some non-0 limit even in other cases than the
> funnelcake.

We can do that in a separate bug. My main concern here is funnelcake and I don't want to uplift anything that affects other builds.

> ::: browser/components/migration/ChromeProfileMigrator.js:325
> (Diff revision 2)
> > -        let rows = yield MigrationUtils.getRowsFromDBWithoutLocks(historyFile.path, "Chrome history",
> > -          `SELECT url, title, last_visit_time, typed_count FROM urls WHERE hidden = 0`);
> > +        let MAX_AGE_IN_DAYS = Services.prefs.getIntPref("browser.migrate.chrome.history.maxAgeInDays");
> > +        let LIMIT = Services.prefs.getIntPref("browser.migrate.chrome.history.limit");
> 
> Nit: const (for both of these lines)

Oops, typo.

> > +        if (MAX_AGE_IN_DAYS) {
> > +          let maxAge = dateToChromeTime(new Date() - MAX_AGE_IN_DAYS * 24 * 60 * 60 * 1000);
> > +          query += " AND last_visit_time > " + maxAge;
> > +        }
> > +        if (LIMIT) {
> > +          query += " ORDER BY last_visit_time DESC LIMIT " + LIMIT;
> 
> Have you done some tests with large amounts of history and whether the
> `order by` clause causes any negative effects in terms of the time it takes
> to get this data?

No, but last_visit_time must be an index, right? SQL is made first and foremost for this kind of thing. It should be fast. (Famous last words.)

> Also, have you checked if this code still works if you run it while Chrome
> is running (and modifying history) ?

I don't understand how modifying the query like I did could make a difference with regards to whether Chrome is running.
Flags: needinfo?(gijskruitbosch+bugs)
(In reply to Dão Gottwald [:dao] from comment #5)
> (In reply to :Gijs from comment #4)
> > > +pref("browser.migrate.chrome.history.limit", 0);
> > > +pref("browser.migrate.chrome.history.maxAgeInDays", 0);
> > 
> > I think it makes sense to set some non-0 limit even in other cases than the
> > funnelcake.
> 
> We can do that in a separate bug. My main concern here is funnelcake and I
> don't want to uplift anything that affects other builds.

I think we've established we want this everywhere. What do you think the risk is?

> > > +        if (MAX_AGE_IN_DAYS) {
> > > +          let maxAge = dateToChromeTime(new Date() - MAX_AGE_IN_DAYS * 24 * 60 * 60 * 1000);
> > > +          query += " AND last_visit_time > " + maxAge;
> > > +        }
> > > +        if (LIMIT) {
> > > +          query += " ORDER BY last_visit_time DESC LIMIT " + LIMIT;
> > 
> > Have you done some tests with large amounts of history and whether the
> > `order by` clause causes any negative effects in terms of the time it takes
> > to get this data?
> 
> No, but last_visit_time must be an index, right? SQL is made first and
> foremost for this kind of thing. It should be fast. (Famous last words.)

Yeah, um, I would suggest testing, given the trouble we've had with this so far. :-)

> > Also, have you checked if this code still works if you run it while Chrome
> > is running (and modifying history) ?
> 
> I don't understand how modifying the query like I did could make a
> difference with regards to whether Chrome is running.

We run this query without paying attention to locking. Sqlite isn't actually 'safe' for this kind of thing if some other consumer modifies the DB while we're running. We retry this query if it fails outright, but it is also possible for us to miss updates to the DB, or (maybe? Not sure...) to get some parts of those updates but not all of them. I expect that might mess with sorting, but I'm not sure. Something something dark voodoo magic something. A smoketest just seems like a good idea.
Flags: needinfo?(gijskruitbosch+bugs)
(In reply to :Gijs from comment #6)
> > > I think it makes sense to set some non-0 limit even in other cases than the
> > > funnelcake.
> > 
> > We can do that in a separate bug. My main concern here is funnelcake and I
> > don't want to uplift anything that affects other builds.
> 
> I think we've established we want this everywhere. What do you think the
> risk is?

Well, you laid out some risks although I don't agree with them. Basically non-funnelcake builds are just not my priority for this week, and the general risk of uplifting new code applies. I'm happy to file the followup bug and write the patch, though, and I won't stop you if you want to make an argument for uplifting that.

> > > Have you done some tests with large amounts of history and whether the
> > > `order by` clause causes any negative effects in terms of the time it takes
> > > to get this data?
> > 
> > No, but last_visit_time must be an index, right? SQL is made first and
> > foremost for this kind of thing. It should be fast. (Famous last words.)
> 
> Yeah, um, I would suggest testing, given the trouble we've had with this so
> far. :-)

Do you still have your large Chrome profile handy to do a test run with that?

> > > Also, have you checked if this code still works if you run it while Chrome
> > > is running (and modifying history) ?
> > 
> > I don't understand how modifying the query like I did could make a
> > difference with regards to whether Chrome is running.
> 
> We run this query without paying attention to locking. Sqlite isn't actually
> 'safe' for this kind of thing if some other consumer modifies the DB while
> we're running. We retry this query if it fails outright, but it is also
> possible for us to miss updates to the DB, or (maybe? Not sure...) to get
> some parts of those updates but not all of them. I expect that might mess
> with sorting, but I'm not sure. Something something dark voodoo magic
> something. A smoketest just seems like a good idea.

Sqlite may have issues here but adding ORDER BY ... LIMIT ... to the query shouldn't make it any worse. Btw, I actually did try this while Chrome was running, but it's not clear what exactly "while Chrome is modifying history" means.
Flags: needinfo?(gijskruitbosch+bugs)
(In reply to Dão Gottwald [:dao] from comment #7)
> it's not clear what exactly "while Chrome is modifying history" means.

... or how I would reproduce that. I have no insight into and control over when Chrome writes to its database. :/
(In reply to Dão Gottwald [:dao] from comment #8)
> (In reply to Dão Gottwald [:dao] from comment #7)
> > it's not clear what exactly "while Chrome is modifying history" means.
> 
> ... or how I would reproduce that. I have no insight into and control over
> when Chrome writes to its database. :/

This bit is trivial. Open a website of your choice in chrome, set up to be able to import/autoimport, and right before doing so run this in the chrome devtool console:

setInterval(function() { location.hash += "a" }, 2000);


(In reply to Dão Gottwald [:dao] from comment #7)
> (In reply to :Gijs from comment #6)
> > > > I think it makes sense to set some non-0 limit even in other cases than the
> > > > funnelcake.
> > > 
> > > We can do that in a separate bug. My main concern here is funnelcake and I
> > > don't want to uplift anything that affects other builds.
> > 
> > I think we've established we want this everywhere. What do you think the
> > risk is?
> 
> Well, you laid out some risks although I don't agree with them. Basically
> non-funnelcake builds are just not my priority for this week, and the
> general risk of uplifting new code applies. I'm happy to file the followup
> bug and write the patch, though, and I won't stop you if you want to make an
> argument for uplifting that.
> 
> > > > Have you done some tests with large amounts of history and whether the
> > > > `order by` clause causes any negative effects in terms of the time it takes
> > > > to get this data?
> > > 
> > > No, but last_visit_time must be an index, right? SQL is made first and
> > > foremost for this kind of thing. It should be fast. (Famous last words.)
> > 
> > Yeah, um, I would suggest testing, given the trouble we've had with this so
> > far. :-)
> 
> Do you still have your large Chrome profile handy to do a test run with that?

It doesn't have a date range usable for testing whether the date limiting works. I would suggest using your own history file and using SQL to duplicate all the rows in it a few times (appending random bits to the URL every time), and maybe selecting random times for the last visit time. Because this is exponential, you only really have to use sqlitemanager and run the same query to duplicate the rows a few times to get a history db with a large number of rows.
Flags: needinfo?(gijskruitbosch+bugs)
(In reply to :Gijs from comment #9)
> (In reply to Dão Gottwald [:dao] from comment #8)
> > (In reply to Dão Gottwald [:dao] from comment #7)
> > > it's not clear what exactly "while Chrome is modifying history" means.
> > 
> > ... or how I would reproduce that. I have no insight into and control over
> > when Chrome writes to its database. :/
> 
> This bit is trivial. Open a website of your choice in chrome, set up to be
> able to import/autoimport, and right before doing so run this in the chrome
> devtool console:
> 
> setInterval(function() { location.hash += "a" }, 2000);

Just to be clear, this guarantees that Chrome is writing to the DB (on disk, not memory) while I import?

> > Do you still have your large Chrome profile handy to do a test run with that?
> 
> It doesn't have a date range usable for testing whether the date limiting
> works.

I tested that it works. Your question was whether ORDER BY makes the query slower, right?
Flags: needinfo?(gijskruitbosch+bugs)
(In reply to Dão Gottwald [:dao] from comment #10)
> (In reply to :Gijs from comment #9)
> > (In reply to Dão Gottwald [:dao] from comment #8)
> > > (In reply to Dão Gottwald [:dao] from comment #7)
> > > > it's not clear what exactly "while Chrome is modifying history" means.
> > > 
> > > ... or how I would reproduce that. I have no insight into and control over
> > > when Chrome writes to its database. :/
> > 
> > This bit is trivial. Open a website of your choice in chrome, set up to be
> > able to import/autoimport, and right before doing so run this in the chrome
> > devtool console:
> > 
> > setInterval(function() { location.hash += "a" }, 2000);
> 
> Just to be clear, this guarantees that Chrome is writing to the DB (on disk,
> not memory) while I import?

It certainly guarantees the DB is locked as far as sqlite is concerned. How often writes are synced is up to Chrome, but given that it doesn't open the DB until you first navigate / write to history, and given that I've seen slowdown in our import process depending on how small I make the interval in the above snippet, I believe so, yes (so the assumption here is that the import takes at least 2 seconds - feel free to make the interval smaller if that's helpful).

> > > Do you still have your large Chrome profile handy to do a test run with that?
> > 
> > It doesn't have a date range usable for testing whether the date limiting
> > works.
> 
> I tested that it works. Your question was whether ORDER BY makes the query
> slower, right?

Yes.
Flags: needinfo?(gijskruitbosch+bugs)
So I generated some 12000 visits and kept Chrome running while spamming its history. The unlimited query took only 60ms. Limiting the query by result count and/or date only sped up the query further.

If I make my stopwatch include the loop where we call row.getResultByName a few times to generate the "places" array, the time for the unlimited query goes up to about 700ms, and again capping the query cuts the time.
(In reply to Dão Gottwald [:dao] from comment #12)
> So I generated some 12000 visits and kept Chrome running while spamming its
> history. The unlimited query took only 60ms. Limiting the query by result
> count and/or date only sped up the query further.
> 
> If I make my stopwatch include the loop where we call row.getResultByName a
> few times to generate the "places" array, the time for the unlimited query
> goes up to about 700ms, and again capping the query cuts the time.

Great. I think this means this is good to go save for the nits I identified earlier. I'm around for another ~40 minutes to rubberstamp this so you can autoland (if we miss each other, please comment if you would like me to autoland for you assuming I r+ late tonight).
I messed up the third revision. This is the interdiff you want to look at: https://reviewboard.mozilla.org/r/112704/diff/2-4/
Comment on attachment 8837582 [details]
Bug 1339413 - Implement prefs for capping the amount of history we import from Chrome.

https://reviewboard.mozilla.org/r/112704/#review114236
Attachment #8837582 - Flags: review?(gijskruitbosch+bugs) → review+
Pushed by dgottwald@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/e4ab837ee587
Implement prefs for capping the amount of history we import from Chrome. r=Gijs
Blocks: 1340115
https://hg.mozilla.org/mozilla-central/rev/e4ab837ee587
Status: ASSIGNED → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → Firefox 54
(In reply to Dão Gottwald [:dao] from comment #2)
> I think these might be reasonable default values for the funnelcake:
> 
> browser.migrate.chrome.history.maxAgeInDays = 100
> browser.migrate.chrome.history.limit = 10000

Looking at https://telemetry.mozilla.org/new-pipeline/dist.html#!cumulative=0&end_date=2017-01-25&keys=chrome!firefox!edge!360se&max_channel_version=release%252F51&measure=FX_MIGRATION_HISTORY_QUANTITY&min_channel_version=null&product=Firefox&sanitize=1&sort_keys=submissions&start_date=2017-01-18&table=0&trim=1&use_submission_date=0
we'll probably want a lower limit.
Comment on attachment 8837582 [details]
Bug 1339413 - Implement prefs for capping the amount of history we import from Chrome.

Approval Request Comment
[Feature/Bug causing the regression]: bug 1332225 / bug 1332318
[User impact if declined]: auto-import performance issues with the next onboarding funnelcake
[Is this code covered by automated tests?]: no
[Has the fix been verified in Nightly?]: no
[Needs manual test from QE? If yes, steps to reproduce]: as part of the funnelcake
[List of other uplifts needed for the feature/fix]: /
[Is the change risky?]: no
[Why is the change risky/not risky?]: pref'd off by default
[String changes made/needed]: /
Attachment #8837582 - Flags: approval-mozilla-beta?
Attachment #8837582 - Flags: approval-mozilla-aurora?
Comment on attachment 8837582 [details]
Bug 1339413 - Implement prefs for capping the amount of history we import from Chrome.

tweak chrome import for onboarding funnelcake, aurora53+, beta52+
Attachment #8837582 - Flags: approval-mozilla-beta?
Attachment #8837582 - Flags: approval-mozilla-beta+
Attachment #8837582 - Flags: approval-mozilla-aurora?
Attachment #8837582 - Flags: approval-mozilla-aurora+
You need to log in before you can comment on or make changes to this bug.