Closed Bug 1249008 Opened 5 years ago Closed 4 years ago

Importing Chrome history from places organizer hangs for minutes

Categories

(Firefox :: Migration, defect)

x86_64
Windows 7
defect
Not set
normal

Tracking

()

RESOLVED WORKSFORME
Tracking Status
firefox44 --- wontfix
firefox45 --- wontfix
firefox46 --- wontfix
firefox47 --- wontfix
firefox48 --- wontfix
firefox49 --- wontfix
firefox50 --- wontfix
firefox51 --- wontfix
firefox52 --- wontfix
firefox53 --- unaffected
firefox54 --- unaffected
firefox55 --- unaffected

People

(Reporter: pauly, Unassigned)

References

(Blocks 1 open bug)

Details

(Keywords: regression)

[Affected versions]:
- Fx 40 - 47.0a1 (2016-02-16)

[Affected platforms]:
- Win 7 x64

[Steps to reproduce]:
1. Add several bookmarks to Chrome
2. Close Chrome
3. Open Firefox
4. Open Library View
5. Go to Import and Backup/Import Data from another browser
6. Select Chrome and click 'Next'
7. Select all the options and click 'Next'

[Expected result]:
- The import should finish in couple of seconds. All the Chrome's bookmarks should be imported.

[Actual result]:
- The import takes 2-3 min and the bookmarks are not imported. In the browser console appears:
"Error: Transaction timeout, most likely caused by unresolved pending work.
Stack trace:
ConnectionData.prototype<.executeTransaction/promise</timeoutPromise</<@resource://gre/modules/Sqlite.jsm:641:33
setTimeout_timer@resource://gre/modules/Timer.jsm:30:5
MU_showMigrationWizard@resource:///modules/MigrationUtils.jsm:615:5
PO_importFromBrowser@chrome://browser/content/places/places.js:366:5
oncommand@chrome://browser/content/places/places.xul:1:1
 Sqlite.jsm:648
some bookmarks did not successfully migrate."

[Regression range]:
- works fine on Fx 39
- no longer works in Fx 40

[Additional notes]:
- bookmarks are imported if Chrome is running during the import
I'll get back with a regression range ASAP.
OS: All → Windows 7
Hardware: All → x86_64
Blocks: 1094876
Flags: needinfo?(mak77)
I investigated this further and found that this is specific to a heavy Chrome profile that I have on my machine, I couldn't reproduce the problem on another machine.
While moving the Chrome data (history, bookmarks, cookies) step by step to another machine to see what exactly breaks the migration, I found that the 'history' file is the culprit.
Let me know if you want to reproduce the problem, and I'll email the file. You only need to put it in "c:\Users\paul.silaghi\AppData\Local\Google\Chrome\User Data\Default" (note it may be Profile1, Profile2 etc instead of 'Default') and then execute the steps 3-7 in comment 0.
(In reply to Paul Silaghi, QA [:pauly] from comment #3)
> While moving the Chrome data (history, bookmarks, cookies) step by step to
> another machine to see what exactly breaks the migration, I found that the
> 'history' file is the culprit.

Hm, it's possible history is blocking bookmarks import, some sort of thread contention.

> Let me know if you want to reproduce the problem, and I'll email the file.

Yes please, it would be useful.

> You only need to put it in
> "c:\Users\paul.silaghi\AppData\Local\Google\Chrome\User Data\Default" (note
> it may be Profile1, Profile2 etc instead of 'Default') and then execute the
> steps 3-7 in comment 0.

Btw, if it's the history view being slow, I suspect it may improve if we'd implement bug 1107364.
Email sent.
(In reply to Marco Bonardo [::mak] from comment #4)
> (In reply to Paul Silaghi, QA [:pauly] from comment #3)
> > You only need to put it in
> > "c:\Users\paul.silaghi\AppData\Local\Google\Chrome\User Data\Default" (note
> > it may be Profile1, Profile2 etc instead of 'Default') and then execute the
> > steps 3-7 in comment 0.
> 
> Btw, if it's the history view being slow, I suspect it may improve if we'd
> implement bug 1107364.

I think this would be good to check. I've found it to impact the speed at which imports run, too.

Paul, if you close the library window before proceeding through the import wizard, does that make the issue go away?
Flags: needinfo?(paul.silaghi)
(In reply to :Gijs Kruitbosch from comment #6)
> (In reply to Marco Bonardo [::mak] from comment #4)
> > (In reply to Paul Silaghi, QA [:pauly] from comment #3)
> > > You only need to put it in
> > > "c:\Users\paul.silaghi\AppData\Local\Google\Chrome\User Data\Default" (note
> > > it may be Profile1, Profile2 etc instead of 'Default') and then execute the
> > > steps 3-7 in comment 0.
> > 
> > Btw, if it's the history view being slow, I suspect it may improve if we'd
> > implement bug 1107364.
> 
> I think this would be good to check. I've found it to impact the speed at
> which imports run, too.
> 
> Paul, if you close the library window before proceeding through the import
> wizard, does that make the issue go away?

... oh, except that works on mac but not Windows. Sigh.

If you go to the options, then Security, then click "saved logins..." and click the import button in there, does that make it work?
(In reply to :Gijs Kruitbosch from comment #7)
> If you go to the options, then Security, then click "saved logins..." and
> click the import button in there, does that make it work?
Still reproducible.
I've noticed the slowness of the library window too, but I guess that's a separate issue.
Flags: needinfo?(paul.silaghi)
the problem appears to be due to a Sqlite.jsm transaction yielding on some promise that never resolves, nor rejects. thus the "Transaction timeout, most likely caused by unresolved pending work" error.

When we issue updatePlaces() in ChromeProfileMigrator.js we don't wait for it, so bookmarks import starts immediately.
If history keeps a transaction open for a long time, it will block bookmark writes. If updatePlaces takes more than 2 minutes (possible for a large history, Sqlite.jsm thinks the bookmarks transaction is hanged cause it doesn't proceed, and rejects it, discarding some of the bookmarks.

A simple solution could be to wait that updatePlaces is done, before proceeding, by returning a promise from the history resource "migrate" method.

It would be nice if we could also find a more general solution, but it's hard to know if a promise that takes ages to resolve is stale or actually doing something... (see the discussion in bug 1091446)
Flags: needinfo?(mak77)
note, it could also make sense to bump the timeout in sqlite.jsm a little bit, maybe 3 minutes? 5 minutes? no way to get a "right" value.
Updating summary. Specifically, sounds like this doesn't affect importing history for a new Firefox user migrating data into a new profile?
Summary: Bookmarks are not imported if Chrome is closed → Importing Chrome history from places organizer hangs for minutes
mak: Is there anything we can realistically do in the short term here? Comments 9/10 make this sound like this is probably hard to fix? 

Kind of gross, but if this can take minutes, maybe a popup progress bar so people at least know it's still going?
Flags: needinfo?(mak77)
The problem of bookmarks not being imported should have been fixed by bug 1272652. That is due to the fact history import takes a lot of time. A short term fix for that, may also be to limit the amount of history we import, for example we could only import the N most recent pages, instead of everything.

Bug 1107364 is still my best bet to improve the views updating time, that is related to doing the import when the Library is open.

The progress bars is something we evaluated from a long time, but we never had time to look into whether it would be worth or not, there is still plenty of perf work that could be done in the backend and frontend before I could even think about adding progress bars. But before we can re-evaluate batching and micro-optimizations we should finish removing synchronous history API, or our optimizations could be lost in future sync/async changes.
Flags: needinfo?(mak77)
See Also: → 1289217
Mark 51 fix-optional as there are no actions for the moment but still happy to have the fix in 51.
Blocks: 1332318
Depends on: 1340115
Blocks: 1332225
No longer blocks: 1332318
Depends on: 1338522
(This is a drive-by comment, by no means do I ever intend to discredit ppl when I accidentally duplicate statements or seem to plagiate ideas)

The only method that has been a reliable enough method for data retrieval from internal sources or known potentially huge internal sources: chunkification. (Assuming BCNF for relational DB engine sources.)
1. Optionally lock the tables that we're querying.
2. If possible (depends on the DB engine in use) fetch the total amount of records that will be returned by the query.
3. With SQL, we select only using LIMIT with an pre-defined upper bound - say 200; 'LIMIT 0, 200' - to make sure any processing JS doesn't get to block the process forever.
It also keeps memory usage in-check - most modern engines prefer to keep at least the indices but also related often-used tables mapped in memory. A query basically duplicates the set to the JS heap. For other DB engines we'd use different chunkification mechanisms, of course.
4. The next chunk will be fetched using 'LIMIT 200, 200', etc, etc.
5. When we already know the total amount of records for the query, we explicitly know when to stop making requests from the DB. If not, we stop when a query return 0 rows.
6. Release the lock.

This way we'll never need to yield using a timeout to forcibly make processing records async, but instead push that work to the DB engine implicitly, because of the very nature of async socket connections.

For UPDATEs we'd work the very same way - except we chunkify the JS data structures and feed them to the DB engine part by part.

Adding hooks for spinners and other interaction mechanisms can be done by emitting events before a query, after processing chunk of records and when finished using EventEmitter-style thingies. The actual UI can be done later, or course.
unfortunately that system doesn't cope very well with our situation. Locking the tables could still block the main-thread, since our connections can unfortunately be used on both threads yet. Plus Sqlite doesn't implement a real "OFFSET", when you offset it basically still fetches all the records up to limit, and then skips the first N. So you only save some overhead of your db abstraction. That said, chunking is still a good approach from the invoking code, when it can chunk easily.
But note in this case chunking doesn't help, these APIs were not designed to do large insertions, and as such they don't do any batching. In practice there's nothing we could chunk yet.
(In reply to Marco Bonardo [::mak] from comment #17)
> unfortunately that system doesn't cope very well with our situation. Locking
> the tables could still block the main-thread, since our connections can
> unfortunately be used on both threads yet. 

Woohar! That's surprising and sounds fixable. 

> Plus Sqlite doesn't implement a
> real "OFFSET", when you offset it basically still fetches all the records up
> to limit, and then skips the first N.

I'm sorry, but I just WTF'd at this out loud. This is the most naive implementation; nobody wins exposing and/ or using it! Is there a bug on file to fix this?

> But note in this case chunking doesn't help, these APIs were not designed to
> do large insertions, and as such they don't do any batching. In practice
> there's nothing we could chunk yet.

True. In many cases, though, chunking can be added to existing APIs without abstraction cost by rewriting the SQL query. Might seem 'dirty' and/ or hacky, but I've seen that a-plenty in the wild.
(In reply to Mike de Boer [:mikedeboer] from comment #18)
> (In reply to Marco Bonardo [::mak] from comment #17)
> > Plus Sqlite doesn't implement a
> > real "OFFSET", when you offset it basically still fetches all the records up
> > to limit, and then skips the first N.
> 
> I'm sorry, but I just WTF'd at this out loud. This is the most naive
> implementation; nobody wins exposing and/ or using it! Is there a bug on
> file to fix this?

That would have to be a sqlite bug - we don't write sqlite ourselves. But also, not really relevant here as the slowness is about inserts, not selects.

> > But note in this case chunking doesn't help, these APIs were not designed to
> > do large insertions, and as such they don't do any batching. In practice
> > there's nothing we could chunk yet.
> 
> True. In many cases, though, chunking can be added to existing APIs without
> abstraction cost by rewriting the SQL query. Might seem 'dirty' and/ or
> hacky, but I've seen that a-plenty in the wild.

I don't understand. You can't 'rewrite' a sql query designed to insert 1 thing to insert 50 things without changing the rest of the code.

We're inserting items literally 1 by 1, and there are custom sql functions involved with that that either call back into the main app code or cause other inserts/updates. That's what's slow.

So the problem here isn't that we don't know *how* to make this faster, it's finding time to do it and/or finding the quickest route to success given we want to uplift things.
(In reply to Mike de Boer [:mikedeboer] from comment #18)
> > Plus Sqlite doesn't implement a
> > real "OFFSET", when you offset it basically still fetches all the records up
> > to limit, and then skips the first N.
> 
> I'm sorry, but I just WTF'd at this out loud. This is the most naive
> implementation; nobody wins exposing and/ or using it! Is there a bug on
> file to fix this?

Apparently the way to fix this for selects in SQLite is to create a temp table with an AUTO_INCREMENT key + index, fill it up with the entire query results (NOT using the C-API, just SQL) and then use the 'WHERE id > 0 AND id < 200', etc. to chunk. This way you'll also always know the amount of rows before chunking starts, which is nice for progress bars.
(In reply to :Gijs from comment #19)
> I don't understand. You can't 'rewrite' a sql query designed to insert 1
> thing to insert 50 things without changing the rest of the code.
> 
> We're inserting items literally 1 by 1, and there are custom sql functions
> involved with that that either call back into the main app code or cause
> other inserts/updates. That's what's slow.

OK, I was talking about concatenating INSERTs with `;` to send the batch down the wire/ socket in one go. If each INSERT needs to reach into JS, do some more queries and move on... well, I can't comment any more on that. 

> So the problem here isn't that we don't know *how* to make this faster, it's
> finding time to do it and/or finding the quickest route to success given we
> want to uplift things.

I totally understand! It's just that I would've felt bad if I hadn't said anything and moved on, personally. :-)
Wontfix for 53 as we're about to release it. Not sure if this is still an active issue or if it may be taken care of in recent related work. 
Paul, can you still reproduce this for 55? (No rush here)
Indeed, I can no longer reproduce the issue on Fx 53.0, 55.0a1 (2017-04-18) Win 7 x64, Win 10 x64.
Status: NEW → RESOLVED
Closed: 4 years ago
Flags: needinfo?(paul.silaghi)
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.