583821 - Firefox Sync full syncs are really slow after SQLite3.7.0 upgrade

Reporter

Description

•

15 years ago

STEPS TO REPRODUCE: 0. Be using a profile that has lots of history (> 5000 entries) and is set up with latest Firefox Sync "experimental version" (I'm using 1.4.3) 1. Start Firefox 2. Tools menu | Sync | Preferences 3. (optional) For simplicity, tell Sync to just synchronize your history (Minefield will: Use my custom settings" in lowest drop-down menu 4. Click "Manage Account" and then "Reset Sync" 5. Choose the bottom option, "Replace all other devices with your local data" 6. Click Next, Next, and then trigger a sync. ACTUAL RESULTS: The sync hangs for at least a minute on my machine, apparently while querying my local history. REGRESSION RANGE (using .txt files in nightly download dirs): 20100723031340 58101a16aff7 20100724030057 30239e4cebd8 http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=58101a16aff7&tochange=30239e4cebd8 This includes both a places change related to history (Bug 580228) as well as a sqlite upgrade (Bug 581000). Filing in Places right now, but it may very well be a sqlite/storage issue.

Daniel Holbert [:dholbert]

Reporter

Comment 1

•

15 years ago

Note that Firefox 4.0 b2 isn't affected by this -- IMHO this should block 4.0b3, or else it'll be a major regression from one beta to the next for Weave users.

blocking2.0: --- → ?

Daniel Holbert [:dholbert]

Reporter

Comment 2

•

15 years ago

A few clarifications on comment 0: - I believe this affects all "full" syncs - I initially discovered it without any "Reset Sync" required. I only include that in my STR because that's a surefire way to get a full sync, and also because it's what I did to find the regression range. - The hang here locks up the UI and spikes the CPU. (whereas in nightlies before this regressed, the CPU usage goes up, but the UI remains responsive. That'd be the 'expected behavior', I guess.)

Shawn Wilsher :sdwilsh

Comment 3

•

15 years ago

What operating system are you on?

Daniel Holbert [:dholbert]

Reporter

Comment 4

•

15 years ago

I've seen this on both Ubuntu 10.04 & 10.10a2. This also might be a dupe of (or related to) Bug 563538. (that's another hang that's marked as blocking Bug 580228, which is the history/places change in this bug's regression range) FWIW, here's the chunk of about:sync-log around the hang (notice that 1:16 elapses before we figure out "5000 outgoing items pre-reconciliation" -- that's the hang): >2010-08-02 11:57:15 Engine.History DEBUG Resetting history last sync time >2010-08-02 11:57:15 Net.Resource DEBUG PUT Length: 563 >2010-08-02 11:57:15 Net.Resource DEBUG PUT success 200 https://phx-sync064.services.mozilla.com/1.0/dholbert/storage/crypto/history >2010-08-02 11:57:15 Engine.History DEBUG First sync, uploading all items >2010-08-02 11:58:31 Engine.History INFO 5000 outgoing items pre-reconciliation >2010-08-02 11:58:31 Collection DEBUG GET success 200 https://phx-sync064.services.mozilla.com/1.0/dholbert/storage/history?full=1

Shawn Wilsher :sdwilsh

Comment 5

•

15 years ago

(In reply to comment #4) > FWIW, here's the chunk of about:sync-log around the hang (notice that 1:16 > elapses before we figure out "5000 outgoing items pre-reconciliation" -- that's > the hang): > >2010-08-02 11:57:15 Engine.History DEBUG Resetting history last sync time > >2010-08-02 11:57:15 Net.Resource DEBUG PUT Length: 563 > >2010-08-02 11:57:15 Net.Resource DEBUG PUT success 200 https://phx-sync064.services.mozilla.com/1.0/dholbert/storage/crypto/history > >2010-08-02 11:57:15 Engine.History DEBUG First sync, uploading all items > >2010-08-02 11:58:31 Engine.History INFO 5000 outgoing items pre-reconciliation > >2010-08-02 11:58:31 Collection DEBUG GET success 200 https://phx-sync064.services.mozilla.com/1.0/dholbert/storage/history?full=1 Mardak - does this tell you anything useful?

Shawn Wilsher :sdwilsh

Comment 6

•

15 years ago

(In reply to comment #4) > This also might be a dupe of (or related to) Bug 563538. (that's another hang > that's marked as blocking Bug 580228, which is the history/places change in > this bug's regression range) Note that that bug has only been seen by people on OS X so far. Knowing that you are on Linux and seeing it would be interesting.

Ed Lee :Mardak

Comment 7

•

15 years ago

Assuming this is latest-weave, it's doing a query SELECT url FROM moz_places_view: http://hg.mozilla.org/services/fx-sync/file/eaffcccd1744/services/sync/modules/engines/history.js#l137 Older versions used history query api and that was locking up the browser, so we switched to sql.

Shawn Wilsher :sdwilsh

Comment 8

•

15 years ago

So that change was made in 1.4a1pre (if I'm reading things correctly) which means dholbert was running that code. Is now a bad time to point out to you that queries against the view, especially with a WHERE clause are going to be *very* slow? As in, create a new temporary table containing the results of the view query, and then querying that (which has no indexes). If you can avoid it, you really /really/ *really* should.

Shawn Wilsher :sdwilsh

Comment 9

•

15 years ago

I should also add that it's possible that the SQLite upgrade regressed querying the view, but we've had a policy in places that we NEVER select from it because it has bad performance. I'd hate to have to start caring about view performance because weave uses it (not to mention that it's likely going away in bug 552023).

Shawn Wilsher :sdwilsh

Comment 10

•

15 years ago

So while this may have been a result of the SQLite upgrade, queries on the view have always been slow; so much so that places specifically forbids using them. Since we are turning on the sync UI in b3 (plan at least), this should block beta 4.

blocking2.0: ? → beta4+

Component: Places → Sync

Keywords: relnote

Product: Toolkit → Weave

QA Contact: places → sync

Version: Trunk → unspecified

Tony Chung [:tchung]

Comment 11

•

15 years ago

flagging for tests when a fix lands

Flags: in-testsuite?

Shawn Wilsher :sdwilsh

Comment 12

•

15 years ago

(Note that this bug is now marked as blocking beta4+ even though it doesn't show up in the UI that way. It does show up in queries for beta 4 blockers.)

Daniel Holbert [:dholbert]

Reporter

Comment 13

•

15 years ago

(In reply to comment #10) > So while this may have been a result of the SQLite upgrade I'm marking this as a regression from that bug (bug 581000), for bookkeeping's sake (even though it sounds like the underlying issue is that Sync shouldn't have been using views in the first place). Also: the SQLite upgrade bug was just backed out for investigation of issues unrelated to this bug. If that backout lasts through the b3 tagging, then this bug won't affect the 4.0b3 release after all.

Blocks: SQLite3.7.0.1

Shawn Wilsher :sdwilsh

Comment 14

•

15 years ago

(In reply to comment #13) > Also: the SQLite upgrade bug was just backed out for investigation of issues > unrelated to this bug. If that backout lasts through the b3 tagging, then this > bug won't affect the 4.0b3 release after all. Assuming that SQLite was the cause of this problem. And it will not be relanding in time for b3.

Daniel Holbert [:dholbert]

Reporter

Comment 15

•

15 years ago

(In reply to comment #14) > Assuming that SQLite was the cause of this problem. FWIW, I just confirmed that it was indeed the cause, by testing an m-c build from just after the sqlite-backout vs. an m-c build with the backout stripped off. (The backout build was fine, and the sans-backout build was affected.)

Daniel Holbert [:dholbert]

Reporter

Updated

•

15 years ago

Summary: Firefox Sync full syncs are really slow since 201007240 m-c nightly → Firefox Sync full syncs are really slow after SQLite3.7.0 upgrade

Mike Connor [:mconnor]

Comment 16

•

15 years ago

(In reply to comment #10) > So while this may have been a result of the SQLite upgrade, queries on the view > have always been slow; so much so that places specifically forbids using them. > Since we are turning on the sync UI in b3 (plan at least), this should block > beta 4. Okay, so what's the recommended alternative to querying the view? It's not fast, but it's correct, data-wise, and doesn't cause hangs.

Marco Bonardo [:mak]

Comment 17

•

15 years ago

(In reply to comment #16) > Okay, so what's the recommended alternative to querying the view? It's not > fast, but it's correct, data-wise, and doesn't cause hangs. https://wiki.mozilla.org/Places/Places_SQL_queries_best_practices or wait for me to kill temp tables... I'm still fighting a couple oranges though.

Shawn Wilsher :sdwilsh

Comment 18

•

15 years ago

(In reply to comment #17) > or wait for me to kill temp tables... I'm still fighting a couple oranges > though. Except that that won't help them exactly because they still support Firefox 3.5 and Firefox 3.6.

Philipp von Weitershausen [:philikon]

Comment 19

•

15 years ago

(In reply to comment #17) > https://wiki.mozilla.org/Places/Places_SQL_queries_best_practices That says not to query the view tables, except that querying the actual tables caused bug 575423. Of course the fix doesn't have a test so there's no way to tell for sure whether it actually solves the problem at hand. :/ (In reply to comment #18) > Except that that won't help them exactly because they still support Firefox 3.5 > and Firefox 3.6. We could have conditional code so that we do the right thing on Firefox 3.5/3.6 and 4.

Marco Bonardo [:mak]

Comment 20

•

15 years ago

(In reply to comment #19) > (In reply to comment #17) > > https://wiki.mozilla.org/Places/Places_SQL_queries_best_practices > > That says not to query the view tables and 5 rows below it says "When using SELECT we must always be sure to query both tables"[...]"To query both tables we can use a special SQL construct"[...]Example.

Shawn Wilsher :sdwilsh

Comment 21

•

15 years ago

Hey look, the blocking flag cannot be changed. Is this fixed now that weave is not querying on the view?

Mike Connor [:mconnor]

Comment 22

•

15 years ago

Is 3.7.0 landed?

Shawn Wilsher :sdwilsh

Comment 23

•

15 years ago

3.7.0.1 landed.

Ed Lee :Mardak

Comment 24

•

15 years ago

see bug 581000 for 3.7.0.1

Mike Beltzner [:beltzner, not reading bugmail]

Updated

•

15 years ago

Assignee: nobody → mitcho

Mike Beltzner [:beltzner, not reading bugmail]

Comment 25

•

15 years ago

Oops, wrong tab! Reassigning to mconnor to resolve for b4; I think we backed out SQLite, though.

Assignee: mitcho → mconnor

Shawn Wilsher :sdwilsh

Comment 26

•

15 years ago

(In reply to comment #25) > Oops, wrong tab! Reassigning to mconnor to resolve for b4; I think we backed > out SQLite, though. And then landed 3.7.0.1 though. I believe that Sync not querying on the views (bug 583852) fixed this, but somebody should confirm that (comment 21)

Daniel Holbert [:dholbert]

Reporter

Comment 27

•

15 years ago

I'll try to confirm. brb.

Daniel Holbert [:dholbert]

Reporter

Comment 28

•

15 years ago

Nope, still broken on a full initial sync (using STR from comment 0). Hangs on the same step as before: >2010-08-16 14:49:55 Engine.History DEBUG First sync, uploading all items >2010-08-16 14:51:20 Engine.History INFO 5000 outgoing items pre-reconciliation That's almost 90-seconds of browser-hanging pain. I'm using Sync 1.4.4 with today's nightly: Mozilla/5.0 (X11; Linux x86_64; rv:2.0b4pre) Gecko/20100816 Minefield/4.0b4pre

Daniel Holbert [:dholbert]

Reporter

Comment 29

•

15 years ago

(In reply to comment #28) > I'm using Sync 1.4.4 with today's nightly FWIW, I tried reproducing with 1.4.4 & an old nightly shortly after this regressed (20100727), but my sync failed in that case (retried a few times), with errors like: 2010-08-16 14:48:37 Engine.Bookmarks DEBUG Total (ms): sync 4377, processIncoming 35, uploadOutgoing 1310, syncStartup 3025, syncFinish 1, createRecord 398, resetLastSync 0, resetClient 1 2010-08-16 14:48:37 RecordMgr DEBUG Failed to import record: Component returned failure code: 0x80070057 (NS_ERROR_ILLEGAL_VALUE) [nsIHttpChannel.requestMethod] Stack trace: Res__request()@resource://services-sync/resource.js:205 < Res_get()@resource://services-sync/resource.js:293 < RecordMgr_import()@resource://services-sync/base_records/wbo.js:119 < RecordMgr_get()@resource://services-sync/base_records/wbo.js:142 < SyncEngine__syncStartup()@resource://services-sync/engines.js:352 < anonymous()@resource://services-sync/engines.js:196 < SyncEngine__sync()@resource://services-sync/engines.js:744 < anonymous()@resource://services-sync/engines.js:196 < WrappedNotify()@resource://services-sync/util.js:115 < Engine_sync()@resource://services-sync/engines.js:206 < WeaveSvc__syncEngine()@resource://services-sync/service.js:1455 < anonymous()@resource://services-sync/service.js:1404 < WrappedNotify()@resource://services-sync/util.js:115 < WrappedLock()@resource://services-sync/util.js:87 < WrappedCatch()@resource://services-sync/util.js:66 < sync()@resource://services-sync/service.js:1289 < anonymous()@sync.js:326 This is all with my "normal" browsing profile. I wonder if the error is due to a recent change in places that makes an older nightly barf? In any case, though, the hanging persists in today's nightly with the most recent "beta channel" version of weave, as noted in comment 28.

Shawn Wilsher :sdwilsh

Comment 30

•

15 years ago

I wonder if turning off automatic_index would help... http://www.sqlite.org/pragma.html#pragma_automatic_index (shooting in the dark here)

Daniel Holbert [:dholbert]

Reporter

Comment 31

•

15 years ago

sdwilsh/mconnor asked me to try "stock" mozilla-central, without the Sync addon installed, to see if that helps. It doesn't. :(

Mike Connor [:mconnor]

Comment 32

•

15 years ago

Okay, so it's not views, since everywhere has that fixed. This looks like a SQLite regression at this point.

Shawn Wilsher :sdwilsh

Comment 33

•

15 years ago

I guess our approach here for the beta is to back out the SQLite upgrade. Can I get some commitment from the Sync team to help me identify the cause of the regression *this week* so I can report it to the SQLite folks so it can hopefully get fixed in time for 3.7.1?

Mike Connor [:mconnor]

Comment 34

•

15 years ago

Whatever help you need from us you'll get.

Shawn Wilsher :sdwilsh

Comment 35

•

15 years ago

(and if this doesn't get backed out by 8 PM PDT, I guess I can do that)

D. Richard Hipp

Comment 36

•

15 years ago

SQLite ticket http://www.sqlite.org/src/info/13f033c865 describes a performance issue in the query planner that was introduced in version 3.7.0 but which has now been fixed. Could this be your problem? A pre-release snapshot of SQLite 3.7.1 is available at http://www.sqlite.org/draft/download.html and the performance regression of ticket [13f033c865] is fixed in that snapshot. Can yall do a test build using the snapshot and see if that clears your problem?

Daniel Holbert [:dholbert]

Reporter

Comment 37

•

15 years ago

sdwi

Daniel Holbert [:dholbert]

Reporter

Comment 38

•

15 years ago

(d'oh, accidentally posted early) sdwilsh: I'd be happy to test a tryserver / custom-patched build with the beta sqlite upgrade that Dr H. referenced above, if you can let me know what I need to do to upgrade.

Daniel Holbert [:dholbert]

Reporter

Comment 39

•

15 years ago

I'm making a local build with the sqlite code from comment 36 and a version-rev in configure.in (don't know if that's required), to match the patches in bug 581000...

Daniel Holbert [:dholbert]

Reporter

Comment 40

•

15 years ago

Yay - that upgrade fixes the issue, in my local build! So, IIUC, all SQLite versions 3.7.0* are affected by this, and 3.7.1 isn't release-ready yet -- should we back out the 3.7.0.1 upgrade and wait until 3.7.1 is ready?

D. Richard Hipp

Comment 41

•

15 years ago

FWIW, we are aiming for a 3.7.1 release early next week - Monday or Tuesday - 2010-08-23 or 2010-09-24.

Daniel Holbert [:dholbert]

Reporter

Updated

•

15 years ago

Assignee: mconnor → nobody

Component: Sync → Storage

Product: Weave → Toolkit

QA Contact: sync → storage

Version: unspecified → Trunk

Mike Beltzner [:beltzner, not reading bugmail]

Comment 42

•

15 years ago

sdwilsh: backout sqlite or go to beta with the slowness; I think we want to back it out, right?

Daniel Holbert [:dholbert]

Reporter

Comment 43

•

15 years ago

beltzner: sounds like it, based on sdwilsh's comment 33 & 34. Sounds like sdwilsh will be AFK until 8pm or later based on comment 34. In the interests of not clashing with other blocker-landings later tonight, I'll go ahead and back out bug 581000 right now.

Daniel Holbert [:dholbert]

Reporter

Comment 44

•

15 years ago

Backed out the sqlite upgrade. (see bug 581000 comment 31) We're back to version 3.6.23.1 for now. Should I resolve this as FIXED by the backout? (with the understanding that the next sqlite version to land will be 3.7.1-based?)

Mike Beltzner [:beltzner, not reading bugmail]

Comment 45

•

15 years ago

Yes, that seems appropriate.

Status: NEW → RESOLVED

Closed: 15 years ago

Resolution: --- → FIXED

Shawn Wilsher :sdwilsh

Comment 46

•

15 years ago

(In reply to comment #42) > sdwilsh: backout sqlite or go to beta with the slowness; I think we want to > back it out, right? Yes, backout is the right thing to do here. I couldn't get to it until about now, so thanks dholbert for doing this! dholbert - did your own trunk build without the test SQLite show the same issue? Just making sure we've covered all our bases here :)

Daniel Holbert [:dholbert]

Reporter

Comment 47

•

15 years ago

Yes, the trunk build from comment 40 hits this bug if I remove the 3.7.1 SQLite prerelease.

Shawn Wilsher :sdwilsh

Comment 48

•

15 years ago

Awesome! Thanks a ton!

BMO Automation

Updated

•

1 year ago

Product: Toolkit → Core