Form sync generates too many records

RESOLVED FIXED in 0.5

Status

()

RESOLVED FIXED
10 years ago
5 months ago

People

(Reporter: hello, Assigned: anant)

Tracking

unspecified
Points:
---
Bug Flags:
blocking-weave1.0 +

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(1 attachment, 1 obsolete attachment)

(Reporter)

Description

10 years ago
This is what I get when I enable form sync:

2009-05-26 13:44:50	Engine.Forms         DEBUG	Preparing 2069 outgoing records

And I've seen logs in the forum from people who have many more items (~9K, ~14K), I don't think it's atypical.

I think our sync engine superclass is not designed for that many items.  We should find a solution (either use a different algorithm or figure out how to combine some of these records together), and probably disable form sync in the meantime (in the prefs UI).
(Reporter)

Updated

10 years ago
Flags: blocking-weave1.0+
Target Milestone: -- → 1.0
Component: Weave → General
Product: Mozilla Labs → Weave
QA Contact: weave → general
So, we did add code to do expiration of form history [1], but I don't know if that's enough to mitigate this.  We do have semi-fake frecency in form history, and last used dates, and we should be able to use those to limit further if we need to.

[1] http://blog.mozilla.com/dolske/2009/05/13/happy-form-history-expiration-day/
Component: General → Sync
QA Contact: general → sync
Target Milestone: 1.0 → 0.5

Comment 2

10 years ago
Right now we do..
    let stmnt = this._formDB.createStatement("SELECT * FROM moz_formhistory");

We can easily sort by the form usage count and limit it.
Yeah, we could be that simple too.  Eventually we can co-opt something like https://bugzilla.mozilla.org/show_bug.cgi?id=370117 but count is probably sufficient for our purposes.  Or count > 3 + last 200 saved?
(Assignee)

Comment 4

10 years ago
Per discussion with thunder, we're going to sync the last 200 items (one record per item, even if the field name is the same) and expire each record after a period of 90 days. Let's test and see if this improves performance and offers a reasonable user experience and make revisions as neccessary.
Assignee: nobody → anant
(Assignee)

Comment 5

10 years ago
Some performance boost can be gained is if we figure out a way to generate and maintain GUIDs for each form history item. Currently, we perform a SHA1 on the field+value which guarantees GUIDness but is slow.

Is maintaining a seperate table in moz_formhistory that maps GUIDs to keys worth it?

Comment 6

10 years ago
Why lastUsed instead of timesUsed?

SELECT COUNT(*) FROM (
  SELECT * FROM moz_formhistory ORDER BY lastUsed DESC LIMIT 200)
WHERE id NOT IN (
  SELECT id FROM moz_formhistory ORDER BY timesUsed DESC LIMIT 200);

184 items in the most used/recent aren't in the other most recent/used for me.
(Assignee)

Comment 7

10 years ago
Created attachment 387581 [details] [diff] [review]
Sync only the last 200 form items

Tweak the SQL query, and remove form items older than a month.
Attachment #387581 - Flags: review?
(Assignee)

Comment 8

10 years ago
(In reply to comment #6)
> Why lastUsed instead of timesUsed?

The query in the patch is:
SELECT * FROM moz_formhistory ORDER BY lastUsed DESC, timesUsed DESC LIMIT 200;

We could reverse the order of lastUsed and timesUsed -- but I think lastUsed should trump timesUsed unless timesUsed is relatively large (say >5?)
(Assignee)

Updated

10 years ago
Attachment #387581 - Flags: review? → review?(thunder)

Comment 9

10 years ago
(In reply to comment #8)
> ORDER BY lastUsed DESC, timesUsed DESC LIMIT 200;
The extra timesUsed is basically a no-op... unless you're able to submit multiple forms at the very same microsecond. ;)

Comment 10

10 years ago
How about...

SELECT * FROM moz_formhistory ORDER BY 1.0 * (lastUsed - (SELECT lastUsed FROM moz_formhistory ORDER BY lastUsed ASC LIMIT 1)) / ((SELECT lastUsed FROM moz_formhistory ORDER BY lastUsed DESC LIMIT 1) - (SELECT lastUsed FROM moz_formhistory ORDER BY lastUsed ASC LIMIT 1)) * timesUsed / (SELECT timesUsed FROM moz_formhistory ORDER BY timesUsed DESC LIMIT 1) DESC LIMIT 200;

;)

Which is more readable form..

SELECT * FROM moz_formhistory ORDER BY 1.0 * (lastUsed - minLast) / (maxLast - minLast) * timesUsed / minTimes DESC LIMIT 200

But if not that, why not select top 100 lastUsed union select top 100 timesUsed.
(Assignee)

Comment 11

10 years ago
I like both the alternate queries, and yes the timesUsed in my query was useless :-)

thunder?
(Assignee)

Comment 12

10 years ago
Created attachment 388118 [details] [diff] [review]
Sync only the last 200 (most relevant) form items

Patch with Ed's optimized query. Worked fine on my local tests.
Attachment #387581 - Attachment is obsolete: true
Attachment #388118 - Flags: review?(thunder)
Attachment #387581 - Flags: review?(thunder)
(Reporter)

Updated

10 years ago
Attachment #388118 - Flags: review?(thunder) → review+
(Reporter)

Comment 13

10 years ago
Comment on attachment 388118 [details] [diff] [review]
Sync only the last 200 (most relevant) form items

Looks good.  I didn't test the sql statement, I trust you that it works ;)

r=thunder
(Assignee)

Comment 14

10 years ago
http://hg.mozilla.org/labs/weave/rev/5e0dcfa417f2
Status: NEW → RESOLVED
Last Resolved: 10 years ago
Resolution: --- → FIXED

Updated

10 years ago
Blocks: 504196
Component: Firefox Sync: Backend → Sync
Product: Cloud Services → Firefox
You need to log in before you can comment on or make changes to this bug.