Speed up drop_suggestions
Categories
(Application Services :: Suggest, defect)
Tracking
(firefox126 wontfix, firefox127 wontfix, firefox128 fixed, firefox129 fixed)
People
(Reporter: bdk, Assigned: bdk)
References
Details
(Whiteboard: [disco-])
Attachments
(3 files)
drop_suggestions is called for each record in the database, which I think is around 50 or so. In practice, this seems to be running slow and we should speed it up. This is probably causing the performance issues that the Android team has noticed and might be contributing to the iOS issues as well.
Assignee | ||
Updated•1 month ago
|
Updated•1 month ago
|
Comment 1•1 month ago
|
||
Just wanted to add that I ran a local profile on my M3Max laptop - I intentionally re-ingested all records to see how our injest would handle the extreme case where every record is updated, and got this: https://share.firefox.dev/3Vu73E3
I had to CTRL-C the program because the re-ingestion did not terminate
Looks like more evidence the dropping is taking too long, but ya that's for the extreme case since in practice only small subsets of records should be re-ingested. I'm also attaching a screenshot that shows that the first DELETE, on the suggestions
table is the one taking the longest. Oddly enough, that table does have an index on record_ids, so it's possible that we are simply doing too many deletions
Comment 2•1 month ago
|
||
Assignee | ||
Comment 3•1 month ago
|
||
in practice only small subsets of records should be re-ingested
Each record contains a chunk of 200 suggestions. So even if only a small number of suggestions change, we may end up changing all the records.
Comment 4•1 month ago
•
|
||
The profile Kaya collected on his Android phone was also spending its time in SuggestDao::drop_suggestions
, so the Android perf issue we were seeing is likely the same bug: https://share.firefox.dev/4bJm8aL
Assignee | ||
Updated•1 month ago
|
Assignee | ||
Comment 5•1 month ago
|
||
I've been checking this out a bit and I don't believe the issue is the lack of index on the suggestions
table. It seems more related to the cascading deletes that happen in related tables -- especially full_keywords
.
Weirdly though, it doesn't happen at all when I try to run the same query from the command line. One possibilities is there's a read on that table that's blocking the write, but I haven't found any evidence of this.
Updating the title/description based on this new data.
Assignee | ||
Comment 6•1 month ago
|
||
I believe the issue is this FK: https://github.com/mozilla/application-services/blob/71578bea70b5f9da0b9c35c17e8fe15e9cf2ebc9/components/suggest/src/schema.rs#L30, in combination with the suggestion_id
FK on both tables. I think that means that when we delete suggestions, for each suggestion row SQLITE will do something like:
- Delete the corresponding row from the
full_keywords
table. - Null out the
keywords.full_keyword_id
field. - Delete the corresponding row from the
keywords
table.
For some reason, that nulling out step really slows things down. With my benchmark suite, when I remove the FK linked at the top of this comment, I can successfully run the ingest-again benchmarks. It's slower than the initial ingestion by maybe 25%, but it works. When I keep the FK in, the benchmark runs for a very long time. I've always interrupted it before it could complete.
Assignee | ||
Comment 7•1 month ago
|
||
After more investigation, that FK was only a minor issue. The main issue was that we didn't have indexes on the child key columns, which made foreign key checks much slower. That said, I think that removing that ON DELETE SET NULL
clause will speed up things slightly.
Comment 8•1 month ago
|
||
Updated•1 month ago
|
Comment 9•14 days ago
|
||
This fix has now made it into mozilla-central via the application-services vendoring in bug 1880183.
Pushed by lbutler@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/4822c98103c2
Vendor the latest Application Services. r=bdk
Updated•13 days ago
|
Comment 10•12 days ago
|
||
Assignee | ||
Comment 11•12 days ago
|
||
Comment on attachment 9409609 [details] [review]
[mozilla/application-services] Bug 1900837 - Speed up re-ingestion (backport #6267) (#6284)
Beta/Release Uplift Approval Request
- User impact if declined: If the remote settings collection is updated, then users may experience excessive CPU usage on Android and shutdown hangs on Desktop.
- Is this code covered by automated tests?: No
- Has the fix been verified in Nightly?: Yes
- Needs manual test from QE?: No
- If yes, steps to reproduce:
- List of other uplifts needed: Bug 1903819
- Risk to taking this patch: Low
- Why is the change risky/not risky? (and alternatives if risky): I've tested this on manually by forcing the client to use the dev remote settings server and forcing re-ingestion. It worked fine on my emulator (no crashes, ~20 seconds total time for re-ingestion)
- String changes made/needed:
- Is Android affected?: Yes
Comment 12•11 days ago
|
||
Comment on attachment 9409609 [details] [review]
[mozilla/application-services] Bug 1900837 - Speed up re-ingestion (backport #6267) (#6284)
Approved for AS 128.0.1 which will be bumped in Beta 128.0b9
Comment 13•11 days ago
|
||
Marking 128 as Fixed since this has been merged to beta
Desktop: https://hg.mozilla.org/releases/mozilla-beta/rev/3b26fb0a9b20d6a742f691201285fc6c0efef37f
Android: https://hg.mozilla.org/releases/mozilla-beta/rev/cd1a5821cb0f378dadc6658371375666a20c7f19
Description
•