Closed Bug 1900837 Opened 1 month ago Closed 14 days ago

Speed up drop_suggestions

Categories

(Application Services :: Suggest, defect)

defect

Tracking

(firefox126 wontfix, firefox127 wontfix, firefox128 fixed, firefox129 fixed)

RESOLVED FIXED
129 Branch
Tracking Status
firefox126 --- wontfix
firefox127 --- wontfix
firefox128 --- fixed
firefox129 --- fixed

People

(Reporter: bdk, Assigned: bdk)

References

Details

(Whiteboard: [disco-])

Attachments

(3 files)

drop_suggestions is called for each record in the database, which I think is around 50 or so. In practice, this seems to be running slow and we should speed it up. This is probably causing the performance issues that the Android team has noticed and might be contributing to the iOS issues as well.

Whiteboard: [disco-]

Just wanted to add that I ran a local profile on my M3Max laptop - I intentionally re-ingested all records to see how our injest would handle the extreme case where every record is updated, and got this: https://share.firefox.dev/3Vu73E3

I had to CTRL-C the program because the re-ingestion did not terminate

Looks like more evidence the dropping is taking too long, but ya that's for the extreme case since in practice only small subsets of records should be re-ingested. I'm also attaching a screenshot that shows that the first DELETE, on the suggestions table is the one taking the longest. Oddly enough, that table does have an index on record_ids, so it's possible that we are simply doing too many deletions

Attached image profile.png

in practice only small subsets of records should be re-ingested

Each record contains a chunk of 200 suggestions. So even if only a small number of suggestions change, we may end up changing all the records.

The profile Kaya collected on his Android phone was also spending its time in SuggestDao::drop_suggestions, so the Android perf issue we were seeing is likely the same bug: https://share.firefox.dev/4bJm8aL

Summary: drop_suggestions should use an index → Speed up drop_suggestions

I've been checking this out a bit and I don't believe the issue is the lack of index on the suggestions table. It seems more related to the cascading deletes that happen in related tables -- especially full_keywords.

Weirdly though, it doesn't happen at all when I try to run the same query from the command line. One possibilities is there's a read on that table that's blocking the write, but I haven't found any evidence of this.

Updating the title/description based on this new data.

I believe the issue is this FK: https://github.com/mozilla/application-services/blob/71578bea70b5f9da0b9c35c17e8fe15e9cf2ebc9/components/suggest/src/schema.rs#L30, in combination with the suggestion_id FK on both tables. I think that means that when we delete suggestions, for each suggestion row SQLITE will do something like:

  • Delete the corresponding row from the full_keywords table.
  • Null out the keywords.full_keyword_id field.
  • Delete the corresponding row from the keywords table.

For some reason, that nulling out step really slows things down. With my benchmark suite, when I remove the FK linked at the top of this comment, I can successfully run the ingest-again benchmarks. It's slower than the initial ingestion by maybe 25%, but it works. When I keep the FK in, the benchmark runs for a very long time. I've always interrupted it before it could complete.

After more investigation, that FK was only a minor issue. The main issue was that we didn't have indexes on the child key columns, which made foreign key checks much slower. That said, I think that removing that ON DELETE SET NULL clause will speed up things slightly.

See Also: → 1900928
Assignee: nobody → bdeankawamura
See Also: → 1903819
Blocks: 1904437

This fix has now made it into mozilla-central via the application-services vendoring in bug 1880183.

Pushed by lbutler@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/4822c98103c2
Vendor the latest Application Services. r=bdk

https://hg.mozilla.org/mozilla-central/rev/4822c98103c2

Status: NEW → RESOLVED
Closed: 14 days ago
Depends on: 1880183
Resolution: --- → FIXED
Target Milestone: --- → 129 Branch
Blocks: 1903819
See Also: 1903819

Comment on attachment 9409609 [details] [review]
[mozilla/application-services] Bug 1900837 - Speed up re-ingestion (backport #6267) (#6284)

Beta/Release Uplift Approval Request

  • User impact if declined: If the remote settings collection is updated, then users may experience excessive CPU usage on Android and shutdown hangs on Desktop.
  • Is this code covered by automated tests?: No
  • Has the fix been verified in Nightly?: Yes
  • Needs manual test from QE?: No
  • If yes, steps to reproduce:
  • List of other uplifts needed: Bug 1903819
  • Risk to taking this patch: Low
  • Why is the change risky/not risky? (and alternatives if risky): I've tested this on manually by forcing the client to use the dev remote settings server and forcing re-ingestion. It worked fine on my emulator (no crashes, ~20 seconds total time for re-ingestion)
  • String changes made/needed:
  • Is Android affected?: Yes
Attachment #9409609 - Flags: approval-mozilla-beta?

Comment on attachment 9409609 [details] [review]
[mozilla/application-services] Bug 1900837 - Speed up re-ingestion (backport #6267) (#6284)

Approved for AS 128.0.1 which will be bumped in Beta 128.0b9

Attachment #9409609 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: