Closed Bug 1594368 Opened 5 years ago Closed 23 days ago

Query visited status of multiple URIs in the parent process at the same time.

Categories

(Toolkit :: Places, task, P3)

task

Tracking

()

RESOLVED FIXED
134 Branch
Performance Impact low
Tracking Status
firefox134 --- fixed

People

(Reporter: emilio, Assigned: mak)

References

(Blocks 1 open bug)

Details

(Keywords: perf, perf:frontend, Whiteboard: [snt-scrubbed][places-performance])

Attachments

(1 file, 1 obsolete file)

This is a logic follow-up to bug 1593690 that would reduce the amount of I/O considerably, potentially.

Keywords: perf
Priority: -- → P2
Whiteboard: [fxperf]

How expensive is this in general? You mention reducing IO - have we ever tried just maintaining a bloom filter of visited URIs to make this cheap for the common case of unvisited URIs?

See Also: → 1503481

(In reply to Doug Thayer [:dthayer] from comment #1)

How expensive is this in general? You mention reducing IO - have we ever tried just maintaining a bloom filter of visited URIs to make this cheap for the common case of unvisited URIs?

The problem with the bloom filter is that makes it very easy to time whether the URL is probably-visited. Which is a privacy issue as the ones I'm trying to eliminate.

Oh I see. It was a bit hard to follow the chain here because of lack of access. Trying to triage for [fxperf] though.

So, we are trying to level the time it takes to compute the visited status for visited vs unvisited links, which means our best/average case performance has to go down, but just running the queries in aggregate rather than individually makes the whole process more efficient without leaking any information through query timings?

In any case, the query is done off main thread anyway, correct? So we're just affecting overall IO throughput?

(Marking as fxperf:p3 on the assumption that we're not doing major work for this on the main thread as is)

Whiteboard: [fxperf] → [fxperf:p3]

This is not on the main thread from quite some time, it's only about efficiency (one query VS tens)

Yeah, it also allows the notifications back to the content process to be batched, which allows smarter restyling and such.

Severity: normal → S3
Severity: normal → S3

Summarizing, we are currently querying each url separately, generating one query per url.
In this bug we would like to run a single query for multiple urls, and notify back in batches, instead of one notification at a time.

Severity: S3 → N/A
Type: enhancement → task
Performance Impact: --- → ?
Priority: P2 → P3
Whiteboard: [fxperf:p3] → [fxperf:p3][snt-scrubbed][places-performance]

:mak, we've set the performance impact to low, and added a keyword. Let us know if you think they should be set differently.

Performance Impact: ? → low
Keywords: perf:frontend
Whiteboard: [fxperf:p3][snt-scrubbed][places-performance] → [snt-scrubbed][places-performance]
Blocks: 1854770
Blocks: 1909883
Summary: Query multiple URIs in the parent process at the same time. → Query visited status of multiple URIs in the parent process at the same time.

The problem is caching the visited statement, as the number of entries to query will be variable.
There' a few ways to solve that:

  1. bind an array. For this we need to add SQLite support (carray.h), that is feasible and it supports up to char* (no wide chars). There may be a max limit for which we'll have to chunk that complicates things a bit. We can either ask Sqlite to copy over the array, or manage lifetime on our own (Sqlite needs the array to stay alive until next bind, reset or finalize).
  2. bind using json (json array => memory view), this is not super safe when the input comes from the outside, like URLs in this case, and would require to build json in cpp.
  3. store URLs in a memory table, on idle resolve any pending URLs, then DELETE all resolved URLs, fetching their status though a RETURNING clause. Storing URLs may again require to bind a variable number of URLs, so it may just move the problem to the insertion statement, rather than solve it.
Depends on: 483318
Assignee: nobody → mak
Status: NEW → ASSIGNED

Instead of executing one query per URI, we execute a single query per
the whole batch of URIs to be checked. The same query is also reused
in the single URI case.
Note this still notifies back links status one by one, that will be
handled separately.

Attachment #9430352 - Attachment is obsolete: true
Pushed by mak77@bonardo.net: https://hg.mozilla.org/integration/autoland/rev/6107713f0780 Query visited status of multiple URIs at the same time. r=daisuke,places-reviewers
Backout by agoloman@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/fb0536e2d581 Backed out 3 changesets (bug 1594368, bug 483318) for causing multiple failures. CLOSED TREE

Backed out for causing multiple failures.

Flags: needinfo?(mak)
Pushed by mak77@bonardo.net: https://hg.mozilla.org/integration/autoland/rev/119db0349a21 Query visited status of multiple URIs at the same time. r=daisuke,places-reviewers
Status: ASSIGNED → RESOLVED
Closed: 23 days ago
Resolution: --- → FIXED
Target Milestone: --- → 134 Branch
Flags: needinfo?(mak)

FWIW, this bug or bug 483318 lead to a tiny 0.1% increase in NSS section sizes opt on Windows Nightly.
This is most probably below the threshold for any alert. But reporting here just as an FYI :)

(In reply to Mayank Bansal from comment #16)

FWIW, this bug or bug 483318 lead to a tiny 0.1% increase in NSS section sizes opt on Windows Nightly.

If this means nss lib increased a bit, that's expected as we're bundling some small additional code (the carray SQLite extension).

You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: