Closed Bug 1173439 Opened 10 years ago Closed 9 years ago

Remove Cache API text column indices from sqlite schema

Tracking

()

Status:

RESOLVED FIXED

Milestone:

mozilla41

Tracking Flags:

Tracking

Status

firefox41

---

fixed

People

(Reporter: bkelly, Assigned: bkelly)

References

(Blocks 1 open bug)

Details

Attachments

(5 files, 7 obsolete files)

P1 Store URLs as UTF8 strings in Cache instead of UTF16. r=ehsan 9 years ago Ben Kelly [:bkelly, not reviewing] 13.14 KB, patch		Details \| Diff \| Splinter Review
P1 Store URLs as UTF8 strings in Cache instead of UTF16. r=ehsan 9 years ago Ben Kelly [:bkelly, not reviewing] 13.77 KB, patch		Details \| Diff \| Splinter Review
P2 Parse Response URL query as a separate field. r=ehsan 9 years ago Ben Kelly [:bkelly, not reviewing] 11.15 KB, patch	ehsan.akhgari : review+	Details \| Diff \| Splinter Review
P3 Use url query field for database matching and Request creation. r=ehsan 9 years ago Ben Kelly [:bkelly, not reviewing] 3.37 KB, patch	ehsan.akhgari : review+	Details \| Diff \| Splinter Review
P4 Remove Request url field from Cache API database. r=ehsan 9 years ago Ben Kelly [:bkelly, not reviewing] 10.02 KB, patch	ehsan.akhgari : review+	Details \| Diff \| Splinter Review
P5 Cache should index on a hash instead of the url itself. r=ehsan 9 years ago Ben Kelly [:bkelly, not reviewing] 13.72 KB, patch	ehsan.akhgari : review+	Details \| Diff \| Splinter Review
P1 Store URLs as UTF8 strings in Cache instead of UTF16. r=ehsan 9 years ago Ben Kelly [:bkelly, not reviewing] 14.45 KB, patch	ehsan.akhgari : review+	Details \| Diff \| Splinter Review
P1 Store URLs as UTF8 strings in Cache instead of UTF16. r=ehsan 9 years ago Ben Kelly [:bkelly, not reviewing] 14.84 KB, patch	bkelly : review+	Details \| Diff \| Splinter Review
P2 Parse Response URL query as a separate field. r=ehsan 9 years ago Ben Kelly [:bkelly, not reviewing] 11.42 KB, patch	bkelly : review+	Details \| Diff \| Splinter Review
P3 Use url query field for database matching and Request creation. r=ehsan 9 years ago Ben Kelly [:bkelly, not reviewing] 3.22 KB, patch	bkelly : review+	Details \| Diff \| Splinter Review
P4 Remove Request url field from Cache API database. r=ehsan 9 years ago Ben Kelly [:bkelly, not reviewing] 9.84 KB, patch	bkelly : review+	Details \| Diff \| Splinter Review
P5 Cache should index on a hash instead of the url itself. r=ehsan 9 years ago Ben Kelly [:bkelly, not reviewing] 13.71 KB, patch	bkelly : review+	Details \| Diff \| Splinter Review

Ben Kelly [:bkelly, not reviewing]

Assignee

Description

•

10 years ago

One of the database optimizations we should do sooner rather than later is removing the text column indices for url and url_without_query.

Ben Kelly [:bkelly, not reviewing]

Assignee

Updated

•

10 years ago

Assignee: nobody → bkelly

Status: NEW → ASSIGNED

Ben Kelly [:bkelly, not reviewing]

Assignee

Comment 1

•

9 years ago

Attached patch P1 Store URLs as UTF8 strings in Cache instead of UTF16. r=ehsan (obsolete) — Details — Splinter Review

Switch to storing URLs as UTF8 c-strings. Making the URL strings more compact should help regardless of how we store them. In addition, fetch already does this conversion, so we avoid reconverting them back to UTF16 this way.

Ben Kelly [:bkelly, not reviewing]

Assignee

Comment 2

•

9 years ago

Attached patch P1 Store URLs as UTF8 strings in Cache instead of UTF16. r=ehsan (obsolete) — Details — Splinter Review

I broke this bug up into a number of smaller refactors. The end goal is to store the URL as separate URL-without-query and query columns. Matching willing compare both column unless ignoreSearch is set, then just the URL-without-query will be examined. I can then hash these two columns and index the hashes. The first part is to switch to use UTF8 strings for the URLs as I discussed above.

Attachment #8622537 - Attachment is obsolete: true

Attachment #8622786 - Flags: review?(ehsan)

Ben Kelly [:bkelly, not reviewing]

Assignee

Comment 3

•

9 years ago

Attached patch P2 Parse Response URL query as a separate field. r=ehsan (obsolete) — Details — Splinter Review

Next, parse the URL query out into a separate field and store it in the database.

Attachment #8622787 - Flags: review?(ehsan)

Ben Kelly [:bkelly, not reviewing]

Assignee

Comment 4

•

9 years ago

Attached patch P3 Use url query field for database matching and Request creation. r=ehsan (obsolete) — Details — Splinter Review

Next, use the new request_url_query column along with the existing request_url_no_query column to perform matching. Deserialize Request objects using these fields as well.

Attachment #8622790 - Flags: review?(ehsan)

Ben Kelly [:bkelly, not reviewing]

Assignee

Comment 5

•

9 years ago

Attached patch P4 Remove Request url field from Cache API database. r=ehsan (obsolete) — Details — Splinter Review

The CacheRequest.url() member and request_url column are no longer used. Remove them.

Attachment #8622791 - Flags: review?(ehsan)

Ben Kelly [:bkelly, not reviewing]

Assignee

Comment 6

•

9 years ago

Attached patch P5 Cache should index on a hash instead of the url itself. r=ehsan (obsolete) — Details — Splinter Review

Now remove the old request_url_no_query index and replace it with a single index containing the following columns: cache_id request_url_no_query_hash request_url_query_hash The hash columns are created using the first 8 bytes of the SHA-1 of the URL. The resulting SQL query is: SELECT id, COUNT(response_headers.name) AS vary_count FROM entries LEFT OUTER JOIN response_headers ON entries.id=response_headers.entry_id AND response_headers.name='vary' WHERE entries.cache_id=:cache_id AND entries.request_url_no_query_hash=:url_no_query_hash AND entries.request_url_query_hash=:url_query_hash AND entries.request_url_no_query=:url_no_query AND entries.request_url_query=:url_query GROUP BY entries.id ORDER BY entries.id; I used EXPLAIN QUERY PLAN to verify this will correctly use the new entries index and also the request_headers name index. With all of these changes applied peak disk usage in the test_cache_shrink.html mochitest drops from 672k to 288k. I will examine execution times in automation to see if there are any improvements or regressions. I don't see any changes in the few tests I ran locally. https://treeherder.mozilla.org/#/jobs?repo=try&revision=eefb2c5059bf

Attachment #8622794 - Flags: review?(ehsan)

Ben Kelly [:bkelly, not reviewing]

Assignee

Comment 7

•

9 years ago

Attached patch P1 Store URLs as UTF8 strings in Cache instead of UTF16. r=ehsan (obsolete) — Details — Splinter Review

The ASAN tests on try showed that the previous patch had a problem with passing nsACString types to ThrowTypeError(). The var args there specifically want things inheriting from nsAString. Update the patch to fix that.

Attachment #8622786 - Attachment is obsolete: true

Attachment #8622786 - Flags: review?(ehsan)

Attachment #8623041 - Flags: review?(ehsan)

Ben Kelly [:bkelly, not reviewing]

Assignee

Comment 8

•

9 years ago

https://treeherder.mozilla.org/#/jobs?repo=try&revision=d6b725070026

Ben Kelly [:bkelly, not reviewing]

Assignee

Comment 9

•

9 years ago

The previous try run suggests that cache-match.https.html in the web-platform-tests slows from an average of 14.3 seconds to 17.8 seconds. That's a ~25% slowdown on those slow windows vms. Although the second try I posted has better numbers at 13.1 seconds. I guess this test has a lot of variance in automation. In any case, I'm going to see if using md5sum instead of sha-1 makes any noticeable difference: https://treeherder.mozilla.org/#/jobs?repo=try&revision=39ac6d818b16

(no longer active)

Updated

•

9 years ago

Attachment #8623041 - Flags: review?(ehsan) → review+

(no longer active)

Comment 10

•

9 years ago

Comment on attachment 8622787 [details] [diff] [review] P2 Parse Response URL query as a separate field. r=ehsan Review of attachment 8622787 [details] [diff] [review]: ----------------------------------------------------------------- ::: dom/cache/TypeUtils.h @@ +100,5 @@ > ToInternalHeaders(const nsTArray<HeadersEntry>& aHeadersEntryList, > HeadersGuardEnum aGuard = HeadersGuardEnum::None); > > static void > ProcessURL(nsACString& aUrl, bool* aSchemeValidOut, Nit: please document what this function should return, given an input of "http://example.com/path?foo=bar".

Attachment #8622787 - Flags: review?(ehsan) → review+

(no longer active)

Updated

•

9 years ago

Attachment #8622790 - Flags: review?(ehsan) → review+

(no longer active)

Updated

•

9 years ago

Attachment #8622791 - Flags: review?(ehsan) → review+

(no longer active)

Comment 11

•

9 years ago

Comment on attachment 8622794 [details] [diff] [review] P5 Cache should index on a hash instead of the url itself. r=ehsan Review of attachment 8622794 [details] [diff] [review]: ----------------------------------------------------------------- FWIW, there is also mozilla/SHA1.h which has a better C++ interface in case you want to use it in the future.

Attachment #8622794 - Flags: review?(ehsan) → review+

Ben Kelly [:bkelly, not reviewing]

Assignee

Comment 12

•

9 years ago

I took some data from the cache-match.https.html test running on windows in automation: https://docs.google.com/spreadsheets/d/19A3Cl9CliLlE3d3qFv7rmtrX3gdXv27ybKxksWfRHv0/edit?usp=sharing It actually shows that md5 is slower than sha1 by 3% to 4% for some reason. There does not appear to be any regression from the currently reviewed patches. So I will go with them without further changes.

Ben Kelly [:bkelly, not reviewing]

Assignee

Comment 13

•

9 years ago

Attached patch P1 Store URLs as UTF8 strings in Cache instead of UTF16. r=ehsan — Details — Splinter Review

Attachment #8623041 - Attachment is obsolete: true

Attachment #8623403 - Flags: review+

Ben Kelly [:bkelly, not reviewing]

Assignee

Comment 14

•

9 years ago

Attached patch P2 Parse Response URL query as a separate field. r=ehsan — Details — Splinter Review

Attachment #8622787 - Attachment is obsolete: true

Attachment #8623404 - Flags: review+

Ben Kelly [:bkelly, not reviewing]

Assignee

Comment 15

•

9 years ago

Attached patch P3 Use url query field for database matching and Request creation. r=ehsan — Details — Splinter Review

Attachment #8622790 - Attachment is obsolete: true

Attachment #8623405 - Flags: review+

Ben Kelly [:bkelly, not reviewing]

Assignee

Comment 16

•

9 years ago

Attached patch P4 Remove Request url field from Cache API database. r=ehsan — Details — Splinter Review

Attachment #8622791 - Attachment is obsolete: true

Attachment #8623406 - Flags: review+

Ben Kelly [:bkelly, not reviewing]

Assignee

Comment 17

•

9 years ago

Attached patch P5 Cache should index on a hash instead of the url itself. r=ehsan — Details — Splinter Review

Attachment #8622794 - Attachment is obsolete: true

Attachment #8623407 - Flags: review+

Pulsebot

Comment 18

•

9 years ago

https://hg.mozilla.org/integration/mozilla-inbound/rev/3f1616172cd6 https://hg.mozilla.org/integration/mozilla-inbound/rev/818edd204d7b https://hg.mozilla.org/integration/mozilla-inbound/rev/619c2a4e7ce2 https://hg.mozilla.org/integration/mozilla-inbound/rev/12ccbd45dcb5 https://hg.mozilla.org/integration/mozilla-inbound/rev/eefb95916e40

Ryan VanderMeulen [:RyanVM]

Comment 19

•

9 years ago

https://hg.mozilla.org/mozilla-central/rev/3f1616172cd6 https://hg.mozilla.org/mozilla-central/rev/818edd204d7b https://hg.mozilla.org/mozilla-central/rev/619c2a4e7ce2 https://hg.mozilla.org/mozilla-central/rev/12ccbd45dcb5 https://hg.mozilla.org/mozilla-central/rev/eefb95916e40

Status: ASSIGNED → RESOLVED

Closed: 9 years ago

status-firefox41: --- → fixed

Resolution: --- → FIXED

Target Milestone: --- → mozilla41

Nobody; OK to take it and work on it

Updated

•

6 years ago

Component: DOM → DOM: Core & HTML

You need to log in before you can comment on or make changes to this bug.