Closed Bug 1173439 Opened 5 years ago Closed 5 years ago

Remove Cache API text column indices from sqlite schema

Categories

(Core :: DOM: Core & HTML, defect)

defect
Not set

Tracking

()

RESOLVED FIXED
mozilla41
Tracking Status
firefox41 --- fixed

People

(Reporter: bkelly, Assigned: bkelly)

References

(Blocks 1 open bug)

Details

Attachments

(5 files, 7 obsolete files)

14.84 KB, patch
bkelly
: review+
Details | Diff | Splinter Review
11.42 KB, patch
bkelly
: review+
Details | Diff | Splinter Review
3.22 KB, patch
bkelly
: review+
Details | Diff | Splinter Review
9.84 KB, patch
bkelly
: review+
Details | Diff | Splinter Review
13.71 KB, patch
bkelly
: review+
Details | Diff | Splinter Review
One of the database optimizations we should do sooner rather than later is removing the text column indices for url and url_without_query.
Assignee: nobody → bkelly
Status: NEW → ASSIGNED
Switch to storing URLs as UTF8 c-strings.  Making the URL strings more compact should help regardless of how we store them.  In addition, fetch already does this conversion, so we avoid reconverting them back to UTF16 this way.
I broke this bug up into a number of smaller refactors.

The end goal is to store the URL as separate URL-without-query and query columns.  Matching willing compare both column unless ignoreSearch is set, then just the URL-without-query will be examined.  I can then hash these two columns and index the hashes.

The first part is to switch to use UTF8 strings for the URLs as I discussed above.
Attachment #8622537 - Attachment is obsolete: true
Attachment #8622786 - Flags: review?(ehsan)
Next, parse the URL query out into a separate field and store it in the database.
Attachment #8622787 - Flags: review?(ehsan)
Next, use the new request_url_query column along with the existing request_url_no_query column to perform matching.  Deserialize Request objects using these fields as well.
Attachment #8622790 - Flags: review?(ehsan)
The CacheRequest.url() member and request_url column are no longer used.  Remove them.
Attachment #8622791 - Flags: review?(ehsan)
Now remove the old request_url_no_query index and replace it with a single index containing the following columns:

  cache_id
  request_url_no_query_hash
  request_url_query_hash

The hash columns are created using the first 8 bytes of the SHA-1 of the URL.

The resulting SQL query is:

    SELECT id, COUNT(response_headers.name) AS vary_count
    FROM entries
    LEFT OUTER JOIN response_headers ON entries.id=response_headers.entry_id
                                    AND response_headers.name='vary'
    WHERE entries.cache_id=:cache_id
      AND entries.request_url_no_query_hash=:url_no_query_hash
      AND entries.request_url_query_hash=:url_query_hash
      AND entries.request_url_no_query=:url_no_query
      AND entries.request_url_query=:url_query
    GROUP BY entries.id ORDER BY entries.id;

I used EXPLAIN QUERY PLAN to verify this will correctly use the new entries index and also the request_headers name index.

With all of these changes applied peak disk usage in the test_cache_shrink.html mochitest drops from 672k to 288k.

I will examine execution times in automation to see if there are any improvements or regressions.  I don't see any changes in the few tests I ran locally.

https://treeherder.mozilla.org/#/jobs?repo=try&revision=eefb2c5059bf
Attachment #8622794 - Flags: review?(ehsan)
The ASAN tests on try showed that the previous patch had a problem with passing nsACString types to ThrowTypeError().  The var args there specifically want things inheriting from nsAString.  Update the patch to fix that.
Attachment #8622786 - Attachment is obsolete: true
Attachment #8622786 - Flags: review?(ehsan)
Attachment #8623041 - Flags: review?(ehsan)
The previous try run suggests that cache-match.https.html in the web-platform-tests slows from an average of 14.3 seconds to 17.8 seconds.  That's a ~25% slowdown on those slow windows vms.

Although the second try I posted has better numbers at 13.1 seconds.  I guess this test has a lot of variance in automation.

In any case, I'm going to see if using md5sum instead of sha-1 makes any noticeable difference:

  https://treeherder.mozilla.org/#/jobs?repo=try&revision=39ac6d818b16
Attachment #8623041 - Flags: review?(ehsan) → review+
Comment on attachment 8622787 [details] [diff] [review]
P2 Parse Response URL query as a separate field. r=ehsan

Review of attachment 8622787 [details] [diff] [review]:
-----------------------------------------------------------------

::: dom/cache/TypeUtils.h
@@ +100,5 @@
>    ToInternalHeaders(const nsTArray<HeadersEntry>& aHeadersEntryList,
>                      HeadersGuardEnum aGuard = HeadersGuardEnum::None);
>  
>    static void
>    ProcessURL(nsACString& aUrl, bool* aSchemeValidOut,

Nit: please document what this function should return, given an input of "http://example.com/path?foo=bar".
Attachment #8622787 - Flags: review?(ehsan) → review+
Attachment #8622790 - Flags: review?(ehsan) → review+
Attachment #8622791 - Flags: review?(ehsan) → review+
Comment on attachment 8622794 [details] [diff] [review]
P5 Cache should index on a hash instead of the url itself. r=ehsan

Review of attachment 8622794 [details] [diff] [review]:
-----------------------------------------------------------------

FWIW, there is also mozilla/SHA1.h which has a better C++ interface in case you want to use it in the future.
Attachment #8622794 - Flags: review?(ehsan) → review+
I took some data from the cache-match.https.html test running on windows in automation:

  https://docs.google.com/spreadsheets/d/19A3Cl9CliLlE3d3qFv7rmtrX3gdXv27ybKxksWfRHv0/edit?usp=sharing

It actually shows that md5 is slower than sha1 by 3% to 4% for some reason.

There does not appear to be any regression from the currently reviewed patches.  So I will go with them without further changes.
Component: DOM → DOM: Core & HTML
You need to log in before you can comment on or make changes to this bug.