Open Bug 1209578 Opened 9 years ago Updated 2 years ago

Cache API should consider de-duping header names and values

Categories

(Core :: Storage: Cache API, task)

task

Tracking

()

People

(Reporter: bkelly, Unassigned)

Details

Looking at caches.sqlite for trained-to-thrill I see this distribution of storage: *** Page counts for all tables with their indices ***************************** RESPONSE_HEADERS.................................. 10 34.5% SECURITY_INFO..................................... 6 20.7% ENTRIES........................................... 4 13.8% STORAGE........................................... 2 6.9% CACHES............................................ 1 3.4% REQUEST_HEADERS................................... 1 3.4% SQLITE_MASTER..................................... 1 3.4% SQLITE_SEQUENCE................................... 1 3.4% This is a relatively small site, but already a third of the database is taken up with just the response headers. Looking at the data there is a lot of duplication in both header names and values. Some of the values are quite long. For example: p3p|policyref="https://policies.yahoo.com/w3c/p3p.xml", CP="CAO DSP COR CUR ADM DEV TAI PSA PSD IVAi IVDi CONi TELo OTPi OUR DELi SAMi OTRi UNRi PUBi IND PHY ONL UNI PUR FIN COM NAV INT DEM CNT STA POL HEA PRE LOC GOV"|127 We should consider normalizing header and value names separate tables. This will add some time complexity during insert and look-up, but would keep the database from growing so fast. Of course, we'll have to measure the end impact on performance.
Component: DOM → DOM: Core & HTML
Component: DOM: Core & HTML → Storage: Cache API
No longer blocks: 1110136
Type: defect → task
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.