Closed Bug 968593 Opened 6 years ago Closed 6 years ago

HTTP cache v2: Make load context key (a.k.a key prefix) safely parsable

Categories

(Core :: Networking: Cache, defect)

defect
Not set

Tracking

()

RESOLVED FIXED
mozilla31

People

(Reporter: mayhemer, Assigned: mayhemer)

References

Details

Attachments

(1 file, 3 obsolete files)

Since the partitioning of cache storages should be generic and we need the partitioning attribute values when rebuilding the index, only way is to parse the prefix stored in cache entry files as the cache keys.

Now it's not safe.  The design was originally not meant for parsing.



After discussion with Michal, this is based on his suggestion, I concur:

a1,b1,i3456,:http://foo.com/index.html

= anonymous, inbrowser, appid=3456, url=http://foo.com/index.html

:http://bar.com/

= non anon, not in browser, appid=0 (most common case from daily browsing)

each attribute has a letter as the name, with a default value which should be the most often used so that we can just bypass it from the string.  unknown shall be ignored.  ':' is the stop mark.  ',' shall be escaped as ',,'.

the index now keeps only appid,inbrowser,anon flags.  When we need a new attribute to quickly search the cache via index, we shall introduce a new version of the index and rebuild it from scratch.



Other more complicated option would be to have directories for some of the attributes, e.g. appid.  Then it would be super easy to delete all e.g. app's data even when we don't have index built up at the moment.

But I think it's hard to decide which attributes to use for individual directories, so I tend to leave the design (flat) as is now.
Forgot to mention: we need to somehow deal with the current string format.  It can be backward compatibility or change (actually introduce (!)) versioning of cache entry format.  That's why dep on bug 941698.
Attached patch v1 (obsolete) — Splinter Review
the included CPP test is a draft (passes, but is not compiled).  I know this is being refactored in the index patch, so I will coordinate with it later.
Assignee: nobody → honzab.moz
Status: NEW → ASSIGNED
Attachment #8374603 - Flags: review?(michal.novotny)
Attached patch v1.1 (obsolete) — Splinter Review
Also gives the cache key + few more nits.
Attachment #8374603 - Attachment is obsolete: true
Attachment #8374603 - Flags: review?(michal.novotny)
Attachment #8375214 - Flags: review?(michal.novotny)
Comment on attachment 8375214 [details] [diff] [review]
v1.1

Review of attachment 8375214 [details] [diff] [review]:
-----------------------------------------------------------------

::: netwerk/test/TestCachePrefixKeyParser.cpp
@@ +33,5 @@
> +  info = MappingKeyToLoadContextInfo(NS_LITERAL_CSTRING("a,b,:xxx"));
> +  CHECK(info && !info->IsPrivate() && info->IsAnonymous() && info->IsInBrowserElement() && info->AppId() == 0);
> +  info = MappingKeyToLoadContextInfo(NS_LITERAL_CSTRING("a,b,i123,:xxx"));
> +  CHECK(info && !info->IsPrivate() && info->IsAnonymous() && info->IsInBrowserElement() && info->AppId() == 123);
> +  info = MappingKeyToLoadContextInfo(NS_LITERAL_CSTRING("a,i123,b,:xxx"));

I think we must specify an order of the values. Probably the best would be to order them by the first letter. Otherwise both a,b,i123,:xxx and a,i123,b,:xxx would be the same key but they would not point to the same entry, since the hash would be different.
Attachment #8375214 - Flags: review?(michal.novotny) → review-
(In reply to Michal Novotny (:michal) from comment #4)
> Comment on attachment 8375214 [details] [diff] [review]
> v1.1
> 
> Review of attachment 8375214 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> ::: netwerk/test/TestCachePrefixKeyParser.cpp
> @@ +33,5 @@
> > +  info = MappingKeyToLoadContextInfo(NS_LITERAL_CSTRING("a,b,:xxx"));
> > +  CHECK(info && !info->IsPrivate() && info->IsAnonymous() && info->IsInBrowserElement() && info->AppId() == 0);
> > +  info = MappingKeyToLoadContextInfo(NS_LITERAL_CSTRING("a,b,i123,:xxx"));
> > +  CHECK(info && !info->IsPrivate() && info->IsAnonymous() && info->IsInBrowserElement() && info->AppId() == 123);
> > +  info = MappingKeyToLoadContextInfo(NS_LITERAL_CSTRING("a,i123,b,:xxx"));
> 
> I think we must specify an order of the values. Probably the best would be
> to order them by the first letter. Otherwise both a,b,i123,:xxx and
> a,i123,b,:xxx would be the same key but they would not point to the same
> entry, since the hash would be different.

Hmm.. I don't think we would ever just swap attributes in LoadContextInfoMappingKey w/o any reason.  I'd personally would group them by logic.

Is this the only concern about the patch?
Attached patch v2 (obsolete) — Splinter Review
The test doesn't build, I think we should resolve that in a followup.
Attachment #8375214 - Attachment is obsolete: true
Attachment #8378629 - Flags: review?(michal.novotny)
Comment on attachment 8378629 [details] [diff] [review]
v2

Review of attachment 8378629 [details] [diff] [review]:
-----------------------------------------------------------------

(In reply to Honza Bambas (:mayhemer) from comment #5)
> Hmm.. I don't think we would ever just swap attributes in
> LoadContextInfoMappingKey w/o any reason.  I'd personally would group them
> by logic.

I'm not against grouping them by logic. I just think that we should parse them only in the same order as we serialize them.

::: netwerk/cache2/CacheFileUtils.cpp
@@ +57,5 @@
> +
> +    // 'Read' the tag name and move to the next char
> +    switch (*caret++) {
> +    case ':':
> +      // optional terminator, followed by optional cache key

Why is ':' optional? There must always be the colon to separate key prefix and URL, since URL must never be an empty string. Let's make the colon an obligatory part of the key prefix, or treat it as a separate part and never pass it to the parser as a part of the key prefix.
Attachment #8378629 - Flags: review?(michal.novotny) → feedback+
(In reply to Michal Novotny (:michal) from comment #7)
> Comment on attachment 8378629 [details] [diff] [review]
> v2
> 
> Review of attachment 8378629 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> (In reply to Honza Bambas (:mayhemer) from comment #5)
> > Hmm.. I don't think we would ever just swap attributes in
> > LoadContextInfoMappingKey w/o any reason.  I'd personally would group them
> > by logic.
> 
> I'm not against grouping them by logic. I just think that we should parse
> them only in the same order as we serialize them.

I think I've caught your idea.  Alphabetical sorting is good.

> 
> ::: netwerk/cache2/CacheFileUtils.cpp
> @@ +57,5 @@
> > +
> > +    // 'Read' the tag name and move to the next char
> > +    switch (*caret++) {
> > +    case ':':
> > +      // optional terminator, followed by optional cache key
> 
> Why is ':' optional? There must always be the colon to separate key prefix
> and URL, since URL must never be an empty string. Let's make the colon an
> obligatory part of the key prefix, or treat it as a separate part and never
> pass it to the parser as a part of the key prefix.

Don't forget we may need to parse also only the key prefix itself, w/o the added ':' + cacheKey.

Maybe just the comments are wrong.  The thing is that when we find ':' there actually must be (here the comment is probably confusing) a cache key then, and we don't expect any ',' after it.  

I thought it would be better to keep the scope key short and less confusing by not adding ':' when we want to keep just the key prefix alone.  What you suggest is IMO a bit risky and might break something.

Idea: treat ':' as a special tag (among the current a b i p set) with following properties:
- is optional
- must be added as last
- must be followed by a 'value' that is treated as the cacheKey that is not ',,' escaped
- value ends at the end of the input string
- not followed (terminated) by ','

This would then actually need just comments update.  Makes sense?
Flags: needinfo?(michal.novotny)
(In reply to Honza Bambas (:mayhemer) from comment #8)
> (In reply to Michal Novotny (:michal) from comment #7)
> > Comment on attachment 8378629 [details] [diff] [review]
> > v2
> > 
> > Review of attachment 8378629 [details] [diff] [review]:
> > -----------------------------------------------------------------
> > 
> > (In reply to Honza Bambas (:mayhemer) from comment #5)
> > > Hmm.. I don't think we would ever just swap attributes in
> > > LoadContextInfoMappingKey w/o any reason.  I'd personally would group them
> > > by logic.
> > 
> > I'm not against grouping them by logic. I just think that we should parse
> > them only in the same order as we serialize them.
> 
> I think I've caught your idea.  Alphabetical sorting is good.
> 
> > 
> > ::: netwerk/cache2/CacheFileUtils.cpp
> > @@ +57,5 @@
> > > +
> > > +    // 'Read' the tag name and move to the next char
> > > +    switch (*caret++) {
> > > +    case ':':
> > > +      // optional terminator, followed by optional cache key
> > 
> > Why is ':' optional? There must always be the colon to separate key prefix
> > and URL, since URL must never be an empty string. Let's make the colon an
> > obligatory part of the key prefix, or treat it as a separate part and never
> > pass it to the parser as a part of the key prefix.
> 
> Don't forget we may need to parse also only the key prefix itself, w/o the
> added ':' + cacheKey.
> 
> Maybe just the comments are wrong.  The thing is that when we find ':' there
> actually must be (here the comment is probably confusing) a cache key then,
> and we don't expect any ',' after it.  
> 
> I thought it would be better to keep the scope key short and less confusing
> by not adding ':' when we want to keep just the key prefix alone.  What you
> suggest is IMO a bit risky and might break something.
> 
> Idea: treat ':' as a special tag (among the current a b i p set) with
> following properties:
> - is optional
> - must be added as last
> - must be followed by a 'value' that is treated as the cacheKey that is not
> ',,' escaped
> - value ends at the end of the input string
> - not followed (terminated) by ','
> 
> This would then actually need just comments update.  Makes sense?

Makes sense, but changes in the code are needed for parsing in alphabetical order.
Flags: needinfo?(michal.novotny)
(In reply to Michal Novotny (:michal) from comment #9)
> Makes sense, but changes in the code are needed for parsing in alphabetical
> order.

What changes exactly?  Assertions?  I was somewhat thinking about it, but I don't think that enforcing it is wise.  If we load a broken file (can we get that far to parse the context key?) then it will be badly asserting.  But if you believe the assertion is a good think to have, I can assert it.
Note: fail parser on non-alphabetical order.
Note: Fix CacheFileContextEvictor::GetContextFile when landed after bug 977766.
Note: oh, I think the fact that CreateKeyPrefix appends is a bug and a bad thing to count on.  We should either rename it to AppendKeyPrefix or change the behavior.  I prefer the former.
Attached patch v3Splinter Review
- alphabetical order enforced
- '~idExtension,' tag added
- CacheFileUtils::ParseKey now returns idExtension and URI spec separately
- s/CreateKeyPrefix/AppendKeyPrefix/
- CacheFileIOManager::DoomFileInternal updated to use idextension + url properly
- CacheStorageService::CacheFileDoomed rewritten to enter/leave the service lock just once to prevent races

Tested with included CPP test code + manually tested the CacheFileDoomed with reduced cache capacity of 1MB
Attachment #8378629 - Attachment is obsolete: true
Attachment #8400897 - Flags: review?(michal.novotny)
Attachment #8400897 - Flags: review?(michal.novotny) → review+
https://hg.mozilla.org/mozilla-central/rev/6109876e1a22
Status: ASSIGNED → RESOLVED
Closed: 6 years ago
Flags: in-testsuite+
Resolution: --- → FIXED
Target Milestone: --- → mozilla31
Blocks: 1003091
Depends on: 993649
No longer depends on: 993649
You need to log in before you can comment on or make changes to this bug.