Closed Bug 712914 Opened 12 years ago Closed 12 years ago

Disk cache does not verify that it correctly serialized/deserialized security info

Categories

(Core :: Networking: Cache, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla14
Tracking Status
firefox14 --- fixed
firefox-esr10 - unaffected

People

(Reporter: briansmith, Assigned: u408661)

References

(Blocks 2 open bugs)

Details

(Keywords: addon-compat, Whiteboard: [sg:moderate][advisory-tracking+])

Attachments

(1 file, 19 obsolete files)

8.18 KB, patch
u408661
: review+
Details | Diff | Splinter Review
If the security info is corrupted for whatever reason, we will silently ignore the corruption. This means we may report the wrong security info and/or make same-origin/mixed-content decisions wrongly.

See
http://mxr.mozilla.org/mozilla-central/source/netwerk/cache/nsDiskCacheEntry.cpp?rev=ec7577dec4fc#90

As a consequence, we will probably have to change the disk cache version, blowing away the user's disk cache when they upgrade, when we resolve bug 697781. I am marking bug 697781 as depending on this one because we should fix this and any similar issues before doing anything that would cause us to need to bump the disk cache version again
In this code:

    // Store security info, if it is serializable
    nsCOMPtr<nsISerializable> serializable =
        do_QueryInterface(entry->SecurityInfo());
    if (serializable) {
        nsCString info;
        NS_SerializeToString(serializable, info);
        entry->SetMetaDataElement("security-info", info.get());
    }

We should know whether the security info should be serialized, not by whether it is serializable, but by the scheme of the URL of the item.
Summary: Disk cache does not verify that it correctly deserialized security info → Disk cache does not verify that it correctly serialized/deserialized security info
Sounds like it might be an sg:high bug if we knew how to corrupt security info at will, but I don't think we do so I'm bumping the security severity down a notch.
Whiteboard: [sg:moderate]
Assignee: nobody → hurley
(In reply to Brian Smith (:bsmith) from comment #0)
> As a consequence, we will probably have to change the disk cache version,
> blowing away the user's disk cache when they upgrade, when we resolve bug
> 697781.

We need to blow away the https://* in the the cache because there is a small but real chance that some of those entries have invalid security info in them. Changing the disk cache version number is the easiest way to do that, but it might not be the best. Another way of doing it would be to doom all entries that use the previous security-info format, as was suggested by :hurley in private email. If we do that, it is really important that we fix this bug before fixing bug 697781.
First run of a proposed solution running through try right now, will upload patch if all goes well

https://tbpl.mozilla.org/?tree=Try&rev=66a9de4d328d
*sigh* let's go with this link, instead https://tbpl.mozilla.org/?tree=Try&rev=9f80027ca47c
Attached patch proposed patch (obsolete) — Splinter Review
Here's a first run (try is green with it). It adds a new metadata entry for parity of the security-info and refuses to read any security-info without one. I'm not tied to using parity (given how simple it is), maybe doing a CRC or some other checksum of the security-info is a better idea? We're never going to get cryptographic-level integrity on the security-info, so I'm not even going to try for that.

The only thing we don't actively do is doom the entries, but that's a bigger issue with the cache so far as I can tell (if any entry fails to be properly read, it seems on my quick inspection that we just leave that space used until an eviction event occurs that reaps the unused entry).
Attachment #589887 - Flags: feedback?(bsmith)
Attachment #589887 - Flags: feedback?(bjarne)
Comment on attachment 589887 [details] [diff] [review]
proposed patch

Review of attachment 589887 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good in general.

I agree that parity might be a little simple. You may look at nsHttpChannel::Hash() which is used when comparing cookies and there is also the hash used to map URLs in the disk-cache map. Few CPU-cycles is a goal here, I guess, but some simple profiling may defend a proper hash.

You have a point about handling cache-entries not successfully read. I suggest you enter a new bug to discuss and handle this, which would then take care of dooming "old" https entries. Make sure to fix this new bug before bug #697781, as emphasized in comment #4.

::: netwerk/cache/nsDiskCacheMap.cpp
@@ +788,4 @@
>      nsCacheEntry * entry = binding->mCacheEntry;
>      if (!entry)  return nsnull;
>      
>      // Store security info, if it is serializable

Please consider proposal in comment #2 here.
Attachment #589887 - Flags: feedback?(bjarne)
Comment on attachment 589887 [details] [diff] [review]
proposed patch

Review of attachment 589887 [details] [diff] [review]:
-----------------------------------------------------------------

Oh - and please add a test.
Comment on attachment 589887 [details] [diff] [review]
proposed patch

I spoke with Nick about this IRL. I merely want us to:

1. Check the return value from the serialize/deserialize operations.

2. For TLS-protected cache entries (FTPS, SFTP, HTTPS), instead of silently ignoring a missing security-info metadata element, we should do NS_WARNING and then doom the entry. This requires the cache lookup operation to take a flag indicating whether or not we require a security-info metadata element to be present.
Attachment #589887 - Flags: feedback?(bsmith)
Attached patch patch v2 (obsolete) — Splinter Review
Here's an updated patch to use the scheme to determine whether or not we should be able to serialize the security info. If this approach seems good to you, Brian, I'll r? you for the netwerk/ stuff and others for the tests in their modules as appropriate.
Attachment #589887 - Attachment is obsolete: true
Attachment #600051 - Flags: feedback?(bsmith)
Comment on attachment 600051 [details] [diff] [review]
patch v2

Review of attachment 600051 [details] [diff] [review]:
-----------------------------------------------------------------

Yes, this is exactly what I had in mind when filing the bug.

::: netwerk/cache/nsICacheSession.idl
@@ +72,5 @@
>       * descriptor has been given WRITE access but hasn't validated the entry yet.
>       */
>      nsICacheEntryDescriptor openCacheEntry(in ACString          key,
>                                             in nsCacheAccessMode accessRequested,
> +                                           in boolean           isSecure,

I recommend naming this parameter isSSL.

It would be better to pass symbolic names like "SECURITY_INFO_REQUIRED" and "SECURITY_INFO_NOT_REQUIRED" instead of "true" and "false" when calling this method, because symbol names make the callers easier to read.

Also, "isSecure" is a bad name, because there are many things that go into making something "secure" besides the TLS-related information. securityInfoRequired is a better name.

::: netwerk/protocol/http/nsHttpChannel.cpp
@@ +2102,5 @@
>      nsCOMPtr<nsICacheSession> session;
>  
> +    bool entryIsSecure = false;
> +    if (mURI) {
> +        mURI->SchemeIs("https", &entryIsSecure);

See bug 718002.

It seems dangerous to have this logic spread out across the various *Channel classes and across multiple methods in those classes. It would be better to implement a function that takes a scheme and returns "Is TLS," "Is NOT TLS," or "Unknown", as that is needed for other things as well.

::: netwerk/protocol/wyciwyg/nsWyciwygChannel.cpp
@@ +705,5 @@
>    if (!cacheSession) 
>      return NS_ERROR_FAILURE;
>  
>    if (aAccessMode == nsICache::ACCESS_WRITE)
> +    rv = cacheSession->OpenCacheEntry(aCacheKey, aAccessMode, false, false,

Often WYCIWYG channels are for HTTPS documents, so we shouldn't be hard-coding false here...

@@ +710,3 @@
>                                        getter_AddRefs(mCacheEntry));
>    else
> +    rv = cacheSession->AsyncOpenCacheEntry(aCacheKey, aAccessMode, false, this);

...or here.
Attachment #600051 - Flags: feedback?(bsmith) → feedback+
Attached patch patch (obsolete) — Splinter Review
Here's the latest patch, it's running through try now https://tbpl.mozilla.org/?tree=Try&rev=42e537f2aa38
I'll request review after (assuming) it goes well
Attachment #600051 - Attachment is obsolete: true
Attached patch patch (obsolete) — Splinter Review
OK, that last one had some issues. Here's one that looks good on try https://tbpl.mozilla.org/?tree=Try&rev=6f93d841f343

I'm not a big fan of how I named NS_SchemeIsTLS and associated enums, but my head's a bit hazy today, so I'm open to suggestions.
Attachment #602034 - Attachment is obsolete: true
Attachment #602241 - Flags: superreview?(bzbarsky)
Attachment #602241 - Flags: review?(bsmith)
So a question about the IDL changes.  Was there a reason to not put the new argument last and make it optional?  That would leave the APIs compatible with existing JS callers, right?  Seems like that would be way better for extension compat.
My reasoning was twofold: (1) that just felt like the right-ish place for that argument in the parameter list (not the greatest reason, I know), and (2) it feels a little wrong to leave a security-related parameter be optional. If we prefer to go for better extension compat, though, that's fine, it'll make the patch smaller :)
You're adding the default "0" value to a ton of places in your patch.

I think making it optional, defaulting to that behavior, would be perfectly reasonable.

Brian?
Comment on attachment 602241 [details] [diff] [review]
patch

I think we want to keep extension compat here.
Attachment #602241 - Flags: superreview?(bzbarsky) → superreview-
(In reply to Boris Zbarsky (:bz) from comment #15)
> So a question about the IDL changes.  Was there a reason to not put the new
> argument last and make it optional?  That would leave the APIs compatible
> with existing JS callers, right?

Boris, below I will argue that it is better to change the interface in an incompatible way. But, I am interested also in understanding why you think we should keep the interface compatible. Is it just because you think the break would be gratuitous? Or, is there some other reason.

AFAICT, existing JS callers need to change to indicate whether they are expecting the security-info to be there or not. We are going to change break this anyway with other changes waiting for review (bug 722033, bug 695399). Over the course of the next few quarters, we need flexibility in changing the API, so I would like to make almost the entire cache API private (at least warn addons not to use it) until it can become stable again (if ever).

If we need to keep addon compatibility for this change, then the default should be fail-safe: that is, the default should be to require the security-info to be present.

We can reduce the number of times the interface changes by changing (Async)OpenCacheEntry to take a bitfield flags argument. One flag of which would be for this bug (IS_NOT_TLS), and another flag would be for bug 722033 and bug 695399 (LOAD_BYPASS_LOCAL_CACHE_IF_BUSY).
> Is it just because you think the break would be gratuitous?

As far as I can tell from the patch, yes.  Maybe I'm missing context?

> AFAICT, existing JS callers need to change to indicate whether they are expecting the
> security-info to be there or not.

OK.  All the JS callers I saw in the patch at first skim don't expect it, so are passing 0 effectively, right?

> If we need to keep addon compatibility for this change, then the default should be
> fail-safe

That seems reasonable.

> We can reduce the number of times the interface changes by changing
> (Async)OpenCacheEntry to take a bitfield flags argument.

That seems like a good idea, yes.

Can we do it in a way that preserves JS source-level compat by just changing the access mode argument to be the low two bits of this field?  Seems like it would work...

For the rest, I believe this interface is pretty commonly used in extensions, so if we do decide to break it we should give the AMO folks some heads-up here.
(In reply to Boris Zbarsky (:bz) from comment #20)
> > Is it just because you think the break would be gratuitous?
> 
> As far as I can tell from the patch, yes.  Maybe I'm missing context?

For security reasons, the caller needs to determine whether the resource was loaded over a TLS-based connection, and pass that information to the cache. (The cache doesn't understand URLs or schemes.) We can make the change fail-safe, but that would break cache users that open HTTP (vs. HTTPS) entries, which I suspect are the majority of users.

> For the rest, I believe this interface is pretty commonly used in
> extensions, so if we do decide to break it we should give the AMO folks some
> heads-up here.

https://mxr.mozilla.org/addons/search?string=asyncOpenCacheEntry
https://mxr.mozilla.org/addons/search?string=openCacheEntry

Most of them are using the synchronous openCacheEntry method that blocks the main thread on disk I/O :(.
> but that would break cache users that open HTTP (vs. HTTPS) entries

Every single JS caller in the attached patch passes SECURITY_INFO_NOT_REQUIRED, right?  Are they just buggy?

> Most of them are using the synchronous openCacheEntry method

Yep.  If you give people an easy path...
(In reply to Boris Zbarsky (:bz) from comment #22)
> > but that would break cache users that open HTTP (vs. HTTPS) entries
> 
> Every single JS caller in the attached patch passes
> SECURITY_INFO_NOT_REQUIRED, right?  Are they just buggy?

Most of them are tests, and either we don't test storing HTTPS (or FTPS, etc.) entries in the cache or the modifications to the tests are buggy.

The JS changes under browser/* do seem buggy to me. I would expect them to work more like the changes to the channel implementations. Even if the immediate use of a cache entry doesn't require the security info, we must still pass SECURITY_INFO_REQUIRED if the scheme of the URL of the cached resource indicates TLS, in case future accesses to the entry require the security-info.

I just realized that I will need (Asnyc)OpenCacheEntry to also take the host and port, at least, to implement bug 660749 and bug 722034. Perhaps it would be better to do the following:

* Define AsyncOpenCacheEntryForOrigin that takes a (scheme, host, port) and some currently-unused flags in addition to the existing AsyncOpenCacheEntry arguments, checks if the scheme is one of { https, ftps } requiring security-info if so (and, eventually, verifying the certificate against host:port), or otherwise refusing to open the cache entry if the scheme isn't one of { http, ftp }. (i.e. This method would only support caching HTTPS, HTTP, FTP, and FTPS schemes.) (This would subsume the NS_IsSchemeTLS method, which IMO isn't general-purpose enough to be in NetUtils.)

* Define AsyncOpenCacheEntryForNonNetworkResource with the same signature as AsyncOpenCacheEntry + some currently-unused flags.

* Replace callers of AsyncOpenCacheEntry in Gecko and Firefox with AsyncOpenCacheEntryForOrigin.

* Mark AsyncOpenCacheEntry and OpenCacheEntry deprecated, have them warn when used in debug builds, post a follow-up bug for their removal, post a note about their imminent removal to dev.platform, and directly email the developers of the addons known to use these obsolete methods.

Thoughts?
OK, if the JS changes in the patch are just wrong your objections make more sense.  ;)

I'm not the best person to ask about global cache architecture changes, unfortunately....
As in, no useful thoughts, but worth checking with Honza and maybe biesi.
I'm fine with pretty much everything suggested so far. Turning the existing access argument into flags for compatibility sounds pretty good, with the caveat that doing fail-safe will break addons (which is what we're trying to avoid, here).

I can certainly buy some of the changes under browser/ being buggy (specifically the pageinfo.js changes, hence code review), but I can't buy the thumbnail storage changes being buggy (how does a thumbnail have security info?!), unless I'm misunderstanding what's going on in those files (entirely possible).

The cache interface changes Brian suggested in comment 23 may or may not be useful in the long run (I haven't looked at them in depth), but I don't want this to get bogged down in things that aren't necessary to get this bug fixed.
Attached patch proposed idl changes (obsolete) — Splinter Review
Brian, Boris:

Here's my proposal for changes to the cache IDLs to resolve (I think) all the issues with the first patch I put up for review. All the implementation is still the same underneath, but before I put in the effort to go through and change all the tests, etc, I wanted to run this by you guys to see how you feel about it (that way we can get the API mostly nailed down without longer back-and-forth cycles needed for a full patch).

This preserves the existing API, but marks it deprecated, and adds new APIs to do what we need to do here, as well as adding in flags for later usage to expand the API. I added one new synchronous API that's already marked deprecated, because we will only need it until bug 695399 is resolved (which is the bug to remove all synchronous OpenCacheEntry calls). I also marked that API noscript so we don't get addons using that API, making it less painful to remove in the future.
Attachment #603348 - Flags: feedback?(bzbarsky)
Attachment #603348 - Flags: feedback?(bsmith)
I'd really like Brian to OK things before reviewing the details...
Comment on attachment 603348 [details] [diff] [review]
proposed idl changes

Update the documentation for each method (new and old) to indicate that their use by addons is not supported.

I suggest making all the new methods [noscript]. This, combined with the above, will allow us to change the signatures as we need to, unless some popular addon just totally ignores the rules.

With this in mind, maybe we don't need to have separate key and uri parameters for the new functions? If we need to support non-URI keys then I suggest we have the methods take (scheme, host, port) parameters (i.e. an origin) instead of a whole URI, since we'll only be inspecting the origin part anyway.

nsCacheFlags should be defined in the idl it is used in.
Attachment #603348 - Flags: feedback?(bsmith) → feedback+
(In reply to Brian Smith (:bsmith) from comment #29)
> Comment on attachment 603348 [details] [diff] [review]
> proposed idl changes
> 
> Update the documentation for each method (new and old) to indicate that
> their use by addons is not supported.

Was already planning on it, this was just to see how people felt about the interface itself :)

> I suggest making all the new methods [noscript]. This, combined with the
> above, will allow us to change the signatures as we need to, unless some
> popular addon just totally ignores the rules.

We need to leave these scriptable, since they are, in fact, accessed from script in browser/ (and not just for tests). I personally find xpcshell unit tests much nicer and easier to write than C++ unit tests, anyway, so there's another point in favor of scriptability :)

> With this in mind, maybe we don't need to have separate key and uri
> parameters for the new functions? If we need to support non-URI keys then I
> suggest we have the methods take (scheme, host, port) parameters (i.e. an
> origin) instead of a whole URI, since we'll only be inspecting the origin
> part anyway.

The URI and key need to be kept separate. HTTP already makes use of the fact that the key is an opaque (to the cache) string by incorporating the post ID into the cache key (if applicable).

As far as taking just scheme/host/port, I can see where you're coming from, but (for this patch, at least) we're only inspecting the scheme, so if we're going to generalize, we may as well generalize all the way out to the URI. This also centralizes the extraction of the origin info from the URI into one place, so if we ever feel the need to add to what we're currently calling "origin", we're free to do that at will, without breaking even internal consumers of the API.

> nsCacheFlags should be defined in the idl it is used in.

These types (and their values, of which nsCacheFlags currently has none) are all currently defined in nsICache (for whatever reason), and it makes more sense to me to keep them all together, so there's one centralized location for knowing what values are valid. If we were starting from scratch, I'd agree with you, but since we've already got a canonical location, let's keep them there.
What is the status of this bug? I assume this isn't landing for 12. Am I right?
Keywords: addon-compat
(In reply to Jorge Villalobos [:jorgev] from comment #31)
> What is the status of this bug? I assume this isn't landing for 12. Am I
> right?

That's correct, this will not be in 12. New patch should be forthcoming later tonight or tomorrow morning (PDT) assuming try looks good.
Comment on attachment 603348 [details] [diff] [review]
proposed idl changes

>+++ b/netwerk/cache/nsICache.idl
>-[scriptable, uuid(ec1c0063-197d-44bb-84ba-7525d50fc937)]
>+[scriptable, uuid(81b5506c-c2ef-4992-a88b-250150aa35da)]

This change doesn't seem to be needed.

>+++ b/netwerk/cache/nsICacheSession.idl

Something should define what the flags in nsCacheFlags mean, right?

I'm not quite sure what "restricted to http/ftp (and TLS variant) schemes"
means.  What happens if you pass some other nsIURI in?

The rest looks fine.
Attachment #603348 - Flags: feedback?(bzbarsky) → feedback+
Attached patch patch (obsolete) — Splinter Review
Thanks for the feedback, guys. It's all looking good so far, with the exception of WYCIWYG channels. Right now, I'm not seeing any way to determine whether security info is necessary for a wyciwyg channel or not. When writing the cache entry, I could conceivably just say "does this have security info?" and if so, pass in a flag that says "force security info required" to AsyncOpenCacheEntryForNonURI, but when opening a wyciwyg channel for reading, I'm not seeing any way to tell. Both mURI and mOriginalURI in the channel seem to have wyciwyg:// as the scheme, so there's no way to figure out if it corresponds to an https resource, and AsyncOpenCacheEntryForURI is useless, given the scheme isn't one of the ones it handles. I'm not seeing any reference to the "original" channel being kept in the wyciwyg channel (mOwner *may* be it, but I have zero proof of that), so I can't get the URI from there, either.

Right now, the patch uses the NonURI interfaces, and "works" in that all the tests are green. However, I'm concerned that, given how we never really test anything with SSL anyway, this will cause issues in the real world.

I'm marking this iteration of the patch f? for both of you just to make sure it gets on your radars so we can get a solution figured out to this.

Any thoughts on how to proceed next?
Attachment #602241 - Attachment is obsolete: true
Attachment #606710 - Flags: feedback?(bzbarsky)
Attachment #606710 - Flags: feedback?(bsmith)
Attachment #602241 - Flags: review?(bsmith)
Comment on attachment 606710 [details] [diff] [review]
patch

Review of attachment 606710 [details] [diff] [review]:
-----------------------------------------------------------------

Looks reasonable except for the WYCIWYG bits. Please get a review from Michal if he has time.

::: netwerk/cache/nsCacheSession.cpp
@@ +118,5 @@
>      return rv;
>  }
>  
> +
> +static nsresult IsSecurityInfoRequired(nsIURI *uri, bool *result)

namespace {

// XXX: Ideally, we should be able to determine whether any
// support scheme is TLS-protected or not, instead of
// hard-coding these schemes.
nsresult
IsSecurityInfoRequired(nsIURI *uri, bool *result)
{
...
}

}

@@ +139,5 @@
> +
> +
> +NS_IMETHODIMP
> +nsCacheSession::OpenCacheEntryForURI(nsIURI *uri,
> +                                     const nsACString &key,

Key is always uri without the fragment. I think we should just define this method as taking an nsIURI without the key argument.

::: netwerk/cache/nsICacheSession.idl
@@ +126,5 @@
> +     * have a scheme of http, ftp, https, or ftps. Any other scheme
> +     * will result in NS_ERROR_NOT_AVAILABLE being returned.
> +     */
> +    void asyncOpenCacheEntryForURI(in nsIURI            uri,
> +                                   in ACString          key,

Remove the key argument.

::: netwerk/protocol/wyciwyg/nsWyciwygChannel.cpp
@@ +705,5 @@
>    if (!cacheSession) 
>      return NS_ERROR_FAILURE;
>  
>    if (aAccessMode == nsICache::ACCESS_WRITE)
> +    rv = cacheSession->OpenCacheEntryForNonURI(aCacheKey,

I also am unsure how to map a WYCIWYG URI to the original URI. Michal and Honza should know.
Attachment #606710 - Flags: feedback?(bsmith) → feedback+
If we don't have any tests that store HTTPS resources then we should add at least one. If there is some problem with creating the test, please let me know how I can help.
Oops! I made a mistake in my feedback:

Do NOT add or implement the synchronous openCacheEntryForURI method. It cannot be implemented correctly, because bug 660749 cannot be fixed (with reasonable effort) for a synchronous API.
Comment on attachment 606710 [details] [diff] [review]
patch

Cancelling feedback until 2 last bits are decided with Brian
Attachment #606710 - Flags: feedback?(bzbarsky)
Depends on: 722033
No longer depends on: 715752
Attached patch patch (obsolete) — Splinter Review
Here's what should be the final patch (modulo perhaps some minor changes, I hope). It still has the synchronous openCacheEntryForURI until nsHttpChannel is fixed (Michal is working on this) and wyciwyg channels, as well (I opened a bug for this a couple days ago, should be an easy fix). However, I don't want to block getting this landed on those things.
Attachment #606710 - Attachment is obsolete: true
Attachment #608347 - Flags: superreview?(bzbarsky)
Attachment #608347 - Flags: review?(bsmith)
And of course, I forgot to put the link to try in: https://tbpl.mozilla.org/?tree=Try&rev=74a67d597608
Comment on attachment 608347 [details] [diff] [review]
patch

Review of attachment 608347 [details] [diff] [review]:
-----------------------------------------------------------------

Looks pretty good to me. Please address the few small issues I noted and upload a new patch.

::: netwerk/cache/nsCacheDevice.h
@@ +63,5 @@
>      virtual nsresult Init() = 0;
>      virtual nsresult Shutdown() = 0;
>  
>      virtual const char *   GetDeviceID(void) = 0;
> +    virtual nsCacheEntry * FindEntry( nsCString * key, bool securityInfoRequired, bool *collision ) = 0;

Please keep line length to 80 chars for the lines you modify/add. Otherwise, splinter doesn't work well.

::: netwerk/cache/nsCacheEntry.h
@@ +223,5 @@
>      bool     RemoveRequest( nsCacheRequest * request);
>      bool     RemoveDescriptor( nsCacheEntryDescriptor * descriptor);
> +
> +    void SetSecurityInfoRequired() { mSecurityInfoRequired = true; }
> +    bool IsSecurityInfoRequired() { return mSecurityInfoRequired; }

GetSecurityInfo() should be made to fail if mSecurityInfoRequired but there is no security info.

::: netwerk/cache/nsCacheService.cpp
@@ +1567,5 @@
>      if (mMaxKeyLength < key->Length()) mMaxKeyLength = key->Length();
>  
>      // create request
> +    *request = new  nsCacheRequest(key, listener, accessRequested,
> +        securityInfoRequired, blockingMode, session);

Remove extra space in "new  " and wrap the second line at the parenthesis (like in the signature to this method).

@@ +1813,5 @@
>      if (!entry) {
>          // search cache devices for entry
>          bool collision = false;
> +        entry = SearchCacheDevices(request->mKey, request->StoragePolicy(),
> +            request->mSecurityInfoRequired, &collision);

wrap at perenthesis.

@@ +1891,5 @@
>  
>  
>  nsCacheEntry *
> +nsCacheService::SearchCacheDevices(nsCString * key, nsCacheStoragePolicy policy,
> +    bool securityInfoRequired, bool *collision)

wrap at parenthesis.

::: netwerk/cache/nsICacheSession.idl
@@ +95,5 @@
> +     * descriptor will be granted.  If 'blockingMode' is set to false, it will
> +     * return NS_ERROR_CACHE_WAIT_FOR_VALIDATION rather than block when another
> +     * descriptor has been given WRITE access but hasn't validated the entry yet.
> +     * The URI passed in must have a scheme of http, ftp, https, or ftps. Any other
> +     * schemes will result in a return of NS_ERROR_NOT_AVAILABLE.

I would replace "Any other schemes..." With "or the function will fail." (We don't need to promise specific error codes.)

@@ +112,5 @@
> +    /**
> +     * Asynchronous cache access. Does not block the calling thread.
> +     * Instead, the listener will be notified when the descriptor is
> +     * available. The uri passed in must have a scheme of http, ftp,
> +     * https, or ftps. Any other scheme will result in NS_ERROR_NOT_AVAILABLE

Some comment on the error case.

::: netwerk/cache/nsMemoryCacheDevice.h
@@ +63,5 @@
>  
>      virtual const char *    GetDeviceID(void);
>  
>      virtual nsresult        BindEntry( nsCacheEntry * entry );
> +    virtual nsCacheEntry *  FindEntry( nsCString * key, bool securityInfoRequired, bool *collision );

Line length.

::: netwerk/protocol/wyciwyg/nsWyciwygChannel.cpp
@@ +148,5 @@
>    mOriginalURI = uri;
> +  mResourceURI = nsnull;
> +
> +  // Get the URI used for the cache here, since nsIOService
> +  // expects to only create a URI on the main thread

This would be clearer if it said "Extract the original URI from the wyciwyg URI by stripping wyciwyg:<whatever else gets stripped>. We must do so here because nsIOService expects..."

By the way, what happens if we have wyciwyg://http://foo but foo is HSTS? In that case, we would have to rewrite http://foo to https://foo. But, also, we never should have created the wyciwyg:// URI in the first place.

@@ +151,5 @@
> +  // Get the URI used for the cache here, since nsIOService
> +  // expects to only create a URI on the main thread
> +  nsCAutoString path;
> +  rv = mURI->GetPath(path);
> +  if (NS_SUCCEEDED(rv)) {

Why not NS_ENSURE_SUCCESS(rv, rv);?

@@ +152,5 @@
> +  // expects to only create a URI on the main thread
> +  nsCAutoString path;
> +  rv = mURI->GetPath(path);
> +  if (NS_SUCCEEDED(rv)) {
> +    PRInt32 secondSlash = path.FindChar('/', 2);

What if there aren't two slashes?

@@ +154,5 @@
> +  rv = mURI->GetPath(path);
> +  if (NS_SUCCEEDED(rv)) {
> +    PRInt32 secondSlash = path.FindChar('/', 2);
> +
> +    rv = NS_NewURI(getter_AddRefs(mResourceURI), path.get() + secondSlash + 1);

Again, why not NS_ENSURE_SUCCESS(rv, rv);?
Attachment #608347 - Flags: review?(bsmith) → review-
Comment on attachment 608347 [details] [diff] [review]
patch

> It'd be best if we knew whether ANY
> * supported scheme is TLS-protected or not.

Add an nsIProtocolHandler flag for this?  Followup bug fine. 

sr=me
Attachment #608347 - Flags: superreview?(bzbarsky) → superreview+
You must apply "for-brian.patch" and then then "for-brian-fixed" before applying this patch.

This patch simplifies the API created by this patch and by Michal's patch in bug 722033.
Attached patch coalesced patch (obsolete) — Splinter Review
This coalesces all the patches into one (with some compile fixes & removal of unnecessary code from earlier versions). Carrying forward sr+ since IDLs are the same.

Michal, can you get to this soon, so we can get this in for Q1, or should I get someone else to review this?
Attachment #608347 - Attachment is obsolete: true
Attachment #609367 - Attachment is obsolete: true
Attachment #609368 - Attachment is obsolete: true
Attachment #609369 - Attachment is obsolete: true
Attachment #609395 - Flags: superreview+
Attachment #609395 - Flags: review?(michal.novotny)
Once again, I forgot the try link: https://tbpl.mozilla.org/?tree=Try&rev=2ae48dcbe615 (still running, will keep an eye on it until it completes)
For future reference, try looks nice & green.
Comment on attachment 609395 [details] [diff] [review]
coalesced patch

Review of attachment 609395 [details] [diff] [review]:
-----------------------------------------------------------------

::: netwerk/cache/nsCacheSession.cpp
@@ +198,5 @@
> +                            nsCacheAccessMode accessRequested,
> +                            bool securityInfoRequired,
> +                            nsICacheListener *listener,
> +                            bool noWait)
> +{

You must have NS_ENSURE_ARG_POINTER(listener) here because the semantics of the patch for bug 739307 require it.

I did not review the rest of the patch.
Attachment #609395 - Flags: review-
Attached patch coalesced patch (obsolete) — Splinter Review
Adds NS_ENSURE_ARG_POINTER as brian requested.
Attachment #609395 - Attachment is obsolete: true
Attachment #609533 - Flags: superreview+
Attachment #609533 - Flags: review?(michal.novotny)
Attachment #609395 - Flags: review?(michal.novotny)
Thanks Nick! I just hopped over here to tell you that I should be doing it in my patch, instead of you doing it in yours. You can leave the check in your patch and I am sorry I wasted your time making you do my work here!
Comment on attachment 609533 [details] [diff] [review]
coalesced patch

>      // Store security info, if it is serializable
> -    nsCOMPtr<nsISerializable> serializable =
> -        do_QueryInterface(entry->SecurityInfo());
> -    if (serializable) {
> +    if (entry->IsSecurityInfoRequired()) {

The comment is wrong. The security info is stored when it is required and not when it is serializable.


>      // Restore security info, if present

The same as above.


> +    if (securityInfoRequired) {
> +        const char* info = entry->GetMetaDataElement("security-info");
> +        if (!info) {
> +            delete entry;
> +            return nsnull;
> +        }
> +
>          nsCOMPtr<nsISupports> infoObj;
> -        NS_DeserializeObject(nsDependentCString(info), getter_AddRefs(infoObj));
> +        rv = NS_DeserializeObject(nsDependentCString(info), getter_AddRefs(infoObj));
> +        if (NS_FAILED(rv)) {
> +            delete entry;
> +            return nsnull;
> +        }

Returning null won't delete the wrong entry. SearchCacheDevices() won't just see it, so a new duplicate one will be created.


>      /**
>       * Asynchronous cache access. Does not block the calling thread. Instead,
>       * the listener will be notified when the descriptor is available. If
>       * 'noWait' is set to true, the listener will be notified immediately with
>       * status NS_ERROR_CACHE_WAIT_FOR_VALIDATION rather than queuing the request
>       * when another descriptor has been given WRITE access but hasn't validated
>       * the entry yet.
>       */
>      void asyncOpenCacheEntry(in ACString           key,
>                               in nsCacheAccessMode  accessRequested,
> -                             in nsICacheListener   listener,
> -                             [optional] in boolean noWait);
> +                             in nsICacheListener   listener);

What is the reason for removing noWait argument here? BTW you did't remove it from the comment.


> +    /**
> +     * Asynchronous cache access. Does not block the calling thread.
> +     * Instead, the listener will be notified when the descriptor is
> +     * available. The uri passed in must have a scheme of http, ftp,
> +     * https, or ftps, otherwise this will fail.
> +     */
> +    void asyncOpenCacheEntryForURI(in nsIURI            uri,
> +                                   in ACString          key,
> +                                   in nsCacheAccessMode accessRequested,
> +                                   in nsICacheListener  listener,
> +                                   in boolean           noWait);

And you should probably add a description of noWait here.


> +  // Extract the original URI by stripping "wyciwyg://". We do this here
> +  // because nsIOService expects to only create a URI on the main thread.

The code is correct but the comment is wrong. In fact, you are stripping "wyciwyg://123/" from the URI.


These changes need definitely a test. Existing tests won't cover edge cases like opening entry with invalid/missing security info etc.

Otherwise it looks good, but I have probably a totally dump question. Why we must determine if the security info is required on the cache level? Why we just can't fail in nsDiskCacheMap::CreateDiskCacheEntry() when entry->SecurityInfo() is non-null and we can't serialize it? In nsDiskCacheEntry::CreateCacheEntry() we would fail (and doom the entry) when the security info is present but can't be deserialized. And if it's not present but it should be, shouldn't this case be handled on the protocol level (in nsHttpChannel, nsFtpChannel, etc..)? I.e. the cache API wouldn't need any change. I believe that this would be much more elegant and as well as more correct solution.
Attachment #609533 - Flags: review?(michal.novotny) → review-
(In reply to Michal Novotny (:michal) from comment #52)
> Comment on attachment 609533 [details] [diff] [review]
> coalesced patch
> 
> >      // Store security info, if it is serializable
> > -    nsCOMPtr<nsISerializable> serializable =
> > -        do_QueryInterface(entry->SecurityInfo());
> > -    if (serializable) {
> > +    if (entry->IsSecurityInfoRequired()) {
> 
> The comment is wrong. The security info is stored when it is required and
> not when it is serializable.
> 
> 
> >      // Restore security info, if present
> 
> The same as above.

Easy fixes, will do.

> > +    if (securityInfoRequired) {
> > +        const char* info = entry->GetMetaDataElement("security-info");
> > +        if (!info) {
> > +            delete entry;
> > +            return nsnull;
> > +        }
> > +
> >          nsCOMPtr<nsISupports> infoObj;
> > -        NS_DeserializeObject(nsDependentCString(info), getter_AddRefs(infoObj));
> > +        rv = NS_DeserializeObject(nsDependentCString(info), getter_AddRefs(infoObj));
> > +        if (NS_FAILED(rv)) {
> > +            delete entry;
> > +            return nsnull;
> > +        }
> 
> Returning null won't delete the wrong entry. SearchCacheDevices() won't just
> see it, so a new duplicate one will be created.

Ah, indeed. The original (bad) entry will need to be doomed.

> >      /**
> >       * Asynchronous cache access. Does not block the calling thread. Instead,
> >       * the listener will be notified when the descriptor is available. If
> >       * 'noWait' is set to true, the listener will be notified immediately with
> >       * status NS_ERROR_CACHE_WAIT_FOR_VALIDATION rather than queuing the request
> >       * when another descriptor has been given WRITE access but hasn't validated
> >       * the entry yet.
> >       */
> >      void asyncOpenCacheEntry(in ACString           key,
> >                               in nsCacheAccessMode  accessRequested,
> > -                             in nsICacheListener   listener,
> > -                             [optional] in boolean noWait);
> > +                             in nsICacheListener   listener);
> 
> What is the reason for removing noWait argument here? BTW you did't remove
> it from the comment.

I *think* it was removed because we never need to use it in this API any more, now that we've added the ForURI variant (Brian is responsible for this change, so he'll have to verify my thinking). I'll make sure to either fix the comment or re-add the parameter for the final patch, once Brian confirms (or not).

> > +    /**
> > +     * Asynchronous cache access. Does not block the calling thread.
> > +     * Instead, the listener will be notified when the descriptor is
> > +     * available. The uri passed in must have a scheme of http, ftp,
> > +     * https, or ftps, otherwise this will fail.
> > +     */
> > +    void asyncOpenCacheEntryForURI(in nsIURI            uri,
> > +                                   in ACString          key,
> > +                                   in nsCacheAccessMode accessRequested,
> > +                                   in nsICacheListener  listener,
> > +                                   in boolean           noWait);
> 
> And you should probably add a description of noWait here.

Will do.

> > +  // Extract the original URI by stripping "wyciwyg://". We do this here
> > +  // because nsIOService expects to only create a URI on the main thread.
> 
> The code is correct but the comment is wrong. In fact, you are stripping
> "wyciwyg://123/" from the URI.

That's easy to fix, but shows how little I know about both FindChar and wyciwyg URIs. Is the "123" literally in the URI, or is that just an example text, and the URI looks like "wyciwyg://<sometext>/<original uri>"?

> These changes need definitely a test. Existing tests won't cover edge cases
> like opening entry with invalid/missing security info etc.

Yes, I'm working with Brian on getting a setup that makes it easy to test these cases (we need an easy-to-create security info object, which he should be providing me a way to get).

> Otherwise it looks good, but I have probably a totally dump question. Why we
> must determine if the security info is required on the cache level? Why we
> just can't fail in nsDiskCacheMap::CreateDiskCacheEntry() when
> entry->SecurityInfo() is non-null and we can't serialize it? In
> nsDiskCacheEntry::CreateCacheEntry() we would fail (and doom the entry) when
> the security info is present but can't be deserialized. And if it's not
> present but it should be, shouldn't this case be handled on the protocol
> level (in nsHttpChannel, nsFtpChannel, etc..)? I.e. the cache API wouldn't
> need any change. I believe that this would be much more elegant and as well
> as more correct solution.

FWIW, I agree, it would be better to handle the decision about whether or not to use an entry in the channel itself (at least in theory), as it makes the cache very "dumb" in terms of not having to know at all what's being stuck in it. However, we already have broken the pure "dumb"-ness of the cache (it already has to know about security info and serializing it, etc), and doing that logic in each channel would be more invasive and quite probably more error-prone, since each channel's implementation could have its own unique bugs.
(In reply to Michal Novotny (:michal) from comment #52)
> Returning null won't delete the wrong entry. SearchCacheDevices() won't just
> see it, so a new duplicate one will be created.

You are saying that we should doom the entry in this case, right? I agree.

> What is the reason for removing noWait argument here? BTW you did't remove
> it from the comment.

> And you should probably add a description of noWait here.

Like Nick said, we only need noWait on the new API currently. We should move the comment. 

> Otherwise it looks good, but I have probably a totally dump question. Why we
> must determine if the security info is required on the cache level? Why we
> just can't fail in nsDiskCacheMap::CreateDiskCacheEntry() when
> entry->SecurityInfo() is non-null and we can't serialize it? In
> nsDiskCacheEntry::CreateCacheEntry() we would fail (and doom the entry) when
> the security info is present but can't be deserialized. And if it's not
> present but it should be, shouldn't this case be handled on the protocol
> level (in nsHttpChannel, nsFtpChannel, etc..)? I.e. the cache API wouldn't
> need any change. I believe that this would be much more elegant and as well
> as more correct solution.

That is what I originally wanted to do. Nick's patch ensures that every SSL-secured page that is in the cache has valid security-info. If we did the check "does this have security info? If so, deserialize it; if this fails, then fail" then we wouldn't handle the (admittedly extremely unusual) case where the security-info is missing when it should be there. The approach in the patch avoids us even needing to worry about if that is possible or not.
Also, the approach in this bug makes it much easier to fix bug 660749 in an efficient way. In particular, with Nick's patch, I am able to do have control flow like this:

    Cache Thread -> Cert Verification Thread -> Main Thread
 
instead of this:

    Cache Thread -> Main Thread -> Cert Verification Thread -> Main Thread

which helps to minimize the performance regression from doing the certificate verification.
(In reply to Nick Hurley [:hurley] from comment #53)
> > The code is correct but the comment is wrong. In fact, you are stripping
> > "wyciwyg://123/" from the URI.
> 
> That's easy to fix, but shows how little I know about both FindChar and
> wyciwyg URIs. Is the "123" literally in the URI, or is that just an example
> text, and the URI looks like "wyciwyg://<sometext>/<original uri>"?

It is just an example. "sometext" is a version of the document. E.g. first wyciwyg document for the http://foo.com/ is wyciwyg://0/http://foo.com/ and another document.write() for the same URI increases the version.
Attached patch coalesced patch (obsolete) — Splinter Review
Here's the patch with everything in Michal's comments BUT the test.

Brian, is there an ETA for your patch containing the separated TransportSecurityInfo? (In other words, will it be possible to get that AND this landed in Q1, or should we either not land this in Q1 or land this without a test?)

Carrying forward sr+, leaving without review until we determine what we want to do about the test.
Attachment #609533 - Attachment is obsolete: true
Attachment #609878 - Flags: superreview+
(In reply to Brian Smith (:bsmith) from comment #54)
> (In reply to Michal Novotny (:michal) from comment #52)
> > Returning null won't delete the wrong entry. SearchCacheDevices() won't just
> > see it, so a new duplicate one will be created.
> 
> You are saying that we should doom the entry in this case, right? I agree.

Right.


> That is what I originally wanted to do. Nick's patch ensures that every
> SSL-secured page that is in the cache has valid security-info. If we did the
> check "does this have security info? If so, deserialize it; if this fails,
> then fail" then we wouldn't handle the (admittedly extremely unusual) case
> where the security-info is missing when it should be there. The approach in
> the patch avoids us even needing to worry about if that is possible or not.

Hmm, but the more I'm thinking about the new API, the less I like it. I will just sum up what I don't like... One would assume from the name of asyncOpenCacheEntryForURI() method that the entry is identified by the URI but we take from URI just the information whether the security info should be present or not (and then we pass this bool argument to the tons of functions that just forward it to the lower level). We could of course remove the key and take it from the URI but this would not work for wyciwyg channel. Also IsSecurityInfoRequired() is something that really shouldn't be part of the cache code.


> +    /**
> +     * For use by nsWyciwygChannel only. This will be removed after
> +     * bugs 695399 and 737614 are resolved.
> +     */
> +    [noscript,deprecated] nsICacheEntryDescriptor
> +    openWriteCacheEntryForURI(in nsIURI uri, in ACString key);

I don't think we will remove the sync call from the wyciwyg channel, so it shouldn't be marked as deprecated. Instead of removing openCacheEntry() and keeping this method as noscript, I'd prefer to keep openCacheEntry(), but change it so that it fails when it is not called on the cache IO thread or maybe on any background thread.
(In reply to Nick Hurley [:hurley] from comment #53)
> > These changes need definitely a test. Existing tests won't cover edge cases
> > like opening entry with invalid/missing security info etc.
> 
> Yes, I'm working with Brian on getting a setup that makes it easy to test
> these cases (we need an easy-to-create security info object, which he should
> be providing me a way to get).

I assume that you need this to test successful parsing of security info, but you shouldn't need it to test missing or corrupted security info. Or am I missing something?
(In reply to Michal Novotny (:michal) from comment #58)
> Hmm, but the more I'm thinking about the new API, the less I like it. I will
> just sum up what I don't like... One would assume from the name of
> asyncOpenCacheEntryForURI() method that the entry is identified by the URI

Ignoring (for now) the rest of your concerns with the new API, would this portion change if the function were renamed? (I'm certainly not tied to the name, it's just the first thing I came up with.)

> > +    /**
> > +     * For use by nsWyciwygChannel only. This will be removed after
> > +     * bugs 695399 and 737614 are resolved.
> > +     */
> > +    [noscript,deprecated] nsICacheEntryDescriptor
> > +    openWriteCacheEntryForURI(in nsIURI uri, in ACString key);
> 
> I don't think we will remove the sync call from the wyciwyg channel, so it
> shouldn't be marked as deprecated. Instead of removing openCacheEntry() and
> keeping this method as noscript, I'd prefer to keep openCacheEntry(), but
> change it so that it fails when it is not called on the cache IO thread or
> maybe on any background thread.

I've already filed bug 739473 to remove openwritecacheentryforuri from the xpcom interface entirely, and just use it in wyciwyg without xpcom (trying to keep it as a follow-up so we can get this bug landed by the end of the quarter without too much churn). In terms of changing openCacheEntry, that might be OK eventually (though I would prefer to just remove it entirely), but for now we have consumers of that API in browser/ (bugs have also been filed to get those uses removed).

To answer your question in comment 59, I suppose it would be possible to test missing security info without Brian's changes, not sure how to test corrupted security info without either custom-creating the cache directory (is that even possible?) or having some security info class that intentionally corrupts itself when it serializes (which would be even more work).
(In reply to Nick Hurley [:hurley] from comment #60)
> To answer your question in comment 59, I suppose it would be possible to
> test missing security info without Brian's changes, not sure how to test
> corrupted security info

It should be clear from inspecting the code that missing security-info is treated the same as corrupted security info. So, writing a cache entry with NULL security info,  and then opening it, would be a good test.

Please test both the memory cache and disk cache.

(In reply to Michal Novotny (:michal) from comment #58)
> Hmm, but the more I'm thinking about the new API, the less I like it. I will
> just sum up what I don't like... One would assume from the name of
> asyncOpenCacheEntryForURI() method that the entry is identified by the URI
> but we take from URI just the information whether the security info should
> be present or not (and then we pass this bool argument to the tons of
> functions that just forward it to the lower level).

> Also IsSecurityInfoRequired() is something that really shouldn't be
> part of the cache code.

In the patch I am building on top of this one, we extract the hostname and port from the URI to pass to the cert verifier. We could pass (bool isTLS, in ACString hostname, long port) parameters instead of the URI if anyone cares strongly about the above. But, we are running out of time before our deadline so...let's decide one way or the other.

> I don't think we will remove the sync call from the wyciwyg channel, so it
> shouldn't be marked as deprecated. Instead of removing openCacheEntry() and
> keeping this method as noscript, I'd prefer to keep openCacheEntry(), but
> change it so that it fails when it is not called on the cache IO thread or
> maybe on any background thread.

Let's just remove [deprecated] then. If/when we have a more concrete plan for removing it in the next few weeks, we can deprecate and/or remove it before Firefox 14 release, which is more than 12 weeks away.
I hope this patch clarifies my previous comment.
Attachment #610042 - Flags: feedback?(michal.novotny)
Attachment #610042 - Flags: feedback?(hurley)
Comment on attachment 610042 [details] [diff] [review]
Show how host/port is used by patch for bug 660749

Review of attachment 610042 [details] [diff] [review]:
-----------------------------------------------------------------

This all seems sensible enough, to me. It kind of sucks that we have to keep shoving stuff into the cache that it shouldn't care about (in an ideal world), but if it keeps work off the main thread and keeps thread round trips low, that seems like an acceptable tradeoff, IMHO.
Attachment #610042 - Flags: feedback?(hurley) → feedback+
Attached patch coalesced patch (obsolete) — Splinter Review
Here's a patch that contains all the testing that can be done until Brian's changes land. I'll post the try URL once I have it, but I'll note that the previous try run (https://tbpl.mozilla.org/?tree=Try&rev=843f0a8a0c27) is fine, and the new xpcshell test runs fine on my machine. The only reasons I'm doing a new full try run are (1) to make sure the xpcshell test runs fine on other platforms, and (2) to make sure the deleting of disk records in FindEntry (the only native code change in this version) doesn't accidentally break anything. I'm confident those 2 conditions will be met, but I'm also paranoid :)

The only thing left that we have to decide on is what to do about the ForURI async API. My personal inclination is to say leave it (especially given that Brian has work that builds on it), but I'm open to good arguments otherwise, and also to renaming to make it less confusing (if anyone has any suggestions, I'll also mull that over during lunch.)
Attachment #609878 - Attachment is obsolete: true
Attachment #610244 - Flags: superreview+
Attachment #610244 - Flags: review?(michal.novotny)
(In reply to Brian Smith (:bsmith) from comment #61)
> In the patch I am building on top of this one, we extract the hostname and
> port from the URI to pass to the cert verifier.

Where exactly is the certification verified? I can't find it in the patch #610042. The only effect of this 50k patch is that we set host and port of the security info in nsDiskCacheEntry::CreateCacheEntry(), right? Why can't we change security info so that it includes host and port in its serialized form?
(In reply to Michal Novotny (:michal) from comment #66)
> Where exactly is the certification verified? I can't find it in the patch
> #610042. The only effect of this 50k patch is that we set host and port of
> the security info in nsDiskCacheEntry::CreateCacheEntry(), right?

> Why can't we change security info so that it includes host and port in its 
> serialized form?

1. This would be a change in the cache disk format.
2. This would make the metadata larger, which would be (slightly) worse for performance.
3. This patch avoids #1 and #2, so it is a less risky change.
4. When we have the opportunity to make a breaking change to the cache disk format we will completely redo the serialization of the security info to be significantly more compact. This requires this bug to be fixed first.

The certificate verification will happen from nsCacheService::NotifyListener. Here is a portion of that patch:

nsresult
 nsCacheService::NotifyListener(nsCacheRequest *          request,
-                               nsICacheEntryDescriptor * descriptor,
+                               nsCacheEntryDescriptor *  descriptor,
                                nsCacheAccessMode         accessGranted,
                                nsresult                  status)
 {
     NS_ASSERTION(request->mThread, "no thread set in async request!");
 
     // Swap ownership, and release listener on target thread...
     nsICacheListener *listener = request->mListener;
     request->mListener = nsnull;
 
+    if (descriptor && NS_SUCCEEDED(status) && request->mIsTLS) {
+        // Cannot use descriptor->GetSecurityInfo() because we are holding the
+        // cache service lock, and GetSecurityInfo() would attempt to acquire it.
+
+        // If we do NOT have an infoObject, then we must have just created an
+        // empty cache entry, so we have nothing to validate yet. The caller is
+        // expected to set the security info before closing the cache entry to
+        // security info that was already validated.
+        if (descriptor->CacheEntry()->SecurityInfo()) {
+            status = descriptor->CacheEntry()->SecurityInfo()
+                               ->VerifySecurityInfoFromCacheLookup(
+                                          listener, request->mThread,
+                                          descriptor, accessGranted);
+            if (NS_SUCCEEDED(status)) {
+                return status;
+            }
+        }
+    }
+
+    // Do not pass a descriptor to the listener when there was an error. In
+    // particular, we avoided validating the security-info (if any) because of
+    // the error.
+    if (NS_FAILED(status)) {
+        descriptor = nsnull;
+    }
+
     nsCOMPtr<nsIRunnable> ev =
             new nsCacheListenerEvent(listener, descriptor,
                                      accessGranted, status);
 
     return request->mThread->Dispatch(ev, NS_DISPATCH_NORMAL);
 }

VerifySecurityInfoFromCacheLookup() sends a request to the cert verification thread to verify the certificate. When certificate verification is complete, the certificate verification thread will call mRequest->listener->OnCacheEntryAvailable on request->mThread.
(In reply to Brian Smith (:bsmith) from comment #67)
> (In reply to Michal Novotny (:michal) from comment #66)
> > Where exactly is the certification verified? I can't find it in the patch
> > #610042. The only effect of this 50k patch is that we set host and port of
> > the security info in nsDiskCacheEntry::CreateCacheEntry(), right?

Right. This is a prerequisite for the cert verification patch.
Comment on attachment 610244 [details] [diff] [review]
coalesced patch

> -    nsCacheEntry * entry = diskEntry->CreateCacheEntry(this);
> +    bool doomEntry = false;
> +    nsCacheEntry * entry = diskEntry->CreateCacheEntry(this,
> +                                                       securityInfoRequired,
> +                                                       &doomEntry);
> +    if (doomEntry) {
> +        // Make sure this can never be found by FindRecord again
> +#ifdef DEBUG
> +        nsresult rv =
> +#endif
> +            mCacheMap.DeleteRecord(&record);
> +        NS_ASSERTION(NS_SUCCEEDED(rv),"DeleteRecord failed.");
> +        return nsnull;
> +    }


You need to delete storage too, i.e. add mCacheMap.DeleteStorage(&record). Also you don't need to add doomEntry parameter. Simple delete the entry when nsDiskCacheEntry::CreateCacheEntry() returns null. This will delete the entry also in case when metadata can't be unflatten, but that's actually what we want to do.


> +function write_no_secinfo(entry, accessGranted, status) {
> +  write_data(entry);
> +
> +  try {
> +    entry.close();
> +  } catch (e) {
> +    do_throw("Unexpected exception closing bare entry: " + e);
> +  }
> +
> +  test_missing_secinfo();
> +}

You can write invalid security info simply by doing entry.setMetaDataElement("security-info", "blablabla");

Ideally the test should also check that the invalid entry was removed (enumerate all entries in the cache), and that all space was freed (check cache size).
(In reply to Michal Novotny (:michal) from comment #69)
> Ideally the test should also check that the invalid entry was removed
> (enumerate all entries in the cache), and that all space was freed (check
> cache size).

I agree that it makes sense to check that the entry was removed, but I think that the test that the disk space is freed should be covered by other tests (not to be done in this bug) that verify that space is freed when a cache entry is removed.
(In reply to Brian Smith (:bsmith) from comment #70)
> I agree that it makes sense to check that the entry was removed, but I think
> that the test that the disk space is freed should be covered by other tests
> (not to be done in this bug) that verify that space is freed when a cache
> entry is removed.

Such test would point out the missing DeleteStorage(), but I'm fine with omitting this test.
(In reply to Michal Novotny (:michal) from comment #71)
> (In reply to Brian Smith (:bsmith) from comment #70)
> > I agree that it makes sense to check that the entry was removed, but I think
> > that the test that the disk space is freed should be covered by other tests
> > (not to be done in this bug) that verify that space is freed when a cache
> > entry is removed.
> 
> Such test would point out the missing DeleteStorage(), but I'm fine with
> omitting this test.

Gotcha. I forgot that the existing tests don't try to test anything with security-info.
(In reply to Brian Smith (:bsmith) from comment #67)
> +    if (descriptor && NS_SUCCEEDED(status) && request->mIsTLS) {
> +        // Cannot use descriptor->GetSecurityInfo() because we are holding
> the
> +        // cache service lock, and GetSecurityInfo() would attempt to
> acquire it.
> +
> +        // If we do NOT have an infoObject, then we must have just created
> an
> +        // empty cache entry, so we have nothing to validate yet. The
> caller is
> +        // expected to set the security info before closing the cache entry
> to
> +        // security info that was already validated.
> +        if (descriptor->CacheEntry()->SecurityInfo()) {
> +            status = descriptor->CacheEntry()->SecurityInfo()
> +                               ->VerifySecurityInfoFromCacheLookup(
> +                                          listener, request->mThread,
> +                                          descriptor, accessGranted);
> +            if (NS_SUCCEEDED(status)) {
> +                return status;
> +            }
> +        }
> +    }


This whole concept is IMO wrong. We should probably create a new interface nsISecureCacheSession that would be a wrapper for nsICacheSession. This new interface would implement the asyncOpenCacheEntryForURI() method, it would call nsICacheSession::asyncOpenCacheEntry() and listen for the result. In the listener it would do all the security checks (if the secinfo is needed, if it is valid, otherwise doom the entry at this level). Then it would call the original listener. And this code could (or even should) live outside the netwerk/cache directory.

BTW I think that the current patch won't handle following scenario correctly:

1) Open the cache entry using asyncOpenCacheEntry()
2) It is inactive so it is read from the disk without the security info (because of asyncOpenCacheEntry())
3) Open the cache entry using asyncOpenCacheEntryForURI()
4) The request is queued since the entry is in use.
5) The request is processed when the previous descriptor is closed, but since the cache entry is active, the security info isn't deserialized.
6) Now you got a problem in nsCacheService::NotifyListener().
Comment on attachment 610244 [details] [diff] [review]
coalesced patch

I'm clearing the review request for now since I'm really not sure that this is the right way to go.
Attachment #610244 - Flags: review?(michal.novotny)
Attached patch coalesced patch (obsolete) — Splinter Review
Here's the patch with the more robust testing requested by Michal in comment 74.
Attachment #610244 - Attachment is obsolete: true
Attachment #610305 - Flags: superreview+
Attachment #610305 - Flags: review?(michal.novotny)
Go me, not seeing these comments before I upload the most recent patch (you can ignore the r? since it seems to be irrelevant at this point)

(In reply to Michal Novotny (:michal) from comment #73)
> This whole concept is IMO wrong. We should probably create a new interface
> nsISecureCacheSession that would be a wrapper for nsICacheSession. This new
> interface would implement the asyncOpenCacheEntryForURI() method, it would
> call nsICacheSession::asyncOpenCacheEntry() and listen for the result. In
> the listener it would do all the security checks (if the secinfo is needed,
> if it is valid, otherwise doom the entry at this level). Then it would call
> the original listener. And this code could (or even should) live outside the
> netwerk/cache directory.

As much as I hate to admit it (given how much work it's going to create for me + reviewers over the next 2 days), I think you're right, especially in light of the scenario you outline below. Even though I can't imagine that scenario playing out often (if at all) in the current state of things, but it's better to protect against it (in case someone does the Wrong Thing at some point later down the line).

> BTW I think that the current patch won't handle following scenario correctly:
> 
> 1) Open the cache entry using asyncOpenCacheEntry()
> 2) It is inactive so it is read from the disk without the security info
> (because of asyncOpenCacheEntry())
> 3) Open the cache entry using asyncOpenCacheEntryForURI()
> 4) The request is queued since the entry is in use.
> 5) The request is processed when the previous descriptor is closed, but
> since the cache entry is active, the security info isn't deserialized.
> 6) Now you got a problem in nsCacheService::NotifyListener().
Attachment #610305 - Flags: superreview+
Attachment #610305 - Flags: review?(michal.novotny)
Attached patch patch (simple version) (obsolete) — Splinter Review
OK, here's the simple version that we agreed on via email. I added in the cleanup of the on-disk entry if we fail to deserialise properly. That's easy to strip out if we want to make that a follow-up instead (I'm obviously of the opinion that cleaning up is The Right Thing to do). I'll add the try URL once it appears in my inbox. The only thing I can see that *might* go wrong on the try run is breaking android xpcshell tests (since disk cache is disabled there, and the test insists on caching to disk). If that happens, we can just skip-if this test on android until we enable the disk cache there.
Attachment #610305 - Attachment is obsolete: true
Attachment #610413 - Flags: review?(michal.novotny)
Here's that try link I promised: https://tbpl.mozilla.org/?tree=Try&rev=a2393f8c7775
Comment on attachment 610413 [details] [diff] [review]
patch (simple version)

>      nsCacheEntry * entry = diskEntry->CreateCacheEntry(this);
> -    if (!entry)  return nsnull;
> +    if (!entry) goto cleanup;
>      
>      binding = mBindery.CreateBinding(entry, &record);
>      if (!binding) {
>          delete entry;
> -        return nsnull;
> +        goto cleanup;
>      }
>      
>      return entry;
> +
> +cleanup:
> +    (void) mCacheMap.DeleteStorage(&record);
> +    (void) mCacheMap.DeleteRecord(&record);
> +    return nsnull;
>  }

I don't like goto too much. Although it is sometimes reasonable to use goto, I don't think this is the case. Could you please change it to something like this?

  nsCacheEntry * entry = diskEntry->CreateCacheEntry(this);
  if (entry) {
    binding = mBindery.CreateBinding(entry, &record);
    if (!binding) {
      delete entry;
      entry = nsnull;
  }

  if (!entry) {
    (void) mCacheMap.DeleteStorage(&record);
    (void) mCacheMap.DeleteRecord(&record);
  }

  return entry;


>      nsCOMPtr<nsISerializable> serializable =
>          do_QueryInterface(entry->SecurityInfo());
>      if (serializable) {
>          nsCString info;
> -        NS_SerializeToString(serializable, info);
> +        nsresult rv = NS_SerializeToString(serializable, info);
> +        if (NS_FAILED(rv)) return nsnull;
>          entry->SetMetaDataElement("security-info", info.get());
>      }

We should also fail in case when we have a security info but QueryInterface to nsISerializable fails.


> +  visitDevice: function (deviceID, deviceInfo) {
> +    if (deviceID == "disk") {
> +      // We only care about the disk device, since that's the only place we
> +      // actually serialize security info
> +      return true;
> +    }
> +
> +    // We don't care about visiting entries on non-disk devices
> +    return false;
> +  },
> +
> +  visitEntry: function (deviceID, entryInfo) {
> +    if (deviceID == "disk") {
> +      this.nentries++;
> +      this.size += entryInfo.dataSize;
> +      return true;
> +    }

In comment #69 I meant to check size in device info. Only this counter would be affected by missing DeleteStorage(). For example see http://hg.mozilla.org/mozilla-central/annotate/563ea372f000/netwerk/test/unit/test_bug651100.js#l86
Attachment #610413 - Flags: review?(michal.novotny)
Attached patch patch (simple version) (obsolete) — Splinter Review
Here's a version of the patch that addresses your comments, Michal
Attachment #610413 - Attachment is obsolete: true
Attachment #610571 - Flags: review?(michal.novotny)
Comment on attachment 610571 [details] [diff] [review]
patch (simple version)

> +        nsresult rv = NS_SerializeToString(serializable, info);
> +        if (NS_FAILED(rv)) return nsnull;
>          entry->SetMetaDataElement("security-info", info.get());

Maybe we should check the return value of SetMetaDataElement(). I don't know if it can fail. Or more precisely, it can fail iff PR_REALLOC is fallible (in nsCacheMetaData::EnsureBuffer), which I'm not sure it is. I'm sorry that I overlooked this in the previous review.

r+ with this fixed
Attachment #610571 - Flags: review?(michal.novotny) → review+
Attached patch patch (simple version) (obsolete) — Splinter Review
Alright, here's the final version. The try run looks good https://tbpl.mozilla.org/?tree=Try&rev=59e2f556dfbd (still a couple android tests running, but I'm confident they'll be fine, since disk cache is disabled on android).
Attachment #610571 - Attachment is obsolete: true
Attachment #610684 - Flags: review+
Keywords: checkin-needed
Forgot to add r=michal to the commit message in the last patch.
Attachment #610684 - Attachment is obsolete: true
Attachment #610687 - Flags: review+
Ignore that comment about disk cache being disabled on android, I must've dreamt it. Still, try looks good so far, and previous versions of the patch have gone green no problem.
ok, try is all green. can either brian or michal land this? thanks!
https://hg.mozilla.org/mozilla-central/rev/461760e5dbd5
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Target Milestone: mozilla12 → mozilla14
We don't think the ESR is affected by this bug - please let us know if that's not the case.
Attachment #603348 - Attachment is obsolete: true
Attachment #610042 - Attachment is obsolete: true
Attachment #610042 - Flags: feedback?(michal.novotny)
Whiteboard: [sg:moderate] → [sg:moderate][advisory-tracking+]
Group: core-security
You need to log in before you can comment on or make changes to this bug.