Closed Bug 1288104 Opened 6 years ago Closed 5 years ago

The bytecode cache should recall the hashes of the Subresource Integrity checks.

Categories

(Core :: JavaScript Engine, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla52
Tracking Status
firefox50 --- affected
firefox52 --- fixed

People

(Reporter: nbp, Assigned: nbp)

References

Details

Attachments

(4 files, 3 obsolete files)

The bytecode cache gives us the opportunity of saving the source code in a converted & compressed format, as well as the bytecode of the functions used during the start-up of the page.

Unfortunately, such feature goes against the logic of having sub-resources integrity [1][2] (SRI) checks to validate that the script downloaded from a third party website is no altered.  This goes against this verification procedure because we do not want to iterate over the sources.  Moreover, the sources are converted into a different format, thus computing the hash on the converted source would never match to original hashes.

Also, we have to always check the SRI, because multiple/single website can rely on the same script, while having different hashes & algo.  Loading from the bytecode cache without checking it would be a security issue.

I see multiple options for this issue:
  1. "One script is mostly used with a unique couple of hash & algo".  If so, we could just save the provided one next to the bytecode cache.  Thus, the SRI validation would compare the current hash & algo with the stored one.

  2. "We only have a few major set of crypto functions used in the wild".  If so, we could compute the crypto hashes of the source before saving the bytecode cache, and save multiple hashes next to the cache entry.  Thus, the SRI validation would check that the hash & algo from the document matches one of the pre-computed one.

In both cases, if we find the algo, and the hash does not match, then we have a verification issue.  Otherwise, if we cannot find the algo, then we should re-request the source and prevent us from loading from the bytecode cache entry.

François, do you know which assumptions of the 2 options above is the most likely such that we can rely on it most of the time?  I do not think we have any other alternative, but if you can think of a better one, I am open to suggestions.

[1] https://developer.mozilla.org/en-US/docs/Web/Security/Subresource_Integrity
[2] http://searchfox.org/mozilla-central/source/dom/base/nsScriptLoader.cpp#1137-1143
Flags: needinfo?(francois)
I don't have data on this, but there are two main use cases for SRI:

1. Site wants to host their own static files on a CDN.
2. Site wants to embed a common JS framework hosted by a popular CDN.

For #1, I would expect that the hash algorithm wouldn't change since there's only one site pulling down that JS file. So option 1 is likely to work very well.

For #2, I would expect that the bulk of the people using SRI will use it as a result of copy/pasting a complete script tag from say code.jquery.com. In which case, whatever jQuery uses in their example is likely to be the dominant hash algorithm. This also points in the direction of option 1.

For those people manually adding SRI hashes to their scripts, it's worth noting that the spec uses SHA-384 in its examples and srihash.org only returns SHA-384 hashes. So copy/paste developers will end up with that. Again, option 1 is likely to work here.

If I were the one doing this, I'd probably go with option 1 and then add telemetry to note how often we get a non-matching hash algorithm v. how often the one we saved is the right one. That way we can see if our assumptions were correct as more people adopt SRI.
Flags: needinfo?(francois)
The goal of this patch is to be able to encode/decode the SRI hash & algo as
part of the TranscodeBuffer (previously named XDRBuffer) while avoiding
making copies as much as possible.

In order to avoid making copies, this patch externalized the TranscodeBuffer
to the caller, which can then use the same read/write functions of the
TranscodeBuffer to load/store additional information such as the SRI.

This patch also removes all extra arguments of the JS_EncodeScript and
JS_DecodeScript functions, to only leave the TranscodeBuffer (which already carry a
JSContext pointer) and the Handle which has to be transcoded.

While adding the TranscodeBuffer class to the jsapi.h, I also nested all
these functions under the JS namespace, to avoid polluting the gloabl
namespace.
Attachment #8773875 - Flags: review?(luke)
Comment on attachment 8773875 [details] [diff] [review]
part 1 - Move XDR buffer initialization to the caller.

Review of attachment 8773875 [details] [diff] [review]:
-----------------------------------------------------------------

I know the XDRBuffer API is pre-existing but I think we should take some time to improve it before sticking it in the public JSAPI because it's problematic.  First, it reimplements Vector's growing algorithm.  Second, it has a super-unclear lifetime/ownership contract.  For example, I see us calling freeBuffer() when codeScript() fails, when codeFunction() *succeeds* and nowhere in any destructor, despite the fact that mozJSLoaderUtils.cpp is unconditionally calling buf.release() on the data passed to the buffer.  Let's build something on Vector with the classic Vector lifetime/ownership contract.
Attachment #8773875 - Flags: review?(luke)
This patch replaces the EncodeScript, EncodeInterpretedFunction, DecodeScript,
DecodeInterpretedFunction by members of the Transcoder class.

The Transcoder class own a buffer of bytes which contains the bytecode, which
would be allocated when the encode functions / write function are called, and
freed when the destructor is called.

The buffer which contains the bytecode, can be stolen and overwritten with the
extractRawBuffer and replaceRawBuffer member functions. This is used for
transfering the ownership of the bytes to another entity such as a cache entry.
Attachment #8774758 - Flags: review?(luke)
Attachment #8773875 - Attachment is obsolete: true
(In reply to Luke Wagner [:luke] from comment #3)
> Second, it
> has a super-unclear lifetime/ownership contract.  For example, I see us
> calling freeBuffer() when codeScript() fails, when codeFunction() *succeeds*
> and nowhere in any destructor, …

Good catches, somehow I forgot to add the destructor and a negation, but I intended to have the TranscoderBuffer owns the bits, as the only case where it does not, is when we steal the buffer in the Evaluate function of the JS shell.
Comment on attachment 8774758 [details] [diff] [review]
part 1 - Move XDR buffer ownership to the caller.

Review of attachment 8774758 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jsapi.h
@@ +5691,5 @@
>  
> +class Transcoder {
> +  public:
> +    explicit Transcoder(JSContext* cx)
> +      : cx_(), buffer_(), cursor_(0) { }

self-nit: cx_(cx)
Comment on attachment 8774758 [details] [diff] [review]
part 1 - Move XDR buffer ownership to the caller.

Review of attachment 8774758 [details] [diff] [review]:
-----------------------------------------------------------------

This is definitely an improvement but, looking at the Gecko uses of this class, I have to wonder whether we couldn't just go all the way and simply have the caller create and own the Vector<uint8_t> directly (or maybe we create a Vector typedef for them in jsapi.h) and then the JS_Encode* functions take a Vector* and the JS_Decode* functions take a const Vector&.  Then we could maintain an entirely *internal* XDRBuffer class in vm/Xdr.h whose ctor took a Vector&.  This has the advantage of keeping the internal helpers (like readCString) out of jsapi.h and letting clients work directly with Vectors.
Add a typedef on top of mozilla::Vector to define the TranscodeBuffer which owns
the encoded content of a Script / Function.

This modification renames JS_EncodeScript, into JS::EncodeScript, and change its
prototype to have a JSContext, a TranscodeBuffer, and a Handle on a script that
we want to encode.

Similar modifications are made to JS_EncodeInterpretedFunction, and the Decode
variant of these.
Attachment #8775181 - Flags: review?(luke)
Attachment #8774758 - Attachment is obsolete: true
Attachment #8774758 - Flags: review?(luke)
Comment on attachment 8775181 [details] [diff] [review]
part 1 - Move XDR buffer to the caller.

Review of attachment 8775181 [details] [diff] [review]:
-----------------------------------------------------------------

Great, much simpler, thanks!

::: js/src/shell/js.cpp
@@ +1570,5 @@
>              return false;
> +        }
> +        uint8_t* saveData = saveBuffer.extractOrCopyRawBuffer();
> +        if (!CacheEntry_setBytecode(cx, cacheEntry, saveData, saveLength)) {
> +            js_free(saveData);

Perhaps for a future patch, but it'd be nice if CacheEntry_setBytecode took a UniquePtr to avoid this free-if-failed case.

::: js/xpconnect/loader/mozJSLoaderUtils.cpp
@@ +31,5 @@
>      if (NS_FAILED(rv))
>          return rv; // don't warn since NOT_AVAILABLE is an ok error
>  
> +    JS::TranscodeBuffer buffer;
> +    buffer.replaceRawBuffer(reinterpret_cast<uint8_t*>(buf.release()), len);

For a separate bug, but perhaps you could change cache->GetBuffer() to take a TranscodeBuffer*?

@@ +77,2 @@
>      MOZ_ASSERT(size && data);
> +    nsresult rv = cache->PutBuffer(PromiseFlatCString(uri).get(), reinterpret_cast<char*>(data), size);

I think this code could be simplified to pass buffer.begin() :)  That would also remove the ScopeExit.h #include.

::: js/xpconnect/src/nsXPConnect.cpp
@@ +1125,4 @@
>      MOZ_ASSERT(size && data);
>      rv = stream->Write32(size);
>      if (NS_SUCCEEDED(rv))
> +        rv = stream->WriteBytes(reinterpret_cast<char*>(data), size);

Same comment re: buffer.begin().
Attachment #8775181 - Flags: review?(luke) → review+
This patch is not yet tested (f? instead of r?), it adds an API which is
used to dump the computed hash in a pre-allocated array, and to read it from
an array.

This API is made to work with the buffer exposed in the previous patch, such
that we can one can record the SRI computed hash before the bytecode.  Thus,
when we would stream the bytes, the first thing we would get would be the
SRI computed hash from the alternate-data (Bug 1231565) of the nsChannel,
instead of any sources.

This will gives us the ability to verify the hash, or resart the nsChannel
with a new request if the hash type differs from the previous
load. (= NS_ERROR_SRI_UNEXPECTED_HASH_TYPE)
Attachment #8776931 - Flags: feedback?(francois)
Assignee: nobody → nicolas.b.pierron
Status: NEW → ASSIGNED
Comment on attachment 8776931 [details] [diff] [review]
part 2 - Instrument SRICheckDataVerifier to load/save the computed hash from the bytecode cache.

Review of attachment 8776931 [details] [diff] [review]:
-----------------------------------------------------------------

Overall that looks good to me.

I do find the naming a little confusing. Do you think we could use something like export/import or serialize/deserialize instead of encodehash and decodehash?

i.e. SerializeVerifiedHash() / DeserializeVerifiedHash() or ExportVerifiedHash() / ImportVerifiedHash()

It would be nice if it were clear to most users of this class that they can ignore this stuff and just use Update() & Verify().

::: dom/security/SRICheck.h
@@ +49,5 @@
>    public:
>      SRICheckDataVerifier(const SRIMetadata& aMetadata,
>                           const nsIDocument* aDocument);
>  
> +    // Append the following bytes in the content of the string used to compute

It would be good to group Update() and Verify() together here since they are meant to be used together (and is the main API).

Then further down you could have EncodedHashLength(), DecodeHash() and EncodeVerifiedHash().

@@ +55,3 @@
>      nsresult Update(uint32_t aStringLen, const uint8_t* aString);
> +
> +    // Return the length of the computed hash.

This is the length of the computed hash along with its type and length (i.e. it's not the same as mHashLength).

@@ +73,5 @@
>      nsCOMPtr<nsICryptoHash> mCryptoHash;
>      nsAutoCString           mComputedHash;
>      size_t                  mBytesHashed;
>      int8_t                  mHashType;
> +    uint32_t                mHashLength;

nit: the members are sorted by size from largest to smallest.
Attachment #8776931 - Flags: feedback?(francois) → feedback+
This patch adds new functions to SRICheckDataVerifier. These functions are
made to serialize (export), and deserialize (import) the computed hash in
the bytecode cache of the nsScriptLoader (Bug 900874).

The bytecode cache is a buffer saved in the Necko cache, which contains the
result of the parser from previous evaluation. The current implementation
(Bug 900874) unconditionally stores the SRI computed hash, when available,
otherwise stores an "empty" hash (mHashLength == 0).

The "empty" hash is used to make the Verify function fail if we attempts to
verify the integrity against a non-empty metadata.  As the metadata are part
of the load-er, instead of the load-y, we might have cases where the load-er
changed, adding metadata, but the cached script is still persitent.  In such
cases, this would cause the nsScriptLoader to fallback on the source to
compute the SRI hash.

PS: I will also ask you for review on the Bug 900874 parts (so far part 1.0
& part 1.3) which are making use of this new functions.
Attachment #8800621 - Flags: review?(francois)
Attachment #8776931 - Attachment is obsolete: true
To avoid making extra copies of the SRI hash, I modified the EncodeScript
function to append in the TranscodeBuffer, instead of overwritting with the
content produced by XDR.

This way, in Bug 900874 part 1.3, the SRI hash is written directly into the
buffer which might later be by EncodeScript, to serialize the JSScripts.
Attachment #8800622 - Flags: review?(luke)
By adding a cursor index, we remove the need of making a byte-by-byte copy
of all the TranscodeBuffer content when removing/skipping the header of the
TranscodeBuffer which deals with the computed hash needed for SRI checks.
Attachment #8800623 - Flags: review?(luke)
In all the previous comment, read Bug 900784 (js-startup-cache) instead of Bug 900874.
Comment on attachment 8800622 [details] [diff] [review]
part 3 - Always append when encoding scripts & interpreted functions.

Review of attachment 8800622 [details] [diff] [review]:
-----------------------------------------------------------------

I don't currently see a 3rd argument to XDREncoder's ctor, so I'll have to take your word it appears and it works as intended.
Attachment #8800622 - Flags: review?(luke) → review+
Attachment #8800623 - Flags: review?(luke) → review+
(In reply to Luke Wagner [:luke] from comment #16)
> I don't currently see a 3rd argument to XDREncoder's ctor, so I'll have to
> take your word it appears and it works as intended.

Sorry, this code is part of the part 4, I will swap the 2 patches before landing.
Comment on attachment 8800621 [details] [diff] [review]
part 2 - Instrument SRICheckDataVerifier to load/save the computed hash from the bytecode cache.

Review of attachment 8800621 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good. It might be good to introduce another error code to distinguish these failures from the hash mismatches.

::: dom/security/SRICheck.cpp
@@ +430,5 @@
> +
> +  // we expect to always encode an SRI, even if it is empty or incomplete
> +  if (aDataLen < EmptyDataSummaryLength()) {
> +    SRILOG(("SRICheckDataVerifier::DataSummaryLength, encoded length[%u] is too small", aDataLen));
> +    return NS_ERROR_SRI_CORRUPT;

Maybe we should add a new error message?

NS_ERROR_SRI_CORRUPT is the error we return when the hashes don't match. It's not entirely inappropriate in these cases, but it also feels like a bit of stretch.
Attachment #8800621 - Flags: review?(francois) → review+
(In reply to François Marier [:francois] from comment #18)
> NS_ERROR_SRI_CORRUPT is the error we return when the hashes don't match.
> It's not entirely inappropriate in these cases, but it also feels like a bit
> of stretch.

I added NS_ERROR_SRI_IMPORT, to report errors related to unexpected values or buffer size.
Thus the only remaining uses of NS_ERROR_SRI_CORRUPT are in ::Verify and ::VerifyHash, as it used to be.
Pushed by npierron@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/4a8c5061f3b7
part 1 - Move XDR buffer to the caller. r=luke
https://hg.mozilla.org/integration/mozilla-inbound/rev/3ae084908fc4
part 2 - Instrument SRICheckDataVerifier to load/save the computed hash from the bytecode cache. r=francois
https://hg.mozilla.org/integration/mozilla-inbound/rev/e30472b2861a
part 3 - Give an optional cursor index to DecodeScript function. r=luke
https://hg.mozilla.org/integration/mozilla-inbound/rev/f9faf2a9c735
part 4 - Always append when encoding scripts & interpreted functions. r=luke
You need to log in before you can comment on or make changes to this bug.