Last Comment Bug 232030 - check the Content-MD5 header
: check the Content-MD5 header
Status: RESOLVED WONTFIX
:
Product: Core
Classification: Components
Component: Networking: HTTP (show other bugs)
: Trunk
: All All
: -- enhancement with 8 votes (vote)
: ---
Assigned To: Patrick McManus [:mcmanus]
:
Mentors:
http://www.w3.org/Protocols/rfc2616/r...
Depends on: 599164
Blocks: 603503 pipelining-review
  Show dependency treegraph
 
Reported: 2004-01-24 03:28 PST by bugzilla
Modified: 2012-06-11 18:31 PDT (History)
28 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments
Patch v1 to implement Content-MD5 verification (35.32 KB, patch)
2010-08-26 09:21 PDT, Patrick McManus [:mcmanus]
no flags Details | Diff | Review
patch v2 to implement md5 verification (35.68 KB, patch)
2010-09-03 07:51 PDT, Patrick McManus [:mcmanus]
no flags Details | Diff | Review
patch v3 to implement md5 verification (40.89 KB, patch)
2010-09-08 17:17 PDT, Patrick McManus [:mcmanus]
no flags Details | Diff | Review
Patch to Implement HTTP Content-MD5 Verification (v4) (40.00 KB, patch)
2010-09-09 09:47 PDT, Patrick McManus [:mcmanus]
no flags Details | Diff | Review
Patch to Implement HTTP Content-MD5 Verification (v5) (39.98 KB, patch)
2010-09-17 07:26 PDT, Patrick McManus [:mcmanus]
no flags Details | Diff | Review
Patch to Implement HTTP Content-MD5 Verification (v6) (31.06 KB, patch)
2010-10-13 14:13 PDT, Patrick McManus [:mcmanus]
no flags Details | Diff | Review
patch to imlement http content md5 verification (31.70 KB, patch)
2010-12-03 15:21 PST, Patrick McManus [:mcmanus]
no flags Details | Diff | Review
patch to imlement http content md5 verification (38.35 KB, patch)
2011-01-18 12:35 PST, Patrick McManus [:mcmanus]
no flags Details | Diff | Review
patch to imlement http content md5 verification v9 (38.74 KB, application/octet-stream)
2011-02-08 12:48 PST, Patrick McManus [:mcmanus]
no flags Details
patch to imlement http content md5 verification v10 (38.75 KB, patch)
2011-02-18 18:30 PST, Patrick McManus [:mcmanus]
honzab.moz: review-
Details | Diff | Review
patch to imlement http content md5 verification v11 (43.77 KB, patch)
2011-04-14 09:48 PDT, Patrick McManus [:mcmanus]
no flags Details | Diff | Review
v12 (review updated and merged v11) (48.16 KB, patch)
2011-05-29 09:50 PDT, Honza Bambas (:mayhemer)
honzab.moz: review+
Details | Diff | Review
v11-v12 interdiff (12.95 KB, patch)
2011-05-29 09:59 PDT, Honza Bambas (:mayhemer)
no flags Details | Diff | Review
telemetry v1 (6.13 KB, patch)
2011-05-31 15:04 PDT, Patrick McManus [:mcmanus]
no flags Details | Diff | Review
telemetry v2 (5.73 KB, patch)
2011-05-31 17:41 PDT, Patrick McManus [:mcmanus]
no flags Details | Diff | Review
error page for ns_error_corrupted content v1 [landed as 662414] (12.97 KB, patch)
2011-06-06 10:43 PDT, Patrick McManus [:mcmanus]
no flags Details | Diff | Review
v13 (34.42 KB, patch)
2011-06-06 10:44 PDT, Patrick McManus [:mcmanus]
mcmanus: review+
Details | Diff | Review

Description bugzilla 2004-01-24 03:28:46 PST
User-Agent:       Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.4) Gecko/20030624
Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.4) Gecko/20030624

The Content-MD5 headers, specified in HTTP 1.1 and already sent by some webpages
is not used and compared with the MD5 of the message body. So mozilla doesn't
tell you when the content seems to be broken (which could be very useful in some
cases)

Reproducible: Always

Steps to Reproduce:
1. send a wrong Content-MD5 header
2. does mozilla care about this?

Actual Results:  
nothing, that's the fact!

Expected Results:  
inform me about the fact that the transmission of the page seems to be errorneous.

i think it's moreless a wish than a bug
Comment 1 bugzilla 2004-01-24 03:32:02 PST
i've testet with the moz1.4 (as the bug says) and

Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.5) Gecko/20031007
Firebird/0.7 (which is moz1.5)
Comment 2 Christian :Biesinger (don't email me, ping me on IRC) 2004-01-24 06:24:19 PST
rfc 2616 (section 14.15) says that Content-MD5 MAY be checked; it is therefore
not a bug that mozilla doesn't.

confirming as enhancement request.
Comment 3 Christopher Hoess (gone) 2004-01-24 23:26:15 PST
Is this related to bug 101743?
Comment 4 Darin Fisher 2006-06-21 16:05:41 PDT
-> default owner
Comment 5 [Baboo] 2010-04-18 22:28:05 PDT
Just wondering if this would give some kind of protection protection against the the CVE-2009-3555 vulnerability (heavily discussed in bug 526689 )?
(of course only on servers sending this header)
Comment 6 Patrick McManus [:mcmanus] 2010-08-26 09:21:30 PDT
Created attachment 469486 [details] [diff] [review]
Patch v1 to implement Content-MD5 verification
Comment 7 Patrick McManus [:mcmanus] 2010-08-26 09:22:04 PDT
Attached is a patch that calculates an entity's md5 if there is a Content-MD5 header in the http response, or if there is a "Trailer: Content-MD5" header in the response telling the client to expect a checksum at the end of the chunked encoding. After the entity is received the calculated sum is compared to the one sent with the entity and if they don't match NS_ERROR_CORRUPTED_CONTENT is generated and a page similar to the one used when a gzip-decode fails is displayed.

The md5 calculation does not change the entity at all and does not require an extra copy of the entity bytes.

draft-nottingham-http-pipeline-00.txt suggests MD5 verification as one way of detecting pipelining corruption. This patch is a necessary precursor to being able to do that.

I don't yet have any idea how to add tests to the build system. I would love a pointer on that. I do have 6 nph scripts to test against that illustrate content-md5 as:
  * a header on a content-length based response 
  * a header on a chunked encoding based response
  * a trailer on a chunked-encoding based response

Each of those three scenarios is represented in a separate script with a valid and invalid checksum.
Comment 8 Patrick McManus [:mcmanus] 2010-09-02 05:02:56 PDT
patch needs to be updated as the digest string used for comparison is in the wrong formant.
Comment 9 Patrick McManus [:mcmanus] 2010-09-03 07:51:22 PDT
Created attachment 471844 [details] [diff] [review]
patch v2 to implement md5 verification

The directive "ContentDigest On" can be added to apache to have it generate md5 checksums. It is in apache core, no module necessary.
Comment 10 Patrick McManus [:mcmanus] 2010-09-08 17:17:39 PDT
Created attachment 473299 [details] [diff] [review]
patch v3 to implement md5 verification

fix a problem with QI, a refcount cycle, and add a unit test

based on bz feedback will use a Tee and an outputstream in next revision
Comment 11 Reed Loden [:reed] (use needinfo?) 2010-09-08 17:44:56 PDT
Comment on attachment 473299 [details] [diff] [review]
patch v3 to implement md5 verification

>+ * The Initial Developer of the Original Code is
>+ * Mozilla Corporation.

Initial Developer should be "Mozilla Foundation", not MoCo.

http://weblogs.mozillazine.org/gerv/archives/2010/02/mpl_initial_developer_for_mozilla_employ.html
Comment 12 Patrick McManus [:mcmanus] 2010-09-09 09:47:45 PDT
Created attachment 473600 [details] [diff] [review]
Patch to Implement HTTP Content-MD5 Verification (v4)

update to use nsStreamListenerTee instead of hand rolling that functionality, and address formatting concerns of jst-review.pl
Comment 13 Patrick McManus [:mcmanus] 2010-09-17 07:26:32 PDT
Created attachment 476242 [details] [diff] [review]
Patch to Implement HTTP Content-MD5 Verification (v5)

Re-merged with current mozilla-central, otherwise unchanged from v4
Comment 14 :Ms2ger 2010-10-12 04:57:44 PDT
Comment on attachment 476242 [details] [diff] [review]
Patch to Implement HTTP Content-MD5 Verification (v5)

Style police:

>diff --git a/netwerk/base/src/nsMD5Stream.cpp b/netwerk/base/src/nsMD5Stream.cpp
>new file mode 100644
>--- /dev/null
>+++ b/netwerk/base/src/nsMD5Stream.cpp
>@@ -0,0 +1,403 @@
>+/* -*- Mode: C++; tab-width: 4; indent-tabs-mode: nil; c-basic-offset: 4 -*- */
>+/* ***** BEGIN LICENSE BLOCK *****
>+ * Version: MPL 1.1/GPL 2.0/LGPL 2.1
>+ *
>+ * The contents of this file are subject to the Mozilla Public License Version
>+ * 1.1 (the "License"); you may not use this file except in compliance with
>+ * the License. You may obtain a copy of the License at
>+ * http://www.mozilla.org/MPL/
>+ *
>+ * Software distributed under the License is distributed on an "AS IS" basis,
>+ * WITHOUT WARRANTY OF ANY KIND, either express or implied. See the License
>+ * for the specific language governing rights and limitations under the
>+ * License.
>+ *
>+ * The Original Code is Mozilla.
>+ *
>+ * The Initial Developer of the Original Code is
>+ * Mozilla Foundation.
>+ * Portions created by the Initial Developer are Copyright (C) 2010
>+ * the Initial Developer. All Rights Reserved.
>+ *
>+ * Contributor(s):
>+ *   Patrick McManus <mcmanus@ducksong.com>
>+ *
>+ * Alternatively, the contents of this file may be used under the terms of
>+ * either of the GNU General Public License Version 2 or later (the "GPL"),
>+ * or the GNU Lesser General Public License Version 2.1 or later (the "LGPL"),
>+ * in which case the provisions of the GPL or the LGPL are applicable instead
>+ * of those above. If you wish to allow use of your version of this file only
>+ * under the terms of either the GPL or the LGPL, and not to allow others to
>+ * use your version of this file under the terms of the MPL, indicate your
>+ * decision by deleting the provisions above and replace them with the notice
>+ * and other provisions required by the GPL or the LGPL. If you do not delete
>+ * the provisions above, a recipient may use your version of this file under
>+ * the terms of any one of the MPL, the GPL or the LGPL.
>+ *
>+ * ***** END LICENSE BLOCK ***** */
>+
>+#include "nsMD5Stream.h"
>+#include "nsIInputStream.h"
>+#include "nsCOMPtr.h"
>+#include "plbase64.h"
>+
>+NS_IMPL_ISUPPORTS1(nsMD5Stream, nsIOutputStream)
>+
>+nsMD5Stream::nsMD5Stream() :
>+    mIsComplete(PR_FALSE)
>+{
>+    MD5Init(&mContext);
>+}
>+
>+void
>+nsMD5Stream::Finalize()
>+{
>+    if (!mIsComplete)
>+    {
>+        mIsComplete = PR_TRUE;
>+        MD5Final(mBinaryDigest, &mContext);
>+    }
>+    return;
>+}

Why the return?

>+
>+nsresult
>+nsMD5Stream::CompleteHex(nsACString &result)
>+{
>+    static char const hexlate[] = "0123456789abcdef";
>+    
>+    Finalize();
>+    
>+    mResult.SetLength(0);

mResult.Truncate();

>+    for (PRInt16 i = 0; i < 16; i ++) {

i++ (or ++i, but no space).

>+        PRInt32 a = mBinaryDigest[i];
>+        mResult.Append(hexlate[(a>>4) & 0xf]);
>+        mResult.Append(hexlate[a & 0xf]);
>+    }
>+    
>+    result = mResult;
>+    return NS_OK;
>+}

Does this need to return nsresult?

>+
>+nsresult
>+nsMD5Stream::CompleteB64(nsACString &result)
>+{
>+    Finalize();
>+    
>+    /* An MD-5 Binary digest is always 16 bytes, and that is always 24 bytes
>+       after being base 64 encoded */
>+    
>+    mResult.SetLength(24);
>+    if (mResult.Length() == 24) {
>+        (char *) PL_Base64Encode ((const char *)mBinaryDigest,
>+                                  16,
>+                                  mResult.BeginWriting());
>+        result = mResult;
>+        return NS_OK;
>+    }
>+    
>+    return NS_ERROR_OUT_OF_MEMORY;
>+}

if (!mResult.SetLength(24)) {
  return NS_ERROR_OUT_OF_MEMORY;
}

// Normal code path

>+
>+
>+// implementation of nsIOutputStream
>+// nsReadFromInputStream and ::WriteFrom are based on in nsBufferedStreams
>+
>+NS_IMETHODIMP nsMD5Stream::Close()
>+{
>+    return NS_OK;
>+}
>+
>+NS_IMETHODIMP nsMD5Stream::Flush()
>+{
>+    return NS_OK;
>+}
>+
>+NS_IMETHODIMP nsMD5Stream::Write(const char *aBuf,
>+                                 PRUint32 aCount,
>+                                 PRUint32 *_retval NS_OUTPARAM)
>+{
>+    MD5Update(&mContext, (unsigned char *) aBuf, aCount);
>+    *_retval = aCount;
>+    return NS_OK;
>+}
>+
>+static NS_METHOD
>+nsReadFromInputStream(nsIOutputStream* outStr,
>+                      void* closure,
>+                      char* toRawSegment,
>+                      PRUint32 offset,
>+                      PRUint32 count,
>+                      PRUint32 *readCount)
>+{
>+    nsIInputStream* fromStream = (nsIInputStream*)closure;
>+    return fromStream->Read(toRawSegment, count, readCount);
>+}
>+
>+NS_IMETHODIMP
>+nsMD5Stream::WriteFrom(nsIInputStream *inStr, PRUint32 count, PRUint32 *_retval)
>+{
>+    return WriteSegments(nsReadFromInputStream, inStr, count, _retval);
>+}
>+
>+NS_IMETHODIMP
>+nsMD5Stream::WriteSegments(nsReadSegmentFun reader,
>+                           void * closure,
>+                           PRUint32 count,
>+                           PRUint32 *_retval)
>+{
>+    nsresult rv;
>+    char buffer[1024];
>+    
>+    *_retval = 0;
>+    
>+    while (count > 0) {
>+        PRUint32 chunk = PR_MIN(count, 1024);

NS_MIN

>+        PRUint32 rd = 0;
>+        
>+        rv = reader(this, closure, buffer, *_retval, chunk, &rd);
>+
>+        if (NS_FAILED(rv)) // If we have written some data, return ok
>+            return (*_retval > 0) ? NS_OK : rv;
>+
>+        MD5Update(&mContext, (unsigned char *) buffer, rd);
>+        
>+        *_retval += rd;
>+        count -= rd;
>+    }
>+    return NS_OK;
>+}
>+
>+NS_IMETHODIMP nsMD5Stream::IsNonBlocking(PRBool *_retval NS_OUTPARAM)
>+{
>+    *_retval  = PR_FALSE;
>+    return NS_OK;
>+}
>+
>+
>+/* MD5 implementation below is derived from the public domain SQLlite 3.7.2 */
>+
>+/*
>+ * Note: this code is harmless on little-endian machines.
>+ */
>+static void byteReverse (unsigned char *buf, unsigned longs)
>+{
>+    uint32 t;

PRUint32 or uint32_t. And why are you mixing `unsigned` and `uint32`?

>+    do {
>+        t = (uint32)((unsigned)buf[3]<<8 | buf[2]) << 16 |
>+            ((unsigned)buf[1]<<8 | buf[0]);
>+        *(uint32 *)buf = t;
>+        buf += 4;
>+    } while (--longs);
>+}
>+
>+#define F1(x, y, z) (z ^ (x & (y ^ z)))
>+#define F2(x, y, z) F1(z, x, y)
>+#define F3(x, y, z) (x ^ y ^ z)
>+#define F4(x, y, z) (y ^ (x | ~z))

Inline functions, and possibly understandable names? Also, don't we have an MD5 implementation yet?

>+
>+/* This is the central step in the MD5 algorithm. */
>+#define MD5STEP(f, w, x, y, z, data, s) \
>+        ( w += f(x, y, z) + data,  w = w<<s | w>>(32-s),  w += x )
>+
>+/*
>+ * The core of the MD5 algorithm, this alters an existing MD5 hash to
>+ * reflect the addition of 16 longwords of new data.  MD5Update blocks
>+ * the data and converts bytes into longwords for this routine.
>+ */
>+static void MD5Transform(uint32 buf[4], const uint32 in[16])
>+{
>+    register uint32 a, b, c, d;
>+    
>+    a = buf[0];
>+    b = buf[1];
>+    c = buf[2];
>+    d = buf[3];
>+
>+    MD5STEP(F1, a, b, c, d, in[ 0]+0xd76aa478,  7);
>+    MD5STEP(F1, d, a, b, c, in[ 1]+0xe8c7b756, 12);
>+    MD5STEP(F1, c, d, a, b, in[ 2]+0x242070db, 17);
>+    MD5STEP(F1, b, c, d, a, in[ 3]+0xc1bdceee, 22);
>+    MD5STEP(F1, a, b, c, d, in[ 4]+0xf57c0faf,  7);
>+    MD5STEP(F1, d, a, b, c, in[ 5]+0x4787c62a, 12);
>+    MD5STEP(F1, c, d, a, b, in[ 6]+0xa8304613, 17);
>+    MD5STEP(F1, b, c, d, a, in[ 7]+0xfd469501, 22);
>+    MD5STEP(F1, a, b, c, d, in[ 8]+0x698098d8,  7);
>+    MD5STEP(F1, d, a, b, c, in[ 9]+0x8b44f7af, 12);
>+    MD5STEP(F1, c, d, a, b, in[10]+0xffff5bb1, 17);
>+    MD5STEP(F1, b, c, d, a, in[11]+0x895cd7be, 22);
>+    MD5STEP(F1, a, b, c, d, in[12]+0x6b901122,  7);
>+    MD5STEP(F1, d, a, b, c, in[13]+0xfd987193, 12);
>+    MD5STEP(F1, c, d, a, b, in[14]+0xa679438e, 17);
>+    MD5STEP(F1, b, c, d, a, in[15]+0x49b40821, 22);
>+
>+    MD5STEP(F2, a, b, c, d, in[ 1]+0xf61e2562,  5);
>+    MD5STEP(F2, d, a, b, c, in[ 6]+0xc040b340,  9);
>+    MD5STEP(F2, c, d, a, b, in[11]+0x265e5a51, 14);
>+    MD5STEP(F2, b, c, d, a, in[ 0]+0xe9b6c7aa, 20);
>+    MD5STEP(F2, a, b, c, d, in[ 5]+0xd62f105d,  5);
>+    MD5STEP(F2, d, a, b, c, in[10]+0x02441453,  9);
>+    MD5STEP(F2, c, d, a, b, in[15]+0xd8a1e681, 14);
>+    MD5STEP(F2, b, c, d, a, in[ 4]+0xe7d3fbc8, 20);
>+    MD5STEP(F2, a, b, c, d, in[ 9]+0x21e1cde6,  5);
>+    MD5STEP(F2, d, a, b, c, in[14]+0xc33707d6,  9);
>+    MD5STEP(F2, c, d, a, b, in[ 3]+0xf4d50d87, 14);
>+    MD5STEP(F2, b, c, d, a, in[ 8]+0x455a14ed, 20);
>+    MD5STEP(F2, a, b, c, d, in[13]+0xa9e3e905,  5);
>+    MD5STEP(F2, d, a, b, c, in[ 2]+0xfcefa3f8,  9);
>+    MD5STEP(F2, c, d, a, b, in[ 7]+0x676f02d9, 14);
>+    MD5STEP(F2, b, c, d, a, in[12]+0x8d2a4c8a, 20);
>+
>+    MD5STEP(F3, a, b, c, d, in[ 5]+0xfffa3942,  4);
>+    MD5STEP(F3, d, a, b, c, in[ 8]+0x8771f681, 11);
>+    MD5STEP(F3, c, d, a, b, in[11]+0x6d9d6122, 16);
>+    MD5STEP(F3, b, c, d, a, in[14]+0xfde5380c, 23);
>+    MD5STEP(F3, a, b, c, d, in[ 1]+0xa4beea44,  4);
>+    MD5STEP(F3, d, a, b, c, in[ 4]+0x4bdecfa9, 11);
>+    MD5STEP(F3, c, d, a, b, in[ 7]+0xf6bb4b60, 16);
>+    MD5STEP(F3, b, c, d, a, in[10]+0xbebfbc70, 23);
>+    MD5STEP(F3, a, b, c, d, in[13]+0x289b7ec6,  4);
>+    MD5STEP(F3, d, a, b, c, in[ 0]+0xeaa127fa, 11);
>+    MD5STEP(F3, c, d, a, b, in[ 3]+0xd4ef3085, 16);
>+    MD5STEP(F3, b, c, d, a, in[ 6]+0x04881d05, 23);
>+    MD5STEP(F3, a, b, c, d, in[ 9]+0xd9d4d039,  4);
>+    MD5STEP(F3, d, a, b, c, in[12]+0xe6db99e5, 11);
>+    MD5STEP(F3, c, d, a, b, in[15]+0x1fa27cf8, 16);
>+    MD5STEP(F3, b, c, d, a, in[ 2]+0xc4ac5665, 23);
>+
>+    MD5STEP(F4, a, b, c, d, in[ 0]+0xf4292244,  6);
>+    MD5STEP(F4, d, a, b, c, in[ 7]+0x432aff97, 10);
>+    MD5STEP(F4, c, d, a, b, in[14]+0xab9423a7, 15);
>+    MD5STEP(F4, b, c, d, a, in[ 5]+0xfc93a039, 21);
>+    MD5STEP(F4, a, b, c, d, in[12]+0x655b59c3,  6);
>+    MD5STEP(F4, d, a, b, c, in[ 3]+0x8f0ccc92, 10);
>+    MD5STEP(F4, c, d, a, b, in[10]+0xffeff47d, 15);
>+    MD5STEP(F4, b, c, d, a, in[ 1]+0x85845dd1, 21);
>+    MD5STEP(F4, a, b, c, d, in[ 8]+0x6fa87e4f,  6);
>+    MD5STEP(F4, d, a, b, c, in[15]+0xfe2ce6e0, 10);
>+    MD5STEP(F4, c, d, a, b, in[ 6]+0xa3014314, 15);
>+    MD5STEP(F4, b, c, d, a, in[13]+0x4e0811a1, 21);
>+    MD5STEP(F4, a, b, c, d, in[ 4]+0xf7537e82,  6);
>+    MD5STEP(F4, d, a, b, c, in[11]+0xbd3af235, 10);
>+    MD5STEP(F4, c, d, a, b, in[ 2]+0x2ad7d2bb, 15);
>+    MD5STEP(F4, b, c, d, a, in[ 9]+0xeb86d391, 21);
>+
>+    buf[0] += a;
>+    buf[1] += b;
>+    buf[2] += c;
>+    buf[3] += d;
>+}
>+
>+/*
>+ * Start MD5 accumulation.  Set bit count to 0 and buffer to mysterious
>+ * initialization constants.
>+ */
>+void
>+nsMD5Stream::MD5Init(nsMD5Stream::Context *ctx)
>+{
>+    ctx->buf[0] = 0x67452301;
>+    ctx->buf[1] = 0xefcdab89;
>+    ctx->buf[2] = 0x98badcfe;
>+    ctx->buf[3] = 0x10325476;
>+    ctx->bits[0] = 0;
>+    ctx->bits[1] = 0;
>+    return;
>+}

Another return.

>+
>+/*
>+ * Update context to reflect the concatenation of another buffer full
>+ * of bytes.
>+ */
>+void
>+nsMD5Stream::MD5Update(nsMD5Stream::Context *ctx,
>+                       const unsigned char *buf,
>+                       unsigned int len)
>+{
>+    uint32 t;

Same comment.

>+    
>+    /* Update bitcount */
>+    
>+    t = ctx->bits[0];
>+    if ((ctx->bits[0] = t + ((uint32)len << 3)) < t)
>+        ctx->bits[1]++; /* Carry from low to high */
>+    ctx->bits[1] += len >> 29;
>+    
>+    t = (t >> 3) & 0x3f;    /* Bytes already in shsInfo->data */
>+    
>+    /* Handle any leading odd-sized chunks */
>+    
>+    if ( t ) {

No spaces around parentheses.

>+        unsigned char *p = (unsigned char *)ctx->in + t;
>+        
>+        t = 64-t;

Put them around the - instead.

>+        if (len < t) {
>+            memcpy(p, buf, len);
>+            return;
>+        }
>+        memcpy(p, buf, t);
>+        byteReverse(ctx->in, 16);
>+        MD5Transform(ctx->buf, (uint32 *)ctx->in);
>+        buf += t;
>+        len -= t;
>+    }
>+    
>+    /* Process data in 64-byte chunks */
>+    
>+    while (len >= 64) {
>+        memcpy(ctx->in, buf, 64);
>+        byteReverse(ctx->in, 16);
>+        MD5Transform(ctx->buf, (uint32 *)ctx->in);
>+        buf += 64;
>+        len -= 64;
>+        }
>+    
>+    /* Handle any remaining bytes of data. */
>+    memcpy(ctx->in, buf, len);
>+    return;
>+}

Another one.

>+
>+/*
>+ * Final wrapup - pad to 64-byte boundary with the bit pattern
>+ * 1 0* (64-bit count of bits processed, MSB-first)
>+ */
>+void
>+nsMD5Stream::MD5Final(unsigned char digest[16], nsMD5Stream::Context *ctx)
>+{
>+    unsigned count;
>+    unsigned char *p;
>+    
>+    /* Compute number of bytes mod 64 */
>+    count = (ctx->bits[0] >> 3) & 0x3F;
>+    
>+    /* Set the first char of padding to 0x80.  This is safe since there is
>+       always at least one byte free */
>+    p = ctx->in + count;
>+    *p++ = 0x80;
>+    
>+    /* Bytes of padding needed to make 64 bytes */
>+    count = 64 - 1 - count;
>+    
>+    /* Pad out to 56 mod 64 */
>+    if (count < 8) {
>+        /* Two lots of padding:  Pad the first block to 64 bytes */
>+        memset(p, 0, count);
>+        byteReverse(ctx->in, 16);
>+        MD5Transform(ctx->buf, (uint32 *)ctx->in);
>+        
>+        /* Now fill the next block with 56 bytes */
>+        memset(ctx->in, 0, 56);
>+    } else {
>+        /* Pad block to 56 bytes */
>+        memset(p, 0, count-8);

Spaces.

>+    }
>+    byteReverse(ctx->in, 14);
>+    
>+    /* Append length in bits and transform */
>+    ((uint32 *)ctx->in)[ 14 ] = ctx->bits[0];
>+    ((uint32 *)ctx->in)[ 15 ] = ctx->bits[1];
>+    
>+    MD5Transform(ctx->buf, (uint32 *)ctx->in);
>+    byteReverse((unsigned char *)ctx->buf, 4);
>+    memcpy(digest, ctx->buf, 16);
>+    return;
>+}

Again.

>+

Trailing newline.

>diff --git a/netwerk/base/src/nsMD5Stream.h b/netwerk/base/src/nsMD5Stream.h
>new file mode 100644
>--- /dev/null
>+++ b/netwerk/base/src/nsMD5Stream.h
>@@ -0,0 +1,85 @@
>+/* -*- Mode: C++; tab-width: 4; indent-tabs-mode: nil; c-basic-offset: 4 -*- */
>+/* ***** BEGIN LICENSE BLOCK *****
>+ * Version: MPL 1.1/GPL 2.0/LGPL 2.1
>+ *
>+ * The contents of this file are subject to the Mozilla Public License Version
>+ * 1.1 (the "License"); you may not use this file except in compliance with
>+ * the License. You may obtain a copy of the License at
>+ * http://www.mozilla.org/MPL/
>+ *
>+ * Software distributed under the License is distributed on an "AS IS" basis,
>+ * WITHOUT WARRANTY OF ANY KIND, either express or implied. See the License
>+ * for the specific language governing rights and limitations under the
>+ * License.
>+ *
>+ * The Original Code is Mozilla.
>+ *
>+ * The Initial Developer of the Original Code is
>+ * Mozilla Foundation.
>+ * Portions created by the Initial Developer are Copyright (C) 2010
>+ * the Initial Developer. All Rights Reserved.
>+ *
>+ * Contributor(s):
>+ *   Patrick McManus <mcmanus@ducksong.com>
>+ *
>+ * Alternatively, the contents of this file may be used under the terms of
>+ * either of the GNU General Public License Version 2 or later (the "GPL"),
>+ * or the GNU Lesser General Public License Version 2.1 or later (the "LGPL"),
>+ * in which case the provisions of the GPL or the LGPL are applicable instead
>+ * of those above. If you wish to allow use of your version of this file only
>+ * under the terms of either the GPL or the LGPL, and not to allow others to
>+ * use your version of this file under the terms of the MPL, indicate your
>+ * decision by deleting the provisions above and replace them with the notice
>+ * and other provisions required by the GPL or the LGPL. If you do not delete
>+ * the provisions above, a recipient may use your version of this file under
>+ * the terms of any one of the MPL, the GPL or the LGPL.
>+ *
>+ * ***** END LICENSE BLOCK ***** */
>+
>+#ifndef nsMD5Stream_h___
>+#define nsMD5Stream_h___

All names with two or more consecutive underscores are reserved. Prefer "nsMD5Stream_h", or "mozilla_net_MD5Stream_h" with namespaces.

>+
>+#include "nsCOMPtr.h"
>+#include "nsIOutputStream.h"
>+#include "nsString.h"
>+
>+/*
>+   The nsMD5Stream implements an nsIOutputStream. As data is written
>+   to the stream an MD5 checksum is updated. When all of the data has
>+   been written one of the Complete* functions is called to get the
>+   calculated sum. Complete* may be called multiple times, but no more
>+   data may be written to the stream after being Complete*()'d.
>+*/
>+
>+class nsMD5Stream : public nsIOutputStream
>+{
>+public:
>+    NS_DECL_ISUPPORTS
>+    NS_DECL_NSIOUTPUTSTREAM
>+
>+    nsMD5Stream();
>+    nsresult CompleteHex(nsACString &result);
>+    nsresult CompleteB64(nsACString &result);
>+
>+    struct Context {
>+        PRUint32 buf[4];
>+        PRUint32 bits[2];
>+        PRUint8  in[64];
>+    };
>+
>+    // a binary md5 digest is 16 bytes
>+    static void MD5Init(nsMD5Stream::Context *ctx);
>+    static void MD5Update(nsMD5Stream::Context *ctx,
>+                          const unsigned char *buf, unsigned int len);
>+    static void MD5Final(unsigned char* digest, nsMD5Stream::Context *Ctx);
>+  
>+private:
>+    void Finalize();
>+
>+    struct Context mContext;
>+    nsCAutoString mResult;
>+    PRUint8       mBinaryDigest[16];
>+    PRPackedBool  mIsComplete;

Might prefer bool, if Boris agrees.

>+};
>+
>+#endif

Typically
#endif // nsMD5Stream_h

>diff --git a/netwerk/protocol/http/nsHttpChannel.cpp b/netwerk/protocol/http/nsHttpChannel.cpp
>--- a/netwerk/protocol/http/nsHttpChannel.cpp
>+++ b/netwerk/protocol/http/nsHttpChannel.cpp
>@@ -762,16 +762,40 @@ nsHttpChannel::CallOnStartRequest()
>             NS_SUCCEEDED(mCachePump->PeekStream(CallTypeSniffers, thisChannel));
>         }
>         
>         if (!typeSniffersCalled && mTransactionPump) {
>           mTransactionPump->PeekStream(CallTypeSniffers, thisChannel);
>         }
>     }
> 
>+    // If Content-MD5 is in the response header, or if we are told
>+    // to expect chunked-encoding trailers, then setup a
>+    // stream tee with the nsMD5Stream to calculate the sum of
>+    // what we really receive
>+    if (mResponseHead &&
>+        (mResponseHead->PeekHeader(nsHttp::Content_MD5) ||
>+         mResponseHead->PeekHeader(nsHttp::Trailer))) {
>+
>+        nsresult rv;
>+        nsRefPtr<nsMD5Stream> md5stream = new nsMD5Stream();
>+
>+        if (md5stream) {

new can't fail.

>+            nsCOMPtr<nsIStreamListenerTee> tee =
>+                do_CreateInstance(kStreamListenerTeeCID, &rv);
>+            if (NS_SUCCEEDED(rv)) {
>+                rv = tee->Init(mListener, md5stream, nsnull);
>+                if (NS_SUCCEEDED(rv)) {
>+                    mMD5Stream = md5stream;
>+                    mListener = tee;
>+                }
>+            }
>+        }
>+    }
>+
>     LOG(("  calling mListener->OnStartRequest\n"));
>     nsresult rv = mListener->OnStartRequest(this, mListenerContext);
>     if (NS_FAILED(rv)) return rv;
> 
>     // install stream converter if required
>     rv = ApplyContentConversions();
>     if (NS_FAILED(rv)) return rv;
> 
>@@ -3947,16 +3971,40 @@ nsHttpChannel::OnStopRequest(nsIRequest 
>             conn = mTransaction->Connection();
>             // This is so far a workaround to fix leak when reusing unpersistent
>             // connection for authentication retry. See bug 459620 comment 4
>             // for details.
>             if (conn && !conn->IsPersistent())
>                 conn = nsnull;
>         }
> 
>+        // verify http content-md5 if present
>+        if (mResponseHead && mMD5Stream) {
>+            const char *http_md5 =

httpMD5? Certainly no underscores.

>+                mResponseHead->PeekHeader(nsHttp::Content_MD5);
>+            
>+            if (!http_md5) {
>+                nsHttpHeaderArray *trailers = mTransaction->Trailers();
>+                if (trailers)
>+                    http_md5 = trailers->PeekHeader(nsHttp::Content_MD5);
>+            }
>+            
>+            if (http_md5) {
>+                nsCAutoString md5_b64;
>+                mMD5Stream->CompleteB64(md5_b64);
>+                if (!md5_b64.Equals(http_md5)) {
>+                    LOG(("md5 mismatch - remote %s, local %s\n",
>+                         http_md5,
>+                         md5_b64.get()));
>+                    if (NS_SUCCEEDED(status))
>+                        status = NS_ERROR_CORRUPTED_CONTENT;
>+                }
>+            }
>+        }
>+
>         // at this point, we're done with the transaction
>         mTransaction = nsnull;
>         mTransactionPump = 0;
> 
>         // handle auth retry...
>         if (authRetry) {
>             mAuthRetryPending = PR_FALSE;
>             status = DoAuthRetry(conn);
>diff --git a/netwerk/protocol/http/nsHttpChannel.h b/netwerk/protocol/http/nsHttpChannel.h
>--- a/netwerk/protocol/http/nsHttpChannel.h
>+++ b/netwerk/protocol/http/nsHttpChannel.h
>@@ -44,16 +44,17 @@
> #define nsHttpChannel_h__
> 
> #include "HttpBaseChannel.h"
> 
> #include "nsHttpTransaction.h"
> #include "nsInputStreamPump.h"
> #include "nsThreadUtils.h"
> #include "nsTArray.h"
>+#include "nsMD5Stream.h"
> 
> #include "nsIHttpEventSink.h"
> #include "nsICachingChannel.h"
> #include "nsICacheEntryDescriptor.h"
> #include "nsICacheListener.h"
> #include "nsIApplicationCacheChannel.h"
> #include "nsIPrompt.h"
> #include "nsIResumableChannel.h"
>@@ -339,16 +340,18 @@ private:
>     PRUint32                          mFallingBack              : 1;
>     PRUint32                          mWaitingForRedirectCallback : 1;
>     // True iff this channel is servicing a remote HttpChannelChild
>     PRUint32                          mRemoteChannel : 1;
>     // True if mRequestTime has been set. In such a case it is safe to update
>     // the cache entry's expiration time. Otherwise, it is not(see bug 567360).
>     PRUint32                          mRequestTimeInitialized : 1;
> 
>+    nsRefPtr<nsMD5Stream>             mMD5Stream;
>+
>     nsTArray<nsContinueRedirectionFunc> mRedirectFuncStack;
> 
>     nsresult WaitForRedirectCallback();
>     void PushRedirectAsyncFunc(nsContinueRedirectionFunc func);
>     void PopRedirectAsyncFunc(nsContinueRedirectionFunc func);
> };
> 
> #endif // nsHttpChannel_h__
>diff --git a/netwerk/protocol/http/nsHttpTransaction.cpp b/netwerk/protocol/http/nsHttpTransaction.cpp
>--- a/netwerk/protocol/http/nsHttpTransaction.cpp
>+++ b/netwerk/protocol/http/nsHttpTransaction.cpp
>@@ -644,20 +644,16 @@ nsHttpTransaction::Close(nsresult reason
>     mStatus = reason;
>     mTransactionDone = PR_TRUE; // forcibly flag the transaction as complete
>     mClosed = PR_TRUE;
> 
>     // release some resources that we no longer need
>     mRequestStream = nsnull;
>     mReqHeaderBuf.Truncate();
>     mLineBuf.Truncate();
>-    if (mChunkedDecoder) {
>-        delete mChunkedDecoder;
>-        mChunkedDecoder = nsnull;
>-    }
> 
>     // closing this pipe triggers the channel's OnStopRequest method.
>     mPipeOut->CloseWithStatus(reason);
> }
> 
> //-----------------------------------------------------------------------------
> // nsHttpTransaction <private>
> //-----------------------------------------------------------------------------
>diff --git a/netwerk/protocol/http/nsHttpTransaction.h b/netwerk/protocol/http/nsHttpTransaction.h
>--- a/netwerk/protocol/http/nsHttpTransaction.h
>+++ b/netwerk/protocol/http/nsHttpTransaction.h
>@@ -38,33 +38,33 @@
> 
> #ifndef nsHttpTransaction_h__
> #define nsHttpTransaction_h__
> 
> #include "nsHttp.h"
> #include "nsHttpHeaderArray.h"
> #include "nsAHttpTransaction.h"
> #include "nsAHttpConnection.h"
>+#include "nsHttpChunkedDecoder.h"
> #include "nsCOMPtr.h"
> #include "nsInt64.h"
> 
> #include "nsIPipe.h"
> #include "nsIInputStream.h"
> #include "nsIOutputStream.h"
> #include "nsIInterfaceRequestor.h"
> #include "nsISocketTransportService.h"
> #include "nsITransport.h"
> #include "nsIEventTarget.h"
> 
> //-----------------------------------------------------------------------------
> 
> class nsHttpTransaction;
> class nsHttpRequestHead;
> class nsHttpResponseHead;
>-class nsHttpChunkedDecoder;
> class nsIHttpActivityObserver;
> 
> //-----------------------------------------------------------------------------
> // nsHttpTransaction represents a single HTTP transaction.  It is thread-safe,
> // intended to run on the socket thread.
> //-----------------------------------------------------------------------------
> 
> class nsHttpTransaction : public nsAHttpTransaction
>@@ -118,16 +118,21 @@ public:
>     nsHttpRequestHead     *RequestHead()    { return mRequestHead; }
>     nsHttpResponseHead    *ResponseHead()   { return mHaveAllHeaders ? mResponseHead : nsnull; }
>     nsISupports           *SecurityInfo()   { return mSecurityInfo; }
> 
>     nsIInterfaceRequestor *Callbacks()      { return mCallbacks; } 
>     nsIEventTarget        *ConsumerTarget() { return mConsumerTarget; }
>     nsAHttpConnection     *Connection()     { return mConnection; }
> 
>+    nsHttpHeaderArray     *Trailers()
>+    {
>+        return mChunkedDecoder ? mChunkedDecoder->Trailers() : nsnull;
>+    }
>+
>     // Called to take ownership of the response headers; the transaction
>     // will drop any reference to the response headers after this call.
>     nsHttpResponseHead *TakeResponseHead();
> 
>     // Called to find out if the transaction generated a complete response.
>     PRBool ResponseIsComplete() { return mResponseIsComplete; }
> 
>     void   SetSSLConnectFailed() { mSSLConnectFailed = PR_TRUE; }
>diff --git a/netwerk/test/unit/head_channels.js b/netwerk/test/unit/head_channels.js
>--- a/netwerk/test/unit/head_channels.js
>+++ b/netwerk/test/unit/head_channels.js
>@@ -20,29 +20,31 @@ function read_stream(stream, count) {
>   }
>   return data.join('');
> }
> 
> const CL_EXPECT_FAILURE = 0x1;
> const CL_EXPECT_GZIP = 0x2;
> const CL_EXPECT_3S_DELAY = 0x4;
> const CL_SUSPEND = 0x8;
>+const CL_EXPECT_LATE_FAILURE = 0x10;
> 
> const SUSPEND_DELAY = 3000;
> 
> /**
>  * A stream listener that calls a callback function with a specified
>  * context and the received data when the channel is loaded.
>  *
>  * Signature of the closure:
>  *   void closure(in nsIRequest request, in ACString data, in JSObject context);
>  *
>  * This listener makes sure that various parts of the channel API are
>  * implemented correctly and that the channel's status is a success code
>- * (you can pass CL_EXPECT_FAILURE as flags to allow a failure code)
>+ * (you can pass CL_EXPECT_FAILURE or CL_EXPECT_LATE_FAILURE as flags
>+ * to allow a failure code)
>  *
>  * Note that it also requires a valid content length on the channel and
>  * is thus not fully generic.
>  */
> function ChannelListener(closure, ctx, flags) {
>   this._closure = closure;
>   this._closurectx = ctx;
>   this._flags = flags;
>@@ -125,25 +127,25 @@ ChannelListener.prototype = {
>   onStopRequest: function(request, context, status) {
>     try {
>       if (!this._got_onstartrequest)
>         do_throw("onStopRequest without onStartRequest event!");
>       if (this._got_onstoprequest)
>         do_throw("Got second onStopRequest event!");
>       this._got_onstoprequest = true;
>       var success = Components.isSuccessCode(status);
>-      if ((this._flags & CL_EXPECT_FAILURE) && success)
>+      if ((this._flags & (CL_EXPECT_FAILURE | CL_EXPECT_LATE_FAILURE)) && success)
>         do_throw("Should have failed to load URL (status is " + status.toString(16) + ")");
>-      else if (!(this._flags & CL_EXPECT_FAILURE) && !success)
>+      else if (!(this._flags & (CL_EXPECT_FAILURE | CL_EXPECT_LATE_FAILURE)) && !success)
>         do_throw("Failed to load URL: " + status.toString(16));
>       if (status != request.status)
>         do_throw("request.status does not match status arg to onStopRequest!");
>       if (request.isPending())
>         do_throw("request reports itself as pending from onStopRequest!");
>-      if (!(this._flags & CL_EXPECT_FAILURE) &&
>+      if (!(this._flags & (CL_EXPECT_FAILURE | CL_EXPECT_LATE_FAILURE)) &&
>           !(this._flags & CL_EXPECT_GZIP) &&
>           this._contentLen != -1)
>           do_check_eq(this._buffer.length, this._contentLen)
>     } catch (ex) {
>       do_throw("Error in onStopRequest: " + ex);
>     }
>     try {
>       this._closure(request, this._buffer, this._closurectx);
>diff --git a/netwerk/test/unit/test_md5.js b/netwerk/test/unit/test_md5.js
>new file mode 100644
>--- /dev/null
>+++ b/netwerk/test/unit/test_md5.js
>@@ -0,0 +1,66 @@
>+do_load_httpd_js();
>+
>+var httpserver = new nsHttpServer();
>+var index = 0;
>+var tests = [
>+    {url: "/md5test?valid",
>+     responseheader: [ "content-md5: MI+3bcTXMDYO4zky0vsQVg==" ],
>+     flags : 0,},
>+
>+    {url: "/md5test?invalid",
>+     responseheader: [ "content-md5: ZZZ3bcTXMDYO4zky0vsQVg==" ],
>+     flags : CL_EXPECT_LATE_FAILURE},
>+
>+];
>+
>+function setupChannel(url)
>+{
>+    var ios = Components.classes["@mozilla.org/network/io-service;1"].
>+                         getService(Ci.nsIIOService);
>+    var chan = ios.newChannel("http://localhost:4444" + url, "", null);
>+    var httpChan = chan.QueryInterface(Components.interfaces.nsIHttpChannel);
>+    return httpChan;
>+}
>+
>+function startIter()
>+{
>+    var channel = setupChannel(tests[index].url );
>+    channel.asyncOpen(new ChannelListener(completeIter,
>+                                          channel, tests[index].flags), null);
>+}
>+
>+function completeIter(request, data, ctx)
>+{
>+    if (++index < tests.length ) {
>+        startIter();
>+    } else {
>+        httpserver.stop(do_test_finished);
>+    }
>+}
>+
>+function run_test()
>+{
>+    httpserver.registerPathHandler("/md5test", handler);
>+    httpserver.start(4444);
>+
>+    startIter();
>+    do_test_pending();
>+}
>+
>+function handler(metadata, response)
>+{
>+    var body = "thequickbrownfox";
>+    response.setHeader("Content-Type", "text/plain", false);
>+
>+    var header = tests[index].responseheader;
>+    if (header != undefined) {
>+        for (var i = 0; i < header.length; i++) {
>+            var splitHdr = header[i].split(": ");
>+            response.setHeader(splitHdr[0], splitHdr[1], false);
>+        }
>+    }
>+    
>+    response.setStatusLine(metadata.httpVersion, 200, "OK");
>+    response.bodyOutputStream.write(body, body.length);
>+}
>+ 

Another trailing newline.

Throughout the patch, prefer Foo* bar over Foo *bar, String& aBar over String &aBar and aArgument over argument or _argument. Also, C++-style casts are preferred over C-style ones.
Comment 15 Patrick McManus [:mcmanus] 2010-10-12 06:45:19 PDT
Ms2ger, I sincerely appreciate the comments - thanks.

> Style police:

As a relative noob to the codebase, I'm relieved to know that there is a formatting justice system. This has been very confusing for me. Where is the law book kept? (that sounds kind of snarky - sorry. Its a sincere question.)

I've consulted https://developer.mozilla.org/En/Mozilla_Coding_Style_Guide and https://developer.mozilla.org/index.php?title=en/C%2b%2b_Portability_Guide but they are silent on most of the matters you have raised.

I want to do the right thing but I am at a total loss as to what is expected coding style and what is my personal preference and in turn I'm unable to figure out what in your review is based on policy and what is based on your personal preference differing from mine. Are there better resources for me to consult? 


> >+    return;
> >+}
> 
> Why the return?

I personally favor a return in every function so I can search forward to the return line(s) and quickly delimit the function. I don't add them to existing classes (well, sometimes by habit - but feel free to call me on that), but I thought that was a reasonable personal style attribute in a new class. Is there a guideline for sorting out that kind of decision?

> >+    mResult.SetLength(0);
> 
> mResult.Truncate();

what is the difference? what is the guideline?

> 
> Does this need to return nsresult?

That one no, but complete() methods in general could certainly fail (e.g. insufficient input), so for future consistency I put it there.

> >+        PRUint32 chunk = PR_MIN(count, 1024);
> 
> NS_MIN

ok. How do I know which one to use?

> >+
> >+/* MD5 implementation below is derived from the public domain SQLlite 3.7.2 */

So this is an impt comment :). I've intentionally tried to inline this as directly as possible for a couple reasons:

 * If the reference changes for some reason it simplifies incorporating the changes here
 * It more clearly establishes the provenance of the code. There is another copy floating around with the comment along the lines "PD code from SQLLite, some changes by Google All Rights Reserved" which I think is kind of a nightmare of a comment to run into. (which changes exactly are reserved?) Now if there are good reasons for the changes then so be it, but I intentionally avoided making any style changes for this reason.

If there is something in there that causes a portability problem, please help me flag it and of course we will make that change.

> 
>  Also, don't we have an MD5
> implementation yet?

I cannot locate one. There is a piece of google code imported into moz-central but it is not even compiled and contains fuzzy licensing - I did not use that. I'm happy to take a pointer to something that can be used instead of this.

> Throughout the patch, prefer Foo* bar over Foo *bar, String& aBar over String
> &aBar and aArgument over argument or _argument. Also, C++-style casts are
> preferred over C-style ones.

Why? That is opposite of my personal preference (especially on the first two points), and almost all of the existing netwerk code I'm looking at does it in foo *bar style.. is there a guideline everyone adheres to?
Comment 16 karl155 2010-10-12 10:56:04 PDT
(In reply to comment #15)
> >  Also, don't we have an MD5
> > implementation yet?
> 
> I cannot locate one. There is a piece of google code imported into moz-central
> but it is not even compiled and contains fuzzy licensing - I did not use that.
> I'm happy to take a pointer to something that can be used instead of this.

Just a quick search, not sure that it helps: security/nss/lib/freebl/md5.c
Comment 17 karl155 2010-10-12 11:15:50 PDT
Better one: https://developer.mozilla.org/en/nsICryptoHash
(as used in nsHttpDigestAuth)
Comment 18 Patrick McManus [:mcmanus] 2010-10-12 11:23:34 PDT
(In reply to comment #17)
> Better one: https://developer.mozilla.org/en/nsICryptoHash
> (as used in nsHttpDigestAuth)

That is excellent and sounds promising - I was unable to find that. I will try to update the patch to use that.

Thanks.
Comment 19 Jason Duell [:jduell] (needinfo? me) 2010-10-12 21:11:31 PDT
Re: style changes.  The style guides you refer to are the "canonical" ones.  And yes, they are silent about many of these questions.  Which is a bit tedious.

Here's my understanding of the semi-official (but mysteriously not written down anywhere--sigh) rules:

For new files, we use 2-space indents.  In necko, we put them in mozilla::net namespace.

I personally don't like nestling */& with the type rather than the variable name, but I'm nagged about it a lot.  And most of necko does the opposite.  So I could go either way.  I suppose until we see it writing we can keep our beloved "int *p" style :)

Ms2ger is right about most of the other things.  New() can no longer fail (but malloc and string allocs/reallocs can: see https://developer.mozilla.org/en/Infallible_memory_allocation and https://developer.mozilla.org/En/Mozilla_internal_string_guide).  Putting "return" at the end of function just so your editor can hop to them is discouraged (most editors can just hop to a '}' at the start of a line for this, no?)

I don't care as much about using macros vs inline functions, at least for short things.  

The licensing issue we may want to bump upstream to gerv or someone with quasi-lawyering skillz.
Comment 20 Patrick McManus [:mcmanus] 2010-10-13 14:13:29 PDT
Created attachment 482961 [details] [diff] [review]
Patch to Implement HTTP Content-MD5 Verification (v6)

updated patch to use nsICryptoHash - thanks karl155. This removes the inlined sqlite code and thus makes the point of whether it should have mozilla style or not moot.

Also incorporated the majority of the style comments from c14 && c19
Comment 21 :Ms2ger 2010-10-16 06:53:52 PDT
(In reply to comment #15)
> Ms2ger, I sincerely appreciate the comments - thanks.
> 
> > Style police:
> 
> As a relative noob to the codebase, I'm relieved to know that there is a
> formatting justice system. This has been very confusing for me. Where is the
> law book kept? (that sounds kind of snarky - sorry. Its a sincere question.)
> 
> I've consulted https://developer.mozilla.org/En/Mozilla_Coding_Style_Guide and
> https://developer.mozilla.org/index.php?title=en/C%2b%2b_Portability_Guide but
> they are silent on most of the matters you have raised.
> 
> I want to do the right thing but I am at a total loss as to what is expected
> coding style and what is my personal preference and in turn I'm unable to
> figure out what in your review is based on policy and what is based on your
> personal preference differing from mine. Are there better resources for me to
> consult? 

You found the right documents. Re-reading my comments, in particular the comment about avoiding macros is more of a personal preference, and the include guard is one of my pet peeves. The others are canonical style as I understand it myself; other contributors tend to be less strongly attached to questions of style.

The "style police" was probably too strong; I'm not your reviewer, and you're free to say no to any comments you disagree with.

> > >+    return;
> > >+}
> > 
> > Why the return?
> 
> I personally favor a return in every function so I can search forward to the
> return line(s) and quickly delimit the function. I don't add them to existing
> classes (well, sometimes by habit - but feel free to call me on that), but I
> thought that was a reasonable personal style attribute in a new class. Is there
> a guideline for sorting out that kind of decision?

It felt strange to me, but I don't feel strongly about it.

> > >+    mResult.SetLength(0);
> > 
> > mResult.Truncate();
> 
> what is the difference? what is the guideline?

I was rather surprised that wasn't in the style guide, but at least it's somewhat more self-documenting, and the canonical way to do this (again, AIUI).

> > 
> > Does this need to return nsresult?
> 
> That one no, but complete() methods in general could certainly fail (e.g.
> insufficient input), so for future consistency I put it there.

Fair enough.

> > >+        PRUint32 chunk = PR_MIN(count, 1024);
> > 
> > NS_MIN
> 
> ok. How do I know which one to use?

You couldn't know this, but always NS_MIN. See bug 518502.

> > >+
> > >+/* MD5 implementation below is derived from the public domain SQLlite 3.7.2 */
> 
> So this is an impt comment :). I've intentionally tried to inline this as
> directly as possible for a couple reasons:
> 
>  * If the reference changes for some reason it simplifies incorporating the
> changes here
>  * It more clearly establishes the provenance of the code. There is another
> copy floating around with the comment along the lines "PD code from SQLLite,
> some changes by Google All Rights Reserved" which I think is kind of a
> nightmare of a comment to run into. (which changes exactly are reserved?) Now
> if there are good reasons for the changes then so be it, but I intentionally
> avoided making any style changes for this reason.
> 
> If there is something in there that causes a portability problem, please help
> me flag it and of course we will make that change.

Okay; I wasn't sure how much you'd already changed the code. However, my comments are indeed moot.

> > 
> >  Also, don't we have an MD5
> > implementation yet?
> 
> I cannot locate one. There is a piece of google code imported into moz-central
> but it is not even compiled and contains fuzzy licensing - I did not use that.
> I'm happy to take a pointer to something that can be used instead of this.
> 
> > Throughout the patch, prefer Foo* bar over Foo *bar, String& aBar over String
> > &aBar and aArgument over argument or _argument. Also, C++-style casts are
> > preferred over C-style ones.
> 
> Why? That is opposite of my personal preference (especially on the first two
> points), and almost all of the existing netwerk code I'm looking at does it in
> foo *bar style.. is there a guideline everyone adheres to?

The guide is pretty clear, and in particular for new files I'd prefer following it. The consensus isn't too strong here, though, so following netwerk/ style would be fine. (Excepts for casts, as C-style casts are known to have masked bugs.)

(Sorry I didn't reply earlier; I thought I cc:d myself.)
Comment 22 Patrick McManus [:mcmanus] 2010-12-03 15:21:23 PST
Created attachment 495137 [details] [diff] [review]
patch to imlement http content md5 verification

update bitrot, confrom better to style guide, etc..
Comment 23 Boris Zbarsky [:bz] 2010-12-03 20:34:41 PST
I'm going to read over the code in the near future hopefully, but I had some goals questions.

Since we can't MD5-verify until all the data is received and we do progressive rendering, the user experience here will be that the page is loading along and then suddenly dies, right?  Is that what happens if partway through a gzip stream we get bytes we don't expect too?
Comment 24 Patrick McManus [:mcmanus] 2010-12-04 06:10:45 PST
(In reply to comment #23)
> I'm going to read over the code in the near future hopefully, but I had some
> goals questions.

no worries. This is obviously not 2.0 material, so I'm not anxious - just keeping things up to date as best as I can.

> Since we can't MD5-verify until all the data is received and we do progressive
> rendering, the user experience here will be that the page is loading along and
> then suddenly dies, right?

It will always happen at the end of data transfer (though maybe not completed rendering) in the case of MD5 because the hash is across all the data.

> Is that what happens if partway through a gzip
> stream we get bytes we don't expect too?

yes.
Comment 25 Brian Smith (:briansmith, :bsmith, use NEEDINFO?) 2010-12-28 20:55:57 PST
This seems very risky and I don't see a clear gain.

Very, very few websites send Content-MD5 now and that is unlikely to change. If we implement the Assoc-Req header then websites that want to opt-in to (more) pipelining would start sending the associated request header which seems like it would work much better than anything that Content-MD5 would provide. 

How many (client-side desktop) proxies (e.g. ad blockers, "accelerators") are there that rewrite content in one way or another? And, how many of them properly recalculate Content-MD5 when they rewrite the content?

How certain are we that these pipelining problems only result in corrupt bodies and not (just) corrupt headers, which Content-MD5 doesn't cover?

If you have a very slow connection, and you download a progressive JPEG, you will see the JPEG (in various levels of detail), but then once the Content-MD5 check fails, it will disappear? Or, you download a giant ISO image, get to 100% done, and the file just disappears with no opportunity for the user to keep it around regardless of the mismatch?

Also note that various government regulations prevent us from using the MD5 algorithm as an anti-tampering check and SSL/TLS already provides stronger anti-tampering/anti-corruption evidence.

As far as the use for improving pipelining support goes, I think we should ship an implementation of the final Assoc-Req and/or other explicit request for pipelining mechanism in a release first, and then re-evaluate whether we need to implement this after we see how widely Assoc-Req (or whatever) is deployed, because Assoc-Req seems to have the least likelihood of causing any regressions.
Comment 26 Unknown W. Brackets 2010-12-29 00:20:58 PST
I agree; I think your arguments are very sound.  And even if MD-5 is a bad hashing algorithm, they are still generally valid for SHA-1.

I'm not really sure why bug #597706 is blocked by this, is it just for potential security win?  That's the only non-pipelining bug.

FWIW, Amazon S3 uses Content-MD5 (but I believe only for uploads?)  I think that would be the biggest example deployment.  Personally I see Content-MD5 useful only as a request header (which might be nice if Mozilla sent with uploaded files.)

-[Unknown]
Comment 27 karl155 2011-01-06 21:10:56 PST
(In reply to comment #25)
> Or, you download a giant ISO image, get to 100%
> done, and the file just disappears with no opportunity for the user to keep it
> around regardless of the mismatch?

This is exactly the point where I see the main benefit in this feature. Now I think that everything worked correct because there is no indication when something went wrong. I sometimes get minor transmission errors and the only chance to note them is to download all files again. This can be very annoying when I don't know in which file the error occurred or not even know that an transmission error occurred at all.

Sure, some kind of question dialog or so, asking what to do, will be the best. But just marking the download as failed is a lot better than completely ignoring that something went wrong.

The next big improvement is, that Firefox will also note other errors in the download (like Bug 237623).
Comment 28 Alex Limi (:limi) — Firefox UX Team 2011-01-07 11:41:42 PST
I believe we wouldn't (shouldn't) implement this for general web content, just for data that is downloaded to a file. That's where these things bite you, not for an image somewhere in a page.

We should of course implement this as some sort of warning (and not delete the file automatically, etc) — so it gives people the ability to troubleshoot corrupted downloads, and try them again easily.
Comment 29 karl155 2011-01-11 19:48:50 PST
Maybe even if the file was checked successfully it should be displayed to the user that the file has a correct md5 hash. (To let the user know that there was any checking mechanism at all.)
Comment 30 Patrick McManus [:mcmanus] 2011-01-18 12:35:41 PST
Created attachment 504820 [details] [diff] [review]
patch to imlement http content md5 verification

Following changes since the last version:

1] Accept broken(?) Content-MD5 response headers that are not padded in B64 to 4 byte increments

2] md5 should be calculated TE removed, but CE still applied. I misunderstood the layering in the Tee stream and calculated it with CE removed.

3] a failed MD5 verification no longer by default disrupts rendering. It logs to the console and provides feedback to the pipelining connection management state (later patch). The preference network.http.content-md.enforce can be set to true (default false) to provide an end to end integrity check.
Comment 31 Brian Smith (:briansmith, :bsmith, use NEEDINFO?) 2011-01-18 13:46:44 PST
The code should be modified to check that the "Content-MD5" token is listed in the Trailer header field. Presumably then, a response with "Trailer: Content-MD5" but without a Content-MD5 trailer would be considered corrupt.
Comment 32 Patrick McManus [:mcmanus] 2011-02-08 12:48:32 PST
Created attachment 510734 [details]
patch to imlement http content md5 verification v9

fixes its own test case - which broke when the pref was required to actively reject
Comment 33 Patrick McManus [:mcmanus] 2011-02-18 18:30:45 PST
Created attachment 513666 [details] [diff] [review]
patch to imlement http content md5 verification v10

update for bitrot
Comment 34 :Ms2ger 2011-02-20 01:56:13 PST
Patrick, assuming you didn't mean to unassign this bug.
Comment 35 Honza Bambas (:mayhemer) 2011-03-27 10:23:14 PDT
Boris, I'm stealing review of this bug from you.
Comment 36 Boris Zbarsky [:bz] 2011-03-27 11:25:22 PDT
Sounds good to me.
Comment 37 Honza Bambas (:mayhemer) 2011-03-27 14:45:51 PDT
Comment on attachment 513666 [details] [diff] [review]
patch to imlement http content md5 verification v10

>diff --git a/docshell/base/nsDocShell.cpp b/docshell/base/nsDocShell.cpp
>+        case NS_ERROR_CORRUPTED_CONTENT:
>+            // Invalid Content Found. e.g. Md-5 failure.

Comment should rather be:
// Broken Content Detected. e.g. Content-MD5 check failure.

>diff --git a/modules/libpref/src/init/all.js b/modules/libpref/src/init/all.js
>+// If true generate CORRUPTED_CONTENT errors for entities that
>+// contain an invalid Content-MD5 response header
>+pref ("network.http.content-md5.enforce", false);

A space after |pref|

>diff --git a/netwerk/base/public/nsNetError.h b/netwerk/base/public/nsNetError.h
>+#define NS_ERROR_CORRUPTED_CONTENT \
>+    NS_ERROR_GENERATE_FAILURE(NS_ERROR_MODULE_NETWORK, 29)

The value is a bit questionable.  The highest number used is currently 81.  However, 29 is unused at the moment.  But maybe this is OK...

>diff --git a/netwerk/base/src/nsMD5Stream.cpp b/netwerk/base/src/nsMD5Stream.cpp
>new file mode 100644

For new files and the header please see:
http://www.mozilla.org/MPL/boilerplate-1.1/

>+nsMD5Stream::nsMD5Stream() :
>+mIsComplete(PR_FALSE)
>+{
>+  mCryptoHasher = do_CreateInstance(NS_CRYPTO_HASH_CONTRACTID, &mHashResult);
>+  if (NS_SUCCEEDED(mHashResult))
>+    mHashResult = mCryptoHasher->Init(nsICryptoHash::MD5);
>+}

I would a but rather more see nsMD5Stream::Init() method here.

>+nsMD5Stream::Finalize()

>+  if (NS_SUCCEEDED(mHashResult))
>+    mHashResult = mCryptoHasher->Finish(PR_TRUE, mResult);

Please throw the mCryptoHasher away here.  You will not need it and you will safe resources ASAP.

>+NS_IMETHODIMP nsMD5Stream::Close()

You should call Finalize() in this method.

>+NS_IMETHODIMP nsMD5Stream::Write(const char *aBuf,
>+                                 PRUint32 aCount,
>+                                 PRUint32 *_retval NS_OUTPARAM)
>+{
>+  if (mIsComplete && aCount)
>+    return NS_ERROR_UNEXPECTED;

Please check only for mIsComplete to return.  Even zero-bytes write should fail after you finished the stream.

>+NS_IMETHODIMP
>+nsMD5Stream::WriteSegments(nsReadSegmentFun reader,
>+                           void * closure,
>+                           PRUint32 count,
>+                           PRUint32 *_retval)
>+{
>+  if (mIsComplete && count)
>+    return NS_ERROR_UNEXPECTED;

As well here.

>diff --git a/netwerk/protocol/http/nsHttpChannel.cpp b/netwerk/protocol/http/nsHttpChannel.cpp
>@@ -769,16 +770,41 @@ nsHttpChannel::CallOnStartRequest()
>     LOG(("  calling mListener->OnStartRequest\n"));
>     nsresult rv = mListener->OnStartRequest(this, mListenerContext);
>     if (NS_FAILED(rv)) return rv;
> 
>     // install stream converter if required
>     rv = ApplyContentConversions();
>     if (NS_FAILED(rv)) return rv;
> 
>+    // If Content-MD5 is in the response header, or if we are told
>+    // to expect chunked-encoding trailers, then setup a
>+    // stream tee with the nsMD5Stream to calculate the sum of
>+    // what we really receive with any content-encoding applied
>+    //
>+    // Still calculate it even if enforcement is off by pref so that
>+    // we can log failures and use them as feedback into the
>+    // pipeline connection manager
>+    if (mResponseHead &&
>+        (mResponseHead->PeekHeader(nsHttp::Content_MD5) ||
>+         mResponseHead->PeekHeader(nsHttp::Trailer))) {
>+        nsresult rv;
>+        nsRefPtr<nsMD5Stream> md5stream = new nsMD5Stream();
>+
>+        nsCOMPtr<nsIStreamListenerTee> tee =
>+            do_CreateInstance(kStreamListenerTeeCID, &rv);
>+        if (NS_SUCCEEDED(rv)) {
>+            rv = tee->Init(mListener, md5stream, nsnull);
>+            if (NS_SUCCEEDED(rv)) {
>+                mMD5Stream = md5stream;
>+                mListener = tee;
>+            }
>+        }
>+    }
>+

Adding the MD5 stream after content conversion to the chain is against the RFC:

"The MD5 digest is computed based on the content of the entity-body,
including any content-coding that has been applied, but not including
any transfer-encoding applied to the message-body. If the message is
received with a transfer-encoding, that encoding MUST be removed
prior to checking the Content-MD5 value against the received entity."

Move your new code before call to ApplyContentConversions.  Also maybe have a helper method for it?

And also you should not add the MD5 stream for cached responses.  Maybe check for mTransaction is OK to detect it.

>@@ -3988,16 +4014,62 @@ nsHttpChannel::OnStopRequest(nsIRequest 

Please move this new code to a helper method.

>+        // verify http content-md5 if present
>+        if (mResponseHead && mMD5Stream) {

Doesn't the trailers merge to the response head?  I don't know exactly but I would expect it, worth to check for it.

>+                    // Sometimes a webserver will drop the required =
>+                    // padding. Before rejecting this lets see if that
>+                    // is the case
>+                    nsCAutoString normalizedHttpMD5(httpMD5);
>+                    while (normalizedHttpMD5.Length() < 24)
>+                        normalizedHttpMD5.Append('=');
>+                    if (!md5B64.Equals(normalizedHttpMD5)) {

You probably don't need to do this extra check when the header already has 24 bytes.

>diff --git a/netwerk/protocol/http/nsHttpTransaction.cpp b/netwerk/protocol/http/nsHttpTransaction.cpp

Remove #include "nsHttpChunkedDecoder.h" when you are doing it in the header.

>diff --git a/netwerk/test/unit/test_md5.js b/netwerk/test/unit/test_md5.js

I'll take a look at the test in a second round.

r-
Comment 38 Patrick McManus [:mcmanus] 2011-03-27 15:21:18 PDT
Thanks for taking the time for the review - can I ask you to double check one particular bit?

(In reply to comment #37)
>
> > 
> >+    // If Content-MD5 is in the response header, or if we are told
> >+    // to expect chunked-encoding trailers, then setup a
> >+    // stream tee with the nsMD5Stream to calculate the sum of
> >+    // what we really receive with any content-encoding applied
> >+    //
> >+    // Still calculate it even if enforcement is off by pref so that
> >+    // we can log failures and use them as feedback into the
> >+    // pipeline connection manager
> >+    if (mResponseHead &&
> >+        (mResponseHead->PeekHeader(nsHttp::Content_MD5) ||
> >+         mResponseHead->PeekHeader(nsHttp::Trailer))) {
> >+        nsresult rv;
> >+        nsRefPtr<nsMD5Stream> md5stream = new nsMD5Stream();
> >+
> >+        nsCOMPtr<nsIStreamListenerTee> tee =
> >+            do_CreateInstance(kStreamListenerTeeCID, &rv);
> >+        if (NS_SUCCEEDED(rv)) {
> >+            rv = tee->Init(mListener, md5stream, nsnull);
> >+            if (NS_SUCCEEDED(rv)) {
> >+                mMD5Stream = md5stream;
> >+                mListener = tee;
> >+            }
> >+        }
> >+    }
> >+
> 
> Adding the MD5 stream after content conversion to the chain is against the RFC:
> 
> "The MD5 digest is computed based on the content of the entity-body,
> including any content-coding that has been applied, but not including
> any transfer-encoding applied to the message-body. If the message is
> received with a transfer-encoding, that encoding MUST be removed
> prior to checking the Content-MD5 value against the received entity."
> 
> Move your new code before call to ApplyContentConversions.  

getting this ordering right was one of the main points of the change from rev 7 to rev 8 (comment 30).. I know I even had a live example of it in the wild to draw my attention to it - so I suspect the code is right..


> 
> And also you should not add the MD5 stream for cached responses. 

:) You're right - its probably not going to bitrot in the cache.
Comment 39 Honza Bambas (:mayhemer) 2011-03-28 02:39:00 PDT
(In reply to comment #38)
> getting this ordering right was one of the main points of the change from rev 7
> to rev 8 (comment 30).. I know I even had a live example of it in the wild to
> draw my attention to it - so I suspect the code is right..

Yes.  It is OK, the later added tees gets the un-decoded content.  Sorry, my test was backwards.  I realized this morning :)

> :) You're right - its probably not going to bitrot in the cache.

My goal is not to waste resources on cached responses for which you don't need to check MD5.
Comment 40 Honza Bambas (:mayhemer) 2011-03-28 11:06:12 PDT
(In reply to comment #38)
> getting this ordering right was one of the main points of the change from rev 7
> to rev 8 (comment 30).. I know I even had a live example of it in the wild to
> draw my attention to it - so I suspect the code is right..

Maybe also worth to extend your test for this?
Comment 41 Patrick McManus [:mcmanus] 2011-04-14 09:44:39 PDT
> >diff --git a/netwerk/base/src/nsMD5Stream.cpp b/netwerk/base/src/nsMD5Stream.cpp
> >new file mode 100644
> 
> For new files and the header please see:
> http://www.mozilla.org/MPL/boilerplate-1.1/
> 

As far as I can tell, that is the boilerplate I used. Please help me understand the specific change you would like to see?


> Doesn't the trailers merge to the response head?  I don't know exactly but I
> would expect it, worth to check for it.

confirmed they don't merge.
Comment 42 Patrick McManus [:mcmanus] 2011-04-14 09:48:57 PDT
Created attachment 526030 [details] [diff] [review]
patch to imlement http content md5 verification v11

integrate review comments from c37 and c40 .. modulo clarifications of c38 and c41.
Comment 43 Brian Smith (:briansmith, :bsmith, use NEEDINFO?) 2011-04-14 10:33:05 PDT
I re-iterate my objections from comment 25. We have no evidence that anybody is using Content-MD5 and anybody that would add Content-MD5 to improve the detection of pipelining problems could implement Assoc-Req instead. IMO, we are better off without this extra code.
Comment 44 Patrick McManus [:mcmanus] 2011-04-14 11:14:21 PDT
(In reply to comment #43)
> I re-iterate my objections from comment 25. We have no evidence that anybody is
> using Content-MD5 and anybody that would add Content-MD5 to improve the
> detection of pipelining problems could implement Assoc-Req instead. IMO, we are
> better off without this extra code.

I hope your primary objection (essentially as I understood it, false positives) was addressed in comment 30 - a transaction is not aborted by default for failing an MD5 check. There is a pref if you want the aborting behavior.

By default on failure you get a console log and, most importantly to me, feedback to the pipeline engine to turn off that optimization for requests in that context. That is transparent to the user. Failing an MD5 check won't degrade your browsing experience beyond what it is today - you will continue to see data that is different than what the content owner checksummed, for better or for worse.

As far as MD5 vs Assoc-Req, I think they are complementary. MD5 is standardized and is produced by a commonly available apache module. If it is present it provides useful information. Assoc-Req is just a header in an Internet-Draft which I believe isn't even adopted by a working group; though I do have hopes for it and think we should support it too as an alternate approach. MD5 might get deprecated down the road in favor of something else but as we are just a consumer of that, it isn't really a problem.

MD5 is verifiably used on the Internet today - though sparingly. For instance, it appears hundreds of times in the dataset from the latest run of the http archive project (http://httparchive.org/downloads.php), and I know I have run into it in the wild, although I no longer have the URL, because that is what lead me to pad things out to 24 bytes if the response header didn't contain the expected "=" B64 padding. I have also seen it used in conjunction with Content-Encodings in the wild.
Comment 45 Brian Smith (:briansmith, :bsmith, use NEEDINFO?) 2011-04-14 11:23:01 PDT
(In reply to comment #44)
> I hope your primary objection (essentially as I understood
> it, false positives) was addressed in comment 30

Right, that was my main objection.

> MD5 is verifiably used on the Internet today - though sparingly.

My doubt here is about how often the user will actually go to websites that has Content-MD5, and thus whether users will actually benefit from us having this logic. How many sites from Talos are using Content-MD5 in responses?

Also, are mobile transcoding proxies stripping or recalculating Content-MD5? If not, this could turn off pipelining unnecessarily on mobile networks that do such transcoding, resulting in less pipelining than we have now.
Comment 46 Patrick McManus [:mcmanus] 2011-04-14 11:42:49 PDT
(In reply to comment #45)

> logic. How many sites from Talos are using Content-MD5 in responses?
> 

I'm not familiar with how the talos content is derived - but the httparchive.org link I cited is derived from "Top Alexa/Fortune/Quantcast" sites. http://httparchive.org/about.php#listofurls

> Also, are mobile transcoding proxies stripping or recalculating Content-MD5? If
> not, this could turn off pipelining unnecessarily on mobile networks that do
> such transcoding, resulting in less pipelining than we have now.

My POV is that conservatism is the right approach here. We can loosen things up based on experience. (Another item for telemetry.)
Comment 47 Brian Smith (:briansmith, :bsmith, use NEEDINFO?) 2011-04-14 12:34:08 PDT
(In reply to comment #46)
> I'm not familiar with how the talos content is derived - but the
> httparchive.org link I cited is derived from "Top Alexa/Fortune/Quantcast"
> sites. http://httparchive.org/about.php#listofurls

That sounds like a reasonable source considering our current lack of data. We should look into which sites in that list are providing Content-MD5, and what percentage of the total number of sites is, to judge whether Content-MD5 usage is significant.

> > Also, are mobile transcoding proxies stripping or
> > recalculating Content-MD5? If not, this could turn
> > off pipelining unnecessarily on mobile networks
> > that do such transcoding, resulting in less
> > pipelining than we have now.
> 
> My POV is that conservatism is the right approach
> here. We can loosen things up based on experience.
> (Another item for telemetry.)

It would be a big and unnecessary performance regression if we stopped pipelining on mobile just because of a transcoding proxy. My point is that we need to weigh this performance regression risk against the actual benefit. In Santa Cruz, we agreed that it would be great to get these pipelining patches in ASAP but we also agreed that we need to mitigate the risk of them causing regressions (perf or functionality).
Comment 48 Steve Roussey (:sroussey) 2011-04-14 13:54:39 PDT
> It would be a big and unnecessary performance regression if we stopped
> pipelining on mobile just because of a transcoding proxy

Only a potential performance regression if the site was using Content-MD5 and a transcoding proxy passed the Content-MD5 unaltered, right? So it has to go through two unlikely hoops, and then pipelining will be disabled for that one site, and performance will be the same as things stand today with FF4. Did I summarize that correctly, or am I missing something important?
Comment 49 Brian Smith (:briansmith, :bsmith, use NEEDINFO?) 2011-04-14 14:24:19 PDT
(In reply to comment #48)
> and then pipelining will be disabled for that one site,

If we'd turn off pipelining on a Content-MD5 failure, it would only make sense to do it for all (non-HTTPS) sites, because we'd assume that a bad intermediary (proxy) is to blame, not a bad origin server.

> and performance will be the same as things stand today with FF4.

Firefox Mobile 4 does pipelining by default already.
Comment 50 Brian Smith (:briansmith, :bsmith, use NEEDINFO?) 2011-04-14 14:26:54 PDT
(In reply to comment #49)
> If we'd turn off pipelining on a Content-MD5 failure, it would only make sense
> to do it for all (non-HTTPS) sites, because we'd assume that a bad intermediary
> (proxy) is to blame, not a bad origin server.

A consequence of this would be that that a mischievous server could turn off pipelining simply by purposely sending a bad Content-MD5 (or Assoc-Req, or anything else we use to heuristically detect bad proxies).
Comment 51 Patrick McManus [:mcmanus] 2011-04-14 14:45:29 PDT
(In reply to comment #49)
> (In reply to comment #48)

> 
> If we'd turn off pipelining on a Content-MD5 failure, it would only make sense
> to do it for all (non-HTTPS) sites, because we'd assume that a bad intermediary
> (proxy) is to blame, not a bad origin server.
> 

it is not my intention to do that.
Comment 52 Honza Bambas (:mayhemer) 2011-05-29 09:50:00 PDT
Created attachment 535944 [details] [diff] [review]
v12 (review updated and merged v11)

This is merged and slightly updated patch v11.  Interdiff follows.
Comment 53 Honza Bambas (:mayhemer) 2011-05-29 09:59:37 PDT
Created attachment 535945 [details] [diff] [review]
v11-v12 interdiff

- localizations added to mobile as well
- failure in nsMD5Stream::Init makes nsMD5Stream::Finalize behave like the stream would never be initialized
- in nsHttpChannel::SetupMD5Stream I realized checking for mTransactionPump to see if we don't load from the cache is not enough ; changed to check nsHttpChannel::IsFromCache result check
- nsHttpChannel::VerifyMD5 now makes sure the stream is no longer referenced by the channel ; tries to release resource ASAP
- test_md5.js added to the xpcshell manifest (new) and done some very small tweaks on it (indentions, redundant commans etc.)
- few comments tweaks
Comment 54 Honza Bambas (:mayhemer) 2011-05-29 10:10:02 PDT
As I'm thinking of this more, I am not a big fan of adding this check to the tree.  Not many sites are IMO using this header.  Also I'm a bit worried about time regressions for chunked encoded responses.  We may calculate MD5 (that is not cheap these days!) for chunked response that does not contain Content-MD5 header.  

We need telemetry on how often this can be used to detect problems (i.e. how often we see Content-MD5) and how often (if not more often) it is just a waste of resources (we have chunked response w/ trailers not providing Content-MD5).

Brian, I have heard about making hashing more efficient (not based on NSS) in the future.  How far is that work and what bug (if any) is that being tracked in?
Comment 55 Patrick McManus [:mcmanus] 2011-05-29 11:10:33 PDT
  Not many sites are IMO using this header.  Also I'm a bit worried
> about time regressions for chunked encoded responses.  We may calculate MD5
> (that is not cheap these days!) for chunked response that does not contain
> Content-MD5 header.  
>

I am not worried at all. We will only calculate the checksum either in the presence of a content-md header, or in the presence of a "trailers" header, in which case it is appropriate. In practice trailers is only ever used in practice in conjunction with MD5, as it is the only thing that makes sense to put there. I'm pretty sure I've never seen anything else there.
Comment 56 Patrick McManus [:mcmanus] 2011-05-31 15:04:37 PDT
Created attachment 536429 [details] [diff] [review]
telemetry v1

telemetry patch to figure out more about md5.

Each http transaction is classified as one of:
 kMD5None (no md5 calculated)
 kMD5TrailerAdvertised (calculated because of a trailer header, but no trailers found)
 kMD5TrailerWithout (there was a trailer section, but it didn't contain md5)
 kMDTrailerWith (calculated and compared to a md5 in the trailer)
 kMD5Header (calculated and compared to value in header)

Additionally we track the fraction of passes/fails when we do a comparison.

I've got some questions for taras about the API, but this is a start
Comment 57 Honza Bambas (:mayhemer) 2011-05-31 16:06:24 PDT
UMA_HISTOGRAM_ENUMERATION is probably more useful:

UMA_HISTOGRAM_ENUMERATION("Name", value, valueMax + 1);
Comment 58 Honza Bambas (:mayhemer) 2011-05-31 16:11:10 PDT
BTW: the patch for telemetry should not be dependent on the MD5 header check patch.  As I understand we want the usage numbers (how many sites use this header) to see if it makes sense to land this patch.  If we decide otherwise, then the telemetry should be part of the main patch (so just qfold it).
Comment 59 Patrick McManus [:mcmanus] 2011-05-31 16:47:02 PDT
(In reply to comment #58)
> BTW: the patch for telemetry should not be dependent on the MD5 header check
> patch.  As I understand we want the usage numbers (how many sites use this
> header) to see if it makes sense to land this patch.

I don't agree with that. I think the md5 patch is quite conservative and meets our normal standards and telemetry is not yet useful being preffed off and all. Therefore the feature should not be blocked by the data at this point in our life cycle.

I think gathering the data is good so we can adjust in flight as necessary - there is always another train. but there is nothing here to suggest to me there is anything to be worried about; certainly not a major issue. Check httparchive.org to find the paucity of Trailers headers.
Comment 60 Brian Smith (:briansmith, :bsmith, use NEEDINFO?) 2011-05-31 17:14:26 PDT
This checking of Content-MD5 shouldn't be a blocker for pipelining. Since the vast majority of sites do not use Content-MD5, and we don't expect that to change anytime soon, pipelining must work correctly (approximately) 100% of the time even without Content-MD5, in order for pipelining to be turned on by default. A website that wants to add Content-MD5 to get pipelining would be better off implementing Assoc-Req instead, AFAICT, and only if pipelining were to being an opt-in feature (off by default).

Content-MD5 checking isn't risk-free. Since Content-MD5 can result in false positive indications (e.g. content transforming proxies) for pipelining failures, a Content-MD5 mismatch runs the risk of being an unnecessary performance regression for mobile, where we already have pipelining turned on. 

> Brian, I have heard about making hashing more efficient
> (not based on NSS) in the future.  How far is that work
> and what bug (if any) is that being tracked in?

AFAIK, the overhead of NSS's wrappers around the MD5 implementation is not a performance problem. The performance problem with hashing/hmacing is in Ts, because NSS DLLs are slow to load; that will be fixed when NSS gets merged into libxul.
Comment 61 Patrick McManus [:mcmanus] 2011-05-31 17:27:23 PDT
(In reply to comment #60)

> Content-MD5 checking isn't risk-free. Since Content-MD5 can result in false
> positive indications (e.g. content transforming proxies) for pipelining
> failures, a Content-MD5 mismatch runs the risk of being an unnecessary
> performance regression for mobile, where we already have pipelining turned
> on. 
> 

That's not a false positive - we have identified an intermediary that is not http compliant that is messing with the protocol stream. That's exactly when we want to turn pipelining off.
Comment 62 Brian Smith (:briansmith, :bsmith, use NEEDINFO?) 2011-05-31 17:36:20 PDT
(In reply to comment #61)
> > Content-MD5 checking isn't risk-free. Since Content-MD5 can result in false
> > positive indications (e.g. content transforming proxies) for pipelining
> > failures, a Content-MD5 mismatch runs the risk of being an unnecessary
> > performance regression for mobile, where we already have pipelining turned
> > on. 
> 
> That's not a false positive - we have identified an intermediary that is not
> http compliant that is messing with the protocol stream. That's exactly when
> we want to turn pipelining off.

Content transformation proxies, as nasty as they might be, aren't (necessarily) non-compliant; if they caused problems with pipelining we wouldn't be able to have pipelining enabled on mobile even now, as they are extremely common and are becoming more common as time passes. (They were recently introduced by some US carriers / ISPs, for example.)
Comment 63 Patrick McManus [:mcmanus] 2011-05-31 17:41:48 PDT
Created attachment 536489 [details] [diff] [review]
telemetry v2

use BOOLEAN and ENUMERATION histogram types - which makes a nicer patch, but the output in about:telemetry is the same
Comment 64 Patrick McManus [:mcmanus] 2011-05-31 21:52:49 PDT
(In reply to comment #62)
> (In reply to comment #61)

> Content transformation proxies, as nasty as they might be, aren't
> (necessarily) non-compliant;

If they fail to update the Content-MD5 they are non-compliant. I don't see any wiggle room on that.

A busted MD5 signature is a very strong sign something is amiss. The presence of the hash is a strong indication that the content value is important to the provider and that they want it checked. And its all standards driven (unlike assoc-req, which I'm also a fan of.). It doesn't make any sense to ignore it.

I think you make far too much of the mobile case. I'm the one that turned it on, and if there had been a convenient way to limit it to known good paths at that time I would have. This patch will only impact a few cases and it will make them safer.
Comment 65 Patrick McManus [:mcmanus] 2011-06-06 10:43:00 PDT
Created attachment 537587 [details] [diff] [review]
error page for ns_error_corrupted content v1 [landed as 662414]

error page for ns_error_corrupted_content broken out from real md5 portion of this patch - the error code is used in 597706.
Comment 66 Patrick McManus [:mcmanus] 2011-06-06 10:44:00 PDT
Created attachment 537588 [details] [diff] [review]
v13

carry forward r+ honzab

this is just v12 with the error page for ns_error_corrupted_content broken into separate patch
Comment 67 Jason Duell [:jduell] (needinfo? me) 2011-06-06 15:44:47 PDT
Re: whether to consider this a worthy thing to put in the tree:  I don't have an opinion yet--hopefully y'all can help me get one :)

Here's my understanding of the situation (please correct me if I'm wrong).

This code only affects channels that encounter an MD-5 header (which is rare). We don't turn off pipelining globally if there's an MD-5 mismatch, and we don't even (currently in the patch) report an error upstream, we just turn off pipelining for that site (and re-try the channel w/o pipelining on?).  If all that is correct, it sounds like the only problematic case in terms of performance are sites that use MD-5 but wind up hitting a content-transformation proxy that doesn't rewrite the MD-5 header (in that case such sites won't be pipelined).  Is that all correct?

If so then I agree with Patrick that this patch seems relatively low risk (at least perf-wise).   However, I'm so far inclined to also agree with Honza and Brian that MD-5 checking doesn't seem very crucial (given its infrequent use), shouldn't be a blocker for pipelining in general, and might not be the best use of our precious, precious time (Patrick: are you anticipating wider user of the header?).

That said, we've already got a +r patch.  We can 1) scrap it and mark WONTFIX; 2) collect telemetry data first, and then have data to argue about landing the feature or not; or 3) land it.  I'm really not sure of the answer.  

- Do other browsers implement MD5?

- What telemetry data would convince us one way or another to land or not? Large #s of transforming proxies messing up MD5?  What else would actually be relevant?  

- comment 25: "note that various government regulations prevent us from using the MD5 algorithm as an anti-tampering check":  is that relevant to our check here?
Comment 68 Honza Bambas (:mayhemer) 2011-06-07 08:20:07 PDT
The MD5 check patch is an addition to the pipelining fail-detect code.  We must be able to turn pipelining on by default w/o this patch.

This is a nice to have feature, but not very important (Patrick we have heard your arguments, but you agreed on collecting stats).

Before we decide to land this, we want a telemetry to see how much MD5 header is used in the open Internet.  I believe not widely. But now we can have those numbers.

What I want to do before we land this is to have those numbers.  Before that the MD5 check patch should not be landed.
Comment 69 Patrick McManus [:mcmanus] 2011-11-15 11:08:53 PST
HTTP bis (somewhere before draft 17) dropped MD5 support from HTTP.
http://tools.ietf.org/wg/httpbis/draft-ietf-httpbis-p3-payload/

I will declare defeat and move on.
Comment 70 Matthias-Christian Ott 2012-06-06 17:54:57 PDT
It's still part of HTTP/1.1, so the newer draft changes nothing about the header. Please reopen.
Comment 71 Ant Bryan 2012-06-11 18:31:54 PDT
I believe the Digest header (RFC 3230: Instance Digests in HTTP) is the successor to the deprecated Content-MD5.

you can use any hash from http://www.iana.org/assignments/hash-function-text-names/hash-function-text-names.xml - MD5, SHA-1, SHA-256, SHA-512, etc.

we use Digest in RFC 6249: Metalink/HTTP.

Note You need to log in before you can comment on or make changes to this bug.