Closed Bug 1098004 Opened 10 years ago Closed 10 years ago

implement snappy compression streams

Categories

(Core :: XPCOM, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla37

People

(Reporter: bkelly, Assigned: bkelly)

References

Details

Attachments

(1 file, 5 obsolete files)

Attached patch snappy_stream.patch (obsolete) — Splinter Review
We currently have the snappy compressor in:

  other-licenses/snappy

This core lib provides the compression algorithm for fixed blocks, but no ability to process data streams.  It specifies what the streaming framing protocol should look like, but does not implement it:

  http://dxr.mozilla.org/mozilla-central/source/other-licenses/snappy/src/framing_format.txt

For the ServiceWorker Cache work I would like a fast, streaming compression algorithm that works with nsIInputStream and friends.  Since our existing deflate infrastructure only works with nsStreamListener I chose to implement this framing protocol.

Note, the lz4 compressor is also in the tree and is perhaps faster than snappy.  Unfortunately its streaming API is still experimental and somewhat complex. (I also went quite far down the snappy rabbit hole before learning of lz4.)  It would be nice to provide lz4 stream in the future, if possible, but that does not preclude snappy streams.

The attached patch has my current work-in-progress.  While I've done basic testing with my Cache code, this still needs a dedicated unit test.  I also still need to do some cleanup and documentation.
Here is a cleaned up version with gtests.

Works locally against maple and just started a try against m-c:

  https://tbpl.mozilla.org/?tree=Try&rev=1332b08eaaa3

Some things I wasn't quite sure about:

Should I hide the classes behind factory methods like NS_NewSnappyCompressOutputStream()?

The snappy framing algorithm uses CRC32c which is different from the CRC32 in zlib, png, etc.  I brought the file in from the FreeBSD repo and just stuck it in the same directory.  Should it go somewhere else?

Thanks!
Attachment #8521745 - Attachment is obsolete: true
Attachment #8523049 - Flags: review?(nfroyd)
Fix compile issues on windows.
Attachment #8523049 - Attachment is obsolete: true
Attachment #8523049 - Flags: review?(nfroyd)
Attachment #8523153 - Flags: review?(nfroyd)
Blocks: 940273
Comment on attachment 8523153 [details] [diff] [review]
Implement snappy compression framing protocol as nsI(Input|Output)Streams. (v2)

Review of attachment 8523153 [details] [diff] [review]:
-----------------------------------------------------------------

I think it's not worth hiding things behind factory methods nowadays, since everything lives in libxul.

The crc32c.{c,h} files are fine where they are.

Not cancelling the review to remind myself I still need to look at things...I didn't stare at the compression/decompression loops very carefully yet.  Looks good on the whole, but lots of little comments so far.
.

::: xpcom/io/crc32c.c
@@ +49,5 @@
> +#include "crc32c.h"
> +#include <stddef.h>
> +#include <stdint.h>
> +
> +const uint32_t crc32_tab[] = {

This doesn't seem to be used?

@@ +700,5 @@
> +		crc ^= (*p_buf++) << 24;
> +#else
> +		crc ^= *(const uint32_t *) p_buf;
> +		p_buf += 4;
> +#endif

I realize this is not your code, but:

- We should be using mozilla/Endian.h and the MOZ_{LITTLE,BIG}_ENDIAN macros;
- Actually, even better, this whole thing should just be:

  crc ^= mozilla::LittleEndian::readUint32(p_buf);
  p_buf += 4;

which is more efficient, and avoids aliasing violations.

@@ +723,5 @@
> +		    term1 ^
> +		    sctp_crc_tableil8_o40[term2 & 0x000000FF] ^
> +		    sctp_crc_tableil8_o32[(term2 >> 8) & 0x000000FF];
> +		p_buf += 4;
> +#endif

Same thing here, although it is less obvious to me how to remove the #if-age here.  I guess you'd say:

  uint32_t u32 = LittleEndian::readUint32(p_buf);
  p_buf++;

and just use the little-endian tables?

@@ +753,5 @@
> +{
> +	if (length < 4) {
> +		return (singletable_crc32c(crc32c, buffer, length));
> +	} else {
> +		return (multitable_crc32c(crc32c, buffer, length));

Do you have data on how much faster all these extra tables make things go?  8KB of space, while not a lot, seems like it's not worth it unless the speed increase is just incredible.

::: xpcom/io/crc32c.h
@@ +9,5 @@
> +
> +uint32_t
> +calculate_crc32c(uint32_t crc32c,
> +    const unsigned char *buffer,
> +    unsigned int length);

|size_t length|?  Or is this one of those things where crc32 on > 4GB is not really well-defined?  If so, this should probably just be uint32_t for extra documentation.

::: xpcom/io/nsSnappyCompressOutputStream.cpp
@@ +78,5 @@
> +NS_IMETHODIMP
> +nsSnappyCompressOutputStream::Write(const char* aBuf, uint32_t aCount,
> +                                  uint32_t* aResultOut)
> +{
> +  return WriteSegments(NS_CopySegmentToBuffer, const_cast<char*>(aBuf), aCount,

It makes me sad that we have to do this const_cast...

@@ +92,5 @@
> +NS_IMETHODIMP
> +nsSnappyCompressOutputStream::WriteSegments(nsReadSegmentFun aReader,
> +                                          void* aClosure,
> +                                          uint32_t aCount,
> +                                          uint32_t* aBytesWrittenOut)

Nit: these parameters don't look lined up with the first parameter.  (Actually, this seems to be a problem everywhere you have multi-line parameters in this patch.  Please fix all instances.)

@@ +125,5 @@
> +    uint32_t numRead = 0;
> +
> +    nsresult rv = aReader(this, aClosure, mBuffer.get() + mNextByte,
> +                          *aBytesWrittenOut, numToRead, &numRead);
> +    if (NS_WARN_IF(NS_FAILED(rv)) || numRead < 1) {

Why are we swallowing the error condition here?

@@ +181,5 @@
> +
> +  // Write the compressed buffer out to the base stream
> +  uint32_t numWritten = 0;
> +  rv = WriteAll(mCompressedBuffer.get(), compressedLength, &numWritten);
> +  if (NS_WARN_IF(NS_FAILED(rv))) { return rv; }

I guess we are guaranteed that NS_SUCCEEDED(rv) <=> numWritten == compressedLength?

@@ +205,5 @@
> +
> +  // Write the compressed buffer out to the base stream
> +  uint32_t numWritten = 0;
> +  rv = WriteAll(mCompressedBuffer.get(), compressedLength, &numWritten);
> +  if (NS_WARN_IF(NS_FAILED(rv))) { return rv; }

Same comment here.

::: xpcom/io/nsSnappyCompressOutputStream.h
@@ +10,5 @@
> +#include "nsAutoPtr.h"
> +#include "nsCOMPtr.h"
> +#include "nsIOutputStream.h"
> +#include "nsSnappyFrameUtils.h"
> +#include "nsISupportsImpl.h"

Nit: please sort these alphabetically.

@@ +13,5 @@
> +#include "nsSnappyFrameUtils.h"
> +#include "nsISupportsImpl.h"
> +
> +class nsSnappyCompressOutputStream : public nsIOutputStream
> +                                   , public nsSnappyFrameUtils

Perhaps use |protected| inheritance for this, so we're not exposing nsSnappyFrameUtils as part of the stream API?  Maybe even put nsSnappyFrameUtils.h in the detail:: namespace?

@@ +43,5 @@
> +  const size_t mBlockSize;
> +
> +  // Buffer holding copied uncompressed data.  This must be copied here
> +  // so that the compression can be performed on a single flat buffer.
> +  nsAutoArrayPtr<char> mBuffer;

Please use mozilla::UniquePtr.

@@ +49,5 @@
> +  // The next byte in the uncompressed data to copy incoming data to.
> +  uint32_t mNextByte;
> +
> +  // Buffer holding the resulting compressed data.
> +  nsAutoArrayPtr<char> mCompressedBuffer;

Likewise.

@@ +54,5 @@
> +  size_t mCompressedBufferLength;
> +
> +  // Have we flushed any data to the base stream yet?  If not we will
> +  // need to write out a StreamIdentifier chunk.
> +  bool mFirstFlush;

Let's get away from bool usages and do something like:

enum StreamIdentifierStatus {
  StreamIdentifierNeeded,
  StreamIdentifierWritten,
} mStreamIdentifierStatus;

::: xpcom/io/nsSnappyFrameUtils.cpp
@@ +47,5 @@
> +{
> +  aBuf[0] = aVal & 0x000000ff;
> +  aBuf[1] = (aVal & 0x0000ff00) >> 8;
> +  aBuf[2] = (aVal & 0x00ff0000) >> 16;
> +  aBuf[3] = (aVal & 0xff000000) >> 24;

Please use mozilla/Endian.h bits for this.

@@ +56,5 @@
> +  uint32_t val = 0;
> +  val |= static_cast<uint32_t>(aBuf[0]) & 0xff;
> +  val |= (static_cast<uint32_t>(aBuf[1]) & 0xff) << 8;
> +  val |= (static_cast<uint32_t>(aBuf[2]) & 0xff) << 16;
> +  val |= (static_cast<uint32_t>(aBuf[3]) & 0xff) << 24;

Likewise.

@@ +101,5 @@
> +  aDest[5] = 0x4e;
> +  aDest[6] = 0x61;
> +  aDest[7] = 0x50;
> +  aDest[8] = 0x70;
> +  aDest[9] = 0x59;

Maybe static_assert(kHeaderLength + kStreamIdentifierDataLength == 10)?

@@ +103,5 @@
> +  aDest[7] = 0x50;
> +  aDest[8] = 0x70;
> +  aDest[9] = 0x59;
> +
> +  *aBytesWrittenOut = 10;

= kHeaderLength + kStreamIdentifierDataLength?

@@ +112,5 @@
> +// static
> +nsresult
> +nsSnappyFrameUtils::WriteCompressedData(char* aDest, size_t aDestLength,
> +                                      const char* aData, size_t aDataLength,
> +                                      size_t* aBytesWrittenOut)

Nit: some peculiar indentation here.

@@ +124,5 @@
> +
> +  size_t offset = 0;
> +
> +  WriteChunkType(aDest, CompressedData);
> +  offset += 1;

kChunkTypeLength?

@@ +128,5 @@
> +  offset += 1;
> +
> +  // skip length for now and write it out after we know the compressed length
> +  size_t lengthOffset = offset;
> +  offset += 3;

kStoredChunkLength?

@@ +194,5 @@
> +                                        size_t* aBytesReadOut)
> +{
> +  *aBytesWrittenOut = 0;
> +  *aBytesReadOut = 0;
> +  if (NS_WARN_IF(aDataLength != kStreamIdentifierDataLength ||

Shouldn't this be aDataLength < kStreamIdentifierDataLength?

@@ +203,5 @@
> +                 aData[4] != 0x70 ||
> +                 aData[5] != 0x59)) {
> +    return NS_ERROR_CORRUPTED_CONTENT;
> +  }
> +  *aBytesWrittenOut = 0;

Redundant write here?

@@ +257,5 @@
> +size_t
> +nsSnappyFrameUtils::MaxCompressedBufferLength(size_t aSourceLength)
> +{
> +  size_t neededLength = 1; // chunk type
> +  neededLength += 3;         // chunk length

kStoredChunkLength?

@@ +258,5 @@
> +nsSnappyFrameUtils::MaxCompressedBufferLength(size_t aSourceLength)
> +{
> +  size_t neededLength = 1; // chunk type
> +  neededLength += 3;         // chunk length
> +  neededLength += 4;         // crc32

kCRC32Length?

::: xpcom/io/nsSnappyUncompressInputStream.h
@@ +10,5 @@
> +#include "nsSnappyFrameUtils.h"
> +#include "nsAutoPtr.h"
> +#include "nsCOMPtr.h"
> +#include "nsIInputStream.h"
> +#include "nsISupportsImpl.h"

Nit: please sort these alphabetically.

@@ +43,5 @@
> +  nsAutoArrayPtr<char> mCompressedBuffer;
> +
> +  // Buffer storing the resulting uncompressed data.  Exactly snappy::kBlockSize
> +  // bytes long.
> +  nsAutoArrayPtr<char> mBuffer;

Please use mozilla::UniquePtr if you can instead of nsAutoArrayPtr.  I know it's not nearly as convenient for things like this, because there's no implicit conversion to T*.

@@ +60,5 @@
> +  size_t mNextChunkDataLength;
> +
> +  // The stream must begin with a StreamIdentifier chunk.  Are we still
> +  // expecting it?
> +  bool mNeedFirstStreamIdentifier;

Same thing with the bool member as in nsSnappyCompressOutputStream.  Maybe unify the enums and put them in nsSnappyFrameUtils?
Attachment #8523153 - Flags: feedback+
Comment on attachment 8523153 [details] [diff] [review]
Implement snappy compression framing protocol as nsI(Input|Output)Streams. (v2)

Review of attachment 8523153 [details] [diff] [review]:
-----------------------------------------------------------------

The guts of the streams look right, just a few additional comments below.

::: xpcom/io/nsSnappyCompressOutputStream.cpp
@@ +117,5 @@
> +      nsresult rv = FlushToBaseStream();
> +      if (NS_WARN_IF(NS_FAILED(rv))) { return rv; }
> +
> +      // Now the entire buffer should be available for copying
> +      remaining = mBlockSize - mNextByte;

Maybe instead:

  MOZ_ASSERT(mNextByte);
  remaining = mBlockSize;

?

::: xpcom/tests/gtest/TestSnappyStreams.cpp
@@ +33,5 @@
> +  uint32_t offset = 0;
> +  while (aNumBytes > 0) {
> +    uint8_t setIndex;
> +    size_t randomSize = PR_GetRandomNoise(&setIndex, sizeof(setIndex));
> +    ASSERT_EQ(sizeof(setIndex), randomSize);

I worry about this a little bit, as test runs will not be deterministic.  WDYT about using something more predictable?

@@ +40,5 @@
> +    const char* dataSet = dataSets[setIndex];
> +    uint32_t dataSetLength = strlen(dataSet);
> +
> +    for (uint32_t i = 0; i < dataSetLength && aNumBytes > 0; ++i) {
> +      aDataOut[offset] = dataSet[i];

I think this is more compactly expressed as:

  uint32_t amount = std::min(dataSetLength, aNumBytes);
  aDataOut.InsertElementsAt(offset, dataSet[i], amount);
  offset += amount;

?  Alternatively, you could use SetCapacity, and use AppendElements() here so you wouldn't have to maintain |offset|.

@@ +127,5 @@
> +  for (uint32_t i = 0; i < inputData.Length(); ++i) {
> +    EXPECT_EQ(inputData[i], outputData.get()[i]) << "Byte " << i;
> +  }
> +}
> +

You should have some tests for error conditions, too, corrupted streams and the like.
Attachment #8523153 - Flags: review?(nfroyd)
Oh, also meant to suggest: you could use nsTArrays for the compression buffers, allocating them with (fallible) SetLength() calls, instead of UniquePtr.  That might make using them a little nicer...
Thanks for the review!  I have a couple questions below, so NI.

> ::: xpcom/io/crc32c.c

As discussed on IRC, I'll strip crc32c.c down to its single table implementation.

> @@ +700,5 @@
> > +		crc ^= (*p_buf++) << 24;
> > +#else
> > +		crc ^= *(const uint32_t *) p_buf;
> > +		p_buf += 4;
> > +#endif
> 
> I realize this is not your code, but:
> 
> - We should be using mozilla/Endian.h and the MOZ_{LITTLE,BIG}_ENDIAN macros;
> - Actually, even better, this whole thing should just be:
> 
>   crc ^= mozilla::LittleEndian::readUint32(p_buf);
>   p_buf += 4;
> 
> which is more efficient, and avoids aliasing violations.

I'll update crc32c to use our macros as well.

> 
> @@ +125,5 @@
> > +    uint32_t numRead = 0;
> > +
> > +    nsresult rv = aReader(this, aClosure, mBuffer.get() + mNextByte,
> > +                          *aBytesWrittenOut, numToRead, &numRead);
> > +    if (NS_WARN_IF(NS_FAILED(rv)) || numRead < 1) {
> 
> Why are we swallowing the error condition here?

Per nsIOutputStream.idl documentation for nsReadSegmentFun:

  "Errors are never passed to the caller of WriteSegments."

I should drop the NS_WARN_IF() here for the same reason.

> @@ +181,5 @@
> > +
> > +  // Write the compressed buffer out to the base stream
> > +  uint32_t numWritten = 0;
> > +  rv = WriteAll(mCompressedBuffer.get(), compressedLength, &numWritten);
> > +  if (NS_WARN_IF(NS_FAILED(rv))) { return rv; }
> 
> I guess we are guaranteed that NS_SUCCEEDED(rv) <=> numWritten ==
> compressedLength?

Since we only support blocking base streams we'll either complete the entire write or the stream will end up closed.  In the closed case we could end up with numWritten != compressedLength.  But we return the error in that case and count on the base stream to continue throwing that error.


> @@ +13,5 @@
> > +class nsSnappyCompressOutputStream : public nsIOutputStream
> > +                                   , public nsSnappyFrameUtils
> 
> Perhaps use |protected| inheritance for this, so we're not exposing
> nsSnappyFrameUtils as part of the stream API?  Maybe even put
> nsSnappyFrameUtils.h in the detail:: namespace?

This was the main thing making me wonder if a factory method was worth it to hide this.  I guess protected inheritence and detail:: namespace should be good enough.

> @@ +54,5 @@
> > +  size_t mCompressedBufferLength;
> > +
> > +  // Have we flushed any data to the base stream yet?  If not we will
> > +  // need to write out a StreamIdentifier chunk.
> > +  bool mFirstFlush;
> 
> Let's get away from bool usages and do something like:
> 
> enum StreamIdentifierStatus {
>   StreamIdentifierNeeded,
>   StreamIdentifierWritten,
> } mStreamIdentifierStatus;

So, I understand trying to avoid bools in argument lists since that leads to confusing code like "DoStuff(true)", but this is not one of those cases.  Its an internal boolean flag that is named for what it does.  I think something like "mFirstFlush = true" is pretty self documenting.  It won't end up with bare boolean usage.

I think removing all use of bools everywhere is excessive and will make the code more obfuscated.  For example, using an enum requires the developer to go find the definition to determine if it only has two states or more, etc.

I'm happy to rename the flag to something like "mStreamIdentifierWritten = true", but I'd prefer to avoid using an enum here.  Are you willing to bend on this one?

> ::: xpcom/io/nsSnappyFrameUtils.cpp
> @@ +47,5 @@
> > +{
> > +  aBuf[0] = aVal & 0x000000ff;
> > +  aBuf[1] = (aVal & 0x0000ff00) >> 8;
> > +  aBuf[2] = (aVal & 0x00ff0000) >> 16;
> > +  aBuf[3] = (aVal & 0xff000000) >> 24;
> 
> Please use mozilla/Endian.h bits for this.

Are the Endian.h utilities guaranteed to work on non-aligned bytes?  I believe the compressed stream can be misaligned due to the variable compressed block size.  For example, you can see that snappy uses memcpy to read pieces of the stream on certain platforms:

  http://dxr.mozilla.org/mozilla-central/source/other-licenses/snappy/src/snappy-stubs-internal.h#196

It looks like they should work for unaligned data, but I'm not sure.

> @@ +194,5 @@
> > +                                        size_t* aBytesReadOut)
> > +{
> > +  *aBytesWrittenOut = 0;
> > +  *aBytesReadOut = 0;
> > +  if (NS_WARN_IF(aDataLength != kStreamIdentifierDataLength ||
> 
> Shouldn't this be aDataLength < kStreamIdentifierDataLength?

No.  The chunk data length is specified in the stream in the chunk header.  If the chunk header does not specify the correct data length for the StreamIdentifier then its a corrupted stream.

> ::: xpcom/tests/gtest/TestSnappyStreams.cpp
> @@ +33,5 @@
> > +  uint32_t offset = 0;
> > +  while (aNumBytes > 0) {
> > +    uint8_t setIndex;
> > +    size_t randomSize = PR_GetRandomNoise(&setIndex, sizeof(setIndex));
> > +    ASSERT_EQ(sizeof(setIndex), randomSize);
> 
> I worry about this a little bit, as test runs will not be deterministic. 
> WDYT about using something more predictable?

Yea, I was thinking about that too.  I'll pre-generate some random data and store it in a static array.
Flags: needinfo?(nfroyd)
(In reply to Ben Kelly [:bkelly] from comment #7)
> > @@ +125,5 @@
> > > +    uint32_t numRead = 0;
> > > +
> > > +    nsresult rv = aReader(this, aClosure, mBuffer.get() + mNextByte,
> > > +                          *aBytesWrittenOut, numToRead, &numRead);
> > > +    if (NS_WARN_IF(NS_FAILED(rv)) || numRead < 1) {
> > 
> > Why are we swallowing the error condition here?
> 
> Per nsIOutputStream.idl documentation for nsReadSegmentFun:
> 
>   "Errors are never passed to the caller of WriteSegments."
> 
> I should drop the NS_WARN_IF() here for the same reason.

That seems sort of bizarre, but OK.  WDYT about rewriting it like so:

  if (NS_FAILED(rv)) {
    // WriteSegments never passes errors to its caller.
    return NS_OK;
  }
  if (numRead < 1) {
    return NS_OK;
  }

Separating things into two conditions seems to read better for me, since there are really two separate things that you're checking even though they both return the same value, but I'm not going to insist.

> > @@ +54,5 @@
> > > +  size_t mCompressedBufferLength;
> > > +
> > > +  // Have we flushed any data to the base stream yet?  If not we will
> > > +  // need to write out a StreamIdentifier chunk.
> > > +  bool mFirstFlush;
> > 
> > Let's get away from bool usages and do something like:
> > 
> > enum StreamIdentifierStatus {
> >   StreamIdentifierNeeded,
> >   StreamIdentifierWritten,
> > } mStreamIdentifierStatus;
>
> I think removing all use of bools everywhere is excessive and will make the
> code more obfuscated.  For example, using an enum requires the developer to
> go find the definition to determine if it only has two states or more, etc.

This is a reasonable objection.  I think for something like this, it's more important that the use sites are self-documenting, rather than the declaration of the member, so I'm not too worried about having to think about multiple enums.

I have found recently that even when I have a logically boolean thing, using a two-valued enum makes the use-sites of that thing slightly more readable, even for things like member variables.  Usually this is because the condition is now something like:

  if (thing.buffering == Supported)

rather than:

  if (thing.mBuffering) // or maybe mBufferingSupported

YMMV.

> I'm happy to rename the flag to something like "mStreamIdentifierWritten =
> true", but I'd prefer to avoid using an enum here.  Are you willing to bend
> on this one?

I do think that's a better name for this particular case.

I'm not going to be strict about it, but I would strongly encourage trying it out in a couple of places.

> > ::: xpcom/io/nsSnappyFrameUtils.cpp
> > @@ +47,5 @@
> > > +{
> > > +  aBuf[0] = aVal & 0x000000ff;
> > > +  aBuf[1] = (aVal & 0x0000ff00) >> 8;
> > > +  aBuf[2] = (aVal & 0x00ff0000) >> 16;
> > > +  aBuf[3] = (aVal & 0xff000000) >> 24;
> > 
> > Please use mozilla/Endian.h bits for this.
> 
> Are the Endian.h utilities guaranteed to work on non-aligned bytes?

Yes.  They use memcpy for all operations, which the compiler is free to optimize.

> > @@ +194,5 @@
> > > +                                        size_t* aBytesReadOut)
> > > +{
> > > +  *aBytesWrittenOut = 0;
> > > +  *aBytesReadOut = 0;
> > > +  if (NS_WARN_IF(aDataLength != kStreamIdentifierDataLength ||
> > 
> > Shouldn't this be aDataLength < kStreamIdentifierDataLength?
> 
> No.  The chunk data length is specified in the stream in the chunk header. 
> If the chunk header does not specify the correct data length for the
> StreamIdentifier then its a corrupted stream.

OK, thanks for the clarification.

> > ::: xpcom/tests/gtest/TestSnappyStreams.cpp
> > @@ +33,5 @@
> > > +  uint32_t offset = 0;
> > > +  while (aNumBytes > 0) {
> > > +    uint8_t setIndex;
> > > +    size_t randomSize = PR_GetRandomNoise(&setIndex, sizeof(setIndex));
> > > +    ASSERT_EQ(sizeof(setIndex), randomSize);
> > 
> > I worry about this a little bit, as test runs will not be deterministic. 
> > WDYT about using something more predictable?
> 
> Yea, I was thinking about that too.  I'll pre-generate some random data and
> store it in a static array.

Great, thank you.
Flags: needinfo?(nfroyd)
I believe this addresses all the issues based on the discussion so far.

Note, I had to expand the assert against sync streams in nsSnappyUncompressInputStream because nsStringInputStream is quite odd.  It claims non-blocking, but its not an nsIAsyncInputStream.  So I have to check if something is blocking and if its not, verify its not async.  I can't just QI to check the async interface because nsPipeInputStream can be blocking, but still implements the async interface.

I also implemented the 24-bit read/write functions using NativeEndian and memcpy.

Thanks again for the review!
Attachment #8523153 - Attachment is obsolete: true
Attachment #8525710 - Flags: review?(nfroyd)
Comment on attachment 8525710 [details] [diff] [review]
Implement snappy compression framing protocol as nsI(Input|Output)Streams. (v3)

Review of attachment 8525710 [details] [diff] [review]:
-----------------------------------------------------------------

General comments:

- I think the |< 1| tests are more idiomatic as |== 0|.
- |mUniquePtr.get() + offset| can be slightly nicer as |&mUniquePtr[offset]|.

Lots more comments below.

::: xpcom/io/crc32c.h
@@ +8,5 @@
> +extern "C" {
> +#endif
> +
> +uint32_t
> +singletable_crc32c(uint32_t crc, const void *buf, size_t size);

I think it's not worth calling this |singletable_crc32c| anymore, since that's just an implementation detail.  Let's call it |ComputeCrc32C|, and provide a brief blurb about which CRC polynomial it uses here?

Bonus points for using Gecko's |a|-prefix for arguments here.

::: xpcom/io/nsSnappyCompressOutputStream.cpp
@@ +20,5 @@
> +
> +nsSnappyCompressOutputStream::nsSnappyCompressOutputStream(nsIOutputStream* aBaseStream,
> +                                                           size_t aBlockSize)
> + : mBaseStream(aBaseStream)
> + , mBlockSize(std::min(aBlockSize, kMaxBlockSize))

Is there a minimum block size for snappy?  MOZ_ASSERT(mBlockSize > 0) might be a good start...

@@ +111,5 @@
> +    }
> +  }
> +
> +  while (aCount > 0) {
> +    // Determine how much space is left in our flat, uncompressed buffer

Nit: sentences always end with a period.  Would be good to fix up other instances, too.

@@ +112,5 @@
> +  }
> +
> +  while (aCount > 0) {
> +    // Determine how much space is left in our flat, uncompressed buffer
> +    uint32_t remaining = mBlockSize - mNextByte;

Probably worth a MOZ_ASSERT somewhere that |mNextByte < mBlockSize|.

mNextByte and mBlockSize are also different types, so it's probably worth either changing mNextByte's type, or using explicit casts to avoid warnings.

@@ +114,5 @@
> +  while (aCount > 0) {
> +    // Determine how much space is left in our flat, uncompressed buffer
> +    uint32_t remaining = mBlockSize - mNextByte;
> +
> +    // If its full, then compress and flush the data to the base stream

Nit: "it's".

@@ +176,5 @@
> +      return NS_ERROR_OUT_OF_MEMORY;
> +    }
> +  }
> +
> +  // The first chunk must be a StreamIdentifier chunk.  Write the out

Did you mean to say "write it out" here?  I'm not sure what "the out" refers to otherwise.

@@ +182,5 @@
> +  nsresult rv = MaybeFlushStreamIdentifier();
> +  if (NS_WARN_IF(NS_FAILED(rv))) { return rv; }
> +
> +  // Compress the data to our internal compressed buffer
> +  size_t compressedLength;

Should we be asserting that we have some amount of data to compress here?

::: xpcom/io/nsSnappyCompressOutputStream.h
@@ +12,5 @@
> +#include "nsIOutputStream.h"
> +#include "nsISupportsImpl.h"
> +#include "nsSnappyFrameUtils.h"
> +
> +class nsSnappyCompressOutputStream : public nsIOutputStream

This should be MOZ_FINAL.

::: xpcom/io/nsSnappyFrameUtils.cpp
@@ +49,5 @@
> +
> +void WriteUInt24(char* aBuf, uint32_t aVal)
> +{
> +  MOZ_ASSERT(!(aVal & 0xff000000));
> +  NativeEndian::swapToLittleEndianInPlace(&aVal, sizeof(aVal));

Please just use a temporary for this swapping, rather than reusing the argument value.

This call as written is also going to do bad things, as the prototype is:

|swapToLittleEndianInPlace(T* aPtr, size_t aCount)|

so you're asking it to swap four values, not one.  Using a temporary means you can just:

uint32_t tmp = swapToLittleEndian(aVal);

@@ +58,5 @@
> +{
> +  uint32_t val = 0;
> +  memcpy(&val, aBuf, 3);
> +  NativeEndian::swapFromLittleEndianInPlace(&val, sizeof(val));
> +  return val;

Same story here with bad code.  Just |return NativeEndian::swapFromLittleEndian(val);| should work.

@@ +63,5 @@
> +}
> +
> +uint32_t MaskChecksum(uint32_t aValue)
> +{
> +  return ((aValue >> 15) | (aValue << 17)) + 0xa282ead8;

A brief comment indicating where these magic constants come from is worthwhile.

@@ +172,5 @@
> +      return ParseCompressedData(aDest, aDestLength, aData, aDataLength,
> +                                 aBytesWrittenOut, aBytesReadOut);
> +
> +    // TODO: support other snappy chunk types
> +    default:

MOZ_ASSERT here, just to make the non-support for other chunk types obvious?

::: xpcom/io/nsSnappyFrameUtils.h
@@ +12,5 @@
> +
> +namespace detail {
> +
> +//
> +// Utility to class providing primitives necessary to build streams based

"Utility class"?

::: xpcom/io/nsSnappyUncompressInputStream.cpp
@@ +67,5 @@
> +    return NS_BASE_STREAM_CLOSED;
> +  }
> +
> +  // current number of uncompressed bytes
> +  *aLengthOut = mBufferFillSize - mNextByte;

We use this expression in several places, WDYT about making it a small utility function, and then we can also MOZ_ASSERT(mBufferFillSize >= mNextByte) in said function?

@@ +135,5 @@
> +      }
> +
> +      *aBytesReadOut += numWritten;
> +      mNextByte += numWritten;
> +      if (mNextByte >= mBufferFillSize) {

Should this really be >=?  Wouldn't > be an error?

@@ +213,5 @@
> +    rv = ReadAll(mCompressedBuffer.get(), firstReadLength, aBytesReadOut);
> +    if (NS_WARN_IF(NS_FAILED(rv)) || *aBytesReadOut < 1) { return rv; }
> +    if (NS_WARN_IF(*aBytesReadOut < firstReadLength)) {
> +      return NS_ERROR_CORRUPTED_CONTENT;
> +    }

This "ReadAll, check various failures" pattern occurs three times, it should be separated out into its own utility function.

::: xpcom/io/nsSnappyUncompressInputStream.h
@@ +12,5 @@
> +#include "nsIInputStream.h"
> +#include "nsISupportsImpl.h"
> +#include "nsSnappyFrameUtils.h"
> +
> +class nsSnappyUncompressInputStream : public nsIInputStream

This should be MOZ_FINAL.

@@ +43,5 @@
> +  mozilla::UniquePtr<char[]> mCompressedBuffer;
> +
> +  // Buffer storing the resulting uncompressed data.  Exactly snappy::kBlockSize
> +  // bytes long.
> +  mozilla::UniquePtr<char[]> mBuffer;

Probably would be good to call this |mUncompressedBuffer| for symmetry.

@@ +46,5 @@
> +  // bytes long.
> +  mozilla::UniquePtr<char[]> mBuffer;
> +
> +  // Number of bytes of uncompressed data in mBuffer.
> +  size_t mBufferFillSize;

|mBufferFillSize| reads oddly to me...WDYT about calling it |mUncompressedBytes|?

::: xpcom/tests/gtest/TestSnappyStreams.cpp
@@ +171,5 @@
> +}
> +
> +TEST(SnappyStream, CompressUncompress_256k_less_13)
> +{
> +  TestCompressUncompress((256 * 1024) - 13);

Where is this magic 13 coming from (here and below)?

@@ +184,5 @@
> +{
> +  TestCompressUncompress((256 * 1024) + 13);
> +}
> +
> +TEST(SnappyStream, UncompressCorrupt)

Can you add a test where we have a valid stream identifier at the start, but a corrupt chunk afterwards?

@@ +188,5 @@
> +TEST(SnappyStream, UncompressCorrupt)
> +{
> +  static const char data[] = "This is not a valid compressed stream";
> +  nsCOMPtr<nsIInputStream> source;
> +  nsresult rv = NS_NewByteInputStream(getter_AddRefs(source), data);

The lack of explicit |length| in this API is slightly frightening...
Attachment #8525710 - Flags: review?(nfroyd) → feedback+
> @@ +213,5 @@
> > +    rv = ReadAll(mCompressedBuffer.get(), firstReadLength, aBytesReadOut);
> > +    if (NS_WARN_IF(NS_FAILED(rv)) || *aBytesReadOut < 1) { return rv; }
> > +    if (NS_WARN_IF(*aBytesReadOut < firstReadLength)) {
> > +      return NS_ERROR_CORRUPTED_CONTENT;
> > +    }
> 
> This "ReadAll, check various failures" pattern occurs three times, it should
> be separated out into its own utility function.

I moved the check for minimum valid number of bytes into ReadAll().  The minimum threshold is passed in as an argument now.

I don't think I can remove any more duplicate code, though.  No matter what I do, these routines will need to check for error and EOF.

> ::: xpcom/tests/gtest/TestSnappyStreams.cpp
> @@ +171,5 @@
> > +}
> > +
> > +TEST(SnappyStream, CompressUncompress_256k_less_13)
> > +{
> > +  TestCompressUncompress((256 * 1024) - 13);
> 
> Where is this magic 13 coming from (here and below)?

I'll add a comment.  I was just trying to pick some value that would not align on the typical power-of-2 boundaries to try to force edge cases.
I believe this addresses all action items to date.  Added tests to cover cases of invalid compressed data header and compressed data content.
Attachment #8525710 - Attachment is obsolete: true
Attachment #8535088 - Flags: review?(nfroyd)
Fix some non-unified and windows build bustage in the try.

  https://tbpl.mozilla.org/?tree=Try&rev=39fe0423d482
Attachment #8535088 - Attachment is obsolete: true
Attachment #8535088 - Flags: review?(nfroyd)
Attachment #8535128 - Flags: review?(nfroyd)
Blocks: 1110482
Comment on attachment 8535128 [details] [diff] [review]
Implement snappy compression framing protocol as nsI(Input|Output)Streams. (v5)

Review of attachment 8535128 [details] [diff] [review]:
-----------------------------------------------------------------

This looks great!  Thanks for your patience.
Attachment #8535128 - Flags: review?(nfroyd) → review+
https://hg.mozilla.org/mozilla-central/rev/8628a94a6444
https://hg.mozilla.org/mozilla-central/rev/ecdbe4421df0
Status: ASSIGNED → RESOLVED
Closed: 10 years ago
Flags: in-testsuite+
Resolution: --- → FIXED
Target Milestone: --- → mozilla37
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: