Open Bug 966024 Opened 6 years ago Updated 3 years ago

Encapsulation for read/write buffer arithmetic

Categories

(Core :: MFBT, defect)

defect
Not set

Tracking

()

People

(Reporter: mayhemer, Unassigned)

Details

Attachments

(3 files, 11 obsolete files)

Since I still see raw ptrs used for reading and writing with a lot of pos += sizeof(xy) craziness added in a new code where using pipes or streams would just be an overkill, I think it would be good to have some simple classes to encapsulate this and make more safe/readable.

I'll create a mockup header to show what I have in mind.
Strongly agree. I wrote a basic form of this when I was hacking on a rewrite of the BMP/DIB decoder, and it made things much much cleaner.
(In reply to Jeff Gilbert [:jgilbert] from comment #1)
> Strongly agree. I wrote a basic form of this when I was hacking on a rewrite
> of the BMP/DIB decoder, and it made things much much cleaner.

Is it in the tree?  Could we just move to mfbt?
(In reply to Honza Bambas (:mayhemer) from comment #2)
> (In reply to Jeff Gilbert [:jgilbert] from comment #1)
> > Strongly agree. I wrote a basic form of this when I was hacking on a rewrite
> > of the BMP/DIB decoder, and it made things much much cleaner.
> 
> Is it in the tree?  Could we just move to mfbt?

No, it's sitting on one of my machines, somewhere. I would have to look for it.
Attached patch wip1 (obsolete) — Splinter Review
seems to compile and work, at least the basic functions.  collectRead, cursor and seek has not been tested.

general feedback welcome.
Assignee: nobody → honzab.moz
Status: NEW → ASSIGNED
Attachment #8368381 - Flags: feedback?
Attached patch wip2 (obsolete) — Splinter Review
and a slightly better working one
Attachment #8368381 - Attachment is obsolete: true
Attachment #8368381 - Flags: feedback?
Attachment #8368428 - Flags: feedback?
Attached patch wip3 (obsolete) — Splinter Review
and this one with auto-reallocations, also a bit more tested
Attachment #8368428 - Attachment is obsolete: true
Attachment #8368428 - Flags: feedback?
Attachment #8368503 - Flags: feedback?
Attached patch wip4 (obsolete) — Splinter Review
added a lot of comments and some more functionality

one question: how to write a test for this?
Attachment #8368503 - Attachment is obsolete: true
Attachment #8368503 - Flags: feedback?
Attachment #8368883 - Flags: feedback?(benjamin)
Comment on attachment 8368883 [details] [diff] [review]
wip4

Michal, can you take a look at this as well?

I also plan:
- introduce typed writers/readers over an existing <char> based write buffer, so you can write a header and then N items (the index usecase)
- change the cursor()/seek() API to only mark()/rewind() to make it more safe regarding use of collectRead()
- some autoptrs and safe pointers should be introduced as well, e.g. so that when you remember a pointer coming from write() it can be safely used after another call to write() that may trigger a reallocation

or maybe the API will completely change anyway :)  we'll see..
Attachment #8368883 - Flags: feedback?(michal.novotny)
Comment on attachment 8368883 [details] [diff] [review]
wip4

I don't think I'm the correct person to review this: probably Waldo as owner of MFBT or maybe froydnj.

Since this is just a header, mfbt/tests seems like a fine place to test it.
Attachment #8368883 - Flags: feedback?(benjamin)
Attached patch wip5 (obsolete) — Splinter Review
- introduces classes for multi-type write and read to/from a char* based buffer
- introduces a safe Ptr class that keeps a pointer to the buffer safely and independently of automatic buffer reallocation and moving
- has a test
- builds only on windows right now...
Attachment #8368883 - Attachment is obsolete: true
Attachment #8368883 - Flags: feedback?(michal.novotny)
Attachment #8397870 - Flags: feedback?(nfroyd)
Attached patch wip5 (builds on all platforms) (obsolete) — Splinter Review
Comment on attachment 8397870 [details] [diff] [review]
wip5

Review of attachment 8397870 [details] [diff] [review]:
-----------------------------------------------------------------

I didn't go over this with a fine-toothed mfbt/STYLE comb; I just tried to skim and offer comments.

Just canceling feedback rather than f+ or f- because I don't have enough information/context on what this is for.  It's not entirely obvious to me why you want this class, especially if you're not going to handle concurrency.  Can you provide a little more background on places in the code where you see this being useful or new use cases you have for this?

::: mfbt/Buffer.h
@@ +4,5 @@
> + * License, v. 2.0. If a copy of the MPL was not distributed with this
> + * file, You can obtain one at http://mozilla.org/MPL/2.0/. */
> +
> +/* Encapsulation for simple buffers used to write and read data to and from,
> + * hides the poitner aritmetic, overflow checks and (re)allocations/freeings.

This comment could be expanded some, and some examples would be most excellent.

@@ +13,5 @@
> +
> +#include "Types.h"
> +#include "Assertions.h"
> +#include <string.h>
> +#include <type_traits>

Headers should be alphabetized and use the mozilla/ prefixes.

<type_traits> shouldn't be used currently; you should use mozilla/TypeTraits.h.

@@ +16,5 @@
> +#include <string.h>
> +#include <type_traits>
> +
> +namespace mozilla {
> +namespace Buf {

Everything exported from the header should go in the mozilla namespace, no sub-namespaces required.  Anything not exported (the Allocator* classes?) should be in the mozilla::detail namespace.

@@ +26,5 @@
> +template <typename T>
> +class AllocatorNew
> +{
> +public:
> +  T * alloc(size_t count)

The methods for all of these allocator classes should be static methods.  If they don't get discarded in favor of AllocPolicy-like things.

@@ +33,5 @@
> +  }
> +
> +  T * realloc(T * buf, size_t count, size_t copy)
> +  {
> +    // No native ralloc for new (AFAIK), so alloc, move, and release.

This seems like a pretty major limitation.  Why would we even want AllocatorNew if the Fallible/Infallible versions are better in this regard?  It's not as though these allocators should be used in a context where clients would be worrying about which allocation functions to use.

@@ +74,5 @@
> +{
> +public:
> +  T * alloc(size_t count)
> +  {
> +    return (T*)moz_xalloc(count * sizeof(T));

moz_xmalloc?

I'm not entirely sure we want this header-(private!)symbol conditional in here.

This stuff also seems like it's duplicating the mfbt/AllocPolicy.h stuff...

@@ +94,5 @@
> +/**
> + * Internal helper class storing the actual buffer, its size and the write cursor.
> + */
> +template <typename C>
> +class WriteBufferTraits

It is peculiar to have a class with Traits in its name requiring actual instances.  WriteBufferStorage?

@@ +133,5 @@
> +    writeCursor -= readCursor;
> +    readCursor = 0;
> +  }
> +
> +  inline C * buf_const() const { return buf; }

No inline required.  No _const in the name, either.

@@ +166,5 @@
> +  {
> +    if (writer) {
> +      // Increasing by 2 to prevent WriteBufferTraits::collectRead()
> +      // to move the physical buffer.
> +      writer->activeReaders += 2;

Even with the comment, it is non-obvious why 2 and not 1.  I don't understand why the check in WriteBufferTraits isn't for >0 readers, either.

@@ +243,5 @@
> +/**
> + * Use to store result of TypeWriteBuffer::writeType().
> + */
> +template <typename T>
> +class PtrType : public PtrBase<T, char>

I feel like the existence of these *Type variants of things, when you are already templating on types, is an indication that something is wrong in the API, or how you are storing things underneath the API.  Like WriteBuffer should just deal with void* and explicit sizes and lengths for everything, and the higher-level things should be explicitly typed and pass the necessary information down.

I guess maybe there's some concern if you have a buffer with type T and you want to access things as a different type U.  But that seems solvable.

Or maybe things are unclear because I can't tell what you want to export and what's header-internal.

@@ +268,5 @@
> +/**
> + * When number of elements doesn't fit the buffer, don't reallocate, just
> + * return null as the write buffer target pointer.
> + */
> +static size_t const REALLOC_NONE = 0;

No static constants in header files.  |const| things are automatically given internal linkage in C++ anyway.

@@ +310,5 @@
> +   *    allocated size of the existing buffer
> +   * @param takeover
> +   *    - true: release the buffer when this object goes away, this
> +   *            allows a complete control over the buffer
> +   *    - false: leave the buffer intact when this WriteBuffer goes away

Please don't use boolean parameters for this; use an enum with descriptive names.

@@ +322,5 @@
> +
> +  ~WriteBuffer()
> +  {
> +    if (buf && release)
> +      Allocator().free(buf);

The allocator's free should properly handle null pointers, so you shouldn't need to null-check |buf| here.

@@ +361,5 @@
> +   *    when no automatic reallocation is setup (reallocBy = REALLOC_NONE)
> +   *    or when we would go out of addressable space, or when the buffer is
> +   *    adopted and has to live after this WriteBuffer, nullptr is returned.
> +   */
> +  Ptr<T> write(size_t count, size_t reallocBy = DefaultRealloc)

|write| is the wrong name for this.  It should be |ensureSpace| or somesuch.

@@ +492,5 @@
> +  /**
> +   * Shortcut to get reference to the buffer at the current read position w/o
> +   * shifting the read position.
> +   */
> +  T const * get()

Shouldn't this be PtrConst?
Attachment #8397870 - Flags: feedback?(nfroyd)
(In reply to Nathan Froyd (:froydnj) from comment #12)
> Comment on attachment 8397870 [details] [diff] [review]
> wip5
> 
> Review of attachment 8397870 [details] [diff] [review]:
> -----------------------------------------------------------------
> 

Thanks!

> I didn't go over this with a fine-toothed mfbt/STYLE comb; I just tried to
> skim and offer comments.
> 
> Just canceling feedback rather than f+ or f- because I don't have enough
> information/context on what this is for.  

Oh, I somewhat hoped it's clear.  Sure the comments at the top needs to be enhanced (planned - a bit shame on me to ask for a feedback w/o it, sorry).



What I want is to convert this...


{
  char * buffer = malloc(1024);
  bufferSize = 1024;
  char * bufferPtr = buffer;

  *static_cast<uint32_t>(bufferPtr) = SomethingReturingUint32();
  bufferPtr += sizeof(uint32_t);

  if (sizeof(mystruct) > bufferPtr - buffer - bufferSize) {
    buffer = realloc(2048);
    bufferPtr = buffer + something;
  }
  *bufferPtr = SomethingReturningMystruct();
  bufferPtr += sizeof(Mystruct);

  if (SomethingFailed()) {
    free(buffer);
    return;
  }

  DoSomething(buffer);
  free(buffer);
}


...to just and only...


{
  TypeWriteBuffer b(1024, 2048);
  *b.write<uint32_t>() = SomethingReturningUint32();
  *b.write<mystruct>() = SomethingReturningMystruct();

  DoSomething(ReadBuffer(b).get());
}



I see raw buffers and a complex boundary checks and reallocation being used all around the platform code.  It's not safe, hard to review, hard to maintain.  These classes should make this super-simple.


> It's not entirely obvious to me
> why you want this class, especially if you're not going to handle
> concurrency.  

Concurrency?  You mean thread safety or what?

> Can you provide a little more background on places in the code
> where you see this being useful or new use cases you have for this?

Sure:
http://hg.mozilla.org/mozilla-central/annotate/bb4dd9872236/netwerk/protocol/http/SpdyStream3.cpp#l343 - all around this code
http://hg.mozilla.org/mozilla-central/annotate/bb4dd9872236/netwerk/cache2/CacheFileMetadata.cpp#l231
http://hg.mozilla.org/mozilla-central/annotate/bb4dd9872236/netwerk/cache2/CacheFileMetadata.cpp#l701
http://hg.mozilla.org/mozilla-central/annotate/bb4dd9872236/netwerk/cache2/CacheIndex.cpp#l2112

I think it's easy to find other examples...

> 
> ::: mfbt/Buffer.h
> @@ +4,5 @@
> > + * License, v. 2.0. If a copy of the MPL was not distributed with this
> > + * file, You can obtain one at http://mozilla.org/MPL/2.0/. */
> > +
> > +/* Encapsulation for simple buffers used to write and read data to and from,
> > + * hides the poitner aritmetic, overflow checks and (re)allocations/freeings.
> 
> This comment could be expanded some, and some examples would be most
> excellent.

Planned.

> @@ +16,5 @@
> > +#include <string.h>
> > +#include <type_traits>
> > +
> > +namespace mozilla {
> > +namespace Buf {
> 
> Everything exported from the header should go in the mozilla namespace, no
> sub-namespaces required.  Anything not exported (the Allocator* classes?)

The allocation have to be exported, it's customization of the buffer.

> should be in the mozilla::detail namespace.
> 
> @@ +26,5 @@
> > +template <typename T>
> > +class AllocatorNew
> > +{
> > +public:
> > +  T * alloc(size_t count)
> 
> The methods for all of these allocator classes should be static methods.  If
> they don't get discarded in favor of AllocPolicy-like things.

I just followed the (xpcom) design for nsTArray comparators, static seems good.

> 
> @@ +33,5 @@
> > +  }
> > +
> > +  T * realloc(T * buf, size_t count, size_t copy)
> > +  {
> > +    // No native ralloc for new (AFAIK), so alloc, move, and release.
> 
> This seems like a pretty major limitation.  Why would we even want
> AllocatorNew if the Fallible/Infallible versions are better in this regard? 
> It's not as though these allocators should be used in a context where
> clients would be worrying about which allocation functions to use.

Interesting opinion, I'm not against making this even simpler.

> 
> @@ +74,5 @@
> > +{
> > +public:
> > +  T * alloc(size_t count)
> > +  {
> > +    return (T*)moz_xalloc(count * sizeof(T));
> 
> moz_xmalloc?
> 
> I'm not entirely sure we want this header-(private!)symbol conditional in
> here.

Quick workaround for the test build problem.  This allocator should probably be defined elsewhere or just completely omitted.

> 
> This stuff also seems like it's duplicating the mfbt/AllocPolicy.h stuff...

Good to know.  I'll check on it.

> 
> @@ +94,5 @@
> > +/**
> > + * Internal helper class storing the actual buffer, its size and the write cursor.
> > + */
> > +template <typename C>
> > +class WriteBufferTraits
> 
> It is peculiar to have a class with Traits in its name requiring actual
> instances.  WriteBufferStorage?

I don't much follow and this is only a base class - never instantiate itself.  But I don't care about the name.

> 
> @@ +133,5 @@
> > +    writeCursor -= readCursor;
> > +    readCursor = 0;
> > +  }
> > +
> > +  inline C * buf_const() const { return buf; }
> 
> No inline required.  No _const in the name, either.

I just need to find the right name...

> 
> @@ +166,5 @@
> > +  {
> > +    if (writer) {
> > +      // Increasing by 2 to prevent WriteBufferTraits::collectRead()
> > +      // to move the physical buffer.
> > +      writer->activeReaders += 2;
> 
> Even with the comment, it is non-obvious why 2 and not 1.  I don't
> understand why the check in WriteBufferTraits isn't for >0 readers, either.

Sure, I'll try to make the comments clearer.  Just for info, when activeReaders > 1 means one of:
- there are two ReadBuffers over this WriteBuffer
- there is at least one Ptr* for this WriteBuffer or any of the ReadBuffers

When it's so, we must not allow collectRead() method to move the data in the physical buffer - there is the > 1 check and that's why we increase in Ptrs by 2 (simulate two readers).  I can think of some better way to track this, but this is IMO fine.

> 
> @@ +243,5 @@
> > +/**
> > + * Use to store result of TypeWriteBuffer::writeType().
> > + */
> > +template <typename T>
> > +class PtrType : public PtrBase<T, char>
> 
> I feel like the existence of these *Type variants of things, when you are
> already templating on types, is an indication that something is wrong in the
> API, or how you are storing things underneath the API.  Like WriteBuffer
> should just deal with void* and explicit sizes and lengths for everything,
> and the higher-level things should be explicitly typed and pass the
> necessary information down.
> 
> I guess maybe there's some concern if you have a buffer with type T and you
> want to access things as a different type U.  But that seems solvable.
> 
> Or maybe things are unclear because I can't tell what you want to export and
> what's header-internal.

PtrBase has two template args because the physycal buffer may have a different type then the data a Ptr* instance points to.  Hence 5 classes - yes, a bit of a mess.  But read on...

The void* underlying buffer (or rather char*) is interesting.  It would make a lot of thinks easier!  I started to write this some time ago and only after some parts were already written I realized there would be a need to not just have an array of same-typed elements, but also to store different types one by one (for network and cross platform persistence purposes).

Have an API, where you always specify the type you read or write, like...

WriteBuffer w(100);
*w.write<uint32_t>() = myInt;
*w.write<uint16_t>() = myShort;

...may be just good!


Then we could just have a single Ptr<> where for read the type would simply have to be 'const'.

Note: I don't want to have an API like w.write(myInt).  It would just become a stream...


> > +   * @param takeover
> > +   *    - true: release the buffer when this object goes away, this
> > +   *            allows a complete control over the buffer
> > +   *    - false: leave the buffer intact when this WriteBuffer goes away
> 
> Please don't use boolean parameters for this; use an enum with descriptive
> names.

+1 !

> @@ +361,5 @@
> > +   *    when no automatic reallocation is setup (reallocBy = REALLOC_NONE)
> > +   *    or when we would go out of addressable space, or when the buffer is
> > +   *    adopted and has to live after this WriteBuffer, nullptr is returned.
> > +   */
> > +  Ptr<T> write(size_t count, size_t reallocBy = DefaultRealloc)
> 
> |write| is the wrong name for this.  It should be |ensureSpace| or somesuch.

Hmm...  If you find a better name (sorry, ensureSpace is not what exactly says what this is doing either) then I'll gladly change it.  So far no idea how this should be called.

I like that write() expresses the 'store to the buffer' nature of the code clearly from the first look.

Maybe: store(), push(), consume(), take(), use(), bite(), eat() :))
Attached patch wip6 (obsolete) — Splinter Review
- examples added, tho some comments still need an update
- simplified API: only a single allocator, just a single Ptr<> and one untyped WriteBuffer and ReadBuffer
- Ptr<> now supports safe ++ operators: while(p) { *p++ = foo(); } loops as many times as was the number of elements the ptr was write() for
- write() and read() methods must always have a template argument to decide correctly on the type written/read
- still not sure about a better name for the write() method
Attachment #8397870 - Attachment is obsolete: true
Attachment #8397932 - Attachment is obsolete: true
Attachment #8398217 - Flags: feedback?(nfroyd)
(In reply to Honza Bambas (:mayhemer) from comment #13)
> > I didn't go over this with a fine-toothed mfbt/STYLE comb; I just tried to
> > skim and offer comments.
> > 
> > Just canceling feedback rather than f+ or f- because I don't have enough
> > information/context on what this is for.  
> 
> Oh, I somewhat hoped it's clear.  Sure the comments at the top needs to be
> enhanced (planned - a bit shame on me to ask for a feedback w/o it, sorry).

I think I had gotten a rough idea of what it was, but it's always helpful for the patch submitter to provide the idea in their own words. :)

> {
>   TypeWriteBuffer b(1024, 2048);
>   *b.write<uint32_t>() = SomethingReturningUint32();
>   *b.write<mystruct>() = SomethingReturningMystruct();
> 
>   DoSomething(ReadBuffer(b).get());
> }

This is much nicer!

> I see raw buffers and a complex boundary checks and reallocation being used
> all around the platform code.  It's not safe, hard to review, hard to
> maintain.  These classes should make this super-simple.

Seems like a worthy goal.

> > It's not entirely obvious to me
> > why you want this class, especially if you're not going to handle
> > concurrency.  
> 
> Concurrency?  You mean thread safety or what?

I did.  The underlying question here is really pertaining to the requirement of the ReadBuffer bits.  I guess I'm having a hard time seeing where you would write things to the buffer and then read them back in some completely different way.  It seems much more likely that you'd just send the buffer to disk or to the network or similar.  The requirement to handle reading buffers makes the code rather more complex and I'd like to avoid that if possible.

Now, if you wanted to have a separate "read arbitrary things out of a chunk of memory" API, separate from writing, I could see a use for that.

> > Can you provide a little more background on places in the code
> > where you see this being useful or new use cases you have for this?
> 
> Sure:
> http://hg.mozilla.org/mozilla-central/annotate/bb4dd9872236/netwerk/protocol/
> http/SpdyStream3.cpp#l343 - all around this code
> http://hg.mozilla.org/mozilla-central/annotate/bb4dd9872236/netwerk/cache2/
> CacheFileMetadata.cpp#l231
> http://hg.mozilla.org/mozilla-central/annotate/bb4dd9872236/netwerk/cache2/
> CacheFileMetadata.cpp#l701
> http://hg.mozilla.org/mozilla-central/annotate/bb4dd9872236/netwerk/cache2/
> CacheIndex.cpp#l2112
> 
> I think it's easy to find other examples...

OK, those all seem pretty reasonable.  The Metadata bits could stand some cleanup.

If you really mean to cleanup the SpdyStream stuff, we should probably try to work out an API for writing in specified endiannesses, too...

> > The methods for all of these allocator classes should be static methods.  If
> > they don't get discarded in favor of AllocPolicy-like things.
> 
> I just followed the (xpcom) design for nsTArray comparators, static seems
> good.

Yeah, I think the use cases are a little different for the comparator vs. the allocator case.  In the comparator case, you have data that you might want to compare in different ways: descending order, ascending order, case-insensitive order, etc. etc.  So you need an actual comparator object with (instance) member functions.  But in the allocator case, you don't want to be changing your allocator during the entire lifetime of the object: imagine starting out with an allocates-with-new-and-delete policy and then trying to use an allocates-with-malloc-and-free policy.  Bad things are going to happen.  The allocator, then, is fixed, and using static member functions is appropriate because you never want an actual allocator object.

> > @@ +94,5 @@
> > > +/**
> > > + * Internal helper class storing the actual buffer, its size and the write cursor.
> > > + */
> > > +template <typename C>
> > > +class WriteBufferTraits
> > 
> > It is peculiar to have a class with Traits in its name requiring actual
> > instances.  WriteBufferStorage?
> 
> I don't much follow and this is only a base class - never instantiate
> itself.  But I don't care about the name.

In template code, a class with traits in its name typically describes properties of some type or controls the behavior of some other template class: <type_traits>, std::iterator_traits, mfbt/Scoped.h, etc.  I see what you are using it for here, but I think it'd be better to change the name.

> > @@ +133,5 @@
> > > +    writeCursor -= readCursor;
> > > +    readCursor = 0;
> > > +  }
> > > +
> > > +  inline C * buf_const() const { return buf; }
> > 
> > No inline required.  No _const in the name, either.
> 
> I just need to find the right name...

I would just use |buf|.

> > @@ +166,5 @@
> > > +  {
> > > +    if (writer) {
> > > +      // Increasing by 2 to prevent WriteBufferTraits::collectRead()
> > > +      // to move the physical buffer.
> > > +      writer->activeReaders += 2;
> > 
> > Even with the comment, it is non-obvious why 2 and not 1.  I don't
> > understand why the check in WriteBufferTraits isn't for >0 readers, either.
> 
> Sure, I'll try to make the comments clearer.  Just for info, when
> activeReaders > 1 means one of:
> - there are two ReadBuffers over this WriteBuffer
> - there is at least one Ptr* for this WriteBuffer or any of the ReadBuffers
> 
> When it's so, we must not allow collectRead() method to move the data in the
> physical buffer - there is the > 1 check and that's why we increase in Ptrs
> by 2 (simulate two readers).  I can think of some better way to track this,
> but this is IMO fine.

I will have to read the code a little more closely to see why one reader lets us move the underlying buffer, but two readers does not.  Seems like an unusual invariant.  Perhaps adding something like:

  void registerPointer() {
    activeReaders += 2;
  }
  void registerReader() {
    activeReaders += 1;
  }

to the WriteBuffer would make things clearer and keep the magic constants in one place.  Or an RAII class (WriteBuffer::AutoPointerLock or somesuch) that can be used as a member in Ptr and ReadBuffer.

In any event, if we didn't have to deal with reads and writes to the same buffer, this would all become much simpler.

> > @@ +243,5 @@
> > > +/**
> > > + * Use to store result of TypeWriteBuffer::writeType().
> > > + */
> > > +template <typename T>
> > > +class PtrType : public PtrBase<T, char>
> > 
> > I feel like the existence of these *Type variants of things, when you are
> > already templating on types, is an indication that something is wrong in the
> > API, or how you are storing things underneath the API.  Like WriteBuffer
> > should just deal with void* and explicit sizes and lengths for everything,
> > and the higher-level things should be explicitly typed and pass the
> > necessary information down.
>
> Have an API, where you always specify the type you read or write, like...
> 
> WriteBuffer w(100);
> *w.write<uint32_t>() = myInt;
> *w.write<uint16_t>() = myShort;
> 
> ...may be just good!

I like explicitly specifying things.  I'm in favor of this.

> > @@ +361,5 @@
> > > +   *    when no automatic reallocation is setup (reallocBy = REALLOC_NONE)
> > > +   *    or when we would go out of addressable space, or when the buffer is
> > > +   *    adopted and has to live after this WriteBuffer, nullptr is returned.
> > > +   */
> > > +  Ptr<T> write(size_t count, size_t reallocBy = DefaultRealloc)
> > 
> > |write| is the wrong name for this.  It should be |ensureSpace| or somesuch.
> 
> Hmm...  If you find a better name (sorry, ensureSpace is not what exactly
> says what this is doing either) then I'll gladly change it.  So far no idea
> how this should be called.
> 
> I like that write() expresses the 'store to the buffer' nature of the code
> clearly from the first look.

I guess my problem with |write| is that you're not actually writing data to the buffer: you're just making sure the buffer has space for |count| elements.  The writing/storing comes later, after you've gotten the return value from this function.  Even the idiomatic:

  *buf.write<uint32>() = ...;

doesn't read right to me.  I suggested |ensureSpace| as an analogy to std::vector's |reserve| or nsTArray's |ensureCapacity|.  Maybe |alloc| is closer to what you want?

I will see about giving wip6 a once-over today.
(In reply to Nathan Froyd (:froydnj) from comment #15)
> > Concurrency?  You mean thread safety or what?
> 
> I did.  The underlying question here is really pertaining to the requirement
> of the ReadBuffer bits.  I guess I'm having a hard time seeing where you
> would write things to the buffer and then read them back in some completely
> different way.  It seems much more likely that you'd just send the buffer to
> disk or to the network or similar.  

The purpose is actually to collect all what you need to a single memory buffer and then work with it as a blob, i.e. write or send as a whole.

Similar when reading - you raw-read to a memory the WriteBuffer provides and then 'parse' via the ReadBuffer.

So, I don't think thread safety is something that should be part of this code, IMHO.

> The requirement to handle reading
> buffers makes the code rather more complex and I'd like to avoid that if
> possible.

I'm not 100% sure what you mean.  Two ReadBuffers can easily read concurrently, each has its own readCursor.

I think when the buffer has to be written concurrently, then the thread safety should be up on the consumer, also because I assume usually there will be a need to make numerous calls to the write buffer be atomic as a whole.  That can only be ensured externally.

> 
> Now, if you wanted to have a separate "read arbitrary things out of a chunk
> of memory" API, separate from writing, I could see a use for that.

Can you give some example or maybe just more explain what you mean?

> If you really mean to cleanup the SpdyStream stuff, we should probably try
> to work out an API for writing in specified endiannesses, too...

(For metadata too, we have a bug right now to write it with correct endiannesses too, and buffering seems to be a problem :)  This would solve it.)

So, beside write<>() - or however we end up calling it - we could have |bool writeBigEndianUint32(uint32_t const value, size_t autoReallocBy = ..);| methods on the buffer, something like this?

>   void registerPointer() {
>     activeReaders += 2;
>   }
>   void registerReader() {
>     activeReaders += 1;
>   }

Will be in wip7 (but read on), I'll try to consider an Auto class too.

> In any event, if we didn't have to deal with reads and writes to the same
> buffer, this would all become much simpler.

How would such simpler API look like?

The multiple readers was something I introduced at the start based on a need to return at the buffer to a certain spot and read it again.  But now, we have the Ptr<> stuff that does it better.  So, we may join ReadBuffer and WriteBuffer to a single class (Buffer) potentially.  To access the buffer at random (or "seek" in it) use the Ptr<>s.  Those are safe according reallocation/moving of the buffer.

> value from this function.  Even the idiomatic:
> 
>   *buf.write<uint32>() = ...;
> 
> doesn't read right to me.  I suggested |ensureSpace| as an analogy to
> std::vector's |reserve| or nsTArray's |ensureCapacity|.  Maybe |alloc| is
> closer to what you want?

It's still hard to decide for me.  Probably consume() or get() would be good for me.  alloc() would be IMO missleading since the buffer is already allcoated.  And any re/allocation is about to be hidden from the consumer - purpose of this API.

Also, reserve and ensureCapacity don't move the writeCursor pointer or repeated calls!
(In reply to Honza Bambas (:mayhemer) from comment #16)
> Also, reserve and ensureCapacity don't move the writeCursor pointer or
> repeated calls!

ON repeated calls.
(In reply to Honza Bambas (:mayhemer) from comment #16)
> (In reply to Nathan Froyd (:froydnj) from comment #15)
> > > Concurrency?  You mean thread safety or what?
> > 
> > I did.  The underlying question here is really pertaining to the requirement
> > of the ReadBuffer bits.  I guess I'm having a hard time seeing where you
> > would write things to the buffer and then read them back in some completely
> > different way.  It seems much more likely that you'd just send the buffer to
> > disk or to the network or similar.  
> 
> The purpose is actually to collect all what you need to a single memory
> buffer and then work with it as a blob, i.e. write or send as a whole.

Right, this is my point: it should be unusual to need to access portions of the buffer while you're writing to the buffer, like writing a bunch of uint32_t and then reading them back as uint16_t.  Even after you're done, you shouldn't need to.  You should just need to blast a bunch of various things to a buffer, and then grab the buffer as a completely separate thing and send it off.

I could see a need for keeping Ptrs to portions of the buffer.  Going back and filling in a content-length or similar, for instance.  But needing an entirely separate ReadBuffer should be very unusual; multiple ReadBuffers should be extremely unusual.

Getting the pointer out could be accomplished by a finish() method or something that just gave you a |const void*| or similar and nulled out internal state.  Or you could be very fancy and say:

  ReadBuffer rbuf(mozilla::Move(writebuf));

and let ReadBuffer's move constructor take care of details.

> Similar when reading - you raw-read to a memory the WriteBuffer provides and
> then 'parse' via the ReadBuffer.

Ah, no, I disagree with this.  You should have a separate ReadBuffer that just deals with raw memory underneath, not indirecting through the WriteBuffer.  Maybe it has some limited abstraction it uses, like mfbt/RangedPtr.h, but conflating the needs of reading and writing in WriteBufferStorage just makes things more complicated for minimal gain.

> So, I don't think thread safety is something that should be part of this
> code, IMHO.

I totally agree!

> > Now, if you wanted to have a separate "read arbitrary things out of a chunk
> > of memory" API, separate from writing, I could see a use for that.
> 
> Can you give some example or maybe just more explain what you mean?

I mean that your ReadBuffer should be something just like:

struct ReadBuffer
{
  ReadBuffer(void* p, size_t length) : p_(p), end_(p + length) {}

  template<typename T>
  T
  read() {
    if (p_ + sizeof(T) > end) {
      MOZ_CRASH(); // or similar
    }

    T t;
    memcpy(&t, p, sizeof(T));
    p_ += sizeof(T);
    return t;
  }

  void* p_;
  void* end_;
};

You don't need to deal with collectRead and all that, and your WriteBufferStorage can become simpler because it doesn't have to think about readers.  WriteBufferStorage can just think about logical pointers into the data.

I think we are describing very similar things.

> > If you really mean to cleanup the SpdyStream stuff, we should probably try
> > to work out an API for writing in specified endiannesses, too...
> 
> (For metadata too, we have a bug right now to write it with correct
> endiannesses too, and buffering seems to be a problem :)  This would solve
> it.)
> 
> So, beside write<>() - or however we end up calling it - we could have |bool
> writeBigEndianUint32(uint32_t const value, size_t autoReallocBy = ..);|
> methods on the buffer, something like this?

Actually, wait, you don't need that.  You just need something like:

  BigEndian::writeUint32(buf.write<uint32_t>(), val);

And we already have writeUint32 in Endian.h.  So we don't need anything out of WriteBuffer other than something that makes sure that we have enough space to write a given thing.  Incidentally, this is also a good example of why |write| is IMHO not a good choice. ;)

> > value from this function.  Even the idiomatic:
> > 
> >   *buf.write<uint32>() = ...;
> > 
> > doesn't read right to me.  I suggested |ensureSpace| as an analogy to
> > std::vector's |reserve| or nsTArray's |ensureCapacity|.  Maybe |alloc| is
> > closer to what you want?
> 
> It's still hard to decide for me.  Probably consume() or get() would be good
> for me.  alloc() would be IMO missleading since the buffer is already
> allcoated.  And any re/allocation is about to be hidden from the consumer -
> purpose of this API.

consume (what is being consumed?) and get (getting what?) don't seem right either.

I don't understand your objection to alloc; just because it says |alloc| doesn't necessarily mean that memory is getting allocated.  Think of an fixed-size pool memory allocator: every N pool_alloc() calls, it might need to mmap more pages or malloc() more memory, but it doesn't mean every pool_alloc() call is allocating memory.  You are allocating memory for a particular value of size T; whether actual memory allocation is taking place for any particular call is immaterial.

> Also, reserve and ensureCapacity don't move the writeCursor pointer ON
> repeated calls!

This is true.  I don't know that's necessarily a reason not to use them.  Maybe |push| is closer to the mark?
Comment on attachment 8398217 [details] [diff] [review]
wip6

Review of attachment 8398217 [details] [diff] [review]:
-----------------------------------------------------------------

Canceling this for now, as I feel like we are coming to a somewhat different API.
Attachment #8398217 - Flags: feedback?(nfroyd)
Attached patch wip7a (obsolete) — Splinter Review
Mostly a backup
- merge of read and write to a single Buffer<> class

Plans:
- make BufferStorage a shared reference-counted object among Buffer<> and Ptr<>'s so that when Ptr<> overlives the Buffer<>, we won't crash ; also will turn some methods from returning bool to be just void infallible
Attachment #8398217 - Attachment is obsolete: true
Attached patch wip7 (obsolete) — Splinter Review
Closing to something I would be proud to give for r?

- as wip7a: there is just a single Buffer<> class
- write() renamed to use(): short and IMO descriptive
- the underlying detail::BufferStorage class is now refcounted
- so that when a Ptr<> overlives a Buffer<> it's still safe to work with it
Attachment #8401968 - Attachment is obsolete: true
Attachment #8402405 - Flags: feedback?(nfroyd)
Comment on attachment 8402405 [details] [diff] [review]
wip7

Review of attachment 8402405 [details] [diff] [review]:
-----------------------------------------------------------------

We are getting a lot closer!  Still some things to address; I tried to point out things in comments too, if you're really serious that this is something worth asking for review on.

::: mfbt/Buffer.h
@@ +59,5 @@
> +  // to be written to a file or sent to the network.
> +  fwrite(buffer.head(), buffer.length(), 1, NULL /*placeholder*/);
> +
> +  // Read back the values previously stored
> +  uint32_t const * integer = buffer.read<uint32_t>();

I remain skeptical that we actually need this functionality of being able to read/write the same buffer.

Since the best argument is that code actually wants to do this, can you convert some of the aforementioned netwerk/ instances into code that uses this API?  If read/write on a given buffer is useful there, then we can include it in the API.  If it turns out to be unused there, then we can make the API not support that usecase and then wait for somebody to actually need it.  It'll be better to not land code into mfbt without some use elsewhere in the tree.

@@ +85,5 @@
> + */
> +class AllocatorFallible
> +{
> +public:
> +  static char * alloc(size_t count)

All of these functions should deal in void*, not char*.  Almost everything in this file that uses char* should use void* as well.

@@ +90,5 @@
> +  {
> +    return (char*)MallocAllocPolicy().malloc_(count);
> +  }
> +
> +  static char * realloc(char * buf, size_t count, size_t copy)

This function should be called realloc_ to avoid name clashes.

@@ +95,5 @@
> +  {
> +    return (char*)MallocAllocPolicy().realloc_(buf, copy, count);
> +  }
> +
> +  static void free(char * ptr)

Likewise, this function should be called free_.

@@ +139,5 @@
> +
> +    if (!hasOneRef()) {
> +      // There are Ptr<> instances referencing this buffer, we may
> +      // not shift the buffer data now.
> +      return;

Why is it safe to move the buffer when there is one reference, but not when there are two references?  For instance, I'm pretty sure that something like:

Buffer b ...;
Ptr<uint32_t const> p = b.read(2)
p++;
b.collectRead();

is going to cause problems, because |p| now points to the wrong place in the buffer.  Actually, if I'm reading things right, you don't even need the |p++|.

Seems like a very footgunnish API.  I'm kind of skeptical that any converted code is going to use this safely.

@@ +165,5 @@
> +  // Size of the buffer allocation.
> +  size_t size;
> +
> +private:
> +  FreeFunction autoFree;

I'd just call this |freeFunction|; |autoFree| sounds too much like a RAII member.

@@ +172,5 @@
> +} // namespace detail
> +
> +
> +/**
> + * This pointer is independent of the real buffer relloactions and

"reallocations".  I think I would say the "underlying buffer" instead of the "real buffer".

@@ +175,5 @@
> +/**
> + * This pointer is independent of the real buffer relloactions and
> + * collectRead() calls.
> + * It keeps reference to the BufferStorage (which keeps the physical
> + * buffer that may silently change) and logical offset cursor to this buffer.

Mmm, how about "Instead of a physical pointer into the underlying buffer, it stores a logical cursor into the buffer."  Or something like that.

@@ +178,5 @@
> + * It keeps reference to the BufferStorage (which keeps the physical
> + * buffer that may silently change) and logical offset cursor to this buffer.
> + */
> +template <typename T>
> +class Ptr

This almost seems like it should be a nested class inside Buffer, since it's so tightly connected with it.  Would also avoid adding unnecessary names to the mozilla:: namespace.

@@ +243,5 @@
> +  {
> +    if (!buf || stop == cursor)
> +      return nullptr;
> +
> +    T * b = reinterpret_cast<T *>(buf->buf + cursor);

Hm, this code (and everything else that uses get()) does nothing to ensure that the pointer is appropriately aligned for accesses of type T.  The correctness-conscious version would be something like:

operator void* () const {
  return get();
}

T operator*() const {
  T t;
  memcpy(&t, get(), sizeof(T));
  return T;
}

But that doesn't really help with the operator++ versions (I don't think we want those to return Ptr<T>).  It also makes the class somewhat less convenient to use.

@@ +256,5 @@
> +/**
> + * When number of elements doesn't fit the buffer, don't reallocate, just
> + * return null as the write buffer target pointer.
> + */
> +enum { REALLOC_NONE = 0 };

These REALLOC constants should probably live inside Buffer, to reduce namespace pollution.

@@ +268,5 @@
> +
> +/**
> + * How is the physical buffer referenced.
> + */
> +enum BufferReference

Likewise for this.

@@ +300,5 @@
> +{
> +public:
> +  Buffer() { }
> +
> +  Buffer(size_t count)

This function is going to need a comment.

@@ +313,5 @@
> +
> +  /**
> +   * Create (allocate) a buffer of an initial size of |count| elements.
> +   */
> +  bool create(size_t count)

The comment should also talk about bytes, not elements.

I'm not convinced this should be public.  But I guess that's what actually rewriting code to use this will decide.

@@ +327,5 @@
> +   * @param existing
> +   *    existing buffer to let this Buffer instance work with
> +   * @param count
> +   *    allocated size of the existing buffer
> +   * @param behavior

This name does not match the actual parameter name.

@@ +339,5 @@
> +    buf = new detail::BufferStorage(existing, count, freeFunc(ref));
> +  }
> +
> +  /**
> +   * Releases the current buffer.  When there are no Ptr<> activalely

"actively"

@@ +341,5 @@
> +
> +  /**
> +   * Releases the current buffer.  When there are no Ptr<> activalely
> +   * pointing at the storage the physical buffer is freed if the Buffer
> +   * instance created or adopted it.

Maybe "The physical buffer may not be freed if there are active Ptr<>s referencing it."

@@ +343,5 @@
> +   * Releases the current buffer.  When there are no Ptr<> activalely
> +   * pointing at the storage the physical buffer is freed if the Buffer
> +   * instance created or adopted it.
> +   */
> +  void release()

Likewise with this function and the publicness of it.

@@ +349,5 @@
> +    buf = nullptr;
> +  }
> +
> +  /**
> +   * Returns how many bytes or elements of type T is available to be written

This is not a template function, so there's no T available here.  I think you want to make this private and only expose the templated version.

@@ +373,5 @@
> +   * Write position is then shifted by |count * sizeof(T)|, so that following
> +   * call to use() will return a pointer to the new writing position.
> +   *
> +   * @param count
> +   *    how many elements of type T is demanded to be written to the buffer

"demanded" reads oddly; I usually think of things being "requested".

@@ +378,5 @@
> +   * @param reallocBy
> +   *    if there is not |count| elements available in the remaining buffer
> +   *    space, the buffer can be automatically reallocated
> +   *
> +   *    REALLOC_NONE - don't reallocate automatically, use() returns null when

I'd like to say that I find it amusing that you rejected my suggestion of "alloc" for this function on the grounds that all the allocation is "hidden", but couldn't avoid using "reallocation"/"allocate" continuously in the comment describing this function, the various names in the function interface, and bits of the API elsewhere. ;)

@@ +381,5 @@
> +   *
> +   *    REALLOC_NONE - don't reallocate automatically, use() returns null when
> +   *    no more space to use is available in the buffer
> +   *
> +   *    REALLOC_FIT - prolong only to exactly fit what is demanded to be

"prolong" is an odd word to use in this comment, here and elsewhere.  How about "resize"?

@@ +395,5 @@
> +   *    argument automatically.
> +   *
> +   *    When no automatic reallocation is setup (reallocBy = REALLOC_NONE)
> +   *    or when we would go out of memory, or when the buffer is being depend
> +   *    on and has to live after this Buffer, nullptr is returned.

I don't understand the "when the buffer is being depend on..." bit.  Can you rephrase?

@@ +407,5 @@
> +    // check for overflow
> +    if (count > SIZE_MAX / sizeof(T))
> +      return Ptr<T>();
> +
> +    count *= sizeof(T);

Please don't reuse variables like this.  Declare a new variable |bytesCount| or somesuch.

@@ +418,5 @@
> +  }
> +
> +  /**
> +   * Number of elements available (previously written) in the buffer
> +   * for reading.  Every call to read with non-null |count| decrease

"decreases".  Also, using "elements" in the comment for this function is incorrect, it's really bytes.

@@ +421,5 @@
> +   * Number of elements available (previously written) in the buffer
> +   * for reading.  Every call to read with non-null |count| decrease
> +   * the number of available elements to read.
> +   */
> +  size_t avail() const

I think you actually only want the templated version public, and this should be private.

@@ +436,5 @@
> +    return avail() / sizeof(T);
> +  }
> +
> +  /**
> +   * If available, returns |count| elements from the buffer at the current

Saying that it returns |count| elements is incorrect; it returns a Ptr to |count| elements.

@@ +451,5 @@
> +    // check for overflow
> +    if (count > SIZE_MAX / sizeof(T))
> +      return Ptr<T const>();
> +
> +    count *= sizeof(T);

Likewise with the variable reuse here.

@@ +463,5 @@
> +
> +  /**
> +   * Throws away all that has already been read from the buffer,
> +   * this actually compacts the memory usage so that we don't need
> +   * to reallocate that often.

Please fix this run-on sentence.  (Other instances of this pattern abound in the comments; I'm not going to point them all out.)

@@ +474,5 @@
> +
> +  /**
> +   * Return pointer to the whole buffer, this is independent on how much
> +   * has been read from the buffer so far, always points to the start of
> +   * the physical buffer.

This comment needs trimming and/or some wordsmithing.

@@ +504,5 @@
> +
> +    if (reallocBy == REALLOC_NONE || !buf->adopted())
> +      return false;
> +
> +    size_t missing = count - room();

|missing| doesn't seem like the right name for this.  |allocAmount| seems much more descriptive.

::: mfbt/tests/TestBuffer.cpp
@@ +5,5 @@
> +#include "mozilla/Assertions.h"
> +#include "mozilla/Buffer.h"
> +#include "mozilla/DebugOnly.h"
> +
> +using namespace mozilla;

The usual style in test files is to only declare what you use.  So:

using mozilla::Buffer;
using mozilla::DebugOnly;
using mozilla::Ptr;
Attachment #8402405 - Flags: feedback?(nfroyd) → feedback+
Ben Turner also pointed out that this stuff is very similar in style to the ipc/ Pickle class; it might be worth looking there for inspiration.  Less template magic there, so there's a lot more explicit API surface, but still worth poking at.
(In reply to Nathan Froyd (:froydnj) from comment #22)
> Comment on attachment 8402405 [details] [diff] [review]
> wip7
> 
> Review of attachment 8402405 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> We are getting a lot closer!  Still some things to address; I tried to point
> out things in comments too, if you're really serious that this is something
> worth asking for review on.

Thank you for the quick feedback again!

> 
> ::: mfbt/Buffer.h
> @@ +59,5 @@
> > +  // to be written to a file or sent to the network.
> > +  fwrite(buffer.head(), buffer.length(), 1, NULL /*placeholder*/);
> > +
> > +  // Read back the values previously stored
> > +  uint32_t const * integer = buffer.read<uint32_t>();
> 
> I remain skeptical that we actually need this functionality of being able to
> read/write the same buffer.

Maybe we can split again:

Have a buffer class you write() (or use()) to and that gives you a blob at the end.

Then have a different buffer class given a blob at the start you can read elements from.

Maybe this is something I originally had in mind but until the smart Ptr<> class has been introduced I messed that in a more confusing way.


I think I should expand the examples to show how to write a blob and then how to read it back from say a file.  Outline of reading a blob:

size_t bytesToRead = myFile->AvailableBytes();

Buffer<> b(bytesToRead);
myFile->Read(b.use(bytesToRead), bytesToRead);

Ptr<uint32_t const> firstElement = b.read<uint32_t>();
Ptr<uint64_t const> secondElement = b.read<uint64_t>();

etc...



> 
> Since the best argument is that code actually wants to do this, can you
> convert some of the aforementioned netwerk/ instances into code that uses
> this API?  If read/write on a given buffer is useful there, then we can
> include it in the API.  If it turns out to be unused there, then we can make
> the API not support that usecase and then wait for somebody to actually need
> it.  It'll be better to not land code into mfbt without some use elsewhere
> in the tree.

I understand and agree.

> @@ +139,5 @@
> > +
> > +    if (!hasOneRef()) {
> > +      // There are Ptr<> instances referencing this buffer, we may
> > +      // not shift the buffer data now.
> > +      return;
> 
> Why is it safe to move the buffer when there is one reference, but not when
> there are two references?  For instance, I'm pretty sure that something like:
> 
> Buffer b ...;
> Ptr<uint32_t const> p = b.read(2)
> p++;
> b.collectRead();
> 
> is going to cause problems, because |p| now points to the wrong place in the
> buffer.  Actually, if I'm reading things right, you don't even need the
> |p++|.

Oh, and it doesn't!  That is the point!

Line by line:
> Buffer b ...;
say we have a buffer and something it, at least 2 * 4 bytes so that the following read(2) won't fail

you get a pointer to where you can read your expected value
> Ptr<uint32_t const> p = b.read(2)
here you forget the first uint32_t element you wanted to read - this is actually a bad use of the ptr!
> p++;
at this line the *p will give you the second element in the buffer

collectRead does nothing here, since:
- first reference to the BufferStorage comes from Buffer itself
- second reference comes from the still existing Ptr<>
> b.collectRead();

hence, p points to the correct position.  This worked even in wip6.


> 
> Seems like a very footgunnish API.  I'm kind of skeptical that any converted
> code is going to use this safely.

I hope the lines above shows otherwise.  Also, next time please read the comments (comment 10) I put to bugzilla along with patches it explains what may be missing in comments of the draft patches.

> @@ +178,5 @@
> > + * It keeps reference to the BufferStorage (which keeps the physical
> > + * buffer that may silently change) and logical offset cursor to this buffer.
> > + */
> > +template <typename T>
> > +class Ptr
> 
> This almost seems like it should be a nested class inside Buffer, since it's
> so tightly connected with it.  Would also avoid adding unnecessary names to
> the mozilla:: namespace.

Yeah, good idea.

> 
> @@ +243,5 @@
> > +  {
> > +    if (!buf || stop == cursor)
> > +      return nullptr;
> > +
> > +    T * b = reinterpret_cast<T *>(buf->buf + cursor);
> 
> Hm, this code (and everything else that uses get()) does nothing to ensure
> that the pointer is appropriately aligned for accesses of type T.  The
> correctness-conscious version would be something like:

Doesn't the reinterpret_cast ensure that?

> 
> operator void* () const {
>   return get();
> }
> 
> T operator*() const {
>   T t;
>   memcpy(&t, get(), sizeof(T));
>   return T;
> }

Oh, why memcpy??  Can you explain more??

> 
> But that doesn't really help with the operator++ versions (I don't think we
> want those to return Ptr<T>).  It also makes the class somewhat less
> convenient to use.

I don't follow.  You can always do *buf.use<uint32_t>(1) = myValue; and uint32_t const myValue = *buf.read<uint32_t>(1);  as well as uint32_t const* myValue = buf.read<uint32_t>(1); with fact that the pointer may point to a bad memory when the buffer is being written to or so.

> 
> @@ +256,5 @@
> > +/**
> > + * When number of elements doesn't fit the buffer, don't reallocate, just
> > + * return null as the write buffer target pointer.
> > + */
> > +enum { REALLOC_NONE = 0 };
> 
> These REALLOC constants should probably live inside Buffer, to reduce
> namespace pollution.

I'm not sure I can then use them as a default values for the Buffer<> template.  But yes, I was thinking of moving these inside too.

> @@ +300,5 @@
> > +{
> > +public:
> > +  Buffer() { }
> > +
> > +  Buffer(size_t count)
> 
> This function is going to need a comment.

Is a reference to create() just bellow enough?  Otherwise I would mostly copy its comment here.

> 
> @@ +313,5 @@
> > +
> > +  /**
> > +   * Create (allocate) a buffer of an initial size of |count| elements.
> > +   */
> > +  bool create(size_t count)
> 
> The comment should also talk about bytes, not elements.
> 
> I'm not convinced this should be public.  But I guess that's what actually
> rewriting code to use this will decide.

I think we may want to be able to (re)create the buffer somewhat dynamically too.  And release it as well.  For me a valid use case.

> @@ +343,5 @@
> > +   * Releases the current buffer.  When there are no Ptr<> activalely
> > +   * pointing at the storage the physical buffer is freed if the Buffer
> > +   * instance created or adopted it.
> > +   */
> > +  void release()
> 
> Likewise with this function and the publicness of it.

See my previous comment.  I want to be able to have a single Buffer<> mBuffer; member of a class that may freely allocate with create() when needed and release() when no needed.  I don't want to do AutoPtr<Buffer<> > mBuffer or something like that...  What do you think?

> 
> @@ +349,5 @@
> > +    buf = nullptr;
> > +  }
> > +
> > +  /**
> > +   * Returns how many bytes or elements of type T is available to be written
> 
> This is not a template function, so there's no T available here.  I think
> you want to make this private and only expose the templated version.

Comment might confuse you here.  I want this exposed for convenience.  I want both room() / working on the actual buffer size / and room<SpecificType>() be public.

> 
> @@ +378,5 @@
> > +   * @param reallocBy
> > +   *    if there is not |count| elements available in the remaining buffer
> > +   *    space, the buffer can be automatically reallocated
> > +   *
> > +   *    REALLOC_NONE - don't reallocate automatically, use() returns null when
> 
> I'd like to say that I find it amusing that you rejected my suggestion of
> "alloc" for this function on the grounds that all the allocation is
> "hidden", but couldn't avoid using "reallocation"/"allocate" continuously in
> the comment describing this function, the various names in the function
> interface, and bits of the API elsewhere. ;)

Ah... :) yeah, maybe I misunderstood or forget to address all of it.  This is still a memory buffer and some control might be useful.  But if you think that to just always realloc is good, I will go with it.  Maybe this (what I draft here) is unnecessarily complicated.

> 
> @@ +395,5 @@
> > +   *    argument automatically.
> > +   *
> > +   *    When no automatic reallocation is setup (reallocBy = REALLOC_NONE)
> > +   *    or when we would go out of memory, or when the buffer is being depend
> > +   *    on and has to live after this Buffer, nullptr is returned.
> 
> I don't understand the "when the buffer is being depend on..." bit.  Can you
> rephrase?

Ah, yeah, I should rewrite like:

"When the buffer is referenced by some Ptr<>'s" or something like this.

> @@ +421,5 @@
> > +   * Number of elements available (previously written) in the buffer
> > +   * for reading.  Every call to read with non-null |count| decrease
> > +   * the number of available elements to read.
> > +   */
> > +  size_t avail() const
> 
> I think you actually only want the templated version public, and this should
> be private.

Same arg as for room().  But if you think this is going to cause too much of a confusion, I will hide or remove it.

> @@ +463,5 @@
> > +
> > +  /**
> > +   * Throws away all that has already been read from the buffer,
> > +   * this actually compacts the memory usage so that we don't need
> > +   * to reallocate that often.
> 
> Please fix this run-on sentence.  (Other instances of this pattern abound in
> the comments; I'm not going to point them all out.)

I'm sorry, I don't understand correctly what you want from me here?

> @@ +504,5 @@
> > +
> > +    if (reallocBy == REALLOC_NONE || !buf->adopted())
> > +      return false;
> > +
> > +    size_t missing = count - room();
> 
> |missing| doesn't seem like the right name for this.  |allocAmount| seems
> much more descriptive.

Hmm... missing means how many bytes we are missing to fulfill the use() request so that current+missing is the amount we want to reallocate to.  So allocAmount is not the right one as well I think.


(In reply to Nathan Froyd (:froydnj) from comment #23)
> Ben Turner also pointed out that this stuff is very similar in style to the
> ipc/ Pickle class; it might be worth looking there for inspiration.  Less
> template magic there, so there's a lot more explicit API surface, but still
> worth poking at.

Yup, I checked on that.  Very very similar, but from the chromium code.  Do you suggest anything here?  Should we (I) use Pickle instead and WONTFIX this bug?  Do you have something in mind I should take from that class?  (Personally, the API seems to me a bit confusing on the first sight and Buffer<> has the big advantage of safe Ptr<>'s ; Pickle explicitly says it's not safe to use the pointers after the buffer is written to).  Btw, it has writeXX() methods ;)  Tho, with a different, more stream like signatures.


Btw, I so far did not add anything for network order writes.  Somehow I think that the current API in Endian.h can use this new Buffer API as:

NetworkEndian::writeUint32(buf.use<uint32_t>(), mMyInt32);
(In reply to Honza Bambas (:mayhemer) from comment #24)
> > ::: mfbt/Buffer.h
> > @@ +59,5 @@
> > > +  // to be written to a file or sent to the network.
> > > +  fwrite(buffer.head(), buffer.length(), 1, NULL /*placeholder*/);
> > > +
> > > +  // Read back the values previously stored
> > > +  uint32_t const * integer = buffer.read<uint32_t>();
> > 
> > I remain skeptical that we actually need this functionality of being able to
> > read/write the same buffer.
> 
> Maybe we can split again:
> 
> Have a buffer class you write() (or use()) to and that gives you a blob at
> the end.
> 
> Then have a different buffer class given a blob at the start you can read
> elements from.
> 
> Maybe this is something I originally had in mind but until the smart Ptr<>
> class has been introduced I messed that in a more confusing way.

I think that would be reasonable.  In the first iteration, at least, doing that means that the buffer adoption bits in wip7 could go away (WriteBuffer always returns newly-allocated memory, ReadBuffer always shares ownership--unless converting code demonstrates WriteBuffer needs something more complicated).

> > @@ +139,5 @@
> > > +
> > > +    if (!hasOneRef()) {
> > > +      // There are Ptr<> instances referencing this buffer, we may
> > > +      // not shift the buffer data now.
> > > +      return;
> > 
> > Why is it safe to move the buffer when there is one reference, but not when
> > there are two references?  For instance, I'm pretty sure that something like:
> > 
> > Buffer b ...;
> > Ptr<uint32_t const> p = b.read(2)
> > p++;
> > b.collectRead();
> > 
> > is going to cause problems, because |p| now points to the wrong place in the
> > buffer.  Actually, if I'm reading things right, you don't even need the
> > |p++|.
> 
> Oh, and it doesn't!  That is the point!
>
> collectRead does nothing here, since:
> - first reference to the BufferStorage comes from Buffer itself
> - second reference comes from the still existing Ptr<>

Thanks for explaining this!  I see that I forgot about the Buffer's reference to the underlying Storage.  Though if we split the read/write functionality, I'm not sure we need anything like collectRead() anymore...

> > @@ +178,5 @@
> > > + * It keeps reference to the BufferStorage (which keeps the physical
> > > + * buffer that may silently change) and logical offset cursor to this buffer.
> > > + */
> > > +template <typename T>
> > > +class Ptr
> > 
> > This almost seems like it should be a nested class inside Buffer, since it's
> > so tightly connected with it.  Would also avoid adding unnecessary names to
> > the mozilla:: namespace.
> 
> Yeah, good idea.

BufferStorage could probably move inside Buffer too, once you move Ptr in there.

> > @@ +243,5 @@
> > > +  {
> > > +    if (!buf || stop == cursor)
> > > +      return nullptr;
> > > +
> > > +    T * b = reinterpret_cast<T *>(buf->buf + cursor);
> > 
> > Hm, this code (and everything else that uses get()) does nothing to ensure
> > that the pointer is appropriately aligned for accesses of type T.  The
> > correctness-conscious version would be something like:
> 
> Doesn't the reinterpret_cast ensure that?

Nope.  The reinterpret_cast here simply says to the compiler, "I am converting this thing of type T1 to a completely different and entirely unrelated thing of type T2.  I take full responsibility for any brokenness that occurs."  Just like reinterpret_cast'ing integers to pointers doesn't ensure the resulting pointer points to something valid, reinterpret_cast'ing pointers around doesn't ensure that the resulting pointer is actually useful or meaningful.

So if the underlying storage is pointing at address 0xZZZZZZ00 and we do something like:

  Ptr<uint16_t> p16 = b.use<uint16_t>();
  Ptr<uint32_t> p32 = b.use<uint32_t>();

|p32| is now referencing storage at address 0xZZZZZZ02.  If |uint32_t| is required to be aligned on 4-byte boundaries, and the underlying machine strictly enforces alignment, trying to read or write a |uint32_t| from |p32| will result in an alignment fault.  This sort of thing isn't noticeable on x86 or x86-64 because they handle unaligned accesses, but it's very noticeable when porting to other chips.  Better to avoid the unspecified behavior.

(The standard says that if you are going to do |reinterpret_cast<T2*>(p)| where |p| is of type |T1*|, then T2's alignment requirements must be no stricter than those of T1.  In this case, that's not true: |T1*| is |void*|, which will have the same alignment requirements as |char|.  But on all the machines we care about, nearly everything has stricter alignment requirements than |char|.  Ergo, nearly all interesting cases of use<T>() are unspecified by the standard.  That's not a good thing.)

> > operator void* () const {
> >   return get();
> > }
> > 
> > T operator*() const {
> >   T t;
> >   memcpy(&t, get(), sizeof(T));
> >   return T;
> > }
> 
> Oh, why memcpy??  Can you explain more??

If get() returns |void*|, then we can't know that it has the correct alignment for values of type T.  Therefore, we have to use memcpy, which just copies bytes and doesn't care about alignments.

> > But that doesn't really help with the operator++ versions (I don't think we
> > want those to return Ptr<T>).  It also makes the class somewhat less
> > convenient to use.
> 
> I don't follow.  You can always do *buf.use<uint32_t>(1) = myValue; and
> uint32_t const myValue = *buf.read<uint32_t>(1);  as well as uint32_t const*
> myValue = buf.read<uint32_t>(1); with fact that the pointer may point to a
> bad memory when the buffer is being written to or so.

I do not follow your response, particularly the part about "bad memory".  The whole point of returning Ptr<> is that the buffer can't disappear out from underneath your Ptr<> and therefore you can never read/write to memory you no longer own.  The alignment concern isn't about use-after-free or something like that; it's about segfaults due to alignment traps and unspecified behavior.

> > @@ +300,5 @@
> > > +{
> > > +public:
> > > +  Buffer() { }
> > > +
> > > +  Buffer(size_t count)
> > 
> > This function is going to need a comment.
> 
> Is a reference to create() just bellow enough?  Otherwise I would mostly
> copy its comment here.

That's probably sufficient, assuming create() is actually useful.

> > @@ +343,5 @@
> > > +   * Releases the current buffer.  When there are no Ptr<> activalely
> > > +   * pointing at the storage the physical buffer is freed if the Buffer
> > > +   * instance created or adopted it.
> > > +   */
> > > +  void release()
> > 
> > Likewise with this function and the publicness of it.
> 
> See my previous comment.  I want to be able to have a single Buffer<>
> mBuffer; member of a class that may freely allocate with create() when
> needed and release() when no needed.  I don't want to do AutoPtr<Buffer<> >
> mBuffer or something like that...  What do you think?

I am not sure this is useful.  But I am willing to be convinced by code.

> > @@ +349,5 @@
> > > +    buf = nullptr;
> > > +  }
> > > +
> > > +  /**
> > > +   * Returns how many bytes or elements of type T is available to be written
> > 
> > This is not a template function, so there's no T available here.  I think
> > you want to make this private and only expose the templated version.
> 
> Comment might confuse you here.  I want this exposed for convenience.  I
> want both room() / working on the actual buffer size / and
> room<SpecificType>() be public.

Well, the comment should be fixed for this function in any event.  I'll defer a decision here until we see use cases in client code.

> > @@ +378,5 @@
> > > +   * @param reallocBy
> > > +   *    if there is not |count| elements available in the remaining buffer
> > > +   *    space, the buffer can be automatically reallocated
> > > +   *
> > > +   *    REALLOC_NONE - don't reallocate automatically, use() returns null when
> > 
> > I'd like to say that I find it amusing that you rejected my suggestion of
> > "alloc" for this function on the grounds that all the allocation is
> > "hidden", but couldn't avoid using "reallocation"/"allocate" continuously in
> > the comment describing this function, the various names in the function
> > interface, and bits of the API elsewhere. ;)
> 
> Ah... :) yeah, maybe I misunderstood or forget to address all of it.  This
> is still a memory buffer and some control might be useful.  But if you think
> that to just always realloc is good, I will go with it.  Maybe this (what I
> draft here) is unnecessarily complicated.

For avoidance of doubt, I don't think it's necessarily a bad thing that there are "special" arguments.  I was more just commenting on the naming of things in the API.  If you wanted to rename use() to alloc(), I would be all for that. :)

> > @@ +463,5 @@
> > > +
> > > +  /**
> > > +   * Throws away all that has already been read from the buffer,
> > > +   * this actually compacts the memory usage so that we don't need
> > > +   * to reallocate that often.
> > 
> > Please fix this run-on sentence.  (Other instances of this pattern abound in
> > the comments; I'm not going to point them all out.)
> 
> I'm sorry, I don't understand correctly what you want from me here?

You have two independent sentences:

- "Throws away all that has already been read from the buffer"
- "This actually compacts the memory usage so that we don't need to reallocate that often"

But you've joined them with a comma, which makes the whole thing a run-on sentence.  Either rewrite the single sentence properly or separate into two distinct sentences.

See also http://en.wikipedia.org/wiki/Run-on_sentence

("compacts the memory usage" is also quite awkward, but I'm not immediately sure what the best replacement there is, other than that it should be replaced.)

> Btw, I so far did not add anything for network order writes.  Somehow I
> think that the current API in Endian.h can use this new Buffer API as:
> 
> NetworkEndian::writeUint32(buf.use<uint32_t>(), mMyInt32);

Yup, that would be ideal.
(In reply to Honza Bambas (:mayhemer) from comment #24)
> (In reply to Nathan Froyd (:froydnj) from comment #22)
> > Comment on attachment 8402405 [details] [diff] [review]
> > wip7
> > 
> > Review of attachment 8402405 [details] [diff] [review]:
> > -----------------------------------------------------------------
> > 
> > We are getting a lot closer!  Still some things to address; I tried to point
> > out things in comments too, if you're really serious that this is something
> > worth asking for review on.
> 
> Thank you for the quick feedback again!
> 
> > 
> > ::: mfbt/Buffer.h
> > @@ +59,5 @@
> > > +  // to be written to a file or sent to the network.
> > > +  fwrite(buffer.head(), buffer.length(), 1, NULL /*placeholder*/);
> > > +
> > > +  // Read back the values previously stored
> > > +  uint32_t const * integer = buffer.read<uint32_t>();
> > 
> > I remain skeptical that we actually need this functionality of being able to
> > read/write the same buffer.
> 
> Maybe we can split again:
> 
> Have a buffer class you write() (or use()) to and that gives you a blob at
> the end.
> 
> Then have a different buffer class given a blob at the start you can read
> elements from.
> 
> Maybe this is something I originally had in mind but until the smart Ptr<>
> class has been introduced I messed that in a more confusing way.
> 
> 
> I think I should expand the examples to show how to write a blob and then
> how to read it back from say a file.  Outline of reading a blob:
> 
> size_t bytesToRead = myFile->AvailableBytes();
> 
> Buffer<> b(bytesToRead);
> myFile->Read(b.use(bytesToRead), bytesToRead);
> 
> Ptr<uint32_t const> firstElement = b.read<uint32_t>();
> Ptr<uint64_t const> secondElement = b.read<uint64_t>();
> 
> etc...
> 
> 
> 
> > 
> > Since the best argument is that code actually wants to do this, can you
> > convert some of the aforementioned netwerk/ instances into code that uses
> > this API?  If read/write on a given buffer is useful there, then we can
> > include it in the API.  If it turns out to be unused there, then we can make
> > the API not support that usecase and then wait for somebody to actually need
> > it.  It'll be better to not land code into mfbt without some use elsewhere
> > in the tree.
> 
> I understand and agree.
> 
> > @@ +139,5 @@
> > > +
> > > +    if (!hasOneRef()) {
> > > +      // There are Ptr<> instances referencing this buffer, we may
> > > +      // not shift the buffer data now.
> > > +      return;
> > 
> > Why is it safe to move the buffer when there is one reference, but not when
> > there are two references?  For instance, I'm pretty sure that something like:
> > 
> > Buffer b ...;
> > Ptr<uint32_t const> p = b.read(2)
> > p++;
> > b.collectRead();
> > 
> > is going to cause problems, because |p| now points to the wrong place in the
> > buffer.  Actually, if I'm reading things right, you don't even need the
> > |p++|.
> 
> Oh, and it doesn't!  That is the point!
> 
> Line by line:
> > Buffer b ...;
> say we have a buffer and something it, at least 2 * 4 bytes so that the
> following read(2) won't fail
> 
> you get a pointer to where you can read your expected value
> > Ptr<uint32_t const> p = b.read(2)
> here you forget the first uint32_t element you wanted to read - this is
> actually a bad use of the ptr!
> > p++;
> at this line the *p will give you the second element in the buffer
> 
> collectRead does nothing here, since:
> - first reference to the BufferStorage comes from Buffer itself
> - second reference comes from the still existing Ptr<>
> > b.collectRead();
> 
> hence, p points to the correct position.  This worked even in wip6.
> 
> 
> > 
> > Seems like a very footgunnish API.  I'm kind of skeptical that any converted
> > code is going to use this safely.
> 
> I hope the lines above shows otherwise.  Also, next time please read the
> comments (comment 10) I put to bugzilla along with patches it explains what
> may be missing in comments of the draft patches.
> 
> > @@ +178,5 @@
> > > + * It keeps reference to the BufferStorage (which keeps the physical
> > > + * buffer that may silently change) and logical offset cursor to this buffer.
> > > + */
> > > +template <typename T>
> > > +class Ptr
> > 
> > This almost seems like it should be a nested class inside Buffer, since it's
> > so tightly connected with it.  Would also avoid adding unnecessary names to
> > the mozilla:: namespace.
> 
> Yeah, good idea.
> 
> > 
> > @@ +243,5 @@
> > > +  {
> > > +    if (!buf || stop == cursor)
> > > +      return nullptr;
> > > +
> > > +    T * b = reinterpret_cast<T *>(buf->buf + cursor);
> > 
> > Hm, this code (and everything else that uses get()) does nothing to ensure
> > that the pointer is appropriately aligned for accesses of type T.  The
> > correctness-conscious version would be something like:
> 
> Doesn't the reinterpret_cast ensure that?
> 
> > 
> > operator void* () const {
> >   return get();
> > }
> > 
> > T operator*() const {
> >   T t;
> >   memcpy(&t, get(), sizeof(T));
> >   return T;
> > }
> 
> Oh, why memcpy??  Can you explain more??
> 
> > 
> > But that doesn't really help with the operator++ versions (I don't think we
> > want those to return Ptr<T>).  It also makes the class somewhat less
> > convenient to use.
> 
> I don't follow.  You can always do *buf.use<uint32_t>(1) = myValue; and
> uint32_t const myValue = *buf.read<uint32_t>(1);  as well as uint32_t const*
> myValue = buf.read<uint32_t>(1); with fact that the pointer may point to a
> bad memory when the buffer is being written to or so.
> 
> > 
> > @@ +256,5 @@
> > > +/**
> > > + * When number of elements doesn't fit the buffer, don't reallocate, just
> > > + * return null as the write buffer target pointer.
> > > + */
> > > +enum { REALLOC_NONE = 0 };
> > 
> > These REALLOC constants should probably live inside Buffer, to reduce
> > namespace pollution.
> 
> I'm not sure I can then use them as a default values for the Buffer<>
> template.  But yes, I was thinking of moving these inside too.
> 
> > @@ +300,5 @@
> > > +{
> > > +public:
> > > +  Buffer() { }
> > > +
> > > +  Buffer(size_t count)
> > 
> > This function is going to need a comment.
> 
> Is a reference to create() just bellow enough?  Otherwise I would mostly
> copy its comment here.
> 
> > 
> > @@ +313,5 @@
> > > +
> > > +  /**
> > > +   * Create (allocate) a buffer of an initial size of |count| elements.
> > > +   */
> > > +  bool create(size_t count)
> > 
> > The comment should also talk about bytes, not elements.
> > 
> > I'm not convinced this should be public.  But I guess that's what actually
> > rewriting code to use this will decide.
> 
> I think we may want to be able to (re)create the buffer somewhat dynamically
> too.  And release it as well.  For me a valid use case.
> 
> > @@ +343,5 @@
> > > +   * Releases the current buffer.  When there are no Ptr<> activalely
> > > +   * pointing at the storage the physical buffer is freed if the Buffer
> > > +   * instance created or adopted it.
> > > +   */
> > > +  void release()
> > 
> > Likewise with this function and the publicness of it.
> 
> See my previous comment.  I want to be able to have a single Buffer<>
> mBuffer; member of a class that may freely allocate with create() when
> needed and release() when no needed.  I don't want to do AutoPtr<Buffer<> >
> mBuffer or something like that...  What do you think?

It's also worth pointing out that you can get this functionality of create/release through mozilla::Maybe (mfbt/Maybe.h), which is a more elegant way of doing things, IMHO.
(In reply to Nathan Froyd (:froydnj) from comment #25)
> > Maybe this is something I originally had in mind but until the smart Ptr<>
> > class has been introduced I messed that in a more confusing way.
> 
> I think that would be reasonable.  In the first iteration, at least, doing
> that means that the buffer adoption bits in wip7 could go away (WriteBuffer
> always returns newly-allocated memory, ReadBuffer always shares
> ownership--unless converting code demonstrates WriteBuffer needs something
> more complicated).

I am not sure I agree, hence I think I'll stop arguing and updating the patch at this point and rather try to use it in some of the real code.  Then we'll get the answer how much this can be simplified.

> > collectRead does nothing here, since:
> > - first reference to the BufferStorage comes from Buffer itself
> > - second reference comes from the still existing Ptr<>
> 
> Thanks for explaining this!  I see that I forgot about the Buffer's
> reference to the underlying Storage.  Though if we split the read/write
> functionality, I'm not sure we need anything like collectRead() anymore...

I see a clear example for collectRead in our code:
http://hg.mozilla.org/mozilla-central/annotate/25aeb2bc79f2/netwerk/cache2/CacheIndex.cpp#l2052

Here a single buffer is reused, so the buffer to read the blob to is not reallocated again and again.  But a real usage will show.

> > > @@ +243,5 @@
> > > > +  {
> > > > +    if (!buf || stop == cursor)
> > > > +      return nullptr;
> > > > +
> > > > +    T * b = reinterpret_cast<T *>(buf->buf + cursor);
> > > 
> > > Hm, this code (and everything else that uses get()) does nothing to ensure
> > > that the pointer is appropriately aligned for accesses of type T.  The
> > > correctness-conscious version would be something like:
> > 
> > Doesn't the reinterpret_cast ensure that?
> 
> Nope.  The reinterpret_cast here simply says to the compiler, "I am
> converting this thing of type T1 to a completely different and entirely
> unrelated thing of type T2.  I take full responsibility for any brokenness
> that occurs."  Just like reinterpret_cast'ing integers to pointers doesn't
> ensure the resulting pointer points to something valid, reinterpret_cast'ing
> pointers around doesn't ensure that the resulting pointer is actually useful
> or meaningful.
> 
> So if the underlying storage is pointing at address 0xZZZZZZ00 and we do
> something like:
> 
>   Ptr<uint16_t> p16 = b.use<uint16_t>();
>   Ptr<uint32_t> p32 = b.use<uint32_t>();
> 
> |p32| is now referencing storage at address 0xZZZZZZ02.  If |uint32_t| is
> required to be aligned on 4-byte boundaries, and the underlying machine
> strictly enforces alignment, trying to read or write a |uint32_t| from |p32|
> will result in an alignment fault.  This sort of thing isn't noticeable on
> x86 or x86-64 because they handle unaligned accesses, but it's very
> noticeable when porting to other chips.  Better to avoid the unspecified
> behavior.
> 
> (The standard says that if you are going to do |reinterpret_cast<T2*>(p)|
> where |p| is of type |T1*|, then T2's alignment requirements must be no
> stricter than those of T1.  In this case, that's not true: |T1*| is |void*|,
> which will have the same alignment requirements as |char|.  But on all the
> machines we care about, nearly everything has stricter alignment
> requirements than |char|.  Ergo, nearly all interesting cases of use<T>()
> are unspecified by the standard.  That's not a good thing.)

Thanks.  Every day I learn something new.

> 
> > > operator void* () const {
> > >   return get();
> > > }
> > > 
> > > T operator*() const {
> > >   T t;
> > >   memcpy(&t, get(), sizeof(T));
> > >   return T;
> > > }
> > 
> > Oh, why memcpy??  Can you explain more??
> 
> If get() returns |void*|, then we can't know that it has the correct
> alignment for values of type T.  Therefore, we have to use memcpy, which
> just copies bytes and doesn't care about alignments.
> 
> > > But that doesn't really help with the operator++ versions (I don't think we
> > > want those to return Ptr<T>).  It also makes the class somewhat less
> > > convenient to use.
> > 
> > I don't follow.  You can always do *buf.use<uint32_t>(1) = myValue; and
> > uint32_t const myValue = *buf.read<uint32_t>(1);  as well as uint32_t const*
> > myValue = buf.read<uint32_t>(1); with fact that the pointer may point to a
> > bad memory when the buffer is being written to or so.
> 
> I do not follow your response, particularly the part about "bad memory". 
> The whole point of returning Ptr<> is that the buffer can't disappear out
> from underneath your Ptr<> and therefore you can never read/write to memory
> you no longer own.  The alignment concern isn't about use-after-free or
> something like that; it's about segfaults due to alignment traps and
> unspecified behavior.

Yep, OK.  Please forget my response (I don't understand the original concern).

Can you please rephrase you original question?  


Bottom line:

I will try to incorporate this to a real code, our new cache metadata or cache index code is the first I want to try with.  Since that is more work and that exact code right now develops a lot, I won't get to it sooner then next week or even later.  Please stay tuned, I really want to finish this to get to the tree.  So far, thanks for all your great and quick feedback!
Note for me: check if this could help with fixing bug 993591 easier way.
Assignee: honzab.moz → dd.mozilla
I will continue working on this bug.
Attached patch wip8 (obsolete) — Splinter Review
I have made some changes and I apologize in advance if I have missed some comments and I have not made suggested changes.

The patch is not ready for a final review, I just want to get an agreement about API that we want.
So any suggestions are welcome...

In the first patch, there are only changes to  to Buffer.h and TestBuffer.cpp
The second patch is an example how it can be used (CacheFileMetadata class)


A short description:
 - we still have one Buffer for read and write. I have not separated buffer classes to ReadBuffer and WriteBuffer
- we still have a Ptr<> and use<> returns Ptr<T> and read returns Ptr<T const>

- I have added:
  template <typename T>
  bool write(T const* src, size_t count = 1)

which does the necessary checks and do a memcpy of data. This should be use when we want fast data copy


char * useChar(size_t count = 1)
const char * readChar(size_t count = 1)
these two return pointers to the row buffer. There are some examples where we can use them in the second patch

/**
   * Makes free space for writing at the head of the buffer.
   * If there Ptr<> refering to the buffer, the buffer will not be shifted
   * This function will rewindRead;
   */
template <typename T>
Ptr<T> unshift(size_t count = 1)

this function is needed in CacheFileMetadata class, but I am not sure if we should make it available.


And

void rewindRead()

This is also needed in CacheFileMetadata class,
because the buffer structure is 
[hash:uint32]
[data:arbitrary] 
[length]

so first I write all data leaving pointer to [hash:uint32], this is
Ptr<uint32_t> pHash = buf.use<uint32_t>();

then I read data that I need for hash calculation 
calculate hash and write it to pHash

Suggestion from Honza, as a possible solution was to have a Cursor<T> and do something like:

 Ptr<uint32_t> hash = buf.use<uint32_t>();
Cursor<char> hashBegin = buf.writeCursor();

...write the [data]...

Cursor<char> hashEnd = buf.writeCursor();
*hash = CacheHash::Hash(hashBegin.get(), hashEnd – hashBegin);


Honza: please correct me if I quoted you wrong

I am afraid that the API will get to complicated with jet another way to access data.
Attachment #8402405 - Attachment is obsolete: true
Attachment #8452346 - Flags: feedback?(nfroyd)
Attachment #8452346 - Flags: feedback?(honzab.moz)
Comment on attachment 8452346 [details] [diff] [review]
wip8

Review of attachment 8452346 [details] [diff] [review]:
-----------------------------------------------------------------

Thanks for updating this.  And for converting bits to use it!  I've attempted to remember everything about this from three months ago; I apologize in advance if I'm repeating myself.

I've just browsed through and commented on things.  There's a lot of comments, but many of them are very small things (if tedious to fix), especially the bits of MFBT style that have changed over the past three months.

f+, but really awaiting the version with separate read/write buffers.  I think that will make things a lot simpler.

::: mfbt/Buffer.h
@@ +9,5 @@
> + */
> +
> +#if 0
> +{
> +  /* Example of use */

MFBT style is to use comments for this, rather than #if 0'd blocks of code (though I do see the appeal of using #if 0 blocks).

@@ +85,5 @@
> +
> +#include "mozilla/AllocPolicy.h"
> +#include "mozilla/Assertions.h"
> +#include "mozilla/Types.h"
> +#include "mozilla/RefPtr.h"

Nit: style is to sort the headers alphabetically.

@@ +98,5 @@
> + */
> +class AllocatorFallible
> +{
> +public:
> +  static void * alloc(size_t count)

Nit: |*| goes with the type, so |static void* alloc(...)|.  Here and other places throughout the file.

@@ +103,5 @@
> +  {
> +    return MallocAllocPolicy().malloc_(count);
> +  }
> +
> +  static void * realloc(void * buf, size_t copy, size_t count)

realloc_, just like MallocAllocPolicy.

@@ +108,5 @@
> +  {
> +    return MallocAllocPolicy().realloc_(buf, copy, count);
> +  }
> +
> +  static void free(void * ptr)

free_, just like MallocAllocPolicy.

@@ +117,5 @@
> +
> +/**
> + * The free() function type.
> + */
> +typedef void(*FreeFunction)(void * ptr);

This typedef should be internal to Buffer or placed in the mozilla::detail namespace.

@@ +142,5 @@
> + *    Define default behavior of the buffer reallocation when write doesn't 
> + *    fit the current buffer space left.
> + */
> +
> +template <size_t DefaultRealloc = REALLOC_NONE,

Please move the REALLOC_* constants into a named enum and require the template argument to be a named enum here.

Also, nit: no space between the |template| and the |<|.  Here and other places.

@@ +143,5 @@
> + *    fit the current buffer space left.
> + */
> +
> +template <size_t DefaultRealloc = REALLOC_NONE,
> +            class Allocator = AllocatorFallible>

Nit: indentation.

I'm not completely convinced we need this class to be templated.

@@ +158,5 @@
> +  /**
> +    * Instructs the buffer to not release or reallocate the buffer
> +    * it was given.
> +    */
> +  BUFFER_SHARE = 0, 

Nit: no trailing whitespace.  Here and other places throughout the file.

@@ +171,5 @@
> +  /**
> +   * Internal helper class storing the actual buffer, its size
> +   * and the write/read cursors.
> +   */
> +  class BufferStorage : public mozilla::RefCounted<BufferStorage>

Using RefCounted is now deprecated; the macros from XPCOM are preferred (NS_INLINE_DECL_REFCOUNTING is what you want here, I believe).

This class should also not be public.

@@ +186,5 @@
> +
> +    ~BufferStorage()
> +    {
> +      if (freeFunction)
> +        freeFunction(buf);

Nit: MFBT style now requires braces around all control structures, like Gecko.  Here and elsewhere.

@@ +254,5 @@
> +    // Size of the allocated memory.
> +    size_t size;
> +
> +  private:
> +    FreeFunction freeFunction;

Nit: MFBT style now prefixes members with an 'm', similar to the main Gecko code.  Here and lots of other places.

@@ +319,5 @@
> +
> +    Ptr& operator+=(int n)
> +    {
> +      if (buf)
> +        cursor += n*sizeof(T);

I think you want to check |cursor >= stop| or similar here.  The desired end result is that |cursor| should always be |<= stop|.

@@ +344,5 @@
> +    Ptr(BufferStorage *bufferStorage, size_t startCursor,
> +        size_t endCursor)
> +      : buf(bufferStorage), cursor(startCursor), stop(endCursor)
> +    {
> +      MOZ_ASSERT((stop - cursor) % sizeof(T) == 0);

You should also assert |cursor <= stop| here.

@@ +356,5 @@
> +    }
> +
> +    RefPtr<BufferStorage> buf;
> +    size_t cursor;
> +    size_t stop;

I think |end| is marginally better than |stop| here.  But we have had a lot of arguments about naming in this bug, so I am willing to be overruled here.

@@ +429,5 @@
> +   */
> +  bool create(size_t count)
> +  {
> +    buf = new BufferStorage(Allocator::alloc(count),
> +                                    count, freeFunc(BUFFER_ADOPT));

Nit: indentation.  Here and other places.

@@ +500,5 @@
> +   *
> +   *    If there is not enough space, nullptr is returned.
> +   */
> +  template <typename T>
> +  Ptr<T> use(size_t count = 1)

I still think this should be called |alloc|.

@@ +507,5 @@
> +      return Ptr<T>();
> +
> +    // check for overflow
> +    if (count > SIZE_MAX / sizeof(T))
> +      return Ptr<T>();

This overflow checking should probably be done with mozilla/CheckedInt.h.  Here and elsewhere.

@@ +547,5 @@
> +   * from src to the buffer. If there is no enough space it return false, 
> +   * otherwise returns true. 
> +   */
> +  template <typename T>
> +  bool write(T const* src, size_t count = 1)

I think you want to make this MOZ_WARN_UNUSED_RESULT, because somebody's probably going to ignore the bool result at the worst possible time.

@@ +677,5 @@
> +    if (buf)
> +      buf->collectRead();
> +  }
> +
> +  void rewindRead()

I don't see why this is necessary if we can hold pointers to earlier positions in the buffer.

@@ +686,5 @@
> +
> +  /**
> +   * Return pointer to the beginning of underlying buffer. 
> +   */
> +  void * head() const

This seems like a bad idea, surely there's utility methods we can add that accomplish what uses of this method would.  (For instance, where we are using this function for memory-reporting purposes, we should instead have a |sizeOfExcludingThis| function that computes the appropriate result internal to this class.)

::: mfbt/tests/TestBuffer.cpp
@@ +3,5 @@
> + * file, You can obtain one at http://mozilla.org/MPL/2.0/. */
> +
> +#include "mozilla/Assertions.h"
> +#include "mozilla/Buffer.h"
> +#include "mozilla/DebugOnly.h"

Nit: sorted headers.

@@ +5,5 @@
> +#include "mozilla/Assertions.h"
> +#include "mozilla/Buffer.h"
> +#include "mozilla/DebugOnly.h"
> +
> +using namespace mozilla;

Please spell out the used things individually for test files.

@@ +9,5 @@
> +using namespace mozilla;
> +
> +#define CHECK(cond)           \
> +  do {                        \
> +    MOZ_ASSERT(cond);         \

This should be MOZ_RELEASE_ASSERT so it checks in debug builds, too.

@@ +10,5 @@
> +
> +#define CHECK(cond)           \
> +  do {                        \
> +    MOZ_ASSERT(cond);         \
> +    if (!(cond)) return 1;    \

All the return values are unnecessary since we rely on MOZ_RELEASE_ASSERT.

@@ +228,5 @@
> +{
> +  char buf[1];
> +  {
> +    AllocatorTest::buf = buf;
> +    AllocatorTest::freed = false;

After reading the tests, I'm wondering if we shouldn't make Buffer take a MallocAllocPolicy-esque argument and abandon the template argument...
Attachment #8452346 - Flags: feedback?(nfroyd) → feedback+
thanks for the fast feedback.

> f+, but really awaiting the version with separate read/write buffers.  I
> think that will make things a lot simpler.
> 

I am not sure that separate read/write buffer will be simpler. 
Probably a standard use case will be to read data from a file into a buffer an then read the data.
With the single buffer it can be done both first read then write.

also other way around first write data into a buffer then read this buffer and store it into a file or give it for calculating checksum or something.

With read/write buffer, am not sure that adoption stuff will go away, as you wrote sometime ago. 

the adoption stuff can go away if we do not care about other owners of the buffer:
if the buffer is created inside Buffer, it is never given away
or
if Buffer is initiated with an existing buffer created elsewhere we assume that belongs to the Buffer<> now and we can do everything with it

otherwise we will need adoption stuff.


> > +

> > +template <size_t DefaultRealloc = REALLOC_NONE,
> > +            class Allocator = AllocatorFallible>
> I'm not completely convinced we need this class to be templated.
> 

i thought about this too. I will remove |size_t DefaultRealloc = REALLOC_NONE|, this will be parameter for constructor.

the second one |class Allocator = AllocatorFallible|: there is a comment from Honza that then Allocator can be customized, maybe he can give example what he had in mind.



> > +   */
> > +  template <typename T>
> > +  Ptr<T> use(size_t count = 1)
> 
> I still think this should be called |alloc|.
> 

I will change this and the other thinks you wrote


> @@ +677,5 @@
> > +    if (buf)
> > +      buf->collectRead();
> > +  }
> > +
> > +  void rewindRead()
> 
> I don't see why this is necessary if we can hold pointers to earlier
> positions in the buffer.
> 

The example from the CacheFileMetadata:

A stucture that is written is :

[hash]
[metadataHeader]
[key]
[someElements]
[length]

so we write [metadataHeader], [key], [someElements] separately but we want a read pointer (char) that point to all 3 of them to be able to calculate [hash].

so first we write these 3 parts, then go back read written parts for hash calculation 
and in the end we want to read everything from the beginning (rewindRead is call) as char* and sent to CacheFileIO (or something like that)

 
> @@ +686,5 @@
> > +
> > +  /**
> > +   * Return pointer to the beginning of underlying buffer. 
> > +   */
> > +  void * head() const
> 
> This seems like a bad idea, surely there's utility methods we can add that
> accomplish what uses of this method would.  (For instance, where we are
> using this function for memory-reporting purposes, we should instead have a
> |sizeOfExcludingThis| function that computes the appropriate result internal
> to this class.)
> 

we can remove this function.


> @@ +228,5 @@
> > +{
> > +  char buf[1];
> > +  {
> > +    AllocatorTest::buf = buf;
> > +    AllocatorTest::freed = false;
> 
> After reading the tests, I'm wondering if we shouldn't make Buffer take a
> MallocAllocPolicy-esque argument and abandon the template argument...



sorry that i forgot to change some parts that you already commented before.
Nathan, some of my first patches had been split to Read and WriteBuffer.  You didn't like it.  So I've merged to a single Buffer doing both.  Now you don't like it too.  So what is it you would like to see now?  I'm not against to split, just let's make sure we will not have to merge later again...
(In reply to Honza Bambas (:mayhemer) from comment #34)
> Nathan, some of my first patches had been split to Read and WriteBuffer. 
> You didn't like it.  So I've merged to a single Buffer doing both.  Now you
> don't like it too.  So what is it you would like to see now?  I'm not
> against to split, just let's make sure we will not have to merge later
> again...

I admit that I am having a hard time swapping in all the context after a three-month hiatus.  I was mostly going off Dragana's comment 30 that suggested that a forthcoming split between ReadBuffer and WriteBuffer was coming.  I can see how that comment could be interpreted as "I'm going to do this later" (as I did) or "I've not done this, even though it was suggested, because I don't think it's a good idea" (maybe what Dragana meant?), though.

Can you point out places where I asked for a single buffer abstraction?  I'm not seeing them, but it's entirely possible that I'm reading back into my comments what I thought I intended to say rather than what you interpreted them as.

(In reply to Dragana Damjanovic from comment #33)
> thanks for the fast feedback.
> 
> > f+, but really awaiting the version with separate read/write buffers.  I
> > think that will make things a lot simpler.
> 
> I am not sure that separate read/write buffer will be simpler. 
> Probably a standard use case will be to read data from a file into a buffer
> an then read the data.
> With the single buffer it can be done both first read then write.

I have been thinking of accomplishing that use case as:

  size_t bufferSize = ...
  void* bufferMemory = malloc(bufferSize);
  // malloc is infallible in Gecko, don't need to check |bufferMemory|
  fread(bufferMemory, 1, bufferSize, file);
  Buffer buffer(bufferMemory, bufferSize);
  ...proceed with |buffer|...

From the CacheFileMetadata, I guess the read/write API would be:

  ...|buffer| declared previously...
  size_t bufferSize = ...
  buffer.create(bufferSize);
  ...check for failure, since we don't know the underlying allocator...
  fread(buffer.allocChar(bufferSize), 1, bufferSize, file); // currently useChar

I'm not sure that's definitely better.

> also other way around first write data into a buffer then read this buffer
> and store it into a file or give it for calculating checksum or something.

I would think that you would write the data into a buffer and then obtain a pointer to the necessary data.  No need to get a separate buffer for reading.

> the adoption stuff can go away if we do not care about other owners of the
> buffer:
> if the buffer is created inside Buffer, it is never given away
> or
> if Buffer is initiated with an existing buffer created elsewhere we assume
> that belongs to the Buffer<> now and we can do everything with it

I think these are the only cases that matter.  (And we can enforce the second with the newly-added mozilla::UniquePtr, even.)

> > > +template <size_t DefaultRealloc = REALLOC_NONE,
> > > +            class Allocator = AllocatorFallible>
> > I'm not completely convinced we need this class to be templated.
> > 
> 
> i thought about this too. I will remove |size_t DefaultRealloc =
> REALLOC_NONE|, this will be parameter for constructor.
> 
> the second one |class Allocator = AllocatorFallible|: there is a comment
> from Honza that then Allocator can be customized, maybe he can give example
> what he had in mind.

Various people have been working on removing custom allocators from Gecko.  There are still a few around, but they're very specialized.  I have a hard time seeing how a specialized allocator could be called for here.  I guess it would be helpful for testing.

> > @@ +677,5 @@
> > > +    if (buf)
> > > +      buf->collectRead();
> > > +  }
> > > +
> > > +  void rewindRead()
> > 
> > I don't see why this is necessary if we can hold pointers to earlier
> > positions in the buffer.
> > 
> 
> The example from the CacheFileMetadata:
> 
> A stucture that is written is :
> 
> [hash]
> [metadataHeader]
> [key]
> [someElements]
> [length]
> 
> so we write [metadataHeader], [key], [someElements] separately but we want a
> read pointer (char) that point to all 3 of them to be able to calculate
> [hash].
> 
> so first we write these 3 parts, then go back read written parts for hash
> calculation 

Ah, I see what the problem is here.  We can save a Ptr<> to [metadataHeader], but that Ptr<>, when created, must be bounded to the size of [metadataHeader].  And that's not what we want, we want the bounds to include [metadataHeader], [key], and [someElements].  But we don't know those bounds, because we haven't written all the data yet.

Maybe we should just abandon the notion that Ptr<>s are bounded and make them more like "real" pointers?  I guess that gives up some small amount of safety, but...

> and in the end we want to read everything from the beginning (rewindRead is
> call) as char* and sent to CacheFileIO (or something like that)

Aren't we done with the buffer at that point?  Can't we just get the pointer to the beginning (I guess that would be |head()|...) and be done with it, rather than rewinding?
Comment on attachment 8452347 [details] [diff] [review]
wip8 changes to CacheFileMetadata class

Review of attachment 8452347 [details] [diff] [review]:
-----------------------------------------------------------------

::: netwerk/cache2/CacheFileMetadata.cpp
@@ +259,5 @@
>  
> +  size_t i=0;
> +  while(i<mHashCount) {
> +    *(mWriteBuf.use<CacheHash::Hash16_t>()) = mHashArray[i++];
> +  }

Seems like we should use:

mWriteBuf.write<CacheHash::Hash16_t>(mHashArray, mHashCount)?

?

@@ +868,5 @@
>                     moz_xmalloc(mHashArraySize));
> +    int i = 0;
> +    while (pHash) {
> +      mHashArray[i++] = *(pHash++);
> +    }

Seems unfortunate that we replaced memcpy with this element-by-element copy; can we replace this with:

memcpy(mHashArray, static_cast<void*>(pHash), mHashArray)?

I think the static_cast does what we want here...
(In reply to Nathan Froyd (:froydnj) from comment #35)
> (In reply to Honza Bambas (:mayhemer) from comment #34)
> > Nathan, some of my first patches had been split to Read and WriteBuffer. 
> > You didn't like it.  So I've merged to a single Buffer doing both.  Now you
> > don't like it too.  So what is it you would like to see now?  I'm not
> > against to split, just let's make sure we will not have to merge later
> > again...
> 
> I admit that I am having a hard time swapping in all the context after a
> three-month hiatus.  I was mostly going off Dragana's comment 30 that
> suggested that a forthcoming split between ReadBuffer and WriteBuffer was
> coming.  I can see how that comment could be interpreted as "I'm going to do
> this later" (as I did) or "I've not done this, even though it was suggested,
> because I don't think it's a good idea" (maybe what Dragana meant?), though.
> 
> Can you point out places where I asked for a single buffer abstraction?  I'm
> not seeing them, but it's entirely possible that I'm reading back into my
> comments what I thought I intended to say rather than what you interpreted
> them as.
> 
> (In reply to Dragana Damjanovic from comment #33)
> > thanks for the fast feedback.
> > 
> > > f+, but really awaiting the version with separate read/write buffers.  I
> > > think that will make things a lot simpler.
> > 
> > I am not sure that separate read/write buffer will be simpler. 
> > Probably a standard use case will be to read data from a file into a buffer
> > an then read the data.
> > With the single buffer it can be done both first read then write.
> 
> I have been thinking of accomplishing that use case as:
> 
>   size_t bufferSize = ...
>   void* bufferMemory = malloc(bufferSize);
>   // malloc is infallible in Gecko, don't need to check |bufferMemory|
>   fread(bufferMemory, 1, bufferSize, file);
>   Buffer buffer(bufferMemory, bufferSize);
>   ...proceed with |buffer|...
> 
> From the CacheFileMetadata, I guess the read/write API would be:
> 
>   ...|buffer| declared previously...
>   size_t bufferSize = ...
>   buffer.create(bufferSize);
>   ...check for failure, since we don't know the underlying allocator...
>   fread(buffer.allocChar(bufferSize), 1, bufferSize, file); // currently useChar
>
> I'm not sure that's definitely better.

I thought about this a little more, and can offer a better explanation for why I prefer the first case.  In the second case, we "create" the buffer first (but aren't really able to do anything with it yet) before actually putting it in a usable state (via allocChar/useChar).  Whereas in the first case, we are constructing the block of memory, initializing it, and then handing it to the buffer, which is fully formed and ready to use.  The first case seems easier to follow for me and has a better flow of values.

Admittedly, the read/write buffer model is more powerful and should therefore be expected to be somewhat more complicated.  I'm just not certain that the complexity is actually justified for what we would be using it for.
Attachment #8452347 - Flags: feedback?(honzab.moz)
Comment on attachment 8452346 [details] [diff] [review]
wip8

Review of attachment 8452346 [details] [diff] [review]:
-----------------------------------------------------------------

in general:
- I'm not against to detemplate the Buffer and pass allocators as an arg to constructor or any method where applicable
- I think that we may want to loose a bit on safety to gain some usability like providing raw pointers to the buffer for one-time usage like passing to memcpy etc (mostly happening, but could be done better)
- the original split was like this (see my _wip6 patch_ for basic idea):
  - you created a write buffer
  - you wrote to it, whichever way (type-by-type, whole blob)
  - then, you created a read buffer that was initialized with either the write buffer (so they shared the actual memory) or with a read buffer (based on the same write buffer)
  - a readbuffer initialized with another readbuffer started to read at its read position (a logical clone)
  - you could have unlimited number of readbuffers at the same time bound to the same writebuffer, while the read position was independent in each of them
  - this was a concept of being able to rollback w/o rollback() :)
  - this way you could compute hashes from parts of originally written data reading that data with readbuffer and then store the whole data as a blob using a different readbuffer

The split could be thought a bit better, tho I think the original concept had something we are now IMO missing and trying to workaround with rewindRead()/seek() or alike.

::: mfbt/Buffer.h
@@ +209,5 @@
> +     * Makes free space at the buffer beginning;
> +     * If there Ptr<> refering to the buffer, the buffer will not be shifted
> +     * This function will rewindRead;
> +     */
> +    bool freeSpaceForWriteBufferHead(size_t count)

should be called unshift() as well

@@ +226,5 @@
> +      readCursor = 0;
> +      return true;
> +    }
> +  
> +    void rewindRead()

the concept of multiple separate readbuffers would avoid any rewind methods.

@@ +319,5 @@
> +
> +    Ptr& operator+=(int n)
> +    {
> +      if (buf)
> +        cursor += n*sizeof(T);

I don't agree with nathan's comment here, the check is then made in the deref operators

@@ +500,5 @@
> +   *
> +   *    If there is not enough space, nullptr is returned.
> +   */
> +  template <typename T>
> +  Ptr<T> use(size_t count = 1)

hmm.. but alloc() as a name doesn't express the write cursor is moving forward and that the data are put byte by byte to a single block of memory.  I still think write() is the best.  I caught myself confused with use() so it's really perhaps not a good name.
Attachment #8452346 - Flags: feedback?(honzab.moz)
Comment on attachment 8452347 [details] [diff] [review]
wip8 changes to CacheFileMetadata class

Review of attachment 8452347 [details] [diff] [review]:
-----------------------------------------------------------------

This was a lot of work :)

When looking at the changes, it doesn't seem simpler, but is definitely safer at least according boundary checks and random seeking - or at least it goes that way.  We will also need to put proper checks everywhere, that is another design task ahead of us.  

I think we may want to go a bit backwards and by looking at the code in e.g. CacheFileMetadata (or anywhere else) we may try to find the best API to make the code simpler and safer according mainly (if not even only) boundaries checks and random access.  And only then suggest the ?Buffer? class API according it.


f- as mostly a note for me we are not there yet.

::: netwerk/cache2/CacheFileMetadata.cpp
@@ +254,5 @@
> +    mListener = nullptr;
> +    return NS_OK; 
> +  };
> +  
> +  Buffer<>::Ptr<uint32_t> wi32 = mWriteBuf.use<uint32_t>();

Is there a need for <> after buffer when it's only used as a namespace?

what is "wi32" ?

after looking to the code, I can see this is the place to store the hash to.  please name the pointer correctly so its purpose is clear.

@@ +259,5 @@
>  
> +  size_t i=0;
> +  while(i<mHashCount) {
> +    *(mWriteBuf.use<CacheHash::Hash16_t>()) = mHashArray[i++];
> +  }

agree

@@ +277,3 @@
>  
> +  wi32 = mWriteBuf.use<uint32_t>();
> +  NetworkEndian::writeUint32(wi32, aOffset);

NetworkEndian::writeUint32(mWriteBuf.use<uint32_t>(), aOffset);

or you expect use() here to realloc?

?

@@ +292,5 @@
>      // result from some reason fails during shutdown, we would unnecessarily leak
>      // both this object and the buffer.
> +    Buffer<>::Ptr<char const> data = mWriteBuf.read<char>(dataToRead);
> +    writeBuffer = static_cast<char *>(moz_xmalloc(dataToRead));
> +    memcpy(writeBuffer, data, dataToRead);

hmm... thinking of a clone() method here.

@@ +361,5 @@
>    }
>  
> +  mReadBuf.create(fileSize - metaOffset);
> +  if (!mReadBuf.head()) {
> +    return NS_OK;

really not NS_OK :)

@@ +372,5 @@
>      PR_Close(fd);
>      return NS_ERROR_FAILURE;
>    }
>  
> +  size_t availForWriting = mReadBuf.room<char>();

there is just room() giving the raw buffer available length, I think...

@@ +653,5 @@
>    }
>  
>    // check whether we have read all necessary data
> +  mReadBuf.rewindRead();
> +  mReadBuf.read<char>(mReadBuf.avail() - sizeof(uint32_t));

readbuffer... that would be more elegant, see bellow

@@ +655,5 @@
>    // check whether we have read all necessary data
> +  mReadBuf.rewindRead();
> +  mReadBuf.read<char>(mReadBuf.avail() - sizeof(uint32_t));
> +  Buffer<>::Ptr<uint32_t const> readBufSize = mReadBuf.read<uint32_t>(); 
> +  uint32_t realOffset = NetworkEndian::readUint32(readBufSize);

why not 

uint32_t realOffset = NetworkEndian::readUint32(mReadBuf.read<uint32_t>().get()) ?

I would not be against a get() method or just provide the raw char* here!

@@ +657,5 @@
> +  mReadBuf.read<char>(mReadBuf.avail() - sizeof(uint32_t));
> +  Buffer<>::Ptr<uint32_t const> readBufSize = mReadBuf.read<uint32_t>(); 
> +  uint32_t realOffset = NetworkEndian::readUint32(readBufSize);
> +  // we just needed one information we will read buffer elsewhere 
> +  readBufSize.release();

readBufSize name is not really well chosen.  if you want it to release the buffer (so that some operations work?) just put the code to { }, as you would be working with autolocks.

@@ +684,2 @@
>      uint32_t missing = usedOffset - realOffset;
> +    Buffer<>::Ptr<char> pBuf = mReadBuf.unshift<char>(missing);

check the result for non-null

@@ +688,5 @@
>  
>      LOG(("CacheFileMetadata::OnDataRead() - We need to read %d more bytes to "
>           "have full metadata. [this=%p]", missing, this));
>  
> +    rv = CacheFileIOManager::Read(mHandle, realOffset, (char*)(void*)pBuf, missing, true, this);

this must be a static_cast, btw.  again, better to provide the raw buffer and loose some safety here

@@ +770,5 @@
>  
>    nsresult rv;
>  
> +  if (aBufOffset)
> +    mReadBuf.read<char>(aBufOffset);

how can you be sure you are at the right position before this operation?  an independent readbuffer created from |WriteBuf mReadBuf| (I would change the name to e.g. mLoadBuf) would be safer IMO:

ReadBuffer reader(mLoadBuf, aBufOffset /* offset to start at! */);
ReadBuffer::Ptr<uint32_t> pHashValue = reader.read<uint32_t>();

@@ +831,5 @@
>        return NS_ERROR_FILE_CORRUPTED;
>      }
>    }
> +  
> +  mElemBufSize = mReadBuf.avail<char>() - sizeof(uint32_t);

use mReadBuf.avail()

@@ +832,5 @@
>      }
>    }
> +  
> +  mElemBufSize = mReadBuf.avail<char>() - sizeof(uint32_t);
> +  Buffer<>::Ptr<char const> pElements = mReadBuf.read<char>(mElemBufSize);

what is actually this code about?

@@ +838,5 @@
>    // check metadata hash (data from hashesOffset to metaposOffset)
> +  mReadBuf.rewindRead();
> +  if (aBufOffset) 
> +    mReadBuf.read<char>(aBufOffset);
> +  pHashValue = mReadBuf.read<uint32_t>();

aren't you doing this the second time?

@@ +852,5 @@
>           hashExpected, this));
>      return NS_ERROR_FILE_CORRUPTED;
>    }
>  
> +  mElemBuf = static_cast<char *>(moz_xmalloc(mElemBufSize));

mElemBuf is actually just a raw buffer that we only care to release properly.  It would be good to design the buffer class(s?) to also provide a simple buffer life-time protection (aka auto-ptr-like class) while leaving ability to access the raw buffer and work with it.

@@ +868,5 @@
>                     moz_xmalloc(mHashArraySize));
> +    int i = 0;
> +    while (pHash) {
> +      mHashArray[i++] = *(pHash++);
> +    }

or just something like |bool read<T>(T* dest, size_t count)| ?


or have mHashArray be a new class "AutoBuffer" that will just provide the raw buffer (void/char* operator()), won't realloc, just release at the end, that will be created as a range-clone?

like:
mHashArray.cloneFrom<CacheHash::Hash16_t>(pHash, hashesLen);

@@ +947,5 @@
>    // mHandle reported via CacheFileIOManager.
>    n += mKey.SizeOfExcludingThisIfUnshared(mallocSizeOf);
>    n += mallocSizeOf(mHashArray);
> +  n += mallocSizeOf(mReadBuf.head());
> +  n += mallocSizeOf(mWriteBuf.head());

there should be SizeOf(Ex/In)cludingThis on the Buffer class(s).
Attachment #8452347 - Flags: feedback?(honzab.moz) → feedback-
(In reply to Honza Bambas (:mayhemer) from comment #38)
> @@ +319,5 @@
> > +
> > +    Ptr& operator+=(int n)
> > +    {
> > +      if (buf)
> > +        cursor += n*sizeof(T);
> 
> I don't agree with nathan's comment here, the check is then made in the
> deref operators

This is true, but I think it's better to have the invariant on the data enforced all the time, not just when you dereference it.

> @@ +500,5 @@
> > +   *
> > +   *    If there is not enough space, nullptr is returned.
> > +   */
> > +  template <typename T>
> > +  Ptr<T> use(size_t count = 1)
> 
> hmm.. but alloc() as a name doesn't express the write cursor is moving
> forward and that the data are put byte by byte to a single block of memory. 
> I still think write() is the best.  I caught myself confused with use() so
> it's really perhaps not a good name.

Yeah, use() seemed very confusing when I was reading the CacheFileMetadata patch.  So there's a point of agreement!

I'll be unavailable for reviews for the next two weeks; please have :Waldo or :jcranmer review in my place.
(In reply to Nathan Froyd [AFK through July 21st] (:froydnj) from comment #40)
> (In reply to Honza Bambas (:mayhemer) from comment #38)
> > @@ +319,5 @@
> > > +
> > > +    Ptr& operator+=(int n)
> > > +    {
> > > +      if (buf)
> > > +        cursor += n*sizeof(T);
> > 
> > I don't agree with nathan's comment here, the check is then made in the
> > deref operators
> 
> This is true, but I think it's better to have the invariant on the data
> enforced all the time, not just when you dereference it.

Yep, but the question is what to do here?  Not to shift?  Throw? :)  IMO bad - would be inconsistent with the behavior of the rest of the API.  Have to think.
Sorry for the confusion about implementing read/write buffer. I did not want to implement it before we agree on what we want, because it used to be separate buffer then it was put together...

Reading you comments I prefer separated read/write buffer.

Lets use alloc instead of use.


Just to sum up what at this point can be a API:


So most of the time we will use:

*writeBuf.alloc<char>() = 0;
and
uint32_t something =  *readBuf.read<uint32_t>();

and 
writeBuf.write<CacheHash::Hash16_t>(mHashArray, mHashCount)
readBuf.read<CacheHash::Hash16_t>(mHashArray, mHashCount)

actually we do not need 
  *writeBuf.alloc<char>() = 0;
  uint32_t something =  *readBuf.read<uint32_t>();

and have just:
writeBuf.write<char>(0);
uint32_t something =  readBuf.read<uint32_t>();

and there are allocChar and readChar:
bytesRead = PR_Read(fd, mReadBuf.useChar(availForWriting), availForWriting);
hash = CacheHash::Hash(mWriteBuf.readChar(dataToRead), dataToRead);

of course Read buffer can only read and write buffer can only alloc

we will use pointers only in special cases like with pHash that we need a place holder. We need to wait for other data to be written before calculating and writing onto pHash position.
Maybe we do not need a Ptr<> but just something simpler like a Cursor<> (you can use it just to write into a position no pointer arithmetic)
If we use Cursor not Ptr we can call the function get<> instead of use<>....

I will leave function:
void* head() const
or maybe call it getBuffer()
so we can use it for check if we have a buffer and for, as an example, to write it into a file.

About unshift(), I am not sure. I will leave it for now. But maybe we can decide that we do not implement it and if user needs something like that char* must be used.
I will leave it for now, we can decide later.

rewindRead goes away, we are using separate buffers.

Ptr<> class has method: operator void* () const
So  we can do:
> NetworkEndian::writeUint32(mWriteBuf.use<uint32_t>(), aOffset);

and 

uint32_t realOffset = NetworkEndian::readUint32(mReadBuf.read<uint32_t>())

for now I will not implement get() method for Ptr<>

> @@ +688,5 @@
> >  
> >      LOG(("CacheFileMetadata::OnDataRead() - We need to read %d more bytes to "
> >           "have full metadata. [this=%p]", missing, this));
> >  
> > +    rv = CacheFileIOManager::Read(mHandle, realOffset, (char*)(void*)pBuf, missing, true, this);
> 
> this must be a static_cast, btw.  again, better to provide the raw buffer and loose some safety here

maybe we provide only char* unshift() because it is going to be use in such cases only (probably) 
actually we can use unshift<uint32_t> for instead of pHash from above, but we do not want to do that.

> there should be SizeOf(Ex/In)cludingThis on the Buffer class(s).

we need this too


> (In reply to Honza Bambas (:mayhemer) from comment #38)
> > @@ +319,5 @@
> > > +
> > > +    Ptr& operator+=(int n)
> > > +    {
> > > +      if (buf)
> > > +        cursor += n*sizeof(T);
> > 
> > I don't agree with nathan's comment here, the check is then made in the
> > deref operators
> 
> This is true, but I think it's better to have the invariant on the data
> enforced all the time, not just when you dereference it.

I will ensure that cursor <=stop
so if we can do while(ptr++) 


I will split buffer and try to do anther example(CacheIndex.cpp) to see if this API will work
Looks good, thinks seems to be shaping up!  Btw, you have nothing to apologies for :)  This is simply how stuff evolves - try/fail/go back a bit/try again.

Thanks!
Attached patch wip9Splinter Review
Here is a new version.

:Waldo  If you do not have time, you can leave it to Nathan Froyd to give a feedback. There is a long correspondence to read.

In short:

This is not a final version for review,  we still need to decide about API.
So TestBuffer.cpp does not cover test for all properties, and I did not work on it at all.

Mainly I worked on two examples:
CasheFileMetadata ans CacheIndex

Short overview:
We have WriteBuffer that owns the raw buffer. There is no notion of buffer sharing. I did this because I wanted to make class easier. So purpose is to write data into a buffer ans send raw data to a file or something like that.
And the other possibility is to call buf.GetRawBufferAndRelease() which returns the raw buffer that is not reference any more by WriteBuffer.

WriteBuffer now has dataWritten() function instead of avail()

There is deleteUsedData(size_t collect) that deletes |collect| amount of data only if there is no Ptrs or ReadBuffers pointing to the raw buffer.

ReadBuffer can only be initialized be a WriteBuffer and it is used for reading.
ReadBuffer has avail() (still not read data) and dataRead() (already read data)


We still have Ptr<> which is in the examples used only as place holder so we do not need the pointer arithmetic and we can change it to something like “Cursor” w/o pointer arithmetic.
Attachment #8452346 - Attachment is obsolete: true
Attachment #8456115 - Flags: feedback?(jwalden+bmo)
Attachment #8456115 - Flags: feedback?(honzab.moz)
Attachment #8452347 - Attachment is obsolete: true
Attachment #8456116 - Flags: feedback?(jwalden+bmo)
Attachment #8456116 - Flags: feedback?(honzab.moz)
Attachment #8456117 - Flags: feedback?(jwalden+bmo)
Attachment #8456117 - Flags: feedback?(honzab.moz)
Attachment #8456115 - Flags: feedback?(jwalden+bmo)
Attachment #8456115 - Flags: feedback?(honzab.moz)
Attachment #8456116 - Flags: feedback?(jwalden+bmo)
Attachment #8456116 - Flags: feedback?(honzab.moz)
Attachment #8456117 - Flags: feedback?(jwalden+bmo)
Attachment #8456117 - Flags: feedback?(honzab.moz)
Dragana, for now I will not have time for this.  It would be nice to have a class like this, but it seems that designing the api to achieve all the primarily stated goals is not worth the time.

We can get to this later when new idea pop here.

This was a well spent time tho, we know it's not that simple to have a nice class for this now :)
Assignee: dd.mozilla → nobody
Tentatively taking this one back not loose it from the radar.
Assignee: nobody → honzab.moz
Status: ASSIGNED → NEW
So, it turned out not to be that easy to design an API that would perform well and be safe at the same time.  I will not have time to work on this any time soon.
Assignee: honzab.moz → nobody
You need to log in before you can comment on or make changes to this bug.