Status

()

enhancement
RESOLVED FIXED
a year ago
10 months ago

People

(Reporter: bhackett, Assigned: bhackett)

Tracking

Trunk
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox62 affected)

Details

(Whiteboard: leave-open)

Attachments

(10 attachments, 5 obsolete attachments)

34.20 KB, patch
froydnj
: review+
Details | Diff | Splinter Review
2.13 KB, patch
froydnj
: review+
Details | Diff | Splinter Review
18.01 KB, patch
froydnj
: review+
Details | Diff | Splinter Review
79.08 KB, patch
froydnj
: review+
Details | Diff | Splinter Review
93.48 KB, patch
froydnj
: review+
Details | Diff | Splinter Review
53.70 KB, patch
froydnj
: review+
Details | Diff | Splinter Review
82.09 KB, patch
froydnj
: review+
Details | Diff | Splinter Review
55.58 KB, patch
froydnj
: review+
Details | Diff | Splinter Review
14.80 KB, patch
froydnj
: review+
Details | Diff | Splinter Review
16.58 KB, patch
mccr8
: review+
Details | Diff | Splinter Review
Assignee

Description

a year ago
This bug is for getting reviews for the core recording/replaying/rewinding logic in Web Replay.  There is a fair amount of overlap here with patches from bug 1207696, but enough has changed and enough of the parts in that bug were not reviewed that I think it's better to start fresh with this (I'll be carrying forward reviews from bug 1207696 for more isolated parts of the codebase).  I'll point out areas that haven't changed significantly for each patch.
Web Replay probably needs it's own component.
Flags: needinfo?(mmorris)
Assignee

Comment 2

a year ago
This is pretty similar to part 1a of bug 1207696. The main differences are:

- Record/replay methods are no-ops except on Mac nightly builds.
- New APIs {Begin,End}CaptureEventStacks, RecordReplayDirective, VirtualThingName, ProcessKind initialization stuff.
- Removed/simplified APIs for hashtables, weak pointers, some other minor stuff.
- CheckpointId type.
- Record/replay allocation policies have moved here.
Assignee: nobody → bhackett1024
Attachment #8981684 - Flags: review?(nfroyd)
Assignee

Comment 3

a year ago
Build file changes for toolkit/recordreplay.
Attachment #8981685 - Flags: review?(nfroyd)
Assignee

Comment 4

a year ago
Utility structures used by recording/replaying and middleman processes.
Attachment #8981689 - Flags: review?(nfroyd)
Assignee

Comment 5

a year ago
This has changed little from code in part 1f of bug 1207696.
Attachment #8981690 - Flags: review?(nfroyd)
Assignee

Comment 6

a year ago
There is some new functionality here for the new parts of the public API, but not much else has changed since part 1f of bug 1207696.
Attachment #8981694 - Flags: review?(nfroyd)
Assignee

Comment 7

a year ago
The redirection code is better integrated with udis86 now than in part 1h of bug 1207696, and the code for finding places to insert jumps in system libraries is more robust, but the interface with the platform redirections hasn't changed otherwise.
Attachment #8981696 - Flags: review?(nfroyd)
Assignee

Comment 8

a year ago
There are some new redirections here since part 1i of bug 1207696.
Attachment #8981697 - Flags: review?(nfroyd)
Assignee

Comment 9

a year ago
Memory snapshots have gone through several rounds of performance improvements --- using splay trees instead of arrays, maintaining a set of free untracked regions --- and some simplifications but haven't changed much otherwise from part 1g of bug 1207696.
Attachment #8981698 - Flags: review?(nfroyd)
Assignee

Comment 10

a year ago
Thread snapshots haven't really changed since part 1g of bug 1207696.
Attachment #8981699 - Flags: review?(nfroyd)
Assignee

Comment 11

a year ago
Posted patch Part 10 - Rewinding logic. (obsolete) — Splinter Review
Parts of the rewinding interface have been changed/simplified --- the middleman tells the process which checkpoints to save, and interim checkpoints have been removed --- but other parts haven't changed much since part 1g of bug 1207696.
Attachment #8981700 - Flags: review?(nfroyd)
FWIW regarding Backtrace in part 3, we do have CodeAddressService already in the tree, as basically a caching wrapper around dladdr, which is used by DMD and refcount logging.
I am attempting to swap in everything I need to remember from the last time I reviewed a bunch of WebReplay code, and try wrap my head around everything here.  I think I'll have the first (ideally useful!) feedback sometime next week.
I'm copied as "need info".  I'm not seeing what is needed from me?  Help!
Flags: needinfo?(mmorris)
(In reply to Marissa Morris from comment #14)
> I'm copied as "need info".  I'm not seeing what is needed from me?  Help!

Comment 1, I was thinking this needs it's own component.
Assignee

Comment 16

a year ago
(In reply to Nathan Froyd [:froydnj] from comment #13)
> I am attempting to swap in everything I need to remember from the last time
> I reviewed a bunch of WebReplay code, and try wrap my head around everything
> here.  I think I'll have the first (ideally useful!) feedback sometime next
> week.

Thanks a lot!  I'm going to attach some updated patches here: after feedback in bug 1465287 and bug 1465294 I'm going to get the sandbox working on recording/replaying and middleman processes, and part of that entails removing all the snapshot files that replaying processes create in order to rewind later.  This is a considerable simplification to some of the patches in this bug, and is an improvement besides (the OS can take care of paging unused memory out to disk).
Assignee

Comment 17

a year ago
Files and their streams no longer need to be templated on the kind of memory they should allocate internally.
Attachment #8981689 - Attachment is obsolete: true
Attachment #8981689 - Flags: review?(nfroyd)
Attachment #8982936 - Flags: review?(nfroyd)
Assignee

Comment 18

a year ago
The thread interface has changed a little, and some initialization code is no longer necessary.
Attachment #8981694 - Attachment is obsolete: true
Attachment #8981694 - Flags: review?(nfroyd)
Attachment #8982937 - Flags: review?(nfroyd)
Assignee

Comment 19

a year ago
All the logic for older heap memory snapshot data out to disk has been removed, along with the logic for keeping the amount of in-memory snapshots within a given limit.
Attachment #8981698 - Attachment is obsolete: true
Attachment #8981698 - Flags: review?(nfroyd)
Attachment #8982938 - Flags: review?(nfroyd)
Assignee

Comment 20

a year ago
Thread state snapshots are now kept in memory instead of on disk.  Some parts of the thread API related to snapshots have moved here as well.
Attachment #8981699 - Attachment is obsolete: true
Attachment #8981699 - Flags: review?(nfroyd)
Attachment #8982939 - Flags: review?(nfroyd)
Assignee

Comment 21

a year ago
The rewinding code now stores the thread snapshots, but doesn't need to keep track of the set of snapshot files anymore.
Attachment #8981700 - Attachment is obsolete: true
Attachment #8981700 - Flags: review?(nfroyd)
Attachment #8982940 - Flags: review?(nfroyd)
Comment on attachment 8981685 [details] [diff] [review]
Part 2 - Record/replay build files.

Review of attachment 8981685 [details] [diff] [review]:
-----------------------------------------------------------------

To be clear, everything being added in toolkit/recordreplay/ is not yet defined at this point in the patch series?

::: toolkit/recordreplay/moz.build
@@ +43,5 @@
> +        'ipc/DisabledIPC.cpp',
> +    ]
> +
> +LOCAL_INCLUDES += [
> +    '!/ipc/ipdl/_ipdlheaders',

This feels a little gross, but OK.
Attachment #8981685 - Flags: review?(nfroyd) → review+
Comment on attachment 8981684 [details] [diff] [review]
Part 1 - Public record/replay API.

Review of attachment 8981684 [details] [diff] [review]:
-----------------------------------------------------------------

All of this seems pretty much OK, although I guess some of the functionality is being added in later patches (e.g. VirtualThingName)?

It's a little hard to comment usefully on how one might do things differently (and I haven't even gotten my head around some of the latter patches), so I have some style-ish suggestions below.

::: mfbt/RecordReplay.h
@@ +28,5 @@
> +// start of the main() routine, and is afterward invariant for the process.
> +// A third process type, middleman processes, are normal content processes used
> +// when replaying to facilitate IPC between recording/replaying processes and
> +// the UI process, and to run debugger code that interacts with the recording
> +// and replaying processes.

This bit about the middleman processes is helpful, but it doesn't seem to belong in the introductory paragraph.  Perhaps it should be moved to its own paragraph (potentially with more information added about controlling determinism between processes) after the discussion of inter-thread non-determinism, below?

@@ +105,5 @@
> +
> +// RAII class for an atomic access. This can also be called directly
> +// (i.e. AutoOrderedAtomicAccess()) to insert an ordering fence that will force
> +// threads to execute in the same order during replay.
> +struct AutoOrderedAtomicAccess

My sense is that it's not good to have an RAII class that you can also use as an unnamed temporary for construction and destruction side effects.  WDYT about adding MOZ_RAII to this class (and the MOZ_GUARD_OBJECT* foo), and having a separate function:

void
OrderAtomicAccessesFence()
{
  AutoOrderedAtomicAccess guard;
  return;
}

so it's more obvious to the reader that you're not making a mistake with an unnamed temporary?

@@ +265,5 @@
> +MFBT_API void ExecuteTriggers();
> +
> +// Return whether execution has diverged from the recording and events may be
> +// encountered that did not happen while recording. Interacting with the system
> +// will generally cause the current debugger operation to fail. For more

To be clear, this "debugger" is the JS debugger, or a "conventional" debugger like GDB?  Can we make this text clearer?

@@ +331,5 @@
> +  MiddlemanReplaying
> +};
> +
> +// Command line option for specifying the record/replay kind of a process.
> +static const char* gProcessKindOption = "-recordReplayKind";

Please make this `static const char gProcessKindOption[] = ...;`

@@ +334,5 @@
> +// Command line option for specifying the record/replay kind of a process.
> +static const char* gProcessKindOption = "-recordReplayKind";
> +
> +// Command line option for specifying the recording file to use.
> +static const char* gRecordingFileOption = "-recordReplayFile";

Same thing here.

@@ +359,5 @@
> +  size_t mNormal;
> +
> +  // How many temporary checkpoints have been generated since the most recent
> +  // normal checkpoint, zero if this represents the normal checkpoint itself.
> +  size_t mTemporary;

Is it worth encapsulating more functionality in this class?  Especially with InvalidCheckpointId/FirstCheckpointId defined outside of the class, it feels like there's some kind of "driver" code that could profitably be moved here.

@@ +365,5 @@
> +  explicit CheckpointId(size_t aNormal = InvalidCheckpointId, size_t aTemporary = 0)
> +    : mNormal(aNormal), mTemporary(aTemporary)
> +  {}
> +
> +  inline bool operator ==(const CheckpointId& o) const {

Nit: I think Gecko style writes operator==.

@@ +369,5 @@
> +  inline bool operator ==(const CheckpointId& o) const {
> +    return mNormal == o.mNormal && mTemporary == o.mTemporary;
> +  }
> +
> +  inline bool operator !=(const CheckpointId& o) const {

Same nit here.

@@ +433,5 @@
> +// saving or restoring checkpoints. All other values refer to memory that is
> +// untracked, and whose contents are preserved when restoring checkpoints.
> +// Different values may be used to distinguish different classes of memory for
> +// diagnosing leaks and reporting memory usage.
> +typedef size_t AllocatedMemoryKind;

I assume we're using a size_t here, rather than a proper enum, because there are going to be custom Kinds defined elsewhere?  Is there an advantage to doing that versus defining them all here?

Actually, presumably the memory APIs dealing with kinds have to have global knowledge of all the kinds anyway, right (or it would be nice if they did)?  So an enum would be the correct thing to use here?
Attachment #8981684 - Flags: review?(nfroyd) → review+
Comment on attachment 8982936 [details] [diff] [review]
Part 3 - Record/replay utilities.

Review of attachment 8982936 [details] [diff] [review]:
-----------------------------------------------------------------

First round of comments on this.  I need to examine File.* closely.

::: toolkit/recordreplay/Backtrace.h
@@ +12,5 @@
> +namespace mozilla {
> +namespace recordreplay {
> +
> +// This file declares some routines for quickly getting a usable backtrace for
> +// associating with a record/replay assertion.

We have a bunch of cross-platform backtrace code with caching in xpcom; it would be lovely if we could figure out how to use that here.

::: toolkit/recordreplay/ChunkAllocator.h
@@ +28,5 @@
> +  // A page sized block holding a next pointer and an array of as many things
> +  // as possible.
> +  struct Chunk
> +  {
> +    uint8_t mStorage[PageSize - sizeof(Chunk*)];

Where is PageSize defined?  It doesn't seem to be in this patch.

@@ +30,5 @@
> +  struct Chunk
> +  {
> +    uint8_t mStorage[PageSize - sizeof(Chunk*)];
> +    ChunkPointer mNext;
> +    Chunk() { PodZero(this); }

Can you make this:

Chunk() : mStorage{}, mNext(0) {}

please, so we don't use PodZero?

@@ +55,5 @@
> +  }
> +
> +public:
> +  // ChunkAllocators are allocated in static storage and should not have
> +  // constructors. Their memory will be initially zero.

Maybe a:

ChunkAllocator() = default;
~ChunkAllocator() = default;

along with deleting copy constructors/assignment would be a good idea?

@@ +67,5 @@
> +    }
> +    return chunk->GetThing(aId);
> +  }
> +
> +  // Get an existing entry from the allocator, or null. This may return an etry

Nit: "...may return an entry"

@@ +76,5 @@
> +
> +  // Create a new entry with the specified ID. This must not be called on IDs
> +  // that have already been used with this allocator.
> +  inline T* Create(size_t aId) {
> +    if (aId < mCapacity) {

Isn't this contravening the second part of the comment before this function?

::: toolkit/recordreplay/Monitor.h
@@ +24,5 @@
> +public:
> +  Monitor() {
> +    AutoEnsurePassThroughThreadEvents pt;
> +    mLock = PR_NewLock();
> +    mCondVar = PR_NewCondVar(mLock);

Is there any way we could make this use detail::{Mutex,CondVar} from mozglue?  That would be somewhat more efficient, and would require less use of NSPR.

::: toolkit/recordreplay/SpinLock.h
@@ +39,5 @@
> +class ReadWriteSpinLock
> +{
> +public:
> +  inline void Lock(bool aRead);
> +  inline void Unlock(bool aRead);

Can we have the public API be ReadLock/ReadUnlock/WriteLock/WriteUnlock, please?  That way we don't have a bunch of /*aRead=*/false comments scattered about.

I am indifferent whether there's a private API underneath that takes boolean parameters, but please use "real" names in the public API.

::: toolkit/recordreplay/SplayTree.h
@@ +25,5 @@
> + * T indicates the type of tree elements, L has a Lookup type and a static
> + * 'ssize_t compare(const L::Lookup&, const T&)' method ordering the elements.
> + */
> +template <class T, class L, class AllocPolicy, size_t ChunkPages>
> +class SplayTree

I'm assuming here that we can't use the MFBT one because it doesn't provide for an AllocPolicy, and trying to shoehorn one into it is tricky?

@@ +33,5 @@
> +    Node* mLeft;
> +    Node* mRight;
> +    Node* mParent;
> +
> +    explicit Node(const T& aItem)

Do you want to provide for constructing Node from T&&  as well?

::: toolkit/recordreplay/ValueIndex.cpp
@@ +15,5 @@
> +ValueIndex::Insert(const void* aValue)
> +{
> +  MOZ_RELEASE_ASSERT(!Contains(aValue));
> +
> +  size_t index = mIndexCount++;

What are the threadsafety guarantees here?  I guess "don't access this on multiple threads"?

@@ +47,5 @@
> +bool
> +ValueIndex::MaybeGetIndex(const void* aValue, size_t* aIndex)
> +{
> +  ValueToIndexMap::const_iterator iter = mValueToIndex.find(aValue);
> +  if (iter != mValueToIndex.end()) {

Is it worth providing another function that just returns the iterator, so Remove() can be more efficient?

::: toolkit/recordreplay/ValueIndex.h
@@ +45,5 @@
> +  // Remove an entry from the map, unless there is no entry for aValue.
> +  void Remove(const void* aValue);
> +
> +  // Get the index for an entry in the map.
> +  size_t GetIndex(const void* aValue);

Please document that the entry must exist in the map.

@@ +55,5 @@
> +  // Return whether there is an entry for aValue.
> +  bool Contains(const void* aValue);
> +
> +  // Get the value associated with an index.
> +  const void* GetValue(size_t aIndex);

Please document that the index must exist in the map.
I am mildly concerned that a lot of these data structures (and maybe other things in these patches) don't appear to have standalone tests.  I understand that the whole thing hangs together, which is good, but standalone tests would raise my confidence in the correctness of what these things do, as well as making it easier for other people to make modifications down the line.

(In reply to Nathan Froyd [:froydnj] from comment #24)
> @@ +47,5 @@
> > +bool
> > +ValueIndex::MaybeGetIndex(const void* aValue, size_t* aIndex)
> > +{
> > +  ValueToIndexMap::const_iterator iter = mValueToIndex.find(aValue);
> > +  if (iter != mValueToIndex.end()) {
> 
> Is it worth providing another function that just returns the iterator, so
> Remove() can be more efficient?

To be clear, an *internal* function.
Assignee

Comment 26

a year ago
(In reply to Nathan Froyd [:froydnj] from comment #24)
> ::: toolkit/recordreplay/Backtrace.h
> @@ +12,5 @@
> > +namespace mozilla {
> > +namespace recordreplay {
> > +
> > +// This file declares some routines for quickly getting a usable backtrace for
> > +// associating with a record/replay assertion.
> 
> We have a bunch of cross-platform backtrace code with caching in xpcom; it
> would be lovely if we could figure out how to use that here.

Sure, I'll see if that can be used here.

> ::: toolkit/recordreplay/ChunkAllocator.h
> @@ +28,5 @@
> > +  // A page sized block holding a next pointer and an array of as many things
> > +  // as possible.
> > +  struct Chunk
> > +  {
> > +    uint8_t mStorage[PageSize - sizeof(Chunk*)];
> 
> Where is PageSize defined?  It doesn't seem to be in this patch.

It is defined in part 5.

> @@ +76,5 @@
> > +
> > +  // Create a new entry with the specified ID. This must not be called on IDs
> > +  // that have already been used with this allocator.
> > +  inline T* Create(size_t aId) {
> > +    if (aId < mCapacity) {
> 
> Isn't this contravening the second part of the comment before this function?

The capacity refers to the amount of space that has been allocated, which can be greater than the last ID that was created due to chunks containing multiple things.

> ::: toolkit/recordreplay/Monitor.h
> @@ +24,5 @@
> > +public:
> > +  Monitor() {
> > +    AutoEnsurePassThroughThreadEvents pt;
> > +    mLock = PR_NewLock();
> > +    mCondVar = PR_NewCondVar(mLock);
> 
> Is there any way we could make this use detail::{Mutex,CondVar} from
> mozglue?  That would be somewhat more efficient, and would require less use
> of NSPR.

Sure, I'll try to use the mozglue structures.

> ::: toolkit/recordreplay/SplayTree.h
> @@ +25,5 @@
> > + * T indicates the type of tree elements, L has a Lookup type and a static
> > + * 'ssize_t compare(const L::Lookup&, const T&)' method ordering the elements.
> > + */
> > +template <class T, class L, class AllocPolicy, size_t ChunkPages>
> > +class SplayTree
> 
> I'm assuming here that we can't use the MFBT one because it doesn't provide
> for an AllocPolicy, and trying to shoehorn one into it is tricky?

Hmm, I think I looked at this but as is the MFBT one doesn't provide some features we need here like lookup types that differ from tree elements, and looking up the closest element to a lookup.  I don't think the memory management issue is a problem, as the MFBT splay tree leaves node allocation up to the caller.

> ::: toolkit/recordreplay/ValueIndex.cpp
> @@ +15,5 @@
> > +ValueIndex::Insert(const void* aValue)
> > +{
> > +  MOZ_RELEASE_ASSERT(!Contains(aValue));
> > +
> > +  size_t index = mIndexCount++;
> 
> What are the threadsafety guarantees here?  I guess "don't access this on
> multiple threads"?

Yeah, this class is not threadsafe.
Comment on attachment 8981696 [details] [diff] [review]
Part 6 - Redirection infrastructure.

Review of attachment 8981696 [details] [diff] [review]:
-----------------------------------------------------------------

::: toolkit/recordreplay/ProcessRedirect.cpp
@@ +151,5 @@
> +    // with redirecting the initial bytes of the function. Ideally we would
> +    // find these backedges with some binary analysis, but this is easier said
> +    // than done, especially since there doesn't seem to be a standard way to
> +    // determine the extent of a symbol's code on OSX.
> +    if ((strstr(startName, "CTRunGetGlyphs") &&

Why are we strstr'ing rather than strcmp'ing here?  Library version numbering goo in the symbol?  Namespacing?  A comment explaining that rationale would be good.

@@ +180,5 @@
> +  va_start(ap, aFormat);
> +  char buf[4096];
> +  VsprintfLiteral(buf, aFormat, ap);
> +  va_end(ap);
> +  buf[sizeof(buf) - 1] = 0;

I think VsprintfLiteral guarantees null termination?

@@ +665,5 @@
> +  void* res = aAssembler.Current();
> +
> +  // On x64 the argument will be in a register, so to add an extra argument for
> +  // the callee we just need to fill in the appropriate register for the
> +  // argument position with the bound argument value.

I have no problem with restricting the argument position required here (not supporting a fully general calling convention makes a lot of things easier), but the documentation for BindFunctionArgument needs to say something about this restriction (and probably also that aArgumentPosition is 1-based, rather than 0-based).

::: toolkit/recordreplay/ProcessRedirect.h
@@ +79,5 @@
> +  // Name of the function being redirected.
> +  const char* mName;
> +
> +  // Address of the function which is being redirected. The code for this
> +  // this function is modified so that attempts to call this function will

Nit: "this function" at the beginning of the line duplicates "this" in the line above.

@@ +722,5 @@
> +
> +// Read/write a success code (where negative values are failure) and errno value on failure.
> +template <typename T>
> +static inline bool
> +RecordOrReplayHadErrorNegative(AutoRecordReplayFunction<T>& aRrf)

I wonder if it'd be better for this function to be RecordOrReplayCallSucceeded, since we're constantly calling it like:

  if (!RecordOrReplayHadErrorNegative(rrf)) {
  }

Or just instead of:

  if (!RecordOrReplayHadErrorNegative(rrf)) {
    ...lots of logic
  }

  return $SIMPLE;

we should invert things:

  if (RecordOrReplayHadErrorNegative(rrf)) {
    return $SIMPLE;
  }

  ...lots of things
Attachment #8981696 - Flags: review?(nfroyd) → review+
Comment on attachment 8981697 [details] [diff] [review]
Part 7 - Darwin redirections.

Review of attachment 8981697 [details] [diff] [review]:
-----------------------------------------------------------------

::: toolkit/recordreplay/ProcessRedirectDarwin.cpp
@@ +521,5 @@
> +    events.CheckInput(initialLengths[i]);
> +  }
> +  delete[] initialLengths;
> +
> +  if (!RecordOrReplayHadErrorNegative(rrf)) {

Mentioned this in the other patch, but it seems that for many of these sort of checks, we want to invert the sense of the test for an early return, even if that means duplicating the return statements.

@@ +658,5 @@
> +
> +static ssize_t
> +RR_munmap(void* aAddress, size_t aSize)
> +{
> +  DeallocateMemory(aAddress, aSize, TrackedMemoryKind);

How does this work in the MAP_ANON/MAP_FIXED case, where we are not AllocateMemory'ing?  Actually, it looks like the !MAP_ANON/MAP_FIXED case also does this...are these combinations rare, or is there just something that I'm missing?

@@ +766,5 @@
> +
> +static ssize_t
> +RR_gettimeofday(struct timeval* aTimeVal, struct timezone* aTimeZone)
> +{
> +  if (HasDivergedFromRecording()) {

This sort of makes sense to me, but could you write a comment on exactly why we're doing this, and what happens if we try to replay the original calls?  Especially why this particular syscall needs to be handled specially?

@@ +1089,5 @@
> +static void
> +RR_start_wqthread(size_t a0, size_t a1, size_t a2, size_t a3, size_t a4, size_t a5, size_t a6, size_t a7)
> +{
> +  // Suspend this thread if system threads should not be running. We are too
> +  // early in initialization of this thread to do anything else.

So we were recording, and saw this call, but during replay, we saw this call *earlier* than when we saw the call when we were recording, and we're just going to hang the thread permanently?  That really works?  Or am I misinterpreting things?

@@ +1121,5 @@
> +// Handle a redirection which releases a mutex, waits in some way for a cvar,
> +// and reacquires the mutex before returning.
> +static ssize_t
> +WaitForCvar(pthread_mutex_t* aMutex, bool aRecordReturnValue,
> +            const std::function<ssize_t()>& aCallback)

Out of curiousity, why not:

template<typename Callback>
static ssize_t
WaitForCvar(..., Callback&& aCallback)
{
  ...
}

So we don't have to create a std::function object?  I assume std::function is efficient and doesn't allocate for this sort of case, but you never know...

@@ +1563,5 @@
> +  "push %rdi;"
> +
> +  // Count how many stack arguments we need to copy.
> +  "movq $64, %rsi;"
> +  "jmp _RecordReplayInvokeObjCMessage_Loop;"

Why are we jumping to the very next instruction here?

@@ +1677,5 @@
> +      TestObjCObjectClass(aArguments->obj, "NSArray"))
> +  {
> +    NSFastEnumerationState* state = (NSFastEnumerationState*)aArguments->arg0;
> +    if (IsReplaying()) {
> +      state->itemsPtr = NewLeakyArray<id>(rval);

How often do we leak things here?

@@ +1701,5 @@
> +      TestObjCObjectClass(aArguments->obj, "NSString"))
> +  {
> +    size_t len = RecordReplayValue(IsRecording() ? strlen((const char*) rval) : 0);
> +    if (IsReplaying()) {
> +      rval = (size_t) NewLeakyArray<uint8_t>(len + 1);

And here?

@@ +1732,5 @@
> +  "movsd %xmm0, 48(%rsp);"
> +
> +  // Count how many stack arguments we need to save.
> +  "movq $64, %rsi;"
> +  "jmp _RR_objc_msgSend_Loop;"

Same question here about jumping to the very next instruction.
Attachment #8981697 - Flags: review?(nfroyd) → review+
Comment on attachment 8981690 [details] [diff] [review]
Part 4 - PLD and PL hashtable stabilization.

Review of attachment 8981690 [details] [diff] [review]:
-----------------------------------------------------------------

::: toolkit/recordreplay/HashTable.cpp
@@ +19,5 @@
> +namespace mozilla {
> +namespace recordreplay {
> +
> +// HashTable stabilization is used with specific hashtable classes which are
> +// based on callbacks. When the table is constructed, if we are

Why do we not care about stabilizing std::unordered_map?

@@ +24,5 @@
> +// recording/replaying then the callbacks are replaced with an alternate set
> +// that produces consistent hash numbers between recording and replay. If
> +// during replay the additions and removals to the tables occur in the same
> +// order that they did during recording, then the structure of the tables and
> +// the order in which elements are visited during iteration will be the same.

Just to be clear, we're doing this stabilizing because between record and reply because for, say, pointer-keyed hashtables, we might have different pointer values for the keys between record and replay, and therefore we might have different hash values between record and replay?  And we can't have that, because then things like iteration order wouldn't be the same, and we would get replay divergences?

If that's the case, then I think the comment here could be a little more clear about what problem we're trying to solve, and how we're solving it here.  e.g. this comment says we "produce[s] consistent hash numbers between recording and replay", but it's not obvious why that's a desirable thing, or even why we would produce inconsistent hash numbers.  (Maybe "equivalent" would be better terminology than "consistent"?  Not sure.)

@@ +56,5 @@
> +  // function) to a vector with all keys sharing that original hash number.
> +  struct HashInfo {
> +    InfallibleVector<KeyInfo> mKeys;
> +  };
> +  typedef std::unordered_map<HashNumber, HashInfo*> HashToKeyMap;

Is it possible to store UniquePtrs in the hash table, so we don't have to manage the memory ourselves?

@@ +70,5 @@
> +  HashNumber mLastNewHash;
> +
> +  // The number of new hashes that have been generated. This increases
> +  // monotonically.
> +  uint32_t mNewHashCount;

What happens when this overflows?

@@ +190,5 @@
> +    BitwiseCast(nfn, aTarget);
> +  }
> +
> +  // Set the last queried key for this table, and generate a new hash number
> +  // for it.

Why are we doing this?  Performance?  Some other reason?  It's not super-clear to me why we're caching this.

@@ +446,5 @@
> +RecordReplayInterface_InternalGeneratePLDHashTableCallbacks(const PLDHashTableOps* aOps)
> +{
> +  PLDHashTableInfo* info = new PLDHashTableInfo(aOps);
> +  Maybe<Assembler> assembler;
> +  info->NewBoundFunction(assembler, PLDHashTableComputeHash, info, 1, &info->mNewOps.hashKey);

This Maybe<Assembler> stuff seems like a really weird API.  Why not just require the assembler to be constructed before calling NewBoundFunction, so the call doesn't have this additional magic side-effect?  Or if the construction of the assembler requires information from `info`, why not have a function in `PLDHashTableInfo` to return an appropriately-initialized one?
Attachment #8981690 - Flags: review?(nfroyd) → review+
Comment on attachment 8982937 [details] [diff] [review]
Part 5 - Core record/replay functionality.

Review of attachment 8982937 [details] [diff] [review]:
-----------------------------------------------------------------

::: toolkit/recordreplay/Lock.cpp
@@ +101,5 @@
> +  }
> +
> +  // Tolerate new locks being created with identical pointers, even if there
> +  // was no DestroyLock call for the old one.
> +  gLocks->erase(aNativeLock);

If this actually erases something, we leak the Lock associated with aNativeLock, yes?  That seems not good.

@@ +190,5 @@
> +{
> +  MOZ_RELEASE_ASSERT(!AreThreadEventsPassedThrough());
> +  gNumLocks = gAtomicLockId;
> +
> +  gAtomicLock = PR_NewLock();

Can we use mozilla::detail::Mutex here, please?  Or do we need to use PR_NewLock here so that we trigger all the code in this file appropriately?

::: toolkit/recordreplay/Lock.h
@@ +43,5 @@
> +  // records the acquire in the lock's acquire order stream. When replaying,
> +  // this is called before the lock has been acquired, and blocks the thread
> +  // until it is next in line to acquire the lock before acquiring it via
> +  // aCallback.
> +  void Enter(const std::function<void()>& aCallback);

It would be great to rework this somehow so the record/replay behavior difference is clearer and/or so the caller doesn't have to manually handle record/replay differences; it's kind of jarring to read:

  if (IsRecording()) {
    DirectLockMutex(aMutex);
  }
  lock->Enter([=]() { DirectLockMutex(aMutex); });

::: toolkit/recordreplay/ProcessRecordReplay.h
@@ +170,5 @@
> +
> +// Check for membership in a vector.
> +template <typename Vector, typename Entry>
> +inline bool
> +VectorContains(const Vector& aVector, const Entry& aEntry)

Why do we have this rather than using std::find or something?

@@ +205,5 @@
> +
> +// Get a current timestamp, in microseconds.
> +double CurrentTime();
> +
> +#define ForEachTimerKind(Macro)                 \

You may be interested in using the macros in mfbt/DefineEnum.h instead.

::: toolkit/recordreplay/WeakPointer.cpp
@@ +16,5 @@
> +
> +namespace mozilla {
> +namespace recordreplay {
> +
> +typedef std::unordered_map<const void*, JS::PersistentRootedObject*> WeakPointerRootMap;

Any chance we could use UniquePtr<PersistentRootedObject> here?
Attachment #8982937 - Flags: review?(nfroyd) → review+

Comment 31

11 months ago
Brian:  if you agree Web Replay probably needs it's own Bugzilla component (comment 1), could you file a bug to open one?  See bug 1415157 for how that's done
Assignee

Comment 32

11 months ago
(In reply to Nathan Froyd [:froydnj] from comment #28)
> @@ +658,5 @@
> > +
> > +static ssize_t
> > +RR_munmap(void* aAddress, size_t aSize)
> > +{
> > +  DeallocateMemory(aAddress, aSize, TrackedMemoryKind);
> 
> How does this work in the MAP_ANON/MAP_FIXED case, where we are not
> AllocateMemory'ing?  Actually, it looks like the !MAP_ANON/MAP_FIXED case
> also does this...are these combinations rare, or is there just something
> that I'm missing?

It's OK if DeallocateMemory is balanced by an mmap call rather than AllocateMemory, as AllocateMemory just calls mmap after checking a free list.

> @@ +1089,5 @@
> > +static void
> > +RR_start_wqthread(size_t a0, size_t a1, size_t a2, size_t a3, size_t a4, size_t a5, size_t a6, size_t a7)
> > +{
> > +  // Suspend this thread if system threads should not be running. We are too
> > +  // early in initialization of this thread to do anything else.
> 
> So we were recording, and saw this call, but during replay, we saw this call
> *earlier* than when we saw the call when we were recording, and we're just
> going to hang the thread permanently?  That really works?  Or am I
> misinterpreting things?

This redirection doesn't interact with the recording, but is kind of a poison pill so that if the function ever executes while replaying, we hang the thread so it can't do anything else.  The problem this is addressing is that by the time the main() routine even runs the OS has already spun up some threads for GCD.  While we don't interact with them while replaying, we also don't want them doing random things so try to neuter them as much as possible.  FWIW this was a much bigger problem back when we could rewind recording processes (which we don't do anymore), and this redirection might not be necessary anymore.

> @@ +1563,5 @@
> > +  "push %rdi;"
> > +
> > +  // Count how many stack arguments we need to copy.
> > +  "movq $64, %rsi;"
> > +  "jmp _RecordReplayInvokeObjCMessage_Loop;"
> 
> Why are we jumping to the very next instruction here?

Sometimes the compiler doesn't place these blocks of code adjacent to each other.  I think it treats them like separate functions, and there is no guarantee you can just fall through.

> @@ +1677,5 @@
> > +      TestObjCObjectClass(aArguments->obj, "NSArray"))
> > +  {
> > +    NSFastEnumerationState* state = (NSFastEnumerationState*)aArguments->arg0;
> > +    if (IsReplaying()) {
> > +      state->itemsPtr = NewLeakyArray<id>(rval);
> 
> How often do we leak things here?

I haven't measured, but I don't think we leak much memory here; all the times we leak will correspond with data that is in the recording, which in general are not large (and if they become large we have a different problem).  I've wanted to add something to keep track of it, but memory used by the replaying process hasn't been much of a problem so it's been a low priority item so far.
Assignee

Comment 33

11 months ago
(In reply to Nathan Froyd [:froydnj] from comment #29)
> ::: toolkit/recordreplay/HashTable.cpp
> @@ +19,5 @@
> > +namespace mozilla {
> > +namespace recordreplay {
> > +
> > +// HashTable stabilization is used with specific hashtable classes which are
> > +// based on callbacks. When the table is constructed, if we are
> 
> Why do we not care about stabilizing std::unordered_map?

Stabilizing all tables would be nice, but it's much easier to do this for PLDHashTable and PLHashTable because they are based on callbacks rather than templates.  We've had to do a little bit of other stabilization stuff for JS hashtables, but otherwise the PLD/PL hashtables cover everything that has been problematic in Gecko.

> @@ +24,5 @@
> > +// recording/replaying then the callbacks are replaced with an alternate set
> > +// that produces consistent hash numbers between recording and replay. If
> > +// during replay the additions and removals to the tables occur in the same
> > +// order that they did during recording, then the structure of the tables and
> > +// the order in which elements are visited during iteration will be the same.
> 
> Just to be clear, we're doing this stabilizing because between record and
> reply because for, say, pointer-keyed hashtables, we might have different
> pointer values for the keys between record and replay, and therefore we
> might have different hash values between record and replay?  And we can't
> have that, because then things like iteration order wouldn't be the same,
> and we would get replay divergences?

Yes, this is correct.

> @@ +70,5 @@
> > +  HashNumber mLastNewHash;
> > +
> > +  // The number of new hashes that have been generated. This increases
> > +  // monotonically.
> > +  uint32_t mNewHashCount;
> 
> What happens when this overflows?

There shouldn't be a problem with overflow, since this field is just used to generate new hash numbers.  I'll change the field name / add a comment.

> @@ +190,5 @@
> > +    BitwiseCast(nfn, aTarget);
> > +  }
> > +
> > +  // Set the last queried key for this table, and generate a new hash number
> > +  // for it.
> 
> Why are we doing this?  Performance?  Some other reason?  It's not
> super-clear to me why we're caching this.

I'll add a comment, but I think this is needed to make sure we keep using the same hash number for new entries that are added into the table.
Assignee

Comment 34

11 months ago
(In reply to Jason Duell [:jduell] (needinfo me) from comment #31)
> Brian:  if you agree Web Replay probably needs it's own Bugzilla component
> (comment 1), could you file a bug to open one?  See bug 1415157 for how
> that's done

Thanks, I filed bug 1470799 to add a new Bugzilla component.
Component: General → Web Replay
Comment on attachment 8982938 [details] [diff] [review]
Part 8 - Heap memory snapshots.

Review of attachment 8982938 [details] [diff] [review]:
-----------------------------------------------------------------

This is pretty gnarly code, even if the ideas behind it are rather simple.  I wish there were some standalone tests for this.

::: toolkit/recordreplay/DirtyMemoryHandler.cpp
@@ +24,5 @@
> +// See AsmJSSignalHandlers.cpp.
> +static const mach_msg_id_t sExceptionId = 2405;
> +
> +// This definition was generated by mig (the Mach Interface Generator) for the
> +// routine 'exception_raise' (exc.defs).

Is this copy-pasted from somewhere?  Could we make that clearer?

@@ +111,5 @@
> +  Thread::SpawnNonRecordedThread(DirtyMemoryExceptionHandlerThread, nullptr);
> +
> +  // Set exception ports on the entire task. Unfortunately, this clobbers any
> +  // other exception ports for the task, and forwarding to those other ports
> +  // is not easy to get right.

Does this mean that this code clobbers wasm/asm.js handlers?

::: toolkit/recordreplay/MemorySnapshot.cpp
@@ +205,5 @@
> +//    then calls ActivateEnd(), blocking in the call.
> +//
> +// 4. Snapshot threads are now unblocked from WaitUntilNoLongerActive(). The
> +//    main thread does not unblock from ActivateEnd() until all snapshot
> +//    threads have left WaitUntilNoLongerActive().

Why does the main thread need to wait until all the other threads have woken up, vs. the main thread just doing the equivalent of a condition variable broadcast?  Can we explain the reasoning behind that here?

@@ +208,5 @@
> +//    main thread does not unblock from ActivateEnd() until all snapshot
> +//    threads have left WaitUntilNoLongerActive().
> +class SnapshotThreadCondition {
> +  Atomic<bool, SequentiallyConsistent, Behavior::DontPreserve> mActive;
> +  Atomic<int32_t, SequentiallyConsistent, Behavior::DontPreserve> mCount;

This structure appears to be manually implementing barriers:

http://pubs.opengroup.org/onlinepubs/009695399/functions/pthread_barrier_wait.html

IIUC, we really want two barriers, one for the main thread to wait until all snapshot threads have done things, and then one for the main thread to wait until all the snapshot threads have been woken up?  Or are the exiting semantics subtly different enough from barriers that it's worth rolling our own routine(s) here?

@@ +394,5 @@
> +{
> +  StartCountdown(0);
> +}
> +
> +#ifdef WANT_COUNTDOWN_THREAD

Is this #ifdef intended for future expansion to other systems (i.e. Windows)?  Or for exploration of alternate, more efficient ways of taking snapshots?

@@ +853,5 @@
> +
> +  UpdateNumTrackedRegionsForSnapshot();
> +
> +  // Write protect all tracked memory.
> +  SetMemoryChangesAllowed(false);

We have a lot of paired SetMemoryChangesAllowed calls; is it worth having an RAII helper for managing the pairing?

@@ +924,5 @@
> +  // Look for a free region we can take the next chunk from.
> +  size_t size = ChunkPages * PageSize;
> +  gMemoryInfo->mMemoryBalance[mKind] += size;
> +
> +  mNextChunk = ExtractLockHeld(size);

Something that we've used in other places throughout the tree is a proof-of-lock parameter:

ExtractLockHeld(size_t, const AutoLockThing&);

so you can't call the function without holding a lock.  It's not foolproof, of course, as you could provide the wrong lock, but I think it'd make some of the locking invariants in this code a little more obvious (e.g. MaybeRefillNextChunk, AIUI, needs to be called with the lock held, but nothing makes that explicit).

@@ +1087,5 @@
> +  }
> +}
> +
> +void*
> +AllocateMemoryTryAddress(void* aAddress, size_t aSize, AllocatedMemoryKind aKind)

Does this function need some sort of locking to protect accesses to gMemoryInfo?

@@ +1093,5 @@
> +  MOZ_RELEASE_ASSERT(aAddress == PageBase(aAddress));
> +  aSize = RoundupSizeToPageBoundary(aSize);
> +
> +  if (gMemoryInfo) {
> +    gMemoryInfo->mMemoryBalance[aKind] += aSize;

e.g. mMemoryBalance isn't atomic in any way, shape, or form.

::: toolkit/recordreplay/MemorySnapshot.h
@@ +108,5 @@
> +void InitializeCountdownThread();
> +
> +// This is an alternative to memmove/memcpy that can be called in areas where
> +// faults in write protected memory are not allowed. It's hard to avoid dynamic
> +// code loading when calling memmove/memcpy directly.

This comment seems a little weird; aren't these routines in libc, which should be loaded by the time we do anything webreplay-related?
Attachment #8982938 - Flags: review?(nfroyd) → review+
Comment on attachment 8982936 [details] [diff] [review]
Part 3 - Record/replay utilities.

Review of attachment 8982936 [details] [diff] [review]:
-----------------------------------------------------------------

I'm not totally clear on the use of File/Stream; I think everything is OK, but I'd like the questions on File.h cleared up before I r+ this.

::: toolkit/recordreplay/File.cpp
@@ +221,5 @@
> +// File
> +///////////////////////////////////////////////////////////////////////////////
> +
> +// We expect to find this at every index in a file.
> +static const uint64_t MagicValue = 0xd3e7f5fa + 0;

What is the `+ 0` here for?  Would it be better to have this be an actual 64-bit value, rather than something that fits into 32 bits?

::: toolkit/recordreplay/File.h
@@ +18,5 @@
> +namespace recordreplay {
> +
> +// Structure managing file I/O. Each file contains an index for a set of named
> +// streams, whose contents are compressed and interleaved throughout the file.
> +// Additionally, we directly manage the file handle and all associated memory.

Maybe it would be more obvious if I combed through all the other patches, but it's not obvious here whether File/Stream are for managing I/O that webreplay requires internally, or managing views of file I/O that the application is doing.  Reading the header kind of makes me think we're doing the latter, but the code makes it look more like the former.  Could we be more explicit about which level this code works at?

@@ +21,5 @@
> +// streams, whose contents are compressed and interleaved throughout the file.
> +// Additionally, we directly manage the file handle and all associated memory.
> +// This makes it easier to restore memory snapshots without getting confused
> +// about the state of the file handles which the process has opened. Data
> +// written and read from files is automatically compressed with LZ4.

If this code is for views of application file I/O, we are remembering the data written to/read from for purposes of replay?  Does data written during recording (or even replay) ever actually hit the disk in any fashion?  I think it'd be worth expanding on questions like that here...unless those questions are covered someplace else and I just haven't seen them/remember them.

@@ +23,5 @@
> +// This makes it easier to restore memory snapshots without getting confused
> +// about the state of the file handles which the process has opened. Data
> +// written and read from files is automatically compressed with LZ4.
> +//
> +// File is threadsafe, but Stream is not.

How does that work with multiple threads writing to a single file?  Or with open/write/close on one thread and then open/read/write/close on another?

@@ +74,5 @@
> +  // chunks in the entire stream.
> +  InfallibleVector<StreamChunkLocation> mChunks;
> +
> +  // Data buffer.
> +  char* mBuffer;

Could this be a UniquePtr<char[]>, please?

@@ +93,5 @@
> +  // The number of uncompressed bytes read or written from the stream.
> +  size_t mStreamPos;
> +
> +  // Any buffer available for use when decompressing or compressing data.
> +  char* mBallast;

Could this be a UniquePtr<char[]>, please?

@@ +209,5 @@
> +  // The offset of the last index read or written to the file.
> +  uint64_t mLastIndexOffset;
> +
> +  // All streams in this file, indexed by stream name and name index.
> +  typedef InfallibleVector<Stream*> StreamVector;

Could this be InfallibleVector<UniquePtr<Stream>>?  Would remove some manual memory management.
Comment on attachment 8982939 [details] [diff] [review]
Part 9 - Thread state snapshots.

Review of attachment 8982939 [details] [diff] [review]:
-----------------------------------------------------------------

More gnarly code.

::: toolkit/recordreplay/ThreadSnapshot.cpp
@@ +12,5 @@
> +namespace mozilla {
> +namespace recordreplay {
> +
> +#define THREAD_STACK_TOP_SIZE 2048
> +#define THREAD_STACK_TOP_SIZE_STR "2048"

Surely we could have a STRINGIFY macro that would avoid the need for a separate #define.

@@ +19,5 @@
> +// The contents of this structure are in preserved memory.
> +struct ThreadState {
> +  // Whether this thread should update its state when no longer idle. This is
> +  // only used for non-main threads.
> +  size_t /* bool */ mShouldRestore;

This is being used as a `bool`, but it's not actually a `bool`?  For alignment?  Worth explicating that here, if so.

@@ +25,5 @@
> +  // Register state, as stored by setjmp and restored by longjmp. Saved when a
> +  // non-main thread idles or the main thread begins to save all thread states.
> +  // When |mShouldRestore| is set, this is the state to set it to.
> +  jmp_buf mRegisters; // jmp_buf is 148 bytes
> +  uint32_t mPadding;

Why do we need to manually pad the structure here?

@@ +83,5 @@
> +#define THREAD_STACK_POINTER_OFFSET "160"
> +#define THREAD_STACK_TOP_OFFSET "168"
> +#define THREAD_STACK_TOP_BYTES_OFFSET "2216"
> +#define THREAD_STACK_CONTENTS_OFFSET "2224"
> +#define THREAD_STACK_BYTES_OFFSET "2232"

The aforementioned STRINGIFY macro would also enable these to be defined as proper integers...

@@ +181,5 @@
> +  MOZ_ASSERT(offsetof(ThreadState, mStackPointer) == atol(THREAD_STACK_POINTER_OFFSET));
> +  MOZ_ASSERT(offsetof(ThreadState, mStackTop) == atol(THREAD_STACK_TOP_OFFSET));
> +  MOZ_ASSERT(offsetof(ThreadState, mStackTopBytes) == atol(THREAD_STACK_TOP_BYTES_OFFSET));
> +  MOZ_ASSERT(offsetof(ThreadState, mStackContents) == atol(THREAD_STACK_CONTENTS_OFFSET));
> +  MOZ_ASSERT(offsetof(ThreadState, mStackBytes) == atol(THREAD_STACK_BYTES_OFFSET));

...which would enable these to be static_asserts.

@@ +277,5 @@
> +{
> +  MOZ_RELEASE_ASSERT(Thread::CurrentIsMainThread());
> +
> +  BeginPassThroughThreadEvents();
> +  SetMemoryChangesAllowed(false);

What handles turning this back to `true`?
Attachment #8982939 - Flags: review?(nfroyd) → review+
Assignee

Comment 38

11 months ago
(In reply to Nathan Froyd [:froydnj] from comment #35)
> @@ +111,5 @@
> > +  Thread::SpawnNonRecordedThread(DirtyMemoryExceptionHandlerThread, nullptr);
> > +
> > +  // Set exception ports on the entire task. Unfortunately, this clobbers any
> > +  // other exception ports for the task, and forwarding to those other ports
> > +  // is not easy to get right.
> 
> Does this mean that this code clobbers wasm/asm.js handlers?

Yeah, it does.  We're not compatible with wasm yet.

> ::: toolkit/recordreplay/MemorySnapshot.cpp
> @@ +205,5 @@
> > +//    then calls ActivateEnd(), blocking in the call.
> > +//
> > +// 4. Snapshot threads are now unblocked from WaitUntilNoLongerActive(). The
> > +//    main thread does not unblock from ActivateEnd() until all snapshot
> > +//    threads have left WaitUntilNoLongerActive().
> 
> Why does the main thread need to wait until all the other threads have woken
> up, vs. the main thread just doing the equivalent of a condition variable
> broadcast?  Can we explain the reasoning behind that here?

The main thread is pretty careful when coordinating with the snapshot threads: if the main thread is doing something then the snapshot threads shouldn't be doing anything, and vice versa.  For example, the main use for this structure is when the main thread tells the snapshot threads to restore the contents of all pages in tracked memory to their contents when an earlier checkpoint was saved.  We want to make sure that operation completes before the main thread does anything else in the checkpoint restore process, to ensure no threads are reading data that hasn't been updated yet.  I'll improve the comment here.

> @@ +208,5 @@
> > +//    main thread does not unblock from ActivateEnd() until all snapshot
> > +//    threads have left WaitUntilNoLongerActive().
> > +class SnapshotThreadCondition {
> > +  Atomic<bool, SequentiallyConsistent, Behavior::DontPreserve> mActive;
> > +  Atomic<int32_t, SequentiallyConsistent, Behavior::DontPreserve> mCount;
> 
> This structure appears to be manually implementing barriers:
> 
> http://pubs.opengroup.org/onlinepubs/009695399/functions/
> pthread_barrier_wait.html
> 
> IIUC, we really want two barriers, one for the main thread to wait until all
> snapshot threads have done things, and then one for the main thread to wait
> until all the snapshot threads have been woken up?  Or are the exiting
> semantics subtly different enough from barriers that it's worth rolling our
> own routine(s) here?

Unfortunately, it's really hard to call library functions in this code, even simple ones like memcpy.  When SnapshotThreadConditions are used there is normally a restriction that tracked memory cannot be written to; SnapshotThreadConditions are allocated in untracked memory and they are implemented using Thread routines that also guarantee no accesses to tracked memory.  pthread_barrier_wait doesn't provide us these guarantees.

> @@ +394,5 @@
> > +{
> > +  StartCountdown(0);
> > +}
> > +
> > +#ifdef WANT_COUNTDOWN_THREAD
> 
> Is this #ifdef intended for future expansion to other systems (i.e.
> Windows)?  Or for exploration of alternate, more efficient ways of taking
> snapshots?

This #ifdef is a debugging workaround.  lldb does not interact very well with some of the things we do in recording/replaying processes and attempting to interrupt execution if a process has hung will often cause it to disconnect from the process entirely.  Breakpoints still work, though, so this #ifdef can be used in concert with the StartCountdown routine to trigger a fatal error after the hang starts and break there with lldb.  I'll comment this stuff, this isn't code I'm super proud of.

> @@ +1087,5 @@
> > +  }
> > +}
> > +
> > +void*
> > +AllocateMemoryTryAddress(void* aAddress, size_t aSize, AllocatedMemoryKind aKind)
> 
> Does this function need some sort of locking to protect accesses to
> gMemoryInfo?

gMemoryInfo is written once during startup and accesses to it don't need to be synchronized.

> ::: toolkit/recordreplay/MemorySnapshot.h
> @@ +108,5 @@
> > +void InitializeCountdownThread();
> > +
> > +// This is an alternative to memmove/memcpy that can be called in areas where
> > +// faults in write protected memory are not allowed. It's hard to avoid dynamic
> > +// code loading when calling memmove/memcpy directly.
> 
> This comment seems a little weird; aren't these routines in libc, which
> should be loaded by the time we do anything webreplay-related?

Yeah, but on macOS at least even after the code has been loaded it is initially a stub that dynamically fills itself in the first time it is called, which touches tracked memory and breaking the memory snapshot system.  I tried defeating this by calling memcpy() before we ever enter the snapshot system to make sure the code has been loaded (InitializeThreadSnapshots does this for setjmp/longjmp) but memcpy has other little functions it may or may not call which also dynamically initialize themselves.  Eventually I gave up and wrote this routine.
Assignee

Comment 39

11 months ago
(In reply to Nathan Froyd [:froydnj] from comment #36)
> ::: toolkit/recordreplay/File.h
> @@ +18,5 @@
> > +namespace recordreplay {
> > +
> > +// Structure managing file I/O. Each file contains an index for a set of named
> > +// streams, whose contents are compressed and interleaved throughout the file.
> > +// Additionally, we directly manage the file handle and all associated memory.
> 
> Maybe it would be more obvious if I combed through all the other patches,
> but it's not obvious here whether File/Stream are for managing I/O that
> webreplay requires internally, or managing views of file I/O that the
> application is doing.  Reading the header kind of makes me think we're doing
> the latter, but the code makes it look more like the former.  Could we be
> more explicit about which level this code works at?

Sure, I'll update the comment.  File/Stream are for I/O that web replay does internally.  The only current (and future, I expect) use is for the recording file which stores the actual events and other data needed to record/replay the execution.  (There used to be other uses for memory and thread snapshot data, but those were not compatible with sandboxing and have been removed.)

> @@ +21,5 @@
> > +// streams, whose contents are compressed and interleaved throughout the file.
> > +// Additionally, we directly manage the file handle and all associated memory.
> > +// This makes it easier to restore memory snapshots without getting confused
> > +// about the state of the file handles which the process has opened. Data
> > +// written and read from files is automatically compressed with LZ4.
> 
> If this code is for views of application file I/O, we are remembering the
> data written to/read from for purposes of replay?  Does data written during
> recording (or even replay) ever actually hit the disk in any fashion?  I
> think it'd be worth expanding on questions like that here...unless those
> questions are covered someplace else and I just haven't seen them/remember
> them.

I think this is covered by the comment above (e.g. this is not for I/O being done by the rest of the process, just internal stuff) but when recording, the process does all the operations it normally would do, including all file I/O.  Additionally, it writes to a recording file through the definitions here, to keep track of those operations and any data they produced.  When replaying, the process reads from the same recording file but does not do any other file I/O or operations included in the recording.

> @@ +23,5 @@
> > +// This makes it easier to restore memory snapshots without getting confused
> > +// about the state of the file handles which the process has opened. Data
> > +// written and read from files is automatically compressed with LZ4.
> > +//
> > +// File is threadsafe, but Stream is not.
> 
> How does that work with multiple threads writing to a single file?  Or with
> open/write/close on one thread and then open/read/write/close on another?

The only threadsafe accesses we support on Files are simultaneous write/write or read/read.  I'll update the comment.
Assignee

Comment 40

11 months ago
(In reply to Nathan Froyd [:froydnj] from comment #37)
> ::: toolkit/recordreplay/ThreadSnapshot.cpp
> @@ +12,5 @@
> > +namespace mozilla {
> > +namespace recordreplay {
> > +
> > +#define THREAD_STACK_TOP_SIZE 2048
> > +#define THREAD_STACK_TOP_SIZE_STR "2048"
> 
> Surely we could have a STRINGIFY macro that would avoid the need for a
> separate #define.

I tried when first writing this code but IIRC even the normal preprocessor trick macros for this would still pass the wrong string to the compiler's assembler.  I'll try again and see if I can get it to work.

> @@ +19,5 @@
> > +// The contents of this structure are in preserved memory.
> > +struct ThreadState {
> > +  // Whether this thread should update its state when no longer idle. This is
> > +  // only used for non-main threads.
> > +  size_t /* bool */ mShouldRestore;
> 
> This is being used as a `bool`, but it's not actually a `bool`?  For
> alignment?  Worth explicating that here, if so.

Sure, I'll add a comment.  Since we're digging around in this structure with hand written assembly code we need to know the offsets the compiler will use for the struct, and making this a size_t also makes it simpler to write code that accesses it.

> @@ +25,5 @@
> > +  // Register state, as stored by setjmp and restored by longjmp. Saved when a
> > +  // non-main thread idles or the main thread begins to save all thread states.
> > +  // When |mShouldRestore| is set, this is the state to set it to.
> > +  jmp_buf mRegisters; // jmp_buf is 148 bytes
> > +  uint32_t mPadding;
> 
> Why do we need to manually pad the structure here?

This keeps the structure's fields aligned at 8 bytes.  I'll add a comment.

> @@ +277,5 @@
> > +{
> > +  MOZ_RELEASE_ASSERT(Thread::CurrentIsMainThread());
> > +
> > +  BeginPassThroughThreadEvents();
> > +  SetMemoryChangesAllowed(false);
> 
> What handles turning this back to `true`?

When this function calls RestoreThreadStack(), it will update its stack and registers to their state when SaveThreadState() was called by SaveAllThreads().  Control flow will continue from there, and will call SetMemoryChangesAllowed(true) before returning.
Comment on attachment 8982936 [details] [diff] [review]
Part 3 - Record/replay utilities.

Review of attachment 8982936 [details] [diff] [review]:
-----------------------------------------------------------------

r=me with changes and whatnot talked about previously.
Attachment #8982936 - Flags: review?(nfroyd) → review+
Comment on attachment 8982940 [details] [diff] [review]
Part 10 - Rewinding logic.

Review of attachment 8982940 [details] [diff] [review]:
-----------------------------------------------------------------

mccr8, do you feel comfortable reviewing this, since I think you looked at some code that uses this?  Otherwise perhaps Eric or jld or some combination of all of you can take a look.
Attachment #8982940 - Flags: review?(nfroyd) → review?(continuation)
Comment on attachment 8982940 [details] [diff] [review]
Part 10 - Rewinding logic.

Review of attachment 8982940 [details] [diff] [review]:
-----------------------------------------------------------------

::: toolkit/recordreplay/ProcessRewind.cpp
@@ +23,5 @@
> +namespace recordreplay {
> +
> +// Information about the current rewinding state. The contents of this structure
> +// are in untracked memory.
> +struct RewindInfo {

nit: { on the next line.

@@ +27,5 @@
> +struct RewindInfo {
> +  // The most recent checkpoint which was encountered.
> +  CheckpointId mLastCheckpoint;
> +
> +  // Whether this is the active child process.

It would be good to refer to wherever it was that has the comment explaining how the active child process stuff is done.

@@ +76,5 @@
> +  MOZ_RELEASE_ASSERT(aCheckpoint == gRewindInfo->mLastCheckpoint ||
> +                     CheckpointPrecedes(aCheckpoint, gRewindInfo->mLastCheckpoint));
> +
> +  // Make sure we don't lose pending main thread callbacks due to rewinding.
> +  MOZ_RELEASE_ASSERT(gMainThreadCallbacks.empty());

You aren't holding the monitor when you access gMainThreadCallbacks here, which seems bad.

@@ +139,5 @@
> +  MOZ_RELEASE_ASSERT(Thread::CurrentIsMainThread());
> +  MOZ_RELEASE_ASSERT(!AreThreadEventsPassedThrough());
> +  MOZ_RELEASE_ASSERT(IsReplaying() || !aTemporary);
> +
> +  gBeforeCheckpointHook();

Is this always non-null?

@@ +206,5 @@
> +  return reachedCheckpoint;
> +}
> +
> +static bool gRecordingDiverged;
> +static bool gUnhandledDivergeAllowed;

These look like they should be initialized.

@@ +220,5 @@
> +
> +MOZ_EXPORT bool
> +RecordReplayInterface_InternalHasDivergedFromRecording()
> +{
> +  return gRecordingDiverged && Thread::CurrentIsMainThread();

Should these checks be in the opposite order to prevent a race? (Albeit one that has no ultimate effect on the outcome.)

@@ +240,5 @@
> +} // extern "C"
> +
> +void
> +EnsureNotDivergedFromRecording()
> +{

Should this have a main thread assertion?

@@ +259,5 @@
> +
> +CheckpointId
> +GetLastSavedCheckpoint()
> +{
> +  MOZ_RELEASE_ASSERT(!gRewindInfo->mSavedCheckpoints.empty());

An assert on HasSavedCheckpoint instead might be a little cleaner, and would avoid a null deref if there's no rewind info.

@@ +272,5 @@
> +  return gMainThreadShouldPause;
> +}
> +
> +// Whether there is a PauseMainThreadAndServiceCallbacks frame on the stack.
> +static bool gMainThreadIsPaused = false;

AFAICT This could be a static variable in the function itself, which would make it a little easier to see the precise scope it is used in.

@@ +281,5 @@
> +  MOZ_RELEASE_ASSERT(Thread::CurrentIsMainThread());
> +  MOZ_RELEASE_ASSERT(!AreThreadEventsPassedThrough());
> +  MOZ_RELEASE_ASSERT(!gRecordingDiverged);
> +
> +  if (gMainThreadIsPaused)

nit: brace the body of the if.

@@ +297,5 @@
> +        MonitorAutoUnlock unlock(*gMainThreadCallbackMonitor);
> +        AutoDisallowThreadEvents disallow;
> +        callback();
> +      }
> +      continue;

nit: Maybe use an "else" for the Wait() rather than having the continue? Seems a little odd to continue into a single statement.
Attachment #8982940 - Flags: review?(continuation) → review+

Comment 44

10 months ago
Pushed by bhackett@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/35160c4985ff
Part 1 - Public record/replay API, r=froydnj.
Assignee

Updated

10 months ago
Whiteboard: leave-open

Comment 46

10 months ago
Pushed by bhackett@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/92c58131d078
Part 2 - Record/replay build files, r=froydnj.
https://hg.mozilla.org/integration/mozilla-inbound/rev/ab10a592c8e0
Part 3 - Record/replay utilities, r=froydnj.
https://hg.mozilla.org/integration/mozilla-inbound/rev/64c2c205de26
Part 4 - PLD and PL hashtable stabilization, r=froydnj.
https://hg.mozilla.org/integration/mozilla-inbound/rev/f4e9d6a808f1
Part 5 - Core record/replay functionality, r=froydnj.
https://hg.mozilla.org/integration/mozilla-inbound/rev/be3e40e703a0
Part 6 - Redirection infrastructure, r=froydnj.
https://hg.mozilla.org/integration/mozilla-inbound/rev/73eb0d202793
Part 7 - Darwin redirections, r=froydnj.
https://hg.mozilla.org/integration/mozilla-inbound/rev/6d6f36cce92e
Part 8 - Heap memory snapshots, r=froydnj.
https://hg.mozilla.org/integration/mozilla-inbound/rev/0f2643f04606
Part 9 - Thread state snapshots, r=froydnj.
https://hg.mozilla.org/integration/mozilla-inbound/rev/a8653d1dfdfe
Part 10 - Rewinding logic, r=mccr8.

Comment 47

10 months ago
Pushed by archaeopteryx@coole-files.de:
https://hg.mozilla.org/integration/mozilla-inbound/rev/e18b9018442b
Add metadata to file bugs for toolkit/recordreplay into Core::Web Replay. r=test-fix
Assignee

Updated

10 months ago
Status: NEW → RESOLVED
Last Resolved: 10 months ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.