Closed Bug 1465452 Opened 6 years ago Closed 6 years ago

Web Replay: Gecko recording tweaks

Categories

(Core Graveyard :: Web Replay, defect)

defect
Not set
normal

Tracking

(firefox62 affected)

RESOLVED FIXED
Tracking Status
firefox62 --- affected

People

(Reporter: bhackett1024, Assigned: bhackett1024)

References

Details

(Whiteboard: leave-open)

Attachments

(14 files)

3.72 KB, patch
froydnj
: review+
Details | Diff | Splinter Review
5.10 KB, patch
froydnj
: review+
Details | Diff | Splinter Review
1.69 KB, patch
froydnj
: review+
Details | Diff | Splinter Review
3.92 KB, patch
aosmond
: review+
Details | Diff | Splinter Review
1.21 KB, patch
n.nethercote
: review+
Details | Diff | Splinter Review
827 bytes, patch
froydnj
: review+
Details | Diff | Splinter Review
1.42 KB, patch
n.nethercote
: review+
Details | Diff | Splinter Review
1.31 KB, patch
froydnj
: review+
Details | Diff | Splinter Review
785 bytes, patch
froydnj
: review+
Details | Diff | Splinter Review
846 bytes, patch
froydnj
: review+
Details | Diff | Splinter Review
1005 bytes, patch
erahm
: review+
Details | Diff | Splinter Review
923 bytes, patch
froydnj
: review+
Details | Diff | Splinter Review
1.23 KB, patch
froydnj
: review+
Details | Diff | Splinter Review
647 bytes, patch
froydnj
: review+
Details | Diff | Splinter Review
This bug has changes needed in Gecko to specify whether mutexes or atomic values should be recorded, and adjustments in specific places to record or not record such activity.
Mutexes and monitors have their behavior recorded by default, but this can be changed in the constructor.  This is pretty similar to part 4b of bug 1207696 (some parts of which still exist in the codebase and have had their review carried forward).
Assignee: nobody → bhackett1024
Attachment #8981854 - Flags: review?(nfroyd)
nsISupports threadsafe refcounts are not recorded by default, but this can be overridden with the NS_DECL_THREADSAFE_ISUPPORTS_WITH_RECORDING macro.
Attachment #8981855 - Flags: review?(nfroyd)
Background hang monitor state can be accessed at non-deterministic points (during the interrupt callback I think).
Attachment #8981859 - Flags: review?(nfroyd)
This is pretty similar to part 5f of bug 1207696, but the associated image loader code has changed a fair amount and the fix is also cleaner now.
Attachment #8981860 - Flags: review?(matt.woodrow)
Malloc can be called at non-deterministic points when recording/replaying.
Attachment #8981861 - Flags: review?(n.nethercote)
If the assertion crash flag is recorded then we can another error when an assertion fails that obscures the real problem.
Attachment #8981862 - Flags: review?(nfroyd)
Profiler flags can be accessed during GC and other places where recorded events are not allowed.
Attachment #8981864 - Flags: review?(n.nethercote)
Logging code can be accessed during GC and other places where recorded events are not allowed.
Attachment #8981865 - Flags: review?(nfroyd)
Passing through thread events when NS_WARNING and errors are printed allows these methods to be freely called in the GC and other places where recorded events are not allowed.
Attachment #8981866 - Flags: review?(nfroyd)
I think this atomic is accessed during GC.
Attachment #8981867 - Flags: review?(nfroyd)
Pipe input and output streams have destructors that can perform recorded events, and if their refcount changes are not recorded then the thread which destroys them can vary between recording and replay.
Attachment #8981869 - Flags: review?(erahm)
Some abstract thread subclasses have destructors that can perform recorded events, and if their refcount changes are not recorded then the thread which destroys them can vary between recording and replay.
Attachment #8981870 - Flags: review?(suro001)
Sorry, forgot the comment on the last patch.  I think timer thread state can be accessed during the GC.
Some runnable subclasses have destructors that can perform recorded events, and if their refcount changes are not recorded then the thread which destroys them can vary between recording and replay.
Attachment #8981874 - Flags: review?(nfroyd)
Attachment #8981855 - Flags: review?(nfroyd) → review+
Comment on attachment 8981859 [details] [diff] [review]
Part 3 - Avoid recording some background hang monitor state.

Review of attachment 8981859 [details] [diff] [review]:
-----------------------------------------------------------------

Maybe the rationale ("Background hang monitor state can be accessed at non-deterministic points (during the interrupt callback I think).") should be expanded slightly into a comment somewhere in this file?
Attachment #8981859 - Flags: review?(nfroyd) → review+
Comment on attachment 8981862 [details] [diff] [review]
Part 6 - Don't record assertion crashing flag.

Review of attachment 8981862 [details] [diff] [review]:
-----------------------------------------------------------------

"then we can another error when an assertion fails"?  Does that mean to say that we can record two errors, and the second one obscures the first?  Or something else?
Attachment #8981862 - Flags: review?(nfroyd) → review+
Attachment #8981854 - Flags: review?(nfroyd) → review+
Attachment #8981865 - Flags: review?(nfroyd) → review+
Attachment #8981866 - Flags: review?(nfroyd) → review+
Attachment #8981867 - Flags: review?(nfroyd) → review+
Comment on attachment 8981874 [details] [diff] [review]
Part 14 - Record refcount changes for runnables.

Review of attachment 8981874 [details] [diff] [review]:
-----------------------------------------------------------------

I think I believe that this is necessary, but "...then the thread which destroys them can vary between recording and replay." sounds completely bewildering.  How does that even happen?

I guess that means if we're not sequencing refcount changes on runnables, we might not be running the refcount changes in the same order between record and replay, and therefore thread 1 could see the refcount go to zero during record, but thread 2 sees the refcount go to zero in the replay.  Is that a correct understanding of the situation?
Attachment #8981874 - Flags: review?(nfroyd) → review+
Comment on attachment 8981871 [details] [diff] [review]
Part 13 - Avoid recording some timer thread state.

Review of attachment 8981871 [details] [diff] [review]:
-----------------------------------------------------------------

Some comment about why this is necessary would be great.  (Actually, some comment about why we're DontPreserve'ing things in each patch in this bug wouldn't be a bad idea, similar to the way we attempt to have people using Relaxed memory ordering with atomics justify themselves in the code.)

Timers during GC sounds very plausible.
Attachment #8981871 - Flags: review?(nfroyd) → review+
Attachment #8981860 - Flags: review?(matt.woodrow) → review?(aosmond)
(In reply to Nathan Froyd [:froydnj] from comment #17)
> Comment on attachment 8981862 [details] [diff] [review]
> Part 6 - Don't record assertion crashing flag.
> 
> Review of attachment 8981862 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> "then we can another error when an assertion fails"?  Does that mean to say
> that we can record two errors, and the second one obscures the first?  Or
> something else?

If an assertion fails when we are recording or replaying and are in the GC or another place where recorded events are not allowed, then touching the crashing flag will cause a separate fatal error to be reported by the record/replay system, unrelated to the original failing assertion.  Sorry about the garbled description.
(In reply to Nathan Froyd [:froydnj] from comment #18)
> Comment on attachment 8981874 [details] [diff] [review]
> Part 14 - Record refcount changes for runnables.
> 
> Review of attachment 8981874 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> I think I believe that this is necessary, but "...then the thread which
> destroys them can vary between recording and replay." sounds completely
> bewildering.  How does that even happen?
> 
> I guess that means if we're not sequencing refcount changes on runnables, we
> might not be running the refcount changes in the same order between record
> and replay, and therefore thread 1 could see the refcount go to zero during
> record, but thread 2 sees the refcount go to zero in the replay.  Is that a
> correct understanding of the situation?

Yes, this is the problem which this patch and the other ones on nsISupports refcounts are fixing.  In principle we should be able to record refcount changes for all (or almost all) nsISupports, but this would add a lot of recording overhead and in most cases is unnecessary, so we default to not recording the changes and then doing spot fixes for classes where that is a problem.
Attachment #8981861 - Flags: review?(n.nethercote) → review+
Attachment #8981864 - Flags: review?(n.nethercote) → review+
Attachment #8981860 - Flags: review?(aosmond) → review+
Attachment #8981869 - Flags: review?(erahm) → review+
Attachment #8981870 - Flags: review?(suro001) → review?(nfroyd)
Attachment #8981870 - Flags: review?(nfroyd) → review+
Component: General → Web Replay
Pushed by bhackett@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/7ff685ebac3e
Part 1 - Allow platform mutexes to specify whether they are recorded, r=froydnj.
https://hg.mozilla.org/integration/mozilla-inbound/rev/16226b867568
Part 2 - Allow nsISupports subclasses to record/replay their refcount changes, r=froydnj.
https://hg.mozilla.org/integration/mozilla-inbound/rev/894b187d288f
Part 3 - Avoid recording some background hang monitor state, r=froydnj.
https://hg.mozilla.org/integration/mozilla-inbound/rev/7b542f98c02c
Part 5 - Don't record some jemalloc atomics, r=njn.
https://hg.mozilla.org/integration/mozilla-inbound/rev/ef7ed9b4ee49
Part 6 - Don't record assertion crashing flag, r=froydnj.
https://hg.mozilla.org/integration/mozilla-inbound/rev/554c3f761c64
Part 7 - Don't record a profiler atomic, r=njn.
https://hg.mozilla.org/integration/mozilla-inbound/rev/3b091c71170a
Part 8 - Don't record logging state, r=froydnj.
https://hg.mozilla.org/integration/mozilla-inbound/rev/25957aa95ba3
Part 10 - Don't record atom table atomic, r=froydnj.
https://hg.mozilla.org/integration/mozilla-inbound/rev/2fc6404e8d2a
Part 11 - Record refcount changes for pipe input/output streams, r=erahm.
https://hg.mozilla.org/integration/mozilla-inbound/rev/f01abf3b9dd7
Part 12 - Record refcount changes for abstract threads, r=froydnj.
https://hg.mozilla.org/integration/mozilla-inbound/rev/ebe594fec36d
Part 13 - Avoid recording some timer thread state, r=froydnj.
https://hg.mozilla.org/integration/mozilla-inbound/rev/d8f82b41f1af
Part 14 - Record refcount changes for runnables, r=froydnj.
Whiteboard: leave-open
Pushed by bhackett@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/654209457989
Part 4 - Don't sort image loader requests and frames when recording/replaying, r=aosmond.
https://hg.mozilla.org/integration/mozilla-inbound/rev/15e1646fafb9
Part 9 - Allow warnings and errors to be printed without affecting the recording, r=froydnj.
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Product: Core → Core Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: