Closed Bug 1465292 Opened 2 years ago Closed 2 years ago

Web Replay: Save script file contents for the replay debugger

Categories

(Core :: Web Replay, enhancement)

enhancement
Not set

Tracking

()

RESOLVED FIXED
mozilla63
Tracking Status
firefox63 --- fixed

People

(Reporter: bhackett1024, Assigned: bhackett1024)

References

Details

Attachments

(3 files, 2 obsolete files)

Normally devtools does network queries and so forth to get the contents of sources that have been loaded by a tab.  When replaying an old recording, this is problematic because those sources may no longer exist or may have different contents.  This bug adds an API and users of that to keep track of the complete contents of these sources; just using the ScriptSource is not sufficient because it elides non-script parts of HTML files.
Add public API.  This is implemented in the replay debugger.
Assignee: nobody → bhackett1024
Attachment #8981715 - Flags: review?(jimb)
Keep track of content loaded by de-XDR'ing scripts.
Attachment #8981717 - Flags: review?(nicolas.b.pierron)
Keep track of content loaded from script files.
Attachment #8981719 - Flags: review?(bugs)
Keep track of content loaded from HTML files.
Attachment #8981721 - Flags: review?(hsivonen)
Comment on attachment 8981721 [details] [diff] [review]
Part 4 - Track parsed content from HTML files.

Review of attachment 8981721 [details] [diff] [review]:
-----------------------------------------------------------------

::: parser/html/nsHtml5StreamParser.cpp
@@ +290,5 @@
> +  if (recordreplay::IsRecordingOrReplaying()) {
> +    nsAutoCString spec;
> +    aURL->GetSpec(spec);
> +    JS::BeginContentParseForRecordReplay(this, spec.get(), "text/html",
> +                                         JS::SmallestEncoding::Latin1 /* FIXME Guessing here */);

Is the intention to capture bytes and claim they are "Latin1" as a storage hack?

How is the recorded data going to be replayed? Is it going to appear as if it came from the network in the HTML case and the external JS case stores post-decode UTF-16 as a shortcut?

@@ +1179,5 @@
>    }
>  
> +  if (recordreplay::IsRecordingOrReplaying()) {
> +    JS::AddContentParseDataForRecordReplay(this, aBuffer, aLength);
> +  }

This runs off the main thread. I assume calling these APIs is not OK from there. Marking r-, unless it's explicitly OK to do this from the parser thread.
Attachment #8981721 - Flags: review?(hsivonen) → review-
(In reply to Brian Hackett (:bhackett) from comment #1)
> Add public API.  This is implemented in the replay debugger.

Is this API thread-safe?  IS there a bug number where we can read about this API implementation?

(In reply to Brian Hackett (:bhackett) from comment #4)
> Part 4 - Track parsed content from HTML files.
> Part 3 - Track parsed content from loaded script files.
> Part 2 - Track parsed content from XDR'ed scripts.

Wouldn't this monitor 3 times the same data?
(In reply to Nicolas B. Pierron [:nbp] {backlog: 36} from comment #6)
> (In reply to Brian Hackett (:bhackett) from comment #1)
> > Add public API.  This is implemented in the replay debugger.
> 
> Is this API thread-safe?  IS there a bug number where we can read about this
> API implementation?

Yes, this API is thread safe.  The implementation is located here:

https://hg.mozilla.org/projects/ash/file/1262f537e29f/js/src/vm/ReplayDebugger.cpp#l2244

The code is up for review in part 2 of bug 1465289.

> (In reply to Brian Hackett (:bhackett) from comment #4)
> > Part 4 - Track parsed content from HTML files.
> > Part 3 - Track parsed content from loaded script files.
> > Part 2 - Track parsed content from XDR'ed scripts.
> 
> Wouldn't this monitor 3 times the same data?

These are separate paths along which files containing scripts can be loaded by a content process.
(In reply to Henri Sivonen (:hsivonen) from comment #5)
> Comment on attachment 8981721 [details] [diff] [review]
> Part 4 - Track parsed content from HTML files.
> 
> Review of attachment 8981721 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> ::: parser/html/nsHtml5StreamParser.cpp
> @@ +290,5 @@
> > +  if (recordreplay::IsRecordingOrReplaying()) {
> > +    nsAutoCString spec;
> > +    aURL->GetSpec(spec);
> > +    JS::BeginContentParseForRecordReplay(this, spec.get(), "text/html",
> > +                                         JS::SmallestEncoding::Latin1 /* FIXME Guessing here */);
> 
> Is the intention to capture bytes and claim they are "Latin1" as a storage
> hack?
> 
> How is the recorded data going to be replayed? Is it going to appear as if
> it came from the network in the HTML case and the external JS case stores
> post-decode UTF-16 as a shortcut?

Hmm, to more carefully explain this API, it might be better to forget about whether we are recording or replaying an execution and consider the problem of getting the complete contents of an HTML or other source file.  AFAICT the HTML parser never constructs such a buffer: data is read off a stream and processed as it comes in via nsHtml5StreamParser::DoDataAvailable.  So if the devtools asked for a complete copy of the file, it can't get one, because there never was one.  The purpose of this API is to store the data read off the stream so that a complete copy can be provided when the devtools ask for it later.

In the XDR and foo.js load cases, there is a complete buffer constructed at some point, but when the devtools ask for it later it might not no longer exist.

The Latin1 encoding above isn't a storage hack, this is the encoding which will be used when converting the file contents to a string for the devtools code:

https://hg.mozilla.org/projects/ash/file/1262f537e29f/js/src/vm/ReplayDebugger.cpp#l2328

I couldn't figure out what encoding the HTML file contents actually use, though.  Does the |mEncoding| member describe the entire file's contents?

> @@ +1179,5 @@
> >    }
> >  
> > +  if (recordreplay::IsRecordingOrReplaying()) {
> > +    JS::AddContentParseDataForRecordReplay(this, aBuffer, aLength);
> > +  }
> 
> This runs off the main thread. I assume calling these APIs is not OK from
> there. Marking r-, unless it's explicitly OK to do this from the parser
> thread.

This API is thread safe, sorry for not specifying that earlier.
(In reply to Brian Hackett (:bhackett) from comment #7)
> (In reply to Nicolas B. Pierron [:nbp] {backlog: 36} from comment #6)
> > (In reply to Brian Hackett (:bhackett) from comment #4)
> > > Part 4 - Track parsed content from HTML files.
> > > Part 3 - Track parsed content from loaded script files.
> > > Part 2 - Track parsed content from XDR'ed scripts.
> > 
> > Wouldn't this monitor 3 times the same data?
> 
> These are separate paths along which files containing scripts can be loaded
> by a content process.

Would it make sense to only have these hooks in the Parser and XDR? As these are the only locations which can produce a top-level JSScript.
(In reply to Nicolas B. Pierron [:nbp] {backlog: 36} from comment #9)
> (In reply to Brian Hackett (:bhackett) from comment #7)
> > (In reply to Nicolas B. Pierron [:nbp] {backlog: 36} from comment #6)
> > > (In reply to Brian Hackett (:bhackett) from comment #4)
> > > > Part 4 - Track parsed content from HTML files.
> > > > Part 3 - Track parsed content from loaded script files.
> > > > Part 2 - Track parsed content from XDR'ed scripts.
> > > 
> > > Wouldn't this monitor 3 times the same data?
> > 
> > These are separate paths along which files containing scripts can be loaded
> > by a content process.
> 
> Would it make sense to only have these hooks in the Parser and XDR? As these
> are the only locations which can produce a top-level JSScript.

The problem here is that the script parser never sees the full contents of HTML pages that contain inline <script> tags, just the contents of those script tags.  The devtools want to show the full contents of HTML pages that contain scripts.
Comment on attachment 8981719 [details] [diff] [review]
Part 3 - Track parsed content from loaded script files.

I'm a bit lost here. Why this is used for main thread parsing case only? Why not the case when parsing is Off-main-thread parsing, or when we're loading modules etc.

How does this work for inline scripts if one has several <script> elements in the page?
Attachment #8981719 - Flags: review?(bugs) → review-
Comment on attachment 8981717 [details] [diff] [review]
Part 2 - Track parsed content from XDR'ed scripts.

Review of attachment 8981717 [details] [diff] [review]:
-----------------------------------------------------------------

Reading the implementation of the called function, this call-site sounds fine.

I am still surprised by the fact that we do not have a similar instrumentation for all the code path where StartIncrementalEncoding is done:
https://searchfox.org/mozilla-central/search?q=symbol:_ZN2JS24StartIncrementalEncodingEP9JSContextNS_6HandleIP8JSScriptEE&redirect=false

Or from locations where the parser is called.
Attachment #8981717 - Flags: review?(nicolas.b.pierron) → review+
(In reply to Olli Pettay [:smaug] from comment #11)
> Comment on attachment 8981719 [details] [diff] [review]
> Part 3 - Track parsed content from loaded script files.
> 
> I'm a bit lost here. Why this is used for main thread parsing case only? Why
> not the case when parsing is Off-main-thread parsing, or when we're loading
> modules etc.
> 
> How does this work for inline scripts if one has several <script> elements
> in the page?

Off thread JS parsing is disabled when recording/replaying (bug 1207696 part 5i).  I haven't tested with modules so there might be missing cases; in such an event, the devtools won't get the source for the file and will show a 'This file could not be loaded' placeholder or similar.  For this and other sources we might be missing, I'd have to investigate and write a patch, but so far these changes have worked for me and the very few other folks who have been testing Web Replay.

Inline scripts in HTML pages should be handled by part 4, where there will be a single source reported to the devtools that contains all the inline scripts in addition to the rest of the HTML for the page.
(In reply to Nicolas B. Pierron [:nbp] {backlog: 36} from comment #12)
> Comment on attachment 8981717 [details] [diff] [review]
> Part 2 - Track parsed content from XDR'ed scripts.
> 
> Review of attachment 8981717 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> Reading the implementation of the called function, this call-site sounds
> fine.
> 
> I am still surprised by the fact that we do not have a similar
> instrumentation for all the code path where StartIncrementalEncoding is done:
> https://searchfox.org/mozilla-central/search?q=symbol:
> _ZN2JS24StartIncrementalEncodingEP9JSContextNS_6HandleIP8JSScriptEE&redirect=
> false
> 
> Or from locations where the parser is called.

It looks to me like StartIncrementalEncoding is used for encoding XDR, whereas here we only need to know where script files are decoded from XDR.  I don't think we need complete coverage on all paths where the JS parser can be invoked, just those where the devtools want to know the source of a file that contains script data, and which can occur while recording or replaying (i.e. scripts cannot be parsed off thread, nor lazily).
(In reply to Brian Hackett (:bhackett) from comment #8)
> The Latin1 encoding above isn't a storage hack, this is the encoding which
> will be used when converting the file contents to a string for the devtools
> code:
> 
> https://hg.mozilla.org/projects/ash/file/1262f537e29f/js/src/vm/
> ReplayDebugger.cpp#l2328

If the purpose of the recording is to get source that's displayable as a string in devtools (as opposed to suitable for use as a synthetic network response), I suggest recording the *output* of the DecodeToUTF16() calls in nsHtml5StreamParser.

> I couldn't figure out what encoding the HTML file contents actually use,
> though.  Does the |mEncoding| member describe the entire file's contents?

It does at the end of the parse. Near the start of the parse, the value of mEncoding may still change, so if you choose to record bytes instead of UTF-16, the value of mEncoding should be attached to the recording at the end of the parse and not at the start.
Thanks for the feedback, here is an updated patch that notifies the replay debugger about HTML sources after they have been decoded to UTF16.
Attachment #8981721 - Attachment is obsolete: true
Attachment #8984293 - Flags: review?(hsivonen)
Attachment #8984293 - Flags: review?(hsivonen) → review+
Comment on attachment 8981719 [details] [diff] [review]
Part 3 - Track parsed content from loaded script files.

I answered your questions in comment 13, do you have other concerns about this patch?
Attachment #8981719 - Flags: review- → review?(bugs)
Comment on attachment 8981715 [details] [diff] [review]
Part 1 - Add public API for tracking parsed content when recording/replaying.

As part of bug 1470795 this API has been moved out of JS and into the record/replay public API.  Part 2 of that bug includes the new API, and part 5 changes the uses of it to refer to the new location.
Attachment #8981715 - Attachment is obsolete: true
Attachment #8981715 - Flags: review?(jimb)
Attachment #8981719 - Flags: review?(bugs) → review+
Component: General → Web Replay
Pushed by bhackett@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/1bbbc2375abc
Part 2 - Track parsed content from XDR'ed scripts, r=nbp.
https://hg.mozilla.org/integration/mozilla-inbound/rev/46db17f25557
Part 3 - Track parsed content from loaded script files, r=smaug.
https://hg.mozilla.org/integration/mozilla-inbound/rev/7d63e1235def
Part 4 - Track parsed content from HTML files, r=hsivonen.
You need to log in before you can comment on or make changes to this bug.