Closed Bug 1328553 Opened 4 years ago Closed 4 years ago

Netmonitor DAMP test should measure time until all request data is received and rendered

Categories

(DevTools :: Netmonitor, defect, P1)

defect

Tracking

(firefox53 fixed)

RESOLVED FIXED
Firefox 53
Iteration:
53.4 - Jan 9
Tracking Status
firefox53 --- fixed

People

(Reporter: jsnajdr, Assigned: jsnajdr)

References

(Blocks 1 open bug)

Details

(Whiteboard: [netmonitor])

Attachments

(3 files)

Improve the netmonitorOpen DAMP test to better measure how long it takes Netmonitor to display all requests issued by a page.

The test currently does this:
1. Measure how long it takes to open the Netmonitor tool. Done by waiting for the promise returned by gDevTools.showToolbox() to resolve.

2. Measure how long it takes to reload the page. Done by waiting for the "load" event on the selected "browser".

Note that this "load" event has little relation to what the Netmonitor frontend is doing. The load is slowed down by the Netmonitor backend actor inspecting the nsHttpChannels in real-time, but the frontend and the RDP connection can be pretty busy even after the load is long finished.

3. Wait for all the outstanding RDP requests to "settle". This part is not measured at all, and it hides important info on how well the RDP connection and the frontend are performing.

The waitForRequestsToSettle method is not very reliable for Netmonitor and can resolve at times that are essentially random. The reason is that Netmonitor info is transferred mainly by a stream of events from the server, and requests from client to the server have a secondary role. It can happen that the server is still very busy sending events to the client, but there is no outstanding request from the client at a given moment.

4. Measure how long it takes to close the toolbox. Done by waiting for the promise returned by gDevTools.closeToolbox() to resolve.

Because waitForRequestsToSettle might not wait for long enough, a lot of activity can spill into the "closeToolbox" interval, making the timings unreliable and nondeterministic.

The improvement I'm going to make is:
- wait until request update events keep coming from the server
- track the timing of this event as a new "netmonitor.requestsFinished" DAMP result
Blocks: 1321749
Whiteboard: [netmonitor][triage]
Part 1: Add a NetmonitorController.waitForAllRequestsFinished method. It keeps record of first and last update events for network requests and resolves when all requests are in a "finished" state. Works very well for tracking page loads.

I decided to add this method to NetmonitorController code, not to the test code, for two reasons: first, I hope to unify it with NetmonitorView.whenDataAvailable soon (both methods watch the incoming update events and wait until some condition is met). Second, I have an addon that does some performance measurements after pressing a toolbar button and displays the results. It's nice to be able to reuse code between DAMP and addons like this.

Part 2: Use the waitForAllRequestsFinished method to implement a DAMP test.
Assignee: nobody → jsnajdr
There is a similar issue with inspector.OPEN
showToolbox resolves when Inspector.open() resolves over here:
  http://searchfox.org/mozilla-central/source/devtools/client/inspector/inspector.js#272

But the inspector is not done loading, we, at least, set the currently selected element after that.
It also mean the rule view and its dependencies are going to be loaded after. It will be entangled into inspector.RELOAD.

Also note that in your current patch:
-        yield openToolboxAndLog(label + ".netmonitor", "netmonitor");
+        const toolbox = yield openToolboxAndLog(label + ".netmonitor", "netmonitor");
+        const requestsDone = this.waitForNetworkRequests(label + ".netmonitor", toolbox);

The time spent in waitForNetworkRequests is going to be completely ignored. It means that we won't track any regression nor any improvement made around any of these requests.

I'm wondering if we should just ensure that Panel.open only resolves once everything is loaded and we let DAMP as-is, we let it just prove showToolbox timing.
Do you think we can make it so that NetMonitorPanel resolves only once everything is loaded?
http://searchfox.org/mozilla-central/source/devtools/client/netmonitor/panel.js#31
(In reply to Alexandre Poirot [:ochameau] from comment #5)
> The time spent in waitForNetworkRequests is going to be completely ignored.
> It means that we won't track any regression nor any improvement made around
> any of these requests.

This time is not ignored, it's logged as netmonitor.requestsFinished value. We'll be keeping track of both the "page load" and "requests finished" times. The first one should correlate with the overhead of the backend actor, the second one with the frontend and RDP communication.

> Do you think we can make it so that NetMonitorPanel resolves only once
> everything is loaded?

Netmonitor does not need to load anything when opening. It just watches network requests as they happen. The DAMP test opens the Netmonitor (with empty list) and then must reload the page to see any request. Inspector is different - it needs to load the DOM tree to populate its view. Similarly, console retrieves the cached messages.
The talos runs from try:

https://treeherder.mozilla.org/perf.html#/compare?originalProject=autoland&originalRevision=bc09e56cdc52&newProject=try&newRevision=17cd11206ee6fe9453616f3ce8e0ef309c8b75cb&framework=1&showOnlyImportant=0

show interesting results. The "close" time fell down dramatically not only for Netmonitor, but also for many other panels. I have no idea why.

Also, there is the new netmonitor.requestsFinished timing stat. Looking forward to slash it down :)
Status: NEW → ASSIGNED
Iteration: --- → 53.4 - Jan 9
Flags: qe-verify?
Priority: -- → P1
Whiteboard: [netmonitor][triage] → [netmonitor]
Comment on attachment 8823580 [details]
Bug 1328553 - Part 1: Add a method to NetmonitorController that waits for requests to finish

https://reviewboard.mozilla.org/r/102122/#review102626

Seems interesting overall!

This seems to require at least one request to be in flight when you call it, or else it may never resolve.

Can it be adapted to handle the case where requests are all already done as well?
Attachment #8823580 - Flags: review?(jryans)
Comment on attachment 8823581 [details]
Bug 1328553 - Part 2: Add a netmonitor.requestsFinished timing to DAMP tests

https://reviewboard.mozilla.org/r/102124/#review102622

I think the overall approach here seems reasonable to me.  You are giving a more specific label to a set of work, so that it's easier to track and less likely to bleed into other probes.

Can you please do another try run in the following way:

1. No changes applied, run ./mach try -b o -p linux64,macosx64,win32,win64 -u none -t g2,g2-e10s --rebuild-talos 5
2. Apply changes, run ./mach try -b o -p linux64,macosx64,win32,win64 -u none -t g2,g2-e10s --rebuild-talos 5

This will ensure you get multiple runs for both the base and new changeset, which Talos needs for statistical confidence.

Additionally, once you are ready to land, we need to bump the version of the DAMP add-on and upload it for signing.  I can help with this step when we're ready (the signing keys are MoCo only at the moment [this whole process is silly, see bug 1257627]).

::: testing/talos/talos/tests/devtools/addon/content/damp.js:123
(Diff revision 1)
> +    const { NetMonitorController } = toolbox.getCurrentPanel().panelWin;
> +    const start = performance.now();
> +    yield NetMonitorController.waitForAllRequestsFinished();
> +    const end = performance.now();
> +    this._results.push({
> +      name: label + ".requestsFinished.DAMP",

(Why do some labels end in .DAMP and others don't?  I am guessing there is no reason...)
Attachment #8823581 - Flags: review?(jryans) → review+
(In reply to J. Ryan Stinnett [:jryans] (use ni?) from comment #8)
> This seems to require at least one request to be in flight when you call it,
> or else it may never resolve.
> 
> Can it be adapted to handle the case where requests are all already done as
> well?

This is intentional. Finishing the wait after zero requests would lead to unsolvable race condition. We trigger a page reload and at the same time start waiting for all requests to finish. If we didn't wait for at least one request, the wait would finish before the first request (for the document) was even issued.

I was solving a similar issue in bug 1285173.
(In reply to J. Ryan Stinnett [:jryans] (use ni?) from comment #9)
> Can you please do another try run in the following way:

Started the two try runs, results can be watched here:

https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=82ed53fbab69c1e30ab20481b950e57f7974f482&newProject=try&newRevision=9c1d586e617e5f3c330132602b81130b683ce210

> Additionally, once you are ready to land, we need to bump the version of the
> DAMP add-on and upload it for signing.  I can help with this step when we're
> ready (the signing keys are MoCo only at the moment [this whole process is
> silly, see bug 1257627]).

Thanks, I didn't know that landing DAMP changes requires additional steps.

> (Why do some labels end in .DAMP and others don't?  I am guessing there is
> no reason...)

I guess it doesn't really matter - it's just a text label shown in Perfherder. I wanted to be consistent with existing labels. Some other labels don't use this convention (e.g., complicated.readHeapSnapshot should really be complicated.memory.readHeapSnapshot.DAMP) and it seems nobody is offended.
Comment on attachment 8823580 [details]
Bug 1328553 - Part 1: Add a method to NetmonitorController that waits for requests to finish

https://reviewboard.mozilla.org/r/102122/#review102684

Okay.  Please clarify in the function comment that it assumes at least one request is either in flight or will begin in the future, and so it will never resolve if all requests are already completed.

I think that makes it a bit harder to use in a generic way, but it seems okay as long the assumption is communicated clearly.
Attachment #8823580 - Flags: review+
(In reply to J. Ryan Stinnett [:jryans] (use ni?) from comment #13)
> Okay.  Please clarify in the function comment that it assumes at least one
> request is either in flight or will begin in the future, and so it will
> never resolve if all requests are already completed.
> 
> I think that makes it a bit harder to use in a generic way, but it seems
> okay as long the assumption is communicated clearly.

I updated the comment to document the assumptions. In general case, it's impossible to precisely define the moment when "all page requests are finished".

The try/talos results are in:
- for some reason, the linux64 ran only once, ignoring the --rebuild-talos param. Other platforms are ok
- the overall result shows a significant ~6% regression - that's caused by the new timing being added. Can we somehow "reset" the stats?
- for individual test results, there are no significant changes
- there is a bogus item with name {complicated,simple}.netmonitor.requestsFinished - without the "DAMP" suffix. It comes from some try runs I did with a WIP patch and was saved. Can we get rid of it?

Ryan, if everything is alright, can you help me land this?
Flags: needinfo?(jryans)
Depends on: 1329147
Depends on: 1329158
Depends on: 1329165
(In reply to Jarda Snajdr [:jsnajdr] from comment #16)
> - the overall result shows a significant ~6% regression - that's caused by
> the new timing being added. Can we somehow "reset" the stats?

I am not aware of any way to do so.  In the past, we have landed changes to the probes like this, and then try to let :jmaher and others watching the data that the baseline has changed, so it shouldn't be considered a regression.

> - there is a bogus item with name
> {complicated,simple}.netmonitor.requestsFinished - without the "DAMP"
> suffix. It comes from some try runs I did with a WIP patch and was saved.
> Can we get rid of it?

There are various other bogus labels as well that will never be recorded by the test.  I am not aware of a way to remove them.

:jmaher, is there any way to clean these up?

> Ryan, if everything is alright, can you help me land this?

Yes, I'll bundle up the changes for signing.
Flags: needinfo?(jryans) → needinfo?(jmaher)
Okay, I posted an extra patch with the signed addon including your changes.

Please squash this into your part 2 patch before landing.
Pushed by jsnajdr@gmail.com:
https://hg.mozilla.org/integration/autoland/rev/d68deea3f449
Part 1: Add a method to NetmonitorController that waits for requests to finish r=jryans
https://hg.mozilla.org/integration/autoland/rev/6e3472323a0c
Part 2: Add a netmonitor.requestsFinished timing to DAMP tests r=jryans
(In reply to J. Ryan Stinnett [:jryans] (use ni?) from comment #21)
> Okay, I posted an extra patch with the signed addon including your changes.
> Please squash this into your part 2 patch before landing.

Squashed and landed. Thanks!
No longer depends on: 1329147, 1329158, 1329165
it would be nice to clean these up- I assumed it did that automatically, but probably not.

:wlach...
> - there is a bogus item with name
> {complicated,simple}.netmonitor.requestsFinished - without the "DAMP"
> suffix. It comes from some try runs I did with a WIP patch and was saved.
> Can we get rid of it?

There are various other bogus labels as well that will never be recorded by the test.  I am not aware of a way to remove them.
Flags: needinfo?(jmaher) → needinfo?(wlachance)
https://hg.mozilla.org/mozilla-central/rev/d68deea3f449
https://hg.mozilla.org/mozilla-central/rev/6e3472323a0c
Status: ASSIGNED → RESOLVED
Closed: 4 years ago
Flags: in-testsuite+
Resolution: --- → FIXED
Target Milestone: --- → Firefox 53
and some perf wins:
== Change summary for alert #4738 (as of January 06 2017 19:08 UTC) ==

Improvements:

  9%  damp summary windows8-64 pgo e10s     267.47 -> 242.73
  9%  damp summary windows8-64 opt e10s     310.96 -> 282.49
  9%  damp summary windows7-32 pgo e10s     271.8 -> 247.28
  8%  damp summary linux64 opt e10s         328.95 -> 301.79
  8%  damp summary linux64 pgo e10s         268.75 -> 247.84
  4%  damp summary linux64 opt              355.68 -> 340.35
  4%  damp summary osx-10-10 opt            354.84 -> 341.93
  3%  damp summary windows8-64 pgo          298.71 -> 289.06
  3%  damp summary linux64 pgo              292.39 -> 283.78

For up to date results, see: https://treeherder.mozilla.org/perf.html#/alerts?id=4738
(In reply to Joel Maher ( :jmaher) from comment #26)
> it would be nice to clean these up- I assumed it did that automatically, but
> probably not.
> 
> :wlach...
> > - there is a bogus item with name
> > {complicated,simple}.netmonitor.requestsFinished - without the "DAMP"
> > suffix. It comes from some try runs I did with a WIP patch and was saved.
> > Can we get rid of it?
> 
> There are various other bogus labels as well that will never be recorded by
> the test.  I am not aware of a way to remove them.

There is no way to remove them currently. :( I think we should at least expire dead/unused performance signatures/data on try, filed bug 1329742 to take care of that. Not sure what to do aside from that: I guess I could manually delete this specific data, though that's not a solution that scales. I'll try to think about this a bit more.
Flags: needinfo?(wlachance)
Is this new test meant to be running on mozilla-central?

I don't see any results for the subtest there, but I do see them on my try run.

My try push is showing at DAMP regression (~6%), but I'm pretty sure it's just because we're not getting any results for requestsFinished in the base revision.

https://treeherder.mozilla.org/perf.html#/compare?originalProject=mozilla-central&originalRevision=af8a2573d0f1&newProject=try&newRevision=930cbd170feb459791ee19b691633c8dc2da0a01&framework=1&showOnlyImportant=0
(In reply to Matt Woodrow (:mattwoodrow) from comment #30)
> Is this new test meant to be running on mozilla-central?
> 
> I don't see any results for the subtest there, but I do see them on my try
> run.
> 
> My try push is showing at DAMP regression (~6%), but I'm pretty sure it's
> just because we're not getting any results for requestsFinished in the base
> revision.
> 
> https://treeherder.mozilla.org/perf.html#/compare?originalProject=mozilla-
> central&originalRevision=af8a2573d0f1&newProject=try&newRevision=930cbd170feb
> 459791ee19b691633c8dc2da0a01&framework=1&showOnlyImportant=0

Hmm...  The code to log requestsFinished seems to still be present in the tree...  :jmaher, any guess about this?
Flags: needinfo?(jmaher)
(In reply to Matt Woodrow (:mattwoodrow) from comment #30)
> Is this new test meant to be running on mozilla-central?

Yes, it definitely is.

> My try push is showing at DAMP regression (~6%), but I'm pretty sure it's
> just because we're not getting any results for requestsFinished in the base
> revision.

Yes, this regression is expected. The improved version of the test correctly waits until all request data are received and rendered in the UI. The old one abruptly closed the Netmonitor while it was still working.
is the question, why is requestsFinished not in the base revision?  I don't see it logged out in the perfherder_data section or other summarization- but it is in the mozilla-central code and it shows up as expected in the try push :(

Given this data that we see it on try server and not on integration branches, I am certain this is because we didn't build a damp.xpi and sign it.  In automation we are running with signed .xpi files, but on try server it is raw source.

here are the instructions we use:
https://wiki.mozilla.org/EngineeringProductivity/HowTo/SignExtensions
Flags: needinfo?(jmaher)
(In reply to Joel Maher ( :jmaher) from comment #33)
> is the question, why is requestsFinished not in the base revision?  I don't
> see it logged out in the perfherder_data section or other summarization- but
> it is in the mozilla-central code and it shows up as expected in the try
> push :(
> 
> Given this data that we see it on try server and not on integration
> branches, I am certain this is because we didn't build a damp.xpi and sign
> it.  In automation we are running with signed .xpi files, but on try server
> it is raw source.
> 
> here are the instructions we use:
> https://wiki.mozilla.org/EngineeringProductivity/HowTo/SignExtensions

It's strange since we did actually sign and update the add-on in this bug, but maybe I built it from an old version...?  :S

Filed bug 1338708 to update add-on again.
Flags: qe-verify? → qe-verify-
Product: Firefox → DevTools
You need to log in before you can comment on or make changes to this bug.