Closed Bug 1321878 Opened 8 years ago Closed 8 years ago

performance timing numbers exclude initial child process and IPC transit time in e10s

Categories

(Core :: Networking: HTTP, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla54
Tracking Status
firefox-esr45 --- wontfix
firefox50 --- wontfix
firefox51 --- wontfix
firefox52 --- wontfix
firefox53 --- fixed
firefox54 --- fixed

People

(Reporter: bkelly, Assigned: valentin)

References

(Depends on 1 open bug, )

Details

(Whiteboard: [necko-active])

Attachments

(2 files, 3 obsolete files)

I've been working with a partner to optimize their site in firefox. They wrote a simple benchmark here: https://github.com/samertm/firefox-sw-perf Measuring this under different configuration resulted in two interesting findings: 1. e10s provided resource load timing values about half non-e10s timings 2. adding an empty SW (pure overhead) did not increase load timing values and actual seemed to make them go down I believe the reason for these results is that HttpChannelChild wipes out the original AsyncOpen() timing measurement in favor of the time when the parent AsyncOpen() is performed: https://dxr.mozilla.org/mozilla-central/source/netwerk/protocol/http/HttpChannelChild.cpp#900 This means the timing measurement excludes all of the work in the child (like SW) and the IPC transit to the parent process. Obviously this is undesirable since the site actually experiences delays for this excluded time period. Simply removing the overwrite of the timestamp on line 900 seems to give more reasonable numbers.
Summary: performance timing numbers exclude initial child process and IPC transit time → performance timing numbers exclude initial child process and IPC transit time in e10s
I'm planning to write a test that verifies the service worker FetchEvent processing time is included in the timing measurement.
Comment on attachment 8816561 [details] [diff] [review] P1 Don't exclude child process and IPC transit time from Performance API network timings. r=valentin Review of attachment 8816561 [details] [diff] [review]: ----------------------------------------------------------------- Hi Ben, it's great that you're working on this. We were actually looking at the telemetry data today, and wondering if e10s being better is a fluke. I generally agree with the approach, my only concern is that we will be diffing mAsyncOpen from the child channel with other timestamps from the parent child. Before landing this can we check that it makes sense sistem wide, and on all platforms? Otherwise, we'll have to figure out how to sync or skew it to get the correct values.
Attachment #8816561 - Flags: review?(valentin.gosu) → review+
Nathan, do you know if TimeStamp values are comparable across processes? Or if not, who might be a good person to ask? I believe OS level things like clock_gettime() are comparable, but not sure about our TimeStamp thing. Valentin, if this requires more work to de-skew values across the processes I will probably have to punt it back to you to sort. At least on windows so far, though, the numbers I am getting look reasonable and make sense.
Flags: needinfo?(nfroyd)
Work-in-progress patch to try to verify that service worker time is included in the timing measurement. I'm having difficulty getting my fetch() calls to show up in Performance entries, though.
Whiteboard: [necko-active]
So, I think I have a straightforward solution even if TimeStamp is not comparable across processes. I think the easiest thing to do would be: 1. Take a TimeStamp::Now() in HttpChannelChild::OnStopRequest() before we copy the timings over. 2. Adjust the parent's ResponseEnd timing to match the current time from step (1). 3. Use this adjustment to shift all the other timing values from the parent. 4. The original child mAsyncOpenTime is unchanged. This ensures that we include all time spent in the measurements. All time outside of the parent process is effectively treated as occurring *before* the parent timings. So if there is delay in firing the child OnStopRequest() IPC runnable from the parent that time will be accounted between mAsyncOpenTime and the first network operation time. I think this is the best we can do if TimeStamp values are not directly comparable across processes.
(In reply to Ben Kelly [:bkelly] from comment #4) > Nathan, do you know if TimeStamp values are comparable across processes? Or > if not, who might be a good person to ask? I believe OS level things like > clock_gettime() are comparable, but not sure about our TimeStamp thing. bz and I talked about this very thing: http://logs.glob.uno/?c=mozilla%23content&s=11+Nov+2016&e=11+Nov+2016&h=TimeStamp#c408106 I think the decision from that discussion was that on Linux/Android, it works fine. On Mac (where we don't use clock_gettime, IIRC), I *think* it works OK? (We're using mach_absolute_time, which is consistent between processes.) On Windows, it wasn't clear to us whether or not all the values that we use to derive TimeStamps are going to be identical between different processes. I think that means the answer to your question is "no". We could check in with somebody who understands the Windows TimeStamp implementation better than I do...
Flags: needinfo?(nfroyd)
This implements my plan from comment 6. I found that depending on what the actual parent channel did the "last timing value" could be different. So I wrote this code to inspect each timing value to determine the last one to base the adjustment on. I also found that adjusting to the end of the response captured a *significant* amount of delay in the IPC messaging from parent to child. We have about 40+ parallel style sheet network requests here. The work triggered from the early responses jank up the main thread delaying the IPC OnStopRequest from the remaining channels. The first channels have adjustments of ~1ms, but the last channels to complete are adjusted by 100+ms. The delay ramps up linearly which suggests this is almost certainly delay caused by the child process main thread event queue. Overall results for the original benchmark on my mac laptop with these patches are: Key: / Loaded 1056 resources Average duration: 111.17769098674245 jQuery duration: 116.33847854545452 Key: /with_sw Loaded 1056 resources Average duration: 113.73764291761361 jQuery duration: 146.80827636363637
Attachment #8816561 - Attachment is obsolete: true
Attachment #8817432 - Flags: review?(valentin.gosu)
To compare to non-e10s with the same build on the same machine: non-e10s: Key: / Loaded 1056 resources Average duration: 63.616684736742435 jQuery duration: 76.60912122727272 Key: /with_sw Loaded 1056 resources Average duration: 349.4717153958333 jQuery duration: 641.0320482272728 This shows our e10s channels are worse than non-e10s in the default case. In the case of a no-op service worker e10s is much better.
(In reply to Nathan Froyd [:froydnj] from comment #7) > I think that means the answer to your question is "no". We could check in > with somebody who understands the Windows TimeStamp implementation better > than I do... Which is me.. if you are on a modern HW, it will be identical between processes, always (checked!). If there is a jitter detected in values from the high performance counter we use (on way older hardware), it might be different among processes naturally or TimeStamp may fallback to a lower res software-based timer (more stable). Which "should" be consistent among processes, but that is something I never checked on. Generally, you have no knowledge whether TimeStamp::Now() returns the hi-res or the fallback lo-res thing. Tho, we have the info about "being on a modern HW". There is no public API for that, can be easily implemented.
Comment on attachment 8817432 [details] [diff] [review] P1 Don't exclude child process and IPC transit time from Performance API network timings. r=valentin Let me think a bit more about this given comment 10.
Attachment #8817432 - Flags: review?(valentin.gosu)
So, my interpretation of comment 10 is that depending on TimeStamp equivalence across processes should be mostly ok. We may have some variance on old windows hardware, but it doesn't seem too critical if these numbers are a bit off. And that problem should decrease in the future as this old hardware is retired. So I'd like to go back to the simple case of keeping the child mAsyncOpenTime and copying the rest of the parent side values. This does ignore time between the end of the nsIChannel processing on the network thread in the parent and actually delivering OnStopRequest to the child main thread. As I discussed in comment 8 that time can be non-trivial. However, we don't actually track that in non-e10s either. Also, its arguable whether that should be considered "network time" or if it should be considered part of post-network processing internal to the browser. For now I think it makes most sense to just conform to our non-e10s semantics and exclude this "dispatch back to main thread" delay from these measurements. A separate bug could discuss whether to track this or not. Honza, what do you think?
Attachment #8817432 - Attachment is obsolete: true
Attachment #8817489 - Flags: review?(honzab.moz)
This test fails with the current mozilla-central and passes with the P1 patch. It works by spinning the event loop for a couple seconds in a particular FetchEvent handler. It then ensures that the measurement of that request shows increased delay.
Attachment #8816612 - Attachment is obsolete: true
Attachment #8818086 - Flags: review?(bugmail)
Comment on attachment 8818086 [details] [diff] [review] P2 Add a wpt test to verify Performance timing API accounts for service worker time. r=asuth Review of attachment 8818086 [details] [diff] [review]: ----------------------------------------------------------------- My heart breaks that we need a busy-loop with such a high constant, but I of course can think of nothing better and the benefits of having this test ensure things are working is far more important to the web than these tests running at light speed. ::: testing/web-platform/tests/service-workers/service-worker/resources/empty-but-slow-worker.js @@ +1,4 @@ > +addEventListener('fetch', evt => { > + if (evt.request.url.endsWith('slow')) { > + let start = Date.now(); > + while(Date.now() - start < 2000); Technically speaking, I think performance.now() is safer since it's strictly increasing and not subject to ntpdate or other system clock time-warps. I understand that spec and implementation wise performance.now() in this context is probably a non-starter, but I think it's still probably appropriate to add a comment here indicating performance.now() is preferred but not available. (And if it is okay, maybe https://developer.mozilla.org/en-US/docs/Web/API/Performance/now needs an update in its worker compatibility matrix.)
Attachment #8818086 - Flags: review?(bugmail) → review+
Comment on attachment 8817489 [details] [diff] [review] P1 Don't exclude child process and IPC transit time from Performance API network timings. r=mayhemer Review of attachment 8817489 [details] [diff] [review]: ----------------------------------------------------------------- Let's try it!
Attachment #8817489 - Flags: review?(honzab.moz) → review+
Pushed by bkelly@mozilla.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/efe8228d192b P1 Don't exclude child process and IPC transit time from Performance API network timings. r=mayhemer https://hg.mozilla.org/integration/mozilla-inbound/rev/e1b7a57826e0 P2 Add a wpt test to verify Performance timing API accounts for service worker time. r=asuth
Let's let this bake on nightly for a few days and then try to uplift. I'd like to get it at least into FF52 since thats the next ESR.
Status: ASSIGNED → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla53
Comment on attachment 8817489 [details] [diff] [review] P1 Don't exclude child process and IPC transit time from Performance API network timings. r=mayhemer Approval Request Comment [Feature/Bug causing the regression]: e10s [User impact if declined]: Web developers will see incorrect timing measurements. [Is this code covered by automated tests?]: P2 patch includes a WPT test [Has the fix been verified in Nightly?]: Its been on nightly for a couple weeks. [Needs manual test from QE? If yes, steps to reproduce]: No [List of other uplifts needed for the feature/fix]: None [Is the change risky?]: Some risk [Why is the change risky/not risky?]: Some risk that measurements will still be wrong on older hardware if our timestamps are not equivalent across promises. We have assertions to catch that condition. [String changes made/needed]: None
Attachment #8817489 - Flags: approval-mozilla-aurora?
Comment on attachment 8818086 [details] [diff] [review] P2 Add a wpt test to verify Performance timing API accounts for service worker time. r=asuth See comment 20.
Attachment #8818086 - Flags: approval-mozilla-aurora?
Comment on attachment 8817489 [details] [diff] [review] P1 Don't exclude child process and IPC transit time from Performance API network timings. r=mayhemer improve network timings for e10s, aurora52+
Attachment #8817489 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Comment on attachment 8818086 [details] [diff] [review] P2 Add a wpt test to verify Performance timing API accounts for service worker time. r=asuth test-only change for aurora52
Attachment #8818086 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
remote: https://hg.mozilla.org/releases/mozilla-aurora/rev/b67e9aafeaa5e04ec730c0b5551aba36bd76f939 remote: https://hg.mozilla.org/releases/mozilla-aurora/rev/0c3414e6f0b2535f7b7668e835fbfe2401c43bb0 I don't want to uplift to beta 51 because we need time to see if the diagnostic triggers on the beta population before merging to release.
Depends on: 1329410
Depends on: 1329913
Also pushed a backout to mozilla-inbound: https://hg.mozilla.org/integration/mozilla-inbound/rev/c41f6dabe834dba31143a83a1c1405ae50e33584 I guess we wait to reopen this when that merges to central?
Nah, we can reopen now.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Target Milestone: mozilla53 → ---
Valentin, since the quick fix did not work I'd like to pass this back to the necko team to design a better long term solution. Unfortunately I don't have time to work on anything more involved at the moment. Can you guys pick this up?
Assignee: bkelly → nobody
Flags: needinfo?(valentin.gosu)
So, I tried to see if a simple NTP-like time synchronization between processes would work, based on the algorithm at: https://en.wikipedia.org/wiki/Network_Time_Protocol#Clock_synchronization_algorithm The patch is here: https://gist.github.com/valenting/334d399ea6a72e7c6b49e46aafb163d1 The results aren't great. Even though the timestamps on my machine should be the same in both processes, the computed time difference varied from 70 microseconds up to a second and a half. The large delta is caused by the child->parent and parent->child IPC time being different. Now, I'm not sure about the approach we should take from this point forward. We could either implement a more complicated but reliable way to sync time. Or we could implement the time syncing as a fallback, when we detect that timestamps aren't consistent across the IPC boundary. I don't have the cycles to spend on this right now, but we should definitely look into it. Also, were the failures in bug 1329410 caused by a faulty test, a platform that triggered the timestamp issue, or was there a bug in the implementation?
Flags: needinfo?(valentin.gosu) → needinfo?(bkelly)
(In reply to Valentin Gosu [:valentin] from comment #29) > So, I tried to see if a simple NTP-like time synchronization between > processes would work, based on the algorithm at: > https://en.wikipedia.org/wiki/ > Network_Time_Protocol#Clock_synchronization_algorithm > The patch is here: > https://gist.github.com/valenting/334d399ea6a72e7c6b49e46aafb163d1 > > The results aren't great. Even though the timestamps on my machine should be > the same in both processes, the computed time difference varied from 70 > microseconds up to a second and a half. The large delta is caused by the > child->parent and parent->child IPC time being different. Its very hard with a system like this to know if the time difference is due to mis-aligned timers or if the IPC runnables are delayed due to an event queue backlog introducing delay. > Now, I'm not sure about the approach we should take from this point forward. > We could either implement a more complicated but reliable way to sync time. > Or we could implement the time syncing as a fallback, when we detect that > timestamps aren't consistent across the IPC boundary. > I don't have the cycles to spend on this right now, but we should definitely > look into it. Ok, but it feels like this should factor into our e10s planning somehow. We're regressing a web-exposed API by enabling e10s right now. > Also, were the failures in bug 1329410 caused by a faulty test, a platform > that triggered the timestamp issue, or was there a bug in the implementation? I'm not sure. Given the assertions I felt the solution was too simplistic and just backed it all out. We could re-land the test and mark it expected fail for now. NI to Brad just so the e10s team is aware that Performance timeline API is somewhat broken in e10s mode at the moment. A large production site pointed out this strangeness to me, so its being noticed.
Flags: needinfo?(bkelly) → needinfo?(blassey.bugs)
We could also re-land the fix without the assertion. This would fix the values for linux, mac, and windows with modern hardware. The numbers would be wrong on old hardware, but they are also currently wrong. So maybe this would be an improvement.
Flags: needinfo?(blassey.bugs)
I think I'm ok with landing the fix without the assertion. I'm unsure about what to do about the old platforms. Instead of asserting, we could set mTimingEnabled causing the resource not to be reported in resource timing. Or when detected, we could flip the pref in the parent channel, and disable resource timing completely. But I guess this is a decision better left as a follow up.
Flags: needinfo?(bkelly)
Do we have data on what hardware/OS the assertion was failing?
Based on the crash reports in bug 1329913, the most common CPUs and OS are: CPU: AuthenticAMD family 15 model 104 stepping 1 | 2 AuthenticAMD family 15 model 107 stepping 2 | 2 GenuineIntel family 6 model 42 stepping 7 | 4 AuthenticAMD family 15 model 72 stepping 2 | 2 AuthenticAMD family 15 model 67 stepping 2 | 2 GenuineIntel family 6 model 60 stepping 3 | 2 OS: 5.1.2600 Service Pack 3 Windows XP 5.2.3790 Service Pack 2 Windows XP 6.3.9600 Windows 8.1
Another alternative is to TimeStamp::Now() in the child when we actually call SendPHttpChannelConstructor(). We could then do: (child send time) - (child async open) + (parent timestamp) - (parent recv time) This would miss the IPC transit time, but would at least account for child-side service worker time.
Flags: needinfo?(bkelly)
Assignee: nobody → bkelly
Comment on attachment 8817489 [details] [diff] [review] P1 Don't exclude child process and IPC transit time from Performance API network timings. r=mayhemer Clearing aurora approval flag since this had to be backed out. We can still fix this in 52 during early beta if an updated patch sticks, feel free to nominate for beta then.
Attachment #8817489 - Flags: approval-mozilla-aurora+
Attachment #8818086 - Flags: approval-mozilla-aurora+
I'm sorry, but I'm on the hook for a large amount of work blocking multi-e10s. I won't have time here. It would be really great if someone from the necko team could move this ahead using either comment 31, comment 35, or another solution.
Assignee: bkelly → nobody
Assignee: nobody → valentin.gosu
So, I looked into it, and believe the initial test failure was because the test was unstable. If I read it correctly, it races 2 requests, one of them a regular one, and another that is delayed for 2 seconds, then checks that their time to completion differs by more than 1.5 seconds. However, we have no guarantee that the regular request will be completed immediately. I am confident that we can reland the fix without the assertion, and will file another bug to address the issue for older systems. We also need to write a better, less racy test for this use case.
https://hg.mozilla.org/integration/mozilla-inbound/rev/ea9c1598f049c022b7ba8bb5c0055d8ec6430773 Bug 1321878 P1 Don't exclude child process and IPC transit time from Performance API network timings. r=mayhemer https://hg.mozilla.org/integration/mozilla-inbound/rev/745f2862bc05a9f7fb8355e9ad04a31ffa394ad5 Bug 1321878 P2 Add a wpt test to verify Performance timing API accounts for service worker time. r=asuth https://hg.mozilla.org/integration/mozilla-inbound/rev/7259adb4d564cecb66b418c8937d917ba530b4ed Bug 1321878 - Disable diagnostic assert and make test less racy r=me
https://hg.mozilla.org/integration/mozilla-inbound/rev/b3f45eae647ff9caf40937a8d6dbe0f465e14c42 Bug 1321878 P1 Don't exclude child process and IPC transit time from Performance API network timings. r=mayhemer https://hg.mozilla.org/integration/mozilla-inbound/rev/a3da42e997403cfd7dc34577b4adccbdf07b7530 Bug 1321878 P2 Add a wpt test to verify Performance timing API accounts for service worker time. r=asuth https://hg.mozilla.org/integration/mozilla-inbound/rev/c4d61963ad98173d9f4c95e6dab1f0e316e8c0bc Bug 1321878 - Disable diagnostic assert and make test less racy r=me
James, we talked about keeping the test upstream the other day. Please note that we also re-landed it here.
Flags: needinfo?(james)
(In reply to Ben Kelly [:bkelly] from comment #42) > James, we talked about keeping the test upstream the other day. Please note > that we also re-landed it here. With a few minor changes.
Flags: needinfo?(valentin.gosu)
OK, well the changes will get upstreamed with the next upstream push.
Flags: needinfo?(james)
Depends on: 1336238
This is wontfix for 52 at this stage.
Can we uplift this to 53 at least? Having this fixed further along the train would help us investigate SW performance issues on dropbox.
Comment on attachment 8817489 [details] [diff] [review] P1 Don't exclude child process and IPC transit time from Performance API network timings. r=mayhemer Approval Request Comment [Feature/Bug causing the regression]: Resource timing reporting issue. [User impact if declined]: Invalid resource timing values when dealing with service workers. [Is this code covered by automated tests?]: yes [Has the fix been verified in Nightly?]: yes [Needs manual test from QE? If yes, steps to reproduce]: no [List of other uplifts needed for the feature/fix]: https://hg.mozilla.org/mozilla-central/rev/b3f45eae647f https://hg.mozilla.org/mozilla-central/rev/a3da42e99740 https://hg.mozilla.org/mozilla-central/rev/c4d61963ad98 [Is the change risky?]: no [Why is the change risky/not risky?]: Very simple fix. The only downside would be on old platforms listed in comment 34; they would still have invalid timing issues. [String changes made/needed]: none.
Attachment #8817489 - Flags: approval-mozilla-beta?
Comment on attachment 8817489 [details] [diff] [review] P1 Don't exclude child process and IPC transit time from Performance API network timings. r=mayhemer Fix an invalid resource timing value. Beta53+.
Attachment #8817489 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: