1342714 - Eliminate the cost of performance monitoring (js::AutoStopwatch) from scrolling google spreadsheets

Reporter

Description

•

8 years ago

At least on OSX, the thread_info() syscall from http://searchfox.org/mozilla-central/rev/93f1641e394cfbdfbeed44e81f7dab0f2aff7b6f/toolkit/components/perfmonitoring/nsPerformanceStats.cpp#1296 shows up in profiles (so does the mach_thread_self() call.) Also js::RunScript is pretty hot here, and I suspect it's hot on other GSuites pages as well. There's no reason to pay this cost all the time.

(no longer active)

Reporter

Comment 1

•

8 years ago

Attached patch Disable performance monitoring by default — Details — Splinter Review

Attachment #8841284 - Flags: review?(dteller)

(no longer active)

Reporter

Updated

•

8 years ago

Assignee: nobody → ehsan

David Teller [:Yoric] - still alive but not very active

Assignee

Comment 2

•

8 years ago

Comment on attachment 8841284 [details] [diff] [review] Disable performance monitoring by default Review of attachment 8841284 [details] [diff] [review]: ----------------------------------------------------------------- If we disable performance monitoring, this will break the add-on watcher (including add-on performance telemetry), as well as about:performance, so there is a need to maintain monitoring. Are you sure that this actually shows up in GSuite? Normally, this code is called at most once per tick-involving-js. If this code is a problem, I can think of the following avenues: - somehow reduce the number of times we call performance monitoring (e.g. introduce sampling); - somehow get rid of the thread_info call (e.g. use per-process info instead of per-thread info, or only use per-thread info once in a while to correct per-process info).

Attachment #8841284 - Flags: review?(dteller)

(no longer active)

Reporter

Comment 3

•

8 years ago

(In reply to David Teller [:Yoric] (please use "needinfo") from comment #2) > If we disable performance monitoring, this will break the add-on watcher > (including add-on performance telemetry), as well as about:performance, so > there is a need to maintain monitoring. > > Are you sure that this actually shows up in GSuite? Yes, 100%. > Normally, this code is > called at most once per tick-involving-js. If this code is a problem, I can > think of the following avenues: What ensures this? AFAICT this is called in js::RunScript() every time, which, as I said, is hot code: <http://searchfox.org/mozilla-central/rev/90d1cbb4fd3dc249cdc11fe5c3e0394d22d9c680/js/src/vm/Interpreter.cpp#361>. > - somehow reduce the number of times we call performance monitoring (e.g. > introduce sampling); > - somehow get rid of the thread_info call (e.g. use per-process info instead > of per-thread info, or only use per-thread info once in a while to correct > per-process info). No, what we need to ensure is to make sure js::AutoStopwatch doesn't show up *at all* when scrolling a google spreadsheet (bug 1342713). (There's other stuff besides thread_info there that shows up too, although thread_info seems to be the most expensive part). My patch achieves that goal. Can you please provide a different patch that achieves the same result without the downsides you mentioned? Thanks!

Assignee: ehsan → dteller

(no longer active)

Reporter

Updated

•

8 years ago

Summary: Disable performance monitoring by default → Eliminate the cost of performance monitoring (js::AutoStopwatch) from scrolling google spreadsheets

David Teller [:Yoric] - still alive but not very active

Assignee

Comment 4

•

8 years ago

> What ensures this? AFAICT this is called in js::RunScript() every time, which, as I said, is hot code: <http://searchfox.org/mozilla-central/rev/90d1cbb4fd3dc249cdc11fe5c3e0394d22d9c680/js/src/vm/Interpreter.cpp#361>. We are still talking about `thread_info`, right? It's called by `nsPerformanceStatsService::StopwatchStartCallback` and `nsPerformanceStatsService::StopwatchCommitCallback` (see nsPerformanceStats.cpp), which are called respectively here https://dxr.mozilla.org/mozilla-central/source/js/src/vm/Stopwatch.cpp#54-67 and here https://dxr.mozilla.org/mozilla-central/source/js/src/vm/Stopwatch.cpp#137-167. There is a guard to ensure that we're not calling each more than once per tick. At this stage, your profile is unclear to me. Do you have reasons to believe that it is called more than once per tick? If so, my best hypothesis so far is that the iteration number (setup through https://dxr.mozilla.org/mozilla-central/source/js/xpconnect/src/XPCJSContext.cpp#3660 ) is somehow wrong. > No, what we need to ensure is to make sure js::AutoStopwatch doesn't show up *at all* when scrolling a google spreadsheet (bug 1342713). (There's other stuff besides thread_info there that shows up too, although thread_info seems to be the most expensive part). My patch achieves that goal. Do you have a profile at hand? Discussing in the abstract is a bit complicated.

Flags: needinfo?(ehsan)

(no longer active)

Reporter

Comment 5

•

8 years ago

(In reply to David Teller [:Yoric] (please use "needinfo") from comment #4) > > What ensures this? AFAICT this is called in js::RunScript() every time, which, as I said, is hot code: <http://searchfox.org/mozilla-central/rev/90d1cbb4fd3dc249cdc11fe5c3e0394d22d9c680/js/src/vm/Interpreter.cpp#361>. > > We are still talking about `thread_info`, right? No, I'm talking about the overhead of performance monitoring (the code I linked to.) > It's called by `nsPerformanceStatsService::StopwatchStartCallback` and > `nsPerformanceStatsService::StopwatchCommitCallback` (see > nsPerformanceStats.cpp), which are called respectively here > https://dxr.mozilla.org/mozilla-central/source/js/src/vm/Stopwatch.cpp#54-67 > and here > https://dxr.mozilla.org/mozilla-central/source/js/src/vm/Stopwatch.cpp#137- > 167. There is a guard to ensure that we're not calling each more than once > per tick. > > At this stage, your profile is unclear to me. Do you have reasons to believe > that it is called more than once per tick? If so, my best hypothesis so far > is that the iteration number (setup through > https://dxr.mozilla.org/mozilla-central/source/js/xpconnect/src/XPCJSContext. > cpp#3660 ) is somehow wrong. I'm not sure how many times it's called unfortunately. All I know at this point is that it shows up in the profile when scrolling a google spreadsheet. It may be because we're running it too many times, or the expected number of times but it's just too slow. Have you tried to profile this? > > No, what we need to ensure is to make sure js::AutoStopwatch doesn't show up *at all* when scrolling a google spreadsheet (bug 1342713). (There's other stuff besides thread_info there that shows up too, although thread_info seems to be the most expensive part). My patch achieves that goal. > > Do you have a profile at hand? Discussing in the abstract is a bit > complicated. Sure, but it's an Instruments profile so there is no great way to share it. I can upload a screenshot...

Flags: needinfo?(ehsan)

(no longer active)

Reporter

Comment 6

•

8 years ago

Attached image Profile screenshot — Details

(no longer active)

Reporter

Comment 7

•

8 years ago

Attached image Another profile — Details

This one shows the relative cost of this with one of the culprits behind bug 1338802 which is the main reason why we're slow on this test case. That is the js::detail::HashTable... frame (the first one) with a self weight of 38ms.

David Teller [:Yoric] - still alive but not very active

Assignee

Comment 8

•

8 years ago

> Have you tried to profile this? Before I landed this code, ~18 months ago, I spent several days profiling it on every site that I could think of (I'm almost sure that GSuite was included) and AutoStopwatch didn't show up. Nor did it show up on Talos, Arewefastyet, etc. I haven't re-profiled since then, though. > Sure, but it's an Instruments profile so there is no great way to share it. I can upload a screenshot... Ah, and Instruments doesn't show usage counters. That would have been helpful here to determine whether thread_info is somehow called too often. Is there any way you could delve one or two depths further in `AutoStopwatch::AutoStopwatch` in your profiles, to find out exactly what's wrong with this constructor? > This one shows the relative cost of this with one of the culprits behind bug 1338802 which is the main reason why we're slow on this test case. That is the js::detail::HashTable... frame (the first one) with a self weight of 38ms. If I read this correctly, in that case, it's about ~20% wrt to the biggest culprit, right? If that's meaningful, for the moment, the best strategy I can think of atm is to somehow not monitor every tick.

David Teller [:Yoric] - still alive but not very active

Assignee

Updated

•

8 years ago

Flags: needinfo?(ehsan)

(no longer active)

Reporter

Comment 9

•

8 years ago

(In reply to David Teller [:Yoric] (please use "needinfo") from comment #8) > > Have you tried to profile this? > > Before I landed this code, ~18 months ago, I spent several days profiling it > on every site that I could think of (I'm almost sure that GSuite was > included) and AutoStopwatch didn't show up. Nor did it show up on Talos, > Arewefastyet, etc. > > I haven't re-profiled since then, though. Oh no sorry I meant with the Google spreadsheet scrolling test case. I was asking since Instruments makes it next to impossible to share the profile in any decent way. I'm pretty sure this is easy to repro, since I see it every time. > > Sure, but it's an Instruments profile so there is no great way to share it. I can upload a screenshot... > > Ah, and Instruments doesn't show usage counters. That would have been > helpful here to determine whether thread_info is somehow called too often. > Is there any way you could delve one or two depths further in > `AutoStopwatch::AutoStopwatch` in your profiles, to find out exactly what's > wrong with this constructor? The screenshots were sorted bottom up, those are the leaf frames. Usually Instruments can show which lines are hot, although sometimes the feature doesn't work that well. In these profiles nothing shows as particularly hot either in the ctor or in the dtor, it just seems that they're being called... (But I don't trust Instruments here 100% so take this with a grain of salt.) > > This one shows the relative cost of this with one of the culprits behind bug 1338802 which is the main reason why we're slow on this test case. That is the js::detail::HashTable... frame (the first one) with a self weight of 38ms. > > If I read this correctly, in that case, it's about ~20% wrt to the biggest > culprit, right? If that's meaningful, for the moment, the best strategy I > can think of atm is to somehow not monitor every tick. You understand this code a lot better than I do, so I'd be more than happy to trust your judgement. :-)

Flags: needinfo?(ehsan)

(no longer active)

Reporter

Comment 10

•

8 years ago

Another profile: https://perfht.ml/2nATQGR From running this test case: https://jakearchibald.github.io/service-worker-benchmark/ David, can you please propose a plan on how you would like you deal with this? I keep seeing this in various profiles, and I'd like to ensure that we are going to have an actively worked on plan here. Thank you!

Flags: needinfo?(dteller)

David Teller [:Yoric] - still alive but not very active

Assignee

Comment 11

•

8 years ago

Unfortunately, for the moment, I'm actively working on something that also has high priority and is entirely unrelated, so I can make no promise. Ehsan, there is one thing that would provide a simple experience, if you have time. In https://dxr.mozilla.org/mozilla-central/source/toolkit/components/perfmonitoring/nsPerformanceStats.cpp?q=nsPerformanceStats.cpp&redirect_type=direct#1288, could you simply replace the `#if defined(XP_MACOSX)` with a `#if 0` and see if there are performance improvements? This would considerably degrade the precision of performance stats on MacOS, but I realize that your work has higher impact than precise performance stats.

Flags: needinfo?(dteller) → needinfo?(ehsan)

Naveed Ihsanullah [:naveed]

Updated

•

8 years ago

Whiteboard: [qf:p1]

Jean Gong :jgong

Comment 12

•

8 years ago

Hello David, we have raised the priority on this bug and it needs get fixed based on the priority of the Quantum Flow. Can you give let me know if you can start on this ASAP? Thanks!

Whiteboard: [qf:p1]

David Teller [:Yoric] - still alive but not very active

Assignee

Comment 13

•

8 years ago

I believe that I can start running experiments next week. No ETA yet, that will depend on the result of experiments.

Jean Gong :jgong

Updated

•

8 years ago

Whiteboard: [qf:p1]

Jean Gong :jgong

Comment 14

•

8 years ago

Apologies. I cleared the whiteboard by accident.

Desigan Chinniah [:cyberdees] [:dees] [London - GMT]

Updated

•

8 years ago

Whiteboard: [qf:p1] → [qf:p1][platform-rel-Google][platform-rel-GoogleSuite][platform-rel-GoogleSheets]

(no longer active)

Reporter

Comment 15

•

8 years ago

Yoric and I discussed this today on IRC. Clearing the needinfo.

Flags: needinfo?(ehsan)

Comment hidden (mozreview-request)

David Teller [:Yoric] - still alive but not very active

Assignee

Comment 17

•

8 years ago

I did a quick profiling with the attached patch, scrolling continuously on a Google Spreadsheet for a few seconds. AutoStopwatch was profiled as contributing ~300µs of the total, so I'm reasonably optimistic about this patch.

Jan de Mooij [:jandem]

Comment 18

•

8 years ago

Can we just always use the Vector that can store 8 elements inline? Even if we expect many elements, that's still an improvement because we avoid some realloc calls. Having separate small and large cases just makes things more confusing IMO.

Flags: needinfo?(dteller)

David Teller [:Yoric] - still alive but not very active

Assignee

Comment 19

•

8 years ago

(In reply to Jan de Mooij [:jandem] from comment #18) > Can we just always use the Vector that can store 8 elements inline? Even if > we expect many elements, that's still an improvement because we avoid some > realloc calls. Having separate small and large cases just makes things more > confusing IMO. Well, for the large vector, we rely upon `swap` at the end of the tick, and this operation is not implemented if there is inline storage. We could replace it with a manual loop + move, but that's slower and more bug-prone. So, unless you insist, I'd rather keep the separation between small and large cases.

Flags: needinfo?(dteller) → needinfo?(jdemooij)

Nathan Froyd [:froydnj]

Comment 20

•

8 years ago

mozreview-review

Comment on attachment 8850459 [details] Bug 1342714 - Reducing allocations in AutoStopwatch; https://reviewboard.mozilla.org/r/123058/#review125960 I'm kind of inclined to agree with Jan here. Implementing the necessary code around the bit in `PerformanceMonitoring::commit` would be a single line: ``` PerformanceGroupVector recentGroups(Move(recentGroups_)); ``` and the only other use of `swap` below could do the same thing, or convert to using `nsTArray`/`nsAutoTArray` or whatever. ::: toolkit/components/perfmonitoring/nsPerformanceStats.cpp:782 (Diff revision 1) > mTopGroup->Dispose(); > mTopGroup = nullptr; > > // Copy references to the groups to a vector to ensure that we do > // not modify the hashtable while iterating it. > - GroupVector groups; > + LargeGroupVector groups; Seems like you would want to use something with stack-allocated storage here, though I'm not sure that using the typedefs really makes things clearer, since the important thing is the stable storage of the pointers, not the smallness/largeness/groupness of the vector. ::: toolkit/components/perfmonitoring/nsPerformanceStats.cpp:1137 (Diff revision 1) > + // Sanity check: make sure that we're not causing heap allocations here. > + MOZ_ASSERT(out.length() <= out.sMaxInlineStorage); This seems like a really weird check. What's the point of doing this? We don't actually care whether we're overflowing the buffer here... ::: toolkit/components/perfmonitoring/nsPerformanceStats.cpp:1367 (Diff revision 1) > - GroupVector alerts; > + LargeGroupVector alerts; > mPendingAlerts.swap(alerts); If `mPendingAlerts` were an `nsAutoTArray`, calling swap here would be perfectly acceptable.

Attachment #8850459 - Flags: review?(nfroyd) → review-

Comment hidden (mozreview-request)

David Teller [:Yoric] - still alive but not very active

Assignee

Comment 23

•

8 years ago

mozreview-review-reply

Comment on attachment 8850459 [details] Bug 1342714 - Reducing allocations in AutoStopwatch; https://reviewboard.mozilla.org/r/123058/#review125960 Good idea, I hadn't noticed the `mozilla::Vector`'s move constructor. > Seems like you would want to use something with stack-allocated storage here, though I'm not sure that using the typedefs really makes things clearer, since the important thing is the stable storage of the pointers, not the smallness/largeness/groupness of the vector. Fair enough. > This seems like a really weird check. What's the point of doing this? We don't actually care whether we're overflowing the buffer here... Well, if we exceed the inline buffer, pretty much every single time we enter this compartment, we'll end up heap-allocating. That's the most likely culprit for the performance issue encountered here, so we want to make sure that we don't do this accidentally. > If `mPendingAlerts` were an `nsAutoTArray`, calling swap here would be perfectly acceptable. Should I file a bug to suggest that someone should finish implementing `Vector::swap`?

Nathan Froyd [:froydnj]

Comment 24

•

8 years ago

mozreview-review

Comment on attachment 8850459 [details] Bug 1342714 - Reducing allocations in AutoStopwatch; https://reviewboard.mozilla.org/r/123058/#review126326 Thanks for making the suggested changes. A few comments about the comments below. ::: js/src/vm/Stopwatch.cpp:158 (Diff revision 3) > - PerformanceGroupVector recentGroups; > - recentGroups_.swap(recentGroups); > + // Unless `recentGroups_.length() <= 8`, the `Move` operation below > + // is constant-time. I'm unsure that this comment is actually helpful: why is this particular operation worth calling out vs. all the other things we're doing? The expected, common case is that `recentGroups_.length() <= 8`, too, so the comment is saying that the exceptional situation is constant-time...what about the common case? I'm also not wild about the comment embedding a magic constant from the `PerformanceGroupVector` typedef that might get out of sync with the typedef. ::: toolkit/components/perfmonitoring/nsPerformanceStats.cpp:1366 (Diff revision 3) > - mPendingAlerts.swap(alerts); > + // Unless `mPendingAlerts.length() <= 8`, the `Move` operation below > + // is constant-time. Same comment here as in the JS code.

Attachment #8850459 - Flags: review?(nfroyd) → review+

Jan de Mooij [:jandem]

Comment 25

•

8 years ago

mozreview-review

Comment on attachment 8850459 [details] Bug 1342714 - Reducing allocations in AutoStopwatch; https://reviewboard.mozilla.org/r/123058/#review126354 ::: js/src/vm/Stopwatch.cpp:24 (Diff revision 3) > > #include "gc/Zone.h" > #include "vm/Runtime.h" > > +// Maximal initialization size, in elements for the vector of groups. > +#define MAX_GROUPS_INIT_CAPACITY 1024 Nit: I'd prefer the more modern static const size_t MAX_GROUPS_INIT_CAPACITY = 1024; Then you also don't need the size_t-cast below. (Personally I'd also move it into the function that uses it: having the constant far away makes it easy to forget removing it if we ever remove the consumer for some reason.) ::: js/src/vm/Stopwatch.cpp:158 (Diff revision 3) > - PerformanceGroupVector recentGroups; > - recentGroups_.swap(recentGroups); > + // Unless `recentGroups_.length() <= 8`, the `Move` operation below > + // is constant-time. Agreed with Nathan, but also: moving <= 8 elements is pretty close to constant-time (IMO O(n) vs O(1) only matters when n can be large), so I'd just not mention it.

Attachment #8850459 - Flags: review?(jdemooij) → review+

Jan de Mooij [:jandem]

Updated

•

8 years ago

Flags: needinfo?(jdemooij)

Comment hidden (mozreview-request)

David Teller [:Yoric] - still alive but not very active

Assignee

Comment 27

•

8 years ago

mozreview-review-reply

Comment on attachment 8850459 [details] Bug 1342714 - Reducing allocations in AutoStopwatch; https://reviewboard.mozilla.org/r/123058/#review126326 > I'm unsure that this comment is actually helpful: why is this particular operation worth calling out vs. all the other things we're doing? The expected, common case is that `recentGroups_.length() <= 8`, too, so the comment is saying that the exceptional situation is constant-time...what about the common case? > > I'm also not wild about the comment embedding a magic constant from the `PerformanceGroupVector` typedef that might get out of sync with the typedef. Actually, in this case, we expect that it's going to be much longer. If you're confused by the different uses of `PerformanceGroupVector`, recall that I was just r-ed attempting to give them two different type names :)

Comment hidden (mozreview-request)

Pulsebot

Comment 34

•

8 years ago

Pushed by dteller@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/cf506c806cb5 Reducing allocations in AutoStopwatch;r=froydnj,jandem

Wes Kocher (:KWierso) (Not reading bugmail; email directly if needed)

Comment 35

•

8 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/cf506c806cb5

Status: NEW → RESOLVED

Closed: 8 years ago

status-firefox55: --- → fixed

Resolution: --- → FIXED

Target Milestone: --- → mozilla55

Dave Hunt [:davehunt] [he/him] ⌚BST

Updated

•

3 years ago

Performance Impact: --- → P1

Whiteboard: [qf:p1][platform-rel-Google][platform-rel-GoogleSuite][platform-rel-GoogleSheets] → [platform-rel-Google][platform-rel-GoogleSuite][platform-rel-GoogleSheets]

Disable performance monitoring by default 8 years ago (no longer active) 8.84 KB, patch		Details \| Diff \| Splinter Review
Profile screenshot 8 years ago (no longer active) 481.71 KB, image/png		Details
Another profile 8 years ago (no longer active) 284.25 KB, image/png		Details
Bug 1342714 - Reducing allocations in AutoStopwatch; 8 years ago David Teller [:Yoric] - still alive but not very active 59 bytes, text/x-review-board-request	jandem : review+ froydnj : review+	Details