performance.mark and performance.measure are too slow
Categories
(Core :: DOM: Core & HTML, enhancement, P3)
Tracking
()
Tracking | Status | |
---|---|---|
firefox94 | --- | fixed |
People
(Reporter: Harald, Assigned: sefeng)
References
(Depends on 1 open bug)
Details
(Keywords: perf)
Attachments
(2 files, 1 obsolete file)
Reported by the React team. Developer builds have performance markers enabled in React which marks function calls for each component which can generate hundreds of calls to performance.mark/measure per frame. With the mark overhead performance suffers dramatically.
Comment 3•7 years ago
|
||
This test is fast in Nightly (~24ms) and very slow in Chrome (~700ms), and Chrome even crashes with this.
Comment 4•7 years ago
|
||
And we're fast because we have the limit for the buffer size.
Comment 5•7 years ago
|
||
This is perhaps a bit better test, still faster than Chrome. Profiling shows that at least performance.mark is quite malloc heavy. But I'll let qdot to say his opinion. However, we do need some test here indicating what is so slow.
Comment 6•7 years ago
|
||
Even just using AutoTArray and not nsTArray seems to help some.
Updated•7 years ago
|
Comment 7•7 years ago
|
||
Yeah, performance mark/measure both have new calls in them to create objects to return to content. Those objects are tiny, so there's a few ways we could probably optimize this. Have any opinions on the best solution for that, :smaug? I haven't had to deal with creating a lot of DOM objects quickly before so not quite sure what the best method is here.
Comment 8•7 years ago
|
||
nsTArray usage is showing up. If it was AutoTArray, we wouldn't allocate so much. And perhaps we shouldn't create the actual PerformanceEntry objects before the web page actually requests for them.
Updated•7 years ago
|
Comment 9•7 years ago
|
||
Ok, using the test in the attachments here, I managed to optimize Mark performance by about 25%, storing marks to AutoTArray<Pair<nsString>, DOMHighResTimestamp> and only creating PerformanceMark objects on calls to GetEntries(). Also optimized GetEntries() so that we only run a Sort if we're actually merging resource and user. Might be able to take this even lower by using an hash table of <nsString, AutoTArray>. Anyways, now the question is how much work we want to put into this compared to other optimization bugs, sync message removal, etc. Moving to AutoTArray is a pretty quick, easy win. The pair/hashtable stuff would take longer due to logic changes. Since we're not changing functionality, our tests should let us know if things are still basically working.
Comment 10•7 years ago
|
||
Also meant AutoTArray<Pair<nsString, DOMHighResTimestamp>> there.
Comment 11•7 years ago
|
||
Ehsan/Smaug, priorities on this versus sync message removal? I'm guessing sync is more important right now since we hit that more places? (whoever gets to the ni first can cancel for both)
Comment 12•7 years ago
|
||
I'm assuming you're asking about bug 1350640? I honestly have no idea how frequently that one happens, and this one affects react developer builds badly. We have enough time before 57 for both, so I'd be happy with either order. :-)
Comment 13•7 years ago
|
||
Totally bias but I'd choose this one first :-)
Comment 14•7 years ago
|
||
We have no information here how bad the performance issue is, so hard to say anything.
Comment 15•7 years ago
|
||
'k, well, they're both on my list, and I don't think either is more than a few days, so I'll get them both done. Thanks for the input.
Reporter | ||
Comment 16•7 years ago
|
||
This is common in debug builds and affects developer ergonomics. Making it more lightweight would help Speedometer v2 as well until they switch out their debug builds with production.
Comment 17•7 years ago
|
||
Could you be a bit more exact. debug builds of React, I assume, whatever that means. Do you have some example case here to profile? Saying something is slow without any testcase or example isn't too actionable. Maybe we're optimizing wrong thing here?
Comment 18•7 years ago
|
||
I'm gonna go ahead and make this a metabug. Optimizing performance object creation is taking a while, and it's be nice to get the AutoTArray opt in now so we at least get a little better while waiting for bigger gains.
Updated•7 years ago
|
Comment 19•7 years ago
|
||
I found example of performance.* usage (perhaps over usage) from the https://github.com/ScottWeinstein/react-ng-perf-comparison repo via https://github.com/facebook/react-devtools/issues/801#issuecomment-307871766 Here are the STR for getting loads of performance measures / marks. git clone https://github.com/ScottWeinstein/react-ng-perf-comparison.git cd react-ng-perf-comparison/react-16/ npm install npm start open http://localhost:9000/ Note that I filed https://github.com/devtools-html/perf.html/issues/462 because the sheer amount of marks overloads perf.html right now.
Comment 20•7 years ago
|
||
bug 1159003 landed, removing the 150 limit on usertimings
Updated•7 years ago
|
Reporter | ||
Comment 21•7 years ago
|
||
Would this be a good test case? https://jsperf.com/console-cost-perf-mark In the second test case that mimics React's use of markers for per component tracking, Chrome gets 1,486 op/sec and Firefox 186.
Comment 22•7 years ago
|
||
Just replicated those findings over here, so that will be super helpful for figuring out how useful bug 1378541 is. Thanks!
Updated•5 years ago
|
Updated•5 years ago
|
Updated•5 years ago
|
Comment 25•4 years ago
|
||
(from bug 1622309)
See this profile: https://perfht.ml/39MrdPy (Performance.measure ends up being 66% of the non-idle time in this worker)
Most of the time is spent in mozilla::dom::Performance::ResolveTimestampFromName. Naively, it seems we could save most of this time by caching the last Performance.mark instead of having to search through them all every time.
Updated•3 years ago
|
Assignee | ||
Comment 26•3 years ago
•
|
||
There's another profile found by Bas https://share.firefox.dev/3DAFL47.
Given most of the time is spent in Performance::ResolveTimestampFromName
and assume the QuickSort usage is blocker, the way we should improve it is instead of combing resource timing entries
and user timing entries
together and sorting it afterwards, use merge sort to combine these pre-sorted arrays, so that the time complexity is going to be 0(n).
We should also not performing the sorting if only one of the arrays is involved. This is going to be extremely helpful for ResolveTimestampFromName
because only user timing entries are involved.
Credits to Randell for the above ideas!
To optimize it even further, we could use a map to store user entries, such that the key would be the names of the entries, and the values would be an array of entries (This is also what Chrome does). However, we should try the above sorting strategy first before changing the data structure.
Assignee | ||
Comment 27•3 years ago
|
||
We did two optimizations in this patch
- Since mResourceEntries and mUserEntries are presorted
arrays, we could just merge them by iterating them over without
performing a quick sort. - Don't perform quick sort if entry type is provided.
Updated•3 years ago
|
Comment 28•3 years ago
|
||
Pushed by sefeng@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/356896ae2ce1 Optimize Performance::GetEntriesByName r=smaug
Comment 29•3 years ago
|
||
bugherder |
Updated•2 years ago
|
Description
•