Closed Bug 1373086 Opened 7 years ago Closed 3 years ago

performance.mark and performance.measure are too slow

Categories

(Core :: DOM: Core & HTML, enhancement, P3)

enhancement

Tracking

()

RESOLVED FIXED
94 Branch
Performance Impact ?
Tracking Status
firefox94 --- fixed

People

(Reporter: Harald, Assigned: sefeng)

References

(Depends on 1 open bug)

Details

(Keywords: perf)

Attachments

(2 files, 1 obsolete file)

Reported by the React team.

Developer builds have performance markers enabled in React which marks function calls for each component which can generate hundreds of calls to performance.mark/measure per frame. With the mark overhead performance suffers dramatically.
By any chance, is there a testcase for this?
Flags: needinfo?(hkirschner)
This was implemented in bug 782751.
Flags: needinfo?(kyle)
Attached file not too useful test (obsolete) —
This test is fast in Nightly (~24ms) and very slow in Chrome (~700ms), and Chrome even crashes with this.
And we're fast because we have the limit for the buffer size.
This is perhaps a bit better test, still faster than Chrome.

Profiling shows that at least performance.mark is quite malloc heavy.
But I'll let qdot to say his opinion.

However, we do need some test here indicating what is so slow.
Even just using AutoTArray and not nsTArray seems to help some.
Attachment #8877949 - Attachment is obsolete: true
Yeah, performance mark/measure both have new calls in them to create objects to return to content. Those objects are tiny, so there's a few ways we could probably optimize this. Have any opinions on the best solution for that, :smaug? I haven't had to deal with creating a lot of DOM objects quickly before so not quite sure what the best method is here.
Flags: needinfo?(kyle) → needinfo?(bugs)
nsTArray usage is showing up. If it was AutoTArray, we wouldn't allocate so much.
And perhaps we shouldn't create the actual PerformanceEntry objects before the web page actually requests for them.
Flags: needinfo?(bugs)
Assignee: nobody → kyle
Ok, using the test in the attachments here, I managed to optimize Mark performance by about 25%, storing marks to AutoTArray<Pair<nsString>, DOMHighResTimestamp> and only creating PerformanceMark objects on calls to GetEntries(). Also optimized GetEntries() so that we only run a Sort if we're actually merging resource and user. Might be able to take this even lower by using an hash table of <nsString, AutoTArray>.

Anyways, now the question is how much work we want to put into this compared to other optimization bugs, sync message removal, etc. Moving to AutoTArray is a pretty quick, easy win. The pair/hashtable stuff would take longer due to logic changes. Since we're not changing functionality, our tests should let us know if things are still basically working.
Also meant AutoTArray<Pair<nsString, DOMHighResTimestamp>> there.
Ehsan/Smaug, priorities on this versus sync message removal? I'm guessing sync is more important right now since we hit that more places? (whoever gets to the ni first can cancel for both)
Flags: needinfo?(ehsan)
Flags: needinfo?(bugs)
I'm assuming you're asking about bug 1350640?  I honestly have no idea how frequently that one happens, and this one affects react developer builds badly.  We have enough time before 57 for both, so I'd be happy with either order.  :-)
Flags: needinfo?(ehsan)
Totally bias but I'd choose this one first :-)
We have no information here how bad the performance issue is, so hard to say anything.
Flags: needinfo?(bugs)
'k, well, they're both on my list, and I don't think either is more than a few days, so I'll get them both done. Thanks for the input.
This is common in debug builds and affects developer ergonomics. Making it more lightweight would help Speedometer v2 as well until they switch out their debug builds with production.
Flags: needinfo?(hkirschner)
Could you be a bit more exact. debug builds of React, I assume, whatever that means.
Do you have some example case here to profile?
Saying something is slow without any testcase or example isn't too actionable. Maybe we're optimizing wrong thing here?
Flags: needinfo?(hkirschner)
I'm gonna go ahead and make this a metabug. Optimizing performance object creation is taking a while, and it's be nice to get the AutoTArray opt in now so we at least get a little better while waiting for bigger gains.
No longer blocks: 1378541
Depends on: 1378541
I found example of performance.* usage (perhaps over usage) from the https://github.com/ScottWeinstein/react-ng-perf-comparison repo via https://github.com/facebook/react-devtools/issues/801#issuecomment-307871766

Here are the STR for getting loads of performance measures / marks.

    git clone https://github.com/ScottWeinstein/react-ng-perf-comparison.git
    cd react-ng-perf-comparison/react-16/
    npm install
    npm start
    open http://localhost:9000/

Note that I filed https://github.com/devtools-html/perf.html/issues/462 because the sheer amount of marks overloads perf.html right now.
bug 1159003 landed, removing the 150 limit on usertimings
See Also: → 1159003
Priority: -- → P2
Would this be a good test case? https://jsperf.com/console-cost-perf-mark

In the second test case that mimics React's use of markers for per component tracking, Chrome gets 1,486 op/sec and Firefox 186.
Flags: needinfo?(hkirschner) → needinfo?(overholt)
Just replicated those findings over here, so that will be super helpful for figuring out how useful bug 1378541 is. Thanks!
I'll let qDot take this when he's back.
Flags: needinfo?(overholt)
Priority: P2 → P3
Component: DOM → DOM: Core & HTML
Assignee: kyle → nobody

(from bug 1622309)

See this profile: https://perfht.ml/39MrdPy (Performance.measure ends up being 66% of the non-idle time in this worker)

Most of the time is spent in mozilla::dom::Performance::ResolveTimestampFromName. Naively, it seems we could save most of this time by caching the last Performance.mark instead of having to search through them all every time.

Keywords: perf
Whiteboard: [qf]

There's another profile found by Bas https://share.firefox.dev/3DAFL47.

Given most of the time is spent in Performance::ResolveTimestampFromName and assume the QuickSort usage is blocker, the way we should improve it is instead of combing resource timing entries and user timing entries together and sorting it afterwards, use merge sort to combine these pre-sorted arrays, so that the time complexity is going to be 0(n).

We should also not performing the sorting if only one of the arrays is involved. This is going to be extremely helpful for ResolveTimestampFromName because only user timing entries are involved.

Credits to Randell for the above ideas!

To optimize it even further, we could use a map to store user entries, such that the key would be the names of the entries, and the values would be an array of entries (This is also what Chrome does). However, we should try the above sorting strategy first before changing the data structure.

We did two optimizations in this patch

  1. Since mResourceEntries and mUserEntries are presorted
    arrays, we could just merge them by iterating them over without
    performing a quick sort.
  2. Don't perform quick sort if entry type is provided.
Assignee: nobody → sefeng
Status: NEW → ASSIGNED
Pushed by sefeng@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/356896ae2ce1
Optimize Performance::GetEntriesByName r=smaug
Status: ASSIGNED → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
Target Milestone: --- → 94 Branch
Performance Impact: --- → ?
Whiteboard: [qf]
See Also: → 1811750
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: