Open Bug 1373086 Opened 4 years ago Updated 1 year ago

performance.mark and performance.measure are too slow

Categories

(Core :: DOM: Core & HTML, enhancement, P3)

enhancement

Tracking

()

People

(Reporter: Harald, Unassigned)

References

(Depends on 1 open bug)

Details

(Keywords: perf)

Attachments

(1 file, 1 obsolete file)

Reported by the React team.

Developer builds have performance markers enabled in React which marks function calls for each component which can generate hundreds of calls to performance.mark/measure per frame. With the mark overhead performance suffers dramatically.
By any chance, is there a testcase for this?
Flags: needinfo?(hkirschner)
This was implemented in bug 782751.
Flags: needinfo?(kyle)
Attached file not too useful test (obsolete) —
This test is fast in Nightly (~24ms) and very slow in Chrome (~700ms), and Chrome even crashes with this.
And we're fast because we have the limit for the buffer size.
This is perhaps a bit better test, still faster than Chrome.

Profiling shows that at least performance.mark is quite malloc heavy.
But I'll let qdot to say his opinion.

However, we do need some test here indicating what is so slow.
Even just using AutoTArray and not nsTArray seems to help some.
Attachment #8877949 - Attachment is obsolete: true
Yeah, performance mark/measure both have new calls in them to create objects to return to content. Those objects are tiny, so there's a few ways we could probably optimize this. Have any opinions on the best solution for that, :smaug? I haven't had to deal with creating a lot of DOM objects quickly before so not quite sure what the best method is here.
Flags: needinfo?(kyle) → needinfo?(bugs)
nsTArray usage is showing up. If it was AutoTArray, we wouldn't allocate so much.
And perhaps we shouldn't create the actual PerformanceEntry objects before the web page actually requests for them.
Flags: needinfo?(bugs)
Assignee: nobody → kyle
Ok, using the test in the attachments here, I managed to optimize Mark performance by about 25%, storing marks to AutoTArray<Pair<nsString>, DOMHighResTimestamp> and only creating PerformanceMark objects on calls to GetEntries(). Also optimized GetEntries() so that we only run a Sort if we're actually merging resource and user. Might be able to take this even lower by using an hash table of <nsString, AutoTArray>.

Anyways, now the question is how much work we want to put into this compared to other optimization bugs, sync message removal, etc. Moving to AutoTArray is a pretty quick, easy win. The pair/hashtable stuff would take longer due to logic changes. Since we're not changing functionality, our tests should let us know if things are still basically working.
Also meant AutoTArray<Pair<nsString, DOMHighResTimestamp>> there.
Ehsan/Smaug, priorities on this versus sync message removal? I'm guessing sync is more important right now since we hit that more places? (whoever gets to the ni first can cancel for both)
Flags: needinfo?(ehsan)
Flags: needinfo?(bugs)
I'm assuming you're asking about bug 1350640?  I honestly have no idea how frequently that one happens, and this one affects react developer builds badly.  We have enough time before 57 for both, so I'd be happy with either order.  :-)
Flags: needinfo?(ehsan)
Totally bias but I'd choose this one first :-)
We have no information here how bad the performance issue is, so hard to say anything.
Flags: needinfo?(bugs)
'k, well, they're both on my list, and I don't think either is more than a few days, so I'll get them both done. Thanks for the input.
This is common in debug builds and affects developer ergonomics. Making it more lightweight would help Speedometer v2 as well until they switch out their debug builds with production.
Flags: needinfo?(hkirschner)
Could you be a bit more exact. debug builds of React, I assume, whatever that means.
Do you have some example case here to profile?
Saying something is slow without any testcase or example isn't too actionable. Maybe we're optimizing wrong thing here?
Flags: needinfo?(hkirschner)
I'm gonna go ahead and make this a metabug. Optimizing performance object creation is taking a while, and it's be nice to get the AutoTArray opt in now so we at least get a little better while waiting for bigger gains.
No longer blocks: 1378541
Depends on: 1378541
I found example of performance.* usage (perhaps over usage) from the https://github.com/ScottWeinstein/react-ng-perf-comparison repo via https://github.com/facebook/react-devtools/issues/801#issuecomment-307871766

Here are the STR for getting loads of performance measures / marks.

    git clone https://github.com/ScottWeinstein/react-ng-perf-comparison.git
    cd react-ng-perf-comparison/react-16/
    npm install
    npm start
    open http://localhost:9000/

Note that I filed https://github.com/devtools-html/perf.html/issues/462 because the sheer amount of marks overloads perf.html right now.
bug 1159003 landed, removing the 150 limit on usertimings
See Also: → 1159003
Priority: -- → P2
Would this be a good test case? https://jsperf.com/console-cost-perf-mark

In the second test case that mimics React's use of markers for per component tracking, Chrome gets 1,486 op/sec and Firefox 186.
Flags: needinfo?(hkirschner) → needinfo?(overholt)
Just replicated those findings over here, so that will be super helpful for figuring out how useful bug 1378541 is. Thanks!
I'll let qDot take this when he's back.
Flags: needinfo?(overholt)
Priority: P2 → P3
Component: DOM → DOM: Core & HTML
Assignee: kyle → nobody
Duplicate of this bug: 1622309

(from bug 1622309)

See this profile: https://perfht.ml/39MrdPy (Performance.measure ends up being 66% of the non-idle time in this worker)

Most of the time is spent in mozilla::dom::Performance::ResolveTimestampFromName. Naively, it seems we could save most of this time by caching the last Performance.mark instead of having to search through them all every time.

Keywords: perf
You need to log in before you can comment on or make changes to this bug.