Closed Bug 1346723 Opened 7 years ago Closed 3 years ago

[Meta] Setting input.value when input has focus is 1.1x slower than Chrome/Safari

Categories

(Core :: DOM: Core & HTML, defect, P3)

Desktop
All
defect

Tracking

()

RESOLVED FIXED
Performance Impact ?

People

(Reporter: jandem, Assigned: masayuki)

References

(Depends on 4 open bugs, Blocks 3 open bugs)

Details

(Keywords: meta, perf)

Attachments

(4 files, 1 obsolete file)

Attached file Test (obsolete) —
I started to look at the very first Speedometer test to figure out where we are much slower than other engines and I didn't have to look far :)

Basically each test starts like this, where newTodo is an <input> element and numberOfItemsToAdd is 100:

    for (var i = 0; i < numberOfItemsToAdd; i++) {
        newTodo.value = 'Something to do ' + i;
        ...
    }

Just this part takes ~10 ms for us, while it only takes 0-1 for Chrome/Safari.

For the attached microbenchmark (it does 10000 sets instead of 100) I get:

Nightly: 1882 ms
Chrome:    43 ms
Safari:    46 ms

It's only incredibly slow when the input element has focus. If I comment out the el.focus() call, Nightly is at 12 ms and Chrome/Safari around 8 ms, so we're still a little slower but much better than 30x slower.
Flags: needinfo?(ehsan)
Flags: needinfo?(bzbarsky)
Blocks: Speedometer_V2
No longer blocks: 1245279
(In reply to Jan de Mooij [:jandem] from comment #0)
> Just this part takes ~10 ms for us, while it only takes 0-1 for
> Chrome/Safari.

FWIW, fixing this will probably not affect our overall Speedometer score that much, but it would make it easier for me to investigate other issues. It also seems like this is something that affects real-world websites.
Hmm, lots of node additions and removal. I thought we were reusing the same native anonymous div and a text node inside it, but apparently not.

http://searchfox.org/mozilla-central/rev/ef0b6a528e0e7b354ddf8cdce08392be8b8ca679/editor/libeditor/DeleteNodeTransaction.cpp#72
I wonder if we shouldn't do any TransactionManager stuff even if the element is focused if the value is being changed using .value.
The profile when element isn't focused looks more what I'd expect happening here: calling textnode's SetText
Attached file Test
Slightly better testcase: instead of el.focus it now uses autofocus, like the original test, and I also added the placeholder attribute Speedometer has:

  <input id="t" placeholder="What needs to be done?" autofocus>

This makes Chrome/Safari a few ms slower (44-50 ms) but it pushes Nightly over 2 seconds here.
Attachment #8846559 - Attachment is obsolete: true
Freaking editor :((

https://perfht.ml/2nxxNAQ

I'll see what I can do.  I have a number of patches in flight so I may not get to this for a number of days (perhaps a week or longer.)  Are you seriously blocked on this, Jan?
Assignee: nobody → ehsan
Flags: needinfo?(ehsan) → needinfo?(jdemooij)
Flags: needinfo?(ehsan)
In fairness, all the editor is doing is removing all the text that's there and setting the new text.  It just does this by removing/adding nodes instead of modifying a textnode in place.  The profile is showing a ton of time in (not under) nsNodeUtils::ContentRemoved/Appended...  Plus time under it in ranges, etc.

That said, 0.2ms per set (2s over 10k sets) is abysmal.  Something weird is going on here.
And to be clear, the ideal goal here is to reduce the time by a factor of 100 or more.  Over here, on the last testcase, we're at 2170ms, while removing the autofocus attr puts it closer to 15ms.  That means anything over 1% of the total profile needs to go.

Quickly looking at the profile that means:

1)  Time spent doing GetValue in HTMLInputElement::SetValue, which goes via TextEditor::OutputToString (1% of profile).

2)  nsFrameSelection::SelectAll in nsTextEditorState::SetValue (3% of profile).

3)  Selection::EndBatchChanges (1.5% of profile).

4)  EditorBase::EndPlaceHolderTransaction (1.7% of profile).

plus of course the node removal/reinsertion business....

For that last bit, what I really wonder is whether we just end up creating a ton of Ranges that are sticking around and giving us O(N^2) behavior.... That's the only way I can explain nsNodeUtils::ContentRemoved/Appended being so slow.  I'll look into that.
So yes, we do create a bunch of ranges.  Specifically:

1)  IMEContentObserver::NotifyContentAdded calls ContentEventHandler::GetFlatTextLengthInRange which creates an nsRange so it can package up some node+offset pairs in it and use it to Init an nsIContentIterator.  Just switching to mutating textnodes won't help here, because IMEContentObserver::CharacterDataChanged calls GetFlatTextLengthInRange too.

2)  Selection::Collapse clears out all its existing ranges, then creates a new empty range at the given position and inserts it.  We call Collapse via _three_ codepaths in this testcase: from DeleteRangeTransaction::DoTransaction (under TextEditor::DeleteSelection, which is called from TextEditRules::WillInsertText), directly from WillInserText and from nsFrameSelection::TakeFocus, which is called from nsFrameSelection::SelectAll.

3)  SelectAll calls Selection::Extend which creates a new range.  Two actually: it clones the anchorfocus range, and then creates a "difRange".

4)  EditorBase::CreateTxnForDeleteSelection clones the range to delete.

So I tried hacking those in various not-to-be-checked-in ways to avoid the problem.  Specifically, I made IMEContentObserver::NotifyContentAdded return immediately, Selection::Collapse grab a ref to its first range, if any, and reuse it, and added a boolean member to nsRange that indicates it should be "dead" in the sense of not adding itself as a mutation observer, for use for the ranges Selection::Extend creates and the cloning in CreateTxnForDeleteSelection.

The changes for the first three items didn't seem to help much on their own; hacking the CreateTxnForDeleteSelection range to not be live sped things up by 2x or so.  I haven't checked whether undoing the changes for items 1-3 would still leave us at that 2x speedup.

Anyway, with that O(N^2) bit fixed the profile shows a bunch of frame construction as we remove/add content, GetValue taking time so we can update our "are we valid" state, I assume, selection operations of various sorts, etc, etc.  Obviously, deleting existing editor text by selecting it and then deleting the selection is not the way we should be doing things, given all the work selection involves....
Flags: needinfo?(bzbarsky)
(In reply to :Ehsan Akhgari from comment #6)
> I'll see what I can do.  I have a number of patches in flight so I may not
> get to this for a number of days (perhaps a week or longer.)  Are you
> seriously blocked on this, Jan?

No I can probably work around it for now. I also have a number of other bugs to fix/land this week anyway.

It would be awesome to have this fixed in 55 given the severity (as Boris said, 0.2ms per set is terrible and it will obviously be worse on slower machines).
Flags: needinfo?(jdemooij)
For practical purposes it's probably .1ms, because the .2ms time involves a huge number of sets all on the same text control without ever hitting the event loop (to create all those ranges and give the O(N^2) notification behavior), which I don't think is likely to happen in practice.

But yes, even .1ms is pretty bad....  Part of the problem is that iirc we don't do lazy frame construction inside the editor.
(In reply to Boris Zbarsky [:bz] (still a bit busy) (if a patch has no decent message, automatic r-) from comment #12)
> But yes, even .1ms is pretty bad....  Part of the problem is that iirc we
> don't do lazy frame construction inside the editor.

Yeah, we don't.  I doubt that's easy to change.
Flags: needinfo?(ehsan)
Flags: needinfo?(ehsan)
(In reply to :Ehsan Akhgari from comment #13)
> (In reply to Boris Zbarsky [:bz] (still a bit busy) (if a patch has no
> decent message, automatic r-) from comment #12)
> > But yes, even .1ms is pretty bad....  Part of the problem is that iirc we
> > don't do lazy frame construction inside the editor.
> 
> Yeah, we don't.  I doubt that's easy to change.

Actually, I'm not 100% sure why we need to do eager frame construction any more.  At some point we needed that for caret height calculations IIRC but that changed years ago.  I don't remember any other reasons off the top of my head.  Boris, do you?
Flags: needinfo?(ehsan) → needinfo?(bzbarsky)
I don't remember the reasons at all, sorry.  :(
Flags: needinfo?(bzbarsky)
Whiteboard: [qf:p1]
Attached file Test 2
I just noticed on another Speedometer that the slowdown also happens when the input is not focused (a change I made to work around this bug) but after we dispatch a keydown event.
Right, the key is that the editor needs to be initialized.   This can be caused by various things including focus, mouse events, key events, certain API operations, etc.
Depends on: 1348073
Priority: -- → P1
I marked this as P1, since it would be good to have this fixed in this release (if possible). It is one of the major pain points on e.g. speedometer.
I'm still planning to look at this, but realistically I won't be able to spend any time on this before April.  Sorry.  :(
I tested about transactions on other browsers:
https://jsfiddle.net/d_toybox/4zr6jryw/

I tried:
1. type "abc" in the <input>
2. set .value of it by clicking the <button>
3. Ctrl+Z
4. Ctrl+Shift+Z (or Ctrl+Y)

Then, Chrome makes the <input> empty at #3, then, restores "abcs value is set (1)" at #4. So, the transaction management is really broken. But they try to enable undo/redo feature across setting .value.

On the other hand, Edge doesn't allow to do undo at #3.
(In reply to Boris Zbarsky [:bz] (still a bit busy) (if a patch has no decent message, automatic r-) from comment #9)
> So yes, we do create a bunch of ranges.  Specifically:
> 
> 1)  IMEContentObserver::NotifyContentAdded calls
> ContentEventHandler::GetFlatTextLengthInRange which creates an nsRange so it
> can package up some node+offset pairs in it and use it to Init an
> nsIContentIterator.  Just switching to mutating textnodes won't help here,
> because IMEContentObserver::CharacterDataChanged calls
> GetFlatTextLengthInRange too.

Although, there is performance issue of IMEContentObserver (bug 1250823), I think that we can stop using nsRange for ContentEventHandler if we make nsIContentIterator initialize with a pair of node and offset in it.

However, there might be another range class/struct which just store 2 points in a DOM tree. And nsRange should inherit it. Then, a lot of nsRange (or nsIDOMRange) users can avoid using it.

> 2)  Selection::Collapse clears out all its existing ranges, then creates a
> new empty range at the given position and inserts it.  We call Collapse via
> _three_ codepaths in this testcase: from
> DeleteRangeTransaction::DoTransaction (under TextEditor::DeleteSelection,
> which is called from TextEditRules::WillInsertText), directly from
> WillInserText and from nsFrameSelection::TakeFocus, which is called from
> nsFrameSelection::SelectAll.

If we'd change this behavior, we *might* break existing web.

Selection.getRangeAt() returns a reference to a range. So, if current Selection::Collapse() replaces range (assuming that there is only one range in the selection),

var range = document.getSelection().getRangeAt(0);
document.getSelection().collapse(foo, 0);

this causes, document.getSelection().getRangeAt(0) and range becomes different instance.

On the other hand, if we just modify the range at Selection::Collapse(), document.getSelection().getRangeAt(0) returns same range instance.

So, modifying |range| has different meaning if we change Selection::Collapse() behavior.

Although, Chrome returns clone range with Selection.getRangeAt(). So, there must be few web apps which would be broken if we'd change the behavior.

# I feel replacing with different range instance isn't make sense for web apps, though.
# https://w3c.github.io/selection-api/#dom-selection-collapse

> 3)  SelectAll calls Selection::Extend which creates a new range.  Two
> actually: it clones the anchorfocus range, and then creates a "difRange".

There is same compatibility issue as above if we'd change the behavior.

> 4)  EditorBase::CreateTxnForDeleteSelection clones the range to delete.
> 
> So I tried hacking those in various not-to-be-checked-in ways to avoid the
> problem.  Specifically, I made IMEContentObserver::NotifyContentAdded return
> immediately, Selection::Collapse grab a ref to its first range, if any, and
> reuse it, and added a boolean member to nsRange that indicates it should be
> "dead" in the sense of not adding itself as a mutation observer, for use for
> the ranges Selection::Extend creates and the cloning in
> CreateTxnForDeleteSelection.
> 
> The changes for the first three items didn't seem to help much on their own;

Interesting. I was thinking this is same bug as bug 1250823 when I saw the report.
OK, so I poked a bit more and I think the key is that for my items 1-3 the ranges are not long-lived.  So they add some overhead, but don't cause the O(N^2) behavior.

For item 4, the range lives as long as the DeleteRangeTransaction does.  But it's unused after DeleteRangeTransaction::DoTransaction.  Just nulling it out in DoTransaction gives me the 2x speedup comment 9 mentions.  Filed bug 1349940 on that.
Depends on: 1349940
So I tried hack-enabling lazy frame construction on top of bug 1349940 and that gives me maybe 20% speedup.  I've verified that the remaining time is NOT spent in frame construction.  So bug 1348073 won't help too much here.

One of the reasons we're messing about with selection at all is bug 518122.  Which is fine, but can we make assumptions about our DOM in the texteditor case and do something smarter than we do now?
(In reply to Boris Zbarsky [:bz] (still a bit busy) (if a patch has no decent message, automatic r-) from comment #23)
> One of the reasons we're messing about with selection at all is bug 518122. 
> Which is fine, but can we make assumptions about our DOM in the texteditor
> case and do something smarter than we do now?

Yes, I'm 99% sure that as of a few years ago, it's guaranteed that our plain text editor will always have one text node followed by a <br> node in its anonymous subtree, and we already have code in the tree relying on this.  Unfortunately I can't find the bug number where this work happened right now...

I think Masayuki will be taking a look at this one.
Assignee: ehsan → masayuki
Depends on: 1250823
No longer depends on: 1348073
SelectAll and DeleteSelection are expensive for this situation from profile.  When I replace it with editor->DeleteText, it will be 2x faster than original.  This is plain text editor and max length of children of root is 2 (text node and br node), so we might need new transaction like SetTextTransaction for this situation.
Just as a reminder, we need to make this 20-30x faster.  2x is nice, but not enough.  And I suspect that anything involving selection and transactions will be too slow in practice.....
Other browsers (Edge, Chrome and Safari) doesn't create transactions for undo by input.value = xxxx;.  It means that Firefox can use undo even if setter is by script.  But others cannot use undo if it is by script.

Now, input.value setter creates DeleteSelection and InsertText transaction.  Example, after running attached micron benchmark, we can use undo (Something to do 9999 -> Something to do 9998 -> Something to do 9997 -> ...) in textbox.  But other browsers cannot use undo if value is changed by script.

If we change to not supporting undo by input.value setter like other browsers, we can get more improvement for this.

Our input.value setter is 35x slower than Chrome now.  If we don't create transactions for undo, it becomes 4-5x slower according to my monkey path.  Also, if we create 1 transaction (new SetTextTransaction?) instead of 2 transactions (DeleteSelectionTransaction adn InsertTextTransaction), it is 10x slower.

Should we keep creating transactions by input.value setter, not user input?
Flags: needinfo?(masayuki)
Good point.

However, I'm afraid about autocomplete behavior (both in chrome and content) and/or something similar feature in chrome. The most important thing of here is improving the performance in content. So, even if there are some problem in chrome, we can do it only in content. How about autocomplete in content? (I don't know how to set the value from toolkit's autocomplete.)
Flags: needinfo?(masayuki)
Autocomplete takes a different codepath, iirc (through SetUserInput()) and hence we can give it different behavior from .value sets if we want to.
Depends on: 1358025
Depends on: 1359707
Depends on: 1360137
Depends on: 596501
Depends on: 1360162
Depends on: 1360154
This is much better now (at least 3x faster!) but on Speedometer it still shows up on every test and especially on the short-running ones it adds significant overhead. Any chance we could fix some of the bugs blocking this one?
Flags: needinfo?(m_kato)
(In reply to Jan de Mooij [:jandem] (PTO May 16-18) from comment #30)
> This is much better now (at least 3x faster!) but on Speedometer it still
> shows up on every test and especially on the short-running ones it adds
> significant overhead. Any chance we could fix some of the bugs blocking this
> one?

I can improve more 3x-4x faster at least with my idea (Create new transaction type instead of selectAll and insertText, some more optimizations for placeholder and etc.).  But speedometer score improves 15%-20% only even if all fixes are applied.
Flags: needinfo?(m_kato)
(In reply to Makoto Kato [:m_kato] from comment #31)
> But speedometer score improves 15%-20% only even if all fixes are applied.

15-20% is actually pretty big and definitely worth spending time on. Also, it may affect newer versions of Speedometer more because there will be more short-running tests, so the overhead of this will be bigger.
15-20% would be massive improvement in any kind of test, microbenchmark or not. While improving innerHTML performance, the work was basically to fix several small issues which each gave perhaps 1% boost.

Makoto, with your fixes, how would we behave on the tests in this bug comparing to other browsers?
Just wondering if the approach gives us the best performance. Though, new transaction type definitely sounds like a reasonable option.
> Just wondering if the approach gives us the best performance. 

The best performance would be to not allow undo across value set at all, not go through the transaction manager or editor, and directly mutate the textnode.

I have a hard time seeing anything that involves the transaction manager and editor being able to get anywhere close to the performance that direct textnode mutation would give us...

Just to be clear on the numbers, this is what I see as of today.

Firefox nightly:
  Attachment 8846622 [details]: 595
  Attachment 8848015 [details]: 436

Chrome dev:
  Attachment 8846622 [details]: 30
  Attachment 8848015 [details]: 7

WebKit nightly:
  Attachment 8846622 [details]: 41
  Attachment 8848015 [details]: 8

so a 3-4x improvement will still leave us 3-10x slower than the competition on the microbenchmarks.  Whether that's enough to make speedometer no longer be seriously gated on this is a separate question.  If the number in comment 31 is an across-the-board 15-20% improvement to speedometer score, that's pretty huge.

It would be interesting to see what the across-the-board improvement on speedometer is if we make .value sets just no-op altogether; that might tell us what the best outcome we can expect from this bug is, on that test suite.
I agree with Boris.  Based on comment 27, it seems like we are the only browser engine that supports undoing setting of .value through script callers, if I understand correctly, so it should be possible to just stop supporting that.

If my understanding here is correct, I highly recommend looking into that first before spending time on optimizing anything else here, especially if those things fall under the editor/transaction manager triggered code that would just go away if we stopped supporting undoing in this case.
Depends on: 1365874
This shows where we should get in performance, at least close to this.
The hack doesn't seem to work too well with Speedometer. There seems to be different kinds of .value setter calls which end up doing something slow
I will try with my fix again.  Is http://browserbench.org/Speedometer/ still v1.0?
Depends on: 1366218
or https://sm.duct.io/Full.html

(In reply to Olli Pettay [:smaug] from comment #37)
> The hack doesn't seem to work too well with Speedometer. There seems to be
> different kinds of .value setter calls which end up doing something slow
Except that I was profiling wrong build. Yes, the hacky patch does help. But I'm getting 90 without the patch and 94 with when running Speedometer, so it isn't that much, but definitely something to fix anyhow.
No longer blocks: TimeToFirstPaint_FB
So I'm a little unclear on the current state.  We disabled undo in bug 1366218 but we're still using editor transactions and slow selection manipulation, right?  Again, the goal should be to just directly modify the textnode...
(In reply to Boris Zbarsky [:bz] (work week until 5/26) (if a patch has no decent message, automatic r-) from comment #40)
> So I'm a little unclear on the current state.  We disabled undo in bug
> 1366218 but we're still using editor transactions and slow selection
> manipulation, right?  Again, the goal should be to just directly modify the
> textnode...

Yes, now I use transaction and I will remove transaction for script case at finally.  Before this, we need improve some slow path (Selection.collapse and twice GetValue).
transaction will be slow because PlaceHolderTransaction is heavy.  If we can remove it for script case, I might keep transaction code for this situation.
Depends on: 1368888
Depends on: 1370806
Depends on: 1371277
Depends on: 1371492
Depends on: 1372761
Tested this again locally, and looks like fixing this would give around 3 points in Speedometer-2
(tested using my hacky patch).
Need to figure out a proper way to bypass pretty much all editor code here.
Depends on: 1375910
Depends on: 1376544
Depends on: 1383079
Depends on: 1383641
Depends on: 1383659
Depends on: 1385324
Depends on: 1385354
Depends on: 1385369
Depends on: 1385384
Depends on: 1385389
Depends on: 1385392
Depends on: 1385395
Depends on: 1385478
Depends on: 1385521
Depends on: 1373476
Depends on: 1385525
Depends on: 1385530
Depends on: 1385533
Depends on: 1385538
This is a profile of the current state of my local build on top of the patches I have for the dependencies I filed today: https://perfht.ml/2v8vLhK
Just so I have something to compare those numbers to...  Is that a profile of the testcase here, or of speedometer?  And if the former, what are the reported times for that build and for Chrome?

Or put another way, what fraction of that profile do we still need to make go away?
(In reply to Boris Zbarsky [:bz] from comment #45)
> Just so I have something to compare those numbers to...  Is that a profile
> of the testcase here, or of speedometer?

This is a profile of https://bugzilla.mozilla.org/attachment.cgi?id=8848015 reloaded a bunch of times.

> And if the former, what are the
> reported times for that build and for Chrome?

For me on trunk on an optimized build + some patches in the dependencies waiting for reviews, I get between 40-45ms.  On Chrome I get 7-10.  Our maximum devided by their mimumum shows a 6x difference or so left to close.
Depends on: 1386381
Depends on: 1386411
Depends on: 1386468
Depends on: 1386471
Depends on: 1386472
Depends on: 1386480
Depends on: 1386484
Depends on: 1386485
With the patches I submitted in the dependencies on top of the same branch as comment 46, now I get between 24-27ms locally on attachment 8848015 [details].
Depends on: 1390342
Depends on: 1390349
Depends on: 984690
Depends on: 1390382
Depends on: 1390402
Depends on: 1390527
Whiteboard: [qf:p1] → [qf:p2]
On my environment (Win10), Chrome's score of attachment 8848015 [details] is 7~8, Nightly's score is 27~29.
Depends on: 1395701
Today's Nightly's score is 24~25.
(Given that this is more meta-y, I'm marking as P3 but that should in no way indicate a lack of importance or stature of the excellent work happening here!)
Priority: P1 → P3
Keywords: perf
Whiteboard: [qf:p2] → [qf:p1]
Whiteboard: [qf:p1] → [qf:i60][qf:p1]
Whiteboard: [qf:i60][qf:p1] → [qf:f60][qf:p1]
I'm seeing now 2x on Linux and 2.5x on Windows.
Looks like the recent changes shaved off yet another 10%.
Whiteboard: [qf:f60][qf:p1] → [qf:f61][qf:p1]
Summary: Setting input.value when input has focus is 40x slower than Chrome/Safari → [Meta] Setting input.value when input has focus is 40x slower than Chrome/Safari
Whiteboard: [qf:f61][qf:p1] → [qf:f64][qf:p1]
Whiteboard: [qf:f64][qf:p1] → [qf:meta][qf:p1]
Whiteboard: [qf:meta][qf:p1] → [qf:meta]
(tweaking the title to reflect current state better)
Summary: [Meta] Setting input.value when input has focus is 40x slower than Chrome/Safari → [Meta] Setting input.value when input has focus is 2x slower than Chrome/Safari
With today's build, Chrome's score is 10ms ~ 12ms, Nightly's score is 14ms ~ 15ms.
Summary: [Meta] Setting input.value when input has focus is 2x slower than Chrome/Safari → [Meta] Setting input.value when input has focus is 1.5x slower than Chrome/Safari
Depends on: 1529909
Component: DOM → DOM: Core & HTML

Chrome's score is 15ms ~ 18ms, Nightly's score is 19ms ~ 20ms. Our performance of this is really close to Chrome now.

(In reply to Masayuki Nakano [:masayuki] (he/him)(JST, +0900)(no pain of the broken bone, but the corset blocks me to concentrate) from comment #54)

Chrome's score is 15ms ~ 18ms, Nightly's score is 19ms ~ 20ms. Our performance of this is really close to Chrome now.

Great work! It's amazing how far we've come when comparing this to my numbers in comment 0.

Reflecting current numbers from comment 54.

Summary: [Meta] Setting input.value when input has focus is 1.5x slower than Chrome/Safari → [Meta] Setting input.value when input has focus is 1.1x slower than Chrome/Safari

Because of landing a lot of optimization for bug 1597679, today's Nightly build's score of attachment 8848015 [details] becomes really close to Chrome, but we need some more optimization for wining.

On my Win10 environment (Ryzen Threadripper 3960X), Nightly's score is 15-18ms on attachment 8848015 [details] and Chrome 79 score is 18-20ms. So, now, Nightly is faster than Chrome at least on my environment. But I've not checked on other environments (Win10 + Intel CPU, macOS, Linux and Android) yet.

On Mac I get pretty noisy numbers, but generally 22-24ms on Chrome and 26-30ms on today's nightly, with outliers down to 18 and up to 30 on both...

(This is a mid-2017 15" MBP with the 3.1 GHz i7-7920HQ, in case that matters.)

Thank you, Boris. I also tested on Win10 on Intel CPU. Then, Nightly is faster than Chrome.

If we're slower only on macOS, I guess that the difference could be caused by the heap allocation cost. I usually see it as bottle neck only on macOS when I'm working on Quantum Flow.

On Linux, Nightly: 13-15ms, Chrome: 15-18ms.

But on Android, Nightly (Firefox Preview): 78-88ms, Chrome: 50-60ms. The profile is here: https://perfht.ml/2Qk31vc

Interesting.
Andrew, you might be familiar with compiler optimization settings we use on Android.
Wasn't there some plan to change optimization level?

(If one wants to find the relevant to this bug part of the profile, filter using HTMLInputElement.value)

Flags: needinfo?(acreskey)

(In reply to Olli Pettay [:smaug] from comment #62)

Interesting.
Andrew, you might be familiar with compiler optimization settings we use on Android.
Wasn't there some plan to change optimization level?

(If one wants to find the relevant to this bug part of the profile, filter using HTMLInputElement.value)

https://bugzilla.mozilla.org/show_bug.cgi?id=1591725 for optimization. As historical reason, we still use -Os and -Oz due to package size of Fennec. Other platforms uses -O2 / -O3.

Yes, as Makoto mentioned, libxul.so on Android is still -Oz for aggressive size optimizations.
Os and -O2 are being considered, but we're trying to find a way to measure the impact on retention, etc.
Those options are roughly 10% faster than -Oz, so that would help reduce or remove the gap.

Also, PGO just landed for MacOS (I believe Dec 19), so that should help as well.

Flags: needinfo?(acreskey)

Probably by the fix of bug 1591725, Fenix Nighty score is now 57-78ms on my tablet. But Chome runs 50ms-60ms. So, still a little bit slower than Chrome.

On my Galaxy Tab S5e, Fenix Nightly equals or faster then Chrome. Perhaps, stop deleting Text node from TextEditor and the reorganizing editor class inheritance improved the score.

Hmm, on my Pixel 3a, Fenix Nightly takes about 60~70ms, but Chrome takes about 50ms. Although on the desktop, now, Firefox Nightly (8ms) is really faster than Chrome (12ms) though. This meta bug is too long. So, I'll fix this for Desktop and file a new bug for Fenix.

Status: NEW → RESOLVED
Closed: 3 years ago
Hardware: All → Desktop
Resolution: --- → FIXED
Performance Impact: --- → ?
Whiteboard: [qf:meta]
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: