Bugzilla

Assignee

Updated

•

4 years ago

Assignee: nobody → asumu

Flags: needinfo?(idimitriou)

Assignee

Comment 1

•

4 years ago

Hi, I'm taking assignment of this issue as I pushed the patches that triggered the alert.

In the push that triggered this, I suspect it would be the first patch (https://hg.mozilla.org/integration/autoland/rev/0b260a20b5834dc1122445d6b9e4f6de68dbf5b8) that might've caused the regression. That's because the first patch affects JS code more generally, while the other is Wasm-specific.

To test this, I've been running some browsertime tests with the patch reverted, and if I'm running this correctly it seems to regain some performance: https://treeherder.mozilla.org/perfherder/compare?originalProject=mozilla-central&originalRevision=2f575c3c4353b2c5adc27519e55ca4f3c125a1d2&newProject=try&newRevision=16568273aa9c8778c1912c958136f6239fa01cea&framework=13

A potential solution is to revert the patch along with some minor changes (to avoid breaking the other patches in the patchset). I think the patch should be revertible in this way without breaking the current state of wasm exception handling, as the patch is required mainly to accommodate a future feature that is part of bug 1690965.

Comment 2

•

4 years ago

If we need it we need it, regression or no...

BugBot [:suhaib / :marco/ :calixte]

Updated

•

4 years ago

Priority: -- → P1

Comment 3

•

4 years ago

Set release status flags based on info from the regressing bug 1335652

status-firefox85: --- → affected

status-firefox86: --- → affected

status-firefox-esr78: --- → unaffected

Julien Cristau [:jcristau]

Updated

•

4 years ago

status-firefox85: affected → unaffected

status-firefox86: affected → unaffected

Comment 4

•

4 years ago

(In reply to Asumu Takikawa from comment #1)

Hi, I'm taking assignment of this issue as I pushed the patches that triggered the alert.

In the push that triggered this, I suspect it would be the first patch (https://hg.mozilla.org/integration/autoland/rev/0b260a20b5834dc1122445d6b9e4f6de68dbf5b8) that might've caused the regression. That's because the first patch affects JS code more generally, while the other is Wasm-specific.

To test this, I've been running some browsertime tests with the patch reverted, and if I'm running this correctly it seems to regain some performance: https://treeherder.mozilla.org/perfherder/compare?originalProject=mozilla-central&originalRevision=2f575c3c4353b2c5adc27519e55ca4f3c125a1d2&newProject=try&newRevision=16568273aa9c8778c1912c958136f6239fa01cea&framework=13

A potential solution is to revert the patch along with some minor changes (to avoid breaking the other patches in the patchset). I think the patch should be revertible in this way without breaking the current state of wasm exception handling, as the patch is required mainly to accommodate a future feature that is part of bug 1690965.

Can you confirm if the regression is related to increasing the size of ErrorObject, or the extra initialization work done in ErrorObject::init? I'd be curious if the extra field is pushing us into a different object size that is slower, or if it's possible to not initialize this field in ErrorObject::init and rely on some default value.

Assignee

Comment 5

•

4 years ago

(In reply to Ryan Hunt [:rhunt] from comment #4)

Can you confirm if the regression is related to increasing the size of ErrorObject, or the extra initialization work done in ErrorObject::init? I'd be curious if the extra field is pushing us into a different object size that is slower, or if it's possible to not initialize this field in ErrorObject::init and rely on some default value.

That's a good point, I hadn't tested that before but I've done it now. I believe this perfherder comparison is showing that removing just the initialization (the diff for the change) is improving the booking tests: https://treeherder.mozilla.org/perfherder/compare?originalProject=mozilla-central&originalRevision=2f575c3c4353b2c5adc27519e55ca4f3c125a1d2&newProject=try&newRevision=cc363575fcec2a883ba77db93033a96e4441ff39&framework=13

In which case, we could leave it uninitialized. I believe it could work to have fromWasmTrap() check if the slot is uninitialized (slot contents passes isUndefined()) and then initialize it to false in that case, then do the read. In which case the initialization should only happen when errors pass through Wasm.

Assignee

Comment 6

•

4 years ago

Attached file Bug 1691426 - address perf regression for ErrorObject wasm trap flag — Details

This patch intends to fix a performance regression on some tests caused
by the overhead of initializing the extra wasm trap slot on
ErrorObjects. The slot initialized is delayed until the flag is
read or set.

Assignee

Comment 7

•

4 years ago

I've attached a patch to address this bug. I believe this improves performance significantly (~9%) for at least one of the tests in the perfherder alert (the booking fcp android-hw-p2-8-0-android-aarch64-shippable opt nocondprof warm webrender case) and then moderately (~5%) for the other two (booking FirstVisualChange android-hw-p2-8-0-android-aarch64-shippable opt nocondprof warm webrender and booking ContentfulSpeedIndex android-hw-p2-8-0-android-aarch64-shippable opt nocondprof warm webrender).

This is based on these perfherder comparisons: https://treeherder.mozilla.org/perfherder/compare?originalProject=mozilla-central&originalRevision=2f575c3c4353b2c5adc27519e55ca4f3c125a1d2&newProject=try&newRevision=8c1b3317be67b524efeb3a9aab13785e12db9d2f&framework=13 (looking at browsertime tests filtered by booking)

I've tried to see if I can improve the latter two tests more, but I didn't see an obvious way to do so without breaking Wasm exception functionality (commenting out the relevant code that queries the ErrorObject trap flag in WasmBuiltins.cpp seems to get more speedup).

Comment 8

•

4 years ago

Ryan, will this problem impact FF88? If so, can we get this reviewed and landed before monday?

Flags: needinfo?(rhunt)

Comment 9

•

4 years ago

Yeah, I just reviewed it and I think we should be able to land this soon.

Flags: needinfo?(rhunt)

[1] https://treeherder.mozilla.org/perfherder/compare?originalProject=try&originalRevision=6c316d5193c9ecfeef0450d5ce4a717e19960c0e&newProject=try&newRevision=7e6eca8149edabc51ea4f7fe472d1a8ae4773b97&framework=13
[2] https://treeherder.mozilla.org/perfherder/compare?originalProject=try&originalRevision=6c316d5193c9ecfeef0450d5ce4a717e19960c0e&newProject=try&newRevision=08912a43754bc63d359756869848b67e33688dc0&framework=13
[3] https://treeherder.mozilla.org/perfherder/compare?originalProject=try&originalRevision=f91503f7a542269c398e70a5d3e661d50afea5b3&newProject=try&newRevision=1eb049ffdbd4d16e6f0f61b3bf70e0e70c31369f&framework=13

Comment 10

•

4 years ago

I've discussed this with Asumu and we're having a hard time proving that any of this work fixes the regression. A perf run that he did with the current patch actually showed a slight regression [1]. Further tweaks to make this even lazier didn't seem to do the trick either [2]. I then did a perf run with the commit completely backed out and that didn't seem to regress the test but there is no clear improvement either [3].

At this point, I believe the safest course is to just land the backout patch conservatively and figure this out with more time.

Comment 11

•

4 years ago

Attached file Bug 1691426 - Backed out changeset 0b260a20b583. (obsolete) — Details

This commit backs out the changes to add a isWasmTrap() flag on
ErrorObject to fix a performance regression. Some form of flag will
need to be added back, but for now this is the easiest fix for
release.

Comment 12

•

4 years ago

If backing it out did not improve perf then, provided we believe that perfherder gives good results on the Pixel 2, the patch wasn't responsible for the regression in the first place. With that in mind, could it be that other patches in the set are to blame after all? Does backing them all out (assuming that's practical) restore performance?

(And it's not like we won't need the field in the structure. Let's take this up with eg jorendorff + jonco as soon as the freeze is over and absent any good ideas, plan to reland the original patch after that.)

[1] https://treeherder.mozilla.org/perfherder/compare?originalProject=try&originalRevision=f91503f7a542269c398e70a5d3e661d50afea5b3&newProject=try&newRevision=dcaf04d27734e384522694086b5acf5d63dfc0c4&framework=13

Comment 13

•

4 years ago

The full backout of all exception-handling patches (required because they build upon the two patches in the regressing push) yields this perfherder result [1]. Possibly 1-3% improvements, but confidence is low after 15 retriggers on both sides so I almost attribute it to noise.

So at this point, I really don't know if the original patches caused this regression.

[1] https://treeherder.mozilla.org/perfherder/compare?originalProject=try&originalRevision=c1ca5b3dc8afe626ec8b2965d0ffb0877e300847&newProject=try&newRevision=c9f611b375aabbc4c29b00aab08b3672e6925039&framework=13

Comment 14

•

4 years ago

I did a perf comparison of the original regressing commits in autoland verus a commit with them immediately backed out and was able to see the cited regression [1]. So something is magical about that spot in the tree, or the commits that have happened since, that causes the backout to only be effective there.

Considering the soft freeze is over, I think we should try to land Asumu's patch that removes the slot and see if that makes a difference. If it doesn't, then we should just consider this an acceptable regression, if it even exists. We need this behavior for the wasm exception-handling proposal and this is the most efficient way to add it.

Pulsebot

Comment 15

•

4 years ago

Pushed by archaeopteryx@coole-files.de: https://hg.mozilla.org/integration/autoland/rev/1b682d807c40 address perf regression for ErrorObject wasm trap flag r=rhunt

Phabricator Automation

Updated

•

4 years ago

Attachment #9204378 - Attachment is obsolete: true

Dorel Luca [:dluca]

Comment 16

•

4 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/1b682d807c40

Status: NEW → RESOLVED

Closed: 4 years ago

status-firefox88: --- → fixed

Resolution: --- → FIXED

Target Milestone: --- → 88 Branch

BugBot [:suhaib / :marco/ :calixte]

Comment 17

•

4 years ago

The patch landed in nightly and beta is affected.
:asumu, is this bug important enough to require an uplift?
If not please set status_beta to wontfix.

For more information, please visit auto_nag documentation.

Flags: needinfo?(asumu)

https://treeherder.mozilla.org/perfherder/graphs?highlightAlerts=0&highlightChangelogData=1&highlightedRevisions=916497e295fe&&highlightedRevisions=0b260a20b5834&highlightedRevisions=b422dd886e036&highlightedRevisions=287db0596e0b0&series=mozilla-central,2895566,1,13&series=autoland,2891495,1,13&timerange=2592000
https://treeherder.mozilla.org/perfherder/graphs?highlightAlerts=1&highlightChangelogData=1&highlightedRevisions=916497e295fe&&highlightedRevisions=0b260a20b5834&highlightedRevisions=287db0596e0b0&highlightedRevisions=b422dd886e036&series=mozilla-central,2895651,1,13&series=autoland,2891544,1,13&timerange=2592000
https://treeherder.mozilla.org/perfherder/graphs?highlightAlerts=1&highlightChangelogData=1&highlightedRevisions=916497e295fe&&highlightedRevisions=0b260a20b5834&highlightedRevisions=287db0596e0b0&highlightedRevisions=b422dd886e036&series=mozilla-central,2895649,1,13&series=autoland,2891542,1,13&timerange=2592000

Assignee

Comment 18

•

4 years ago

•

Edited

After the patch landed, I've been looking at if it had an effect. It seems inconclusive from just looking at a comparison between the previous mozilla-central push and the one for the patch (it actually gets worse) but looking at trends it seems more positive:

(relevant revisions like the original alert, and autoland/central pushes with the patch highlighted)

These plots show the trends (with both mozilla-central and autoland plotted) on the three tests from this bug's alert. And it does look like it has trended down around the time the patch landed (especially in autoland), but it's hard to tell exactly.

I'm not sure if that means it is important enough for an uplift. Do you have any thoughts on this Ryan?

Flags: needinfo?(asumu) → needinfo?(rhunt)

Comment 19

•

4 years ago

Hmm, there does seem to be a slight improvement in the trend here. But it's slight and this regression has been so tricky to reliably figure out what's going on that I think we should just let this be for now.

Flags: needinfo?(rhunt)