Closed Bug 1274588 Opened 4 years ago Closed 4 years ago

2.45% damp (windows7-32) regression on push 2e29695fee87 (Tue May 17 2016)

Categories

(Core :: JavaScript Engine, defect)

defect
Not set

Tracking

()

RESOLVED FIXED
mozilla49
Tracking Status
firefox49 + fixed

People

(Reporter: jmaher, Assigned: nbp)

References

Details

(Keywords: perf, regression, Whiteboard: [talos_regression])

Attachments

(4 files)

Talos has detected a Firefox performance regression from push 2e29695fee87. As author of one of the patches included in that push, we need your help to address this regression.

This is a list of all known regressions and improvements related to the push:
https://treeherder.mozilla.org/perf.html#/alerts?id=1241

On the page above you can see an alert for each affected platform as well as a link to a graph showing the history of scores for this test. There is also a link to a treeherder page showing the Talos jobs in a pushlog format.

To learn more about the regressing test(s), please see:
https://wiki.mozilla.org/Buildbot/Talos/Tests#DAMP

Reproducing and debugging the regression:

If you would like to re-run this Talos test on a potential fix, use try with the following syntax:
try: -b o -p win32 -u none -t g2-e10s[Windows 7] --rebuild 5  # add "mozharness: --spsProfile" to generate profile data

(we suggest --rebuild 5 to be more confident in the results)

To run the test locally and do a more in-depth investigation, first set up a local Talos environment:
https://wiki.mozilla.lorg/Buildbot/Talos/Running#Running_locally_-_Source_Code

Then run the following command from the directory where you set up Talos:
talos --develop -e [path]/firefox -a damp

(add --e10s to run tests in e10s mode)

Making a decision:
As the patch author we need your feedback to help us handle this regression.
*** Please let us know your plans within 3 business days, or the offending patch(es) will be backed out! ***

Our wiki page outlines the common responses and expectations:

https://wiki.mozilla.org/Buildbot/Talos/RegressionBugsHandling
I am having a hard time getting all the alerts lined up, but a compare to the previous push yields:
https://treeherder.mozilla.org/perf.html#/compare?originalProject=mozilla-inbound&originalRevision=a7f37ab1e0c3&newProject=mozilla-inbound&newRevision=2e29695fee87&framework=1

this isn't pgo though, and the regressions are slightly different there.

what I see in the compare view is;
5% win7 damp
3% winxp session restore [auto restore] e10s


we have a few other damp regressions on the tree around the same time, same with a ts_paint- looking to see if those are related at all.

:nbp, can you take a look at this and determine what is the root cause and how we should resolve this issue?
Flags: needinfo?(nicolas.b.pierron)
(In reply to Joel Maher (:jmaher) from comment #1)
> I am having a hard time getting all the alerts lined up, but a compare to
> the previous push yields:
> https://treeherder.mozilla.org/perf.html#/compare?originalProject=mozilla-
> inbound&originalRevision=a7f37ab1e0c3&newProject=mozilla-
> inbound&newRevision=2e29695fee87&framework=1

The Base & New are miss-leading as it does not show all the patches in the patch series.

This issue concern all patches from Bug 1261826, except for the part 10.1 (which is not landed yet) and part 5 (which is not going to land).  Bug 1261826 is one step toward enabling a Jit optimization (Bug 1229813).

Part 1, 1.5, 2 and 6 are minimal changes, and I do not expect them to have any major performance impact.

Part 9 and 10, are not supposed to be used today, as these are adding branches for code which is not yet being executed (unless code coverage is enabled by the Debugger).  I would not expect these to have any impact on performances (except code-size wise)

The most notable impact of Bug 1261826 is that the bytecode size increased by 5.8%, as mentioned in Bug 1261826 comment 36.

Another possible reasons might be caused by the lack of inlined functions for the new code which unifies the 2 different ways for doing jumps in the bytecode.  And/or that fact that we chose the most general way of handling instead of specializing cases where there is only a single jump instruction.

If this is the bytecode size (since part 8), then I might have ways to reduce the impact, but not to remove it completely. Checking performances on part 7 should highlight if the issue comes from the part 8 to 10.

If this is the lack of inlined function, then the problem should already exists since the part 4.  Thus checking performances on part 3, and part 7 should highlight if the issue is already present between part 4 and 7.  Note, I would expect talos performances on part 3, to behave the same way as on the "BASE" changeset.

(In reply to Joel Maher (:jmaher) from comment #1)
> :nbp, can you take a look at this and determine what is the root cause and
> how we should resolve this issue?

We should send try runs on part 7 (and 3 to remove doubts if present on part 7), to know if the problems comes from part 4, or part 8.  Once we have results we should decide how to fix it.
I've pushed part 7 to try, in order to compare talos results:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=dcf5d8dcfe83a27546304366091f2b889f950201

I will check later today, or on Monday to see how we should continue.
I don't see a difference in the try push, so I assumed part 7 isn't changing numbers.
In the mean time I will work on a patch to fold consecutive jump targets, and if this is still an issue, I will look for removing fallthrough jump.
Try is no longer open for the moment …

(In reply to Nicolas B. Pierron [:nbp] from comment #5)
> In the mean time I will work on a patch to fold consecutive jump targets,
> and if this is still an issue, I will look for removing fallthrough jump.
Comment on attachment 8755926 [details] [diff] [review]
part 0.1 - Add MOZ_MUST_USE to popStatement functions.

Review of attachment 8755926 [details] [diff] [review]:
-----------------------------------------------------------------

Sure.
Attachment #8755926 - Flags: review?(jorendorff) → review+
Tracking 49+ since it would be great to address this perf issue.
Attachment #8756471 - Flags: review?(jorendorff)
Attachment #8756472 - Flags: review?(bhackett1024) → review+
this is exciting- can we get this landed and resolved? :)
(In reply to Joel Maher (:jmaher) from comment #14)
> this is exciting- can we get this landed and resolved? :)

Part 1.0 and 1.1 are the 2 mandatory patches for fixing this issue.
oh sorry, I saw r+ in the last comment and didn't look at the other patches- I will wait :)
Comment on attachment 8756471 [details] [diff] [review]
part 1.0 - Alias consecutive jump targets.

Review of attachment 8756471 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/frontend/BytecodeEmitter.h
@@ +379,5 @@
> +    // Check if the last emitted opcode is a jump target.
> +    bool lastOpcodeIsJumpTarget() const {
> +        return offset() - current->lastTarget.offset == ptrdiff_t(JSOP_JUMPTARGET_LENGTH);
> +    }
> +    // JumpTarget should not be part of the emitted statement, as they can be

style nit: blank line before comment

@@ +383,5 @@
> +    // JumpTarget should not be part of the emitted statement, as they can be
> +    // aliased by multiple statements. If we included the jump target as part of
> +    // the statement we might have issues where the enclosing statement might
> +    // not contain all the opcodes of the enclosed statements.
> +    ptrdiff_t lastStmtOffset() const {

This should be called lastNonJumpTargetOffset(), as lastStmtOffset() seems to be saying something else.

Verbose, I know, but it's only called twice.
Attachment #8756471 - Flags: review?(jorendorff) → review+
Attachment #8756473 - Flags: review?(jorendorff) → review+
Pushed by npierron@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/cd714f1275be
part 0.1 - Add MOZ_MUST_USE to popStatement functions. r=jorendorff
https://hg.mozilla.org/integration/mozilla-inbound/rev/0792b0daef2a
part 1.0 - Alias consecutive jump targets. r=jorendorff
https://hg.mozilla.org/integration/mozilla-inbound/rev/32eb8332d618
part 1.1 - Update code coverage reports to accounts for the aliased jump targets. r=bhackett
https://hg.mozilla.org/integration/mozilla-inbound/rev/e838c11fd532
part 2 - Rely on aliased jump target to remove extra popStatement function. r=jorendorff
Component: Untriaged → JavaScript Engine
Product: Firefox → Core
Flags: needinfo?(nicolas.b.pierron)
Assignee: nobody → nicolas.b.pierron
Status: NEW → ASSIGNED
I went to verify this and find alerts to show improvements and the graphs look the same, there are no improvements.

Possibly there is not much else to do?
(In reply to Joel Maher (:jmaher) from comment #20)
> I went to verify this and find alerts to show improvements and the graphs
> look the same, there are no improvements.
> 
> Possibly there is not much else to do?

There is still one option.  Currently we are producing jump-target opcodes for fallthrough cases of jumps, but intuitively this should not be needed if the jump target is only a fallthrough.  Doing so, implies that counters instrumentation should be placed on the branch-side instead of being on the target side.

I opened bug 1278207, to address this issue as this is likely a larger patch.
ok, thanks for the information!  Lets leave this bug open and see how bug 1278207 affects this test.
(In reply to Joel Maher (:jmaher) from comment #22)
> ok, thanks for the information!  Lets leave this bug open and see how bug
> 1278207 affects this test.

While useful to keep tracking this, we should keep in mind that the goal is not the test numbers, but rather making sure that performance tradeoffs implied by a patch are made explicit and accounted for (of course, with possible followups in case the owner thinks it's due).
(In reply to Joel Maher (:jmaher) from comment #20)
> I went to verify this and find alerts to show improvements and the graphs
> look the same, there are no improvements.
> 
> Possibly there is not much else to do?

I fixed another regression caused by the same patch set. (Bug 1273955)  It might be plausible that this help mitigate the regression seen here as well.
Are you thinking that the other fix introduced a regression which nullified the improvement we were expecting?  If that is the case, are we fine with wontfixing this, or should we do more investigation?
Flags: needinfo?(nicolas.b.pierron)
(In reply to Joel Maher (:jmaher) from comment #25)
> Are you thinking that the other fix introduced a regression which nullified
> the improvement we were expecting?  If that is the case, are we fine with
> wontfixing this, or should we do more investigation?

I am thinking that other fix, might also fix this issue.
Flags: needinfo?(nicolas.b.pierron)
sorry for the confusion, I will wait for bug 1278207 then!
Depends on: 1280710
Version: 47 Branch → Trunk
You need to log in before you can comment on or make changes to this bug.