Closed Bug 1521696 Opened 10 months ago Closed 10 months ago

4.19 - 4.4% tp5o_webext responsiveness (linux64, linux64-qr) regression on push fe634797760796afcba95f49d9288c34f3de4865 (Fri Jan 18 2019)

Categories

(Core :: JavaScript Engine, defect, P2)

Unspecified
Linux
defect

Tracking

()

VERIFIED FIXED
mozilla67
Tracking Status
firefox-esr60 --- unaffected
firefox65 --- unaffected
firefox66 --- fixed
firefox67 --- fixed

People

(Reporter: igoldan, Assigned: arai)

References

(Blocks 1 open bug)

Details

(Keywords: perf, regression, talos-regression)

Attachments

(1 file)

Talos has detected a Firefox performance regression from push:

https://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?changeset=fe634797760796afcba95f49d9288c34f3de4865

As author of one of the patches included in that push, we need your help to address this regression.

Regressions:

4% tp5o_webext responsiveness linux64 opt e10s stylo 1.42 -> 1.48
4% tp5o_webext responsiveness linux64-qr opt e10s stylo 1.98 -> 2.06

You can find links to graphs and comparison views for each of the above tests at: https://treeherder.mozilla.org/perf.html#/alerts?id=18833

On the page above you can see an alert for each affected platform as well as a link to a graph showing the history of scores for this test. There is also a link to a treeherder page showing the Talos jobs in a pushlog format.

To learn more about the regressing test(s), please see: https://wiki.mozilla.org/Performance_sheriffing/Talos/Tests

For information on reproducing and debugging the regression, either on try or locally, see: https://wiki.mozilla.org/Performance_sheriffing/Talos/Running

*** Please let us know your plans within 3 business days, or the offending patch(es) will be backed out! ***

Our wiki page outlines the common responses and expectations: https://wiki.mozilla.org/Performance_sheriffing/Talos/RegressionBugsHandling

Component: General → JavaScript Engine
Product: Testing → Core
Flags: needinfo?(arai.unmht)

:arai if one of the blocked bugs isn't related to this issue, do remove it.

where's the testcase?

and after opening 2 links in comment #1, what should I do?
they list many items, and the set (or maybe just order?) of items differ between them.

Flags: needinfo?(igoldan)

(In reply to Tooru Fujisawa [:arai] from comment #3)

where's the testcase?

You can find more about the tsvgx test from here. This link includes references to the test source code also.

and after opening 2 links in comment #1, what should I do?
they list many items, and the set (or maybe just order?) of items differ between them.

The Gecko profiles should allow you to pinpoint why/where this test regressed. These are our most precise perf debug tools we have.

Flags: needinfo?(igoldan)

:jmaher do you happen to know who's best suited for explaining Gecko profiles? Or maybe point to some docs to help :arai read the before/after profiles?

Flags: needinfo?(jmaher)

(In reply to Ionuț Goldan [:igoldan], Performance Sheriffing from comment #4)

(In reply to Tooru Fujisawa [:arai] from comment #3)

where's the testcase?

You can find more about the tsvgx test from here. This link includes references to the test source code also.

Oh, sorry. I've provided the wrong link. This should have been the correct one for tp5o: https://wiki.mozilla.org/Performance_sheriffing/Talos/Tests#Responsiveness

(In reply to Ionuț Goldan [:igoldan], Performance Sheriffing from comment #6)

(In reply to Ionuț Goldan [:igoldan], Performance Sheriffing from comment #4)

(In reply to Tooru Fujisawa [:arai] from comment #3)

where's the testcase?

You can find more about the tsvgx test from here. This link includes references to the test source code also.

Oh, sorry. I've provided the wrong link. This should have been the correct one for tp5o: https://wiki.mozilla.org/Performance_sheriffing/Talos/Tests#Responsiveness

...Messed up the buttons.

for profiles, I would talk to :mconley as a starting point- I only know enough to verify a link opens useful data.

:mconley, do you know more about how to use profiles to help understand a perf regression? Maybe any links to examples, docs, or pointers to others?

Flags: needinfo?(jmaher) → needinfo?(mconley)

before the profile, can you explain what the testcase is doing?
I don't know what kind of thing the testcase is doing/calculating, and what kind of score regressed.

this is described on the wiki:
https://wiki.mozilla.org/Performance_sheriffing/Talos/Tests#Responsiveness

I honestly don't know much about the test, If what is on the wiki doesn't help answer your question, then I would ask :jimm

unfortunately the description on the wiki is very high-level of "how", and it doesn't mention about the testcase itself.
so I cannot figure out how/why it's related to my change.

jimm, can you tell me the detail of the testcase?

Flags: needinfo?(jmathies)

this is running a pass through tp5 pages (49 pages from 2011) which loads each page 25 times, then moves onto the next. during that process we collect metrics, one of which is responsiveness.

Thanks.

so the "response time" mentioned in https://wiki.mozilla.org/Performance/Snappy#Current_Infrastructure somehow regressed while loading those pages (or after loaded the page?), and the profiles above are for each page's case, right?

is there any hint which one regressed in those pages? or do we just sum all of them up before calculating the score?
(of course I can investigate all of them one by one, but if there's any hint, that's really helpful.

I think this is one of those unfortunate cases where, even though we have profiles, it's not clear where the problem is.

This is because the regression isn't on a particular page, but on a "global responsiveness" metric that we measure when loading the pages from the tp5o set.

That responsiveness metric is gathered by sending tracer events through the event loop in the parent process (via EventTracer), and then having Talos parse the log for tracer events that took longer than the threshold (20ms) to get serviced.

So what Talos is reporting here, I think, is that the main thread in the parent process is servicing the tracer events more slowly.

Flags: needinfo?(mconley)

thank you for the info.

so looks like I first need to split the change into smaller set and run this test on all of them, to make it easier to investigate further.
fortunately the changes can be split into at least 4 or 5 parts.
I'll do that as a first step.

but I think I cannot solve this within 3 days.
feel free to back both of them out.

Flags: needinfo?(arai.unmht)

apparently bug 1501576 is not related.
I get the regression on bug 1501577 push.

Assignee: nobody → arai.unmht
No longer blocks: 1501576
Status: NEW → ASSIGNED

okay, I figured out.
the refactoring in bug 1501577 has some bug that produces different bytecode.
will fix shortly.

Flags: needinfo?(jmathies)
Priority: -- → P2
Pushed by arai_a@mac.com:
https://hg.mozilla.org/integration/autoland/rev/07d1d28e3b0a
Fix PropertyEmitter not to use JSOP_OBJECT when the objest has accessor. r=jorendorff
Status: ASSIGNED → RESOLVED
Closed: 10 months ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla67

I'm trying to verify this bug. I think I noticed the improvement, but it's possible another regression landed just a couple of hours before the push from comment 20. If that's the case, I'll mark this as VERIFIED.

can you describe some more about the issue (..if there is. actually I don't understand your comment) ?

Flags: needinfo?(igoldan)

(In reply to Tooru Fujisawa [:arai] from comment #23)

can you describe some more about the issue (..if there is. actually I don't understand your comment) ?

There shouldn't be any issue here. I'm logging the status for the bugs I'm verifying.
While checking this bug, I may have discovered a different regression (caused by another bug) which Perfherder didnt't catch.

Flags: needinfo?(igoldan)

AFAICT, this needs a Beta uplift request?

Flags: needinfo?(arai.unmht)

Comment on attachment 9038760 [details]
Bug 1521696 - Fix PropertyEmitter not to use JSOP_OBJECT when the objest has accessor. r?jorendorff

Beta/Release Uplift Approval Request

Feature/Bug causing the regression

Bug 1501577

User impact if declined

Performance regression in JS execution

Is this code covered by automated tests?

Yes

Has the fix been verified in Nightly?

Yes

Needs manual test from QE?

No

If yes, steps to reproduce

List of other uplifts needed

None

Risk to taking this patch

Low

Why is the change risky/not risky? (and alternatives if risky)

Fixed wrong refactoring, and the code should behave just like the pre-rafactoring

String changes made/needed

Flags: needinfo?(arai.unmht)
Attachment #9038760 - Flags: approval-mozilla-beta?

Comment on attachment 9038760 [details]
Bug 1521696 - Fix PropertyEmitter not to use JSOP_OBJECT when the objest has accessor. r?jorendorff

Fix for perf regression in 66, let's uplift for beta 6.

Attachment #9038760 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.