Open Bug 1394761 Opened 7 years ago Updated 2 years ago

Hasal input latency case facebook_ail_scroll_home_1_txt regressed median from 22ms to 44ms on 7/29

Categories

(Core :: JavaScript Engine, defect, P3)

defect

Tracking

()

Performance Impact low
Tracking Status
firefox57 --- affected

People

(Reporter: kanru, Unassigned)

References

(Blocks 1 open bug)

Details

(Keywords: perf, Whiteboard: [QRC][QRC_NeedAnalysis])

Whiteboard: [QRC][QRC_NeedAnalysis]
Mark, could you help to create a profile data? Thanks.
Flags: needinfo?(myen)
facebook_ail_scroll_home_1_txt

Description:
0. Logged in with Whitehat account, Facebook Homepage
1. Scroll down

Short Firefox Profiling Data
http://bit.ly/2wjW6Jz
Flags: needinfo?(myen)
(In reply to myen from comment #2)
> facebook_ail_scroll_home_1_txt
> 
> Description:
> 0. Logged in with Whitehat account, Facebook Homepage
> 1. Scroll down
> 
> Short Firefox Profiling Data
> http://bit.ly/2wjW6Jz

It seems that last profile data did not show some information.

Please refer to this new one.


New Short Firefox Profiling Data
http://bit.ly/2xzAS9q
The profile data was recorded on the reference device (ACER laptop) with latest nightly (20170830).
Sean, could you help to analyze the profile data? Thanks.
Assignee: nobody → sstangl
Flags: needinfo?(sstangl)
Do you have a profile that shows the previous 22ms behavior?

If you don't -- would you happen to know a specific commit I could build locally to exhibit that behavior? I would also need steps to reproduce the Hasal testcase.
Flags: needinfo?(bchien)
It's worth noting that the bytecode cache landed in that regression range.

Nicolas, could you take a look at the profile in Comment 3 and see if you notice any time occupied by caching?
Flags: needinfo?(nicolas.b.pierron)
(In reply to Sean Stangl [:sstangl] from comment #6)
> Do you have a profile that shows the previous 22ms behavior?
> 
> If you don't -- would you happen to know a specific commit I could build
> locally to exhibit that behavior? I would also need steps to reproduce the
> Hasal testcase.

Hi Sean, 

Here is some regressed information from Kanru's bug description. 
Regressed commits range: https://hg.mozilla.org/mozilla-central/log?rev=5845151f1a2c%3A2fba314d7de7
Known good: https://treeherder.mozilla.org/#/jobs?repo=mozilla-central&revision=5845151f1a2cd00957fdd48e204542ccbdfaba1e&selectedJob=119171754&group_state=expanded
Known bad: https://treeherder.mozilla.org/#/jobs?repo=mozilla-central&revision=2fba314d7de77ad8ab693a2ea0112c0cda5dd564&selectedJob=122591464&group_state=expanded

Hi Mark, 

could you help to create a profile data based on known good and known bad builds? Thanks.
Flags: needinfo?(bchien) → needinfo?(myen)
(In reply to Sean Stangl [:sstangl] from comment #7)
> It's worth noting that the bytecode cache landed in that regression range.
> 
> Nicolas, could you take a look at the profile in Comment 3 and see if you
> notice any time occupied by caching?

I do not see any stack reporting any caching from the JSBC.

However, the code coverage is enabled in the profile reported in comment 3:

    Interpret
    \- js::ScriptCounts::maybeGetPCCounts(unsigned __int64)

This is not supposed to happen unless the code coverage is enabled to investigate the coverage of a given test case in our infra, or if a Debugger object is attached, and that it request the code coverage information, or if a given environment variable ask to dump the code coverage in a directory.

If any of these code coverage collection is enabled on the interpreter, then *IonMonkey compilation is disabled*. Which implies that all executions would remain in Baseline.

This is not supposed to happen in a normal build, because the code coverage collection divide by 2 the performance of the interpreter.

(In reply to Bobby Chien [:bchien] from comment #8)
> Here is some regressed information from Kanru's bug description. 
> Regressed commits range:
> https://hg.mozilla.org/mozilla-central/log?rev=5845151f1a2c%3A2fba314d7de7

This new range also excludes Bug 900784 (JSBC), which by the way got backed out since.
Flags: needinfo?(nicolas.b.pierron)
myen, would you be able to set a breakpoint in
  http://searchfox.org/mozilla-central/source/js/src/jscompartment.cpp#1323

And report which one of the 3 conditions is true?
None of these should be set to true, as this slow down the execution of the interpreter by a factor of 2.

Note, that this do exists in both profiles, so this is unlikely to be the primary reason for the slow down you observe.
Flags: needinfo?(myen)
Flags: needinfo?(myen)
(In reply to Nicolas B. Pierron [:nbp] from comment #11)
> myen, would you be able to set a breakpoint in
>   http://searchfox.org/mozilla-central/source/js/src/jscompartment.cpp#1323
> 
> And report which one of the 3 conditions is true?
> None of these should be set to true, as this slow down the execution of the
> interpreter by a factor of 2.
> 
> Note, that this do exists in both profiles, so this is unlikely to be the
> primary reason for the slow down you observe.

Need help!
Flags: needinfo?(sho)
Ok, we will borrow the dev environment from Kan-ru to do the following test. 
Will update the result after we finish the test.
Thanks
Flags: needinfo?(sho)
(In reply to Nicolas B. Pierron [:nbp] from comment #9)
> (In reply to Sean Stangl [:sstangl] from comment #7)
> > It's worth noting that the bytecode cache landed in that regression range.
> > 
> > Nicolas, could you take a look at the profile in Comment 3 and see if you
> > notice any time occupied by caching?
> 
> I do not see any stack reporting any caching from the JSBC.
> 
> However, the code coverage is enabled in the profile reported in comment 3:
> 
>     Interpret
>     \- js::ScriptCounts::maybeGetPCCounts(unsigned __int64)
> 
> This is not supposed to happen unless the code coverage is enabled to
> investigate the coverage of a given test case in our infra, or if a Debugger
> object is attached, and that it request the code coverage information, or if
> a given environment variable ask to dump the code coverage in a directory.
> 
> If any of these code coverage collection is enabled on the interpreter, then
> *IonMonkey compilation is disabled*. Which implies that all executions would
> remain in Baseline.
> 
> This is not supposed to happen in a normal build, because the code coverage
> collection divide by 2 the performance of the interpreter.

I tried to inspect a Nightly build and indeed we call js::ScriptCounts::maybeGetPCCounts(unsigned __int64) a lot!

I found in JSCompartment::collectCoverage(), collectCoverageForPGO() returns true.

Is it left enabled by accident?
Flags: needinfo?(nicolas.b.pierron)
(In reply to Kan-Ru Chen [:kanru] (UTC+8) from comment #14)
> (In reply to Nicolas B. Pierron [:nbp] from comment #9)
> > However, the code coverage is enabled in the profile reported in comment 3:
> > 
> >     Interpret
> >     \- js::ScriptCounts::maybeGetPCCounts(unsigned __int64)
> > 
> I tried to inspect a Nightly build and indeed we call
> js::ScriptCounts::maybeGetPCCounts(unsigned __int64) a lot!
> 
> I found in JSCompartment::collectCoverage(), collectCoverageForPGO() returns
> true.
> 
> Is it left enabled by accident?

The fact that collectCoverage returns true is normal, this is used to collect code coverage information while we are running in baseline, in which case maybeGetPCCounts is called during code generation, to bake in the counter. This is also used in IonBuilder to annotate block which are unused in order to remove them and improve the compilation time and quality.

The fact that collectCoverageForDebug returns true is not normal. Seeing stacks where the Interpret function calls into these maybeGetPCCounts, implies that collectCoverageForDebug returned true.

http://searchfox.org/mozilla-central/rev/44c693914255638d74bcf1ec3b0bcd448dfab2fd/js/src/vm/Interpreter.cpp#1830
http://searchfox.org/mozilla-central/rev/44c693914255638d74bcf1ec3b0bcd448dfab2fd/js/src/vm/Interpreter.cpp#1842
http://searchfox.org/mozilla-central/search?q=symbol:_ZNK13JSCompartment23collectCoverageForDebugEv&redirect=false

I was not able to reproduce this issue while testing locally.  Locally "collectCoverageForDebug" always returns false.
Flags: needinfo?(nicolas.b.pierron)
I can't reproduce collectCoverageForDebug() returning true either.
The stack I saw looks exactly like

>     Interpret
>     \- js::ScriptCounts::maybeGetPCCounts(unsigned __int64)

And its called from here

https://searchfox.org/mozilla-central/source/js/src/vm/Interpreter.cpp#2020

> CASE(JSOP_TRY)
> CASE(JSOP_JUMPTARGET)
> CASE(JSOP_LOOPHEAD)
> {
>     MOZ_ASSERT(CodeSpec[*REGS.pc].length == 1);
>     COUNT_COVERAGE();
>     ADVANCE_AND_DISPATCH(1);
> }

(In reply to Nicolas B. Pierron [:nbp] from comment #15)
> (In reply to Kan-Ru Chen [:kanru] (UTC+8) from comment #14)
> > (In reply to Nicolas B. Pierron [:nbp] from comment #9)
> > > However, the code coverage is enabled in the profile reported in comment 3:
> > > 
> > >     Interpret
> > >     \- js::ScriptCounts::maybeGetPCCounts(unsigned __int64)
> > > 
> > I tried to inspect a Nightly build and indeed we call
> > js::ScriptCounts::maybeGetPCCounts(unsigned __int64) a lot!
> > 
> > I found in JSCompartment::collectCoverage(), collectCoverageForPGO() returns
> > true.
> > 
> > Is it left enabled by accident?
> 
> The fact that collectCoverage returns true is normal, this is used to
> collect code coverage information while we are running in baseline, in which
> case maybeGetPCCounts is called during code generation, to bake in the
> counter. This is also used in IonBuilder to annotate block which are unused
> in order to remove them and improve the compilation time and quality.
> 
> The fact that collectCoverageForDebug returns true is not normal. Seeing
> stacks where the Interpret function calls into these maybeGetPCCounts,
> implies that collectCoverageForDebug returned true.

I think it does not imply collectCoverageForDebug returned true since we also init script counts here

https://searchfox.org/mozilla-central/source/js/src/jit/BaselineCompiler.cpp#108

And like I found in comment 14 in JSCompartment::collectCoverage(), collectCoverageForPGO() returns true.

Once hasScriptCount_ is true we do not reset it back to false. If maybeGetPCCounts is supposed to be called only during code generation, it does end up being called during normal interpreting.
Flags: needinfo?(nicolas.b.pierron)
Depends on: 1398738
As mentioned in comment 11, as this appears in both profiles, this is unlikely to be the source of the regression.

(In reply to Kan-Ru Chen [:kanru] (UTC+8) from comment #17)
> I think it does not imply collectCoverageForDebug returned true since we
> also init script counts here
> 
> https://searchfox.org/mozilla-central/source/js/src/jit/BaselineCompiler.
> cpp#108

Most of the time it does, the only exception being Baseline scripts which are garbage collected.
I opened Bug 1398738 to solve this code coverage issue.
Flags: needinfo?(nicolas.b.pierron)
Component: General → JavaScript Engine
Whiteboard: [QRC][QRC_NeedAnalysis] → [QRC][QRC_NeedAnalysis][qf]
Priority: -- → P3
Whiteboard: [QRC][QRC_NeedAnalysis][qf] → [QRC][QRC_NeedAnalysis][qf:p3]
Keywords: perf
Performance Impact: --- → P3
Whiteboard: [QRC][QRC_NeedAnalysis][qf:p3] → [QRC][QRC_NeedAnalysis]

The bug assignee didn't login in Bugzilla in the last 7 months.
:sdetar, could you have a look please?
For more information, please visit auto_nag documentation.

Assignee: sstangl → nobody
Flags: needinfo?(sdetar)
Flags: needinfo?(sstangl)
Flags: needinfo?(sdetar)
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.