Determine correct nglayout.initialpaint.delay for modern platforms

Status

()

Core
Layout
RESOLVED FIXED
10 months ago
3 months ago

People

(Reporter: jet, Assigned: neerja, NeedInfo)

Tracking

(Depends on: 2 bugs, Blocks: 1 bug)

Trunk
mozilla53
All
Unspecified
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox48 wontfix, firefox49 wontfix, firefox50 wontfix, firefox51 wontfix, firefox52 wontfix, firefox53 fixed)

Details

Attachments

(2 attachments, 1 obsolete attachment)

(Reporter)

Description

10 months ago
nglayout.initialpaint.delay is currently set to 250ms for all platforms. That preference is the amount of time we'll allow a page to finish its initial layout before painting to the screen. By setting it to a lower number, the user can potentially see content sooner, though that content may change again (causing FOUT or other visible layout shifts.)

I don't know how we got to 250ms as a default, or if that number even makes sense anymore. I also don't know if this should remain a constant or dynamically computed based on system capabilities.
(Reporter)

Updated

10 months ago
Blocks: 1283300
(Reporter)

Comment 1

10 months ago
Created attachment 8767332 [details] [diff] [review]
Sets the default value for nglayout.initialpaint.delay to 5ms (was 250 ms) per user research conclusions
Attachment #8767332 - Flags: review?(matt.woodrow)
Since 5ms is substantially smaller than a refresh tick, we might want to follow up by getting rid of the entire delay mechanism.
(In reply to David Baron :dbaron: ⌚️UTC-7 (review requests must explain patch) from comment #2)
> Since 5ms is substantially smaller than a refresh tick, we might want to
> follow up by getting rid of the entire delay mechanism.

Yeah I agree. Once we've had the pref change on release for a bit without fallout, then we should rip the remaining code out.
Attachment #8767332 - Flags: review?(matt.woodrow) → review+
(Reporter)

Comment 4

10 months ago
Try results coming up:

remote: View your changes here:
remote:   https://hg.mozilla.org/try/rev/9c41428cc88b
remote:   https://hg.mozilla.org/try/rev/ff5c0c16a40e
remote: 
remote: Follow the progress of your build on Treeherder:
remote:   https://treeherder.mozilla.org/#/jobs?repo=try&revision=ff5c0c16a40e
remote: 
remote: It looks like this try push has talos jobs. Compare performance against a baseline revision:
remote:   https://treeherder.mozilla.org/perf.html#/comparechooser?newProject=try&newRevision=ff5c0c16a40e

Comment 5

10 months ago
Pushed by jvillegas@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/24ec7aba60e7
Sets the default value for nglayout.initialpaint.delay to 5ms (was 250 ms) per user research conclusions. r=mattwoodrow
(Reporter)

Updated

10 months ago
Assignee: nobody → bugs
Some hg blame history:

2001-04-25 - bug 77002 originally defined PAINTLOCK_EVENT_DELAY as 1200 ms.
2001-05-01 - bug 76495 changed it to 1000 ms.
2001-05-31 - bug 78695 changed it back to 1200 ms.
2002-12-26 - bug 180241 changed it to 250 ms "to match Phoenix and Chimera".
2016-07-06 - bug 1283302 (this bug) changed it to 5 ms.
Blocks: 180241
status-firefox48: --- → affected
status-firefox49: --- → affected

Comment 7

10 months ago
sorry backout for test failures like https://treeherder.mozilla.org/logviewer.html#?job_id=31325455&repo=mozilla-inbound
Flags: needinfo?(bugs)

Comment 8

10 months ago
Backout by cbook@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/1052561d1faf
Backed out changeset 24ec7aba60e7 for testfailures in 518172-1b.html

Comment 9

10 months ago
this caused a 4% tp5o regression on linux64 and it reverted when this was backed out:
https://treeherder.mozilla.org/perf.html#/alerts?id=1723

please keep talos in mind when looking to reland this!
(Reporter)

Comment 10

10 months ago
(In reply to Pulsebot from comment #8)
> Backout by cbook@mozilla.com:
> https://hg.mozilla.org/integration/mozilla-inbound/rev/1052561d1faf
> Backed out changeset 24ec7aba60e7 for testfailures in 518172-1b.html

Thanks, filed bug 1285317 for that.
Flags: needinfo?(bugs)
(Reporter)

Updated

10 months ago
Depends on: 1285317

Comment 11

10 months ago
in addition to tp5o regression there is a 13% glterrain regression on win7 with this patch.
(Reporter)

Updated

10 months ago
Depends on: 1285839
(In reply to Joel Maher ( :jmaher - PTO - back on Friday July 15 ) from comment #9)
> this caused a 4% tp5o regression on linux64 and it reverted when this was
> backed out:
> https://treeherder.mozilla.org/perf.html#/alerts?id=1723

I'm somewhat suspicious of the glterrain regression (Windows) and the tp5o_scroll and tscrollx regressions (Linux64).  The rest (except probably the RSS one) seem expected.

Comment 13

10 months ago
the backout shows a 10%+ win for glterrain for win7:
https://treeherder.mozilla.org/perf.html#/compare?originalProject=mozilla-inbound&originalRevision=cc866385dd01&newProject=mozilla-inbound&newRevision=1052561d1faf&framework=1

this follows the pattern on the graph of the regression aligning with the commit and the improvement coming with the backout.  Possibly we can push to try with a --spsProfile flag to learn more of what is going on?
(In reply to David Baron :dbaron: ⌚️UTC-7 (review requests must explain patch) from comment #12)
> I'm somewhat suspicious of the glterrain regression (Windows) and the
> tp5o_scroll and tscrollx regressions (Linux64).  The rest (except probably
> the RSS one) seem expected.

Pinging this bug. I am wondering how we can move forward with this bug. What will make you guys feel comfortable fixing this and set the value to 5ms? Reading dbaron's comment it seems that the regressions are mostly expected.
see my comment above. What do we need to move forward here?
Flags: needinfo?(bugs)
(Reporter)

Comment 16

9 months ago
(In reply to Dominik Strohmeier from comment #15)
> see my comment above. What do we need to move forward here?

This change will come with an inevitable increase in measured page load time but may come with a decrease in *perceived* time to initial rendering. This was true the last time we changed this value, see:

https://bugzilla.mozilla.org/show_bug.cgi?id=180241#c17

I recommend we try it early in a cycle, see how it looks/feels, and gather feedback. We'll still want to figure out what's up with the GLTerrain test in comment 13 before we do that.
Flags: needinfo?(bugs)
> I recommend we try it early in a cycle, see how it looks/feels, and gather feedback.

Just to clarify the next steps: Given the SHIELD study confirmed improved perceived performance, what other feedback are we looking for that is blocking this?
Flags: needinfo?(bugs)
(Reporter)

Comment 18

8 months ago
(In reply to Harald Kirschner :digitarald from comment #17)
> > I recommend we try it early in a cycle, see how it looks/feels, and gather feedback.
> 
> Just to clarify the next steps: Given the SHIELD study confirmed improved
> perceived performance, what other feedback are we looking for that is
> blocking this?

We're looking for confirmation that the SHIELD study sample set actually represents the larger population with more diverse configurations. We also need Platform Operations team to approve the measured Load time regression.
Flags: needinfo?(bugs)
ni? jgaunt. Could we check Jet's concern on the study's sample being representative?
Flags: needinfo?(jgaunt)
(Reporter)

Comment 20

8 months ago
(In reply to Joel Maher ( :jmaher) from comment #13)
> the backout shows a 10%+ win for glterrain for win7:
> https://treeherder.mozilla.org/perf.html#/compare?originalProject=mozilla-
> inbound&originalRevision=cc866385dd01&newProject=mozilla-
> inbound&newRevision=1052561d1faf&framework=1

I believe you :) ...but I can't reproduce the regression on Win7 with our without this change.

> this follows the pattern on the graph of the regression aligning with the
> commit and the improvement coming with the backout.  Possibly we can push to
> try with a --spsProfile flag to learn more of what is going on?

+cc: jgilbert Jeff: this is the bug I mentioned last time we met. You're probably in a better spot to run this Try build and comment. Thx!
Flags: needinfo?(jgilbert)
Jason, can we somehow use Project Presto to see if this regresses Speed Index?
Flags: needinfo?(jduell.mcbugs)
Sadly Presto has been using

   https://www.webpagetest.org/

for our tests, and it uses standard builds from mozilla.org, so there's no easy way to get it to use a custom build.

I do think that the webpagetest SpeedIndex is a good metric for page load responsivity.  

I'm asking Valentin (Presto coder) and Nick Hurley (who had a hand-rolled webpagetest setup at one point: maybe he has a recipe for standing one up) if they have ideas here.
Flags: needinfo?(jduell.mcbugs) → needinfo?(valentin.gosu)
Flags: needinfo?(hurley)
Valentin--how often does webpagetest.org update its nightly build?  Could we throw the patch from this bug onto nightly for a few days till it shows up there, and then use your scripts to compare with nightlies from a few days before (w/o patch)?
Note: Valentin is on PTO so it may be a week or so before he responds.
(In reply to Jason Duell [:jduell] (needinfo me) from comment #22)
> I'm asking Valentin (Presto coder) and Nick Hurley (who had a hand-rolled
> webpagetest setup at one point: maybe he has a recipe for standing one up)
> if they have ideas here.

I didn't hand-roll a wpt setup, I hand-rolled a not-quite-wpt-like-thing-that-doesn't-measure-speedindex. Bob Clary was the one that set up wpt for us, redirecting to him.
Flags: needinfo?(hurley)
Flags: needinfo?(bob)
> it uses standard builds from mozilla.org, so there's no easy way to get it to use a custom build.

We can also reach out to speedcurve to set up one of their instances with a custom build and run benchmarks.

For reference, the BYO-Server solution is described in more detail: https://www.instartlogic.com/blog/configuring-private-instance-webpagetest-browser-extensions

Comment 27

8 months ago
wpt is using the directory:
http://archive.mozilla.org/pub/firefox/nightly/latest-mozilla-central/ to get builds.

which does appear to be updated regularly. I had thought taskcluster killed updating the latest dirs, but perhaps they changed their minds.

https://github.com/WPO-Foundation/webpagetest/blob/master/www/installers/browsers/updateFirefoxNightly.php#L8 pulls the 32bit installer and places it on 

https://www.webpagetest.org/installers/browsers/
https://www.webpagetest.org/installers/browsers/nightly.dat

http://www.webpagetest.org/installers/browsers/nightly-51-1471627440.exe is current.

Doing an experiment on nightly should work.
Flags: needinfo?(bob)
> Doing an experiment on nightly should work.

Testing this on Nightly as a second way of validating the patch in addition to the results from user research sounds like a great path forward.

I suggest that we'll use the Alexa TOP100-curated list for testing to cover a wide range of popular websites [1]. I think that it will also be important to webpagetest also for slower connections.

I figured out that webpagetest also allows for using credentials to login to pages like FB or Pinterest through scripting for more valid results [2][3] beyond landing pages.




[1] https://github.com/WPO-Foundation/webpagetest/blob/master/bulktest/alexa-curated.txt
[2] https://sites.google.com/a/webpagetest.org/docs/using-webpagetest/scripting
[3] https://www.webpagetest.org/forums/archive/index.php?thread-11504.html
(answering for jgaunt)

Hello!  I designed and implemented the Shield study.

To answer representativeness:

0.  All the concerns are valid.  

1.  It was a self selected study of EN-* users from Release.  These users self-selected to participate, not knowing what performance enhancements they would receive. It is entirely possible that these users had existing bad performance and that lured them into the study.

2.  At the time we had no good access to merge these packets with Telemetry.

3.  I would be delighted to have some code to compare a bunch of profiles to decide what is representative enough.

4.  A decision here is whether this sample is more representative than a NIGHTLY sample.

I personally doubt that users in Nightly are more representative of the full range of user configs.  I especially doubt that their modal / median / centroid is the same.  But I have no good evidence to back that up.

The evidence we do have:

1.  Surveys in the population reported better perceived performance than the other settings.
2.  That effect followed a dose-response curve, and got worse when the setting was worse
3.  Setting the timeout to worse than current (1000ms) was the worst group.
4.  Participating user uninstalled the addon in that same pattern.

I want to put a hard question forward:  Suppose Nightly shows a trend in the other direction?  Or no-effect?  Is nightly the product we are actually building?  Or release?

If we want to move to a standard of evidence that is conclusive and acceptable, I would gladly hear thoughts on what that is.  Until then, we have the evidence we have.
Flags: needinfo?(jgaunt)
Hey Gregg,

thanks for these details. I briefly want to chime in here to clarify a misunderstanding that I read in your comment. The discussion to do tests in Nightly are targeting more the engineering side of things. We do see some regressions with nglayout.initialpaint.delay set to 5ms and there's been the question about validity of this regression in the wild.

So the idea would be to run webpagetest.org against a Nightly version that has the pref set to 5ms and compare to results for the old value of 250ms. That has nothing to do with the results from the SHIELD research.

Second, I agree with what you described regarding the validity and reliability of the SHIELD results. Thanks for clarifying. Makes sense?
Flags: needinfo?(glind)

Comment 31

8 months ago
(In reply to David Baron :dbaron: ⌚️UTC-7 (review requests must explain patch) from comment #2)
> Since 5ms is substantially smaller than a refresh tick, we might want to
> follow up by getting rid of the entire delay mechanism.

Please do not. There are negative side effects to too low a setting. I personally keep the delay at 56ms after a bunch of testing. This is usually enough to prevent incorrect rendering of pages. Imagest usually take around 50ms to start loading, and having the whole page move is annoying. 

Similarly, I would definitely suggest testing among people who are not looking to prioritize speed. They may also like the smoother operation better.
> We're looking for confirmation that the SHIELD study sample set actually represents the larger population with more diverse configurations.

We got that from Gregg on a higher level. The assumption is that SHIELD does random sampling on release and therefor is diverse enough.

> We also need Platform Operations team to approve the measured Load time regression.

Looks like we had some general agreement in the Friday meeting that the regression isn't a major concern.
Flags: needinfo?(bugs)
A delay of 50ms works for me, 5ms makes the browser nearly unusable.
(In reply to Spencer Selander [greenknight] from comment #33)
> A delay of 50ms works for me, 5ms makes the browser nearly unusable.

I have had the pref at 5ms for nearly 6 weeks now and haven't had any issues. I am curious what "nearly unusable" means for you.
Flags: needinfo?(spencerselander)
(In reply to Dominik Strohmeier from comment #34)
> (In reply to Spencer Selander [greenknight] from comment #33)
> > A delay of 50ms works for me, 5ms makes the browser nearly unusable.
> 
> I have had the pref at 5ms for nearly 6 weeks now and haven't had any
> issues. I am curious what "nearly unusable" means for you.

It had been a while since I tried it, so I just gave it another shot - no problem now with 5ms. None of the reflows and glitches I got before, even image-heavy pages render very smoothly. I'm leaving it set that way for now.
Flags: needinfo?(spencerselander)
(Reporter)

Comment 36

8 months ago
Created attachment 8784602 [details] [diff] [review]
Sets the default value for nglayout.initialpaint.delay to 5ms on desktop (stays 250 ms on Android) per user research conclusions. r=tn

Modify the patch to apply only to desktop platforms. Android will remain at 250 ms.
Attachment #8767332 - Attachment is obsolete: true
Flags: needinfo?(bugs)
Attachment #8784602 - Flags: review?(tnikkel)
Attachment #8784602 - Flags: review?(tnikkel) → review+
(Reporter)

Comment 37

8 months ago
After conferring with the Platform and Ops teams, we've decided to enable this on mozilla-central for all desktop platforms. For more detailed information on the User Research backing this change, please read this doc:

https://docs.google.com/document/d/1BvCoZzk2_rNZx3u9ESPoFjSADRI0zIPeJRXFLwWXx_4/edit#heading=h.28ki6m8dg30z
(Reporter)

Updated

8 months ago
Attachment #8784602 - Attachment description: Sets the default value for nglayout.initialpaint.delay to 5ms on desktop (stays 250 ms on Android) per user research conclusions. → Sets the default value for nglayout.initialpaint.delay to 5ms on desktop (stays 250 ms on Android) per user research conclusions. r=tn

Comment 38

8 months ago
Pushed by jvillegas@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/e68b376d1191
Sets the default value for nglayout.initialpaint.delay to 5ms on desktop (stays 250 ms on Android) per user research conclusions. r=tn
Backed out in https://hg.mozilla.org/integration/mozilla-inbound/rev/ccf3d0b90b0422b169c60d429029cdfdc0e5a5a0 for making a bunch of existing intermittent failures nearly permaorange.

Going to be hard to point out without just a ton of copy-paste, but see in particular the Linux64 reftest-4 and reftest-7 chunks in https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound&tochange=ccf3d0b90b0422b169c60d429029cdfdc0e5a5a0&fromchange=e59fbcb1da07bb3e330c8c35759e58404981ffa6&filter-searchStr=reftest&group_state=expanded
And some of the same reftests on Win7 (it'll be hours yet before Win8 or WinXP actually run).
Hi Jet,
what'S the status here? Any chance that this will still ship in August?

Thanks,
Dominik
Flags: needinfo?(bugs)
(Reporter)

Comment 42

8 months ago
(In reply to Dominik Strohmeier from comment #41)
> Hi Jet,
> what'S the status here? Any chance that this will still ship in August?
> 
> Thanks,
> Dominik

mstange & I are looking through the failing tests. Note that we didn't plan to uplift this change (ie. it wasn't ever shipping in August.)
Flags: needinfo?(bugs)
(In reply to Bob Clary [:bc:] from comment #27)
> Doing an experiment on nightly should work.

I'll do a run in the next couple of days so we get a baseline, then we can compare with the build after we land on nightly.
Flags: needinfo?(valentin.gosu)
Jet, what's the status with this bug?

Bug 1297996 landed and got uplifted to Beta and we're planning to run some tests on page2page transitions. In our pretests, we've seen that there are some more or less announced effects for the page2page transitions for different initial delay settings. 

It would be great if this would land and ride the trains. Thanks.
Flags: needinfo?(bugs)
(Reporter)

Comment 45

7 months ago
(In reply to Dominik Strohmeier from comment #44)
> Jet, what's the status with this bug?

Markus is investigating the test failures this week. 

> Bug 1297996 landed and got uplifted to Beta and we're planning to run some
> tests on page2page transitions. In our pretests, we've seen that there are
> some more or less announced effects for the page2page transitions for
> different initial delay settings. 

I had a look at the patches in bug 1297996 and the checks for suppressed painting will certainly be affected by landing this one. This bug affects all platforms, whereas your experiments in bug 1297996 are non-e10s-only. I wouldn't assume your results will hold true for e10s. 

> It would be great if this would land and ride the trains. Thanks.

Agreed :) though we can't ship with the tests in this current state. I would note that the failing tests are perma-failures on my Linux VM with or without this fix, so that should help debugging at least.
Flags: needinfo?(bugs)
In bug 1311448, we will run a Telemetry Experiment on different page-to-page transitions. In this context, we will also have a test cohort that has nglayout.initialpaint.delay set to 0 and one that combines transition over blank with a initialpaint.delay of 0 as paint delays and the experience of different page to page transitions is very much linked.

What's the state of landing this? Any progress on Markus's investigations?

Does it make sense to wait for some more Telemetry data from the Telemetry Experiment and then target to get rid of the delay mechanisms maybe completely?
Flags: needinfo?(bugs)
(Reporter)

Comment 47

6 months ago
(In reply to Dominik Strohmeier from comment #46)
> What's the state of landing this? Any progress on Markus's investigations?

Markus diagnosed the test failures. Most will add additional fuzzing to the test results to account for paint invalidation that occurs prior to the reftest screen captures. 

Neerja will land the test fixes.

> Does it make sense to wait for some more Telemetry data from the Telemetry Experiment and then target to get rid of the delay mechanisms maybe completely?

I think we'll see the same test failures at 0 delay. It would be good to see what the Telemetry Experiment results say about stability, since we do intend to remove the delay mechanism entirely (see comment 2.)
Flags: needinfo?(bugs) → needinfo?(npancholi)

Comment 48

6 months ago
(In reply to Jet Villegas (:jet) from comment #47)
> I think we'll see the same test failures at 0 delay. It would be good to see
> what the Telemetry Experiment results say about stability, since we do
> intend to remove the delay mechanism entirely (see comment 2.)

Comment 2 has one person suggest that. I have specifically asked that the preference not be removed, and provide my reasoning--that I personally prefer the smoother experience of no flashing on simple sites like xkcd.com, where the image is the primary content and thus there is no need to display anything until the image is ready. 

What reason is there to remove the mechanism entirely instead of leaving it there for those of us who might want to change it? It does not seem that it would be possible for an addon to overwrite this change (as there is no delayPaint property in JavaScript or CSS.) 

I would understand it if you were overhauling the entire painting system, and thus you'd need to reimplement the delay mechanism. Then, if few users are using it, it makes sense to remove it. But, otherwise, why not leave it?
(Assignee)

Updated

6 months ago
Assignee: bugs → npancholi
Flags: needinfo?(npancholi)
(Assignee)

Updated

6 months ago
(Assignee)

Updated

6 months ago
No longer blocks: 1260629, 1293550, 1294278, 1295466, 1313772, 1313773
(Assignee)

Updated

6 months ago
(Assignee)

Comment 49

6 months ago
Added fuzzy annotations to intermittent failures corresponding to bugs 1293550, 1260629, 1313772, 1313773, 1294278, 1295466 and review request sent to mstange for each of them.   
(these bugs were also added to 'Depends on' list)

Started new try run with patch for this bug + patches for all bugs listed above:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=7d70c7ae79678fb56c1ea09e9d640e719f5c8c22
(Assignee)

Comment 50

6 months ago
Failure for test invalidate-1.html on windows needed to be looked into more closely:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=20b632bf5f252b8e21789a5e74e4f2c51a07273c&selectedJob=28813830&exclusion_profile=false

I triggered a Try run with "reftest-wait" and a small setTimeout added to this test:
You can see that this does fix the issue. (Test doesn’t fail anymore in the try run below)
That means we do *eventually* paint the correct thing, but the default reftest snapshot is just happening too quickly.
https://treeherder.mozilla.org/#/jobs?repo=try&revision=1aa1e5359ecb84245c545bc9758a58474771fa3c

So the explanation is probably that we've got a bug in one of the following two places:

1. Since delaying the snapshot taken by the test harness by 100ms does make the test pass, there might be a bug in the paint stabilization in the test harness where it doesn’t correctly wait for the second div in the testcase to be painted.  

2. Or the paint stabilization is correct but the -moz-element invalidation is too delayed for it to catch for some reason Eg: -moz-element references one frame too late i.e. we paint the first div, then we discover that we need to invalidate the other div, and we trigger another paint which then paints the other div so maybe we must modify the code for -moz-element so that both the invalidation happens in the first paint itself? 

mattwoodrow, do you have any thoughts on what the explanation might be?  (since you've worked on -moz-element as well as paint stabilization)

(Thanks very much mstange and dholbert for your help with this!)
Flags: needinfo?(matt.woodrow)
(Slight tangent: IMO, we should feel free to paper over the invalidate-1.html failure using a small setTimeout (since that does fix it), in the short-term interests of getting this bug landed.  And then we can investigate comment 50 more closely in bug 1178202 or a new bug.  A small setTimeout would be basically equivalent to the paint delay that we're already giving this test in current trunk, so in practice it wouldn't be reducing the sensitivity of that test.)
I did some digging, and I strongly suspect that this is bug 1209751 coming back to bite us.

My guess is that when we go to take the reftest snapshot of the page, the background image hasn't loaded yet.

Reftests snapshots force synchronous image decoding (by not passing DRAWWINDOW_ASYNC_DECODE_IMAGES to drawWindow), so we block (during painting) when we try to draw the first div until the image has loaded and then draw it correctly.

We then try painting the div with -moz-element (which didn't have the instrinsic image size available during layout, as the image was only ready during painting) and then that fails.

Seth's comment suggests that this should only be a problem if the element's size depends on the image's intrinsic size, which shouldn't be the case here (we specify a size ourselves). There could easily be something else depending on the instrinsic size though which causes painting to fail.

It looks like this is reproducible, so shouldn't be too hard to confirm.

I'm not sure what the best fix is here. I think we need to detect that the image size has changed and schedule another paint so it can be draw with the correct layout. The reftest harness should detect this scheduled paint and wait for it.
Flags: needinfo?(matt.woodrow)
(Assignee)

Updated

6 months ago
Depends on: 1178202
(Assignee)

Updated

6 months ago
No longer depends on: 1260629, 1293550, 1294278, 1295466, 1313772, 1313773
(Assignee)

Updated

6 months ago
No longer depends on: 1178202
(Assignee)

Comment 53

6 months ago
(In reply to Matt Woodrow (:mattwoodrow) from comment #52)
> I did some digging, and I strongly suspect that this is bug 1209751 coming
> back to bite us.
> 
> My guess is that when we go to take the reftest snapshot of the page, the
> background image hasn't loaded yet.
> 
> Reftests snapshots force synchronous image decoding (by not passing
> DRAWWINDOW_ASYNC_DECODE_IMAGES to drawWindow), so we block (during painting)
> when we try to draw the first div until the image has loaded and then draw
> it correctly.
> 
> We then try painting the div with -moz-element (which didn't have the
> instrinsic image size available during layout, as the image was only ready
> during painting) and then that fails.
> 
> Seth's comment suggests that this should only be a problem if the element's
> size depends on the image's intrinsic size, which shouldn't be the case here
> (we specify a size ourselves). There could easily be something else
> depending on the instrinsic size though which causes painting to fail.
> 
> It looks like this is reproducible, so shouldn't be too hard to confirm.
> 
> I'm not sure what the best fix is here. I think we need to detect that the
> image size has changed and schedule another paint so it can be draw with the
> correct layout. The reftest harness should detect this scheduled paint and
> wait for it.

Thanks for looking into this Matt!
(Assignee)

Comment 54

6 months ago
List of all bugs that had to be modified in some way to prevent paint delay from causing a test-failure ->
(Note that some of these bugs will remain open since we've simply papered over the failure with a "fuzzy" annotation -- that's why they're not marked as blocking in this bug)

Bug 1293550, Bug 1260629, Bug 1313772, Bug 1313773, Bug 1294278, Bug 1295466, Bug 1178202, Bug 1315834, Bug 1295466, Bug 1313253, Bug 1312951, Bug 1315846

Clean try run -> 
https://treeherder.mozilla.org/#/jobs?repo=try&revision=0268d8e0c8def2152d1b9adb390c7687e8b4608c
(Assignee)

Updated

6 months ago
Flags: needinfo?(bugs)
Just spoke with jet -- I'm going to land this on inbound tomorrow, after the test-tweaks (on autoland) have been merged around and have made it to inbound.  --> transferring jet's needinfo to myself

(I think bug 1316430 is the last test-tweak that needed landing, so once that's on inbound, we should be OK to land.)
Flags: needinfo?(bugs) → needinfo?(dholbert)

Comment 56

6 months ago
Pushed by dholbert@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/ff60dd49591d
Sets the default value for nglayout.initialpaint.delay to 5ms on desktop (stays 250 ms on Android) per user research conclusions. r=tn
Flags: needinfo?(dholbert)
I had to back this out for frequent intermittent reftest failures like https://treeherder.mozilla.org/logviewer.html#?job_id=39017948&repo=mozilla-inbound

https://hg.mozilla.org/integration/mozilla-inbound/rev/6deb8408930c
Flags: needinfo?(dholbert)
Depends on: 1316762
Thanks -- I've just landed a fix for that particular issue on bug 1316762.

Did you see any others that seem to have been caused by this push? If so, please mention them here, so we can fix them before the next time we re-land. :)  (Per comment 54, we thought we'd caught all the newly-fuzzy tests, but apparently there were a few more hiding.)

I'll hopefully re-land this later tonight or tomorrow morning.
Flags: needinfo?(dholbert) → needinfo?(wkocher)
Flags: needinfo?(dholbert)
Flags: needinfo?(jgilbert)

Comment 59

6 months ago
Pushed by dholbert@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/b2b359340a84
Sets the default value for nglayout.initialpaint.delay to 5ms on desktop (stays 250 ms on Android) per user research conclusions. r=tn
(Hoping it sticks this time around. If there are any new intermittents that look suspiciously tied to this, please ping me before backing this out again -- we can hopefully just land any remaining targeted tweaks as followups, as-needed.)
Flags: needinfo?(wkocher)
Flags: needinfo?(dholbert)

Comment 61

6 months ago
bugherder
https://hg.mozilla.org/mozilla-central/rev/b2b359340a84
Status: NEW → RESOLVED
Last Resolved: 6 months ago
status-firefox52: --- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla52

Updated

6 months ago
Depends on: 1317048
Duplicate of this bug: 180241
Duplicate of this bug: 549294
No longer blocks: 180241
See Also: → bug 1271691
@ Gregg Lind - Half a year passed (bug #180241 Comment 73) and still no results of Shield Study 1 & 2.
I think we need to back this out, until we better-understand the glterrain regression (and possibly modify that test and tsvgx so this change doesn't regress/noisify them)

See bug 1317048 comment 8 and 9 for more.  But for now, I'll be backing this out.
Created attachment 8811055 [details] [diff] [review]
backout patch

Not sure if I need aurora approval to land the backout on Aurora, but here goes just in case.

Approval Request Comment
[Feature/regressing bug #]: This bug. (I'd like to back it out for now.)
[User impact if declined]: Talos regressions; see previous comment.
[Describe test coverage new/current, TreeHerder]: N/A
[Risks and why]:  Extremely low risk. Just reverting a tweak to a paint-delay timeout value that *just* landed last week.  This backout-patch restores us to our previous state.
[String/UUID change made/needed]: None.
Attachment #8811055 - Flags: approval-mozilla-aurora?
Status: RESOLVED → REOPENED
Resolution: FIXED → ---

Comment 67

6 months ago
Backout by dholbert@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/1c43a08d519f
Backed out changeset b2b359340a84 for causing talos regressions.
Aurora backout: https://hg.mozilla.org/releases/mozilla-aurora/rev/f99c3a999e81a017ca3f6026b1e75fe4ec31a50a
status-firefox52: fixed → affected
status-firefox53: --- → affected
Comment on attachment 8811055 [details] [diff] [review]
backout patch

Went ahead and pushed the backout to aurora.
Attachment #8811055 - Flags: approval-mozilla-aurora?
status-firefox48: affected → wontfix
status-firefox49: affected → wontfix
status-firefox50: affected → wontfix
status-firefox51: --- → wontfix
status-firefox52: affected → fix-optional
Target Milestone: mozilla52 → ---
Depends on: 1318530
Depends on: 1067360
Depends on: 1220414
No longer depends on: 1220414
No longer depends on: 1067360
Depends on: 1319458
I'm planning on re-landing this later this afternoon.

(I'm waiting for bug 1319458 to get a few cycles of talos runs on inbound, so that it's easier to distinguish glterrain differences [if any] caaused by bug 1319458's patch vs. by this bug's patch.)
Flags: needinfo?(dholbert)

Comment 71

5 months ago
Pushed by dholbert@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/a494dd0ae2b0
Sets the default value for nglayout.initialpaint.delay to 5ms on desktop (stays 250 ms on Android) per user research conclusions. r=tn
Flags: needinfo?(dholbert)
Note: This re-landing might initially be implicated as causing a tsvgx regression on inbound and/or central.  Bug 1318530 (currently on autoland) should address that regression.

After bug 1318530 is merged to a branch that has this paint-suppression reduction, the tsvgx regression should mostly go away, though there may be a small regression in the new "svg_static" suite (which is the tsvgx pieces that we spun off in bug 1318530).  A small regression there is an expected & acceptable consequence of this bug, per comment 16.

Comment 73

5 months ago
bugherder
https://hg.mozilla.org/mozilla-central/rev/a494dd0ae2b0
Status: REOPENED → RESOLVED
Last Resolved: 6 months ago5 months ago
status-firefox53: affected → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla53

Comment 74

5 months ago
Hi Daniel, 
Here is perf alert for this landed patch: https://treeherder.mozilla.org/perf.html#/alerts?id=4345
Do you think all of these are expected?
Thanks, Alison! I think these are all OK. Specifically:

 * The "damp" regressions weren't reported the previous times this was landed, but they don't surprise me. (And they look like they might be sporadic / within the range of noise on the graph too).
 * The tart improvements make sense.
 * The "fileio" ones are *NOT* expected, and almost certainly unrelated to this change. (Looking at the graph for those, it looks like this "regression" is actually us returning to the same value we had a few days ago. Not sure if that's from a backout or from a bimodal test result or what, but I'm confident it's unrelated to this bug's one-liner non-fileio-related patch.
 * The tp5o tsvgx and regressions are expected.
I actually think the fileio regressions are related, they were improved on backing this out, and if you look at the 30 day history, you can see when we first landed this, backed it out, and now recently landed this:
https://treeherder.mozilla.org/perf.html#/graphs?timerange=2592000&series=%5Bmozilla-inbound,2f3af3833d55ff371ecf01c41aeee1939ef3a782,1,1%5D&series=%5Bautoland,2f3af3833d55ff371ecf01c41aeee1939ef3a782,1,1%5D

these are all non-main startup fileio, I am not sure what this is- I would have to push to try with some hacks to talos to make it output data or an artifact which I could use to figure out which files are the problem.  I did this about a month ago for another problem and it turned out we were loading different fonts.  Let me know if you want me to do that and I could get some try pushes put together.
status-firefox52: fix-optional → wontfix
Depends on: 1332558
You need to log in before you can comment on or make changes to this bug.