1276850 - When scrolling back to top of kryogenix.org, screen is entirely white for a second or two

Reporter

Description

•

9 years ago

STR: 1. Visit http://kryogenix.org/ 2. Scroll to the bottom (or at least down a few screen-worths of space) 3. Rapidly scroll to the top. ACTUAL RESULTS: The top section of the page is just blank white for a second or so. (when you reach it by scrolling up) EXPECTED RESULTS: Top section of the page should repaint a low-res version perhaps, but not white. I'm hitting this in recent Nightlies on Fennec (Firefox for Android). Working on a regression range.

Daniel Holbert [:dholbert]

Reporter

Comment 1

•

9 years ago

(In reply to Daniel Holbert [:dholbert] from comment #0) > EXPECTED RESULTS: > Top section of the page should repaint a low-res version perhaps, but not > white. (Or at least it should repaint something non-blank pretty rapidly.)

Daniel Holbert [:dholbert]

Reporter

Comment 2

•

9 years ago

mozregression says this was probably a regression from: https://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?fromchange=eca3a6b94f7934377b099ddb22ae9c91f340cbfc&tochange=9066ef31441959ae15b83d59395ed34d3ed03c0a > Kartikaya Gupta — Bug 1192910 - Ensure we flush paints on the main thread during an APZ flush. r=mstange > Markus Stange — Bug 1192910 - Stop triggering repaints when the displayport margins change without the displayport changing. r=kats (I say "probably" because this doesn't reproduce 100% of the time -- I had to reload a few times to get it to trigger. Makes sense from the blamed bug's commit message that it would be related to this, though.) FWIW, I'm using a OnePlus One phone, running CyanogenMod 13 (Android 6.01).

Blocks: 1192910

URL: http://kryogenix.org/

Component: General → Panning and Zooming

Flags: needinfo?(bugmail.mozilla)

Product: Firefox for Android → Core

Version: unspecified → Trunk

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Assignee

Comment 3

•

9 years ago

I can reproduce this on Nightly pretty easily, but I can't repro it on Aurora. There we have low-res painting enabled, so that might be masking the problem. However I also enabled the APZ minimap on both nightly and aurora, and there's a clear difference in behaviour there - on Nightly the painted rect stays "stuck" below the top of the page for a few seconds before snapping up, and on aurora it paints the top much faster. So I'm inclined to mark 48/aurora as unaffected, even though bug 1192910 is on that branch. I'll investigate to see why this is happening.

Assignee: nobody → bugmail.mozilla

status-firefox47: --- → unaffected

status-firefox48: --- → unaffected

status-firefox49: --- → affected

Flags: needinfo?(bugmail.mozilla)

OS: Unspecified → Android

Summary: [Android] When scrolling back to top of kryogenix.org, screen is entirely white for a second or two → When scrolling back to top of kryogenix.org, screen is entirely white for a second or two

Version: Trunk → 48 Branch

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Assignee

Comment 4

•

9 years ago

Weird, I can't reproduce this on a local build from m-c tip, with a clean profile. I can still repro in Nightly.

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Assignee

Comment 5

•

9 years ago

I did some try pushes with logging [1] and I can reproduce on those builds (although they're using my Nightly profile). From the logs it looks like the APZ code is calling RequestContentRepaint as intended, but since it's calling it from the compositor thread, it needs to redispatch to the main thread at [2], and that redispatch doesn't happen until much later. So the main thread must be busy doing something else, I don't know what. I was unsuccessful in running the profiler on it. I'll try to get a backtrace from gdb. [1] https://treeherder.mozilla.org/#/jobs?repo=try&revision=6f6fb8da959f [2] http://mxr.mozilla.org/mozilla-central/source/gfx/layers/apz/src/AsyncPanZoomController.cpp?rev=ddd9b902f9e1#2784

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Assignee

Comment 6

•

9 years ago

Couldn't get gdb to work on the try build either, since it's built as a Nightly and appears to be non-debuggable.

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Assignee

Comment 7

•

9 years ago

Ok, I got a profile from a Nightly build (I didn't have my setup correct before). https://cleopatra.io/#report=f321c5b18da81dc1b7302e01ac66cc896afcc015 We're basically spending a lot of time painting backgrounds and gradients, and that blocks the main thread from processing the new incoming paint requests.

Assignee: bugmail.mozilla → nobody

Component: Panning and Zooming → Graphics

Daniel Holbert [:dholbert]

Reporter

Comment 8

•

9 years ago

(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #4) > Weird, I can't reproduce this on a local build from m-c tip, with a clean profile. (In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #5) > I did some try pushes with logging [1] and I can reproduce on those builds > (although they're using my Nightly profile). (FWIW, I don't think there are any special normal-browsing-profile quirks required to trigger this -- I'm able to reproduce in "guest mode", as well as in mozregression which uses fresh profiles.) (In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #7) > We're basically spending a lot of time painting backgrounds and gradients, > and that blocks the main thread from processing the new incoming paint > requests. Thanks for taking a look at this! So, the page does have some complex layered tiled-background + partially-transparent gradient on the top section (see rule for #basic). I imagine that's what we're working to paint here. Do you know how we managed to paint responsively before, despite this background being expensive?

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Assignee

Comment 9

•

9 years ago

(In reply to Daniel Holbert [:dholbert] from comment #8) > Do you know how we managed to paint responsively before, despite this > background being expensive? I'm not sure, but considering that I see the issue on try build but not local builds, I suspect there might be something racy or otherwise intermittent going on. I also did a try build with bug 1192910 backed out [1] and I'm still able to reproduce the issue on that build. I wasn't able to install the 47 nightly build just before bug 1192910 landed (I'd have to blow away my profile which I don't want to do right now) so I couldn't verify the regression range, but I suspect it might be off? [1] https://treeherder.mozilla.org/#/jobs?repo=try&revision=81c027858c2e

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Assignee

Comment 10

•

9 years ago

Ok, so I verified your regression range, and confirmed that bug 1192910 did make things worse. However the specific thing that bug 1192910 regressed was fixed in bug 1263347. I think it regressed again since then, trying to track down the regression range for that.

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Assignee

Comment 11

•

9 years ago

The new regression range is https://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?fromchange=7389f08fb91c83a239add2be15e648431ed645f1&tochange=d74e9d5a64c352187ab74b60255b571f6192a0e5 Are we now using rust code for any part of painting/layout on fennec?

Blocks: 1220307
No longer blocks: 1192910

Flags: needinfo?(nfroyd)

Nathan Froyd [:froydnj]

Comment 12

•

9 years ago

(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #11) > The new regression range is > https://hg.mozilla.org/integration/mozilla-inbound/ > pushloghtml?fromchange=7389f08fb91c83a239add2be15e648431ed645f1&tochange=d74e > 9d5a64c352187ab74b60255b571f6192a0e5 > > Are we now using rust code for any part of painting/layout on fennec? Uh, no, we aren't. That's a peculiar regression range. My best guess is that when we link an empty rust crate, we drag in bits and pieces of support libraries (e.g. what one would find in libgcc) and those pieces get linked into libxul, rather than libxul depending on some outside library. And then somehow, one of those pieces has a bug? That seems a little unlikely, but...

Flags: needinfo?(nfroyd)

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Assignee

Comment 13

•

9 years ago

Hm. It's far-fetched, but it would also explain why try builds demonstrate the problem and my local build does not. Somebody should verify my regression range just to be sure though. Assuming the range is valid, I don't really know where to go from here - is it useful to bisect within those three changesets? Or should we look at the profile in comment 7 more closely to see if there's things in there that wouldn't have been involved before?

Nathan Froyd [:froydnj]

Comment 14

•

9 years ago

(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #13) > Hm. It's far-fetched, but it would also explain why try builds demonstrate > the problem and my local build does not. Somebody should verify my > regression range just to be sure though. > > Assuming the range is valid, I don't really know where to go from here - is > it useful to bisect within those three changesets? Or should we look at the > profile in comment 7 more closely to see if there's things in there that > wouldn't have been involved before? I have two suggestions, one of which is more work than the other: 1. Look at the profile in comment 7 to see if you can find anything unusual, particularly with functions starting with '_' or '__'. If we can at least identify *something*, we'll have more information to possibly take to the Rust team or even someplace else. 2. Back out the patches; nothing else we're building depends on them, because the only Rust pieces in the tree aren't being built on Android for tedious reasons (bug 1266792). These are not mutually exclusive, of course.

Nathan Froyd [:froydnj]

Comment 15

•

9 years ago

(In reply to Nathan Froyd [:froydnj] from comment #14) > 2. Back out the patches; nothing else we're building depends on them, > because the only Rust pieces in the tree aren't being built on Android for > tedious reasons (bug 1266792). Well, at least back out the patch that flips on --enable-rust; the other ones can stay.

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Assignee

Comment 16

•

9 years ago

Here's a try push with --enable-rust disabled, let's see if it helps any. https://treeherder.mozilla.org/#/jobs?repo=try&revision=8ab94a6e08cb

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Assignee

Comment 17

•

9 years ago

(In reply to Nathan Froyd [:froydnj] from comment #14) > 1. Look at the profile in comment 7 to see if you can find anything unusual, > particularly with functions starting with '_' or '__'. If we can at least > identify *something*, we'll have more information to possibly take to the > Rust team or even someplace else. There are some symbols like __addXf3__, __mulXf3__, and __divdXf3__ which are showing up as pretty high in the profile. They're the top three nontrivial symbols in the profile if you select "invert callstack". Does that help?

Nathan Froyd [:froydnj]

Comment 18

•

9 years ago

(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #17) > (In reply to Nathan Froyd [:froydnj] from comment #14) > > 1. Look at the profile in comment 7 to see if you can find anything unusual, > > particularly with functions starting with '_' or '__'. If we can at least > > identify *something*, we'll have more information to possibly take to the > > Rust team or even someplace else. > > There are some symbols like __addXf3__, __mulXf3__, and __divdXf3__ which > are showing up as pretty high in the profile. They're the top three > nontrivial symbols in the profile if you select "invert callstack". Does > that help? Well, it's information, at least. :) What is gfxContext::Fill doing with adding, multiplying, and dividing floating point numbers? OK, I don't know the answer to that question. What I suspect is going on here is this: we call runtime functions __add{s,d}f3__ for doing addition of floating-point numbers, because ARM. Prior to the --enable-rust push, references to these functions were being resolved to libgcc or similar on the system, which probably uses the floating-point hardware to do the work (possibly with some cost for massaging the data in and out of the integer calling convention registers, I think). But after the --enable-rust push, those __add{s,d}f3__ references are getting resolved by using Rust's support library; __addXf3__ is the llvm internal implementation of the function we're actually calling. And Rust's support library on the ARM target that we use is compiled to be fully general and run on any ARM chip...including chips that don't have a floating-point unit. So the Rust routines are, I think, doing everything in software (!), which as you might imagine is somewhat expensive. I'm really surprised a regression from this hasn't shown up sooner. There's a Rust issue open on this: https://github.com/rust-lang/rust/issues/33278 Clearly, we need to use the correct Rust target for our Android builds. :) I don't think the necessary changes to add an armv7 target have showed up in stable Rust, though, so perhaps the best course of action for now is simply to back out the --enable-rust changes?

Pulsebot

Comment 19

•

9 years ago

Pushed by kgupta@mozilla.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/da10eecd0e76 Back out part 3 of bug 1220307. rs=froydnj

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Assignee

Comment 20

•

9 years ago

The try build with rust disabled fixed the issue. After discussion with Nathan on IRC we backed out part 3 of bug 1220307.

Nathan Froyd [:froydnj]

Comment 21

•

9 years ago

Whew, in under the 49 wire so we don't have to ask for uplifts. :)

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Assignee

Updated

•

9 years ago

Assignee: nobody → bugmail.mozilla

Alex Crichton [:acrichto]

Comment 22

•

9 years ago

Oh whoa, this is indeed quite surprising! The armv7 Android target is in nightly Rust now and I'll try to work on producing nightlies for it once I get back to SF after the work week to ensure they start riding the trains.

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Assignee

Comment 23

•

9 years ago

This got merged https://hg.mozilla.org/mozilla-central/rev/da10eecd0e76

Status: NEW → RESOLVED

Closed: 9 years ago

status-firefox49: affected → fixed

Resolution: --- → FIXED

Target Milestone: --- → mozilla49