Closed Bug 1199674 Opened 5 years ago Closed 3 years ago

[Messages][Contacts] White flashing when moving from a document to another document

Categories

(Firefox OS Graveyard :: Gaia::SMS, defect)

ARM
Gonk (Firefox OS)
defect
Not set

Tracking

(Not tracked)

RESOLVED WONTFIX

People

(Reporter: julienw, Unassigned)

References

(Depends on 1 open bug)

Details

Attachments

(4 files)

When using split views, there is a white flash when entering the next view, before anything is displayed.
Cleopatra direct link: [1].
You can follow the instructions at [2] to run the application in Firefox. Except you use the URL app://sms.gaiamobile.org/views/inbox/index.html. Then open any conversation, or click the "+" button. And then you can come back pressing the "back" button (the one in the application).


When running in Firefox, it's behaving diffently than on a device: it uses the httpd.js extension, it's not _really_ using the app:// protocol, but instead there is a real HTTP request. Moreover HTTP Cache is disabled with httpd.js (with "Expires: -1" headers). So I think the effect in Firefox is bigger than on a device.

I'm fetching a profile from the device right now.

[1] https://people.mozilla.org/~bgirard/cleopatra/#report=eb0269ee1b722b0063fe0e28597a020b4c7fb77a

[2] https://github.com/mozilla-b2g/gaia/blob/master/apps/sms/README.md#how-to-run-the-application-in-firefox
See also in Cleopatra: [1].

I _think_ we may have a sync reflow happening because of gaia-header.

You need the patch from Bug 1183133.

[1] http://people.mozilla.org/~bgirard/cleopatra/#report=778c9ff8c9a25e703034097912039dd156ff3f2d
See Cleopatra: http://people.mozilla.org/~bgirard/cleopatra/#report=b3e634314634208610e958ee051493e538bb973e

Here we don't see the white flashing, and this is what makes me think we have the gaia-header issue: we did some work to actually don't have a sync reflow on this panel, because this is the initial panel.

It's still quite slow though (but we don't use the BFCache yet).
Please tell me if I missed some threads when profiling.
With this patch I think I eliminated the sync reflow happening because of the gaia-header, but I still have a white flash :(

The PR includes the patch from bug 1183133 (and you still have to force rehashing the manifests using instructions in bug 1183133 comment 2).
Boris told me he'd make time to look at this today.
OK, so I poked at this a bit.  I do see the flash in a debug Firefox following the instructions in comment 1 and following the link to the 9-message conversation with "Alan Turing, 102, Claude Shannon, ..."; thank you for those!  I don't see it in an opt build, but that's just because it's faster...  All the testing I did was using gaia rev 1dfcae14eccf7e8ce2c1dae27b34d4b8e9ee04cd in case that matters.

As an experiment, I tried changing the paint suppression timeout from its default value (250ms) to 3000ms (I just changed it in nsPresShell.cpp, but you could set the nglayout.initialpaint.delay pref to test without rebuilding.  With the higher value the flash of white seems to be gone.

I also checked what causes us to start that timer.  In this case it's the fact that the last stylesheet that was linked from the <head> has finished loading and we have seen the <body> tag.  Normally, this would be when we want to start rendering an HTML page, on the assumption that it's a document and we want to render it as it comes in.  But this particular page (conversation/index.html) is blank white until all those <script defer> from the <head> load/run, as far as I can tell.  And unfortunately those scripts take a while to do that.  I threw this into the page right before </body>:

  <script>
     var startTestTimer = new Date;
     addEventListener("DOMContentLoaded", 
                      function() { console.log(new Date - startTestTimer); });
  </script>

and I see times in the multiple-seconds range in my debug build.  In an opt build it's about 170-300ms (quite noisy)... on decent laptop hardware.

Since we end up painting during this time, it's not that the scripts are all loaded and ready to run; that would peg the CPU and block rendering.  We're still waiting on them to load, afaict.

We could eliminate the flash of white by raising the paint suppression timeout (e.g. letting the target page control it), but that would just replace it with showing the previous page until the new one is ready.  That might be OK for a short period of time, but we're already doing it for 250ms.  I assume we wouldn't want to do it for that much longer than that, right?
So looking at the network traffic in Firefox (again, may not quite match what happens on the device), I see us issue a bunch of GETs for CSS and JS and shim_host.html at about the 60-70ms mark.  Then at about the 500ms mark we issue another bunch of GETs for various images (bmp, jpg), some videos (ogv), etc.  Then at the 650ms mark we issue some GETs for .properties file.

Then at the 1.3s mark (this is in an _opt_ build!) we have a bunch more GETs for JS files: navigator_moz_mobilemessage_shim.js, bridge.js, service.js, etc, etc, etc.

It's possible that with the real app:// protocol we don't end up having these connection-limit issues.  It might be interesting to see what the network panel equivalent data looks like on there...
(In reply to Boris Zbarsky [:bz] from comment #9)
> So looking at the network traffic in Firefox (again, may not quite match
> what happens on the device), I see us issue a bunch of GETs for CSS and JS
> and shim_host.html at about the 60-70ms mark.

Yeah, it's quite different on device, we don't concatenate files when run app in Firefox and we have a lot of them :) But I've just realized that we don't optimize resources for the split view at all yet - I'm going to see if I can easily fix that so that we can compare.

"shim_host.html" iframe is something that we added recently - it's _hidden_ iframe that sits inside _every_ view and it loads the same bunch of files (5, ~2200 lines of code in total) for every view as well.

What we (SMS) should probably do from gaia side is to concatenate iframe's js files as well to at least reduce number of GET requests.

What Gecko/System app could do is bug 1192193, so that we can have only one shared window for every view and get rid of iframes.

What Gecko could do is probably - bug 1164389.

> Then at about the 500ms mark we issue another bunch of GETs for various images (bmp, jpg), some videos
> (ogv), etc.

These are only for desktop to simulate MMS - we don't have such files on device.

> Then at the 650ms mark we issue some GETs for .properties file.

It's likely what l10n.js lib does to load localization files, again for every view. I suspect it should be taken from browser cache when we navigate to another view. Also per https://bugzilla.mozilla.org/show_bug.cgi?id=1198266#c5, new l20n.js will likely improve things here.

> Then at the 1.3s mark (this is in an _opt_ build!) we have a bunch more GETs
> for JS files: navigator_moz_mobilemessage_shim.js, bridge.js, service.js,
> etc, etc, etc.

Yeah, those are _lazily loaded_ resources (we load them one by one via adding script node for every resource, quite inefficient, but it should not be in the critical path since we only load them when view is ready for the user aka "visuallyLoaded" performance marker).

> 
> It's possible that with the real app:// protocol we don't end up having
> these connection-limit issues.  It might be interesting to see what the
> network panel equivalent data looks like on there...

I've just tried to open Network tab in WebIDE connected to device and don't see any request at all.

Hey J Ryan, I'm not sure you're the right person to ask about this, but do you have any clue why I don't see any network requests for the app:// resources on device (I have "Disable Cache (when toolbox is opened)" checked)?

Boris, do you have device handy to try there (with the patch from comment 6 + I'll try to workout something quickly to optimize resources for split views)? Also it seems Julien posted relevant profile from device in comment 3.

Let me know if I can provide more info/profiles/etc while Julien is on PTO :)

Thanks!
Flags: needinfo?(jryans)
(In reply to Oleg Zasypkin [:azasypkin][⏰UTC+3] from comment #10)
> But I've just realized that we
> don't optimize resources for the split view at all yet - I'm going to see if
> I can easily fix that so that we can compare.

> I'll try to workout something quickly to optimize resources for split views

Sorry, I was wrong - with PRODUCTION=1, scripts are also optimized for the split views, so it should be OK on device.
I do have some local patches to expose an API on top of mozbrowser to control painting and avoid this white flash when navigating from a page to another.

The API has 2 methods (on the parent):
 DOMRequest iframe.pausePainting();
 DOMRequest iframe.resumePainting();

Which basically map (on the child) to:
 docShell.contentViewer.pausePainting();
 docShell.contentViewer.resumePainting();

For me it seems to prevent the white background to be painted if docShell.contentViewer.pausePainting() is called during a nsIWebProgressListener.onLocationChange:
  if (!(flags & Ci.nsIWebProgressListener.LOCATION_CHANGE_SAME_DOCUMENT)) {
    docShell.contentViewer.pausePainting();
  }

(with the addition of a one line fix in nsPresShell.cpp to make sure the pausing is honored in some cases).

The downside of this method compared to tweaking the nglayout.initialpaint.delay value is that the painting is programmatically enable/disable by Gaia (and so we can introduce bugs).

The good part is that it is not a timer that may render the page at a random (from the point of view of the app) time as there can be more granularity based on some implicit page events.

An ideal scenario would be an explicit mechanism from the page to inform the platform that it is ready to be painted and displayed.
For example I would be happy with an API accessible from the Web Page that does something like:
 navigator.waitForFirstPaint(); // to inform the page, that the right time to paint is controlled by the page itself
 navigator.doFirstPaint();// to tell the page that the right time is now.



Now, whatever the choosen solution is, the end result will look pretty bad. The clicked page may stay in front of the user for a little while, giving the feeling of an unresponsive application.
One can argue that target page can be optimized deeply (don't get me wrong I'm all for it!), but it will take a while (looking at some comments from Oleg it involves at least 4 0r 5 different teams). And once this page is fixed, we will still have a lot of pages of other apps to fix as well (likely more than 50) :/

The situation will likely be a bit better once we will be able to load a SharedWorker in a decent amount of time (bug 1191926) as it will be less duplicate code in all views.
But I'm a bit skeptic that it will be enough to provide the best possible UX.

This is partly why adding prerendering ability to the platform would make a huge difference for Gaia. It will let it sideload other pages and likely shape hundreds of ms (as demo'ed in the last work week in Paris using the prerendering hack of bug I do have some local patches to expose an API on top of mozbrowser to control painting and avoid this white flash when navigating from a page to another.

The API has 2 methods (on the parent):
 DOMRequest iframe.pausePainting();
 DOMRequest iframe.resumePainting();

Which basically map (on the child) to:
 docShell.contentViewer.pausePainting();
 docShell.contentViewer.resumePainting();

For me it seems to prevent the white background to be painted if docShell.contentViewer.pausePainting() is called during a nsIWebProgressListener.onLocationChange:
  if (!(flags & Ci.nsIWebProgressListener.LOCATION_CHANGE_SAME_DOCUMENT)) {
    docShell.contentViewer.pausePainting();
  }

(with the addition of a one line fix in nsPresShell.cpp to make sure the pausing is honored in some cases).

The downside of this method compared to tweaking the nglayout.initialpaint.delay value is that the painting is programmatically enable/disable by Gaia (and so we can introduce bugs).

The good part is that it is not a timer that may render the page at a random (from the point of view of the app) time as there can be more granularity based on some implicit page events.

An ideal scenario would be an explicit mechanism from the page to inform the platform that it is ready to be painted and displayed.
For example I would be happy with an API accessible from the Web Page that does something like:
 navigator.waitForFirstPaint(); // to inform the page, that the right time to paint is controlled by the page itself
 navigator.doFirstPaint();// to tell the page that the right time is now.



Now, whatever the choosen solution is, the end result will look pretty bad. The clicked page may stay in front of the user for a little while, giving the feeling of an unresponsive application.
One can argue that target page can be optimized deeply (don't get me wrong I'm all for it!), but it will take a while (looking at some comments from Oleg it involves at least 4 0r 5 different teams). And once this page is fixed, we will still have a lot of pages of other apps to fix as well (likely more than 50) :/

The situation will likely be a bit better once we will be able to load a SharedWorker in a decent amount of time (bug 1191926) as it will be less duplicate code in all views.
But I'm a bit skeptic that it will be enough to provide the best possible UX.

This is partly why adding prerendering ability to the platform would make a huge difference for Gaia. It will let it sideload other pages and likely shape hundreds of ms (as demo'ed in the last Paris work week using the prerendering hack of bug 1190805).
(In reply to Oleg Zasypkin [:azasypkin][⏰UTC+3] from comment #10)
> Hey J Ryan, I'm not sure you're the right person to ask about this, but do
> you have any clue why I don't see any network requests for the app://
> resources on device (I have "Disable Cache (when toolbox is opened)"
> checked)?

Unfortunately we don't currently include app:// requests in the Network Monitor.  Bug 1076948 is tracking this issue.
Flags: needinfo?(jryans)
Assignee: ehsandelhi → nobody
Depends on: 1211853
Sorry for the lag here; comment 10 somehow slipped through the cracks, and I wasn't needinfod.  :(

> What Gecko could do is probably - bug 1164389.

Worth checking in more realistic conditions whether we think that would help, yeah.

> Yeah, those are _lazily loaded_ resources

OK.

> Boris, do you have device handy to try there (with the patch from comment 6 + I'll try
> to workout something quickly to optimize resources for split views)?

I have a Flame, but it's also my primary phone, so I'd have to be a bit careful trying things on it and would need very explicit instructions for how to get it quickly into the desired test state and then back into a known-good state.

> Also it seems Julien posted relevant profile from device in comment 3.

Yes, but note that this profile doesn't have the white flashing bit this bug is about either, right?

Basically, what I'm seeing on desktop is that we take so long to actually create the relevant DOM and apply the relevant CSS, due to network latency and the way we perform the network requests, that the 250ms paint suppression timer fires and we start painting the page even though there is nothing to paint there yet.

As Vivien says, we could suppress painting for long enough via various mechanisms, but that just replaces the flash of white with longer display of the previous page.  That might be ok if the longer display is not too long, but it's not clear to me how long it would be in practice for us right now.

So what we should probably measure is two things:

1)  How long actually passes between the user click and the time when the target page feels it would unsuppress painting if it got to control that.

2)  What we're doing during that time (both in terms of CPU usage and in terms of idle time while we wait for something).
There actually is a profile with the white flash in comment 2.

But I'm quite sure the issue here is a sync reflow...
Flags: needinfo?(bzbarsky)
Right, I was focusing on the case where the white flash was not well-understood.
Flags: needinfo?(bzbarsky)
Just a note that split views will not be done in 2.5, so we'll resume work on this post-branching given the other work we have now.
Mass closing of Gaia::SMS bugs. End of an era :(
Status: NEW → RESOLVED
Closed: 3 years ago
Resolution: --- → WONTFIX
Mass closing of Gaia::SMS bugs. End of an era :(
You need to log in before you can comment on or make changes to this bug.