White screen when resume crystalskull/cubevid app

RESOLVED FIXED in Firefox 18

Status

Firefox OS
General
RESOLVED FIXED
5 years ago
5 years ago

People

(Reporter: pchang, Assigned: bjacob)

Tracking

unspecified
B2G C2 (20nov-10dec)
ARM
Gonk (Firefox OS)

Firefox Tracking Flags

(firefox18 fixed, firefox19 fixed, firefox20 fixed)

Details

Attachments

(2 attachments)

(Reporter)

Description

5 years ago
Long press home key to resume crystalskull/cubevid app and will see the white screen displayed. Need to kill process and re-launch again to make it work.

Device: Otoro/Unagi
Failure rate: High
(Reporter)

Comment 1

5 years ago
Found white screen was caused by invalid WebGL context.

During app was on background, WebGL context was destroyed because memory-pressure events received. Resume app works well when disabling the following code.

+++ b/content/canvas/src/WebGLContext.cpp
@@ -58,8 +58,8 @@ WebGLMemoryPressureObserver::Observe(nsISupports* aSubject,
                                      const char* aTopic,
                                      const PRUnichar* aSomeData)
 {
-  if (strcmp(aTopic, "memory-pressure") == 0)
-    mContext->ForceLoseContext();
+//  if (strcmp(aTopic, "memory-pressure") == 0)
+//    mContext->ForceLoseContext();
   return NS_OK;
 }


Checking why MemoryPressure observer was triggered.
(Reporter)

Comment 2

5 years ago
The memory-pressure event was triggered when app was on background.
Therefore, we need to consider which kind of GL resource need to be released.

ProcessPriorityManager::OnGracePeriodTimerFired()
{ 
  LOG("Grace period timer fired; setting priority to %d.",
      PROCESS_PRIORITY_BACKGROUND); 
...
  if (mgr) {
    mgr->MinimizeMemoryUsage(/* callback = */ nullptr);
  } 
...
}
(Reporter)

Updated

5 years ago
Assignee: nobody → pchang
(Reporter)

Comment 3

5 years ago
Created attachment 689605 [details] [diff] [review]
bug-818766-fix

Avoid destroying GLContext when app goes to the background
Attachment #689605 - Flags: review?(bjacob)
(Assignee)

Comment 4

5 years ago
Comment on attachment 689605 [details] [diff] [review]
bug-818766-fix

Review of attachment 689605 [details] [diff] [review]:
-----------------------------------------------------------------

Losing WebGL contexts aggressively is intentional. This typically frees important amounts of system memory.

The right approach to fixing this bug is rather to let this WebGL application handle the "webglcontextlost" and "webglcontextrestored" events.

When a WebGL context gets lost, it receives the "webglcontextlost" event. If it does preventDefault() on it, it will then normally receive a "webglcontextrestored" event where it can re-construct its WebGL state from scratch (re-load its WebGL textures, buffers etc).

This is all explained here:
http://www.khronos.org/webgl/wiki/HandlingContextLost
Attachment #689605 - Flags: review?(bjacob) → review-
The event being sent here is a gecko implementation detail; it's not triggered by OS memory pressure, but rather when applications go foreground->background.  Given that (as we've discussed before) authors are unlikely to handle this properly, as this code does not, we're guaranteed to completely break pretty much every WebGL application in b2g.

Are you confident enough in contextlost handling to sign off on that?  I'm sure not ;).

When we really are facing OS memory pressure, we'll send an event with a different flag that will still kill the contexts.  I think that's going to cause bad breakage but at least it won't be on every foreground->background transition.  We can deal with that separately.
Also, I don't think we should be aggressively killing webgl contexts for memory savings unless we have a mechanism to detect when applications don't restore the contexts and kill the applications if not.  It's a better UX for an app to "crash" than for it to just break with a blank screen.
(Assignee)

Comment 7

5 years ago
First of all, if the event here is actually an implementation detail that isn't related to real memory pressure, then maybe it shouldn't be called "memory-pressure" ?

That said, I actually believe that it's a good idea to lose all WebGL contexts going to the background *after a certain delay* (like, 15 minutes). We plan to gradually get there on all platforms (even on desktop). This is part of a concerted effort to force Web authors to handle context loss. In that vein, we already lose the oldest-used WebGL context when a certain limit is exceeded (and that limit is low on mobile: 2 per principal, 4 globally). Chrome is starting to do the same.

So it's very unfortunate that we ship code in Gaia that doesn't apply the good practices that we're trying to evangelize, and it would be even worse if this led to backing down on our plan to fire more and more context loss events gradually.

I understand (now) that the B2G specific problem here is that B2G triggers "memory-pressure" abusively. If there is really a good reason why we can't fix that, I'll r+ that patch.
(Assignee)

Comment 8

5 years ago
(In reply to Chris Jones [:cjones] [:warhammer] from comment #6)
> Also, I don't think we should be aggressively killing webgl contexts for
> memory savings unless we have a mechanism to detect when applications don't
> restore the contexts and kill the applications if not.  It's a better UX for
> an app to "crash" than for it to just break with a blank screen.

If I understand correctly, you're suggesting a change whereby not restoring a lost WebGL context would eventually lead to generating an exception.

That's... interesting, but I am afraid that that could break a significant amount of code out there.

If you're serious about that, you can start a discussion on the public_webgl list.
(In reply to Benoit Jacob [:bjacob] from comment #7)
> First of all, if the event here is actually an implementation detail that
> isn't related to real memory pressure, then maybe it shouldn't be called
> "memory-pressure" ?

It's piggy-backing on existing infrastructure.

> That said, I actually believe that it's a good idea to lose all WebGL
> contexts going to the background *after a certain delay* (like, 15 minutes).
> We plan to gradually get there on all platforms (even on desktop). This is
> part of a concerted effort to force Web authors to handle context loss. In
> that vein, we already lose the oldest-used WebGL context when a certain
> limit is exceeded (and that limit is low on mobile: 2 per principal, 4
> globally). Chrome is starting to do the same.

That's fine, but we need a plan for the inevitable blank-screens-of-death that will ensue.

> So it's very unfortunate that we ship code in Gaia that doesn't apply the
> good practices that we're trying to evangelize, and it would be even worse
> if this led to backing down on our plan to fire more and more context loss
> events gradually.

This isn't core gaia code, it's a third party demo we imported.  Which is precisely my concern.

Do we have any guess at what % of sites properly handle contextrestore?  I understand and agree with the long-term project, but would you kill webgl on b2g v1 to do it?

> I understand (now) that the B2G specific problem here is that B2G triggers
> "memory-pressure" abusively. If there is really a good reason why we can't
> fix that, I'll r+ that patch.

As I said, there should be two different events generated, one when the app goes foreground->background, and another on "real" OS memory pressure.  But let me confirm that with jlebar.
Flags: needinfo?(justin.lebar+bug)
(In reply to Benoit Jacob [:bjacob] from comment #8)
> (In reply to Chris Jones [:cjones] [:warhammer] from comment #6)
> > Also, I don't think we should be aggressively killing webgl contexts for
> > memory savings unless we have a mechanism to detect when applications don't
> > restore the contexts and kill the applications if not.  It's a better UX for
> > an app to "crash" than for it to just break with a blank screen.
> 
> If I understand correctly, you're suggesting a change whereby not restoring
> a lost WebGL context would eventually lead to generating an exception.
> 
> That's... interesting, but I am afraid that that could break a significant
> amount of code out there.
> 
> If you're serious about that, you can start a discussion on the public_webgl
> list.

This is purely a UA issue, what do you want to achieve through that discussion?

At any rate, the goal is to improve the UX for sites that don't properly handle contextrestore.  Do you agree that leaving the user staring at a blank non-interactive screen is worse than just closing the page?
There are two tiers of WebGL content in this respect.

Commercial-grade WebGL content like Google Maps does tend to handle WebGL context loss.

Random demos on the Web tend not to.

Obviously I don't want to harm WebGL on B2G v1.

So, OK to make an exception here, but there is a little problem:

(In reply to Chris Jones [:cjones] [:warhammer] from comment #9)
> As I said, there should be two different events generated, one when the app
> goes foreground->background, and another on "real" OS memory pressure.  But
> let me confirm that with jlebar.

In fact, the about:memory minimize memory usage button also sends "heap-minimize". So yes, the issue preexists. The problem is that it's very useful that this button allows to test WebGL context loss.

So I'd r+ a version of this patch making it apply only to B2G. OK?

On this other topic:

(In reply to Chris Jones [:cjones] [:warhammer] from comment #10)
> (In reply to Benoit Jacob [:bjacob] from comment #8)
> > If you're serious about that, you can start a discussion on the public_webgl
> > list.
> 
> This is purely a UA issue, what do you want to achieve through that
> discussion?

I don't see how this is a UA issue. One of the most fundamental things that a Web API has to specify is under what exact circumstances using that API can interrupt the script's control flow with an exception. This seems to fall right into that category.
> As I said, there should be two different events generated, one when the app goes 
> foreground->background, and another on "real" OS memory pressure.  But let me confirm 
> that with jlebar.

When we hit real OS memory pressure, we fire a memory-pressure event with aData.StartsWith('low-memory') (it's actually "low-memory-no-forward", but you shouldn't rely on that).  See GonkMemoryPressureMonitoring.cpp::MemoryPressureRunnable.

In contrast, when we send a process into the background, we fire a memory-pressure event with aData == 'heap-minimize' (nsMemoryReporterManager::MinimizeMemoryUsageRunnable).

I agree that "memory-pressure" isn't a great name for the second event, but you can distinguish between these two cases.
Flags: needinfo?(justin.lebar+bug)
> The problem is that it's very useful that this button allows to test WebGL context loss.

You can't have it both ways -- either the minimize-memory-usage button triggers context loss and we drop contexts when the process goes into the background, or the button doesn't trigger context loss and we don't drop contexts when the process is bg'ed.

I think this situation is pretty reasonable, myself.  Minimize memory usage is doing exactly what it says.
Yes. That's why I believe that the least bad solution here is to take the present patch but #ifdef b2g. That way, about:memory's button will be broken on B2G in that it won't lose WebGL contexts, but at least WebGL content will run on B2G and about:memory won't be broken outside of B2G.
(In reply to Benoit Jacob [:bjacob] from comment #14)
> Yes. That's why I believe that the least bad solution here is to take the
> present patch but #ifdef b2g.

Our MO has been to add prefs instead of #ifdef b2g's.  That way, if Fennec wants the same behavior, it's a simple change for them.
(In reply to Benoit Jacob [:bjacob] from comment #11)
> There are two tiers of WebGL content in this respect.
> 
> Commercial-grade WebGL content like Google Maps does tend to handle WebGL
> context loss.
> 
> Random demos on the Web tend not to.
> 

On what do you base these statements?

> (In reply to Chris Jones [:cjones] [:warhammer] from comment #10)
> > (In reply to Benoit Jacob [:bjacob] from comment #8)
> > > If you're serious about that, you can start a discussion on the public_webgl
> > > list.
> > 
> > This is purely a UA issue, what do you want to achieve through that
> > discussion?
> 
> I don't see how this is a UA issue. One of the most fundamental things that
> a Web API has to specify is under what exact circumstances using that API
> can interrupt the script's control flow with an exception. This seems to
> fall right into that category.

What I'm suggesting is like a slow-script timeout.  UA mechanisms to kill degenerate content has no business living in specs IMHO.
Before we continue arguing: do you agree with the above proposal to take the patch but #ifdef b2g?

(In reply to Chris Jones [:cjones] [:warhammer] from comment #16)
> (In reply to Benoit Jacob [:bjacob] from comment #11)
> > There are two tiers of WebGL content in this respect.
> > 
> > Commercial-grade WebGL content like Google Maps does tend to handle WebGL
> > context loss.
> > 
> > Random demos on the Web tend not to.
> > 
> 
> On what do you base these statements?

The Google Maps statement? On word from Google; but now I just verified it. Go to maps.google.com, enable MapsGL, zoom to any city with 3D buildings in non-satellite view (e.g. Toronto), go to about:memory in another tab, minimize memory usage, go back to Maps tab, watch it recreate its state.

That said... I then checked the WebGL version of Streetview... and it does not restore :-(

> What I'm suggesting is like a slow-script timeout.  UA mechanisms to kill
> degenerate content has no business living in specs IMHO.

I see. That makes sense. In that case, a dialog offering the user to reload the page/app would make even more sense as that is what would fix it in most cases.
(In reply to Justin Lebar [:jlebar] from comment #15)
> (In reply to Benoit Jacob [:bjacob] from comment #14)
> > Yes. That's why I believe that the least bad solution here is to take the
> > present patch but #ifdef b2g.
> 
> Our MO has been to add prefs instead of #ifdef b2g's.  That way, if Fennec
> wants the same behavior, it's a simple change for them.

That makes sense, let me make a patch.
(In reply to Benoit Jacob [:bjacob] from comment #17)
> Before we continue arguing: do you agree with the above proposal to take the
> patch but #ifdef b2g?
> 
> (In reply to Chris Jones [:cjones] [:warhammer] from comment #16)
> > (In reply to Benoit Jacob [:bjacob] from comment #11)
> > > There are two tiers of WebGL content in this respect.
> > > 
> > > Commercial-grade WebGL content like Google Maps does tend to handle WebGL
> > > context loss.
> > > 
> > > Random demos on the Web tend not to.
> > > 
> > 
> > On what do you base these statements?
> 
> The Google Maps statement?

"commercial-grade content tends to"

> That said... I then checked the WebGL version of Streetview... and it does
> not restore :-(
> 

That doesn't inspire much confidence :(.

> > What I'm suggesting is like a slow-script timeout.  UA mechanisms to kill
> > degenerate content has no business living in specs IMHO.
> 
> I see. That makes sense. In that case, a dialog offering the user to reload
> the page/app would make even more sense as that is what would fix it in most
> cases.

Hm yeah, good idea.
Created attachment 689861 [details] [diff] [review]
add pref for whether to lose contexts on heap-minimize. False on B2G, true elsewhere.
Attachment #689861 - Flags: review?(jgilbert)
(In reply to Chris Jones [:cjones] [:warhammer] from comment #19)
> > > On what do you base these statements?
> > 
> > The Google Maps statement?
> 
> "commercial-grade content tends to"

On this and some more examples I don't remember specifically but it would be easy to check that by using the about:memory button on a few of the more high profile WebGL applications out there.
*sigh* so that was from a conversation with a Google engineer, and indeed the only WebGL application that I can find, that handle it, are Google ones.
Comment on attachment 689861 [details] [diff] [review]
add pref for whether to lose contexts on heap-minimize. False on B2G, true elsewhere.

Review of attachment 689861 [details] [diff] [review]:
-----------------------------------------------------------------

r+ if we default to false everywhere.

::: content/canvas/src/WebGLContext.cpp
@@ +57,5 @@
>  WebGLMemoryPressureObserver::Observe(nsISupports* aSubject,
>                                       const char* aTopic,
>                                       const PRUnichar* aSomeData)
>  {
> +    if (strcmp(aTopic, "memory-pressure"))

Please use != 0 here. Since strcmp returns 0 when things match (aka on success), we want to be explicit that we're returning early if they *don't* match.

@@ +62,5 @@
> +        return NS_OK;
> +
> +    bool wantToLoseContext = true;
> +
> +    if (!nsCRT::strcmp(aSomeData, NS_LITERAL_STRING("heap-minimize").get()))

Same thing about !=0 here.

::: modules/libpref/src/init/all.js
@@ +3664,5 @@
>  pref("webgl.msaa-force", false);
>  pref("webgl.prefer-16bpp", false);
>  pref("webgl.default-no-alpha", false);
>  pref("webgl.force-layers-readback", false);
> +pref("webgl.lose-context-on-heap-minimize", true);

Since this is a pref now, we should just default to true. If we want to lose contexts for testing, we're free to twiddle this pref.
Attachment #689861 - Flags: review?(jgilbert) → review+
That context-lost isn't ever handled is mostly a dev-eng problem. As part of the webgl-adverse-mode I want to add a pref for, we should make it aggressively lose contexts.

There is also discussion about making handling context-lost events mandatory in a later iteration of WebGL.
The only way to make it mandatory is to only hand out a context off a context-restore event on startup.  But that would Break The Web and I don't see a way for the game theory to work out among vendors.
http://hg.mozilla.org/integration/mozilla-inbound/rev/c38be1281766
Assignee: pchang → bjacob
Target Milestone: --- → B2G C2 (20nov-10dec)
(In reply to Chris Jones [:cjones] [:warhammer] from comment #25)
> The only way to make it mandatory is to only hand out a context off a
> context-restore event on startup.  But that would Break The Web and I don't
> see a way for the game theory to work out among vendors.

Exactly. We have been discussing that, also on the topic of asynchronous WebGL context creation. We might push that in WebGL 2. Of course, noone is talking about breaking WebGL 1.

Comment 29

5 years ago
https://hg.mozilla.org/mozilla-central/rev/c38be1281766
https://hg.mozilla.org/mozilla-central/rev/a285d038f2c3
Status: NEW → RESOLVED
Last Resolved: 5 years ago
Resolution: --- → FIXED
Let's try to get this uplifted to aurora and gecko-18.
Comment on attachment 689861 [details] [diff] [review]
add pref for whether to lose contexts on heap-minimize. False on B2G, true elsewhere.

[Approval Request Comment]
Bug caused by (feature/regressing bug #): since a few months ago we started losing WebGL contexts on memory pressure. Not per se a Gecko bug, but doesn't fit well with the realities of 1) web page not handling lost context events and 2) B2G generating such memory events frequently.
User impact if declined: WebGL apps/pages will stop rendering on B2G very often, like everytime you go back to the home screen.
Testing completed (on m-c, etc.): Been on m-c for a few days
Risk to taking this patch (and alternatives if risky): Not risky.
String or UUID changes made by this patch: None.
Attachment #689861 - Flags: approval-mozilla-beta?
Attachment #689861 - Flags: approval-mozilla-aurora?
Comment on attachment 689861 [details] [diff] [review]
add pref for whether to lose contexts on heap-minimize. False on B2G, true elsewhere.

go ahead with branch uplift and testing on b2g nightlies.
Attachment #689861 - Flags: approval-mozilla-beta?
Attachment #689861 - Flags: approval-mozilla-beta+
Attachment #689861 - Flags: approval-mozilla-aurora?
Attachment #689861 - Flags: approval-mozilla-aurora+
(Assignee)

Updated

5 years ago
status-firefox18: --- → fixed
status-firefox19: --- → fixed
status-firefox20: --- → fixed
You need to log in before you can comment on or make changes to this bug.