Last Comment Bug 839500 - Gaia's identity.js opens a browser process that's stuck in the foreground. Therefore, if you open the marketplace app, you lose ~10% of available app memory until you reboot the phone.
: Gaia's identity.js opens a browser process that's stuck in the foreground. T...
Status: RESOLVED FIXED
[MemShrink:P1]
:
Product: Firefox OS
Classification: Client Software
Component: Gaia::System (show other bugs)
: unspecified
: All All
: -- normal (vote)
: B2G C4 (2jan on)
Assigned To: Jed Parsons (use needinfo, please) [:jedp, :jparsons]
:
Mentors:
Depends on: 841820 899437
Blocks: slim-fast PayId-v1next 769865 837927 851492
  Show dependency treegraph
 
Reported: 2013-02-08 08:13 PST by Justin Lebar (not reading bugmail)
Modified: 2013-08-01 18:22 PDT (History)
19 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
leo+
wontfix
wontfix
fixed
+
fixed
wontfix
wontfix


Attachments
wip gecko patch (8.77 KB, patch)
2013-02-17 00:04 PST, [:fabrice] Fabrice Desré
no flags Details | Diff | Review
wip gaia patch (1.15 KB, patch)
2013-02-17 00:05 PST, [:fabrice] Fabrice Desré
no flags Details | Diff | Review
wip gecko patch v2 (9.70 KB, patch)
2013-02-19 14:12 PST, [:fabrice] Fabrice Desré
justin.lebar+bug: feedback+
Details | Diff | Review
wip gecko patch v3 (10.50 KB, patch)
2013-02-21 14:44 PST, [:fabrice] Fabrice Desré
jed+bmo: feedback+
Details | Diff | Review
close identity dialog iframe when not in use (gaia) (1.45 KB, patch)
2013-02-24 22:20 PST, Jed Parsons (use needinfo, please) [:jedp, :jparsons]
no flags Details | Diff | Review
close identity dialog iframe when not in use (gecko) (17.46 KB, patch)
2013-02-24 22:21 PST, Jed Parsons (use needinfo, please) [:jedp, :jparsons]
fabrice: feedback+
Details | Diff | Review
close identity dialog iframe when not in use (gecko) (v2) (28.38 KB, patch)
2013-02-25 09:41 PST, Jed Parsons (use needinfo, please) [:jedp, :jparsons]
no flags Details | Diff | Review
close identity dialog iframe when not in use (gecko) (v3) (31.03 KB, patch)
2013-03-11 14:42 PDT, Jed Parsons (use needinfo, please) [:jedp, :jparsons]
fabrice: review+
Details | Diff | Review
close identity dialog iframe when not in use (gecko) (v4) (31.02 KB, patch)
2013-03-12 12:29 PDT, Jed Parsons (use needinfo, please) [:jedp, :jparsons]
benadida: review+
Details | Diff | Review
close identity dialog iframe when not in use (gecko) (v5) (31.10 KB, patch)
2013-03-14 08:00 PDT, Jed Parsons (use needinfo, please) [:jedp, :jparsons]
no flags Details | Diff | Review

Description Justin Lebar (not reading bugmail) 2013-02-08 08:13:04 PST
STR:

 * Connect to wifi (cellular data probably also works)
 * In a terminal,
   
   $ watch -n.5 adb shell b2g-ps --oom

 * Load the marketplace app.
 * Notice that two processes start: The "marketplace" process and a "browser" process.  (1)
 * Press the home button.
 * Notice that the marketplace process's OOM_ADJ decreases to 6, but the browser process's OOM_ADJ does not. (2)

Example output of b2g-ps --oom after end of STR:

> APPLICATION       OOM_ADJ  OOM_SCORE  OOM_SCORE_ADJ  USER     PID  
> b2g                  0        481         0          root      4663
> Cost Control         6        501         400        app_4704  4704
> Homescreen           1        194         67         app_4741  4741
> Marketplace          6        531         400        app_4793  4793
> (Preallocated a      6        449         400        root      4819
> Browser              1        163         67         app_5359  5359

I have no idea if the fact that the marketplace app starts a browser process (1) is a bug.  The fact that this process is stuck in the foreground is definitely a bug, and it's one of the factors causing bug 837927.
Comment 1 Justin Lebar (not reading bugmail) 2013-02-08 08:16:10 PST
Process stuck in the fg should block: It can cause us to be unable to load apps.  It also permanently wastes memory on the phone.

Chris, Fabrice, do you know whom I should CC on this bug?
Comment 2 Jason Smith [:jsmith] 2013-02-08 08:18:09 PST
I wonder if any of this is related to the auto-reload stuff marketplace is doing.
Comment 3 Justin Lebar (not reading bugmail) 2013-02-08 08:44:26 PST
Whoa, this is really bad.  It looks like the marketplace app is able to load a nested child process!  There are about eight separate things about that should make that not possible.
Comment 4 Justin Lebar (not reading bugmail) 2013-02-08 08:46:38 PST
Okay, false alarm, this frame is being loaded by the main process.  That's an improvement.
Comment 5 Justin Lebar (not reading bugmail) 2013-02-08 08:54:12 PST
This browser process is being created as a result of a chrome event coming from the identity module.

> Breakpoint 1, nsFrameLoader::TryRemoteBrowser (this=0x48f99150) at ../../../../../ff-git2
> /src/content/base/src/nsFrameLoader.cpp:1977
> 1977	  nsIDocument* doc = mOwnerContent->GetDocument();
> (gdb) call DumpJSStack()
> 0 onMozChromeEvent() ["app://system.gaiamobile.org/js/identity.js":66]
>     this = [object Object]

But the identity window does not show up.
Comment 6 Justin Lebar (not reading bugmail) 2013-02-08 08:56:27 PST
Yeah, this looks like a Gaia bug in identity.js; search for 'open-id-dialog'.  I have no idea why we should be opening this frame eagerly instead of lazily, but we can't leave it in the foreground like this.
Comment 7 Justin Lebar (not reading bugmail) 2013-02-08 08:58:19 PST
Note that this code is also broken because it does not handle crashes of the communication iframe.
Comment 8 Justin Lebar (not reading bugmail) 2013-02-08 09:00:10 PST
(In reply to Jason Smith [:jsmith] from comment #2)
> I wonder if any of this is related to the auto-reload stuff marketplace is
> doing.

Could you elaborate on this?  I'm scared by the phrase "auto-reload"; we've had a few bugs now which involved that.  :)
Comment 9 Jason Smith [:jsmith] 2013-02-08 09:04:52 PST
Hmm...identity.js is Jed's area of expertise. Jed - Thoughts?
Comment 10 Jed Parsons (use needinfo, please) [:jedp, :jparsons] 2013-02-08 10:49:34 PST
Hi Jason, hi Justin: I can't speak to the auto-reload question, but as to the identity.js question: That iframe holds state for persona.  Once the iframe is instantiated, it continues to be used as the storage for all navigator.id functions across all apps.  So yes, like the main tab in the browser, once it's there, it does stay there.

If we cannot leave this in the foreground, what do you think would be the best approach?

Is there a way to somehow background a Browser process and wake it up when it's needed?

Otherwise, perhaps there's a way we can freeze and thaw the state?  (Persona uses both localStorage and http cookies.)
Comment 11 Justin Lebar (not reading bugmail) 2013-02-08 13:42:38 PST
(In reply to Jed Parsons [:jparsons] from comment #10)
> Hi Jason, hi Justin: I can't speak to the auto-reload question, but as to
> the identity.js question: That iframe holds state for persona.  Once the
> iframe is instantiated, it continues to be used as the storage for all
> navigator.id functions across all apps.  So yes, like the main tab in the
> browser, once it's there, it does stay there.

There's a big difference here: With the browser app, when I close the browser via the cards view, all of its tabs go away.  In contrast, as far as I can tell, this frame never goes away.

> If we cannot leave this in the foreground, what do you think would be the
> best approach?

Sending it to the background when you can would be a good start...

> Is there a way to somehow background a Browser process and wake it up when
> it's needed?

Sure; you can setVisible(false) on the iframe containing the window.  But sending a frame into the background makes it more likely to be killed, so you'll have to deal with that.  But then if you can handle the tab crashing at random points in time, it's not clear to me why you need it at all.

Note that when a tab crashes because of memory pressure, it's usually a /really/ bad idea to start it again immediately.  That's the "auto-reload" thing I was scared about above; we've tried that twice in Gaia, both times to disastrous results.  :)

I doubt you want this frame to be in the foreground even while the marketplace app is open, because then B2G will have no preference between killing the persona frame and killing the Marketplace app.  In practice, it will probably kill the marketplace app first, because the marketplace will use more memory.
 
> Otherwise, perhaps there's a way we can freeze and thaw the state?  (Persona
> uses both localStorage and http cookies.)

We don't have anything like that, sorry.
Comment 12 Justin Lebar (not reading bugmail) 2013-02-09 17:18:40 PST
> There's a big difference here: With the browser app, when I close the browser via the 
> cards view, all of its tabs go away.

Another big difference, in case it's not obvious, is that all of the browser's processes go into the background when the browser is in the bg, so therefore they are candidates for killing on low-memory.

Since the identity frame is always foreground, it will only be killed when it's using more memory than the current "true" fg process, which should essentially be never.

Therefore this bug is essentially "If you open the marketplace app, you lose 10mb (~10%) of available memory until the next time you reboot your phone."
Comment 13 Justin Lebar (not reading bugmail) 2013-02-09 17:47:21 PST
To be explicit about the user impact here: This is by far the worst reproducible b2g memory bug I am aware of, and it negates a lot of the work we did earlier this year.  I know the schedule is really tight at this point, but I'd hate to ship v1.0.1 with this bug, particularly since partners seem to be deeply concerned about our memory usage.
Comment 14 Ben Adida [:benadida] 2013-02-11 13:26:28 PST
documentation from chat with Justin Lebar:
- 10mb lowerbound for a new process, so there's not much room for us to play with
- foreground vs. background: foreground gets killed as close to last resort as possible when OOM. Multiple processes with foreground priority will use the larger ones to kill, so identity process will never get killed.
- if we labeled it background: process could get CPU-starved. Also, it would get killed more eagerly. Can switch back and forth foreground-to-background.
- assuming we keep the current implementation with OOP frame, we have no choice but to either kill the frame or let the frame be killed to address this issue.
- if we kill the frame, we can do so cleanly, and we will maintain cookies and localStorage.
- if we let the frame be killed, there's a chance that cookies/localStorage might not be fsync'ed, talk to Jason Duell or Jonas.
- another option: move identity frame (at least the invisible one) into main process, because overhead would then be much smaller. But that would likely only be okay if we remove the iframe each time after we're done.

how to run the test: do what you want, then run adb shell and see if the process is around.
Comment 15 Alex Keybl [:akeybl] 2013-02-12 09:04:34 PST
(In reply to Justin Lebar [:jlebar] from comment #13)
> To be explicit about the user impact here: This is by far the worst
> reproducible b2g memory bug I am aware of, and it negates a lot of the work
> we did earlier this year.  I know the schedule is really tight at this
> point, but I'd hate to ship v1.0.1 with this bug, particularly since
> partners seem to be deeply concerned about our memory usage.

If this requires an architectural change to resolve, though, we couldn't block on this bug. What's the proposed short term solution here? We'd need resolution this week.
Comment 16 Ben Adida [:benadida] 2013-02-12 11:30:44 PST
(In reply to Alex Keybl [:akeybl] from comment #15)
> If this requires an architectural change to resolve, though, we couldn't
> block on this bug. What's the proposed short term solution here? We'd need
> resolution this week.

we tried the quick fix yesterday (close the frame and reopen it when needed), and it breaks the second-login flow for reasons we have not yet determined. Could be cookie/localStorage fsync issues. In any case, that means that it would be a risky fix and not doable this week, as best as I can tell.
Comment 17 Jed Parsons (use needinfo, please) [:jedp, :jparsons] 2013-02-12 11:58:41 PST
(In reply to Ben Adida [:benadida] from comment #14)
> - if we kill the frame, we can do so cleanly, and we will maintain cookies
> and localStorage.

Two things to consider regarding killing and reviving the iframe:

First, both login (request) and logout need to interact with the communication iframe, so after killing it post-login, we'd need to resuscitate it just before logout.  (I think this is the problem we saw when we tried the simple kill approach yesterday; logout didn't log the user out of persona, but there may have been more.)  Also, I'm not sure what's going to happen with the watch() callbacks after we do this.  At the very least, I think it's risky.

Second, it cuts us off from potential future persona features such as 'logout everywhere'.

> - if we let the frame be killed, there's a chance that cookies/localStorage
> might not be fsync'ed, talk to Jason Duell or Jonas.

This will be important to know if we consider adopting the background iframe approach, in which the goal is to make the iframe more easily killable.

> - another option: move identity frame (at least the invisible one) into main
> process, because overhead would then be much smaller. But that would likely
> only be okay if we remove the iframe each time after we're done.

What is the base memory footprint of an in-process iframe?
Comment 18 Justin Lebar (not reading bugmail) 2013-02-14 13:20:04 PST
Sorry, I had a reply written here earlier but I must have midaired Jet and lost it.

> Second, [killing the frame post-login] cuts us off from potential future persona features 
> such as 'logout everywhere'.

Why is this?

> What is the base memory footprint of an in-process iframe?

The footprint itself is very low, but that doesn't tell the whole story.  The iframe is going to run JS, which will affect fragmentation in the JS and C++ heaps even after we close the iframe.  But it's hard to say how much of an impact it would have without trying it first.
Comment 19 Chris Jones [:cjones] inactive; ni?/f?/r? if you need me 2013-02-14 13:31:11 PST
(In reply to Ben Adida [:benadida] from comment #14)
> - another option: move identity frame (at least the invisible one) into main
> process, because overhead would then be much smaller. But that would likely
> only be okay if we remove the iframe each time after we're done.

Sorry, we can't even consider this.  We already had this discussion  Loading arbitrary code loaded over the network into a root process is just not an option.
Comment 20 Ben Adida [:benadida] 2013-02-14 19:49:03 PST
(In reply to Chris Jones [:cjones] [:warhammer] from comment #19)
> (In reply to Ben Adida [:benadida] from comment #14)
> > - another option: move identity frame (at least the invisible one) into main
> > process, because overhead would then be much smaller. But that would likely
> > only be okay if we remove the iframe each time after we're done.
> 
> Sorry, we can't even consider this.  We already had this discussion  Loading
> arbitrary code loaded over the network into a root process is just not an
> option.

Thanks for the confirmation, Chris, we figured as much, but wanted to list out all potential options for completeness of discussion.
Comment 21 Alex Keybl [:akeybl] 2013-02-15 09:07:26 PST
Tracking but not blocking - please nominate for approval if/when this is resolved. Blocking on an issue like this which will likely require architectural changes and has a high probability of being inappropriate for v1.0.1.
Comment 22 Justin Lebar (not reading bugmail) 2013-02-15 09:16:50 PST
Alex, can you please verify for me that our partners (including hw partners) are aware of this bug and have signed off not blocking v1.0.1 on this?

I ask because I frequently get e-mails from hw partners concerned about our memory usage, and I can't explain my way out of this bug.
Comment 23 Ben Adida [:benadida] 2013-02-15 09:18:41 PST
(In reply to Justin Lebar [:jlebar] from comment #22)
> Alex, can you please verify for me that our partners (including hw partners)
> are aware of this bug and have signed off not blocking v1.0.1 on this?
> 
> I ask because I frequently get e-mails from hw partners concerned about our
> memory usage, and I can't explain my way out of this bug.

Justin: one way, and I suspect it's a fairly accurate way, to look at this is that, currently, our identity solution on B2G requires a 10MB resident process. We will optimize it over time, of course. But you do get a feature in return for this memory usage, it's not just a leak, for example.
Comment 24 Justin Lebar (not reading bugmail) 2013-02-15 09:26:31 PST
> currently, our identity solution on B2G requires a 10MB resident process.

I think the bigger concern for them is one of expectations.  They have tested and come to expect that they have X mb available for apps on B2G.  This bug means that, under relatively common circumstances, they actually have X-10mb available.

That's a serious regression, from their point of view.  And it's seriously our fault that we didn't make them aware of this earlier.  So I want to make sure that we've at least informed them as of now.
Comment 25 Ben Adida [:benadida] 2013-02-15 09:29:15 PST
(In reply to Justin Lebar [:jlebar] from comment #24)
> I think the bigger concern for them is one of expectations.  They have
> tested and come to expect that they have X mb available for apps on B2G. 
> This bug means that, under relatively common circumstances, they actually
> have X-10mb available.

I'm not sure I understand this. They tested without logging into marketplace? Is that a valid test?

I understand about perception, but I don't think we should try to meet an expectation that marketplace and identity, in particular persistent identity across apps and marketplace sessions, can be had with no extra memory usage.
Comment 26 Alex Keybl [:akeybl] 2013-02-15 09:33:05 PST
(In reply to Justin Lebar [:jlebar] from comment #22)
> Alex, can you please verify for me that our partners (including hw partners)
> are aware of this bug and have signed off not blocking v1.0.1 on this?

Our partners are not in a position to make a call on whether individual memory bugs should block, but rather whether the overall memory usage and LMK experience is acceptable. Lucas has been working on collecting that information (and criteria).
Comment 27 Justin Lebar (not reading bugmail) 2013-02-15 09:39:21 PST
All I would like is to make sure that our partners are aware that their expectation that we have X mb available for app is now X-10mb, if their expectations had been built around tests which did not involve the marketplace app.

I would like us to communicate this explicitly to them, because this bug represents a major regression from our previous mutual understandings.  Can we do that, please?
Comment 28 Ben Adida [:benadida] 2013-02-15 09:43:23 PST
(In reply to Justin Lebar [:jlebar] from comment #27)
> All I would like is to make sure that our partners are aware that their
> expectation that we have X mb available for app is now X-10mb, if their
> expectations had been built around tests which did not involve the
> marketplace app.
> 
> I would like us to communicate this explicitly to them, because this bug
> represents a major regression from our previous mutual understandings.  Can
> we do that, please?

I leave that decision to Alex and Lucas, since I don't know what those conversations have been.

But I would like to add that if we communicated to partners a memory expectation from tests that did not involve marketplace and identity, then *we* made a big mistake, and we need to fix our process for testing and expectation setting that takes into account complete use cases. Who should take the lead on this? Is this a QA task?
Comment 29 Justin Lebar (not reading bugmail) 2013-02-15 09:50:29 PST
> But I would like to add that if we communicated to partners a memory expectation from tests that did 
> not involve marketplace and identity, then *we* made a big mistake

I totally agree.  We should have found this bug months ago.  In fact, we never should have written the code this way.

I think this was mostly a matter of not having the right people cc'ed on the bug where we created the identity frame.  Someone signed off on mozbrowser usage who wasn't entirely familiar with the consequences of using the API (or didn't think of them at the time).

This is very difficult to test from a black-box perspective, so I don't think it's necessarily appropriate to put it on QA.
Comment 30 Ben Adida [:benadida] 2013-02-15 10:02:42 PST
(In reply to Justin Lebar [:jlebar] from comment #29)
> In fact, we never should have written the code this way.

How can you make this statement when we don't even know yet *how* we might build it differently while retaining the features we need?

Service integration is super tricky. Memory usage is one important issue, but it is by no means the only one. You're assuming we could simply compromise on features to optimize memory. I disagree. Sometimes we have to use memory to get features we need. Where we draw the line is an art, not a science, and we have to keep exploring better ways to successfully integrate services.

But when you assume that we obviously should have done this differently, you're assuming the only hard thing is memory management. It's not. There are other competing interests, and sometimes memory will lose out.
Comment 31 Justin Lebar (not reading bugmail) 2013-02-15 10:15:01 PST
I was tasked with reducing b2g's memory usage a few months ago.  It was one of the highest priorities in the B2G project at the time.

I /never/ would have accepted this change as-is.  You'll have to trust me when I say that I understand there's a trade-off between memory usage and features; I've been living that for the past eighteen months.

This change literally wipes out roughly half of all of the B2G memory work we've done.  Those changes represents years of person-hours.  I'm not going to agree that this is somehow an acceptable trade-off.
Comment 32 Lucas Adamski [:ladamski] 2013-02-15 16:03:38 PST
10 megs seems a lot for a background service of moderate complexity, so we need to find a way to get this down.  I agree that its probably very late to take on this sort of risk for 1.0.1, but we should strive to fix this for leo.

I don't think having an entire process to host a persistent remote frame is a  performant pattern for services in general.  The inability to kill it at will is particularly troubling.

Persistent services should be local and lightweight to minimize resource and network consumption.
Comment 33 Jason Smith [:jsmith] 2013-02-15 16:29:35 PST
Noming for leo per comment 32.
Comment 34 [:fabrice] Fabrice Desré 2013-02-17 00:04:44 PST
Created attachment 714878 [details] [diff] [review]
wip gecko patch

In which we dispatch a mozChromeEvent when the last user of navigator.id.watch() disappears. With the matching gaia patch, this kills the "Browser" process when we close the marketplace application.

This patch probably doesn't do the right thing in non-OOP mode, I'll take a look at that later.
Comment 35 [:fabrice] Fabrice Desré 2013-02-17 00:05:58 PST
Created attachment 714879 [details] [diff] [review]
wip gaia patch

In which we close the hidden identity iframe when we get the 'identity-cleanup' mozChromeEvent.
Comment 36 [:fabrice] Fabrice Desré 2013-02-19 14:12:25 PST
Created attachment 715703 [details] [diff] [review]
wip gecko patch v2

Should now work with non-OOP cases. I can't test on b2g desktop because the online/offline detection is broken there.
Comment 37 Justin Lebar (not reading bugmail) 2013-02-20 10:40:22 PST
Comment on attachment 715703 [details] [diff] [review]
wip gecko patch v2

I just tested this, and it seems to work as advertised.

One issue is that the identity process doesn't go away until the app that's using it is killed.  So this fg frame stays around until we kill the marketplace app due to low memory.  This is not ideal, but I don't think it will cause big problems.
Comment 38 [:fabrice] Fabrice Desré 2013-02-20 10:46:19 PST
(In reply to Justin Lebar [:jlebar] from comment #37)
> Comment on attachment 715703 [details] [diff] [review]
> One issue is that the identity process doesn't go away until the app that's
> using it is killed.  So this fg frame stays around until we kill the
> marketplace app due to low memory.  This is not ideal, but I don't think it
> will cause big problems.

Thanks for testing Justin! Indeed we keep the frame open as long as they are active watchers from the API. If the marketplace can release them early (which I have no idea if it's feasible) we will close the identity frame earlier also.
Comment 39 Ben Adida [:benadida] 2013-02-20 11:58:35 PST
(In reply to Justin Lebar [:jlebar] from comment #37)
> I just tested this, and it seems to work as advertised.

Thanks Fabrice and Justin, we'll take a look to confirm this ASAP. Jed is out sick today, but we'll do this soon.

> One issue is that the identity process doesn't go away until the app that's
> using it is killed.  So this fg frame stays around until we kill the
> marketplace app due to low memory.  This is not ideal, but I don't think it
> will cause big problems.

To be clear, again, this is not a bug. The frame has to stick around to support logout functionality. It is the cost of the feature.

Fabrice may have figured out a good approach for when no app is using Identity, so that's fantastic. But while an app that uses identity is running, the frame must stick around.
Comment 40 Ben Adida [:benadida] 2013-02-20 12:01:31 PST
(In reply to Lucas Adamski from comment #32)
> 10 megs seems a lot for a background service of moderate complexity, so we
> need to find a way to get this down.

Agreed, and I suggest we look at the whole stack of issues, including in particular the cookie partitioning, which is the reason we have to do this in the system app to begin with. Being able to stay in the app's process while touching shared storage, that would make life a LOT easier.

This is a longer term issue, which will need significant help from platform team.

> I don't think having an entire process to host a persistent remote frame is
> a  performant pattern for services in general.

Actually, I'd like to make sure we look at how we can make this a more general pattern. It has many desirable architectural properties. But that is a broader discussion.
Comment 41 Justin Lebar (not reading bugmail) 2013-02-20 12:16:33 PST
> To be clear, again, this is not a bug. The frame has to stick around to support logout 
> functionality. It is the cost of the feature.

Can you explain why we couldn't reload the frame at the point that we wanted to log out?

There is simply no guarantee that this frame will not have crashed at the time you want to log out, so correct code is going to have to be able to handle this frame going away at any point in time.  Given that, I don't understand why it should be impossible to intentionally destroy the frame.

I understand that features use memory, and I've been in the position myself of having to defend a feature's memory usage against attack.  But I'd really appreciate if you'd work with me in trying to figure out how to reduce the memory usage of this feature in the future.  If every time a feature used 10mb of memory we said that was simply a cost of the feature, not a bug, nothing to be done about it, we'd quickly have no memory left on the B2G phones.  Making a usable product on these devices requires cooperation from everyone, at all levels of the product.
Comment 42 Ben Adida [:benadida] 2013-02-20 12:23:29 PST
(In reply to Justin Lebar [:jlebar] from comment #41)
> I understand that features use memory, and I've been in the position myself
> of having to defend a feature's memory usage against attack.  But I'd really
> appreciate if you'd work with me in trying to figure out how to reduce the
> memory usage of this feature in the future.

I believe we are working with you. But we're going to keep advocating for the features, and it's perfectly fine by me if you keep advocating for memory (as long as you stop calling Identity a "regression.")

There are really good reasons for the frame to stick around and not be reloaded on every action, in particular actual user-perceived performance and responsiveness. Fabrice's solution, if it works, is a really nice way to slice it that may have zero downsides. I'm all for it (once we test it.)

I expect we'll find the right middle ground this way.
Comment 43 Jed Parsons (use needinfo, please) [:jedp, :jparsons] 2013-02-21 09:33:20 PST
(In reply to Fabrice Desré [:fabrice] from comment #36)
> Created attachment 715703 [details] [diff] [review]
> wip gecko patch v2
> 
> Should now work with non-OOP cases. I can't test on b2g desktop because the
> online/offline detection is broken there.

Thanks for the patch, Fabrice.  I'm having trouble getting a device build working with latest b2g18 and v1-train (no keyboard).  I'll test this as soon as I can get that working.
Comment 44 Jed Parsons (use needinfo, please) [:jedp, :jparsons] 2013-02-21 12:37:17 PST
Hmm, unfortunately I'm not having success with this patch; When I run the navigator.mozId tests in the UI Tests app, which lets you execute a variety of RP calls, I get an error that the message manager for 'watch' is undefined:

E/GeckoConsole(  428): [JavaScript Error: "TypeError: this._mm is undefined" {file: "resource://gre/modules/DOMIdentity.jsm" line: 125}]

I have reconfirmed that without the patch, I don't get the error.
Comment 45 [:fabrice] Fabrice Desré 2013-02-21 14:44:26 PST
Created attachment 716802 [details] [diff] [review]
wip gecko patch v3

Fixed a stupid error.
Comment 46 Jed Parsons (use needinfo, please) [:jedp, :jparsons] 2013-02-21 15:48:22 PST
Thanks, Fabrice, that certainly fixes the undefined variable error.  But now when I test by quitting to the homescreen from the midst of one signin flow and starting a second flow, the second flow gets stuck.

STR
- in the browser, visit http://notoriousb2g.123done.org
- click sign in
- when the trusted ui appears, escape to the home screen
- open UI Tests
- go to navigator.mozId tests
- click request assertion

The flow in the second dialog will be stuck.  Even after escaping back to the homescreen, going back to notoriousb2g.123done, and finishing the signin, the UI Tests app will continue to be stuck.

(for reference, there's a marketplace story in bug #825362 about the need to quit and resume trusted UI flows)
Comment 47 Jed Parsons (use needinfo, please) [:jedp, :jparsons] 2013-02-22 10:31:31 PST
(In reply to Justin Lebar [:jlebar] from comment #41)
> Can you explain why we couldn't reload the frame at the point that we wanted
> to log out?

Justin and Ben, I think now that I may have been wrong when I said this would be a problem.  I've got a better prototype working that kills the iframe successfully with no state lost after each dom call; I will adapt it to b2g now and see how it works.  The reference-counting logic of Fabrice's patch will be crucial to make sure that one flow cannot destroy another in-process flow, so the work that's been done so far was certainly not wasted.

Fabrice, so that we don't duplicate work, how would you like to proceed?  Do you want to reassign this bug to me?
Comment 48 [:fabrice] Fabrice Desré 2013-02-22 11:57:13 PST
I have no idea what your patch is doing, if it's a replacement of mine or if they are both needed. But, if my patch is enough to fix the current issue from this bug, you should just review it and we have to push that as soon as possible.
Comment 49 Jed Parsons (use needinfo, please) [:jedp, :jparsons] 2013-02-24 22:20:33 PST
Created attachment 717742 [details] [diff] [review]
close identity dialog iframe when not in use (gaia)
Comment 50 Jed Parsons (use needinfo, please) [:jedp, :jparsons] 2013-02-24 22:21:33 PST
Created attachment 717744 [details] [diff] [review]
close identity dialog iframe when not in use (gecko)
Comment 51 Jed Parsons (use needinfo, please) [:jedp, :jparsons] 2013-02-24 22:42:40 PST
Fabrice, I've adapted and slightly modified your patches to create two new patches that close the identity dialog iframe when it is not in use.  It no longer holds indefinitely onto the chunk of memory used by the hidden iframe.

I've taken the refcount code from your patch, but I've moved it out of the DOM modules into the 'pipe' that the SignInToWebsite module uses to communicate with the UI (since I feel that the iframe is an implementation detail that the DOM code should not know about).  The pipe adds and removes watchers when it is opened and closed.  So the iframes can be cleaned up (and re-created as necessary) before uninit is called in the DOM module.

Looking at this with "watch -n.5 adb shell b2g-ps --oom" now, I see that the iframe goes away immediately after the persona API watch(), request(), and logout() calls.  Additionally, it is possible to have multiple Trusted UIs open concurrently with half-completed authentication: They do not cancel one another.

Can you give me your feedback on this?  Many thanks.
Comment 52 Jed Parsons (use needinfo, please) [:jedp, :jparsons] 2013-02-24 22:46:05 PST
Comment on attachment 716802 [details] [diff] [review]
wip gecko patch v3

This is great.  The main thing I would change here is to move the tabluation of watchers out of the dom module and into the SignInToWebsite controller; I don't think dom should know about how the persona ui is implemented - whether with an iframe as now, or with some other native code.

Additionally, I would propose a patch in which we can open and close the hidden iframe with each call to the pipe's communicate() method.  I've tested this and it seems to work.
Comment 53 [:fabrice] Fabrice Desré 2013-02-24 23:07:12 PST
Comment on attachment 717744 [details] [diff] [review]
close identity dialog iframe when not in use (gecko)

Review of attachment 717744 [details] [diff] [review]:
-----------------------------------------------------------------

That looks mostly ok, but there are a couple of issues that make we wonder how you tested this (the Identity:RP:Unwatch is never sent).

Also, there should not be any explicit references to "gaia" anywhere in gecko. This should just be "content".

::: b2g/components/SignInToWebsite.jsm
@@ +156,5 @@
> +        log("unwatching", options.id);
> +        this._removeWatchers(options.id, options.messageManager);
> +        break;
> +    }
> +  }, 

Nit: trailing whitespace

@@ +168,5 @@
> +        return;
> +      }
> +    }
> +    this._watchers.push({id: aId, count: 1, mm: aMm});
> +  }, 

Nit: trailing whitespace

@@ +184,5 @@
> +    }
> +
> +    if (index !== -1) {
> +      if (checkId) {
> +        if (--(this._watchers[index]) === 0) {

this should be this.watchers[index].count

@@ +201,5 @@
> +      log('telling gaia to close the dialog');
> +      // tell gaia to close the dialog
> +      GaiaInterface.sendChromeEvent(detail);
> +    }
> +  }, 

Nit: trailing whitespace

@@ +240,5 @@
>        let detail = {
> +        type: kDoneIdentityDialog,
> +        showUI: aOptions.showUI || false,
> +        id:  kDoneIdentityDialog + "-" + uuid,
> +        requestId: aOptions.rpOptions.id 

Nit: trailing whitespace

::: dom/identity/nsDOMIdentity.js
@@ +530,5 @@
> +      "Identity:RP:Unwatch",
> +      {id: this._id}
> +    );
> +  }
> +

This function is never called because you forgot to hook it in the inner-window-destroyed observer.

::: toolkit/identity/MinimalIdentity.jsm
@@ +137,5 @@
>      Services.obs.notifyObservers({wrappedJSObject: options},"identity-controller-watch", null);
>    },
>  
> +  /*
> +   * The RP has gone away; we can remove remove handles to the 

nit: trailing whitespace

@@ +192,5 @@
> +      if (this._rpFlows[key]._mm === messageManager) {
> +        log("** child process shutdown for rp", key, "- deleting flow");
> +        delete this._rpFlows[key];
> +      }
> +    }.bind(this));

Nit: no need to bind, just do forEach(function() {}, this);
Comment 54 Jed Parsons (use needinfo, please) [:jedp, :jparsons] 2013-02-25 09:39:09 PST
(In reply to Fabrice Desré [:fabrice] from comment #53)
> Comment on attachment 717744 [details] [diff] [review]
> close identity dialog iframe when not in use (gecko)
> 
> Review of attachment 717744 [details] [diff] [review]:
> -----------------------------------------------------------------

Thanks for the speedy feedback, Fabrice!

> That looks mostly ok, but there are a couple of issues that make we wonder
> how you tested this (the Identity:RP:Unwatch is never sent).

You're right: I failed to wire this up to the inner-window-destroyed observer.  Fixed.  Because of the way the communication pipe between chrome and content cleans up the iframe after each interaction with the iframe, it was already cleaned up by the time inner-window-destroyed was received.

I do need to create some tests that crash the requesting app to verify that the iframes and trusted ui close when the parent crashes.

Regarding testing, I have marionette tests that exercise an identity component of the UI Tests:
https://github.com/jedp/gaia-ui-tests/commit/6d8674d6418f072943d9bead8703c182c64e9c81

Additionally, I test manually to ensure that multiple tabs in the browser behave correctly with the same RP.  I also ensure that we can have multiple authentication flows at a time.  (I will try to marionettify these tests, too.)

> Also, there should not be any explicit references to "gaia" anywhere in
> gecko. This should just be "content".

Fixed

> Nit: trailing whitespace

I've re-linted and fixed all.
 
> @@ +184,5 @@
> > +    }
> > +
> > +    if (index !== -1) {
> > +      if (checkId) {
> > +        if (--(this._watchers[index]) === 0) {
> 
> this should be this.watchers[index].count

Wow.  Epic fail.  Thanks for catching that.
 
> ::: dom/identity/nsDOMIdentity.js
> @@ +530,5 @@
> > +      "Identity:RP:Unwatch",
> > +      {id: this._id}
> > +    );
> > +  }
> > +
> 
> This function is never called because you forgot to hook it in the
> inner-window-destroyed observer.

Thank you; fixed.

> @@ +192,5 @@
> > +      if (this._rpFlows[key]._mm === messageManager) {
> > +        log("** child process shutdown for rp", key, "- deleting flow");
> > +        delete this._rpFlows[key];
> > +      }
> > +    }.bind(this));
> 
> Nit: no need to bind, just do forEach(function() {}, this);

Thanks.  Fixed here and elsewhere in dom/identity.
Comment 55 Jed Parsons (use needinfo, please) [:jedp, :jparsons] 2013-02-25 09:41:25 PST
Created attachment 717932 [details] [diff] [review]
close identity dialog iframe when not in use (gecko) (v2)
Comment 56 Jed Parsons (use needinfo, please) [:jedp, :jparsons] 2013-03-07 09:41:54 PST
Just checking in; I have a patch that does what I want, except it appears to get the RP stuck if navigator.id api is first called if the device is offline.  Since this is a likely occurrence, I want to fix this before submitting for review.  I hope to have it done by the weekend.
Comment 57 Jed Parsons (use needinfo, please) [:jedp, :jparsons] 2013-03-07 09:42:54 PST
Argh!  Why did the flag reset to leo? - I can't change it back to leo+
Comment 58 Jed Parsons (use needinfo, please) [:jedp, :jparsons] 2013-03-11 14:42:11 PDT
Created attachment 723652 [details] [diff] [review]
close identity dialog iframe when not in use (gecko) (v3)

The problem described in Comment #56 (navigator.id.watch() hangs when network drops) was not actually introduced by this patch, so I'm going to focus on it instead in bug 769865.

The attached gecko patch closes the trusted ui and persona flow if the parent process is killed.

Fabrice, can I ask for your review?  Cheers,
j
Comment 59 [:fabrice] Fabrice Desré 2013-03-11 17:55:11 PDT
Comment on attachment 723652 [details] [diff] [review]
close identity dialog iframe when not in use (gecko) (v3)

Review of attachment 723652 [details] [diff] [review]:
-----------------------------------------------------------------

r=me with nits fixed

::: dom/identity/DOMIdentity.jsm
@@ +253,5 @@
> +  _unwatch: function DOMIdentity_unwatch(message, targetMM) {
> +    IdentityService.RP.unwatch(message.id, targetMM);
> +  },
> +
> +  _request: function DOMIdentity__request(message, targetMM) {

You don't use targetMM, please remove it.

::: dom/identity/nsDOMIdentity.js
@@ +527,5 @@
> +  uninit: function DOMIdentity_uninit() {
> +    this._log("nsDOMIdentity uninit()");
> +    this._identityInternal._mm.sendAsyncMessage(
> +      "Identity:RP:Unwatch",
> +      {id: this._id}

nit: { id: this._id }

@@ +615,2 @@
>      ];
>      this._messages.forEach((function(msgName) {

Nit : you don't need the () around function(msgName) anymore

::: toolkit/identity/MinimalIdentity.jsm
@@ +60,5 @@
>      }
>    });
>  
> +  // check validity of message structure
> +  if ((typeof options.id === 'undefined') || 

nit: trailing whitespace
Comment 60 Jed Parsons (use needinfo, please) [:jedp, :jparsons] 2013-03-12 12:28:42 PDT
(In reply to Fabrice Desré [:fabrice] from comment #59)
> Comment on attachment 723652 [details] [diff] [review]
> close identity dialog iframe when not in use (gecko) (v3)
> 
> Review of attachment 723652 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> r=me with nits fixed

Thanks for your review, Fabrice.

> ::: dom/identity/DOMIdentity.jsm
> @@ +253,5 @@
> > +  _unwatch: function DOMIdentity_unwatch(message, targetMM) {
> > +    IdentityService.RP.unwatch(message.id, targetMM);
> > +  },
> > +
> > +  _request: function DOMIdentity__request(message, targetMM) {
> 
> You don't use targetMM, please remove it.

Quite so.  Removed.

> ::: dom/identity/nsDOMIdentity.js
> @@ +527,5 @@
> > +  uninit: function DOMIdentity_uninit() {
> > +    this._log("nsDOMIdentity uninit()");
> > +    this._identityInternal._mm.sendAsyncMessage(
> > +      "Identity:RP:Unwatch",
> > +      {id: this._id}
> 
> nit: { id: this._id }

Fixed

> @@ +615,2 @@
> >      ];
> >      this._messages.forEach((function(msgName) {
> 
> Nit : you don't need the () around function(msgName) anymore

Yup.  Fixed.

> ::: toolkit/identity/MinimalIdentity.jsm
> @@ +60,5 @@
> >      }
> >    });
> >  
> > +  // check validity of message structure
> > +  if ((typeof options.id === 'undefined') || 
> 
> nit: trailing whitespace

Removed
Comment 61 Jed Parsons (use needinfo, please) [:jedp, :jparsons] 2013-03-12 12:29:59 PDT
Created attachment 724095 [details] [diff] [review]
close identity dialog iframe when not in use (gecko) (v4)

Updated according to fabrice's comments.

Ben, can I ask you for final approval?
Comment 62 Ben Adida [:benadida] 2013-03-13 17:11:44 PDT
Comment on attachment 724095 [details] [diff] [review]
close identity dialog iframe when not in use (gecko) (v4)

Review of attachment 724095 [details] [diff] [review]:
-----------------------------------------------------------------

One comment below. If it's easy to handle, I would prefer it be done, but if it's a real pain, this is not critical and I'm okay with this patch. r=me.

::: b2g/components/SignInToWebsite.jsm
@@ +409,5 @@
>    },
>  
>    doWatch: function SignInToWebsiteController_doWatch(aRpOptions) {
>      // dom prevents watch from  being called twice
> +    let options = {

Initially, I preferred seeing rpOptions from other options as separate arguments for the sake of defensive coding, as those have different trust properties: we trust the RP less than the chrome, of course. I would prefer that we stick to that defensive approach unless that's a big pain.

(this happens a few times, of course, every time we call pipe.communicate)
Comment 63 Jed Parsons (use needinfo, please) [:jedp, :jparsons] 2013-03-14 07:55:58 PDT
(In reply to Ben Adida [:benadida] from comment #62)
> Comment on attachment 724095 [details] [diff] [review]
> close identity dialog iframe when not in use (gecko) (v4)
> 
> Review of attachment 724095 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> One comment below. If it's easy to handle, I would prefer it be done, but if
> it's a real pain, this is not critical and I'm okay with this patch. r=me.
> 
> ::: b2g/components/SignInToWebsite.jsm
> @@ +409,5 @@
> >    },
> >  
> >    doWatch: function SignInToWebsiteController_doWatch(aRpOptions) {
> >      // dom prevents watch from  being called twice
> > +    let options = {
> 
> Initially, I preferred seeing rpOptions from other options as separate
> arguments for the sake of defensive coding, as those have different trust
> properties: we trust the RP less than the chrome, of course. I would prefer
> that we stick to that defensive approach unless that's a big pain.
> 
> (this happens a few times, of course, every time we call pipe.communicate)

Good point!  Fixed.
Comment 64 Jed Parsons (use needinfo, please) [:jedp, :jparsons] 2013-03-14 08:00:13 PDT
Created attachment 724915 [details] [diff] [review]
close identity dialog iframe when not in use (gecko) (v5)

r=fabrice, r=benadida
Comment 65 Jed Parsons (use needinfo, please) [:jedp, :jparsons] 2013-03-14 08:06:00 PDT
I've opened gaia v1-train PR https://github.com/mozilla-b2g/gaia/pull/8648

This gecko patch wants to land together with that gaia patch (same as attachment above).  I hope I'm doing that right; it's been a while since I've landed a b2g patch for both gaia and gecko.  Thanks!
Comment 66 Ryan VanderMeulen [:RyanVM] 2013-03-14 12:34:03 PDT
Gecko patch:
https://hg.mozilla.org/integration/mozilla-inbound/rev/3da6a0ba66ff

Leaving checkin-needed set for the Gaia PR.
Comment 67 Jed Parsons (use needinfo, please) [:jedp, :jparsons] 2013-03-14 12:56:37 PDT
Here is the PR for gaia master: https://github.com/mozilla-b2g/gaia/pull/8654
Comment 68 Ryan VanderMeulen [:RyanVM] 2013-03-15 05:05:40 PDT
https://hg.mozilla.org/mozilla-central/rev/3da6a0ba66ff
Comment 69 Ryan VanderMeulen [:RyanVM] 2013-03-15 05:25:43 PDT
Yay m-cMerge. Comment #67 needs attention as soon as possible.
Comment 70 Julien Wajsberg [:julienw] 2013-03-15 05:53:31 PDT
gaia master: 19d99d5882c1571c63e7f2637f0dc8a64cda7099
Comment 71 Ryan VanderMeulen [:RyanVM] 2013-03-15 06:16:01 PDT
https://hg.mozilla.org/releases/mozilla-b2g18/rev/78be5f74e410

Leaving the status flag set to affected for the v1-train uplift.
Comment 72 Julien Wajsberg [:julienw] 2013-03-15 06:20:23 PDT
gaia v1-train: 4724f7873ead5c4c8863173a46cf98deedb2f198

Note You need to log in before you can comment on or make changes to this bug.