App objects on the child processes sometimes are created with a wrong state

RESOLVED FIXED in mozilla34

Status

RESOLVED FIXED
5 years ago
a year ago

People

(Reporter: carmen.jimenezcabezas, Assigned: ferjm)

Tracking

({regression, verifyme})

unspecified
mozilla34
regression, verifyme
Dependency tree / graph
Bug Flags:
in-testsuite +

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(2 attachments, 37 obsolete attachments)

13.58 KB, patch
amac
: review+
Details | Diff | Splinter Review
56.68 KB, patch
amac
: review+
Details | Diff | Splinter Review
(Reporter)

Description

5 years ago
STR:
1. Connect to any page that installs apps (for example, Mozilla marketplace).
2. Click Install
3. Kill the installer app (marketplace, browser, whatever) before the confirm dialog appears. Note that while I'm killing it intentionally, it could be killed by OOM
4. Click Install on the dialog

Expected results (and what happened before bug 873567 landed):

The app is installed correctly

Actual results:

The app icon appears (or not, half of the times I don't even get the icon, but still the app *is* added to webapps) but the app never finishes installing (in fact it never ever starts downloading). The only thing the user can do (if he got lucky and actually got the icon) is to 'deinstall' the app and try again.

If he didn't get the icon, then any further attempts to install the app will fail silently (since it'll be considered a reinstall). After a reboot of the phone, the icons appear and then he can deinstall the app and try again.
Blocks: 873567
No longer depends on: 873567
The solution for 893800 depends on the solution taken here. FWIW, I don't like much the solution taken on bug 873567. If the intention was to ensure that the onsuccess handler had time to be defined no matter the download speed I think a better approach would have been to queue the event on the child process in the case it was received before the handler was set, and then fire the events immediately if the event handler is set after the event has happened. That way the actual download flow isn't impacted by the child process.
Blocks: 893800
Besides breaking this, the original patch fixed it so the tests run but didn't fix the underlying problem (it's still possible to miss on critical events, leaving the installed apps in funny states till the phone is rebooted).

I think we should revert the change in [1], and then we can fix the original problems by one of those two ways(that I can see):

1. Change Webapps.js so the one-shot event semantics is as follows:
"This event handler will fire when the event happens. In case the event has already happened, it will fire immediately" 

instead of the current one which is: 

"This event handler will fire when the event happens. If the event has already
happened, it will not fire at all"


or 

2. 
  a. Change Gaia so whenever a installPackage/install is called, or wherever a apps.mgmt.onwhatever is defined, the code checks if the event has already happened and calls the event manually. That is, do something like:

var handler = function () {
  // We might need to check if this is the first time we're called or not
}

app.ondownloadapplied = handler();

if (app.installState === 'installed') { 
  // Maybe we set the handler too late... let's call it just in case
  handler();
  // And we unset it.. this should avoid it being called
  app.ondownloadapplied = null;
}

for all the handlers (also the onsuccess, onwhatever).

b. Change the unit tests so they take the current DOMRequest semantics into account.


I've been talking about this with Fabrice on IRC, pending on his decision on which way to go. 


[1] https://hg.mozilla.org/mozilla-central/rev/ce95e81ebee8
Assignee: nobody → amac
Flags: needinfo?(fabrice)
Created attachment 798607 [details] [diff] [review]
Part 1: Rollback the Install:Return:Ack related changes

This just rolls back the changes introduced on Bug 873567 and Bug 839810 that queued the downloads until the child process/launcher returned an ACK.

Note that this patch by itself would fix the problem described in c0, but it would generate again the problems that were fixed on the rolled back bugs.
Attachment #798607 - Flags: feedback?(ferjmoreno)
Attachment #798607 - Flags: feedback?(fabrice)
Created attachment 798608 [details] [diff] [review]
Part 2. Make app event handlers fire immediately when set if already should have fired

This makes all the apps.onwhatever fire immediately when they're set if the related event has happened already. This by itself fixes part of the problems with the tests that Bug 873567 and Bug 839810 fixed originally. The rest of the problems are fixed by fixing the tests themselves.
Attachment #798608 - Flags: review?(ferjmoreno)
Attachment #798608 - Flags: review?(fabrice)
Attachment #798607 - Flags: review?(ferjmoreno)
Attachment #798607 - Flags: review?(fabrice)
Attachment #798607 - Flags: feedback?(ferjmoreno)
Attachment #798607 - Flags: feedback?(fabrice)
Attachment #798607 - Flags: feedback?
Created attachment 798609 [details] [diff] [review]
WIP Part 3. Fixed unit tests for apps

And this fixes the texts for the DOMRequests that I didn't touch in part 2 (DOMRequest events only fire if the handler is set before the event happens).

This is marked as a WIP because I haven't ran the tests yet, so it's up just for feedback.
Attachment #798609 - Flags: feedback?(ferjmoreno)
Attachment #798609 - Flags: feedback?(fabrice)
Comment on attachment 798609 [details] [diff] [review]
WIP Part 3. Fixed unit tests for apps

Review of attachment 798609 [details] [diff] [review]:
-----------------------------------------------------------------

::: dom/apps/tests/test_app_update.html
@@ +78,5 @@
>  
>      var request = navigator.mozApps.install(gHostedManifestURL);
> +    request.onerror = mozAppsError.bind(undefined, request);
> +    request.onsuccess = continueTest.bind(undefined, request);
> +    setTimeout(function(request) {

Why is setTimeout needed here and other locations in this file?
(In reply to Jason Smith [:jsmith] from comment #6)
> Comment on attachment 798609 [details] [diff] [review]
> WIP Part 3. Fixed unit tests for apps
> 
> Review of attachment 798609 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> ::: dom/apps/tests/test_app_update.html
> @@ +78,5 @@
> >  
> >      var request = navigator.mozApps.install(gHostedManifestURL);
> > +    request.onerror = mozAppsError.bind(undefined, request);
> > +    request.onsuccess = continueTest.bind(undefined, request);
> > +    setTimeout(function(request) {
> 
> Why is setTimeout needed here and other locations in this file?
Because the original tests used yield/iterator.next as a way to control de flow and wait for event handlers to fire. But the problem is that with the DOMRequests the handlers are only called if they're set before the event happens. So it's possible that on some cases the events won't get fired and the original tests would hang. And you cannot call iterator.next on the same flow that has the yield (cause it gives an exception). setTimeout is just a nice trick to get around that and to keep the flow going even when the handlers don't fire.
Comment on attachment 798607 [details] [diff] [review]
Part 1: Rollback the Install:Return:Ack related changes

Review of attachment 798607 [details] [diff] [review]:
-----------------------------------------------------------------

::: dom/apps/src/Webapps.jsm
@@ +2001,5 @@
>        aData.app[aProp] = appObject[aProp];
>       });
>  
>      if (manifest.appcache_path) {
> +      this.startOfflineCacheDownload(manifest, app, aPprofileDir);

This is aProfileDir... seems I missed a qref, sorry about that.
(Assignee)

Comment 9

5 years ago
Comment on attachment 798608 [details] [diff] [review]
Part 2. Make app event handlers fire immediately when set if already should have fired

Review of attachment 798608 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good to me.

Just remember to also change _onprogress at https://mxr.mozilla.org/mozilla-central/source/dom/apps/src/Webapps.js#464

::: dom/apps/src/Webapps.js
@@ +366,5 @@
>    },
>  
> +  _setEventHandler: function(aType, aHandler) {
> +    this._eventHandlers[aType].handler = aHandler;
> +    if (this._eventHandlers[aType].queued) {

if (aHandler && this._eventHandlers[aType].queued)
Attachment #798608 - Flags: review?(ferjmoreno) → review+
(Assignee)

Comment 10

5 years ago
Comment on attachment 798608 [details] [diff] [review]
Part 2. Make app event handlers fire immediately when set if already should have fired

Review of attachment 798608 [details] [diff] [review]:
-----------------------------------------------------------------

::: dom/apps/src/Webapps.js
@@ +364,5 @@
>                             "Webapps:PackageEvent",
>                             "Webapps:CheckForUpdate:Return:OK"]);
>    },
>  
> +  _setEventHandler: function(aType, aHandler) {

I know this is not needed for new files anymore, but not sure about already existing ones, so name the function, please. Also _getEventHandler.
(Assignee)

Updated

5 years ago
Attachment #798607 - Flags: review?(ferjmoreno) → review+
(Assignee)

Comment 11

5 years ago
Comment on attachment 798609 [details] [diff] [review]
WIP Part 3. Fixed unit tests for apps

Review of attachment 798609 [details] [diff] [review]:
-----------------------------------------------------------------

I only did a quick pass. Quite ugly, but it looks like it could work.

I still wonder why DOMRequest has not a queue for events that happen before the handler is set...

::: dom/apps/tests/test_app_update.html
@@ +38,5 @@
>    }
>  
> +  function mozAppsError(request) {
> +    if (request) {
> +      this.onerror = null;

request.onerror

::: dom/apps/tests/test_packaged_app_install.html
@@ +106,5 @@
>        finish();
>      }
>    };
> +  if (req.result && req.onsuccess) {
> +    req.onsucess();

typo: onsuccess

@@ +136,5 @@
>                                 aName) {
>    var req = navigator.mozApps.installPackage(aMiniManifestURL);
>    req.onsuccess = function() {
>      ok(true, "App installed");
> +    req.onsucess = undefined;

typo: onsuccess
Attachment #798609 - Flags: feedback?(ferjmoreno) → feedback+
Created attachment 799299 [details] [diff] [review]
alternate solution

Here's my take on the problem: we track dying processes and force an Ack when we are in confirmInstall() from a dead process. This is a wip, missing cleanup in "normal" case and in denyInstall().

With this patch I was able to install and run an app from the marketplace following the steps in comment #0.
Flags: needinfo?(fabrice)
(In reply to Fabrice Desré [:fabrice] from comment #12)
> Created attachment 799299 [details] [diff] [review]
> alternate solution
> 
> Here's my take on the problem: we track dying processes and force an Ack
> when we are in confirmInstall() from a dead process. This is a wip, missing
> cleanup in "normal" case and in denyInstall().
> 
> With this patch I was able to install and run an app from the marketplace
> following the steps in comment #0.

Going back to the classics:

Procure siempre acertalla
el honrado y principal;
pero si la acierta mal,
defendella y no enmendalla.

Which, on the other hand, can apply to both of us :P.

With your proposed change, we're still linking the download to what the child process do, which isn't really necessary, so as to no acknowledge that we have another underlying problem. Even with detecting child process deaths, what happens if the child process hangs instead of dieing? For example, what happens if I do a 

while (true); 

after doing mozApps.installPackage()? Yes, that's a problem on the caller's part, but the point is the caller should not be able to affect the callee in this way. 

Going by parts: 

* I didn't touch the DOMRequest on my proposed patch, because I understand that sooner than later they'll be replaced with futures, which have a much saner semantics (meaning the .then() fires immediately if it's called after the promise has already been fulfilled).  

* The original problem, as detected on the tests, can be fixed by touching the tests and ensuring the handler is called (by checking request.result) instead of linking the child and parent process in any way. This is in fact what part 3 does because as I said previously, I didn't touch the DOMRequest.

* What I did touch was mozIDOMApplication implementation of the event handlers. mozIDOMApplication isn't even a EventTarget... and most people I've talked to assume that the on* handlers there already behave as they would do after my part 2 patch. 

In fact, part 1 + part 3 would have fixed the problem described in c0 without reintroducing the problems that the rolled back patches originally aimed to fix. But it would have leave open the problem that's happening actually. On Homescreen we have something like (much simplified)

var appMgr = navigator.mozApps.mgmt;

appMgr.oninstall = function oninstall(event) {
  app = event.application;
  if (!app.isBookmark) {
    app.ondownloadapplied = function ondownloadapplied(event) {
      // do something
      app.ondownloadapplied = null;
      app.ondownloaderror = null;
    };
    app.ondownloaderror = function ondownloaderror(event) {
      // Do something else
    };
  }
};

which sounds like a sensate use of the ondownloadapplied event, only with the current semantics it's wrong (and in fact it IS failing: sometimes when the downloaded app is very small, the "downloadapplied" event gets fired before the handler is set, and so the icon stays on the rocket until the phone is rebooted or the homescreen dies). 

That was what part2 fixed.

I still think the send ack/wait for ack is evil. If you don't like changing the semantics of the .on* handlers then I can implement comment 2 solution 2, which still removes the ACK and fixes the problem by changing the client code. That would still mean rolling back the ack change, and then changing the tests and Gaiam as described on comment 2.
> * What I did touch was mozIDOMApplication implementation of the event
> handlers. mozIDOMApplication isn't even a EventTarget... and most people
> I've talked to assume that the on* handlers there already behave as they
> would do after my part 2 patch. 

So, these people believe that when you attach a resize event handler on a window you get the previous resize?
Also, mozIDOMApplication is not a real event target because we could not implement it in JS until recently. As soon as we move the implementation to webidl (see bug 899322) it will be an EventTarget, so there is no way we will land changes to these semantics.

> I still think the send ack/wait for ack is evil. If you don't like changing
> the semantics of the .on* handlers then I can implement comment 2 solution
> 2, which still removes the ACK and fixes the problem by changing the client
> code. That would still mean rolling back the ack change, and then changing
> the tests and Gaiam as described on comment 2.

Yes, let's fix all the call sites first, and then rollback the gecko parts.
Attachment #798607 - Flags: review?(fabrice)
Attachment #798608 - Flags: review?(fabrice)
Attachment #798609 - Flags: feedback?(fabrice)
Created attachment 804104 [details] [diff] [review]
Part 2. Update child apps when they're outdated at construction time

This is mostly done (and tested). Needs some cleaning up, but before that I wanted to check that this is what we talked about yesterday.
Attachment #798608 - Attachment is obsolete: true
Attachment #798609 - Attachment is obsolete: true
Attachment #804104 - Flags: feedback?(fabrice)
Comment on attachment 804104 [details] [diff] [review]
Part 2. Update child apps when they're outdated at construction time

Review of attachment 804104 [details] [diff] [review]:
-----------------------------------------------------------------

::: dom/apps/src/Webapps.jsm
@@ +828,5 @@
> +
> +        // If it wasn't registered before, let's update its state
> +        if ((aMsgName === 'Webapps:PackageEvent') ||
> +            (aMsgName === 'Webapps:OfflineCache')) {
> +          if (aApp && aApp.manifestURL) {

nit: if (man)

@@ +941,5 @@
>        case "Webapps:InstallPackage":
>          this.doInstallPackage(msg, mm);
>          break;
>        case "Webapps:RegisterForMessages":
> +        this.addMessageListener(msg.messages, mm, msg.app);

nit: keep the mm as the last argument : (msg.messages, msg.app, mm)
Attachment #804104 - Flags: feedback?(fabrice) → feedback+
Status: NEW → ASSIGNED
Created attachment 804350 [details] [diff] [review]
Part 2. Update child apps when they're outdated at construction time
Attachment #804104 - Attachment is obsolete: true
Attachment #804350 - Flags: review?(fabrice)
Attachment #804350 - Flags: review?(ferjmoreno)
Comment on attachment 804350 [details] [diff] [review]
Part 2. Update child apps when they're outdated at construction time

Review of attachment 804350 [details] [diff] [review]:
-----------------------------------------------------------------

r=me Let's wait for the try build...

::: dom/apps/src/Webapps.jsm
@@ +994,4 @@
>        return;
>      }
>      this.children[aMsgName].forEach(function(mmRef) {
> +      debug("BroadcastMessage: Sending " + aMsgName + ":" + aContent.type + " to " + mmRef.mm);

Please remove these debug messages before landing.
Attachment #804350 - Flags: review?(fabrice) → review+
Attachment #804350 - Flags: review?(ferjmoreno)
Try run for part 2 is at: https://tbpl.mozilla.org/?tree=Try&rev=bafcaa6ca87e

Only part 2 (https://bugzilla.mozilla.org/attachment.cgi?id=804350) must be checked in now. I want to run a few more tests on part 1 (which is the one that actually fixes the problem described in comment 0) before asking to checkin for it. Reviewing the r+ from it now to avoid confusions :).
Keywords: checkin-needed
Whiteboard: [leave open]
I've just remembered I forgot to upload a new version of the patch with the nits fixed. I will upload it in a few and rerequest checkin then.
Keywords: checkin-needed
Whiteboard: [leave open]
Created attachment 804889 [details] [diff] [review]
Part 2. Update child app objects when they're outdated at creation time

r=fabrice

Test run is at https://tbpl.mozilla.org/?tree=Try&rev=0e1e9c92b0a3 and at 
https://tbpl.mozilla.org/?tree=Try&rev=bafcaa6ca87e (the second one has already finished and the only difference between the two is the removal of a couple of debug() calls.
Attachment #804350 - Attachment is obsolete: true
Attachment #804889 - Flags: review+
Requesting checkin since https://tbpl.mozilla.org/?tree=Try&rev=bafcaa6ca87e finished correctly (and this one only removes a couple of debug calls). 

Only attachment 804889 [details] [diff] [review] should be checked in now, I still want to run some more tests on the other one, but this one by itself should help with random test failures for tests relateds with the mozApp API.
Keywords: checkin-needed
Whiteboard: [leave open]
No longer blocks: 893800
Sorry for the long time without news here, had a bad week. I've been checking out the possibility of removing the ACK exchanging (because its the root cause of this bug, and while it fixes the test cases it doesn't really fix the real usage) and I've ran into a wall: installation/download errors.

What I did on the landed patch is to update the child object with the parent state if they were outdated at creation time. And that works well for most cases. But it doesn't work for this one (and the ACK doesn't really help either if there are two child process), when this happens:

T0: Child process 1 registers mozApps.mgmt.oninstall.
T1: Child process 2: calls mozApps.oninstall.
T2: Parent receives the msg to install, starts processing it, broadcasts the PackageEvent, sends the Ack
T3: Child process 2: executes the onsuccess handler, sends the ACK
T4: Child process 2 starts executing the app constructor.
T5: Parent process, receives the ACK, installs the package, gets an error, broadcasts the error.
T6: Child process 2 finishes executing the app constructor. Sets the onerror handler on the app object. T7: Since the The object is outdated, it's updated. But we've lost the error event!

The problem here is that error events or error states are not kept on the parent process at all. They're not part of the object. But that means that there's no real way of making this:

navigator.mozApps.mgmt.oninstall = function(evt) {
  var aApp = evt.application;
  aApp.ondownloaderror = downloaderrorHandler;
  // 2 
}

Because if the error happens before the app object is built at the child process (different child than the one calling install!) then the newly created object will have downloadError = undefined/null, and downloaderror won't be called ever. And doing:

if (aApp.ondownloaderror) {
  // Call error handler
}

at // 2 won't work either since aApp.ondownloaderror will be null/undefined.

If we don't want to cache events, the only solution I can think to this is to keep the last download operation state at the parent also (downloadResult == null means nothing has been done to the object since creation at the parent, otherwise we kept either DOWNLOAD_SUCCESS or the latest error reported and sending that as part of the PackageEvent so the child process can also update downloadError).
Flags: needinfo?(fabrice)
I'm not saying that we don't have issues there, but what is important is that at any time T, all DOM objects have the correct properties. If we ensure that, we are good.

(In reply to Antonio Manuel Amaya Calvo (:amac) from comment #25)
 
> The problem here is that error events or error states are not kept on the
> parent process at all. They're not part of the object. But that means that
> there's no real way of making this:
> 
> navigator.mozApps.mgmt.oninstall = function(evt) {
>   var aApp = evt.application;
>   aApp.ondownloaderror = downloaderrorHandler;
>   // 2 
> }

What is the value of aApp.downloadError before you set ondownloaderror ?
Flags: needinfo?(fabrice)
(In reply to Fabrice Desré [:fabrice] from comment #26)
> I'm not saying that we don't have issues there, but what is important is
> that at any time T, all DOM objects have the correct properties. If we
> ensure that, we are good.

Yeah, that's not happening. From T5 onwards, the child process 2 has an incorrect state.
> 
> (In reply to Antonio Manuel Amaya Calvo (:amac) from comment #25)
>  
> > The problem here is that error events or error states are not kept on the
> > parent process at all. They're not part of the object. But that means that
> > there's no real way of making this:
> > 
> > navigator.mozApps.mgmt.oninstall = function(evt) {
> >   var aApp = evt.application;
> >   aApp.ondownloaderror = downloaderrorHandler;
> >   // 2 
> > }
> 
> What is the value of aApp.downloadError before you set ondownloaderror ?

Null. The problem is that the error happens between the moment the child process is told 'hey, you have to create an app object' and the time the app object is actually created. That means the child hasn't registered its event handlers, and when it does so its state is updated from the parent... but the parent state doesn't currently include the possible download errors.
Which event handlers are you talking about? The web facing ones, or the IPC message listeners?
The message listeners, sorry for the misnaming. The listeners are added at the end of the object constructor, and if by that time there has been an error, that process won't get the error ever. 

I have an idea for a possible patch. Was going to implement it today but got derailed. My idea is basically storing the download errors for apps and on the object creation (when it registers it's listeners) if the child object is outdated then I'll send the last error along with the updated object. That way the child process will have the correct data and error info.
Created attachment 810696 [details] [diff] [review]
WIP, doesn't work reliably yet

This is what I was talking about doing. It does work... more or less (which is another way of saying it doesn't work, yet :P ). Each child process/message manager is notified for outdated objects only once (per manifest URL). And they're notified only if the object being created is obsolete at the time of creation.

Do you feel I should finish this, or is this something that won't be r+'ed ever? I can't think on any other solution, TBH. And what we currently have (the ACKs) make the test work, but still doesn't work reliably on the field.
Attachment #810696 - Flags: feedback?(ferjmoreno)
Attachment #810696 - Flags: feedback?(fabrice)
BTW I think I discovered the problem with the uploaded version during the drive home. If this approach is ok I'll check it and finish it tomorrow.
So I'd like to try a different approach. Currently each dom app object is listening for ipc message even if eg two of them represent the same application. That's a waste of ipc. Instead, what about:
- each process is getting the list of apps, and updates to these. Let's call them cached apps.
- when we create a dom object, it gets its state by proxying the cached app (no copy necessary).
- events are also dispatched to the cached app that relays them to the N dom objects. If the dom object has a content event listener registered, we dispatch the event.

This means that the dom app objects are very lightweight, and mostly just manage event listeners. That will come basically for free with the webidl bindings.

I think that would solve our issues correctly, at the expense of a larger refactoring. Thoughts?
(In reply to Fabrice Desré [:fabrice] from comment #32)
> So I'd like to try a different approach. Currently each dom app object is
> listening for ipc message even if eg two of them represent the same
> application. 
That's why the patch I uploaded yesterday fails some times also (because if there were two objects created at the child process and one of them registered his listeners in time, some times the second object would get two updates because it would see the update for the first and it's own).

> That's a waste of ipc. Instead, what about:
> - each process is getting the list of apps, and updates to these. Let's call
> them cached apps.
Do you mean copying all of the registry to each child process? Or just the apps it has created?

> - when we create a dom object, it gets its state by proxying the cached app
> (no copy necessary).
> - events are also dispatched to the cached app that relays them to the N dom
> objects. If the dom object has a content event listener registered, we
> dispatch the event.
> 
> This means that the dom app objects are very lightweight, and mostly just
> manage event listeners. That will come basically for free with the webidl
> bindings.
> 
> I think that would solve our issues correctly, at the expense of a larger
> refactoring. Thoughts?

Hmm... a not too refactore-y approach could be (dunno if that's the same you were proposing :)):

* Register all the listeners at the constructor of mozApps (that is, WebappsRegistry around [1])
* Remove the registration of each individual object listener ([2])
* When a mozApps create a new new mozIDOMApplication is created, it adds it to the cached app object lists and returns a new object that has a reference to the cached object (and its own handlers). 
* When an app specific event is received, it'll be received by WebappsRegistry and it will:
  * If the app isn't on the cache, discard it
  * Otherwise, call the handlers for all the objects that depend on that cache element

I think that would solve all of the problems we're having... since the listeners would be set *before* any object is created, we won't run into any situation where an object is outdated by its creation time.

If that's the general idea, I can write a patch for that.

[1] https://mxr.mozilla.org/mozilla-central/source/dom/apps/src/Webapps.js#233
[2] https://mxr.mozilla.org/mozilla-central/source/dom/apps/src/Webapps.js#352
(In reply to Antonio Manuel Amaya Calvo (:amac) from comment #33)
 
> Hmm... a not too refactore-y approach could be (dunno if that's the same you
> were proposing :)):
> 
> * Register all the listeners at the constructor of mozApps (that is,
> WebappsRegistry around [1])

Not exactly. We already keep an updated list of apps per process for the appsService. It doesn't have all the information we need for the DOM api but we must augment this one and not create a new cache.

> * Remove the registration of each individual object listener ([2])

yes

> * When a mozApps create a new new mozIDOMApplication is created, it adds it
> to the cached app object lists and returns a new object that has a reference
> to the cached object (and its own handlers).

I'm not sure to follow the ownership model here. I think the dom object will have a strong ref on the cached app, and the cached app weak refs to the matching dom objects to dispatch events.


> * When an app specific event is received, it'll be received by
> WebappsRegistry and it will:
>   * If the app isn't on the cache, discard it
>   * Otherwise, call the handlers for all the objects that depend on that
> cache element

I would rather not have the registry receive these messages, since you can have several instances of webapps.js per process, but we want to manage only one cached app list per process.
(In reply to Fabrice Desré [:fabrice] from comment #34)
> (In reply to Antonio Manuel Amaya Calvo (:amac) from comment #33)
>  
> > Hmm... a not too refactore-y approach could be (dunno if that's the same you
> > were proposing :)):
> > 
> > * Register all the listeners at the constructor of mozApps (that is,
> > WebappsRegistry around [1])
> 
> Not exactly. We already keep an updated list of apps per process for the
> appsService. It doesn't have all the information we need for the DOM api but
> we must augment this one and not create a new cache.

On AppsServiceChild.jsm, right? I totally forgot about that one.
 
> > * Remove the registration of each individual object listener ([2])
> 
> yes
> 
> > * When a mozApps create a new new mozIDOMApplication is created, it adds it
> > to the cached app object lists and returns a new object that has a reference
> > to the cached object (and its own handlers).
> 
> I'm not sure to follow the ownership model here. I think the dom object will
> have a strong ref on the cached app, and the cached app weak refs to the
> matching dom objects to dispatch events.

Yep, that was my idea. The DOM object refers the cached app (and the cached app cannot get GCed while there's a DOM object pointing to it). The DOM object, though, can and should be GCed even if there are references from the registry/cached app to it.

> > * When an app specific event is received, it'll be received by
> > WebappsRegistry and it will:
> >   * If the app isn't on the cache, discard it
> >   * Otherwise, call the handlers for all the objects that depend on that
> > cache element
> 
> I would rather not have the registry receive these messages, since you can
> have several instances of webapps.js per process, but we want to manage only
> one cached app list per process.

Yeah, as I said, I forgot about the AppsService. I think... s/WebappsRegistry/DOMApplicationRegistry on the previous description.

Is this something you want to do on this bug? Because I almost would prefer finishing what I started (propagating the errors, that is fixing my WIP patch) even if that's something we'll throw away soon so we can get a working version and then do the rest on a followup bug. Mostly because this is something that's failing on 1.1 and 1.2 and it would be easier to ask for approval that way, because it'll be less risky that way, I think.
(Assignee)

Comment 36

5 years ago
(In reply to Fabrice Desré [:fabrice] from comment #32)
> So I'd like to try a different approach. Currently each dom app object is
> listening for ipc message even if eg two of them represent the same
> application. That's a waste of ipc. Instead, what about:
> - each process is getting the list of apps, and updates to these. Let's call
> them cached apps.
> - when we create a dom object, it gets its state by proxying the cached app
> (no copy necessary).
> - events are also dispatched to the cached app that relays them to the N dom
> objects. If the dom object has a content event listener registered, we
> dispatch the event.
> 
> This means that the dom app objects are very lightweight, and mostly just
> manage event listeners. That will come basically for free with the webidl
> bindings.
> 
> I think that would solve our issues correctly, at the expense of a larger
> refactoring. Thoughts?

Yes, please :)

This would be a great follow up for bug 910466.
(Assignee)

Comment 37

5 years ago
Comment on attachment 810696 [details] [diff] [review]
WIP, doesn't work reliably yet

I'd like to try the approach explained in comment 32
Attachment #810696 - Flags: feedback?(ferjmoreno)
(In reply to Fernando Jiménez Moreno [:ferjm] from comment #37)
> Comment on attachment 810696 [details] [diff] [review]
> WIP, doesn't work reliably yet
> 
> I'd like to try the approach explained in comment 32

I would like to do a transient solution based of my latest patch since it would be much less risky to land it on 1.2, and this is something that's failing currently (on 1.0.1, on 1.1 and on 1.2). But if you prefer to implement the refactor directly, feel free to steal the bug :).
Nothing will be backported on 1.2 here. It's too late.
(Assignee)

Comment 40

5 years ago
(In reply to Antonio Manuel Amaya Calvo (:amac) from comment #38)
> (In reply to Fernando Jiménez Moreno [:ferjm] from comment #37)
> > Comment on attachment 810696 [details] [diff] [review]
> > WIP, doesn't work reliably yet
> > 
> > I'd like to try the approach explained in comment 32
> 
> I would like to do a transient solution based of my latest patch since it
> would be much less risky to land it on 1.2, and this is something that's
> failing currently (on 1.0.1, on 1.1 and on 1.2). But if you prefer to
> implement the refactor directly, feel free to steal the bug :).

Stealing it :)
Assignee: amac → ferjmoreno

Updated

5 years ago
Blocks: 928262
(Assignee)

Comment 41

5 years ago
Created attachment 827444 [details] [diff] [review]
WIP

Very early WIP
(Assignee)

Comment 42

5 years ago
Created attachment 827922 [details] [diff] [review]
WIP
Attachment #827444 - Attachment is obsolete: true
(Assignee)

Updated

5 years ago
Attachment #810696 - Flags: feedback?(fabrice)
(Assignee)

Comment 43

5 years ago
Created attachment 828104 [details] [diff] [review]
WIP

Now with the proxy object
Attachment #827922 - Attachment is obsolete: true

Updated

5 years ago
blocking-b2g: --- → koi?
(Assignee)

Comment 44

5 years ago
Rafael, could you explain which are your reasons for nominating this bug for koi, please?

I would strongly suggest not to land this kind of refactors for koi if possible.
That is just not gonna happen.
blocking-b2g: koi? → ---
Summary: App download hangs indefinitely if the child process dies before confirming the install → App objects on the child processes sometimes are created with a wrong state
(Assignee)

Comment 46

5 years ago
Created attachment 8346111 [details] [diff] [review]
WIP

Now with update app state.
Attachment #798607 - Attachment is obsolete: true
Attachment #799299 - Attachment is obsolete: true
Attachment #804889 - Attachment is obsolete: true
Attachment #810696 - Attachment is obsolete: true
Attachment #828104 - Attachment is obsolete: true
(Assignee)

Comment 47

5 years ago
Created attachment 8347354 [details] [diff] [review]
WIP

Fabrice, this is a very basic WIP but it contains the basic idea. It still needs a lot of work, but I would appreciate if you could give me some feedback in the mean time. Thanks!
Attachment #8346111 - Attachment is obsolete: true
Attachment #8347354 - Flags: feedback?(fabrice)
Comment on attachment 8347354 [details] [diff] [review]
WIP

Review of attachment 8347354 [details] [diff] [review]:
-----------------------------------------------------------------

I like it! That looks good overall. Once case I think is not managed by the code is:
- start with a set of apps.
- the content holds a DOM object on app1.
- app1 is deleted, that removes it from DOMApps
- the content can still access properties on app1 and we will fail in interesting ways.

::: dom/apps/src/AppsServiceChild.jsm
@@ +115,5 @@
> +  addDOMApp: function(aApp, aManifestURL, aId) {
> +    let weakRef = Cu.getWeakReference(aApp);
> +    if (!this.DOMApps) {
> +      this.DOMApps = {};
> +    }

That could be done by default in init()

@@ +144,5 @@
> +    };
> +  },
> +
> +  _fireEvent: function(aMessage) {
> +    let msg = aMessage.json;

s/aMessage.json/aMessage.data

::: dom/apps/src/Webapps.js
@@ +548,5 @@
>        }
>      }
>    },
>  
> +  _fireRequestResult: function(aMessage) {

this function is unused as of now

@@ +560,5 @@
> +      req = this.takeRequest(msg.requestID);
> +    }
> +
> +    if (req) {
> +      Services.DOMRequest.fireSuccess(req, msg.manifestURL);

that will not work for the promises, right?
Attachment #8347354 - Flags: feedback?(fabrice) → feedback+
(Assignee)

Comment 49

5 years ago
Thanks for the feedback Fabrice!

(In reply to Fabrice Desré [:fabrice] from comment #48)
> Comment on attachment 8347354 [details] [diff] [review]
> WIP
> 
> Review of attachment 8347354 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> I like it! That looks good overall. Once case I think is not managed by the
> code is:
> - start with a set of apps.
> - the content holds a DOM object on app1.
> - app1 is deleted, that removes it from DOMApps
> - the content can still access properties on app1 and we will fail in
> interesting ways.
>

This is now controlled in the DOMApp proxy. Thanks.
 
>
> > +  _fireRequestResult: function(aMessage) {
> 
> this function is unused as of now
>

It is used from AppsServiceChild :)
 
> @@ +560,5 @@
> > +      req = this.takeRequest(msg.requestID);
> > +    }
> > +
> > +    if (req) {
> > +      Services.DOMRequest.fireSuccess(req, msg.manifestURL);
> 
> that will not work for the promises, right?

That's true, but there is no case where we need to use Promises yet.
(Assignee)

Comment 50

5 years ago
Created attachment 8351243 [details] [diff] [review]
WIP
Attachment #8347354 - Attachment is obsolete: true
(Assignee)

Comment 51

5 years ago
Created attachment 8368058 [details] [diff] [review]
WIP

Rebased and including a lot of fixes. Only a few more red tests to fix and we should be good for a first try.
Attachment #8351243 - Attachment is obsolete: true
(Assignee)

Comment 53

5 years ago
Created attachment 8369337 [details] [diff] [review]
v1

This should fix the orange.

New try build: https://tbpl.mozilla.org/?tree=Try&rev=935d6f44fc3e
Attachment #8368541 - Attachment is obsolete: true
(Assignee)

Comment 54

5 years ago
Created attachment 8369909 [details] [diff] [review]
v1
Attachment #8369337 - Attachment is obsolete: true
Attachment #8369909 - Flags: review?(fabrice)
Comment on attachment 8369909 [details] [diff] [review]
v1

Review of attachment 8369909 [details] [diff] [review]:
-----------------------------------------------------------------

Looks great! The only part that I'd really like to see fixed is the manifest one.

::: dom/apps/src/AppsServiceChild.jsm
@@ +63,5 @@
> +  },
> +
> +  observe: function(aSubject, aTopic, aData) {
> +    // Clear the cache on memory pressure.
> +    this._cache = { };

We should also trigger a forced GC here, because we don't really control whether we are before or after the GC in the memory-pressure observer list, but we really want everything to be cleared asap.

@@ +122,2 @@
>        this.cpmm.removeMessageListener(aMsgName, this);
>      }).bind(this));

nit: use => here

::: dom/apps/src/Webapps.js
@@ +290,2 @@
>      this._manifest = aApp.manifest;
>      this._updateManifest = aApp.updateManifest;

I'd prefer if we didn't special cased the manifest here. Can you try to get them to be sent like the other properties?

::: dom/apps/src/Webapps.jsm
@@ +61,1 @@
>  #endif

That makes debug() undefined on non DEBUG builds no?
Attachment #8369909 - Flags: review?(fabrice) → feedback+
(Assignee)

Comment 56

5 years ago
Created attachment 8377574 [details] [diff] [review]
v2

Fabrice, this patch should address your feedback. Thanks!
Attachment #8369909 - Attachment is obsolete: true
Attachment #8377574 - Flags: review?(fabrice)
(Assignee)

Comment 57

5 years ago
Comment on attachment 8377574 [details] [diff] [review]
v2

Actually, let me test again before asking for r?
Attachment #8377574 - Flags: review?(fabrice)
(Assignee)

Updated

5 years ago
Attachment #8378246 - Attachment is patch: true
Attachment #8378246 - Flags: review?(fabrice) → review+
(Assignee)

Comment 61

5 years ago
I think this can be closed now :)
Status: ASSIGNED → RESOLVED
Last Resolved: 5 years ago
Resolution: --- → FIXED
Whiteboard: [leave open]
This should fix the intermittent error from bug 928262. Can you ask for it to be re-enabled, Fernando?
Flags: needinfo?(ferjmoreno)
(Assignee)

Updated

5 years ago
Depends on: 972927
(Assignee)

Comment 63

5 years ago
Yes, trying at bug 972927
Flags: needinfo?(ferjmoreno)
Target Milestone: --- → mozilla30

Updated

5 years ago
Depends on: 977051
(Assignee)

Updated

5 years ago
No longer depends on: 977051

Updated

5 years ago
Depends on: 978089

Updated

5 years ago
Depends on: 977215
This is confirmed to cause bug 977215, which is a smoketest regression.

Adding checkin-needed for a backout.
Keywords: checkin-needed
(In reply to Jason Smith [:jsmith] from comment #64)
> This is confirmed to cause bug 977215, which is a smoketest regression.
> 
> Adding checkin-needed for a backout.

https://hg.mozilla.org/integration/b2g-inbound/rev/061c67d92d57
Status: RESOLVED → REOPENED
Keywords: checkin-needed
Resolution: FIXED → ---
This also likely breaks SMS notifications with this logcat error:

  GeckoConsole: [JavaScript Error: "TypeError: app.manifest is undefined" {file: "app://sms.gaiamobile.org/shared/js/notification_helper.js" line: 16}]
Confirmed, backing this out fixed the issue.
(In reply to Wes Kocher (:KWierso) from comment #65)
> (In reply to Jason Smith [:jsmith] from comment #64)
> > This is confirmed to cause bug 977215, which is a smoketest regression.
> > 
> > Adding checkin-needed for a backout.
> 
> https://hg.mozilla.org/integration/b2g-inbound/rev/061c67d92d57

Backout is now on mozilla-central:
https://hg.mozilla.org/mozilla-central/rev/061c67d92d57
(Assignee)

Comment 69

5 years ago
Created attachment 8396590 [details] [diff] [review]
v3

This should fix the regression reported in bug 978089. Couldn't reproduce bug 977215 or comment 66.
Attachment #8378246 - Attachment is obsolete: true
(Assignee)

Comment 70

5 years ago
Created attachment 8415260 [details] [diff] [review]
v3

Rebased on top of m-i.

I am afraid that with my new assignments I am not able to spend the time that this bug requires so Carmen will be helping here.

Carmen, as we spoke, we need to:

1- Fix the new failing tests introduced in bug 880043. I triggered a try run yesterday that shows the new oranges https://tbpl.mozilla.org/?tree=Try&rev=4606c8b198c8

2- Try to reproduce and fix the regressions reported in bug 977215 and comment 66.

3- Ask for a full QA verification as this bug is quite sensitive.

Thanks so much for your help!
Attachment #8396590 - Attachment is obsolete: true
Flags: needinfo?(cjc)
(Reporter)

Comment 71

5 years ago
Ok!, Taking this, thanks Fernando
Flags: needinfo?(cjc)
(Reporter)

Updated

5 years ago
Assignee: ferjmoreno → cjc
(Reporter)

Comment 72

5 years ago
Created attachment 8426246 [details] [diff] [review]
Proposed patch - v1. Incremental over Ferjm patch

This patch continues Fernando's work and fixes the latest problem which was that applications that included an origin on the manifest couldn't be installed correctly.
Try run is at: 
https://tbpl.mozilla.org/?tree=Try&rev=db6736f4cf96
Attachment #8426246 - Flags: review?(fabrice)
(Reporter)

Updated

5 years ago
Attachment #8426246 - Flags: review?(fabrice)
(Reporter)

Comment 73

5 years ago
Created attachment 8427148 [details] [diff] [review]
v3 patch rebased with lastest Mozilla Central changes
Attachment #8415260 - Attachment is obsolete: true
(Reporter)

Comment 74

5 years ago
With the small change I made on the second patch (Proposed patch - v1. Incremental over Ferjm patch) the mochitests run correctly and after two days of testing I'm unable to reproduce the problems described by Fernando in comment 70. Requesting review
(Reporter)

Updated

5 years ago
Attachment #8426246 - Flags: review?(fabrice)
(Reporter)

Updated

5 years ago
Attachment #8427148 - Flags: review?(fabrice)
Comment on attachment 8426246 [details] [diff] [review]
Proposed patch - v1. Incremental over Ferjm patch

Review of attachment 8426246 [details] [diff] [review]:
-----------------------------------------------------------------

Thanks for making an inter-diff!
Attachment #8426246 - Flags: review?(fabrice) → review+
Comment on attachment 8427148 [details] [diff] [review]
v3 patch rebased with lastest Mozilla Central changes

Review of attachment 8427148 [details] [diff] [review]:
-----------------------------------------------------------------

I also tested locally and this looks fine. I'll check with QA before landing to make sure they are aware of things to look at.
Attachment #8427148 - Flags: review?(fabrice) → review+
 https://hg.mozilla.org/integration/b2g-inbound/rev/3986e809158a

Marcia, this is the bug I talked about.
Flags: needinfo?(mozillamarcia.knous)
(Reporter)

Comment 79

5 years ago
Created attachment 8428300 [details] [diff] [review]
Proposed patch - v2. Incremental over Ferjm patch
Attachment #8426246 - Attachment is obsolete: true
Flags: needinfo?(cjc)
(Reporter)

Comment 80

5 years ago
Comment on attachment 8428300 [details] [diff] [review]
Proposed patch - v2. Incremental over Ferjm patch

I think this fixes it, it was just a silly typo on a message name
Attachment #8428300 - Flags: review?(fabrice)
Attachment #8428300 - Attachment is patch: true
Attachment #8428300 - Flags: review?(fabrice) → review+
https://hg.mozilla.org/mozilla-central/rev/09ebd0af27a4
Status: REOPENED → RESOLVED
Last Resolved: 5 years ago5 years ago
Resolution: --- → FIXED
Backed out for being the likely cause of the big spike of B2G debug mochitest crashes (as tracked in bug 950653).

https://hg.mozilla.org/mozilla-central/rev/ca042038cdee
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Target Milestone: mozilla30 → ---
The other issue with these patches, which I just noticed this morning, is that installing an app and then retrieving its record via mozApps.mgmt.getAll results in the mozIDOMApplication.manifest property being undefined, even though DOMApplicationRegistry._writeManifestFile correctly writes the data to disk, and DOMApplicationRegistry._readManifests correctly reads the manifest data from disk, before getAll returns the app record (so there isn't a file read/write race like in bug 991397).

I rebuilt after the patches were reverted, and the problem went away.
Flags: needinfo?(mozillamarcia.knous)
Carmen, do you still have time to work on that patch?
Flags: needinfo?(cjc)
(Reporter)

Comment 87

4 years ago
At this moment I'm pretty tied up with closing 2.0 blockers and some other things that while not blockers we would like to go into 2.0. So in the near future I don't think I'll have time for this. Releasing it and I'll retake it once the dust has settled if nobody takes it first.
Assignee: cjc → nobody
Flags: needinfo?(cjc)
I am not sure I am handling all the cases here but I am attempting fixes for cancellation states (which I filed separate bugs for) I will try to sync up with fabrice tomorrow.
(In reply to Myk Melez [:myk] [@mykmelez] from comment #85)
> The other issue with these patches, which I just noticed this morning, is
> that installing an app and then retrieving its record via
> mozApps.mgmt.getAll results in the mozIDOMApplication.manifest property
> being undefined, even though DOMApplicationRegistry._writeManifestFile
> correctly writes the data to disk, and DOMApplicationRegistry._readManifests
> correctly reads the manifest data from disk, before getAll returns the app
> record (so there isn't a file read/write race like in bug 991397).
> 
> I rebuilt after the patches were reverted, and the problem went away.

Myk, I can't reproduce with this code:

document.addEventListener("DOMContentLoaded", function() {
  console.log("Document loaded");
  document.getElementById("install-button").addEventListener("click",
  function() {
    console.log("Installing app");
    var req = navigator.mozApps.install("https://m.facebook.com/openwebapp/manifest.webapp");
    req.onsuccess = function() {
      var all = navigator.mozApps.mgmt.getAll();
      all.onsuccess = function() {
        var apps = all.result;
        console.log("Got " + apps.length + " apps");
        var i;
        for (i = 0; i < apps.length; i++) {
          if (!apps[i].manifest) {
            console.error("No manifest for " + apps[i].manifestURL);
          }
        }
      }
    }
  });

});

Is that similar to your test?
Flags: needinfo?(myk)
Created attachment 8449626 [details] [diff] [review]
app-ipc.patch

Rebased on current inbound.
Attachment #8427148 - Attachment is obsolete: true
Attachment #8428300 - Attachment is obsolete: true
(In reply to Fabrice Desré [:fabrice] from comment #89)
> Myk, I can't reproduce with this code:> Is that similar to your test?

Hmm, I'm having trouble remembering exactly what I was testing, but I bet I was testing sideloading apps on Android devices, which calls DOMApplicationRegistry.doInstall/doInstallPackage directly and then calls mozApps.getAll.  And the latest patch appears to have the same problem (or a similar one), as an app I sideloaded and then ran for the first time never finished starting up.  But I'll dig into this further tomorrow to confirm the problem.  Leaving the needinfo request to remind me to do that.
Hold on Myk - I found a race with how we set the manifest properties with the current patch.
(In reply to Fabrice Desré [:fabrice] from comment #92)
> Hold on Myk - I found a race with how we set the manifest properties with
> the current patch.

Ok, canceling the needinfo request in the meantime!
Flags: needinfo?(myk)
Created attachment 8451323 [details] [diff] [review]
Part 1. app-ipc.patch updated

I fixed a few issues and now got a nice green try build (https://tbpl.mozilla.org/?tree=Try&rev=1a5fc0532276) and all my local tests working too.

Fixed:
- a race in GetList where we would not return some manifests.
- a bug in the WrappedManifestCache where we stored undefined values for a key, and afterward failed to update it.
- a typo introduced when rebasing.
Blocks: 972076
https://hg.mozilla.org/mozilla-central/rev/ea4acc8b76dd
Assignee: nobody → ferjmoreno
Status: REOPENED → RESOLVED
Last Resolved: 5 years ago4 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla33
(Assignee)

Comment 97

4 years ago
Thanks Fabrice!
Depends on: 1036143
Since this is causing bug 1036143 can we either get a fix quickly or a backout?

Updated

4 years ago
Depends on: 1035867
Backed out for causing frequent B2G crashtest failures like the log below shows. Retriggers from before and after this landed on b2g-inbound confirm it to be the culprit.
https://hg.mozilla.org/mozilla-central/rev/cd6457486d15

https://tbpl.mozilla.org/php/getParsedLog.php?id=43443107&tree=Mozilla-Central

08:23:42     INFO -  REFTEST TEST-START | http://10.0.2.2:8888/tests/layout/base/crashtests/365909-2.xhtml
08:23:42     INFO -  REFTEST TEST-LOAD | http://10.0.2.2:8888/tests/layout/base/crashtests/365909-2.xhtml | 14 / 883 (1%)
08:23:42     INFO -  System JS : ERROR chrome://reftest/content/reftest-content.js:884 - NS_ERROR_UNEXPECTED: Component returned failure code: 0x8000ffff (NS_ERROR_UNEXPECTED) [nsISyncMessageSender.sendSyncMessage]
08:23:42     INFO -  *** UTM:SVC TimerManager:notify - notified @mozilla.org/b2g/webapps-update-timer;1
08:23:42     INFO -  ###################################### forms.js loaded
08:23:42     INFO -  ############################### browserElementPanning.js loaded
08:23:42     INFO -  ######################## BrowserElementChildPreload.js loaded
08:23:42     INFO -  REFTEST TEST-UNEXPECTED-FAIL | http://10.0.2.2:8888/tests/layout/base/crashtests/365909-2.xhtml | load failed: timed out waiting for test to complete (waiting for onload scripts to complete)

It's very possible that bug 984274 will wallpaper over this, for better or for worse.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
(In reply to Kevin Grandon :kgrandon from comment #98)
> Since this is causing bug 1036143 can we either get a fix quickly or a
> backout?

Actually bug 1036143 is bug 1013279, which in turn is bug 997717... which has been unfixed for two months now. I think this last patch just exposed the problem (and if we're not seeing it without it is because we're not really cancelling the download?).
Blocks: 997717
Created attachment 8453622 [details] [diff] [review]
Part 2. Reject the promise when cancelling a downlaod

This is incremental over the other patches. I saw that two of the changes I had proposed on bug 1013279 are already on the patch, but there's one case where the promise isn't rejected (and so it's trying to fulfill it incorrectly).
Attachment #8453622 - Flags: review?(fabrice)
Blocks: 1036847
No longer blocks: 972076
Depends on: 972076
Created attachment 8453888 [details] [diff] [review]
Delay the Webapps updater until the DOMApplicationRegistry is ready

I believe (and the code word there is believe cause I haven't seen it fail) that the crashtests failures are related with the Webapps updater (because on the log Ryan linked there was an updater trace and because otherwise there's no Webapps code that run there).

There are two possible solutions here... one is to just don't run the updater code on the tests, and the other (which is what this patch does) is at least to try and wait until the application registry is ready (although navigator.mozApps should already take care of that, so it's still probably better to take the other solution and just don't launch the updater on the tests).
Attachment #8453888 - Flags: feedback?(fabrice)
Attachment #8453622 - Flags: review?(fabrice) → review+
Comment on attachment 8453888 [details] [diff] [review]
Delay the Webapps updater until the DOMApplicationRegistry is ready

Review of attachment 8453888 [details] [diff] [review]:
-----------------------------------------------------------------

That seems to break xpcshell tests, and I'm not a fan of having consumers of the api do that.
Attachment #8453888 - Flags: feedback?(fabrice) → feedback-
Created attachment 8454341 [details] [diff] [review]
Delay processing mozApps.mgmt.getAll petition until the registry is ready

And second approach. This one makes more sense, actually, and is just to check on Webapps.jsm if the registry is ready before trying to process the petition. 
There's a fresh try run for this on https://tbpl.mozilla.org/?tree=Try&rev=5f5e5820bd64
Attachment #8453888 - Attachment is obsolete: true
Attachment #8454341 - Flags: feedback?(fabrice)
Blocks: 972076
No longer depends on: 972076
Comment on attachment 8454341 [details] [diff] [review]
Delay processing mozApps.mgmt.getAll petition until the registry is ready

Review of attachment 8454341 [details] [diff] [review]:
-----------------------------------------------------------------

I was thinking about doing something similar yesterday... I think we need to do 2 things:
- what you do here because we will call directly getAll() in Bug 972076
- also wait for this.registryReady().then(...) in receiveMessage(), because other methods of the dom api can also be racy.

That may let us get rid of the webapps-registry-ready observer notification, but I need to check all the consumers.
Attachment #8454341 - Flags: feedback?(fabrice) → feedback+
Created attachment 8454576 [details] [diff] [review]
Delay processing DOM API petitions until the application registry is ready

Changed the last version to delay all the petitions until the registry is ready. Emulator try run is at: https://tbpl.mozilla.org/?tree=Try&rev=96d0df0e4571 (it that doesn't explode I'll do a full run later).
Attachment #8454341 - Attachment is obsolete: true
Attachment #8454576 - Flags: feedback?(fabrice)
Created attachment 8454730 [details] [diff] [review]
With 100% less deadlock

So... the last one had a very nice deadlock. Turns out that one of the way the registryReady promise is resolved is via a message that I had made dependant on the promise being resolved. Go figure.

New try is at: https://tbpl.mozilla.org/?tree=Try&rev=a5bf45fcb803

And sorry for all the tries but I can't run the emulator locally. Although TBH I could have catched this also if I had tried it on an actual device instead of just running the tests on desktop. The device not booting would have been a dead giveaway.
Attachment #8454576 - Attachment is obsolete: true
Attachment #8454576 - Flags: feedback?(fabrice)
Created attachment 8455062 [details] [diff] [review]
P4, WIP! Change the way AppsServiceChild.jsm is initialized and work
Attachment #8455062 - Flags: feedback?(fabrice)
Created attachment 8455331 [details] [diff] [review]
Still WIP, doesn't work. Change the initialization of AppsServiceChild.jsm
Attachment #8455062 - Attachment is obsolete: true
Attachment #8455062 - Flags: feedback?(fabrice)
Created attachment 8456164 [details] [diff] [review]
Delay processing requests until the registry is ready

This looked mostly green:

https://tbpl.mozilla.org/?tree=Try&rev=89329e289c92

I've relaunched a full try run at https://tbpl.mozilla.org/?tree=Try&rev=0992d0072e90
Attachment #8454730 - Attachment is obsolete: true
Attachment #8455331 - Attachment is obsolete: true
Attachment #8456164 - Flags: review?(fabrice)
Also, try run only for the emulator, both opt and debug: https://tbpl.mozilla.org/?tree=Try&rev=07181510eb77 So much green :)
Hey Antonio - I noticed in your try run that it breaks a gaia integration test when resuming downloads: https://tbpl.mozilla.org/php/getParsedLog.php?id=43839241&tree=Try#error0

If you get a chance please verify that this use case continues to work. (We are currently trying to get these tests re-enabled on TBPL, and this one is passing for us these days).
Comment on attachment 8456164 [details] [diff] [review]
Delay processing requests until the registry is ready

Review of attachment 8456164 [details] [diff] [review]:
-----------------------------------------------------------------

Clearing review until we have all green builds.
Attachment #8456164 - Flags: review?(fabrice)
(In reply to Kevin Grandon :kgrandon from comment #113)
> Hey Antonio - I noticed in your try run that it breaks a gaia integration
> test when resuming downloads:
> https://tbpl.mozilla.org/php/getParsedLog.php?id=43839241&tree=Try#error0
> 
> If you get a chance please verify that this use case continues to work. (We
> are currently trying to get these tests re-enabled on TBPL, and this one is
> passing for us these days).

Hey Kevin... how do get that log from https://tbpl.mozilla.org/?rev=0992d0072e90&tree=Try ? I assume it's the "B2G Desktop Linux x64 Opt" but I don't see any orange or red there?
Flags: needinfo?(kgrandon)
(In reply to Antonio Manuel Amaya Calvo (:amac) from comment #115)
> (In reply to Kevin Grandon :kgrandon from comment #113)
> > Hey Antonio - I noticed in your try run that it breaks a gaia integration
> > test when resuming downloads:
> > https://tbpl.mozilla.org/php/getParsedLog.php?id=43839241&tree=Try#error0
> > 
> > If you get a chance please verify that this use case continues to work. (We
> > are currently trying to get these tests re-enabled on TBPL, and this one is
> > passing for us these days).
> 
> Hey Kevin... how do get that log from
> https://tbpl.mozilla.org/?rev=0992d0072e90&tree=Try ? I assume it's the "B2G
> Desktop Linux x64 Opt" but I don't see any orange or red there?

It's currently hidden, so to display it we can hack the URL with: https://tbpl.mozilla.org/?rev=0992d0072e90&tree=Try&showall=1

It will be the red `Gij`
Flags: needinfo?(kgrandon)
Well, that explains at least why I didn't see it :) I was going crazy :)

I'll take a look at that tomorrow, thank you.
I couldn't run the test. I've been trying manually on a flame (configured for 273mb also). And I can't make it fail. I've uploaded what I've been testing at:
http://youtu.be/_ElZ_c5blTM (sorry for the Blair's witch effect :P) Is that what the test is supposed to do? I've also tried canceling/restarting/canceling several times with no failures.
Flags: needinfo?(kgrandon)
(In reply to Antonio Manuel Amaya Calvo (:amac) from comment #118)
> I couldn't run the test. I've been trying manually on a flame (configured
> for 273mb also). And I can't make it fail. I've uploaded what I've been
> testing at:
> http://youtu.be/_ElZ_c5blTM (sorry for the Blair's witch effect :P) Is that
> what the test is supposed to do? I've also tried
> canceling/restarting/canceling several times with no failures.

Hey Antonio - have you tried running them manually inside of gaia? You can run the following command to do so:

TEST_FILES="apps/verticalhome/test/marionette/app_packaged_resume_test.js" make test-integration
Flags: needinfo?(kgrandon)
Also in the test we pause/resume the download twice from the homescreen. It's similar to your video. I wonder if we could be causing some bad behavior in b2g desktop?

I've also restarted the test run on tbpl a few times to verify.
Antonio - please let me know if you aren't able to run the tests locally. The make test-integration command should setup most of the things you need from a gaia checkout. Inside the gaia checkout you can use a custom b2g binary compiled with your patches.
Flags: needinfo?(amac.bug)
Yeah, I've also tried stopping/restarting several times in a row (two/four times). The problem with the emulator tests is that I can't run the emulator at all on my usual build machine (Ubuntu VM). And I can't compile on my Mac (it's beyond my patience :)). I'll try getting a precompiled binary for macos and just patch manually the jsm/is files and see how it goes. Do the tests fail also when ran against an actual device?
Flags: needinfo?(amac.bug) → needinfo?(kgrandon)
(In reply to Antonio Manuel Amaya Calvo (:amac) from comment #122)
> Yeah, I've also tried stopping/restarting several times in a row (two/four
> times). The problem with the emulator tests is that I can't run the emulator
> at all on my usual build machine (Ubuntu VM). And I can't compile on my Mac
> (it's beyond my patience :)). I'll try getting a precompiled binary for
> macos and just patch manually the jsm/is files and see how it goes. Do the
> tests fail also when ran against an actual device?

These tests currently only run on b2g desktop, I don't think we have everything quite running using our device runner, but we're working on it. I'll also try to get your patches applied and running against a local b2g instance.
Flags: needinfo?(kgrandon)
Also these tests run using b2g desktop, not the emulator. Not sure if that matters in your case though.
Created attachment 8459009 [details] [diff] [review]
Delay processing DOM API petitions until the application registry is ready

This should fix the failing Gaia integration test (thanks Kevin, BTW!). Try run is at  https://tbpl.mozilla.org/?tree=Try&rev=3da05f1a63f0
Attachment #8456164 - Attachment is obsolete: true
All of the app tests seemed to pass this time. Thank you! :)
Created attachment 8459556 [details] [diff] [review]
Delay processing requests until the registry is ready, v4.

This fixes the oth breakage from the last version. Try run is at

https://tbpl.mozilla.org/?tree=Try&rev=a584875f25d5
Attachment #8459009 - Attachment is obsolete: true
Try run looks good (lots of orange actually, but nothing that seems related nor new), requesting review for part 3.
Attachment #8459556 - Flags: review?(fabrice)
This is a dependency of a CAF bug but isn't blocking 2.0.  Should it be?
(In reply to Lucas Adamski [:ladamski] (plz needinfo) from comment #129)
> This is a dependency of a CAF bug but isn't blocking 2.0.  Should it be?

I don't think so - we've have had trouble landing this previously on trunk, as it's caused smoketest regressions, so I'm concerned that taking this on 2.0 could cause some critical regressions late in the 2.0 cycle.

Fabrice - What do you think?
Flags: needinfo?(fabrice)
Assuming we have a green gaia marionette test run, which we do now, I would like to see this in 2.0. Our gaia integration tests provide fairly deep coverage, much more so than the horizontal homescreen. I feel if we don't take this patch, then we will not be able to solve bug 1037706.
(In reply to Kevin Grandon :kgrandon from comment #131)
> Assuming we have a green gaia marionette test run, which we do now, I would
> like to see this in 2.0. Our gaia integration tests provide fairly deep
> coverage, much more so than the horizontal homescreen. I feel if we don't
> take this patch, then we will not be able to solve bug 1037706.

Let's talk about this offline. I'll ping you in IRC to discuss this.
I already said that I considered that one too risky to uplift. That's a major refactoring.

Why can't the new homescreen use the same tricks as the horizontal one to not be slow at startup?
Flags: needinfo?(fabrice)
(In reply to Fabrice Desré [:fabrice] from comment #133)
> I already said that I considered that one too risky to uplift. That's a
> major refactoring.

FWIW I agree with Fabrice... and that's a pity since our current code tends to fail randomly. I've seen end user 1.3 device (ebay ZTE Open C latest version) where an app installed from the marketplace could only be launched from the search bar (or the marketplace) because somehow the oninstall event was lost and so the homescreen didn't add the icon. And no icon meant no uninstalling it either, so the user's only option was to do a factory reset.

This patch hopefully will avoid these cases (that was the idea after all).
Comment on attachment 8459556 [details] [diff] [review]
Delay processing requests until the registry is ready, v4.

Review of attachment 8459556 [details] [diff] [review]:
-----------------------------------------------------------------

I see some light at the end of the tunnel!

::: dom/apps/src/AppsServiceChild.jsm
@@ +174,5 @@
>          }
>          for (let i = 0; i < apps.length; i++) {
>            let domApp = apps[i].get();
> +          if (!domApp || domApp._window === null) {
> +            apps.splice(i,1);

nit: (i, 1)

@@ +234,5 @@
>      // Get rid of dead weak references.
>      for (let i = 0; i < apps.length; i++) {
> +      let app = apps[i].get();
> +      if (!app || app._window === null) {
> +        apps.splice(i,1);

ditto

::: dom/apps/src/Webapps.jsm
@@ +275,5 @@
>    get registryStarted() {
>      return this._registryStarted.promise;
>    },
>  
> +  // The registry will be safe to clone

nit: add a full stop at the end of the comment.
Attachment #8459556 - Flags: review?(fabrice) → review+
Attachment #8449626 - Attachment is obsolete: true
Created attachment 8460774 [details] [diff] [review]
Part 3. Delay processing requests until the registry is ready, nits addressed

r=fabrice
Attachment #8459556 - Attachment is obsolete: true
Attachment #8460774 - Flags: review+
Attachment #8451323 - Attachment description: app-ipc.patch updated → Part 1. app-ipc.patch updated
Attachment #8453622 - Attachment description: Reject the promise when cancelling a downlaod → Part 2. Reject the promise when cancelling a downlaod
Attachment #8460774 - Attachment description: Delay processing requests until the registry is ready, nits addressed → Part 3. Delay processing requests until the registry is ready, nits addressed
Well, here goes nothing :) Requesting checkin. The patches should be applied in the order specified in the name.
Keywords: checkin-needed
the patches failed to apply as example for part 1

applying app-ipc.patch
patching file dom/apps/src/Webapps.js
Hunk #6 FAILED at 811
1 out of 7 hunks FAILED -- saving rejects to file dom/apps/src/Webapps.js.rej
patching file dom/apps/src/Webapps.jsm
Hunk #17 FAILED at 2574
Hunk #19 FAILED at 2785
Hunk #20 FAILED at 2812
Hunk #24 succeeded at 3556 with fuzz 1 (offset 23 lines).
3 out of 27 hunks FAILED -- saving rejects to file dom/apps/src/Webapps.jsm.rej
patching file dom/apps/tests/test_packaged_app_update.html
Hunk #1 succeeded at 75 with fuzz 1 (offset 0 lines).
patch failed, unable to continue (try -v)
patch failed, rejects left in working dir
errors during apply, please fix and refresh app-ipc.patch

could you try to update these patches like rebasing against a current tree ?
Keywords: checkin-needed
It seems there has been some changes on Webapps on the last day... I'm going to relaunch the tests and will re-request checking when/if they pass correctly.
Comment on attachment 8453622 [details] [diff] [review]
Part 2. Reject the promise when cancelling a downlaod

Now that bug 997717 landed this doesn't seem to be necessary any more.
Attachment #8453622 - Attachment is obsolete: true
Created attachment 8460887 [details] [diff] [review]
Part 1. Core changes

r=fabrice

Rebased with latest M-C changes

New try run is at: 
https://tbpl.mozilla.org/?tree=Try&rev=3503918f65fa
Attachment #8460887 - Flags: review+
Attachment #8451323 - Attachment is obsolete: true
Try run is green still. Re-requesting checkin. Part 2 is not necessary anymore (I obsoleted it)
Keywords: checkin-needed
https://hg.mozilla.org/mozilla-central/rev/fd84374f63f1
https://hg.mozilla.org/mozilla-central/rev/81f96f02263f
Status: REOPENED → RESOLVED
Last Resolved: 4 years ago4 years ago
Resolution: --- → FIXED

Comment 145

4 years ago
This appears to have regressed bug 1035867 again. Working on verifying. Fabrice, did you test the steps in bug 1035867 ?
Flags: needinfo?(fabrice)
I totally forgot about that other bug (and as I said on bug 1035867 I don't even have an Android phone to test this :)). Anyway I uploaded a possible fix (in case it's caused by the same problem we ran with the B2G debug builds and processNextEvent).
status-firefox33: --- → ?
status-firefox34: --- → fixed
Keywords: verifyme
Flags: needinfo?(fabrice)
Backed out for very-likely being the cause of frequent B2G crashtest timeouts (as noted in bug 1015200).

https://hg.mozilla.org/mozilla-central/rev/8cd9e4240f29
Status: RESOLVED → REOPENED
status-firefox33: ? → ---
status-firefox34: fixed → ---
Resolution: FIXED → ---
Target Milestone: mozilla33 → ---
https://hg.mozilla.org/mozilla-central/rev/5ba4a3f85a2e
https://hg.mozilla.org/mozilla-central/rev/8efaa9f52856
Status: REOPENED → RESOLVED
Last Resolved: 4 years ago4 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla34
Depends on: 1015200

Updated

a year ago
Product: Core → Core Graveyard
You need to log in before you can comment on or make changes to this bug.