Closed Bug 991397 Opened 6 years ago Closed 6 years ago

launching app installed outside Firefox hangs firstrun

Categories

(Firefox for Android :: Web Apps (PWAs), defect, P1, critical)

29 Branch
ARM
Android
defect

Tracking

()

RESOLVED WONTFIX
Tracking Status
firefox29 --- wontfix
firefox30 --- unaffected
firefox31 --- unaffected

People

(Reporter: myk, Assigned: myk)

Details

(Whiteboard: [WebRuntime])

Attachments

(4 files)

Launching an app with Firefox Beta after having installed it from outside that browser (f.e. by sideloading it or installing it using Nightly) causes the app to hang on the splash screen, and the log shows this suspicious error:

 28928           GeckoConsole  E  [JavaScript Error: "TypeError: this._manifest is null" {file: "resource://gre/modules/AppsUtils.jsm" line: 559}]
 28928           GeckoConsole  E  [JavaScript Error: "TypeError: this._manifest is null" {file: "resource://gre/modules/AppsUtils.jsm" line: 559}]

On second run (presumably with an existing Firefox Beta process in the background), the app crashes due to bug 968129.  On third run, it runs as expected.

Nightly doesn't exhibit the same problem, so this seems specific to Beta.  (I haven't tested on Aurora yet.)
Assignee: nobody → jhugman
per triage, demoting to not blocking, still P1
Severity: blocker → critical
Further testing shows that current Aurora, like Nightly, is unaffected.  It's only Beta (Fx 29) that shows the problem.
This is a race between DOMApplicationRegistry.confirmInstall and navigator.mozApps.mgmt.getAll.

WebappManager calls confirmInstall, which installs the app into the registry, including writing its manifest to a file via DOMApplicationRegistry._writeFile.

Afterward, confirmInstall calls its aInstallSuccessCallback argument, which eventually results in WebappRT calling getAll, which reads from the manifest file.

But _writeFile writes the file asynchronously, and confirmInstall doesn't wait for it to complete.  So the file isn't available when getAll reads the file, which makes app.manifest be null in WebappRT.getManifestFor, which passes that value to the ManifestHelper constructor, which throws the exception.

For some reason, this doesn't happen on Central/Aurora, where _writeFile has been reimplemented, even though the new implementation still writes the file asynchronously, and confirmInstall still doesn't wait for it to finish.  Perhaps the new implementation is more performant, so it happens to work on my device.

In any case, the real fix is for confirmInstall to wait until _writeFile finishes before calling the callback.  And I'll fix that issue on Central (even though I can't reproduce this problem there) and uplift that fix to Aurora, which has the same implementation of _writeFile.

For Beta, however, it's safer to leave _writeFile alone, since touching it would affect all its many consumers, and patch getManifestFor to handle this situation where and when it occurs.

I tried several ways of fixing the problem, including passing the manifest from WebappManager to WebappRT (a long and winding road through InstallHelper, WebappImpl, and BrowserApp), and ended up with this one: save the "just installed" manifest in WebappManager and then retrieve it if the manifest is missing.
Assignee: jhugman → myk
Status: NEW → ASSIGNED
Attachment #8409721 - Flags: feedback?(mark.finkle)
Another approach is to try calling getAll again after one second.  But we also have to purge the manifest caches in that case, and the one in Webapps.js can only be purged by notifying memory-pressure.  Which then seems to delay the file write from completing for several more seconds, so we actually end up trying again several more times before succeeding, on my speedy Nexus 5 anyway.
Attachment #8409722 - Flags: feedback?(mark.finkle)
A third option is to update the implementation of _writeFile to the one on Central/Aurora.  But that seems risky to me, since it affects many more consumers, and since that implementation seems likely to suffer the same race condition, even though I haven't managed to reproduce it.
Attachment #8409723 - Flags: feedback?(mark.finkle)
Attachment #8409723 - Flags: feedback?(fabrice)
Comment on attachment 8409723 [details] [diff] [review]
option #3: update _writeFile implementation

I'm not too worried about this impl, but I do want to point out that one reason this might be working is that NetUtils.asyncCopy is not very async.

I think bug 925838 and bug 925831 are suggesting that.

It looks like the _writeFile code used to be implemented using NetUtils.asyncCopy before the callbacks were changed to promises.
http://hg.mozilla.org/mozilla-central/rev/75a010e3f038#l2.631

On trunk, we reverted back to NetUtils.asyncCopy when we stopped using OS.File in the apps code.
http://hg.mozilla.org/mozilla-central/rev/12985e11e4e8#l3.114

f+ from me, but fabrice has more knowledge about the code.
Attachment #8409723 - Flags: feedback?(mark.finkle) → feedback+
And here's a test that should fail because of the race condition, although I can't actually get it to fail on my desktop (perhaps because it's too fast).
Comment on attachment 8409723 [details] [diff] [review]
option #3: update _writeFile implementation

Review of attachment 8409723 [details] [diff] [review]:
-----------------------------------------------------------------

Looks like our best option here and now.
Attachment #8409723 - Flags: feedback?(fabrice) → review+
Comment on attachment 8409723 [details] [diff] [review]
option #3: update _writeFile implementation

[Approval Request Comment]
Bug caused by (feature/regressing bug #): 
  Bug 959420.

User impact if declined: 
  Sideloaded apps will hang on firstrun.

Testing completed (on m-c, etc.): 
  The fix essentially reverts a change made by bug 959420.  The change was
  already reverted on Central/Aurora in bug 981085, so this has been tested
  on those branches ever since that bug landed about a month ago (March 25).

Risk to taking this patch (and alternatives if risky): 
  There's some risk, since this only reverts a part of the changes that
  caused the problem (all of which were reverted on Central/Aurora).
  The function is also called by a variety of consumers, and it's possible
  that some of them depend on the existing (buggy) behavior.

  The alternatives are the other two patches in the bug, which are less
  risky in terms of scope, since they don't affect other consumers, but also
  less well exercised, since they haven't been tested on Central/Aurora.

String or IDL/UUID changes made by this patch:
  None.
Attachment #8409723 - Flags: approval-mozilla-beta?
Comment on attachment 8409723 [details] [diff] [review]
option #3: update _writeFile implementation

We cannot take this patch for 29. Sorry. It is too risky and we are not planning to do an other build for the release...
Attachment #8409723 - Flags: approval-mozilla-beta? → approval-mozilla-beta-
Ok, wontfixing, as the bug is Fx29-only, and it's wontfix for 29.
Status: ASSIGNED → RESOLVED
Closed: 6 years ago
Resolution: --- → WONTFIX
Attachment #8409721 - Flags: feedback?(mark.finkle)
Attachment #8409722 - Flags: feedback?(mark.finkle)
Summary: launching app installed outside Firefox Beta hangs firstrun → launching app installed outside Firefox hangs firstrun
You need to log in before you can comment on or make changes to this bug.