Closed Bug 834618 Opened 9 years ago Closed 9 years ago

Unable to boot: JavaScript Error: "TypeError: manifest is null" {file: "app://system.gaiamobile.org/shared/js/manifest_helper.js" line: 12}

Categories

(Core Graveyard :: DOM: Apps, defect)

ARM
Gonk (Firefox OS)
defect
Not set
blocker

Tracking

(blocking-b2g:tef+, firefox19 wontfix, firefox20 wontfix, firefox21 fixed, b2g18 fixed, b2g18-v1.0.0 fixed, b2g18-v1.0.1 fixed)

VERIFIED FIXED
mozilla21
blocking-b2g tef+
Tracking Status
firefox19 --- wontfix
firefox20 --- wontfix
firefox21 --- fixed
b2g18 --- fixed
b2g18-v1.0.0 --- fixed
b2g18-v1.0.1 --- fixed

People

(Reporter: nbp, Assigned: julienw)

References

Details

(Keywords: b2g-testdriver, crash, unagi)

Attachments

(2 files, 1 obsolete file)

Attached file logcat
Last installed apps:
 - https://song-info.appspot.com/
 - Here Maps (from marketplace)
 - Thomasy (from marketplace)

Steps: (?)
 - Attempts to reproduce Bug 834117 (succeeded)
 - Reboot to remove extra icons from the screen
Keywords: crash, unagi
Hi Nicolas,

can you provide qa more information on this issue?
- what build ad phone are you testing on
- can you reproduce this if only just 1 marketplace app
- if you can provide more reproducible details for us to try, that would be helpful

Thanks,
Tony
(In reply to Tony Chung [:tchung] from comment #1)
> can you provide qa more information on this issue?
> - what build ad phone are you testing on

The latest test driver build.

> - can you reproduce this if only just 1 marketplace app

No, because the phone does not boot anymore. (see bug title)

> - if you can provide more reproducible details for us to try, that would be
> helpful

Sorry, I cannot remember all the steps I made to break the boot sequence of my phone.  I only installed 4 apps, magic stones (installed a while ago), song-info installed from the website, Here maps installed from the market place after removing the version which did not get any update, and the next app in the marketplace when searching for "maps".
Minused until we get more info.
blocking-b2g: tef? → -
Sorry, we must have it tef+.

That's not Nicolas' task to give us more information, this is _our_ task to reproduce this bug, when the bug is so serious that it bricks his phone.

So nominating again.
blocking-b2g: - → tef?
from the attached logcat:

E/GeckoConsole(  108): [JavaScript Error: "DOMApplicationRegistry: Could not parse JSON: /data/local/webapps/{6658c560-9a95-499c-83fb-304e24ba14db}/manifest.webapp Error: Webapps.jsm: non-relative URI passed to resolveFromOrigin

# cat \{6658c560-9a95-499c-83fb-304e24ba14db\}/manifest.webapp 

{"name":"Thomasy Map","developer":{"name":"Thomasy","url":"http://thomasy.tw"},"description":"Thomasy Map Web App","icons":{"64":"/images/icon64.png","128":"/images/icon128.png"}}
Some logs that happen at boot time :

E/GeckoConsole(  227): [JavaScript Error: "DOMApplicationRegistry: Could not parse JSON: /system/b2g/webapps/webapps.json TypeError: this.webapps[(intermediate value)] is undefined
E/GeckoConsole(  227): [JavaScript Error: "DOMApplicationRegistry: Could not parse JSON: /system/b2g/webapps/browser.gaiamobile.org/manifest.webapp Error: Webapps.jsm: non-relative URI passed to resolveFromOrigin
E/GeckoConsole(  227): [JavaScript Error: "manifest is null" {file: "resource://gre/modules/Webapps.jsm" line: 595}]
E/GeckoConsole(  227): [JavaScript Error: "TypeError: manifest is null" {file: "app://system.gaiamobile.org/shared/js/manifest_helper.js" line: 12}]


The first logs are very serious, I'm asking fabrice for more information.
In the mean time I try to reproduce on my phone.
Flags: needinfo?(fabrice)
Component: Gaia::System → DOM: Apps
Product: Boot2Gecko → Core
Version: unspecified → Trunk
The logs seem to imply this might be a bug in the mozApps API.

Maybe Fabrice knows what "Error: Webapps.jsm: non-relative URI passed to resolveFromOrigin" means. I'll take a look at the source and see what I find.
In app manifests, all URLs are supposed to be absolute or relative to the origin. So they either begin by / or are absolute. We're supposed to check this at install-time but here it obviously failed at one moment.
I can't reproduce it with the apps that Nicolas has, however I think we should be very protective during our boot sequence to not fail like that. Multiple layers of security is better than only one.
I can't reproduce either.

resolveFromOrigin() failing means that something goes wrong when registering activities or system messages, which could be an error in a 3rd party app manifest, but any testing would have caught that.
Flags: needinfo?(fabrice)
Fabrice, you said in another bug that we should never fail at startup, no matter what.

I mean, a wrong 3rd party app manifest should NOT makes a device in this situation

That's what I'm concerned about here: catch it when resolveFromOrigin throws an exception and do something to recover.

Is that dumb ?
Flags: needinfo?(fabrice)
(In reply to Fabrice Desré [:fabrice] from comment #10)
> resolveFromOrigin() failing means that something goes wrong when registering
> activities or system messages, which could be an error in a 3rd party app
> manifest, but any testing would have caught that.

AFAIK, the only application that I have which register an activity is
  https://song-info.appspot.com/

Which I started at least alone and from the music application.

The author of the application was not able to reproduce this Bug. (ask :MattN)
(In reply to David Scravaglieri [:scravag] from comment #3)
> Minused until we get more info.

Are you really sure we should tef- a bug which is caused by a thrid party app which can cause the phone to become a brick for a non-developer user?  Currently I am just unable to receive/make any call/sms.  The home screen never appear, so I am unable to see any of my installed app, and thus it is impossible to consult any email/sms/calendar application or play with any installed game.

What I have as of today is a *blue screen* with a Firefox OS logo where I can change the volume and display the menu listed with the power button.  A long press on the home button just make the phone vibrate but it does not show anything.  I am able to take screenshots and receive the notification noise, but the *blue screen* remain on top.

To summarize, the phone is a Brick!  Don't you think this should be a blocking bug?
Severity: normal → blocker
tracking-b2g18: --- → ?
Flags: needinfo?(dscravaglieri)
OS: Linux → Gonk (Firefox OS)
Hardware: x86_64 → ARM
No need to flip the tracking flag here - you've already got the blocker nom here.
tracking-b2g18: ? → ---
QA Contact: jsmith
Skipping triage, but here's my comments just in case:

We should ask the following question:

1. Given the information provided, is there anything we can implement to prevent what might have bricked the phone? If yes, then we should block.

Fabrice a person to ask on this.

I'll investigate this as well.
I'm a bit tired of reading that "we don't want to fix stuff". tef+ bugs have to be fixed and checked in before tonight PST, so without STR there's not much we can do.

If you have str (or another app that shows that), then we'll gladly see what we can do. It's not even clear to me if we should ignore silently failed activities registration or abort the install at this point.
Flags: needinfo?(fabrice)
It's clear that should fix the cause, but this is difficult to do because for that we need to reproduce.

However, I do think we need to fix the symptom too: we should not brick when these errors arrive. Unless they're really unrecoverable...
Fabrice> the logs that I pasted happens when the phone was booting, not when we install the app.
logs in Comment 6, I mean.
(In reply to Fabrice Desré [:fabrice] from comment #16)
> I'm a bit tired of reading that "we don't want to fix stuff". tef+ bugs have
> to be fixed and checked in before tonight PST, so without STR there's not
> much we can do.

Correction, there is no tef+ deadline tonight. Just wanted to clear that up.
We tracked this down to Bug 824670.

It seems that at one point we accepted activities with absolute URI, but now we don't.

Nicolas installed an app with an activity with an absolute URI, then updated his phone, which bricked it, because we now throw an error during the boot because of this app that we accepted to install in a previous version.

The app dev fixed the manifest so that's why we can't reproduce. I made an app that do the same at http://everlong.org/mozilla/hosted-bricked/ and now we refuse the install for such an app.

So, two thoughts here :

1. We should never brick a phone because of an update. If Nicolas is the only one who had this specific problem I can fix it myself with a manual fix.
2. Throwing errors during the boot is a very dangerous thing to do.

A big thank you to Vivien who helped me on this.

I don't know what to do now. Either have some code to recover from such a situation if this happens to other, or just close this bug now.
just verified that removing the absolute uri in the installed manifest.webapp _on the device_ unbricked the phone.
Sounds like we've got STR now.

I'd say we should have a way to recover from such a situation. The reality is we do change things per update. We can't shoot ourselves into a ground on an update.
IMHO, we should not be able to brick the phone with a single manifest from whatever app. There can be some problem with an update the ends up adding a bad manifest, maybe just bit- or byte-shifted due to a download problem, and then only that one app should have a problem, but not the whole system. At least that's my naive view from a high-level view.
funny or not, I just got into this state myself without even looking. The culprit was the marketplace and the problem was also that activities href had an absolute URL.

It's not clear how I got into this state as a "make install-gaia" unbricked the phone in this case (becase we re-flash the marketplace app). But this makes it clear that this should be fixed anyway.
Ok as it's reproducible we block on it.
Assignee: nobody → felash
blocking-b2g: tef? → tef+
Flags: needinfo?(dscravaglieri)
Ok, I reproduced it very easily. You just need to flash a current VARIANT=eng build, finish FTU, and reboot. A patch is coming.

In a VARIANT=user build, the marketplace activities are set in the manifest in /data/local/webapps/marketplace/manifest.webapp but since marketplace is not removable, this manifest is not read and the manifest in /system/b2g is read instead. I believe this is a bug but I don't know what are the effect of this. 
Fabrice, could you comment on this ?
Flags: needinfo?(fabrice)
And BTW the activities for the marketplace are made wrong in gaia build, because there are correct in https://marketplace.firefox.com/telefonica/manifest.webapp. But this helps us reproducing this bug.
Attached patch patch v1 (obsolete) — Splinter Review
Attachment #709228 - Flags: review?(fabrice)
I just tried comment 27 on a user build.

- first launch, FTU not yet finished, we have exactly the same manifest in /system/b2g/webapps/marketplace/manifest.webapp and in /data/local/webapps/marketplace/manifest.webapp. I don't understand why we have something in /data/local/webapps at this point. We also do have webapps.json both in /system/b2g/webapps and in /data/local/webapps.

- after FTU is finished, I believe we're updating the manifest in some way, and now the manifest in /data/local/webapps/marketplace/manifest.webapp has some activities. But they're not used because the Marketplace app is not removable, and therefore the manifest in /system/b2g/webapps/marketplace/manifest.webapp is used instead.

The basePath property isn't used, at least for this.

Fabrice, do you believe this is a bug ? If yes, can you file it, as I don't completely understand the implication ?
In the patch I added a log to see which actual file is read in _readManifest.
Are you sure you flashed a user build from scratch? These builds should *not* have anything in /system/b2g/webapps
Flags: needinfo?(fabrice)
Comment on attachment 709228 [details] [diff] [review]
patch v1

Review of attachment 709228 [details] [diff] [review]:
-----------------------------------------------------------------

::: dom/apps/src/Webapps.jsm
@@ +510,5 @@
> +        description.href = manifest.resolveFromOrigin(description.href);
> +      } catch (e) {
> +        debug("Activity href (" + description.href + ") is invalid, skipping " +
> +            "this activity. Error is: " + e);
> +        return activitiesToRegister;

here you stop on the first activity that has an error. We just want to skip it, but still try to register the next ones.

@@ +1953,5 @@
>        file = FileUtils.getFile(baseDir, ["webapps", id, "manifest.json"], true);
>      }
>  
> +    debug('_readManifests: loading ' + file.path + ' for app ' +
> +        app.manifestURL);

Please remove that, and revert the previous change.
Attachment #709228 - Flags: review?(fabrice) → review-
(In reply to Fabrice Desré [:fabrice] from comment #32)
> Are you sure you flashed a user build from scratch? These builds should
> *not* have anything in /system/b2g/webapps

you mean /data/local right ?


Well, I did |./build.sh && ./flash.sh| after changing the VARIANT env in .userconfig. Should I clobber too ?
(In reply to Julien Wajsberg [:julienw] from comment #34)
> (In reply to Fabrice Desré [:fabrice] from comment #32)
> > Are you sure you flashed a user build from scratch? These builds should
> > *not* have anything in /system/b2g/webapps
> 
> you mean /data/local right ?

Ha yes, user builds. But we copy stuff over in /data/local at startup. And non-removable apps can't be updated so I don't understand how you end up with a different manifest in /system and in /data.


> Well, I did |./build.sh && ./flash.sh| after changing the VARIANT env in
> .userconfig. Should I clobber too ?

I usually clobber out/ when changing from user to eng.
Attached patch patch v2Splinter Review
catches all invocations of resolveFromOrigin in Webapps.jsm
Attachment #709228 - Attachment is obsolete: true
Attachment #709690 - Flags: review?(fabrice)
(In reply to Fabrice Desré [:fabrice] from comment #35)
> Ha yes, user builds. But we copy stuff over in /data/local at startup. And
> non-removable apps can't be updated so I don't understand how you end up
> with a different manifest in /system and in /data.

I think I found several things that seems strange, I'm opening another bug about this.
Attachment #709690 - Flags: review?(fabrice) → review+
Keywords: checkin-needed
https://hg.mozilla.org/mozilla-central/rev/704a1ceb0568
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla21
A bit puzzled on what to test here.

Julien - Can you clarify on what I should look out for?
Flags: needinfo?(felash)
Keywords: verifyme
Jason> the easiest way is to follow the STR in Bug 838102.

Without this fix, a reboot bricks the phone. With this fix, a reboot just ignores the wrong activities.
Flags: needinfo?(felash)
BTW I now believe that Bug 838102 is what triggered this bug initially on Nicolas' phone, he confirmed that the song-info app was updated on his phone (this information was not given before unfortunately).
With a lot of testing with Fabrice's one packaged app web activity patch, I haven't hit this bug, so it looks fixed to me now per the verification on the related bug - bug 838102.
Status: RESOLVED → VERIFIED
Keywords: verifyme
Product: Core → Core Graveyard
You need to log in before you can comment on or make changes to this bug.