Closed Bug 834672 Opened 11 years ago Closed 11 years ago

AppProtocolHandler.js should never throw

Categories

(Core Graveyard :: DOM: Apps, defect)

ARM
Gonk (Firefox OS)
defect
Not set
normal

Tracking

(blocking-b2g:leo+, firefox20 wontfix, firefox21 wontfix, firefox22 fixed, b2g18 fixed, b2g18-v1.0.0 wontfix, b2g18-v1.0.1 wontfix)

RESOLVED FIXED
mozilla22
blocking-b2g leo+
Tracking Status
firefox20 --- wontfix
firefox21 --- wontfix
firefox22 --- fixed
b2g18 --- fixed
b2g18-v1.0.0 --- wontfix
b2g18-v1.0.1 --- wontfix

People

(Reporter: julienw, Assigned: fabrice)

References

Details

(Keywords: dataloss, Whiteboard: [qa-])

Attachments

(2 files, 2 obsolete files)

In some situations, most notably because Homescreen bugs (yeah, our coders are great but still... :) ), we get to do XHR to app:// URL that have an invalid ID.

This leads to the following logs :

I/Gecko   (  108): -*-*- Webapps.jsm : No webapp for {7120fb68-4942-44f2-a02f-84a6d32efcab}
E/GeckoConsole(  413): [JavaScript Error: "appInfo is null" {file: "jar:file:///system/b2g/omni.ja!/components/AppProtocolHandler.js" line: 69}]
E/GeckoConsole(  413): [JavaScript Error: "NS_ERROR_XPC_JAVASCRIPT_ERROR_WITH_DETAILS: '[JavaScript Error: "appInfo is null" {file: "jar:file:///system/b2g/omni.ja!/components/AppProtocolHandler.js" line: 69}]' when calling method: [nsIProtocolHandler::newChannel]" {file: "app://homescreen.gaiamobile.org/js/page.js" line: 158}]
E/GeckoConsole(  413): [JavaScript Error: "NS_ERROR_XPC_JAVASCRIPT_ERROR_WITH_DETAILS: '[JavaScript Error: "appInfo is null" {file: "jar:file:///system/b2g/omni.ja!/components/AppProtocolHandler.js" line: 69}]' when calling method: [nsIProtocolHandler::newChannel]" {file: "app://homescreen.gaiamobile.org/js/page.js" line: 158}]

As we can see, AppProtocolHandler tries to get the app informations for a ID that doesn't exist anymore. And then it throws in the calling code.

If that's possible, I'd return a 404 error or something like that.
Or maybe use a try/catch block around the whole block and throw a 50x error if we get an exception. Or maybe both.

But I really don't know how that stuff works so..

The implications for the user code is that if this throws, it means that the phone in a bad state and we may not showing the icons any more, or not showing some pages in the Homescreen, without being able to recover without a full reset.

That's why I'd like to get this fixed, even if we obviously need to fix the other bugs in Homescreen too.

I really would like this to go to shira because I think it would be a small patch with a low risk with big reward. Of course Fabrice could disagree :)
I don't think we can block on this until we know what the implications are. A bad logcat bug with no understanding of the user impact will not block.
Blocks: app-install
blocking-b2g: shira? → ---
Actually I see this as an easy last guard against yet unfound bugs and so that is the reason I think it should block.
blocking-b2g: --- → shira?
Triage: shira is to be as close to tef as possible. before understanding a confirmed STR and user impact, move to leo? for consideration
blocking-b2g: shira? → leo?
blocking-b2g: leo? → leo+
Attached patch patch (obsolete) — Splinter Review
It's hard to verify since we hit that in cases that I suspect to be race conditions. Julien, do you have an way to tweak gaia to hit this?
Assignee: nobody → fabrice
Attachment #716954 - Flags: feedback?(jduell.mcbugs)
IIRC this happens when we uninstall an app and install it again (because homescreen doesn't handle correctly the uninstall event). Maybe it was fixed though, I'll have a look today.

(the feedback to jduell was an error ?)
Flags: needinfo?(felash)
(In reply to Julien Wajsberg [:julienw] from comment #5)
> 
> (the feedback to jduell was an error ?)

No, I want some feedback on the DummyChannel code.
So I'm actually seeing this happening during packaged app updates. When this happens, the app update completes, but the app won't be able to be launched. Which then makes this data loss now. I've hit this twice in a row so far.

02-22 17:09:20.756: E/GeckoConsole(392): [JavaScript Error: "appInfo is null" {file: "jar:file:///system/b2g/omni.ja!/components/AppProtocolHandler.js" line: 68}]
02-22 17:09:20.756: E/GeckoConsole(392): [JavaScript Error: "NS_ERROR_XPC_JAVASCRIPT_ERROR_WITH_DETAILS: '[JavaScript Error: "appInfo is null" {file: "jar:file:///system/b2g/omni.ja!/components/AppProtocolHandler.js" line: 68}]' when calling method: [nsIProtocolHandler::newChannel]" {file: "app://homescreen.gaiamobile.org/js/page.js" line: 158}]
02-22 17:09:20.756: E/GeckoConsole(392): [JavaScript Error: "NS_ERROR_XPC_JAVASCRIPT_ERROR_WITH_DETAILS: '[JavaScript Error: "appInfo is null" {file: "jar:file:///system/b2g/omni.ja!/components/AppProtocolHandler.js" line: 68}]' when calling method: [nsIProtocolHandler::newChannel]" {file: "app://homescreen.gaiamobile.org/js/page.js" line: 158}]
Blocks: b2g-app-updates
No longer blocks: app-install
blocking-b2g: leo+ → tef?
Component: General → DOM: Apps
Keywords: dataloss
Product: Boot2Gecko → Core
Version: unspecified → Trunk
Comment on attachment 716954 [details] [diff] [review]
patch

Review of attachment 716954 [details] [diff] [review]:
-----------------------------------------------------------------

My gut sense is that you'd be better served by throwing an error in newChannel instead of constructing a dummychannel.  But if it makes life a lot easier to have the error happen later in OnStopRequest, then I'm fine with that.  More details below.

::: netwerk/protocol/app/AppProtocolHandler.js
@@ +54,5 @@
>      // We map app://ABCDEF/path/to/file.ext to
>      // jar:file:///path/to/profile/webapps/ABCDEF/application.zip!/path/to/file.ext
> +    let url = aURI.QueryInterface(Ci.nsIURL);
> +    let appId = aURI.host;
> +    let fileSpec = url.filePath;

So you're relying on nsIURL parsing to treat everything up to first / as host/appID.  I don't know the URL code well enough to know if that's true--sounds like it ought to work.  :bz would know, or maybe you've already looked at the code.

@@ +62,5 @@
>      let uri;
> +
> +    if (!appInfo) {
> +      dump("!! got no appInfo for " + appId + "\n");
> +      return new DummyChannel();

Any reason you don't just return an error here instead of constructing a dummy channel?  It's certainly a cleaner approach.  But if it makes life easier upstream there's nothing inherently wrong with the dummy channel approach.  It's just kludgy.

@@ +89,5 @@
> +  dump("DummyChannel constructor");
> +}
> +
> +DummyChannel.prototype = {
> +  QueryInterface: XPCOMUtils.generateQI([Ci.nsIChannel]),

nsIChannel inherits from nsIRequest, so there's a bunch of members/methods there that you theoretically also need to implement to be a properly functioning nsIChannel.  Whether you actually need to implement them or not depends of course on whether the code upstream actually uses/relies on them.

@@ +115,5 @@
> +    Services.tm.currentThread.dispatch(
> +    {
> +      run: function dc_run() {
> +        aListener.onStartRequest(request, aContext);
> +        aListener.onStopRequest(request, aContext, Cr.NS_ERROR_FILE_NOT_FOUND);

My JS-Fu is poor.  Does this use of aListener in a nested function that gets run in a different event preseve a refcount for aListener the entire time?  A call to AsyncOpen that returns NS_OK is responsible for AddRef'ing the listener and then we must release it after calling OnStopRequest.

For thoroughness's sake, you might also want to set notificationCallbacks=null (after calling OnStopRequest), in case some code set it after the channel was created to avoid possibly ref cycle.

It's also generally the case that if nsIRequest.loadGroup has been set, AsyncOpen calls loadGroup.AddRequest(this) and after OnStopRequest is called we call loadgroup.RemoveRequest.  Don't know if you need it in the case here.  IsPending() would also generally return true in between AsyncOpen and OnStop, and false otherwise.

suspend() would prevent OnStart/Stop from happening until resume() is called (and could be called by aListener.OnStart())  Again maybe no need to implement.  

Perhaps you get a taste for how faking a channel is complicated, and this dummy approach could easily blow up on you later if some new code is added that assumes one of these things about the channel and you haven't implemented it :)  That's why I recommend the simple newChannel error throw if it's possible.
Attachment #716954 - Flags: feedback?(jduell.mcbugs)
(In reply to Jason Duell (:jduell) from comment #8)

> My gut sense is that you'd be better served by throwing an error in
> newChannel instead of constructing a dummychannel.  But if it makes life a
> lot easier to have the error happen later in OnStopRequest, then I'm fine
> with that.  More details below.

In the calling code, we use a system XHR to get a file whose URL is an app:// url. I was suggesting that having a 404 or 50x error would be better than throwing at "open" (what happens now if the app id is not found).

At least we should not throw with a "null value error" but maybe the "50x" error could be generated by the XHR implementation if the protocol handler throws an Error ?
Flags: needinfo?(felash)
(In reply to Jason Smith [:jsmith] from comment #7)
> So I'm actually seeing this happening during packaged app updates. When this
> happens, the app update completes, but the app won't be able to be launched.
> Which then makes this data loss now. I've hit this twice in a row so far.
> 

I can't reproduce this (even if I got other unrelated issues trying to reproduce it). Would you have consistent STR ?
With this patch you can easily reproduce the error.

Don't use it at home as this will break homescreen until this bug is fixed ;)
(In reply to Julien Wajsberg [:julienw] from comment #10)
> (In reply to Jason Smith [:jsmith] from comment #7)
> > So I'm actually seeing this happening during packaged app updates. When this
> > happens, the app update completes, but the app won't be able to be launched.
> > Which then makes this data loss now. I've hit this twice in a row so far.
> > 
> 
> I can't reproduce this (even if I got other unrelated issues trying to
> reproduce it). Would you have consistent STR ?

There isn't really a consistent STR, but you've got about a 1/5 of a chance of hitting this and getting that error. Which means it's probably a timing issue. But I most definitely have seen this happen and have seen the app that gets hit by it become unlaunchable.

And if it's a timing issue, I doubt I'm going to get consistent STR.
(In reply to Jason Smith [:jsmith] from comment #12)
> 
> There isn't really a consistent STR, but you've got about a 1/5 of a chance
> of hitting this and getting that error. Which means it's probably a timing
> issue. But I most definitely have seen this happen and have seen the app
> that gets hit by it become unlaunchable.

Does a reboot fix it ?
With the lack of consistent reproducibility we won't block on tef but will push to leo nomination so it can be considered for blocking that version.
blocking-b2g: tef? → leo?
(In reply to Julien Wajsberg [:julienw] from comment #13)
> (In reply to Jason Smith [:jsmith] from comment #12)
> > 
> > There isn't really a consistent STR, but you've got about a 1/5 of a chance
> > of hitting this and getting that error. Which means it's probably a timing
> > issue. But I most definitely have seen this happen and have seen the app
> > that gets hit by it become unlaunchable.
> 
> Does a reboot fix it ?

Yes.
> we use a system XHR to get a file whose URL is an app:// url. I was suggesting
> that having a 404 or 50x error would be better than throwing at "open" 

If that's easier, then the dummychannel approach should be workable.
(In reply to Jason Duell (:jduell) from comment #16)
> > we use a system XHR to get a file whose URL is an app:// url. I was suggesting
> > that having a 404 or 50x error would be better than throwing at "open" 
> 
> If that's easier, then the dummychannel approach should be workable.

I wonder if throwing an error here, and catching it in the XMLHttpRequest implementation instead, would not be easier ? This would prevent any programming mistake in all protocol handlers to cause a problem to the calling code. Also, this is not unlike how servers deal with programming errors: with errors 50x.
blocking-b2g: leo? → leo+
Attached patch patch v2 (obsolete) — Splinter Review
So if we want to return an HTTP error, we can either use a dummy channel, or tweak the XHR implementation to do something special in our case. Jason, if you think the later is the good thing to do, don't read further and r- this patch.

If not, I implemented the behavior you asked for in your first comment. I checked that gaia was indeed getting 404s.

The changes to the URL decoding code were asked by Gavin in bug 811540, but we finally used another solution so we never landed them.
Attachment #716954 - Attachment is obsolete: true
Attachment #718194 - Flags: review?(jduell.mcbugs)
Attached patch patch v3Splinter Review
Correct version of the patch...
Attachment #718194 - Attachment is obsolete: true
Attachment #718194 - Flags: review?(jduell.mcbugs)
Attachment #718207 - Flags: review?(jduell.mcbugs)
Comment on attachment 718207 [details] [diff] [review]
patch v3

Review of attachment 718207 [details] [diff] [review]:
-----------------------------------------------------------------

Look good.  If you fix up suspend/resume you can mark next patch +r and land w/o another review from me.

::: netwerk/protocol/app/AppProtocolHandler.js
@@ +63,5 @@
> +
> +    if (!appInfo) {
> +      // That should not happen, so dump() inconditionnally.
> +      dump("!! got no appInfo for " + appId + "\n");
> +      return new DummyChannel();

Can you put a comment that explains why we prefer to return a dummy channel instead of just throwing here?

@@ +86,5 @@
>    }
>  };
>  
> +/**
> +  * This dummy channel implementation only provides enough functionnality

functionality

@@ +88,5 @@
>  
> +/**
> +  * This dummy channel implementation only provides enough functionnality
> +  * to return a fake 404 error when the caller asks for an incorrect
> +  * app:// url in the case of an unknonw appId.

"asks for an app:// URL containing an unknown appId"

@@ +110,5 @@
> +  cancel: function dc_cancel() {
> +  },
> +
> +  suspend: function dc_suspend() {
> +    this._suspended = true;

this._suspendCount++

@@ +114,5 @@
> +    this._suspended = true;
> +  },
> +
> +  resume: function dc_resume() {
> +    this._suspended = false;

Suspend/resume can be recursive, so make this 

  if (suspendCount <= 0)
     throw NS_ERROR_UNEXPECTED;

  if (--suspendCount == 0) { if (pending) dispatch() }

@@ +137,5 @@
> +  contentDisposition: Ci.nsIChannel.DISPOSITION_INLINE,
> +  contentDispositionFilename: "",
> +
> +  _pending: false,
> +  _suspended: false,

_suspendCount = 0;
Attachment #718207 - Flags: review?(jduell.mcbugs) → review+
Julien:  I don't actually know XHR well enough to say if hooking there is significantly easier than using a dummy JAR channel.  This seems to work well enough?
Whichever is fine for me as soon as this is fixed, I was only exploring all possible solutions.
hg.mozilla.org/integration/mozilla-inbound/rev/7a79ddc7bedf
Backed out for debug-only bustage during packaging.
https://hg.mozilla.org/integration/mozilla-inbound/rev/a87453b34058

https://tbpl.mozilla.org/php/getParsedLog.php?id=20153551&tree=Mozilla-Inbound

e:\builds\moz2_slave\m-in-w32-d-0000000000000000000\build\toolkit\mozapps\installer\packager.mk:590:0$ e:/builds/moz2_slave/m-in-w32-d-0000000000000000000/build/obj-firefox/_virtualenv/Scripts/python.exe e:/builds/moz2_slave/m-in-w32-d-0000000000000000000/build/toolkit/mozapps/installer/packager.py -DNO_NSPR_10_SUPPORT -DAB_CD=en-US -DMOZ_APP_NAME=firefox -DPREF_DIR=defaults/preferences -DMOZ_DEBUG=1 -D_MSC_VER=1600 -DJAREXT= -DMOZ_ANGLE_RENDERER=1 -DMOZ_D3DCOMPILER_DLL=D3DCompiler_43.dll -DMOZ_CHILD_PROCESS_NAME=plugin-container.exe -DMOZ_MSVC_REDIST=1600 -DMOZ_SHARED_MOZGLUE=1 -DMOZ_JSDEBUGGER -DNECKO_WIFI -DDLL_PREFIX= -DDLL_SUFFIX=.dll -DBIN_SUFFIX=.exe -DBINPATH=bin -DENABLE_MARIONETTE=1 \
	--format omni \
	--removals e:/builds/moz2_slave/m-in-w32-d-0000000000000000000/build/browser/installer/removed-files.in \
	 \
	 \
	 \
	--optimizejars \
	 \
	package-manifest ../../dist ../../dist/firefox \
	
Executing e:/builds/moz2_slave/m-in-w32-d-0000000000000000000/build/obj-firefox/dist\bin\xpcshell.exe -g e:\builds\moz2_slave\m-in-w32-d-0000000000000000000\build\obj-firefox\dist\bin/ -a e:\builds\moz2_slave\m-in-w32-d-0000000000000000000\build\obj-firefox\dist\bin/ -f e:/builds/moz2_slave/m-in-w32-d-0000000000000000000/build/toolkit/mozapps/installer\precompile_cache.js -e precompile_startupcache("resource://gre/");
resource://gre/components/ActivityOptions.js
resource://gre/components/ActivityProxy.js
resource://gre/components/ActivityRequestHandler.js
resource://gre/components/ActivityWrapper.js
resource://gre/components/addonManager.js
resource://gre/components/AlarmsManager.js
resource://gre/components/amContentHandler.js
resource://gre/components/amWebInstallListener.js
resource://gre/components/AppProtocolHandler.js
###!!! ASSERTION: Recursive GetService!: 'Error', file e:/builds/moz2_slave/m-in-w32-d-0000000000000000000/build/xpcom/components/nsComponentManager.cpp, line 1385
xul!CallGetService+0x000000000000003B (e:\builds\moz2_slave\m-in-w32-d-0000000000000000000\build\obj-firefox\xpcom\build\nscomponentmanagerutils.cpp, line 63)
xul!CallGetService<nsIProtocolHandler>+0x0000000000000051 (e:\builds\moz2_slave\m-in-w32-d-0000000000000000000\build\obj-firefox\dist\include\nsservicemanagerutils.h, line 98)
xul!nsIOService::GetProtocolHandler+0x0000000000000110 (e:\builds\moz2_slave\m-in-w32-d-0000000000000000000\build\netwerk\base\src\nsioservice.cpp, line 418)
xul!nsIOService::NewURI+0x00000000000000E1 (e:\builds\moz2_slave\m-in-w32-d-0000000000000000000\build\netwerk\base\src\nsioservice.cpp, line 540)
xul!NS_InvokeByIndex_P+0x0000000000000027 (e:\builds\moz2_slave\m-in-w32-d-0000000000000000000\build\xpcom\reflect\xptcall\src\md\win32\xptcinvoke.cpp, line 71)
xul!XPCWrappedNative::CallMethod+0x0000000000001084 (e:\builds\moz2_slave\m-in-w32-d-0000000000000000000\build\js\xpconnect\src\xpcwrappednative.cpp, line 2385)
xul!XPC_WN_CallMethod+0x00000000000002A1 (e:\builds\moz2_slave\m-in-w32-d-0000000000000000000\build\js\xpconnect\src\xpcwrappednativejsops.cpp, line 1417)
mozjs!js::InvokeKernel+0x000000000000043D (e:\builds\moz2_slave\m-in-w32-d-0000000000000000000\build\js\src\jsinterp.cpp, line 383)
mozjs!js::Interpret+0x000000000000A142 (e:\builds\moz2_slave\m-in-w32-d-0000000000000000000\build\js\src\jsinterp.cpp, line 2362)
mozjs!js::RunScript+0x00000000000004B8 (e:\builds\moz2_slave\m-in-w32-d-0000000000000000000\build\js\src\jsinterp.cpp, line 340)
mozjs!js::ExecuteKernel+0x00000000000001EC (e:\builds\moz2_slave\m-in-w32-d-0000000000000000000\build\js\src\jsinterp.cpp, line 530)
mozjs!js::Execute+0x00000000000004A5 (e:\builds\moz2_slave\m-in-w32-d-0000000000000000000\build\js\src\jsinterp.cpp, line 570)
mozjs!JS_ExecuteScript+0x00000000000001E8 (e:\builds\moz2_slave\m-in-w32-d-0000000000000000000\build\js\src\jsapi.cpp, line 5516)
mozjs!JS_ExecuteScriptVersion+0x0000000000000033 (e:\builds\moz2_slave\m-in-w32-d-0000000000000000000\build\js\src\jsapi.cpp, line 5525)
xul!mozJSComponentLoader::ObjectForLocation+0x0000000000000C76 (e:\builds\moz2_slave\m-in-w32-d-0000000000000000000\build\js\xpconnect\loader\mozjscomponentloader.cpp, line 1103)
xul!mozJSComponentLoader::LoadModule+0x000000000000016B (e:\builds\moz2_slave\m-in-w32-d-0000000000000000000\build\js\xpconnect\loader\mozjscomponentloader.cpp, line 532)
xul!nsComponentManagerImpl::KnownModule::Load+0x000000000000002E (e:\builds\moz2_slave\m-in-w32-d-0000000000000000000\build\xpcom\components\nscomponentmanager.cpp, line 698)
xul!nsComponentManagerImpl::CreateInstanceByContractID+0x0000000000000083 (e:\builds\moz2_slave\m-in-w32-d-0000000000000000000\build\xpcom\components\nscomponentmanager.cpp, line 1030)
xul!nsComponentManagerImpl::GetServiceByContractID+0x00000000000002BE (e:\builds\moz2_slave\m-in-w32-d-0000000000000000000\build\xpcom\components\nscomponentmanager.cpp, line 1426)
xul!CallGetService+0x000000000000003B (e:\builds\moz2_slave\m-in-w32-d-0000000000000000000\build\obj-firefox\xpcom\build\nscomponentmanagerutils.cpp, line 63)
xul!CallGetService<nsIProtocolHandler>+0x0000000000000051 (e:\builds\moz2_slave\m-in-w32-d-0000000000000000000\build\obj-firefox\dist\include\nsservicemanagerutils.h, line 98)
xul!nsIOService::GetProtocolHandler+0x0000000000000110 (e:\builds\moz2_slave\m-in-w32-d-0000000000000000000\build\netwerk\base\src\nsioservice.cpp, line 418)
xul!nsIOService::NewURI+0x00000000000000E1 (e:\builds\moz2_slave\m-in-w32-d-0000000000000000000\build\netwerk\base\src\nsioservice.cpp, line 540)
xul!NS_InvokeByIndex_P+0x0000000000000027 (e:\builds\moz2_slave\m-in-w32-d-0000000000000000000\build\xpcom\reflect\xptcall\src\md\win32\xptcinvoke.cpp, line 71)
xul!XPCWrappedNative::CallMethod+0x0000000000001084 (e:\builds\moz2_slave\m-in-w32-d-0000000000000000000\build\js\xpconnect\src\xpcwrappednative.cpp, line 2385)
xul!XPC_WN_CallMethod+0x00000000000002A1 (e:\builds\moz2_slave\m-in-w32-d-0000000000000000000\build\js\xpconnect\src\xpcwrappednativejsops.cpp, line 1417)
mozjs!js::InvokeKernel+0x000000000000043D (e:\builds\moe:\builds\moz2_slave\m-in-w32-d-0000000000000000000\build\toolkit\mozapps\installer\packager.mk:590:0: command 'e:/builds/moz2_slave/m-in-w32-d-0000000000000000000/build/obj-firefox/_virtualenv/Scripts/python.exe e:/builds/moz2_slave/m-in-w32-d-0000000000000000000/build/toolkit/mozapps/installer/packager.py -DNO_NSPR_10_SUPPORT -DAB_CD=en-US -DMOZ_APP_NAME=firefox -DPREF_DIR=defaults/preferences -DMOZ_DEBUG=1 -D_MSC_VER=1600 -DJAREXT= -DMOZ_ANGLE_RENDERER=1 -DMOZ_D3DCOMPILER_DLL=D3DCompiler_43.dll -DMOZ_CHILD_PROCESS_NAME=plugin-container.exe -DMOZ_MSVC_REDIST=1600 -DMOZ_SHARED_MOZGLUE=1 -DMOZ_JSDEBUGGER -DNECKO_WIFI -DDLL_PREFIX= -DDLL_SUFFIX=.dll -DBIN_SUFFIX=.exe -DBINPATH=bin -DENABLE_MARIONETTE=1 \
	--format omni \
	--removals e:/builds/moz2_slave/m-in-w32-d-0000000000000000000/build/browser/installer/removed-files.in \
	 \
	 \
	 \
	--optimizejars \
	 \
	package-manifest ../../dist ../../dist/firefox \
	' failed, return code 1
e:\builds\moz2_slave\m-in-w32-d-0000000000000000000\build\toolkit\mozapps\installer\packager.mk:617:0: command 'C:/mozilla-build/python27/python.exe e:/builds/moz2_slave/m-in-w32-d-0000000000000000000/build/build/pymake/pymake/../make.py make-package-internal' failed, return code 2
e:\builds\moz2_slave\m-in-w32-d-0000000000000000000\build\config\rules.mk:596:0: command 'C:/mozilla-build/python27/python.exe e:/builds/moz2_slave/m-in-w32-d-0000000000000000000/build/build/pymake/pymake/../make.py libs' failed, return code 2
e:\builds\moz2_slave\m-in-w32-d-0000000000000000000\build\browser\build.mk:45:0: command 'C:/mozilla-build/python27/python.exe e:/builds/moz2_slave/m-in-w32-d-0000000000000000000/build/build/pymake/pymake/../make.py -C browser/installer' failed, return code 2
Traceback (most recent call last):
  File "e:/builds/moz2_slave/m-in-w32-d-0000000000000000000/build/toolkit/mozapps/installer/packager.py", line 366, in <module>
    main()
  File "e:/builds/moz2_slave/m-in-w32-d-0000000000000000000/build/toolkit/mozapps/installer/packager.py", line 358, in main
    args.source, gre_path, base)
  File "e:/builds/moz2_slave/m-in-w32-d-0000000000000000000/build/toolkit/mozapps/installer/packager.py", line 140, in precompile_cache
    errors.fatal('Error while running startup cache precompilation')
  File "e:\builds\moz2_slave\m-in-w32-d-0000000000000000000\build\python\mozbuild\mozpack\errors.py", line 101, in fatal
    self._handle(self.FATAL, msg)
  File "e:\builds\moz2_slave\m-in-w32-d-0000000000000000000\build\python\mozbuild\mozpack\errors.py", line 96, in _handle
    raise ErrorMessage(msg)
mozpack.errors.ErrorMessage: Error: Error while running startup cache precompilation
https://hg.mozilla.org/mozilla-central/rev/3ac24f8c6d21
https://hg.mozilla.org/mozilla-central/rev/5f4904d3191f
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla22
Whiteboard: [qa-]
Nom for tef per Bug 851281 comment 30.
blocking-b2g: leo+ → tef?
Even if we found the root cause in Bug 851281, this patch still provides us guards around such unexpected problems.
Flags: needinfo?(mvines)
I agree it's nice to have but doesn't seem mandatory for tef+, so rather not take the risk.  Instead lets, umm, fall on our face the next time this class of problem manifests on v1.0.1 :)
blocking-b2g: tef? → leo+
Flags: needinfo?(mvines)
Michael, I'd like to insist here.

* A homescreen that doesn't show anything is the last thing we want. This patch prevents a whole class of potential bugs that lead to this state.
* this patch landed for nearly 1 month in both m-c and b2g18, we would have caught any existing regression as of now. AFAIK we had none.
* even if the code added seems big, in reality there is not much changed in the existing code: a safer way to split the URL, a 404 error when we would have badly thrown instead.
Flags: needinfo?(mvines)
np.  The class of problems that this bug prevents are ones that would we would not expect the end-user to ever encounter right?  The bug reference in comment 30 looked to be more a build configuration issue.
Flags: needinfo?(mvines)
The fact that we don't expect them does not mean that a user won't trigger it...
Yep, I know.  I'm on your side here!  Help me turn that - into a + with a strong justification for how this bug could manifest in the commercial device.
When the homescreen starts up, it first displays the apps that are in its local indexed db and only then it synchronizes with the Gecko apps registry.

The problem manifests when the homescreen has an unavailable app in its local indexed db. "Unavailable" means here that the app is not on the device anymore (ie not in the apps registry).

I believe that this specific bug will prevent us from having a system update that removes apps from the device. This will prevent a user from removing an app "by hand". Even if that's unexpected, there could be bugs in the homescreen that unsynchronize the local indexed db, and without this patch, the local indexed db could not be resynchronized.

If that's not enough, I won't fight more. I believe we include much less important and much less safe patches in 1.0.1, having to fight about this drives me crazy.
(In reply to Julien Wajsberg [:julienw] from comment #36)
> I believe that this specific bug will prevent us from having a system update
> that removes apps from the device.

Ok, not sure this is even a requirement at the moment.  If this becomes a requirement then I think that system update could also simply include this patch.

> This will prevent a user from removing an app "by hand".

You mean through |adb shell| right?  This is a developer use-case so not really blocking.  

> Even if that's unexpected, there could be bugs in the homescreen that 
> unsynchronize the local indexed db, and without this patch, the local 
> indexed db could not be resynchronized.

I'm *certain* there are bugs but really the concern is whether this patch will fix more bugs than it introduces. :)

> If that's not enough, I won't fight more. I believe we include much less
> important and much less safe patches in 1.0.1, having to fight about this
> drives me crazy.

I understand your frustration.  We had a window to get slightly risker patches in v1.0.1, as well as "would really be nice and very likely low risk" patches like this, but that window has now closed on us.  If this was a week back I'd have tef+'ed it already.  However I'm certainly not the sole arbiter of what lands in v1.0.1 so I do encourage you to tef? this bug again for somebody else to assess if you feel this bug could result in a significant number of support calls or device returns.
re: whether this is tef+ worthy: comment 7 and comment 12 make it sound like w/o this patch, updating a packaged app has a 1/5 chance of getting it into a wedged state when it won't launch.
Flags: in-moztrap-
Product: Core → Core Graveyard
You need to log in before you can comment on or make changes to this bug.