Closed Bug 1163956 Opened 9 years ago Closed 9 years ago

[nexus-5-l] Can't perform FOTA from 2.2 to 2.2

Categories

(Firefox OS Graveyard :: General, defect)

ARM
Gonk (Firefox OS)
defect
Not set
normal

Tracking

(firefox42 fixed)

RESOLVED FIXED
FxOS-S2 (10Jul)
Tracking Status
firefox42 --- fixed

People

(Reporter: gchang, Assigned: etsai)

References

(Depends on 1 open bug)

Details

Attachments

(8 files, 5 obsolete files)

6.23 KB, text/plain
Details
6.67 KB, text/plain
Details
7.76 KB, text/plain
Details
906 bytes, patch
Details | Diff | Splinter Review
906 bytes, patch
Details | Diff | Splinter Review
56 bytes, text/x-github-pull-request
gsvelto
: review+
Details | Review
43 bytes, text/x-github-pull-request
gsvelto
: review+
Details | Review
1.46 KB, patch
Details | Diff | Splinter Review
Trying to upgrade 2.2 to 2.2 using FOTA, but got errors when entering recovery mode. Which will cause me to use adb sideload to update device.
The log shows
"Rebooting into recovery to apply FOTA update: /storage/emulated/legacy/updates/fota/update.zip"

STR:

1. Build 2.2 fota image locally and put the image in local server.
2. Flash nexus-5-l from pvt
3. Change ota url in "Developer" in Settings app
4. Go to Settings > Device information 
5. Tap "Check now" button
6. Perform upgrade in notification bar

### Expected:
1. The device should be upgraded

### Actual:
1. The device failed to upgrade

### Reproduce rate
always
QA Whiteboard: [COM=OTA]
See Also: → 1136638
Two problems to be solved:
1) Command for recovery is set: "/updates/fota/update.zip", path should be fixed
2) When using sideload, the update-binary is too old to support new function. Need to package a new one with update.zip
Assignee: nobody → etsai
Status: NEW → ASSIGNED
For first problem, set RECOVERY_EXTERNAL_STORAGE to /sdcard in full_hammerhead.mk doesn't work on my build or google's LMY47D recovery image, file not found. TWRP will mount /sdcard and there is update.zip in /sdcard/updates/fota, need to know how android L works.
For 1) Using stock recovery, only /cache is mounted, it doesn't match with b2g update service's download path. Per-discussed with :kli, we will modify recovery to mount external storage. Then it will work as b2g.
For 2) copy system/bin/updater to device/, add TARGET_UPDATE_BINARY to BoardConfig.mk.
For 2) Update: our edify script generator uses set_perm and set_perm_recursive, these two functions are deprecated.
(In reply to Eric Tsai from comment #3)
> For 1) Using stock recovery, only /cache is mounted, it doesn't match with
> b2g update service's download path. Per-discussed with :kli, we will modify
> recovery to mount external storage. Then it will work as b2g.
> For 2) copy system/bin/updater to device/, add TARGET_UPDATE_BINARY to
> BoardConfig.mk.

I think you can also just make TARGET_UPDATE_BINARY to be like:
TARGET_UPDATE_BINARY := $(PRODUCT_OUT)/system/bin.updater

Gerry, we cannot debug blindly this kind of error. Next time you should include the content of /cache/recovery/last_log for example.
Flags: needinfo?(gchang)
Attached file last_log
Flags: needinfo?(gchang)
Summarize problems to solve:
1) Command for recovery is set: "/updates/fota/update.zip", path should be fixed
2) updater-script uses "set_perm" and "set_perm_recursive" those are deprecated and replaced by "set_metadata" and "set_metadata_recursive" in new update-binary

:gsvelto, for this bug, my proposed solution is:
1) add RECOVERY_EXTERNAL_STORAGE to device/lge/hammerhead/hammerhead.mk
2) fork bootable/recovery branch 5.1.0_r1 from CAF, patch install.c with the two functions: "set_perm" and "set_perm_recursive". Then modify manifest.xml for nexus-5-l using the new revision.

If it's ok, :kli and I will finish in this week.
Flags: needinfo?(gsvelto)
(In reply to Eric Tsai from comment #7)
> 1) add RECOVERY_EXTERNAL_STORAGE to device/lge/hammerhead/hammerhead.mk

+1

> 2) fork bootable/recovery branch 5.1.0_r1 from CAF, patch install.c with the
> two functions: "set_perm" and "set_perm_recursive". Then modify manifest.xml
> for nexus-5-l using the new revision.

From what I can tell the edify script generator already supports using set_metadata and set_metadata_recursive, you have to add the 'use_set_metadata=1' flag to the META/misc_info.txt for it to work though.
Flags: needinfo?(gsvelto)
Using my proposed solution works to adopt FOTA. But after FOTA, b2g can't start. I use ./build.sh gecko-update-fota-full to generate a full fota package.
It's strange because I can push update.zip and write recovery command as FOTA does, b2g works fine.

No matter download or manually FOTA, the FOTA log is the same, with unmount data partition fail, device busy.

:kli and I did some experiments and we found two ways to recovery from the status:
1. reboot to recovery mode, use vol/power button, reboot to system, then b2g starts.
2. flash clear data or cache partition by fastboot and reboot, then b2g starts

Still trying what can help.
I think :gsvelto said 'use_set_metadata=1' flag to META/misc_info.txt doesn't works for "./build.sh gecko-fota-update-full". Let updater-binary support set_perm should be better?
More experiments on FOTA pcakage
1. Remove all mount/extract/delete command in updater-script, do FOTA->b2g can't start
2. Download and extract to update.zip, don't write command and reboot to recovery. Manually reboot to system or reboot to recovery then to system->b2g starts
3. Download and extract to update.zip, don't write any command but auto reboot to recovery->b2g starts
4. Change EXTERNAL_STORAGE to /cache, but RECOVERY_EXTERNAL_STORAGE in /data/media/0, make a fail update->b2g can't start
5. Change EXTERNAL_STORAGE and RECOVERY_EXTERNAL_STORAGE to /cache and do FOTA->b2g can't start
Some more experiments, when b2g can't start, move /cache/recovery to /cache/recovery2, then b2g starts.
Not sure why but may be I can check bootable/recovery and see something related with /cache/recovery
Only /cache/recovery/last_install will cause b2g can't start. After change recovery.cpp not copy /tmp/last_install to /cache/recovery/last_install, the FOTA works fine and b2g can start. I think the last_install format should be fine but need to figure out why can't start b2g.
Eric, can you be more precise for what happens when you say "b2g cannot start"?

Does the system not even boot?
Does it boots but the process /system/b2g/b2g never gets created?
Does /system/b2g/b2g gets executed but stalls at some point?
Flags: needinfo?(etsai)
Hi Alexandre, in my case b2g starts but crash. Some watchdog will start b2g again. I can use b2g-ps see the PID increase. Screen stay in Google logo, doesn't change to b2g animation.
When I remove (or rename) last_install, b2g starts and I can boot to home screen
I have a back trace log but I can't find the root cause.
Flags: needinfo?(etsai)
Fix RECOVERY_EXTERNAL_STORAGE for nexus 5
Add set_perm for backward compatible for gecko-update-fota-full which will not use set_metadata
Hi Alexandre, the two patches can helps to duplicate my current status.
That's strange, I cannot find any trace of playing with last_install in Gecko
Hm are you sure Gecko is at fault? It's only librecovery and recovery itself that manipulates this last_install file. And you did hack recovery ...
Eric, last_install is being read by librecovery in |getFotaUpdateStatus()| and this is being used by Gecko https://dxr.mozilla.org/mozilla-central/search?q=getFotaUpdateStatus&case=true

Maybe it's this code that is choking ?
Flags: needinfo?(etsai)
I can try this later. But in my experience, I didn't see any related log from logcat.
Flags: needinfo?(etsai)
Alexandre, I did some experiments and find

https://dxr.mozilla.org/mozilla-central/source/b2g/components/RecoveryService.js#145

Accessing cStatus.result cause b2g crash, any idea?
No idea, at all.
(In reply to Eric Tsai from comment #25)
> Alexandre, I did some experiments and find
> 
> https://dxr.mozilla.org/mozilla-central/source/b2g/components/
> RecoveryService.js#145
> 
> Accessing cStatus.result cause b2g crash, any idea?

Vivien suggested maybe we have simply a discrepency with the ctypes ?
Vivien, may I know what's "simply a discrepency with the ctypes" means? I don't have any idea about this.

I print the status object/structure address in 1) RecoveryService.js, after create cStatus, 2) librecovery.c, before leave getFotaUpdateStatus and 3) RecoveryService.js, after calling getFotaUpdateStatus.
1 and 2 are the same address, but 3 is different. Is this why b2g crashes?

> I/Gecko   (  736): -*- RecoveryService: before getFotaUpdateStatus, cStatus.address: FotaUpdateStatus.ptr(ctypes.UInt64("0xb1e4ee10"))
> W/librecovery(  736): before leave getFotaUpdateStatus, status address: 0xb1e4ee10
> I/Gecko   (  736): -*- RecoveryService: after getFotaUpdateStatus, cStatus.address: FotaUpdateStatus.ptr(ctypes.UInt64("0x656d2f61"))
Flags: needinfo?(21)
(In reply to Eric Tsai from comment #28)
> Vivien, may I know what's "simply a discrepency with the ctypes" means? I
> don't have any idea about this.
> 
> I print the status object/structure address in 1) RecoveryService.js, after
> create cStatus, 2) librecovery.c, before leave getFotaUpdateStatus and 3)
> RecoveryService.js, after calling getFotaUpdateStatus.
> 1 and 2 are the same address, but 3 is different. Is this why b2g crashes?
> 
> > I/Gecko   (  736): -*- RecoveryService: before getFotaUpdateStatus, cStatus.address: FotaUpdateStatus.ptr(ctypes.UInt64("0xb1e4ee10"))
> > W/librecovery(  736): before leave getFotaUpdateStatus, status address: 0xb1e4ee10
> > I/Gecko   (  736): -*- RecoveryService: after getFotaUpdateStatus, cStatus.address: FotaUpdateStatus.ptr(ctypes.UInt64("0x656d2f61"))

Eric, what Vivien said is that maybe the ctypes binding are not in sync with your current librecovery's prototype.
Flags: needinfo?(21)
Looks the same in RecoveryService.js
>  let FotaUpdateStatus = new ctypes.StructType("FotaUpdateStatus", [
>                                                { result: ctypes.int },
>                                                { updatePath: ctypes.char.ptr }
>                                              ]);

and librecovery.h
> typedef enum {
>   FOTA_UPDATE_UNKNOWN = 0,
>   FOTA_UPDATE_FAIL = 1,
>   FOTA_UPDATE_SUCCESS = 2
> } FotaUpdateResult;
> 
> typedef struct {
>   FotaUpdateResult result;
>   // The path to the update that was installed
>   char updatePath[PATH_MAX];
> } FotaUpdateStatus;
Eric, could you give a try to this patch on v2.2 branch ? I suspected maybe
there was something wrong with ctypes itself so we checked with :nbp but while
verifying he suspecting the line making use of FotaUpdateStatus struct was quite
strange, so we dig up into Gecko for other uses and it showed that all others
were declaring the struct like it is done right now, but we using it in another
way:
 - either accessing directly the variable declared as ctypes.StructType()
   without any use of '()', 'new' and playing around with '.ptr'
 - or accessing the variable declared as ctypes.StructTypes() with '()', a 'new'
   statement and using '.address()'

This patch should make us in the second case properly, by adding the 'new'
statement. I have no idea why this would not have been caught previously :(
Flags: needinfo?(etsai)
Alexandre, patch from attachment 8621539 [details] [diff] [review] doesn't help :( How could I try the first case?
Flags: needinfo?(etsai)
I run the same code on flame-2.2, cStatus.address() is the same before/after getFotaUpdateStatus. I think if we can figure out why address change on nexus 5, this bug can be solved.
Eric, can you try attachment 8622474 [details] [diff] [review] ?
Flags: needinfo?(etsai)
attachment 8622474 [details] [diff] [review] doesn't work for me and will get the following gecko error:

W/GeckoConsole(  196): [JavaScript Error: "TypeError: expected type pointer, got ctypes.StructType("FotaUpdateStatus", [{ "result": ctypes.int }, { "updatePath": ctypes.char.ptr }]).ptr" {file: "jar:file:///system/b2g/omni.ja!/components/RecoveryService.js" line: 144}]
W/GeckoConsole(  196): [JavaScript Error: "NS_ERROR_XPC_JAVASCRIPT_ERROR_WITH_DETAILS: [JavaScript Error: "expected type pointer, got ctypes.StructType("FotaUpdateStatus", [{ "result": ctypes.int }, { "updatePath": ctypes.char.ptr }]).ptr" {file: "jar:file:///system/b2g/omni.ja!/components/RecoveryService.js" line: 146}]'[JavaScript Error: "expected type pointer, got ctypes.StructType("FotaUpdateStatus", [{ "result": ctypes.int }, { "updatePath": ctypes.char.ptr }]).ptr" {file: "jar:file:///system/b2g/omni.ja!/components/RecoveryService.js" line: 144}]' when calling method: [nsIRecoveryService::getFotaUpdateStatus]" {file: "jar:file:///system/b2g/omni.ja!/components/nsUpdateService.js" line: 1848}]
W/GeckoConsole(  196): [JavaScript Error: "NS_ERROR_XPC_JAVASCRIPT_ERROR_WITH_DETAILS: [JavaScript Error: "expected type pointer, got ctypes.StructType("FotaUpdateStatus", [{ "result": ctypes.int }, { "updatePath": ctypes.char.ptr }]).ptr" {file: "jar:file:///system/b2g/omni.ja!/components/RecoveryService.js" line: 146}]'[JavaScript Error: "expected type pointer, got ctypes.StructType("FotaUpdateStatus", [{ "result": ctypes.int }, { "updatePath": ctypes.char.ptr }]).ptr" {file: "jar:file:///system/b2g/omni.ja!/components/RecoveryService.js" line: 144}]' when calling method: [nsIRecoveryService::getFotaUpdateStatus]" {file: "jar:file:///system/b2g/omni.ja!/components/nsUpdateService.js" line: 1848}]
Flags: needinfo?(etsai)
Bobby, could you help me to figure out why cStatus.address() changes after:
https://dxr.mozilla.org/mozilla-central/source/b2g/components/RecoveryService.js#144

I did some experiments and find on my Nexus 5, the status structure's address changes then cause b2g crashes. But this will not happen on flame. RecoveryService.js and librecovery.c are the same between nexus 5 and flame's code base.
Flags: needinfo?(bobbyholley)
cStatus is |new ctypes.StructType("FotaUpdateStatus", ...)| right? That means that it represents a type, not data. You can't take the address of a type. ;-)

You should really rename cStatus to "cStatus_t", and then do "cStatus = new cStatus_t(...)", and take .address() of that.
Flags: needinfo?(bobbyholley)
Boddy, I think FotaUpdateStatus represents a type now, from:
https://dxr.mozilla.org/mozilla-central/source/b2g/components/RecoveryService.js#29
I guess Alexandre's attachment 8621539 [details] [diff] [review] works the same as you said? I've tried this but still get two different addresses, b2g crashes when access cStatus.result.
Flags: needinfo?(bobbyholley)
Looks like the RecoveryService.js jsctype structure doesn't match with librecovery stucture. I fix updatePath with:
 let FotaUpdateStatus = new ctypes.StructType("FotaUpdateStatus", [
                                                { result: ctypes.int },
                                                { updatePath: ctypes.char.array(4096) }
                                              ]);

then everything works good now.
Flags: needinfo?(bobbyholley)
Attachment #8617855 - Attachment is obsolete: true
Change FotaUpdateStatus.updatePath from ctypes.char.ptr to ctyps.char.array(4096), sync with librecovery.h
Attachment #8630335 - Flags: review?(gsvelto)
Attachment #8629267 - Flags: review?(gsvelto)
Attachment #8628672 - Flags: review?(gsvelto)
Eric, did you had time to check on other devices that this was not regressing?
Flags: needinfo?(etsai)
Flags: needinfo?(taz)
I've tested attachment 8630335 [details] [diff] [review] on flame-kk at branch 2.2 only, it works fine so I think will not cause regression. 
As I mentioned in Comment 39, I don't know why original code works on flame but crashes nexus 5. Maybe Gerry can helps to verify attachment 8630335 [details] [diff] [review] for more devices?
Flags: needinfo?(etsai) → needinfo?(gchang)
Comment on attachment 8630335 [details] [diff] [review]
0001-Bug-1163956-Modify-updatePath-to-fixed-char-array.patch

LGTM but it be worth mentioning in a comment that this is due to librecovery.h using PATH_MAX as the size for that particular field so that someone reading the code doesn't wonder where that number is coming from.
Attachment #8630335 - Flags: review?(gsvelto) → review+
Comment on attachment 8628672 [details] [review]
Add RECOVERY_EXTERNAL_STORAGE

LGTM.
Attachment #8628672 - Flags: review?(gsvelto) → review+
Comment on attachment 8629267 [details] [review]
Support USE_SET_METADATA for build fota package

LGTM, I've just added a nit about commenting this change a little more. Our scripts are already fairly obscure and I think we should leave small comments in our code to explain why we're doing certain things otherwise it will be hard for future readers to figure out why we did.
Attachment #8629267 - Flags: review?(gsvelto) → review+
Update with comment for changes to char.array, and where the size from
Attachment #8630335 - Attachment is obsolete: true
(In reply to Eric Tsai from comment #51)
> Created attachment 8630924 [details] [diff] [review]
> [final]0001-Bug-1163956-Modify-updatePath-to-fixed-char-array-v2.patch,
> r=gsvelto
> 
> Update with comment for changes to char.array, and where the size from

Just my 2 cents, maybe to avoid potential issues we should reuse the PATH_MAX define rather than hardcoding 4096 ? Long shot, but still ...
Agree, my original idea is to reuse the PATH_MAX define from limits.h. But I don't have proper way so finally hardcoding 4096.
(In reply to Eric Tsai from comment #53)
> Agree, my original idea is to reuse the PATH_MAX define from limits.h. But I
> don't have proper way so finally hardcoding 4096.

Yeah that would be best but I don't know of any reasonable way to pull a constant such as PATH_MAX defined in a header into JavaScript :-( Maybe we can put it in a global somewhere and read that using jsctypes but that feels like overkill for this patch. We can open a bug for that though, it might be a good first bug (low priority, high reward in terms of learning how our stuff works).
Depends on: 1181492
Flags: needinfo?(gchang)
https://hg.mozilla.org/mozilla-central/rev/2c678f1b4207
Status: ASSIGNED → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Target Milestone: --- → FxOS-S2 (10Jul)
Flags: needinfo?(taz)
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: