Closed Bug 1234898 Opened 6 years ago Closed 6 years ago

Fullimg mar off of taskcluster fails to build appropriately.

Categories

(Taskcluster :: General, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED WONTFIX

People

(Reporter: nhirata, Unassigned)

Details

(Keywords: qablocker, Whiteboard: [dogfood-blocker])

Attachments

(3 files)

Attached file recovery log
1. update device with : https://queue.taskcluster.net/v1/task/OnnzQ-rTRLubS0MFR-ECYA/runs/0/artifacts/private/build/fota-aries-update-fullimg.mar

Expected: device updates and boots
Actual: device doesn't update.

logcat: 
W/linker  (  976): could not load library "/system/b2g/libmozglue.so" from LD_PRELOAD for "getprop"; caused by library "/system/b2g/libmozglue.so" not found
E/Diag_Lib(  359): qmi_init:  Not initialized, calling process init sequence
E/Diag_Lib(  359): Setting internal use port to rmnet0
W/linker  ( 1028): could not load library "/system/b2g/libmozglue.so" from LD_PRELOAD for "getprop"; caused by library "/system/b2g/libmozglue.so" not found
D/charger_monitor(  506): init target 500000
W/linker  ( 1035): could not load library "/system/b2g/libmozglue.so" from LD_PRELOAD for "getprop"; caused by library "/system/b2g/libmozglue.so" not found


Note: works fine when building locally.  There's something wrong with the taskcluster image or build system.
Unmaring and sideloading the mar file fails to install.
Whiteboard: [dogfood-blocker]
Alexandre Lissey and I had pre-investigated the issue, and the issue lies in the Taskcluster builds.
Note: the full zip file is fine when flashing the device and OTA mar file is fine.  There's something going wrong with the zip process for the mar file for the fota-aries-update-fullimg.mar.

Alexandre had told me that the update.zip from the mar is based off of the files for aries.zip. 
The mar does unpackage the files without error so I think the main problem lies in the zipping of the update.zip file on taskcluster.

It works fine locally.
Flags: needinfo?(jopsen)
What's even more odd is that today's build : 
https://tools.taskcluster.net/task-inspector/#W4AHfVjAT0WHdZied1KckA/0
does unmar and sideload.

So it appears that there might be some sort of race condition that the build structure hits every now and then is my guess? I'm not sure I understand the difference yet.
[ to note, we have a dogfood blocker in today's build so we can't use it ]
I think there's a possibility that the cache might be corrupted.
I built without cache on taskcluster and it seemed to work fine as well.

I'm not sure how to expose this.
Note: that the logcat can appear with a misconfiguration of the update server for a FOTA build.
Having said this, I double checked and the issue still occurs still with this build.
https://tools.taskcluster.net/task-inspector/#HMZj1bt9Sz2y5WqaQmzYow/1 was kicked off with a build.sh -j1 for troubleshooting.
A build made via using mach shows this issue as well : 
https://tools.taskcluster.net/task-inspector/#TakfjYt8RtizBxrjKzdkVw/

Made a new dogfood build using j1 to see if this works:
https://tools.taskcluster.net/task-inspector/#CHzIWbKVTSmtRkfJ9cjXKA/
Even with a j1 the update.zip fails.  It seems like the failure occurs in the packaging of the zip file as the mar file unpacks just fine.
I added a sleep 10 in two places and it seems to work for one try.
Need to do further testing to see if this resolves the issue.
> I added a sleep 10 in two places and it seems to work for one try.
> Need to do further testing to see if this resolves the issue.

Sounds like a race condition in a build system I'm completely unfamiliar with.
Flags: needinfo?(jopsen)
May I suggest we do try also with adding sha1sum call on the resulting zip and MAR file, so we can compare the result in the end?
(In reply to Naoki Hirata :nhirata (please use needinfo instead of cc) from comment #8)
> A build made via using mach shows this issue as well : 
> https://tools.taskcluster.net/task-inspector/#TakfjYt8RtizBxrjKzdkVw/
> 
> Made a new dogfood build using j1 to see if this works:
> https://tools.taskcluster.net/task-inspector/#CHzIWbKVTSmtRkfJ9cjXKA/

I took the fota-aries-update-fullimg.mar out of your first link, mar.py -x -j, I get a readable update.zip.

Then adb sideload this.

And then. Well, it works.
Said otherwise, I am starting to doubt there is a bug with build system itself. On the other hand, I got at least one report from Julien of failing update. He investigated, found the update.zip on the device was broken. Then cleaned and redownloaded MAR and it worked.

I also got report from Ben, but we could not get logs to confirm. However I am starting to get convinced he might have suffered the same condition since after three retries it applied properly.
Flags: needinfo?(felash)
Flags: needinfo?(bfrancis)
Ok, adb sideload works, it starts to flash partition and then it fails with: "E:Error in /tmp/update.zip (Status 0)" during flashing of system partition.
(In reply to Alexandre LISSY :gerard-majax from comment #15)
> Ok, adb sideload works, it starts to flash partition and then it fails with:
> "E:Error in /tmp/update.zip (Status 0)" during flashing of system partition.

now that is fun because the device does boot with the new system, but I am missing homescreen?

Could it be a resurgence of gaia profile building race condition ?
Homescreen is running but empty
(In reply to Alexandre LISSY :gerard-majax from comment #17)
> Homescreen is running but empty

That might be because of back/forth with new/old build:
> 01-09 15:14:40.489  1147  1147 E Default Home Screen: [JavaScript Error: "NS_ERROR_STORAGE_CONSTRAINT: " {file: "app://homescreen.gaiamobile.org/gaia_build_defer_index.js" line: 800}]
Sorry, I don't have any more log.

I also don't have the failed update.log anymore (I guess coming from applying update.mar) but I remember there was something about an error while building the zip.
Flags: needinfo?(felash)
Maybe the race is triggered by the multiple forced dependencies we have in gonk-misc/Android.mk against system.img that triggers rebuild of gecko and gaia?

This was made in the early days to be extra cautious.
Attached file Gonk-misc PR
Naoki, that would remove all dependency against system.img avoiding rebuilding that partition and Gecko/Gaia. Maybe this is the root cause of that race?
Flags: needinfo?(nhirata.bugzilla)
I doubt this is the issue I had: likely I downloaded the same .mar in both my failed and successful attempts, only how it was extracted differed... That means this isn't a build issue IMO.
I had hit the same issues as you when dealing with this, where doing a sideload with a manually un-mar-ed file with -x -j would finish sideloading with an error at the end and shows the error banner.   It doesn't seem like anything failed from the install, at the same time, it will freak end users out if we don't have a clean upgrade and is a bad user experience.

It's almost like something like a misplaced EoF or something after zipping or marring that's causing the error to be thrown?  That's just a guess.

I'm not sure why the sleeps seems to help...
Flags: needinfo?(nhirata.bugzilla)
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → WONTFIX
Flags: needinfo?(bfrancis)
You need to log in before you can comment on or make changes to this bug.