[Buri] OTA updates fails on master/m-c

RESOLVED WONTFIX

Status

RESOLVED WONTFIX
5 years ago
4 years ago

People

(Reporter: tchung, Unassigned)

Tracking

({regression})

Firefox Tracking Flags

(Not tracked)

Details

(Whiteboard: burirun1)

Attachments

(3 attachments)

(Reporter)

Description

5 years ago
Created attachment 793164 [details]
b2g_noOTA_log.txt

I tried forcing an OTA update on a 1.2 master nightly, and it failed to apply one. 

Full logcat attached.

logcat snippet:
---------------
08-20 15:27:55.839: D/HWComposer(140): Render aborted. Nothing was drawn to the screen
08-20 15:27:55.919: I/GeckoUpdater(1666): Progress [ =================================================  ]
08-20 15:27:55.939: I/GeckoUpdater(1666): Progress [ ================================================== ]
08-20 15:27:55.939: I/GeckoUpdater(1666): Finished applying update
08-20 15:27:55.969: D/HWComposer(140): ImageLayer Layer doesn't have a gralloc buffer
08-20 15:27:55.969: D/HWComposer(140): Render aborted. Nothing was drawn to the screen
08-20 15:27:56.069: I/GonkAutoMounter(1666): Mounted /system partition as read-only
08-20 15:27:56.079: I/Gecko(140): *** AUS:SVC readStatusFile - status: failed: 7, path: /data/local/updates/0/update.status
08-20 15:27:56.089: E/GeckoConsole(140): AUS:SVC readStatusFile - status: failed: 7, path: /data/local/updates/0/update.status
08-20 15:27:56.089: I/Gecko(140): UpdatePrompt: Update error, state: failed, errorCode: 7
08-20 15:27:56.109: I/Gecko(140): UpdatePrompt: Setting gecko.updateStatus: Install Pending
08-20 15:27:56.139: I/Gecko(140): *** AUS:SVC UpdateManager:refreshUpdateStatus - Notifying observers that the update was staged. state: pending, status: failed: 7
08-20 15:27:56.139: E/GeckoConsole(140): AUS:SVC UpdateManager:refreshUpdateStatus - Notifying observers that the update was staged. state: pending, status: failed: 7

Space on Data Partition:
----------------
root@android:/ # df data
Filesystem             Size   Used   Free   Blksize
data                   161M     3M   157M   4096

Repro:
1) install 1.2 hamachi nightly build (1 day old)
** build 20130819040203
2) force check for updates 
** 08-20 15:41:37.949: E/GeckoConsole(293): AUS:SVC Checker:checkForUpdates - sending request to: http://update.boot2gecko.org/hamachi/1.2.0/nightly/update.xml?force=1
4) verify update returns:
<updates><update type="minor" appVersion="26.0a1" version="26.0a1" extensionVersion="26.0a1" buildID="20130820040204" licenseURL="http://www.mozilla.com/test/sample-eula.html" detailsURL="http://www.mozilla.com/test/sample-details.html"><patch type="complete" URL="http://update.boot2gecko.org/hamachi/1.2.0/nightly/b2g_update_20130820040204.mar?build_id=20130820040204&version=26.0a1" hashFunction="SHA512" hashValue="7b36cd72fe7f49b680e17ea64ca662ce6223e03276c787d5a72649cec6f14b4d633c2ad0abc79dc91d8acb847a9495e293d47ae4984262725f3371b5b2b6f0bb" size="46073387"/></update></updates>

5) drag down notification and apply it
6) Verify update downloads, and fails to apply (UI says apply update error)

Expected:
- update successful
Unfortunately, there is so much spew of these messages:

08-20 15:26:24.419: D/HWComposer(140): Layer has a 3D transform or a non-square angle rotation
08-20 15:26:24.419: D/HWComposer(140): Render aborted. Nothing was drawn to the screen

that the beginning of the update isn't even in the log.

I'll see if I can reproduce locally.
OTA updates are part of daily smoketests, so adding the keyword here.
Blocks: 884399
Keywords: smoketest
(Reporter)

Comment 3

5 years ago
(In reply to Dave Hylands [:dhylands] from comment #1)
> Unfortunately, there is so much spew of these messages:
> 
> 08-20 15:26:24.419: D/HWComposer(140): Layer has a 3D transform or a
> non-square angle rotation
> 08-20 15:26:24.419: D/HWComposer(140): Render aborted. Nothing was drawn to
> the screen
> 
> that the beginning of the update isn't even in the log.
> 
> I'll see if I can reproduce locally.

i'll try to reproduce again, and trim the log.  sorry for the noise.
Created attachment 793234 [details]
/data/local/updates/0/update.log

Tip for reducing the log spew:

adb logcat HWComposer:S QCALOG:S Diag_Lib:S ONCRPC:S

will produce a much nicer logcat, although there is so much spew that if you run it after the fact you won't get much output (I run to my terminal window and copy/paste from that).

The update.log shows the following error:

EXECUTE ADD libnss3.so
### execution failed

which is what is causing the WRITE_ERROR (status of 7).

Now to figure out why extracting libnss3.so is failing.
It appears that /system runs out of space while trying to apply the update.

Obviously we need a better error message.

So, fundamentally, the /system partition shows:

1|root@android:/system/b2g # df /system                                        
Filesystem             Size   Used   Free   Blksize
/system                200M   143M    56M   4096

and the compressed mar file is 46 Mb.

I'd guess that TCL only designed their /system partition for FOTA updates and not our update mechanism, which requires considerably more free space.
No longer a smoketest regression then - sounds like the Buri devices are designed for FOTAs, not OTAs.
No longer blocks: 884399
Keywords: smoketest
(Reporter)

Comment 7

5 years ago
FOTA could include OTA system updates, so this is still a legitimate smoketest blocker.   And its highly likely that TCL will take a 1.2 FOTA.  lets leave it out there unless there's nothing we can do about it.
Blocks: 884399
Keywords: smoketest
I don't think that there's any way to make a Full OTA work, unless the size of the OTA can be reduced.

You can definitely do a FOTA (which doesn't need the free space in /system).
And you can definitely do incremental OTA (which needs less space in /system).

And we really need to detect this and produce better error messages.

Updated

5 years ago
Duplicate of this bug: 907788
Unless we're decommissioning the Buri (we're not), QA still needs to be able to apply OTA updates.

dhylands, can you investigate where the new sizing came from and recommend a short term fix?
Assignee: nobody → dhylands
blocking-b2g: koi? → koi+
The 1.1 image from TCL has 52 Mb of free space (I flashed US_20130815.cfg using the TeleWebExt tool)

After flashing https://pvtbuilds.mozilla.org/pvt/mozilla.org/b2gotoro/nightly/mozilla-central-hamachi/2013/08/2013-08-19-04-02-03/
using QA's Fullflash gecko ril gaia.sh.zip from https://wiki.mozilla.org/B2G/QA/Tips_And_Tricks#Scripts then there
is 56 Mb of free space

The uncompressed mar file is about 59 Mb.

So, the 1.2 image seems to actually be smaller than the 1.1 image.

Since the system partition wasn't sized for OTA, the fact that we've had OTAs that have worked in the past has been blind luck.

The "fix" is that we should be using FOTA updates since that's what this phone was designed to use.
Assignee: dhylands → nobody
(Reporter)

Comment 12

5 years ago
are you saying for internal testing purposes, we need to setup FOTA in house and replace our current OTA structure on our nightlies?
blocking-b2g: koi+ → koi?
More or less - yes.

If you could find a way to trim a few Mb from the image, then you could get away with using regular OTA, until the image grows again.

But expecting OTA to work on a phone where one of the reasons that the vendor chose to use FOTA was because there wasn't enough space to use OTA, seems like we're just asking for these types of issues.

Setting up FOTA is really just a case of running a different script. The server portion that serves up the update.xml/update.mar file stays the same.

To build the full OTA that we use, I normally do:

./build.sh gecko-update-full

To build a full FOTA update, I have a script called fota-create.sh which contains the following (for my unagi, some of the fields will be different for buri):

#!/bin/bash
SCRIPT_NAME=$(basename $0)
. load-config.sh
mkdir -p fota
tools/update-tools/build-flash-fota.py \
    --system-dir out/target/product/${DEVICE}/system \
    --system-fs-type ext4 \
    --system-location /dev/block/mmcblk0p19 \
    --data-fs-type ext4 \
    --data-location /dev/block/mmcblk0p22 \
    --output fota/update.zip
tools/update-tools/build-fota-mar.py \
    --output fota/update.mar fota/update.zip

And that creates a FOTA version of update.mar. The update.xml file has an extra filed to indicate that its a FOTA update instead of a reagular OTA.
(Reporter)

Comment 14

5 years ago
askeing, can you investigate if this is something you can setup on TW's infrastructure?  we'll want to do all the same testing we did when 1.0.1 -> 1.1 work.
Flags: needinfo?(fyen)
Hi Tony,

I try to do the following steps, and the FOTA update failed.
The devices will reboot then popup the message "There was an error while downloading the updates".

1. List the mount info of buri. (Thanks SChien's help)
  $ adb shell cat /proc/mounts
  rootfs / rootfs ro,relatime 0 0
  tmpfs /dev tmpfs rw,nosuid,relatime,mode=755 0 0
  devpts /dev/pts devpts rw,relatime,mode=600 0 0
  proc /proc proc rw,relatime 0 0
  sysfs /sys sysfs rw,relatime 0 0
  none /acct cgroup rw,relatime,cpuacct 0 0
  tmpfs /mnt/asec tmpfs rw,relatime,mode=755,gid=1000 0 0
  tmpfs /mnt/obb tmpfs rw,relatime,mode=755,gid=1000 0 0
  none /dev/cpuctl cgroup rw,relatime,cpu 0 0
  /dev/block/mtdblock1 /system yaffs2 ro,relatime 0 0
  /dev/block/mtdblock5 /data yaffs2 rw,nosuid,nodev,relatime 0 0
  /dev/block/mtdblock6 /persist yaffs2 rw,nosuid,nodev,relatime 0 0
  /dev/block/mtdblock2 /cache yaffs2 rw,nosuid,nodev,relatime 0 0
  /dev/block/vold/179:1 /mnt/sdcard vfat rw,dirsync,nosuid,nodev,noexec,relatime,uid=1000,gid=1015,fmask=0707,dmask=0707,allow_utime=0020,codepage=cp437,iocharset=iso8859-1,shortname=mixed,utf8,errors=remount-ro 0 0
  /dev/block/vold/179:1 /mnt/secure/asec vfat rw,dirsync,nosuid,nodev,noexec,relatime,uid=1000,gid=1015,fmask=0707,dmask=0707,allow_utime=0020,codepage=cp437,iocharset=iso8859-1,shortname=mixed,utf8,errors=remount-ro 0 0
  tmpfs /mnt/sdcard/.android_secure tmpfs ro,relatime,size=0k,mode=000 0 0

2. Run following script to generate the fota package.
=====
#!/bin/bash
SCRIPT_NAME=$(basename $0)
. load-config.sh
mkdir -p fota
tools/update-tools/build-flash-fota.py \
    --system-dir out/target/product/${DEVICE}/system \
    --system-fs-type yaffs2 \
    --system-location /dev/block/mtdblock1 \
    --data-fs-type yaffs2 \
    --data-location /dev/block/mtdblock5 \
    --output fota/update.zip
tools/update-tools/build-fota-mar.py \
    --output fota/update.mar fota/update.zip
=====

3. Generate update.xml file for fota package.
  $ ./tools/update-tools/build-update-xml.py -c fota/update.mar -o fota/update.xml -v 26 -V 26 -O -u http://people.mozilla.org/~fyen/fota/buri/1.2.0/update.mar

4. Change the update.xml URL.
  $ ./change_OTA_URL.sh -u http://people.mozilla.org/~fyen/fota/buri/1.2.0/update.xml

5. "Settings > Device Info > Check Now".

6. Download and Install the System Update.
Flags: needinfo?(fyen)
(In reply to Askeing Yen[:askeing] from comment #15)
4.5. Upload fota/update.mar and fota/update.xml to people server.

> 5. "Settings > Device Info > Check Now".
> 
> 6. Download and Install the System Update.
Dave, would you mind helping figure this out? I think it makes sense to change the included locales/dictionaries for Buri to take a couple MB off.

If you think somebody else (RelEng, l10n) who is in better position to do this work, please do help by passing the baton.
Flags: needinfo?(dhylands)
(Reporter)

Comment 18

5 years ago
(In reply to Alex Keybl [:akeybl] from comment #17)
> Dave, would you mind helping figure this out? I think it makes sense to
> change the included locales/dictionaries for Buri to take a couple MB off.
> 
> If you think somebody else (RelEng, l10n) who is in better position to do
> this work, please do help by passing the baton.

Dave, just repinging if you can help.   This is critical for QA to test if we are preserving data correctly.   Thanks.
(In reply to Tony Chung [:tchung] from comment #18)
> (In reply to Alex Keybl [:akeybl] from comment #17)
> > Dave, would you mind helping figure this out? I think it makes sense to
> > change the included locales/dictionaries for Buri to take a couple MB off.
> > 
> > If you think somebody else (RelEng, l10n) who is in better position to do
> > this work, please do help by passing the baton.
> 
> Dave, just repinging if you can help.   This is critical for QA to test if
> we are preserving data correctly.   Thanks.

So the problem isn't with the OTA process itself, which is the part that I'm familiar with. The problem is that the image we're using is too big for the space we have.

I haven't had time to investigate why the image is too big, although I suspect that its related to adding languages/dictionaries.
Flags: needinfo?(dhylands)
So I took a look at a recent image and it looks like we're close.

When I flashed: https://pvtbuilds.mozilla.org/pvt/mozilla.org/b2gotoro/nightly/mozilla-central-hamachi/2013/09/2013-09-09-11-46-57/ there was 40M of free space.

The update expands to around 42M

The keyboard app currently takes up 24M, and upon further examination, it seems to contain the following dictionaries:

2	ca.dict
2	cs.dict
3	de.dict
3	el.dict
2	en_us.dict
2	es.dict
2	fr.dict
2	hr.dict
4	hu.dict
2	nl.dict
2	pl.dict
2	pt_br.dict
1	ro.dict
3	ru.dict
2	sk.dict
2	sr-Cyrl.dict
2	sr-Latn.dict
2	tr.dict

Pruning this list should free up enough space to allow OTA to work and give some breathing room.
Adding needinfo on djf since he probably knows how to prune the list of dictionaries that we install.
Flags: needinfo?(dflanagan)
Hmm. I got my sizes wrong.

The update expands to 74 Mb. Since we have 40M of free space, we need to reduce the size of the image by (74 - 40) = 34, and we'd need about half (if the image took 74 - 17 = 57 Mb, then we'd have 57M of free space.

So we need to trim at least 17M of space from the image to get things to fit.

The keyboard app still seems like the most likely candidate. The numbers next to dictionaries from comment 20 are uncompressed size in megabytes. Since these get bundled into the keyboard's zip file, it's the compressed size that matters.

Here's the sizes (in kbytes) for the uncompressed keyboard app:

8	./js/settings
16	./js/imes/jspinyin/tests
2116	./js/imes/jspinyin
24	./js/imes/jskanji/dict
344	./js/imes/jskanji/docs
448	./js/imes/jskanji
12	./js/imes/jszhuyin/tools
6868	./js/imes/jszhuyin
32400	./js/imes/latin/dictionaries
32488	./js/imes/latin
41924	./js/imes
42080	./js
52	./style/fonts
200	./style/images
280	./style
20	./resources/sounds
24	./resources
16	./shared/style_unstable/lists
24	./shared/style_unstable
40	./shared/js
52	./shared/style/headers/images/icons
24	./shared/style/headers/images/ui/dark
28	./shared/style/headers/images/ui/organic
12	./shared/style/headers/images/ui/overlay
96	./shared/style/headers/images/ui
152	./shared/style/headers/images
160	./shared/style/headers
40	./shared/style/confirm/images/ui
44	./shared/style/confirm/images
72	./shared/style/confirm
12	./shared/style/switches/images/check
12	./shared/style/switches/images/switch
12	./shared/style/switches/images/radio
40	./shared/style/switches/images
48	./shared/style/switches
308	./shared/style
376	./shared
84	./locales-obj
42992	.
Bug 908286 added a bunch of dictionaries. That was a leo+ bug and the dictionaries were listed as needed for 1.1.

Bug 884752 is a koi+ bug (but on my back burner) for creating build-time configuration options to allow carriers to easily ship with just the keyboard layouts and dictionaries that they need.

Would having this ability to configure dictionaries at build time help with this bug?  If so, I'll make it my top priority for next week.  (Ping me by email?)

It sounds like the specific needinfo question was "what dictionaries can we omit?".  And I don't have an answer to that. You'd have to ask the localization or the product people, I think.

Note that there is a potential problem here if 1.1 ships with more dictionaries than we can deliver updates to because then an OTA update would end up removing functionality from the phone.  Presumably carriers are taking care of that in their own builds, though.
Flags: needinfo?(dflanagan)
I think it would be great if the languages included with the keyboard were determined by the languages included with gaia (or a subset thereof).

Updated

5 years ago
Depends on: 884752
Created attachment 806364 [details]
cleanup.sh

I wrote a script to try and free up a bunch of space on the hamachi.

After running the attached script, I was able to get my free space on /system to go from 41M to 63M, but that isn't quite enough to allow an OTA to occur successfully.

So the next thing is to try and reduce the size of the system.img file.

Updated

5 years ago
Whiteboard: burirun1
(Reporter)

Comment 26

5 years ago
(In reply to Dave Hylands [:dhylands] from comment #25)
> Created attachment 806364 [details]
> cleanup.sh
> 
> I wrote a script to try and free up a bunch of space on the hamachi.
> 
> After running the attached script, I was able to get my free space on
> /system to go from 41M to 63M, but that isn't quite enough to allow an OTA
> to occur successfully.
> 
> So the next thing is to try and reduce the size of the system.img file.

Hi Dave, just poking in the bug to see if there's been any more recent progress.  Thanks!
(In reply to Tony Chung [:tchung] from comment #26)
> (In reply to Dave Hylands [:dhylands] from comment #25)
> > Created attachment 806364 [details]
> > cleanup.sh
> > 
> > I wrote a script to try and free up a bunch of space on the hamachi.
> > 
> > After running the attached script, I was able to get my free space on
> > /system to go from 41M to 63M, but that isn't quite enough to allow an OTA
> > to occur successfully.
> > 
> > So the next thing is to try and reduce the size of the system.img file.
> 
> Hi Dave, just poking in the bug to see if there's been any more recent
> progress.  Thanks!

Hey Tony - I think that bug 884752 is the solution. It will cut about 15 Mb from the image, which translates to needing 30 Mb less space in /system (because the base image will be 15 Mb smaller, and the incoming OTA image will be 15 Mb smaller).

Updated

5 years ago
blocking-b2g: koi? → koi+

Updated

5 years ago
No longer blocks: 884399
Awaiting fix for 884752 to land
884752 has finally landed. Sorry it took so long... No one was able to review it for a long time.
QA Wanted - this should be working with the landed dependent patch now. We need to wait though until one build after the build with bug 884399 gets generated.

On Tuesday (10/22/2013), we should try address this QA Wanted bug by trying to an OTA update from a 10/21/2013 1.2 Buri build to a 10/22/2013 1.2 Buri build.
Keywords: qawanted
(In reply to Jason Smith [:jsmith] from comment #30)
> QA Wanted - this should be working with the landed dependent patch now. We
> need to wait though until one build after the build with bug 884399 gets
> generated.
> 
> On Tuesday (10/22/2013), we should try address this QA Wanted bug by trying
> to an OTA update from a 10/21/2013 1.2 Buri build to a 10/22/2013 1.2 Buri
> build.

Looks like the patch was backed out, so we'll have to wait until it lands again to do qawanted here.
Keywords: qawanted
Bug 884752 is now in trouble with localization. So even once the failing test is fixed, it may not be able to land. I am not able to provide a realistic estimate of when those localization issues will be worked out.

If you need to get this unblocked, I suppose you could just use this bug to remove some autocorrect dictionaries from the build while waiting for a more configurable solution.
(In reply to David Flanagan [:djf] from comment #32)
> Bug 884752 is now in trouble with localization. So even once the failing
> test is fixed, it may not be able to land. I am not able to provide a
> realistic estimate of when those localization issues will be worked out.
> 
> If you need to get this unblocked, I suppose you could just use this bug to
> remove some autocorrect dictionaries from the build while waiting for a more
> configurable solution.

That sounds like the right approach. Could someone get a patch together to do that?
This is just a matter of removing dictionaries from apps/keyboard/js/imes/latin/dictionaries/ and also editing apps/keyboard/js/layout.js to remove the lines that reference those files. It should be easy for whoever produces the builds to do this and remove however many dictionaries are required to get the build size down.
(In reply to David Flanagan [:djf] from comment #34)
> This is just a matter of removing dictionaries from
> apps/keyboard/js/imes/latin/dictionaries/ and also editing
> apps/keyboard/js/layout.js to remove the lines that reference those files.
> It should be easy for whoever produces the builds to do this and remove
> however many dictionaries are required to get the build size down.

Aki - Can we address this in the build generation process on the release engineering side?
Flags: needinfo?(aki)

Comment 36

5 years ago
This should probably be a build.sh change.  We wrap this script, and going in and trying to muck with things after build.sh does everything is probably a bad idea.

This script has been written+maintained by developers.
Flags: needinfo?(aki)
Yuren or Alexandre - Can one of you guys help make the necessary build.sh changes here to get OTA update testing unblocked? See comment 34 for what needs to be changed.
Flags: needinfo?(yurenju.mozilla)
Flags: needinfo?(poirot.alex)
This should definitely be handled by the build system itself.

Having a post-processing step that munges the build output will be fragile and inevitably lead to bustage when upstream code changes the format or location of these files. Developers will have no way of knowing their change will break this post processing.

Can we add a flag of some kind to indicate that the build should strip out dictionaries?
Ok so we are talking about having a temporary and most likely hacky workaround until bug 884752 lands, right? Because It looks like the proper way to filter keyboard is about just redoing what is being done in that bug.

If that's the case, we can simply remove the dictionary files like this:
  zip -d gaia/profile/webapps/keyboard.gaiamobile.org/application.zip js/imes/latin/dictionaries/*
And for the layout.js, we could replace the one in the zip with an hardcoded striped one.

I have no idea how these OTA images are built. Is there a specific script we can patch?
TBH I'd rather make the hacky strip there, but we can also pass a flag to gaia build system in order to tell it to arbitrary strip some keyboard dictionaries. (May be OTA already pass some env variable to gaia Makefile that identify it very precisely?)
Otherwise, do we have an idea if we want to strip all dictionaries, all but english, or ...?

Updated

5 years ago
Flags: needinfo?(yurenju.mozilla)
Flags: needinfo?(poirot.alex)
(In reply to Alexandre Poirot (:ochameau) from comment #39)
> Ok so we are talking about having a temporary and most likely hacky
> workaround until bug 884752 lands, right? Because It looks like the proper
> way to filter keyboard is about just redoing what is being done in that bug.

Yup, we need a temporary workaround until bug 884752 lands.

> 
> If that's the case, we can simply remove the dictionary files like this:
>   zip -d gaia/profile/webapps/keyboard.gaiamobile.org/application.zip
> js/imes/latin/dictionaries/*
> And for the layout.js, we could replace the one in the zip with an hardcoded
> striped one.
> 
> I have no idea how these OTA images are built. Is there a specific script we
> can patch?

Dave Hylands could help answer this question.

> TBH I'd rather make the hacky strip there, but we can also pass a flag to
> gaia build system in order to tell it to arbitrary strip some keyboard
> dictionaries. (May be OTA already pass some env variable to gaia Makefile
> that identify it very precisely?)
> Otherwise, do we have an idea if we want to strip all dictionaries, all but
> english, or ...?

djf could help answer this question.
(In reply to Jason Smith [:jsmith] from comment #40)
> (In reply to Alexandre Poirot (:ochameau) from comment #39)
> > Ok so we are talking about having a temporary and most likely hacky
> > workaround until bug 884752 lands, right? Because It looks like the proper
> > way to filter keyboard is about just redoing what is being done in that bug.
> 
> Yup, we need a temporary workaround until bug 884752 lands.
> 
> > 
> > If that's the case, we can simply remove the dictionary files like this:
> >   zip -d gaia/profile/webapps/keyboard.gaiamobile.org/application.zip
> > js/imes/latin/dictionaries/*
> > And for the layout.js, we could replace the one in the zip with an hardcoded
> > striped one.
> > 
> > I have no idea how these OTA images are built. Is there a specific script we
> > can patch?
> 
> Dave Hylands could help answer this question.

I create local OTA images for testing by doing:

./build.sh gecko-update-full

I honestly have no idea how the OTA images get built for deployment onto our OTA servers. My experience is that releng does their own thing.
Naveed, this bug is about the size of the update and no devs engineers can help, could you take a look at it ?
Flags: needinfo?(nihsanullah)
I replied by email, and I'll repeat my comment here.

Based on the email, I think that we want something like

  ./build.sh gecko-update-full-fota

which would build a fota image which contains exactly the same files that are contained in 

  ./build.sh gecko-update-full

(so no binary blobs. The steps describes in comment #13 builds a fota which contains binary blobs).

I took a quick look at the scripts we already have and could find anything already done which would do exactly what we want (there are helper functions to do most of the work, just not the finished product).

I don't currently have time to work on this, but I'm certainly available to provide guidance.
(Reporter)

Comment 44

5 years ago
+gabriele, who i was told will assist from Engineering.
I can try and help out with this since I already worked in the past on shrinking the images but I have to study the issue in a bit more detail first.
Gabriele,

We don't really need to shrink anything if we switch from OTA to FOTA (that amounts to about a 60-70Mb savings).
(In reply to Dave Hylands [:dhylands] from comment #46)
> We don't really need to shrink anything if we switch from OTA to FOTA (that
> amounts to about a 60-70Mb savings).

From your previous comments that sounds like the right thing to do and probably a more robust solution too. I'll have a look at the scripts tomorrow and ping you if I need some help on how to proceed.
John, this bug is about update on the buri, it looks like we are trying to make OTA update on a device that has been thought for FOTA, OTA update needs more space than FOTA. This creates an issue. Could you handle this please ?
Flags: needinfo?(nihsanullah) → needinfo?(joduinn)
(Reporter)

Comment 49

5 years ago
(In reply to David Scravaglieri [:scravag] from comment #48)
> John, this bug is about update on the buri, it looks like we are trying to
> make OTA update on a device that has been thought for FOTA, OTA update needs
> more space than FOTA. This creates an issue. Could you handle this please ?

what will it take to switch the work over to start using FOTA for doing gecko/gaia updates?  Dave has previously said this is possible, and possibly even recommended.   If its a huge amount of work, then we need to revisit another solution.   I understand the device is partitioned incorrectly, but we still need some creative solution so QA can test updates on this device.   Thanks for getting creative.
I'd guess that it would take 2-3 days (a week at the absolute most), by the time you've done all of the testing. I'm not all that familiar with the scripts, so I'm trying to be ultra conservative.
(Reporter)

Comment 51

5 years ago
(In reply to Dave Hylands [:dhylands] from comment #50)
> I'd guess that it would take 2-3 days (a week at the absolute most), by the
> time you've done all of the testing. I'm not all that familiar with the
> scripts, so I'm trying to be ultra conservative.

Gabriele, is this something you can take a shot at?   reminder, this is a koi+ bug, so we are looking for an assignee and a solution.   Thanks.
(In reply to Tony Chung [:tchung] from comment #51)
> Gabriele, is this something you can take a shot at?   reminder, this is a
> koi+ bug, so we are looking for an assignee and a solution.   Thanks.

I will have time to take a look at it this week though since I haven't touched the generation of those images for a while I cannot guarantee I will be able to come up with a solution immediately.
(Reporter)

Comment 53

5 years ago
(In reply to Gabriele Svelto [:gsvelto] from comment #52)
> (In reply to Tony Chung [:tchung] from comment #51)
> > Gabriele, is this something you can take a shot at?   reminder, this is a
> > koi+ bug, so we are looking for an assignee and a solution.   Thanks.
> 
> I will have time to take a look at it this week though since I haven't
> touched the generation of those images for a while I cannot guarantee I will
> be able to come up with a solution immediately.

Hi Gabriele, any update?
What's involved in serving FOTA updates? Are they packaged as a MAR file just like OTA updates, and the on-device updater knows how to handle both types?
(In reply to Chris AtLee [:catlee] from comment #54)
> What's involved in serving FOTA updates? Are they packaged as a MAR file
> just like OTA updates, and the on-device updater knows how to handle both
> types?

So basically, you create a zip file, and some type of script, which just refers to the files needed to perform the update. The update.zip file is signed, and then we wrap it up into an update.mar file.

When the update is downloaded, the update.xml has an attribute (isOSUPdate) that indicates that its a FOTA update, and we download the update.mar file and unpack it into an update.zip. We then reboot into the recovery ROM and pass it the path to the update.zip.

The recovery ROM, unpacks the update.zip, executes the script, which would replace the gecko/gaia files in /system/b2g and would then reboot the phone normally.

Ads far as serving FOTA updates goes, the only difference is the isOSUpdate attribute associated with the update in the update.xml file. The actual serving up stuff is identical.

This is what the update.xml file looks like for a regular OTA update:

<?xml version="1.0" ?>
<updates>
	<update appVersion="99.0" buildID="20131029154651" detailsURL="http://www.mozilla.com/test/sample-details.html" licenseURL="http://www.mozilla.com/test/sample-eula.html" type="minor" version="99.0">
		<patch URL="http://localhost/update.mar" hashFunction="SHA512" hashValue="ee598e9352fb2855f4f5220cab4f86d5e12aa4b3772a182a3f436a6d574158fa4e11f394b7496aca1240930e5a0322fee7bab04659a8eb7057e97287e81c7c61" size="71742896" type="complete"/>
	</update>
</updates>


This is what the update.xml file looks like for a FOTA update:

<?xml version="1.0" ?>
<updates>
	<update appVersion="99.0" buildID="20131029154651" detailsURL="http://www.mozilla.com/test/sample-details.html" isOSUpdate="true" licenseURL="http://www.mozilla.com/test/sample-eula.html" type="minor" version="99.0">
		<patch URL="http://localhost/update.mar" hashFunction="SHA512" hashValue="d3e7ae731283058f76a6fd1c1e70b4238049c71c69aee689c03d6c74cfe4e2a0658b5e425ccfcadb30964dfeef240b68a564d5aa0346468538856cb10483dc9f" size="108574402" type="complete"/>
	</update>
</updates>


So the update.xml file for the FOTA update has isOSUpdate="true"

The FOTA update in this example is significantly larger because its updating all of /system, not just /system/b2g
(In reply to Tony Chung [:tchung] from comment #53)
> Hi Gabriele, any update?

Not yet, sorry but I'm still clearing up stuff I had committed for this sprint but I should be done soon.

Anyway the approach I intend to take is to add a new target to build.sh as :dhylands suggested in comment 43 to build a FOTA image without binary blobs and then we'll try to serve that one instead of the OTA. According to comment 55 this shouldn't be very hard.
Looks like bug 884752 just landed - so this might start working now. QA Wanted to test this with a build with the patch from bug 884752 & the build after (i.e. test this by doing an OTA update from 10/31 build to 11/1 build).
Keywords: qawanted
(Reporter)

Comment 58

5 years ago
(In reply to Jason Smith [:jsmith] from comment #57)
> Looks like bug 884752 just landed - so this might start working now. QA
> Wanted to test this with a build with the patch from bug 884752 & the build
> after (i.e. test this by doing an OTA update from 10/31 build to 11/1 build).

It's not clear to me that just fixing bug 884752 does not require any additional work to fix this bug.  David, can you confirm if there's more work needed here, or should the updating just work now after this patch lands in the next nightly?
Flags: needinfo?(dflanagan)
Tony,

I don't know anything at all about OTA updates, so I can't really answer your question.  I think the default build size should have gotten a lot smaller now with 884752 landed.  I hope it helps!
Flags: needinfo?(dflanagan)
I've been poking around the code today and I've come up with a few steps needed to implement a 'gecko-update-full-fota' target for our build.

- I'm adding a 'gecko-update-full-fota' target to gonk-misc/Android.mk which invokes modified version of build-flash-fota.py and build-fota-mar.py to assemble the image

- build-flash-fota.py works pretty much as before except that it relies on extra functionality that I'm adding to update_tools.py when building a Gecko-only FOTA image

- I'm modifying the FlashFotaBuilder class in update_tools.py so that when building a Gecko-only FOTA image it will gather only the files contained under 'system/b2g' instead of 'system' and emit an update script that won't format the '/system' and '/data' partitions when run. Instead the update script mounts the partitions and extracts the archive contents over the target 'system/b2g' directory. This extra functionality is then used to assemble the Gecko-only FOTA image

Dave, before I start digging deeper into this does this sound like a reasonable approach to you?
Flags: needinfo?(dhylands)
Hey Gabriele,

Yeah that sounds like the right approach.

We should probably have some ability to add some canned steps to the script (in addition to expanding the files in /system/b2g).

Some things that we should do before expanding the files into /system/b2g:

- If /system/b2g doesn't exist, and /system/b2g.bak exists, then rename b2g.bak to b2g. This covers the case where an aborted OTA update left the system in a state it shouldn't get in.
- Remove /system/b2g/libdmd.so if it exists (so if we changed from a dmd build to a non-dmd build we don't pick up a stale libdmd.so)
- Remove directory tree /syste/b2g/updated (if it exists) This might get created from an OTA update that hasn't been applied yet.
Flags: needinfo?(dhylands)
Keywords: verifyme
We've already got the right keywords here - I flagged qawanted already for testing this.
Keywords: verifyme
Naoki took a look into this and confirmed this still doesn't work with the dependency fixed. The new problem that we're hitting is captured in bug 933897.
Depends on: 933897
Keywords: qawanted

Updated

5 years ago
No longer depends on: 933897
Duplicate of this bug: 933897
Turns out after talking with Naoki, he's discovered that we can still test OTA updates if we use leo with a modified boot.img. So it looks like we aren't blocked entirely from testing OTA updates, it's just we're limited to using the leo device.

Given that we're unblocked here, I don't think we need to continue to block on this bug. However, we should still move forward getting a FOTA solution in place - we're limited on what we can test here device wise. But doesn't necessarily need to block 1.2.
blocking-b2g: koi+ → koi?
Keywords: smoketest
ok, lot going on here, 56comments so far! As best as I can tell, the current summary is:

1) comment#65: workaround on leo devices in place, so urgency reduced but still need to change from serving OTA to serving FOTA.
2) comment#43,55,56: Looks like :gsvelto is tackling writing a new target for generating FOTAs. 
3) once the new target is in place, RelEng and QA can test these updates in automation before sending them out to all leo users who currently get OTA. 

Let me know if I missed anything?
Flags: needinfo?(joduinn) → needinfo?(gsvelto)
(In reply to John O'Duinn [:joduinn] from comment #66)
> 2) comment#43,55,56: Looks like :gsvelto is tackling writing a new target
> for generating FOTAs. 

Yes, I'll be working on that though I'm glad if I haven't got to do it in a rush. As the various scenarios described by :dhylands in comment 61 show this is something that needs to be thoroughly tested least we brick user devices with broken updates.
Flags: needinfo?(gsvelto)

Updated

5 years ago
blocking-b2g: koi? → 1.3+
1.3+ keeping track for 1.3 for FOTA updates.
(Reporter)

Comment 69

5 years ago
(In reply to Preeti Raghunath(:Preeti) from comment #68)
> 1.3+ keeping track for 1.3 for FOTA updates.

i suggest we create a new bug to track FOTA update code migration for 1.3.   This bug still implies that 1.2 OTA isnt working and needs a solution and verification.

i'll file a new bug to do this.  in the meantime, can we retriage for koi? again?
(Reporter)

Updated

5 years ago
blocking-b2g: 1.3+ → koi?
(In reply to Tony Chung [:tchung] from comment #69)
> (In reply to Preeti Raghunath(:Preeti) from comment #68)
> > 1.3+ keeping track for 1.3 for FOTA updates.
> 
> i suggest we create a new bug to track FOTA update code migration for 1.3.  
> This bug still implies that 1.2 OTA isnt working and needs a solution and
> verification.

No that's not right. We already confirmed OTA updates are working fine with Leo for 1.1 --> 1.2, so we're unblocked here. I don't think pursuing an OTA solution on Buri on 1.2 is worthwhile anymore anyways, as we're hitting space limitations that are causing problems here that really needs to be fixed with a FOTA solution. I think we should plan on moving forward with OTA update testing on Leo devices in 1.2 as our plan & implement a FOTA solution for Buri for future releases.

In that case, if we're moving to a new bug, we probably should close this as a WONTFIX.
(In reply to Jason Smith [:jsmith] from comment #70)
> In that case, if we're moving to a new bug, we probably should close this as
> a WONTFIX.

Agreed, making a new bug would also allow us to put up a better title and description to narrow the scope to what was described in comment 60 and comment 61.
Filed bug 935059 to track the FOTA solution. Closing this as a WONTFIX per comment 70.
Status: NEW → RESOLVED
blocking-b2g: koi? → ---
Last Resolved: 5 years ago
Resolution: --- → WONTFIX
I could not update my hamachi with a 1.2.x build from pvtbuilds.
I digged and couldn't find this bug, thanks to :tchung for pointing me at it on IRC. I have tried dhyland's cleanup script in comment 25 but bootup fails now ;) according to logcat it is missing libmpqstopbinder.so:
> Error: Cannot load library: load_library[1091]: Library 'libmpqstobinder.so' not found
This causes a SIGSEGV somewhere and makes the phone startover (i.e. an endless loop of segfaults).

I can reflash now but that brings me back to a version that doesn't do OTA. I really hope this is nothing end-users will see...
The cleanup.sh was probably too agressive. Although I don't see libmpqstobinder.so as being one of the libraries which is removed, the script may have removed a dependency.

If you get into the place where it doesn't load, can you try:

adb stop b2g
adb shell /system/bin/b2g.sh

and see if any additional clues get printed?
Now that I think about it, the reason for the issue is because it's running out of disk space due to adding the files rather than checking the replacement of the files and updating the device space only after reboot... if there's something the vendcom can do to resolve that issue, this may resolve the issue with the current production devices not being able to update to 1.1...

Though, I guess it might be too late for that notion?
Status: RESOLVED → REOPENED
Component: Gaia::System → Vendcom
Flags: needinfo?(vchen)
Resolution: WONTFIX → ---
(In reply to Naoki Hirata :nhirata (please use needinfo instead of cc) from comment #75)
> Now that I think about it, the reason for the issue is because it's running
> out of disk space due to adding the files rather than checking the
> replacement of the files and updating the device space only after reboot...
> if there's something the vendcom can do to resolve that issue, this may
> resolve the issue with the current production devices not being able to
> update to 1.1...
> 
> Though, I guess it might be too late for that notion?

I don't think it's worthwhile to continue to pursue this path at this point - we've already got a partial FOTA solution in place implemented in a different bug. We just need rel eng to do the server side work to finish this off. OTA won't ever be used by our partners - FOTA will though. As such, I don't think there's value to continue to pursue getting OTA support working on a Buri device.

Does that make sense?
Flags: needinfo?(nhirata.bugzilla)
The underlying impact is not just the OTA; it affects when we flash to a new build.  This is why I reopened the issue.  If you think I morphed the bug, then I will go ahead and open a new bug and close this one out.

Also to note, if I recall correctly, it's still conceivable though less likely that FOTA can still run into the same issue.  The difference is that FOTA does a single file swap out in a different mode than the OTA which does the whole system swap out and backup of the files as it goes.
Flags: needinfo?(nhirata.bugzilla)
(In reply to Naoki Hirata :nhirata (please use needinfo instead of cc) from comment #77)
> The underlying impact is not just the OTA; it affects when we flash to a new
> build.  This is why I reopened the issue.  If you think I morphed the bug,
> then I will go ahead and open a new bug and close this one out.

Let's open a new bug for this.
Status: REOPENED → RESOLVED
Last Resolved: 5 years ago5 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.