Recovery does not dare to wipe gecko cache - Incremental FOTA update may fail

RESOLVED FIXED in Firefox OS v1.4
(NeedInfo from)

Status

RESOLVED FIXED
4 years ago
3 years ago

People

(Reporter: pehrsons, Assigned: pehrsons, NeedInfo)

Tracking

unspecified
2.1 S4 (12sep)
All
Gonk (Firefox OS)
Dependency tree / graph

Firefox Tracking Flags

(blocking-b2g:1.4+, b2g-v1.4 fixed, b2g-v2.0 fixed, b2g-v2.1 fixed, b2g-v2.2 fixed)

Details

Attachments

(4 attachments, 3 obsolete attachments)

(Assignee)

Description

4 years ago
Steps to reproduce:
1. Create an incremental FOTA update for your device (not explained here)
2. Use the device to visit websites (i.e., populate the cache)
3. Monitor the available space on /cache using `adb shell df`. When it gets below 26MB, install the update.
4. Update will fail, complaining that cache does not have enough size.

(5.) After rebooting back (to the now old version), cache will be wiped.
(6.) Apply update again and it works.

My findings show that the updater will only delete files (not folders, not files in subfolders) in /cache and /cache/recovery/otatest (see https://android.googlesource.com/platform/bootable/recovery/+/android-4.4.2_r2/applypatch/freecache.c#85).


While the cache I have gets populated like so:
$ adb shell ls -lR /cache | head -n40

/cache:
drwxr-x--- root     root              2014-08-26 00:37 Cache
-rw------- root     root            1 2014-08-26 11:48 _CACHE_CLEAN_
drwx------ root     root              2014-08-22 02:00 cache2
drwxrwx--- root     root              2014-08-22 02:00 lost+found
drwxrwx--- system   cache             2014-08-25 19:40 recovery

/cache/Cache:
drwx------ root     root              2014-08-26 10:45 0
drwx------ root     root              2014-08-26 10:45 1
drwx------ root     root              2014-08-26 10:40 2
drwx------ root     root              2014-08-26 10:45 3
drwx------ root     root              2014-08-26 10:45 4
drwx------ root     root              2014-08-26 10:45 5
drwx------ root     root              2014-08-26 10:45 6
drwx------ root     root              2014-08-26 10:27 7
drwx------ root     root              2014-08-26 10:27 8
drwx------ root     root              2014-08-26 10:46 9
drwx------ root     root              2014-08-26 10:45 A
drwx------ root     root              2014-08-26 10:45 B
drwx------ root     root              2014-08-26 10:27 C
drwx------ root     root              2014-08-26 10:27 D
drwx------ root     root              2014-08-26 11:21 E
drwx------ root     root              2014-08-26 10:45 F
-rw------- root     root      4194304 2014-08-26 11:48 _CACHE_001_
-rw------- root     root      4194304 2014-08-26 11:21 _CACHE_002_
-rw------- root     root      8388608 2014-08-26 11:43 _CACHE_003_
-rw------- root     root          276 2014-08-26 00:37 _CACHE_MAP_

/cache/Cache/0:
drwx------ root     root              2014-08-26 01:32 0D
drwx------ root     root              2014-08-26 10:11 16
drwx------ root     root              2014-08-26 00:48 1E
drwx------ root     root              2014-08-26 10:21 25
drwx------ root     root              2014-08-26 10:11 26
drwx------ root     root              2014-08-26 00:48 35
drwx------ root     root              2014-08-26 10:45 39
drwx------ root     root              2014-08-26 00:48 3D


This is not really blocking v1.4, but it's definitely blocking the first update we want to release for v1.4.


I am not sure who is doing this caching, but I am assuming it is gecko.

I see two fixes for this:
* Either have gecko wipe the cache not only on boot, but also right before applying a FOTA update. This would be a blocker for v1.4, unless we do the other fix as well. (Our deadline is today, unfortunately)
* Patch recovery so that the updater can wipe everything in the /cache/Cache directory.
(Assignee)

Comment 1

4 years ago
Michael, Alexandre, please give your opinion on this. If we patch the updater we either need to upstream it to AOSP or fork recovery.
Do you know who would know about the caching, and wiping it before installing an update?

And a small correction. Our deadline is not today, but tomorrow.
(Assignee)

Updated

4 years ago
Flags: needinfo?(mwu)
(Assignee)

Updated

4 years ago
Flags: needinfo?(lissyx+mozillians)
Since you are doing FOTA in recovery, why not ensuring in the edify script that you are wiping the cache ?
Flags: needinfo?(lissyx+mozillians)
(Assignee)

Comment 3

4 years ago
As we discussed on IRC, I don't know if we can safely wipe the entire partition. /cache/recovery is used to keep track of some update metadata for instance.

Patching gecko to clear the cache before rebooting into recovery would be the most future-proof solution. We'd preferably need this landed before our release though (that is tomorrow evening at the very latest). Worst case we can always update to a gecko where this is fixed with a patched updater.

Not sure who to ask for help on the caching. I see that Patrick is off this week so I'm needinfoing the other necko peers.
Flags: needinfo?(cbiesinger)
(Assignee)

Updated

4 years ago
Flags: needinfo?(jduell.mcbugs)
(Assignee)

Updated

4 years ago
Flags: needinfo?(honzab.moz)
(Assignee)

Updated

4 years ago
Flags: needinfo?(michal.novotny)
(Assignee)

Comment 4

4 years ago
[Blocking Requested - why for this release]:

Incremental FOTAs may intermittently fail when gecko's cache has grown too large before installing update.
blocking-b2g: --- → 1.4?
Flags: needinfo?(sworkman)
Flags: needinfo?(hurley)
(Assignee)

Comment 5

4 years ago
See the code at http://dxr.mozilla.org/mozilla-central/source/b2g/components/RecoveryService.js#114

---
 installFotaUpdate: function RS_installFotaUpdate(updatePath) {
#ifdef MOZ_WIDGET_GONK
    // If this succeeds, then the device reboots and this never returns
    if (librecovery.installFotaUpdate(updatePath, updatePath.length) != 0) {
      log("Error: FOTA install failed. Trying again after clearing cache.");
    }
    var cache = Cc["@mozilla.org/netwerk/cache-storage-service;1"].getService(Ci.nsICacheStorageService);
    cache.clear();
    if (librecovery.installFotaUpdate(updatePath, updatePath.length) != 0) {
      log("Error: FOTA install failed again");
    }
#endif
---


My take on this is that phone will reboot already in the first call to librecovery.installFotaUpdate, thus never returning an error so it can clear cache and try again.

mwu, you wrote the cache clearing part, with dhylands as reviewer. Do you guys know more about how this is supposed to work?

I'll patch it on my end to always clear the cache to see if that helps. Shouldn't have any implications as it seems to always be cleared on boot anyway (though I haven't verified if that is by design).
Flags: needinfo?(sworkman)
Flags: needinfo?(michal.novotny)
Flags: needinfo?(jduell.mcbugs)
Flags: needinfo?(hurley)
Flags: needinfo?(honzab.moz)
Flags: needinfo?(dhylands)
Flags: needinfo?(cbiesinger)
(Assignee)

Comment 6

4 years ago
Looking at librecovery code, it never checks for enough space available on /cache to perform an incremental FOTA (needs to backup the single largest _patched_ file, libxul is 26.8MB).

See https://github.com/mozilla-b2g/librecovery/blob/master/librecovery.c#L185
(Assignee)

Comment 7

4 years ago
Created attachment 8479101 [details] [diff] [review]
[v1.4] Always clear cache when applying B2G FOTA update

Since it looks like cache is always clear on a fresh boot, it should be safe to clear it right before installing a FOTA update.

r? for mwu, dhylands, since you were involved in bug 976450. I couldn't figure out which module this belongs to.

Btw, are there any tests that touch this code? Does a try run make sense?
Assignee: nobody → pehrsons
Status: NEW → ASSIGNED
Attachment #8479101 - Flags: review?(mwu)
Attachment #8479101 - Flags: review?(dhylands)
Flags: needinfo?(mwu)
Flags: needinfo?(dhylands)
(Assignee)

Comment 8

4 years ago
Wayne, sorry to bug you again but I'd appreciate a speedy triage on this one.
Flags: needinfo?(wchang)

Comment 9

4 years ago
Comment on attachment 8479101 [details] [diff] [review]
[v1.4] Always clear cache when applying B2G FOTA update

Review of attachment 8479101 [details] [diff] [review]:
-----------------------------------------------------------------

Originally, we only wanted to clear the cache if we couldn't write into the recovery command file. However, if we need a certain amount of space for FOTA to succeed, it makes sense to clear as much as possible. Didn't realize FOTA uses the cache partition as its scratch space while applying updates.
Attachment #8479101 - Flags: review?(mwu) → review+

Comment 10

4 years ago
I don't know of any tests for this code. BTW, does this also apply to master? This should probably land everywhere we can land it.
I think another alternative method is add one more option --wipe_cache before --update_package in librecovery.
Comment on attachment 8479101 [details] [diff] [review]
[v1.4] Always clear cache when applying B2G FOTA update

This seems reasonable to me.

I think its worthwhile to put a comment in about why we need to clean the cache (because partial FOTAs won't remove the cache files).
Attachment #8479101 - Flags: review?(dhylands) → review+
(Assignee)

Comment 13

4 years ago
Created attachment 8479680 [details] [diff] [review]
[v1.4] Always clear cache when applying B2G FOTA update

Comment added, the error message fixed (removed 'again'), commit message added. Carries r=mwu, r=dhylands. Waiting for 1.4+ to land.
Attachment #8479101 - Attachment is obsolete: true
(Assignee)

Comment 14

4 years ago
Created attachment 8479684 [details] [diff] [review]
[v1.4] Always clear cache when applying B2G FOTA update (carries r=mwu, r=dhylands)

Adding back the comment I accidentally removed in the first patch. Carries r=mwu, r=dhylands.
Attachment #8479680 - Attachment is obsolete: true
Andreas,
Thanks, I think this will also be needed by 2.0, can you request a 2.0 approval (send request to :bajaj) as well?

Thanks
Flags: needinfo?(wchang) → needinfo?(pehrsons)
(Assignee)

Comment 16

4 years ago
Yep, will do. Can't find the flag now. Does the bug have to be resolved, fixed before it shows up?
status-b2g-v1.4: --- → affected
status-b2g-v2.0: --- → affected
status-b2g-v2.1: --- → affected
Flags: needinfo?(pehrsons)
For 2.0 the flags for gaia and gecko are approval‑gaia‑v2.0 and approval‑mozilla‑b2g32 respectively.
(Assignee)

Comment 18

4 years ago
I know, I just can't find where to set them here in bugzilla. I'll follow up and make sure it lands on 2.0 and 2.1 though.

What about the 1.4? ?
(Assignee)

Comment 19

4 years ago
Oh, you set them on the patch, not the bug. Got it.
Andreas, just my two cents. I think the comment in patch could confuse us. 
Actually, recovery can do wipe_cache for us, if we set more option in recovery command when reboot into recovery ROM as comment 11. BTW, either of change in gecko or librecovery can solve the problem.
(Assignee)

Comment 21

4 years ago
Kai-Zhen, could you point me to the code in recovery/updater that does the cache wiping?
Flags: needinfo?(kli)
(Assignee)

Comment 22

4 years ago
Comment on attachment 8479684 [details] [diff] [review]
[v1.4] Always clear cache when applying B2G FOTA update (carries r=mwu, r=dhylands)

NOTE: This flag is now for security issues only. Please see https://wiki.mozilla.org/Release_Management/B2G_Landing to better understand the B2G approval process and landings.

[Approval Request Comment]
Bug caused by (feature/regressing bug #): 
User impact if declined: System updates (incremental fota only) will intermittently fail.
Testing completed: Tested and working fine on v1.4. No changes done since then to this file in 2.0.
Risk to taking this patch (and alternatives if risky): Low. With patch we always do what was previously a fallback.
String or UUID changes made by this patch: None
Attachment #8479684 - Flags: approval-mozilla-b2g32?
Andreas, After I do more study, erase /cache partition in recovery can't meet the requirement. Because cache partition will only be erased after update is success. Details are as below.

In librecovery/librecovery.c, the related command will be written into /cache/recovery/command before reboot into recovery ROM. For example, "--wipe_data" for factory reset and "--update_package=..." for update. We can use "--wipe_cache --update_package=__patch_to_update_package__" to clear cache partition.

In bootable/recovery/recovery.cpp, ‘wipe_cache’ option is handled and /cache will be erased. But it is done after update_package is success.
--
    if (update_package != NULL) {
        status = install_package(update_package, &wipe_cache, TEMPORARY_INSTALL_FILE);
        if (status == INSTALL_SUCCESS && wipe_cache) {
            if (erase_volume("/cache")) {
                LOGE("Cache wipe (requested by package) failed.");
            }
        }
--

Please forget what I said in previous comment, and I think your current solution is the suitable one.
Flags: needinfo?(kli)
(In reply to Michael Wu [:mwu] from comment #10)
> I don't know of any tests for this code. BTW, does this also apply to
> master? This should probably land everywhere we can land it.

I believe that the only testing we do is on the emulator. I don't know if /cache exists on the emulator, and I have a suspicion we only test OTA and not FOTA.

If we're testing FOTA, we could test it by adding a fie to /cache, performing a partial FOTA which adds a dummy file, and verifying that it was removed.
(Assignee)

Comment 25

4 years ago
(In reply to Dave Hylands [:dhylands] from comment #24)
> If we're testing FOTA, we could test it by adding a fie to /cache,
> performing a partial FOTA which adds a dummy file, and verifying that it was
> removed.

We'd have to add the file through the gecko cache though, as that is all the patch touches. Complicates things a bit I assume.

The tests would also have to be for incremental FOTA, not just a full FOTA (based on all the patching I've had to do to get incremental FOTAs working, my faith in there being tests for it is quite limited :) )
(Assignee)

Comment 26

4 years ago
I just saw a gecko with the patch above fail to update due to too little space on /cache.
It worked when I tested it before but this was different as /cache was almost full when I went on to install the update. There was ~28MB used and ~3MB available. For my successful test there was ~20MB available at the start. ~27MB needed.

In the recovery UI after the failure I could browse /cache and saw that there were plenty of gecko cache files left in place. I cannot say if any space had been freed by clear() as /cache wasn't mounted and I was on a user build.

Any reason why CacheStorageService::clear() wouldn't clear all files? Is there anything async fooling us here, or does clear() not delete all files?
Flags: needinfo?(michal.novotny)
Flags: needinfo?(honzab.moz)
Flags: needinfo?(dhylands)
I'm not familiar with the cache code.
Flags: needinfo?(dhylands)
(Assignee)

Comment 28

4 years ago
After a reboot I filled the cache up to the following, and installed the update again.
$ adb shell df
Filesystem               Size     Used     Free   Blksize
/dev                   234.8M   164.0K   234.7M   4096
/sys/fs/cgroup         234.8M    12.0K   234.8M   4096
/mnt/asec              234.8M     0.0K   234.8M   4096
/mnt/obb               234.8M     0.0K   234.8M   4096
/system                231.6M   139.4M    92.2M   4096
/data                   75.4M    25.0M    50.3M   4096
/cache                  31.2M    21.6M     9.6M   4096
/productinfo             1.1M    52.0K     1.1M   4096
/mnt/internal-sdcard    56.7M    14.0M    42.7M   4096

The update succeeded.
Sorry I was supposed to 1.4+ this with my last post... (after that you set approval on the patches for other releases)

Apologies for the confusions.
blocking-b2g: 1.4? → 1.4+
ni?:bajaj for approval-mozilla-b2g32?
Flags: needinfo?(bbajaj)
Bwfore landing on any branch, this needs to land on central and stick there after which uplift request for branches will be approved. Can you please help with that landing first?
Flags: needinfo?(bbajaj)
(In reply to Andreas Pehrson [:pehrsons] (Telenor) from comment #26)
> Any reason why CacheStorageService::clear() wouldn't clear all files? Is
> there anything async fooling us here, or does clear() not delete all files?

Clearing the cache is asynchronous since we cannot block on deleting all cache files. After CacheStorageService::clear() finished no old cache entry can be opened, but the files still do exist on the disk and are deleted on the background.
Flags: needinfo?(michal.novotny)
Flags: needinfo?(honzab.moz)
Also to note that probably not only the HTTP cache (that we could move somewhere else on B2G) is not in the cache partition, right?  However HTTP cache (please use full names to not cause any confusion) is the biggest offender.

Do we have some way to just wipe the /cache partition and don't cache about HTTP cache / go through its API at all?
(In reply to Honza Bambas (:mayhemer) from comment #33)
> and don't cache about

...and don't CARE about...
(Assignee)

Comment 35

4 years ago
/cache/Cache (HTTP cache) is the biggest offender and I believe the only offender.

I have only seen /cache/cache2 contain two empty folders, "doomed" and "entries", and /cache/recovery holds some metadata about updates/recovery mode, like this:

/cache/recovery:
-rw-r--r-- root     root           38 2014-08-25 19:45 last_install
-rw-r----- root     root         9569 2014-08-25 19:45 last_log
-rw-r----- root     root         6254 2014-08-25 19:24 last_log.1
-rw-r----- root     root         6252 2014-08-25 18:31 last_log.2
-rw-r----- root     root        12555 2014-08-22 02:07 last_log.3
-rw------- system   system      34630 2014-08-25 19:40 log

Michal, Honza, is there any event or callback from the cache, that files have been wiped? Anything we could use?

A definite solution would be to patch the updater in recovery, but that would require us to either upstream to aosp or fork recovery with the fix for all gonk versions. Dave, Michael, any spontaneous ideas?
Flags: needinfo?(mwu)
Flags: needinfo?(michal.novotny)
Flags: needinfo?(honzab.moz)
Flags: needinfo?(dhylands)
(In reply to Andreas Pehrson [:pehrsons] (Telenor) from comment #35)
> /cache/Cache (HTTP cache) is the biggest offender and I believe the only
> offender.
> 
> I have only seen /cache/cache2 contain two empty folders, "doomed" and
> "entries", and /cache/recovery holds some metadata about updates/recovery
> mode, like this:
> 
> /cache/recovery:
> -rw-r--r-- root     root           38 2014-08-25 19:45 last_install
> -rw-r----- root     root         9569 2014-08-25 19:45 last_log
> -rw-r----- root     root         6254 2014-08-25 19:24 last_log.1
> -rw-r----- root     root         6252 2014-08-25 18:31 last_log.2
> -rw-r----- root     root        12555 2014-08-22 02:07 last_log.3
> -rw------- system   system      34630 2014-08-25 19:40 log
> 
> Michal, Honza, is there any event or callback from the cache, that files
> have been wiped? Anything we could use?

No, despite my suggestions.  But it would be easy to implement some.

> 
> A definite solution would be to patch the updater in recovery, but that
> would require us to either upstream to aosp or fork recovery with the fix
> for all gonk versions. Dave, Michael, any spontaneous ideas?
Flags: needinfo?(honzab.moz)
Presumably the update script (which is provided with the update) could also remove stuff from cache (I know that there is a script, but I'm familiar with a all of the details).

I think that gerard-majax is more familiar with this.
Flags: needinfo?(dhylands) → needinfo?(lissyx+mozillians)
(Assignee)

Comment 38

4 years ago
I don't think the updater-script can remove folders recursively but I'll verify.

Comment 39

4 years ago
No new ideas here.
Flags: needinfo?(mwu)
(Assignee)

Comment 40

4 years ago
(In reply to Dave Hylands [:dhylands] [off Fri Aug 29] from comment #37)
> Presumably the update script (which is provided with the update) could also
> remove stuff from cache (I know that there is a script, but I'm familiar
> with a all of the details).
> 
> I think that gerard-majax is more familiar with this.

There's support for deleting recursively (at least since ICS) in the updater but not in edify_generator. We can of course add it.

We can fix it there now since we already own a fork of the platform_build repo, but I'd like to see it fixed in Gecko as well.
Flags: needinfo?(lissyx+mozillians)
Comment on attachment 8479684 [details] [diff] [review]
[v1.4] Always clear cache when applying B2G FOTA update (carries r=mwu, r=dhylands)

Clearing this till the central landings are sorted out, as this is not actionable yet.
Attachment #8479684 - Flags: approval-mozilla-b2g32?
(Assignee)

Comment 42

4 years ago
Created attachment 8481839 [details] [review]
[platform/build b2g-4.4] Clear Gecko HTTP cache before checking free cache space

This adds support for delete_recursive to the edify script generator, and clears all of the gecko cache for incremental updates.
delete_recursive is supported in the recovery updater (/system/bin/updater) since at least ice cream sandwich.

We should still follow up on clearing it properly in gecko with an event or callback. IMHO that would be sufficient to do on master. Michael, what do you think?
Attachment #8479684 - Attachment is obsolete: true
Attachment #8481839 - Flags: review?(mwu)
Attachment #8481839 - Flags: review?(lissyx+mozillians)
Flags: needinfo?(michal.novotny)
(Assignee)

Comment 43

4 years ago
Created attachment 8481840 [details] [review]
[platform/build b2g-4.3] Clear Gecko HTTP cache before checking free cache space
(Assignee)

Comment 44

4 years ago
Created attachment 8481841 [details] [review]
[platform/build b2g-4.2] Clear Gecko HTTP cache before checking free cache space
Comment on attachment 8481839 [details] [review]
[platform/build b2g-4.4] Clear Gecko HTTP cache before checking free cache space

That does looks fine for me, and it means you can do a follow up also to make use of this new method in tools/update-tools/update_tools.py where we already did use directly delete_recursive() calls.
Attachment #8481839 - Flags: review?(lissyx+mozillians) → review+
(Assignee)

Updated

4 years ago
Blocks: 1061134
(Assignee)

Updated

4 years ago
Depends on: 1061165
(Assignee)

Comment 46

4 years ago
I filed follow-up bugs for update_tools.py and for fixing this in gecko. I'll land this when mwu has had a look.

Since this is for platform_build and can be handled by the update, we don't really care about the 1.4+ anymore.

Comment 47

4 years ago
Comment on attachment 8481839 [details] [review]
[platform/build b2g-4.4] Clear Gecko HTTP cache before checking free cache space

Sounds good to me. f+ for the concept - I haven't dug into the actual code too much, but the existing r+ here is enough.
Attachment #8481839 - Flags: review?(mwu) → feedback+
(Assignee)

Comment 48

4 years ago
Comment on attachment 8481839 [details] [review]
[platform/build b2g-4.4] Clear Gecko HTTP cache before checking free cache space

In gecko 32 and higher, the new cache2 is used and it puts files at a different path; /cache/cache2.

I have modified the pull requests so that the updater also deletes /cache/cache2.

f? for :mayhemer to verify if it is safe for the recovery updater to delete both /cache/Cache and /cache/cache2.
Attachment #8481839 - Flags: feedback?(honzab.moz)
Comment on attachment 8481839 [details] [review]
[platform/build b2g-4.4] Clear Gecko HTTP cache before checking free cache space

It's safe, just added a small feedback on one comment line that is not clear.  But generally, this is good to go.
Attachment #8481839 - Flags: feedback?(honzab.moz) → feedback+
(Assignee)

Comment 50

4 years ago
Thanks, I clarified the comment a bit. Please land all three pull requests.
Keywords: checkin-needed
b2g-4.2.2_r1: https://github.com/mozilla-b2g/platform_build/commit/31311bbedfef1167163ee7ac4f17a3b4e2202d45

b2g-4.3_r2.1: https://github.com/mozilla-b2g/platform_build/commit/8986df0f82e15ac2798df0b6c2ee3435400677ac

b2g-4.4.2_r1: https://github.com/mozilla-b2g/platform_build/commit/fe92ddd450e03b38edb2d465de7897971d68ac68
Status: ASSIGNED → RESOLVED
Last Resolved: 4 years ago
status-b2g-v1.4: affected → fixed
status-b2g-v2.0: affected → fixed
status-b2g-v2.1: affected → fixed
status-b2g-v2.2: --- → fixed
Keywords: checkin-needed
Resolution: --- → FIXED
Target Milestone: --- → 2.1 S4 (12sep)
(Assignee)

Comment 52

4 years ago
Created attachment 8487118 [details] [review]
[platform/build master (ics)] Clear Gecko HTTP cache before checking free cache space

We need to land this on master branch as well to not break gecko-update-fota and gecko-update-fota-full on ICS devices when landing bug 1061134. Also, keon and peak use a forked platform/build, so they'll have to land it there as well.
Flags: needinfo?(gp)
(Assignee)

Updated

4 years ago
Attachment #8487118 - Attachment description: [platform/build master (ics)] → [platform/build master (ics)] Clear Gecko HTTP cache before checking free cache space
(Assignee)

Comment 53

4 years ago
Please merge the master branch PR. Clean cherry-pick, carries r=gerard-majax.
Keywords: checkin-needed
You need to log in before you can comment on or make changes to this bug.