Closed Bug 976450 Opened 10 years ago Closed 10 years ago

librecovery doesn't work on full cache partitions

Categories

(Firefox OS Graveyard :: GonkIntegration, defect)

All
Gonk (Firefox OS)
defect
Not set
normal

Tracking

(blocking-b2g:1.3+, firefox28 wontfix, firefox29 wontfix, firefox30 fixed, b2g-v1.3 fixed, b2g-v1.3T fixed, b2g-v1.4 fixed)

VERIFIED FIXED
1.4 S3 (14mar)
blocking-b2g 1.3+
Tracking Status
firefox28 --- wontfix
firefox29 --- wontfix
firefox30 --- fixed
b2g-v1.3 --- fixed
b2g-v1.3T --- fixed
b2g-v1.4 --- fixed

People

(Reporter: mwu, Assigned: mwu)

References

Details

Attachments

(1 file)

If the cache partition is full, librecovery can't reset or update the phone since it can't write the recovery command to a file.
blocking-b2g: --- → 1.4?
I'll take a closer look at what's filling the partition when we hit this again. I think it's the cache, but maybe there's other users.
How likely could this problem happen? Do we think this problem has been present in past releases?
On the demo floor with 64 devices, we saw this at least 5-6 times over 3 days. Unfortunately I wasn't able to pick up and investigate another device where this happened. I think I saw this happen on 1.1 devices but I'm not sure.

Tyler, do you remember seeing this on the 1.1 devices?
Flags: needinfo?(tdowner)
I only saw this with the 1.3 phones, mainly the TCL Souls
Flags: needinfo?(tdowner)
I just reproduced it. It looks like you can just browse websites (imgur is great for this) until the cache is filled, so the http cache size config doesn't seem to be doing the job. Requesting 1.3 blocking since we want people who actually use their phones to be able to update them.
blocking-b2g: 1.4? → 1.3?
Depends on: 979589
This should work, except it doesn't because of bug 979589 .
Depends on: 979625
Comment on attachment 8385660 [details] [diff] [review]
Clear cache and try again

I did find conditions in which this actually works, so we should have it even if it doesn't work all the time.
Attachment #8385660 - Attachment description: Clear cache and try again → Link to Github pull-request: https://github.com/mozilla-b2g/librecovery/pull/7
Attachment #8385660 - Flags: review?(dhylands)
Comment on attachment 8385660 [details] [diff] [review]
Clear cache and try again

Not sure how that description got clobbered.
Attachment #8385660 - Attachment description: Link to Github pull-request: https://github.com/mozilla-b2g/librecovery/pull/7 → Clear cache and try again
Comment on attachment 8385660 [details] [diff] [review]
Clear cache and try again

Review of attachment 8385660 [details] [diff] [review]:
-----------------------------------------------------------------

This looks good to me.

Is there a bug about the cache not respecting the size?

I think it would be good if we can also prevent the cache from completely filling the partition (I have concerns that we might get into a situation where you can't even remove stuff if the file system gets full in just the wrong way - I can't find anything that mentions one way or the other whether this is a problem or not). I think that the FS should be keeping a spare erase block around so that it can't get into that situation (I'm probably remembering stuff from working on JFFS2 - it had lots of problems like this in the early days).
Attachment #8385660 - Flags: review?(dhylands) → review+
(In reply to Dave Hylands [:dhylands] from comment #10)
> Comment on attachment 8385660 [details] [diff] [review]
> Clear cache and try again
> 
> Review of attachment 8385660 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> This looks good to me.
> 
> Is there a bug about the cache not respecting the size?
> 
> I think it would be good if we can also prevent the cache from completely
> filling the partition (I have concerns that we might get into a situation
> where you can't even remove stuff if the file system gets full in just the
> wrong way - I can't find anything that mentions one way or the other whether
> this is a problem or not). I think that the FS should be keeping a spare
> erase block around so that it can't get into that situation (I'm probably
> remembering stuff from working on JFFS2 - it had lots of problems like this
> in the early days).

Agreed. Right now we set the cache size to the partition size minus 1 MB, but the cache partition takes 4MB just storing one empty directory and a 1 byte file, so that's not an effective default. So I'm not sure that the cache is actually not respecting the limit - it's doing as we're telling it to AFAICT.
(In reply to Jason Smith [:jsmith] from comment #3)
> How likely could this problem happen? Do we think this problem has been
> present in past releases?

BTW I do think this may be part of FOTA failures in past releases, especially since it seems very easy to hit.
I just filed bug 979900 for preventing us from running out of cache partition space.
1.3+ for FOTA failures
blocking-b2g: 1.3? → 1.3+
Setting assignee to avoid "blocker lacks an assignee".
Assignee: nobody → mwu
https://hg.mozilla.org/mozilla-central/rev/0f9efd24b1a0
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Target Milestone: --- → 1.4 S3 (14mar)
Please request approval-mozilla-b2g28 on this patch when this is ready for uplift.
Comment on attachment 8385660 [details] [diff] [review]
Clear cache and try again

[Approval Request Comment]
Bug caused by (feature/regressing bug #): 
User impact if declined: Users won't get FOTA updates.
Testing completed: Tested with bug 979625.
Risk to taking this patch (and alternatives if risky): Minimal. At most, people lose their http cache.
String or UUID changes made by this patch: None
Attachment #8385660 - Flags: approval-mozilla-b2g28?
Keywords: verifyme
I'm not sure if there's an easy way to verify this bug.

Michael - What do you think?
Flags: needinfo?(mwu)
mwu shared how to fill the cache in irc

We basically need to compile a build with the patch and replace the librecovery.so file.  Once we do that we can fill the cache by adb shell and "cat /dev/urandom > /cache/foo"
Then try to reset the phone through the menu options.

The phone should reset even though the cache is filled.

I am currently testing this.
Flags: needinfo?(mwu)
When the cache is filled using cat /dev/urandom > /cache/foo:

root@android:/ # df cache
Filesystem             Size   Used   Free   Blksize
cache                   40M    39M    36K   4096

The device does not reset with the patch.

adb rm /cache/recovery/log 

After removing the recovery log, the device was responsive to resetting and deleting the cache and reseting the phone.
Flags: needinfo?(mwu)
Yeah that's as expected. This patch only clears data that was put there by the cache. You'll need to visit some websites first to make sure there's data to remove before filling the device.
Flags: needinfo?(mwu)
I tried browsing a bunch and then filled the rest with : "cat /dev/urandom > /cache/foo" and I still got the same result.

When the cache was filled by itself with the web cache, the phone rebooted correctly.  It was slightly delayed. ( ~20 seconds to reboot)

Marking as verified.
Status: RESOLVED → VERIFIED
Keywords: verifyme
Comment on attachment 8385660 [details] [diff] [review]
Clear cache and try again

Thanks for the verification Naoki, looks good to land.
Attachment #8385660 - Flags: approval-mozilla-b2g28? → approval-mozilla-b2g28+
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: