Closed Bug 823695 Opened 7 years ago Closed 7 years ago

OTA updates leave otoro in unbootable state, needing re-flashing

Categories

(Firefox OS Graveyard :: General, defect, P1)

ARM
Gonk (Firefox OS)
defect

Tracking

(blocking-basecamp:+, firefox19 wontfix, firefox20 wontfix, firefox21 fixed, b2g18 fixed)

RESOLVED FIXED
B2G C4 (2jan on)
blocking-basecamp +
Tracking Status
firefox19 --- wontfix
firefox20 --- wontfix
firefox21 --- fixed
b2g18 --- fixed

People

(Reporter: kairo, Assigned: marshall)

References

Details

Attachments

(6 files, 2 obsolete files)

This might be related to other update bugs, including also bug 821296, but I think this deserves a bug report, as it's worrisome.

In the last week or two, I had multiple updates on my otoro leaving the device not being able to boot.

The last build I tried was 2012-12-18-07-02-00 and yesterday (19th), an update was offered and downloaded. Once that download was finished, it probably didn't finish correctly, in any case, when I looked at the screen again, it showed another notification that an update was available, which I tried to activate but it sat there with no download progress, just like it usually does when applying updates, so I did let it sit like that - actually for a few hours, but it didn't finish.
That's a similar game as to what I had seen with the other updates that left the device unbootable. When I did forcible reboot the device, it again was in a state that wouldn't boot, all it shows is the initial screen with the "mozilla" font.

I think we should never end up with an update getting us into such a state, users cannot get out of this - and given that all that helps is re-flashing, it loses everything stored in the /data partition, which is a lot of the user's data.
blocking-basecamp: --- → ?
Ran into this too, in dupe bug 824002
blocking-basecamp: ? → +
Duplicate of this bug: 824002
In my case I can't reflash, it hangs at boot and flash.sh doesn't connect.  So it really is bricked.
Assignee: nobody → marshall
(In reply to Lucas Adamski from comment #4)
> In my case I can't reflash, it hangs at boot and flash.sh doesn't connect. 
> So it really is bricked.

Hmm, strange, I had similar logcat messages and I could re-flash. Also, when you can get a logcat, you usually should be able to run the flashing as well.
I couldn't get flash.sh to connect whether I started it before the device booted or after.  But it did work after I did a "adb shell reboot".  Doesn't make any sense but there it is.
Target Milestone: --- → B2G C3 (12dec-1jan)
This logcat may be the same as the first one from Lucas, but I thought it's best to give you the post-update one of the same phone that I give you and update logcat of.
This update.log (from /data/local/updates/0/) seems to stop right in the middle of a line.
I took a look at the failed logcat, and it may be failing due to running out of space in /data.

The key information is here:
I/GeckoUpdater( 3103): Progress [ =================================================  ]
I/GeckoUpdater( 3103): Progress [ ================================================== ]
I/GeckoUpdater( 3103): Finished applying update
I/GonkAutoMounter( 3103): Mounted /system partition as read-only
I/Gecko   (  104): *** AUS:SVC readStatusFile - status: failed: 7, path: /data/local/updates/0/update.status
E/GeckoConsole(  104): AUS:SVC readStatusFile - status: failed: 7, path: /data/local/updates/0/update.status
I/Gecko   (  104): UpdatePrompt: Update error, state: failed, errorCode: 7
I/Gecko   (  104): UpdatePrompt: Setting gecko.updateStatus: Install Pending

The failed 7 is a write failure, which I also saw when I was seeing disk full errors. Once bug 785124 lands, the update will try to use the sdcard to download to instead of /data which will reduce the amount of space required in /data.

The update.log file also appears to be incomplete. Curiously, it's size is exactly 8192 bytes, which is also suggestive of a block of the log file that was unable to be written to disk.

The "bricked" logcat shows b2g restarting, which typically means that libxul.so failed to load. You would be able to confirm this by doing:

adb shell
stop b2g
/system/bin/b2g.sh

and observe the output in the adb shell terminal.
Unfortunately I can't look at the bricked system right now, as I re-flashed it with a fresh image (verifying bug 821296 in the process).
Target Milestone: B2G C3 (12dec-1jan) → B2G C4 (2jan on)
Priority: -- → P1
(In reply to Dave Hylands [:dhylands] from comment #10)
> I took a look at the failed logcat, and it may be failing due to running out
> of space in /data.

Wrong.


Here's a few df states during an update I ran now:

root@android:/ # df
Filesystem             Size   Used   Free   Blksize
/dev                    92M   100K    92M   4096
/mnt/asec               92M     0K    92M   4096
/mnt/obb                92M     0K    92M   4096
/system                234M   189M    44M   4096
/data                  152M    72M    80M   4096
/persist                 1M   772K   764K   4096
/cache                  57M     7M    49M   4096
/mnt/sdcard              1G   214M     1G   4096
/mnt/secure/asec         1G   214M     1G   4096
root@android:/ # df                                                            
Filesystem             Size   Used   Free   Blksize
/dev                    92M   100K    92M   4096
/mnt/asec               92M     0K    92M   4096
/mnt/obb                92M     0K    92M   4096
/system                234M   220M    13M   4096
/data                  152M    72M    80M   4096
/persist                 1M   772K   764K   4096
/cache                  57M     7M    49M   4096
/mnt/sdcard              1G   214M     1G   4096
/mnt/secure/asec         1G   214M     1G   4096

So it's /system that becomes smaller and smaller during what ultimately fails, the second df was a short time before it failed (and before the uncompressing would have been finished - we now at least show in UI that we're uncompressing).

This is after it had failed:

root@android:/ # df                                                            
Filesystem             Size   Used   Free   Blksize
/dev                    92M   100K    92M   4096
/mnt/asec               92M     0K    92M   4096
/mnt/obb                92M     0K    92M   4096
/system                234M   154M    79M   4096
/data                  152M    72M    80M   4096
/persist                 1M   772K   764K   4096
/cache                  57M     7M    49M   4096
/mnt/sdcard              1G   214M     1G   4096
/mnt/secure/asec         1G   214M     1G   4096
root@android:/ # 

Here's some more info from the state the system is in after that:

root@android:/ # ls -l /system/b2g/                                            
-rw-r--r-- root     root          825 2013-01-01 16:47 Throbber-small.gif
-rw-r--r-- root     root          425 2013-01-01 16:47 application.ini
-rwxr-xr-x root     root        96724 2013-01-01 16:47 b2g
-rw-r--r-- root     root           36 2013-01-01 16:47 chrome.manifest
drwxr-xr-x root     root              2013-01-01 16:47 components
-rw-r--r-- root     root         4003 2013-01-01 16:47 crashreporter.ini
drwxr-xr-x root     root              2013-01-01 16:47 defaults
-rw-r--r-- root     root          126 2013-01-01 16:47 dependentlibs.list
drwxr-xr-x root     root              2013-01-01 16:47 dictionaries
-rw-r--r-- root     root       208716 2013-01-01 16:47 libfreebl3.so
-rw-r--r-- root     root       102300 2013-01-01 16:47 libmozglue.so
-rw-r--r-- root     root       343432 2013-01-01 16:47 libmozsqlite3.so
-rw-r--r-- root     root       129640 2013-01-01 16:47 libnspr4.so
-rw-r--r-- root     root       607344 2013-01-01 16:47 libnss3.so
-rw-r--r-- root     root       390096 2013-01-01 16:47 libnssckbi.so
-rw-r--r-- root     root        95636 2013-01-01 16:47 libnssutil3.so
-rw-r--r-- root     root         8488 2013-01-01 16:47 libplc4.so
-rw-r--r-- root     root         5584 2013-01-01 16:47 libplds4.so 
-rw-r--r-- root     root        86672 2013-01-01 16:47 libsmime3.so
-rw-r--r-- root     root       135316 2013-01-01 16:47 libsoftokn3.so
-rw-r--r-- root     root       141108 2013-01-01 16:47 libssl3.so
-rw-r--r-- root     root         8796 2013-01-01 16:47 libxpcom.so
-rw-r--r-- root     root     17593896 2013-01-01 16:47 libxul.so
drwxr-xr-x root     root              2013-01-01 16:47 modules
-rw-r--r-- root     root      4121343 2013-01-01 16:47 omni.ja
-rw-r--r-- root     root          133 2013-01-01 16:47 platform.ini
-rwxr-xr-x root     root         2732 2013-01-01 16:47 plugin-container
-rw-r--r-- root     root         1133 2013-01-01 16:47 precomplete
-rw-r--r-- root     root           39 2013-01-01 16:47 removed-files
-rw-r--r-- root     root         8915 2013-01-01 16:47 run-mozilla.sh
drwxr-xr-x root     root              2013-01-03 22:27 updated
-rwxr-xr-x root     root        75104 2013-01-01 16:47 updater
drwxr-xr-x root     root              2013-01-01 16:12 webapps
root@android:/ # ls -l /system/b2g/updated/                                    
-rw-r--r-- root     root          425 2013-01-03 22:26 application.ini
-rwxr-xr-x root     root        96724 2013-01-03 22:26 b2g
-rw-r--r-- root     root           36 2013-01-03 22:26 chrome.manifest
drwxr-xr-x root     root              2013-01-03 22:27 components
-rw-r--r-- root     root         4003 2013-01-03 22:26 crashreporter.ini
drwxr-xr-x root     root              2013-01-03 22:26 defaults
-rw-r--r-- root     root          126 2013-01-03 22:26 dependentlibs.list
drwxr-xr-x root     root              2013-01-03 22:27 dictionaries
-rw-r--r-- root     root       208716 2013-01-03 22:26 libfreebl3.so
-rw-r--r-- root     root       102300 2013-01-03 22:26 libmozglue.so
-rw-r--r-- root     root       343432 2013-01-03 22:26 libmozsqlite3.so
-rw-r--r-- root     root       129640 2013-01-03 22:26 libnspr4.so
-rw-r--r-- root     root       607344 2013-01-03 22:26 libnss3.so
drwxr-xr-x root     root              2013-01-03 22:26 modules
-rw-r--r-- root     root         1133 2013-01-03 22:26 precomplete
drwxr-xr-x root     root              2013-01-03 22:27 webapps

So it looks to me like we put some huge temporary stuff during decompressing into /system and that makes us fail to put the actual updated binaries into the updated/ directory.

I'll back /system/b2g/ up before doing a reboot, but I fear we actually end up replacing the real one with the incomplete updated/ one and then the phone is bricked. We should definitely not allow that to happen, is there some way we have to avoid that?
Hmm, interesting, after a reboot I have this:

root@android:/ # ls -l /system/b2g/ 
-rw-r--r-- root     root          825 2013-01-03 22:55 Throbber-small.gif
-rw-r--r-- root     root          425 2013-01-03 22:55 application.ini
-rwxr-xr-x root     root        96724 2013-01-03 22:55 b2g
-rw-r--r-- root     root           36 2013-01-03 22:55 chrome.manifest
drwxr-xr-x root     root              2013-01-03 22:55 components
-rw-r--r-- root     root         4003 2013-01-03 22:55 crashreporter.ini
drwxr-xr-x root     root              2013-01-03 22:55 defaults
-rw-r--r-- root     root          126 2013-01-03 22:55 dependentlibs.list
drwxr-xr-x root     root              2013-01-03 22:55 dictionaries
-rwxr-xr-x root     root       208716 2013-01-03 22:55 libfreebl3.so
-rwxr-xr-x root     root       102300 2013-01-03 22:55 libmozglue.so
-rwxr-xr-x root     root       343432 2013-01-03 22:55 libmozsqlite3.so
-rwxr-xr-x root     root       129640 2013-01-03 22:55 libnspr4.so
-rwxr-xr-x root     root       607344 2013-01-03 22:55 libnss3.so
-rwxr-xr-x root     root       390096 2013-01-03 22:55 libnssckbi.so
-rwxr-xr-x root     root        95636 2013-01-03 22:55 libnssutil3.so
-rwxr-xr-x root     root         8488 2013-01-03 22:55 libplc4.so
-rwxr-xr-x root     root         5584 2013-01-03 22:55 libplds4.so
-rwxr-xr-x root     root        86672 2013-01-03 22:55 libsmime3.so
-rwxr-xr-x root     root       135316 2013-01-03 22:55 libsoftokn3.so
-rwxr-xr-x root     root       141108 2013-01-03 22:55 libssl3.so
-rw-r--r-- root     root         8796 2013-01-03 22:55 libxpcom.so
-rwxr-xr-x root     root     17597000 2013-01-03 22:55 libxul.so
-rw-r--r-- root     root      4121578 2013-01-03 22:55 omni.ja
-rw-r--r-- root     root          133 2013-01-03 22:55 platform.ini
-rwxr-xr-x root     root         2732 2013-01-03 22:55 plugin-container
-rw-r--r-- root     root         1133 2013-01-03 22:55 precomplete
-rw-r--r-- root     root           39 2013-01-03 22:55 removed-files
-rwxr-xr-x root     root         8915 2013-01-03 22:55 run-mozilla.sh
drwxr-xr-x root     root              2013-01-03 22:27 updated
-rwxr-xr-x root     root        75104 2013-01-03 22:55 updater
drwxr-xr-x root     root              2013-01-03 22:55 webapps
root@android:/ # ls -l /system/b2g/updated/                                    
-rw-r--r-- root     root          425 2013-01-03 22:26 application.ini
-rwxr-xr-x root     root        96724 2013-01-03 22:26 b2g
-rw-r--r-- root     root           36 2013-01-03 22:26 chrome.manifest
drwxr-xr-x root     root              2013-01-03 22:27 components
-rw-r--r-- root     root         4003 2013-01-03 22:26 crashreporter.ini
drwxr-xr-x root     root              2013-01-03 22:26 defaults
-rw-r--r-- root     root          126 2013-01-03 22:26 dependentlibs.list
drwxr-xr-x root     root              2013-01-03 22:27 dictionaries
-rw-r--r-- root     root       208716 2013-01-03 22:26 libfreebl3.so
-rw-r--r-- root     root       102300 2013-01-03 22:26 libmozglue.so
-rw-r--r-- root     root       343432 2013-01-03 22:26 libmozsqlite3.so
-rw-r--r-- root     root       129640 2013-01-03 22:26 libnspr4.so
-rw-r--r-- root     root       607344 2013-01-03 22:26 libnss3.so
drwxr-xr-x root     root              2013-01-03 22:26 modules
-rw-r--r-- root     root         1133 2013-01-03 22:26 precomplete
drwxr-xr-x root     root              2013-01-03 22:27 webapps


Now where did I get the updated files in the main directory from? And why is there still an updated/ directory with an incomplete set?
So /system getting full will be addressed by bug 821168

When the updater does the "uncompressing" stage, it remounts /system RW and then makes backups of the /system/b2g files, and unpacks and applies the .mar file.

If it runs out of space, then it removes the backup files, but leaves the updated directory.

The updated directory will get cleaned up by the next update.

So if this bug is about /system getting full, then it's a dup of bug 821168
Interestingly, I didn't end up bricked this time, but possibly because I didn't try to run update again after it failed but just rebooted. And the phone looks like it actually successfully updated even though I saw the same "status: failed: 7" in logcat, see this excerpt:

I/GeckoUpdater(  543): Progress [ ================================================== ]
I/GeckoUpdater(  543): Finished applying update
I/GonkAutoMounter(  543): Mounted /system partition as read-only
I/Gecko   (  104): *** AUS:SVC readStatusFile - status: failed: 7, path: /data/local/updates/0/update.status
E/GeckoConsole(  104): AUS:SVC readStatusFile - status: failed: 7, path: /data/local/updates/0/update.status
I/Gecko   (  104): UpdatePrompt: Update error, state: failed, errorCode: 7
I/Gecko   (  104): UpdatePrompt: Setting gecko.updateStatus: Install Pending
I/Gecko   (  104): *** AUS:SVC UpdateManager:refreshUpdateStatus - Notifying observers that the update was staged. state: pending, status: failed: 7
E/GeckoConsole(  104): AUS:SVC UpdateManager:refreshUpdateStatus - Notifying observers that the update was staged. state: pending, status: failed: 7
Once /system/b2g/updated is complete, then the phone reboots and this little dance (I represented it using sh commands, its actually C++ code) at boot time:

rm -rf /system/b2g.bak
mv /system/b2g /system/b2g.bak
mv /system/b2g.bak/updated /system/b2g

The above should only happen if /data/local/updates/0/update.status contains the word "applied", but from the logcat above, it contains a "failed: 7".

If the dance fails for some reason, then on the next reboot, b2g.sh will detect that /system/b2g.bak exists and /system/b2g doesn't and it will rename /system/b2g.bak back to /system/b2g

So, it shouldn't have actually applied the update (the only way to know would be to check /data/local/0/update.status).
I manually rebooted, but as you can see above, /system/b2g/ got "magically" replaced with an updated version at that boot, even though update.status had the wrong value and updated/ wasn't complete. I'm confused as well.
So what could be happening is that one of the reboots is detecting that the apply failed and somehow its managing to try again and suceededing.

Once bug 813770 lands, then it won't automatically try to apply the updates at boot time.
Bug 821168 has landed on inbound, so once it gets uplifted, then the amount of space needed to apply the update will be dramatically reduced and the updates should be working better.
Here's the state of updated/ just before failing today's update:

root@android:/ # df                                                            
Filesystem             Size   Used   Free   Blksize
/dev                    92M   100K    92M   4096
/mnt/asec               92M     0K    92M   4096
/mnt/obb                92M     0K    92M   4096
/system                234M   231M     2M   4096
/data                  152M    82M    70M   4096
/persist                 1M   772K   764K   4096
/cache                  57M    13M    43M   4096
/mnt/sdcard              1G   214M     1G   4096
/mnt/secure/asec         1G   214M     1G   4096
root@android:/ # ls -l /system/b2g/updated/                                    
-rw-r--r-- root     root          825 2013-01-06 22:53 Throbber-small.gif
-rw-r--r-- root     root          825 2013-01-06 22:51 Throbber-small.gif.moz-backup
-rw-r--r-- root     root          425 2013-01-06 22:51 application.ini.moz-backup
-rwxr-xr-x root     root        96724 2013-01-06 22:51 b2g.moz-backup
-rw-r--r-- root     root           36 2013-01-06 22:51 chrome.manifest.moz-backup
drwxr-xr-x root     root              2013-01-06 22:52 components
-rw-r--r-- root     root         4003 2013-01-06 22:51 crashreporter.ini.moz-backup
drwxr-xr-x root     root              2013-01-06 22:52 defaults
-rw-r--r-- root     root          126 2013-01-06 22:51 dependentlibs.list.moz-backup
drwxr-xr-x root     root              2013-01-06 22:52 dictionaries
-rwxr-xr-x root     root       208716 2013-01-06 22:51 libfreebl3.so.moz-backup
-rwxr-xr-x root     root       102300 2013-01-06 22:51 libmozglue.so.moz-backup
-rwxr-xr-x root     root       343432 2013-01-06 22:51 libmozsqlite3.so.moz-backup
-rwxr-xr-x root     root       129640 2013-01-06 22:51 libnspr4.so.moz-backup
-rwxr-xr-x root     root       607344 2013-01-06 22:51 libnss3.so.moz-backup
-rwxr-xr-x root     root       390096 2013-01-06 22:51 libnssckbi.so.moz-backup
-rwxr-xr-x root     root        95636 2013-01-06 22:51 libnssutil3.so.moz-backup
-rwxr-xr-x root     root         8488 2013-01-06 22:51 libplc4.so.moz-backup
-rwxr-xr-x root     root         5584 2013-01-06 22:51 libplds4.so.moz-backup
-rwxr-xr-x root     root        86672 2013-01-06 22:51 libsmime3.so.moz-backup
-rwxr-xr-x root     root       135316 2013-01-06 22:51 libsoftokn3.so.moz-backup
-rwxr-xr-x root     root       141108 2013-01-06 22:53 libssl3.so
-rwxr-xr-x root     root       141108 2013-01-06 22:51 libssl3.so.moz-backup
-rw-r--r-- root     root         8796 2013-01-06 22:53 libxpcom.so
-rw-r--r-- root     root         8796 2013-01-06 22:51 libxpcom.so.moz-backup
-rwxr-xr-x root     root     17597132 2013-01-06 22:53 libxul.so
-rwxr-xr-x root     root     17597000 2013-01-06 22:51 libxul.so.moz-backup
-rw-r--r-- root     root      4125123 2013-01-06 22:53 omni.ja
-rw-r--r-- root     root      4121578 2013-01-06 22:51 omni.ja.moz-backup
-rw-r--r-- root     root          133 2013-01-06 22:53 platform.ini
-rw-r--r-- root     root          133 2013-01-06 22:51 platform.ini.moz-backup
-rwxr-xr-x root     root         2732 2013-01-06 22:53 plugin-container
-rwxr-xr-x root     root         2732 2013-01-06 22:51 plugin-container.moz-backup
-rw-r--r-- root     root         1133 2013-01-06 22:53 precomplete
-rw-r--r-- root     root         1133 2013-01-06 22:51 precomplete.moz-backup
-rw-r--r-- root     root           39 2013-01-06 22:53 removed-files
-rw-r--r-- root     root           39 2013-01-06 22:51 removed-files.moz-backup
-rwxr-xr-x root     root         8915 2013-01-06 22:53 run-mozilla.sh
-rwxr-xr-x root     root         8915 2013-01-06 22:51 run-mozilla.sh.moz-backup
-rwxr-xr-x root     root        75104 2013-01-06 22:53 updater
-rwxr-xr-x root     root        75104 2013-01-06 22:51 updater.moz-backup
drwxr-xr-x root     root              2013-01-06 22:52 webapps
drwxr-xr-x root     root              2013-01-06 22:52 webapps

And now after it failed:

root@android:/ # ls -l /system/b2g/updated/                                    
-rw-r--r-- root     root          425 2013-01-06 22:51 application.ini
-rwxr-xr-x root     root        96724 2013-01-06 22:51 b2g
-rw-r--r-- root     root           36 2013-01-06 22:51 chrome.manifest
drwxr-xr-x root     root              2013-01-06 22:53 components
-rw-r--r-- root     root         4003 2013-01-06 22:51 crashreporter.ini
drwxr-xr-x root     root              2013-01-06 22:52 defaults
-rw-r--r-- root     root          126 2013-01-06 22:51 dependentlibs.list
drwxr-xr-x root     root              2013-01-06 22:53 dictionaries
-rwxr-xr-x root     root       208716 2013-01-06 22:51 libfreebl3.so
-rwxr-xr-x root     root       102300 2013-01-06 22:51 libmozglue.so
-rwxr-xr-x root     root       343432 2013-01-06 22:51 libmozsqlite3.so
-rwxr-xr-x root     root       129640 2013-01-06 22:51 libnspr4.so
-rwxr-xr-x root     root       607344 2013-01-06 22:51 libnss3.so
-rw-r--r-- root     root         1133 2013-01-06 22:51 precomplete
drwxr-xr-x root     root              2013-01-06 22:53 webapps
root@android:/ # df
Filesystem             Size   Used   Free   Blksize
/dev                    92M   100K    92M   4096
/mnt/asec               92M     0K    92M   4096
/mnt/obb                92M     0K    92M   4096
/system                234M   154M    79M   4096
/data                  152M    82M    70M   4096
/persist                 1M   772K   764K   4096
/cache                  57M    13M    43M   4096
/mnt/sdcard              1G   214M     1G   4096
/mnt/secure/asec         1G   214M     1G   4096

Now those .moz-backup files nicely explain why we take up even more space.


I'll now try to reproduce what I had when it got bricked:
After failing the update, we don't display that to the user, really, we instead show a notification saying "1 update availble. - Tap to download." I'll follow that advice.
What's interesting with that is that it goes to "Uncompressing…" directly in the UI, but logcat doesn't show the usual output it has when uncompressing, and there's no change at all in the updated/ directory.

All I see in logcat is this:

I/Gecko   (  105): *** AUS:SVC Creating Downloader
E/GeckoConsole(  105): AUS:SVC Creating Downloader
I/Gecko   (  105): *** AUS:SVC UpdateService:_downloadUpdate
E/GeckoConsole(  105): AUS:SVC UpdateService:_downloadUpdate
I/Gecko   (  105): *** AUS:SVC readStatusFile - status: pending, path: /data/local/updates/0/update.status
E/GeckoConsole(  105): AUS:SVC readStatusFile - status: pending, path: /data/local/updates/0/update.status
I/Gecko   (  105): *** AUS:SVC Downloader:_selectPatch - found existing patch with state: pending
E/GeckoConsole(  105): AUS:SVC Downloader:_selectPatch - found existing patch with state: pending
I/Gecko   (  105): *** AUS:SVC Downloader:_selectPatch - already downloaded and staged
E/GeckoConsole(  105): AUS:SVC Downloader:_selectPatch - already downloaded and staged
I/Gecko   (  105): *** AUS:SVC Downloader:downloadUpdate - no patch to download
E/GeckoConsole(  105): AUS:SVC Downloader:downloadUpdate - no patch to download
I/Gecko   (  105): *** AUS:SVC readStatusFile - status: pending, path: /data/local/updates/0/update.status
E/GeckoConsole(  105): AUS:SVC readStatusFile - status: pending, path: /data/local/updates/0/update.status
I/GeckoDump(  105): XXX FIXME : Got a mozContentEvent: update-available-result
D

And no messages following that would be connected to the update.

At this stage, I tried rebooting the device (I initially waited a long time without any change before doing that, but as it looks like actually nothing is happening, I guess I can speed that up).


Interestingly, it came up again with the update applied and working fine.
Hey guys.

The log from the failed update is really strange -- it looks like there was a write failure, but the updater still thinks the update status is "pending". It's possible the updater is overwriting this "failed: 7" status with "pending", and then on the next reboot, continuing on with the update anyway.

I'll be poking around in this code tonight to verify if the update.status has the wrong value or not when the standalone updater fails for some reason.
Yup, it's as I suspected. When the error is WRITE_ERROR, the status is being reset to pending:

https://mxr.mozilla.org/mozilla-central/source/toolkit/mozapps/update/nsUpdateService.js#1043

See line 1053 in that if statement :(
OK, I've finally been able to reproduce the majority of the behavior that Robert is seeing here. Here's what I've boiled this down to:

- When any error occurs in the "background apply" (update extraction) process, nsUpdateService will set the update's status to "pending", so that the extraction is tried again the next time the b2g process starts (or equivalently for us, when the phone is rebooted)

- On the next b2g startup, before gecko is even booted, the updater runs again, trying to apply the update. Unfortunately, this code path is still preloading libmozglue, which can cause the /system partition not to remount as read-only since the updater process holds open the filehandle to /system/b2g/libmozglue.so. When this error is encountered, the Gonk updater code doesn't mess around, and issues a system reboot, starting an endless loop of of reboots

- The updater is also apparently using the /system/b2g directory as it's playground when it tries to apply from the b2g process startup. During normal "background updates", we use a staging directory at /system/b2g/updated. 

This is __really__ bad, because any critical errors in updater land can become unrecoverable. If I force the apply process to fail by filling up the /system partition with a large file, the updater will fail catastrophically, leaving moz-backup files around in /system/b2g, and completely bricking the device.
(In reply to Marshall Culpepper [:marshall_law] from comment #23)
> OK, I've finally been able to reproduce the majority of the behavior that
> Robert is seeing here.

Great! I hope we can still get those things under control to make updates a more stable experience for everyone! :)
Component: General → Gaia
Component: Gaia → General
Attachment #702004 - Flags: review?(robert.bugzilla)
Comment on attachment 702004 [details] [diff] [review]
part 2: don't stage updates at b2g startup - v1

># HG changeset patch
># User Marshall Culpepper <marshall@mozilla.com>
>
>Bug 823695: Part 2 - Don't try to stage pending updates at b2g startup time. r=rstrong
>
>diff --git a/toolkit/xre/nsUpdateDriver.cpp b/toolkit/xre/nsUpdateDriver.cpp
>index 0074c8a..1eb5f35 100644
>--- a/toolkit/xre/nsUpdateDriver.cpp
>+++ b/toolkit/xre/nsUpdateDriver.cpp
>@@ -941,16 +941,22 @@ ProcessUpdates(nsIFile *greDir, nsIFile *appDir, nsIFile *updRootDir,
>     nsCOMPtr<nsIFile> versionFile;
>     // Remove the update if the update application version file doesn't exist
>     // or if the update's application version is less than the current
>     // application version.
>     if (!GetVersionFile(updatesDir, versionFile) ||
>         IsOlderVersion(versionFile, appVersion)) {
>       updatesDir->Remove(true);
>     } else {
>+#ifdef MOZ_WIDGET_GONK
>+      if (restart) {
>+        LOG("Deferring pending update staging in Gonk\n");
As elsewhere LOG(("..."))
LOG(("Deferring pending update staging in Gonk\n"));

>+        return NS_OK;
>+      }
>+#endif

r=me as long as tests pass.

It sounds like you might want to always use staging and never fallback to applying an update directly? I haven't put any thought into the ramifications of doing so but if you are fairly certain this is the case please file a bug to add this capability to the update process.
Attachment #702004 - Flags: review?(robert.bugzilla) → review+
Attachment #702003 - Flags: review?(fabrice) → review+
accidentally missed part of this patch, tagging dhylands for the missed part of
the patch.
Attachment #702004 - Attachment is obsolete: true
Attachment #702138 - Flags: review?(dhylands)
This version of the patch moves the unsetenv to the updater process. unsetenv in the parent isn't thread safe, and would cause child apps launched after the updater to fail (since the child apps rely on LD_PRELOAD)
Attachment #702138 - Attachment is obsolete: true
Attachment #702138 - Flags: review?(dhylands)
Attachment #702153 - Flags: review?(jones.chris.g)
Comment on attachment 702153 [details] [diff] [review]
Version which does the unsetenv in updater rather than in the parent.

r=me with removed check for exec() return value, which is redundant.  (It returning at all signals an error.)
Attachment #702153 - Flags: review?(jones.chris.g) → review+
You need to log in before you can comment on or make changes to this bug.