Closed Bug 823695 Opened 12 years ago Closed 12 years ago

OTA updates leave otoro in unbootable state, needing re-flashing

Categories

(Firefox OS Graveyard :: General, defect, P1)

ARM
Gonk (Firefox OS)
defect

Tracking

(blocking-basecamp:+, firefox19 wontfix, firefox20 wontfix, firefox21 fixed, b2g18 fixed)

RESOLVED FIXED
B2G C4 (2jan on)
blocking-basecamp +
Tracking Status
firefox19 --- wontfix
firefox20 --- wontfix
firefox21 --- fixed
b2g18 --- fixed

People

(Reporter: kairo, Assigned: marshall)

References

Details

Attachments

(6 files, 2 obsolete files)

This might be related to other update bugs, including also bug 821296, but I think this deserves a bug report, as it's worrisome. In the last week or two, I had multiple updates on my otoro leaving the device not being able to boot. The last build I tried was 2012-12-18-07-02-00 and yesterday (19th), an update was offered and downloaded. Once that download was finished, it probably didn't finish correctly, in any case, when I looked at the screen again, it showed another notification that an update was available, which I tried to activate but it sat there with no download progress, just like it usually does when applying updates, so I did let it sit like that - actually for a few hours, but it didn't finish. That's a similar game as to what I had seen with the other updates that left the device unbootable. When I did forcible reboot the device, it again was in a state that wouldn't boot, all it shows is the initial screen with the "mozilla" font. I think we should never end up with an update getting us into such a state, users cannot get out of this - and given that all that helps is re-flashing, it loses everything stored in the /data partition, which is a lot of the user's data.
blocking-basecamp: --- → ?
Ran into this too, in dupe bug 824002
blocking-basecamp: ? → +
In my case I can't reflash, it hangs at boot and flash.sh doesn't connect. So it really is bricked.
Assignee: nobody → marshall
(In reply to Lucas Adamski from comment #4) > In my case I can't reflash, it hangs at boot and flash.sh doesn't connect. > So it really is bricked. Hmm, strange, I had similar logcat messages and I could re-flash. Also, when you can get a logcat, you usually should be able to run the flashing as well.
I couldn't get flash.sh to connect whether I started it before the device booted or after. But it did work after I did a "adb shell reboot". Doesn't make any sense but there it is.
Target Milestone: --- → B2G C3 (12dec-1jan)
This logcat may be the same as the first one from Lucas, but I thought it's best to give you the post-update one of the same phone that I give you and update logcat of.
This update.log (from /data/local/updates/0/) seems to stop right in the middle of a line.
I took a look at the failed logcat, and it may be failing due to running out of space in /data. The key information is here: I/GeckoUpdater( 3103): Progress [ ================================================= ] I/GeckoUpdater( 3103): Progress [ ================================================== ] I/GeckoUpdater( 3103): Finished applying update I/GonkAutoMounter( 3103): Mounted /system partition as read-only I/Gecko ( 104): *** AUS:SVC readStatusFile - status: failed: 7, path: /data/local/updates/0/update.status E/GeckoConsole( 104): AUS:SVC readStatusFile - status: failed: 7, path: /data/local/updates/0/update.status I/Gecko ( 104): UpdatePrompt: Update error, state: failed, errorCode: 7 I/Gecko ( 104): UpdatePrompt: Setting gecko.updateStatus: Install Pending The failed 7 is a write failure, which I also saw when I was seeing disk full errors. Once bug 785124 lands, the update will try to use the sdcard to download to instead of /data which will reduce the amount of space required in /data. The update.log file also appears to be incomplete. Curiously, it's size is exactly 8192 bytes, which is also suggestive of a block of the log file that was unable to be written to disk. The "bricked" logcat shows b2g restarting, which typically means that libxul.so failed to load. You would be able to confirm this by doing: adb shell stop b2g /system/bin/b2g.sh and observe the output in the adb shell terminal.
Unfortunately I can't look at the bricked system right now, as I re-flashed it with a fresh image (verifying bug 821296 in the process).
Target Milestone: B2G C3 (12dec-1jan) → B2G C4 (2jan on)
Priority: -- → P1
(In reply to Dave Hylands [:dhylands] from comment #10) > I took a look at the failed logcat, and it may be failing due to running out > of space in /data. Wrong. Here's a few df states during an update I ran now: root@android:/ # df Filesystem Size Used Free Blksize /dev 92M 100K 92M 4096 /mnt/asec 92M 0K 92M 4096 /mnt/obb 92M 0K 92M 4096 /system 234M 189M 44M 4096 /data 152M 72M 80M 4096 /persist 1M 772K 764K 4096 /cache 57M 7M 49M 4096 /mnt/sdcard 1G 214M 1G 4096 /mnt/secure/asec 1G 214M 1G 4096 root@android:/ # df Filesystem Size Used Free Blksize /dev 92M 100K 92M 4096 /mnt/asec 92M 0K 92M 4096 /mnt/obb 92M 0K 92M 4096 /system 234M 220M 13M 4096 /data 152M 72M 80M 4096 /persist 1M 772K 764K 4096 /cache 57M 7M 49M 4096 /mnt/sdcard 1G 214M 1G 4096 /mnt/secure/asec 1G 214M 1G 4096 So it's /system that becomes smaller and smaller during what ultimately fails, the second df was a short time before it failed (and before the uncompressing would have been finished - we now at least show in UI that we're uncompressing). This is after it had failed: root@android:/ # df Filesystem Size Used Free Blksize /dev 92M 100K 92M 4096 /mnt/asec 92M 0K 92M 4096 /mnt/obb 92M 0K 92M 4096 /system 234M 154M 79M 4096 /data 152M 72M 80M 4096 /persist 1M 772K 764K 4096 /cache 57M 7M 49M 4096 /mnt/sdcard 1G 214M 1G 4096 /mnt/secure/asec 1G 214M 1G 4096 root@android:/ # Here's some more info from the state the system is in after that: root@android:/ # ls -l /system/b2g/ -rw-r--r-- root root 825 2013-01-01 16:47 Throbber-small.gif -rw-r--r-- root root 425 2013-01-01 16:47 application.ini -rwxr-xr-x root root 96724 2013-01-01 16:47 b2g -rw-r--r-- root root 36 2013-01-01 16:47 chrome.manifest drwxr-xr-x root root 2013-01-01 16:47 components -rw-r--r-- root root 4003 2013-01-01 16:47 crashreporter.ini drwxr-xr-x root root 2013-01-01 16:47 defaults -rw-r--r-- root root 126 2013-01-01 16:47 dependentlibs.list drwxr-xr-x root root 2013-01-01 16:47 dictionaries -rw-r--r-- root root 208716 2013-01-01 16:47 libfreebl3.so -rw-r--r-- root root 102300 2013-01-01 16:47 libmozglue.so -rw-r--r-- root root 343432 2013-01-01 16:47 libmozsqlite3.so -rw-r--r-- root root 129640 2013-01-01 16:47 libnspr4.so -rw-r--r-- root root 607344 2013-01-01 16:47 libnss3.so -rw-r--r-- root root 390096 2013-01-01 16:47 libnssckbi.so -rw-r--r-- root root 95636 2013-01-01 16:47 libnssutil3.so -rw-r--r-- root root 8488 2013-01-01 16:47 libplc4.so -rw-r--r-- root root 5584 2013-01-01 16:47 libplds4.so -rw-r--r-- root root 86672 2013-01-01 16:47 libsmime3.so -rw-r--r-- root root 135316 2013-01-01 16:47 libsoftokn3.so -rw-r--r-- root root 141108 2013-01-01 16:47 libssl3.so -rw-r--r-- root root 8796 2013-01-01 16:47 libxpcom.so -rw-r--r-- root root 17593896 2013-01-01 16:47 libxul.so drwxr-xr-x root root 2013-01-01 16:47 modules -rw-r--r-- root root 4121343 2013-01-01 16:47 omni.ja -rw-r--r-- root root 133 2013-01-01 16:47 platform.ini -rwxr-xr-x root root 2732 2013-01-01 16:47 plugin-container -rw-r--r-- root root 1133 2013-01-01 16:47 precomplete -rw-r--r-- root root 39 2013-01-01 16:47 removed-files -rw-r--r-- root root 8915 2013-01-01 16:47 run-mozilla.sh drwxr-xr-x root root 2013-01-03 22:27 updated -rwxr-xr-x root root 75104 2013-01-01 16:47 updater drwxr-xr-x root root 2013-01-01 16:12 webapps root@android:/ # ls -l /system/b2g/updated/ -rw-r--r-- root root 425 2013-01-03 22:26 application.ini -rwxr-xr-x root root 96724 2013-01-03 22:26 b2g -rw-r--r-- root root 36 2013-01-03 22:26 chrome.manifest drwxr-xr-x root root 2013-01-03 22:27 components -rw-r--r-- root root 4003 2013-01-03 22:26 crashreporter.ini drwxr-xr-x root root 2013-01-03 22:26 defaults -rw-r--r-- root root 126 2013-01-03 22:26 dependentlibs.list drwxr-xr-x root root 2013-01-03 22:27 dictionaries -rw-r--r-- root root 208716 2013-01-03 22:26 libfreebl3.so -rw-r--r-- root root 102300 2013-01-03 22:26 libmozglue.so -rw-r--r-- root root 343432 2013-01-03 22:26 libmozsqlite3.so -rw-r--r-- root root 129640 2013-01-03 22:26 libnspr4.so -rw-r--r-- root root 607344 2013-01-03 22:26 libnss3.so drwxr-xr-x root root 2013-01-03 22:26 modules -rw-r--r-- root root 1133 2013-01-03 22:26 precomplete drwxr-xr-x root root 2013-01-03 22:27 webapps So it looks to me like we put some huge temporary stuff during decompressing into /system and that makes us fail to put the actual updated binaries into the updated/ directory. I'll back /system/b2g/ up before doing a reboot, but I fear we actually end up replacing the real one with the incomplete updated/ one and then the phone is bricked. We should definitely not allow that to happen, is there some way we have to avoid that?
Hmm, interesting, after a reboot I have this: root@android:/ # ls -l /system/b2g/ -rw-r--r-- root root 825 2013-01-03 22:55 Throbber-small.gif -rw-r--r-- root root 425 2013-01-03 22:55 application.ini -rwxr-xr-x root root 96724 2013-01-03 22:55 b2g -rw-r--r-- root root 36 2013-01-03 22:55 chrome.manifest drwxr-xr-x root root 2013-01-03 22:55 components -rw-r--r-- root root 4003 2013-01-03 22:55 crashreporter.ini drwxr-xr-x root root 2013-01-03 22:55 defaults -rw-r--r-- root root 126 2013-01-03 22:55 dependentlibs.list drwxr-xr-x root root 2013-01-03 22:55 dictionaries -rwxr-xr-x root root 208716 2013-01-03 22:55 libfreebl3.so -rwxr-xr-x root root 102300 2013-01-03 22:55 libmozglue.so -rwxr-xr-x root root 343432 2013-01-03 22:55 libmozsqlite3.so -rwxr-xr-x root root 129640 2013-01-03 22:55 libnspr4.so -rwxr-xr-x root root 607344 2013-01-03 22:55 libnss3.so -rwxr-xr-x root root 390096 2013-01-03 22:55 libnssckbi.so -rwxr-xr-x root root 95636 2013-01-03 22:55 libnssutil3.so -rwxr-xr-x root root 8488 2013-01-03 22:55 libplc4.so -rwxr-xr-x root root 5584 2013-01-03 22:55 libplds4.so -rwxr-xr-x root root 86672 2013-01-03 22:55 libsmime3.so -rwxr-xr-x root root 135316 2013-01-03 22:55 libsoftokn3.so -rwxr-xr-x root root 141108 2013-01-03 22:55 libssl3.so -rw-r--r-- root root 8796 2013-01-03 22:55 libxpcom.so -rwxr-xr-x root root 17597000 2013-01-03 22:55 libxul.so -rw-r--r-- root root 4121578 2013-01-03 22:55 omni.ja -rw-r--r-- root root 133 2013-01-03 22:55 platform.ini -rwxr-xr-x root root 2732 2013-01-03 22:55 plugin-container -rw-r--r-- root root 1133 2013-01-03 22:55 precomplete -rw-r--r-- root root 39 2013-01-03 22:55 removed-files -rwxr-xr-x root root 8915 2013-01-03 22:55 run-mozilla.sh drwxr-xr-x root root 2013-01-03 22:27 updated -rwxr-xr-x root root 75104 2013-01-03 22:55 updater drwxr-xr-x root root 2013-01-03 22:55 webapps root@android:/ # ls -l /system/b2g/updated/ -rw-r--r-- root root 425 2013-01-03 22:26 application.ini -rwxr-xr-x root root 96724 2013-01-03 22:26 b2g -rw-r--r-- root root 36 2013-01-03 22:26 chrome.manifest drwxr-xr-x root root 2013-01-03 22:27 components -rw-r--r-- root root 4003 2013-01-03 22:26 crashreporter.ini drwxr-xr-x root root 2013-01-03 22:26 defaults -rw-r--r-- root root 126 2013-01-03 22:26 dependentlibs.list drwxr-xr-x root root 2013-01-03 22:27 dictionaries -rw-r--r-- root root 208716 2013-01-03 22:26 libfreebl3.so -rw-r--r-- root root 102300 2013-01-03 22:26 libmozglue.so -rw-r--r-- root root 343432 2013-01-03 22:26 libmozsqlite3.so -rw-r--r-- root root 129640 2013-01-03 22:26 libnspr4.so -rw-r--r-- root root 607344 2013-01-03 22:26 libnss3.so drwxr-xr-x root root 2013-01-03 22:26 modules -rw-r--r-- root root 1133 2013-01-03 22:26 precomplete drwxr-xr-x root root 2013-01-03 22:27 webapps Now where did I get the updated files in the main directory from? And why is there still an updated/ directory with an incomplete set?
So /system getting full will be addressed by bug 821168 When the updater does the "uncompressing" stage, it remounts /system RW and then makes backups of the /system/b2g files, and unpacks and applies the .mar file. If it runs out of space, then it removes the backup files, but leaves the updated directory. The updated directory will get cleaned up by the next update. So if this bug is about /system getting full, then it's a dup of bug 821168
Interestingly, I didn't end up bricked this time, but possibly because I didn't try to run update again after it failed but just rebooted. And the phone looks like it actually successfully updated even though I saw the same "status: failed: 7" in logcat, see this excerpt: I/GeckoUpdater( 543): Progress [ ================================================== ] I/GeckoUpdater( 543): Finished applying update I/GonkAutoMounter( 543): Mounted /system partition as read-only I/Gecko ( 104): *** AUS:SVC readStatusFile - status: failed: 7, path: /data/local/updates/0/update.status E/GeckoConsole( 104): AUS:SVC readStatusFile - status: failed: 7, path: /data/local/updates/0/update.status I/Gecko ( 104): UpdatePrompt: Update error, state: failed, errorCode: 7 I/Gecko ( 104): UpdatePrompt: Setting gecko.updateStatus: Install Pending I/Gecko ( 104): *** AUS:SVC UpdateManager:refreshUpdateStatus - Notifying observers that the update was staged. state: pending, status: failed: 7 E/GeckoConsole( 104): AUS:SVC UpdateManager:refreshUpdateStatus - Notifying observers that the update was staged. state: pending, status: failed: 7
Once /system/b2g/updated is complete, then the phone reboots and this little dance (I represented it using sh commands, its actually C++ code) at boot time: rm -rf /system/b2g.bak mv /system/b2g /system/b2g.bak mv /system/b2g.bak/updated /system/b2g The above should only happen if /data/local/updates/0/update.status contains the word "applied", but from the logcat above, it contains a "failed: 7". If the dance fails for some reason, then on the next reboot, b2g.sh will detect that /system/b2g.bak exists and /system/b2g doesn't and it will rename /system/b2g.bak back to /system/b2g So, it shouldn't have actually applied the update (the only way to know would be to check /data/local/0/update.status).
I manually rebooted, but as you can see above, /system/b2g/ got "magically" replaced with an updated version at that boot, even though update.status had the wrong value and updated/ wasn't complete. I'm confused as well.
So what could be happening is that one of the reboots is detecting that the apply failed and somehow its managing to try again and suceededing. Once bug 813770 lands, then it won't automatically try to apply the updates at boot time. Bug 821168 has landed on inbound, so once it gets uplifted, then the amount of space needed to apply the update will be dramatically reduced and the updates should be working better.
Here's the state of updated/ just before failing today's update: root@android:/ # df Filesystem Size Used Free Blksize /dev 92M 100K 92M 4096 /mnt/asec 92M 0K 92M 4096 /mnt/obb 92M 0K 92M 4096 /system 234M 231M 2M 4096 /data 152M 82M 70M 4096 /persist 1M 772K 764K 4096 /cache 57M 13M 43M 4096 /mnt/sdcard 1G 214M 1G 4096 /mnt/secure/asec 1G 214M 1G 4096 root@android:/ # ls -l /system/b2g/updated/ -rw-r--r-- root root 825 2013-01-06 22:53 Throbber-small.gif -rw-r--r-- root root 825 2013-01-06 22:51 Throbber-small.gif.moz-backup -rw-r--r-- root root 425 2013-01-06 22:51 application.ini.moz-backup -rwxr-xr-x root root 96724 2013-01-06 22:51 b2g.moz-backup -rw-r--r-- root root 36 2013-01-06 22:51 chrome.manifest.moz-backup drwxr-xr-x root root 2013-01-06 22:52 components -rw-r--r-- root root 4003 2013-01-06 22:51 crashreporter.ini.moz-backup drwxr-xr-x root root 2013-01-06 22:52 defaults -rw-r--r-- root root 126 2013-01-06 22:51 dependentlibs.list.moz-backup drwxr-xr-x root root 2013-01-06 22:52 dictionaries -rwxr-xr-x root root 208716 2013-01-06 22:51 libfreebl3.so.moz-backup -rwxr-xr-x root root 102300 2013-01-06 22:51 libmozglue.so.moz-backup -rwxr-xr-x root root 343432 2013-01-06 22:51 libmozsqlite3.so.moz-backup -rwxr-xr-x root root 129640 2013-01-06 22:51 libnspr4.so.moz-backup -rwxr-xr-x root root 607344 2013-01-06 22:51 libnss3.so.moz-backup -rwxr-xr-x root root 390096 2013-01-06 22:51 libnssckbi.so.moz-backup -rwxr-xr-x root root 95636 2013-01-06 22:51 libnssutil3.so.moz-backup -rwxr-xr-x root root 8488 2013-01-06 22:51 libplc4.so.moz-backup -rwxr-xr-x root root 5584 2013-01-06 22:51 libplds4.so.moz-backup -rwxr-xr-x root root 86672 2013-01-06 22:51 libsmime3.so.moz-backup -rwxr-xr-x root root 135316 2013-01-06 22:51 libsoftokn3.so.moz-backup -rwxr-xr-x root root 141108 2013-01-06 22:53 libssl3.so -rwxr-xr-x root root 141108 2013-01-06 22:51 libssl3.so.moz-backup -rw-r--r-- root root 8796 2013-01-06 22:53 libxpcom.so -rw-r--r-- root root 8796 2013-01-06 22:51 libxpcom.so.moz-backup -rwxr-xr-x root root 17597132 2013-01-06 22:53 libxul.so -rwxr-xr-x root root 17597000 2013-01-06 22:51 libxul.so.moz-backup -rw-r--r-- root root 4125123 2013-01-06 22:53 omni.ja -rw-r--r-- root root 4121578 2013-01-06 22:51 omni.ja.moz-backup -rw-r--r-- root root 133 2013-01-06 22:53 platform.ini -rw-r--r-- root root 133 2013-01-06 22:51 platform.ini.moz-backup -rwxr-xr-x root root 2732 2013-01-06 22:53 plugin-container -rwxr-xr-x root root 2732 2013-01-06 22:51 plugin-container.moz-backup -rw-r--r-- root root 1133 2013-01-06 22:53 precomplete -rw-r--r-- root root 1133 2013-01-06 22:51 precomplete.moz-backup -rw-r--r-- root root 39 2013-01-06 22:53 removed-files -rw-r--r-- root root 39 2013-01-06 22:51 removed-files.moz-backup -rwxr-xr-x root root 8915 2013-01-06 22:53 run-mozilla.sh -rwxr-xr-x root root 8915 2013-01-06 22:51 run-mozilla.sh.moz-backup -rwxr-xr-x root root 75104 2013-01-06 22:53 updater -rwxr-xr-x root root 75104 2013-01-06 22:51 updater.moz-backup drwxr-xr-x root root 2013-01-06 22:52 webapps drwxr-xr-x root root 2013-01-06 22:52 webapps And now after it failed: root@android:/ # ls -l /system/b2g/updated/ -rw-r--r-- root root 425 2013-01-06 22:51 application.ini -rwxr-xr-x root root 96724 2013-01-06 22:51 b2g -rw-r--r-- root root 36 2013-01-06 22:51 chrome.manifest drwxr-xr-x root root 2013-01-06 22:53 components -rw-r--r-- root root 4003 2013-01-06 22:51 crashreporter.ini drwxr-xr-x root root 2013-01-06 22:52 defaults -rw-r--r-- root root 126 2013-01-06 22:51 dependentlibs.list drwxr-xr-x root root 2013-01-06 22:53 dictionaries -rwxr-xr-x root root 208716 2013-01-06 22:51 libfreebl3.so -rwxr-xr-x root root 102300 2013-01-06 22:51 libmozglue.so -rwxr-xr-x root root 343432 2013-01-06 22:51 libmozsqlite3.so -rwxr-xr-x root root 129640 2013-01-06 22:51 libnspr4.so -rwxr-xr-x root root 607344 2013-01-06 22:51 libnss3.so -rw-r--r-- root root 1133 2013-01-06 22:51 precomplete drwxr-xr-x root root 2013-01-06 22:53 webapps root@android:/ # df Filesystem Size Used Free Blksize /dev 92M 100K 92M 4096 /mnt/asec 92M 0K 92M 4096 /mnt/obb 92M 0K 92M 4096 /system 234M 154M 79M 4096 /data 152M 82M 70M 4096 /persist 1M 772K 764K 4096 /cache 57M 13M 43M 4096 /mnt/sdcard 1G 214M 1G 4096 /mnt/secure/asec 1G 214M 1G 4096 Now those .moz-backup files nicely explain why we take up even more space. I'll now try to reproduce what I had when it got bricked: After failing the update, we don't display that to the user, really, we instead show a notification saying "1 update availble. - Tap to download." I'll follow that advice.
What's interesting with that is that it goes to "Uncompressing…" directly in the UI, but logcat doesn't show the usual output it has when uncompressing, and there's no change at all in the updated/ directory. All I see in logcat is this: I/Gecko ( 105): *** AUS:SVC Creating Downloader E/GeckoConsole( 105): AUS:SVC Creating Downloader I/Gecko ( 105): *** AUS:SVC UpdateService:_downloadUpdate E/GeckoConsole( 105): AUS:SVC UpdateService:_downloadUpdate I/Gecko ( 105): *** AUS:SVC readStatusFile - status: pending, path: /data/local/updates/0/update.status E/GeckoConsole( 105): AUS:SVC readStatusFile - status: pending, path: /data/local/updates/0/update.status I/Gecko ( 105): *** AUS:SVC Downloader:_selectPatch - found existing patch with state: pending E/GeckoConsole( 105): AUS:SVC Downloader:_selectPatch - found existing patch with state: pending I/Gecko ( 105): *** AUS:SVC Downloader:_selectPatch - already downloaded and staged E/GeckoConsole( 105): AUS:SVC Downloader:_selectPatch - already downloaded and staged I/Gecko ( 105): *** AUS:SVC Downloader:downloadUpdate - no patch to download E/GeckoConsole( 105): AUS:SVC Downloader:downloadUpdate - no patch to download I/Gecko ( 105): *** AUS:SVC readStatusFile - status: pending, path: /data/local/updates/0/update.status E/GeckoConsole( 105): AUS:SVC readStatusFile - status: pending, path: /data/local/updates/0/update.status I/GeckoDump( 105): XXX FIXME : Got a mozContentEvent: update-available-result D And no messages following that would be connected to the update. At this stage, I tried rebooting the device (I initially waited a long time without any change before doing that, but as it looks like actually nothing is happening, I guess I can speed that up). Interestingly, it came up again with the update applied and working fine.
Hey guys. The log from the failed update is really strange -- it looks like there was a write failure, but the updater still thinks the update status is "pending". It's possible the updater is overwriting this "failed: 7" status with "pending", and then on the next reboot, continuing on with the update anyway. I'll be poking around in this code tonight to verify if the update.status has the wrong value or not when the standalone updater fails for some reason.
Yup, it's as I suspected. When the error is WRITE_ERROR, the status is being reset to pending: https://mxr.mozilla.org/mozilla-central/source/toolkit/mozapps/update/nsUpdateService.js#1043 See line 1053 in that if statement :(
OK, I've finally been able to reproduce the majority of the behavior that Robert is seeing here. Here's what I've boiled this down to: - When any error occurs in the "background apply" (update extraction) process, nsUpdateService will set the update's status to "pending", so that the extraction is tried again the next time the b2g process starts (or equivalently for us, when the phone is rebooted) - On the next b2g startup, before gecko is even booted, the updater runs again, trying to apply the update. Unfortunately, this code path is still preloading libmozglue, which can cause the /system partition not to remount as read-only since the updater process holds open the filehandle to /system/b2g/libmozglue.so. When this error is encountered, the Gonk updater code doesn't mess around, and issues a system reboot, starting an endless loop of of reboots - The updater is also apparently using the /system/b2g directory as it's playground when it tries to apply from the b2g process startup. During normal "background updates", we use a staging directory at /system/b2g/updated. This is __really__ bad, because any critical errors in updater land can become unrecoverable. If I force the apply process to fail by filling up the /system partition with a large file, the updater will fail catastrophically, leaving moz-backup files around in /system/b2g, and completely bricking the device.
(In reply to Marshall Culpepper [:marshall_law] from comment #23) > OK, I've finally been able to reproduce the majority of the behavior that > Robert is seeing here. Great! I hope we can still get those things under control to make updates a more stable experience for everyone! :)
Component: General → Gaia
Component: Gaia → General
Attachment #702004 - Flags: review?(robert.bugzilla)
Comment on attachment 702004 [details] [diff] [review] part 2: don't stage updates at b2g startup - v1 ># HG changeset patch ># User Marshall Culpepper <marshall@mozilla.com> > >Bug 823695: Part 2 - Don't try to stage pending updates at b2g startup time. r=rstrong > >diff --git a/toolkit/xre/nsUpdateDriver.cpp b/toolkit/xre/nsUpdateDriver.cpp >index 0074c8a..1eb5f35 100644 >--- a/toolkit/xre/nsUpdateDriver.cpp >+++ b/toolkit/xre/nsUpdateDriver.cpp >@@ -941,16 +941,22 @@ ProcessUpdates(nsIFile *greDir, nsIFile *appDir, nsIFile *updRootDir, > nsCOMPtr<nsIFile> versionFile; > // Remove the update if the update application version file doesn't exist > // or if the update's application version is less than the current > // application version. > if (!GetVersionFile(updatesDir, versionFile) || > IsOlderVersion(versionFile, appVersion)) { > updatesDir->Remove(true); > } else { >+#ifdef MOZ_WIDGET_GONK >+ if (restart) { >+ LOG("Deferring pending update staging in Gonk\n"); As elsewhere LOG(("...")) LOG(("Deferring pending update staging in Gonk\n")); >+ return NS_OK; >+ } >+#endif r=me as long as tests pass. It sounds like you might want to always use staging and never fallback to applying an update directly? I haven't put any thought into the ramifications of doing so but if you are fairly certain this is the case please file a bug to add this capability to the update process.
Attachment #702004 - Flags: review?(robert.bugzilla) → review+
Attachment #702003 - Flags: review?(fabrice) → review+
accidentally missed part of this patch, tagging dhylands for the missed part of the patch.
Attachment #702004 - Attachment is obsolete: true
Attachment #702138 - Flags: review?(dhylands)
This version of the patch moves the unsetenv to the updater process. unsetenv in the parent isn't thread safe, and would cause child apps launched after the updater to fail (since the child apps rely on LD_PRELOAD)
Attachment #702138 - Attachment is obsolete: true
Attachment #702138 - Flags: review?(dhylands)
Attachment #702153 - Flags: review?(jones.chris.g)
Comment on attachment 702153 [details] [diff] [review] Version which does the unsetenv in updater rather than in the parent. r=me with removed check for exec() return value, which is redundant. (It returning at all signals an error.)
Attachment #702153 - Flags: review?(jones.chris.g) → review+
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: