Closed
Bug 1143607
Opened 9 years ago
Closed 8 years ago
2nd run failure on emulator-x86-kk
Categories
(Firefox OS Graveyard :: Emulator, defect, P1)
Tracking
(tracking-b2g:backlog)
RESOLVED
FIXED
tracking-b2g | backlog |
People
(Reporter: swu, Assigned: jdai)
References
Details
Attachments
(6 files)
32.83 KB,
text/x-log
|
Details | |
58 bytes,
text/x-github-pull-request
|
tzimmermann
:
review+
|
Details | Review |
62 bytes,
text/x-github-pull-request
|
tzimmermann
:
review+
|
Details | Review |
809 bytes,
patch
|
Details | Diff | Splinter Review | |
21.73 KB,
text/plain
|
Details | |
1.08 KB,
patch
|
Details | Diff | Splinter Review |
After building emulator-x86-kk, we are only able to run run-emulator.sh for the first time. Attached is the error log, looks like it's related to device storage or indexedDB.
Reporter | ||
Updated•9 years ago
|
Blocks: b2g-emulator-x86-KK
Comment 2•9 years ago
|
||
(In reply to Shian-Yow Wu [:swu] from comment #0) > Created attachment 8577944 [details] > Error log. > > After building emulator-x86-kk, we are only able to run run-emulator.sh for > the first time. Attached is the error log, looks like it's related to > device storage or indexedDB. I hit the same IndexedDB error. I can get only a black screen emulator.
Assignee | ||
Comment 3•9 years ago
|
||
From my investigation it appears that 2nd run emulator those files become empty. data/local/storage/permanent/chrome/.metadata data/local/storage/default/12+f+app+++keyboard.gaiamobile.org/.metadata data/local/storage/default/21+f+app+++verticalhome.gaiamobile.org/.metadata data/local/storage/default/30+f+app+++system.gaiamobile.org/.metadata I backup those files from 1st run emulator and 2nd run emulator restore it back, I am able to boot up emulator.
Assignee | ||
Comment 4•9 years ago
|
||
The emulator-x86-kk cannot run 2nd time only happen at force close emulator. e.g., ctrl+c, force close emulator, kill emulator process. I close the emulator through emulator grahic panel button, those .metadata files are exist and not empty.
Comment 5•9 years ago
|
||
I've had some success with running `adb shell sync` before halting the emulator. Obviously not a real fix (and doesn't work if, say, the adb server wasn't running when the emulator was started), but it might be a better workaround, at least.
Assignee | ||
Comment 6•9 years ago
|
||
Attachment #8595942 -
Flags: feedback?(jld)
Assignee | ||
Comment 7•9 years ago
|
||
Attachment #8595943 -
Flags: feedback?(jld)
Assignee | ||
Updated•9 years ago
|
Assignee: nobody → jdai
Assignee | ||
Updated•9 years ago
|
Attachment #8595942 -
Flags: feedback?(jld) → review?(tzimmermann)
Assignee | ||
Updated•9 years ago
|
Attachment #8595943 -
Flags: feedback?(jld) → review?(tzimmermann)
Assignee | ||
Comment 8•9 years ago
|
||
Hi Thomas, Would you mind help me review this patches? If I assigned to the wrong reviewer person, please help me forward to the correct person. Thanks.
Comment 9•9 years ago
|
||
I tried to see if there's a less dramatic way to deal with this than mounting all of /data sync. This patch seems to avoid bug 1107608 when interrupting and restarting the emulator, although it's not a complete fix for that bug as I understand it. It would be interesting to see if it helps other people who've run into these second-boot problems. I also encountered another failure mode, where the Gaia system app didn't have the various privileged navigator properties defined; this basically breaks the entire system. I couldn't reproduce it in order to find out exactly what went wrong, but I'd guess there's someplace that app permissions are stored that's affected. Also, the profiles.ini apparently isn't fsync()ed, so it can be truncated to 0 length the same way as the QuotaManager metadata, depending on when the first boot is interrupted. But this seems to be harmless — it results in Gecko ignoring the original "default" profile directory and creating a new one, and the emulator seems to work normally.
Attachment #8596362 -
Flags: feedback?(swu)
Attachment #8596362 -
Flags: feedback?(jdai)
Comment 10•9 years ago
|
||
Comment on attachment 8595942 [details] [review] [platform_system_core] pull request 9 [Checkin: comment 28] Hi, Thanks for working on this problem. In general, I'm OK with this fix if it solves the reported problem without much overhead. I just have a few general comments and questions. - Your commit messages just contain the reported bug. It'd be better to state what the patch does. - Does 'sync' increase the overhead significantly? - Could the same effect be achieved by using ext4's 'data=ordered'?
Attachment #8595942 -
Flags: review?(tzimmermann) → review+
Comment 11•9 years ago
|
||
Comment on attachment 8595943 [details] [review] [device_generic_goldfish] pull request 27 [Checkin: comment 28] Same as comment 10.
Attachment #8595943 -
Flags: review?(tzimmermann) → review+
Reporter | ||
Comment 12•9 years ago
|
||
Comment on attachment 8596362 [details] [diff] [review] Gecko patch: fsync new QuotaManager metadata files. With this patch, there is no more error message as in comment 0. But the 2nd run failure problem still happens with different error messages, probably it's same as the other failure case you mentioned. After booted to homescreen at the 1st run, I hit ctrl-c to close the emulator.
Attachment #8596362 -
Flags: feedback?(swu)
Reporter | ||
Comment 13•9 years ago
|
||
Here is the error log for comment 12.
Comment 14•9 years ago
|
||
What's going on here is that we're not syncing writes to the permissions database (/data/local/permissions.sqlite). On first boot or after an update, the code in dom/apps/Webapps.jsm (in loadAndUpdateApps and installSystemApps; also isFirstRun in AppsUtils.jsm) copies permissions from the apps' webapps.manifest files into the permissions DB, and then sets preferences to indicate that that's been done. The prefs.js file, however, is fsync()ed; if the profiles.ini file is lost (as described above) then that doesn't matter because it will never be read, but it's possible for the emulator to be stopped when the profiles.ini has been flushed to storage but the permissions.sqlite hasn't. In that case, the next boot sees the prefs that indicate the installed apps have already been scanned, but the permissions DB is empty, which breaks the system app ("navigator.mozSettings is null" and so on). Adding fflush() and fsync() calls to nsToolkitProfileService::Flush (and applying the QuotaManager patch) makes this reliably reproducible. I'll file a separate bug for this, because this should be fixed regardless of what happens with the B2G emulator.
Comment 15•9 years ago
|
||
I also, in one case, saw corruption of permissions.sqlite, not just data loss — that is, running the sqlite3 utility on it, all commands fail with “Error: database disk image is malformed”. I wasn't able to reproduce this, but the sqlite documentation warns that it's a possibility if `PRAGMA synchronous = OFF` is used and the OS isn't shut down cleanly.
Comment 16•9 years ago
|
||
Probably not useful, but in case it is: for the permissions DB issue, it should be possible to “unbrick” the emulator by using edit_prefs.sh to set dom.apps.reset-permissions to false.
Assignee | ||
Comment 17•9 years ago
|
||
(In reply to Thomas Zimmermann [:tzimmermann] [:tdz] from comment #10) > Comment on attachment 8595942 [details] [review] > [platform_system_core] pull request 9 > > Hi, > > Thanks for working on this problem. In general, I'm OK with this fix if it > solves the reported problem without much overhead. I just have a few general > comments and questions. > > - Your commit messages just contain the reported bug. It'd be better to > state what the patch does. > I changed the commit message for more descriptions. > - Does 'sync' increase the overhead significantly? > No, it didn't. But I don't know any tool which can measure emulator performance. If you know any tool, please tell me. > - Could the same effect be achieved by using ext4's 'data=ordered'? Using ext4's 'data=ordered' can't solve this issue, because 'data=ordered' is the default mode in ext4[1]. [1] http://man7.org/linux/man-pages/man5/ext4.5.html
Comment 19•9 years ago
|
||
Comment on attachment 8596362 [details] [diff] [review] Gecko patch: fsync new QuotaManager metadata files. Bug 1157979 is going to fix this more thoroughly.
Attachment #8596362 -
Flags: feedback?(jdai)
Comment 20•9 years ago
|
||
All, About the permission DB(permissions.sqlite) issue, there are two cases as follow which have been identified. Please refer to my comment at https://bugzilla.mozilla.org/show_bug.cgi?id=1115619#c23. Case 1: App permission records in table are not completed(bug 1115619) Already land a patch to guarantee all app permissions can be loaded to permissions.sqlite completely and successfully. Case 2: corrupted permissions.sqlite(bug 1124126) The patch of handling corrupted permissions.sqlite is under review.
Comment 21•9 years ago
|
||
(In reply to John Dai[:johnz][:jdai] from comment #17) > (In reply to Thomas Zimmermann [:tzimmermann] [:tdz] from comment #10) > > Comment on attachment 8595942 [details] [review] > > [platform_system_core] pull request 9 > > > > Hi, > > > > Thanks for working on this problem. In general, I'm OK with this fix if it > > solves the reported problem without much overhead. I just have a few general > > comments and questions. > > > > - Your commit messages just contain the reported bug. It'd be better to > > state what the patch does. > > > I changed the commit message for more descriptions. > > > - Does 'sync' increase the overhead significantly? > > > No, it didn't. But I don't know any tool which can measure emulator > performance. > If you know any tool, please tell me. > I have no idea which is the best tool for measuring the performance overhead. I just did some experiments based on existing indexDB mochitests. I specifically ran test_deleteDatabase_interactions.html and test_unique_index_update.html . The average time it took for the 1st test before and after applying John's patch was 372s and 374s. For the 2nd test, before and after: 372s and 377s. Any better idea of examining the possible overhead resulting from the patch, Thomas?
Updated•9 years ago
|
Flags: needinfo?(tzimmermann)
Comment 22•9 years ago
|
||
Thanks Hsin-Yi! It was just a question to rule out some obvious problem. But AFAIK syncs in the emulator don't propagate to the actual hard drive. Comment 14 sounds scary to me. We can break a device by modifying files in such a trivial way. (?) Syncing the FS is probably OK for now, but we might want to revert this change when these DB bugs have been resolved.
Flags: needinfo?(tzimmermann)
Comment 23•9 years ago
|
||
Something like Bonnie [1] might be useful for performance testing. [1] http://www.textuality.com/bonnie/
Comment 24•9 years ago
|
||
(In reply to Thomas Zimmermann [:tzimmermann] [:tdz] from comment #22) > Thanks Hsin-Yi! It was just a question to rule out some obvious problem. But > AFAIK syncs in the emulator don't propagate to the actual hard drive. > > Comment 14 sounds scary to me. We can break a device by modifying files in > such a trivial way. (?) Syncing the FS is probably OK for now, but we might > want to revert this change when these DB bugs have been resolved. Agreed!
Comment 25•9 years ago
|
||
(In reply to Thomas Zimmermann [:tzimmermann] [:tdz] from comment #22) > Comment 14 sounds scary to me. We can break a device by modifying files in > such a trivial way. (?) Syncing the FS is probably OK for now, but we might > want to revert this change when these DB bugs have been resolved. Comment #14 is now bug 1157950. To summarize: we definitely need to use a journaling mode where the effects of unexpectedly halting the OS are limited to nondurability, and that's going to happen. The synchronization problem with Webapps.jsm is harder; most users of the permissions manager don't need durability, and it doesn't even try to provide it, not even optionally: it's doing updates asynchronously on a background thread, on top of whatever transaction loss we could get from SQLite itself. So that needs more work if we want to truly fix it; just syncing will make the race condition harder to hit by accident, but it will still exist.
Comment 26•9 years ago
|
||
Thanks for this summary.
Updated•9 years ago
|
Updated•9 years ago
|
Blocks: Emulator_L_Local
Updated•9 years ago
|
No longer blocks: Emulator_L_Local
Comment 27•9 years ago
|
||
Hi John, Can you help to land your patch which already r+? Thanks!
Status: NEW → ASSIGNED
Flags: needinfo?(jdai)
Priority: -- → P1
Assignee | ||
Updated•8 years ago
|
Flags: needinfo?(jdai)
Keywords: checkin-needed
Comment 28•8 years ago
|
||
b2g-4.4.2_r1: https://github.com/mozilla-b2g/platform_system_core/commit/8586f55fe4b015911b48e731b69c592ad82a0807 b2g-4.4.2_r1: https://github.com/mozilla-b2g/device_generic_goldfish/commit/85f6a2e1e638dbc8b119896e61383e973e764ffd
Keywords: checkin-needed
Updated•8 years ago
|
Attachment #8595942 -
Attachment description: [platform_system_core] pull request 9 → [platform_system_core] pull request 9 [Checkin: comment 28]
Updated•8 years ago
|
Attachment #8595943 -
Attachment description: [device_generic_goldfish] pull request 27 → [device_generic_goldfish] pull request 27 [Checkin: comment 28]
Assignee | ||
Comment 29•8 years ago
|
||
This issue is verified fixed on the latest builds.
Updated•8 years ago
|
Status: ASSIGNED → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•