Closed Bug 1082001 Opened 11 years ago Closed 11 years ago

Mulet doesn't start correctly

Categories

(Firefox OS Graveyard :: Runtime, defect)

x86_64
Windows 7
defect
Not set
normal

Tracking

(firefox34 wontfix, firefox35 wontfix, firefox36 fixed, b2g-v2.1 fixed, b2g-v2.2 fixed)

RESOLVED FIXED
2.1 S8 (7Nov)
Tracking Status
firefox34 --- wontfix
firefox35 --- wontfix
firefox36 --- fixed
b2g-v2.1 --- fixed
b2g-v2.2 --- fixed

People

(Reporter: ochameau, Assigned: gerard-majax)

References

Details

(Whiteboard: [systemsfe])

Attachments

(4 files, 8 obsolete files)

When opening it, shell.html is loaded, but gaia doesn't start and the page stays black. I narrowed it down to an issue in Settings API, where a dom-window-destroyed event is dispatched during shell.html loading... We end up destroying the mozSettings API over here: http://mxr.mozilla.org/mozilla-central/source/dom/settings/SettingsManager.js#386 That explains why gaia doesn't start. But I can't get why we receive such dom-window-destroyed. It is being dispatched from these two callsites: http://mxr.mozilla.org/mozilla-central/source/dom/base/nsGlobalWindow.cpp#1562 http://mxr.mozilla.org/mozilla-central/source/dom/base/nsGlobalWindow.cpp#2868 Both seems to be very late during window destruction. This is really weird as the window works correctly, whereas the window object should be mostly non-working by the time we receive this event.
Looks like you have been able to find more info than me on this issue ...
After some more debugging, it is being called from http://mxr.mozilla.org/mozilla-central/source/dom/base/nsGlobalWindow.cpp#2177 WindowStateHolder::~WindowStateHolder() nsGlobalWindow::FreeInnerObjects() --> notify dom-window-destroyed --> break mozSettings (and many other APIs!) Given that most of this codepath is very old and that it used to work... it must be something at a higher level. I would say it can be something in session restore or in the way we load the homepage.
So the way we load the homepage is just about calling loadURI on <browser> during startup, I haven't seen anything really special.
Ok so it regressed since bug 1065128 attachment 8488141 [details] [diff] [review] landed. I'm not sure listening to dom-window-destroyed was the right fix. Listening to inner-window-destroyed from an nsIDOMGlobalPropertyInitializer looks like a common pattern...
Depends on: 1065128
Assignee: nobody → lissyx+mozillians
Attached file settings-mulet.log (obsolete) —
There's a lot of settings requests that fails.
Even if I revert bug 1065128, it goes further but I'm still stuck on the boot logo :(
Flags: needinfo?(poirot.alex)
(In reply to Alexandre LISSY :gerard-majax from comment #6) > Even if I revert bug 1065128, it goes further but I'm still stuck on the > boot logo :( Nevermind, I failed something, it's okay.
Flags: needinfo?(poirot.alex)
Listening to inner-window-destroyed is fine. However, bug 1065128 moved certain calls to dom-window-destroyed because both settings and geolocation were calling sendAsyncMessage from inner-window-destroyed, which doesn't work because at that point there's no window. However, when I fixed this for settings in bug 1065128, I made SettingsManager::cleanup() be called in dom-window-destroyed, which also involves things like unregistering listeners and settings the cached window object to null. What we could do is only make the requires sendAsyncMessage calls in dom-window-destroyed, then do everything else in inner-window-destroyed? Don't know if that'd fix this or not.
Er, meant comment 8 to be on bug 1065128, so reposted there, and I fixed settings and IDENTITY, not geolocation. Ok, back to vacation.
Attached patch bug1082001.diff (obsolete) — Splinter Review
Implementing the changes suggested
Flags: needinfo?(kyle)
Attached file settings-mulet-fixed.log (obsolete) —
log when mulet does not boot with the patch applied
Attachment #8506963 - Attachment is obsolete: true
> $ grep -n 'SettingsManager init' settings-mulet-* > settings-mulet-NOK.log:29:-*- SettingsManager: SettingsManager init > settings-mulet-OK.log:29:-*- SettingsManager: SettingsManager init > settings-mulet-OK.log:1448:-*- SettingsManager: SettingsManager init > settings-mulet-OK.log:3676:-*- SettingsManager: SettingsManager init > settings-mulet-OK.log:11180:-*- SettingsManager: SettingsManager init > settings-mulet-OK.log:11502:-*- SettingsManager: SettingsManager init
Flags: needinfo?(kyle)
Attached file settings-mulet-OK.log (obsolete) —
Attachment #8507893 - Attachment is obsolete: true
Attached file settings-mulet-NOK.log (obsolete) —
After a lot of trying, investigations and discussions, it seems we should just not use dom-window-destroyed as it seems not to be a good idea. See also but 1016952 comment 11.
Attached file Errors :)
With some changes, I have been able to get errors popping when this is failing!
Attached patch bug1082001.diff (obsolete) — Splinter Review
Attachment #8507892 - Attachment is obsolete: true
Attachment #8507951 - Attachment is obsolete: true
Attachment #8507952 - Attachment is obsolete: true
Whiteboard: [systemsfe]
(In reply to Alexandre LISSY :gerard-majax from comment #15) > After a lot of trying, investigations and discussions, it seems we should > just not use dom-window-destroyed as it seems not to be a good idea. > > See also bug 1016952 comment 11. Ok, things are getting clearer now. So we load the system app, and it makes a request to navigator.mozSettings. This triggers a SettingManager.init() call, which gets the window matching the system app. There's a trick using "about:blank" for resources cleanup, so when we unload this page this is triggering a "dom-window-destroyed" event. Since we just have a reference to the window, and not the inner window, we cannot distinguish whether we have an event for the about:blank inner window or for a legit app getting killed. So when we cleanup resources upon this dom-window-destroyed event, we indeed put SettingsManager in an unexpected state where it has no more reference to the window (it was destroyed) but it's still partly active.
Kyle, in light of comment 20, I gave a try to reverting the dom-window-destroyed changes only, keeping the fix for the principals. As far as I could test, I could not reproduce using STRs provided in bug 1065128 comment 0.
Flags: needinfo?(kyle)
Ok, comment 20 makes sense, but I'm not sure what you mean by "keeping the fix for the principals". Can you post a patch of what you've got now? Also, I was thinking about it yesterday... There's a chance the way I'm doing shutdown in SettingsManager is wrong. The reason we need the cleanup in dom-window-destroyed is so that parent process SettingsManagers would clean themselves up, since SettingsRequestManager won't get a child-process-shutdown message from them. However, since SettingsManager and SettingsRequestManager are in the same process already, SettingsRequestManager should be able to track the window object for SettingsManagers in the parent process, and do the same thing on inner-window-destroyed for these SettingsManager objects as it would on receiving child-process-shutdown for apps in B2G. This would mean we would no longer have to rely on SettingsManager running sendAsyncMessage in shutdown at all, since that seems to be causing a lot of problems. I don't think it'd be a ton of work to fix this, but I also don't know what, if anything, it would fix. It just seems like our last ditch effort to kick off transaction finalizing from the child process or SettingsManager objects seems like it's causing more pain than it should.
Flags: needinfo?(lissyx+mozillians)
Flags: needinfo?(kyle)
Flags: needinfo?(anygregor)
As I was saying yesterday on IRC, once reverted the dom-window-destroyed changes, I would still run into issues with PRODUCTION=1 Gaia builds. As far as I can say, it looks like checking out a revision just prior landing of bug 1049439 fixes the issue.
Depends on: 1049439
Flags: needinfo?(lissyx+mozillians)
(In reply to Alexandre LISSY :gerard-majax from comment #23) > As I was saying yesterday on IRC, once reverted the dom-window-destroyed > changes, I would still run into issues with PRODUCTION=1 Gaia builds. > > As far as I can say, it looks like checking out a revision just prior > landing of bug 1049439 fixes the issue. Ok, maybe it's not related to bug 1049439 in fact.
No longer depends on: 1049439
(In reply to Alexandre LISSY :gerard-majax from comment #24) > (In reply to Alexandre LISSY :gerard-majax from comment #23) > > As I was saying yesterday on IRC, once reverted the dom-window-destroyed > > changes, I would still run into issues with PRODUCTION=1 Gaia builds. > > > > As far as I can say, it looks like checking out a revision just prior > > landing of bug 1049439 fixes the issue. > > Ok, maybe it's not related to bug 1049439 in fact. That's indeed because of bug 941969.
Depends on: 941969
Depends on: 1087967
(In reply to Kyle Machulis (OUT OCT 1-NOV 30) [:kmachulis] [:qdot] (USE NEEDINFO?) from comment #22) > Ok, comment 20 makes sense, but I'm not sure what you mean by "keeping the > fix for the principals". Can you post a patch of what you've got now? > > Also, I was thinking about it yesterday... There's a chance the way I'm > doing shutdown in SettingsManager is wrong. The reason we need the cleanup > in dom-window-destroyed is so that parent process SettingsManagers would > clean themselves up, since SettingsRequestManager won't get a > child-process-shutdown message from them. However, since SettingsManager and > SettingsRequestManager are in the same process already, > SettingsRequestManager should be able to track the window object for > SettingsManagers in the parent process, and do the same thing on > inner-window-destroyed for these SettingsManager objects as it would on > receiving child-process-shutdown for apps in B2G. This would mean we would > no longer have to rely on SettingsManager running sendAsyncMessage in > shutdown at all, since that seems to be causing a lot of problems. > > I don't think it'd be a ton of work to fix this, but I also don't know what, > if anything, it would fix. It just seems like our last ditch effort to kick > off transaction finalizing from the child process or SettingsManager objects > seems like it's causing more pain than it should. This seems to be doing the trick on my mulet builds, whether with OOP enabled or not. Here is a first tentative try: https://tbpl.mozilla.org/?tree=Try&rev=d4b2410b0212 I'm checking the behavior on a real device, too, but I have been struggling with bug 1087967 and bug 969215.
So far my testing on Mulet shows that I cannot reproduce the bug documented in bug 1065128 comment 0.
(In reply to Alexandre LISSY :gerard-majax from comment #27) > So far my testing on Mulet shows that I cannot reproduce the bug documented > in bug 1065128 comment 0. Nor on my device.
From bug 1065128 SettingsManager has been changed to listen the dom-window-destroyed event for its cleanup. However, when running Gaia in Mulet, a race condition is exposed. For B2G, when loading a page, about:blank is first used. This means that window destroyed events will be triggered. However, from the dom-window-destroyed event we cannot distringuish whether this is about:blank or a legit application being closed. SettingsManager gets initialized (i.e., init() called) when the application makes use of navigator.mozSettings. So the chain of event is that we have a SettingsManager living because System app did some request. At this time, about:blank is being unloaded and triggers a dom-window-destroyed event. This makes SettingsManager doing its cleanup, especially freeing the window reference. Then in the meantime, we have the navigator.mozSettings use that is progressing. At some point, SettingsManager has no more window to send messages to, and Gaia is not able to even start. SettingsRequestManager lives on the parent process and SettingsManager lives on the child side. Part of the cleanup performed by SettingsManager was to ensure pending locks on the parent process would but forced to finalize to make sure those are being properly committed. We move this cleanup to SettingsRequestManager and we augment the lock informations with the proper inner window id. This way we can track which lock is attached to which inner window when the lock gets created. And thus we can listen on inner-window-destroyed from SettingsRequestManager to be able to force finalize on any pending lock.
Attachment #8508198 - Attachment is obsolete: true
Comment on attachment 8510417 [details] [diff] [review] Cleanup settings lock from parent itself r=bent Ben since you reviewed the first patch, please have a look at this one. There's a first green try: https://tbpl.mozilla.org/?tree=Try&rev=d4b2410b0212 And a second one is en-route to check on all other platforms, for the sake of making sure, at https://tbpl.mozilla.org/?tree=Try&rev=35e5d0443b83. I have been testing this since this morning on my mulet build, not being able to expose any issue. This is also on one of my devices that I'm using and it behaves properly for now. I'm planning for some more stress testing also.
Attachment #8510417 - Flags: review?(bent.mozilla)
Depends on: 1088683
I have been trying to document and ensure any perf issue, but ran into bug 1088683. Now that I have a workaround for this one, it's Cleopatra that is broken and that fails with: > Error in worker: Exception: out of memory (undefined:undefined)
Comment on attachment 8510417 [details] [diff] [review] Cleanup settings lock from parent itself r=bent Hrm, what guarantees you that the innerWindowID you get in the child (and then send to the parent) matches the innerWindowID you get in the parent? I don't think we synchronize that between processes, do we?
Attachment #8510417 - Flags: review?(bent.mozilla)
From bug 1065128 SettingsManager has been changed to listen the dom-window-destroyed event for its cleanup. However, when running Gaia in Mulet, a race condition is exposed. For B2G, when loading a page, about:blank is first used. This means that window destroyed events will be triggered. However, from the dom-window-destroyed event we cannot distinguish whether this is about:blank or a legit application being closed. SettingsManager gets initialized (i.e., init() called) when the application makes use of navigator.mozSettings. So the chain of event is that we have a SettingsManager living because System app did some request. At this time, about:blank is being unloaded and triggers a dom-window-destroyed event. This makes SettingsManager doing its cleanup, especially freeing the window reference. Then in the meantime, we have the navigator.mozSettings use that is progressing. At some point, SettingsManager has no more window to send messages to, and Gaia is not able to even start. SettingsRequestManager lives on the parent process and SettingsManager lives on the child side. Part of the cleanup performed by SettingsManager was to ensure pending locks on the parent process would be forced to finalize to make sure those are being properly committed. We move this cleanup to SettingsRequestManager and we augment the lock informations with the proper inner window id. This way we can track which lock is attached to which inner window when the lock gets created. And thus we can listen on inner-window-destroyed from SettingsRequestManager to be able to force finalize on any pending lock. Impacted code path are those were we are not running out of process. When we are running out of process, SettingsRequestManager already listens on the child-process-shutdown event to perform the lock finalization.
(In reply to Alexandre LISSY :gerard-majax from comment #33) > Created attachment 8512602 [details] [diff] [review] > Cleanup settings lock from parent itself r=bent > > From bug 1065128 SettingsManager has been changed to listen the > dom-window-destroyed event for its cleanup. However, when running Gaia > in Mulet, a race condition is exposed. For B2G, when loading a page, > about:blank is first used. This means that window destroyed events will > be triggered. However, from the dom-window-destroyed event we cannot > distinguish whether this is about:blank or a legit application being > closed. SettingsManager gets initialized (i.e., init() called) when the > application makes use of navigator.mozSettings. So the chain of event is > that we have a SettingsManager living because System app did some > request. At this time, about:blank is being unloaded and triggers a > dom-window-destroyed event. This makes SettingsManager doing its > cleanup, especially freeing the window reference. Then in the meantime, > we have the navigator.mozSettings use that is progressing. At some > point, SettingsManager has no more window to send messages to, and Gaia > is not able to even start. > > SettingsRequestManager lives on the parent process and SettingsManager > lives on the child side. Part of the cleanup performed by > SettingsManager was to ensure pending locks on the parent process would > be forced to finalize to make sure those are being properly committed. > We move this cleanup to SettingsRequestManager and we augment the lock > informations with the proper inner window id. This way we can track > which lock is attached to which inner window when the lock gets created. > And thus we can listen on inner-window-destroyed from > SettingsRequestManager to be able to force finalize on any pending lock. > Impacted code path are those were we are not running out of process. > When we are running out of process, SettingsRequestManager already > listens on the child-process-shutdown event to perform the lock > finalization. Refactored a little bit the OOP and non OOP code paths, and renaming method to make it more obvious. New try: https://tbpl.mozilla.org/?tree=Try&rev=ef3fdb68102e
Attachment #8510417 - Attachment is obsolete: true
(In reply to ben turner [:bent] (use the needinfo? flag!) from comment #32) > Comment on attachment 8510417 [details] [diff] [review] > Cleanup settings lock from parent itself r=bent > > Hrm, what guarantees you that the innerWindowID you get in the child (and > then send to the parent) matches the innerWindowID you get in the parent? I > don't think we synchronize that between processes, do we? Yes but that's not a problem: the issue we fix here is related to running in process. The case for oop is already handled by the child-process-shutdown code path. Given your feedback, I think it's worthwhile to make it more obvious in the patch, so I: - factorized between non OOP and OOP code paths - renamed methods to make it more clear. Tested this on Mulet and device (Flame/KK), no issue spotted so far.
Attachment #8512602 - Flags: review?(bent.mozilla)
Comment on attachment 8512602 [details] [diff] [review] Cleanup settings lock from parent itself r=bent Review of attachment 8512602 [details] [diff] [review]: ----------------------------------------------------------------- Ok this makes much more sense. Thanks! ::: dom/settings/SettingsManager.js @@ +389,5 @@ > observe: function(aSubject, aTopic, aData) { > + if (DEBUG) debug("Topic: " + aTopic); > + if (aTopic == "inner-window-destroyed") { > + let wId = aSubject.QueryInterface(Ci.nsISupportsPRUint64).data; > + if (wId == this.innerWindowID) { Nit: ===, then we won't have to worry about null equaling 0. ::: dom/settings/SettingsRequestManager.jsm @@ +755,5 @@ > } > }, > > + hasLockFinalizeTask: function(lock) { > + let task_index; Nit: This can go in the for() loop initializer right?
Attachment #8512602 - Flags: review?(bent.mozilla) → review+
From bug 1065128 SettingsManager has been changed to listen the dom-window-destroyed event for its cleanup. However, when running Gaia in Mulet, a race condition is exposed. For B2G, when loading a page, about:blank is first used. This means that window destroyed events will be triggered. However, from the dom-window-destroyed event we cannot distinguish whether this is about:blank or a legit application being closed. SettingsManager gets initialized (i.e., init() called) when the application makes use of navigator.mozSettings. So the chain of event is that we have a SettingsManager living because System app did some request. At this time, about:blank is being unloaded and triggers a dom-window-destroyed event. This makes SettingsManager doing its cleanup, especially freeing the window reference. Then in the meantime, we have the navigator.mozSettings use that is progressing. At some point, SettingsManager has no more window to send messages to, and Gaia is not able to even start. SettingsRequestManager lives on the parent process and SettingsManager lives on the child side. Part of the cleanup performed by SettingsManager was to ensure pending locks on the parent process would be forced to finalize to make sure those are being properly committed. We move this cleanup to SettingsRequestManager and we augment the lock informations with the proper inner window id. This way we can track which lock is attached to which inner window when the lock gets created. And thus we can listen on inner-window-destroyed from SettingsRequestManager to be able to force finalize on any pending lock. Impacted code path are those were we are not running out of process. When we are running out of process, SettingsRequestManager already listens on the child-process-shutdown event to perform the lock finalization.
Attachment #8512602 - Attachment is obsolete: true
Comment on attachment 8513326 [details] [diff] [review] Cleanup settings lock from parent itself r=bent Carrying r+ from :bent. Addressed nits.
Attachment #8513326 - Flags: review+
Flags: needinfo?(anygregor)
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Target Milestone: --- → 2.1 S8 (7Nov)
Blocks: 1092384
Depends on: 1092941
Comment on attachment 8513326 [details] [diff] [review] Cleanup settings lock from parent itself r=bent NOTE: Please see https://wiki.mozilla.org/Release_Management/B2G_Landing to better understand the B2G approval process and landings. [Approval Request Comment] Bug caused by (feature/regressing bug #): bug 1065128 User impact if declined: gaia not booting in some cases: mulet, might affect also some devices Testing completed: living on master since october 29th Risk to taking this patch (and alternatives if risky): low String or UUID changes made by this patch: none Bug 1092941 needs to be uplifted/landed just after this one.
Attachment #8513326 - Flags: approval-mozilla-b2g34?
Attachment #8513326 - Flags: approval-mozilla-b2g34? → approval-mozilla-b2g34+
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: