Closed Bug 1065128 Opened 10 years ago Closed 10 years ago

Changing Geolocation settings on flame causes settings observers to fail

Categories

(Core :: DOM: Device Interfaces, defect)

x86_64
Linux
defect
Not set
normal

Tracking

()

VERIFIED FIXED
2.1 S5 (26sep)
blocking-b2g 2.1+
Tracking Status
firefox33 --- wontfix
firefox34 --- verified
firefox35 --- verified
b2g-v1.4 --- unaffected
b2g-v2.0 --- unaffected
b2g-v2.0M --- unaffected
b2g-v2.1 --- verified
b2g-v2.2 --- verified

People

(Reporter: qdot, Assigned: qdot)

References

Details

(Keywords: regression, Whiteboard: [systemsfe])

Attachments

(10 files, 4 obsolete files)

99.19 KB, text/plain
Details
106.02 KB, patch
Details | Diff | Splinter Review
543.45 KB, text/plain
Details
493.21 KB, text/plain
Details
373.68 KB, text/plain
Details
498.70 KB, text/plain
Details
1.74 KB, patch
ferjm
: review+
Details | Diff | Splinter Review
1.84 KB, patch
bent.mozilla
: review+
Details | Diff | Splinter Review
2.76 KB, patch
bent.mozilla
: review+
Details | Diff | Splinter Review
500.44 KB, text/x-log
Details
STR: 1. Flash phone with flame-jb m-c from https://tbpl.mozilla.org/?rev=4d1793da0b96 (pvtbuilds link: https://pvtbuilds.mozilla.org/pvt/mozilla.org/b2gotoro/nightly/mozilla-central-flame-eng/2014/09/2014-09-09-04-02-19/) 2. Go through FTU 3. Open Settings 4. Open Language Panel 5. Change language. Language should change correctly 6. Close Language Panel. Go to main Settings app panel. 7. Turn off Geolocation 8. Open Language Panel 9. Change language. Expected behavior: Language changes Actual behavior: Language stays the same. On phone reboot, phone will lock up at boot screen due to "settings not found" errors. PLEASE POST LOGCATS WITH ALL PHONE RUNS MENTIONED.
Ok, ran a few more times. I can get settings failures 100% of the time after changing geolocation. I always see IPC failures after that: --- E/GeckoConsole( 279): 1410264816013 Identity.FxAccounts DEBUG watch: {213135a1-2f20-4d3f-aac3-cee77cebf544} E/GeckoConsole( 279): 1410264816014 Identity.FxAccounts DEBUG Current rp flows: 2 I/Gecko ( 279): 1410264816018 FirefoxAccounts ERROR OFFLINE E/GeckoConsole( 279): [JavaScript Error: "1410264816025 Identity.FxAccounts ERROR get silent assertion failed: {"error":"OFFLINE"}" {file: "resource://gre/modules/Log.jsm" line: 749}] I/Gecko ( 279): IPDL protocol error: Handler for AsyncMessage returned error code I/Gecko ( 279): I/Gecko ( 279): ###!!! [Parent][DispatchAsyncMessage] Error: (msgtype=0x1C00A2,name=PContent::Msg_AsyncMessage) Processing error: message was deserialized, but the handler returned false (indicating failure) I/Gecko ( 279): I/Gecko ( 279): [Parent 279] WARNING: waitpid failed pid:1255 errno:10: file ../../../gecko/ipc/chromium/src/base/process_util_posix.cc, line 261 I/Gecko ( 279): IPDL protocol error: Handler for AsyncMessage returned error code I/Gecko ( 279): I/Gecko ( 279): ###!!! [Parent][DispatchAsyncMessage] Error: (msgtype=0x1C00A2,name=PContent::Msg_AsyncMessage) Processing error: message was deserialized, but the handler returned false (indicating failure) I/Gecko ( 279): I/Gecko ( 279): IPDL protocol error: Handler for AsyncMessage returned error code I/Gecko ( 279): I/Gecko ( 279): ###!!! [Parent][DispatchAsyncMessage] Error: (msgtype=0x1C00A2,name=PContent::Msg_AsyncMessage) Processing error: message was deserialized, but the handler returned false (indicating failure) I/Gecko ( 279): E/GeckoConsole( 279): [JavaScript Error: "NS_ERROR_NOT_INITIALIZED: Component returned failure code: 0xc1f30001 (NS_ERROR_NOT_INITIALIZED) [nsIMessageSender.sendAsyncMessage]" {file: "resource://gre/modules/AlarmService.jsm" line: 190}] E/GeckoConsole( 1285): [JavaScript Warning: "Unknown property 'will-change'. Declaration dropped." {file: "resource://gre-resources/full-screen-override.css" line: 24 column: 13 source: " will-change: initial !important;"}] --- However, I'm having problems repro'ing the device not booting afterward. On runs where the device will not run after reboot, I saw IndexedDB transaction errors after these IPC messages, so I have a feeling that may be a race condition based on transaction lifetimes. Will post logcats once I can fully repro.
Whiteboard: [systemfe]
Kyle: Flashing the engineering build using the TPE tool, I am not seeing this issue. Attaching large logcat which includes rebooting. Gaia 4acd3e69b263b54f4111e3586ff4ade84b49b4da SourceStamp 6b8da5940f74 BuildID 20140909040219 Version 35.0a1 base image: v123 The language settings were fine, and after restarting the device I did not get any errors.
Attachment #8486782 - Attachment description: settings.txt → Logcat after fresh flash using TPE tool
Flame-JB, confirming 512 memory. This is not the same device used as in the last attachment. STR: 1. Fresh flash using TPE tool 2. Run through FTU 3. Open Settings - Change language to Italian 4. Close Language panel - Shut off Geo 5. Back to language panel - Change to Romanian 6. Reboot device. Actual: *Language doesn't change. *After Reboot language is set to Romanian. *Phone doesn't lock up at reboot.
Adding qawanted to see if someone else can reproduce.
blocking-b2g: --- → backlog
Keywords: qawanted
I was able to reproduce the issue mentioned at step 9 of original STR where the language does not change after toggling Geolocation button, but NOT the issue where rebooting locks the phone. The language issue repro rate is 3/3. It repro's using both our own flash tool and the flash tool provided at comment 0. It repro's on latest tinderbox eng master build, as well as the nightly eng build provided at comment 0. The reboot lockup issue repro rate is 0/3.
QA Whiteboard: [QAnalyst-Triage?]
Flags: needinfo?(jmitchell)
Keywords: qawanted
Actually, un-setting tracking flag since I wasn't able to repro the boot lockup issue which is being used in the title.
I haven't been able to get the settings DB crash to repro again, so downgrading the title a bit and resetting tracking flag. Turned on settings API logging. It looks like whatever the error in Comment #1 is killing also causes the SettingsRequestManager to drop permissions mappings for some principals for some reason, meaning we no longer call observers correctly. This is why it looks like settings doesn't work anymore. Things are getting set, but no observer is ever getting called. This may or may not get divided out into a separate bug.
Summary: Flame will not boot after settings change → Changing Geolocation settings on flame causes settings observers to fail
Attached file Logcat of settings crash (obsolete) —
And of course the second I say I can't repro the settings crash, I do. :| Repro was just "Flash phone, run adb reboot after FTU" comes up. Have attached logcat of phone after reboot to show what settings crash looks like. Unforutnately it seems to have truncated the gecko logs so it doesn't show boot?
Whenever the phone gets into this state it will ALWAYS crash on reboot, so I managed to get a full log.
Attachment #8486882 - Attachment is obsolete: true
Looking at Comment 1 again, I just realized I'm ALWAYS seeing those FxAccounts lines before IPC failures, which look like they're killing message managers when they happen. Trying to debug this now.
QA Whiteboard: [QAnalyst-Triage?] → [QAnalyst-Triage+]
Flags: needinfo?(jmitchell)
It looks like this is due to things that require principal checks for sendAsyncMessage (in the case of geolocation, identity and settings as part of the findmydevice app) running during inner-window-destroyed observation. Trying to figure out if something in destruction order changed recently, as findmydevice has always done this. Also trying to identify if this is a red herring in relation to the observers disappearing.
QA Wanted - I think we need branch checks here if we can reproduce this.
QA Whiteboard: [QAnalyst-Triage+]
Keywords: qawanted
This branch check is ONLY for changing language fails after toggling Geolocation button issue. I was NOT able to reproduce the reboot phone crash bug so this branch check doesn't include the crash bug. This issue occurs on Flame 2.2 and Open C 2.2. Changing language fails after toggling Geolocation button. Device: Flame 2.2 Master BuildID: 20140910060915 Gaia: f108c706fae43cd61628babdd9463e7695b2496e Gecko: 843332cc69af Version: 35.0a1 (2.2 Master) Firmware: V123 User Agent: Mozilla/5.0 (Mobile; rv:35.0) Gecko/35.0 Firefox/35.0 Device: Open_C 2.2 Master BuildID: 20140910060915 Gaia: f108c706fae43cd61628babdd9463e7695b2496e Gecko: 843332cc69af Version: 35.0a1 (2.2 Master) Firmware: P821A10v1.0.0B06_LOG_DL User Agent: Mozilla/5.0 (Mobile; rv:35.0) Gecko/35.0 Firefox/35.0 -------------- This issue does NOT occur on Flame 2.1. Toggling Geolocation button doesn't affect language changing. Repro rate: 0/4. Device: Flame 2.1 BuildID: 20140910122556 Gaia: 6cb9de2c10070dc638ed9bc083ad5ff9a065db36 Gecko: fcbfbd3cba3f Version: 34.0a2 (2.1) Firmware: V123 User Agent: Mozilla/5.0 (Mobile; rv:33.0) Gecko/33.0 Firefox/33.0
QA Whiteboard: [QAnalyst-Triage?]
Flags: needinfo?(jmitchell)
Keywords: qawantedregression
QA Whiteboard: [QAnalyst-Triage?] → [QAnalyst-Triage+]
Flags: needinfo?(jmitchell)
Ok, well, cleaned up the IPC issues, but that wasn't the problem. It seems that the way SettingsRequestManager indexes message manager principals means we only ever store one principal (this is due to me making the mistake of thinking objects can be keys :( ), and if that gets deleted (which happens when findmydevice shuts down when we turn off geolocation), we no longer have any principal to check against. Not only that, this means that we use the permissions of whatever principal is in the list for ALL checks on observers, which is bad. Need to figure out a new way to match message managers to principals. That should fix this right up.
Calling SendAsyncMessage after inner-window-destroyed is bad. Not only do we not have a principal at that point, the permissions check in the parent fails even if we do manage to send the message by caching the principal. Calling this on dom-window-destroyed fixes the error.
Attachment #8488137 - Flags: review?(ferjmoreno)
Same case as patch 1. SendAsyncMessage after inner-window-destroyed is a bad idea. Moving this to dom-window-destroyed fixes permissions issues.
Attachment #8488141 - Flags: review?(bent.mozilla)
Trying to use an object as a hashmap means we converted objects to strings, and since we were trying to use an XPCOM object as the key, it always converted to the same string. This lookup table is O(n) for all operations, but we'll never have THAT many observers.
Attachment #8488142 - Flags: review?(bent.mozilla)
Whitespace and debug message cleanup
Attachment #8488142 - Attachment is obsolete: true
Attachment #8488142 - Flags: review?(bent.mozilla)
Attachment #8488153 - Flags: review?(bent.mozilla)
Comment on attachment 8488153 [details] [diff] [review] Patch 3 (v2) - Fix SettingsRequestManager storage of MessageManager Principals Switching this to use a Map type.
Attachment #8488153 - Flags: review?(bent.mozilla)
MessageManager Principals now uses Map type, which will handle XPCOM objects as keys.
Attachment #8488153 - Attachment is obsolete: true
Attachment #8488201 - Flags: review?(bent.mozilla)
Whiteboard: [systemfe] → [systemsfe]
Comment on attachment 8488201 [details] [diff] [review] Patch 3 (v3) - Fix SettingsRequestManager storage of MessageManager Principals Review of attachment 8488201 [details] [diff] [review]: ----------------------------------------------------------------- ::: dom/settings/SettingsRequestManager.jsm @@ +742,3 @@ > }, > > removeMessageManager: function(aMsgMgr){ nit: )<space>{
Fix nits pointed out by :mikeh
Attachment #8488201 - Attachment is obsolete: true
Attachment #8488201 - Flags: review?(bent.mozilla)
Attachment #8488245 - Flags: review?(bent.mozilla)
Josh - Can you provide a blocking triage here?
Flags: needinfo?(jmitchell)
nomming for 2.2 - failing to set the correct language is bad UX at best and possibly a Cert issue
blocking-b2g: backlog → 2.2?
Flags: needinfo?(jmitchell)
qDot: IT PASSES!! \o/ \o/ *ahem* ...confirming that with the above patches applied, I no longer see the issue that was blocking my patch in bug 1053966.
Comment on attachment 8488137 [details] [diff] [review] Patch 1 (v1) - Make Identity API use dom-window-destroyed for shutdown Review of attachment 8488137 [details] [diff] [review]: ----------------------------------------------------------------- Thanks Kyle! ::: dom/identity/nsDOMIdentity.js @@ +707,5 @@ > > // nsIObserver > observe: function nsDOMIdentityInternal_observe(aSubject, aTopic, aData) { > + let window = aSubject.QueryInterface(Ci.nsIDOMWindow); > + if(window != this._window) { nit: if<space>( @@ +753,3 @@ > // innerwindow id to construct the unique id. > this._id = uuidgen.generateUUID().toString(); > this._innerWindowID = util.currentInnerWindowID; It seems that this is not used anymore so it can be remove.
Attachment #8488137 - Flags: review?(ferjmoreno) → review+
Attachment #8488141 - Flags: review?(bent.mozilla) → review+
Comment on attachment 8488245 [details] [diff] [review] Patch 3 (v4) - Fix SettingsRequestManager storage of MessageManager Principals Review of attachment 8488245 [details] [diff] [review]: ----------------------------------------------------------------- ::: dom/settings/SettingsRequestManager.jsm @@ +707,5 @@ > + // Make sure we're not leaking since Map requires management by > + // hand. > + let i = 0; > + for (let it of this.mmPrincipals.keys()) { > + i = i + 1; Can't you just use the 'size' property?
Attachment #8488245 - Flags: review?(bent.mozilla) → review+
Comment on attachment 8488137 [details] [diff] [review] Patch 1 (v1) - Make Identity API use dom-window-destroyed for shutdown Approval Request Comment [Feature/regressing bug #]: Bug 1061510 [User impact if declined]: Settings observers never fire [Describe test coverage new/current, TBPL]: Mostly QA hands on due to OOP nature of bug [Risks and why]: None [String/UUID change made/needed]: None
Attachment #8488137 - Flags: approval-mozilla-aurora?
Comment on attachment 8488141 [details] [diff] [review] Patch 2 (v1) - Make SettingsManager use dom-window-destroyed for shutdown Approval Request Comment [Feature/regressing bug #]: Bug 1061510 [User impact if declined]: Settings observers never fire [Describe test coverage new/current, TBPL]: Mostly QA hands on due to OOP nature of bug [Risks and why]: None [String/UUID change made/needed]: None
Attachment #8488141 - Flags: approval-mozilla-aurora?
Comment on attachment 8488245 [details] [diff] [review] Patch 3 (v4) - Fix SettingsRequestManager storage of MessageManager Principals Approval Request Comment [Feature/regressing bug #]: Bug 1061510 [User impact if declined]: Settings observers never fire [Describe test coverage new/current, TBPL]: Mostly QA hands on due to OOP nature of bug [Risks and why]: None [String/UUID change made/needed]: None
Comment on attachment 8488245 [details] [diff] [review] Patch 3 (v4) - Fix SettingsRequestManager storage of MessageManager Principals Approval Request Comment [Feature/regressing bug #]: Bug 1061510 [User impact if declined]: Settings observers never fire [Describe test coverage new/current, TBPL]: Mostly QA hands on due to OOP nature of bug [Risks and why]: None [String/UUID change made/needed]: None
Attachment #8488245 - Flags: approval-mozilla-aurora?
2.1 will be affected once bug 1061510 is uplifted. It just got approval-aurora+'d, so marking this as affected now.
blocking-b2g: 2.2? → 2.1+
Attachment #8488137 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Attachment #8488141 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Attachment #8488245 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Blocks: 1063510
Blocks: 1062069
Blocks: 1053793
Target Milestone: --- → 2.1 S5 (26sep)
It looks like this commit made the b2gperf and eideticker integration tests much less reliable, causing the b2g process to crash when we're interacting with the phone using marionette: http://eideticker-ci1.ateam.phx1.mozilla.com:8080/job/b2g.flame-512.b2g-inbound.tests/ http://jenkins1.qa.scl3.mozilla.com:8080/view/Performance/job/flame-319.b2g-inbound.perf.b2gperf/ It looks like the unreliability started within this range: http://hg.mozilla.org/integration/b2g-inbound/pushloghtml?fromchange=2a7cae8b0b5d&tochange=ee4e51c71b03 And this patch is the most probably culprit. Needinfo'ing :zac to followup
Flags: needinfo?(zcampbell)
(In reply to William Lachance (:wlach) from comment #37) > It looks like this commit made the b2gperf and eideticker integration tests > much less reliable, causing the b2g process to crash when we're interacting > with the phone using marionette: > > http://eideticker-ci1.ateam.phx1.mozilla.com:8080/job/b2g.flame-512.b2g- > inbound.tests/ > http://jenkins1.qa.scl3.mozilla.com:8080/view/Performance/job/flame-319.b2g- > inbound.perf.b2gperf/ > > It looks like the unreliability started within this range: > http://hg.mozilla.org/integration/b2g-inbound/ > pushloghtml?fromchange=2a7cae8b0b5d&tochange=ee4e51c71b03 > > And this patch is the most probably culprit. Needinfo'ing :zac to followup Marking NO_UPLIFT till this is clarified.
Whiteboard: [systemsfe] → [systemsfe][NO_UPLIFT]
OK I've run the b2gperf for the two builds in the regression range of comment #37. The b2gperf test launches the Settings app, measures the launch, kills it, waits 10 seconds launches again, 30 times over. To gain confidence in the last good, I ran 3 times the test run of 30 launches with no failures. On the first failing, it failed after 3 launches. First failing b2g-inbound: build ID 20140915134152 changeset https://tbpl.mozilla.org/?tree=B2g-Inbound&showall=1&rev=d4b6ecaee33c Last good b2g-inbound: Build ID: 20140915131232 changeset: https://tbpl.mozilla.org/?tree=B2g-Inbound&showall=1&rev=3d4b867d6bc7 Logcat coming up.
Flags: needinfo?(zcampbell)
Logcat for b2gperf failures on build 20140915134152
Depends on: 1068962
Moving b2gperf failures to bug 1068962, since trying to follow it here is getting messy and these patches have already landed to m-c. If this gets backed out, announce it on dev-b2g, because you're going to break everyone using nightlies. This is a problem between how the perf tests does app launch/kill at the settings API.
(In reply to Kyle Machulis [:kmachulis] [:qdot] (USE NEEDINFO?) from comment #42) > Moving b2gperf failures to bug 1068962, since trying to follow it here is > getting messy and these patches have already landed to m-c. > > If this gets backed out, announce it on dev-b2g, because you're going to > break everyone using nightlies. This is a problem between how the perf tests > does app launch/kill at the settings API. In that case, it sounds like we can't backout here. We'll just have to resolve bug 1068962 before uplifting.
Dependency fixed, so we're good to uplift here.
Whiteboard: [systemsfe][NO_UPLIFT] → [systemsfe]
Verified fixed on Flame 2.2 (319mb/full flash) and Flame 2.1 (319mb/full flash) Actual result: The language changes after toggling geolocation, no errors displayed upon reboot. Device: Flame 2.2 BuildID: 20141011040204 Gaia: 95f580a1522ffd0f09302372b78200dab9b6f322 Gecko: 3f6a51950eb5 Gonk: 52c909e821d107d414f851e267dedcd7aae2cebf Version: 35.0a1 (2.2) Firmware: V180 User Agent: Mozilla/5.0 (Mobile; rv:35.0) Gecko/35.0 Firefox/35.0 Device: Flame 2.1 BuildID: 20141011000201 Gaia: f5d4ff60ffed8961f7d0380ada9d0facfdfd56b1 Gecko: d813d79d3eae Gonk: 52c909e821d107d414f851e267dedcd7aae2cebf Version: 34.0a2 (2.1) Firmware: V180 User Agent: Mozilla/5.0 (Mobile; rv:34.0) Gecko/34.0 Firefox/34.0
Status: RESOLVED → VERIFIED
QA Whiteboard: [QAnalyst-Triage+] → [QAnalyst-Triage?]
Flags: needinfo?(ktucker)
QA Whiteboard: [QAnalyst-Triage?] → [QAnalyst-Triage+]
Flags: needinfo?(ktucker)
This is making Mulet unable to start most of the time, as reported in bug 1082001. I had a look but I could not understand exactly the reason why changing inner-window-destroyed to dom-window-destroyed would fix the present bug. When we revert this bug, we don't experience anymore the mulet breakage reported in bug 1082001. Ben or Kyle, can you help me understanding ?
Flags: needinfo?(kyle)
Flags: needinfo?(bent.mozilla)
Whoops. Replies on the wrong bug. Listening to inner-window-destroyed is fine. However, bug 1065128 moved certain calls to dom-window-destroyed because both settings and identity were calling sendAsyncMessage from inner-window-destroyed, which doesn't work because at that point there's no window. However, when I fixed this for settings in bug 1065128, I made SettingsManager::cleanup() be called in dom-window-destroyed, which also involves things like unregistering listeners and settings the cached window object to null. What we could do is only make the requires sendAsyncMessage calls in dom-window-destroyed, then do everything else in inner-window-destroyed? Don't know if that'd fix this or not.
Flags: needinfo?(kyle)
Flags: needinfo?(bent.mozilla)
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: