Changing Geolocation settings on flame causes settings observers to fail

VERIFIED FIXED in Firefox 34, Firefox OS v2.1

Status

()

VERIFIED FIXED
4 years ago
3 years ago

People

(Reporter: qdot, Assigned: qdot)

Tracking

({regression})

unspecified
2.1 S5 (26sep)
x86_64
Linux
regression
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(blocking-b2g:2.1+, firefox33 wontfix, firefox34 verified, firefox35 verified, b2g-v1.4 unaffected, b2g-v2.0 unaffected, b2g-v2.0M unaffected, b2g-v2.1 verified, b2g-v2.2 verified)

Details

(Whiteboard: [systemsfe])

Attachments

(10 attachments, 4 obsolete attachments)

99.19 KB, text/plain
Details
106.02 KB, patch
Details | Diff | Splinter Review
543.45 KB, text/plain
Details
493.21 KB, text/plain
Details
373.68 KB, text/plain
Details
498.70 KB, text/plain
Details
1.74 KB, patch
ferjm
: review+
Details | Diff | Splinter Review
1.84 KB, patch
bent.mozilla
: review+
Details | Diff | Splinter Review
2.76 KB, patch
bent.mozilla
: review+
Details | Diff | Splinter Review
500.44 KB, text/x-log
Details
STR:

1. Flash phone with flame-jb m-c from https://tbpl.mozilla.org/?rev=4d1793da0b96 (pvtbuilds link: https://pvtbuilds.mozilla.org/pvt/mozilla.org/b2gotoro/nightly/mozilla-central-flame-eng/2014/09/2014-09-09-04-02-19/)
2. Go through FTU
3. Open Settings
4. Open Language Panel
5. Change language. Language should change correctly
6. Close Language Panel. Go to main Settings app panel.
7. Turn off Geolocation
8. Open Language Panel
9. Change language.

Expected behavior:

Language changes

Actual behavior:

Language stays the same. On phone reboot, phone will lock up at boot screen due to "settings not found" errors.
 
PLEASE POST LOGCATS WITH ALL PHONE RUNS MENTIONED.
Ok, ran a few more times. I can get settings failures 100% of the time after changing geolocation. I always see IPC failures after that:

---

E/GeckoConsole(  279): 1410264816013	Identity.FxAccounts	DEBUG	watch: {213135a1-2f20-4d3f-aac3-cee77cebf544}
E/GeckoConsole(  279): 1410264816014	Identity.FxAccounts	DEBUG	Current rp flows: 2
I/Gecko   (  279): 1410264816018	FirefoxAccounts	ERROR	OFFLINE
E/GeckoConsole(  279): [JavaScript Error: "1410264816025	Identity.FxAccounts	ERROR	get silent assertion failed: {"error":"OFFLINE"}" {file: "resource://gre/modules/Log.jsm" line: 749}]
I/Gecko   (  279): IPDL protocol error: Handler for AsyncMessage returned error code
I/Gecko   (  279): 
I/Gecko   (  279): ###!!! [Parent][DispatchAsyncMessage] Error: (msgtype=0x1C00A2,name=PContent::Msg_AsyncMessage) Processing error: message was deserialized, but the handler returned false (indicating failure)
I/Gecko   (  279): 
I/Gecko   (  279): [Parent 279] WARNING: waitpid failed pid:1255 errno:10: file ../../../gecko/ipc/chromium/src/base/process_util_posix.cc, line 261
I/Gecko   (  279): IPDL protocol error: Handler for AsyncMessage returned error code
I/Gecko   (  279): 
I/Gecko   (  279): ###!!! [Parent][DispatchAsyncMessage] Error: (msgtype=0x1C00A2,name=PContent::Msg_AsyncMessage) Processing error: message was deserialized, but the handler returned false (indicating failure)
I/Gecko   (  279): 
I/Gecko   (  279): IPDL protocol error: Handler for AsyncMessage returned error code
I/Gecko   (  279): 
I/Gecko   (  279): ###!!! [Parent][DispatchAsyncMessage] Error: (msgtype=0x1C00A2,name=PContent::Msg_AsyncMessage) Processing error: message was deserialized, but the handler returned false (indicating failure)
I/Gecko   (  279): 
E/GeckoConsole(  279): [JavaScript Error: "NS_ERROR_NOT_INITIALIZED: Component returned failure code: 0xc1f30001 (NS_ERROR_NOT_INITIALIZED) [nsIMessageSender.sendAsyncMessage]" {file: "resource://gre/modules/AlarmService.jsm" line: 190}]
E/GeckoConsole( 1285): [JavaScript Warning: "Unknown property 'will-change'.  Declaration dropped." {file: "resource://gre-resources/full-screen-override.css" line: 24 column: 13 source: "  will-change: initial !important;"}]

---


However, I'm having problems repro'ing the device not booting afterward. On runs where the device will not run after reboot, I saw IndexedDB transaction errors after these IPC messages, so I have a feeling that may be a race condition based on transaction lifetimes. Will post logcats once I can fully repro.
Created attachment 8486782 [details]
Logcat after fresh flash using TPE tool

Kyle: Flashing the engineering build using the TPE tool, I am not seeing this issue. Attaching large logcat which includes rebooting.

Gaia   4acd3e69b263b54f4111e3586ff4ade84b49b4da
SourceStamp 6b8da5940f74
BuildID 20140909040219
Version 35.0a1
base image: v123

The language settings were fine, and after restarting the device I did not get any errors.
Attachment #8486782 - Attachment description: settings.txt → Logcat after fresh flash using TPE tool
Created attachment 8486804 [details]
Logcat - trying a second time with same STR

Flame-JB, confirming 512 memory. This is not the same device used as in the last attachment.

STR:
1. Fresh flash using TPE tool
2. Run through FTU
3. Open Settings - Change language to Italian
4. Close Language panel - Shut off Geo
5. Back to language panel - Change to Romanian
6. Reboot device.

Actual: 
*Language doesn't change. 
*After Reboot language is set to Romanian. 
*Phone doesn't lock up at reboot.
Adding qawanted to see if someone else can reproduce.
blocking-b2g: --- → backlog
Keywords: qawanted
Created attachment 8486873 [details]
logcat reproducing language issue but not reboot issue

I was able to reproduce the issue mentioned at step 9 of original STR where the language does not change after toggling Geolocation button, but NOT the issue where rebooting locks the phone.

The language issue repro rate is 3/3. It repro's using both our own flash tool and the flash tool provided at comment 0. It repro's on latest tinderbox eng master build, as well as the nightly eng build provided at comment 0.

The reboot lockup issue repro rate is 0/3.
QA Whiteboard: [QAnalyst-Triage?]
status-b2g-v2.2: --- → affected
Flags: needinfo?(jmitchell)
Keywords: qawanted
Actually, un-setting tracking flag since I wasn't able to repro the boot lockup issue which is being used in the title.
status-b2g-v2.2: affected → ---
I haven't been able to get the settings DB crash to repro again, so downgrading the title a bit and resetting tracking flag.

Turned on settings API logging. It looks like whatever the error in Comment #1 is killing also causes the SettingsRequestManager to drop permissions mappings for some principals for some reason, meaning we no longer call observers correctly. This is why it looks like settings doesn't work anymore. Things are getting set, but no observer is ever getting called. This may or may not get divided out into a separate bug.
status-b2g-v2.2: --- → affected
Summary: Flame will not boot after settings change → Changing Geolocation settings on flame causes settings observers to fail
Created attachment 8486882 [details]
Logcat of settings crash

And of course the second I say I can't repro the settings crash, I do. :|

Repro was just "Flash phone, run adb reboot after FTU" comes up.

Have attached logcat of phone after reboot to show what settings crash looks like. Unforutnately it seems to have truncated the gecko logs so it doesn't show boot?
Created attachment 8486883 [details]
Logcat of settings crash (v2)

Whenever the phone gets into this state it will ALWAYS crash on reboot, so I managed to get a full log.
Attachment #8486882 - Attachment is obsolete: true
Looking at Comment 1 again, I just realized I'm ALWAYS seeing those FxAccounts lines before IPC failures, which look like they're killing message managers when they happen. Trying to debug this now.
QA Whiteboard: [QAnalyst-Triage?] → [QAnalyst-Triage+]
Flags: needinfo?(jmitchell)
It looks like this is due to things that require principal checks for sendAsyncMessage (in the case of geolocation, identity and settings as part of the findmydevice app) running during inner-window-destroyed observation. Trying to figure out if something in destruction order changed recently, as findmydevice has always done this. Also trying to identify if this is a red herring in relation to the observers disappearing.
QA Wanted - I think we need branch checks here if we can reproduce this.
QA Whiteboard: [QAnalyst-Triage+]
Keywords: qawanted
This branch check is ONLY for changing language fails after toggling Geolocation button issue. I was NOT able to reproduce the reboot phone crash bug so this branch check doesn't include the crash bug.

This issue occurs on Flame 2.2 and Open C 2.2. Changing language fails after toggling Geolocation button.

Device: Flame 2.2 Master
BuildID: 20140910060915
Gaia: f108c706fae43cd61628babdd9463e7695b2496e
Gecko: 843332cc69af
Version: 35.0a1 (2.2 Master)
Firmware: V123
User Agent: Mozilla/5.0 (Mobile; rv:35.0) Gecko/35.0 Firefox/35.0

Device: Open_C 2.2 Master
BuildID: 20140910060915
Gaia: f108c706fae43cd61628babdd9463e7695b2496e
Gecko: 843332cc69af
Version: 35.0a1 (2.2 Master)
Firmware: P821A10v1.0.0B06_LOG_DL
User Agent: Mozilla/5.0 (Mobile; rv:35.0) Gecko/35.0 Firefox/35.0

--------------

This issue does NOT occur on Flame 2.1. Toggling Geolocation button doesn't affect language changing. Repro rate: 0/4.

Device: Flame 2.1
BuildID: 20140910122556
Gaia: 6cb9de2c10070dc638ed9bc083ad5ff9a065db36
Gecko: fcbfbd3cba3f
Version: 34.0a2 (2.1)
Firmware: V123
User Agent: Mozilla/5.0 (Mobile; rv:33.0) Gecko/33.0 Firefox/33.0
QA Whiteboard: [QAnalyst-Triage?]
status-b2g-v2.1: --- → unaffected
Flags: needinfo?(jmitchell)
Keywords: qawanted → regression
QA Whiteboard: [QAnalyst-Triage?] → [QAnalyst-Triage+]
Flags: needinfo?(jmitchell)
Ok, well, cleaned up the IPC issues, but that wasn't the problem. It seems that the way SettingsRequestManager indexes message manager principals means we only ever store one principal (this is due to me making the mistake of thinking objects can be keys :( ), and if that gets deleted (which happens when findmydevice shuts down when we turn off geolocation), we no longer have any principal to check against. Not only that, this means that we use the permissions of whatever principal is in the list for ALL checks on observers, which is bad.

Need to figure out a new way to match message managers to principals. That should fix this right up.
Created attachment 8488137 [details] [diff] [review]
Patch 1 (v1) - Make Identity API use dom-window-destroyed for shutdown

Calling SendAsyncMessage after inner-window-destroyed is bad. Not only do we not have a principal at that point, the permissions check in the parent fails even if we do manage to send the message by caching the principal. Calling this on dom-window-destroyed fixes the error.
Attachment #8488137 - Flags: review?(ferjmoreno)
Created attachment 8488141 [details] [diff] [review]
Patch 2 (v1) - Make SettingsManager use dom-window-destroyed for shutdown

Same case as patch 1. SendAsyncMessage after inner-window-destroyed is a bad idea. Moving this to dom-window-destroyed fixes permissions issues.
Attachment #8488141 - Flags: review?(bent.mozilla)
Created attachment 8488142 [details] [diff] [review]
Patch 3 (v1) - Fix SettingsRequestManager storage of MessageManager Principals

Trying to use an object as a hashmap means we converted objects to strings, and since we were trying to use an XPCOM object as the key, it always converted to the same string. This lookup table is O(n) for all operations, but we'll never have THAT many observers.
Attachment #8488142 - Flags: review?(bent.mozilla)
Created attachment 8488153 [details] [diff] [review]
Patch 3 (v2) - Fix SettingsRequestManager storage of MessageManager Principals

Whitespace and debug message cleanup
Attachment #8488142 - Attachment is obsolete: true
Attachment #8488142 - Flags: review?(bent.mozilla)
Attachment #8488153 - Flags: review?(bent.mozilla)
Comment on attachment 8488153 [details] [diff] [review]
Patch 3 (v2) - Fix SettingsRequestManager storage of MessageManager Principals

Switching this to use a Map type.
Attachment #8488153 - Flags: review?(bent.mozilla)
Created attachment 8488201 [details] [diff] [review]
Patch 3 (v3) - Fix SettingsRequestManager storage of MessageManager Principals

MessageManager Principals now uses Map type, which will handle XPCOM objects as keys.
Attachment #8488153 - Attachment is obsolete: true
Attachment #8488201 - Flags: review?(bent.mozilla)

Updated

4 years ago
Whiteboard: [systemfe] → [systemsfe]
Comment on attachment 8488201 [details] [diff] [review]
Patch 3 (v3) - Fix SettingsRequestManager storage of MessageManager Principals

Review of attachment 8488201 [details] [diff] [review]:
-----------------------------------------------------------------

::: dom/settings/SettingsRequestManager.jsm
@@ +742,3 @@
>    },
>  
>    removeMessageManager: function(aMsgMgr){

nit: )<space>{
Created attachment 8488245 [details] [diff] [review]
Patch 3 (v4) - Fix SettingsRequestManager storage of MessageManager Principals

Fix nits pointed out by :mikeh
Attachment #8488201 - Attachment is obsolete: true
Attachment #8488201 - Flags: review?(bent.mozilla)
Attachment #8488245 - Flags: review?(bent.mozilla)
Josh - Can you provide a blocking triage here?
Flags: needinfo?(jmitchell)
nomming for 2.2 - failing to set the correct language is bad UX at best and possibly a Cert issue
blocking-b2g: backlog → 2.2?
Flags: needinfo?(jmitchell)
qDot: IT PASSES!! \o/ \o/

*ahem* ...confirming that with the above patches applied, I no longer see the issue that was blocking my patch in bug 1053966.
Comment on attachment 8488137 [details] [diff] [review]
Patch 1 (v1) - Make Identity API use dom-window-destroyed for shutdown

Review of attachment 8488137 [details] [diff] [review]:
-----------------------------------------------------------------

Thanks Kyle!

::: dom/identity/nsDOMIdentity.js
@@ +707,5 @@
>  
>    // nsIObserver
>    observe: function nsDOMIdentityInternal_observe(aSubject, aTopic, aData) {
> +    let window = aSubject.QueryInterface(Ci.nsIDOMWindow);
> +    if(window != this._window) {

nit: if<space>(

@@ +753,3 @@
>      // innerwindow id to construct the unique id.
>      this._id = uuidgen.generateUUID().toString();
>      this._innerWindowID = util.currentInnerWindowID;

It seems that this is not used anymore so it can be remove.
Attachment #8488137 - Flags: review?(ferjmoreno) → review+
Attachment #8488141 - Flags: review?(bent.mozilla) → review+
Comment on attachment 8488245 [details] [diff] [review]
Patch 3 (v4) - Fix SettingsRequestManager storage of MessageManager Principals

Review of attachment 8488245 [details] [diff] [review]:
-----------------------------------------------------------------

::: dom/settings/SettingsRequestManager.jsm
@@ +707,5 @@
> +      // Make sure we're not leaking since Map requires management by
> +      // hand.
> +      let i = 0;
> +      for (let it of this.mmPrincipals.keys()) {
> +        i = i + 1;

Can't you just use the 'size' property?
Attachment #8488245 - Flags: review?(bent.mozilla) → review+
Comment on attachment 8488137 [details] [diff] [review]
Patch 1 (v1) - Make Identity API use dom-window-destroyed for shutdown

Approval Request Comment
[Feature/regressing bug #]: Bug 1061510
[User impact if declined]: Settings observers never fire
[Describe test coverage new/current, TBPL]: Mostly QA hands on due to OOP nature of bug
[Risks and why]: None
[String/UUID change made/needed]: None
Attachment #8488137 - Flags: approval-mozilla-aurora?
Comment on attachment 8488141 [details] [diff] [review]
Patch 2 (v1) - Make SettingsManager use dom-window-destroyed for shutdown

Approval Request Comment
[Feature/regressing bug #]: Bug 1061510
[User impact if declined]: Settings observers never fire
[Describe test coverage new/current, TBPL]: Mostly QA hands on due to OOP nature of bug
[Risks and why]: None
[String/UUID change made/needed]: None
Attachment #8488141 - Flags: approval-mozilla-aurora?
Comment on attachment 8488245 [details] [diff] [review]
Patch 3 (v4) - Fix SettingsRequestManager storage of MessageManager Principals

Approval Request Comment
[Feature/regressing bug #]: Bug 1061510
[User impact if declined]: Settings observers never fire
[Describe test coverage new/current, TBPL]: Mostly QA hands on due to OOP nature of bug
[Risks and why]: None
[String/UUID change made/needed]: None
Comment on attachment 8488245 [details] [diff] [review]
Patch 3 (v4) - Fix SettingsRequestManager storage of MessageManager Principals

Approval Request Comment
[Feature/regressing bug #]: Bug 1061510
[User impact if declined]: Settings observers never fire
[Describe test coverage new/current, TBPL]: Mostly QA hands on due to OOP nature of bug
[Risks and why]: None
[String/UUID change made/needed]: None
Attachment #8488245 - Flags: approval-mozilla-aurora?
2.1 will be affected once bug 1061510 is uplifted. It just got approval-aurora+'d, so marking this as affected now.
status-b2g-v2.1: unaffected → affected

Updated

4 years ago
blocking-b2g: 2.2? → 2.1+

Updated

4 years ago
Attachment #8488137 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+

Updated

4 years ago
Attachment #8488141 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+

Updated

4 years ago
Attachment #8488245 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+

Updated

4 years ago
Blocks: 1063510

Updated

4 years ago
Blocks: 1062069

Updated

4 years ago
Blocks: 1053793
Target Milestone: --- → 2.1 S5 (26sep)
It looks like this commit made the b2gperf and eideticker integration tests much less reliable, causing the b2g process to crash when we're interacting with the phone using marionette:

http://eideticker-ci1.ateam.phx1.mozilla.com:8080/job/b2g.flame-512.b2g-inbound.tests/
http://jenkins1.qa.scl3.mozilla.com:8080/view/Performance/job/flame-319.b2g-inbound.perf.b2gperf/

It looks like the unreliability started within this range: http://hg.mozilla.org/integration/b2g-inbound/pushloghtml?fromchange=2a7cae8b0b5d&tochange=ee4e51c71b03

And this patch is the most probably culprit. Needinfo'ing :zac to followup
Flags: needinfo?(zcampbell)
(In reply to William Lachance (:wlach) from comment #37)
> It looks like this commit made the b2gperf and eideticker integration tests
> much less reliable, causing the b2g process to crash when we're interacting
> with the phone using marionette:
> 
> http://eideticker-ci1.ateam.phx1.mozilla.com:8080/job/b2g.flame-512.b2g-
> inbound.tests/
> http://jenkins1.qa.scl3.mozilla.com:8080/view/Performance/job/flame-319.b2g-
> inbound.perf.b2gperf/
> 
> It looks like the unreliability started within this range:
> http://hg.mozilla.org/integration/b2g-inbound/
> pushloghtml?fromchange=2a7cae8b0b5d&tochange=ee4e51c71b03
> 
> And this patch is the most probably culprit. Needinfo'ing :zac to followup

Marking NO_UPLIFT till this is clarified.
Whiteboard: [systemsfe] → [systemsfe][NO_UPLIFT]

Comment 40

4 years ago
OK I've run the b2gperf for the two builds in the regression range of comment #37.

The b2gperf test launches the Settings app, measures the launch, kills it, waits 10 seconds launches again, 30 times over.

To gain confidence in the last good, I ran 3 times the test run of 30 launches with no failures.

On the first failing, it failed after 3 launches.

First failing b2g-inbound: 
build ID 20140915134152
changeset https://tbpl.mozilla.org/?tree=B2g-Inbound&showall=1&rev=d4b6ecaee33c

Last good b2g-inbound: 
Build ID: 20140915131232
changeset: https://tbpl.mozilla.org/?tree=B2g-Inbound&showall=1&rev=3d4b867d6bc7

Logcat coming up.
Flags: needinfo?(zcampbell)

Comment 41

4 years ago
Created attachment 8490653 [details]
b2gperf-20140915134152.log

Logcat for b2gperf failures on build 20140915134152
status-b2g-v2.2: affected → fixed
Moving b2gperf failures to bug 1068962, since trying to follow it here is getting messy and these patches have already landed to m-c.

If this gets backed out, announce it on dev-b2g, because you're going to break everyone using nightlies. This is a problem between how the perf tests does app launch/kill at the settings API.
(In reply to Kyle Machulis [:kmachulis] [:qdot] (USE NEEDINFO?) from comment #42)
> Moving b2gperf failures to bug 1068962, since trying to follow it here is
> getting messy and these patches have already landed to m-c.
> 
> If this gets backed out, announce it on dev-b2g, because you're going to
> break everyone using nightlies. This is a problem between how the perf tests
> does app launch/kill at the settings API.

In that case, it sounds like we can't backout here. We'll just have to resolve bug 1068962 before uplifting.
Dependency fixed, so we're good to uplift here.
Whiteboard: [systemsfe][NO_UPLIFT] → [systemsfe]
status-firefox33: --- → wontfix
status-firefox34: --- → fixed
status-firefox35: --- → fixed
Duplicate of this bug: 1060217

Comment 47

4 years ago
Verified fixed on Flame 2.2 (319mb/full flash) and Flame 2.1 (319mb/full flash)

Actual result: The language changes after toggling geolocation, no errors displayed upon reboot.

Device: Flame 2.2
BuildID: 20141011040204
Gaia: 95f580a1522ffd0f09302372b78200dab9b6f322
Gecko: 3f6a51950eb5
Gonk: 52c909e821d107d414f851e267dedcd7aae2cebf
Version: 35.0a1 (2.2)
Firmware: V180
User Agent: Mozilla/5.0 (Mobile; rv:35.0) Gecko/35.0 Firefox/35.0

Device: Flame 2.1
BuildID: 20141011000201
Gaia: f5d4ff60ffed8961f7d0380ada9d0facfdfd56b1
Gecko: d813d79d3eae
Gonk: 52c909e821d107d414f851e267dedcd7aae2cebf
Version: 34.0a2 (2.1)
Firmware: V180
User Agent: Mozilla/5.0 (Mobile; rv:34.0) Gecko/34.0 Firefox/34.0
Status: RESOLVED → VERIFIED
QA Whiteboard: [QAnalyst-Triage+] → [QAnalyst-Triage?]
status-b2g-v2.1: fixed → verified
status-b2g-v2.2: fixed → verified
status-firefox34: fixed → verified
status-firefox35: fixed → verified
Flags: needinfo?(ktucker)
QA Whiteboard: [QAnalyst-Triage?] → [QAnalyst-Triage+]
Flags: needinfo?(ktucker)
This is making Mulet unable to start most of the time, as reported in bug 1082001. I had a look but I could not understand exactly the reason why changing inner-window-destroyed to dom-window-destroyed would fix the present bug.

When we revert this bug, we don't experience anymore the mulet breakage reported in bug 1082001.

Ben or Kyle, can you help me understanding ?
Flags: needinfo?(kyle)
Flags: needinfo?(bent.mozilla)
Whoops. Replies on the wrong bug.

Listening to inner-window-destroyed is fine. However, bug 1065128 moved certain calls to dom-window-destroyed because both settings and identity were calling sendAsyncMessage from inner-window-destroyed, which doesn't work because at that point there's no window.

However, when I fixed this for settings in bug 1065128, I made SettingsManager::cleanup() be called in dom-window-destroyed, which also involves things like unregistering listeners and settings the cached window object to null. What we could do is only make the requires sendAsyncMessage calls in dom-window-destroyed, then do everything else in inner-window-destroyed? Don't know if that'd fix this or not.
Flags: needinfo?(kyle)
Flags: needinfo?(bent.mozilla)
You need to log in before you can comment on or make changes to this bug.