Closed Bug 989045 Opened 6 years ago Closed 6 years ago

[tps] Registering of observer topics happens too late so 'weave:engine:start-tracking' can be missed during startup

Categories

(Testing Graveyard :: TPS, defect, major)

defect
Not set
major

Tracking

(firefox29 wontfix, firefox30 fixed, firefox31 fixed)

RESOLVED FIXED
mozilla31
Tracking Status
firefox29 --- wontfix
firefox30 --- fixed
firefox31 --- fixed

People

(Reporter: whimboo, Assigned: whimboo)

References

Details

(Keywords: hang)

Attachments

(2 files)

Executing the add-on tests some are failing after a restart in waiting for 'weave:engine:start-tracking'. Not sure yet, if that is a problem with the tests or the tps framework. I will investigate.

Here the hanging lines:
CROSSWEAVE INFO: Waiting for weave:service:ready...
CROSSWEAVE INFO: weave:service:ready observed!
CROSSWEAVE INFO: Starting phase 3/8
CROSSWEAVE INFO: setting client.name to profile1
CROSSWEAVE INFO: ----------event observed: sessionstore-windows-restored
CROSSWEAVE INFO: starting action: EnsureTracking
CROSSWEAVE INFO: Waiting for weave:engine:start-tracking...
Actually we hang in those phases when EnsureTracking() is called as first action. I'm not exactly sure yet what's the exact purpose of it. Jonathan, any idea?
Summary: [tps] Some add-on tests are hanging after a browser restart → [tps] Some add-on tests hang in EnsureTracking()
Ok, this method has been implemented along with bug 712715 to ensure that sync is definitely active before starting the steps in the phase. So it might be a regression in Sync 1.5. I will do some more testing.
fxaccounts:
CROSSWEAVE INFO: Firefox Accounts enabled: true
1395953994883 addons.manager  WARN  Exception calling callback
CROSSWEAVE INFO: Waiting for weave:service:ready...
CROSSWEAVE INFO: weave:service:ready observed!
CROSSWEAVE INFO: Starting phase 5/16
CROSSWEAVE INFO: setting client.name to profile1
CROSSWEAVE INFO: ----------event observed: sessionstore-windows-restored
CROSSWEAVE INFO: starting action: EnsureTracking
CROSSWEAVE INFO: Waiting for weave:engine:start-tracking...

old:
CROSSWEAVE INFO: Firefox Accounts enabled: false
CROSSWEAVE INFO: Waiting for weave:service:ready...
1395953825539 addons.manager  WARN  Exception calling callback
CROSSWEAVE INFO: weave:service:ready observed!
CROSSWEAVE INFO: Starting phase 5/16
CROSSWEAVE INFO: setting client.name to profile1
CROSSWEAVE INFO: ----------event observed: sessionstore-windows-restored
CROSSWEAVE INFO: starting action: EnsureTracking
CROSSWEAVE INFO: Setting client credentials and login.
CROSSWEAVE INFO: Logging in user: crossweaveservices@mozilla.com
CROSSWEAVE INFO: ----------event observed: weave:service:setup-complete
CROSSWEAVE INFO: ----------event observed: weave:engine:start-tracking

So with fxaccounts we do not login anymore for that phase given that the user is logged in, while for the old sync auth was logged out?

Edwin mind checking test_addon_nonrestartless_xpi.js? Do you see something suspicious? It might be that Sync 1.1 was broken in a way that we needed this method to explicitely login and to wait for tracking. 

For fxaccounts I can remove this call and the test works fine. So I think we can remove this call for fxaccounts but not for the old sync authentication.
Flags: needinfo?(edwong)
Summary: [tps] Some add-on tests hang in EnsureTracking() → [tps] Tests calling EnsureTracking() as first step in a phase will hang with fxaccount authentication
I can look at this tomorrow.
Blocks: 981706
No longer depends on: 981706
For reference these are the affected tests:
> test_addon_restartless_xpi.js
> test_addon_nonrestartless_xpi.js
So from the other bug 712715:

A new action "EnsureTracking" is available and utilized by the addons tests. This action ensures Sync is logged in and that tracking is active. This works around the original issue in this bug. It is required by the addons engine only because the addons reconciler waits for start-tracking before becoming active. Before, we had a race condition between Sync and TPS and the reconciler wasn't fully initialized before TPS starting performing actions. Since it wasn't initialized, performed actions were getting lost. These were discovered on the 2nd sync (after start-tracking was called for the first time) and the engine regained a proper view on the world. So, I'm confident there was no bug inside the addons engine.

So as it looks like we are missing the 'weave:engine:start-tracking' observer notification sent by Sync when you are logged in via Firefox Accounts.
Depends on: 989282
> So with fxaccounts we do not login anymore for that phase given that the
> user is logged in, while for the old sync auth was logged out?

For Sync 1.5, we auth, then we get a token 1 time with ttl of ~12hrs. We don't fetch token again until there's a browser restart or token expiry.
 
> Edwin mind checking test_addon_nonrestartless_xpi.js? Do you see something
> suspicious? It might be that Sync 1.1 was broken in a way that we needed
> this method to explicitely login and to wait for tracking. 

Without digging too deep, it's likely sync doesn't start till we have a sync token so tracking isn't used anymore?  Just guessing.

> 
> For fxaccounts I can remove this call and the test works fine. So I think we
> can remove this call for fxaccounts but not for the old sync authentication.

this sounds good
Flags: needinfo?(edwong)
(In reply to Edwin Wong [:edwong] from comment #7)
> > Edwin mind checking test_addon_nonrestartless_xpi.js? Do you see something
> > suspicious? It might be that Sync 1.1 was broken in a way that we needed
> > this method to explicitely login and to wait for tracking. 
> 
> Without digging too deep, it's likely sync doesn't start till we have a sync
> token so tracking isn't used anymore?  Just guessing.

Tracking is still used. Not sure why it's not send here. Also given the above comment the addons module waits for it to start the reconciler. I imagine it is a problem with Sync. Whom can we ask for more detailed information here?
 
> > For fxaccounts I can remove this call and the test works fine. So I think we
> > can remove this call for fxaccounts but not for the old sync authentication.
> 
> this sounds good

Meanwhile I'm not hesitated to do that right now. As said above I assume there might be a problem with Sync 1.5 here.
With enabled Trace mode I can see the following:

CROSSWEAVE INFO: starting action: EnsureTracking
1396042757354	Sync.Tracker.Tabs	TRACE	onTab event: pageshow
CROSSWEAVE INFO: Waiting for weave:engine:start-tracking...
1396042757790	Sync.RESTResponse	TRACE	Processing response headers.
1396042757794	Sync.BrowserIDManager	DEBUG	Getting a token
1396042759032	Sync.RESTResponse	TRACE	Processing response headers.
1396042759033	Sync.RESTResponse	TRACE	Processing response headers.
1396042759033	Sync.BrowserIDManager	DEBUG	Successfully got a sync token
1396042759035	Sync.BulkKeyBundle	INFO	BulkKeyBundle being created for undefined
1396042759036	Sync.BrowserIDManager	INFO	Background fetch for key bundle done
1396042759036	Sync.Status	DEBUG	Status.login: success.login => success.login
1396042759036	Sync.Status	DEBUG	Status.service: success.status_ok => success.status_ok
1396042759748	Sync.Status	DEBUG	Status.service: success.status_ok => success.status_ok
1396042766513	Sync.Status	DEBUG	Status.service: success.status_ok => success.status_ok

So the sync status is set to ok, but we actually never fire start-tracking. Brian do you have an idea or is that part outside of your expertise?
Flags: needinfo?(warner-bugzilla)
Blocks: 989249
Attached file tps.log (trace)
Trace log for the test_addon_nonrestartless_xpi.js test. I will dig through all that by tomorrow if I find the time.
I don't know how Sync's internals work.. my knowledge ends at the point where we fire the onlogin event and offer the keys+assertion to Sync. Sorry :).
Flags: needinfo?(warner-bugzilla)
So 'weave:engine:start-tracking' is currently getting fired at two places:

1. Very early during the start-up phase in onStartup():
http://mxr.mozilla.org/mozilla-central/source/services/sync/modules/service.js#356

Given that with FxAccounts we are logged in, we would have to get into this code and the notification fired. It might be that we are subscribing too late for that notification in TPS.

2. By the 'weave:service:setup-complete' observer:
http://mxr.mozilla.org/mozilla-central/source/services/sync/modules/service.js#469

Given that we run through those steps only once after the fxaccount login, it will never be send again. We usually see this after login, so this should not be the code we are looking for.


I will do some debugging in that module and check why we don't see this notification.
Ok, so I was right with my assumption that TPS is registering for those observer notifications too late. Moving the registering to the new _init() method makes it passing our tests, and no more hang occurs. I will do some more tests and also check with the old sync authentication before uploading the patch.
Component: Server: Sync → TPS
Product: Mozilla Services → Testing
Summary: [tps] Tests calling EnsureTracking() as first step in a phase will hang with fxaccount authentication → [tps] Registering of observer topics happens too late so 'weave:engine:start-tracking' can be missed during startup
Keywords: hang
Comment on attachment 8399561 [details] [diff] [review]
Early registering of observer topics v1

Review of attachment 8399561 [details] [diff] [review]:
-----------------------------------------------------------------

Good detective work!
Attachment #8399561 - Flags: review?(jgriffin) → review+
https://hg.mozilla.org/mozilla-central/rev/ecda58bb90a7
Status: ASSIGNED → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Comment on attachment 8399561 [details] [diff] [review]
Early registering of observer topics v1

[Approval Request Comment]
Bug caused by (feature/regressing bug #): None
User impact if declined: None, just testing code
Testing completed (on m-c, etc.): m-c
Risk to taking this patch (and alternatives if risky): tps tests can hang
String or IDL/UUID changes made by this patch: None
Attachment #8399561 - Flags: approval-mozilla-aurora?
The following changeset is now in Firefox Nightly:

> ecda58bb90a7 Bug 989045 - [tps] Registering of observer topics happens too late so 'weave:engine:start-tracking' can be missed during startup. r=jgriffin DONTBUILD

Nightly Build Information:

        ID: 20140402030201
 Changeset: 4941a2ac0786109b08856738019b016a6c5a66a6
   Version: 31.0a1
      TBPL: https://tbpl.mozilla.org/?rev=4941a2ac0786
       URL: https://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/2014/04/2014-04-02-03-02-01-mozilla-central

Download Links:

>         Linux x86: https://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/2014/04/2014-04-02-03-02-01-mozilla-central/firefox-31.0a1.en-US.linux-i686.tar.bz2
>      Linux x86_64: https://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/2014/04/2014-04-02-03-02-01-mozilla-central/firefox-31.0a1.en-US.linux-x86_64.tar.bz2
> Linux x86_64 ASAN: https://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/2014/04/2014-04-02-03-02-01-mozilla-central/firefox-31.0a1.en-US.linux-x86_64-asan.tar.bz2
>               Mac: https://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/2014/04/2014-04-02-03-02-01-mozilla-central/firefox-31.0a1.en-US.mac.dmg
>             Win32: https://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/2014/04/2014-04-02-03-02-01-mozilla-central/firefox-31.0a1.en-US.win32.installer.exe
>             Win64: https://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/2014/04/2014-04-02-03-02-01-mozilla-central/firefox-31.0a1.en-US.win64-x86_64.installer.exe

Previous Nightly Build Information:

        ID: 20140401030203
 Changeset: 1417d180a1d8665b1a91b897d1cc4cc31e7980d4
   Version: 31.0a1
      TBPL: https://tbpl.mozilla.org/?rev=1417d180a1d8
       URL: https://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/2014/04/2014-04-01-03-02-03-mozilla-central
Comment on attachment 8399561 [details] [diff] [review]
Early registering of observer topics v1

Given the low risk here and the affected on Beta, please also uplift there if it can land cleanly otherwise put up a new branch-specific patch for approval.
Attachment #8399561 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Product: Testing → Testing Graveyard
You need to log in before you can comment on or make changes to this bug.