Closed Bug 712715 Opened 10 years ago Closed 9 years ago

Intermittent failure in test_addon_nonrestartless_xpi.js

Categories

(Firefox :: Sync, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla13

People

(Reporter: gps, Assigned: gps)

References

Details

(Whiteboard: [qa-])

Attachments

(1 file, 2 obsolete files)

test_addon_nonrestartless_xpi.js fails intermittently. This is likely due to bug 710448.
Attached patch Work around TPS race condition (obsolete) — Splinter Review
At least part of the problem is a race condition between the TPS test runner and code in the addons engine. This change works around the race by forcing an additional sync.
Assignee: nobody → gps
Status: NEW → ASSIGNED
Attachment #590976 - Flags: review?(rnewman)
Am I right in thinking that the two syncs are essentially glorified sleep() calls?
And by "two" I mean "extra", of course.
(In reply to Richard Newman [:rnewman] from comment #2)
> Am I right in thinking that the two syncs are essentially glorified sleep()
> calls?

Yup.
OK, then please choose one of these, in order of increasing preference:

* Comment that you're performing a sync for the side-effect of taking time, not because you want to ensure that some synchronization has occurred.

* Add a sleep operation to TPS, so at least the test is explicit and doesn't introduce another fallible network hit.

* Add a timeout-controlled polling operation to TPS, so we can poll Addons.verifyNot for some period of time. The timeout can be much longer than the sleep that we'd be willing to use.

* Directly perform whatever side-effect syncing has. If we need to spin the event loop 9 times to ensure that the addons engine is fully set up, we can do that and say why. See head_helpers.js, waitForZeroTimer, which is AWFUL but necessary, and used six times in our unit tests:

  function waitForZeroTimer(callback) {
    // First wait >100ms (nsITimers can take up to that much time to fire, so
    // we can account for the timer in delayedAutoconnect) and then two event
    // loop ticks (to account for the Utils.nextTick() in autoConnect).

* Somehow tie the appropriate TPS operations to events to achieve a strict order, rather than trying to impose an order by guessing delays. I don't know how feasible this is.

I would love to get to Level 4 in this sequence, but I'll settle for Level 1 and a followup to do better!
Attached patch Make TPS wait for Sync events (obsolete) — Splinter Review
I believe this patch achieves the highest desirable outcome from rnewman's list.

There are effectively 3 parts, but they are all related to the same systemic issue: race conditions between Sync and TPS actions.

1) *all* TPS phases now wait on weave:service:ready before starting. Before, it was possible you could start performing actions before Sync was initialized. I think we just got lucky in that it didn't (most of the time).

2) When logging in and when resetting data, TPS will wait for the start tracking event to fire before proceeding. Before, it did not wait on this.

3) A new action "EnsureTracking" is available and utilized by the addons tests. This action ensures Sync is logged in and that tracking is active. This works around the original issue in this bug. It is required by the addons engine only because the addons reconciler waits for start-tracking before becoming active. Before, we had a race condition between Sync and TPS and the reconciler wasn't fully initialized before TPS starting performing actions. Since it wasn't initialized, performed actions were getting lost. These were discovered on the 2nd sync (after start-tracking was called for the first time) and the engine regained a proper view on the world. So, I'm confident there was no bug inside the addons engine.

I've confirmed all TPS tests pass with this applied. However, I did also have bug 710448 applied, as that addresses buggy behavior exposed by TPS tests. I'm not sure if TPS tests will run cleanly with just this patch.
Attachment #590976 - Attachment is obsolete: true
Attachment #590976 - Flags: review?(rnewman)
Attachment #591313 - Flags: review?(rnewman)
Comment on attachment 591313 [details] [diff] [review]
Make TPS wait for Sync events

Review of attachment 591313 [details] [diff] [review]:
-----------------------------------------------------------------

::: services/sync/tps/extensions/tps/modules/tps.jsm
@@ +547,5 @@
> +      if (!Weave.Status.ready) {
> +        Logger.logInfo("Waiting on Sync service to become ready...");
> +        let cb = Async.makeSyncCallback();
> +        Svc.Obs.add("weave:service:ready", cb);
> +        Async.waitForSyncCallback(cb);

I tend to prefer the brevity and OO of

  let cb = Async.makeSpinningCallback();
  Svc.Obs.add("weave:service:ready", cb);
  cb.wait();

but no big deal.

@@ +557,5 @@
> +      // service:ready event, this is required to ensure all handlers have
> +      // executed.
> +      Utils.nextTick(function() {
> +        this._executeTestPhase(file, phase, settings);
> +      }, this);

Equivalent to

  Utils.nextTick(this._executeTestPhase.bind(this, file, phase, settings));

no?

@@ +572,5 @@
> +   * This is called by RunTestPhase() after the environment is validated.
> +   */
> +  _executeTestPhase: function _executeTestPhase(file, phase, settings) {
> +    try {
> +      // TODO Unregister observers on unload.

Bug number?

@@ +577,4 @@
>        Services.obs.addObserver(this, "weave:service:sync:finish", true);
>        Services.obs.addObserver(this, "weave:service:sync:error", true);
> +      Services.obs.addObserver(this, "weave:engine:start-tracking", true);
> +      Services.obs.addObserver(this, "weave:engine:stop-tracking", true);

Can we add these first? Makes sense from a chronological perspective.

@@ +720,5 @@
> +      let cb = Async.makeSyncCallback();
> +      Svc.Obs.add("weave:engine:start-tracking", cb);
> +      Async.waitForSyncCallback(cb);
> +      Svc.Obs.remove("weave:engine:start-tracking", cb);
> +      Logger.logInfo("weave:engine:start-tracking observed!");

This pattern has come up twice. How about

  function waitForEvent(event) {
    let cb = Async.makeSpinningCallback();
    Svc.Obs.add(event, cb);
    cb.wait();
    Svc.Obs.remove(event, cb);
    Logger.logInfo(event + " observed!");
  }

@@ +725,5 @@
> +    }
> +
> +    let cb = Async.makeSyncCallback();
> +    Utils.nextTick(cb);
> +    Async.waitForSyncCallback(cb);

Ooh, janky.

@@ +796,5 @@
>      Logger.AssertEqual(Weave.Status.service, Weave.STATUS_OK, "Weave status not OK");
>      Weave.Svc.Obs.notify("weave:service:setup-complete");
>      this._loggedIn = true;
> +
> +    this.waitForTracking();

start-tracking is notified in onStartup, which is in charge of the basics like creating IDs and registering engines. It seems really backward to be logging in and wiping the server before this has happened.

Does anything break if you wait for tracking *before* the login call?
Attachment #591313 - Flags: review?(rnewman) → feedback+
(In reply to Richard Newman [:rnewman] from comment #7)
> @@ +725,5 @@
> > +    }
> > +
> > +    let cb = Async.makeSyncCallback();
> > +    Utils.nextTick(cb);
> > +    Async.waitForSyncCallback(cb);
> 
> Ooh, janky.

You are telling me. What would you prefer? A timer?

> 
> @@ +796,5 @@
> >      Logger.AssertEqual(Weave.Status.service, Weave.STATUS_OK, "Weave status not OK");
> >      Weave.Svc.Obs.notify("weave:service:setup-complete");
> >      this._loggedIn = true;
> > +
> > +    this.waitForTracking();
> 
> start-tracking is notified in onStartup, which is in charge of the basics
> like creating IDs and registering engines. It seems really backward to be
> logging in and wiping the server before this has happened.
> 
> Does anything break if you wait for tracking *before* the login call?

I /think/ I tried this and encountered hangs waiting for tracking.

The login/setup code in tps.jsm is really janky. I'm pretty sure it is doing a lot of things wrong. However, the login aspects of service.js are one of the parts of Sync I still don't grok too well, so I can't say anything beyond it tingles my spider senses.
(In reply to Gregory Szorc [:gps] from comment #8)

> > Ooh, janky.
> 
> You are telling me. What would you prefer? A timer?

Heh! Naw. Not a comment about the code; a comment about the architecture

Other than explicitly lifting the event loop spinning code in Async ("Async.spinFor(1);"), there's nothing better :/

Ugly systemic design means ugly code.

> > Does anything break if you wait for tracking *before* the login call?
> 
> I /think/ I tried this and encountered hangs waiting for tracking.

I would encourage you to try again, on the off chance that it works. If it works, it'll avoid a whole class of possible bugs.

(I'm not even sure how it manages to work right now!)

> The login/setup code in tps.jsm is really janky. I'm pretty sure it is doing
> a lot of things wrong. However, the login aspects of service.js are one of
> the parts of Sync I still don't grok too well, so I can't say anything
> beyond it tingles my spider senses.

Having reworked it a couple of times, I can assure you that it's awful and doesn't make any sense.

I'm happy to let this through with the other nits addressed (if it breaks, we'll know about it!), but I'd like you to try the tracking-first approach, make sure your recollection is correct.
(In reply to Richard Newman [:rnewman] from comment #9)
> I'm happy to let this through with the other nits addressed (if it breaks,
> we'll know about it!), but I'd like you to try the tracking-first approach,
> make sure your recollection is correct.

I moved the "wait for tracking" call to before Service.login() and the test runner hangs. I think this is happening because the user prefs aren't defined until the first time Service.login() and thus onStartup() doesn't have a sane environment, so it never emits start-tracking. I could be wrong.
I think I hit all the feedback points.
Attachment #591313 - Attachment is obsolete: true
Attachment #592942 - Flags: review?(rnewman)
Comment on attachment 592942 [details] [diff] [review]
Make TPS wait for Sync events, v2

r+, but please first explore setting the credentials prefs and inverting the operations, see if that works.
Attachment #592942 - Flags: review?(rnewman) → review+
https://hg.mozilla.org/services/services-central/rev/b6968b815726

Will file follow-up for review feedback.
Whiteboard: [fixed in services]
Blocks: 723764
Whiteboard: [fixed in services] → [fixed in services][qa-]
https://hg.mozilla.org/mozilla-central/rev/b6968b815726
Status: ASSIGNED → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Whiteboard: [fixed in services][qa-] → [qa-]
Target Milestone: --- → mozilla13
This was really fixed by bug 712542. The changes in this bug did hopefully address timing issues in the TPS test harness, however.
Depends on: 712542
Component: Firefox Sync: Backend → Sync
Product: Cloud Services → Firefox
You need to log in before you can comment on or make changes to this bug.