Closed Bug 1159200 Opened 9 years ago Closed 9 years ago

Long test suites fail due to a timeout in "before each" hook

Categories

(Testing Graveyard :: JSMarionette, defect)

defect
Not set
major

Tracking

(firefox40 affected)

RESOLVED FIXED
Tracking Status
firefox40 --- affected

People

(Reporter: davehunt, Assigned: aus)

References

Details

Attachments

(2 files)

I found this while investigating the failure after landing bug 1145680. It appears that after a certain time the test harness fails to launch the B2G application correctly.

The attached test case simply launches the Setting application each time. Eventually this fails and the homescreen is displayed. This is blocking us from landing bug 1145680, as for some unknown reason the patch there causes this to happen earlier. I suspect sooner or later another patch would have the same effect.

Locally I have had this fail on the 21st iteration of the test. At the point the B2G application is showing the homescreen if you close the application the suite appears to recover. This might suggest that for some reason the application has launched unexpectedly and the harness is simultaneously attempting to launch the application, but times out. The following is the exception details shown on the console:

  1) Repeat "before each" hook:
     
  Error: timeout of 60000ms exceeded
      at null.<anonymous> (/Users/dhunt/workspace/gaia/node_modules/mocha/lib/runnable.js:139:19)
      at Timer.listOnTimeout [as ontimeout] (timers.js:112:15)

  2) Repeat "after each" hook:
     
  Error: Not connected. To write data you must call connect first.
      at TcpSync.send (/Users/dhunt/workspace/gaia/node_modules/marionette-client/lib/marionette/drivers/tcp-sync.js:162:15)
      at Object.send (/Users/dhunt/workspace/gaia/node_modules/marionette-client/lib/marionette/client.js:459:36)
      at Object.Client._sendCommand (/Users/dhunt/workspace/gaia/node_modules/marionette-client/lib/marionette/client.js:509:21)
      at Object.closeDriver (/Users/dhunt/workspace/gaia/node_modules/marionette-client/lib/marionette/client.js:768:14)
      at Object.executeHook (/Users/dhunt/workspace/gaia/node_modules/marionette-client/lib/marionette/client.js:361:20)
      at Object.Client.runHook (/Users/dhunt/workspace/gaia/node_modules/marionette-client/lib/marionette/client.js:379:7)
      at Object.destroySession [as deleteSession] (/Users/dhunt/workspace/gaia/node_modules/marionette-client/lib/marionette/client.js:778:12)
      at /Users/dhunt/workspace/gaia/node_modules/marionette-js-runner/node_modules/promise/lib/node-extensions.js:25:20
      at doResolve (/Users/dhunt/workspace/gaia/node_modules/marionette-js-runner/node_modules/promise/lib/core.js:91:5)
      at new Promise (/Users/dhunt/workspace/gaia/node_modules/marionette-js-runner/node_modules/promise/lib/core.js:71:3)
      at /Users/dhunt/workspace/gaia/node_modules/marionette-js-runner/node_modules/promise/lib/node-extensions.js:17:12
      at Context.<anonymous> (/Users/dhunt/workspace/gaia/node_modules/marionette-js-runner/lib/runtime/hostmanager.js:129:14)
      at callFn (/Users/dhunt/workspace/gaia/node_modules/mocha/lib/runnable.js:223:21)
      at Hook.Runnable.run (/Users/dhunt/workspace/gaia/node_modules/mocha/lib/runnable.js:216:7)
      at next (/Users/dhunt/workspace/gaia/node_modules/mocha/lib/runner.js:258:10)
      at Object._onImmediate (/Users/dhunt/workspace/gaia/node_modules/mocha/lib/runner.js:275:5)
      at processImmediate [as _immediateCallback] (timers.js:354:15)

  3) Repeat "after all" hook:
     
  Error: timeout of 60000ms exceeded
      at null.<anonymous> (/Users/dhunt/workspace/gaia/node_modules/mocha/lib/runnable.js:139:19)
      at Timer.listOnTimeout [as ontimeout] (timers.js:112:15)
Flags: needinfo?(kgrandon)
Flags: needinfo?(gaye)
@auswerk Can you help Dave with this one? I suspect it's related to your recent research into this...
Flags: needinfo?(kgrandon)
Flags: needinfo?(gaye)
Flags: needinfo?(aus)
@davehunt - Give this a try

marionette('Repeat', function() {

  var client = marionette.client();
  var settingsApp;

  setup(function() {
    settingsApp = new Settings(client);
    settingsApp.launch();
  });

  teardown(function() {
    settingsApp = null;
  });

  suite('Repeat suite', function() {
    for (var i = 0; i < 50; i++) {
      test(i, function() {});
    }
  });
});
Flags: needinfo?(aus)
Were you able to replicate it? It still fails for me with the teardown.
Flags: needinfo?(aus)
Ohhh, interesting! I have used that teardown to fix similar issues with other tests in the past. I have no attempted to replicate it locally yet using your test snippet.
Flags: needinfo?(aus)
:aus pointed out out IRC that this may be a duplicate of bug 1104285.
This is still blocking a couple of bugs. Have you been able to reproduce it with my reduced test case Aus?
Flags: needinfo?(aus)
I'm not sure what you were seeing on your end but here's what I've found so far.

We seem to be racing between starting up and shutting down b2g-desktop. As far as I can tell we end up talking to the wrong client because of this. That's where we get the send failure because we're not connected.

We won't be able to talk to the right client because it won't start. We have a mutex / semaphore check to ensure we don't have more than one instance.

It's possible --no-remote could be used to avoid this kind of problem, but we should really fix that race.

My assumption right now is that whatever code we're using to check that the process has fully exited is _not_ correct.
Flags: needinfo?(aus)
Soooo, we already use -no-remote in the runner itself (which starts the process and such) so ... we shouldn't be seeing that error. There's a bug in our logging that's preventing us from easily diving in deeper, we're going to fix that and see what details emerge.
(In reply to Aus Lacroix [:aus] from comment #8)
> Soooo, we already use -no-remote in the runner itself (which starts the
> process and such) so ... we shouldn't be seeing that error. There's a bug in
> our logging that's preventing us from easily diving in deeper, we're going
> to fix that and see what details emerge.

Is there a bug for the logging issue that we can add as a blocker?
Alright, more digging and more progress here. This may be a red herring but, we should clean up this issue regardless of it is the origin of the issue.

Sometimes, the python runner service will crash and needs to be restarted. When it restarts, it doesn't clean up the previous instance of b2g-bin that was running (and is still running!). If by chance we attempt to reuse the profile via tmp profile generation there will be terrible badness. Mainly that the new process we are trying to connect to via it's marionette-socket will FAIL to start. Then we get a timeout.

Anyway, we'll clean this one up and see what we find afterwards. Maybe it will be the only thing.
(In reply to Dave Hunt (:davehunt) from comment #9)
> (In reply to Aus Lacroix [:aus] from comment #8)
> > Soooo, we already use -no-remote in the runner itself (which starts the
> > process and such) so ... we shouldn't be seeing that error. There's a bug in
> > our logging that's preventing us from easily diving in deeper, we're going
> > to fix that and see what details emerge.
> 
> Is there a bug for the logging issue that we can add as a blocker?

:lightsofapollo has a patch in the works for this, right now there isn't a bug, but we can work around the problem with aggressive node DEBUG logging as well as letting python output to stdout and the likes. Right now we just swallow that output.
Assignee: nobody → aus
Status: NEW → ASSIGNED
See Also: → 1207453
Dave, I think by fixing bug 1195220 this should also be fixed. Enjoy!
Depends on: 1195220
Status: ASSIGNED → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
This does indeed appear to be fixed, thanks! I'll see if I can unbitrot the patches that were depending on this.
Product: Testing → Testing Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: