Closed Bug 1159200 Opened 9 years ago Closed 9 years ago

Long test suites fail due to a timeout in "before each" hook

Tracking

(firefox40 affected)

Status:

RESOLVED FIXED

Tracking Flags:

Tracking

Status

firefox40

---

affected

People

(Reporter: davehunt, Assigned: aus)

References

Details

Attachments

(2 files)

Launch the settings app in B2G up to 50 times 9 years ago Dave Hunt [:davehunt] [he/him] ⌚BST 454 bytes, application/x-javascript		Details
Pull Request - Clean up stale b2g instances when python service crashes. 9 years ago Aus Lacroix [:aus] 46 bytes, text/x-github-pull-request		Details \| Review

Dave Hunt [:davehunt] [he/him] ⌚BST

Reporter

Description

•

9 years ago

Attached file Launch the settings app in B2G up to 50 times — Details

I found this while investigating the failure after landing bug 1145680. It appears that after a certain time the test harness fails to launch the B2G application correctly.

The attached test case simply launches the Setting application each time. Eventually this fails and the homescreen is displayed. This is blocking us from landing bug 1145680, as for some unknown reason the patch there causes this to happen earlier. I suspect sooner or later another patch would have the same effect.

Locally I have had this fail on the 21st iteration of the test. At the point the B2G application is showing the homescreen if you close the application the suite appears to recover. This might suggest that for some reason the application has launched unexpectedly and the harness is simultaneously attempting to launch the application, but times out. The following is the exception details shown on the console:

  1) Repeat "before each" hook:
     
  Error: timeout of 60000ms exceeded
      at null.<anonymous> (/Users/dhunt/workspace/gaia/node_modules/mocha/lib/runnable.js:139:19)
      at Timer.listOnTimeout [as ontimeout] (timers.js:112:15)

  2) Repeat "after each" hook:
     
  Error: Not connected. To write data you must call connect first.
      at TcpSync.send (/Users/dhunt/workspace/gaia/node_modules/marionette-client/lib/marionette/drivers/tcp-sync.js:162:15)
      at Object.send (/Users/dhunt/workspace/gaia/node_modules/marionette-client/lib/marionette/client.js:459:36)
      at Object.Client._sendCommand (/Users/dhunt/workspace/gaia/node_modules/marionette-client/lib/marionette/client.js:509:21)
      at Object.closeDriver (/Users/dhunt/workspace/gaia/node_modules/marionette-client/lib/marionette/client.js:768:14)
      at Object.executeHook (/Users/dhunt/workspace/gaia/node_modules/marionette-client/lib/marionette/client.js:361:20)
      at Object.Client.runHook (/Users/dhunt/workspace/gaia/node_modules/marionette-client/lib/marionette/client.js:379:7)
      at Object.destroySession [as deleteSession] (/Users/dhunt/workspace/gaia/node_modules/marionette-client/lib/marionette/client.js:778:12)
      at /Users/dhunt/workspace/gaia/node_modules/marionette-js-runner/node_modules/promise/lib/node-extensions.js:25:20
      at doResolve (/Users/dhunt/workspace/gaia/node_modules/marionette-js-runner/node_modules/promise/lib/core.js:91:5)
      at new Promise (/Users/dhunt/workspace/gaia/node_modules/marionette-js-runner/node_modules/promise/lib/core.js:71:3)
      at /Users/dhunt/workspace/gaia/node_modules/marionette-js-runner/node_modules/promise/lib/node-extensions.js:17:12
      at Context.<anonymous> (/Users/dhunt/workspace/gaia/node_modules/marionette-js-runner/lib/runtime/hostmanager.js:129:14)
      at callFn (/Users/dhunt/workspace/gaia/node_modules/mocha/lib/runnable.js:223:21)
      at Hook.Runnable.run (/Users/dhunt/workspace/gaia/node_modules/mocha/lib/runnable.js:216:7)
      at next (/Users/dhunt/workspace/gaia/node_modules/mocha/lib/runner.js:258:10)
      at Object._onImmediate (/Users/dhunt/workspace/gaia/node_modules/mocha/lib/runner.js:275:5)
      at processImmediate [as _immediateCallback] (timers.js:354:15)

  3) Repeat "after all" hook:
     
  Error: timeout of 60000ms exceeded
      at null.<anonymous> (/Users/dhunt/workspace/gaia/node_modules/mocha/lib/runnable.js:139:19)
      at Timer.listOnTimeout [as ontimeout] (timers.js:112:15)

Flags: needinfo?(kgrandon)

Flags: needinfo?(gaye)

James Lal [:lightsofapollo]

Comment 1

•

9 years ago

@auswerk Can you help Dave with this one? I suspect it's related to your recent research into this...

Flags: needinfo?(kgrandon)

Flags: needinfo?(gaye)

Dave Hunt [:davehunt] [he/him] ⌚BST

Reporter

Updated

•

9 years ago

Flags: needinfo?(aus)

Aus Lacroix [:aus]

Assignee

Comment 2

•

9 years ago

@davehunt - Give this a try

marionette('Repeat', function() {

  var client = marionette.client();
  var settingsApp;

  setup(function() {
    settingsApp = new Settings(client);
    settingsApp.launch();
  });

  teardown(function() {
    settingsApp = null;
  });

  suite('Repeat suite', function() {
    for (var i = 0; i < 50; i++) {
      test(i, function() {});
    }
  });
});

Flags: needinfo?(aus)

Dave Hunt [:davehunt] [he/him] ⌚BST

Reporter

Comment 3

•

9 years ago

Were you able to replicate it? It still fails for me with the teardown.

Flags: needinfo?(aus)

Aus Lacroix [:aus]

Assignee

Comment 4

•

9 years ago

Ohhh, interesting! I have used that teardown to fix similar issues with other tests in the past. I have no attempted to replicate it locally yet using your test snippet.

Flags: needinfo?(aus)

Dave Hunt [:davehunt] [he/him] ⌚BST

Reporter

Comment 5

•

9 years ago

:aus pointed out out IRC that this may be a duplicate of bug 1104285.

Dave Hunt [:davehunt] [he/him] ⌚BST

Reporter

Comment 6

•

9 years ago

This is still blocking a couple of bugs. Have you been able to reproduce it with my reduced test case Aus?

Flags: needinfo?(aus)

Aus Lacroix [:aus]

Assignee

Comment 7

•

9 years ago

I'm not sure what you were seeing on your end but here's what I've found so far.

We seem to be racing between starting up and shutting down b2g-desktop. As far as I can tell we end up talking to the wrong client because of this. That's where we get the send failure because we're not connected.

We won't be able to talk to the right client because it won't start. We have a mutex / semaphore check to ensure we don't have more than one instance.

It's possible --no-remote could be used to avoid this kind of problem, but we should really fix that race.

My assumption right now is that whatever code we're using to check that the process has fully exited is _not_ correct.

Flags: needinfo?(aus)

Aus Lacroix [:aus]

Assignee

Comment 8

•

9 years ago

Soooo, we already use -no-remote in the runner itself (which starts the process and such) so ... we shouldn't be seeing that error. There's a bug in our logging that's preventing us from easily diving in deeper, we're going to fix that and see what details emerge.

Dave Hunt [:davehunt] [he/him] ⌚BST

Reporter

Comment 9

•

9 years ago

(In reply to Aus Lacroix [:aus] from comment #8)
> Soooo, we already use -no-remote in the runner itself (which starts the
> process and such) so ... we shouldn't be seeing that error. There's a bug in
> our logging that's preventing us from easily diving in deeper, we're going
> to fix that and see what details emerge.

Is there a bug for the logging issue that we can add as a blocker?

Aus Lacroix [:aus]

Assignee

Comment 10

•

9 years ago

Alright, more digging and more progress here. This may be a red herring but, we should clean up this issue regardless of it is the origin of the issue.

Sometimes, the python runner service will crash and needs to be restarted. When it restarts, it doesn't clean up the previous instance of b2g-bin that was running (and is still running!). If by chance we attempt to reuse the profile via tmp profile generation there will be terrible badness. Mainly that the new process we are trying to connect to via it's marionette-socket will FAIL to start. Then we get a timeout.

Anyway, we'll clean this one up and see what we find afterwards. Maybe it will be the only thing.

Aus Lacroix [:aus]

Assignee

Comment 11

•

9 years ago

(In reply to Dave Hunt (:davehunt) from comment #9)
> (In reply to Aus Lacroix [:aus] from comment #8)
> > Soooo, we already use -no-remote in the runner itself (which starts the
> > process and such) so ... we shouldn't be seeing that error. There's a bug in
> > our logging that's preventing us from easily diving in deeper, we're going
> > to fix that and see what details emerge.
> 
> Is there a bug for the logging issue that we can add as a blocker?

:lightsofapollo has a patch in the works for this, right now there isn't a bug, but we can work around the problem with aggressive node DEBUG logging as well as letting python output to stdout and the likes. Right now we just swallow that output.

Aus Lacroix [:aus]

Assignee

Comment 12

•

9 years ago

Attached file Pull Request - Clean up stale b2g instances when python service crashes. — Details

Assignee: nobody → aus

Status: NEW → ASSIGNED

Gareth Aye [:gaye] (back from PTO)

Updated

•

9 years ago

Blocks: 1161727

yifan [:yifan][:yliao]

Updated

•

9 years ago

Comment 14

•

9 years ago

Dave, I think by fixing bug 1195220 this should also be fixed. Enjoy!

Depends on: 1195220

Aus Lacroix [:aus]

Assignee

Updated

•

9 years ago

Status: ASSIGNED → RESOLVED

Closed: 9 years ago

Resolution: --- → FIXED

Dave Hunt [:davehunt] [he/him] ⌚BST

Reporter

Comment 15

•

9 years ago

This does indeed appear to be fixed, thanks! I'll see if I can unbitrot the patches that were depending on this.

BMO Automation

Updated

•

6 years ago

Product: Testing → Testing Graveyard

You need to log in before you can comment on or make changes to this bug.