Closed Bug 1165765 Opened 9 years ago Closed 8 years ago

Intermittent Gij Automation Error: mozprocess timed out after 330 seconds running ['make', 'test-integration', 'REPORTER=mocha-tbpl-reporter', 'TEST_MANIFEST=./shared/test/integration/tbpl-manifest.json', 'NODE_MODULE_SRC=npm-cache', 'VIRTUALENV_EXISTS=1'

Categories

(Firefox OS Graveyard :: Gaia::SMS, defect)

ARM
Gonk (Firefox OS)
defect
Not set
normal

Tracking

(firefox40 unaffected, firefox41 unaffected, firefox42 fixed, firefox-esr38 unaffected, b2g-v2.2 unaffected, b2g-master fixed)

RESOLVED FIXED
FxOS-S3 (24Jul)
Tracking Status
firefox40 --- unaffected
firefox41 --- unaffected
firefox42 --- fixed
firefox-esr38 --- unaffected
b2g-v2.2 --- unaffected
b2g-master --- fixed

People

(Reporter: philor, Assigned: julienw)

References

Details

(Keywords: intermittent-failure)

Attachments

(4 files)

If for some reason you wanted to see some of the times this has failed over the last couple of months, you could look at the misstars of it as bug 1013585.
Summary: Intermittent Automation Error: mozprocess timed out after 330 seconds running ['make', 'test-integration', 'REPORTER=mocha-tbpl-reporter', 'TEST_MANIFEST=./shared/test/integration/tbpl-manifest.json', 'NODE_MODULE_SRC=npm-cache', 'VIRTUALENV_EXISTS=1'] → Intermittent Gij Automation Error: mozprocess timed out after 330 seconds running ['make', 'test-integration', 'REPORTER=mocha-tbpl-reporter', 'TEST_MANIFEST=./shared/test/integration/tbpl-manifest.json', 'NODE_MODULE_SRC=npm-cache', 'VIRTUALENV_EXISTS=1'
Hard to believe this and bug 1165758 aren't related.
See Also: → 1165759
Seems odd that this only appears to affect Gij(19).
Flags: needinfo?(kgrandon)
I don't think there's much I can do here. Seems like a runner issue, and may be a dupe? Gareth or Aus - any ideas here?
Flags: needinfo?(kgrandon)
Flags: needinfo?(gaye)
Flags: needinfo?(aus)
It's possible that we have certain tests running that don't output anything for 330 seconds and then cause mozprocess to kill the test. We could either bump the timeout up from 330 to 600 (we use this for build tests and linting already).

:gaye, if that seems reasonable I'll go ahead and write a patch up tomorrow for this.
Flags: needinfo?(aus)
Wouldn't it be more reasonable to try splitting up the long-running test first?
(In reply to Ryan VanderMeulen [:RyanVM UTC-4] from comment #415)
> Wouldn't it be more reasonable to try splitting up the long-running test
> first?

Sure, we can try that but I would imagine that it would still happen since it's likely to be one single test that's running for more than 5.5 minutes without output.

I'm going to go ahead and push a change that increases the number of chunks and we can decide if increasing the process output timeout to 600s from 330s is still necessary.
I guess this is mine now.
Assignee: nobody → aus
Status: NEW → ASSIGNED
Comment on attachment 8635459 [details] [review]
[gaia] nullaus:master > mozilla-b2g:master

r=me , tests running as expected.
Attachment #8635459 - Flags: review+
Commit (master): https://github.com/mozilla-b2g/gaia/commit/d12d12570571aad2c2b4a1e6574d1687345ccc70

Going to go ahead and mark this resolved, we can reopen if we still see it happening on master.
Status: ASSIGNED → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Clearing ni? request for :gaye.
Flags: needinfo?(gaye)
Target Milestone: --- → FxOS-S3 (24Jul)
Better, but still happening. Any suggestions for what to try next?
Flags: needinfo?(aus)
We could bump up the max runtime and see if that does it for the rest of the timeouts.
Flags: needinfo?(aus)
reopen, seems this got lately a lot of hits
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Looking at the brasstacks link, it seems this is always happening on sms composer_test.js [1]. And that test is quite long. I'm going to disable that test for now, and perhaps someone from sms can take a look at it.

1.) https://github.com/mozilla-b2g/gaia/blob/bc1be88909cb74b0e19507c83d1083c9bbcb4b5a/apps/sms/test/marionette/composer_test.js#L131
Julien could you take a look at this specific test in composer_test.js [1]? It seems it's taking longer than 5.5 minutes to run sometimes in automation, which makes taskcluster fail it. We could either make the test console.log something, or split the test into multiple tests I think.

1.) https://github.com/mozilla-b2g/gaia/blob/bc1be88909cb74b0e19507c83d1083c9bbcb4b5a/apps/sms/test/marionette/composer_test.js#L131
Flags: needinfo?(felash)
Assignee: aus → felash
Flags: needinfo?(felash)
Comment on attachment 8707541 [details] [review]
[gaia] julienw:1165765-split-composer-tests > mozilla-b2g:master

Hey Oleg, what do you think of this simple patch ?
Attachment #8707541 - Flags: review?(azasypkin)
Comment on attachment 8707541 [details] [review]
[gaia] julienw:1165765-split-composer-tests > mozilla-b2g:master

Looks good to me, thanks for the quick fix!

> It seems it's taking longer than 5.5 minutes to run sometimes in automation

Ouch, honestly I'd be very surprised if any _non-trivial app integration_ test can be stable enough in automation if it's that slow :/
Attachment #8707541 - Flags: review?(azasypkin) → review+
master: 612e09a4183157fe7659c5ed4e5336a620f56a58
Status: REOPENED → RESOLVED
Closed: 9 years ago8 years ago
Resolution: --- → FIXED
orangefactor looks good so far :)
Grr it would be easier if we could have a screenshot at that moment :/
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
On a related note I filed bug 1240557 to get more logs also in mozilla-inbound and friends.
What's weird to me is that it happens only in mozilla-inbound or fx-team lately. Is there anything special about these trees ?
I've seen it on other trees too, like gaia-master and mozilla-inbound (for instance [1]).

1.) https://treeherder.mozilla.org/logviewer.html#?job_id=3788889&repo=b2g-inbound
sorry, meant b2g-inbound ^^
OK thanks, will look into it then :/
Attached image screenshot of the issue
I managed to reproduce by reducing my computer's CPU. Here is a screenshot. Then it seems to wait forever, and I'll explain why in a moment.

I've not seen where the first recipient "g" comes from. The "om" comes from the longer mail "a@b.com" where "a@b.c" was deleted.

Indeed at the end of the test we run this code:

  clearRecipients: function() {
    var recipientsList = this.accessors.recipientsList;
    recipientsList.tap();
    while (recipientsList.text() !== '') {
      recipientsList.sendKeys(this.KEYS.backspace);
    }
  },

But because we focus after "c" and not at the end, we never ever clear the recipient zone and we then we keep sending "backspace".

I think we need to fix 2 things:
* where we focus
* the "clearRecipients" logic
And bonus points if we find where the "g" comes from.


Note: on intel's CPU on Linux we can reduce the CPU power with this command:

sudo sh -c 'echo 25 > /sys/devices/system/cpu/intel_pstate/max_perf_pct'

Don't forget to turn it back to 100 afterwards :)
I also noticed this error but with the fast computer setup:

  1) Messages Composer Messages Composer Test Suite Message char counter and MMS label 2:
     
  ssertionError: Element should be displayed: expected false to be true
      at Function.assert.isTrue (node_modules/chai/lib/chai/interface/assert.js:317:31)
      at assertIsDisplayed (apps/sms/test/marionette/composer_test.js:32:12)
      at Context.<anonymous> (apps/sms/test/marionette/composer_test.js:265:7)
      at Test.MarionetteTest.run (node_modules/marionette-js-runner/lib/ui.js:25:31)

This happens because we check for the mmsLabel right after doing something, but we should instead wait that it changes its displayed status. I'll try to fix this as well here.
Just got another reproduction with "v" instead of "g", which puzzles me even more :)

I think this could come from the fact we focus the "recipient" panel after bug 1064144. But still I don't understand how we can get this "v" at all.
Component: General → Gaia::SMS
Comment on attachment 8711671 [details] [review]
[gaia] julienw:1165765-fix-intermittent-failure > mozilla-b2g:master

Hey Oleg,

what do you think ?
I still don't know why sometimes a "v" or "g" is coming as a recipient. Though now the test code should handle this.
Attachment #8711671 - Flags: review?(azasypkin)
Comment on attachment 8711671 [details] [review]
[gaia] julienw:1165765-fix-intermittent-failure > mozilla-b2g:master

Looks sane to me and we will know for sure only over time, so let's push it forward :)

Regarding "v" and "g" - did you see this in "Message char counter and MMS label" test or in other tests (like "should not enable send button if the contact is invalid" with "in>>v<<alidContact" "should display that a non existing contact is invalid" with "non_exisitin>>g<<_contact" recipients) :)? Same as you, don't have any idea why this could happen.

Thanks!
Attachment #8711671 - Flags: review?(azasypkin) → review+
I saw it in this very same test, it was the reason a@b.com got pushed to the right...
master: b69367cf772114024d4bda70a78b7a8c02485dad

let's see if this is enough !
Status: REOPENED → RESOLVED
Closed: 8 years ago8 years ago
Resolution: --- → FIXED
Removing leave-open keyword from resolved bugs, per :sylvestre.
Keywords: leave-open
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: