Closed Bug 1342828 Opened 7 years ago Closed 7 years ago

linux64 debug mozmill test always hit timeout

Tracking

(Not tracked)

Status:

RESOLVED FIXED

People

(Reporter: arai, Assigned: tomprince)

References

Details

Attachments

(5 files, 1 obsolete file)

Use two chunks for TB Linux debug mozmill tests 7 years ago aleth [:aleth] 1.70 KB, patch	jlund : review+	Details \| Diff \| Splinter Review
Support chunks in mozmill tests. 7 years ago Tom Prince [:tomprince] 1.56 KB, patch	Fallen : review+ jorgk-bmo : approval-comm-beta+ jorgk-bmo : approval-comm-esr52+	Details \| Diff \| Splinter Review
[treeherder] tp-tc:handle-split-mozmill > mozilla:master 7 years ago GitHub Autolander Bot 47 bytes, text/x-github-pull-request		Details \| Review
Support chunked mozmill tests in try syntax. 7 years ago Tom Prince [:tomprince] 471 bytes, patch	jlund : review+	Details \| Diff \| Splinter Review
Bug 1342828 - Use many chunks in thunderbird debug tests. 7 years ago Tom Prince [:tomprince] 59 bytes, text/x-review-board-request		Details
Bug 1342828 - Use many chunks in thunderbird debug tests. 7 years ago Tom Prince [:tomprince] 59 bytes, text/x-review-board-request	nthomas : review+	Details

Tooru Fujisawa [:arai]

Reporter

Description

•

7 years ago

> command timed out: 7200 seconds elapsed running ['/tools/buildbot/bin/python',
> 'scripts/scripts/desktop_unittest.py', '--cfg', 'unittests/linux_unittest.py',
> '--mozmill-suite', 'mozmill', '--cfg', 'unittests/thunderbird_extra.py',
> '--blob-upload-branch', 'try-comm-central', '--download-symbols', 'true'],
> attempting to kill

it need to be split up into 2 or more chunks, like bug 1248692 for xpcshell test.

:aceman

Comment 1

•

7 years ago

Aleth, would you know how to do this?

Flags: needinfo?(aleth)

Tooru Fujisawa [:arai]

Reporter

Updated

•

7 years ago

Blocks: 1340947

Tooru Fujisawa [:arai]

Reporter

Comment 2

•

7 years ago

can it be separated into many smaller parts?
in that case it's easy to test try-only intermittent.

Tooru Fujisawa [:arai]

Reporter

Comment 3

•

7 years ago

I think currently mozmill test result is not reliable for finding regression, because of this issue.
even if we compare the result between previous run, the set of executed tests can be different, and we'll overlook regression, or get confused by not-new-regression.

Severity: normal → major

aleth [:aleth]

Updated

•

7 years ago

Assignee: nobody → aleth

Flags: needinfo?(aleth)

aleth [:aleth]

Comment 4

•

7 years ago

Attached patch Use two chunks for TB Linux debug mozmill tests — Details — Splinter Review

I'm not sure if mozmill supports totalChunks, but let's try it and see. If it doesn't work we might have to increase the script_maxtime instead.

Attachment #8843727 - Flags: review?(jlund)

Jorg K (CEST = GMT+2)

Comment 5

•

7 years ago

Sorry about the following unrelated comment:
On Linux debug the Xpcshell tests run as X1 and X2 on the treeherder. X1 seem to be TB tests whereas X2 are M-C tests. Could I have that on all platforms and also on opt builds.

aleth [:aleth]

Comment 6

•

7 years ago

(In reply to Jorg K (GMT+1) from comment #5)
> Sorry about the following unrelated comment:
> On Linux debug the Xpcshell tests run as X1 and X2 on the treeherder. X1
> seem to be TB tests whereas X2 are M-C tests. Could I have that on all
> platforms and also on opt builds.

That pattern is a coincidence, as you can see from the fact it's only approximately true (e.g. devtools/ in X1). The test order is more or less alphabetical.

I'm not sure what the resource costs would be of using two xpcshell chunks everywhere.

Jordan Lund (:jlund)

Comment 7

•

7 years ago

Comment on attachment 8843727 [details] [diff] [review]
Use two chunks for TB Linux debug mozmill tests

Review of attachment 8843727 [details] [diff] [review]:
-----------------------------------------------------------------

::: mozilla-tests/thunderbird_config.py
@@ +166,5 @@
>  ]
> +MOZMILL_TWO_CHUNKS = [
> +    ('mozmill', {
> +        'use_mozharness': True,
> +        'script_path': 'scripts/desktop_unittest.py',

hm, I suppose if you wanted to test this on try you hack the desktop_unittest.py script in-tree to pretend as if totalChunks:2 was passed in the script args.

Attachment #8843727 - Flags: review?(jlund) → review+

Markus Adrario [:Taraman]

Comment 8

•

7 years ago

What is the status of this bug?
It has a positive review but no chekin yet...

Flags: needinfo?(jlund)

Flags: needinfo?(aleth)

Jorg K (CEST = GMT+2)

Comment 9

•

7 years ago

Sadly I haven't seen Aleth in weeks :-(

:aceman

Comment 10

•

7 years ago

Or it just needs to land on *central? Can it break anything outside of c-c if it isn't right? In which repo does it go when the file is in mozilla-tests/ ?

Jordan Lund (:jlund)

Comment 11

•

7 years ago

(In reply to Markus Adrario [:Taraman] from comment #8)
> What is the status of this bug?
> It has a positive review but no chekin yet...

I'm not sure. It has been reviewed so it can land on buildbot-configs (outside gecko/central). Someone will have to land and be on point for any fallout.

Flags: needinfo?(jlund)

Jorg K (CEST = GMT+2)

Comment 12

•

7 years ago

Philipp, could you please take a look at this and land it if you think it's right.

Flags: needinfo?(philipp)

Philipp Kewisch [:Fallen] [:📆][:🧩]

Comment 13

•

7 years ago

Looks right to me. I'll take care to back out if there is fallout, just let me know if I miss it.

https://hg.mozilla.org/build/buildbot-configs/rev/555764ee0a26cda0164cb1d54fdb014b2afc6606

Flags: needinfo?(philipp)

Flags: needinfo?(aleth)

Tooru Fujisawa [:arai]

Reporter

Updated

•

7 years ago

Blocks: 1342734

Markus Adrario [:Taraman]

Comment 14

•

7 years ago

When is this change expected to take effect?
until now the problem remains...

Jorg K (CEST = GMT+2)

Comment 15

•

7 years ago

Philipp, can you please back this out again, it seems to have destroyed Mozmill on Linux altogether, see treeherder C-C after Mon May 1, 2017, 22:15:53: ? ? B X1 X2.

Flags: needinfo?(philipp)

Philipp Kewisch [:Fallen] [:📆][:🧩]

Comment 16

•

7 years ago

Backed out in https://hg.mozilla.org/build/buildbot-configs/rev/f3cb566ee39e4a42af2f7713053d73d98f11f7f6
Pending a merge to production and a reconfig

Flags: needinfo?(philipp)

Jorg K (CEST = GMT+2)

Comment 17

•

7 years ago

Nick, could you please take care of the merge to production, etc.

Flags: needinfo?(nthomas)

Jorg K (CEST = GMT+2)

Comment 18

•

7 years ago

Looks like it's in production now.

Flags: needinfo?(nthomas)

Jorg K (CEST = GMT+2)

Comment 19

•

7 years ago

Tom, could you take a look at this one. We put this into production and had to back it out.

Flags: needinfo?(tom.prince)

Tom Prince [:tomprince]

Assignee

Comment 20

•

7 years ago

It looks like the mozmill test runner (https://dxr.allizom.org/comm-central/source/mail/test/mozmill/runtestlist.py) doesn't have support for running in multiple chunks currently, unlike the xpcshell runner (https://dxr.allizom.org/comm-central/source/mozilla/testing/xpcshell/runxpcshelltests.py).

It looks like it would make sense to port runtestlist to use manifestparser, which can then take advantage manifestparser.chunk_by_slice. A quicker solution would be to just duplicate the slicing logic (https://dxr.allizom.org/comm-central/source/mozilla/testing/mozbase/manifestparser/manifestparser/filters.py#170-194) from there.

Tom Prince [:tomprince]

Assignee

Comment 21

•

7 years ago

Attached patch Support chunks in mozmill tests. — Details — Splinter Review

I looked into using manifestparser, but it has more expectations on test layout and test running, so I just implemented chunking in the mozmill runner.

It does look like mozmill has some support for manifestparser (although it doesn't expose filters). It might be worthwhile exploring migrating to using that, but this unblocks chunking the tests pending that work.

Flags: needinfo?(tom.prince)

Philipp Kewisch [:Fallen] [:📆][:🧩]

Comment 22

•

7 years ago

Comment on attachment 8876459 [details] [diff] [review]
Support chunks in mozmill tests.

Looks good to me, thanks for figuring it out! I agree that this is the simpler solution, especially given mozmill is only still used in Thunderbird.

Let's get this patch pushed then retry the buildbot changes. I wonder if the question mark that showed up on treeherder was just a side-effect of the chunk logic missing, or if more work is needed.

Attachment #8876459 - Flags: review+

Philipp Kewisch [:Fallen] [:📆][:🧩]

Updated

•

7 years ago

Assignee: aleth → tom.prince

Keywords: checkin-needed

Jorg K (CEST = GMT+2)

Comment 23

•

7 years ago

(In reply to Philipp Kewisch [:Fallen] from comment #22)
> Let's get this patch pushed then retry the buildbot changes.
I can certainly land this in the next push.

Can someone please enlighten me what the Buildbot changes are for. Are they needed as well? Philipp talks about a "simpler solution" presented in the second patch.

Jorg K (CEST = GMT+2)

Comment 24

•

7 years ago

Oh, looks like the first patch requests chunks and the second patch makes them possible for C-C.

Jorg K (CEST = GMT+2)

Comment 25

•

7 years ago

To see the treeherder question marks Philipp mentioned, please click here:
https://treeherder.mozilla.org/#/jobs?repo=comm-central&revision=34153f64820674523909b789c68f58e8f0c1af01

Tom Prince [:tomprince]

Assignee

Comment 26

•

7 years ago

> I wonder if the question mark that showed up on treeherder was just a side-effect of the chunk logic missing, or if more work is needed.

I had been going to guess that it was just a side effect, but looking at https://github.com/mozilla/treeherder/blob/5e7a78cbbd129b21164609a3d96353dc114ecde8/treeherder/etl/buildbot.py#L642 the trailing '$' in the regex might need to be removed.

:aceman

Comment 27

•

7 years ago

Does this change mozmill itself? May the change get overwritten if we update mozmill? If so, the hunk should be somehow marked so it doesn't get lost as it will not be upstream.

Tom Prince [:tomprince]

Assignee

Comment 28

•

7 years ago

No. The vendored copy of mozmill is at https://dxr.allizom.org/comm-central/source/mail/test/resources/mozmill . This is just the code that invokes mozmill. https://dxr.allizom.org/comm-central/source/mail/test/mozmill/runtest.py is a wrapper around mozmill which is invoked by https://dxr.allizom.org/comm-central/source/mail/test/mozmill/runtestlist.py which is what is being changed.

Philipp Kewisch [:Fallen] [:📆][:🧩]

Comment 29

•

7 years ago

(In reply to Jorg K (GMT+2) from comment #23)
> (In reply to Philipp Kewisch [:Fallen] from comment #22)
> > Let's get this patch pushed then retry the buildbot changes.
> I can certainly land this in the next push.
> 
> Can someone please enlighten me what the Buildbot changes are for. Are they
> needed as well? Philipp talks about a "simpler solution" presented in the
> second patch.

So for checkin, please push the c-c patch. I will take care of the buildbot patch once it is in. With "simpler solution", I just agreed with Tom from comment 20 and 21 that just porting the chunking logic makese sense instead of porting mozmill to use the manifestparser.

Tom, if you are certain that the $ will fix it I think you can go ahead and send a PR for treeherder. Otherwise we can fix this when the builds are in as it seems to just be a cosmetic issue.

Jorg K (CEST = GMT+2)

Comment 30

•

7 years ago

I know I make myself very unpopular by re-reviewing things before landing, but so be it.

Let's assume we have 100 tests and want two chunks. Let's also assume this_chunk is set to 1 and 2, although I can't see where this happens.

this_chunk=1:
+    tests_per_chunk = float(len(tests)) / options.total_chunks
+    start = int(round((options.this_chunk - 1) * tests_per_chunk))
+    end = int(round(options.this_chunk * tests_per_chunk))
We get: tests_per_chunk = 50.
start=0
end=50.

this_chunk=2:
We get: tests_per_chunk = 50.
start=50
end=100.

I think there is a -1 missing on the end calculation. Or am I missing something? Then I apologise in advance ;-)

Keywords: checkin-needed

Tom Prince [:tomprince]

Assignee

Comment 31

•

7 years ago

> I know I make myself very unpopular by re-reviewing things before landing, but so be it.

Easier to fix things before they land than after.

> I think there is a -1 missing on the end calculation. Or am I missing something?

Python (as well as many other languages) uses half-open intervals to represent ranges. That is, a range includes the left endpoint but not the right endpoint. So tests[0:50] selects everything from 0-49 and tests[50:100] selects everything 50-99. They do this to make this kind of code easier to reason about; you know that you've covered a large interval by smaller ones if all the endpoints match, rather than if the endpoints differ by one, for example.

> Then I apologise in advance ;-)

If something is confusing for one person, it is liable to be confusing to others, so better to catch it and improve it, if necessary. I don't *think* that any explanation is needed in the code here, but it seems better to ask than to let something confusing get through.

Jorg K (CEST = GMT+2)

Comment 32

•

7 years ago

(In reply to Tom Prince from comment #31)
> That is, a range includes the left endpoint but not the
> right endpoint. So tests[0:50] selects everything from 0-49 and
> tests[50:100] selects everything 50-99.
I thought you would answer that, I was going to look it up once I returned to my office, but you beat me to it.
This clearly shows my ignorance and the fact that we need a build/release engineer ;-)

> I don't *think* that any explanation is needed in the code here, ...
Well, if it's a language feature, it doesn't need a comment. Code is to be read by people capable of understanding it ;-)

I'll get it landed after the next M-C merge. As I explained elsewhere, M-C land so many changes that in order to see bustage early, I do a push after every M-C merge.

Keywords: checkin-needed

Jorg K (CEST = GMT+2)

Comment 33

•

7 years ago

https://hg.mozilla.org/comm-central/rev/aa78e5d5dc26d9cda6e983589ccb98a7a94054be

Philipp, please take care of the buildbot part. Can we also please fix the treeherder, cosmetic or not ;-)

Flags: needinfo?(philipp)

Keywords: checkin-needed → leave-open

Jorg K (CEST = GMT+2)

Comment 34

•

7 years ago

Mozmill still working ;-)
Just as a matter of interest, where do you increment options.this_chunk?

Tom Prince [:tomprince]

Assignee

Comment 35

•

7 years ago

this_chunk isn't every incremented. The buildbot configuration arranges to call the script twice (in the two different builds), once with --this-chunk 1 and once with --this-chunk 2. (See https://dxr.allizom.org/build-central/source/buildbotcustom/misc.py#665-686 for details)

Jorg K (CEST = GMT+2)

Comment 36

•

7 years ago

Good to know, thanks. Any reason to use allizom?
https://dxr.mozilla.org/build-central/source/buildbotcustom/misc.py#665-686 works, too.

Philipp Kewisch [:Fallen] [:📆][:🧩]

Comment 37

•

7 years ago

https://hg.mozilla.org/build/buildbot-configs/rev/4f9b97ebf944bb3fb0aff829e8450d41ffdc5821

I think buildbot-configs is merged to production and reconfigured at midnight, I always forget if this is automated or requires someone to manually intervene.

Flags: needinfo?(philipp)

Jorg K (CEST = GMT+2)

Comment 38

•

7 years ago

The question marks are back, can we get them fixed, please.

Flags: needinfo?(tom.prince)

Tom Prince [:tomprince]

Assignee

Comment 39

•

7 years ago

https://github.com/mozilla/treeherder/pull/2573 should fix the `?`s.

Flags: needinfo?(tom.prince)

GitHub Autolander Bot

Comment 40

•

7 years ago

Attached file [treeherder] tp-tc:handle-split-mozmill > mozilla:master — Details

Treeherder GitHub Bugbot

Comment 41

•

7 years ago

Commit pushed to master at https://github.com/mozilla/treeherder

https://github.com/mozilla/treeherder/commit/a0fbe83925d3a9c72363ba71b12822f5111aeae4
Bug 1342828 - Support chunked mozmill jobs (#2573)

Tom Prince [:tomprince]

Assignee

Comment 42

•

7 years ago

It looks like one of the tests is still failing due to timeout. It looks like the number of test per folder is unevenly distributed causing the first chunk to take wildly longer. (The first chunk has over 600 tests and the second has ~250).

Jorg K (CEST = GMT+2)

Comment 43

•

7 years ago

(In reply to Tom Prince from comment #39)
> https://github.com/mozilla/treeherder/pull/2573 should fix the `?`s.
It does, you can see it working on comm-beta right now. Thanks!

Tom Prince [:tomprince]

Assignee

Comment 44

•

7 years ago

It looks like https://bugzilla.mozilla.org/attachment.cgi?id=8876459 should be landed comm-beta too, since the buildbot changes are causing the tests to not run there. (This might mean that this also needs to be applied to ESR releases?)

Jorg K (CEST = GMT+2)

Comment 45

•

7 years ago

Comment on attachment 8876459 [details] [diff] [review]
Support chunks in mozmill tests.

I'll take care of it, thanks!

Attachment #8876459 - Flags: approval-comm-esr52+

Attachment #8876459 - Flags: approval-comm-beta+

Jorg K (CEST = GMT+2)

Comment 46

•

7 years ago

Beta (TB 55): https://hg.mozilla.org/releases/comm-beta/rev/4faffb05b32140d4a30c695f88abacb78812a4d8
TB 52 ESR:    https://hg.mozilla.org/releases/comm-esr52/rev/1bd907ee760875974a35ed455777e51ac345e33f

Tom Prince [:tomprince]

Assignee

Comment 47

•

7 years ago

Attached patch Support chunked mozmill tests in try syntax. — Details — Splinter Review

Tom Prince [:tomprince]

Assignee

Updated

•

7 years ago

Attachment #8880478 - Flags: review?(jlund)

Jordan Lund (:jlund)

Comment 48

•

7 years ago

Comment on attachment 8880478 [details] [diff] [review]
Support chunked mozmill tests in try syntax.

Review of attachment 8880478 [details] [diff] [review]:
-----------------------------------------------------------------

lgtm. do we need to update trychooser front end to add this option?

Attachment #8880478 - Flags: review?(jlund) → review+

Jorg K (CEST = GMT+2)

Comment 49

•

7 years ago

https://hg.mozilla.org/build/buildbotcustom/rev/e32f42e43b34782fde995aff4477c5cbc5d2d967

(In reply to Jordan Lund (:jlund) from comment #48)
> do we need to update trychooser front end to add this option?
Maybe not. Mozmill is not used in Firefox and our TB folks just put |-u mozmill|. That just stopped working due to the chunking so now it will work again.

Are we done here or will we still address comment #42? Tom?

Tom Prince [:tomprince]

Assignee

Comment 50

•

7 years ago

Looking at the latest treeherder push on comm-central, the first mozmill test run is still timing out, so we'll need to something to address that.

An easy workaround would be to up the chunks, but given that the first chunk takes >120m and the second just 45, there should be a better way of balancing the tests.

Jorg K (CEST = GMT+2)

Comment 51

•

7 years ago

I'd go to three chunks and be done with it ;-)

Jorg K (CEST = GMT+2)

Comment 52

•

7 years ago

Nick, sorry for the hassle, did the push from comment #49 get merged to production?

Flags: needinfo?(nthomas)

Nick Thomas [:nthomas] (UTC+12)

Comment 53

•

7 years ago

https://hg.mozilla.org/build/buildbotcustom/pushloghtml?fromchange=07a05ce39ac6&tochange=4e4744e608c6 shows pretty clearly that it did.

Flags: needinfo?(nthomas)

Jorg K (CEST = GMT+2)

Comment 54

•

7 years ago

Sorry, I didn't know what to look for, now I understand.

Jorg K (CEST = GMT+2)

Comment 55

•

7 years ago

Tom, can we do a bit more here, since both Xpcshell and Mozmill are timing out, for example here:
https://treeherder.mozilla.org/#/jobs?repo=comm-central&revision=0170000376576d23e86bd8faa3e51c4e659f85c2

Flags: needinfo?(mozilla)

Comment hidden (mozreview-request)

Review commit: https://reviewboard.mozilla.org/r/157890/diff/#index_header
See other reviews: https://reviewboard.mozilla.org/r/157890/

Tom Prince [:tomprince]

Assignee

Updated

•

7 years ago

Attachment #8887142 - Attachment is obsolete: true

Comment hidden (mozreview-request)

Review commit: https://reviewboard.mozilla.org/r/157894/diff/#index_header
See other reviews: https://reviewboard.mozilla.org/r/157894/

Tom Prince [:tomprince]

Assignee

Comment 58

•

7 years ago

mozreview-review

Comment on attachment 8887143 [details]
Bug 1342828 - Use many chunks in thunderbird debug tests.

https://reviewboard.mozilla.org/r/157892/#review162994

::: mozilla-tests/thunderbird_config.py:171
(Diff revision 1)
>      ('mozmill', {
>          'use_mozharness': True,
>          'script_path': 'scripts/desktop_unittest.py',
>          'extra_args': ['--mozmill-suite', 'mozmill',
>                         '--cfg', 'unittests/thunderbird_extra.py',
>                         '--opt-cfg', 'unittests/thunderbird_buildbot.py'],

This depends on https://reviewboard.mozilla.org/r/155746/ for the trivial reason that they overlap adding this line.

Tom Prince [:tomprince]

Assignee

Comment 59

•

7 years ago

mozreview-review

Comment on attachment 8887143 [details]
Bug 1342828 - Use many chunks in thunderbird debug tests.

https://reviewboard.mozilla.org/r/157894/#review162998

This is probably overkill, but it seems easier to split things up more, and increase the timeouts to figure out what is going on, and then make the bounds tighter later.

Jorg K (CEST = GMT+2)

Comment 60

•

7 years ago

Great!

Nick Thomas [:nthomas] (UTC+12)

Comment 61

•

7 years ago

mozreview-review

Comment on attachment 8887143 [details]
Bug 1342828 - Use many chunks in thunderbird debug tests.

https://reviewboard.mozilla.org/r/157894/#review163254

Attachment #8887143 - Flags: review+

Nick Thomas [:nthomas] (UTC+12)

Comment 62

•

7 years ago

Comment on attachment 8887143 [details]
Bug 1342828 - Use many chunks in thunderbird debug tests.

https://hg.mozilla.org/build/buildbot-configs/rev/26fba14dab2d8a75690dcec4394c1d06ce60f3df
https://hg.mozilla.org/build/buildbot-configs/rev/cc0315e8cf031861b866a972ca5048e93b22a232

Will be used for jobs starting in 55 minutes or so.

Nick Thomas [:nthomas] (UTC+12)

Comment 63

•

7 years ago

If you would like to dig into the timeouts and running them locally is an issue then I suggest getting a loaner, see https://wiki.mozilla.org/ReleaseEngineering/How_To/Request_a_loaner.

Jorg K (CEST = GMT+2)

Comment 64

•

7 years ago

I think it's time to close this bug and open a new one for follow-up work.

In
https://treeherder.mozilla.org/#/jobs?repo=comm-central&revision=25b49e872ac53975737e857be31f5c676c586007
there are no more timeouts, the slowest tests are X4 and Z1:

Linux 32bit: X4: 110 min, Z1: 113 min
Linux 64bit: X4:  85 min, Z1:  88 min.

With 'script_maxtime': 3600*3 which is 180 min we should be good for a while, right?

Status: NEW → RESOLVED

Closed: 7 years ago

Flags: needinfo?(mozilla)

Keywords: leave-open

Resolution: --- → FIXED

You need to log in before you can comment on or make changes to this bug.