Closed Bug 927550 Opened 7 years ago Closed 7 years ago

allow an xpcshell test to request longer timeout before it is killed

Categories

(Testing :: XPCShell Harness, defect)

defect
Not set
minor

Tracking

(Not tracked)

RESOLVED FIXED
mozilla27

People

(Reporter: aceman, Assigned: aceman)

References

Details

(Keywords: dev-doc-needed)

Attachments

(1 file)

+++ This bug was initially created as a clone of Bug #794303 Comment 58 +++

In bug 794303 we identified a case where we need to run a long test but it is killed by xpcshell after the standard timeout (seems to be 5 * 60 seconds). See Bug #794303 Comment 68 for the try runs that seem to confirm this problem.

In this bug I propose to add infrastructure to xpcshell for a test to request longer timeout.
With parallel test execution, the entire xpcshell test suite can execute in under 3 minutes on a fast machine (it's actually ~2 minutes when you throw out tests that only run sequentially).

Having very long tests in the xpcshell test suite thus will become long poles during execution and decrease the time required to run the test suite. This in turn decreases automation efficiency.

That's what you need to keep in mind.

To mitigate the impact of long tests in our current automation world, I propose we annotate test manifest files (xpcshell.ini) with something saying a test is a "long" test. Long tests can be scheduled early during concurrent execution to mitigate the risk of them being a long pole. This annotation can also be used to determine the test timeouts. Despite the fact that 95% of xpcshell tests execute in under 10s (even during highly concurrent execution), we have an absurdly long default timeout that is universally applied. We should be able to drastically reduce the default timeout to minimize hang detection times in automation while still supporting long tests.
Attached patch patchSplinter Review
I do not think we have parallel execution. Anyway, this does not affect tests that do not opt-in into the longer timeout. The test in the linked bug takes 160seconds to run on a 3.5Ghz Phenom II CPU, but it is I/O bottlenecked. That is why it may take longer than 5 minutes on the try-servers.

gps, you can still develop your solution (I can't) later in a new bug. But it will not help me right now with the mentioned test for Thunderbird. But I have no problem if you later rename the option for opting in to the "long test" behaviour.
Attachment #818027 - Flags: review?(ted)
Comment on attachment 818027 [details] [diff] [review]
patch

Review of attachment 818027 [details] [diff] [review]:
-----------------------------------------------------------------

I could bikeshed over the key name, but I don't care that much.
Attachment #818027 - Flags: review?(ted) → review+
(In reply to :aceman from comment #2)
> I do not think we have parallel execution.
Sorry, this wanted to be "...parallel execution in comm-central xpcshell tests" .

Thanks for accepting it. You can change the key name when gps continues with his plan. I also do not care about the name, as long as the functionality is there.
Keywords: checkin-needed
https://hg.mozilla.org/mozilla-central/rev/11762cace081
Status: ASSIGNED → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla27
This isn't documented anywhere, and it should be.
Keywords: dev-doc-needed

Dear aceman,

I would like to run xpcshell-test under valgrind to weed out any remaining memory-related errors.

But in order to do that I need to run the test with much longer timeout value.

Maybe a multiplication factor of 20-25 would be needed.

What is exactly the format of writing this multiplication factor?

For example, the following is the content of
mozilla/comm/chat/components/src/test/xpcshell.ini
that is included from
moz-obj-dir/objdir-tb3/_tests/xpcshell/xpcshell.ini (this is under my MOZOBJDIR).

--- quote
[DEFAULT]
head =
tail =

[test_accounts.js]
[test_commands.js]
[test_conversations.js]
[test_logger.js]
--- end quote

Where do I put the multiplication factor for timeout and how?

It would be awesome if we can specify the timeout multiplication factor on a directory basis instead of
specifying it for each file, but I can live with individual specification for now.

TIA

Flags: needinfo?(acelists)

Yes, see how the timeout variable is used in tests:
https://searchfox.org/comm-central/search?q=requesttimeoutfactor&path=

Flags: needinfo?(acelists)

(In reply to :aceman from comment #9)

Yes, see how the timeout variable is used in tests:
https://searchfox.org/comm-central/search?q=requesttimeoutfactor&path=

Thank you very much.
I will try to increase the timeout as suggested.

Aceman,

Thank you for the feature to lengthen timeout value.

This is a summary of latest discovery.

valgrind warning goes into stderr or stdout and is not visible unless the log is printed.
So I had to print out the detailed log for every test since xpcshell test does not print stderr/stdout output if there is no error.
(Passing --verbose to xpcshell-test as in |mach xpcshell-test --verbose|)

This makes the output of whole xpcshell-test whoppingly large. About 115MB (!)
[With verbose output of my local patches to enable buffered output, the total will be more than 170MB. A bit too large for easy handling in an editor buffer. Not impossible.]

So I had to run a few tests to see how it goes before producing such a large output.
Then I already found a positive uninitialized value reference in M-C code. I will file a bugzilla on this.

After a little tweak of my setup to invoke valgrind to reduce the output (now I had to disable output for memory leak and mismatched new/delete/malloc/free), I ran the whole C-C test only. [I selectively chose C-C test directories, and lengthened the timeout values appropriately..] This still generates close to 20MB of output.

But the run was worth it.
Surprise. I found a few dozen tests which caused uninitialized value access: there seem to be several causes and a few of them cause a dozen warnings each.
I will file bugzilla entries about them in the next few days.

At least, I can immediately see on of them seems to be serious. It has something to do with the handling of selected messages. I have a feeling some erratic behavior seen by users may be related to this.

All in all, I am glad I asked the question to lengthen the timeout so that I can run xpcshell test with C-C TB running under valgrind.
(I used a fake wrapper to call the real xpcshell binary under valgrind with suitable options and installed this wrapper as |xpcshell| after renaming the original |xpcshell| to |xpcshell-bin|. Then invoked |mach xpcshell-test --verbose| )

I have a feeling that I probably find another few warnings in M-C code once I ran the whole test under valgrind.

TIA

You need to log in before you can comment on or make changes to this bug.