Closed
Bug 1444826
Opened 7 years ago
Closed 7 years ago
Add more logs to help debugging failing DAMP tests
Categories
(DevTools :: General, enhancement, P3)
Tracking
(firefox61 fixed)
RESOLVED
FIXED
Firefox 61
Tracking | Status | |
---|---|---|
firefox61 | --- | fixed |
People
(Reporter: jdescottes, Assigned: ochameau)
References
(Blocks 2 open bugs)
Details
Attachments
(2 files)
While splitting DAMP in 1441703, there was very little logging when tests had errors. This made debugging issues a bit painful.
We should add minimal logs to know:
- if a given test failed to load
- if a helper is waiting for an async method to finish
We might also want to add a log helper to avoid relying on dump(str + "\n")
Comment 1•7 years ago
|
||
> if a helper is waiting for an async method to finish
As a quick note, I often remarked that in tests involving async methods we nearly never handle errors right. Especially we seldom fail the test in the error callback / in a promise's catch clause. I believe this is a big source of frustration when debugging an async test, more than missing logs. For example if you do a programming mistake (which would yield ReferenceError or TypeError) in an async function (be it a normal callback or a promise chain) it would be silently caught if no `catch` is present.
Assignee | ||
Comment 2•7 years ago
|
||
See also the discussion over bug 1378002 where most DAMP errors ends up being flagged as "Intermittent PROCESS-CRASH | damp | application crashed". I would like to help sheriffs flags various DAMP intermittents/failures with distinct bugs!
This may be slightly different topic but any error should appear correctly in PerfHerder summary without having to manually read the full stdout log!
Assignee | ||
Updated•7 years ago
|
Assignee: nobody → poirot.alex
Comment hidden (mozreview-request) |
Assignee | ||
Comment 4•7 years ago
|
||
Joel, thanks to bug 1378002 comment 65, I ended up tweaking the python part.
But:
* instead of doing something only about timeout, I also handle exception/errors.
* I used a per subtest timeout from damp.js (instead of tptimeout).
I had a first approach before this patch, instead of dumping "DAMP ERROR ", I was using "TEXT-UNEXPECTED-FAIL | damp | ".
So that the string would be referenced in treeherder. But it doesn't make Talos python app quit with an error code and so the job is still considered as successful.
In order to be more generic, we could grep for "TEXT-UNEXPECTED-FAIL |" and kill the browser session if we have one.
That would make the python patch more generic and usable by other talos test suites if they want to stop the test and report an error.
Otherwise, I used 15 minutes timeout arbitrarily, I'll do some analysis on how long each subtest last.
Comment 5•7 years ago
|
||
mozreview-review |
Comment on attachment 8963065 [details]
Bug 1444826 - Stop DAMP with explicit error on exceptions and timeouts.
https://reviewboard.mozilla.org/r/231922/#review237488
I have a pending question about the done variable. outside of that, the idea of using TEST-UNEXPECTED-FAIL is interesting- it seems like a good route to go in the future, we would have to ensure the mozharness parsers accept it properly, that seems like a great followup bug.
::: testing/talos/talos/tests/devtools/addon/content/damp.js:25
(Diff revision 1)
> // "normal". Print allocation sites to stdout if DEBUG_DEVTOOLS_ALLOCATIONS is set to
> // "verbose".
> const DEBUG_ALLOCATIONS = env.get("DEBUG_DEVTOOLS_ALLOCATIONS");
>
> +// Maximum time spent in one test, in milliseconds
> +const TEST_TIMEOUT = 15 * 60000;
15 minutes for a single test seems very long- I would think 5 minutes would be enough. Are there specific tests in damp which take a long time? Is this leaving buffer room for profiling or running on debug builds?
to make things easier, I would like to add a comment in test.py where we define damp noting "there is a 15 minute internal timeout for damp". That would help with debugging timeout issues if for some reason the DAMP ERROR is not present in the line.
::: testing/talos/talos/tests/devtools/addon/content/damp.js:290
(Diff revision 1)
> _doneInternal() {
> + // Ignore any duplicated call to this method
> + if (this.done) {
> + return;
> + }
> + this.done = true;
where is this.done defined at? I am old fashioned and code more in python than javascript- does _done == this.done?
Reporter | ||
Comment 6•7 years ago
|
||
mozreview-review |
Comment on attachment 8963065 [details]
Bug 1444826 - Stop DAMP with explicit error on exceptions and timeouts.
https://reviewboard.mozilla.org/r/231922/#review237522
Thanks a lot Alex, that's a great improvement!
I tested it with errors added in both test files and head.js and the test runner stopped in both cases, with a helpful log.
In a followup we could make out logs standout more by prefixing them with somthing like [DAMP]
r+ with the _done/done typo mentioned by Joel fixed.
::: testing/talos/talos/talos_process.py:77
(Diff revision 1)
> self.event.set()
> elif line == 'TART: TIMEOUT':
> self.got_timeout = True
> self.timeout_message = 'TART'
> self.event.set()
> + elif line.startswith('DAMP ERROR: '):
This is probably more a question for Joel, but isn't it weird to have something specific to DAMP here (rather than `TALOS ERROR:` or `TALOS ERROR (DAMP):`)
Maybe we could define a common prefix that all tests could use in a followup.
::: testing/talos/talos/tests/devtools/addon/content/damp.js:231
(Diff revision 1)
> + if (promise && typeof(promise.catch) == "function") {
> + promise.catch(e => {
> + this.exception(e);
> + });
> + }
This makes me think that an interesting follow up would be to:
- change _runNextTest to an async method
- use `await testMethod();`
- run testTeardown() at the end of _runNextTest instead of manually calling it in every test
Attachment #8963065 -
Flags: review?(jdescottes) → review+
Assignee | ||
Comment 7•7 years ago
|
||
Here is the time it takes to run each DAMP subtests.
The slowest if webconsole/custom.js, which takes 27s to run.
Given that we suggest adding new tests to custom pages, it is going to grow over time. I imagine at some point, it would make sense to tweak DAMP harness to be able to timeout individual subtests, may be via `runTest` method.
In the meantime, what about using 5 minutes?
It is totally arbitrary, it gives you 5 minutes to debug damp before it is automatically killed.
To reply to Joel about how to debug DAMP with this new debug mechanism, I don't have a good answer excet making the timeout big enough to be able to debug a bit.
We could add a "--keep-open" argument, somewhat similar to mochitest one, but to disable timeouts? Or disable the timeout if this is local run? Or?
Comment hidden (mozreview-request) |
Assignee | ||
Comment 9•7 years ago
|
||
In the latest patch, I'm using TEST-UNEXPECTED-FAIL, it is correctly picked up by TreeHerder.
Then there is no need to save the error line on the python side,
the fact that it is on stdout is enough for treeherder to pick it up.
The only thing that python code should do is to quit with an error code.
See this failed run:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=aeb6fcc513f5e62ffa730ca77d9bffe602d07aad&selectedJob=170819262
Also, in damp.js, I added some code to print the time it took to execute each test to help seeing when we reach to timeout limit.
And finally, a green try:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=5b25ea077b1f91d2cd206b4b46ebf69e1ab6f104
Comment 10•7 years ago
|
||
mozreview-review |
Comment on attachment 8963065 [details]
Bug 1444826 - Stop DAMP with explicit error on exceptions and timeouts.
https://reviewboard.mozilla.org/r/231922/#review237604
thanks, this is looking good.
::: testing/talos/talos/tests/devtools/addon/content/damp.js:25
(Diff revision 2)
> // "normal". Print allocation sites to stdout if DEBUG_DEVTOOLS_ALLOCATIONS is set to
> // "verbose".
> const DEBUG_ALLOCATIONS = env.get("DEBUG_DEVTOOLS_ALLOCATIONS");
>
> +// Maximum time spent in one test, in milliseconds
> +const TEST_TIMEOUT = 5 * 60000;
one suggestion here is that if running locally via mach (there will be develop mode in the python harness) we could set a pref or send in a url param and the timeout could be much longer.
Attachment #8963065 -
Flags: review?(jmaher) → review+
Assignee | ||
Comment 11•7 years ago
|
||
Green try running all talos suites:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=9f2decb8d6d1f14873f42aa135b000bbfee5dc3b
Just to ensure that none of the other suites were printing TEST-UNEXPECTED-FAIL.
Reporter | ||
Comment 12•7 years ago
|
||
New version looks good to me!
Comment 13•7 years ago
|
||
Pushed by apoirot@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/84ba9406ef28
Stop DAMP with explicit error on exceptions and timeouts. r=jdescottes,jmaher
Comment 14•7 years ago
|
||
bugherder |
Status: NEW → RESOLVED
Closed: 7 years ago
status-firefox61:
--- → fixed
Resolution: --- → FIXED
Target Milestone: --- → Firefox 61
Updated•6 years ago
|
Product: Firefox → DevTools
You need to log in
before you can comment on or make changes to this bug.
Description
•