Closed Bug 1038279 Opened 6 years ago Closed 6 years ago

Lots of mochitest-dt logspam from a recent fx-team push

Categories

(DevTools :: Inspector, defect)

defect
Not set

Tracking

(Not tracked)

RESOLVED FIXED
Firefox 33

People

(Reporter: RyanVM, Assigned: jwalker)

References

(Blocks 1 open bug)

Details

The below push to fx-team has caused large amounts of mochitest-dt logspam (confirmed the push prior is clean).
https://tbpl.mozilla.org/?tree=Fx-Team&rev=33f1bc57b49c

https://tbpl.mozilla.org/php/getParsedLog.php?id=43658556&tree=Fx-Team

I discussed this briefly on IRC with jryans and the culprit wasn't immediately obvious so we decided to file for now.
The first log to include the new messages (https://tbpl.mozilla.org/php/getParsedLog.php?id=43765301&tree=Try) points to bug 1035296 as the culprit.

Joe, as that was yours, can you take a look at this?
Flags: needinfo?(jwalker)
Blocks: 1035296
Duplicate of this bug: 1038571
Before:

11:29:22     INFO -  TEST-INFO | checking window state
11:29:22     INFO -  console.error:
11:29:22     INFO -    unknownError

After:

11:28:09     INFO -  TEST-INFO | checking window state
11:28:09     INFO -  console.error:
11:28:09     INFO -    TypeError: win is null


Joe's patch just made the existing errors more visible on TBPL as they now start with an actual error type. I believe that this is a good thing.
All the error messages that I could find point to inspector-related tests, so moving the bug there.
Component: Developer Tools: Framework → Developer Tools: Inspector
(In reply to Panos Astithas [:past] from comment #4)
> Joe's patch just made the existing errors more visible on TBPL as they now
> start with an actual error type. I believe that this is a good thing.

This is undoubtably a good thing in the long run. However, right now these errors are overwhelming the mochitest-dt logs, causing TBPL's log parser to be unable to find real errors (due to hitting internal restrictions on number of failures it'll try to match for bug suggestions). This means we need to open the full logs to find the cause of nearly every dt orange and manually star them.

It seems to me that a more iterative approach (similar to Yoric's work making Promise exceptions fatal) would have avoided this pain, where the patch in question was run through Try, bugs were filed and fixed on existing errors, and then the patch landed. While I'm completely in favor of what this patch is accomplishing, I think we need to backout for now so that mochitest-dt is starrable again.
I agree that we should backout the patch if it disrupts the work of our sheriffs.
Another hackish idea would be to prepend a string in the error message in order to keep the useful error message in the log, but preventing TBPL's parser from matching.
Thanks, backed out :)

https://hg.mozilla.org/mozilla-central/rev/31c5d0a5115d

(In reply to Panos Astithas [:past] from comment #8)
> Another hackish idea would be to prepend a string in the error message in
> order to keep the useful error message in the log, but preventing TBPL's
> parser from matching.

This is the regex TBPL uses for matching failure lines in the log parser:
https://mxr.mozilla.org/webtools-central/source/tbpl/php/inc/GeneralErrorFilter.php#40
Assignee: nobody → jwalker
Status: NEW → RESOLVED
Closed: 6 years ago
Flags: needinfo?(jwalker)
Resolution: --- → FIXED
Target Milestone: --- → Firefox 33
(In reply to Ryan VanderMeulen [:RyanVM UTC-4][PTO 21-July] from comment #9)
> Thanks, backed out :)
> 
> https://hg.mozilla.org/mozilla-central/rev/31c5d0a5115d
> 
> (In reply to Panos Astithas [:past] from comment #8)
> > Another hackish idea would be to prepend a string in the error message in
> > order to keep the useful error message in the log, but preventing TBPL's
> > parser from matching.
> 
> This is the regex TBPL uses for matching failure lines in the log parser:
> https://mxr.mozilla.org/webtools-central/source/tbpl/php/inc/
> GeneralErrorFilter.php#40

The linked code in GeneralErrorFilter.php doesn't check for unknownError at all though, so how does the filtering work? Is it just a case of 'too much' - in which case prepending the error message wouldn't work. Or maybe you're suggesting that if we prepended the error message, then we could add to GeneralErrorFilter.php?

Also CC Patrick in case there is something simple we can to do prevent the inspector from excepting so much.
Flags: needinfo?(ryanvm)
Prepending the string "Protocol error: " appears to not trigger the TBPL heuristics:

https://tbpl.mozilla.org/?tree=Try&rev=59ef40c797cc
(In reply to Joe Walker [:jwalker] from comment #10)
> The linked code in GeneralErrorFilter.php doesn't check for unknownError at
> all though, so how does the filtering work? Is it just a case of 'too much'
> - in which case prepending the error message wouldn't work. Or maybe you're
> suggesting that if we prepended the error message, then we could add to
> GeneralErrorFilter.php?

The issue is that, like you said, unknownError wasn't being flagged, but TypeError was. Mind you, this is just a log parsing algorithm, but it was causing an explosion in "suspected" failures based on what was now being flagged.

Also as an FYI, we would eventually like to make such failures fatal (turning the jobs orange). Doing slow has been very slow-going due to the high number of failures that occur right now. With TypeError, you had the advantage of it being more obviously-visible on TBPL runs, which makes it more likely it'll be noticed and fixed faster. I'm worried that obscuring these errors will make it harder for them to be caught and resolved in a timely manner. As you can see by this bug, we do try to file bugs for these errors when they're flagged :)

How troublesome is fixing the existing failures before landing bug 1035296? Should be fairly easy to test on Try as you go.
Flags: needinfo?(ryanvm)
So how about this:

1. We raise a bug to fix this by preventing TypeErrors hitting protocol.js so often by fixing the underlying problem (if there is one, or sending a different type of error otherwise)
2. We adapt the patch in bug 1035296 to do `s/^TypeError: //` before sending the message out
3. I ask Patrick to fix the bug raised in step 1.

I don't want to prevent bug 1035296 from landing for step 3 because PTO and because it's preventing CSS coverage from being turned on.
In other words, land Panos' patch for now and file a follow-up bug for reverting it and fixing the underlying errors? Works for me.
We should definitely be fixing the underlying inspector-related errors. I believe most (all?) of them are triggered when tests end and window or node objects aren't available anymore but for some reason some actor code still runs and tries to access them.
The fixes for these errors shouldn't be too hard.
Joe, did you file bugs for these already?
(In reply to Patrick Brosset [:pbrosset] [:patrick] from comment #15)
> Joe, did you file bugs for these already?

I didn't.
Product: Firefox → DevTools
You need to log in before you can comment on or make changes to this bug.