Closed Bug 1021957 Opened 11 years ago Closed 11 years ago

dzAlerts - how to deal with noisy data (e.g. tp5's alipay.com)

Categories

(Datazilla Graveyard :: Metrics, defect)

x86_64
Windows 7
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: ekyle, Unassigned)

References

Details

The noise should not be a problem for the median test with the large window size (=20), but it still emits a large number of alerts. What's up? I must grab those test results and chart the p-value the median test is generating; maybe there are odd edge effects.
in the past, the alipay test was connecting to external resources (and so would hang periodically on desktop talos runs). aiui this was fixed, but worth checking if there are any stragglers perhaps...
It appears Alipay.com's performance score has slight time-of-day dependency that is causing this. This has been mitigated by increasing window size for alipay.com only, and using the minimum statistic rather than the median. Inspecting past data, we do get a signal; dzAlerts is still able to detect regressions despite the noise. This is now running on staging.
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
(In reply to Kyle Lahnakoski [:ekyle] from comment #2) > It appears Alipay.com's performance score has slight time-of-day dependency > that is causing this. This has been mitigated by increasing window size for > alipay.com only, and using the minimum statistic rather than the median. > > Inspecting past data, we do get a signal; dzAlerts is still able to detect > regressions despite the noise. > > This is now running on staging. We should still most probably disable Alipay.com unless there's a very good reason for us to keep it*. IMO it's not worth jumping through code/analysis hoops just to make the results of this tests usable, unless it really means a lot to us, which I don't think it does. Regardless, a test with so much noise should be examined carefully at the test itself first, before "fixing" it at the processing side by filtering the hell out of it. * I've heard the argument that this test specifically serves as a test case for very noisy data, which we'd like to know how to handle. While handling very noisy data might be useful on some cases, our first priority should be to have all our tests reasonably quiet. I think so far all the effort has been on processing the data. That's sub optimal IMO, for this case and other cases as well.
Do we know what the time of day dependency is due to? I'm still concerned we're connecting to external resources when we shouldn't...
(In reply to Ed Morley [:edmorley UTC+0] from comment #4) > Do we know what the time of day dependency is due to? > I'm still concerned we're connecting to external resources when we > shouldn't... Joel, aren't talos runs blocked from accessing external URLs? do you know if anyone looked at what this page does and tried to analyze where does the noise come from?
Flags: needinfo?(jmaher)
Should we open a bug to actually deal with the alipay.com noise? For me, this bug was about dealing with noisy data, not about reducing the source of noise.
Summary: alipay.com is noisy → dzAlerts - how to deal with noisy data (e.g. tp5's alipay.com)
Just to summarize what I think we should do on such cases where a test produces a lot of noise which makes it hard for us to use it effectively: 1. If it's not very important to us, disable/remove the test. No point in putting effort into something we don't really care about. 2. If it is important to us, try to understand why is the test noisy, and fix the test. 3. If we can't fix the test, assess again how much we want it, and if we still really want to be able to use it, try better/different/more-specialized processing to bring a clearer signal out of its noise. Filtering out the noise should not be our first option, as it's a (good) effort put into something we don't necessarily really care about, and it will never be as effective as making the test produce less noise in the first place.
alipay.com is a local cache of a real webpage, I am not sure what we would fix. With that said it would be really interesting to figure out what in the page is causing the noise :) That might be a bug in the product we can fix!
Flags: needinfo?(jmaher)
we went through all the pages and verified there was no external data. chmanchester did that with me back in the day. We have high confidence there is no outside network access from the webpage. There could be something odd that happens in the webpage (i.e. time of day in the javascript, or maybe a feature of firefox that calls out to the internet) which could be suspect. right now we don't do strict blocking of network traffic- maybe this year we will get to it!
(In reply to Joel Maher (:jmaher) from comment #8) > alipay.com is a local cache of a real webpage, I am not sure what we would > fix. With that said it would be really interesting to figure out what in > the page is causing the noise :) That might be a bug in the product we can > fix! True, but do we want to fix a bug in that product? do we have spare time to investigate this? is there any reason to keep this test other than to "practice" noisy data? (In reply to Joel Maher (:jmaher) from comment #9) > we went through all the pages and verified there was no external data. > chmanchester did that with me back in the day. We have high confidence > there is no outside network access from the webpage. Ah, thanks! > There could be > something odd that happens in the webpage (i.e. time of day in the > javascript, or maybe a feature of firefox that calls out to the internet) > which could be suspect. So, guys, let's throw this test away please, or keep it running but never report any regressions or improvements on it, and kyle could keep tuning his processing if he so wishes and has time for this, without having anyone else being affected by what this test supposedly say. Joel, any objection?
I am fine getting rid of alipay.com, in fact that will reduce our alerts. If we want to keep it and not alert that is fine also. Steps to get rid of it: * edit tp5n.zip and adjust the tp5o.manifest file * hack talos pageloader to not run alipay.com * hack dzalerts to not alert I recommend hacking dzalerts and then file a bug that is not time sensitive to fix tp5n.zip.
(In reply to Joel Maher (:jmaher) from comment #9) > right now we don't do strict blocking of network traffic- maybe this year we > will get to it! Now that bug 995417 has landed, I plan to file bugs against talos, jetpack etc to enable for them too :-)
(In reply to Joel Maher (:jmaher) from comment #11) > I am fine getting rid of alipay.com, in fact that will reduce our alerts. > * hack dzalerts to not alert > > I recommend hacking dzalerts and then file a bug that is not time sensitive > to fix tp5n.zip. Joel, I closed this bug because daAlerts it is effectively dealing with alipay.com: There are no more false alerts coming from this page. We can fix tp5n at out leisure.
(In reply to Joel Maher (:jmaher) from comment #9) > we went through all the pages and verified there was no external data. > chmanchester did that with me back in the day. We have high confidence > there is no outside network access from the webpage. It seems like there are still external network connection attempts sadly - see bug 720852 comment 10.
Note that bug 1026869 was filed to remove alipay.com from the tp5o manifest.
wow, good find Ed! We had loaded alipay.com and dumped the network connections whilst loading all the pages- two sets of eyes did this and yielded success. I suspect it is timing related or based on some other criteria internal to the javascript. Either way as Avi pointed out that specific page is going to be deprecated in the next day or two!
You need to log in before you can comment on or make changes to this bug.