Closed Bug 1096498 Opened 11 years ago Closed 9 years ago

investigate possible changes in the measurement of activeTicks that landed on Nightly around 2014-10-02

Categories

(Firefox Health Report Graveyard :: Client: Desktop, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED WORKSFORME

People

(Reporter: bcolloran, Unassigned)

Details

Attachments

(1 file, 2 obsolete files)

Attached image activityIndicatorsForNightly_daily.png (obsolete) —
it looks like a patch landed on Nightly some time around 2014-10-02 that changed the way activeTicks are measured. from email thread on fhr-dev "Quantifying nightly pain for e10s": ==2014-11-04== 1. Usage hours depicted on that dashboard [ https://metrics.mozilla.com/protected/dashboards/firefox/ ] are derived from the "activeTicks" fields in FHR, not from the e.g. the total length of sessions (the time from which Nightly is launched until the program is closed). 2. The drop in usage hours appears to start on 2014-10-02-- it does not coinciding with the "prompt patch" on 2014-09-12. The numbers are basically flat from 2014-10-02 to 2014-09-12, so it doesn't look at all like what I'd expect to see in a case of delayed adoption of e10s or something-- it really looks like a drop on or around 2014-10-02, so I'd take a look at what patches landed around then. 3. There is no corresponding drop in daily active profiles; this is corroborated by looking at blocklist pings for clients on Nightly. 5. There is no corresponding drop in session length. 4. There does not appear to be a similar drop in the number of daily sessions (this is a noisier signal, so it's harder to state conclusively). I don't really buy the story about the world that you'd have to believe to make these observations fit together; rather, I'm inclined to suspect that something has changed in the was we're measuring activeTicks. Of course, I'm not an engineer, so I have no idea how this stuff really works, but from what I understand it seems at least plausible that something having to do with counting activeTicks could get touched while working on multiple process stuff... There are a couple more things I'm going to look into, but that's my opening bid: something is up, but I'm not convinced it's necessarily e10s pain for Nightly users. I'd start looking at patches from around 2014-10-02, and pay especially close attention to stuff that might have touched the active tick measurements. ==2014-11-07== OK, here's a little more. Thanks to Saptarshi, who helped me get better numbers for the count of sessions. In the attached plots, each of the grey dashed vertical lines is a release (not necessarily relevant to nightly, but there fwiw), the red line is on 2014-09-12 the day Dave said the "prompt patch" landed, and the blue line is on 2014-10-02, the last day before the number of active hours drops. If you look at the 7 day averages, you *might* be able to argue that the number of daily active profiles and daily sessions is off a little bit, but those are very small fluctuations compared to the marked drop in "active" hours. Moreover, the number of session hours is dead flat. So, again, my leading hypothesis is that there was some change in the code that measures activeTicks, which is the measurement the ultimately leads to the "active hours" graph that Johnathan was looking at. Anyone looking into this? Should I file a bug? Or do people have alternative hypothesis about what could be going on? -brendan
Doesn't need to be private. I'm going to hand this to Georg first for investigation. He's on vacation starting next week, so we may need to reassign if there isn't an obvious cause/fix. Brendan, could you break this down by OS? The active-ticks code has some OS-specific code, and it would be good to know if this is across all platforms or just on a specific platform. Also, I presume you've verified that this didn't happen on any of the other channels? What's a little strange about this data is that it typically takes several days for a nightly release to propagate, and the graph doesn't show that gradual update quite the way I'd expect. But assuming this is the 10-2 nightly, here's the regression range: https://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=2399d1ae89e9&tochange=5d6ec4dddf14 But I don't see any obvious candidates here for causing this.
Assignee: nobody → georg.fritzsche
Group: mozilla-employee-confidential
Component: Data Collection → Client: Desktop
Flags: needinfo?(bcolloran)
confidential numbers included in plots?
Group: mozilla-employee-confidential
Flags: needinfo?(bcolloran)
yes, i'll get those numbers by OS, etc.
Actually there were two nightlies on 10-2 and that's the regression link for the second one. Here's the first one: https://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=14665b1de5ee&tochange=2399d1ae89e9 With some candidates: 7c97034feb78 mycoolclub — Bug 506815 - Replace old MouseTrailer code with TrackMouseEvent api. r=jimm d954ed24e795 Bas Schouten — Bug 902952 - Part 2: Use Direct2D 1.1 where available. r=jrmuizel
Group: mozilla-employee-confidential
Ah, Benjamin, I forgot that you can filter by OS on the dashboard that Jonath originally spotted the anomaly on: https://metrics.mozilla.com/protected/dashboards/firefox/ filter by channel: nightly, and then by whatever OS you want. My interpretation of the dashboard is: The drop is clearly evident on Windows; on Mac and Linux, there is not a similarly clear-cut drop, but we have so few Nightly users on Mac or Linux that it's not possible to conclusively state that the behavior differs on those platforms because the signal is so much noisier.
(In reply to Georg Fritzsche [:gfritzsche] from comment #6) > bug 506815 / https://hg.mozilla.org/mozilla-central/rev/7c97034feb78 jimm, do you think this could have a side effect on mouse events we use for "user-interaction-active" and "user-interaction-inactive"? > bug 1073043 / https://hg.mozilla.org/mozilla-central/rev/13740fca92f2 smaug, same thing here. We trigger on mouse events (except NS_MOUSE_ENTER/EXIT and !IsReal()), keyboard & wheel events: http://hg.mozilla.org/mozilla-central/annotate/c0d559389a5c/dom/events/EventStateManager.cpp#l510 Anything morphing those conditions would be interesting.
(In reply to Georg Fritzsche [:gfritzsche] from comment #7) > (In reply to Georg Fritzsche [:gfritzsche] from comment #6) > > bug 506815 / https://hg.mozilla.org/mozilla-central/rev/7c97034feb78 > > jimm, do you think this could have a side effect on mouse events we use for > "user-interaction-active" and "user-interaction-inactive"? > > > bug 1073043 / https://hg.mozilla.org/mozilla-central/rev/13740fca92f2 > > smaug, same thing here. > > We trigger on mouse events (except NS_MOUSE_ENTER/EXIT and !IsReal()), > keyboard & wheel events: > http://hg.mozilla.org/mozilla-central/annotate/c0d559389a5c/dom/events/ > EventStateManager.cpp#l510 > Anything morphing those conditions would be interesting. ... actually setting needinfo.
Flags: needinfo?(jmathies)
Flags: needinfo?(bugs)
(In reply to Georg Fritzsche [:gfritzsche] from comment #7) > > bug 1073043 / https://hg.mozilla.org/mozilla-central/rev/13740fca92f2 > > smaug, same thing here. Shouldn't have any effect on "user-interaction-active" and "user-interaction-inactive". user activity check happens on different level than DOM Event dispatch.
Flags: needinfo?(bugs)
(In reply to Georg Fritzsche [:gfritzsche] from comment #7) > (In reply to Georg Fritzsche [:gfritzsche] from comment #6) > > bug 506815 / https://hg.mozilla.org/mozilla-central/rev/7c97034feb78 > > jimm, do you think this could have a side effect on mouse events we use for > "user-interaction-active" and "user-interaction-inactive"? Also, any other mouse-/keyboard-event related changes that could be in that timeframe?
(In reply to Georg Fritzsche [away Nov 15 - Nov 30] [:gfritzsche] from comment #10) > Also, any other mouse-/keyboard-event related changes that could be in that > timeframe? ... especially from all the recent e10s work?
Do we report the data of parent or child process?
(In reply to Georg Fritzsche [away Nov 15 - Nov 30] [:gfritzsche] from comment #7) > (In reply to Georg Fritzsche [:gfritzsche] from comment #6) > > bug 506815 / https://hg.mozilla.org/mozilla-central/rev/7c97034feb78 > > jimm, do you think this could have a side effect on mouse events we use for > "user-interaction-active" and "user-interaction-inactive"? The only code I could find related to these events filters mouse enter and mouse leave - http://mxr.mozilla.org/mozilla-central/source/dom/events/EventStateManager.cpp#508 Unrelated though, there was a regression bug filed against bug 506815 that has not been fixed, so i think we can back the original landing out temporarily to see if it helps.
Flags: needinfo?(jmathies)
(In reply to Olli Pettay [:smaug] from comment #12) > Do we report the data of parent or child process? Parent AFAICT, i'm not aware of anything to hook things up for the child. Note this though: (brendan c from comment #0) > 2. The drop in usage hours appears to start on 2014-10-02-- it does not > coinciding with the "prompt patch" on 2014-09-12. The numbers are basically > flat from 2014-10-02 to 2014-09-12, so it doesn't look at all like what I'd > expect to see in a case of delayed adoption of e10s or something-- it really > looks like a drop on or around 2014-10-02, so I'd take a look at what > patches landed around then.
(In reply to Jim Mathies [:jimm] from comment #13) > (In reply to Georg Fritzsche [away Nov 15 - Nov 30] [:gfritzsche] from > comment #7) > > (In reply to Georg Fritzsche [:gfritzsche] from comment #6) > > > bug 506815 / https://hg.mozilla.org/mozilla-central/rev/7c97034feb78 > > > > jimm, do you think this could have a side effect on mouse events we use for > > "user-interaction-active" and "user-interaction-inactive"? > > The only code I could find related to these events filters mouse enter and > mouse leave - > > http://mxr.mozilla.org/mozilla-central/source/dom/events/EventStateManager. > cpp#508 > > Unrelated though, there was a regression bug filed against bug 506815 that > has not been fixed, so i think we can back the original landing out > temporarily to see if it helps. That would be helpful - who should we poke about this?
Flags: needinfo?(jmathies)
Flags: needinfo?(jmathies)
There might still be an open question from comment 15 Jim (not sure if there were talks elsewhere).
Unassigning because i'm on PTO. Sorry, didn't really get this anywhere.
Assignee: georg.fritzsche → nobody
(In reply to Georg Fritzsche [away Nov 15 - Nov 30] [:gfritzsche] from comment #16) > There might still be an open question from comment 15 Jim (not sure if there > were talks elsewhere). I backed bug 506815 out yesterday so if that was the cause of this we'll know soon enough.
NI?bcolloran to check the data next week(s)
Flags: needinfo?(bcolloran)
Attached image measures of activity from FHR (obsolete) —
There is not yet any evidence of a change in the level of the tick measurement within the last week. All other measures of activity remain stable as well. I'll keep an eye on this for at least a couple more weeks.
Good news! now that a couple more weeks of data have rolled in, it does look like the measurements for ticks have returned roughly to the levels they should be at given the changes in other variables (or at least that seems plausible, it's impossible to know for sure). activeHoursPerActiveProfiles still looks a little depressed, but that may be a real phenomenon due to e.g. seasonality or something else. In any case, the rightmost blue vertical dashed line marks 2014-11-18, and the numbers bounce back after that.
Attachment #8520090 - Attachment is obsolete: true
Attachment #8528611 - Attachment is obsolete: true
Flags: needinfo?(bcolloran)
Interesting - so we only have an unexplained short-term drop here? We started to poke a little at the underlying code last week, but only found over-counting so far (definite over-counting of user-activity on Windows per bug 1107782).
Oh, I thought the drop was explained by whatever was going on with bug 506815, and that the return to higher levels was due to Jim backing out the bug on the 18th... not so?
(In reply to brendan c from comment #23) > Oh, I thought the drop was explained by whatever was going on with bug > 506815, and that the return to higher levels was due to Jim backing out the > bug on the 18th... not so? Ah, cool - i missed that comment getting back here when catching up from PTO, sorry. So, if this is confirmed to match the timelines, i guess we can keep watching this a little and then close it out if its fine?
I presume this is no longer of interest.
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → WORKSFORME
Product: Firefox Health Report → Firefox Health Report Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: