Closed Bug 888765 Opened 11 years ago Closed 11 years ago

Many tests on Win8 on nightly builds fail to start

Categories

(Toolkit :: Telemetry, defect)

x86
Windows 8
defect
Not set
blocker

Tracking

()

RESOLVED FIXED
mozilla25
Tracking Status
firefox24 --- fixed
firefox25 --- fixed

People

(Reporter: philor, Assigned: jimm)

References

()

Details

(Whiteboard: [qa-])

Attachments

(1 file)

https://tbpl.mozilla.org/?rev=c5ce065936fa&showall=1&jobname=6.2.*pgo and then https://tbpl.mozilla.org/?rev=cbb24a4a96af&showall=1&jobname=6.2.*pgo - the first set of tests is on the periodic PGO build, the second plus the retriggers are on the nightly build. The only difference between the two ought to be that the former has MOZ_UPDATE_CHANNEL=default and the latter has MOZ_UPDATE_CHANNEL=nightly, but then you have people like http://mxr.mozilla.org/mozilla-central/source/configure.in#8824 deciding that "nightly" means "shoot ourselves in the foot, but only once a day." Assuming it's always going to happen to at least one suite, the regression range is the rather unwieldy https://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=8e3a124c9c1a&tochange=c5ce065936fa
https://tbpl.mozilla.org/?tree=Try&rev=845a7ce7ddb0 is just PGO plus MOZ_TELEMETRY_ON_BY_DEFAULT, and I expect it will show the failure (https://tbpl.mozilla.org/?tree=Try&rev=80034f32f9c5 where I blew the talos part of the trychooser syntax, seems to have); https://tbpl.mozilla.org/?tree=Try&rev=35e02f18786b is that with https://hg.mozilla.org/mozilla-central/rev/d8d194d3dcc1 backed out.
Component: General → Telemetry
Product: Core → Toolkit
When this does prove to have been nightly-only bustage caused by telemetry's fondness for abusing MOZ_UPDATE_CHANNEL, this won't be the first time that fondness has caused nightly-only bustage, will it?
Since it wasn't that, the only thing in the range that mentioned telemetry in the commit message, pushed https://tbpl.mozilla.org/?tree=Try&rev=f5efb2ec7814 with just PGO to make sure we can have a try build that *doesn't* show the bustage.
This needs an owner ASAP. Nightly-only test bustage is not acceptable.
Severity: normal → blocker
Flags: needinfo?(taras.mozilla)
Flags: needinfo?(dtownsend+bugmail)
mozilla-central is closed due to this
And that PGO but not MOZ_TELEMETRY_ON_BY_DEFAULT try was green. Unless reftests and talos are actually inheriting prefs from automation.py.in (which now gets its prefs from testing/profiles/prefs_general.js), and I don't think they are, the way that mochitests don't hit it probably means that having one or both of toolkit.telemetry.prompted and toolkit.telemetry.notifiedOptOut set to 999 avoids it.
Taras, please find an owner for this
Assignee: nobody → taras.mozilla
Flags: needinfo?(dtownsend+bugmail)
(In reply to Dave Townsend (:Mossop) from comment #7) > Taras, please find an owner for this Seems to me Taras IS the owner for this according to the WIKI page for telemetry.
(In reply to Phil Ringnalda (:philor) from comment #6) > mochitests don't hit it probably means that having one or both of > toolkit.telemetry.prompted and toolkit.telemetry.notifiedOptOut set to 999 > avoids it. That really shouldn't be the case, since both of those prefs are unused since bug 829881.
If we don't get an owner for this soon we'll just disable telemetry by default on nightlies
Flags: needinfo?(vdjeric)
Flags: needinfo?(nfroyd)
Flags: needinfo?(irving)
Flags: needinfo?(dteller)
(In reply to Dave Townsend (:Mossop) from comment #11) > If we don't get an owner for this soon we'll just disable telemetry by > default on nightlies Good call! But still on WIKI page Taras identifies himself as Directly Responsible Individual. Sounds like should be bug owner to me.
Disabling telemetry by default seems worse than nightly-only test failures.
(In reply to :Gavin Sharp (use gavin@gavinsharp.com for email) from comment #13) > Disabling telemetry by default seems worse than nightly-only test failures. I don't see how. getting telemetry on builds we are not doing seems completely useless.
(In reply to Dave Townsend (:Mossop) from comment #11) > If we don't get an owner for this soon we'll just disable telemetry by > default on nightlies Seems like bug 888927 will fix this and Gavin has already posted a patch in that bug. (In reply to Bill Gianopoulos [:WG9s] from comment #14) > (In reply to :Gavin Sharp (use gavin@gavinsharp.com for email) from comment > #13) > > Disabling telemetry by default seems worse than nightly-only test failures. > > I don't see how. getting telemetry on builds we are not doing seems > completely useless. How can we be getting telemetry on builds that are not being done?
Flags: needinfo?(nfroyd)
(In reply to Nathan Froyd (:froydnj) from comment #15) > (In reply to Dave Townsend (:Mossop) from comment #11) > > If we don't get an owner for this soon we'll just disable telemetry by > > default on nightlies > > Seems like bug 888927 will fix this and Gavin has already posted a patch in > that bug. My reading is that tests fail when telemetry is enabled, so won't that just make tests fail in all builds, not just nightlies? Happy to hear otherwise.
(In reply to Nathan Froyd (:froydnj) from comment #15) > (In reply to Dave Townsend (:Mossop) from comment #11) > > If we don't get an owner for this soon we'll just disable telemetry by > > default on nightlies > > Seems like bug 888927 will fix this and Gavin has already posted a patch in > that bug. > > (In reply to Bill Gianopoulos [:WG9s] from comment #14) > > (In reply to :Gavin Sharp (use gavin@gavinsharp.com for email) from comment > > #13) > > > Disabling telemetry by default seems worse than nightly-only test failures. > > > > I don't see how. getting telemetry on builds we are not doing seems > > completely useless. > > How can we be getting telemetry on builds that are not being done? That was kind of my whole point. I guess you figured it out.
Flags: needinfo?(dteller)
Jim, do any of the Windows 8 commits in the comment 0 regression range stand out to you as a possible culprit?
Flags: needinfo?(jmathies)
On a cursory look through the blame, the only bug that pops out at me is bug 873073. But from a discussion on irc with khuey sounds like if this was the cause it would also happen on inbound builds. FWIW, none of the metro specific landings here would get in the way of tests start up. They are all front end js.
Flags: needinfo?(jmathies)
I've been running several try builds in an attempt to bisect: https://tbpl.mozilla.org/?tree=Try&rev=e393837ecc16 https://tbpl.mozilla.org/?tree=Try&rev=82f9f73cf954 https://tbpl.mozilla.org/?tree=Try&rev=e3323fc7a7d1 https://tbpl.mozilla.org/?tree=Try&rev=5717788a2de4 Assuming that I didn't screw anything up, those builds have telemetry turned on and PGO turned on and are fully green. So anything that went wrong happened after: https://hg.mozilla.org/mozilla-central/rev/cec705c00777 I would like to push more builds, but Try appears to be busted.
(In reply to Nathan Froyd (:froydnj) from comment #20) > I've been running several try builds in an attempt to bisect: Try builds by default don't get --enable-update-channel=nightly, as I understand it, so this might not be reproducible there without some build config changes.
Flags: needinfo?(taras.mozilla)
Flags: needinfo?(vdjeric)
(In reply to :Gavin Sharp (use gavin@gavinsharp.com for email) from comment #21) > (In reply to Nathan Froyd (:froydnj) from comment #20) > > I've been running several try builds in an attempt to bisect: > > Try builds by default don't get --enable-update-channel=nightly, as I > understand it, so this might not be reproducible there without some build > config changes. Try builds get run with --enable-update-channel=, so changes like: https://hg.mozilla.org/try/rev/bcaf37d7c0e4 should be sufficient to turn on telemetry. I did screw up my previous builds: they didn't turn on MOZILLA_OFFICIAL, so telemetry was likely not running correctly. I've run several win PGO try builds with the above: https://tbpl.mozilla.org/?tree=Try&rev=572cb21c69e0 https://tbpl.mozilla.org/?tree=Try&rev=3175952c3a0b https://tbpl.mozilla.org/?tree=Try&rev=2665c5b13d74 https://tbpl.mozilla.org/?tree=Try&rev=7bac89250f31 https://tbpl.mozilla.org/?tree=Try&rev=d94c47fba470 https://tbpl.mozilla.org/?tree=Try&rev=513e708b1169 https://tbpl.mozilla.org/?tree=Try&rev=533d5c8b74ae (results not in yet, try is very backed up) and all of them are green. All those pushes are based off changesets in the regression range philor mentioned in comment 1. I suppose it's possible that something that wants MOZ_UPDATE_CHANNEL=nightly isn't getting properly triggered on those builds, which is what's needed to cause the (intermittent?) problems described in comment 1.
You absolutely should be able to repro on try with that (or less, what you did at first with just the s/nightly/default/ was enough) plus PGO, I did so multiple times. I think rather than "backed up" that last one is "thrown on the floor" and you'll have to repush it, though - there's a period during buildbot reconfigs when pushes just get ignored.
Finally got it down to a single push, behold: good: https://tbpl.mozilla.org/?tree=Try&rev=f939bb960eb6 bad: https://tbpl.mozilla.org/?tree=Try&rev=613bd3e6acd6 The crashtest/reftest oranges seem to be either intermittent or caused/exacerbated by a later push. But the talos reds start showing up with: https://hg.mozilla.org/mozilla-central/rev/5cd49ff35fb9 and even still appear to be intermittent (e.g. many retriggers of talos tests came up green in the bad push above). We also have pushes with patches after the bad landing: https://tbpl.mozilla.org/?tree=Try&rev=fc7b16936ac8 https://tbpl.mozilla.org/?tree=Try&rev=6dc7fb98dd99 coming up with red talos. So I'm going to pin the tail on bug 873073. (Which I apparently can't block this bug on because s-g. Lovely.) Jim, do you see how your patch for bug 873073 could be causing those talos hangs?
Flags: needinfo?(irving) → needinfo?(jmathies)
I really don't. As I understand it the only different between the builds that show the issue and those that don't is official branding plus telemetry. Not sure why creating a temporary window on startup would break things based on those changes. Let's go ahead and back out 873073 all the same and confirm the problem goes away. If it does I can take a look at that bug again and push to try with different fixes to come up with a new fix that doesn't trigger this.
Flags: needinfo?(jmathies)
Actually I might be able to fix this without the backout based on an idea I have. Need to do some try pushes.
Assignee: taras.mozilla → jmathies
Attached patch fixSplinter Review
Moving this query call from the toolkit to the first toplevel window fixes the problem. https://tbpl.mozilla.org/?tree=Try&showall=0&rev=5a2f9cedb166
Attachment #771677 - Flags: review?(netzen)
Thanks for the bisection work Nathan!
Blocks: 873073
Attachment #771677 - Flags: review?(netzen) → review+
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla25
Today's nightly is looking green! Thanks Jim :)
Comment on attachment 771677 [details] [diff] [review] fix Follow up fix for bug 873073.
Attachment #771677 - Flags: approval-mozilla-aurora?
Attachment #771677 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Whiteboard: [qa-]
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: