Permaorange browser-chrome on Aurora Linux64 and Win7 nightly builds since merge of Firefox 20 to Aurora

RESOLVED FIXED

Status

Testing
BrowserTest
--
critical
RESOLVED FIXED
5 years ago
4 years ago

People

(Reporter: rnewman, Unassigned)

Tracking

(Depends on: 1 bug, {crash, intermittent-failure})

20 Branch
crash, intermittent-failure
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox20- wontfix, firefox21 affected)

Details

Attachments

(1 attachment, 1 obsolete attachment)

+++ This bug was initially created as a clone of Bug #818990 +++

*** Start BrowserChrome Test Results ***
TEST-INFO | checking window state
TEST-INFO | unknown test url | must wait for focus
TEST-INFO | (browser-test.js) | Console message: PAC file installed from data:text/plain,function%20FindProxyForURL(url,%20host){%20%20var%20origins%20=%20['http://127.0.0.1:80',%20'…
INFO | automation.py | Application ran for: 0:05:36.670928
INFO | automation.py | Reading PID log: /tmp/tmpQDy7Vppidlog
Downloading symbols from: http://ftp-scl3.mozilla.com/pub/mozilla.org/firefox/tinderbox-builds/mozilla-aurora-linux64/1356092418/firefox-19.0a2.en-US.linux-x86_64.crashreporter-symbols.zip
PROCESS-CRASH | automation.py | application crashed [@ libc-2.11.so + 0xd4aa3]
Crash dump filename: /tmp/tmpXMOrul/minidumps/451a7cee-12ba-4cd4-039396d9-6fa8d400.dmp
Operating system: Linux
                  0.0.0 Linux 2.6.31.5-127.fc12.x86_64 #1 SMP Sat Nov 7 21:11:14 EST 2009 x86_64
CPU: amd64
     family 6 model 23 stepping 10
     2 CPUs

Crash reason:  SIGABRT
Crash address: 0x1f4000008a1

Thread 0 (crashed)
 0  libc-2.11.so + 0xd4aa3
    rbx = 0x00007f3b2a4d4d00   r12 = 0x00000000ffffffff
    r13 = 0x00000034d4ce5160   r14 = 0x0000000000000008
    r15 = 0x00007f3b456596d8   rip = 0x00000034d2ed4aa3
    rsp = 0x00007fff2068fa40   rbp = 0x0000000000000008
    Found by: given as instruction pointer in context
 1  libxul.so!PollWrapper [nsAppShell.cpp:4287525881ec : 35 + 0xd]
    rip = 0x00007f3b42b8409e   rsp = 0x00007fff2068fa70
    Found by: stack scanning
 2  libglib-2.0.so.0.2200.2 + 0x3c9fb
    rbx = 0x00007f3b456596d0   r12 = 0x00007f3b42b84070
    rip = 0x00000034d4a3c9fc   rsp = 0x00007fff2068fa90
    rbp = 0x00007f3b2a4d4d00
    Found by: call frame info
 3  libglib-2.0.so.0.2200.2 + 0x2e4ac7
    rip = 0x00000034d4ce4ac8   rsp = 0x00007fff2068fa98
    Found by: stack scanning
 4  libglib-2.0.so.0.2200.2 + 0x2e4aff
    rip = 0x00000034d4ce4b00   rsp = 0x00007fff2068faa0
    Found by: stack scanning
 5  libpthread-2.11.so + 0x8daf
    rip = 0x00000034d3608db0   rsp = 0x00007fff2068faf8
    Found by: stack scanning
 6  libglib-2.0.so.0.2200.2 + 0x3cd39
    rip = 0x00000034d4a3cd3a   rsp = 0x00007fff2068fb10
    Found by: stack scanning
 7  libxul.so!nsAppShell::ProcessNextNativeEvent(bool) [nsAppShell.cpp:4287525881ec : 135 + 0xa]
    rip = 0x00007f3b42b8405f   rsp = 0x00007fff2068fb40
    Found by: stack scanning
 8  libxul.so!nsBaseAppShell::DoProcessNextNativeEvent(bool, unsigned int) [nsBaseAppShell.cpp:4287525881ec : 139 + 0x5]
    rip = 0x00007f3b42b89ec9   rsp = 0x00007fff2068fb50
    Found by: call frame info
 9  libxul.so!nsBaseAppShell::OnProcessNextEvent(nsIThreadInternal*, bool, unsigned int) [nsBaseAppShell.cpp:4287525881ec : 298 + 0x4]
    rbx = 0x00007f3b37b53080   r12 = 0x00000000002aa076
    rip = 0x00007f3b42b8a081   rsp = 0x00007fff2068fb80
    rbp = 0x00007f3b45625d40
    Found by: call frame info
10  libxul.so!nsThread::ProcessNextEvent(bool, bool*) [nsThread.cpp:4287525881ec : 600 + 0x7]
    rbx = 0x00007f3b45625d40   r12 = 0x0000000000000001
Don't suppose you have the log/tbpl url? :-)
Sorry, haven't had my coffee yet!

https://tbpl.mozilla.org/php/getParsedLog.php?id=18162696&tree=Mozilla-Aurora
Thank you :-)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)

Comment 10

5 years ago
ABICT, this isn't a talos bug?
Component: Talos → Mochitest Chrome
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Charmingly enough, it's not just Linux64 and it's not particularly intermittent either.

https://tbpl.mozilla.org/?tree=Mozilla-Aurora&rev=2f801d18884d was the first rev to build nightlies after 19 merged to Aurora, and it hit this. Since then, there have been only 3 or 4 Linux browser-chrome runs that should have been running against the nightly build which have not hit this (which may mean it's nearly permaorange rather than actually permaorange, or may mean that tests against nightlies don't always actually run on the nightly, I didn't investigate them).

The "crash" signature is different between Linux64 and Linux32, but I don't think that's significant, it's just where they happen to be sitting idling when the timeout kills them. The visible and possibly significant differences between working runs on dep jobs and failing runs on nightlies seem to be that testpilot is enabled and installed, and that the nightlies have that line as shown abbreviated in comment 0 about "TEST-INFO | (browser-test.js) | Console message: PAC file installed from data:text/plain,function%20FindProxyForURL".
Blocks: 784681
Component: Mochitest Chrome → BrowserTest
Hardware: x86_64 → All
Summary: Linux 64: TEST-UNEXPECTED-FAIL | automation.py | application timed out after 330 seconds with no output | followed by PROCESS-CRASH | automation.py | application crashed [@ libc-2.11.so + 0xd4aa3] → Permaorange browser-chrome on Aurora 19 Linux and Linux64 nightly builds: TEST-UNEXPECTED-FAIL | automation.py | application timed out after 330 seconds with no output | followed by PROCESS-CRASH | automation.py | application crashed
Version: Trunk → 19 Branch
Duplicate of this bug: 825246
Repros on try (https://tbpl.mozilla.org/?tree=Try&rev=5ab7bc28b255) with "export MOZ_UPDATE_CHANNEL=aurora" so that testpilot winds up installed (and failed to repro when I took a different and failed approach to getting testpilot installed by just hacking at extension/Makefile.in). Not sure if there are other byproducts of setting MOZ_UPDATE_CHANNEL, though.

https://tbpl.mozilla.org/php/getParsedLog.php?id=18352661&tree=Mozilla-Aurora
https://tbpl.mozilla.org/php/getParsedLog.php?id=18352610&tree=Mozilla-Aurora
https://tbpl.mozilla.org/php/getParsedLog.php?id=18365999&tree=Mozilla-Aurora
https://tbpl.mozilla.org/php/getParsedLog.php?id=18365871&tree=Mozilla-Aurora
https://tbpl.mozilla.org/php/getParsedLog.php?id=18385051&tree=Mozilla-Aurora
https://tbpl.mozilla.org/php/getParsedLog.php?id=18384978&tree=Mozilla-Aurora
https://tbpl.mozilla.org/php/getParsedLog.php?id=18396759&tree=Mozilla-Aurora
https://tbpl.mozilla.org/php/getParsedLog.php?id=18396778&tree=Mozilla-Aurora
https://tbpl.mozilla.org/php/getParsedLog.php?id=18433842&tree=Mozilla-Aurora
https://tbpl.mozilla.org/php/getParsedLog.php?id=18431387&tree=Mozilla-Aurora
Not quite perma, since I only got the one https://tbpl.mozilla.org/php/getParsedLog.php?id=18463151&tree=Mozilla-Aurora out of https://tbpl.mozilla.org/?tree=Mozilla-Aurora&onlyunstarred=1&rev=32dba69af0fa and the linux32 one does appear to have downloaded the nightly.
https://tbpl.mozilla.org/php/getParsedLog.php?id=18499507&tree=Mozilla-Aurora
https://tbpl.mozilla.org/php/getParsedLog.php?id=18499093&tree=Mozilla-Aurora
https://tbpl.mozilla.org/php/getParsedLog.php?id=18516021&tree=Mozilla-Aurora
https://tbpl.mozilla.org/php/getParsedLog.php?id=18516027&tree=Mozilla-Aurora
And not 19, since Aurora 20 is affected, so I'll bet what I really meant was "any build which includes testpilot, but the only ones of those where I see the tests are Aurora nightlies." 

https://tbpl.mozilla.org/php/getParsedLog.php?id=18605589&tree=Mozilla-Aurora
https://tbpl.mozilla.org/php/getParsedLog.php?id=18603373&tree=Mozilla-Aurora
Summary: Permaorange browser-chrome on Aurora 19 Linux and Linux64 nightly builds: TEST-UNEXPECTED-FAIL | automation.py | application timed out after 330 seconds with no output | followed by PROCESS-CRASH | automation.py | application crashed → Permaorange browser-chrome on Aurora Linux and Linux64 nightly builds: TEST-UNEXPECTED-FAIL | automation.py | application timed out after 330 seconds with no output | followed by PROCESS-CRASH | automation.py | application crashed
https://tbpl.mozilla.org/php/getParsedLog.php?id=18671165&tree=Mozilla-Aurora
https://tbpl.mozilla.org/php/getParsedLog.php?id=18671463&tree=Mozilla-Aurora
akeybl, this is permaorange on Aurora nightlies. Aurora has now been closed since no one has been forthcoming in fixing it. Could you find someone with some cycles that could take a look?
Severity: critical → blocker
Flags: needinfo?(akeybl)
Along with the Linux permaorange that came in on the 19 merge, Mac and Windows b-c became permaorange on the 20 merge - Mac with "leaking until shutdown" like https://tbpl.mozilla.org/php/getParsedLog.php?id=18817908&tree=Mozilla-Aurora and Windows with those plus failures a la https://tbpl.mozilla.org/php/getParsedLog.php?id=18822117&tree=Mozilla-Aurora which remind me a great deal of the Aurora bustage that turned out to be OOM from too many threads from bug 802239. 

Does testpilot spin up a huge number of threads?

Do we really want to keep this situation where we run tests with testpilot included, but only on Aurora nightlies and not anywhere else?
I'm going to reach out to fx-team, given the reproducible steps in comment 16.
tracking-firefox20: --- → +
Flags: needinfo?(akeybl)
This is easily reproducible locally as well.
So here's what happens.  During testpilot init, we get to this code: <http://mxr.mozilla.org/mozilla-central/source/browser/app/profile/extensions/testpilot@labs.mozilla.com/modules/interface.js#87>.  As part of BrowserToolboxCustomizeDone, the content area gets focused: <http://mxr.mozilla.org/mozilla-central/source/browser/base/content/browser.js#3681>.

Later on, when we want to start running the tests, we get to this point: <http://mxr.mozilla.org/mozilla-central/source/testing/mochitest/browser-test.js#281>.  waitForWindowsState calls waitForFocus here: <http://mxr.mozilla.org/mozilla-central/source/testing/mochitest/browser-test.js#154> and we attempt to wait for focus on the window, but the focus event is never dispatched since the element to be focused is already focused, but the focus manager's activeWindow property returns null, so we can't detect that case.

nsFocusManager::WindowRaised seems to be responsible for updating mActiveWindow, and when I turn on focus manager logging, WindowRaised is called way after this stuff.
Assignee: nobody → ehsan
Created attachment 703412 [details] [diff] [review]
Work-around

This patch works around the problem by preventing the testpilot extension from trying to customize the toolbar and hence screwing with focus.
Attachment #703412 - Attachment is patch: true
Attachment #703412 - Flags: review?(enndeakin)
Depends on: 831854
Filed bug 831854 as a follow-up to fix this issue for real.  I think I'll go ahead and push the patch pending post-landing review.  I've already wasted enough time on this.
https://hg.mozilla.org/integration/mozilla-inbound/rev/1b1be4ac343f
https://hg.mozilla.org/releases/mozilla-aurora/rev/12f52471747d
status-firefox20: --- → fixed
I've filed bug 832050 for making sure that Nightly-only test breakage is more obvious (they are currently indistinguishable from pgo test results, meaning a later green pgo result implies it was only an intermittent).
After Ehsan's fix and finally getting runs on every platform, our status now is that Linux32, Mac, and WinXP only have "leaked until shutdown" errors from devtools tests, but Win7 and Linux64 have those plus things just like the symptoms of bug 798849 (timeouts in devtools tests, yeah, but get them out of the way and you have to deal with pdfjs timeouts, get them out of the way and you have to deal with addonmgr timeouts and browser_bug666317.js and a host of others) that bug 802239 fixed. Whether testpilot uses (or leaks) a ton of memory, or we're right on the threshold anyway and it pushes us over, or it's something else, we *look* exactly like we do when we're OOM.
Assignee: ehsan → nobody
OS: Linux → All
Summary: Permaorange browser-chrome on Aurora Linux and Linux64 nightly builds: TEST-UNEXPECTED-FAIL | automation.py | application timed out after 330 seconds with no output | followed by PROCESS-CRASH | automation.py | application crashed → Permaorange browser-chrome on Aurora nightly builds
Created attachment 703732 [details] [diff] [review]
Extreme measures

Couple of choices:

You can pass this bug around through your top generalists, khuey and bz and roc and dbaron and bsmedberg and billm and karlt and I'll think of the next set of people who aren't afraid to look at something that could be coming from any part of the codebase when I need to, until you hit on one who wants to land something on aurora badly enough to borrow a slave (since they're unlikely to have a sufficiently hobbled machine to let them repro OOM) and figure out what's going wrong remotely.

Or you can just land this patch, stop building testpilot on the only tree where we actually look at the results of testing with it built, and reopen aurora in a few hours.

Personally, I can't quite decide which choice I'd take, if I were in the unfortunate position to choose.
Attachment #703732 - Flags: review?(akeybl)
status-firefox20: fixed → affected
I installed testpilot, and when I went looking for the active tests that would explain why we are shipping it, it looks like there's one (either active or forgotten) for Thunderbird, and the last active Firefox tests were in the spring of 2011.

I take it back, I *can* decide whether I'd take a weeks-long closure of aurora while burning the time of some of our most expensive developers or stop shipping an addon that hasn't done anything for a year and a half.
So do we have a good regression window for this?  It looks like it started before the last merge (which was January 7?)... which makes me puzzled as to why it's not happening on beta now too.
I've been scrolling way down on https://tbpl.mozilla.org/?tree=Mozilla-Aurora&jobname=Rev3%20Fedora%2012x64%20mozilla-aurora%20pgo%20test%20mochitest-browser-chrome (though I suppose I could have pulled the nightly changeset hashes off FTP); hopefully I'll have an answer at some point.
Actually, I'm guessing it's not showing up on Beta because we don't do nightlies on Beta (or at least there aren't any on tbpl).

And as I scrolled further down, I realized it probably was the previous merge (when 19 merged to aurora), so I pulled:
https://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/2012/11/2012-11-19-04-20-13-mozilla-aurora/firefox-18.0a2.en-US.linux-x86_64.txt
https://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/2012/11/2012-11-20-04-20-14-mozilla-aurora/firefox-19.0a2.en-US.linux-x86_64.txt

which led to:
https://tbpl.mozilla.org/?tree=Mozilla-Aurora&rev=edc2aedfaed5
https://tbpl.mozilla.org/?tree=Mozilla-Aurora&rev=5f19747d3410

But I guess philor found that already in comment 14; I should have read more closely.  Why don't I put it in the summary where it belongs so others don't do the same, at least.
Summary: Permaorange browser-chrome on Aurora nightly builds → Permaorange browser-chrome on Aurora Linux nightly builds since merge of Firefox 19 to aurora
Summary: Permaorange browser-chrome on Aurora Linux nightly builds since merge of Firefox 19 to aurora → Permaorange browser-chrome on Aurora Linux nightly builds since merge of Firefox 19 to aurora due to presence of testpilot changing focus
Comment on attachment 703412 [details] [diff] [review]
Work-around

Review of attachment 703412 [details] [diff] [review]:
-----------------------------------------------------------------

nit: would have been better to keep the testpilot prefs near each other (there was already one some rows below)
Attachment #703412 - Flags: review?(enndeakin) → review+
(In reply to Marco Bonardo [:mak] from comment #41)
> Comment on attachment 703412 [details] [diff] [review]
> Work-around
> 
> Review of attachment 703412 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> nit: would have been better to keep the testpilot prefs near each other
> (there was already one some rows below)

(Landed on trunk as https://hg.mozilla.org/integration/mozilla-inbound/rev/7d5fdfc2b165, totally not worth backporting to Aurora)
Comment on attachment 703732 [details] [diff] [review]
Extreme measures

FWIW I'd take this if it gives us all green nightly builds.  AFAIK we're not actually running any user studies through this extension.
So I'd missed comment 27, though I think the new failures likely belong in another bug.

However, to test philor's theory in comment 36 that all of the failures are due to testpilot, which after discussion appears not to have been confirmed, I did two try runs off of aurora, one with testpilot:
https://tbpl.mozilla.org/?tree=Try&rev=09687ee6aec9
and one without:
https://tbpl.mozilla.org/?tree=Try&rev=54bfcb35a934
(at least assuming I did it correctly).
(In reply to David Baron [:dbaron] from comment #44)
> I did two try runs off of aurora, one with testpilot:
> https://tbpl.mozilla.org/?tree=Try&rev=09687ee6aec9

That didn't actually get testpilot for you, because you have to export the env var before http://mxr.mozilla.org/mozilla-aurora/source/browser/config/mozconfigs/linux32/nightly#1 (or redo that line after you export it in the override).
Indeed.  New pair, overriding the configure option directly:

As things are now on aurora, with aurora update channel:
https://tbpl.mozilla.org/?tree=Try&rev=3186beecad30

Plus removal of testpilot:
https://tbpl.mozilla.org/?tree=Try&rev=0d401b02bc4a
(In reply to comment #46)
> Indeed.  New pair, overriding the configure option directly:
> 
> As things are now on aurora, with aurora update channel:
> https://tbpl.mozilla.org/?tree=Try&rev=3186beecad30
> 
> Plus removal of testpilot:
> https://tbpl.mozilla.org/?tree=Try&rev=0d401b02bc4a

Seems like the second push is burning at least on Linux and Mac.
New second try push:
https://tbpl.mozilla.org/?tree=Try&rev=bd8b84c0b9b1
in which:
https://hg.mozilla.org/users/dbaron_mozilla.com/patches-aurora/raw-file/2a9366c139d9/no-aurora-testpilot
replaces attachment 703732 [details] [diff] [review] from comment 36.
Current status:

Aurora is closed because Linux64 and Win7 browser-chrome against nightlies fail in a way which stops the test suite from finishing, making it impossible to tell whether any new failures have been added.

On those two platforms we get a complex of test failures which looks exactly like the bug 798849 OOM failures that we hit in both June/July and October 2012. We have no idea what fixed them in June/July; in October it turned out that we were winding up with ~300 storage threads.
Summary: Permaorange browser-chrome on Aurora Linux nightly builds since merge of Firefox 19 to aurora due to presence of testpilot changing focus → Permaorange browser-chrome on Aurora Linux64 and Win7 nightly builds since merge of Firefox 20 to Aurora
Version: 19 Branch → 20 Branch
Crash Signature: [@ libc-2.11.so@0xd4aa3]
Comment on attachment 703732 [details] [diff] [review]
Extreme measures

I don't know why I'm surprised that this awful method of enabling or disabling building an extension based on overloading an env var instead of using configure leads to confusion and bustage.
Attachment #703732 - Attachment is obsolete: true
Attachment #703732 - Flags: review?(akeybl)
So this pair of try runs:

> As things are now on aurora, with aurora update channel:
> https://tbpl.mozilla.org/?tree=Try&rev=3186beecad30
> 
> Plus removal of testpilot:
> https://tbpl.mozilla.org/?tree=Try&rev=bd8b84c0b9b1

seems to show that disabling testpilot fixes all of the browser-chrome failures.

(Ignore the android builds; the mechanism I used to override the update channel setting and simulate a nightly on try didn't apply to them anyway due to a build system bug that I'll prepare an m-c patch for shortly; I'm not sure why they're orange, though.)
So let me try to summarize the current state of what's going on here:

Our continuous integration testing on TBPL generates builds for pushes, and runs tests on them, occasionally coalescing them.  This happens on all of our active development branches.  On mozilla-central and mozilla-aurora (but not mozilla-beta or mozilla-release, I think), the nightly builds we generate also show up on TBPL, and have unit tests run on them.

This bug covers a set of permanent test failures (perma-oranges) that occur *only* on the unit tests of nightly builds (which differ in some ways from the other builds, most notably by setting the update channel) and not on the unit tests of the push-generated builds.  Furthermore, these test failures are happening only on Aurora, and the Aurora tree is currently closed for those failures.

Disabling the testpilot extension fixes *all* of these failures (see comment 51); the patch to disable it is the patch linked in comment 48.   Since building testpilot is conditional on the update channel being aurora or beta, the only place we run unit tests on builds with testpilot is the unit tests we run of nightly builds on mozilla-aurora.

These failures (again, all fixed by disabling testpilot) were introduced at separate points:

 (a) when Firefox 19 merged to aurora, we introduced a focus-related perma-orange on the browser-chrome tests on Linux.  This permaorange was worked around yesterday by https://hg.mozilla.org/releases/mozilla-aurora/rev/12f52471747d and bug 831854 covers fixing it better.

 (b) when Firefox 20 merged to aurora, additional browser-chrome failures were introduced.  These failures were similar to failures previously observed twice before (see comment 49)

 (c) There was also a set of leaks from devtools tests, investigated in bug 824016 rather than this bug, which I believe (but am not sure) were also introduced when Firefox 20 merged to aurora.  These tests have been disabled in https://hg.mozilla.org/releases/mozilla-aurora/rev/a8d6394508a3 after a set of attempts to fix them failed.  Since that fix was not included in the with-and-without testpilot comparative try runs in comment 51 (though the previous attempts to fix those failures were), these devtools leaks also appear related to testpilot.



I am aware of three options going forward:

 (1) Decide that our push-based testing is sufficient test coverage and that we're ok reopening the aurora tree with permanent test failures in the tests of *nightly* builds, and reopen mozilla-aurora.  (jlebar and I were advocating this in the thread on dev-platform; ehsan was against, as I think were some others; this was before option (2) was confirmed to be an available solution.)

 (2) Disable the testpilot extension on aurora using the patch in comment 48, and reopen mozilla-aurora.  comment 43 says that we're not currently running any studies using testpilot (and also that ehsan supports this solution).

 (3) Continue to hold mozilla-aurora closed for further investigation of the group (b) failures above.  This does not provide a clear path to reopening or to shipping Firefox 20.
(In reply to David Baron [:dbaron] from comment #52)
>  (c) There was also a set of leaks from devtools tests, investigated in bug
> 824016 rather than this bug, which I believe (but am not sure) were also
> introduced when Firefox 20 merged to aurora.

philor confirms that these were indeed introduced when Firefox 20 merged to aurora.
One other point to add to the summary, actually:  tests of nightlies aren't currently distinguished on tbpl from tests of pgo builds (bug 832050 covers fixing this).  This meant that *all* of the failures described in this bug appeared to be intermittent failures rather than permanent failures unless they were examined very closely.  That's one of the reasons it took so long for these failures to lead to the tree being closed.
I support option 2 in comment 52.
https://hg.mozilla.org/mozilla-central/rev/1b1be4ac343f
Assignee: nobody → ehsan
Status: NEW → RESOLVED
Last Resolved: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla21
Assignee: ehsan → nobody
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
(In reply to :Ehsan Akhgari from comment #55)
> I support option 2 in comment 52.

Agreed - Cheng and Jinghua (the main creators of testpilot surveys) hopefully don't have any urgent surveys in the short term while we continue our investigation. a=akeybl on option 2.
When I say I'm in support of option 2, I am assuming that we'll continue to investigate and find a final resolution allowing testpilot surveys on Aurora soon, of course.
Landed the patch to turn off building testpilot on aurora in https://hg.mozilla.org/releases/mozilla-aurora/rev/c489c87349b5
Target Milestone: mozilla21 → ---
Depends on: 832702
Filed bug 832702 - Reenable building testpilot on mozilla-aurora when it no longer causes test failures, dependent on bug 832703 - testpilot causes browser-chrome leaks on Mac and Linux and bug 832705 - Complex of OOM failures in Linux64 and Win7 browser-chrome tests with testpilot enabled.
Aurora's reopened.
Status: REOPENED → RESOLVED
Last Resolved: 4 years ago4 years ago
Resolution: --- → FIXED
And once the light of future merges dawned on me, pushed to m-c in https://hg.mozilla.org/mozilla-central/rev/4919e8091542
We use Test Pilot all the time, and continually deploy new tests on it.  

The situation with its code is bad, and we are trying to decide what the best way to handle this going forward is...

Fix 1.2?  Build 2.0?
Depends on: 840108
Severity: blocker → critical
(In reply to Gregg Lind (User Research - Test Pilot) from comment #63)
> We use Test Pilot all the time, and continually deploy new tests on it.  
> 
> The situation with its code is bad, and we are trying to decide what the
> best way to handle this going forward is...
> 
> Fix 1.2?  Build 2.0?

Do you have an ETA on owners for bug 832703 and bug 832705?
Assignee: nobody → glind
Depends on: 841029
Gregg: this is tracking for Firefox 20 which is now on Beta and will ship in 6 weeks - anything you can do to advance the investigation here (put additional pressure on the 6 day old bug about getting a 64 bit machine?)?
Moving this over to FF21 tracking (current Aurora) as I don't believe there is anything to do here for FF20.
status-firefox20: affected → unaffected
status-firefox21: --- → affected
tracking-firefox20: + → -
tracking-firefox21: --- → +
Well, yes and no - there's absolutely no reason to believe that 20-on-beta isn't leaking and OOMing just because we don't actually run the tests (or to be more painfully accurate, just because we run the tests, on release builds, but absolutely positively not one person ever looks at the results of the tests) that would tell us that it is.

As far as I know we haven't done any investigation about whether any of the test failures, those two or the screwy focus that we "fixed" by insisting that the addon stop customizing the toolbar, were actually things that users would also see.
(Ask:  Real-time help me build and test on Linux-64-opt) 

I am blocked on this, honestly.  I need some real-time help building this on Unix and running the tests.  I have a build host, and have done mach builds on OSX, but unless I can get someone to walk me through the simplest testing / patching path, I really am failing at doing this.  

I want to fix this, and have time authorized to fix this, but the cost of re-figuring out the build/test process without guidance is very very expensive.  Help me lower it :)
Flags: needinfo?
Taking Gregg off this bug for now, since assigning Neil to bug 831854 looks to be the next steps here.  Also marking this tracking again for FF20 since, as philor calls out, we do need this test suite running prior to FF20 release to ensure we are not leaking and OOMing.
Assignee: glind → nobody
status-firefox20: unaffected → affected
tracking-firefox20: - → +
Flags: needinfo?
(In reply to comment #69)
> Taking Gregg off this bug for now, since assigning Neil to bug 831854 looks to
> be the next steps here.  Also marking this tracking again for FF20 since, as
> philor calls out, we do need this test suite running prior to FF20 release to
> ensure we are not leaking and OOMing.

Note that it might be possible to work around the focus issue in testpilot in case we won't have an immediate fix for bug 831854.

Comment 71

4 years ago
What's the status here? The doors are closing pretty soon on 20 and this is still marked for tracking that one...
(In reply to Robert Kaiser (:kairo@mozilla.com) from comment #71)
> What's the status here? The doors are closing pretty soon on 20 and this is
> still marked for tracking that one...

Is there any risk here outside of Test Pilot? If not, we can untrack for FF20 at this point (releasing in 2 weeks).
Flags: needinfo?(ehsan)
Nothing outside of testpilot - it needed the flag that doesn't exist, tracking-the-20-betas, since they may or may not have leaked and OOMed, but we don't build or ship testpilot with releases, so at this point it's 20-whatever and on to shipping 21 betas that may or may not leak and OOM.
Yeah, what philor said.
Flags: needinfo?(ehsan)

Updated

4 years ago
status-firefox20: affected → wontfix
tracking-firefox20: + → -

Updated

4 years ago
tracking-firefox21: + → ---
We untracking in favor of bug 840108.
No longer depends on: 840108
You need to log in before you can comment on or make changes to this bug.