Closed Bug 812729 Opened 12 years ago Closed 11 years ago

Intermittent xperf, tpn "talosError: Graph server unreachable (5 attempts)" while (not) sending missing counters

Categories

(Testing :: Talos, defect)

x86
Windows 7
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: philor, Unassigned)

References

Details

(Keywords: intermittent-failure)

Bug 812315 stopped us from failing out of xperf runs when a counter was missing, but instead gave us https://tbpl.mozilla.org/php/getParsedLog.php?id=17127455&tree=Mozilla-Inbound Exception in writing file '['START\nVALUES\ntalos-r3-w7-084,tp5n_paint,Mozilla-Inbound-Non-PGO,5078cf4f60a3,20121116192202,1353128273\n0,1680.00,thesartorialist.blogspot.com\n1,1029.00,cakewrecks.blogspot.com\n2,2702.00,baidu.com\n3,968.00,en.wikipedia.org\n4,581.00,twitter.com\n5,474.00,msn.com\n6,378.00,yahoo.co.jp\n7,1853.00,amazon.com\n8,326.00,linkedin.com\n9,218.00,bing.com\n10,1182.00,icanhascheezburger.com\n11,667.00,yandex.ru\n12,813.00,cgi.ebay.com\n13,1108.00,163.com\n14,732.00,mail.ru\n15,893.00,bbc.co.uk\n16,676.00,store.apple.com\n17,591.00,imdb.com\n18,4272.00,mozilla.com\n19,713.00,ask.com\n20,893.00,cnn.com\n21,1095.00,sohu.com\n22,473.00,vkontakte.ru\n23,952.00,youku.com\n24,474.00,myparentswereawesome.tumblr.com\n25,1006.00,ifeng.com\n26,555.00,ameblo.jp\n27,847.00,tudou.com\n28,321.00,chemistry.about.com\n29,550.00,beatonna.livejournal.com\n30,633.00,hao123.com\n31,985.00,rakuten.co.jp\n32,199.00,alibaba.com\n33,813.00,uol.com.br\n34,967.00,cnet.com\n35,482.00,ehow.com\n36,390.00,thepiratebay.org\n37,501.00,page.renren.com\n38,517.00,chinaz.com\n39,900.00,globo.com\n40,779.00,spiegel.de\n41,708.00,dailymotion.com\n42,350.00,goo.ne.jp\n43,435.00,alipay.com\n44,905.00,stackoverflow.com\n45,623.00,nicovideo.jp\n46,383.00,ezinearticles.com\n47,630.00,taringa.net\n48,1755.00,tmall.com\n49,971.00,huffingtonpost.com\n50,615.00,deviantart.com\n51,778.00,media.photobucket.com\n52,601.00,douban.com\n53,897.00,imgur.com\n54,374.00,reddit.com\n55,568.00,digg.com\n56,411.00,filestube.com\n57,1373.00,dailymail.co.uk\n58,267.00,whois.domaintools.com\n59,641.00,indiatimes.com\n60,604.00,rambler.ru\n61,276.00,torrentz.eu\n62,717.00,reuters.com\n63,646.00,foxnews.com\n64,1902.00,xinhuanet.com\n65,942.00,56.com\n66,1655.00,bild.de\n67,666.00,guardian.co.uk\n68,400.00,w3schools.com\n69,1428.00,naver.com\n70,374.00,blogfa.com\n71,668.00,terra.com.br\n72,367.00,ucoz.ru\n73,717.00,yelp.com\n74,1024.00,wsj.com\n75,593.00,noimpactman.typepad.com\n76,901.00,myspace.com\n77,206.00,google.com\n78,383.00,orange.fr\n79,235.00,php.net\n80,1205.00,zol.com.cn\n81,1036.00,mashable.com\n82,340.00,etsy.com\n83,405.00,gmx.net\n84,1137.00,csdn.net\n85,953.00,xunlei.com\n86,653.00,hatena.ne.jp\n87,490.00,icious.com\n88,865.00,repubblica.it\n89,567.00,web.de\n90,533.00,slideshare.net\n91,529.00,telegraph.co.uk\n92,918.00,seesaa.net\n93,541.00,wp.pl\n94,1086.00,aljazeera.net\n95,423.00,w3.org\n96,813.00,homeway.com.cn\n97,222.00,facebook.com\n98,625.00,youtube.com\n99,649.00,people.com.cn\nEND', 'START\nVALUES\ntalos-r3-w7-084,tp5n_main_shutdown_fileio_paint,Mozilla-Inbound-Non-PGO,5078cf4f60a3,20121116192202,1353128273\nEND', 'START\nVALUES\ntalos-r3-w7-084,tp5n_main_normal_netio_paint,Mozilla-Inbound-Non-PGO,5078cf4f60a3,20121116192202,1353128273\nEND', 'START\nVALUES\ntalos-r3-w7-084,tp5n_nonmain_startup_fileio_paint,Mozilla-Inbound-Non-PGO,5078cf4f60a3,20121116192202,1353128273\nEND', 'START\nVALUES\ntalos-r3-w7-084,tp5n_main_startup_netio_paint,Mozilla-Inbound-Non-PGO,5078cf4f60a3,20121116192202,1353128273\nEND', 'START\nVALUES\ntalos-r3-w7-084,tp5n_nonmain_normal_fileio_paint,Mozilla-Inbound-Non-PGO,5078cf4f60a3,20121116192202,1353128273\nEND', 'START\nVALUES\ntalos-r3-w7-084,tp5n_main_normal_fileio_paint,Mozilla-Inbound-Non-PGO,5078cf4f60a3,20121116192202,1353128273\nEND', 'START\nVALUES\ntalos-r3-w7-084,tp5n_main_shutdown_netio_paint,Mozilla-Inbound-Non-PGO,5078cf4f60a3,20121116192202,1353128273\nEND']' from results_url: ['START\nVALUES\ntalos-r3-w7-084,tp5n_paint,Mozilla-Inbound-Non-PGO,5078cf4f60a3,20121116192202,1353128273\n0,1680.00,thesartorialist.blogspot.com\n1,1029.00,cakewrecks.blogspot.com\n2,2702.00,baidu.com\n3,968.00,en.wikipedia.org\n4,581.00,twitter.com\n5,474.00,msn.com\n6,378.00,yahoo.co.jp\n7,1853.00,amazon.com\n8,326.00,linkedin.com\n9,218.00,bing.com\n10,1182.00,icanhascheezburger.com\n11,667.00,yandex.ru\n12,813.00,cgi.ebay.com\n13,1108.00,163.com\n14,732.00,mail.ru\n15,893.00,bbc.co.uk\n16,676.00,store.apple.com\n17,591.00,imdb.com\n18,4272.00,mozilla.com\n19,713.00,ask.com\n20,893.00,cnn.com\n21,1095.00,sohu.com\n22,473.00,vkontakte.ru\n23,952.00,youku.com\n24,474.00,myparentswereawesome.tumblr.com\n25,1006.00,ifeng.com\n26,555.00,ameblo.jp\n27,847.00,tudou.com\n28,321.00,chemistry.about.com\n29,550.00,beatonna.livejournal.com\n30,633.00,hao123.com\n31,985.00,rakuten.co.jp\n32,199.00,alibaba.com\n33,813.00,uol.com.br\n34,967.00,cnet.com\n35,482.00,ehow.com\n36,390.00,thepiratebay.org\n37,501.00,page.renren.com\n38,517.00,chinaz.com\n39,900.00,globo.com\n40,779.00,spiegel.de\n41,708.00,dailymotion.com\n42,350.00,goo.ne.jp\n43,435.00,alipay.com\n44,905.00,stackoverflow.com\n45,623.00,nicovideo.jp\n46,383.00,ezinearticles.com\n47,630.00,taringa.net\n48,1755.00,tmall.com\n49,971.00,huffingtonpost.com\n50,615.00,deviantart.com\n51,778.00,media.photobucket.com\n52,601.00,douban.com\n53,897.00,imgur.com\n54,374.00,reddit.com\n55,568.00,digg.com\n56,411.00,filestube.com\n57,1373.00,dailymail.co.uk\n58,267.00,whois.domaintools.com\n59,641.00,indiatimes.com\n60,604.00,rambler.ru\n61,276.00,torrentz.eu\n62,717.00,reuters.com\n63,646.00,foxnews.com\n64,1902.00,xinhuanet.com\n65,942.00,56.com\n66,1655.00,bild.de\n67,666.00,guardian.co.uk\n68,400.00,w3schools.com\n69,1428.00,naver.com\n70,374.00,blogfa.com\n71,668.00,terra.com.br\n72,367.00,ucoz.ru\n73,717.00,yelp.com\n74,1024.00,wsj.com\n75,593.00,noimpactman.typepad.com\n76,901.00,myspace.com\n77,206.00,google.com\n78,383.00,orange.fr\n79,235.00,php.net\n80,1205.00,zol.com.cn\n81,1036.00,mashable.com\n82,340.00,etsy.com\n83,405.00,gmx.net\n84,1137.00,csdn.net\n85,953.00,xunlei.com\n86,653.00,hatena.ne.jp\n87,490.00,icious.com\n88,865.00,repubblica.it\n89,567.00,web.de\n90,533.00,slideshare.net\n91,529.00,telegraph.co.uk\n92,918.00,seesaa.net\n93,541.00,wp.pl\n94,1086.00,aljazeera.net\n95,423.00,w3.org\n96,813.00,homeway.com.cn\n97,222.00,facebook.com\n98,625.00,youtube.com\n99,649.00,people.com.cn\nEND', 'START\nVALUES\ntalos-r3-w7-084,tp5n_main_shutdown_fileio_paint,Mozilla-Inbound-Non-PGO,5078cf4f60a3,20121116192202,1353128273\nEND', 'START\nVALUES\ntalos-r3-w7-084,tp5n_main_normal_netio_paint,Mozilla-Inbound-Non-PGO,5078cf4f60a3,20121116192202,1353128273\nEND', 'START\nVALUES\ntalos-r3-w7-084,tp5n_nonmain_startup_fileio_paint,Mozilla-Inbound-Non-PGO,5078cf4f60a3,20121116192202,1353128273\nEND', 'START\nVALUES\ntalos-r3-w7-084,tp5n_main_startup_netio_paint,Mozilla-Inbound-Non-PGO,5078cf4f60a3,20121116192202,1353128273\nEND', 'START\nVALUES\ntalos-r3-w7-084,tp5n_nonmain_normal_fileio_paint,Mozilla-Inbound-Non-PGO,5078cf4f60a3,20121116192202,1353128273\nEND', 'START\nVALUES\ntalos-r3-w7-084,tp5n_main_normal_fileio_paint,Mozilla-Inbound-Non-PGO,5078cf4f60a3,20121116192202,1353128273\nEND', 'START\nVALUES\ntalos-r3-w7-084,tp5n_main_shutdown_netio_paint,Mozilla-Inbound-Non-PGO,5078cf4f60a3,20121116192202,1353128273\nEND'] FAIL: Graph server unreachable (5 attempts) RETURN:send failed, graph server says: RETURN:to determine average from 'test_run_values' for 20793713 - local variable 'values' referenced before assignment RETURN: File "/var/www/html/graphs/server/pyfomatic/collect.py", line 273, in handleRequest RETURN: average = valuesReader(databaseCursor, databaseModule, inputStream, metadata) RETURN: File "/var/www/html/graphs/server/pyfomatic/collect.py", line 219, in valuesReader RETURN: raise DatabaseException("to determine average from 'test_run_values' for %s - %s" % (metadata.test_run_id, str(x))) RETURN: RETURN: Traceback (most recent call last): File "run_tests.py", line 311, in ? main() File "run_tests.py", line 308, in main run_tests(parser) File "run_tests.py", line 284, in run_tests talos_results.output(results_urls, **results_options) File "C:\talos-slave\talos-data\talos\results.py", line 89, in output raise e utils.talosError: 'Graph server unreachable (5 attempts)\nsend failed, graph server says:\nto determine average from \'test_run_values\' for 20793713 - local variable \'values\' referenced before assignment\n File "/var/www/html/graphs/server/pyfomatic/collect.py", line 273, in handleRequest\n average = valuesReader(databaseCursor, databaseModule, inputStream, metadata)\n File "/var/www/html/graphs/server/pyfomatic/collect.py", line 219, in valuesReader\n raise DatabaseException("to determine average from \'test_run_values\' for %s - %s" % (metadata.test_run_id, str(x)))\n\n' program finished with exit code 1 elapsedTime=333.778000 talosError: Graph server unreachable (5 attempts) TinderboxPrint:send failed, graph server says: TinderboxPrint:to determine average from 'test_run_values' for 20793713 - local variable 'values' referenced before assignment TinderboxPrint: File "/var/www/html/graphs/server/pyfomatic/collect.py", line 273, in handleRequest TinderboxPrint: average = valuesReader(databaseCursor, databaseModule, inputStream, metadata) TinderboxPrint: File "/var/www/html/graphs/server/pyfomatic/collect.py", line 219, in valuesReader TinderboxPrint: raise DatabaseException("to determine average from 'test_run_values' for %s - %s" % (metadata.test_run_id, str(x))) which isn't a whole lot nicer.
Depends on: 808547
Since emails aren't visible to sheriffs... ----- Original Message ----- > From: "Joel Maher" <jmaher@snip> > To: "Taras Glek" <tglek@snip> > Cc: "Ed Morley" <edmorley@snip>, "Jeffrey Hammel" <jhammel@snip> > Sent: Friday, 16 November, 2012 7:02:53 PM > > today we turned off the forced error on missing data from xperf and > now we get a different error if there is missing data: > https://bugzilla.mozilla.org/show_bug.cgi?id=808547 > > The cycle never ends, but we need somebody to look at this xperf data > or we need to turn it off. I am still busy with mobile and panda > boards. We could hack talos some more to work around this. But > this comes back to the argument about what is the point of > collecting this information if we miss a lot of it. We turned on > the error for missing counters/data because we found an xperf > regression that didn't report for a single changeset. > > Our resources are limited and running tests that nobody cares to fix > or even track don't motivate folks to keep tests running. ----- Original Message ----- > From: "Taras Glek" <tglek@snip> > > I think you guys are the only people in a position to fix these. We > need > these tests. If you can rule out machine configuration issues, we can > put a perf developer on it. ----- Original Message ----- > From: "Joel Maher" <jmaher@snip> > > I am pretty certain the machines have the xperf stuff installed as we > are getting other counters. > > I suspect 1 of 2 problems: > 1) xperf isn't collecting data > 2) the toolchain that parses the xperf.etl file has a corner case and > we are failing > 3) if the only counters we are missing are custom firefox counters we > need to ensure the counters are installed on the machines > > I spent a day looking at this a few weeks ago and I could not > reproduce the problems on my windows 7 vm. All other options I can > think of is to either: > 1) get a win7 slave from the pool and try to reproduce > 2) print a lot of debugging in the logs so we can try to find > incorrect assumptions or data > 3) query all machines that exhibit this failure pattern and see if > there are common slave names. ----- Original Message ----- > From: "Taras Glek" <tglek@snip> > > That sounds like the way to go. I can ask Aaron Klotz to learn xperf > stuff to debug it further once you've done above and still need help. > He's currently working on some flash stuff that should be wrapping up > in ----- Original Message ----- > From: "Ed Morley" <emorley@snip> > > > On 11/16/2012 11:02 AM, Joel Maher wrote: > > and now we get a different error if there is missing data: > > https://bugzilla.mozilla.org/show_bug.cgi?id=808547 > > That bug is for the output format obscuring the cause of the failure > in TBPL's annotated summary. > > Philor has since filed a bug for the failure itself: > https://bugzilla.mozilla.org/show_bug.cgi?id=812729
Summary: Intermittent xperf "talosError: Graph server unreachable (5 attempts)" while (not) sending missing counters → Intermittent xperf, tpn "talosError: Graph server unreachable (5 attempts)" while (not) sending missing counters
Depends on: 813239
Whiteboard: [orange]
xperf has been hidden on Windows on mozilla-central, mozilla-inbound & due to too many failures. It will be unhidden once fixed. Note: Due to being hidden, there will be no more tbplbot spam to this bug, but that is not an indication that it is fixed.
(In reply to Ed Morley [UTC+0; email:edmorley@moco] from comment #253) > mozilla-central, mozilla-inbound & & Try
Blocks: 822813
Depends on: 827348
Depends on: 834366
any objection to closing this bug? It hsa been 4 months and the toolchain for xperf was updated in late June (after the last comment).
wfm :-)
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.