Closed Bug 812729 Opened 12 years ago Closed 11 years ago

Intermittent xperf, tpn "talosError: Graph server unreachable (5 attempts)" while (not) sending missing counters

Categories

(Testing :: Talos, defect)

x86
Windows 7
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: philor, Unassigned)

References

Details

(Keywords: intermittent-failure)

Bug 812315 stopped us from failing out of xperf runs when a counter was missing, but instead gave us

https://tbpl.mozilla.org/php/getParsedLog.php?id=17127455&tree=Mozilla-Inbound

Exception in writing file '['START\nVALUES\ntalos-r3-w7-084,tp5n_paint,Mozilla-Inbound-Non-PGO,5078cf4f60a3,20121116192202,1353128273\n0,1680.00,thesartorialist.blogspot.com\n1,1029.00,cakewrecks.blogspot.com\n2,2702.00,baidu.com\n3,968.00,en.wikipedia.org\n4,581.00,twitter.com\n5,474.00,msn.com\n6,378.00,yahoo.co.jp\n7,1853.00,amazon.com\n8,326.00,linkedin.com\n9,218.00,bing.com\n10,1182.00,icanhascheezburger.com\n11,667.00,yandex.ru\n12,813.00,cgi.ebay.com\n13,1108.00,163.com\n14,732.00,mail.ru\n15,893.00,bbc.co.uk\n16,676.00,store.apple.com\n17,591.00,imdb.com\n18,4272.00,mozilla.com\n19,713.00,ask.com\n20,893.00,cnn.com\n21,1095.00,sohu.com\n22,473.00,vkontakte.ru\n23,952.00,youku.com\n24,474.00,myparentswereawesome.tumblr.com\n25,1006.00,ifeng.com\n26,555.00,ameblo.jp\n27,847.00,tudou.com\n28,321.00,chemistry.about.com\n29,550.00,beatonna.livejournal.com\n30,633.00,hao123.com\n31,985.00,rakuten.co.jp\n32,199.00,alibaba.com\n33,813.00,uol.com.br\n34,967.00,cnet.com\n35,482.00,ehow.com\n36,390.00,thepiratebay.org\n37,501.00,page.renren.com\n38,517.00,chinaz.com\n39,900.00,globo.com\n40,779.00,spiegel.de\n41,708.00,dailymotion.com\n42,350.00,goo.ne.jp\n43,435.00,alipay.com\n44,905.00,stackoverflow.com\n45,623.00,nicovideo.jp\n46,383.00,ezinearticles.com\n47,630.00,taringa.net\n48,1755.00,tmall.com\n49,971.00,huffingtonpost.com\n50,615.00,deviantart.com\n51,778.00,media.photobucket.com\n52,601.00,douban.com\n53,897.00,imgur.com\n54,374.00,reddit.com\n55,568.00,digg.com\n56,411.00,filestube.com\n57,1373.00,dailymail.co.uk\n58,267.00,whois.domaintools.com\n59,641.00,indiatimes.com\n60,604.00,rambler.ru\n61,276.00,torrentz.eu\n62,717.00,reuters.com\n63,646.00,foxnews.com\n64,1902.00,xinhuanet.com\n65,942.00,56.com\n66,1655.00,bild.de\n67,666.00,guardian.co.uk\n68,400.00,w3schools.com\n69,1428.00,naver.com\n70,374.00,blogfa.com\n71,668.00,terra.com.br\n72,367.00,ucoz.ru\n73,717.00,yelp.com\n74,1024.00,wsj.com\n75,593.00,noimpactman.typepad.com\n76,901.00,myspace.com\n77,206.00,google.com\n78,383.00,orange.fr\n79,235.00,php.net\n80,1205.00,zol.com.cn\n81,1036.00,mashable.com\n82,340.00,etsy.com\n83,405.00,gmx.net\n84,1137.00,csdn.net\n85,953.00,xunlei.com\n86,653.00,hatena.ne.jp\n87,490.00,icious.com\n88,865.00,repubblica.it\n89,567.00,web.de\n90,533.00,slideshare.net\n91,529.00,telegraph.co.uk\n92,918.00,seesaa.net\n93,541.00,wp.pl\n94,1086.00,aljazeera.net\n95,423.00,w3.org\n96,813.00,homeway.com.cn\n97,222.00,facebook.com\n98,625.00,youtube.com\n99,649.00,people.com.cn\nEND', 'START\nVALUES\ntalos-r3-w7-084,tp5n_main_shutdown_fileio_paint,Mozilla-Inbound-Non-PGO,5078cf4f60a3,20121116192202,1353128273\nEND', 'START\nVALUES\ntalos-r3-w7-084,tp5n_main_normal_netio_paint,Mozilla-Inbound-Non-PGO,5078cf4f60a3,20121116192202,1353128273\nEND', 'START\nVALUES\ntalos-r3-w7-084,tp5n_nonmain_startup_fileio_paint,Mozilla-Inbound-Non-PGO,5078cf4f60a3,20121116192202,1353128273\nEND', 'START\nVALUES\ntalos-r3-w7-084,tp5n_main_startup_netio_paint,Mozilla-Inbound-Non-PGO,5078cf4f60a3,20121116192202,1353128273\nEND', 'START\nVALUES\ntalos-r3-w7-084,tp5n_nonmain_normal_fileio_paint,Mozilla-Inbound-Non-PGO,5078cf4f60a3,20121116192202,1353128273\nEND', 'START\nVALUES\ntalos-r3-w7-084,tp5n_main_normal_fileio_paint,Mozilla-Inbound-Non-PGO,5078cf4f60a3,20121116192202,1353128273\nEND', 'START\nVALUES\ntalos-r3-w7-084,tp5n_main_shutdown_netio_paint,Mozilla-Inbound-Non-PGO,5078cf4f60a3,20121116192202,1353128273\nEND']' from results_url: ['START\nVALUES\ntalos-r3-w7-084,tp5n_paint,Mozilla-Inbound-Non-PGO,5078cf4f60a3,20121116192202,1353128273\n0,1680.00,thesartorialist.blogspot.com\n1,1029.00,cakewrecks.blogspot.com\n2,2702.00,baidu.com\n3,968.00,en.wikipedia.org\n4,581.00,twitter.com\n5,474.00,msn.com\n6,378.00,yahoo.co.jp\n7,1853.00,amazon.com\n8,326.00,linkedin.com\n9,218.00,bing.com\n10,1182.00,icanhascheezburger.com\n11,667.00,yandex.ru\n12,813.00,cgi.ebay.com\n13,1108.00,163.com\n14,732.00,mail.ru\n15,893.00,bbc.co.uk\n16,676.00,store.apple.com\n17,591.00,imdb.com\n18,4272.00,mozilla.com\n19,713.00,ask.com\n20,893.00,cnn.com\n21,1095.00,sohu.com\n22,473.00,vkontakte.ru\n23,952.00,youku.com\n24,474.00,myparentswereawesome.tumblr.com\n25,1006.00,ifeng.com\n26,555.00,ameblo.jp\n27,847.00,tudou.com\n28,321.00,chemistry.about.com\n29,550.00,beatonna.livejournal.com\n30,633.00,hao123.com\n31,985.00,rakuten.co.jp\n32,199.00,alibaba.com\n33,813.00,uol.com.br\n34,967.00,cnet.com\n35,482.00,ehow.com\n36,390.00,thepiratebay.org\n37,501.00,page.renren.com\n38,517.00,chinaz.com\n39,900.00,globo.com\n40,779.00,spiegel.de\n41,708.00,dailymotion.com\n42,350.00,goo.ne.jp\n43,435.00,alipay.com\n44,905.00,stackoverflow.com\n45,623.00,nicovideo.jp\n46,383.00,ezinearticles.com\n47,630.00,taringa.net\n48,1755.00,tmall.com\n49,971.00,huffingtonpost.com\n50,615.00,deviantart.com\n51,778.00,media.photobucket.com\n52,601.00,douban.com\n53,897.00,imgur.com\n54,374.00,reddit.com\n55,568.00,digg.com\n56,411.00,filestube.com\n57,1373.00,dailymail.co.uk\n58,267.00,whois.domaintools.com\n59,641.00,indiatimes.com\n60,604.00,rambler.ru\n61,276.00,torrentz.eu\n62,717.00,reuters.com\n63,646.00,foxnews.com\n64,1902.00,xinhuanet.com\n65,942.00,56.com\n66,1655.00,bild.de\n67,666.00,guardian.co.uk\n68,400.00,w3schools.com\n69,1428.00,naver.com\n70,374.00,blogfa.com\n71,668.00,terra.com.br\n72,367.00,ucoz.ru\n73,717.00,yelp.com\n74,1024.00,wsj.com\n75,593.00,noimpactman.typepad.com\n76,901.00,myspace.com\n77,206.00,google.com\n78,383.00,orange.fr\n79,235.00,php.net\n80,1205.00,zol.com.cn\n81,1036.00,mashable.com\n82,340.00,etsy.com\n83,405.00,gmx.net\n84,1137.00,csdn.net\n85,953.00,xunlei.com\n86,653.00,hatena.ne.jp\n87,490.00,icious.com\n88,865.00,repubblica.it\n89,567.00,web.de\n90,533.00,slideshare.net\n91,529.00,telegraph.co.uk\n92,918.00,seesaa.net\n93,541.00,wp.pl\n94,1086.00,aljazeera.net\n95,423.00,w3.org\n96,813.00,homeway.com.cn\n97,222.00,facebook.com\n98,625.00,youtube.com\n99,649.00,people.com.cn\nEND', 'START\nVALUES\ntalos-r3-w7-084,tp5n_main_shutdown_fileio_paint,Mozilla-Inbound-Non-PGO,5078cf4f60a3,20121116192202,1353128273\nEND', 'START\nVALUES\ntalos-r3-w7-084,tp5n_main_normal_netio_paint,Mozilla-Inbound-Non-PGO,5078cf4f60a3,20121116192202,1353128273\nEND', 'START\nVALUES\ntalos-r3-w7-084,tp5n_nonmain_startup_fileio_paint,Mozilla-Inbound-Non-PGO,5078cf4f60a3,20121116192202,1353128273\nEND', 'START\nVALUES\ntalos-r3-w7-084,tp5n_main_startup_netio_paint,Mozilla-Inbound-Non-PGO,5078cf4f60a3,20121116192202,1353128273\nEND', 'START\nVALUES\ntalos-r3-w7-084,tp5n_nonmain_normal_fileio_paint,Mozilla-Inbound-Non-PGO,5078cf4f60a3,20121116192202,1353128273\nEND', 'START\nVALUES\ntalos-r3-w7-084,tp5n_main_normal_fileio_paint,Mozilla-Inbound-Non-PGO,5078cf4f60a3,20121116192202,1353128273\nEND', 'START\nVALUES\ntalos-r3-w7-084,tp5n_main_shutdown_netio_paint,Mozilla-Inbound-Non-PGO,5078cf4f60a3,20121116192202,1353128273\nEND']

FAIL: Graph server unreachable (5 attempts)
RETURN:send failed, graph server says:
RETURN:to determine average from 'test_run_values' for  20793713 - local variable 'values' referenced before assignment
RETURN:  File "/var/www/html/graphs/server/pyfomatic/collect.py", line 273, in handleRequest
RETURN:    average = valuesReader(databaseCursor, databaseModule, inputStream, metadata)
RETURN:  File "/var/www/html/graphs/server/pyfomatic/collect.py", line 219, in valuesReader
RETURN:    raise DatabaseException("to determine average from 'test_run_values' for  %s - %s" % (metadata.test_run_id, str(x)))
RETURN:
RETURN:
Traceback (most recent call last):
  File "run_tests.py", line 311, in ?
    main()
  File "run_tests.py", line 308, in main
    run_tests(parser)
  File "run_tests.py", line 284, in run_tests
    talos_results.output(results_urls, **results_options)
  File "C:\talos-slave\talos-data\talos\results.py", line 89, in output
    raise e
utils.talosError: 'Graph server unreachable (5 attempts)\nsend failed, graph server says:\nto determine average from \'test_run_values\' for  20793713 - local variable \'values\' referenced before assignment\n  File "/var/www/html/graphs/server/pyfomatic/collect.py", line 273, in handleRequest\n    average = valuesReader(databaseCursor, databaseModule, inputStream, metadata)\n  File "/var/www/html/graphs/server/pyfomatic/collect.py", line 219, in valuesReader\n    raise DatabaseException("to determine average from \'test_run_values\' for  %s - %s" % (metadata.test_run_id, str(x)))\n\n'
program finished with exit code 1
elapsedTime=333.778000
talosError: Graph server unreachable (5 attempts)

TinderboxPrint:send failed, graph server says:

TinderboxPrint:to determine average from 'test_run_values' for  20793713 - local variable 'values' referenced before assignment

TinderboxPrint:  File "/var/www/html/graphs/server/pyfomatic/collect.py", line 273, in handleRequest

TinderboxPrint:    average = valuesReader(databaseCursor, databaseModule, inputStream, metadata)

TinderboxPrint:  File "/var/www/html/graphs/server/pyfomatic/collect.py", line 219, in valuesReader

TinderboxPrint:    raise DatabaseException("to determine average from 'test_run_values' for  %s - %s" % (metadata.test_run_id, str(x)))

which isn't a whole lot nicer.
Depends on: 808547
Since emails aren't visible to sheriffs...

----- Original Message -----
> From: "Joel Maher" <jmaher@snip>
> To: "Taras Glek" <tglek@snip>
> Cc: "Ed Morley" <edmorley@snip>, "Jeffrey Hammel" <jhammel@snip>
> Sent: Friday, 16 November, 2012 7:02:53 PM
> 
> today we turned off the forced error on missing data from xperf and
> now we get a different error if there is missing data:
> https://bugzilla.mozilla.org/show_bug.cgi?id=808547
> 
> The cycle never ends, but we need somebody to look at this xperf data
> or we need to turn it off.  I am still busy with mobile and panda
> boards.  We could hack talos some more to work around this.  But
> this comes back to the argument about what is the point of
> collecting this information if we miss a lot of it.  We turned on
> the error for missing counters/data because we found an xperf
> regression that didn't report for a single changeset.
> 
> Our resources are limited and running tests that nobody cares to fix
> or even track don't motivate folks to keep tests running.


----- Original Message -----
> From: "Taras Glek" <tglek@snip>
>
> I think you guys are the only people in a position to fix these. We
> need
> these tests. If you can rule out machine configuration issues, we can
> put a perf developer on it.


----- Original Message -----
> From: "Joel Maher" <jmaher@snip>
> 
> I am pretty certain the machines have the xperf stuff installed as we
> are getting other counters.
> 
> I suspect 1 of 2 problems:
> 1) xperf isn't collecting data
> 2) the toolchain that parses the xperf.etl file has a corner case and
> we are failing
> 3) if the only counters we are missing are custom firefox counters we
> need to ensure the counters are installed on the machines
> 
> I spent a day looking at this a few weeks ago and I could not
> reproduce the problems on my windows 7 vm.  All other options I can
> think of is to either:
> 1) get a win7 slave from the pool and try to reproduce
> 2) print a lot of debugging in the logs so we can try to find
> incorrect assumptions or data
> 3) query all machines that exhibit this failure pattern and see if
> there are common slave names.


----- Original Message -----
> From: "Taras Glek" <tglek@snip>
>
> That sounds like the way to go. I can ask Aaron Klotz to learn xperf
> stuff to debug it further once you've done above and still need help.
> He's currently working on some flash stuff that should be wrapping up
> in


----- Original Message -----
> From: "Ed Morley" <emorley@snip>
> 
> > On 11/16/2012 11:02 AM, Joel Maher wrote:
> > and now we get a different error if there is missing data:
> > https://bugzilla.mozilla.org/show_bug.cgi?id=808547
> 
> That bug is for the output format obscuring the cause of the failure
> in TBPL's annotated summary.
> 
> Philor has since filed a bug for the failure itself:
> https://bugzilla.mozilla.org/show_bug.cgi?id=812729
Summary: Intermittent xperf "talosError: Graph server unreachable (5 attempts)" while (not) sending missing counters → Intermittent xperf, tpn "talosError: Graph server unreachable (5 attempts)" while (not) sending missing counters
Depends on: 813239
Whiteboard: [orange]
xperf has been hidden on Windows on mozilla-central, mozilla-inbound & due to too many failures. It will be unhidden once fixed.

Note: Due to being hidden, there will be no more tbplbot spam to this bug, but that is not an indication that it is fixed.
(In reply to Ed Morley [UTC+0; email:edmorley@moco] from comment #253)
> mozilla-central, mozilla-inbound & 

& Try
Blocks: 822813
Depends on: 827348
Depends on: 834366
any objection to closing this bug?  It hsa been 4 months and the toolchain for xperf was updated in late June (after the last comment).
wfm :-)
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.