Closed Bug 458093 Opened 12 years ago Closed 11 years ago

Update Talos to send graph server new test information

Categories

(Webtools Graveyard :: Graph Server, defect, P2)

defect

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: rdoherty, Assigned: anodelman)

References

(Blocks 1 open bug)

Details

Attachments

(6 files, 2 obsolete files)

Since we're storing new data for 1.0, we need to make sure Talos sends everything the graph server needs to store the test results. We'll have to write up some basic info on the api.
Blocks: 456568
Assignee: nobody → anodelman
Depends on: 456529
Reassigning to Coop while Alice is out.
Assignee: anodelman → ccooper
Assignee: ccooper → nobody
Assignee: nobody → anodelman
Duplicate of this bug: 452421
Working on getting this lined up with the new db schema collector script format.
Priority: -- → P2
Duplicate of this bug: 474644
Still setting this up stage, basic properties will be:

- ability to send data in the new graph server send data format
- ability to send data to _both the new graph server and the old (ie, complete a test and send full performance results to both graph servers) - this should simplify some of the problems surrounding data migration
This is currently up and running on Talos stage.  It allows for sending data to both graph servers after the completion of test run, it does require changes to master.cfg/perfrunner.py to handle extra required variables (patches to come).  I did not attempt to fix the older graph server send data code (which is convoluted and weird), but have simply wrapped it up in procedures called 'old_' - the hope would be that all that code could be removed after full acceptance of the new graph server.

There was a little feature creep so this also includes:
- addition of a small widget that tracks how long the talos test cycle took to complete, reported to top of the waterfall display
- fix to how metrics are collected on mac/linux (no longer abandons collection of a specific counter on a single failure, but tries again post-failure)

There's some question if some of this code should be pushed into post_file to better isolate the send data code from the rest of the talos code.  I'd be willing to do some reshuffling if that would make better sense.
Attachment #360183 - Flags: review?(bhearsum)
Comment on attachment 360183 [details] [diff] [review]
[Checked in]send talos results in both old/new graph server formats


>Index: cmanager_mac.py
>===================================================================
>RCS file: /cvsroot/mozilla/testing/performance/talos/cmanager_mac.py,v
>retrieving revision 1.4
>diff -u -8 -p -r1.4 cmanager_mac.py
>--- cmanager_mac.py	3 Dec 2008 00:53:20 -0000	1.4
>+++ cmanager_mac.py	30 Jan 2009 23:05:15 -0000
>@@ -192,11 +192,12 @@ class CounterManager(threading.Thread):
>         # counter[0] is a function that gets the current value for
>         # a counter
>         # counter[1] is a list of recorded values
>         try:
>           self.registeredCounters[counter][1].append(
>             self.registeredCounters[counter][0](self.pid))
>         except:
>           # if a counter throws an exception, remove it
>-          self.unregisterCounters([counter])
>+          #self.unregisterCounters([counter]) #don't remove, let it try and resolve on next cycle
>+          print "Error in collecting counter: " + counter
> 

Do you think it makes sense to have disable the counter after a certain number of failures in a row? You know the requirements here better than I, so I'll leave it up to you!

r=bhearsum if you don't think that's necessary.
Attachment #360183 - Flags: review?(bhearsum) → review+
This should get us to stable state so that what is checked in for stage talos reflects what is running on stage talos.
Attachment #360987 - Flags: review?(catlee)
bhearsum - I can see how it would be bad to be throwing rolling errors, I am seeing the error get thrown sporadically in the logs but it hasn't run away yet.  I'm pretty comfortable going with the code that we have now in terms of risk, but this could bear some more investigation.
Attachment #360987 - Flags: review?(catlee) → review+
Attachment #360987 - Attachment description: update talos stage to use double-send → [Checked in]update talos stage to use double-send
Comment on attachment 360987 [details] [diff] [review]
[Checked in]update talos stage to use double-send 

936:6369b9a4abfc
Doesn't actually change how data is currently being sent, but would allow for landing the pending talos patches.  Mostly just some extra variable passing.

Once the new production graph server is available we'd be able to make a one line change to have data be send to it.
Attachment #361650 - Flags: review?(catlee)
Attachment #361650 - Flags: review?(catlee) → review+
Forgot the necessary changes to try in my last patch.
Attachment #361650 - Attachment is obsolete: true
Attachment #362316 - Flags: review?(catlee)
Attachment #362316 - Flags: review?(catlee) → review+
Comment on attachment 362316 [details] [diff] [review]
[Checked in]changes to production talos/talos try buildbot-configs 

changeset:   954:4fc64e763bac
Attachment #362316 - Attachment description: changes to production talos/talos try buildbot-configs → [Checked in]changes to production talos/talos try buildbot-configs
Comment on attachment 360183 [details] [diff] [review]
[Checked in]send talos results in both old/new graph server formats

Checking in PerfConfigurator.py;
/cvsroot/mozilla/testing/performance/talos/PerfConfigurator.py,v  <--  PerfConfigurator.py
new revision: 1.8; previous revision: 1.7
done
Checking in cmanager_linux.py;
/cvsroot/mozilla/testing/performance/talos/cmanager_linux.py,v  <--  cmanager_linux.py
new revision: 1.4; previous revision: 1.3
done
Checking in cmanager_mac.py;
/cvsroot/mozilla/testing/performance/talos/cmanager_mac.py,v  <--  cmanager_mac.py
new revision: 1.5; previous revision: 1.4
done
Checking in post_file.py;
/cvsroot/mozilla/testing/performance/talos/post_file.py,v  <--  post_file.py
new revision: 1.6; previous revision: 1.5
done
Checking in run_tests.py;
/cvsroot/mozilla/testing/performance/talos/run_tests.py,v  <--  run_tests.py
new revision: 1.36; previous revision: 1.35
done
Checking in sample.config;
/cvsroot/mozilla/testing/performance/talos/sample.config,v  <--  sample.config
new revision: 1.24; previous revision: 1.23
done
Checking in utils.py;
/cvsroot/mozilla/testing/performance/talos/utils.py,v  <--  utils.py
new revision: 1.8; previous revision: 1.7
done
Attachment #360183 - Attachment description: send talos results in both old/new graph server formats → [Checked in]send talos results in both old/new graph server formats
After a talos test cycle send the results to both graphs.mozilla.org and graphs-new.mozilla.org.

Will affect the waterfall display and possibly make it more confusing the find Talos results.
Attachment #364170 - Flags: review?(catlee)
Attachment #364170 - Flags: review?(catlee) → review+
Comment on attachment 364170 [details] [diff] [review]
[Checked in]turn on double send to send results to both old/new graph servers

967:6663ebe90c11
Attachment #364170 - Attachment description: turn on double send to send results to both old/new graph servers → [Checked in]turn on double send to send results to both old/new graph servers
For fast cycle machines online the tp test should have the extension '_fast' - right now it creates nonsense tests like 'tdhtml_fast', 'tsspider_fast', etc.
Attachment #365046 - Flags: review?
Attachment #365046 - Flags: review? → review?(aki)
Comment on attachment 365046 [details] [diff] [review]
fix up how fast cycle Talos machines label their test results

looks good to me.  r=aki
Attachment #365046 - Flags: review?(aki) → review+
The old graph server does not support the new test name extensions - since I plan on removing this code once the new graph server becomes the only graph server I've just special cased it to remove it as necessary.
Attachment #365046 - Attachment is obsolete: true
Attachment #365053 - Flags: review?(aki)
Attachment #365053 - Flags: review?(aki) → review+
Comment on attachment 365053 [details] [diff] [review]
[Checked in]fix up how fast cycle Talos machines label their test results (take 2)

Checking in PerfConfigurator.py;
/cvsroot/mozilla/testing/performance/talos/PerfConfigurator.py,v  <--  PerfConfigurator.py
new revision: 1.11; previous revision: 1.10
done
Checking in run_tests.py;
/cvsroot/mozilla/testing/performance/talos/run_tests.py,v  <--  run_tests.py
new revision: 1.38; previous revision: 1.37
done
Attachment #365053 - Attachment description: fix up how fast cycle Talos machines label their test results (take 2) → [Checked in]fix up how fast cycle Talos machines label their test results (take 2)
Should be matched with running this on the production db:

update tests set name = "tp_fast_pbytes" where name = "tp_pbytes_fast";
update tests set name = "tp_fast_rss" where name = "tp_rss_fast";
update tests set name = "tp_fast_%cpu" where name = "tp_%cpu_fast";
update tests set name = "tp_fast_memset" where name = "tp_memset_fast";
Attachment #365067 - Flags: review?(rdoherty)
Attachment #365067 - Flags: review?(rdoherty) → review+
Comment on attachment 365067 [details] [diff] [review]
[Checked in]update fast cycle test names to reflect actual

changeset:   202:1f8de99492df
Attachment #365067 - Attachment description: update fast cycle test names to reflect actual → [Checked in]update fast cycle test names to reflect actual
We're doing successful double send across all branches and all talos boxes, I think that we can call this done.  Other graph server and talos interaction work can go into new bugs.
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Product: Webtools → Webtools Graveyard
You need to log in before you can comment on or make changes to this bug.