899197 - Add script to report endurance test results to AWSY

Reporter

Description

•

12 years ago

We need to write a script (or find an existing one) that can take results from rwood's endurance tests and report them to AWSY.

Jonathan Griffin (:jgriffin)

Reporter

Comment 1

•

12 years ago

Looks like some existing scripts are at https://github.com/Nephyrin/MozAreWeSlimYet (e.g., https://github.com/Nephyrin/MozAreWeSlimYet/blob/master/create_graph_json.py); we may be able to re-use some of this.

John Schoenick [:johns]

Comment 2

•

12 years ago

As long as this data can be sanely formatted+compressed+rsync/scp'd to AWSY, I'm happy to write a script to import it to the database. Each dump sent should have: - Build changeset id - test name (something to identify what test was run, e.g. B2G-Endurance-Foo) - test time - The memory report(s) - [optional] Whether to replace existing test data for this testname+changeset ID (e.g. the test was re-triggered), or just store multiple runs of the test (which AWSY doesn't visualize yet, but will in the future) The memory reports are taken at a series of checkpoints during the run (e.g. TabsOpen, TabsClosed, BeforeUnload) potentially run over multiple iterations. Each datapoint should have units (bytes, percent, unitless). Which of these datapoints are included on the graphs is configured after the fact. AWSY currently stores full about:memory dumps of every checkpoint as its data, but just treats the reporter names as hierarchal key/values, so other data could be included. The pseudo-table of this data looks like: > Builds > Build ID | Build name > ---------------------------------------------------- > 1001 | e70411983e393cc845d44181151dfdd028a2f0bb > Tests > Test ID | For Build | Test Name | Test Time > ------------------------------------------------ > 123 | 1001 | Endurance-Foo | 1375216821 > 204 | 1001 | AWSY-Desktop | 1375216009 > Data > Test ID | Units | reporter name | value | Checkpoint:Iteration > --------------------------------------------------------------------------------------- > 123 | bytes | explicit/foo/tab1 | 1118208 | TabsOpenForceGC:1 > 123 | percent | explicit/js/something | 98 | StartSettled:1 > 123 | bytes | pss/shared-libraries/something.so | 720896 | TabsClosedSettled:1 > 204 | bytes | explicit/foo/tab1 | 1448204 | TabsOpenForceGC:1 > ...

Jonathan Griffin (:jgriffin)

Reporter

Comment 3

•

12 years ago

Thanks John. I'm attaching a b2g memory report; is this what you would need, or would you like it pre-processed somehow?

Jonathan Griffin (:jgriffin)

Reporter

Comment 4

•

12 years ago

Attached file about-memory-new.zip — Details

John Schoenick [:johns]

Comment 5

•

12 years ago

(In reply to Jonathan Griffin (:jgriffin) from comment #3) > Thanks John. I'm attaching a b2g memory report; is this what you would > need, or would you like it pre-processed somehow? If the test is going to be doing only one memory-snapshot this would be fine, but some kind of info about what build the test was run against and when the test was run would need to be included (you could just add a simple test-info.txt to the zip). If the test wants to provide multiple about:memory snapshots per test run, or multiple iterations, they would need to be merged into one archive and labeled somehow (folders in the archive labeled CheckpointOne CheckpointTwo etc would work fine). As long as everything for one test run is uploaded in one bundle and has the necessary metadata, I can handle mangling it in whatever way in the import script.

Jonathan Griffin (:jgriffin)

Reporter

Comment 6

•

12 years ago

Great, we'll work on generating such a package with all the necessary metadata and checkpoints.

Jonathan Griffin (:jgriffin)

Reporter

Comment 7

•

12 years ago

https://hg.mozilla.org/integration/mozilla-inbound/rev/9d9856cf1648

Jonathan Griffin (:jgriffin)

Reporter

Comment 8

•

12 years ago

(In reply to Jonathan Griffin (:jgriffin) from comment #7) > https://hg.mozilla.org/integration/mozilla-inbound/rev/9d9856cf1648 Sorry, that was completely the wrong bug!

Robert Wood [:rwood]

Comment 9

•

11 years ago

Attached file awsy-2014-02-12_20-15-39-.zip — Details

Hi John, Please find attached a sample output package from the AWSY automation driver. I was thinking it would be good to plot the memory info for each about-memory folder included in the package, as follows: start-idle after-1-cycle after-10-cycles after-20-cycles after-30-cycles after-40-cycles after-50-cycles after-60-cycles Note that a single 'cycle' refers to one execution of the awsy test on the b2g emulator. The entire test of 60 cycles can be considered as one single AWSY 'iteration'; one 'iteration' will be triggered each time a new tbpl emulator build is made available. One iteration (the entire 60 cycles) takes approximately 6 hours in duration. You probably guessed, the date and time of the test run is just the name of the .zip. I've included the sources.xml inside, for the emulator tbpl buid info. Please have a look when you have a chance and let me know if this output package would be sufficient to have the results posted on the areweslimyet.com dashboard. Thanks!

Flags: needinfo?(jschoenick)

Robert Wood [:rwood]

Comment 10

•

11 years ago

:johns ping, thanks

John Schoenick [:johns]

Comment 11

•

11 years ago

(In reply to Robert Wood [:rwood] from comment #9) > Created attachment 8375596 [details] > awsy-2014-02-12_20-15-39-.zip > > Hi John, > > Please find attached a sample output package from the AWSY automation > driver. I was thinking it would be good to plot the memory info for each > about-memory folder included in the package, as follows: > > start-idle > after-1-cycle > after-10-cycles > after-20-cycles > after-30-cycles > after-40-cycles > after-50-cycles > after-60-cycles > > Note that a single 'cycle' refers to one execution of the awsy test on the > b2g emulator. The entire test of 60 cycles can be considered as one single > AWSY 'iteration'; one 'iteration' will be triggered each time a new tbpl > emulator build is made available. One iteration (the entire 60 cycles) takes > approximately 6 hours in duration. Assuming the emulator is not being rebooted between these tests, "cycles" is what current awsy is referring to as "iterations": the test runs N times in the same app session, useful for detecting leaks. If the emulator is being rebooted then these would just be separate test runs against the same build ID, but either can be stored in AWSY. > You probably guessed, the date and time of the test run is just the name of > the .zip. I've included the sources.xml inside, for the emulator tbpl buid > info. The harness looks that up on pushlog for a timestamp for the build, so the other metadata we need is changeset ID (present in sources.xml) and the time the test was run, which looks like its in the file name. > Please have a look when you have a chance and let me know if this output > package would be sufficient to have the results posted on the > areweslimyet.com dashboard. Thanks! I believe it would. I need to write a script to process & insert these in the DB, after which the datapoints could be added to the graph config. We may have issues again-doubling the data AWSY is processing per build, since it is still on a sqlite backend. That made sense when we ran one test per m-c build that returned a dozen memory reporters. I think we're 4-5 orders of magnitude above that though, and just inserting results takes minutes :-/ When you have the final data format ready let me know and I can file a bug to write an importer for AWSY. The other step on your end would then be having your harness scp or rsync these to the awsy machine (arcus.mv.mozilla.com) as they're produced so the cron job can pick them up.

Flags: needinfo?(jschoenick)

John Schoenick [:johns]

Comment 12

•

11 years ago

(In reply to Robert Wood [:rwood] from comment #9) > Created attachment 8375596 [details] > awsy-2014-02-12_20-15-39-.zip > > Hi John, > > Please find attached a sample output package from the AWSY automation > driver. I was thinking it would be good to plot the memory info for each > about-memory folder included in the package, as follows: > > start-idle > after-1-cycle > after-10-cycles > after-20-cycles > after-30-cycles > after-40-cycles > after-50-cycles > after-60-cycles Also, what exactly is the test doing to produce this? On another bug there was discussion of launching all apps on the device - this might be good for detecting leaks, but is not going to give a line that is meaningful over time -- how much memory every app uses on its homescreen combined depends as much on the app behavior over time as the lower layers. We could perhaps graph something like after-60-cycles minutes after-1-cycle (which should be a fairly flat line) to keep an eye on regressions

John Schoenick [:johns]

Comment 13

•

11 years ago

Filed bug 977354 for fixing AWSY's database situation. It doesn't block this, but if we suddenly can't keep up on test data on arcus when the imports are turned on, that would change.

Robert Wood [:rwood]

Comment 14

•

11 years ago

:johns Thanks for the feedback! I read about the current AWSY tests and misinterpreted re: iterations. Anyway, for the AWSY emulator tests, no the emulator is not being rebooted between cycles/iterations. Yes this will be the final data format (except a minor filename change, I'm going to remove the trailing dash from the .zip filename). Yes to produce the output, one cycle/iteration consists of: Launch a gaia app, wait 10 seconds, press the home button to minimize the app into the background, then repeat for the rest of the main gaia apps. After each cycle there is a 30 second sleep. After every 10th cycle there is an extended sleep of 180 seconds. About memory dumps are grabbed at the iterations noted in the output package. I was thinking we could graph at the start-idle, after-1-cycle, and then at each of the checkpoints (i.e. every 10 cycles) but whatever you think is best. Sorry I'm not really clear on what you mean by "after-60-cycles minutes after-1-cycle". Do you mean graph the after-60-cycles data and also add a new checkpoint for x minutes after-1-cycle? Thanks.

Flags: needinfo?(jschoenick)

John Schoenick [:johns]

Comment 15

•

11 years ago

(In reply to Robert Wood [:rwood] from comment #14) > :johns > > Thanks for the feedback! I read about the current AWSY tests and > misinterpreted re: iterations. Anyway, for the AWSY emulator tests, no the > emulator is not being rebooted between cycles/iterations. > > Yes this will be the final data format (except a minor filename change, I'm > going to remove the trailing dash from the .zip filename). > > Yes to produce the output, one cycle/iteration consists of: Launch a gaia > app, wait 10 seconds, press the home button to minimize the app into the > background, then repeat for the rest of the main gaia apps. After each cycle > there is a 30 second sleep. After every 10th cycle there is an extended > sleep of 180 seconds. About memory dumps are grabbed at the iterations noted > in the output package. > > I was thinking we could graph at the start-idle, after-1-cycle, and then at > each of the checkpoints (i.e. every 10 cycles) but whatever you think is > best. Sorry I'm not really clear on what you mean by "after-60-cycles > minutes after-1-cycle". Do you mean graph the after-60-cycles data and also > add a new checkpoint for x minutes after-1-cycle? Thanks. Sorry, that was supposed to be "after-60-cycles *minus* after-1-cycle", that is, graph the difference in usage after 60 vs 1 cycle -- but I'm not sure if this helps us much either. My concern is that how much memory one cycle takes doesn't really mean anything when graphed: Adding new apps or changing app behavior will make it go up or down, regardless of how efficient gecko is being underneath it. On memory constrained devices it's probably always going to be taking all available memory after one cycle, so the line would be more or less flat at 512Megs.

Flags: needinfo?(jschoenick)

Jonathan Griffin (:jgriffin)

Reporter

Comment 16

•

11 years ago

We can change the behavior of the tests easily enough, if we can define what would produce interesting data. How do these tests work on Android?

John Schoenick [:johns]

Comment 17

•

11 years ago

(In reply to Jonathan Griffin (:jgriffin) from comment #16) > We can change the behavior of the tests easily enough, if we can define what > would produce interesting data. How do these tests work on Android? :kats would be the person to talk to there, but I believe the tests there are similar to desktop - running a static set of pages. The pages being loaded never changing is what makes the desktop/mobile AWSY graphs interesting: How much memory does firefox use to load these exact pages? However, If both the test (the start page of all gaia apps) and gecko are changing -- and every-gaia-app uses more memory than the emulator has anyway -- then the line we get from plotting total memory use after a cycle doesn't mean much. However, the memory reports have a lot more data than just total memory usage: How much memory the system process uses, how much non-gecko memory is in use, how long the average GC takes, and so on. I should note that AWSY stores all data, even un-graphed stuff, so changing what shows up on the graphs later is easy -- we just want to make sure that whatever we're looking to capture is covered by the test.

Jonathan Griffin (:jgriffin)

Reporter

Comment 18

•

11 years ago

So, I'm not sure what we want to do here. It sounds like we may want to report data that we're currently producing, but possibly graph a different set of numbers than we do for desktop/fennec. Alternately, we need to figure out what we're trying to measure total memory for (is it just gecko, or is it some combination of gecko/gaia) and figure out how to write tests for it. Adding :khuey in case he has ideas.

Flags: needinfo?(khuey)

John Schoenick [:johns]

Updated

•

11 years ago

Blocks: 979040

John Schoenick [:johns]

Updated

•

11 years ago

Blocks: 979046

John Schoenick [:johns]

Comment 19

•

11 years ago

(In reply to Robert Wood [:rwood] from comment #14) > Yes this will be the final data format (except a minor filename change, I'm > going to remove the trailing dash from the .zip filename). I filed bug 979040 to add support to the awsy scripts for importing this format, and bug 979046 for actually turning on graphs on the AWSY homepage.

Robert Wood [:rwood]

Comment 20

•

11 years ago

:johns, are there any credentials that I need for the AWSY machine (arcus.mv.mozilla.com) in order to copy the .zip package there? If so could you please email them to me? There's a Jenkins SCP add-on that I'd like to try to setup to have the automation copy over the results. Thanks!

Flags: needinfo?(jschoenick)

John Schoenick [:johns]

Comment 21

•

11 years ago

(In reply to Robert Wood [:rwood] from comment #20) > :johns, are there any credentials that I need for the AWSY machine > (arcus.mv.mozilla.com) in order to copy the .zip package there? If so could > you please email them to me? There's a Jenkins SCP add-on that I'd like to > try to setup to have the automation copy over the results. Thanks! If you can generate a SSH keypair for automation and send me the pub key, I can create an account on arcus with permission to SCP to an incoming directory of some sort

Flags: needinfo?(jschoenick)

Robert Wood [:rwood]

Comment 22

•

11 years ago

Sent :johns the public key, sorry for the delay!

Robert Wood [:rwood]

Comment 23

•

11 years ago

Hi :johns, is there anything else you need from me besides the key? Thanks!

Flags: needinfo?(jschoenick)

Joel Maher ( :jmaher ) (UTC -8)

Comment 24

•

11 years ago

this bug has sat idle for sometime, are we still waiting on :johns to apply the public key?

John Schoenick [:johns]

Comment 25

•

11 years ago

Sorry I have been busy with other things and this has been on the back burner. Bug 979040 and Bug 979046 need to be fixed for this data to actually be processed, so I need to set aside some time to do the work on the areweslimyet.com side. I'll see if I an devote some time to this in the coming week

Jonathan Griffin (:jgriffin)

Reporter

Updated

•

11 years ago

Keywords: ateam-b2g-perf-task

Mike Lee [:mlee]

Updated

•

11 years ago

Keywords: perf

Priority: -- → P2

Whiteboard: [c=automation p= s= u=]

Jonathan Griffin (:jgriffin)

Reporter

Comment 26

•

11 years ago

Monthly ping for status. If this requires a lot of work on the AWSY side, is there some basic data we could report to datazilla in the interim?

John Schoenick [:johns]

Comment 27

•

11 years ago

(In reply to Jonathan Griffin (:jgriffin) from comment #26) > Monthly ping for status. If this requires a lot of work on the AWSY side, > is there some basic data we could report to datazilla in the interim? Sorry again for the delay on this, AWSY work got pushed down my queue for more urgent platform stuff. I spoke to rwood in person about this last week -- the blockers for this are: 1) The AWSY machine is overloaded. I just last week got a new machine delivered, so I need to format it and move stuff there. 2) The database backend, sqlite, is woefully inadequate for the current amount of data coming in, and it's not clear that the machine would be able to keep up if we doubled it again. However, with the new box in question, I think we can give it a shot, and turn it back off if things start lagging. So let's try this: I'll setup the new machine this week and we can start importing data -- if it looks like the harness hasn't ground to a halt we can go ahead and start displaying it, otherwise it may need to be switched back off blocking on bug 977354.

Kyle Huey (Exited; not receiving bugmail, old account, do not use)

Comment 28

•

11 years ago

I think based on yesterday's discussion we decided to go the other way; reporting the b2g AWSY data to the gaia endurance test stuff?

Flags: needinfo?(khuey) → needinfo?(jgriffin)

Jonathan Griffin (:jgriffin)

Reporter

Comment 29

•

11 years ago

(In reply to Kyle Huey [:khuey] (khuey@mozilla.com) from comment #28) > I think based on yesterday's discussion we decided to go the other way; > reporting the b2g AWSY data to the gaia endurance test stuff? Actually, somewhat the opposite. Instead of maintaining two memory-related test frameworks, we're going to refactor gaia-endurance tests and make them take full memory reports, and submit those to AWSY. We then don't need an entirely separate AWSY framework for B2G. It probably doesn't change the work that's needed on the AWSY site, AFAICT.

Flags: needinfo?(jgriffin)

John Schoenick [:johns]

Comment 30

•

11 years ago

FWIW, I partially setup the new AWSY box earlier this week and looked at how much we're IO bound on sqlite, and I think we don't need to block on PgSQL. Once I get a chance to finish migrating arcus I can hookup the importer script and we can start pushing data over, and maybe push the other problems with AWSY down the road some more :-P

Andrew Overholt [:overholt]

Comment 31

•

11 years ago

Eric took over AWSY from John and likely has more time to help here.

Flags: needinfo?(john) → needinfo?(erahm)

Robert Wood [:rwood]

Comment 32

•

11 years ago

The gaia-ui endurance tests are no longer running/maintained, this is not required anymore.

Status: NEW → RESOLVED

Closed: 11 years ago

Flags: needinfo?(erahm)

Resolution: --- → WONTFIX

about-memory-new.zip 12 years ago Jonathan Griffin (:jgriffin) 5.24 MB, application/octet-stream		Details
awsy-2014-02-12_20-15-39-.zip 11 years ago Robert Wood [:rwood] 1.66 MB, application/zip		Details