Closed Bug 899197 Opened 9 years ago Closed 7 years ago

Add script to report endurance test results to AWSY

Categories

(Firefox OS Graveyard :: General, defect, P2)

defect

Tracking

(Not tracked)

RESOLVED WONTFIX

People

(Reporter: jgriffin, Unassigned)

References

Details

(Keywords: ateam-b2g-perf-task, perf, Whiteboard: [c=automation p= s= u=])

Attachments

(2 files)

We need to write a script (or find an existing one) that can take results from rwood's endurance tests and report them to AWSY.
Looks like some existing scripts are at https://github.com/Nephyrin/MozAreWeSlimYet (e.g., https://github.com/Nephyrin/MozAreWeSlimYet/blob/master/create_graph_json.py); we may be able to re-use some of this.
As long as this data can be sanely formatted+compressed+rsync/scp'd to AWSY, I'm happy to write a script to import it to the database. Each dump sent should have:
- Build changeset id
- test name (something to identify what test was run, e.g. B2G-Endurance-Foo)
- test time
- The memory report(s)
- [optional] Whether to replace existing test data for this testname+changeset ID (e.g. the test was re-triggered), or just store multiple runs of the test (which AWSY doesn't visualize yet, but will in the future)

The memory reports are taken at a series of checkpoints during the run (e.g. TabsOpen, TabsClosed, BeforeUnload) potentially run over multiple iterations. Each datapoint should have units (bytes, percent, unitless). Which of these datapoints are included on the graphs is configured after the fact. AWSY currently stores full about:memory dumps of every checkpoint as its data, but just treats the reporter names as hierarchal key/values, so other data could be included.

The pseudo-table of this data looks like:

> Builds
> Build ID | Build name
> ----------------------------------------------------
> 1001     | e70411983e393cc845d44181151dfdd028a2f0bb

> Tests
> Test ID | For Build | Test Name     | Test Time
> ------------------------------------------------
> 123     | 1001      | Endurance-Foo | 1375216821
> 204     | 1001      | AWSY-Desktop  | 1375216009

> Data
> Test ID | Units   | reporter name                     | value   | Checkpoint:Iteration
> ---------------------------------------------------------------------------------------
> 123     | bytes   | explicit/foo/tab1                 | 1118208 | TabsOpenForceGC:1
> 123     | percent | explicit/js/something             | 98      | StartSettled:1
> 123     | bytes   | pss/shared-libraries/something.so | 720896  | TabsClosedSettled:1
> 204     | bytes   | explicit/foo/tab1                 | 1448204 | TabsOpenForceGC:1
> ...
Thanks John.  I'm attaching a b2g memory report; is this what you would need, or would you like it pre-processed somehow?
Attached file about-memory-new.zip
(In reply to Jonathan Griffin (:jgriffin) from comment #3)
> Thanks John.  I'm attaching a b2g memory report; is this what you would
> need, or would you like it pre-processed somehow?

If the test is going to be doing only one memory-snapshot this would be fine, but some kind of info about what build the test was run against and when the test was run would need to be included (you could just add a simple test-info.txt to the zip).

If the test wants to provide multiple about:memory snapshots per test run, or multiple iterations, they would need to be merged into one archive and labeled somehow (folders in the archive labeled CheckpointOne CheckpointTwo etc would work fine). As long as everything for one test run is uploaded in one bundle and has the necessary metadata, I can handle mangling it in whatever way in the import script.
Great, we'll work on generating such a package with all the necessary metadata and checkpoints.
(In reply to Jonathan Griffin (:jgriffin) from comment #7)
> https://hg.mozilla.org/integration/mozilla-inbound/rev/9d9856cf1648

Sorry, that was completely the wrong bug!
Hi John,

Please find attached a sample output package from the AWSY automation driver. I was thinking it would be good to plot the memory info for each about-memory folder included in the package, as follows:

start-idle
after-1-cycle
after-10-cycles
after-20-cycles
after-30-cycles
after-40-cycles
after-50-cycles
after-60-cycles

Note that a single 'cycle' refers to one execution of the awsy test on the b2g emulator. The entire test of 60 cycles can be considered as one single AWSY 'iteration'; one 'iteration' will be triggered each time a new tbpl emulator build is made available. One iteration (the entire 60 cycles) takes approximately 6 hours in duration.

You probably guessed, the date and time of the test run is just the name of the .zip. I've included the sources.xml inside, for the emulator tbpl buid info.

Please have a look when you have a chance and let me know if this output package would be sufficient to have the results posted on the areweslimyet.com dashboard. Thanks!
Flags: needinfo?(jschoenick)
:johns ping, thanks
(In reply to Robert Wood [:rwood] from comment #9)
> Created attachment 8375596 [details]
> awsy-2014-02-12_20-15-39-.zip
> 
> Hi John,
> 
> Please find attached a sample output package from the AWSY automation
> driver. I was thinking it would be good to plot the memory info for each
> about-memory folder included in the package, as follows:
> 
> start-idle
> after-1-cycle
> after-10-cycles
> after-20-cycles
> after-30-cycles
> after-40-cycles
> after-50-cycles
> after-60-cycles
> 
> Note that a single 'cycle' refers to one execution of the awsy test on the
> b2g emulator. The entire test of 60 cycles can be considered as one single
> AWSY 'iteration'; one 'iteration' will be triggered each time a new tbpl
> emulator build is made available. One iteration (the entire 60 cycles) takes
> approximately 6 hours in duration. 

Assuming the emulator is not being rebooted between these tests, "cycles" is what current awsy is referring to as "iterations": the test runs N times in the same app session, useful for detecting leaks. If the emulator is being rebooted then these would just be separate test runs against the same build ID, but either can be stored in AWSY.

> You probably guessed, the date and time of the test run is just the name of
> the .zip. I've included the sources.xml inside, for the emulator tbpl buid
> info.

The harness looks that up on pushlog for a timestamp for the build, so the other metadata we need is changeset ID (present in sources.xml) and the time the test was run, which looks like its in the file name.

> Please have a look when you have a chance and let me know if this output
> package would be sufficient to have the results posted on the
> areweslimyet.com dashboard. Thanks!

I believe it would. I need to write a script to process & insert these in the DB, after which the datapoints could be added to the graph config. We may have issues again-doubling the data AWSY is processing per build, since it is still on a sqlite backend. That made sense when we ran one test per m-c build that returned a dozen memory reporters. I think we're 4-5 orders of magnitude above that though, and just inserting results takes minutes :-/

When you have the final data format ready let me know and I can file a bug to write an importer for AWSY. The other step on your end would then be having your harness scp or rsync these to the awsy machine (arcus.mv.mozilla.com) as they're produced so the cron job can pick them up.
Flags: needinfo?(jschoenick)
(In reply to Robert Wood [:rwood] from comment #9)
> Created attachment 8375596 [details]
> awsy-2014-02-12_20-15-39-.zip
> 
> Hi John,
> 
> Please find attached a sample output package from the AWSY automation
> driver. I was thinking it would be good to plot the memory info for each
> about-memory folder included in the package, as follows:
> 
> start-idle
> after-1-cycle
> after-10-cycles
> after-20-cycles
> after-30-cycles
> after-40-cycles
> after-50-cycles
> after-60-cycles


Also, what exactly is the test doing to produce this? On another bug there was discussion of launching all apps on the device - this might be good for detecting leaks, but is not going to give a line that is meaningful over time -- how much memory every app uses on its homescreen combined depends as much on the app behavior over time as the lower layers. We could perhaps graph something like after-60-cycles minutes after-1-cycle (which should be a fairly flat line) to keep an eye on regressions
Filed bug 977354 for fixing AWSY's database situation. It doesn't block this, but if we suddenly can't keep up on test data on arcus when the imports are turned on, that would change.
:johns

Thanks for the feedback! I read about the current AWSY tests and misinterpreted re: iterations. Anyway, for the AWSY emulator tests, no the emulator is not being rebooted between cycles/iterations.

Yes this will be the final data format (except a minor filename change, I'm going to remove the trailing dash from the .zip filename).

Yes to produce the output, one cycle/iteration consists of: Launch a gaia app, wait 10 seconds, press the home button to minimize the app into the background, then repeat for the rest of the main gaia apps. After each cycle there is a 30 second sleep. After every 10th cycle there is an extended sleep of 180 seconds. About memory dumps are grabbed at the iterations noted in the output package.

I was thinking we could graph at the start-idle, after-1-cycle, and then at each of the checkpoints (i.e. every 10 cycles) but whatever you think is best. Sorry I'm not really clear on what you mean by "after-60-cycles minutes after-1-cycle". Do you mean graph the after-60-cycles data and also add a new checkpoint for x minutes after-1-cycle? Thanks.
Flags: needinfo?(jschoenick)
(In reply to Robert Wood [:rwood] from comment #14)
> :johns
> 
> Thanks for the feedback! I read about the current AWSY tests and
> misinterpreted re: iterations. Anyway, for the AWSY emulator tests, no the
> emulator is not being rebooted between cycles/iterations.
> 
> Yes this will be the final data format (except a minor filename change, I'm
> going to remove the trailing dash from the .zip filename).
> 
> Yes to produce the output, one cycle/iteration consists of: Launch a gaia
> app, wait 10 seconds, press the home button to minimize the app into the
> background, then repeat for the rest of the main gaia apps. After each cycle
> there is a 30 second sleep. After every 10th cycle there is an extended
> sleep of 180 seconds. About memory dumps are grabbed at the iterations noted
> in the output package.
> 
> I was thinking we could graph at the start-idle, after-1-cycle, and then at
> each of the checkpoints (i.e. every 10 cycles) but whatever you think is
> best. Sorry I'm not really clear on what you mean by "after-60-cycles
> minutes after-1-cycle". Do you mean graph the after-60-cycles data and also
> add a new checkpoint for x minutes after-1-cycle? Thanks.

Sorry, that was supposed to be "after-60-cycles *minus* after-1-cycle", that is, graph the difference in usage after 60 vs 1 cycle -- but I'm not sure if this helps us much either. My concern is that how much memory one cycle takes doesn't really mean anything when graphed: Adding new apps or changing app behavior will make it go up or down, regardless of how efficient gecko is being underneath it. On memory constrained devices it's probably always going to be taking all available memory after one cycle, so the line would be more or less flat at 512Megs.
Flags: needinfo?(jschoenick)
We can change the behavior of the tests easily enough, if we can define what would produce interesting data.  How do these tests work on Android?
(In reply to Jonathan Griffin (:jgriffin) from comment #16)
> We can change the behavior of the tests easily enough, if we can define what
> would produce interesting data.  How do these tests work on Android?

:kats would be the person to talk to there, but I believe the tests there are similar to desktop - running a static set of pages.

The pages being loaded never changing is what makes the desktop/mobile AWSY graphs interesting: How much memory does firefox use to load these exact pages? However, If both the test (the start page of all gaia apps) and gecko are changing -- and every-gaia-app uses more memory than the emulator has anyway -- then the line we get from plotting total memory use after a cycle doesn't mean much. However, the memory reports have a lot more data than just total memory usage:  How much memory the system process uses, how much non-gecko memory is in use, how long the average GC takes, and so on.

I should note that AWSY stores all data, even un-graphed stuff, so changing what shows up on the graphs later is easy -- we just want to make sure that whatever we're looking to capture is covered by the test.
So, I'm not sure what we want to do here.  It sounds like we may want to report data that we're currently producing, but possibly graph a different set of numbers than we do for desktop/fennec.

Alternately, we need to figure out what we're trying to measure total memory for (is it just gecko, or is it some combination of gecko/gaia) and figure out how to write tests for it.  Adding :khuey in case he has ideas.
Flags: needinfo?(khuey)
Blocks: 979040
Blocks: 979046
(In reply to Robert Wood [:rwood] from comment #14)
> Yes this will be the final data format (except a minor filename change, I'm
> going to remove the trailing dash from the .zip filename).

I filed bug 979040 to add support to the awsy scripts for importing this format, and bug 979046 for actually turning on graphs on the AWSY homepage.
:johns, are there any credentials that I need for the AWSY machine (arcus.mv.mozilla.com) in order to copy the .zip package there? If so could you please email them to me? There's a Jenkins SCP add-on that I'd like to try to setup to have the automation copy over the results. Thanks!
Flags: needinfo?(jschoenick)
(In reply to Robert Wood [:rwood] from comment #20)
> :johns, are there any credentials that I need for the AWSY machine
> (arcus.mv.mozilla.com) in order to copy the .zip package there? If so could
> you please email them to me? There's a Jenkins SCP add-on that I'd like to
> try to setup to have the automation copy over the results. Thanks!

If you can generate a SSH keypair for automation and send me the pub key, I can create an account on arcus with permission to SCP to an incoming directory of some sort
Flags: needinfo?(jschoenick)
Sent :johns the public key, sorry for the delay!
Hi :johns, is there anything else you need from me besides the key? Thanks!
Flags: needinfo?(jschoenick)
this bug has sat idle for sometime, are we still waiting on :johns to apply the public key?
Sorry I have been busy with other things and this has been on the back burner. Bug 979040 and Bug 979046 need to be fixed for this data to actually be processed, so I need to set aside some time to do the work on the areweslimyet.com side. I'll see if I an devote some time to this in the coming week
Keywords: perf
Priority: -- → P2
Whiteboard: [c=automation p= s= u=]
Monthly ping for status.  If this requires a lot of work on the AWSY side, is there some basic data we could report to datazilla in the interim?
(In reply to Jonathan Griffin (:jgriffin) from comment #26)
> Monthly ping for status.  If this requires a lot of work on the AWSY side,
> is there some basic data we could report to datazilla in the interim?

Sorry again for the delay on this, AWSY work got pushed down my queue for more urgent platform stuff. I spoke to rwood in person about this last week -- the blockers for this are:
1) The AWSY machine is overloaded. I just last week got a new machine delivered, so I need to format it and move stuff there.
2) The database backend, sqlite, is woefully inadequate for the current amount of data coming in, and it's not clear that the machine would be able to keep up if we doubled it again. However, with the new box in question, I think we can give it a shot, and turn it back off if things start lagging.

So let's try this: I'll setup the new machine this week and we can start importing data -- if it looks like the harness hasn't ground to a halt we can go ahead and start displaying it, otherwise it may need to be switched back off blocking on bug 977354.
I think based on yesterday's discussion we decided to go the other way; reporting the b2g AWSY data to the gaia endurance test stuff?
Flags: needinfo?(khuey) → needinfo?(jgriffin)
(In reply to Kyle Huey [:khuey] (khuey@mozilla.com) from comment #28)
> I think based on yesterday's discussion we decided to go the other way;
> reporting the b2g AWSY data to the gaia endurance test stuff?

Actually, somewhat the opposite.  Instead of maintaining two memory-related test frameworks, we're going to refactor gaia-endurance tests and make them take full memory reports, and submit those to AWSY.  We then don't need an entirely separate AWSY framework for B2G.

It probably doesn't change the work that's needed on the AWSY site, AFAICT.
Flags: needinfo?(jgriffin)
FWIW, I partially setup the new AWSY box earlier this week and looked at how much we're IO bound on sqlite, and I think we don't need to block on PgSQL. Once I get a chance to finish migrating arcus I can hookup the importer script and we can start pushing data over, and maybe push the other problems with AWSY down the road some more :-P
Eric took over AWSY from John and likely has more time to help here.
Flags: needinfo?(john) → needinfo?(erahm)
The gaia-ui endurance tests are no longer running/maintained, this is not required anymore.
Status: NEW → RESOLVED
Closed: 7 years ago
Flags: needinfo?(erahm)
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.