771142 - update talos dirty suite profiles

Assignee

Description

•

13 years ago

our ts_places_med and ts_places_max predefined profiles are out of date. We really need to ensure we are using realistic profiles here as Firefox has changed a LOT in the last few years as well as the web itself. We should consider data collected from telemetry when building this.

Joel Maher ( :jmaher ) (UTC -8)

Assignee

Comment 2

•

12 years ago

I have started to document these tests here: https://wiki.mozilla.org/Buildbot/Talos/Tests#ts_places_generated_max We need to update localstore.rdf to make use of the places.sqlite data, otherwise we are just spinning cycles: localstore.rdf from a blank profile on firefox 26: <?xml version="1.0"?>  <RDF:RDF xmlns:NC="http://home.netscape.com/NC-rdf#" xmlns:RDF="http://www.w3.org/1999/02/22-rdf-syntax-ns#"> </RDF:RDF> localstore.rdf that we use: <?xml version="1.0"?> <RDF:RDF xmlns:NC="http://home.netscape.com/NC-rdf#" xmlns:RDF="http://www.w3.org/1999/02/22-rdf-syntax-ns#"> <RDF:Description RDF:about="chrome://browser/content/browser.xul#main-window" width="1024" height="768" sizemode="normal" screenX="0" screenY="0" /> <RDF:Description RDF:about="chrome://browser/content/browser.xul"> <NC:persist RDF:resource="chrome://browser/content/browser.xul#main-window"/> <NC:persist RDF:resource="chrome://browser/content/browser.xul#toolbar-menubar"/> <NC:persist RDF:resource="chrome://browser/content/browser.xul#sidebar-box"/> <NC:persist RDF:resource="chrome://browser/content/browser.xul#sidebar-title"/> </RDF:Description> <RDF:Description RDF:about="chrome://browser/content/browser.xul#sidebar-box" sidebarcommand="" width="" src="" /> </RDF:RDF> What UI elements should be displayed in the UI to take advantage of moz_history*, moz_bookmarks, mozplaces?

Flags: needinfo?(mnoorenberghe+bmo)

Joel Maher ( :jmaher ) (UTC -8)

Assignee

Comment 3

•

12 years ago

taras brought up that we should consider which type of test makes sense to test with different profiles. If startup time with a large profile isn't a big concern for us due to the optimizations we made on raw startup, then lets rethink the test.

Joel Maher ( :jmaher ) (UTC -8)

Assignee

Comment 4

•

12 years ago

looking over the current ts_paint, tspaint_places_generated_[med|max] tests, I see very similar numbers. It is currently as if we are running the same test 3 times with the one exception on winxp where we had a few outliers in med|max and they didn't show up in ts_paint. ts_paint: https://datazilla.mozilla.org/?start=1378310487&stop=1378915287&product=Firefox&repository=Mozilla-Inbound&test=ts_paint&page=ts_paint&x86=true&x86_64=true&project=talos tspaint_max: https://datazilla.mozilla.org/?start=1378311549&stop=1378916349&product=Firefox&repository=Mozilla-Inbound&test=tspaint_places_generated_max&page=tspaint_places_generated_max&project=talos tspaint_med: https://datazilla.mozilla.org/?start=1378311549&stop=1378916349&product=Firefox&repository=Mozilla-Inbound&test=tspaint_places_generated_med&page=tspaint_places_generated_med&project=talos I would like to know: 1) is there agreement the current tests are not providing any value 2) should we consider changing the localstore.rdf or any other profile files to make med|max more real world 3) should we turn off these tests 4) should we use these profiles for different tests

Flags: needinfo?(taras.mozilla)

Avi Halachmi (:avih)

Comment 5

•

12 years ago

Looking over 10 weeks data (I don't think datazilla has data older than that), it looks very flat, and unless we expect it to be flat (do we?), I'd say it's not useful as is. So, here are some guiding questions: 1. Technically, what do these tests measure? 2. Semantically, what should these test measure? 3. Are we still interested in 2? 4. If 3 is yes, then do 1 and 2 match? if not, why and what should be changed?

Joel Maher ( :jmaher ) (UTC -8)

Assignee

Comment 6

•

12 years ago

1) these tests measure the total time of startup. ts_paint measure it with a blank profile, tspaint_generated_med measures it with a medium sized profile, and tspaint_generated_max measures it with a large profile. Startup is defined as from launching the browser to receiving both onload and mozafterpaint events. Once we receive these events a timestamp is recorded and the browser is shutdown. 2) these tests are intended to show the raw startup time for a user with a filled up profile instead of a new user with a blank profile. 3) I don't know. I suspect there is useful data about starting up a browser with a large profile. If we have optimized this use case enough, I would be happy to look at other tests with a large profile, or forget about this. 4) depends on 3. They don't match because nothing we do on startup accesses the tables we have populated. I am not sure if different UI elements showing would help this, or if we need to do specific actions after launching the browser.

(dormant account)

Comment 7

•

12 years ago

1) I can say these tests are flawed, but I don't have time for a more detailed investigation atm. 2) I dont know how localstore.rdf stuff affects us at all. 3) is probably reasonable if these test results correlate exactly. 4) probably not

Flags: needinfo?(taras.mozilla)

Joel Maher ( :jmaher ) (UTC -8)

Assignee

Comment 8

•

12 years ago

In late June, I worked with MattN on this and did some experiments: 1) modifying localstore.rdf (http://people.mozilla.org/~jmaher/places_hack.patch) 2) modified localstore.rdf + waiting for browser-delayed-startup-finished (instead of mozafterpaint+onload) I ran ts_paint, tspaint_places_generated_med, and tspaint_places_generated_max on all platforms 3 times and took the average value of the 3 runs. This generated a table for each test with a values for each platform to document original, scenario #1 and scenario #2: https://docs.google.com/spreadsheet/ccc?key=0ArS97F99-BEZdHVNOEoxYlBLT01oZVVxdjc0MFBaenc&usp=sharing Knowing this, you can see some differences mostly in the localstore.rdf changes. I would expect a 1-3% fluctuation in the numbers normally depending on the platform. Maybe this data will help us make a determination.

Joel Maher ( :jmaher ) (UTC -8)

Assignee

Comment 9

•

12 years ago

:mak, do you have any thoughts here?

Flags: needinfo?(mak77)

Marco Bonardo [:mak]

Comment 10

•

12 years ago

Some points about these tests: 1. we never updated the profiles with real telemetry data, so what we have there is randomly generated content that doesn't represent any real use-case. Moreover these old profiles must be updated everytime the harness runs, spending precious time. 2. we used those tests quite heavily in the past (3-4 years ago) to optimize Places startup and shutdown. Nowadays they are quite optimized, so that you can't tell much difference between normal/med/max. 3. we already dropped "min" test some time ago, the reasons was that we were interested in keeping some sort of test that would detect us regressing in the startup/shutdown path. Med and Max are more than enough. From a Places point of view the only valid use detecting a regression in the startup path with a large database. For that, even just the MAX test would be enough. But today we can also rely on telemetry, so we could probably just remove them both.

Flags: needinfo?(mak77)

Joel Maher ( :jmaher ) (UTC -8)

Assignee

Comment 11

•

12 years ago

Attached patch disable talos dirty tests (1.0) — Details — Splinter Review

this patch disables the dirty tests for all branches. I didn't remove any code, I would like to let this stick for a full cycle before we remove the code from config.py and m-c::talos.json

Assignee: nobody → jmaher

Status: NEW → ASSIGNED

Attachment #807704 - Flags: review?(armenzg)

Flags: needinfo?(mnoorenberghe+bmo)

Armen [:armenzg]

Updated

•

12 years ago

Attachment #807704 - Flags: review?(armenzg) → review+

Joel Maher ( :jmaher ) (UTC -8)

Assignee

Comment 12

•

12 years ago

https://hg.mozilla.org/build/buildbot-configs/rev/bb57f19fcf73 when a reconfig takes place this will roll out.

Joel Maher ( :jmaher ) (UTC -8)

Assignee

Comment 13

•

12 years ago

this took place Friday evening.

Status: ASSIGNED → RESOLVED

Closed: 12 years ago

Resolution: --- → FIXED

Avi Halachmi (:avih)

Comment 14

•

12 years ago

(In reply to Marco Bonardo [:mak] from comment #10) > From a Places point of view the only valid use detecting a regression in the > startup path with a large database. For that, even just the MAX test would > be enough. But today we can also rely on telemetry, so we could probably > just remove them both. Sorry for joining the party unfashionably late. While telemetry might give us good info and feedback, talos provides a much more immediate feedback and much earlier than telemetry would. On top of that, AFAIK telemetry data isn't automatically monitored and doesn't send dev.tree-management regression emails. So it's not unlikely IMO that regressions which only show on telemetry would just go unnoticed, possibly for a long time, and by the time they're noticed, backing out regressions would be much harder. If the MAX test is still considered useful and unbroken, I think we should keep/reinstate it. Otherwise, if it's broken somehow, we should first fix the test and then reinstate it IMO.

Flags: needinfo?(mak77)

Marco Bonardo [:mak]

Comment 15

•

12 years ago

(In reply to Avi Halachmi (:avih) from comment #14) > While telemetry might give us good info and feedback, talos provides a much > more immediate feedback and much earlier than telemetry would. On top of > that, AFAIK telemetry data isn't automatically monitored and doesn't send > dev.tree-management regression emails. So it's not unlikely IMO that > regressions which only show on telemetry would just go unnoticed, possibly > for a long time, and by the time they're noticed, backing out regressions > would be much harder. Yes, this is something we should probably fix by having some hook on telemetry posting to tree-management or a dedicated place. > If the MAX test is still considered useful and unbroken, I think we should > keep/reinstate it. Otherwise, if it's broken somehow, we should first fix > the test and then reinstate it IMO. It's definitely broken (as testing nonsense and unreal data), but as I said it has some usefulness in figuring out instantly if something really stupid has been done in Places startup. You expressed it well. Though, honestly the likely of that happening today is very low, compared to what it was years ago, we just don't need to add any additional startup code to support the functionality we have, nor there's any plan to touch that part of the code. It would be very a very stupid mistake to regress there, and surely I cannot exclude stupid mistakes. But preventing it has an infrastructure price that could be higher than the likely of that happening.

Flags: needinfo?(mak77)

Avi Halachmi (:avih)

Comment 16

•

12 years ago

Following performance test meeting with taras, vladan, jmaher, bc and myself, and after looking at some graphs to compare ts, ts_dirty_med/max (which all show pretty much identical graphs over the last year or so), it was decided to keep the dirty ts tests disabled. The main problem is that apparently there is _some_ value in keeping at least one of the dirty tests, but it's hard to decide on keeping it when we don't actually know the cost. For instance, if keeping the dirty_max test enabled costs us a cent a day, then the value will surely outweigh the cost. According to jmaher, running the dirty profiles tests cost almost the same as running tp5 tests (about 85% of TP5's CPU time), and considering that TP5 cycles 50 pages and gives a lot of info, while the dirty tests only measures startup time and are typically unchanging, this 85% resources number sounds pretty un-proportional cost/value ratio. Another idea which came up is, after also getting some tangible costs numbers (in money, time, productivity, etc), is to "throttle" some tests to run less frequently, or otherwise reduce their runtime or take other measures to reduce cost. But this is already out of the scope of this bug. For this bug, we're keeping these dirty profile tests disabled for now.

Bugzilla

update talos dirty suite profiles

Categories

(Testing :: Talos, defect)

Tracking

(Not tracked)

People

(Reporter: jmaher, Assigned: jmaher)

References

Details

Crash Data

Security

(public)

User Story

Attachments

(1 file)

Description

Comment 2

Comment 3

Comment 4

Comment 5

Comment 6

Comment 7

Comment 8

Comment 9

Comment 10

Comment 11

Updated

Comment 12

Comment 13

Comment 14

Comment 15

Comment 16

Attachment

General

Description

File Name

Content Type