Closed Bug 771142 Opened 13 years ago Closed 12 years ago

update talos dirty suite profiles

Categories

(Testing :: Talos, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: jmaher, Assigned: jmaher)

References

Details

Attachments

(1 file)

our ts_places_med and ts_places_max predefined profiles are out of date. We really need to ensure we are using realistic profiles here as Firefox has changed a LOT in the last few years as well as the web itself. We should consider data collected from telemetry when building this.
I have started to document these tests here: https://wiki.mozilla.org/Buildbot/Talos/Tests#ts_places_generated_max We need to update localstore.rdf to make use of the places.sqlite data, otherwise we are just spinning cycles: localstore.rdf from a blank profile on firefox 26: <?xml version="1.0"?> <!-- This Source Code Form is subject to the terms of the Mozilla Public - License, v. 2.0. If a copy of the MPL was not distributed with this - file, You can obtain one at http://mozilla.org/MPL/2.0/. --> <RDF:RDF xmlns:NC="http://home.netscape.com/NC-rdf#" xmlns:RDF="http://www.w3.org/1999/02/22-rdf-syntax-ns#"> </RDF:RDF> localstore.rdf that we use: <?xml version="1.0"?> <RDF:RDF xmlns:NC="http://home.netscape.com/NC-rdf#" xmlns:RDF="http://www.w3.org/1999/02/22-rdf-syntax-ns#"> <RDF:Description RDF:about="chrome://browser/content/browser.xul#main-window" width="1024" height="768" sizemode="normal" screenX="0" screenY="0" /> <RDF:Description RDF:about="chrome://browser/content/browser.xul"> <NC:persist RDF:resource="chrome://browser/content/browser.xul#main-window"/> <NC:persist RDF:resource="chrome://browser/content/browser.xul#toolbar-menubar"/> <NC:persist RDF:resource="chrome://browser/content/browser.xul#sidebar-box"/> <NC:persist RDF:resource="chrome://browser/content/browser.xul#sidebar-title"/> </RDF:Description> <RDF:Description RDF:about="chrome://browser/content/browser.xul#sidebar-box" sidebarcommand="" width="" src="" /> </RDF:RDF> What UI elements should be displayed in the UI to take advantage of moz_history*, moz_bookmarks, mozplaces?
Flags: needinfo?(mnoorenberghe+bmo)
taras brought up that we should consider which type of test makes sense to test with different profiles. If startup time with a large profile isn't a big concern for us due to the optimizations we made on raw startup, then lets rethink the test.
looking over the current ts_paint, tspaint_places_generated_[med|max] tests, I see very similar numbers. It is currently as if we are running the same test 3 times with the one exception on winxp where we had a few outliers in med|max and they didn't show up in ts_paint. ts_paint: https://datazilla.mozilla.org/?start=1378310487&stop=1378915287&product=Firefox&repository=Mozilla-Inbound&test=ts_paint&page=ts_paint&x86=true&x86_64=true&project=talos tspaint_max: https://datazilla.mozilla.org/?start=1378311549&stop=1378916349&product=Firefox&repository=Mozilla-Inbound&test=tspaint_places_generated_max&page=tspaint_places_generated_max&project=talos tspaint_med: https://datazilla.mozilla.org/?start=1378311549&stop=1378916349&product=Firefox&repository=Mozilla-Inbound&test=tspaint_places_generated_med&page=tspaint_places_generated_med&project=talos I would like to know: 1) is there agreement the current tests are not providing any value 2) should we consider changing the localstore.rdf or any other profile files to make med|max more real world 3) should we turn off these tests 4) should we use these profiles for different tests
Flags: needinfo?(taras.mozilla)
Looking over 10 weeks data (I don't think datazilla has data older than that), it looks very flat, and unless we expect it to be flat (do we?), I'd say it's not useful as is. So, here are some guiding questions: 1. Technically, what do these tests measure? 2. Semantically, what should these test measure? 3. Are we still interested in 2? 4. If 3 is yes, then do 1 and 2 match? if not, why and what should be changed?
1) these tests measure the total time of startup. ts_paint measure it with a blank profile, tspaint_generated_med measures it with a medium sized profile, and tspaint_generated_max measures it with a large profile. Startup is defined as from launching the browser to receiving both onload and mozafterpaint events. Once we receive these events a timestamp is recorded and the browser is shutdown. 2) these tests are intended to show the raw startup time for a user with a filled up profile instead of a new user with a blank profile. 3) I don't know. I suspect there is useful data about starting up a browser with a large profile. If we have optimized this use case enough, I would be happy to look at other tests with a large profile, or forget about this. 4) depends on 3. They don't match because nothing we do on startup accesses the tables we have populated. I am not sure if different UI elements showing would help this, or if we need to do specific actions after launching the browser.
1) I can say these tests are flawed, but I don't have time for a more detailed investigation atm. 2) I dont know how localstore.rdf stuff affects us at all. 3) is probably reasonable if these test results correlate exactly. 4) probably not
Flags: needinfo?(taras.mozilla)
In late June, I worked with MattN on this and did some experiments: 1) modifying localstore.rdf (http://people.mozilla.org/~jmaher/places_hack.patch) 2) modified localstore.rdf + waiting for browser-delayed-startup-finished (instead of mozafterpaint+onload) I ran ts_paint, tspaint_places_generated_med, and tspaint_places_generated_max on all platforms 3 times and took the average value of the 3 runs. This generated a table for each test with a values for each platform to document original, scenario #1 and scenario #2: https://docs.google.com/spreadsheet/ccc?key=0ArS97F99-BEZdHVNOEoxYlBLT01oZVVxdjc0MFBaenc&usp=sharing Knowing this, you can see some differences mostly in the localstore.rdf changes. I would expect a 1-3% fluctuation in the numbers normally depending on the platform. Maybe this data will help us make a determination.
:mak, do you have any thoughts here?
Flags: needinfo?(mak77)
Some points about these tests: 1. we never updated the profiles with real telemetry data, so what we have there is randomly generated content that doesn't represent any real use-case. Moreover these old profiles must be updated everytime the harness runs, spending precious time. 2. we used those tests quite heavily in the past (3-4 years ago) to optimize Places startup and shutdown. Nowadays they are quite optimized, so that you can't tell much difference between normal/med/max. 3. we already dropped "min" test some time ago, the reasons was that we were interested in keeping some sort of test that would detect us regressing in the startup/shutdown path. Med and Max are more than enough. From a Places point of view the only valid use detecting a regression in the startup path with a large database. For that, even just the MAX test would be enough. But today we can also rely on telemetry, so we could probably just remove them both.
Flags: needinfo?(mak77)
this patch disables the dirty tests for all branches. I didn't remove any code, I would like to let this stick for a full cycle before we remove the code from config.py and m-c::talos.json
Assignee: nobody → jmaher
Status: NEW → ASSIGNED
Attachment #807704 - Flags: review?(armenzg)
Flags: needinfo?(mnoorenberghe+bmo)
Attachment #807704 - Flags: review?(armenzg) → review+
https://hg.mozilla.org/build/buildbot-configs/rev/bb57f19fcf73 when a reconfig takes place this will roll out.
this took place Friday evening.
Status: ASSIGNED → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
(In reply to Marco Bonardo [:mak] from comment #10) > From a Places point of view the only valid use detecting a regression in the > startup path with a large database. For that, even just the MAX test would > be enough. But today we can also rely on telemetry, so we could probably > just remove them both. Sorry for joining the party unfashionably late. While telemetry might give us good info and feedback, talos provides a much more immediate feedback and much earlier than telemetry would. On top of that, AFAIK telemetry data isn't automatically monitored and doesn't send dev.tree-management regression emails. So it's not unlikely IMO that regressions which only show on telemetry would just go unnoticed, possibly for a long time, and by the time they're noticed, backing out regressions would be much harder. If the MAX test is still considered useful and unbroken, I think we should keep/reinstate it. Otherwise, if it's broken somehow, we should first fix the test and then reinstate it IMO.
Flags: needinfo?(mak77)
(In reply to Avi Halachmi (:avih) from comment #14) > While telemetry might give us good info and feedback, talos provides a much > more immediate feedback and much earlier than telemetry would. On top of > that, AFAIK telemetry data isn't automatically monitored and doesn't send > dev.tree-management regression emails. So it's not unlikely IMO that > regressions which only show on telemetry would just go unnoticed, possibly > for a long time, and by the time they're noticed, backing out regressions > would be much harder. Yes, this is something we should probably fix by having some hook on telemetry posting to tree-management or a dedicated place. > If the MAX test is still considered useful and unbroken, I think we should > keep/reinstate it. Otherwise, if it's broken somehow, we should first fix > the test and then reinstate it IMO. It's definitely broken (as testing nonsense and unreal data), but as I said it has some usefulness in figuring out instantly if something really stupid has been done in Places startup. You expressed it well. Though, honestly the likely of that happening today is very low, compared to what it was years ago, we just don't need to add any additional startup code to support the functionality we have, nor there's any plan to touch that part of the code. It would be very a very stupid mistake to regress there, and surely I cannot exclude stupid mistakes. But preventing it has an infrastructure price that could be higher than the likely of that happening.
Flags: needinfo?(mak77)
Following performance test meeting with taras, vladan, jmaher, bc and myself, and after looking at some graphs to compare ts, ts_dirty_med/max (which all show pretty much identical graphs over the last year or so), it was decided to keep the dirty ts tests disabled. The main problem is that apparently there is _some_ value in keeping at least one of the dirty tests, but it's hard to decide on keeping it when we don't actually know the cost. For instance, if keeping the dirty_max test enabled costs us a cent a day, then the value will surely outweigh the cost. According to jmaher, running the dirty profiles tests cost almost the same as running tp5 tests (about 85% of TP5's CPU time), and considering that TP5 cycles 50 pages and gives a lot of info, while the dirty tests only measures startup time and are typically unchanging, this 85% resources number sounds pretty un-proportional cost/value ratio. Another idea which came up is, after also getting some tangible costs numbers (in money, time, productivity, etc), is to "throttle" some tests to run less frequently, or otherwise reduce their runtime or take other measures to reduce cost. But this is already out of the scope of this bug. For this bug, we're keeping these dirty profile tests disabled for now.
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: