Closed Bug 724692 Opened 13 years ago Closed 12 years ago

Rework graphserver and its schema to be more flexible

Categories

(Webtools Graveyard :: Graph Server, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED WONTFIX

People

(Reporter: k0scist, Unassigned)

References

Details

(Whiteboard: [SfN])

looking at http://hg.mozilla.org/graphs/file/tip/sql/data.sql and have been thinking, in abstract, of what data we really have. I have been focusing on tp(5), since that is the test with many options that I know the best. While I don't have any recommendations as far as which database to use -- I am not a database expert particularly in terms of intuiting performance from schema -- hopefully this analysis can guide that decision. What we probably want to do is something using an entirely new schema and that will probably be an entirely new piece of software. That said, I will confine the summary to things that should be addressed with the current approach and we can flush out what we really want as this goes along. - get rid of the `machines` table: http://hg.mozilla.org/graphs/file/tip/sql/data.sql#l147 this should be populated dynamically (case in point, http://hg.mozilla.org/graphs/file/tip/sql/data.sql#l871) * the only good thing about having the machines where they are is that in the SQL inserts there is an implicit mapping between OS and machine name; however this is possibly the worst possible place to keep this information! A simple (key, value) mapping flat file would be much more usable, or a python/JSON dict, or.... * and/or the test machine can upload what OS it is on, possibly with mozinfo, to graphserver This would obviate https://bugzilla.mozilla.org/show_bug.cgi?id=505803 - tests: see * https://wiki.mozilla.org/Buildbot/Talos#Talos_Tests * http://hg.mozilla.org/graphs/file/tip/sql/data.sql#l2384 from a generated .yml file, you get """ tests : - name: tp url : '-tp page_load_test/tp3.manifest.develop -tpchrome -tpnoisy -tpformat tinderbox -tpcycles 10' resolution : 20 win_counters : ['Working Set', 'Private Bytes', '% Processor Time'] w7_counters : ['Working Set', 'Private Bytes', '% Processor Time', 'Modified Page List Bytes'] linux_counters : ['Private Bytes', 'RSS', 'XRes'] mac_counters : ['Private Bytes', 'RSS'] shutdown : True """ A test schema for 'tp5' may be defined as 'tp5': # should probably just be 'tp' since the only difference is the pageset { # corresponds to '-tpchrome' in url of .yml tests section # 'chrome': False, # pageset id; # ideally this would be a URL to a manifest, e.g. # http://hg.mozilla.org/build/talos/raw-file/tip/talos/page_load_test/tp4.manifest # additionally, since this corresponds to e.g. # '-tp page_load_test/tp3.manifest.develop' # in the url of .yml tests section, # these should be unified 'pageset': "Tp February, 2009 (100 pages)", # version of the test # at the moment this is a magical key that doesn't mean anything # instead, it should be a URI that actually means something 'version': 'r', # # e.g. http://hg.mozilla.org/graphs/file/tip/sql/data.sql#l2523 # what event the time is measured from # (and other statistics?) 'event': 'MozAfterPaint' # see https://wiki.mozilla.org/Buildbot/Talos#Paint_Tests # could also be 'onload' # test counters # see e.g. https://wiki.mozilla.org/Buildbot/Talos#tp5 'counters': ['pbytes', # https://wiki.mozilla.org/Buildbot/Talos#Private_Bytes '%cpu', # https://wiki.mozilla.org/Buildbot/Talos#.25_CPU 'rss', # https://wiki.mozilla.org/Buildbot/Talos#RSS_.28Resident_Set_Size.29 'xres', # https://wiki.mozilla.org/Buildbot/Talos#Xres_.28X_Resource_Monitoring.29 'memset', # https://wiki.mozilla.org/Buildbot/Talos#Working_Set_.28tp5_memset.29 'modlistbytes' # https://wiki.mozilla.org/Buildbot/Talos#Modified_Page_List_Bytes 'main_rss', # (no documentation) 'content_rss', # (no documentation) 'shutdown', # (no documentation) 'reposiveness' # (no documentation) ] # could also have: # "filter" : <type of averaging applied> # ideally that would live 100% in graphserver, but if there is a # reason we have to hold back we can add here # (though i can't think of any since we're going to want to change # these at the same time anyway) } The 'counters' is not actually part of the defintion of a test (I don't think, unless it alters the core statistics, which ideally it wouldn't) (if 'filter' was part of the dataset, it wouldn't be part of the definition either). A complete unique test ID is defined by pageset, chrome, event, and version (if present) (for tp series; for other tests possibly other attributes are needed). - a test post would have to specify these parameters to upload data to the graphserver. In addition, it should post each "counter" it collects, what machine it is on, and potentially: * what branch it is testing * the machine OS - tests don't have a pointless pretty name; instead the pretty name and all other test metadata should be explorable through the test definition. In either case there's no real need to store them piecewise in SQL - counters shouldn't have a pointless pretty name; instead there should be a single (1:1) mapping somewhere (and *only* one place) whereby these are explained that should be usable in all applicable code - I'm not even going to comment about branches. It is worth noting, from the point of view of a webserver collecting data, that this can all be collected dynamically. For instance, lets say you're uploading: {'pageset': pageset_name_or_url, 'chrome': False, 'data': [series, of, data, points], 'event': 'MozAfterPaint', # 'timestamp': // if we demanded it 'counters': {'main_rss': [series, of, data, points], ...} 'machine': 'tegra-263.n' ... # etc } [NOTE: this is not necessarily suggesting switching to JSON for uploading; I am just using JSON as shorthand for the information we want to upload regardless of how we choose to POST it.] The server then: - ensures that the required keys are there (pageset, chrome, event, machine, data, whatever else we demand) - ensures that all of the counter data is of the same length (I think) - (could ensure that the metrics are of appropriate data types) If these fails we send back a 400 Bad Request of some variety. However, if these (and whatever other preconditions we want) are satisfied, then the particulars of inserting to a database are satisfied. This is not necessarily a vote for some variety of NoSQL, though we could do that, but in SQL too you could insert these records for a run (I can dust off my SQL chops if you want me to be more explicit here). Anyway, this is food for thought that I hope people find a useful framework to have a discussion around concerning what we actually want to do with Talos data. I'd be happy to spec turning this into redis or SQL (or other, though those two I could figure out pretty easily). For mongo or couch, you can just shove the above data into a DB (or more than one if you wanted different, ahem, "views") though again I don't necessarily recommend this (though neither am I saying this is necessarily a bad idea). I concentrated on tp(5) because I thought it would be exemplative and has a lot of overlap between the other tests we care about. We will also want to upload all of the raw data from the pageloader runs. It should be up to a graphserver stats package to determine the "result" number as long as we want to calculate such things. I will mention in passing that having a single number as a pass/fail indicator metric may not be a good idea and should be reexamined. The raw numbers should be able to be examined via the graphserver. We won't want to crunch them each time so application of statistics filters should be cached, and likewise we won't want to pull up the raw run numbers on each view -- only when requested
graphserver should also probably be a WSGI app and be able to pull in dependencies, IMHO preferably with a setup.py file. If we are going to stats package consumed by graphserver but also for standalone analysis (see: bug 721902 ), we will need dependencies. IIRC, the existing graphserver also has undocumented dependencies that should be handled in this way
can we wontfix this?
I have mixed emotions about this. As usual, this will supposedly be fixed with datazilla. That said, that is our solution to all things; well, that and mozharness. I can live with a wontfix, but that said I guess there will be no fewer than 100 times before datazilla is deployed when i have to explain to someone what data.sql is and why we're not fixing it.
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → WONTFIX
Product: Webtools → Webtools Graveyard
You need to log in before you can comment on or make changes to this bug.